Why doesn't deep learning use modular arithmetic like cryptography, even though both deal with non-linear functions?

Question

So, deep learning models are great at learning complex, non-linear patterns and seem to handle noise just fine. But under the hood, they rely on IEEE754 floating-point numbers, which can lose precision (e.g., rounding errors). Meanwhile, cryptography, like elliptic curve crypto, also deals with theoretically smooth, non-linear curves. But instead of floating-point, crypto uses modular arithmetic (finite fields) to keep everything exact and robust.

Here’s the thing: crypto folks turn continuous math into discrete, finite fields to avoid floating-point messiness. But deep learning avoids modular arithmetic because it hates abrupt jumps—like how uint8 wraps 255 → 0. That discontinuity wrecks gradients during training.

What if we reimagined uint8 as a cyclic space, like a gear with 256 teeth? For example, the distance between 254 and 1 isn’t 253 (linear subtraction) but just 3 steps if you "rotate" through 255 → 0 → 1. It’s like choosing to rotate clockwise instead of counterclockwise. Could gradients work this way? Instead of "add/subtract," maybe gradients tell us how much to "rotate" the weights in this cyclic space. So, the direction of gradient is not about negative or positive. Instead, it's counter clockwise or clockwise.

I’m not talking about security here, or deep learning is supposed to be secure (even though it's better). Just the engineering. Why stick to floating-point when modular arithmetic avoids precision loss? Couldn’t cyclic gradients sidestep exploding/vanishing issues? Or is there a deeper reason deep learning avoids this?

Example: If a gradient says rotate 3 steps clockwise instead of subtract 3, wouldn’t that handle modular wraps more gracefully? Or does this break how neural networks update weights?

So, actual number that we have already known must be encoded into cyclic space, and that literal value in cyclic space is not really represents absolute value in cyclic space.

For example, if we have continuous feature like temperature in celcius, it supposed to be encoded into uint8 space and think this as hidden representation of temperature in celcius.

cinch · Accepted Answer · 2025-03-09 23:18:47Z

Cryptography benefits from the exactness of finite fields for reasons of security and exactness in discrete math, DL depends on the continuous adjustments that floating-point arithmetic facilitates. For features that are inherently continuous like temperature, encoding them into a cyclic space would only make sense if the underlying phenomenon were periodic such as angles or time-of-day. Most continuous quantities do not naturally have a cyclic structure, so forcing them into a cyclic space could distort their true relationships. Instead, these features benefit from a linear continuous representation that preserves their order and differences. For example, a temperature of 254°C and 1°C might be "close" by 3 steps in your forced cyclic angular space, but such closedness is unrelated in reality. A neural network would struggle to learn meaningful relationships from this encoding. In addition, cyclic spaces create symmetric repeating loss landscapes, thus optimization could cycle endlessly or settle in suboptimal regions.

Neural networks rely on linear combinations such as matrix operations and smooth nonlinearities. Cyclic arithmetic would make these operations chaotic. Modern GPUs and other hardware accelerators are optimized for floating-point operations which provide high throughput for floating-point arithmetic, whereas modular arithmetic isn’t similarly optimized, making it impractical for large-scale deep learning. Efficient low-precision training such as 8-bit integers uses quantization-aware training or post-training without cyclic wraps to preserve gradient flow. Though quantization represents weights/activations as low as 8 bits but it still keeps gradients in floating-point for stability during training.

Finally exploding/vanishing gradients arise from repeated multiplicative operations due to learning architecture, not floating-point precision. Cryptography prioritizes exactness and discreteness to enforce security, while DL prioritizes approximation and continuity to generalize from already inherently noisy and non-precise data.

254°C and 1°C might be "close" by 3 steps. It's exactly what I already said in my post. The uint8 space is intended to represent an angle embedding, not the actual absolute value. So, temperate samples must be normalized, e.g., using the mean yielding 30 degrees Celsius over the dataset. This will be mapped into 128 as the midpoint. While standard deviations of 5 degrees Celsius, like 25 degrees or 35 degrees, are mapped to 64 and 196 to represent orthogonality. — Muhammad Ikhwan Perwira, CommentedMar 10 at 1:26
this can work well for naturally periodic data like angles, however, for non-cyclic quantities like temperature this mapping would misleadingly imply that 255°C is the same as 0°C, which distorts the true differences between temperature values as mentioned in my above answer. This artificial "wrap-around" per uint8 memory spec creates incorrect proximity relationships and would likely hinder the learning process due to the effective additional Euclidean space optimization constraint from your enforced wrap around. — cinch, CommentedMar 10 at 1:47
Cryptographic algorithms rely on the exact algebraic properties of finite fields. The fact that numbers “wrap around” (e.g., in an 8‐bit space, 255 + 1 becomes 0) is an inherent property of the field, which is precisely what ensures the operations are closed and consistently repeatable, properties that are crucial for security proofs and the correctness of cryptographic schemes. There’s no requirement to preserve a notion of magnitude/closedness because the numbers are used as mere algebraic symbols within a mathematical structure. — cinch, CommentedMar 10 at 1:57

Stack Exchange Network

Why doesn't deep learning use modular arithmetic like cryptography, even though both deal with non-linear functions?

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Why doesn't deep learning use modular arithmetic like cryptography, even though both deal with non-linear functions?

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions