Implementing Cryptography in AI Systems

Interesting research: “How to Securely Implement Cryptography in Deep Neural Networks.”

Abstract: The wide adoption of deep neural networks (DNNs) raises the question of how can we equip them with a desired cryptographic functionality (e.g, to decrypt an encrypted input, to verify that this input is authorized, or to hide a secure watermark in the output). The problem is that cryptographic primitives are typically designed to run on digital computers that use Boolean gates to map sequences of bits to sequences of bits, whereas DNNs are a special type of analog computer that uses linear mappings and ReLUs to map vectors of real numbers to vectors of real numbers. This discrepancy between the discrete and continuous computational models raises the question of what is the best way to implement standard cryptographic primitives as DNNs, and whether DNN implementations of secure cryptosystems remain secure in the new setting, in which an attacker can ask the DNN to process a message whose “bits” are arbitrary real numbers.

In this paper we lay the foundations of this new theory, defining the meaning of correctness and security for implementations of cryptographic primitives as ReLU-based DNNs. We then show that the natural implementations of block ciphers as DNNs can be broken in linear time by using such nonstandard inputs. We tested our attack in the case of full round AES-128, and had success rate in finding randomly chosen keys. Finally, we develop a new method for implementing any desired cryptographic functionality as a standard ReLU-based DNN in a provably secure and correct way. Our protective technique has very low overhead (a constant number of additional layers and a linear number of additional neurons), and is completely practical.

Posted on February 21, 2025 at 10:33 AM5 Comments

Comments

Clive Robinson February 21, 2025 9:28 PM

@ Bruce, ALL,

I have real problems with this view of DNNs expressed by the authors,

“The problem is that cryptographic primitives are typically designed to run on digital computers that use Boolean gates to map sequences of bits to sequences of bits, whereas DNNs are a special type of analog computer that uses linear mappings and ReLUs to map vectors of real numbers to vectors of real numbers.”

As far as I’m aware all current DNN’s are fully digital and made of the same logic gates and thus the same failings as computer ALU’s and the set of “countable numbers” (Integers).

As for implementing “real numbers” no they are still the set of “countable numbers” that are used to fake the sets of any other numbers like the “real numbers” usually very sparsely at best[1].

Whilst in theory you could use Op-Amps and similar components to form the integrators used in traditional analogue computers. Any such network would fail way way before you formed even a small neural network due to 4ktbr noise, distortion, and bias.

Further the “weights” in a DNN are based on the histogram of the input corpus mapped as a sparse multidimensional spectrum.

“This discrepancy between the discrete and continuous computational models…”

This is where the clouds in the air of the theory, come crashing down to the ground due to the lead boots of practical reality. As with an ALU the basic data unit in a DNN is in reality a countable number. Worse the vectors in the DNN form a very small subset of the Countable Numbers. In reality they are a discrete subset and not continuous.

The authors go on to say,

“[Which] raises the question of what is the best way to implement standard cryptographic primitives as DNNs, and whether DNN implementations of secure cryptosystems remain secure in the new setting, in which an attacker can ask the DNN to process a message whose “bits” are arbitrary real numbers.”

In a number of respects this question has been asked and answered before many times over the past thousand years or so.

Consider LLMs by their inputs and outputs not if they are bit representations from logic gates. The natural language input is based on a table of information we usually call a dictionary. With the individual words within it can be placed on a line to form a very sparse series of spectrums that each represent one aspect of the natural language statistics.

The answer to this has always been “flatten the statistics” which basically involves two operations,

1, Remove the sparsity.
2, Remove symbol usage frequency disparity.

One simple but far from perfect way to do this by hand is via a “Straddling Checkerboard”,

https://en.m.wikipedia.org/wiki/Straddling_checkerboard

Which was used in the VIC hand cipher.

Thus the question arises as to where and how to do this. Because LLM’s are all about very sparse vector data sets and multiple layers of sparse spectrums.

Arguably if there is reduced or no sparsity the redundancy would be diminished accordingly, and the desired potential for “information capacity” would likewise diminish. Certainly the “stochastic nature” of any LLM would be likewise reduced.

But more importantly the output of an LLM to appear “natural” is very much based on the shape of those multiple spectrums. Flatten the spectrum statistics and the “natural” illusion would quickly vanish.

So the answer to the question is an interesting one because the required solution –compression and flattening– is the antithesis of the required function –of faking normality– by randomised token selection from a statistical curve.

Thus we find ourselves in,

“Rather more than ‘Somewhat of a quandary’.”

Oddly it’s a problem I’ve been thinking about for quite some years now and it started with,

“How to do both Voice compression and Encryption, and producing an output suitable for use with existing HF Radio systems.”

In theory it’s simple… In practice it’s really not, and why although I was hopeful for success I could predicted that the attempt to encrypt voice in an analogue form to send over mobile phone audio channels through the GSM “standard” Codec of “JackPair” would fail (you can not reliably compress random).

Which is why this longish paper limits it’s self.

So rather than read the “abstract” start by reading the “Introduction” it makes rather more sense, and you will find,

“In this paper, we study the central problem at the intersection of deep learning and cryptography, namely whether one can implement standard cryptographic primitives in deep neural networks while maintaining their security.”

Or a little further down,

“To clarify the research question, suppose that we want to implement the
Advanced Encryption Standard (AES) block cipher as a DNN.”

In theory this is possible as all you are doing is providing a pair of “reversible one to one mappings” between the input and the output, with as many pairs of mappings as there is “key space”.

But we also find,

“Finally, when we extend bits to qubits and consider quantum circuits which can manipulate such qubits

They have jumped from “classical computing” and the DNN’s that actually exist to theoretical only “Quantum Computing” and types of Neural Network that do not currently exist, and have previously met with significant opposition when proposed as a construct to consider by the likes of Roger Penrose.

Anyway the paper is a bit of an uphill challenge for even those with a standard understanding of current AI DNNs.

[1] To see a consequence of this sparsity in differences between the countable and real numbers you just have to draw up a traditional “Times Table”.

You write 0..9 along the top and 0..9 down the side of a ten by ten grid.

You then fill in the multiplications but omit any duplicates. So 3.2 and 2.3 produce the same result of 6 but you only write it once for which ever you do first. You will see that the results form a much smaller subset of 0..99. When placed on the 0..99 number line you clearly see a pattern of density at the low end and significant sparsity at the high end.

Interestingly if you tabulate all the table multiplication results by their first digit you will see a result approximating the Newcomb – Benford Law.

But also consider you would expect a LLM using word vectors to follow Zipf’s law.

Lozenge February 22, 2025 2:19 AM

The basic idea of the cryptanalysis is that continuously implementing the XOR function on the unit rectangle requires a function with a bend in it. By finding where the bend appears on the coordinate you control, and given knowledge of the function, you can infer the value of the other coordinate (the key). Once you know the key for one layer of XORing, you can compose it with its inverse and attack the next layer.

not important February 23, 2025 5:37 PM

Will quantum computers disrupt critical infrastructure?
https://www.bbc.com/news/articles/cpq9zxxn72qo

‘Today, some fear there is a new critical threat to the world’s digital infrastructure. But this time, we cannot predict exactly when it will move from theory to reality, while the ubiquity of digital technology means fixing the problem is even more complicated.

That’s because the arrival of quantum computing means that many of the encryption algorithms that underpin and secure our hyperconnected world will be trivially easy to crack.

Quantum computing is radically different to the “classical” computing used today. Instead of processing binary bits which exist in one of two states – one or zero, on or off – quantum computing uses qubits, which can exist in multiple states, or superpositions.

“The reason why it’s so powerful is because you’re doing all those possible computations simultaneously,” Prof Nishanth Sastry, director of research for computer science at the University of Surrey, explains. This means it’s “much, much more efficient, much, much more
powerful.”

Today’s computers would take thousands, even millions of years, to crack current encryption standards, such as RSA. A suitably powerful quantum computer could, theoretically, do the job in minutes.’

More when following link if interested and post not deleted by Moderator

Clive Robinson February 24, 2025 12:08 AM

@ not important,

With regards the BBC article you link to, the author gets quite a few things incorrect or over exaggerated.

For instance,

“That’s because the arrival of quantum computing means that many of the encryption algorithms that underpin and secure our hyperconnected world will be trivially easy to crack.”

Firstly we’ve not got anywhere even close to a quantum computer and it’s not unreasonable to think “not in my lifetime” or “the lifetime of my grand children”. Have a read of,

https://www.cs.auckland.ac.nz/~pgut001/pubs/bollocks.pdf

The author is fairly well respected for various good reasons and he’s clearly not impressed with quantum cryptanalysis as a practical engineering rather than the consequence of theoretical ideas.

But getting back to the BBC article,

I find it funny that the picture the author uses for “critical infrastructure” are “Sewage sludge pits”… Not only are they literally full of “S41t” they are probably not nor ever likely to be controlled by electronic systems that implement the type of cryptographic system affected by Quantum Cryptography. And even if they were they could be put back to traditional physical flow control by the turning of a switch.

As I’ve indicated before, the things you really need worry about are,

1, Smart grid systems.
2, Implanted Medical Electronics.

Because they can kill you fairly quickly and are the logical endpoints for asymmetric cryptography that can not be easily upgraded if at all (most implanted medical electronics control communications are in fact not encrypted at all currently, so something you should think about before having them fitted for your heart etc).

But the author of the BBC article neglects to mention what type of encryption that is really will be effected by Quantum Computers if they ever deliver (which is still very much in doubt).

It’s the part used for “key transfer” and it’s something that is in effect “logically impossible” to do with “Perfect Security”.

In fact less than a lifetime ago it did not exist as even a theoretical notion. The problem it tries to solve is the transfer of a “root of trust” between two parties “securely” in an “open communications channel” visible to any and all third parties.

The transfer of “roots of trust” has been done securely for a lot lot longer than that by the simple process of using one or more “closed communications channel(s)” not visible to third parties.

So if Quantum Computers ever do become a threat, there are two ways to solve the issue of transfering “roots of trust”,

1, Return to the use of “Closed channels”.
2, Come up with new ways to use “Open channels”.

We already know how to do the first efficiently, quickly, and importantly effectively, but it has certain trust issues.

The second is at best “a work in progress” and may not get us where we want to go. In fact it may not be possible to do.

Behind the idea of asymmetric cryptography is the unproven notion of “One Way Functions with Trapdoors”(OWFTs).

We have no real proof that OWFs actually exist let alone OWFTs. In fact there is a degree of thought that says OWFTs are just a form of “backdoor”.

Gustav Simmons showed that any system that had redundancy of some form could have a covert channel built using it. Shannon showed that to be able to communicate information you have to have redundancy in the communications channel (hence the reason they are sometimes call “Shannon Channels”). Together this means all communications channels can have either covert or overt side channels within them[1]

Further there is work by Adam Young and Moti Yung that shows how to put “backdoors” into asymmetric cryptography and putting them into symmetric “Stream Ciphers” is generally not difficult. You simply need sufficient redundancy to put a Shannon channel of sufficient bandwidth to leak the “root of trust” inside of the Shannon channel you are using as your communications channel. Symmetric Block ciphers are somewhat harder but it’s not impossible to do.

The scary thing about current asymmetric encryption systems is just how much easily exploitable redundancy there is within them.

[1] Interestingly the logic of this argument comes out to,

For a channel to carry information it must have redundancy, where there is redundancy a channel to carry information can be formed.

The consequence of this is,

1.1 All channels can have a backdoor channel within them.
1.2 All backdoored channels can have channels within them that are not backdoored.

With the consequence of if you create the inermost channel then it can be backdoor free.

Sidebar photo of Bruce Schneier by Joe MacInnis.

close