Skip to content
tales from the digital vise

AI language models can exceed PNG and FLAC in lossless compression, says study

Is compression equivalent to general intelligence? DeepMind digs up more potential clues.

Benj Edwards | 95
Story text

Effective compression is about finding patterns to make data smaller without losing information. When an algorithm or model can accurately guess the next piece of data in a sequence, it shows it's good at spotting these patterns. This links the idea of making good guesses—which is what large language models like GPT-4 do very well—to achieving good compression.

In an arXiv research paper titled "Language Modeling Is Compression," researchers detail their discovery that the DeepMind large language model (LLM) called Chinchilla 70B can perform lossless compression on image patches from the ImageNet image database to 43.4 percent of their original size, beating the PNG algorithm, which compressed the same data to 58.5 percent. For audio, Chinchilla compressed samples from the LibriSpeech audio data set to just 16.4 percent of their raw size, outdoing FLAC compression at 30.3 percent.

In this case, lower numbers in the results mean more compression is taking place. And lossless compression means that no data is lost during the compression process. It stands in contrast to a lossy compression technique like JPEG, which sheds some data and reconstructs some of the data with approximations during the decoding process to significantly reduce file sizes.

The study's results suggest that even though Chinchilla 70B was mainly trained to deal with text, it's surprisingly effective at compressing other types of data as well, often better than algorithms specifically designed for those tasks. This opens the door for thinking about machine learning models as not just tools for text prediction and writing but also as effective ways to shrink the size of various types of data.

A chart of compression test results provided by DeepMind researchers in their paper.
A chart of compression test results provided by DeepMind researchers in their paper. The chart illustrates the efficiency of various data compression techniques on different data sets, all initially 1GB in size. It employs a lower-is-better ratio, comparing the compressed size to the original size.
A chart of compression test results provided by DeepMind researchers in their paper. The chart illustrates the efficiency of various data compression techniques on different data sets, all initially 1GB in size. It employs a lower-is-better ratio, comparing the compressed size to the original size. Credit: DeepMind

Over the past two decades, some computer scientists have proposed that the ability to compress data effectively is akin to a form of general intelligence. The idea is rooted in the notion that understanding the world often involves identifying patterns and making sense of complexity, which, as mentioned above, is similar to what good data compression does. By reducing a large set of data into a smaller, more manageable form while retaining its essential features, a compression algorithm demonstrates a form of understanding or representation of that data, proponents argue.

The Hutter Prize is an example that brings this idea of compression as a form of intelligence into focus. Named after Marcus Hutter, a researcher in the field of AI and one of the named authors of the DeepMind paper, the prize is awarded to anyone who can most effectively compress a fixed set of English text. The underlying premise is that a highly efficient compression of text would require understanding the semantic and syntactic patterns in language, similar to how a human understands it.

So theoretically, if a machine can compress this data extremely well, it might indicate a form of general intelligence—or at least a step in that direction. While not everyone in the field agrees that winning the Hutter Prize would indicate general intelligence, the competition highlights the overlap between the challenges of data compression and the goals of creating more intelligent systems.

Along these lines, the DeepMind researchers claim that the relationship between prediction and compression isn't a one-way street. They posit that if you have a good compression algorithm like gzip, you can flip it around and use it to generate new, original data based on what it has learned during the compression process.

In one section of the paper (Section 3.4), the researchers carried out an experiment to generate new data across different formats—text, image, and audio—by getting gzip and Chinchilla to predict what comes next in a sequence of data after conditioning on a sample. Understandably, gzip didn't do very well, producing completely nonsensical output—to a human mind, at least. It demonstrates that while gzip can be compelled to generate data, that data might not be very useful other than as an experimental curiosity. On the other hand, Chinchilla, which is designed with language processing in mind, predictably performed far better in the generative task.

An example from the DeepMind paper comparing the generative properties of gzip and Chinchilla on a sample text.
An example from the DeepMind paper comparing the generative properties of gzip and Chinchilla on a sample text. gzip's output is unreadable.
An example from the DeepMind paper comparing the generative properties of gzip and Chinchilla on a sample text. gzip's output is unreadable. Credit: DeepMind

While the DeepMind paper on AI language model compression has not been peer-reviewed, it provides an intriguing window into potential new applications for large language models. The relationship between compression and intelligence is a matter of ongoing debate and research, so we'll likely see more papers on the topic emerge soon.

Listing image: Getty Images

Photo of Benj Edwards
Benj EdwardsSenior AI Reporter
Benj Edwards is Ars Technica's Senior AI Reporter and founder of the site's dedicated AI beat in 2022. He's also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.
95 Comments
Staff Picks
r
But what about decompression rate? FLAC has always been noteworthy for being an asymmetrical codec which takes more computational power to compress than to decompress (potentially a lot more, depending on the settings used). If this new AI codec requires a lot of number crunching to decode, it may not be such a big win in all situations.
In terms of a practical format, FLAC/PNG are designed to be incredibly fast and lightweight because they have to be integrated into mobile devices, web browsers, etc without consuming huge amounts of memory and power. For example, FLAC is designed to be able to decode CD audio losslessly in realtime on DSP cores with single-digit MHz and tens of kilobytes of RAM while using the absolute lowest amount of battery. I'm not sure how much memory Chinchilla 70B requires, but seeing as the model has 70 billion parameters, I suspect it will not fit into 64 KB of memory on a low power embedded audio device.
close