Meta’s AI-based audio codec promises 10x compression over MP3 | Tech Rasta

Illustrated representation of data in an audio wave.
big / Illustrated representation of data in an audio wave.

Meta AI

Last week, Meta announced “EnCodec,” an AI-based audio compression method that can compress audio at 64kbps to 10 times smaller than the MP3 format without any loss in quality. Meta says the technique can dramatically improve speech sound quality over low-bandwidth connections, such as phone calls in areas with spotty service. The technique also works for music.

Meta AI researchers debuted the Meta technology on Oct. 25 in a paper titled “High Fidelity Neural Audio Compression” by Alexandre. Defosse, Jade Kopet, Gabriel Sinev and Yossi Adi. Meta has also summarized the research on its blog dedicated to Encodec.

Meta claims its new audio encoder/decoder can compress audio 10x smaller than MP3.
big / Meta claims its new audio encoder/decoder can compress audio 10x smaller than MP3.

Meta AI

Meta describes its method as a three-part system trained to compress audio to a desired target size. First, the encoder converts the uncompressed data into a low frame rate “latent space” representation. A “quantizer” compresses the representation to a target size while keeping track of the most important information used to reconstruct the original signal. (This compressed signal is either sent over the network or saved to disk.) Finally, the decoder converts the compressed data back into audio in real time using a neural network on a single CPU.

A block diagram illustrating how Meta's codec compression works.
big / A block diagram illustrating how Meta’s codec compression works.

Meta AI

Using Meta’s discriminants proves to be the key to designing a method to compress audio as much as possible without losing key elements of a signal:

“Because perfect reconstruction is impossible at low bit rates, detecting changes that humans cannot perceive is key to lossy compression. To do so, we use discriminators to improve the perceptual quality of the generated samples. This creates a cat-and-mouse game in which distinguishing between real samples and reconstructed samples is The discriminator’s task. A compression model attempts to design patterns to deceive discriminators by making the reconstructed patterns more perceptually similar to the original patterns.”

Using neural networks for audio compression and decompression isn’t new—especially for speech compression—but the Meta researchers claim to be the first group to apply the technique to 48 kHz stereo audio (slightly better than a CD’s 44.1 kHz sampling rate. ), which is typical of music files distributed over the Internet. .

As for applications, Meta says this AI-powered “hypercompression of audio” will support “faster, better-quality calls” in bad network conditions. And, of course, being meta, the researchers also touted Encodec’s metaverse implications, saying the technology could eventually deliver “rich metaverse experiences without requiring major bandwidth improvements.”

Beyond that, we’ll get really tiny music audio files someday. For now, Meta’s new technology is still in the research phase, but it points to a future where high-quality audio can use less bandwidth, which will be great news for mobile broadband providers with networks overloaded by streaming media.

Source link