Google's first AI codec to process speech, SoundStream's latest technology detailed, efficient compression and noise reduction

author：Smart stuff 2021-10-31 13:23:11

Zhi DongXi (public number: zhidxcom)

Compile the | Li Huinan

Edit | Jiang Xinbai

According to foreign media VentureBeat, on August 12, US time, Google conducted a technical detailed explanation of its audio codec SoundStream, which can not only process different types of sounds, but also provide high-quality audio. SoundStream is also the first AI codec to handle speech and music, and the codec can also run on smartphones.

SoundStream is an end-to-end "neural" audio codec that can process audio including speech, music, and ambient sounds. At the same time, SoundStream simultaneously compresses and enhances audio to eliminate noise in the background.

According to Google, the performance of the 3kbps SoundStream is close to that of the 9.6kbps US EVS processor, and the performance exceeds the 12kbps Opus codec. In addition, at the same bit rate, SoundStream performs better than the current version of Lyra.

Users can use SoundStream to compress audio, alleviating the need for high storage and bandwidth. At the same time, the decoded audio is not significantly different in perception from the original audio.

In traditional audio processing pipelines, compressed audio and enhanced audio are often performed by different plates. But SoundStream is both compressed and enhanced.

In May, Google released a neural audio codec called Lyra that can be used to compress low bitrate audio. Lyra is a system built by SoundStream using encoders, decoders, and quantizers.

But Google said that SoundStream is still in the experimental stage, and the follow-up plan is to update the version of Lyra, which will have higher audio quality and less complexity.

"When people are transmitting audio, effective compression is necessary. SoundStream is an important step in improving machine-driven audio codecs, which have outperformed the most advanced Opus and EVS codecs and can enhance audio as needed. Google researcher Neil Zeghidour said.

Marco Tagliasacchi, another Google researcher, also wrote in a blog that by integrating SoundStream with Lyra, developers can use existing tools to develop, which is a good use of resources and provides better sound quality.

<h2>Conclusion: The follow-up version of SoundStream is worth consumers looking forward to</h2>

It is reported that SoundStream currently has the functions and features of the best audio codec on the market. SoundStream's highly efficient audio processor not only saves consumer time, but also provides better sound quality. Therefore, SoundStream may be more popular with consumers in the market for audio codecs.

Although SoundStream is still in the experimental stage, as Google advances the technology development, SoundStream may be widely used in the future.

Source: VentureBeat

Google's first AI codec to process speech, SoundStream's latest technology detailed, efficient compression and noise reduction

Read on

Can the original car halogen headlights be directly changed to LED headlights? Don't understand these, don't mess with the headlights!

Win11 comes with a player that is actually super easy to use! Teach you a few tricks to play with it

Direct hit 3·15 evening party: multi-brand electric bicycles blatantly violated the law to decode and speed up The green source, calves, etc. were named

The 315 party exposed the inside story of electric bicycle speeding: Brands such as Mavericks, Xinri, and Hello were named

Direct hit 315 evening: multi-brand electric bicycles blatantly illegal decoding speed up Green source, mavericks and so on were named

Google Docs can now automatically generate text summaries!

Rockchip: Part RK3588 has already started production in small batches

The new multimodal king ascends the throne! OpenAI Releases DAL · E 2, generate the image "which to play which"

Extreme HIFI feeling, experience the 10,000 yuan decoder TEAC-NT-505-X

To see how strong the AI is, someone took it to play a "script kill"

Hardware 丨 AMD expects to launch a CPU with an integrated AI engine as early as 2023

Why sound is suitable for building a brand strengthens the mind

The 7th generation of Qualcomm AI engine: through AI, see the future

Capture once in 5 minutes, at least 89 times a day at home! Suntech employee: I don't even dare to go to the toilet

Played a script kill, the same car teammate "not human"

2022 Le Orange New Product Launch: 14 new products qifa software and hardware fully upgraded

Is there any software to dub videos? Share software that can dub videos

Don't let ChatGPT run

The meme search artifact is here! You can also search for videos, netizens: I found a six-year meme to solve in two minutes

Cheating with ChatGPT, beware of being caught, anti-plagiarism watermark technology makes students' nightmares come early

Google's "crazy" generative AI track, the latest model can "create" music with text and pictures

What to do if ChatGPT goes crazy? Xiaoice Li Di: Two keys that I can break

Experience ChatGPT again: it will still be wrong, but the logic is stronger

Xiaoza personally officially announced the Meta vision big model! Self-supervised learning requires no fine-tuning

The CV ring exploded again? Xiaoza high-profile official announcement DINOv2, split retrieval omnipotent, netizens: Meta is "Open" AI

From Seq2Seq to Attention: Revolutionizing the Attention Mechanism of Sequence Modeling is a solution to the problem of context compression, short-term memory limitations, and bias in neural machine translation models

The first open-back headphones with the sound core come with an optional neck brace. Anker Soundcore has launched two new open-back earbuds, the AeroFit and AeroFitPro

Ghostwire Tokyo needs an effective solution to share with the VP9 video codec

DDColor: AI image colorization tool, excellent image coloring model, support for dual decoders!

#微头条首发挑战赛#北约国家全部自废武功, raised an American. Europe is not as weak as you think! Britain invented steam catapult and radar, which were given to the United States free of charge after the war

IP decoder for Python penetration testing introductory

Decoder wholesale in large quantities #Intelligent weak current engineering #System integration #Multimedia exhibition hall #Integrated wiring #Professional audio system