Listen to the back of the natural sound, for you to interpret the natural AI voice assist/enhance algorithm

2022-03-03 20:13:43

On March 3, Love Ear Day, China Unicom and Tencent Conference's Tianlai Lab released the "Listening King Card Upgrade", which allows hearing impaired users to not only "hear clearly" but also "see clearly" under the dual scenarios of voice calls and real-time subtitles.

Behind this, relying on the Tianlai AI voice assisted listening/enhancement algorithm created by Tianlai Lab for the hearing impaired, it creatively adopts the "voice enhancement method with enhanced speech", starting from the practice of Tencent conference, all the way to goodness, and constantly extending the technical value outward.

Source 丨Tencent Tianlai Laboratory

Can hear clearly = zero noise?

You may have also encountered such a scene:

In a noisy restaurant, two people at the same table were talking intently, and although there was a lot of noise around them, they only heard each other's voices in their ears, and seemed to be completely unable to hear all kinds of noises other than the content of the conversation.

This is the "cocktail effect" common in the field of acoustics.

In fact, the verbal energy and intelligibility of different frequencies of sounds are different, so what kind of sounds can be "heard clearly" and then "understood"?

Listen to such a set of sounds.

Voice A (original noisy voice)

In the waveform, you can intuitively feel that there is a strong wind noise interference.

In terms of hearing, the intelligibility of speech is very low due to noise interference.

Voice B (voice after simple noise reduction logic processing)

After the voice A is noise-reduced, the waveform becomes very clean. However, the intelligibility of the sound has not improved.

Listen to the back of the natural sound, for you to interpret the natural AI voice assist/enhance algorithm

Although the simple noise reduction logic can suppress the noise, this practice destroys the speech structure, resulting in the sound being high and low, and the intelligibility of the voice is not improved. Therefore, voice enhancement and noise reduction cannot be equated.

For people with hearing impairments, this problem is particularly critical.

Relatively speaking, the speech component that the hearing impaired user can perceive is very small, and the perception of speech can only be obtained through the perception of a limited frequency band. If you only use simple "noise reduction" thinking to deal with it, it often creates a dilemma of "the noise is too clean, but I can't hear what you are saying".

Do speech enhancement in a voice-enhancing way

It seems like a fish and bear paw can't be combined. But returning to the nature of human hearing, the problem seems to be solved.

Although the process of perception and processing of sound signals remains to be explored, one thing is clear: the more accurately the speech component can be extracted from the received signal, the better the comprehensibility. So we thought of cutting in from the perspective of "speech", not "noise". Tianlai laboratory researchers said.

In view of the pain points of the experience of hearing-impaired users, researchers in Tianlai Laboratory creatively proposed the idea of "speech enhancement" with "enhanced speech", and developed the Tianlai AI speech assisted listening/enhancement algorithm - cSENN (a speech enhancement method based on deep learning based on speech context relationship).

Tianlai AI voice assisted listening/enhancement algorithm

Through the AI algorithm independently developed by Tianlai, the components of the speech in the noisy speech are identified, and it is first protected in a reasonable way, and then the acoustic noise is effectively suppressed.

This practice, while effectively suppressing the background interference sound, can also maintain a high degree of voice comprehensibility, so that users can hear more clearly.

Listen to the effect of voice A after being enhanced by the natural algorithm.

Voice C (voice after tianlai technology enhancement)

From the waveform point of view, it seems to be similar to the effect of simple "noise reduction" processing, but obviously, the enhanced speech is better retained, the output speech is smooth, and the noise is suppressed to the ideal level.

This technology, also in Tencent's Tianlai Campaign, was applied to China Unicom's Listening King Card.

In the "Listening King Card Upgrade", with the blessing of the Tianlai AI voice assisted listening/enhancement algorithm, users will get a better experience in both voice communication and real-time subtitles, and achieve a 66% increase in the recognition rate of single-byte speech in typical noise scenarios and an increase of 5.5-9.9 percentage points in real-time subtitles.

Here's a video of the downlink receiver being very noisy, with excellent call quality and subtitles.

Note: The downstream end adopts the mobile phone external playback method to record the screen, and the audio quality has an impact

Originated from tencent conference, all the way to good

Hearing clearly and hearing the truth is the audio experience that Tencent Conference is committed to providing users.

As the top audio real-time communication and processing R&D team under Tencent Conference, Tianlai Lab has successfully eliminated more than 300 kinds of ambient noise through deep learning and AI algorithms based on a large number of practical scenarios of Tencent Conference, based on thousands of hours of voice noise data, and has been successfully applied to Tencent Conference.

Previously, the personalized voice enhancement function launched by Tencent Conference is also a successful practice of Tianlai Lab under the idea of "using enhanced speech to do voice enhancement", which can further eliminate the interference of the surrounding voices on the basis of environmental noise elimination, highlight the voice signal of the speaker, as if a "microphone that will find the host", to create a cleaner and purer communication experience.

The accuracy of this technology ranked first in the ICASSP 2022 DNS Personalized Voice Enhancement Contest review organized by Microsoft, with a MOS score of 0.57 higher than the baseline provided by Microsoft and a 1.41 higher voice MOS score than before processing.

"Tencent Tianlai Action" is the "technical value spillover" of Tianlai AI technology applied in the field of hearing impairment, providing conference noise reduction for 200 million Tencent conference users, after mature verification on 100 million products, while ensuring a good video conferencing experience, practicing Tencent's technology concept of goodness, exploring in the field of technical public welfare, using technology for cochlear implant noise reduction, AI assistive listening and subtitle recognition optimization and other scenarios, solving social problems, and truly integrating social responsibility into products and services.

In the future, Tianlai Lab will continue to be open, and we expect more partners to join us to create a purer and high-quality audio experience for the majority of users.

Listen to the back of the natural sound, for you to interpret the natural AI voice assist/enhance algorithm

Read on