laitimes

The Tsinghua team has developed a brain-inspired AI model that provides a new paradigm for perceptual information processing

author:DeepTech

Humans have an innate ability to separate audio signals, such as distinguishing between different speakers' voices or distinguishing sounds from background noise. This innate ability is known as the "cocktail party effect".

By analyzing the statistical structure of patterns in the sound stream, such as the spectrum or envelope, the central auditory system can easily identify specific target sounds in a mixed sound.

In the field of AI, designing speech separation systems that are as powerful as humans has long been an important goal.

Previous neuroscience research suggests that the human brain often uses visual information to help the auditory system solve "cocktail party problems."

Inspired by this discovery, visual information was incorporated to improve the quality of speech separation, and the resulting method is known as the multimodal speech separation method.

If the system is able to capture lip movements, this extra cue will help with speech processing, as it complements the loss of information in the voice signal in a noisy environment.

However, the separation power of existing multimodal speech separation methods is still far inferior to that of the human brain.

基于此,清华大学生物医学工程学院苑克鑫教授团队打造了一款脑启发 AI 模型(CTCNet,cortico-thalamo-cortical neural network)。

The Tsinghua team has developed a brain-inspired AI model that provides a new paradigm for perceptual information processing

Figure | Yuan Kexin (Source: Yuan Kexin)

The speech separation performance of this model is significantly ahead of the existing methods, which not only provides a new brain-inspired paradigm for computer perception information processing, but also has the potential to play an important role in the fields of intelligent assistant and autonomous driving.

"CTCNet is the result of the cortical-thalamic-cortical circuit and A-FRCNN," Yuan said. ”

In recent years, Kexin Yuan's group has systematically studied the architecture and physiological characteristics of the higher auditory thalamus and its cortical connection.

On this basis, combined with the previous speech separation application algorithm of Professor Hu Xiaolin's research group in the Department of Computer Science of Tsinghua University, they proposed a multimodal speech separation scheme.

Then, a series of speech separation tests and parameter tuning were carried out using the public dataset, and finally the CTCNet with excellent speech separation performance was obtained.

The Tsinghua team has developed a brain-inspired AI model that provides a new paradigm for perceptual information processing

(Source: TPAMI)

Therefore, this study is an applied study initiated on the basis of mechanistic research.

"Overall, it's a two-way process. As an AI researcher, you may be inspired by reading the literature in the field of brain science, but direct communication with brain science researchers is always the most effective. Yuan Kexin said.

He goes on to say that in the absence of knowledge, it is difficult for AI researchers to understand how the brain works by reading the literature.

As a brain science researcher, you should have the awareness and intention to transform research results into the AI field, and take the initiative to contact and discuss with researchers in the AI field, so that it is possible to collide with sparks.

In fact, AI researchers are already trying to simulate some of the brain's functions without brain science knowledge, but brain science researchers don't know it.

Through exposure and understanding, brain scientists have the opportunity to transfer their research results to the experiments that AI researchers have already carried out to simulate brain function, thereby contributing to the development of truly effective brain-inspired AI research.

"Through this research, I deeply understand the importance of strengthening communication between researchers in the fields of neuroscience and AI to effectively carry out work related to brain-inspired AI," said Yuan Kexin. ”

The Tsinghua team has developed a brain-inspired AI model that provides a new paradigm for perceptual information processing

Figure | Hu Xiaolin (source: Baidu Encyclopedia)

It is understood that Yuan Kexin and Hu Xiaolin are both part-time researchers at three centers related to brain research at Tsinghua University, so they often have the opportunity to listen to each other's work reports, which has become an opportunity for them to initiate cooperation.

In addition, because neuroscience and AI are two very different disciplines, the success of the collaboration depends on the close communication between team members on both sides.

Although there are often situations where words agree differently in the process of communication, and even situations where each other cannot understand what the other party is saying, both parties have enough patience to understand the connotation of each other's words, which has become an important guarantee for the success of the final cooperation.

最终,相关论文以《由皮层-丘脑-皮层环路启发的视听语音分离模型》(An Audio-Visual Speech Separation Model Inspired by Cortico-Thalamo-Cortical Circuits)为题发在 Transactions on Pattern Analysis and Machine Intelligence(TPAMI)[1]。

Li Kai, a master's student in Hu Xiaolin's team, is the first author, Xie Fenghua, a postdoctoral fellow in Yuan Kexin's team, and Chen Hang, a doctoral student in Hu Xiaolin's team, are the second and third authors, respectively, and Yuan Kexin and Hu Xiaolin are the co-corresponding authors.

The Tsinghua team has developed a brain-inspired AI model that provides a new paradigm for perceptual information processing

Figure | Related papers (Source: TPAMI)

Next:

First, they will analyze the spatial and temporal integration patterns of visual and auditory information at the level of monothalamic neurons, hoping to use this model to upgrade the AI model and further improve the speech separation performance of the model, so that it can cope with more complex natural scenes.

Second, they will explore the potential of the model in other application scenarios, such as exploring the application potential of medical signal detection in the context of noise;

Finally, they will unravel the anatomical and functional connectivity architectures of multimodal neurons in lower-level brain regions of the central sensory system, such as the midbrain, and explore the potential of these connectivity architectures to inspire AI model building.

It is expected that a series of AI models will be able to reversely reveal the important roles and working mechanisms of different multimodal sensory nuclei and the neurons in them in central sensory information processing.

Resources:

1.K. Li, F. Xie, H. Chen, K. Yuan and X. Hu, "An Audio-Visual Speech Separation Model Inspired by Cortico-Thalamo-Cortical Circuits" in IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. , no. 01, pp. 1-15, 5555.

Operation/Typesetting: He Chenlong

Read on