laitimes

The "Winter Olympic Sign Language Broadcast Digital Man" will serve the hearing impaired

The "Winter Olympic Sign Language Broadcast Digital Man" created for the hearing impaired was officially inaugurated on Beijing Satellite TV recently. In the lower left corner of the TV screen, the digital human fingers in the shape of the virtual student "Hua Zhibing" flew over and quickly used sign language to broadcast. The Beijing News reporter learned that the digital person uses the mainland's first ultra-large-scale intelligent information model and virtual digital human technology, which will provide sign language information broadcast services for the hearing impaired, so that they can quickly obtain information when watching the special report of the event.

The "Winter Olympic Sign Language Broadcast Digital Man" will serve the hearing impaired

"Winter Olympic Sign Language Broadcasting Digital People" was put into use in the "Beijing You Morning" program. Courtesy of respondents

Build a high-quality sign language corpus with a vocabulary size of more than 100,000

At present, the number of hearing impaired people in mainland China has reached 27 million, and the sign language service resources in professional scenes are scarce. At the same time, there is a large gap between supply and demand for sign language translation, and it is difficult to translate professional terms for the Winter Olympics. With the support of the Beijing Municipal Science and Technology Commission and the Zhongguancun Administrative Committee, Zhipu AI, Ling Yunguang and Beijing Radio and Television Station jointly created the "Winter Olympic Sign Language Broadcast Digital People".

Sign language generation involves a multi-domain intersection, including computer vision, natural language processing, cross-media computing, human-computer interaction, etc., and the challenges are enormous. With the ultra-large-scale pre-training model as the core technology, the system independently builds a multi-modal body movement, expression, and finger synchronous collection system, and uses industry-leading technologies such as cross-modal anthropomorphic generation algorithm and ultra-high-precision realistic digital people to realize the professional sign language translation and broadcast of event news during the Winter Olympics.

"We first built the largest multimodal sign language corpus in China." Zuo Jiaping, senior vice president of Zhipu AI, introduced that the "Winter Olympic Sign Language Broadcast Digital Person" system has completed the collection and recording of 8214 common sign languages included in the National General Sign Language Dictionary, and the grammar is based on the habitual playing method of the hearing impaired group to ensure the accuracy and professionalism of the sign language broadcast results and better serve the hearing impaired.

Due to the current lack of relatively complete sign language corpus data in China, the R&D personnel, with the support of the Beijing Disabled Persons' Federation and the Municipal Disabled Persons' Federation Deaf Association, invited more than 40 deaf teachers and sign language experts to carry out sign language text transcription and motion capture recording, and conducted a wide range of hearing impaired group evaluations, and finally built a high-quality sign language corpus, with a total vocabulary and sentence scale of more than 100,000. "There's not just sports vocabulary and Winter Olympic terminology, but all kinds of words used in newscasts."

Create a "smart digital brain" that broadcasts key speech messages and converts sign language

Sign language and spoken language broadcast speeds are different. When the anchor broadcasts, he can say more than two hundred words a minute, and the sign language is expressed by the body, and the speed is relatively slow, so to keep up with the speed of the broadcast, it is necessary to summarize the broadcast content and express the core semantics.

"For example, the anchor said that today's Beijing is windy and sunny, the sky is clear, and the sign language broadcast should be matched with the voice, just express 'Today's Beijing weather is good'." Du Jizhong, CTO of the Intelligent Spectrum AI Digital People Division, said.

He also mentioned that sign word order differs from natural language, with the habit of "postposition of negative words". For example, "I'm not happy", sign language will be translated in the order of "I", "happy" and "no".

How can the "Winter Olympic Sign Language Broadcast Digital People" penetrate these characteristics of sign language? It is reported that the system takes the ultra-large-scale pre-training model as the core technology, and through semantic distillation and sign language translation quick editing model, the news broadcast voice is simplified and translated into a word order that conforms to sign language habits.

Digital people broadcast realistic and natural, and the expression is understandable by 90%

Digital people also need to be intimate and natural, in line with aesthetics, and cannot make the audience feel "blunt".

In order to achieve high-precision and high-natural character images and sign language action postures, the R&D team also independently built a multi-modal body movement, expression and finger synchronous collection system. By collecting multimodal motion capture data and using the cross-modal anthropomorphic generation algorithm to drive and render hyperrealistic digital people naturally and smoothly, the algorithm can automatically generate smooth transition actions between adjacent actions for each piece of motion capture data.

Zuo Jiaping said that when the oral teacher broadcasts the oral language, he will also use the mouth type to make the hearing impaired understand more clearly. Therefore, there will also be changes in the shape of the mouth when the digital person broadcasts. "At present, the system has only been done for nine months, and it has been able to match gestures and lip patterns. In the future, through further research and development, digital people will also have a richer expression. ”

At present, the "Winter Olympic Sign Language Broadcast Digital Man" has achieved "accuracy of expression" and "intelligability of expression". After evaluation, its expression understandability has reached 90%.

Digital people can work 7×24 hours a day, and may be able to use sign language teaching in the future

After the opening of the Winter Olympics, the "Winter Olympic Sign Language Broadcasting Digital Man" continued to broadcast the "Winter Olympics Event Highlights" and "Watching the Winter Olympics Together" on Beijing Satellite TV's "Beijing You Morning" program every day.

In the post-Olympic era, what will be the application of "Winter Olympic Sign Language Broadcasting Digital People"? It is reported that digital people can provide digital sign language generation services for news media, which is convenient for hearing impaired people to quickly understand news newsletters. At the same time, digital people can work 7× 24 hours a day to solve the problem of insufficient sign language interpreters.

Wang Yi, deputy director of the News Channel Center of Beijing Radio and Television Station, said that sign language is more complicated, and there are differences in "dialects" in Chinese sign language. In the case of a shortage of sign language teachers, if artificial intelligence is used for standard sign language broadcasting and teaching, it will reduce errors, accelerate the promotion of national common sign language, and create a barrier-free environment for the disabled to participate in social life on an equal footing. He said that in the future, sign language will be broadcast in more channels and programs to facilitate more information for people with hearing impairments.

Zuo Jiaping said that the image of the sign language broadcast digital person is not limited to "Hua Zhibing", and can be replaced by other idol images according to the user's preferences.

Beijing News reporter Zhang Lu

Edited by Fan Yijing Proofreader Li Lijun

Read on