laitimes

In four steps, we will deeply understand why Guo Degang can speak cross talk in fluent English

author:Mobile phone photography with curious babies

Mildew speaks fluent Chinese, Guo Degang speaks crosstalk in English... I believe you have swiped these videos in recent days, and what core technologies are used behind these popular videos recently? Don't worry, curious baby will take you to easily understand these core technologies in four steps.

Let's take a look at Guo Degang's English cross talk first.

In four steps, we will deeply understand why Guo Degang can speak cross talk in fluent English

Guo Degang speaks cross talk in fluent English

Mildew is fluent in Chinese

In four steps, we will deeply understand why Guo Degang can speak cross talk in fluent English

Mildew speaks fluent Chinese-AI synthetic version

The original version actually looked like this.

In four steps, we will deeply understand why Guo Degang can speak cross talk in fluent English

Mildew Speaks English - Original Version

Let's take a look at Watson's Chinese and pay attention to the difference between the Chinese mouth shape and the English mouth shape

In four steps, we will deeply understand why Guo Degang can speak cross talk in fluent English

Watson speaks the Chinese-AI synthetic version

In four steps, we will deeply understand why Guo Degang can speak cross talk in fluent English

Watson speaks English - the original version

You think that's the end of it, and Trump, Bean is fluent in Chinese

In four steps, we will deeply understand why Guo Degang can speak cross talk in fluent English

Bean Speaks English - Original Version

In four steps, we will deeply understand why Guo Degang can speak cross talk in fluent English

Bean Speaks Chinese-AI Synthetic Version

So what are the core technologies behind these videos?

In fact, the technologies involved can be divided into four steps: 1. Speech-to-text, 2. Intelligent translation, 3. Voice style conversion, and 4. Lip-to-mouth matching.

Let's take Mr. Cai Ming's English talk show as an example to illustrate this process:

In four steps, we will deeply understand why Guo Degang can speak cross talk in fluent English

Mr. Cai's English talk show-AI synthetic version

In four steps, we will deeply understand why Guo Degang can speak cross talk in fluent English

Teacher Cai's talk show - the original version

Step 1: Speech-to-text:

In the original video, the sketch said by Mr. Cai Ming is in Chinese, and in order for Mr. Cai Ming to speak English, we first have to let the machine understand what Mr. Cai Ming said. Speech-to-text technology can do just that. Enter a piece of speech, and we convert it into text through this technology. This process is just like how we usually use WeChat's speech-to-text function, which is a relatively mature technology in itself.

Step 2: Intelligent Translation:

Using the above speech-to-text technology, we can easily get the content of Mr. Cai Ming's speech, but if we want Mr. Cai Ming to speak English, we also need to translate the text of these Chinese into English. This process is the same as the text translation software we use, input Chinese into English automatically.

Step 3: Voice Style Transition:

We can see from the video that Mr. Cai Ming's voice and intonation have a pure English accent, which is very different from the original Chinese voice. Therefore, if you want Mr. Cai to speak pure English, you also need to convert Mr. Cai Ming's voice into an English tone. This is where sound cloning technology, or style conversion technology, comes in. Transform one style of content into another. I don't know if anyone remembers that Huang Rong played by Yang Mi, who was very popular on the Internet before, actually changed Zhu Yin's style to Yang Mi's style, but here is a voice style conversion instead of a picture style conversion.

In four steps, we will deeply understand why Guo Degang can speak cross talk in fluent English

Huang Rong and Guo Jing - AI Synthesis Version

In four steps, we will deeply understand why Guo Degang can speak cross talk in fluent English

Huang Rong and Guo Jing - Original Video Version

Step 4: Lip Shape Matching:

In addition to the English accent of the voice, the mouth shape is also similar from the content of the speech. To do this, you also need to do lip matching. The AI generation technology used here is also a popular technology in recent years. For example, GeneFace++, students who want to know more about this paper can go to this paper. In short, we just need to know that there is such a tool that can do lip matching. Looking carefully at the video of Trump speaking Chinese, there are still differences between the Chinese mouth shape and the English version, and the key is to match each other.

In four steps, we will deeply understand why Guo Degang can speak cross talk in fluent English
In four steps, we will deeply understand why Guo Degang can speak cross talk in fluent English
In four steps, we will deeply understand why Guo Degang can speak cross talk in fluent English

GeneFace++ Papers

In four steps, we will deeply understand why Guo Degang can speak cross talk in fluent English

GeneFace inference process

After reading these four steps, I believe you have understood these core technologies, what other interesting applications do you think these core technologies can bring, and do you worry about these AI creations?

#科普掘金计划#

Read on