Everyone's film and television ended, and the squid game killed crazy! What subtitles to want, AI dubbing 5 seconds to generate

Edit: Good sleepy little salted fish

An era, after all, has come to an end.

On November 22, the Shanghai Third Intermediate People's Court conducted a public trial of the infringement case of the "Renren Film and Television Subtitle Group" and rendered a first-instance judgment in court.

Defendant Liang Yongping was sentenced to three years and six months' imprisonment and fined RMB1.5 million for copyright infringement.

The illegal gains shall be recovered, and the personal property seized for the purpose of committing the crime shall be confiscated.

Everyone's film and television ended, and the squid game killed crazy! What subtitles to want, AI dubbing 5 seconds to generate

Some time ago, the Korean dystopian drama "Squid Game" can be described as quite popular, with 142 million plays in the first month of online, dominating 90 countries and regions.

Netflix also offers subtitles and dubbing in up to 13 languages.

However, Korean-American comedian Youngmi Mayer found the official subtitles of "The Squid Game" too outrageous and completely unsatisfactory.

For example, when an actress uses Korean to mean "what to see," Netflix's English subtitles translate to "go away."

With the rise of streaming media such as Netflix, there are also more and more non-English-language works such as "Squid Games".

However, there is a shortage of talent in the subtitling and dubbing industry, especially in small languages.

Again, if you want to bring it to the Spanish market, you usually export the English version of the subtitle, but then translate it into French on this basis.

That is to say, the quality of subtitles in some languages depends entirely on the Translation of English, and this conversion process will inevitably lose a lot of information details.

According to statistics, the dubbed version of "Squid Game" has more viewers than the subtitle version.

To this end, whether it is a streaming giant like Netflix or some small localization service providers, they are exploring whether ai technology can replace human translation.

So, can AI work or not?

It starts with what Deepfake Voice is.

Copying or cloning a person's voice, a commonly used technique called Deepfake Voice, also known as speech cloning or synthesizing speech, aims to generate a person's voice using AI.

At present, the technology has been developed to the point where the human voice can be reproduced very precisely in tone and similarity.

Sound cloning is a process in which people use computers to generate the voice of a real individual and use artificial intelligence (AI) to create a clone of a specific, unique sound.

To clone someone's voice, there must be training data lost to the AI model. This data usually records examples of what the target person is saying.

Artificial intelligence can use this data to render a real sound, such as generating a piece of speech with anything you can type in text, a process called text-to-speech.

In previous text-to-speech (TTS) systems, training data was a key component that controlled the generation of speech output. In other words, the sound you hear should be the voice given in the dataset.

But now, with the introduction of the latest AI technology, using some features of the target sound, such as speech waveforms, can also be analyzed and extracted in more depth.

Synthetic sound is a term commonly referred to as Deepfake Voice, and synthetic sound is often used interchangeably with sound clones.

But in simple terms, synthesized speech is computer-generated speech, also known as speech synthesis, which is generally achieved through artificial intelligence (AI) and deep learning.

There are two main ways to synthesize sound: text-to-speech (TTS) and speech-to-speech (STS).

Text-to-speech (TTS) has been described above, and TTS software has been used to help visually impaired people read digital texts, as well as in other applications such as voice assistants.

Speech-to-speech (STS) is not about using text, but about using one piece of speech to modify the characteristics of its voice to create another piece of synthesized speech that sounds very realistic.

Speech synthesis in the past did not produce sounds that were fake and real. But with the development of technology, this has changed.

Traditional speech synthesis typically uses two basic techniques. These two techniques are splicing synthesis and formant synthesis.

Stitching synthesis takes the form of stitching short samples of recorded sound together to form a chain called a unit. These units are then used to generate user-defined sound patterns.

The technique of formant synthesis is most commonly used to replicate the sounds people make with vowels.

The downside of these methods is that from time to time they generate sounds that people can't make. But the advent of deep learning and artificial intelligence has taken TTS technology to new heights.

Often referred to as neural text-to-speech, AI text-to-speech utilizes neural networks and machine learning techniques to synthesize speech output from text.

First, the speech engine accepts audio input and recognizes the sound waves generated by human sounds.

This information is then translated into linguistic data, which is called automatic speech recognition (ASR). After obtaining this data, the speech engine must analyze the data to understand the meaning of the words it collects, which is known as natural language processing (NLP).

Finding training data is the first basic item of synthesizing sound. Without clear sound recordings, there is no way to successfully train an AI model to capture all the intricate details of a person's speech.

The recording process can take anywhere from a few hours to several hours, and the Voice Solutions team will provide a comprehensive list of phrases to capture all the characteristics of one person's voice.

Usually, this list doesn't go beyond 4,000 phrases, but the goal is really to capture as much data as possible around someone's unique voice — the more data you capture, the more accurate the sound clones will be.

Next, AI models speech data.

Use a neural network to acquire an ordered set of phonemes and then convert them into a set of spectrograms. A spectrogram is a visual representation of the spectrum of a signal band.

The neural network selects the appropriate spectrogram whose frequency bands can more accurately characterize the acoustic features used by the human brain in understanding speech. Neurosonographs then convert these spectrograms into speech waveforms that produce natural and realistic sounds.

In October, a project on GitHub snapped up 13k stars.

In just 5 seconds, it can use AI technology to simulate sound to generate arbitrary voice content, and it also supports Chinese.

According to the uploaded demonstration video, the sound is also realistic.

Key features of Mocking Bird include:

Support Mandarin and test with multiple Chinese datasets: aidatatang_200zh, magicdata, aishell3, biaobei, MozillaCommonVoice, etc

For pytorch, tested in version 1.9.0, GPU Tesla T4 and GTX 2060

Can run in Windows operating system and Linux operating system (Apple system M1 version also has community successful running cases)

Simply download or train a new synthesizer (synthesizer works well, reuse pre-trained encoder/vocoder, or live HiFi-GAN as a voodter)

Provide a Webserver to view the training results for remote call

Mocking Bird is also very simple to use in addition to having a column on Zhihu to share nanny-level tutorials and training tips.

Start by installing the remaining packages required in PyTorch, ffmpeg, webrtcvad-wheels, and requires .txt.

The second step is to prepare a pre-trained model, using models provided by the author or trained by others.

Important data processing operations are audio and Mel spectrogram preprocessing: python pre.py <datasets_root> can pass in the parameters --dataset {dataset} to support aidatatang_200zh, magicdata, aishell3

The third step is to launch a web program directly in the browser to debug.

Or launch a more complete toolbox software.

The author also thoughtfully attached all the papers and original code repositories that can be learned.

The warehouse's name, MockingBird, is a mock bird, an anti-tongue bird, known for its ability to imitate the sounds of other birds, insects, and amphibians, and is also a bird that often appears in Western literature or film and television works, and is biologically a common name for the mockingbird.

The english of the famous book's name, To Kill a Mocking Bird, is actually a translation error, and the English for robin is Robin.

Voice fraud brought on by Deepfake Voice is a big problem.

In 2019, criminals cloned the voice of the CEO of a UK-based energy company and defrauded $240,000 because the fake CEO sounded very real in both accent and tone. The incident was the first known cybercrime in Europe to directly use artificial intelligence.

Another incident occurred in 2020. A bank manager working in the United Arab Emirates answered a phone call when he thought he was talking to a director of a company and ended up falling into an outright voice scam that mistakenly approved a $35 million transfer.

As technology has evolved, Deepfake Voice scams have become more sophisticated, and many people may have already encountered some fake Deepfake Voice voices on social media.

So, how do you prevent Deepfake Voice fraud?

There are two ways.

The first method is to create a detector that analyzes the sound to determine if it was made using deepfake technology. Unfortunately, because Deepfake Voice technology will continue to evolve, detectors can't always stay correct.

The second method is relatively more realistic, mainly to achieve an audio watermark that the listener cannot hear and that people cannot edit. An audio watermark is essentially a record of sound being created, edited, and used. This makes it easier for people to know if a piece of sound is synthetic.

Resources:

https://www.axios.com/artificial-intelligence-voice-dubbing-synthetic-14bfb3c6-99db-4406-920d-91b37d00a99a.html

https://www.businesswire.com/news/home/20210514005132/en/Veritone-Launches-MARVEL.ai-a-Complete-End-to-End-Voice-as-a-Service-Solution-to-Create-and-Monetize-Hyper-Realistic-Synthetic-Voice-Content-at-Commercial-Scale

https://www.veritone.com/blog/combining-conversational-ai-and-synthetic-media/

https://www.veritone.com/blog/everything-you-need-to-know-about-deepfake-voice/

https://www.veritone.com/blog/how-ai-companies-are-tackling-deepfake-voice-fraud/

https://www.veritone.com/blog/how-to-create-a-synthetic-voice/

Special thanks to ifan

https://www.ifanr.com/1454818

synthetic-voice/

Everyone's film and television ended, and the squid game killed crazy! What subtitles to want, AI dubbing 5 seconds to generate

Read on

To see how strong the AI is, someone took it to play a "script kill"

Hardware 丨 AMD expects to launch a CPU with an integrated AI engine as early as 2023

Why sound is suitable for building a brand strengthens the mind

The 7th generation of Qualcomm AI engine: through AI, see the future

Capture once in 5 minutes, at least 89 times a day at home! Suntech employee: I don't even dare to go to the toilet

Played a script kill, the same car teammate "not human"

2022 Le Orange New Product Launch: 14 new products qifa software and hardware fully upgraded

Is there any software to dub videos? Share software that can dub videos

Don't let ChatGPT run

The meme search artifact is here! You can also search for videos, netizens: I found a six-year meme to solve in two minutes

Cheating with ChatGPT, beware of being caught, anti-plagiarism watermark technology makes students' nightmares come early

Google's "crazy" generative AI track, the latest model can "create" music with text and pictures

What to do if ChatGPT goes crazy? Xiaoice Li Di: Two keys that I can break

Experience ChatGPT again: it will still be wrong, but the logic is stronger

Xiaoza personally officially announced the Meta vision big model! Self-supervised learning requires no fine-tuning

The CV ring exploded again? Xiaoza high-profile official announcement DINOv2, split retrieval omnipotent, netizens: Meta is "Open" AI

"Squid Game 2" Lee Jung-jae and Lee Byung-heon's acting skills exploded, and the stage of the death game moved to the rainbow playground

Research report on the market status quo, key enterprises and investment direction of China's squid industry

Food recommendation: squid flower in secret sauce, stir-fried kung pao chicken with burning sauce, and how to make shrimp skin tofu

"The freshness in the sea, the spicy pickled pepper", the fried squid with pickled pepper is spicy and delicious, and the family is full of praise

Mei Ting takes her children to Universal Studios, her daughter COS squid, and her son COS faceless man, so cute,

"The seafood is fresh, the squid is fragrant, and the dancers on the iron plate are busy", the iron plate squid, crispy and tender, and the craving is fragrant

Food recommendation: blessed land Linwu duck, dancing squid tube, jasmine pork incense production method

The flavor of squid is a unique way to eat, stir-fry with chili peppers, which is tender and satisfying, and the flavor is unique

Global Quanjude grilled squid is delicious, buy two get one free for the annual card, and the grilled squid was originally intended to be 95% off, but the young lady who ordered said that the annual card only buys two and gets one free, hahaha, and eats 3 skewers with tears

Trust me, "garlic squid" does this, it's really delicious!

Zhibixing is amazing! Squid Game x Crocs Did you plant grass?

The squid seafood soup is full of fresh and fragrant, and the soup is thick and delicious, which makes people have an endless aftertaste, and they want to eat it after eating it once

The squid has long whiskers and the braised pork is fragrant, and the two are matched, unparalleled in the world, and suitable for all ages

Food recommendation: Korean kimchi hot pot, stir-fried chicken with Huanggong pepper, rattan pepper spring bamboo shoots mixed with small squid production method

Stir-fried squid, the ultimate temptation of rural cuisine, crispy and tender, and the flavor is absolutely unique

The Chinese version of "Squid Game" specializes in pit the wealthy