In tribute to Shan Tianfang, Himalaya uses AI voice to recreate the voice of the late master

author：Cover News 2021-10-26 21:37:54

"Hello fellow listeners. From now on, I will broadcast a suspense novel for everyone, "The History of the Demise of the Jianghu: The Dark Night of Beiping". This story takes place in the city of Beijing for more than ten years in the Republic of China.

September 11 this year marks the third anniversary of the death of Mr. Shan Tianfang, a master of book critics. Three years ago, countless people lamented that there is no "next decomposition" in the world, but now, his "voice" has sounded again in the Himalayas, and "books have been repeated" to reappear in the jianghu.

Recently, under the authorization of Beijing Shan Tian Fang Art Communication Co., Ltd., Himalaya used speech synthesis (TS: Text-to-speech) technology to perfectly restore Mr. Shan Tianfang's voice, and for the first time applied Mr. Shan TianFang's AI synthesized sound to six books with different styles, using a single style commentary tone to interpret the classics that the audience is familiar with. Shan Ruilin, son of Mr. Shan Tianfang, commented, "Hearing the sound of TTS, there was a sudden burst in my heart, as if my father had returned to this world. ”

In tribute to Shan Tianfang, Himalaya uses AI voice to recreate the voice of the late master

The book picks up

Mr. Shan Tianfang is a famous master of storytelling in China and a inheritor of national intangible cultural heritage. Since he has been in the arts for more than half a century, he has recorded and broadcast more than 100 radio and television commentaries, such as "Sui and Tang Dynasties", "Three Heroes and Five Righteousness", "The Tyrant of the Troubled World", "The Great Hero of the White Eyebrows", etc., which have been broadcast on more than 500 radio and television stations across the country, with a program time of about 6,000 hours, and compiled 17 sets of 28 kinds of traditional commentary text manuscripts.

Shan Tianfang's commentaries have become an important symbol of traditional Chinese culture. His book fans are all over the country, from the elderly to the children, and they all dislike his book reviews. In the folk, there is even a saying that "where there is well water, listen to Shan Tianfang", even today, if you take a northern taxi, the driver may still be listening to his book review program.

In the "Shan TianFang Voice AI Reproduction Series Album" launched this time, there are not only martial arts novels that directly hit the tears, strange people and strange things to express the sorrows and joys of life - Zhao Chenguang's "The History of the Demise of the Jianghu: The Dark Night of Beiping"; there are also documentary literature that conveys the pulse of the times - Chen Tingyi's "Three Brothers of Mao: Three Brothers and the Foundation of the Republic"; there are also popular and strange storyline mystery novels - Zi Jin Chen's "Undocumented Crimes"; and there are also continuations of Shan Lao's unfinished book classic - Gong Baiyu's "Twelve Money Darts"...

The cooperation between Himalaya and Beijing Shan Tian Fang Art Communication Co., Ltd. has a long history, and Himalaya has listed more than 80 albums of Mr. Shan Tian Fang's book reviews, including more than 5,000 voices. These albums have also been loved by users in the Himalayas, and many commentaries have long been at the top of the HimalMa Cross-Talk Book Review Chart, for example, "Gone with the Wind" has been played 2.36 billion times in the Himalayas, and "The Man with the White Eyebrows" has been played as high as 1.97 billion.

In order to pay tribute to the old man and inherit the culture, Himalaya also launched the "Book Picking Up - New Book Review Inheritance Plan", hoping to enable more and more book critics and young book critics to participate in the creation of new books and enrich and inherit the intangible cultural heritage of book reviews.

Perfect reproduction

After three years, I was able to hear the iconic "Cloud Shade Moon" voice again thanks to the himalayan intelligent speech laboratory's dedicated exploration and research and development of Mr. Shan Tianfang's voice. In order to maximize the retention of Shan Lao's vigorous, hoarse unique voice and emotional storytelling tone, the Himalaya Intelligent Speech Lab has made a lot of efforts.

The Himalaya Intelligent Speech Lab has long focused on speech synthesis, recognition, speech signal processing, codec and intelligent sound effects research and development, and is the core department of Himalaya.

In order to reproduce Shan Lao's voice and pay tribute to traditional art, the Himalayas Intelligent Speech Lab not only perfectly reproduces Shan Lao's vigorous and hoarse "cloud shading the moon" unique voice, but even his emotionally energetic, undulating tone has also been retained. When the AI synthesizer, whose voice is very similar to that of Mr. Shan Tianfang himself and is close to the Buddha-figure, naturally and fluently speaks of the commentary, the former storyteller who can make people addicted with just one mouth seems to have returned to us.

Himalaya also invited professional sound engineers to add soundtracks and sound effects to each "new one-style work", so that listeners can get an immersive experience through the ear. With the blessing of senior sound designers, the world in the mouth of Shan Laokou has become more three-dimensional and vivid.

Compared with the general synthetic audio, there are many scene descriptions and different emotional expressions in the commentary, especially Mr. Shan Tianfang is good at shaping the characters with his voice, and the rhythm in his commentary changes greatly. There are also many colloquial pronunciations, which are very different from those in Mandarin. For example, the word "this" in "this" is pronounced "zhè" in Mandarin, but is usually pronounced as "zhèi" in commentaries. If you only rely on the current mainstream TTS framework model to extract and synthesize, the overall feelings and emotions of the synthesis review will be very flat, without the ups and downs of the original work.

To solve this problem, the Himalaya Intelligent Speech Lab has designed a separate prosody extraction module and incorporated it into the HiTTS technology framework. This means that no matter how rich and varied the rhymes in Mr. Shan Tianfang's commentary are, they can be extracted and completely reproduced, so that Shan Lao's AI synthesis sounds seem to be reproduced. On the other hand, in view of the pronunciation in the shan lao review book that is different from standard Mandarin, the team also pioneered the design of the accent module and marked these special pronunciations, so that the shan lao AI synthesizer can restore the original taste.

In this way, the original "voice" of Shan Tianfang reappeared.

Sound imagination

The perfect reproduction of Shan Tianfang's "voice" is no accident. Himalaya has been studying in the field of TTS for many years, and TTS technology will help Himalaya to further expand the possibilities of AIGC in addition to the existing "UGC + PGC + PUGC" content ecology.

Dr. Lu Heng of the Himalaya Intelligent Speech Lab said that the TTS system and timbre selection for novels are the highlights and characteristics of Himalaya TTS. It is very difficult to interpret an audio novel with a real and natural TTS timbre, unlike ordinary text-to-speech, using TTS timbre to interpret a novel requires learning the depression, emotional expression, and contextual relationship in the novel, distinguishing between narration and dialogue, and finally perfectly interpreting the work. "The Himalayas have a natural advantage in this regard. After years of working on the audio track, Himalaya has gathered a huge amount of audiobook content and many excellent anchors. The Himalaya Intelligent Speech Lab tries to use a variety of voices to express different emotions, themes and channels, so there is more room for experimentation and play. ”

Dr. Lu Heng introduced that Himalaya's self-developed TTS front-end text processing analysis module has been able to perform polysyllabic word recognition, prosody prediction and style classification of texts with high precision and full automation, and has developed a TTS technology model that can achieve multi-emotion, multi-style, multi-lingual sounds, which can not only interpret the text of different emotions, but also automatically distinguish narration, dialogue, and support English, which greatly enriches the emotions and rhythms that TTS can express. Himalaya has applied for three patents related to TTS speech synthesis, including a technical framework that can make TTS sounds without any original English data speak English, such as Himalaya technology, which can already use Mr. Shan Tianfang's "voice" to speak English.

At present, Himalayas has used TTS for the production of a variety of content, helping creators to lay out audio, transform and upgrade. For example, the "Whale Express" album launched by Himalayas and the Beijing News has ranked first in the himalayan news album list for many consecutive weeks. For users, the application of TTS technology will bring them richer and better content. Himalaya will continue to open up the imagination of sound, let technology bless sound, and let sound serve life.

<b>[If you have a news thread, please report to us, once adopted, there is a fee reward.] WeChat attention: ihxdsb, QQ: 3386405712].</b>

In tribute to Shan Tianfang, Himalaya uses AI voice to recreate the voice of the late master

Read on