Tsinghua Big Data Forum Ends Zheng Wen, Vice President of AI Technology of Kuaishou, shares deep learning applications

On April 27th, on the occasion of the 108th anniversary of Tsinghua University, the Tsinghua University Big Data Research Center and the Tsinghua-Kuaishou Future Media Data Joint Research Institute co-hosted the "Tsinghua Big Data Forum - Deep Learning Technology and Application", where teachers, students and alumni of Tsinghua University gathered together to discuss and share the latest progress of deep learning technology and application.

Tsinghua University-Kuaishou Future Media Data Joint Research Institute was officially established in April 2018. As a university-level scientific research institution of Tsinghua University, the Institute makes full use of Tsinghua University's leading technology and Kuaishou's years of industry accumulation to carry out basic and applied research, development, integration and rapid iteration in many fields, and jointly explore a series of future media topics, so that technology can better empower users and achieve more accurate connections between people.

Tsinghua Big Data Forum Ends Zheng Wen, Vice President of AI Technology of Kuaishou, shares deep learning applications

Dr. Zheng Wen, a 2001 alumnus of the School of Software, vice president of Tsinghua-Kuaishou Future Media Data Joint Research Institute, and vice president of Kuaishou AI Technology, shared the topic of "Application and Prospect of Deep Learning in the Field of Short Video". Here's the core of the talk.

As a short video app with more than 160 million daily active users, Kuaishou's mission is to "use technology to enhance everyone's unique happiness". There are two key words here, one is "everyone", which shows that Kuaishou's values are very universal, but we also emphasize that everyone's happiness is "unique". It is difficult to achieve services for everyone by manual operation alone, and it needs to be achieved through artificial intelligence technology, especially the deep learning technology that has been broken in recent years.

At present, Kuaishou is promoting happiness through recording, which can be reflected in two aspects. First, users want to see the wider world. Second, users also have a need to share themselves and be seen by the wider world.

But here is a challenge, now Kuaishou has accumulated more than 8 billion videos and hundreds of millions of users, in the face of these two massive numbers, how to effectively distribute attention? In the past, attention was generally focused on the so-called "blockbuster video", but under the blockbuster video, there is a large number of content that may contain very rich information and diverse categories, and this kind of "long-tail video" is often difficult to be noticed by others. As a result, some groups with small needs or more segmented interests often have difficulty finding the content they want.

This challenge determines that we must rely on deep learning-based AI technology to solve this problem, instead of manually implementing the distribution of content matching. Kuaishou has done a lot of accumulation in AI-related technologies since early on, and there are a large number of deep learning applications in every link from video production to distribution.

Content production

Kuaishou hopes to make the record richer and more interesting through AI technology, based on this goal, we have developed a large number of multimedia and AI technologies, such as background segmentation, sky segmentation, hair segmentation, human body key points, face key points, gesture key point detection, etc., and apply them to magic expressions.

The distribution of Kuaishou users and Chinese Internet users is very consistent, and a large part of the mobile phones used by Chinese Internet users are low-end mobile phones with limited computing power. In order to make the advanced technology experienced by the most users, Kuaishou customizes the underlying platform, based on Kuaishou's self-developed ycnn deep learning inference engine and media engine, so that the above technology can run efficiently on most models, and adapts and optimizes for different models and different hardware.

Kuaishou also wants to make the content quality even higher, and has developed and applied many image enhancement technologies. For example, when a user shoots in a very low-light environment, the resulting video often loses information and detail, which can be recovered through low-light enhancement technology.

Next are some specific deep learning technologies recently developed by Kuaishou in content production. Three-dimensional face technology can recover the three-dimensional information of the face for a single face image, on the one hand, it can realize some modifications to the face, such as lighting, making some expressions, and achieving three-dimensional face change effects; on the other hand, through the three-dimensional face information, we can extract the expression changes of the person, and then migrate the expression to the virtual cartoon image, the effect is similar to the animoji function launched by iPhonex, but the iPhonex has a structured light camera and runs animoji It requires very powerful computing power, and through technology research and development, we can also achieve similar functions on ordinary cameras and mobile phones with lower configurations.

Just mentioned the semantic segmentation technology, portrait segmentation technology can distinguish the portrait and the background, respectively, the portrait and the background to do special effects, or background replacement, but also can be a portrait bokeh; hair segmentation, you can divide the hair area, do hair coloring effect. Sky segmentation technology can make the sky area more surreal and more dreamy.

Human posture estimation is to predict the joint point position of the person, using this technology, we can add special effects to the human limbs, or modify the human body shape, do the body slimming function. In addition, we can reconstruct the three-dimensional information of the human body for controlling cartoon images.

Gesture detection is the detection of a variety of specific different hand shapes to achieve "rain control" and other gameplay. In addition, there is ar camera attitude estimation, behind which is kuaishou's self-developed 3D engine, and on its basis, editor modules, rendering modules, limb modules, sound modules, etc. are added to achieve the exquisite and natural light sense and material of the model.

In terms of audio and video, we have applied a lot of intelligent algorithms, such as requiring the video to be as clear as possible, but also requiring smooth transmission, which requires some adaptive optimization for the complexity of the video. In addition, we will also analyze the image, such as the area of the face in the video often has the greatest impact on everyone's perception, we will detect the area of the face, the bit rate is higher, so that the overall look and feel has been greatly improved.

We also check image quality, such as the fact that there are factors in the video production process that lead to lower image quality, such as shooting without good focus, lens not wiping for a long time, or video after multiple uploads and compressions resulting in blocky defects. We will detect these problems through the AI algorithm, on the one hand, remind users to pay attention to these problems when shooting, on the other hand, we will also make some tilt on high-quality videos when making video recommendations.

Understanding the content

After the content production process is completed, the video will be uploaded to the back-end server, where we need to have a deeper understanding of the video content. The content understanding of videos will be used in many aspects, such as content security, originality protection, recommendations, search, advertising, etc., which are roughly divided into two stages.

The first is the perception stage, where the machine will understand the video information from the four dimensions of face, image, music, and speech.

The face is a very important dimension, because the face often contains the most important part of the person's concern, we will detect the face area, identify the age, gender, expression and so on.

Another dimension is the image level, where we classify the image, such as what the scene of the image is, and also detect what objects are in the image, evaluate the image quality, and extract text from the image using ocr technology.

Music is an important part of influencing the appeal of the video, we can identify the type of music from the video, and even structure the analysis of the music, separating the accompaniment and singing parts.

Voice is also a very important dimension of video, often from the image may not be very good to get the information conveyed by the video, then the voice is very important, we will recognize the speech into text, but also through the voice to identify the identity, age, gender and so on.

The second stage is the reasoning stage, where we multi-modal fusion of information from different dimensions to deduce higher-level speech information or emotionally recognize the video. We also use knowledge graph technology to store the knowledge in the video and express it in the knowledge graph. Through the reasoning of the knowledge graph, some higher-level and deeper information can be obtained.

In terms of content understanding, we have also done some more specific technologies, such as Kuaishou has developed a video labeling system that can classify most of the content and scenes that appear in the video. In the Kuaishou speech recognition function module, we use deep learning algorithms combined with the context module, which greatly improves the recognition accuracy.

On the one hand, we need to understand the video content, on the other hand, we also need to understand the user, including the user's public age, gender and other information and some behavioral data generated by the user when using Kuaishou in real time. This data is transmitted to the back-end deep learning model to train vectors for user understanding. With these vectors, we can predict the user's interests and his relationship with other users.

Finally, we get the description of the user and the understanding of the video, and the matching between the user and the video will produce trillion-level characteristic big data, which will be used in the real-time online recommendation system to predict what kind of video the user will be interested in. In addition, we will also sort the content in the community, such as how to allocate attention mentioned earlier, we hope that the gap in attention distribution is not too large, so we will adjust the distribution of video content according to the Gini coefficient. Factors such as the security, diversity, and protection of originality of the content are also taken into account.

Finally, we also hope to further strengthen in-depth cooperation with teachers and students in colleges and universities, make full use of Kuaishou's massive data and powerful computing power, jointly promote deep learning technology, explore more possibilities in the future, and enhance public happiness, which is also the vision of establishing Tsinghua University - Kuaishou Future Media Data Joint Research Institute, thank you.

Tsinghua Big Data Forum Ends Zheng Wen, Vice President of AI Technology of Kuaishou, shares deep learning applications

Read on

Zheng Wen went deep into the city to investigate the designated poverty alleviation work

"Zheng Wen Gong Monument" Super Qing Atlas (1) Rough High-definition Text

Zheng Wengong was rude to Zhong'er, how did Zhong'er retaliate against Zheng Guo after he became the monarch?

Zheng Wen: How to make the old brand "shine again"

Wei Stele Round Pen Pole Track "Zheng Wen Gong Monument"

Majestic and majestic, integrating seal posture, rhyme, and grass feelings, rigid and energetic in one "Zheng Wen Gong Monument"

The son of the Eastern Zhou Dynasty Story 134 wants to rebel, and Zheng Wengong kills two of his own sons

Wei Bei's those things - Wei Bei's Pen Method Jiao Detective (The Extreme Track of the Round Pen, Zheng Wen Gong Stele)

The DPP's post-90s "second generation of politicians" took over as the director of the youth bureau, and Zheng Wencan actually asked the outside world to "not think too much"

The DPP's post-90s "second generation of politicians" took over as the director of the youth bureau, and Zheng Wencan actually asked the outside world to "not think too much"

The "second generation of politics" takes office! The second generation of the Democratic Progressive Party took over the important post, and Zheng Wencan asked the outside world to "not think too much"

Zheng Wenguang's works: Flying to Sagittarius III. The Miracle of the Cosmic Age

Zheng Wenguang's works: Flying to sagittarius , A snowstorm twilight

Zheng Wenya's eldest sister-in-law Yu Judi died, once worth hundreds of millions, and her good girlfriend Liang Anqi did not respond for the time being

[Taiwan media: Pulling Chen Jianren into the party as a "living chess piece", Tsai Ing-wen is to snipe Lai Qingde and Su Zhenchang, two "independence factions" to choose 2024, to prevent Lai and Su from taking Taiwan to war and letting the United States go

To see how strong the AI is, someone took it to play a "script kill"

Hardware 丨 AMD expects to launch a CPU with an integrated AI engine as early as 2023

Why sound is suitable for building a brand strengthens the mind

The 7th generation of Qualcomm AI engine: through AI, see the future

Capture once in 5 minutes, at least 89 times a day at home! Suntech employee: I don't even dare to go to the toilet

Played a script kill, the same car teammate "not human"

2022 Le Orange New Product Launch: 14 new products qifa software and hardware fully upgraded

Is there any software to dub videos? Share software that can dub videos

Don't let ChatGPT run

The meme search artifact is here! You can also search for videos, netizens: I found a six-year meme to solve in two minutes

Cheating with ChatGPT, beware of being caught, anti-plagiarism watermark technology makes students' nightmares come early

Google's "crazy" generative AI track, the latest model can "create" music with text and pictures

What to do if ChatGPT goes crazy? Xiaoice Li Di: Two keys that I can break

Experience ChatGPT again: it will still be wrong, but the logic is stronger

Xiaoza personally officially announced the Meta vision big model! Self-supervised learning requires no fine-tuning

The CV ring exploded again? Xiaoza high-profile official announcement DINOv2, split retrieval omnipotent, netizens: Meta is "Open" AI