Tsinghua Big Data Forum: Zheng Wen, vice president of AI technology of Kuaishou, shares deep learning applications

On April 27, on the occasion of the 108th anniversary of Tsinghua University, the Tsinghua University Big Data Research Center and the Tsinghua-Kuaishou Future Media Data Joint Research Institute co-hosted the "Tsinghua Big Data Forum - Deep Learning Technology and Application", and teachers, students and alumni of Tsinghua University gathered together to discuss and share the latest progress of deep learning technology and application.

Dr. Wen Zheng, Vice President of Kuaishou AI Technology, made a keynote speech

It is reported that the Tsinghua University-Kuaishou Future Media Data Joint Research Institute was officially established in April 2018. As a university-level scientific research institution of Tsinghua University, the Institute makes full use of Tsinghua University's leading technology and Kuaishou's years of industry accumulation to carry out basic and applied research, development, integration and rapid iteration in many fields, and jointly explore a series of future media topics, so that technology can better empower users and achieve more accurate connections between people.

Dr. Zheng Wen, a 2001 alumnus of the School of Software, vice president of Tsinghua-Kuaishou Future Media Data Joint Research Institute, and vice president of Kuaishou AI Technology, shared the topic of "Application and Prospect of Deep Learning in the Field of Short Video".

Zheng Wen said that as a short video app with more than 160 million daily active users, Kuaishou's mission is to "use technology to enhance everyone's unique happiness" There are two key words, one is "everyone", which shows that Kuaishou's values are very universal, but at the same time emphasizes that everyone's happiness is "unique". It is difficult to achieve services for everyone by manual operation alone, and it needs to be achieved through artificial intelligence technology, especially the deep learning technology that has been broken in recent years.

Zheng Wen said that at present, Kuaishou is to enhance happiness through records, which can be reflected in two aspects. First, users want to see the wider world. Second, users also have a need to share themselves and be seen by the wider world.

But here is a challenge, now Kuaishou has accumulated more than 8 billion videos and hundreds of millions of users, in the face of these two massive numbers, how to effectively distribute attention? In the past, attention was generally focused on the so-called "blockbuster video", but under the blockbuster video, there is a large number of content that may contain very rich information and diverse categories, and this kind of "long-tail video" is often difficult to be noticed by others. As a result, some groups with small needs or more segmented interests often have difficulty finding the content they want.

This challenge determines that it is necessary to rely on deep learning-based AI technology to solve this problem, instead of manually implementing the distribution of content matching. Kuaishou has done a lot of accumulation in AI-related technologies since early on, and there are a large number of deep learning applications in every link from video production to distribution.

Content production

Zheng Wen said that Kuaishou hopes to make the record more rich and interesting through AI technology, based on this goal, developed a large number of multimedia and AI technologies, such as background segmentation, sky segmentation, hair segmentation, human body key points, face key points, gesture key point detection, etc., and applied them in magic expressions.

The distribution of Kuaishou users and Chinese Internet users is very consistent, and a large part of the mobile phones used by Chinese Internet users are low-end mobile phones with limited computing power. In order to make the advanced technology experienced by the most users, Kuaishou customizes the underlying platform, based on Kuaishou's self-developed ycnn deep learning inference engine and media engine, so that the above technology can run efficiently on most models, and adapts and optimizes for different models and different hardware.

Zheng Wen revealed that Kuaishou also hopes to make the content quality higher, and has developed and applied a lot of image enhancement technology. For example, when a user shoots in a very low-light environment, the resulting video often loses information and detail, which can be recovered through low-light enhancement technology.

Next are some specific deep learning technologies recently developed by Kuaishou in content production. Three-dimensional face technology can restore the three-dimensional information of the face for a single face image, on the one hand, it can realize some modifications to the face, such as lighting, doing some expressions, and achieving three-dimensional face change effects; on the other hand, through the three-dimensional face information, you can extract the change of the person's expression, and then migrate the expression to the virtual cartoon image, the effect is similar to the animoji function launched by iPhonex, but the iPhonex has a structured light camera, and running animoji requires a very strong computing power. Through technology research and development, similar functions can be achieved on ordinary cameras and mobile phones with lower configurations.

Zheng Wen said that the portrait segmentation technology can distinguish the portrait and the background, do special effects on the portrait and background, or replace the background, and can also make the portrait bokeh; hair segmentation, you can divide the hair area, do the hair coloring effect. Sky segmentation technology can make the sky area more surreal and more dreamy. Human posture estimation is to predict the joint point position of the person, using this technology, you can add special effects to the human limbs, or modify the human body shape, do the body slimming function. In addition, it can reconstruct the three-dimensional information of the human body and use it to control the cartoon image.

Gesture detection is to detect a variety of specific different hand shapes to achieve "rain control" and other gameplay. In addition, there is ar camera attitude estimation, behind which is kuaishou's self-developed 3D engine, and on its basis, editor modules, rendering modules, limb modules, sound modules, etc. are added to achieve the exquisite and natural light sense and material of the model.

In terms of audio and video, a lot of intelligent algorithms are applied, such as the need for video to be as clear as possible, but also require smooth transmission, which requires some adaptive optimization for video complexity. In addition, the image will also be analyzed, such as the area of the face in the video often has the greatest impact on everyone's perception, the area of the face will be detected, and the bit rate will be higher, so that the overall look and feel is greatly improved.

Image quality is also detected, such as the presence of factors in the video production process that lead to lower image quality, such as shooting without good focus, lens not wiping for a long time, or video after multiple uploads and compressions resulting in blocky defects. These problems will be detected through the ai algorithm, on the one hand, reminding users to pay attention to these problems when shooting, on the other hand, they will also tilt high-quality videos when making video recommendations.

Understanding the content

According to Zheng Wen, after the content production process is completed, the video will be uploaded to the back-end server, where a deeper understanding of the video content is required. The content understanding of videos will be used in many aspects, such as content security, originality protection, recommendations, search, advertising, etc., which are roughly divided into two stages.

The first is the perception stage, where the machine will understand the video information from the four dimensions of face, image, music, and speech.

The face is a very important dimension, because the face often contains the most important part of the person's concern, and will detect the face area to identify age, gender, expression, etc.

Another dimension is the image level, which classifies the image, such as what the scene of the image is, and also detects what objects are in the image, evaluates the image quality, and extracts text from the image using ocr technology.

Music is an important part of influencing the appeal of a video, and the type of music can be identified from the video, and even a structured analysis of the music can be carried out, separating the accompaniment and singing parts.

Voice is also a very important dimension of video, often from the image may not be able to get the information conveyed by the video very well, then the voice is very important, will be the speech recognition into text, but also through the voice to identify the identity, age, gender and so on.

The second stage is the reasoning stage, which will multi-modal fusion of information from different dimensions to deduce higher-level speech information, or emotionally recognize the video. Knowledge graph technology is also used to store the knowledge in the video and express it in the knowledge graph. Through the reasoning of the knowledge graph, some higher-level and deeper information can be obtained.

In terms of content understanding, some more specific technologies have also been made, such as Kuaishou has developed a video labeling system that can classify most of the content and scenes that appear in the video. In the Kuaishou speech recognition function module, deep learning algorithms are used and combined with the context module, which greatly improves the recognition accuracy.

On the one hand, it is necessary to understand the video content, on the other hand, it is also necessary to understand the user, including the age, gender and other information disclosed by the user and some behavioral data generated by the user when using Kuaishou in real time. This data is transmitted to the back-end deep learning model to train vectors for user understanding. With these vectors, it is possible to predict the user's interests and his relationship with other users.

Finally, the description of the user and the understanding of the video are obtained, and the matching between the user and the video will produce trillion-level characteristic big data, which will be used in the real-time online recommendation system to predict what kind of video the user will be interested in. In addition, the content in the community will be sorted, such as how to allocate attention mentioned earlier, and I hope that the gap in attention distribution is not too large, so the distribution of video content will be adjusted according to the Gini coefficient. Factors such as the security, diversity, and protection of originality of the content are also taken into account.

Zheng Wen said that he hopes to further strengthen in-depth cooperation with teachers and students in colleges and universities, make full use of Kuaishou's massive data and powerful computing power, jointly promote deep learning technology, explore more possibilities in the future, and enhance public happiness, which is also the vision of the establishment of Tsinghua University-Kuaishou Future Media Data Joint Research Institute.

Tsinghua Big Data Forum: Zheng Wen, vice president of AI technology of Kuaishou, shares deep learning applications

Read on

Zheng Wen went deep into the city to investigate the designated poverty alleviation work

"Zheng Wen Gong Monument" Super Qing Atlas (1) Rough High-definition Text

Zheng Wengong was rude to Zhong'er, how did Zhong'er retaliate against Zheng Guo after he became the monarch?

Tsinghua Big Data Forum Ends Zheng Wen, Vice President of AI Technology of Kuaishou, shares deep learning applications

Zheng Wen: How to make the old brand "shine again"

Wei Stele Round Pen Pole Track "Zheng Wen Gong Monument"

Majestic and majestic, integrating seal posture, rhyme, and grass feelings, rigid and energetic in one "Zheng Wen Gong Monument"

The son of the Eastern Zhou Dynasty Story 134 wants to rebel, and Zheng Wengong kills two of his own sons

Wei Bei's those things - Wei Bei's Pen Method Jiao Detective (The Extreme Track of the Round Pen, Zheng Wen Gong Stele)

The DPP's post-90s "second generation of politicians" took over as the director of the youth bureau, and Zheng Wencan actually asked the outside world to "not think too much"

The DPP's post-90s "second generation of politicians" took over as the director of the youth bureau, and Zheng Wencan actually asked the outside world to "not think too much"

The "second generation of politics" takes office! The second generation of the Democratic Progressive Party took over the important post, and Zheng Wencan asked the outside world to "not think too much"

Zheng Wenguang's works: Flying to Sagittarius III. The Miracle of the Cosmic Age

Zheng Wenguang's works: Flying to sagittarius , A snowstorm twilight

Zheng Wenya's eldest sister-in-law Yu Judi died, once worth hundreds of millions, and her good girlfriend Liang Anqi did not respond for the time being

[Taiwan media: Pulling Chen Jianren into the party as a "living chess piece", Tsai Ing-wen is to snipe Lai Qingde and Su Zhenchang, two "independence factions" to choose 2024, to prevent Lai and Su from taking Taiwan to war and letting the United States go

To see how strong the AI is, someone took it to play a "script kill"

Hardware 丨 AMD expects to launch a CPU with an integrated AI engine as early as 2023

Why sound is suitable for building a brand strengthens the mind

The 7th generation of Qualcomm AI engine: through AI, see the future

Capture once in 5 minutes, at least 89 times a day at home! Suntech employee: I don't even dare to go to the toilet

Played a script kill, the same car teammate "not human"

2022 Le Orange New Product Launch: 14 new products qifa software and hardware fully upgraded

Is there any software to dub videos? Share software that can dub videos

Don't let ChatGPT run

The meme search artifact is here! You can also search for videos, netizens: I found a six-year meme to solve in two minutes

Cheating with ChatGPT, beware of being caught, anti-plagiarism watermark technology makes students' nightmares come early

Google's "crazy" generative AI track, the latest model can "create" music with text and pictures

What to do if ChatGPT goes crazy? Xiaoice Li Di: Two keys that I can break

Experience ChatGPT again: it will still be wrong, but the logic is stronger

Xiaoza personally officially announced the Meta vision big model! Self-supervised learning requires no fine-tuning

The CV ring exploded again? Xiaoza high-profile official announcement DINOv2, split retrieval omnipotent, netizens: Meta is "Open" AI