laitimes

Tsinghua Big Data Forum Ends Zheng Wen, Vice President of AI Technology of Kuaishou, shares deep learning applications

author:China Youth Network

On April 27th, on the occasion of the 108th anniversary of Tsinghua University, the Tsinghua University Big Data Research Center and the Tsinghua-Kuaishou Future Media Data Joint Research Institute co-hosted the "Tsinghua Big Data Forum - Deep Learning Technology and Application", where teachers, students and alumni of Tsinghua University gathered together to discuss and share the latest progress of deep learning technology and application.

Tsinghua University-Kuaishou Future Media Data Joint Research Institute was officially established in April 2018. As a university-level scientific research institution of Tsinghua University, the Institute makes full use of Tsinghua University's leading technology and Kuaishou's years of industry accumulation to carry out basic and applied research, development, integration and rapid iteration in many fields, and jointly explore a series of future media topics, so that technology can better empower users and achieve more accurate connections between people.

Tsinghua Big Data Forum Ends Zheng Wen, Vice President of AI Technology of Kuaishou, shares deep learning applications

Dr. Zheng Wen, a 2001 alumnus of the School of Software, vice president of Tsinghua-Kuaishou Future Media Data Joint Research Institute, and vice president of Kuaishou AI Technology, shared the topic of "Application and Prospect of Deep Learning in the Field of Short Video". Here's the core of the talk.

As a short video app with more than 160 million daily active users, Kuaishou's mission is to "use technology to enhance everyone's unique happiness". There are two key words here, one is "everyone", which shows that Kuaishou's values are very universal, but we also emphasize that everyone's happiness is "unique". It is difficult to achieve services for everyone by manual operation alone, and it needs to be achieved through artificial intelligence technology, especially the deep learning technology that has been broken in recent years.

At present, Kuaishou is promoting happiness through recording, which can be reflected in two aspects. First, users want to see the wider world. Second, users also have a need to share themselves and be seen by the wider world.

But here is a challenge, now Kuaishou has accumulated more than 8 billion videos and hundreds of millions of users, in the face of these two massive numbers, how to effectively distribute attention? In the past, attention was generally focused on the so-called "blockbuster video", but under the blockbuster video, there is a large number of content that may contain very rich information and diverse categories, and this kind of "long-tail video" is often difficult to be noticed by others. As a result, some groups with small needs or more segmented interests often have difficulty finding the content they want.

This challenge determines that we must rely on deep learning-based AI technology to solve this problem, instead of manually implementing the distribution of content matching. Kuaishou has done a lot of accumulation in AI-related technologies since early on, and there are a large number of deep learning applications in every link from video production to distribution.

Content production

Kuaishou hopes to make the record richer and more interesting through AI technology, based on this goal, we have developed a large number of multimedia and AI technologies, such as background segmentation, sky segmentation, hair segmentation, human body key points, face key points, gesture key point detection, etc., and apply them to magic expressions.

Tsinghua Big Data Forum Ends Zheng Wen, Vice President of AI Technology of Kuaishou, shares deep learning applications

The distribution of Kuaishou users and Chinese Internet users is very consistent, and a large part of the mobile phones used by Chinese Internet users are low-end mobile phones with limited computing power. In order to make the advanced technology experienced by the most users, Kuaishou customizes the underlying platform, based on Kuaishou's self-developed ycnn deep learning inference engine and media engine, so that the above technology can run efficiently on most models, and adapts and optimizes for different models and different hardware.

Kuaishou also wants to make the content quality even higher, and has developed and applied many image enhancement technologies. For example, when a user shoots in a very low-light environment, the resulting video often loses information and detail, which can be recovered through low-light enhancement technology.

Next are some specific deep learning technologies recently developed by Kuaishou in content production. Three-dimensional face technology can recover the three-dimensional information of the face for a single face image, on the one hand, it can realize some modifications to the face, such as lighting, making some expressions, and achieving three-dimensional face change effects; on the other hand, through the three-dimensional face information, we can extract the expression changes of the person, and then migrate the expression to the virtual cartoon image, the effect is similar to the animoji function launched by iPhonex, but the iPhonex has a structured light camera and runs animoji It requires very powerful computing power, and through technology research and development, we can also achieve similar functions on ordinary cameras and mobile phones with lower configurations.

Tsinghua Big Data Forum Ends Zheng Wen, Vice President of AI Technology of Kuaishou, shares deep learning applications

Just mentioned the semantic segmentation technology, portrait segmentation technology can distinguish the portrait and the background, respectively, the portrait and the background to do special effects, or background replacement, but also can be a portrait bokeh; hair segmentation, you can divide the hair area, do hair coloring effect. Sky segmentation technology can make the sky area more surreal and more dreamy.

Human posture estimation is to predict the joint point position of the person, using this technology, we can add special effects to the human limbs, or modify the human body shape, do the body slimming function. In addition, we can reconstruct the three-dimensional information of the human body for controlling cartoon images.

Gesture detection is the detection of a variety of specific different hand shapes to achieve "rain control" and other gameplay. In addition, there is ar camera attitude estimation, behind which is kuaishou's self-developed 3D engine, and on its basis, editor modules, rendering modules, limb modules, sound modules, etc. are added to achieve the exquisite and natural light sense and material of the model.

In terms of audio and video, we have applied a lot of intelligent algorithms, such as requiring the video to be as clear as possible, but also requiring smooth transmission, which requires some adaptive optimization for the complexity of the video. In addition, we will also analyze the image, such as the area of the face in the video often has the greatest impact on everyone's perception, we will detect the area of the face, the bit rate is higher, so that the overall look and feel has been greatly improved.

We also check image quality, such as the fact that there are factors in the video production process that lead to lower image quality, such as shooting without good focus, lens not wiping for a long time, or video after multiple uploads and compressions resulting in blocky defects. We will detect these problems through the AI algorithm, on the one hand, remind users to pay attention to these problems when shooting, on the other hand, we will also make some tilt on high-quality videos when making video recommendations.

Understanding the content

After the content production process is completed, the video will be uploaded to the back-end server, where we need to have a deeper understanding of the video content. The content understanding of videos will be used in many aspects, such as content security, originality protection, recommendations, search, advertising, etc., which are roughly divided into two stages.

Tsinghua Big Data Forum Ends Zheng Wen, Vice President of AI Technology of Kuaishou, shares deep learning applications

The first is the perception stage, where the machine will understand the video information from the four dimensions of face, image, music, and speech.

The face is a very important dimension, because the face often contains the most important part of the person's concern, we will detect the face area, identify the age, gender, expression and so on.

Another dimension is the image level, where we classify the image, such as what the scene of the image is, and also detect what objects are in the image, evaluate the image quality, and extract text from the image using ocr technology.

Music is an important part of influencing the appeal of the video, we can identify the type of music from the video, and even structure the analysis of the music, separating the accompaniment and singing parts.

Voice is also a very important dimension of video, often from the image may not be very good to get the information conveyed by the video, then the voice is very important, we will recognize the speech into text, but also through the voice to identify the identity, age, gender and so on.

The second stage is the reasoning stage, where we multi-modal fusion of information from different dimensions to deduce higher-level speech information or emotionally recognize the video. We also use knowledge graph technology to store the knowledge in the video and express it in the knowledge graph. Through the reasoning of the knowledge graph, some higher-level and deeper information can be obtained.

In terms of content understanding, we have also done some more specific technologies, such as Kuaishou has developed a video labeling system that can classify most of the content and scenes that appear in the video. In the Kuaishou speech recognition function module, we use deep learning algorithms combined with the context module, which greatly improves the recognition accuracy.

Tsinghua Big Data Forum Ends Zheng Wen, Vice President of AI Technology of Kuaishou, shares deep learning applications

On the one hand, we need to understand the video content, on the other hand, we also need to understand the user, including the user's public age, gender and other information and some behavioral data generated by the user when using Kuaishou in real time. This data is transmitted to the back-end deep learning model to train vectors for user understanding. With these vectors, we can predict the user's interests and his relationship with other users.

Tsinghua Big Data Forum Ends Zheng Wen, Vice President of AI Technology of Kuaishou, shares deep learning applications

Finally, we get the description of the user and the understanding of the video, and the matching between the user and the video will produce trillion-level characteristic big data, which will be used in the real-time online recommendation system to predict what kind of video the user will be interested in. In addition, we will also sort the content in the community, such as how to allocate attention mentioned earlier, we hope that the gap in attention distribution is not too large, so we will adjust the distribution of video content according to the Gini coefficient. Factors such as the security, diversity, and protection of originality of the content are also taken into account.

Finally, we also hope to further strengthen in-depth cooperation with teachers and students in colleges and universities, make full use of Kuaishou's massive data and powerful computing power, jointly promote deep learning technology, explore more possibilities in the future, and enhance public happiness, which is also the vision of establishing Tsinghua University - Kuaishou Future Media Data Joint Research Institute, thank you.

Read on