A good media processing framework has these three characteristics

Guest | Zhao Jun

Edit | Zhongliang

Since 2017, the audio and video application platform has gradually paid attention to bandwidth costs and viewing experience, Tencent has begun to develop ultra-fast HD technology since that time, what challenges have they encountered in the research and development process? What are the industry's technical solutions for high-definition video? In this issue, we interviewed Tencent expert engineer Zhao Jun, who combined his practical experience to give the answer. The following is a compilation of interview articles, I look forward to enlightening you~ ~ ~

InfoQ: Can you tell us a little bit about what you're currently doing? What was the biggest difference between Intel's work at Intel before 2018 on hardware acceleration related to video encoding/decoding/transcoding, and the media processing framework you have today?

Zhao Jun: At present, I am mainly responsible for the media processing framework, codec scene optimization, etc. in Tencent Cloud Video Cloud to provide better media processing related infrastructure for the business side; the biggest difference between the current work and Intel's hardware acceleration work is that hardware acceleration is a part of the media processing framework, and now the media processing framework, codec scene optimization and other work are closer to the real problem, so I also recommend that even in the underlying related optimization, It is also important to understand the business application scenarios, so that you will have a more comprehensive understanding of your work.

InfoQ: What are the stages of Tencent's development? Are there any common ways to improve image quality? What are the biggest challenges you face during development?

Zhao Jun: Tencent Mingyan began to develop in 2017, and at that time, we found that the audio and video application platform began to shift its focus to bandwidth costs and viewing experience. We also began to develop the technology of Mingyan Extreme Speed HD at this time, hoping to apply the long-term accumulated audio and video capabilities to audio and video media scenes, especially live broadcast, on-demand and other media processing scenarios, which is the beginning of Ming's Extreme Speed HD; during this period, the most important parts include:

A: Continuous optimization of the coding kernel: We know that the development of a new coding standard is a breakthrough of 0 to 1, but as a solution or product, it also needs to solve the problem of 1 to 100, and this is a continuous process of coding kernel optimization. Although the new standard is advanced, but without long-term practical optimization, it is difficult for the encoder to fully realize the potential of the standard. After many rounds of optimization, the internal open source collaborative O264 encoder has achieved a gain of more than 30% compared to the open source encoder in various indicators, and the V265 can achieve 40% of the encoding gain compared to the open source x265; we have also taken the lead in the industry to support AV1, its AV1 encoder TXAV1 (called VAV1 during the race), in the AV1 track of MSU, the first launch to achieve the first good result in all indicators; at the same time, Tencent is also actively laying out H.266 and other next-generation encoder technologies.

B: Perfect media processing Pipeline: In the media processing pipeline of Bright Eye, actively introduce the traditional algorithm based on traditional signal processing and the AI capabilities of the current trend, first carry out pre-analysis processes such as scene analysis, glitch detection, noise detection, interlaced detection, quality inspection and JND, analyze the picture quality of video sources, and then use the corresponding image quality enhancement/repair technology for different scenes and picture quality conditions. After the repair, Mingyan will also conduct a secondary analysis of the picture to assist in the subsequent video encoding process.

Specifically, through deep learning, Tencent Mingyan can identify scenes in more than a dozen mainstream categories and dozens of sub-categories, including games, sports, shows, outdoors, animation, film and television, etc., and automatically match the corresponding scene models for video streams. After scene recognition, Mingyan will combine the video source bitrate, frame rate, resolution, texture and motion change amplitude and other information to further perform sharpening, de-blur, deinterlacing, de-effect, noise reduction, color scale compensation, frame reduction / interpolation, dark scene enhancement, de-jitter and other pre-processing; and then conduct secondary analysis of the picture, analyze the ROI/JND of the video, content adaptive coding and other information, and adjust to a coding process that is more in line with the subjective feelings of the human eye. As long as customers turn on the ultra-fast HD function, they can reduce the video bitrate by 30%-50% in the same picture quality, ensuring that the user's viewing experience is at the same time, greatly saving costs.

C: Combined with the transmission and packaging format considerations: In this diverse world, not only need to face H.264, H.265, AV1, H.266 these different video encoding formats, but also need to consider different distribution protocols, container formats, DRM, etc., which makes us consider actively improve the picture quality at the same time, but also need to continue to explore the use of more compact and effective, universal container format, combined with network transmission optimization, with lower distribution bandwidth, to solve the problem of multi-terminal, multi-screen coverage Bring better seconds on, reduce playback jitter, solve problems such as compatibility of different devices and ecosystems.

InfoQ: When it comes to HD video, do you know what solutions of the same type are available? What are the different methods of Mingyan in performance and architectural design?

Zhao Jun: Most of the industry's solutions are influenced by Netflix's Per-Title Encoding/shot-based Encoding. Netflix proposed Per-Title Encoding in 2015, and from a high perspective, Netflix uses a "brute force" encoding technique that encodes each source file into a combination of hundreds of resolutions and bitrates to find the "convex packet," which most effectively constrains the shape of all data points; targeting VMAF to measure the subjective evaluation of the human eye.

A good media processing framework has these three characteristics

After 2018, Per-Title Encoding coding evolved into a scenario-based Dynamic Optimizer technology. Instead of dividing the video into any 2-second or 3-second GOP or clip, dynamic optimization divides the video into scenes and encodes each scene individually. While this dynamic optimization uses dynamic GOP and fragment lengths, adaptive bitrate (ABR) stream switching continues to work efficiently because all the rungs use the same GOP and fragment lengths.

It should be noted that the technology based on Per-Title Encoding and Shot-based Encoding, because of its complexity, can only be used in on-demand scenes, while Tencent Mingyan supports live broadcast scenes at the same time.

In addition, Bright Eye has also embraced the new technology more actively, providing the ability to repair and enhance the quality of the picture, effectively eliminating noise and compression effects in the film source, enhancing detail, removing blur, improving color quality, and solving problems such as stuttering due to low resolution and frame rate. In addition, the cloud is also used, and the transmission protocol and packaging container format are fully optimized, making the whole solution more complete.

In order to solve the hash rate pressure brought by the AI algorithm, Mingyan has designed a new hash rate pool scheme to solve the performance problem asynchronously.

InfoQ: What features and requirements do you think a good media processing framework has? How far is tencent cloud media processing framework from your goal?

Zhao Jun: In my opinion, a good media processing framework needs to have the following three aspects:

Simplicity: We know that it is more difficult to make a thing simple than complex, and simplicity will make things clearer and more unified, which is our first priority when designing the media processing framework; specifically, we use a pipeline design based on directed acyclic graph, combined with low-coupling layering to cope with the needs of different scenarios, in addition, the use of asynchronous processing of computing power pool, the CPU and GPU acceleration are unified into one.

Scalability: A good media processing framework must be extensible, the reason is that 2B business requirements are changeable, its implementation depends on the underlying and the computing power depends on changeable, which requires the media processing framework to have a good amount of scalability and constantly meet the transformation of the business. It should be mentioned here that our extensibility design refers to the extension methods of FFmpeg's AVCodec, AVForamt, etc., so that when the underlying extension function is extended, the upper-level business party has no change in the use of the API.

Completeness: The media world is actually a bit divided, and the reason for the split is not only the difference in technical direction, but also because of the factors behind the various companies, organizations and even other levels, and as a 2B vendor, how to provide a simple and easy-to-use integrated media processing solution is a challenge. Mingyan classifies the underlying basic atomic capabilities and integrates them into the media processing framework in an orderly manner, covering media diagnosis, media pre-analysis, media pre-processing, coding pre-processing, packaging optimization, transmission optimization, etc., completely covering all aspects of media processing, so that it can cope with this diverse media world and be better integrated into different scenarios.

Open source and growth

InfoQ: When did you come into contact with open source? See if you've shared a "FFmpeg Key Components and Hardware Acceleration" before, are you still working on this? In addition to FFmpeg, what other aspects of the project do you pay attention to in open source now?

Zhao Jun: I have been in contact with open source very early, it should have been more than ten years, basically most of the knowledge, is obtained from the open source community or project, before mainly focusing on the Linux kernel network protocol stack part, and then turned to the direction of media processing. I still keep an eye on the FFmpeg project, taking time out of my day to see patches, discussions, etc. in the community. From the perspective of project positioning, I don't know that there are projects that can be exactly similar to FFmpeg, but there are some projects that I personally focus on, such as Gstreamer, GPAC, SRS, etc. In addition to focusing on FFmpeg, I also focus on open source projects and the Linux kernel of the coding project, especially the network part of the Linux kernel.

InfoQ: What has changed you personally about the open source projects you've been involved in? Open source commercialization is currently being done by everyone, how do you think about commercialization?

Zhao Jun: On the one hand, the participation of open source projects requires long-term accumulation of trust, which is a continuous investment process, on the other hand, deep participation in open source projects also requires more thinking in communication, most of the open source projects I participate in use Maillist communication, which need to overcome barriers in different cultures and languages to better integrate into this project. For the commercialization of open source, I don't think much about it, and I am still in a simple open source concept state, "if you take it, you must give it" - since you have obtained knowledge from the open source community, you should also actively give back to the open source community.

InfoQ: If there are new developers who want to reach out to open source, what advice do you have?

Zhao Jun: For new developers, personal experience is to first familiarize yourself with the relevant information that you can find, such as email etiquette, coding style, code submission process, code review process, Github and maillist; most open source projects have some relatively small tasks, you can start from these tasks and try to enter this project; it is necessary to mention that it is necessary to strictly abide by the open source community etiquette, because of development habits and personal engineering literacy. Many new developers in China are easily frustrated by ignoring this problem when trying to integrate into the open source community.

InfoQ: In the environment of rapid iteration of new technologies, how to continuously learn new technologies, and are there some learning habits that can be used to learn from readers? It seems that programmers at 35 years old have anxiety, how do you think about this?

Zhao Jun: For the problem of learning, I think it is necessary to return to the simplest need, that is, people's curiosity, encounter a problem or new technology, whether you have enough curiosity, find the challenge behind the problem, and then try to find the answer that satisfies you; and the beginning of learning new technologies, I am used to starting from the analogy of existing knowledge, try to understand in my own way; in the initial stage, find all the relevant knowledge materials, do not make a distinction between reading through, This process allows for a quick understanding of some jargon of the technology or industry, understanding the basic problems faced; the second time you start to read classics or important literature, code, etc., to get more details, after all, The Devil is in details.

On the 35-year-old anxiety problem, Tencent has a forum called KM, which has colleagues answered similar questions, saying that "reading books break through ten thousand volumes", although some jokes, but in fact, there is a certain truth, on the one hand, to maintain continuous learning, the current society has undergone some changes, need to maintain the habit of lifelong learning, most of the excellent colleagues and friends I have seen have this characteristic; in addition, to consider their core competitiveness, in two or three areas have their own competitive advantages Communicate with excellent external colleagues and friends. The third is that I have done a relatively poor job, and I am currently trying to improve the place, exercising, maintaining a strong energy, and not doing too much unnecessary physical consumption.

Recommended Activities:

On June 19-20, the ArchSummit Global Architect Summit will land in Shanghai, and Teacher Zhao Jun will also come to the scene to share, in Zhao Jun's sharing, you can learn about Tencent Mingyi's media processing architecture. In addition, at this summit, we set up a total of fifteen topics, including big data and artificial intelligence, middleware development practices, mobile terminal development practices, microservice architecture design, etc. The detailed topic content can be understood through the following Banner scan code, looking forward to communicating with you on the spot.

Click on one to see fewer bugs

A good media processing framework has these three characteristics

Read on