Who says alpha dogs only play chess? DeepMind uses them to compress YouTube videos

Excerpt from the DeepMind blog

Author: The MuZero Applied Team

Machine Heart Compilation

Machine Heart Editorial Department

With similar video quality, DeepMind's MuZero can reduce bitrates by about 4 percent.

Who says alpha dogs only play chess? DeepMind uses them to compress YouTube videos

In 2016, DeepMind introduced alphaGo, the first agent capable of beating humans in Go. In the years that followed, its successors, AlphaZero and MuZero, continued to move toward general-purpose algorithms, mastering more games with less predefined knowledge. For example, MuZero mastered chess, Go, Japanese shogi, and Atari games without being told the rules.

However, in terms of speaking, these applications have not been able to get out of the scope of the game, and whether they can be used to solve real-world problems has always been the focus of external attention.

Yesterday, DeepMind blogged the good news that their MuZero has taken its first steps towards the real world, showing potential in optimizing video compression quality. The details are presented in a preprinted paper.

Thesis link: https://storage.googleapis.com/deepmind-media/MuZero/MuZero%20with%20self-competition.pdf

In this study, DeepMind researchers and YouTube collaborated to explore Muzero's potential in video compression. Analysts predict that streaming video will account for the vast majority of Internet traffic. To save bandwidth, the video must be compressed before it can be transmitted. In this way, how to minimize the loss of compressed video image quality, fluency, etc. has become an important problem for video manufacturers, and it is also a problem that is expected to be solved with reinforcement learning. DeepMind's Muzero can reduce bitrates by about 4% while ensuring video quality is similar.

Most online videos rely on codecs to compress or encode the video at its source, then transmit it to the audience over the Internet, and finally decompress or decode it for playback. These codecs make multiple decisions for each frame in the video. After decades of manual engineering, these codecs have achieved a certain degree of optimization, and have been applied in many fields such as video on demand, video calls, video games and virtual reality, but there is still a lot of room for optimization.

Since reinforcement learning is particularly useful for sequential decision-making problems like codecs, DeepMind explores this issue.

Their first study was the VP9 codec (specifically the open source version of libvpx), which was widely used by YouTube and other streaming services. As with other codecs, service providers using VP9 need to consider bit rates. Bitrate, which refers to the number of 1s and 0s required to send each frame of video, is the primary determinant of the amount of computation and bandwidth required to serve and store video, affecting many metrics such as video load time, resolution, buffering, and data usage.

When encoding video, the codec uses information from previous frames to reduce the number of bits required for future frames.

In VP9, the most straightforward way to optimize bitrate is with the help of quantization parameters (QP) in the rate control module. This parameter determines the level of compression to be applied for each frame. Given a target bit rate, the QP of the video frames is determined sequentially to achieve overall video quality optimization. Intuitively, we should assign a higher bitrate (lower QP) to complex scenes and a lower bitrate (higher QP) to static scenes. The QP selection algorithm explains how the QP value of a video frame affects the bitrate allocation and overall video quality of other video frames. Reinforcement learning is especially helpful for solving these sequential decision-making problems.

For each frame of video processed by VP9, MuZero-RC replaces VP9's default rate control mechanism, determining the compression level applied to achieve similar quality at lower bit rates.

MuZero combines the ability to search and learn from environmental patterns and plan accordingly to achieve performance that surpasses humans in a variety of tasks. This approach is particularly effective in large combinatorial motion spaces, making it an ideal candidate solution for video compression rate control problems.

However, for MuZero to tackle this real-world problem, a whole new set of problems needs to be solved. For example, the set of videos uploaded to a platform like YouTube varies in content and quality; any agent needs to generalize to different videos, including new videos after deployment. In contrast, board games tend to have only one known environment. On video tasks, many other metrics and constraints affect the final user experience and bitrate savings, such as PSNR (peak signal-to-noise ratio) and bitrate constraints.

To address these challenges, DeepMind created a mechanism for Muzero called "self-competition," which translates the complex goal of video compression into a simple WIN/LOSS signal by comparing the current performance and historical performance of agents. This results in a rich set of codec requirements being converted into a simple signal that is then optimized by the agent.

By learning the dynamic variations of video encoding and determining how best to allocate bits, the MuZero Rate Controller (MuZero-RC) is able to reduce the bit rate without losing quality. QP selection is just one of many encoding decisions in the coding process. While decades of research and engineering have produced efficient algorithms, DeepMind envisions a single algorithm that can automatically learn to make these coding decisions to get the best rate distortion trade-offs.

Videos encoded using previous QP heuristics

Video encoded using MuZero-RC. With MuZero-RC, each video achieves similar quality at a reduced required bit rate. Experiments have shown that the bitrate is reduced by an average of 4% in a large number of different YouTube live videos.

In addition to video compression, the significance of this study is that they are taking the first step in applying MuZero to the real world, proving that reinforcement learning agents can be used to solve real-world problems. DeepMind says that by creating agents with a range of new capabilities to improve products across domains, they can help various computer systems become faster and more automated. The company's long-term vision is to develop a single algorithm for optimizing thousands of real-world systems across a variety of domains.

Who says alpha dogs only play chess? DeepMind uses them to compress YouTube videos

Read on

Ke Jie repeatedly sent dynamics after the second to delete Netizens: No one can beat

Which careers will disappear in the next decade?

The Google DeepMind team brings new tools to language models to spot and fix harmful behavior in a timely manner

DeepMind successfully used AI to control nuclear fusion, and the "artificial sun" is one step closer

DeepMind also amplified the move: using AI to control nuclear fusion reactions on "Nature"

A transformative technology to control fusion experiments

The AI keyboard man is coming: DeepMind begins training agents to "play" with computers like humans

The story behind the man-machine war: After losing the alpha dog, Fan Li was once criticized by Ke Jie for "not playing chess"

Interview with the DeepMind team: "Ithaca" restoration of Greek inscriptions is just the beginning

AI Cracks Ancient Text on Nature Cover: Fix Missing Text, Precise Geographic Location and Writing Time

Can DeepMind's AI guide human intuition?

Google's DeepMind broke multiple sexual harassment scandals The company responded: It has conducted an in-depth investigation

What happened to me, a biologist who studies nuclear fusion, to publish an ancient Greek paper?

Musk has the strongest starship, deepMind using AI to build the sun

Playing bridge, 8 human world champions, all lost to AI

This "alpha dog" in the field of vehicle logistics has made the car travel all over the world

DeepMind became an AI startup accelerator camp: 17 senior employees and executives left in 3 years

【AI】How will the relationship between AI and humans change in the future?

DeepMind combines category theory and abstract algebra to discover the connection between GNN and DP

DeepMind closed its first overseas lab, set up for only five years, and was led by the father of reinforcement learning