70% faster than human algorithms! Google DeepMind uses AI to improve data sorting and make it to Nature

Smart stuff

Compile | Cheng Qian

Edit | Heart

Wisdom news on June 8, last night, artificial intelligence research laboratory Google DeepMind released three AI tools AlphaZero, MuZero, AlphaDev, these tools can improve data center resource utilization, improve video compression efficiency, discover faster algorithms, thereby optimizing the entire computing ecosystem.

Yesterday, the relevant progress of AlphaDev has been featured in the international academic top journal Nature. AlphaDev, a dedicated version of AlphaZero, has also discovered new algorithms that can improve the sorting efficiency of short sequences of elements by 70%.

Now, AlphaDev's new sorting algorithm has been released to the C++ library. According to Google DeepMind's blog, this is the first change to the ranking library algorithm in more than a decade, and the first time an algorithm designed by reinforcement learning has been added to the library. The hashing algorithm has also been released to the open-source Abseil library.

C++ library open source link: https://reviews.llvm.org/D118029

Link to paper: https://www.nature.com/articles/s41586-023-06004-9

Google DeepMind is working to create AI tools with a broad understanding of the world to optimize the computing ecosystem as part of building more powerful and general-purpose AI systems.

The researchers are also expanding the capabilities of Google's reinforcement learning-based AI models AlphaZero and MuZero to help optimize video compression in data centers, reducing the amount of underutilized hardware in data centers by 19 percent and further reducing bitrates without losing video quality.

These tools currently deliver efficiency gains across the computing ecosystem, but these results also demonstrate the transformative potential of more general AI tools in the future.

AlphaDev: 70% increase in sorting efficiency and 30% increase in retrieval efficiency, which has been used by millions of developers

Previously, Google DeepMind developed an AI system for playing Go games, AlphaZero, and now the researchers applied this system to build algorithm sequencing to build AlphaDev, and the results show that the algorithm created by AlphaDev sorts data three times faster than the human-generated version when converted to the standard programming language C++.

"We were a bit shocked." Daniel Mankowitz, a computer scientist at Google DeepMind, who led the work, said, "At first we didn't believe [the result]. ”

AlphaDev can be used for faster sorting and hashing algorithms, which are used trillions of times a day to sort, store, and retrieve data.

1. From chess to find the algorithm, the efficiency of sorting short elements is increased by 70%

Sorting algorithms affect how all digital devices process and display information, including the presentation of some online search results, the ranking of posts on social media, and some user recommendations.

AlphaDev has developed an algorithm that can improve sorting efficiency, and this algorithm can improve the sorting efficiency of short sequences of elements by 70% and more than 250,000 elements by about 1.7% compared to human-designed algorithms in C++ libraries. This also allows AlphaDev's algorithm to quickly sort the results when a user submits a search query to find answers that are highly relevant to the user's search faster.

Initially, the researchers applied AlphaDev to the task of sorting numbers by size, initially just having it sort 3, 4, and 5 numbers at a time, which is important for sorting more numbers later.

Sort two numbers

AlphaDev works similarly to AlphaZero, which combines a computer version of deliberation and intuition to choose the actions in a board game. AlphaDev does not select an action, it selects instructions to add to a procedure.

AlphaDev discovers faster algorithms by starting from scratch rather than improving existing ones, and it focuses on the computer's assembly instructions. Assembly instructions are used to create binary code for computers to perform operations, and researchers at Google DeepMind believe there will be a lot to improve at this lower level.

When building an algorithm, AlphaDev checks that it is correct by comparing the output of the algorithm with the expected result. For sorting algorithms, this means that unordered numbers come in and correctly sorted numbers come out. Researchers reward AlphaDev for the correct ordering of numbers and for their speed and efficiency.

2. Data storage saves nearly 70% of time, and the algorithm has been open sourced

The Google DeepMind team also applied AlphaDev to non-sorting algorithms, where the version of the algorithm it uses to convert data stored in a particular format into bytes takes 67% less time than the standard version, and the hashing algorithm used for data storage and retrieval takes 30% less time than the standard algorithm.

Hashing information algorithms are often used for storage and retrieval in databases. The hashing algorithm usually uses a keyword to generate a corresponding unique hash, which will correspond to the data value to be retrieved, such as entering the keyword username "Jane Doe", which will generate the corresponding "order number 164335-87".

A more similar scenario is that librarians use a classification system to quickly find specific books, and computers can quickly understand what they are looking for and where to find them with the help of hashing algorithms.

Enter keywords to retrieve the corresponding data values

When applied to hashing algorithms in the 9-16 byte range of data centers, AlphaDev's algorithm can improve retrieval efficiency by 30%.

Last January, researchers at Google DeepMind published machine learning-based sorting algorithms in the LLVM project's C++ standard library and hashing algorithms in the Abseil library, which are already being applied by millions of developers and companies in industries such as cloud computing, online shopping, and supply chain management.

AlphaZero: Optimize data center resources and increase hardware utilization rate by 19%

Data centers need to manage everything from providing search results to processing datasets. Borg, Google's large-scale cluster management system, manages Google's billions of tasks, distributes workloads to optimize the internal infrastructure of the data center, handles services used by users such as Google Search, and manages batch processing.

The process of distributing the workload is like Borg playing Tetris, how to maximize the number of blocks in a limited space, and make use of the free space.

Liken the distribution workload to a Tetris game

Previously, Borg needed to use hand-coded rules to schedule tasks and optimize workloads. But at a scale of billions of tasks, these hand-coded rules fail to account for the diversity of changing workload distributions, so they are designed to be "one size for all," with an intermediate value.

This is where AlphaZero comes in, building algorithms that automatically create individual best-made rules that make Borg more efficient in distributing workloads and finding the right rules for different tasks.

During the experiment, the researchers found that AlphaZero was also able to identify patterns in tasks entering the data center, as well as predict the best way to manage capacity and make decisions with the best long-term outcomes.

When AlphaZero was applied to Borg, the researchers' experiments showed that this approach could reduce the amount of underutilized hardware by as much as 19 percent, optimizing resource utilization in Google's data centers.

MuZero: encodes video picture groups, reduces compression bitrate by 4%

Video streaming accounts for a significant portion of the Internet's traffic, so improving the efficiency of video transmission could have a huge impact on the millions of people who watch video every day.

Last year, Google DeepMind partnered with video site YouTube to compress and stream video through MuZero, and the results showed that the tool could reduce bitrates by 4% without compromising video quality.

Early on, researchers applied MuZero to optimize the compression of each individual frame in a video, and now they extend it to decide how frames are grouped and referenced during the encoding process.

Initially, MuZero defines the GOP (group of pictures) frames to be compressed, and then groups the images according to their visual similarity. MuZero compresses the keyframes of one of the image groups, and then compresses the other frames with reference to the keyframes, in the process, the algorithm will use block search to find the least changed area of the image, making the compression better and ensuring the video quality.

MuZero compresses groups of images

Finally, after a group of images is compressed, MuZero follows the same steps to compress the other parts of the video.

Early results from these studies suggest that MuZero has the potential to become a more versatile tool that helps researchers find the best solution in the video compression process.

Conclusion: The transformative potential of general-purpose AI tools has been highlighted

Today, Google DeepMind's AI tools are enabling billions of users to save time and effort in using computers, from playing games to solving complex engineering problems at the heart of each computer device. Researchers believe this is just the beginning.

In the future, more and more general-purpose AI tools may be able to optimize the entire computing ecosystem that powers the digital world, but at the same time, the digital infrastructure that underpins these tools needs to be faster, more efficient, and more sustainable. Therefore, the realization of this vision requires more theoretical and technical breakthroughs.

There's no denying that the transformative potential of general-purpose AI tools is already being felt, and researchers are already considering their applications in fields such as technology, science, and medicine.