LLM search code self-evolution on Nature, AI for the first time to overcome Tao Zhexuan's mathematical problems

Editor: Editorial Department

This is the first algorithm discovered by LLM in history, which can be called a milestone research, and it will be published in Nature as soon as it is released.

The upper bound set problem is an open problem that has plagued mathematicians for many years.

Tao Zhexuan, a famous mathematician, once described the upper bound set problem as his favorite open-ended problem.

LLM search code self-evolution on Nature, AI for the first time to overcome Tao Zhexuan's mathematical problems

Tao Zhexuan blog

And the large language model has made a new discovery on this problem.

Today, researchers from Google DeepMind, the University of Wisconsin-Madison, and the University of Lyon have teamed up to come up with a new approach, FunSearch, which uses LLMs to discover open questions in the mathematical sciences for the first time!

AI searches for "functions" written in computer code, hence the name FunSearch.

Address: https://www.nature.com/articles/s41586-023-06924-6

In simple terms, FunSearch pairs a pre-trained LLM with an automatic "evaluator". The goal of the former is to provide creative solutions in the form of computer code, while the latter prevents illusions and false ideas.

By iterating back and forth between these two components, the initial solution "evolves" into new knowledge.

In order for everyone to witness this historic moment, DeepMind first sent out the unedited version.

Nature's press release is even more blunt: DeepMind's AI outperforms human mathematicians in unsolved problems!

This is the first time that humans have used LLMs to challenge open questions in science or mathematics and make new discoveries.

In addition, to prove the usefulness of FunSearch, DeepMind experts also tried to solve the "boxing problem", which has a wide range of applications and can improve the efficiency of data centers.

And for this problem, FunSearh has also found a more efficient algorithm.

According to DeepMind experts, scientific progress relies heavily on the ability to analyze new knowledge, and what makes FunSearch a powerful scientific tool is that it outputs programs that not only come up with solutions, but also reveal how they were built.

In this way, scientists using FunSearch can be further inspired to develop new ideas, entering a virtuous cycle of "improvement and discovery".

LLMs drive scientific discovery through "evolution".

Large models are best at solving problems, but can they discover completely new knowledge?

Since LLMs cannot avoid "hallucinations" that output factually incorrect information, it is very difficult to rely on them to make new discoveries that are factually correct.

But what if we could identify and expand LLM's best ideas and push them to the limit?

FunSearch harnesses the power of large models to develop and retain the best creative ideas through an "evolutionary" approach.

These ideas are expressed in computer code that can be run and scored automatically.

First, the user describes the problem in the form of code. This description includes a process for evaluating programs, and a seeding program for initializing the program pool.

FunSearch is an iterative program, and with each iteration, the system selects some programs from the current program pool and enters them into the LLM.

On this basis, LLMs creatively generate new programs and automatically evaluate them.

The highest-rated programs are added back to the existing program pool, creating a self-improvement cycle.

It's worth mentioning that FunSearch uses Google's PaLM 2, but it's also compatible with other code-trained LLMs.

Schematic diagram of the overall process of FunSearch: Present the LLM with the best program it has generated so far (retrieved from the program database) and ask for a better program to be generated. The procedures proposed by the LLM are automatically executed and evaluated. The best programs are added to the database for subsequent cycle selection. Users can search for the programs with the highest scores to date at any time.

Discovering new mathematical knowledge and algorithms in different fields is a notoriously daunting task. To a large extent, this is far beyond the capabilities of state-of-the-art AI systems.

To solve such challenges with FunSearch, DeepMind researchers introduced several key components.

Rather than starting from 0, the process of "evolution" begins with common sense about the problem, allowing FunSearch to focus on finding the most critical ideas to make new discoveries.

In addition, the evolutionary process uses a strategy to increase the diversity of ideas in order to avoid stagnation. Finally, the DeepMind team ran the evolutionary process in parallel, which in turn improved the efficiency of the LLM.

Groundbreaking mathematical discoveries

The upper bound set problem is an open challenge that has puzzled mathematicians in several fields of study for decades.

This time, DeepMind researchers teamed up with Jordan Ellenberg, a professor of mathematics at the University of Wisconsin-Madison, who made an important breakthrough in the upper bound set problem.

Address: https://arxiv.org/abs/1605.09223

One of the keys to the upper limit set problem is to find the largest set of points (i.e., the upper limit set) in a high-dimensional network, in which no three points can be on the same line.

The upper bound set problem is so important because it can be used as a model for other problems in extreme combinatorics that investigate how large and small a set of numbers, figures, or other objects can be.

However, to solve the problem of the upper bound set, the calculation method by brute force will certainly not work, because there are so many possibilities to consider that the number of atoms in the universe will soon be exceeded.

Tao Zhexuan's explanation of why the problem of the upper limit set is important

FunSearch programmatically generated solutions to this, and in some settings, found the largest set of caps ever created.

This finding represents the largest increase in the size of the cap in the last 20 years!

Moreover, FunSearch also outperforms state-of-the-art computational solvers, as the problem scales far beyond the current capabilities of computational solvers.

Here's an interactive diagram showing the evolution of the new function from seeding at the top to higher-scoring at the bottom.

In it, each circle is a program whose size is proportional to the score assigned to it. On the right is the corresponding function generated by FunSearch for each node. (For the complete program of the function, please refer to the original paper)

Interactive Experience Links: https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/funsearch/index.html

The above results show that FunSearch technology has the ability to break through the established research results of complex combination problems. In this type of problem, it is often very difficult to establish an intuitive understanding.

According to the researchers, it is highly anticipated that this approach will contribute to new discoveries in other similar theoretical problems in combinatorics, and even open up new possibilities in the field of communication theory in the future.

FunSearch opens the "black box" and collaborates with mathematicians to become a model

FunSearch prefers a program that is concise and human-explainable

While discovering new mathematical knowledge is important in itself, the FunSearch method has additional advantages over traditional computer search techniques.

This is because FunSearch is not a "black box" that simply generates solutions to problems.

Instead, it generates programs that describe "how these solutions are implemented".

This "show-your-working" approach, similar to how scientists work, can better explain and reproduce newly discovered processes.

FunSearch prefers solutions that are represented by "highly compact programs" – solutions with low Kolmogorov complexity.

Short programs can describe very large objects, allowing FunSearch to scale to the problem of finding small targets in massive amounts of data.

In addition, it also makes it easier for researchers to understand the program output of FunSearch.

Jordan S. Ellenberg, American mathematician Professor UW-Madison and author of the paper, said, "FunSearch provides a completely new mechanism for developing attack strategies. The solution generated by FunSearch is conceptually much richer than a mere list of numbers. When I studied them, I learned something."

What's more, this explainability of the FunSearch program can provide researchers with actionable insights.

For example, when using FunSearch, there is an intriguing symmetry in some of its high-scoring output code.

This gives researchers a new understanding of the problem and can use this insight to refine the problems introduced in FunSearch to arrive at better solutions.

According to DeepMind, "this is an example of collaboration between humans and FunSearch on many mathematical problems."

Left: By examining the code generated by FunSearch, the researchers gained more actionable insights (highlighted). Right: The original "acceptable" collection built using the shorter program on the left.

Solve the major challenge of "packing problem" in the computer field

Now that they have been able to succeed in the theoretical upper set problem, the DeepMind researchers tried to explore the flexibility of FunSearch in the field of computer science.

An important practical challenge in computer science is to explore the flexibility of new approaches.

Here, a challenging "bin packing" problem is used, which involves packing items of different sizes into a minimum number of boxes or containers.

This problem is at the heart of solving many practical problems, from loading items into containers, to distributing computing jobs in data centers to minimize costs.

Online boxing problems are usually solved using algorithmic rules of thumb (heuristics) based on human experience.

However, it can be challenging to find a set of rules for each specific case of different size, time, or capacity.

Although very different from the cap set issue, setting up FunSearch for this issue is easy.

FunSearch offers an auto-customized program that adapts to the specifics of the data and performs better than previous heuristics – using fewer boxes to pack the same number of items.

Existing heuristics: Examples of the best adaptation heuristic (left) and the FunSearch heuristic (right) "Packing Problem".

Difficult combination problems such as online boxing can also be solved using other AI methods, such as neural networks and reinforcement Xi. These methods have all proven to be effective, but to deploy them, it is likely that significant resources will be required.

The code output from FunSearch can be easily inspected and deployed, which means that the solution can be applied to real-world industrial systems and bring benefits quickly.

LLM-driven discoveries in science and beyond

FunSearch is designed to prevent LLMs from "hallucinating".

The power of these models can not only help with new discoveries in the field of mathematics, but also find the best solutions to real-world problems.

DeepMind believes that for many problems in science and industry, whether long-standing or new, it will be common practice to use LLM-driven methods to generate effective and customized algorithms.

In fact, FunSearch's pioneering work is just the beginning.

As the scope of LLMs expands further, FunSearch will naturally improve.

At the same time, DeepMind will also strive to expand its capabilities to address a variety of scientific and engineering challenges that society urgently needs to solve.

Netizens are hotly discussed

If all the illusions were accurate, new insights would accelerate the discovery of basic science.

There are also people who say that the threshold of AGI is to make new discoveries, so I guess we already have AGI now.

In 2007, Tao Zhexuan, the world's greatest mathematician, called the "upper bound set problem" his favorite open-ended problem. Now, Google's DeepMind's FunSearch has successfully solved this problem.

"LLMs can't discover anything new, they're just random parrots." FunSearch can actually discover new and useful things in math and computer science.

This sentence clearly names LeCun himself.

So, when will the proof of P=NP be realized?

Resources:

https://deepmind.google/discover/blog/funsearch-making-new-discoveries-in-mathematical-sciences-using-large-language-models/

LLM search code self-evolution on Nature, AI for the first time to overcome Tao Zhexuan's mathematical problems

Read on

In the fourth grade, in mathematics, the whole book of the test center is silently written on small notes, and one is practiced every day, and the memory is doubled 🔥

Jointly study "Mathematics +" and explore "interdisciplinary"

In order to pile the balls more densely, mathematicians came up with the idea of throwing the balls out at random

The top four best universities for mathematics in the world

Top 10 universities in the world for mathematics majors

In the third grade of mathematics, I often take the special test of application problems in the division method, and I don't know how to !️ do application problems

Pen and ink condense the fragrance of books and calculate the style of the exhibition - the test of Chinese character writing ability and mathematical calculation ability of Hongqi Primary School in Weiyang District

2024 College Entrance Examination Sprint Review: College Entrance Examination Mathematics Classic Example Questions and Analysis

"Counting" I talk about "questions" and practice wonderfully - Shuangqiao Primary School held the "Mathematics Topic Expert" activity

In junior high school, it is most necessary to accumulate subjects, and mathematics is the thinking English vocabulary

The brilliant French mathematical community

2024 College Entrance Examination Sprint Must Read: High School Mathematics Common Test High-frequency Test Centers are fully sorted out

Google Math Gemini solves the Olympiad puzzle and is comparable to a human mathematician!

Lecture 05 Binomial Distribution (Teacher's Edition)-2021-2022 School Year Senior 2 Mathematics Synchronous Lecture Notes (Renjiao A Edition 2019 Selective Compulsory Volume 3)

Lecture 05 Binomial Theorem (Teacher's Edition)-2021-2022 School Year Senior 2 Mathematics Synchronous Lecture Notes (Renjiao A2019 Selective Compulsory 3 Chapter 6)

Lecture 05: Sufficient Conditions, Necessary Conditions, and Sufficient Conditions (Teacher's Edition)-Senior One Mathematics Synchronous Lecture Notes (Sujiao Edition 2019 Compulsory Volume 1)