Nature：DeepMind大模型突破60年数学难题，解法超出人类已有认知

author：Quantum Position 2023-12-15 13:10:00

Cressy from the temple of Wafei

量子位 | 公众号 QbitAI

Using large models to solve problems that have plagued mathematicians for more than 60 years, the latest results of Google's DeepMind are published in Nature.

作者之一、谷歌DeepMind研究副总裁Pushmeet Kohli表示:

This scenario will not be in the training data, and it was not even known to humans before.

The technology is called FunSearch, where Fun is an abbreviation of the word "Function".

Leverage large models to solve long-standing scientific puzzles and generate verifiable and valuable* new information that didn't exist before.

In the news interpretation accompanying the Nature paper, the head of DeepMind said that "the way we use large models is as a creativity engine".

This is the first time that a system based on a large model has been shown to surpass the cognition of mathematicians and computer scientists.

Not only is it new, but it's more effective than anything else that exists today.

In response to this achievement, some netizens sighed:

If this is true, it is the most important discovery since the self-fire of mankind.

So, what problems does FunSearch solve?

Find a better solution to the NP-hard problem

DeepMind specifically shows two types of problems, both of which are NP-hard problems.

In the academic community, there is not, and probably will never be, an algorithm that can find an exact solution to the NP-hard problem in polynomial time in all cases.

Faced with such a problem, researchers usually look for approximate solutions or effective algorithms that are suitable for a specific situation.

Specific to FunSearch, the first type of NP-hard problem it solves is the Cap set problem, which is a type of upper set problem, and it is described like this:

In an n-dimensional space, there are equidistant n points in each dimension (a total of n^n, for example, 3 dimensions is 3*3*3), find as many points as possible to form a set, and require any 3 points in the set to be non-collinear, how many points are there in such a set?

If it seems a little confusing, let's take a look at the precursor to the Cap set problem, a card game invented by geneticist Marsha Falco in the '70s.

There are 81 cards in this card game, and each card has 1 to 3 color patterns, and the patterns in the same card are all the same color, shape, and shadow.

There are a total of 3 colors, 3 shapes and 3 shades in this deck, plus the number of patterns, a total of 3*3*3*3=81 cards, players need to turn over some cards to find a special combination of 3 cards.

If we express this "special combination" in a discrete geometric form, we get the Cap set problem.

The Cap set problem was also created in the 70s by the University of Oxford mathematician Ron Graham, and the first important results did not appear until the 90s.

In 2007, Tao Zhexuan mentioned in a blog post that it was his favorite open-ended math problem.

Before the advent of FunSearch, the most significant breakthrough in the Cap set problem was proposed in 2016 by American mathematician Jordan Ellenberg and Dutch mathematician Dion Gijswijt.

Using the polynomial method, Ellenberg and Gijswijt reduced the upper definite bound of the solution of such a problem to 2.756^n at n>6 (where the maximum set can be found precisely at n≤6).

Also at n>6, the newer figure for the lower bound is 2.218^n, proposed by Fred Tyrrell, a PhD student at the University of Bristol, in 2022.

But this lower bound exists only theoretically – at n = 8, there are only 496 points in the largest set that can be constructed, whereas according to Tyrrell, the number of points should be no less than 585.7.

FunSearch has expanded the size of the set to 512 points – a gap from the theoretical value, but still considered the most significant breakthrough on the issue in 20 years.

At the same time, the lower bound of the Cap set set size is also increased to 2.2202^n by FunSearch.

The second category is the online packing problem:

Suppose there is a set of standard containers with a capacity of C and a sequence of n items (items no larger than C) that arrive in a certain order.

"Online" means that the operator cannot see all the items in advance, but must decide which container to load the items into as soon as they arrive.

The ultimate goal is to keep the number of containers used as small as possible.

The problem of online packing has been widely studied since the 70s of the last century, and can be traced back to the layout problem studied by Gauss in 1831.

After nearly 200 years of research, there is still no mature theory and effective numerical calculation method.

Traditionally, there are two commonly used greedy algorithms: First Fit and Best Fit:

First Fit refers to putting each item in the first box that can hold it.
Best Fit puts each item into a box that can hold it and has the least amount of space left in the chest.

FunSearch, on the other hand, proposes a new algorithm that significantly reduces the number of containers used in both OR and Weibull test datasets.

In particular, when the number of items in the test set reached 100,000, the solution found by FunSearch consumed only 0.03% more containers than the theoretical lower bound.

(The data in the table below represent the difference from the theoretical lower bound, the lower the number, the better the performance)

So, how does FunSearch do it?

Search for "program" instead of "answer"

Overall, FunSearch's workflow is an iterative process, with the core of searching for a program that solves a problem, rather than the answer itself.

Search is exactly the route that DeepMind has been exploring since AlphaGo.

Co-founder Shane Egg explained in an interview:

Where does AlphaGo's key "37th move" to defeat Lee Sedol come from? Not from human game data, but from the search of probability space.

Today's large models are just mimicking and mixing different training data, and to generate real creativity and go beyond the current architecture, they need to incorporate search.

Going back to the latest achievement, FunSearch, there is a library in the system, from which the system searches for the initial program and enters the large model (PaLM2 for experiments, other codes are also compatible as long as the code is supported).

On this basis, the large model builds a new program and hands it over to the automatic evaluation system, and the program with the highest score will be added to the program library to achieve self-circulation.

The evaluation system generates test cases based on the user's questions and then determines whether the output of the candidate program is correct.

Depending on the level of complexity, the method of determining whether it is true or false can be either directly inspected or called by the relevant function.

At the same time, the evaluation system is also set up with fault-tolerant logic to avoid problems such as timeouts affecting the overall process.

Ultimately, an overall score is given based on the behavior of the candidate program on these test cases, which provides a basis for results generation and subsequent library updates.

According to co-author Jordan Ellenberg of the University of Wisconsin-Madison, an important feature of FunSearch is that people can see and Xi from the successful solutions generated by AI, which is completely different from the previous black box model of AI.

The most exciting thing for me is to build new models of human-robot collaboration, which I don't want to use as a substitute for human mathematicians, but as force multipliers.

Address:

https://www.nature.com/articles/s41586-023-06924-6

Reference Links:

[1]https://deepmind.google/discover/blog/funsearch-making-new-discoveries-in-mathematical-sciences-using-large-language-models/

[2]https://www.technologyreview.com/2023/12/14/1085318/google-deepmind-large-language-model-solve-unsolvable-math-problem-cap-set/

[3]https://www.nature.com/articles/d41586-023-04043-w

— END —

QbitAI · Headline number signed

Nature：DeepMind大模型突破60年数学难题，解法超出人类已有认知

Find a better solution to the NP-hard problem

Search for "program" instead of "answer"

Read on