The white cross west wind comes from the Wafei Temple

Quantum Position | 公众号 QbitAI

Just now, the big model has captured a city again!

Google's DeepMind announced that their mathematical AI "won" the IMO (International Mathematical Olympiad) silver medal and are just one point away from the gold medal!

Yes, you heard it right! It is a math Olympiad problem that is difficult for the vast majority of human beings. You must know that only 58 of the 609 participants in this year's IMO have reached the gold medal level.

Google AI won the IMO silver medal, only one point short of gold! The fourth question takes only 19 seconds

This time, Google AI solved 4 of the 6 questions in the 2024 IMO competition, and scored a perfect score of 28 points in total. (42 points out of 42, 29 points for gold medal)

Among them, the fourth geometry question, the AI only took 19 seconds?!

The sixth question, which is known as the most difficult question of this year, was won by only five contestants this year, and it was completely correct.

The results were also professionally recognized by the IMO Organising Committee – IMO Gold Medalist and Fields Medal winner Professor Timothy Gowers and two-time IMO Gold Medalist and Chair of the 2024 IMO Problem Selection Committee Dr. Joseph Myers.

Professor Timothy Gowers was impressed: "It is far beyond the state of the art I know."

How do you do it?

Google won the IMO silver medal, and a new member of the Alpha family was launched

The IMO silver medal was won by two members of Google's Alpha family, each of whom has a specialization in several industries.

AlphaProof, a new member of the Alpha family, is a formal mathematical reasoning system based on reinforcement learning.
AlphaGeometry 2, previously an improved version of AlphaGeometry, is designed to solve geometry problems.

Let's start with a new member, AlphaProof.

It is a self-training system that proves mathematical statements using the formal language Lean. It combines a pre-trained language model with the AlphaZero reinforcement learning algorithm.

By fine-tuning Gemini, the team was able to automatically convert natural language statements into formal language Lean statements, creating a large math problem bank.

When a problem is encountered, AlphaProof generates solution candidates, and then proves or refutes those candidates by searching for possible proof steps in Lean.

Each found and verified proof is used to strengthen AlphaProof's language model, improving its ability to solve more challenging problems in the future.

In the first few weeks of the competition, it was training with millions of IMO level questions over and over again.

Training loops are also applied during the competition, reinforcing their own proofs until a complete solution is found.

Let's take a look at the evolved AlphaGeometry 2. It is a neuro-symbolic hybrid system in which the language model is based on Gemini.

Its predecessor, 1.0, also made it to Nature this year: without human demonstration, reaching the level of geometry of an IMO gold medalist.

Compared to the previous version, it uses an order of magnitude larger synthetic data for de novo training. And it uses a symbolic engine that is two orders of magnitude faster than its predecessor. When a new problem is encountered, a new knowledge-sharing mechanism is used to implement a high-level combination of different search trees to solve more complex problems.

Even before the official competition, it was already able to solve 83% of all IMO geometry problems of the last 25 years, compared to only 53% of its predecessor.

In this year's IMO event, it took only 19 seconds to complete the fourth problem.

Next, let's take a look at how the two IMO play together this time.

First, the problem is manually translated into formal mathematical language so that the system can understand it.

We know that humans compete by submitting answers in two periods of 4.5 hours each.

Google's two systems solved one problem in a matter of minutes, while the others took three days.

Eventually, AlphaProof solved two algebra problems and one number theory problem by determining the answer and proving its correctness.

This included the most difficult question in the competition, which was solved by only five contestants in this year's IMO competition.

AlphaGeometry 2 solves the geometry problem, while the two-pass combination problem remains unresolved.

In addition to this, the Google team also experimented with a Gemini-based natural language reasoning system. In other words, there is no need to translate the question into formal language, and it can be used in conjunction with other AI systems.

The team said they will explore more AI methods to advance mathematical reasoning in the future.

More technical details about AlphaProof are also scheduled to be released soon.

Netizen: I don't know math but I was shocked

Seeing the performance of these two systems, netizens said that they "don't understand mathematics but are shocked".

AI程序员Devin团队Cognition AI联合创始人Scott Wu表示：

The results are truly amazing. As a child, the Olympiad was all I had. It was never thought that they would be solved by artificial intelligence in 10 years.

OpenAI科学家Noam Brown也开麦祝贺：

However, some netizens said that if the standard competition time is followed (the competition is divided into two days, four and a half hours a day, and three questions are solved every day), the two AI systems can actually only solve one of the six problems.

This statement was immediately refuted by some netizens:

In this scenario, speed is not the main concern. If the number of floating-point operations (FLOPS) remains the same, increasing compute resources will reduce the time it takes to resolve the issue.

In response to this, some netizens questioned:

If the two AI systems fail to solve the combined question, is it a training problem or insufficient computing resources, and the time is not enough? Or are there any other limitations?

Timothy Gowers教授发推文给出了他的看法：

If human contestants were allowed to spend more time on each question, their scores would undoubtedly be higher. However, for AI systems, this is far beyond the capabilities of the previous automatic theorem demonstrators; Second, the time required is expected to be further reduced as efficiency increases.

However, two days ago, the large model was still stuck in the question "Which number is bigger, 9.11 or 9.9?" "With such a primary school problem, how can this side of the large model solve the problem of the Olympiad level?!

I lost my mind, and now how did I have a flash of inspiration and regain my mind?

Nvidia scientist Jim Fan explains: it's a problem with the distribution of training data.

Google's system is trained on formal proofs and domain-specific notation engines. In a way, they are highly specialized in solving the Olympiad, even if they are based on a common big model.

And the training set like GPT-4o has a lot of GitHub code data mixed in, which may far exceed the math data. In the software version, "v9.11>v9.9", which can severely distort the distribution. So, this mistake is quite plausible.

He described this strange phenomenon as:

We found a very peculiar region, like an exoplanet that looks like Earth but is full of bizarre valleys.

There are also enthusiastic netizens who cue OpenAI, maybe you can also try it......

To this, Ultraman's reply was:

Reference Links:

[1]https://x.com/googledeepmind/status/1816498082860667086?s=46

[2]https://x.com/jeffdean/status/1816498336171753948?s=46

[3]https://x.com/quocleix/status/1816501362328494500?s=46

[4]https://x.com/drjimfan/status/1816521330298356181?s=46

[5]https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/

— END —

QubitAI · 头条号签约

Google AI won the IMO silver medal, only one point short of gold! The fourth question takes only 19 seconds

Google won the IMO silver medal, and a new member of the Alpha family was launched

Netizen: I don't know math but I was shocked