laitimes

Caltech Chinese use AI to subvert mathematical proofs!Shocked Tao Zhexuan, 80% of mathematical steps are automated

author:New Zhiyuan

Editor: Editorial Department

【New Zhiyuan Guide】The formal research artifact Lean that Tao Zhexuan is full of praise for, but there is a bug in the reasoning of running LLM. Recently, the Caltech team solved this bug and automated more than 80% of the math proof steps!

Lean Copilot, this formal mathematical tool that Tao Zhexuan and many other mathematicians are full of, has evolved again?

Just now, Caltech professor Anima Anandkumar announced that the team has released an extended version of the Lean Copilot paper and updated the codebase.

Caltech Chinese use AI to subvert mathematical proofs!Shocked Tao Zhexuan, 80% of mathematical steps are automated

Address: https://arxiv.org/pdf/2404.12534.pdf

The latest experiments have shown that this Copilot tool can automate more than 80% of the mathematical proof steps, which is 2.3 times better than the previous baseline aesop.

And, as before, it is open source under the MIT license.

Caltech Chinese use AI to subvert mathematical proofs!Shocked Tao Zhexuan, 80% of mathematical steps are automated

A great contribution to this is a Chinese brother Song Peiyang, who is an honorary CS undergraduate student at UCSB and a SURF researcher in the Department of Computing + Mathematical Sciences (CMS) at the California Institute of Technology.

Netizens exclaimed: So, Tao Zhexuan's current mathematical research can be accelerated by 5 times in situ?

Caltech Chinese use AI to subvert mathematical proofs!Shocked Tao Zhexuan, 80% of mathematical steps are automated

LLMs propose proof strategies for seamless human intervention

The team released Lean Copilot's tool in the hope of initiating collaboration between humans and LLMs to produce 100% accurate formal mathematical proofs.

It solves a core technical challenge: inference to run LLMs in Lean.

With this tool, we can get LLMs to come up with proof strategies in Lean, allowing humans to intervene and modify in a seamless way.

Caltech Chinese use AI to subvert mathematical proofs!Shocked Tao Zhexuan, 80% of mathematical steps are automated

The project was developed because the proof of the automated theorem is still a daunting challenge today.

We all know that LLMs are often unreliable when doing math and reasoning tasks, making mistakes, hallucinations, and being very unreliable.

Caltech Chinese use AI to subvert mathematical proofs!Shocked Tao Zhexuan, 80% of mathematical steps are automated

Therefore, until now, mathematical proofs have mostly been manually derived and need to be carefully verified.

Theorem proving tools like Lean's can formalize every step of the proof process, but it takes a lot of effort for humans to write Lean.

In this context, the birth of Lean Copilot is of great significance.

The artifact that shocked Tao Zhexuan many times: the mathematician will be finished before he can use it

LLM can be used as a tool to assist humans in proving theorems, and this thesis has been confirmed by Tao Zhexuan many times.

He just predicted in his blog that in 26 years, AI will be combined with search and symbolic math tools to become a trusted co-author in mathematical research.

Immediately afterward, studies to support his views sprung up like mushrooms after a rain.

In June last year, scholars from Caltech, NVIDIA, MIT and other institutions built a theorem provers, LeanDojo, based on open-source LLM.

Caltech Chinese use AI to subvert mathematical proofs!Shocked Tao Zhexuan, 80% of mathematical steps are automated

In September, researchers from Microsoft Research Asia, Peking University, Beihang University and other institutions successfully made GPT-4 come to the conclusion of "P≠NP" through 97 rounds of "Socratic" rigorous reasoning, solving this millennial problem.

Caltech Chinese use AI to subvert mathematical proofs!Shocked Tao Zhexuan, 80% of mathematical steps are automated

In the 97th round of dialogue, GPT-4 concluded that the proof example could not be solved without exhaustive methods, proving that the conclusion was P≠NP

In October last year, with the help of GPT-4 and Copilot, Tao Zhexuan directly discovered a hidden bug in his paper.

In the process of formalizing the argument on page 6 with Lean4, he found the expression

Caltech Chinese use AI to subvert mathematical proofs!Shocked Tao Zhexuan, 80% of mathematical steps are automated

At n = 3, k = 2, it is actually divergent.

This not-so-obvious bug was caught in time thanks to Lean4. The reason is that Lean asked him to construct 0<n−3, but Tao Zhexuan only assumed n>2. Thus, Lean cannot be rebutted based on a negative 0<n−3.

Caltech Chinese use AI to subvert mathematical proofs!Shocked Tao Zhexuan, 80% of mathematical steps are automated

This discovery directly shocked Tao Zhexuan's pupils.

Caltech Chinese use AI to subvert mathematical proofs!Shocked Tao Zhexuan, 80% of mathematical steps are automated

At the end of last year, Tao Zhexuan directly and successfully used AI tools to complete the formal polynomial Freiman-Ruzsa conjecture proof process.

Caltech Chinese use AI to subvert mathematical proofs!Shocked Tao Zhexuan, 80% of mathematical steps are automated

Finally, the dependency graph has been completely covered in green, and the Lean compiler reports that this conjecture follows the standard axiom exactly.

Caltech Chinese use AI to subvert mathematical proofs!Shocked Tao Zhexuan, 80% of mathematical steps are automated

In this process, all front-line mathematical researchers felt the direct impact of AI on the subversive power of mathematical research for the first time.

Lean Coilot,让Lean更好用

And today, Lean Copilot's research has made Lean even stronger.

In this paper, the team built tools based on Lean Copilot for suggesting proof steps (policy recommendations), completing intermediate proof objectives (proof searches), and selecting relevant premises using LLMs (premise selection).

The experimental results also fully show that Lean Copilot is effective in assisting humans in automating theorem proof compared with the existing rule-based proof automation in Lean.

Lean Copilot provides a common framework for running LLM inference locally with CTranslate 2 or on a server.

Through this framework, users will be able to create a variety of automated proof tools.

Caltech Chinese use AI to subvert mathematical proofs!Shocked Tao Zhexuan, 80% of mathematical steps are automated

Lean is a popular proof assistant among mathematicians. As shown in the diagram below, a proof in Lean is made up of a series of proof steps called tactics.

Caltech Chinese use AI to subvert mathematical proofs!Shocked Tao Zhexuan, 80% of mathematical steps are automated

Starting with the entire theorem as the initial goal, the strategy repeatedly transforms the current goal into simpler sub-goals until all goals are solved.

The user interactively writes the policy in the VSCode-powered IDE, which displays the target in the infoview panel on the right.

Generate policy recommendations

Using Lean Copilot, the team built suggest_tropics, a tool for generating policy recommendations using LLMs.

And in itself, it is also a strategy.

When applied, it enters the current target into the LLM and gets the generated policy candidate list from the LLM.

It looks at each option to see if they 1) cause an error, 2) the result is not wrong, but the proof cannot be completed, and 3) the proof is completed successfully.

If it's 1), the policy will be removed.

Caltech Chinese use AI to subvert mathematical proofs!Shocked Tao Zhexuan, 80% of mathematical steps are automated

Only error-free policies are displayed in the view panel on the right.

Policies that successfully complete proofs are marked in green (category 3), and policies that do not change the attestation goal in error but do not complete proofs are marked in blue (category 2).

Note: When all of the listed policies fall into category 2, this information can be extremely valuable to the user.

In this case, the remaining target information can directly help the user select a strategy as the next intermediate attestation step.

Once they see the suggestions, they can choose whether to accept them, or use them as a source of inspiration for a new strategy.

For example, we define a theorem add_abc in Lean code, and its initial goal is shown on the right in Figure 3.

Caltech Chinese use AI to subvert mathematical proofs!Shocked Tao Zhexuan, 80% of mathematical steps are automated

When we enter the suggest_tropics, we will see the strategy suggestions on the right.

The first strategy is shown in green, indicating that the attestation was successfully completed.

The next three suggestions are all blue, which indicates that the proof cannot be completed directly, but it does not result in an error.

Therefore, they are likely to be valid intermediate proof steps!

At the same time, the remaining sub-goals are also displayed.

The Tactic state field shows No goal because at least one strategy suggestion can be proven.

Caltech Chinese use AI to subvert mathematical proofs!Shocked Tao Zhexuan, 80% of mathematical steps are automated

Search for full proofs

In addition, because neither humans nor machines can consistently produce the right strategy, it is necessary to backtrack and explore different alternatives in the process, and this process is proof search.

When it is the Suggest_tropics mentioned above, it can only generate the policy of the current step, and does not have the ability to search for multi-policy proofs.

To do this, the team combined it with aesop, a rule-based proof search tool, to build an LLM-based proof search tool.

Aesop enforces best-first search as a policy for Leans and allows users to configure how the search tree is extended.

Caltech Chinese use AI to subvert mathematical proofs!Shocked Tao Zhexuan, 80% of mathematical steps are automated

The search tree is made up of targets that are nodes.

At first, it only had the original target as the root node. At each step, AESOP selects the most promising unscaled nodes, scales them by applying policies, and adds the resulting nodes as child nodes.

Caltech Chinese use AI to subvert mathematical proofs!Shocked Tao Zhexuan, 80% of mathematical steps are automated

And when AESOP finds a path from the root cause to the target that can be easily solved, the search is successful!

Therefore, the performance of AESOP depends critically on whether the user has configured a valid rule set.

This shows that AESOP lacks flexibility. As a result, Search_proof augments AESOP's ruleset with goal-related policies generated by the suggest_tropics at each step, making it more flexible.

For the original target in Figure 3, the user simply enters the search_prrof and finds the complete proof that the target can be solved, which is displayed in the information view (Figure 5 right).

As you can see, the remaining Tactic state is No goals due to the evidence of success found.

Caltech Chinese use AI to subvert mathematical proofs!Shocked Tao Zhexuan, 80% of mathematical steps are automated

Select the annotated premise

In addition, another challenging and important task in theorem proofs is to find the relevant premises for reducing or completing the proof.

In addition to a large number of premises in the source and standard libraries, Lean also has a large math library (Mathlib).

However, searching for candidate premises from all libraries is extremely difficult and time-consuming.

So many people are trying to be assisted or automated in Lean, or other proof assistants.

Caltech Chinese use AI to subvert mathematical proofs!Shocked Tao Zhexuan, 80% of mathematical steps are automated

In Lean, the state-of-the-art premise selection method is a random forest-based framework implemented directly in Lean.

However, the premise selection task is well suited to retrieval enhanced LLMs, i.e., training a retrieval matrix (premise embedding) during large model training to estimate the correlation between the proof target and the candidate premise.

Given the proof goal for inference, the target is first encoded into a vector, and then a matrix vector multiplication is performed between the premise embedding and the target vector.

Then, in order to select the first k premises (where k can be a hyperparameter that determines how many premises the user wants to return), only the k premises with the highest score are returned.

And to perform inference tasks in Lean, in addition to the fast inference provided by Lean Copilot, you need an efficient matrix multiplication library and a numpy matrix reader in C++.

The researchers used a matrix multiplication function from CTranslate2 and a C++ fast numpy file reader from Libnpy.

Again, they link these numbers to Lean through the FFI mechanism.

As a result, the premise selection strategy can be run very efficiently, as the premise embedding can be pre-computed and all subsequent operations can be done quickly in C++ using the libraries described above.

After obtaining the premise of the return, the researcher further annotated it with useful information.

All prerequisites are divided into two categories: prerequisites that can be used directly in the current environment (in-scope premises) and prerequisites that are not directly usable in the current environment (out-of-scope prerequisites).

It depends on whether the required package is imported or not.

If you have already imported the package required for the premise, you can easily use the premise. The annotated in-range premise is shown in Figure 6 below.

Caltech Chinese use AI to subvert mathematical proofs!Shocked Tao Zhexuan, 80% of mathematical steps are automated

Figure 7 shows the annotated out-of-scope premise.

Caltech Chinese use AI to subvert mathematical proofs!Shocked Tao Zhexuan, 80% of mathematical steps are automated

As an example of using "premise selection", for the theorem add_abc in Figure 3, you can directly enter select_premises in the proof (left in Figure 8).

A list of relevant prerequisites will then appear in the information view (Figure 8 right).

For this simple theorem, it is clear that the chosen premises are indeed related, since they are all related to natural numbers and the rules of addition.

In this case, the 4 prerequisites selected are all in the current range, which means that their modules have already been imported.

Caltech Chinese use AI to subvert mathematical proofs!Shocked Tao Zhexuan, 80% of mathematical steps are automated

Above, the researchers have built three practical proof automation tools with Lean Copilot for strategy suggestions, search proofs, and premise selection.

81.2% of the proof steps are automated

With the Lean Copilot framework, the researchers empirically hypothesized that human-robot collaboration in Lean Interactive Theorem Proof (ITP) is beneficial.

Due to the theorem proving process in Lean, the strategy proof is mainly used.

Therefore, in specific experiments, the authors mainly evaluated proof automation tools for "strategy suggestion" and "proof search".

All in all, AESOP is currently the most advanced rule-based proof automation tool for proof searches.

The researchers verified the effectiveness of the LLM-based search proof compared to aesop in two cases:

(1) Self-proof theorem (LLM completed independently)

(2) Assisting humans in theorem proofing (collaboration between humans and AI)

In addition, the researchers compared the search proof to the strategy suggestion to demonstrate the advantage of the search proof in addition to a single policy suggestion.

Studying how Lean Copilot can effectively help humans with the process of ITP is similar to the paradigm of humans using Copilot in software programming.

That is, when we are faced with a target, we first call Copilot to see if it can solve the problem directly.

If we can't, we'll simplify the goal further and try Copilot again. Then, repeat the process until Copilot successfully solves the remaining objectives.

It's through this iterative collaboration paradigm that researchers look at how much human labor each proof automation tool can automate.

The specific results are shown in Table 1 below.

Proof search (search_proof) can automatically prove 64% of theorems (32 out of 50), significantly higher than AESOP and strategy suggestions (suggest_tropics).

When used to assist humans, it was proved that the search required an average of only 1.02 manual input policies, which was also better than AESOP (3.62) and policy suggestions (2.72).

Caltech Chinese use AI to subvert mathematical proofs!Shocked Tao Zhexuan, 80% of mathematical steps are automated

Finally, for the theorem for each test, the authors calculated the percentage of proof steps that could be automated in each of the three tools.

The results show that the proof search can automatically complete about 81.2% of the proof steps in the theorem, which is significantly higher than the strategy suggestion (48.6%) and AESOP (35.2%).

Overall, the performance of the proof search is 1.67 times higher than the policy recommendations and 2.31 times better than the rule-based baseline AESOP.

Local LLM inference in Lean via Copilot

Tactic suggestion, proof search, and premise selection in Lean Copilot may seem different in nature, but the requirements for user experience are similar.

They all need to be fast enough to generate responses, with modest computational needs, while running in Lean.

The reason for these requirements is that Lean itself is able to provide feedback on the environment (e.g. residual targets, error messages, type information, etc.) very quickly in most cases.

This kind of speed is consistent with the essence of the proof theorem - it requires coherent reasoning.

If Lean Copilot requires users to wait for a long time, it will be difficult for the collaboration between humans and AI to work.

Again, we really want to meet the need for low compute. Because the proof of the theorem in Lean itself does not require a GPU, it can be run on the user's local laptop.

Therefore, being able to run efficiently on most hardware, including laptops without GPUs, is very important for Lean users.

This is because users may not be able to access CUDA-enabled GPUs when writing proofs.

Because of the need to meet the needs of fast inference and low computation, and the fact that all popular and efficient deep learning frameworks are in Python, a natural solution for the team to come up with was to host the model (local or remote) in Python and then make requests to the model from Lean.

However, this approach is affected by the overhead of inter-process communication, and it requires additional setup steps on the part of the user, which is not suitable for Lean's traditional workflow.

To overcome these issues, Lean Copilot runs the LLM locally in Lean via an External Functional Interface (FFI).

FFI is a mechanism by which a program written in one language can invoke a subroutine of another language.

The Lean part is implemented in C++ and can be efficiently interoperable with C++.

A programmer can declare a function in Lean, but implement a function body in C++. Implementations are compiled into a shared library and dynamically linked to Lean.

By default, we use LeanDojo's pre-trained repver model. It is based on an encoder-decoder converter, BVT5, which maps the input string to the output string.

Lean Copilot makes the model runnable in Lean by wrapping it into a C++ function that operates on strings, which can be called in Lean via FFI.

Caltech Chinese use AI to subvert mathematical proofs!Shocked Tao Zhexuan, 80% of mathematical steps are automated

Chinese authors have made great contributions

The team of three in the latest paper is also the author of the open source platform LeanDojo in June '23.

Caltech Chinese use AI to subvert mathematical proofs!Shocked Tao Zhexuan, 80% of mathematical steps are automated

Address: https://arxiv.org/pdf/2306.15626.pdf

Peiyang Chang(宋沛洋)

Caltech Chinese use AI to subvert mathematical proofs!Shocked Tao Zhexuan, 80% of mathematical steps are automated

Peiyang Song is an undergraduate computer science honors student at the School of Creative Studies (CCS) at the University of California, Santa Barbara, where she is mentored by Richert Wang and Phill Conrad.

He is also a SURF Fellow in the Department of Computational and Mathematical Sciences (CMS) at the California Institute of Technology, co-supervised by Prof. Anima Anandkumar and Dr. Kaiyu Yang.

Caltech Chinese use AI to subvert mathematical proofs!Shocked Tao Zhexuan, 80% of mathematical steps are automated

In addition, he is a fellow at UC's Berkeley Architecture Lab, where he collaborated with Tim Sherwood and Dr. Murphy. Jeremy Lau (Google) in collaboration with him.

His research interests are in machine learning (ML) with applications such as natural language processing (NLP) and computer vision (CV), as well as fundamental theories such as systems and programming languages (PL).

Song's recent research has two main directions.

The first is Neural Symbolic Reasoning and Artificial Intelligence Mathematics (AI4Math), which combines large models with interactive theorem provers (ITPs).

The other is energy-efficient machine learning based on time-series logic.

Guy Yang (杨凯峪)

Caltech Chinese use AI to subvert mathematical proofs!Shocked Tao Zhexuan, 80% of mathematical steps are automated

Kaiyu Yang is a postdoctoral researcher in the Department of Computational + Mathematical Sciences (CMS) at the California Institute of Technology, where her supervisor is Anima Anandkumar.

He received his Ph.D. from Princeton University under the supervision of Jia Deng, and has worked with Olga Russakovsky and Danqi Chen.

His research focuses on neurosymbolic artificial intelligence, which aims to enable machine learning to perform symbolic inference, which he hopes will achieve in two directions:

(1) apply machine learning to symbolic reasoning tasks, such as mathematical reasoning and theorem proofs in formal logic or natural language;

and (2) introduce symbolic components into machine learning models to make them more interpretable, verifiable, and data-efficient.

Currently, he is working on artificial intelligence that can understand and reason about mathematics. Mathematical reasoning is an important milestone in human intelligence, and it has the potential to transform many important problems in science and engineering, such as solving partial differential equations and formula verification.

Anima Anandkumar

Caltech Chinese use AI to subvert mathematical proofs!Shocked Tao Zhexuan, 80% of mathematical steps are automated

Anima Anandkumar is now a Professor of Computational and Mathematical Sciences at the California Institute of Technology.

Her research interests focus on large-scale machine learning, non-convex optimization, and high-dimensional statistics.

In particular, she has been spearheading the development and analysis of tensor algorithms for machine learning.

The tensor decomposition method is extremely parallel and scalable, and can be applied to massive amounts of data. It guarantees convergence to the optimal solution and outputs consistent estimation results for many probabilistic models, such as the Markov model.

More broadly, Professor Anandkumar has been working on efficient techniques to accelerate non-convex optimization.

Resources:

https://arxiv.org/abs/2404.12534

https://github.com/lean-dojo/LeanCopilot

https://twitter.com/animanandkumar/status/1782518528098353535

Read on