laitimes

Blow up AI and biochemical rings! GPT-4 learns to do its own scientific research and teaches humans to do experiments by hand

Blow up AI and biochemical rings! GPT-4 learns to do its own scientific research and teaches humans to do experiments by hand

  Shin Ji Won reports  

Editor: Editorial Office

The AI agent composed of GPT-4 and other large models can already teach you to do chemical experiments hand-in-hand, and it is clear what reagents to choose, how much dose, and how the reasoning reaction will occur. Shudder, biochemical ring!

Amazing, GPT-4 has learned to do scientific research on its own?

Recently, several scientists at Carnegie Mellon University published a paper that blew up both the AI and chemical circles.

They made an AI that would do their own experiments and do their own scientific research. This AI is composed of several large language models, which can be seen as a GPT-4 proxy agent, and its scientific research capabilities are bursting.

Blow up AI and biochemical rings! GPT-4 learns to do its own scientific research and teaches humans to do experiments by hand

Because it has a long-term memory from vector databases, it can read, understand complex scientific documents, and conduct chemical research in a cloud-based robotics lab.

Netizens were so shocked that they lost their speech: So, this is AI to study and then publish itself? Oh, my God.

Others lamented that the era of the "Vensheng Experiment" (TTE) is coming!

Is this the legendary AI holy grail of chemistry?

Recently, many people think that we live every day like science fiction.

The AI version of Breaking Bad is coming?

The strongest LLM on the surface can score high on the SAT and BAR, pass the LeetCode challenge, give a picture to do physics problems correctly, and understand the memes in the emoji.

The technical report also mentioned that GPT-4 can also solve chemical problems.

This inspired several scholars in Carnegie Mellon's Department of Chemistry to develop an AI based on multiple large-language models that would allow it to design its own experiments and do its own experiments.

Address: https://arxiv.org/abs/2304.05332

And this AI they made, really 6 can't do it!

It sclinks literature on its own, precisely controls liquid handling instruments, and solves complex problems that require the simultaneous use of multiple hardware modules and the integration of different data sources.

There is the smell of the AI version of Breaking Bad.

Blow up AI and biochemical rings! GPT-4 learns to do its own scientific research and teaches humans to do experiments by hand

Will do the AI of ibuprofen yourself

For example, let this AI synthesize ibuprofen for us.

Blow up AI and biochemical rings! GPT-4 learns to do its own scientific research and teaches humans to do experiments by hand

Enter a simple prompt for it: "Synthetic ibuprofen."

Then the model will go online on its own to search for what to do.

It identified that the first step required the Friedel-Crafts reaction between ibutylbenzene and acetic anhydride catalyzed by aluminum chloride.

In addition, this AI can also synthesize aspirin.

Blow up AI and biochemical rings! GPT-4 learns to do its own scientific research and teaches humans to do experiments by hand

and synthetic aspartame.

Blow up AI and biochemical rings! GPT-4 learns to do its own scientific research and teaches humans to do experiments by hand

If a methyl is missing from the product and the model finds the correct synthesis example, it is executed in the cloud lab for correction.

Tell the model: Study the Suzuki reaction, and it will accurately identify substrates and products immediately.

Blow up AI and biochemical rings! GPT-4 learns to do its own scientific research and teaches humans to do experiments by hand

In addition, we can connect the model to the chemical reaction database, such as Reaxys or SciFinder, through the API, and stack a large buff on the model, and the accuracy rate soars.

Analyzing the previous records of the system can also greatly improve the accuracy of the model.

Take a chestnut

Let's first take a look at how to operate the robot to do experiments.

It treats a set of samples as a whole (in this case, the entire microplate).

We can give it a direct cue in natural language: "Color every other line with a color of your choice".

Blow up AI and biochemical rings! GPT-4 learns to do its own scientific research and teaches humans to do experiments by hand

When executed by a bot, these protocols are very similar to the prompt for the request (Figure 4B-E).

The agent's first action is to prepare a small sample of the original solution (Figure 4F).

Blow up AI and biochemical rings! GPT-4 learns to do its own scientific research and teaches humans to do experiments by hand

It then requires UV-Vis measurements. Upon completion, the AI obtains a file name containing a NumPy array containing the spectrum of each well of the microplate.

The AI then wrote Python code to identify the wavelength with the maximum absorbance and used the data to solve the problem correctly.

Pull it out and walk

In previous experiments, AI may be influenced by the knowledge received during the pre-training phase.

This time, the researchers intend to thoroughly evaluate AI's ability to design experiments.

The AI first integrates the required data from the network, runs some necessary calculations, and finally writes the program for the liquid reagent operating system (the leftmost part of the figure above).

In order to add some complexity, the researchers used the AI to apply the heating shaker module.

And these requirements are integrated and appear in the configuration of AI.

Blow up AI and biochemical rings! GPT-4 learns to do its own scientific research and teaches humans to do experiments by hand

The specific design is as follows: the AI controls an actual liquid operating system with two miniature versions, and the source version contains the source solution of multiple reagents, including phenylacetylene and phenylboronic acid, multiple aryl halide coupling partners, as well as two catalysts and two bases.

The image above is the content in the Source Plate.

Blow up AI and biochemical rings! GPT-4 learns to do its own scientific research and teaches humans to do experiments by hand

The target version is mounted on the heated shaker module.

In the image above, the left pipette on the left has a 20 μl volume and the single pipette on the right has a 300 μl volume.

The ultimate goal of AI is to design a process that can successfully implement the Suzuki and Sonogueshira reactions.

Let's tell it: you need to use some available reagents to generate these two reactions.

Then, it searches the Internet on its own, for example, what conditions are required for these reactions, what are the requirements for stometrometry, and so on.

Blow up AI and biochemical rings! GPT-4 learns to do its own scientific research and teaches humans to do experiments by hand

It can be seen that the AI has successfully collected the required conditions, the quantification and concentration of the required reagents, and so on.

The AI picked the right coupling partner to complete the experiment. Among all the aryl halides, AI chose bromobenzene for the Suzuki reaction experiment and iodobenzene for the Sonoger Hilla reaction.

And with each round, the choice of AI changes somewhat. For example, it also chose p-iodonitrobenzene, which is highly reactive in oxidation reactions.

Bromobenzene was chosen because bromobenzene can participate in the reaction and is less toxic than aryl iodine.

Blow up AI and biochemical rings! GPT-4 learns to do its own scientific research and teaches humans to do experiments by hand

Next, AI chose Pd/NHC as a catalyst because it worked better. This is a very advanced way to couple reactions. As for the choice of base, AI took a fancy to the substance triethylamine.

From the above process, we can see that the future potential of this model is limitless. Because it will repeatedly conduct experiments to analyze the reasoning process of the model and achieve better results.

After selecting the different reagents, the AI begins to calculate the amount required for each reagent and then begins to plan the entire experimental process.

The intermediate AI also made the mistake of using the name of the heating shaker module incorrectly. But the AI noticed this in time, spontaneously queried the data, corrected the experimental process, and finally ran successfully.

Blow up AI and biochemical rings! GPT-4 learns to do its own scientific research and teaches humans to do experiments by hand

Leaving aside the professional chemical process, let's summarize the "professionalism" that AI has shown in this process.

It can be said that from the above process, AI has shown extremely high analytical reasoning ability. It can spontaneously obtain the information needed to solve complex problems step by step.

In the process, you can also write super high-quality code yourself to advance the design of experiments. Moreover, you can change the code you wrote according to the output content.

OpenAI has successfully demonstrated the power of GPT-4, and GPT-4 will one day be able to participate in real experiments.

But the researchers don't want to stop there. They also gave AI a big problem — they gave it instructions to develop a new cancer drug.

Something that doesn't exist... Is this AI still working?

It turns out that there are really two brushes. AI adheres to the principle of not being afraid when encountering problems (of course, it does not know what to be afraid of), carefully analyzes the need to develop anti-cancer drugs, studies the current trend of anti-cancer drug research and development, and then selects a target to continue to deepen and determine its ingredients.

Then, the AI tries to start synthesizing on its own, also searching for information about the reaction mechanism and mechanism on the Internet, and then looking for examples of relevant reactions after the preliminary steps.

Finally, the synthesis is completed.

Blow up AI and biochemical rings! GPT-4 learns to do its own scientific research and teaches humans to do experiments by hand

The content in the above figure is impossible for AI to really synthesize, it is only a theoretical discussion.

Among them are methamphetamine (aka marijuana), heroin and other well-known drugs, as well as mustard gases that are banned from use.

Out of a total of 11 compounds, AI provided synthesis schemes for 4 of them and tried to consult data to advance the synthesis process.

The synthesis of 5 of the remaining 7 substances was decisively rejected by AI. AI searched the Internet for information about these 5 compounds and found that it could not be fooled around.

Blow up AI and biochemical rings! GPT-4 learns to do its own scientific research and teaches humans to do experiments by hand

For example, in an attempt to synthesize codeine, AI discovered a relationship between codeine and morphine. It is concluded that this thing is a controlled drug and cannot be synthesized casually.

However, this insurance mechanism is not stable. Users only need to modify the flower book slightly, and they can further let the AI operate. For example, the word compound A is used instead of directly referring to morphine, compound B is used instead of direct reference to codeine, and so on.

At the same time, the synthesis of some drugs must be licensed by the Drug Enforcement Administration (DEA), but some users can exploit this loophole to trick the AI into saying that they have permission and induce the AI to give a synthesis plan.

Familiar contraband like heroin and mustard gas, AI also knows well. The problem is that the system can only detect existing compounds so far. For unknown compounds, the model is less likely to identify potential hazards.

Blow up AI and biochemical rings! GPT-4 learns to do its own scientific research and teaches humans to do experiments by hand

For example, some complex protein toxins.

Therefore, in order to prevent someone from verifying the validity of these chemical components out of curiosity, the researchers also specially posted a big red background warning in the paper:

The illicit drug and chemical weapons synthesis discussed in this article is purely for academic research, with the primary purpose of highlighting the potential dangers associated with new technologies.

Under no circumstances should any person or organization attempt to remanufacture, synthesize, or otherwise produce the substances or compounds discussed herein. Engaging in such activities is not only very dangerous, but also illegal in most jurisdictions.

Blow up AI and biochemical rings! GPT-4 learns to do its own scientific research and teaches humans to do experiments by hand

I will go online and search for how to do experiments

This AI consists of multiple modules. These modules can exchange information with each other, and some can also access the Internet, API, and Python interpreter.

Blow up AI and biochemical rings! GPT-4 learns to do its own scientific research and teaches humans to do experiments by hand

After entering a prompt into the Planner, it starts to perform.

For example, it can go online, write code in Python, access documentation, and after figuring out the basics, it can experiment on its own.

When humans do experiments, this AI can guide us hand-in-hand. Because it can reason about various chemical reactions, it searches the Internet, it calculates the amount of chemicals needed in the experiment, and then it can perform the corresponding reaction.

If the description provided is detailed enough, you don't even need to explain it to it, and it can understand the whole experiment on its own.

Blow up AI and biochemical rings! GPT-4 learns to do its own scientific research and teaches humans to do experiments by hand

Once the "Web searcher" component receives a query from Planner, it uses the Google Search API.

After searching for results, it filters out the top ten documents returned, excludes PDFs, and passes the results to itself.

It then uses the "BROWSE" operation to extract text from the web page and generate an answer. Flowing clouds and flowing water, all in one go.

This task, GPT-3.5 can be completed, because its performance is obviously stronger than GPT-4, and there is no quality loss.

The "Docs searcher" component can find the most relevant parts through queries and document indexes, so as to sort through hardware documents (such as robotic liquid processors, GC-MS, cloud labs), and then summarize a best match result to generate a most accurate answer.

The Code execution component does not use any language model, but executes code in an isolated Docker container, protecting the end host from any unexpected actions by Planner. All code output is passed back to Planner so that the software can fix the prediction if something goes wrong. The same principle applies to "Automation" components.

Vector search, how difficult scientific literature can be understood

There are many difficulties in making an AI that can perform complex reasoning.

For example, to make it integrate modern software, users need to be able to understand the software documentation, but the language of this document is generally very academic and very professional, which creates great obstacles.

The big language model can use natural language to generate software documentation that non-experts can understand to overcome this obstacle.

Blow up AI and biochemical rings! GPT-4 learns to do its own scientific research and teaches humans to do experiments by hand

One of the sources of training for these models is a lot of information related to APIs, such as the Opentrons Python API.

However, GPT-4 training data is as of September 2021, so there is a greater need to improve the accuracy of AI using APIs.

To this end, the researchers devised a method to provide AI with documentation for a given task.

They generated OpenAI's ada embedding to cross-reference and calculate similarities related to queries. And the part of the document is selected by a distance-based vector search.

The number of provided parts, depending on the number of GPT-4 tokens present in the original text. The maximum number of tokens is set to 7800, so that it can be provided to AI-related files in just one step.

This method has proven to be crucial for providing AI with information about the heater-vibrator hardware module, which is necessary for chemical reactions.

Even greater challenges arise when this approach is applied to more diverse robotics platforms, such as Emerald Cloud Lab (ECL).

At this point, we can provide the GPT-4 model with information that is unknown, such as the Symbolic Lab Language (SLL) about Cloud Lab.

In all cases, the AI correctly recognizes the task and then completes it.

In this process, the model effectively retains information about various options, tools, and parameters for a given function. After ingesting the entire document, the model is prompted to generate a block of code using the given function and passed it back to Planner.

Regulation is strongly demanded

Finally, the researchers emphasize that safeguards must be put in place to prevent large language models from being misused:

"We call on the AI community to prioritize the security of these models. We call on OpenAI, Microsoft, Google, Meta, Deepmind, Anthropic, and other major players to do their utmost in the security of their large language models. We also call on the physical science community to work with the teams involved in developing large-scale language models to assist them in developing these safeguards."

Blow up AI and biochemical rings! GPT-4 learns to do its own scientific research and teaches humans to do experiments by hand

In this regard, New York University professor Marcus agreed: "This is not a joke, three scientists from Carnegie Mellon University urgently called for safety research on LLM."

Resources:

https://arxiv.org/ftp/arxiv/papers/2304/2304.05332.pdf

Blow up AI and biochemical rings! GPT-4 learns to do its own scientific research and teaches humans to do experiments by hand

Read on