laitimes

AlphaFold3: A Unified Tool for Biomolecular Prediction?

author:Return
AlphaFold3: A Unified Tool for Biomolecular Prediction?

2024年5月8日,谷歌DeepMind AlphaFold团队联合Isomorphic Labs公司在《自然》(Nature)杂志上发表了题为“Accurate structure prediction of biomolecular interactions with AlphaFold 3”的论文[1],推出了全新的能够准确预测蛋白质、DNA、RNA、小分子配体结构以及它们相互作用模式的结构预测工具AlphaFold3,并期望能够转变科学界对于生物世界以及药物发现的理解。

Written by | Liu Anji

Inside every plant, animal, and human cell, there are billions of molecular machines. These machines are made up of proteins, DNA, and other molecules, but none of them can work on their own. Only by observing how they interact in millions of combinations can we begin to truly understand the process of life.

AlphaFold3: A Unified Tool for Biomolecular Prediction?

The AlphaFold3 is a revolutionary new model that is revolutionary in two ways: breadth and accuracy. First of all, in previous structure prediction work (including AlphaFold2), structure prediction tools were often only targeted at a specific biomolecule, such as protein structure prediction or RNA structure prediction, but AlphaFold3 has the function of predicting the structure and interaction of almost all living molecules, and its breadth can be seen. Second, the accuracy of structural prediction has been greatly improved along with the broadness, with at least a 50% improvement over existing prediction methods for protein interactions with other molecule types, and double the prediction accuracy of AlphaFold3 for some important interaction classes.

AlphaFold3: A Unified Tool for Biomolecular Prediction?

Currently, AlphaFold3 is free to open up the AlphaFold Server to the scientific community, which contains most of AlphaFold3's features for scientists to make structural predictions. The link to the Alphafold Server is as follows: https://golgi.sandbox.google.com/about. Overall, Alphafold Server has a simple interface to use and a good degree of visualization, allowing users to submit sequences of various biomolecules on the website and easily make structural predictions. The sequence input screen and result display interface of the website are as follows:

AlphaFold3: A Unified Tool for Biomolecular Prediction?
AlphaFold3: A Unified Tool for Biomolecular Prediction?

(上)AlphaFold Server序列输入界面;(下)AlphaFold Server结果展示界面

In this article, we will answer three questions:

1. What improvements have been made to AlphaFold3?

2. Is AlphaFold3's prediction result much improved?

3. What else needs to be improved in AlphaFold3?

AlphaFold3 improvements

On July 15, 2021, a paper on Google's DeepMind AlphaFold2 was published in the journal Nature [2]. As a deep learning-based structure prediction tool, AlphaFold2 is able to predict the structure of proteins with high accuracy. The function of proteins depends largely on the structure of the protein, and determining what shape a protein folds into is known as the "protein folding problem", which has been a major challenge in biology for the past 50 years. AlphaFold2 has achieved remarkable results in the structure prediction competition CASP, which not only demonstrates the great potential of artificial intelligence in structural prediction, but also sets off a wave of using artificial intelligence to model proteins, which greatly expands the application range of protein modeling and design.

AlphaFold3: A Unified Tool for Biomolecular Prediction?
AlphaFold3: A Unified Tool for Biomolecular Prediction?

(Top left) Performance of previous CASP champions (Top right) AlphaFold2 prediction results compared to experimental results

(Below) Huge search space for proteins丨Image from AlphaFold official website: https://deepmind.google/technologies/alphafold/

After the launch of AlphaFold, there was a blowout boom in the whole field, and many subsequent methods adopted the ideas or technologies of AlphaFold2 to a greater or lesser extent. For example, some studies have found that simply changing the input can achieve better prediction results [3], and others have found that retraining an AlphaFold2 can also achieve good results in protein interaction prediction [4].

AlphaFold3: A Unified Tool for Biomolecular Prediction?

AlphaFold2架构图[2]

AlphaFold3: A Unified Tool for Biomolecular Prediction?

AlphaFold3架构图[1]

AlphaFold3 is also an improvement on AlphaFold2, with the goal of unifying tools for different biomolecules into a single neural network, enabling a single neural network framework to predict all biomolecular structures. With this goal in mind, the research team made the following improvements to include a wider range of chemical structures and improve the efficiency of data use:

1. Reduced number of Multiple Sequence Alignment (MSA) modules: AlphaFold2's subsequent research found that most of the computing time and resource usage of AlphaFold2 was caused by the MSA module.

2. Replace the encoder EvoFormer in AlphaFold2 with a simpler encoder, Pairformer, to rely less on MSA information and more on pair information.

3. The Diffusion Module has been introduced to replace the Structural Module in AlphaFold2. The new diffusion module can directly predict atomic coordinates, while the structure module needs to be manipulated by giving specific amino acid frameworks and side chain twist angles. The multi-scale nature of the diffusion process also allows AlphaFold3 to eliminate stereochemical losses and reduce special handling of bonding modes in the network, allowing for easy adaptation to arbitrary chemistries.

AlphaFold3: A Unified Tool for Biomolecular Prediction?

AlphaFold2的EvoFormer架构[2]

AlphaFold3: A Unified Tool for Biomolecular Prediction?

AlphaFold3的PairFormer[1]

AlphaFold3's predictions

AF3 enables structural prediction from the input polymer sequence, residue modifications, and ligand smiles. A series of examples are shown in the figure below highlighting the ability of AF3 to generalize across a number of biologically important and treatment-relevant patterns.

AlphaFold3: A Unified Tool for Biomolecular Prediction?

(a) protein-nucleic acid complex, (b) glycosylation, (c) antibody-peptide complex (def), small molecule inhibitor-protein complex

In order to measure the performance of AF3 in predicting different biomolecular structures, the researchers measured the accuracy of AF3 on four tasks: protein-ligand interaction, protein-nucleic acid complex, RNA structure, covalent modification, and protein complex.

AlphaFold3: A Unified Tool for Biomolecular Prediction?

For protein-ligand interaction prediction, AF3 was tested on the PoseBusters[5] benchmark dataset. There are two main types of models for protein-ligand interaction tasks: one uses only protein sequences and ligand smiles as inputs; The other class additionally uses resolved protein-ligand information to test the structure. AF3 uses only the first type of sequence information, and the traditional molecular docking uses the second type of protein ligand structure information, but the performance of AF3 far exceeds the traditional docking methods (such as Autodock Vina in the figure above [6]). In March of this year, David Baker's lab launched the RoseTTAFold-All-Atom (RFAA) model [7], which also uses deep learning methods to predict the structure of various biological macromolecules. However, on the PoseBuster dataset, AF3 performs much better than RFAA.

For protein-nucleic acid complex structure prediction, the best prediction method is RoseTTAFold2NA developed in David Baker's lab [8]. As can be seen from the figure below, AF3 performed much better than RoseTTAFold2NA in both protein-RNA complex structure prediction tasks and protein-double-stranded RNA structure prediction tasks.

For RNA structure prediction, the best AI-based methods are RoseTTAFold2NA and AIchemy_RNA [9] (the latter being the best AI-based method in the CASP15 competition). AF3 was tested on 10 publicly available RNA targets in the CASP15 competition, and while it did not reach the AIchemy_RNA2 method assisted by human experts[10], it achieved better results than RoseTTAFold2NA and AIchemy_RNA, as shown in the figure above.

For covalent modifications, such as bond-bound ligands, glycosylation, modification of protein residues, etc., AF3 can also be well predicted.

For protein complex predictions, the results of the previous AlphaFold multimer4 were somewhat unsatisfactory; In AF3, the prediction accuracy of the protein complex was also improved. In the field of protein complexes, AF3 focuses more on the structural prediction of protein-antibody complexes, and the prediction accuracy in this area has been greatly improved.

Limitations of AlphaFold3

It is undeniable that AF3 has made great breakthroughs in structural prediction methods, but there are also some limitations. There are four main limitations: stereochemistry, hallucinations, dynamics, and accuracy for certain targets.

When it comes to stereochemistry, there are two main problems. First of all, the chirality output of the AF3 model is not always right. Although the chirality of the protein input to the model was correct, and the model added a penalty term for chiral errors during training, predictions on the PoseBuster dataset still showed chiral errors (4.4%). The second type of stereochemical violation is the tendency of models to sometimes produce overlapping ("colliding") atoms in their predictions, and for proteins with homology, this structural overlap is more common, and overlapping entire strands are sometimes observed. During model training, penalizing overlap mitigates the overlap in the output structure, but it cannot be eliminated completely.

AlphaFold3: A Unified Tool for Biomolecular Prediction?

AF3 predicts overlap in proteins

This is because AF3 introduces a diffusion model, which is prone to hallucinations. In the case of AF3, there will be a spurious structural sequence in the disordered region of the protein. While these hallucinatory regions are often labeled with very low confidence, they may lack the band-like appearance typical of the disordered regions produced by AlphaFold 2. To encourage band-like predictions in AF3, the researchers used distillation training from the AlphaFold 2 predictions, and added a ranking term to encourage results to show more solvent-accessible surface area.

AlphaFold3: A Unified Tool for Biomolecular Prediction?

Hallucinatory effects in AF3

For protein structure prediction tasks, the predicted structure is usually a static structure, but in biological systems, proteins often present a dynamic structure. In AF3, this problem persists, and AF3 still only predicts the static structure of the protein.

In some specific cases, the predicted conformation of a protein may not correspond well to the ligand, given the ligand. For example, E3 ubiquitinated ligase has an open conformation without ligand binding and a close formation when binding ligands, but AF3 can only predict a close conformation regardless of whether a ligand is given.

AlphaFold3: A Unified Tool for Biomolecular Prediction?

Dynamics cannot be generated in AF3 predicted proteins

Therefore, although AF3 has greatly improved the modeling accuracy, there are still many target proteins that are difficult to model. The best way to get the highest accuracy is to generate a large number of predictions and rank them. As shown in the figure below, the accuracy of the prediction results has been improving as the prediction structure generated has increased, even to 1000 times, and the curve does not seem to converge.

AlphaFold3: A Unified Tool for Biomolecular Prediction?

discuss

The central challenge in molecular biology is to understand and ultimately regulate the complex atomic interactions of biological systems. AlphaFold3 takes a big step forward in this regard, proving that it is possible to accurately predict the structure of various biomolecular systems in a unified framework. AlphaFold3 reduces the dependence on MSA, so structural prediction can be completed quickly. The AlphaFold3 also had some limitations, and subsequent improvements required advances in the computer field on the one hand, and advances in experimental structural elucidation, such as cryo-electron microscopy (Cryo-EM) and cryo-electron tomography (Cryo-ET), on the other. Advances in experimental elucidation techniques will lead to more high-quality protein complex structures, which can be used as training data to further improve the generalization ability of the model. Therefore, the development of experimental technology and the development of computational methods go hand in hand, so that we can better understand the biological world and develop drugs with better efficacy.

Original link:

https://www.nature.com/articles/s41586-024-07487-w

AlphaFold Server链接:

https://golgi.sandbox.google.com/about

bibliography

[1] Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature (2024) doi:10.1038/s41586-024-07487-w.

[2] Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

[3] Bryant, P., Pozzati, G. & Elofsson, A. Improved prediction of protein-protein interactions using AlphaFold2. Nat. Commun. 13, 1265 (2022).

[4] Evans, R. et al. Protein complex prediction with AlphaFold-Multimer. Preprint at https://doi.org/10.1101/2021.10.04.463034 (2021).

[5] Buttenschoen, M., Morris, G. M. & Deane, C. M. PoseBusters: AI-based docking methods fail to generate physically valid poses or generalise to novel sequences. (2023) doi:10.48550/ARXIV.2308.05777.

[6] Trott, O. & Olson, A. J. AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 31, 455–461 (2010).

[7] Krishna, R. et al. Generalized Biomolecular Modeling and Design with RoseTTAFold All-Atom. http://biorxiv.org/lookup/doi/10.1101/2023.10.09.561603 (2023) doi:10.1101/2023.10.09.561603.

[8] Baek, M. et al. Accurate prediction of protein–nucleic acid complexes using RoseTTAFoldNA. Nat. Methods 21, 117–121 (2024).

[9] Shen, T. et al. E2Efold-3D: End-to-End Deep Learning Method for accurate de novo RNA 3D Structure Prediction. (2022) doi:10.48550/ARXIV.2207.01586.

[10] Chen, K., Zhou, Y., Wang, S. & Xiong, P. RNA tertiary structure modeling with BRiQ potential in CASP15. Proteins Struct. Funct. Bioinforma. 91, 1771–1778 (2023).

This article is reprinted with permission from the WeChat public account "Beijing Frontier Research Center for Biological Structure".

Special Reminder

1. Enter the "Boutique Column" at the bottom menu of the "Huipu" WeChat official account to view a series of popular science articles on different themes.

2. "Back to Park" provides the function of searching for articles by month. Follow the official account and reply to the four-digit year + month, such as "1903", to get the article index in March 2019, and so on.

Read on