AI can now quickly and reliably predict the three-dimensional shape of most proteins. Image source: deepmind
Recently, the University of Washington in the United States and deepmind in the United Kingdom respectively announced the results of years of work: advanced modeling programs that can predict the precise three-dimensional atomic structure of proteins and some molecular complexes. One of the research groups reports that they have used newly developed artificial intelligence (AI) programs to predict 350,000 protein structures from humans and 20 model organisms — such as E. coli, yeast and fruit flies. In the coming months, they plan to include all the programmed proteins in the list of model proteins, which have about 100 million molecules.
"It's pretty amazing." John Moult, a protein expert at the University of Maryland in the United States, said he hosts a competition every two years called "Key Protein Structure Prediction Methods" (CASP). For decades, moult said, structural biologists have dreamed of a day when computer models could increase the number of extremely precise protein shapes obtained from experimental methods such as X-ray crystallography. "I never thought this dream would come true." Moult said.
The model, called Alphafold, is the work of researchers at DeepMind, a British AI company affiliated with Google's parent company Alphabet. In 2020, alphafold "swept" CASP. But the deepmind researchers didn't reveal the theoretical details of mapping the shape of the proteins, specifically the underlying computer code for alphafold.
This has begun to change. On July 15, a team of minkyung baek and david bakers at the University of Washington reported that they had created a highly accurate protein structure prediction program called rosettafold and published it publicly. The results were published online in Science. Meanwhile, Nature published a paper written by deepmind researchers Demis Hassabis and John Jumper, revealing alphafold details.
Both programs use AI to identify folding patterns in a vast database of protein structures. These procedures calculate the most likely structure of an unknown protein by considering the basic physical and biological rules for the interaction of adjacent amino acids in the protein. The paper shows that Baek and Baker used rosettafold to create a structural database of hundreds of g-protein-coupled receptors, a common class of drug targets.
Deepmind researchers created 350,000 predictive structures, more than double the results obtained by previous experimental methods. The researchers say alphafold produces nearly 44 percent of the human protein structure, covering nearly 60 percent of the amino acids encoded by the human genome. Alphafold determined that many other human proteins are "disordered," meaning that their shape is not a single structure.
In addition, Deepmind has partnered with the European Molecular Biology Laboratory to create a new protein prediction database that can be accessed online for free. "It's great to be able to provide this kind of service." Baker said, "It's really going to speed up the pace of research." "Because a protein's 3D structure largely determines its function, this database allows biologists to clarify how thousands of unknown proteins work." (Bunraku)
Source: China Science Daily