Edit | Radish peel
"If this can be reproduced, this is the end of the world as we know it! A new era of functional modeling has begun." Jan Kosinski, a scientist at the European Laboratory of Molecular Biology (EMBL), tweeted. Immediately after the release of AlphaFold 3, he did a series of simple tests with it and posted the results on X.
"I took a transcription factor with an unknown structure, folded it up, and embedded its recognition sequence into a longer piece of DNA. AlphaFold 3 pinpoints transcription factors." Kosinski explains.
Video link: https://mp.weixin.qq.com/s/-T-c603cFN5BCmPk5M1JFg
Transcription factors he used: https://jaspar2020.genereg.net/matrix/MA0027.2/
使用的 DNA 序列:CAGGATCCTAATTATGGATCCTGTGTATCTTCAGT
The model he used was: https://oc.embl.de/index.php/s/SsDKcS06P1xewAV
"Once an open-source copy of AlphaFold 3 is available and working well, we may be able to compute the sequence specificity of all transcription factors," he says. I think that's going to be big, right?"
At the same time, the rigorous Kosinski appealed to netizens: "But I would be happy if someone checked that I was not confused, that I was consistent with other transcription factors, and that the score could be used to distinguish between specific and non-specific sequences."
Kosinski also did other experiments.
Video link: https://mp.weixin.qq.com/s/-T-c603cFN5BCmPk5M1JFg
"And one more, maybe no training bias. The promoter region should be specifically bound to the ATTTTAGTCGCGCCTAAAAT and it reappears! The crystal structure is on the left and the AlphaFold 3 model is on the right. The promoter is red." He explained.
"We don't know yet if that's the case, but if this sequence specificity is generalized to CRISPR, zinc finger structures, TALENs, restriction nucleases – how much revenue could be generated for AlphaFold 3 owners?" He teased in a tweet.
The next day, Kosinski did another test with restriction enzymes.
"It 'can't' be generalized to my restriction nucleases. The recognition sequence is cyan and the cleavage site is red." "But be careful: there are only two examples and long DNA, and someone can try to have a sequence of exact length and compare specific and non-specific sequences," he said.
"AlphaFold 3 cannot correctly predict the sequence of the restriction nuclease BamHI, although there are 5 constructs available, 4 of which have DNA." He tweeted, "On the left is the model, on the right is the crystal structure. Given that some of these structures may be included in the training and template databases, this suggests that the presence of structures in the training set does not guarantee accurate predictions."
BamHI:https://uniprot.org/uniprotkb/P23940/entry#struct
Martin Pacesa from the Ecole Polytechnique Fédérale de Lausanne (EPFL) commented: "Probably because of the noise from the MSA! If too many RE homologs recognize very different target sequences, it can get lost in the co-evolutionary noise."
Kosinski affirms Pacesa's point: "Yes, you're probably right, REases usually have very few similar sequences in their MSAs...... Or when I studied them during my master's degree, they did."
"What does it 'recognize'?" Netizen Evgenii asked.
"CAAGCTTG, it just binds another palindromic sequence (REases like palindromic sequences, the original sequence GGATCC is also a palindromic sequence)." Kosinski replied, "I tried shuffling the sequence, but it still found and bound another imperfect palindromic sequence (GCACGC). I think we need a better 'background bait sequence'."
"Interesting! AAGCTT That's HindIII. Based on your experience to date, how does AlphaFold 3 handle longer dsDNA fragments?" Pim Huis in 't Veld of Max Perutz's lab at the University of Vienna commented.
"The reason this worked before was that it searched for any palindromic sequences, because this was bound by similar dimers? Sometimes does it look for one-dimensional DNA patterns that are similar to those in the trained model (e.g., palindromes, mismatches)? Marcin J. Suskiewicz of the French CNRS Orléans commented.
"Maybe there's a reasonable explanation!" Kosinski said.
Karel Krápník Berka, a theoretical physical chemist, chemist and bioinformatician at Univerzita Palackého, used AlphaFold 3 to study lipid molecules on membranes.
"AlphaFold 3 can also be used to predict membrane position." "This is an example of CYP2E1 versus oleic acid (OLA)," he posted. This is compared to my 2013 MD membrane model and cryo-EM from the nanodisc."
Kosinski did the testing and posted, "A candidate novel cannabinoid receptor modeled within the lipid membrane bilayer using this technique, with oleic acid (OLA) as a ligand."
Video link: https://mp.weixin.qq.com/s/-T-c603cFN5BCmPk5M1JFg
ScienceAI will continue to follow up on the follow-up results.
Related content: https://twitter.com/jankosinski/status/1788532231939453015