Edit | Kaixia
Generative AI can talk, write poems, draw pictures, make videos, compose music, write code, and ......
So, can AI rewrite the human genome?
Now, new AI technologies are creating a blueprint for the microscopic biological mechanisms of editable DNA, which heralds a future where scientists will fight disease with greater precision and speed.
Recently, Profluence, an American AI protein design startup, launched the OpenCRISPRTM initiative and released the world's first open-source AI-generated gene editor.
Profluence showcased a customizable gene editor designed from the ground up with AI, providing the first successful precision editing of the human genome.
The technology is based on the same approach that drives ChatGPT. Just as ChatGPT learns generative language by analyzing Wikipedia articles, books, and chat logs, Profluent's technology is analyzing vast amounts of biological data, including microscopic mechanisms that scientists already use to edit human DNA, and then creating new gene editors.
相关研究以「Design of highly functional genome editors by modeling the universe of CRISPR-Cas sequences」为题,于 2024 年 4 月 22 日发布在预印平台 bioRxiv 上。
Paper link: https://doi.org/10.1101/2024.04.22.590591
OpenCRISPR-1 is an AI-created gene editor consisting of a Cas9-like protein and guide RNA, developed entirely using Profluence's large language model (LLM).
Through OpenCRISPR's training process, the company's AI learns from large-scale sequences and biological contexts to generate millions of multiple CRISPR-like proteins that do not exist in nature, exponentially expanding almost every known CRISPR family.
Structural analysis of OpenCRISPR-1.
In an effort to democratize technology, Profluence launched OpenCRISPR-1 as an initial open-source version, making AI-designed gene editors freely licensed for ethical research and commercial use.
Ali Madani, Co-Founder and CEO of Profluence, said:
"Trying to edit human DNA with AI-designed biological systems is a scientific moonshot program. Our success bodes well for a future where AI can design exactly what is needed to create a range of customized therapies for diseases. To spur innovation and democratization in the field of gene editing to drive the future forward, we are open-sourcing the program's offerings. 」
Explore the full range of protein sequence variations in just a few hours
Until now, the protein engineering community has typically relied on discovery-based methods to replicate functional proteins from nature, or iteratively modified through a process known as directed evolution. Many converted proteins are discovered by chance.
The core component of the CRISPR-Cas9 gene editing system is the Cas9 protein, which is an RNA-guided nuclease that searches the entire human genome for all 3 billion nucleotides and cleaves them at only one specific site.
This nuclease binds to a single guide RNA (sgRNA), which consists of a scaffold that structurally interacts with proteins, and spacer sequences can be programmed to target any location in the genome.
CRISPR-Cas 图谱的形成。
Given that most Cas9 proteins are more than 1000 amino acids in length, the overall design space contains 20^1000 possible sequences, which is orders of magnitude more than the number of atoms in the observable universe. However, because these proteins must coordinate many interactions in a precise order to achieve precise cleavage, even a single erroneous mutation can completely disrupt protein function.
It takes many, many hours to experimentally explore all possible sequence variants, but in a matter of hours, AI systems can navigate through this search space to discover functional gene editors.
Language models generate a variety of CRISPR-Cas proteins
Generative protein language models are typically pre-trained on large, diverse, native protein sequence datasets that cover a wide range of functions. They can generate true protein sequences that reflect the properties of native proteins. However, for specific applications, such as the generation of novel gene editors, we need to guide the generation of specific protein families of interest.
To this end, Profluence's research team conducted exhaustive data mining to construct the most extensive dataset of CRISPR systems to date. Call this resource the CRISPR-Cas Atlas.
All told, the study found 5.1 million CRISPR-Cas proteins, extending the known natural diversity of these systems by a factor of 2.7 overall, and Cas9 in particular by a factor of 4.1.
To generate novel CRISPR-Cas proteins, researchers trained protein language models on the CRISPR-Cas Atlas. Four million sequences were generated from this model, and bioinformatics techniques were used to remove degenerate sequences and identify which CRISPR-Cas family each generated protein belonged to. This set of filtered generated sequences has a 4.8-fold greater diversity compared to the native proteins found in the CRISPR-Cas Atlas.
Measured by the number of protein clusters, the resulting sequences greatly expand the diversity of CRISPR-related protein families.
The resulting gene editor functions in human cells
The researchers further focused their attention on the CRISPR-Cas9 system and trained a protein language model based on 238,917 Cas9 proteins in the CRISPR-Cas profile.
Given the widespread adoption and clinical success of SpCas9, models are used to generate Cas9-like proteins that are interoperable with SpCas9. In other words, they bind the same part of the genome (PAM) and are compatible with the same sgRNA; therefore, they can be used for the same application.
Then, 48 sequences were selected from these generated sequences for rigorous functional characterization of human cells. OpenCRISPR-1 is comparable to SpCas9 in activity at the target site (55.7% for OpenCRISPR-1 and 48.3% for SpCas9), but surprisingly 95% less editing at off-target sites (0.32% for OpenCRISPR-1 compared to 6.1% for SpCas9).
In addition, OpenCRISPR-1 is a highly novel protein: it has 403 mutations from SpCas9 and 182 mutations from any native protein in the CRISPR-Cas profile.
Multiple generated nucleases (green), including OpenCRISPR-1 (dark green), have on-target activity comparable to or higher than SpCas9 (blue), but much lower off-target activity.
Next, the study demonstrated that OpenCRISPR-1 and SpCas9 have similar activity and specificity when paired with deaminases when precisely editing a single base in the target genome. In addition, the ability to maintain base editing activity while increasing specificity by using a deaminase generated by another Profluence-trained protein language model.
OpenCRISPR-1 functions very similarly to SpCas9 when base editing with ABE8.20, a highly active engineered deaminase, and the study-generated deaminases PF-DEAM-1 and PF-DEAM-2.
Finally, to further optimize the activity of the resulting nucleases, the researchers also trained a model to generate compatible sgRNAs for any given Cas9-like protein. These generated sgRNAs can increase the activity of 4 of the 5 proteins tested compared to the sgRNAs of SpCas9.
For 4 of the 5 generated nucleases tested, the sgRNAs generated using the model improved editing efficiency.
OpenCRISPR-1 只是冰山一角
The study demonstrates the world's first successful editing of the human genome using a gene editing system, where every component is designed entirely by AI.
In addition, the platform is capable of generating more gene-editing systems at will, and OpenCRISPR-1 is just the tip of the iceberg.
The team publicly released OpenCRISPR-1 to promote widespread, ethical use in research and commercial applications. In making this molecule available to the wider community, researchers hope to reduce the cost and barrier to entry for therapeutic, agricultural, and scientific applications of CRISPR-based technologies.
Peter Cameron, Vice President and Head of Gene Editing at Profluence, said, "This is a watershed moment and the beginning of an iterative process that we hope will begin to build the next generation of gene medicines. We encourage the gene editing community to stress test OpenCRISPR-1. If there are specific features that can be improved for specific applications, we want to know and can work together to optimize those features. 」
References:
Hatps://twitter.com/thesismadani/status/1782510590839406904
https://www.nytimes.com/2024/04/22/technology/generative-ai-gene-editing-crispr.html
https://www.profluent.bio/blog/editing-the-human-genome-with-ai
https://www.businesswire.com/news/home/20240422399482/en/Profluent-Successfully-Edits-Human-Genome-with-OpenCRISPR-1-the-World%E2%80%99s-First-AI-Created-and-Open-Source-Gene-EditorProfluence