laitimes

AI has successfully rewritten human DNA, and the world's first gene editor is open source!

author:New Zhiyuan

Editor: Editorial Department

Just now, the molecular biology community has detonated nuclear bomb-level news: human DNA has been rewritten by AI! The start-up company Profluent announced that it has open-sourced the world's first AI design gene editor and successfully edited the DNA in human cells. It's sci-fi, if you had the chance, would you choose to "modify" your DNA?

AI, capable of rewriting the human genome?

Just now, the startup Profluent announced that a gene editor, designed entirely by AI, has successfully edited DNA in human cells.

In other words, the world's first molecular-level accurate gene editor designed from scratch using AI was born.

AI has successfully rewritten human DNA, and the world's first gene editor is open source!

Just as ChatGPT can generate poetry, Profluent is a new AI system that allows us to edit the microscopic mechanisms of our DNA to generate blueprints.

On the most extensive dataset of CRISPR-based gene editing systems to date, the researchers trained LLMs. These LLMs produce proteins that expand the diversity of almost all naturally occurring CRISPR-Cas families by a factor of 4.8!

In addition, gene editors have shown comparable or better activity and specificity than SpCas9 (an example gene editor) in human cells, while remaining more than 400 mutations away.

This means that we have our own genomic code. The scientists of tomorrow will be able to fight diseases more precisely and quickly than they do today.

Moreover, the company has also decided to release these DNA molecules freely under the OpenCRISPR protocol.

AI has successfully rewritten human DNA, and the world's first gene editor is open source!

The physical structure of OpenCRISPR-1, a gene editor created by Profluent's AI technology

Ali Madani, co-founder of Profluent, said, "Trying to edit human DNA with AI-designed biological systems is a scientific journey to the moon."

"Our success shows that in the future, AI can accurately design a range of customized treatment options."

Some netizens said, "Is it time to reprogram humans? AI-driven CRISPR technology advancements are challenging the boundaries of genetic ethics."

AI has successfully rewritten human DNA, and the world's first gene editor is open source!

If you could change your DNA, would you do it?

AI has successfully rewritten human DNA, and the world's first gene editor is open source!

The genes for anemia and blindness are modified by ourselves

The startup Profluent describes the technology in detail in this just-published paper.

AI has successfully rewritten human DNA, and the world's first gene editor is open source!

Address: https://www.biorxiv.org/content/10.1101/2024.04.22.590591v1.full.pdf

The paper is expected to be presented at the annual meeting of the American Society of Gene and Cell Therapy next month.

The technology and the same approach that powers ChatGPT are that it analyzes vast amounts of biological data to create new gene editors, including microscopic mechanisms that scientists are already using to edit human DNA.

These gene editors are based on a Nobel Prize-winning method that involves a biological mechanism called CRISPR.

AI has successfully rewritten human DNA, and the world's first gene editor is open source!

After the birth of CRISPR-based technology, it caused a sensation in the industry. It has changed the way scientists study disease.

In the past, if we were unfortunate enough to suffer from genetic diseases such as sickle cell anemia and blindness, we were often helpless, but now CRISPR technology allows us to modify the genes that cause these diseases!

AI has successfully rewritten human DNA, and the world's first gene editor is open source!

The CRISPR method uses a mechanism that we find in nature: biological material collected from bacteria that magically gives these microbes the ability to fight bacteria.

James Fraser, professor and chair of the Department of Bioengineering and Therapeutic Sciences at the University of California, San Francisco, said that these biomaterials have never existed on Earth, and Profluent's AI system is learning from nature how to create these new things.

If these technologies continue to evolve, the resulting gene editors may be more flexible and powerful than the ones we humans have evolved over billions of years.

Now, Profluent says it's open-sourcing the OpenCRISPR-1 editor, which means that the technologies are free for individuals, academic labs, and companies to use.

Open source, which is common in the AI world, can accelerate the generation of new technologies. For biolabs and pharmaceutical companies, though, open source like OpenCRISPR-1 isn't common.

Of course, Profluent only open-sourced the gene editor generated by its AI technology, not the AI technology itself.

AI has successfully rewritten human DNA, and the world's first gene editor is open source!

Time-lapse photography of human cells edited by OpenCRISPR-1

Why AI editing proteins matters

At present, if the protein engineering community wants to replicate functional proteins, or use "directed evolution" to iteratively modify them, they usually need to copy them from nature.

Many proteins of great importance to humans were discovered by chance, such as insulin in dogs, Cas9 in yogurt facilities, and botulinum toxin, which often causes food poisoning.

The role of large generative protein language models is to capture the basic blueprint for making natural proteins work. They outline a shortcut that bypasses the random process of evolution and pushes humans to consciously design proteins for specific purposes.

The Cas9 protein, a core component of the CRISPR-Cas9 gene editing system, is an RNA-guided nuclease that searches for all 3 billion nucleotides in the human genome and cleaves them at a specific site.

AI has successfully rewritten human DNA, and the world's first gene editor is open source!

This nuclease is complexed with a single guide RNA (sgRNA), which consists of a scaffold that structurally interacts with a protein and a spacer sequence that can be programmed to target any site in the genome.

AI has successfully rewritten human DNA, and the world's first gene editor is open source!

The tricky thing is that most Cas9 proteins are more than 1000 amino acids in length, and the entire design space contains 20^1000 possible sequences, which is orders of magnitude higher than the number of atoms in the observable universe!

AI has successfully rewritten human DNA, and the world's first gene editor is open source!

And, because these proteins must coordinate many interactions in precise order to achieve precise cleavage, even a single misalignment mutation may completely eliminate the function of the protein.

If all possible sequence variations were exhausted experimentally, many scientists would not be able to do it in a lifetime.

However, AI systems can easily explore the entire search space and discover functional gene editors. And, it only takes a few hours!

The world's first open-source gene editor that rewrites human DNA

The gene editor OpenCRISPR-1 consists of a Cas9-like protein and a guide RNA.

As mentioned earlier, it was developed entirely by Profluent's AI large model.

In the implementation process, the researchers mined 26TB of assembled "genome" and "metagenome" database systems to sort out a dataset of more than 1 million CRISPR operons.

By training OpenCRISPR, AI learns from large-scale sequences and biological contexts to generate millions of CRISPR-like proteins that don't exist in nature.

AI has successfully rewritten human DNA, and the world's first gene editor is open source!

According to the researchers, AI has generated 4.8 times the protein clusters of the "CRISPR-Cas family" that has been discovered in nature, which is completely exponentially expanded!

Moreover, the language model also customized a single guide RNA sequence for Cas9-like effector proteins.

Compared to the prototype gene editing effector SpCas9, several generated gene editors showed comparable or improved activity and specificity, while differing by 400 mutations in sequence.

Finally, the researchers also demonstrated the compatibility of the AI-generated gene editing OpenCRISPR-1 with base editing.

The key findings from this study are as follows:

AI生成4.8倍「CRISPR-Cas」蛋白质宇宙

Generating protein language models is typically pre-trained on large datasets of native protein sequences covering multiple phylogenies and functions.

These models enable the generation of true protein sequences that reflect the distribution and properties of native proteins.

However, for specific applications, such as the generation of novel gene editors, it is necessary to direct the generation process towards a specific subset of protein families of interest.

AI has successfully rewritten human DNA, and the world's first gene editor is open source!

In response, the researchers conducted exhaustive data mining to build the database.

They searched 26.2 TB of assembled microbial genomes and metagenomes and found 1,246,163 CRISPR-Cas operons.

AI has successfully rewritten human DNA, and the world's first gene editor is open source!

Compared to select databases such as CRISPRCasDB and CasPDB, as well as UniProt, the world's largest protein resource, the newly created database shows greater diversity.

By summarizing the commonalities, the researchers discovered a single model of all CRISPR-Cas proteins, capable of generating distinct sequences across families.

To generate novel CRISPR-Cas proteins, the authors fine-tuned the ProGen2-based language model on the CRISPR-Cas Atlas, thereby balancing the representation of protein families and sequence cluster sizes.

From this model, the researchers generated 4 million sequences.

Half of these are generated directly from the model, and the other half is cued by up to 50 residues at the N- or C-terminus of native proteins to guide generation towards specific proteins.

To assess its novelty and diversity, the authors clustered the generated and native sequences of each family with 70% identity using MMseqs2.

It was found that the generated sequences achieved a 4.8-fold diversity expansion compared to the native proteins in the CRISPR-Cas profile.

For families with few native proteins, such as Cas13 and Cas12a, the diversity of generated sequences increased by 8.4-fold and 6.2-fold, respectively.

In addition, only minimal context, i.e., 50 or fewer residues are provided, to be consistent with the family of interest for a particular family-specific guide sequence generation.

AI has successfully rewritten human DNA, and the world's first gene editor is open source!

All 1,000,000 Cas9-like proteins were generated

While many CRISPR-Cas proteins have been used for genome editing, Cas9 remains one of the most widely used.

To generate new Cas9-like sequences, the researchers sampled 50 residues from the N- or C-terminus of Cas9 to suggest the CRISPR-Cas model.

Here, the authors used 238917 Cas9 sequence from the CRISPR-Cas Atlas to fine-tune another language model.

This model generates viable Cas9-like sequences twice as fast (54.2%) as the CRISPR-Cas model and requires any prompting.

AI has successfully rewritten human DNA, and the world's first gene editor is open source!

To explore the potential sequence distribution of type II effectors, the researchers generated 1 million Cas9 proteins using the Cas9 model.

The generated viable generations (n=542,042) were clustered with native Cas9 with 40% identity and used as input for the construction of a maximum likelihood phylogenetic tree (Figure 2a).

Strikingly, the resulting proteins dominate the phylogenetic landscape, accounting for 94.1% of the total phylogenetic diversity.

There is a 10.3-fold increase in diversity compared to the entire CRISPR-Cas profile (Figure 2b).

The new phylogenetic population is distributed throughout the tree, suggesting that the model captures the full diversity of Cas9 and does not overfit any particular system.

The resulting sequences differ greatly from the CRISPR-Cas profile, with an average identity of only 56.8% to any natural sequence (Figure 2c).

Overall, the resulting sequences closely matched the length of native proteins in the same protein cluster, with a Pearson correlation of 0.97 (Figure 2D).

In addition, Figure 2e shows the on-target and off-target editing efficiency of native Cas9, ancestral sequence reconstruction, and 48 generated proteins. Figure 2f shows native Cas9, ancestral sequence reconstruction, and generative proteins in terms of targeted editing efficiency and specificity.

AI has successfully rewritten human DNA, and the world's first gene editor is open source!

Generated gene editors that function in human cells

Then, the researchers further narrowed their focus to the CRISPR-Cas9 system and trained a protein language model on 238,917 Cas9 proteins in the CRISPR-Cas profile.

Using these models, the researchers generated Cas9-like proteins that are interoperable with SpCas9. That is, they bind to the same part of the genome (PAM) and are compatible with the same sgRNA, so they can be used for the same applications.

The researchers selected 48 of these generated sequences for rigorous functional characterization in human cells.

OpenCRISPR-1, the most popular, has comparable activity to SpCas9 at the target site (55.7% for OpenCRISPR-1 and 48.3% for SpCas9), but surprisingly 95% less editing at the off-target site (0.32% for OpenCRISPR-1 and 6.1% for SpCas9).

In addition, as a very new protein, OpenCRISPR-1 is 403 mutations away from SpCas9 and 182 mutations away from any native protein in the CRISPR-Cas profile.

AI has successfully rewritten human DNA, and the world's first gene editor is open source!

Multiple generated nucleases (green), including OpenCRISPR-1 (dark green), have targeting activity comparable to or higher than SpCas9 (blue), but much lower off-target activity

The researchers also found that OpenCRISPR-1 and SpCas9 have similar activity and specificity when paired with deaminases to precisely edit a single base in the target genome.

They also maintain base-editing activity while increasing specificity by using a deaminase generated by a protein language model trained by another Profluent.

AI has successfully rewritten human DNA, and the world's first gene editor is open source!

OpenCRISPR-1 functions very similarly to SpCas9 when base editing using ABE8.20, a highly active engineered deaminase, and the resulting deaminases PF-DEAM-1 and PF-DEAM-2

Finally, to further optimize the activity of the resulting nucleases, a model was trained to generate compatible sgRNAs for any given Cas9-like protein.

These generated sgRNAs can increase the activity of nucleases produced by four of the five proteins tested compared to the sgRNAs of SpCas9.

AI has successfully rewritten human DNA, and the world's first gene editor is open source!

For 4 of the 5 generated nucleases tested, the sgRNA generated using the model improved editing efficiency

AI is improving healthcare

Now, there are many projects around the world that are using AI technology to improve healthcare.

For example, scientists at the University of Washington are using the methods behind ChatGPT and Midjourney to create entirely new proteins and are working to accelerate the development of new vaccines and drugs.

AI has successfully rewritten human DNA, and the world's first gene editor is open source!

Many of the generative AI that is on fire today is powered by neural networks. By analyzing large amounts of data, neural networks acquire certain skills.

Midjourney, for example, analyzes millions of digital images based on neural networks, along with captions describing each image. In this way, the system learns to recognize the connection between the image and the text, and can draw a picture like "a rhinoceros jumps off the Golden Gate Bridge".

AI has successfully rewritten human DNA, and the world's first gene editor is open source!

Profluent's technology is also powered by a similar AI model.

This model learns from amino acid and nucleic acid sequences, and it is these compounds that define the microbiological mechanisms that scientists use to edit genes.

Essentially, it analyzes the behavior of CRISPR gene editors extracted from nature and learns how to generate entirely new gene editors.

According to Ali Madani, CEO of Profluent, these AI models learn from sequences, whether it's characters, words, computer code, or amino acid sequences.

AI has successfully rewritten human DNA, and the world's first gene editor is open source!

Mr. Madani is at the Profluent Lab in Berkeley, California, after previously working at the AI Lab at software giant Salesforce

How far away will humans edit genes

Currently, Profluent has not conducted clinical trials on these synthetic gene editors, so it is unclear whether they can match or even surpass the performance of CRISPR.

But their research shows that AI models can produce something capable of editing the human genome.

Still, this outcome is unlikely to impact healthcare in the short term.

AI has successfully rewritten human DNA, and the world's first gene editor is open source!

Fyodor Urnov, a gene editing pioneer and scientific director at UC's Berkeley Institute for Innovative Genomics, said scientists have no shortage of naturally occurring gene editors to fight disease.

The real bottleneck is that the editor incurs extremely high costs for safety, manufacturing, and regulatory review before it can be used in clinical treatment.

However, as more and more data is learned, the potential of generative AI systems cannot be underestimated.

If Profluent's technology continues to improve, one day scientists will be able to edit genes in a more precise way.

At that point, we may be in a world where many medications and treatments can be quickly tailored to the individual. This is something that people today dare not think.

"I dream of a world where we can have CRISPR available on-demand in a matter of weeks," says Dr. Urnov.

AI has successfully rewritten human DNA, and the world's first gene editor is open source!

Another big question is, is CRIPSR risky?

Scientists have been warning for a long time: don't use CRISPR for human augmentation!

Because this is a relatively new technology, it is likely to have undesirable side effects, such as causing cancer. And some people use it for unethical purposes, such as genetically modified human embryos.

Synthetic gene editors also face this problem. Today, scientists have mastered everything they need to edit embryos.

But Dr. Fraser says that if anyone really wants to do something bad with them, they will only use what they have, not AI-created editors.

Read on