laitimes

Moving towards a long-standing goal of biology

Moving towards a long-standing goal of biology

DNA is the "instruction manual" that guides the functioning of life. While every human cell contains a large number of genes, these so-called coding DNA sequences, the sequences that can actually be translated directly to build proteins, make up only 1% of our entire genome. The remaining 99% is made up of non-coding DNA, which does not carry instructions for building proteins.

But these non-coding regions aren't really useless, they guide the operation of life machines in different ways, so they're sometimes referred to as regulating DNA.

For example, one of the important functions of non-coding DNA is to help turn genes on and off, fragments of genes called promoters that control the number of proteins that are made, or whether they are made.

Moving towards a long-standing goal of biology

Startup. | Image credit: NIH

What's more interesting is that over time, these non-coding regions often mutate as cells copy DNA and grow and divide. Many mutations are insignificant and have little effect, and others sometimes subtly alter the way they control gene expression. Some mutations may be beneficial, but they are also occasionally associated with some disease risks, such as type 2 diabetes and cancer.

To gain a deeper understanding of the effects of such mutations, researchers have been working to study mathematical maps that allow them to look at the genomes of organisms, predict which genes will be expressed, and determine how this expression will affect the visible characteristics of organisms.

These atlases are also known as adaptation landscapes. This concept was proposed about a century ago. Early adaptation landscapes were very simple and usually focused on limited mutations. Today, there are richer data sets, but researchers still need additional tools to describe these complex data and visualize them. Not only does this help to better understand how individual genes evolve over time, but it also helps predict possible future changes in sequence and expression.

In a new study published recently in Nature, a team of researchers developed a framework for studying adaptive landscapes that regulate DNA. They created a neural network model that, when trained on hundreds of millions of experimental measurements, predicts how changes in noncoding sequences in yeast affect gene expression. Predicting gene expression based on DNA sequences has always been one of the most important goals of biology, and this study has helped us take another step in this direction.

Predict the evolution of gene regulation

In the new study, the team's goal is to create an unbiased model capable of predicting the fitness and gene expression of organisms based on any possible DNA sequence, even those that have never been seen before.

They chose to create a neural network model, trained it on a dataset, inserted millions of completely random, non-coding DNA sequences into yeast, and observed how each random sequence affected gene expression. The object of focus and study is a subset of the promoters.

While it's important for scientists to create an accurate model, it's more accurate to say that it's just a starting point. They tested the predictive power of the model in a variety of ways in their study, showing how it could help unravel the evolutionary history of certain promoters, as well as their likely future.

First, to determine whether models can help with synthetic biology applications, such as the production of antibiotics, enzymes, and foods, the researchers tried to use models to design promoters that control the desired expression levels of various genes. They then went through a large number of scientific papers, identified some basic evolutionary questions, and then tested whether the model could help answer them.

The team even provided their model with a real-world population dataset containing genetic information from yeast strains around the world. In this way, they sketched the selection pressures of the yeast of the past few thousand years that shaped the yeast genome today.

But that's not enough. To create a truly powerful tool that can probe any genome, the researchers also need to find a way to predict the evolution of non-coding sequences even in the absence of such a comprehensive population dataset.

To achieve this, the team devised a computational technique that plots the predictions of the framework in a two-dimensional graph. This shows in a very simple and intuitive way how any non-coding DNA sequence will affect gene expression and fitness without requiring any time-consuming experiments on a lab bench.

This study shows that AI can not only predict the effects of regulating CHANGES in DNA, but also reveal the basic principles that have governed millions of years of evolution.

The "Oracle" of Predictive Genes

Study veteran author Aviv Regev says it's like having an "oracle": "We can now ask it, what if we tried all the possible mutations in this sequence?" Or, what new sequences should we design to meet the needed gene expression? Scientists can use this model to the evolutionary problems or scenarios they study and solve other problems, such as creating sequences that ideally control gene expression. ”

The team believes that even in the short term, some application scenarios are already very clear, such as custom-regulated DNA for yeast in brewing, baking and biotechnology. In the long run, an extension of this research is also expected to help identify disease mutations in human regulatory DNA that are currently difficult to find and largely ignored clinically.

Even before the study was officially published, the team had begun to receive questions from other researchers who wanted to use the model to design non-coding DNA sequences for gene therapy.

This study shows a bright promise of training AI models of gene regulation on richer, more complex, and more diverse datasets. They are also expected to help answer more big unanswered questions about basic biology and genetic evolution.

#创作团队:

Compile: M ka

Typography: Wenwen

#参考来源:

https://news.mit.edu/2022/oracle-predicting-evolution-gene-regulation-0311

#图片来源:

Cover image: Martin Krzywinski

First image: Martin Krzywinski

Read on