laitimes

Nature – Transfer learning makes it possible to predict gene interaction networks

author:Seishin Treasure Book

Mapping the gene regulatory networks that drive disease progression can screen out the core regulatory genes for diseases, providing a more effective approach to disease treatment.

On May 31, 2023, an article titled Transfer learning enables predictions in network biology was published in Nature.

Nature – Transfer learning makes it possible to predict gene interaction networks

summary

Building gene networks requires the use of large amounts of transcriptome data to learn the relationships between genes, which limits the use of gene networks in situations where data is limited, including rare diseases and diseases of hard-to-access tissues. Recently, transfer learning has revolutionized natural language understanding, computer vision, and more. Transfer learning fine-tunes deep learning models that have been pre-trained on large-scale routine datasets with limited task-specific data, enabling them to be applied to a large number of downstream tasks. Here, we develop a context-aware deep learning model Geneformer based on attention mechanism. Geneformer was pre-trained with a massive corpus of about 30 million individual cell transcriptomes to predict downstream network biology tasks with limited data. During pre-training, Geneformer gained a basic understanding of network dynamics, encoding network hierarchies in the attention weights of the model in a completely self-supervised manner. Geneformer demonstrated continued improvement in prediction accuracy with limited task-specific data when fine-tuning to a diverse downstream task involving chromatin and network dynamics. In the application of disease modeling with limited patient data, Geneformer identified candidate therapeutic targets for cardiomyopathy. Overall, Geneformer represents a pre-trained deep learning model that can be fine-tuned to accommodate a wide range of downstream applications, further facilitating the discovery of key network regulators and candidate therapeutic targets.

Nature – Transfer learning makes it possible to predict gene interaction networks

Figure 1 Geneformer architecture and transfer learning strategy a depicts the flow chart of transfer learning. b shows the tissue distribution of 30 million transcriptome data; c shows the pre-trained Geneformer architecture.

discuss

The researchers developed a deep learning-based, context-sensitive model, Geneformer, that can make predictions in scenarios with limited data through pre-training on large-scale transcriptome data. By observing a large number of cellular states during pre-training, Geneformer gained a basic understanding of network dynamics and encoded the network hierarchy in the model's attention weights in a completely self-supervised manner. Geneformer's ability to predict dose-sensitive disease genes through context-sensitive computational simulation deletion methods provides valuable assets for the interpretation of genetic variation, including prioritizing GWAS targets that drive complex traits and predicting specific tissues they may affect. The results of experiments validating dose-sensitive gene candidate TEAD4 in fetal cardiomyocytes support the important role of Geneformer in advancing further research in human developmental biology.

Nature – Transfer learning makes it possible to predict gene interaction networks

Figure 2 Using limited data, Geneformer improved the predictive power of gene dose sensitivity.

When modeling cardiomyopathy disease models using limited patient samples, Geneformer predicted candidate therapeutic targets and experimentally validated them in iPSC disease models, finding that CRISPR-mediated knockout of candidate TEAD4 in iPSC-derived cardiac microtissues resulted in a significant reduction in its ability to generate contractile stress (force per unit area) (as shown in Figure 2e).

Therefore, computational simulation analysis using limited data may be useful in discovering treatments for rare diseases that have previously been hampered by data limitations or that affect diseases that are clinically difficult to obtain tissue. In addition, we found that pre-training with a larger, more diverse corpus consistently improved Geneformer's predictive power. In addition, exposure to hundreds of experimental datasets during pre-training also appears to help improve the stability of single-cell analyses susceptible to batch effects and individual differences. These findings suggest that as the amount of publicly available transcriptome data continues to expand, future pre-trained models based on larger corpus may provide meaningful predictions for more difficult-to-explore studies.

Read the original content:

https://www.nature.com/articles/s41586-023-06139-9

Read on