laitimes

New research from the Chinese Academy of Sciences team: Artificial intelligence helps to identify tissue substructures from spatially resolved transcriptomics

Edit | Radish peel

Recent advances in spatially resolved transcriptomics have made it possible to comprehensively measure gene expression patterns while preserving the spatial context of tissue microenvironments. Deciphering the spatial background of spots in tissue requires careful use of their spatial information.

To this end, researchers at the Chinese Academy of Sciences have developed a graph-aware autoencoder framework, STAGATE, which learns low-dimensional submersible inserts by integrating spatial information and gene expression profiles to accurately identify spatial domains. In order to better characterize spatial similarity at the boundaries of the spatial domain, the similarity of adjacent points is learned adaptively by preclustering gene expression, and the attention mechanism is used to adaptively learn the same cell type perception module.

The researchers validated STAGATE of different spatial transcriptome datasets generated by different platforms at different spatial resolutions. STAGATE can greatly improve the recognition accuracy of spatial domains and denois while maintaining spatial expression patterns. Importantly, STAGATE can be extended to multiple consecutive sections to reduce batch effects between sections and efficiently extract three-dimensional (3D) expression domains from reconstructed 3D tissues.

The study, titled "Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder," was published in Nature Communications on April 1, 2022.

The function of complex tissues is fundamentally related to the spatial context of different cell types. The relative position of transcriptional expressions in tissues is critical to understanding their biological function and describing interactive biological networks. Breakthrough technologies in spatially resolved transcriptomics (STs), such as 10x Visium, Slide-seq, Stereo-seq, and PIXEL-seq, have enabled genome-wide analysis of gene expression at capture locations at multiple or even subcellular levels.

Deciphering spatial domains (that is, regions with similar spatial expression patterns) is one of the great challenges facing ST. For example, the laminar tissue of the human cerebral cortex is particularly relevant to its biological function, where cells located in different cortex tend to differ in expression, morphology, and physiology. Most existing clustering methods do not effectively use the available spatial information.

These nonspatial methods can be roughly divided into two categories. The first class uses traditional clustering methods such as k-means and Louvain algorithms. The resolution of these methods depending on the ST technique is limited to cases where the number of spots is small or sparse, and the clustering results in tissue sections may be discontinuous. The second category utilizes cell-type signatures defined by single-cell RNA-SEQ to deconstruct spots. They are not suitable for ST data at cellular or subcellular resolution levels.

Recently, some new algorithms have adjusted clustering methods by considering similarities between adjacent spots to better account for the spatial dependence of gene expression. These methods show a significant improvement in the spatial domain that identifies parts of the brain and cancerous tissue. For example, Bayesspace is a Bayesian statistical method that encourages neighboring spots to belong to the same cluster by introducing spatial neighbor structures before they are introduced. Giotto identifies the spatial domain by previously implementing the hidden Markov Random Field (HMRF) model by using spatial neighbors. STLEarn defines morphological distances based on features extracted from histological images and uses this distance as well as spatial neighbor structures to smooth gene expression. SEDR uses a network of deep autoencoders to learn gene representations and uses morph map autoencoders to simultaneously embed spatial information.

Spagcn also applies graphical convolutional networks to integrate gene expression and spatial location, combined with self-monitoring modules to identify domains. In addition, a recently developed method named repept utilizes a supervised image segmentation method to perform organizational structure recognition. Although these methods take into account the spatial structure of STS, the similarity of their adjacent points is predetermined prior to training and cannot learn adaptively.

In addition, these methods no longer consider the spatial similarity of spots at spatial domain boundaries and no longer integrate spatial information to confer and desovify gene expression. More importantly, these methods cannot be applied to multiple contiguous parts to reconstruct a 3D (3D) ST model and extract a 3D expression domain.

Illustration: STAGATE overview. (Source: Thesis)

Here, the researchers developed STAGATE, a fast and user-friendly spatial domain recognition method that can be seamlessly integrated into standard analysis workflows by taking the "AnnData" object of the Scanpy package as input. STAGATE converts spatial location information into SNNs and further incorporates graph attention autoencoders to integrate SNNs and expression profiles.

Figure: STAGATE improves the recognition of the structure of the middle layer in human dorsolateral prefrontal cortex (DLPFC) tissues. (Source: Thesis)

The researchers tested STAGATE's performance on various ST data generated on different platforms of different spatial resolutions. They found that laminar flow tissue of the dorsolateral prefrontal cortex (DLPFC) and mouse olfactory bulbs was revealed precisely. In addition, STAGATE identified the known organizational structure of the hippocampus, clearly covering its spatial domain. The ability to express shedding was also demonstrated by comparing it with ISH images. Finally, they illustrate THE ABILITY OF STAGATE to mitigate the batch effect between successive parts of a pseudo-3D ST model and extraction of 3D expression domains.

Figured: STAGATE improves the identification of known tissue structures in mouse hippocampal tissues.

STAGATE's success is largely due to the use of graph attention mechanisms to consider spatial neighbor information. However, current STAGATE focuses on the integration of expression spectra and spatial information and does not utilize histological images. Existing methods, using histological images as inputs, such as STYLERN, do not achieve good performance in comparison. STLEarn uses a pre-trained neural network to extract the features of the image and further calculate the morphological distance by cosine distance. The researchers argue that this predefined approach does not take advantage of the flexibility of deep learning and can extend attention mechanisms to conveniently integrate histological image features.

In the study, the researchers focused primarily on sequencing-based ST data that were not characterized at single-cell resolution. They further applied STAGATE to an image-based ST dataset of single-cell resolution generated by STARMAP technology that included the expression of 1020 genes on 1207 cells. Using the expert annotation structure as the gold standard, STAGATE has the highest clustering accuracy (ARI = 0.544) compared to the other five methods, while SpaGCN ranks second (ARI = 0.484).

In addition, given the link between spatial domain recognition and single-cell segmentation of image-based ST data, the researchers expect that STAGATE's ideas could be extended to single-cell segmentation tasks underway in subcellular resolution techniques in the near future. It is also desirable to improve the applicability of datasets generated through the use of new technologies.

Figure: STAGATE enhances the spatial pattern of tagged genes in the layers of the DLPFC dataset. (Source: Thesis)

STARGATE can handle SET data at different spatial resolutions. In general, STARGATE performs better on ST data of cellular or subcellular resolution due to the high similarity between adjacent points. For techniques with relatively low spatial resolution, the team introduced a cell type sensing module to describe heterogeneous spatial similarities. However, one potential limitation of STAGATE is that it will have the same neighbors from one part as those that belong to different parts. Future work may employ heterogeneous networks to better characterize 3D organizational models.

As spatial resolution and data scale increase, computational methods should meet the basic requirements for efficiency and scalability. They recorded the elapsed time that STAGATE spent on real data sets. STAGATE only takes about 40 minutes when processing the largest real-world dataset with more than 50k points. The researchers also benchmarked STAGATE's uptime and memory usage on simulated datasets of different sizes, arranged according to the location of the 10x Visium chip.

Numerical experiments show that STAGATE is fast, taking less than 40 minutes and using about 4GB of GPU memory to process datasets with 50k points. However, GPU memory usage is almost linearly correlated with the number of points and can be a bottleneck that limits the application of STAGATE to massive data sets. Future work is expected to improve the scalability of STAGATE by introducing a subgraph-based training strategy.

Illustration: STAGATE can mitigate batch processing effects between contiguous sections by merging 3D spatial networks. (Source: Thesis)

In addition, STAGATE is capable of detecting spatially variable genes within the spatial domain. Existing spatially variable gene recognition algorithms such as SPARK-X do not consider spatial domain information, which makes it difficult to identify genes that are spatially specifically expressed within small tissue structures. To illustrate this, the researchers compared differential expression genes from the STARGATE spatial domain with spark-X differential expression genes on the Slide-seq V2 dataset from mouse olfactory bulb tissue.

Specifically, STAGATE identified 959 domain-specific genes and SPARK-X searched for 2479 FREMs

The gene sets identified by the two methods overlap greatly, but SPARK-X ignores specific genes for some small tissue structures. For example, the mitral valve cell marker Gabra1 shows significant enrichment in the MCL domain, but SPARK-X does not recognize its spatial pattern. In addition, the Nefh gene also shows strong expression in the MCL domain. The researchers expect that STAGATE can facilitate the identification of tissues and the discovery of corresponding genetic markers.

"With the rapid development of spatial omics techniques and the continuous accumulation of data, this new model STAGATE can facilitate the accurate analysis of large-scale spatial transcriptome data and facilitate our understanding of organizational substructure," the authors said.

STAGATE Open Source Links:

https://github.com/zhanglabtools/STAGATE

https://doi.org/10.5281/zenodo.6330702

Related: https://medicalxpress.com/news/2022-04-artificial-intelligence-tissue-substructure-identification.html

Artificial Intelligence × [ Biological Neuroscience Mathematics Physics Materials ]

"ScienceAI" focuses on the intersection and integration of artificial intelligence with other cutting-edge technologies and basic sciences.

Welcome to follow the stars and click Likes and Likes and Are Watching in the bottom right corner.

Read on