laitimes

Microsoft's Pan Haifeng, University of Washington Sheng's team released the first full-slide digital pathology model

Microsoft's Pan Haifeng, University of Washington Sheng's team released the first full-slide digital pathology model

编辑 | ScienceAI

In recent years, the vigorous development of digital pathology has become an important part of the accelerated breakthrough of precision medicine. In the process of cancer care, the use of whole-slide imaging technology to convert tumor tissue samples into high-resolution digital images has become a routine technique. Billion-pixel pathology images contain a variety of tumor microenvironment information, providing unprecedented opportunities for cancer typing and diagnosis, survival analysis, and precision immunotherapy.

Recently, the generative AI revolution has provided a powerful solution for accurately perceiving and analyzing massive amounts of information in pathology images. At the same time, breakthroughs in multimodal generative AI technology will help understand digital pathology images from multiple scales in time and space and integrate them with other biomedical modalities, so as to better depict the evolution and development process of patients' diseases, and assist doctors in clinical diagnosis and treatment.

However, due to the large-scale, high-resolution, and complex features of digital pathology images, it is very challenging to efficiently process and understand the complex patterns from a computational perspective. After the digital transformation of each full slice, it will contain billions of pixels, and its area will be more than 100,000 times that of a natural image, making it difficult to apply existing computer vision models. The computational complexity of traditional vision models, such as Vision Transformers, increases rapidly as the size of the input image increases. At the same time, clinical medical data has the characteristics of cross-scale, multi-modality and high noise, while most of the existing pathology models are based on standard public datasets, which still have a considerable distance from real-world applications.

To this end, researchers from Microsoft Research, the medical network at Providence, and the University of Washington have jointly proposed GigaPath, the first full-slide scale digital pathology model.

The GigaPath model adopts a two-stage cascade structure and the LongNet architecture recently developed by Microsoft Research, which efficiently solves the problem of processing and understanding images at the gigapixel level. Providence's researchers collected 170,000 full-slide digital pathology images licensed from 30,000 patients at its 28 U.S. hospitals, for a total of 1.3 billion pathology tiles. Researchers at Microsoft, the University of Washington, and Providence collaborated to pre-train GigaPath on this real-world data at scale.

The experimental results showed that GigaPath achieved leading results in 25 of the 26 tasks, including 9 cancer classification and 17 pathomics tasks, and was significantly higher than the existing methods in 18 tasks.

The researchers believe that the study demonstrates the importance of full-slide scale modeling and pre-training of large-scale real-world data, and that GigaPath will open up new possibilities for more advanced cancer care and clinical discovery.

It is worth mentioning that GigaPath's model and code have been open-sourced, and researchers from all over the world are welcome to explore and use GigaPath.

相关研究以《A whole-slide foundation model for digital pathology from real-world data》为题,于 5 月 22 日发布在《Nature》上。

Microsoft's Pan Haifeng, University of Washington Sheng's team released the first full-slide digital pathology model

Paper link: https://www.nature.com/articles/s41586-024-07441-w

Model open source address: https://huggingface.co/prov-gigapath/prov-gigapath

Code open source address: https://github.com/prov-gigapath/prov-gigapath

way

Microsoft's Pan Haifeng, University of Washington Sheng's team released the first full-slide digital pathology model

Figure 1: Schematic diagram of the GigaPath model.

GigaPath uses a two-stage curriculum consisting of tile-level pre-training using DINOv2 and full-slice-level pre-training using a mask autoencoder with LongNet (see Figure 1). DINOv2 is a standard self-supervised method that combines contrast loss and mask reconstruction loss when training the teacher and student Vision Transformer. However, due to the computational challenges posed by self-attention itself, its application is limited to small images, such as 256 × 256 tiles.

For full-slide-level modeling, we applied dilated attention from LongNet (https://arxiv.org/abs/2307.02486) to digital pathology (see Figure 2).

To process a long sequence of image tiles for the entire full slice, we introduce a series of incremental sizes that subdivide the tile sequence into fragments of a given size. For larger fragments, LongNet introduces sparse attention, which is proportional to the fragment length, thus counteracting the squared growth. The largest size fragment will cover the entire whole slice. This enables remote dependencies to be captured in a systematic manner while maintaining the ease of computation (context length is linear).

Microsoft's Pan Haifeng, University of Washington Sheng's team released the first full-slide digital pathology model

Figure 2: Schematic diagram of the LongNet model.

Main experimental results

In terms of cancer classification and diagnosis, the task goal is to classify fine-grained subtypes according to pathological sections. For example, for ovarian cancer, the model needs to distinguish between six subtypes: clear cell ovarian cancer, endometrioid ovarian cancer, high-grade serous ovarian cancer, low-grade serous ovarian cancer, mucinous ovarian cancer, and ovarian carcinosarcoma.

GigaPath achieved leading results in all nine cancer classification tasks, with significant accuracy gains in six of these cancer categories. For six cancers (breast, kidney, liver, brain, ovarian, central nervous system), GigaPath has an AUROC of 90% or higher. This is a good start for downstream applications in the field of precision health, such as cancer diagnosis and prognosis.

In the pathomics task, the goal of the task is to predict whether the tumor exhibits a specific clinically relevant gene mutation based on whole section images alone. This prediction task helps to uncover the rich connections between tissue morphology and genetic pathways that are difficult to detect in humans. With the exception of some known specific cancer types and mutation pairs, the question of how much gene mutation signal is present in whole section images remains an unanswered question. In addition, in some experiments, the researchers considered pan-cancer scenarios, i.e., universal signals to identify gene mutations across all cancer types and very diverse tumor morphologies.

In such a challenging scenario, GigaPath once again achieved leading performance in 16 of the 17 tasks and significantly outperformed the second place in 12 of them. Gigapath can extract genetically relevant pan-cancer and subtype-specific morphological signatures at the whole whole slice level, opening the door to complex future research directions in real-world scenarios.

In addition, the investigators further demonstrated the potential of GigaPath for multimodal visual language tasks by introducing pathology reports. Previously, work on pathological visual language pre-training tended to focus on small images at the tile level.

Instead, GigaPath explores full-slice-level visual language pre-training. By continuing to pre-train pathology report pairs, report semantics are used to align the hidden space representation of pathology images. This is more challenging than traditional visual language pre-training, and without utilizing any fine-grained alignment information between individual image tiles and text fragments, GigaPath significantly outperforms three state-of-the-art pathology visual language models in standard visual language tasks.

summary

Through a rich and comprehensive experiment, the researchers proved that the relevant research work of GigaPath is a good practice at the level of pre-training at the full slice level and at the level of multimodal visual language modeling.

It is worth mentioning that although GigaPath has achieved a leading result in multitasking, there is still a lot of room for improvement in some specific task levels. At the same time, although researchers have explored the visual language multimodal task, there are still many specific questions to be explored on the road to building a multimodal dialogue assistant at the pathological level.

Author Information:

GigaPath is a collaborative program that spans Microsoft Research, Providence Health Systems, and the Paul Allen School of Computer Science at the University of Washington. Among them, Hanwen Xu, a second-year Ph.D. student from Microsoft Research and the University of Washington, and Naoto Usuyama, a principal researcher from Microsoft Research, are the co-first authors of the paper. Dr. Hoifung Poon, General Manager from the Health Futures team at Microsoft Research, Professor Sheng Wang from the University of Washington, and Dr. Providence from Providence. Carlo Bifulco is the co-corresponding author of the paper.

许涵文:华盛顿大学二年级在读博士生。 研究方向为AI和医学交叉。 科研成果发表于Nature, Nature Communications, Nature Machine Intelligence, AAAI等。 曾担任Nature Communications, Nature Computational Science等子刊审稿人。

王晟:华盛顿大学计算机系助理教授,研究方向专注于AI和医学交叉。 科研成果发表于Nature, Science, Nature Biotechnology, Nature Machine Intelligence and The Lancet Oncology, 研究转化成果被Mayo Clinic, Chan Zuckerberg Biohub, UW Medicine,Providence等多个医疗机构使用。

Haifeng Pan is a Health Futures General Manager at Microsoft Research, with research interests in generative AI basic research and precision medicine applications. It has won the best paper award in several AI summits, and the open-source biomedical model published on HuggingFace has been downloaded tens of millions of times, and some of the research results have begun to be transformed into applications in cooperative medical institutions and pharmaceutical companies.

Read on