Microsoft's Pan Haifeng, University of Washington Sheng's team released the first full-slide digital pathology model

编辑 | ScienceAI

In recent years, the vigorous development of digital pathology has become an important part of the accelerated breakthrough of precision medicine. In the process of cancer care, the use of whole-slide imaging technology to convert tumor tissue samples into high-resolution digital images has become a routine technique. Billion-pixel pathology images contain a variety of tumor microenvironment information, providing unprecedented opportunities for cancer typing and diagnosis, survival analysis, and precision immunotherapy.

Recently, the generative AI revolution has provided a powerful solution for accurately perceiving and analyzing massive amounts of information in pathology images. At the same time, breakthroughs in multimodal generative AI technology will help understand digital pathology images from multiple scales in time and space and integrate them with other biomedical modalities, so as to better depict the evolution and development process of patients' diseases, and assist doctors in clinical diagnosis and treatment.

However, due to the large-scale, high-resolution, and complex features of digital pathology images, it is very challenging to efficiently process and understand the complex patterns from a computational perspective. After the digital transformation of each full slice, it will contain billions of pixels, and its area will be more than 100,000 times that of a natural image, making it difficult to apply existing computer vision models. The computational complexity of traditional vision models, such as Vision Transformers, increases rapidly as the size of the input image increases. At the same time, clinical medical data has the characteristics of cross-scale, multi-modality and high noise, while most of the existing pathology models are based on standard public datasets, which still have a considerable distance from real-world applications.

To this end, researchers from Microsoft Research, the medical network at Providence, and the University of Washington have jointly proposed GigaPath, the first full-slide scale digital pathology model.

The GigaPath model adopts a two-stage cascade structure and the LongNet architecture recently developed by Microsoft Research, which efficiently solves the problem of processing and understanding images at the gigapixel level. Providence's researchers collected 170,000 full-slide digital pathology images licensed from 30,000 patients at its 28 U.S. hospitals, for a total of 1.3 billion pathology tiles. Researchers at Microsoft, the University of Washington, and Providence collaborated to pre-train GigaPath on this real-world data at scale.

The experimental results showed that GigaPath achieved leading results in 25 of the 26 tasks, including 9 cancer classification and 17 pathomics tasks, and was significantly higher than the existing methods in 18 tasks.

The researchers believe that the study demonstrates the importance of full-slide scale modeling and pre-training of large-scale real-world data, and that GigaPath will open up new possibilities for more advanced cancer care and clinical discovery.

It is worth mentioning that GigaPath's model and code have been open-sourced, and researchers from all over the world are welcome to explore and use GigaPath.

相关研究以《A whole-slide foundation model for digital pathology from real-world data》为题，于 5 月 22 日发布在《Nature》上。

Paper link: https://www.nature.com/articles/s41586-024-07441-w

Model open source address: https://huggingface.co/prov-gigapath/prov-gigapath

Code open source address: https://github.com/prov-gigapath/prov-gigapath

way

Figure 1: Schematic diagram of the GigaPath model.

GigaPath uses a two-stage curriculum consisting of tile-level pre-training using DINOv2 and full-slice-level pre-training using a mask autoencoder with LongNet (see Figure 1). DINOv2 is a standard self-supervised method that combines contrast loss and mask reconstruction loss when training the teacher and student Vision Transformer. However, due to the computational challenges posed by self-attention itself, its application is limited to small images, such as 256 × 256 tiles.

For full-slide-level modeling, we applied dilated attention from LongNet (https://arxiv.org/abs/2307.02486) to digital pathology (see Figure 2).

To process a long sequence of image tiles for the entire full slice, we introduce a series of incremental sizes that subdivide the tile sequence into fragments of a given size. For larger fragments, LongNet introduces sparse attention, which is proportional to the fragment length, thus counteracting the squared growth. The largest size fragment will cover the entire whole slice. This enables remote dependencies to be captured in a systematic manner while maintaining the ease of computation (context length is linear).

Figure 2: Schematic diagram of the LongNet model.

Main experimental results

In terms of cancer classification and diagnosis, the task goal is to classify fine-grained subtypes according to pathological sections. For example, for ovarian cancer, the model needs to distinguish between six subtypes: clear cell ovarian cancer, endometrioid ovarian cancer, high-grade serous ovarian cancer, low-grade serous ovarian cancer, mucinous ovarian cancer, and ovarian carcinosarcoma.

GigaPath achieved leading results in all nine cancer classification tasks, with significant accuracy gains in six of these cancer categories. For six cancers (breast, kidney, liver, brain, ovarian, central nervous system), GigaPath has an AUROC of 90% or higher. This is a good start for downstream applications in the field of precision health, such as cancer diagnosis and prognosis.

In the pathomics task, the goal of the task is to predict whether the tumor exhibits a specific clinically relevant gene mutation based on whole section images alone. This prediction task helps to uncover the rich connections between tissue morphology and genetic pathways that are difficult to detect in humans. With the exception of some known specific cancer types and mutation pairs, the question of how much gene mutation signal is present in whole section images remains an unanswered question. In addition, in some experiments, the researchers considered pan-cancer scenarios, i.e., universal signals to identify gene mutations across all cancer types and very diverse tumor morphologies.

In such a challenging scenario, GigaPath once again achieved leading performance in 16 of the 17 tasks and significantly outperformed the second place in 12 of them. Gigapath can extract genetically relevant pan-cancer and subtype-specific morphological signatures at the whole whole slice level, opening the door to complex future research directions in real-world scenarios.

In addition, the investigators further demonstrated the potential of GigaPath for multimodal visual language tasks by introducing pathology reports. Previously, work on pathological visual language pre-training tended to focus on small images at the tile level.

Instead, GigaPath explores full-slice-level visual language pre-training. By continuing to pre-train pathology report pairs, report semantics are used to align the hidden space representation of pathology images. This is more challenging than traditional visual language pre-training, and without utilizing any fine-grained alignment information between individual image tiles and text fragments, GigaPath significantly outperforms three state-of-the-art pathology visual language models in standard visual language tasks.

summary

Through a rich and comprehensive experiment, the researchers proved that the relevant research work of GigaPath is a good practice at the level of pre-training at the full slice level and at the level of multimodal visual language modeling.

It is worth mentioning that although GigaPath has achieved a leading result in multitasking, there is still a lot of room for improvement in some specific task levels. At the same time, although researchers have explored the visual language multimodal task, there are still many specific questions to be explored on the road to building a multimodal dialogue assistant at the pathological level.

Author Information:

GigaPath is a collaborative program that spans Microsoft Research, Providence Health Systems, and the Paul Allen School of Computer Science at the University of Washington. Among them, Hanwen Xu, a second-year Ph.D. student from Microsoft Research and the University of Washington, and Naoto Usuyama, a principal researcher from Microsoft Research, are the co-first authors of the paper. Dr. Hoifung Poon, General Manager from the Health Futures team at Microsoft Research, Professor Sheng Wang from the University of Washington, and Dr. Providence from Providence. Carlo Bifulco is the co-corresponding author of the paper.

许涵文：华盛顿大学二年级在读博士生。研究方向为AI和医学交叉。科研成果发表于Nature, Nature Communications, Nature Machine Intelligence, AAAI等。曾担任Nature Communications, Nature Computational Science等子刊审稿人。

王晟：华盛顿大学计算机系助理教授，研究方向专注于AI和医学交叉。科研成果发表于Nature, Science, Nature Biotechnology, Nature Machine Intelligence and The Lancet Oncology, 研究转化成果被Mayo Clinic, Chan Zuckerberg Biohub, UW Medicine,Providence等多个医疗机构使用。

Haifeng Pan is a Health Futures General Manager at Microsoft Research, with research interests in generative AI basic research and precision medicine applications. It has won the best paper award in several AI summits, and the open-source biomedical model published on HuggingFace has been downloaded tens of millions of times, and some of the research results have begun to be transformed into applications in cooperative medical institutions and pharmaceutical companies.

Microsoft's Pan Haifeng, University of Washington Sheng's team released the first full-slide digital pathology model

Read on

【Role Model by Your Side (18)】Li Haifeng's Team: "Doers" on the Road to Rural Revitalization

Yunhai Fenglin: A realistic version of flowing ink and blue

Wu Haifeng, member of the party group and deputy director of the Provincial Department of Science and Technology, came to Shushan District for investigation

"Wu Song" Ding Haifeng fell in love with "Pan Jinlian" and proposed divorce to his original partner, and a letter reversed the situation

The GAME ON Game Industry Overseas Summit came to a successful conclusion

Zhou Geng Suhai summit met with Tan Hua, chairman of China Energy Gezhouba Group, and his party

He Haifeng: I believe in the power of reform and look forward to the role of finance

Don't refute it! The 3 most valuable gold medals in the history of the Chinese Olympic Games, Xu Haifeng + Liu Xiang + Pan Zhanle

Haifeng Chess Academy: Like a hot-blooded comic, Lai Junfu and badminton men's doubles won the championship

Shooting champion Xu Haifeng: won gold for the country, but he was ashamed of his wife and children all his life, and now he is 65 years old and has a haggard face

"Rich Flower in the World" Chen Hao: At the age of 30, he married Liu Haifeng, who married for the second time, and gave birth to 3 children in a row after marriage, and now he is very happy

Chinese Olympic champion Xu Haifeng walked into the big locust tree in Hongdong to experience a cultural journey to find roots

HUAWEI CLOUD Shang Haifeng: Practicing in-depth cloud use and working with governments and enterprises to reach new heights

Professor Dai Haifeng of Tongji University: Multi-level intelligent empowerment of lithium-ion batteries

rejected Pan Jinlian and Wang Siyi twice, and guarded his wife who was suffering from cancer, "Wu Song" Ding Haifeng was affectionate and righteous

Chinese health master Lin Haifeng: He died at the age of 51, and he used his life to tell everyone that he had nothing to do

CNCC | The future of multimodal affective computing under large models

The "Fuxi Eye" large model was released! It has the world's largest ophthalmic image database

New car | The AI large model is on the car, 13 new/27 optimizations, and the ZEEKR 009 glorious OTA upgrade

AI Daily: Fudan and Baidu's new models can generate 1-hour long videos; The new version of ChatGPT for Windows is launched; Two new features have been added to NotebookLM

Surveying and Mapping Bulletin | Ren Ping: Noise data visualization based on LOD1 city model

The terminal AI grading standard has been implemented, and the "fire" of the mobile phone model has burned to the agent

J Clin Invest丨Yang Weili/Li Shihua/Li Xiaojiang's team used monkey models to reveal new pathological mechanisms of Parkinson's disease

Tens of millions of dollars lost by poisoning for large model training? Anthropic found a hidden bug in the LLM codebase

Nearly 1,000 teenagers in the city gathered at Zhonghai Expo to show their skills in the three major model competitions of navigation, aviation and architecture

DeepMind and MIT developed Fluid, which enables autoregressive models to achieve large-scale expansion of Wensheng graphs

AI Weekly | ByteDance's large model training was "poisoned"; Microsoft will terminate the Azure OpenAI service for individuals in China

ByteDance responded to the attack on the intern for the training of the large model: it has been dismissed and does not affect the online business

A number of large models have been rolled out in the field of traditional Chinese medicine, and the "AI old Chinese medicine" is coming?

Shoot the king to bomb? Photorealistic generative world model, with Pixar investment

Tencent, Huawei, etc. access to DeepSeek lose more than 400 million yuan per month, and the MaaS model as a service is about to be subverted? Titanium media AGI

The sex robot was unexpectedly empowered by a large model, and the concept stocks of adult products rose collectively, against the sky?