Multimodal AI is the future of medicine, and Google's three new models, Med-Gemini, ushered in a big upgrade

Edit | Cabbage leaves

Many clinical tasks require knowledge of specialized data, such as medical imaging and genomics, that are not typically available in the training of general-purpose multimodal large models.

In the description of the previous paper, Med-Gemini surpassed the GPT-4 series models to achieve SOTA on a variety of medical imaging tasks!

Here, Google DeepMind wrote a second paper on Med-Gemini.

Based on Gemini's multimodal models, the team developed several models for the Med-Gemini family. These models inherit Gemini's core functionality and are optimized for medical use with fine-tuning of 2D and 3D radiology, histopathology, ophthalmology, dermatology, and genomic data:

1. Med-Gemini-2D: able to process radiology, pathology, dermatology, ophthalmology images;

2、Med-Gemini-3D：能够处理 CT 图像；

3、Med-Gemini-Polygenic：能够处理基因组「图像」。

该研究以「Advancing Multimodal Medical Capabilities of Gemini」为题，于 2024 年 5 月 6 日发布在 arXiv 预印平台。

Medical data from different sources, such as biobanks, electronic health records, medical imaging, wearables, biosensors, and genome sequencing, is driving the development of multimodal AI solutions to better capture the complexity of human health and disease.

While AI in medicine is primarily focused on narrow tasks with a single input and output type, recent advances in generative AI show promise in solving multimodal, multi-task challenges in healthcare settings.

Multimodal generative AI, represented by powerful models such as Gemini, has great potential to revolutionize healthcare. While medicine is the data source for the rapid iteration of these new models, generic models often do not perform well when applied in the medical field due to their highly specialized data.

基于 Gemini 的核心功能,DeepMind 推出了 Med-Gemini 系列的三个新模型,Med-Gemini-2D、Med-Gemini-3D、Med-Gemini-Polygenic。

图示:Med-Gemini 概述。（来源：论文）

More than 7 million data samples from 3.7 million medical images and medical cases were used to train the model. Use a variety of visual Q&A and image captioning datasets, including some private datasets from hospitals.

To process 3D data (CT), a Gemini video encoder is used, where the time dimension is treated as the depth dimension. To process genomic data, risk scores for various traits are encoded as RGB pixels in the image.

Illustration: An example of predicting coronary artery disease using an individual's PRS images and demographic information. (Source: Paper)

Med-Gemini-2D

Based on expert assessments, Med-Gemini-2D sets a new standard for AI-based chest X-ray (CXR) report generation, surpassing the best results of the two previous independent datasets, with absolute superiority of 1% and 12%, with 57% and 96% of normal case reports and 43% and 65% of abnormal case reports for AI, with quality "comparable" or even "better" compared to the original radiologist's report.

Illustration: Performance of the Med-Gemini-2D on a chest X-ray classification task. (Source: Paper)

Med-Gemini-2D outperforms the general-purpose larger Gemini 1.0 Ultra model for distributed chest X-ray classification tasks (examples from the same dataset were seen during training). Performance varies for out-of-distribution tasks.

Figure: Med-Gemini-2D histopathology image classification performance. (Source: Paper)

Med-Gemini mostly outperformed Gemini Ultra on histopathological classification tasks, but failed to outperform pathology-specific underlying models.

Figure: Performance of the PAD-UFES-20 classification task. (Source: Paper)

在皮肤病变分类上,观察到类似的趋势(特定领域模型 > Med-Gemini > Gemini Ultra),尽管 Med-Gemini 与特定领域模型非常接近。

Figure: Comparison of the performance of a supervised model trained using additional data for fundus image classification. (Source: Paper)

For the ophthalmic classification, a similar situation is seen again. Note that the domain-specific model is trained on about 200x the data, so Med-Gemini performs quite well in comparison.

Figure: Evaluation details of a VOA mission. (Source: Paper)

The team also evaluated the Med-Gemini-2D model for medical visual Q&A (VQA). Here, their model is very powerful on many VQA tasks, often beating the SOTA model. The Med-Gemini-2D performed well on CXR classification and radiological VQA, exceeding SOTA or baseline on 17 of the 20 tasks.

Illustration: The assessment details the generation of a chest X-ray report. (Source: Paper)

In addition to a simple narrow interpretation of medical images, the authors evaluated the performance of Med-Gemini-2D in chest X-ray radiology report generation and observed that it achieved SOTA based on the evaluation of radiologists in radiology!

Med-Gemini-3D

Figure: Human assessment results generated by the head CT volume report. (Source: Paper)

Med-Gemini-3D is not only only suitable for 2D images, but also for automating end-to-end CT report generation. According to expert assessments, 53% of these AI reports were considered clinically acceptable, and although additional research is needed to meet the quality of reports from expert radiologists, this is the first generative model capable of accomplishing this task.

Med-Gemini-Polygenic

Finally, the health outcome prediction of Med-Gemini-Polygenic was evaluated based on polygenic risk scores for various traits. This model is generally superior to existing baselines.

Figure: Prediction of health outcomes using Med-Gemini-Polygenic compared to two baselines of unevenly distributed and out-of-distribution outcomes. (Source: Paper)

Here are some examples of multimodal conversations supported by Med-Gemini!

Illustration: Example of a 2D medical image conversation through an open-ended Q&A. (Source: Paper)

In histopathology, ophthalmology, and dermatology image classification, Med-Gemini-2D surpassed baseline in 18 of the 20 tasks and approached task-specific model performance.

epilogue

Overall, this work has made useful progress on general-purpose multimodal medical AI models, but it is clear that there is still a lot of room for improvement. Many domain-specific models outperform Med-Gemini, but Med-Gemini is able to perform well with less data and a more generic approach. Interestingly, Med-Gemini appears to perform better on tasks that rely on more language understanding, such as VQA or radiology report generation.

The researchers envision a future where all of these individual functions are integrated into an integrated system to perform a complex set of multidisciplinary clinical tasks. AI works alongside humans to maximize clinical outcomes and improve patient outcomes.

Paper link: https://arxiv.org/abs/2405.03162

Related content: https://twitter.com/iScienceLuvr/status/1789216212704018469

Multimodal AI is the future of medicine, and Google's three new models, Med-Gemini, ushered in a big upgrade

Read on

The testing service exceeded 2 billion person-times, and Jinyu Medical released a new report

Big S Studio responded to the drug test results and was attacked by netizens, MD: Drugs have dual attributes

Elephant Night Reading|Zhejiang University Master of Medicine has opened a bakery for 9 years: the back kitchen is my laboratory

【Civilized Practice in Chengguan and Establish a Correct Health Concept】The community of Xiaoyantan Village, Yanbei Street, jointly carried out medical lecture activities

Notice on Holding the Second Annual Academic Meeting of the Photodynamic Therapy and Rehabilitation Professional Committee of the Chinese Association of Rehabilitation Medicine (First Round)

Notice on Holding the 2024 Academic Annual Meeting of the Oral Disease Prevention and Rehabilitation Professional Committee of the Chinese Association of Rehabilitation Medicine (Second Round)

In 2011, the doctor of medicine died suddenly 5 times in 11 days of sperm donation, and his father demanded a claim of 4 million, what happened later

AI-driven "deep medicine" is transforming current healthcare practices

🔬 Recently, the New Year's money for medical devices has been exposed, and beauty bloggers have called for the replacement of the New Year's money due to the pain of their private parts, and doubts about medical devices on the Internet have come and gone. 🎉 New Year's money, this price is low

Why is the medical profession a charlatan? The sharing of netizens made me suddenly enlightened, so appropriate!

Sudden Death of Accidental Sperm Donation by Dr. Zheng Gang Causes Legal Disputes The society has an in-depth review of the safety of sperm donation

Eating seven or eight minutes full, the secret of human longevity! Both animal testing and human medical studies have confirmed

Xiao Xin shared: cellular automata model

The man stole 800 yuan of mobile phone models and was detained

Only Google's injured world has been achieved, but should the "all-round model" be followed?

Notice on Holding the 2024 Academic Annual Meeting of the Foot and Ankle Rehabilitation Professional Committee of the Chinese Association of Rehabilitation Medicine (Second Round)

Unraveling the Mystery of Memory: Ebbinghaus's Forgetting Curve and Mind Model Playing Cards Help You Grow and Leap

Big S Studio responded to the drug test results and was attacked by netizens, medical experts: drugs have dual attributes

After GPU, NPU becomes the standard configuration again, how do mobile phones and PCs carry large AI models?

The miracle awakening of the vegetative mother: a new life is conceived after a car accident, and the power of love creates a medical miracle!

Be a sneak peek! ByteDance is unprecedented! The large model is stunningly unveiled, and the price is as low as 99%!

39 million people watched Lei Jun's live test drive; Musk recruits second brain-computer experiment patient; DeepMind launches a large-scale model risk assessment framework

From "sky-high prices" to "fracture prices", large models are about to change

If you want to land a large model, let everyone afford to use it first

Direct interaction with hundreds of millions of users Third-party AI models accelerate access to the Weibo ecosystem

iFLYTEK Xinghuo large model empowerment, opening up the "new consciousness" of virtual people

When open source meets large models, what kind of changes will occur?

It is said that the senior management of the Tsinghua Department of the large model company has changed

58.com Sun Qiming: How to build a large model of life service vertical? Self-developed + open source with both hands

AI Dimensity Full Push, China's First End-to-End Large Model Mass Production on the Car Xpeng opens the era of AI intelligent driving

Medical artificial intelligence is moving towards the "new".

The price of large models has fallen, and the Internet-style "turf war" has reappeared, will big factories really lose money?