laitimes

Multimodal AI is the future of medicine, and Google's three new models, Med-Gemini, ushered in a big upgrade

author:ScienceAI
Multimodal AI is the future of medicine, and Google's three new models, Med-Gemini, ushered in a big upgrade

Edit | Cabbage leaves

Many clinical tasks require knowledge of specialized data, such as medical imaging and genomics, that are not typically available in the training of general-purpose multimodal large models.

In the description of the previous paper, Med-Gemini surpassed the GPT-4 series models to achieve SOTA on a variety of medical imaging tasks!

Here, Google DeepMind wrote a second paper on Med-Gemini.

Based on Gemini's multimodal models, the team developed several models for the Med-Gemini family. These models inherit Gemini's core functionality and are optimized for medical use with fine-tuning of 2D and 3D radiology, histopathology, ophthalmology, dermatology, and genomic data:

1. Med-Gemini-2D: able to process radiology, pathology, dermatology, ophthalmology images;

2、Med-Gemini-3D:能够处理 CT 图像;

3、Med-Gemini-Polygenic:能够处理基因组「图像」。

该研究以「Advancing Multimodal Medical Capabilities of Gemini」为题,于 2024 年 5 月 6 日发布在 arXiv 预印平台。

Multimodal AI is the future of medicine, and Google's three new models, Med-Gemini, ushered in a big upgrade

Medical data from different sources, such as biobanks, electronic health records, medical imaging, wearables, biosensors, and genome sequencing, is driving the development of multimodal AI solutions to better capture the complexity of human health and disease.

While AI in medicine is primarily focused on narrow tasks with a single input and output type, recent advances in generative AI show promise in solving multimodal, multi-task challenges in healthcare settings.

Multimodal generative AI, represented by powerful models such as Gemini, has great potential to revolutionize healthcare. While medicine is the data source for the rapid iteration of these new models, generic models often do not perform well when applied in the medical field due to their highly specialized data.

基于 Gemini 的核心功能,DeepMind 推出了 Med-Gemini 系列的三个新模型,Med-Gemini-2D、Med-Gemini-3D、Med-Gemini-Polygenic。

Multimodal AI is the future of medicine, and Google's three new models, Med-Gemini, ushered in a big upgrade
Multimodal AI is the future of medicine, and Google's three new models, Med-Gemini, ushered in a big upgrade

图示:Med-Gemini 概述。 (来源:论文)

More than 7 million data samples from 3.7 million medical images and medical cases were used to train the model. Use a variety of visual Q&A and image captioning datasets, including some private datasets from hospitals.

To process 3D data (CT), a Gemini video encoder is used, where the time dimension is treated as the depth dimension. To process genomic data, risk scores for various traits are encoded as RGB pixels in the image.

Multimodal AI is the future of medicine, and Google's three new models, Med-Gemini, ushered in a big upgrade

Illustration: An example of predicting coronary artery disease using an individual's PRS images and demographic information. (Source: Paper)

Med-Gemini-2D

Based on expert assessments, Med-Gemini-2D sets a new standard for AI-based chest X-ray (CXR) report generation, surpassing the best results of the two previous independent datasets, with absolute superiority of 1% and 12%, with 57% and 96% of normal case reports and 43% and 65% of abnormal case reports for AI, with quality "comparable" or even "better" compared to the original radiologist's report.

Multimodal AI is the future of medicine, and Google's three new models, Med-Gemini, ushered in a big upgrade

Illustration: Performance of the Med-Gemini-2D on a chest X-ray classification task. (Source: Paper)

Med-Gemini-2D outperforms the general-purpose larger Gemini 1.0 Ultra model for distributed chest X-ray classification tasks (examples from the same dataset were seen during training). Performance varies for out-of-distribution tasks.

Multimodal AI is the future of medicine, and Google's three new models, Med-Gemini, ushered in a big upgrade

Figure: Med-Gemini-2D histopathology image classification performance. (Source: Paper)

Med-Gemini mostly outperformed Gemini Ultra on histopathological classification tasks, but failed to outperform pathology-specific underlying models.

Multimodal AI is the future of medicine, and Google's three new models, Med-Gemini, ushered in a big upgrade

Figure: Performance of the PAD-UFES-20 classification task. (Source: Paper)

在皮肤病变分类上,观察到类似的趋势(特定领域模型 > Med-Gemini > Gemini Ultra),尽管 Med-Gemini 与特定领域模型非常接近。

Multimodal AI is the future of medicine, and Google's three new models, Med-Gemini, ushered in a big upgrade

Figure: Comparison of the performance of a supervised model trained using additional data for fundus image classification. (Source: Paper)

For the ophthalmic classification, a similar situation is seen again. Note that the domain-specific model is trained on about 200x the data, so Med-Gemini performs quite well in comparison.

Multimodal AI is the future of medicine, and Google's three new models, Med-Gemini, ushered in a big upgrade

Figure: Evaluation details of a VOA mission. (Source: Paper)

The team also evaluated the Med-Gemini-2D model for medical visual Q&A (VQA). Here, their model is very powerful on many VQA tasks, often beating the SOTA model. The Med-Gemini-2D performed well on CXR classification and radiological VQA, exceeding SOTA or baseline on 17 of the 20 tasks.

Multimodal AI is the future of medicine, and Google's three new models, Med-Gemini, ushered in a big upgrade

Illustration: The assessment details the generation of a chest X-ray report. (Source: Paper)

In addition to a simple narrow interpretation of medical images, the authors evaluated the performance of Med-Gemini-2D in chest X-ray radiology report generation and observed that it achieved SOTA based on the evaluation of radiologists in radiology!

Med-Gemini-3D

Multimodal AI is the future of medicine, and Google's three new models, Med-Gemini, ushered in a big upgrade

Figure: Human assessment results generated by the head CT volume report. (Source: Paper)

Med-Gemini-3D is not only only suitable for 2D images, but also for automating end-to-end CT report generation. According to expert assessments, 53% of these AI reports were considered clinically acceptable, and although additional research is needed to meet the quality of reports from expert radiologists, this is the first generative model capable of accomplishing this task.

Med-Gemini-Polygenic

Finally, the health outcome prediction of Med-Gemini-Polygenic was evaluated based on polygenic risk scores for various traits. This model is generally superior to existing baselines.

Multimodal AI is the future of medicine, and Google's three new models, Med-Gemini, ushered in a big upgrade

Figure: Prediction of health outcomes using Med-Gemini-Polygenic compared to two baselines of unevenly distributed and out-of-distribution outcomes. (Source: Paper)

Here are some examples of multimodal conversations supported by Med-Gemini!

Multimodal AI is the future of medicine, and Google's three new models, Med-Gemini, ushered in a big upgrade

Illustration: Example of a 2D medical image conversation through an open-ended Q&A. (Source: Paper)

In histopathology, ophthalmology, and dermatology image classification, Med-Gemini-2D surpassed baseline in 18 of the 20 tasks and approached task-specific model performance.

epilogue

Overall, this work has made useful progress on general-purpose multimodal medical AI models, but it is clear that there is still a lot of room for improvement. Many domain-specific models outperform Med-Gemini, but Med-Gemini is able to perform well with less data and a more generic approach. Interestingly, Med-Gemini appears to perform better on tasks that rely on more language understanding, such as VQA or radiology report generation.

The researchers envision a future where all of these individual functions are integrated into an integrated system to perform a complex set of multidisciplinary clinical tasks. AI works alongside humans to maximize clinical outcomes and improve patient outcomes.

Paper link: https://arxiv.org/abs/2405.03162

Related content: https://twitter.com/iScienceLuvr/status/1789216212704018469

Read on