laitimes

Literature Express丨Develop and validate interpretable models that integrate multimodal information to improve the diagnosis of ovarian cancer

author:One life

As a common gynecological malignant tumor, the early diagnosis of ovarian cancer is crucial to improve the survival rate of patients. However, due to the lack of accurate non-invasive diagnostic tools, many patients with ovarian cancer are diagnosed at an advanced stage. Recently, a research team from mainland China published a study in the journal Nature Communications in which they developed and validated a novel diagnostic model called OvcaFinder, which has shown potential in improving the diagnostic accuracy and consistency of radiologists in identifying ovarian cancer by integrating ultrasound images, radiologist evaluation, and clinical parameters [1]. This article summarizes the core content of the research for the benefit of readers.

Research Methods:

This study adopted a retrospective study design and included a patient population from Sun Yat-sen University Cancer Center (SYSUCC) and Chongqing University Cancer Hospital who had at least one pathologically confirmed adnexal lesion on transvaginal ultrasound (TVUS). A total of 3972 B-type color ultrasound images of 724 lesions from patients at SYSUCC from February 2011 to May 2021 were collected. These images were randomly divided into a training set, a validation set, and an internal test set with a ratio of 7:1:2. The external validation dataset consisted of 2,200 images of 387 lesions from patients at the Affiliated Cancer Hospital of Chongqing University from December 2018 to June 2021.

In the study, five experienced radiologists participated in the reading, and they evaluated all the anonymized images without knowing any clinicopathological information. Lesions were risk-scored using the Ovarian-Adnexal Reporting and Data System (O-RADS)[2].

To build an image-based deep learning (DL) model, the research team used six different convolutional neural network architectures, including DenseNet12128, DenseNet16928, DenseNet20128, ResNet3429, EfficientNet-b530, and EfficientNet-b630 [3-5]. These models were initialized with ImageNet[6] pre-trained weights and fine-tuned on the training set. The generalization capabilities of the model are enhanced through the use of data augmentation techniques such as random horizontal flipping, rotation, and color jitter. Eventually, an ensemble DL model is formed by integrating the prediction results of these models.

The OvcaFinder model is a multimodal information model based on the random forest (RF) algorithm. The model integrated three clinical parameters (patient age, lesion diameter, and CA125 concentration), the radiologist's O-RADS score, and the prediction of the DL model (Fig. 1)。 To optimize model performance, the researchers developed multiple RF models and determined the optimal number of estimators. Finally, to enhance the interpretability of OvcaFinder, the researchers applied heat maps and Shapley values. The heat map is generated using Gradient Weighted Class Activation Mapping (Grad-CAM) technology, highlighting areas of the image that are critical to the model's predictions. Shapley values are used to quantify the contribution of each input feature to the model output, providing a global and local interpretation of the model's predictions.

Literature Express丨Develop and validate interpretable models that integrate multimodal information to improve the diagnosis of ovarian cancer

For statistical analysis, the researchers calculated the AUC, accuracy, sensitivity, specificity, positive predictive value, and negative predictive value of the model, and calculated 95% confidence intervals using the nonparametric bootstrap method. By comparing the performance of different models and readers, the researchers evaluated the diagnostic performance of OvcaFinder and performed statistical significance analysis using the pROC library and McNemar's test.

Findings:

Baseline information

As shown in Table 1, SYSUCC had 3972 B-type color ultrasound images, covering 296 (40.9%) benign and 428 (59.1%) malignant lesions, from a total of 724 patients (mean age: 48 ± 13 years; Range: 16-82 years old). The diameter of the lesion ranged from 10~224 mm, and the mean diameter was 74.3 mm (standard deviation (SD): 35.5 mm). The concentration range of CA125 was 4~37,827 U/mL. These patients were randomly divided into a training set (2941 images, 532 lesions), a validation set (334 images, 63 lesions), and an internal test dataset (697 images, 129 lesions). In the external cohort, there were patients from 387 (mean age: 43 ± 12 years; Range: 18-83 years old). The mean lesion diameter was 71.2 mm (SD: 35.0 mm). The concentration range of CA125 is 2~46,090 U/mL. Of the 509 malignant lesions, 57 were marginal tumors (11.2%). For malignant lesions, the average lesion diameter was 83.4 mm (range: 13~225 mm). With 35 U/mL as the threshold, nearly 88.2% (449/509) of patients were assessed for CA125 levels. On ultrasound images, ascites and peritoneal thickening or nodules were found in 272 and 306 patients, respectively.

Literature Express丨Develop and validate interpretable models that integrate multimodal information to improve the diagnosis of ovarian cancer

O-RADS score performance

Upon completion of the training, five readers showed high diagnostic performance in the adnexal tumor classification. O-RADS assessment scores are normalized to a range of 0 to 1 to calculate the performance of AUCs. The mean AUCs were 0.927 for the internal test dataset and 0.904 for the external cohort. The mean sensitivity and specificity were 96.2% and 73.3% in the internal dataset and 85.7% and 81.8% in the external cohort, respectively.

DL predicts model performance

DenseNet121, DenseNet169, DenseNet201, ResNet34, EfficientNet-b5 and EfficientNet-b6 achieved AUCs of 0.898~0.923 at the lesion level and 0.806~0.851 in the external test dataset using B-type color ultrasound images in the internal test dataset, which was worse than the final integrated DL model. The ensemble model showed an AUC of 0.970, a sensitivity of 97.3%, and a specificity of 74.1% in the internal dataset. In the external cohort, the AUC was reduced to 0.893, with a sensitivity of 88.9% and a specificity of 68.6% (Table 2). As Fig. 2 shows that the red areas in the heat map contribute the most to the classification of lesions, while the blue areas are less important. Specifically, on B-type color ultrasound images, irregular solid components or protrusions are highlighted in the heat map and are valuable features for malignant prediction. With regard to color Doppler images, the heat map focuses on areas where angiogenesis is abundant. This is consistent with the diagnostic criteria for ovarian tumors in clinical practice. For benign lesions, 27.8% (15/54) and 19.8% (60/306) of cases in the internal and external test datasets, respectively, showed hotspots. For malignant lesions, no hotspot display was observed in 4.0% (3/75) and 12.3% (10/81) of cases in the internal and external test datasets, respectively.

Literature Express丨Develop and validate interpretable models that integrate multimodal information to improve the diagnosis of ovarian cancer
Literature Express丨Develop and validate interpretable models that integrate multimodal information to improve the diagnosis of ovarian cancer

Clinical model performance

In the internal test dataset, the clinical model achieved an AUC of 0.936, a sensitivity of 97.3%, and a specificity of 40.7%. In the external cohort, the clinical model yielded an AUC of 0.842, a sensitivity of 85.2%, and a specificity of 53.3% (Table 2).

OvcaFinder model performance

As Fig. 3 OvcaFinder demonstrated higher performance than clinical models (AUC: 0.936, p=0.007) and image-based DL predictions (AUC: 0.970, p=0.152) in the in-house test dataset (AUC: 0.978 [95% CI: 0.953, 0.998]) by integrating clinical information, O-RADS scores, and image-based DL predictions. OvcaFinder also outperformed the clinical model (AUC: 0.842, p=4.65 × 10-5) and image-based DL prediction (AUC: 0.893, p=3.93 × 10-6) in the external test dataset, with an AUC of 0.947 (95% CI: 0.917, 0.970).

Literature Express丨Develop and validate interpretable models that integrate multimodal information to improve the diagnosis of ovarian cancer

To make a fair comparison, the investigators compared the specificity of the three models by maintaining similar sensitivities. In the internal test dataset, when sensitivity remained at 97.3%, OvcaFinder showed higher specificity than clinical models (40.7%, p=1.52 ×10-5) and DL predictions (74.1%, p=0.062). In the external cohort, OvcaFinder showed 90.5% specificity when maintaining sensitivity similar to that of other models, outperforming clinical models (53.3%, p = 2.21 × 10-29) and image-based DL predictions (68.6%, p = 1.36 × 10-20; Table 2). In addition, image-based DL prediction was observed to be the most important in OvcaFinder's decision prediction, followed by O-RADS score, CA125 concentration, patient age, and lesion diameter (Fig. 4)。

Literature Express丨Develop and validate interpretable models that integrate multimodal information to improve the diagnosis of ovarian cancer

The AUC values of readers ranged from 0.900 to 0.958. However, with the help of OvcaFinder, the AUC values increased significantly, and the internal test dataset ranged from 0.971 to 0.981 without reducing sensitivity. A similar improvement was observed in the external cohort for all readers. In addition, OvcaFinder improved the diagnostic accuracy of the reader and reduced false positives (Fig. 5 and Table 3). The mean false-positive rate decreased from 26.7% (range: 13.0~38.9%) to 13.3% (range: 7.4~18.5%, p = 0.029) and from 18.2% (range: 10.8~29.4%) to 9.9% (range: 8.2~12.4%, p = 0.033) in the internal and external cohorts, respectively, which may avoid unnecessary biopsies or surgeries.

Literature Express丨Develop and validate interpretable models that integrate multimodal information to improve the diagnosis of ovarian cancer
Literature Express丨Develop and validate interpretable models that integrate multimodal information to improve the diagnosis of ovarian cancer

Inter-reader consistency in ovarian cancer diagnosis is summarized in Table 4. The kappa values between readers in the internal and external test datasets were 0.711~0.924 and 0.588~0.796, respectively, indicating the consistency from fair to excellent. After using OvcaFinder, the kappa value between readers in the internal test dataset was increased to 0.886~0.983 and 0.863~0.933 in the external cohort, showing excellent consistency.

Literature Express丨Develop and validate interpretable models that integrate multimodal information to improve the diagnosis of ovarian cancer

Conclusions of the study

As a novel multimodal information integration model, OvcaFinder has shown great potential in improving the accuracy of ovarian cancer diagnosis. The model not only improves the diagnostic performance of radiologists and reduces unnecessary surgeries, but also enhances the interpretability of the model by providing decision interpretation through heat maps and Shapley values. The research team noted that future development of OvcaFinder will include further optimization of the model and validation in a broader patient population.

Bibliography:

1. Huiling, Xiang,Yongjie, Xiao,Fang, Li et al. Development and validation of an interpretable model integrating multimodal information for improving ovarian cancer diagnosis. PG - 2681[J] . Nat Commun, 2024, 15: 0.

2. Andreotti, R. et al. O-RADS US Risk Stratification and Management System: A Consensus Guideline from the ACR Ovarian-Adnexal Reporting and Data System Committee. Radiology 294, 168–185 (2020).

3. Huang G., Liu Z., Laurens V. & Weinberger K. Q. Densely Connected Convolutional Networks. IEEE Computer Society, 2261–2269 (2016).

4. He K., Zhang X., Ren S. & Sun J. Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–778 (2016).

5. Tan M. & Le Q. V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. International conference on machine learning. 6105–6114 (2019).

6. Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 84–90 (2017).

Disclaimer: This article is published with the support of AstraZeneca and is intended for healthcare professionals only

审批编号:CN-134932

Valid until: 2025-5-7

Read on