laitimes

CJTER丨Introduction to the Technology of Medical-Engineering Integration Science and Engineering Experts (Phase 4): Precise Segmentation Technology of Thyroid Region in Ultrasound Video Based on Attention Mechanism

Preface:

According to the strategic requirements of "Healthy China 2030" and "Made in China 2025", medicine is becoming the integration point of knowledge flow and the flashpoint of innovation, and the development of the combination of medicine and engineering is inseparable from the close combination of medicine and science and engineering technology. The journal "Chinese Tissue Engineering Research" and the Liaoning Society of Cell Biology have set up a special column of "Technical Introduction to Science and Engineering Experts in the Integration of Medicine and Engineering" to build a bridge for scientific research cooperation and product transformation between science and engineering experts and clinicians.

1. Background:

The thyroid gland is an important endocrine organ in the human body, located below the thyroid cartilage in the front of the neck [1], secreting thyroid hormones and regulating the body's metabolism [2]. In the past 10 years, the incidence of thyroid disease has been increasing, with an annual incidence of more than 10% [3]. Judgment of the thyroid gland plays an important role in the diagnosis of Graves', subacute thyroiditis, goiter, thyroid nodules, and thyroid cancer, as many of the typical symptoms of these diseases can be observed in the glandular area [4]. Thyroid nodules, located in the thyroid region, are one of the most prevalent thyroid diseases, occurring in 4 to 7 percent of patients and incidentally found on ultrasonography in 19 to 67 percent of people [5]. Thyroid nodules can be classified as hyperechoic, isoechoic, and hyperechoic. Previous studies have shown that the incidence of malignant thyroid nodules is 0.1%~0.2%, and hypoechoic nodules with irregular borders are more likely to develop into malignant nodules [6]. Thyroid disease can be diagnosed with computed tomography, magnetic resonance imaging, ultrasound imaging, and imaging nuclide imaging. Among the various diagnostic tools, ultrasound is often used to obtain valuable features in portable, low-cost, and real-time to assess pathological changes in the thyroid gland and thyroid nodules, such as margins, shape, size, and volume. The Thyroid Imaging, Reporting, and Data System (TI-RADS) [7], published by the United States Academy of Imaging, classifies important ultrasound features for the diagnosis of thyroid nodules into five categories, including composition, echogenicity, shape, margins, and echogenic foci. The morphology and boundaries of thyroid nodules are the key features to distinguish benign and malignant nodules, especially the ultra-wide morphology and needle-like margins are the two main features to judge the malignancy of thyroid nodules. However, due to the relatively low quality of B-mode ultrasound images and interference with noise, direct visualization of thyroid features from a large number of ultrasound images is time-consuming and relies on the experience of radiologists.

With the continuous development of computer technology, artificial intelligence algorithms continue to improve our lives, and the medical system is gradually becoming intelligent. The application of intelligent algorithms to assist clinical diagnosis and surgical guidance of robot-assisted diagnosis systems can effectively help doctors improve work efficiency and reduce patients' suffering. Medical image segmentation is a complex and critical step in the field of medical image processing and analysis, the purpose of which is to segment the parts of medical images with some special meanings, and extract relevant features, so as to provide a reliable basis for clinical diagnosis and pathology research, and assist doctors to make more accurate diagnoses. Thyroid nodule segmentation based on computer vision technology, as a medical aid, can promote the diagnosis of thyroid diseases and provide valuable auxiliary information for clinicians to make the best diagnostic decisions. At present, the robot-assisted diagnosis system that applies intelligent algorithms to assist clinical diagnosis and surgical guidance can effectively help doctors improve work efficiency and reduce patients' suffering. The medical image segmentation is committed to helping clinicians focus on specific areas of the disease and extract detailed information so that doctors can make more accurate diagnoses, which is of guiding significance for doctors' clinical diagnosis and subsequent treatment plans.

2. Technical introduction

Automatic segmentation of targets from medical images is a difficult task because medical images have a high level of complexity and lack of simple linear features. In addition, the accuracy of the segmentation results is also affected by factors such as partial volumetric effect, gray inhomogeneity, artifacts, and the proximity of gray scale between different soft tissues [6]. From the perspective of medical image processing process, grayscale-based and texture-based feature technology is a conventional classification method. This project applies a deep learning framework to the segmentation of the thyroid gland, and the deep learning method mimics the behavior of human radiologists: 1) describing the low-level features of medical images, and 2) identifying high-level semantic differences between the thyroid gland and other anatomical structures.

Through the cooperation with the Department of Ultrasound Imaging of Shengjing Hospital, we discussed the practicability of deep learning algorithms applied to thyroid segmentation to assist in the clinical diagnosis of thyroid diseases, and developed a deep learning algorithm H-TUNet that can accurately segment thyroid regions.

In this work, we propose an H-TUNet method that integrates the intra-frame semantic features and inter-frame contextual information of cases in thyroid ultrasound scans to accurately segment the thyroid gland in thyroid ultrasound images. The structure of our proposed H-TUNet algorithm is shown in Figure 1:

CJTER丨Introduction to the Technology of Medical-Engineering Integration Science and Engineering Experts (Phase 4): Precise Segmentation Technology of Thyroid Region in Ultrasound Video Based on Attention Mechanism

Fig.1 Structure of the H-TUNet algorithm

The algorithm fuses a 2D intra-frame feature extraction network 2D Transformer UNet (2D TUNet) and a 3D inter-frame feature extraction network 3D Transformer UNet (3D TUNet). Firstly, the algorithm extracts target intra-frame features from a single thyroid ultrasound frame by combining 2D TUNet with Multi-scale Cross-Attention Mechanism (MSCAT). Then, the semantic probability map of the ultrasonic frame extracted by 2D TUNet was superimposed with the original image, and sent to the designed 3D TUNet to learn the inter-frame features. Finally, the intra-frame and inter-frame features are fused through the hybrid feature fusion layer to achieve accurate thyroid segmentation.

CJTER丨Introduction to the Technology of Medical-Engineering Integration Science and Engineering Experts (Phase 4): Precise Segmentation Technology of Thyroid Region in Ultrasound Video Based on Attention Mechanism

Fig.2 The specific structure of the 2D TUNet algorithm

CJTER丨Introduction to the Technology of Medical-Engineering Integration Science and Engineering Experts (Phase 4): Precise Segmentation Technology of Thyroid Region in Ultrasound Video Based on Attention Mechanism

Figure 3 Structure of the MSCAT module

In order to make full use of the low-level and high-level features of each frame of 2D ultrasound image, we propose a U-shaped network, named TUNet, which consists of two parts: encoder and decoder. When segmenting objects in ultrasound images with a lot of noise, the original UNet has the following disadvantages due to the low and ambiguous contrast and boundary between organs: 1) the maximum set operation in the encoding scheme reduces the resolution of the feature map, resulting in the loss of information of the low-level features, resulting in insufficient semantic representation; 2) The short-cut connection in the decoding scheme directly reuses the low-level features without refinement, so that the irrelevant or noisy information in the low-level feature map is retained, and those important high-level features are interrupted. In order to solve the above problems, we filter and fuse the shallow and deep features extracted by the network through the Multi-scale Cross-Attention Transformer (MSCAT) module in the coding layer to filter the noise information, so that the useful features in the lower levels can be better retained in the intra-frame semantic feature representation.

CJTER丨Introduction to the Technology of Medical-Engineering Integration Science and Engineering Experts (Phase 4): Precise Segmentation Technology of Thyroid Region in Ultrasound Video Based on Attention Mechanism

(1)

Equation 1 is the mathematical expression of the fusion of each feature in MSCAT, where and and respectively represent the query (Q), key (K) and value (V) generated after the embedding matrix operation of the extracted features, which represents the dimension of the features and ensures the stability of the gradient during training. The similarity is calculated by Q and V, the weight matrix is obtained, and the weighted sum of the weight matrix and the value K of the retained features is used to obtain the features after final filtering and fusion. The decoder fuses the features from the encoder by skipping the connection, recovers the lost spatial information from the high-level features by reusing the feature map of the encoder, and obtains the features of the final 2D image through the deconvolution operation of the decoder.

For the ultrasound image sequence of the same case obtained by ultrasound scanning, we used 3D Transformer UNet to extract inter-frame contextual features. At the end of the coding layer of the 3D TUNet network, we design a self attention module to give different channel weights. Semantic probability maps from features and ultrasound frames from 2D-TUNet are stacked as inputs to 3D TUNet, enabling better contextual information extraction. Finally, we integrate intra-frame semantic features and inter-frame contextual features. As a result, hybrid features can be co-optimized for better thyroid segmentation.

CJTER丨Introduction to the Technology of Medical-Engineering Integration Science and Engineering Experts (Phase 4): Precise Segmentation Technology of Thyroid Region in Ultrasound Video Based on Attention Mechanism

Fig.4 Schematic diagram of the structure of 2D TUNet

CJTER丨Introduction to the Technology of Medical-Engineering Integration Science and Engineering Experts (Phase 4): Precise Segmentation Technology of Thyroid Region in Ultrasound Video Based on Attention Mechanism

Fig.5. Structure of the self-attention module

2D TUNET can provide semantically representative intra-frame features, while 3D TUNET can extract inter-frame contextual features. Based on these two feature extraction parts, we propose a Hybrid Feature Fusion (HFF) layer to integrate, learn, and optimize these two features. The fusion process is mathematically described as follows:

CJTER丨Introduction to the Technology of Medical-Engineering Integration Science and Engineering Experts (Phase 4): Precise Segmentation Technology of Thyroid Region in Ultrasound Video Based on Attention Mechanism

(2)

In summary, our current work has the following contributions: 1. A 2D TUNet is designed to extract hierarchical intra-frame features for thyroid segmentation, in which the high-level features from the decoding layer and the low-level features from the coding layer are integrated through the proposed multi-scale cross-attention transformer module to improve the comprehensiveness of the extracted semantic features. 2. A 3D TUNet is proposed to further explore the inter-frame context features of thyroid segmentation, and one of the designed 3D self-attention converter modules is integrated into the lowest layer of 3D TUNet to refine the context features. 3. Unify 2D TUNet and 3D TUNet into an end-to-end framework where intra- and inter-frame features are integrated into hybrid features for state-of-the-art performance of thyroid segmentation.

3. Introduction of the project leader

CJTER丨Introduction to the Technology of Medical-Engineering Integration Science and Engineering Experts (Phase 4): Precise Segmentation Technology of Thyroid Region in Ultrasound Video Based on Attention Mechanism

Chi Jianning, male, born in November 1987 in Shenyang, Liaoning Province, is an associate professor and master's supervisor of the School of Robotics Science and Engineering, Northeastern University. In 2011, he was qualified for the "China Scholarship Council to Build a High-level University Public Doctoral Program", and in 2017, he received a Ph.D. in Computer Science from Saskatchewan University, Canada. His main research interests include image quality enhancement, medical image processing, object recognition, scene understanding, etc.

As the project leader, he presided over 1 youth project of the National Natural Science Foundation of China, 1 fund project of the Liaoning Provincial Department of Education, 1 basic scientific research fund cultivation project of the Central Universities, 1 horizontal project, and participated in 4 general projects of the National Natural Science Foundation of China, 1 project of the Natural Science Foundation of Liaoning Province, and 1 innovation project of the Key Laboratory of Intelligent Robots of Shenyang City. 1 medical imaging research project cooperated by the Medical Association of Saskatchewan Province of Canada and the University of Saskatchewan.

在图像处理和计算机视觉领域的IEEE Transactions期刊和著名会议上发表科研论文20余篇,其中SCI期刊论文12篇,包括IEEE Transactions on Image Processing, Neurocomputing, Computer Vision and Image Understanding, Journal of Digital Imaging担任Neurocomputing, IEEE Journal of Biomedical Engineering, IEEE ICRA等著名期刊与会议审稿专家。

4. Clinical pain points

Through background research, the main problems in the application of computer-aided diagnosis technology in the diagnosis of thyroid diseases in clinical practice are as follows:

1) The edge, shape, size and volume of the thyroid gland in the ultrasound image are crucial for the diagnosis of thyroid disease, but the speckle noise in the ultrasound image can blur the boundaries of the thyroid gland, resulting in the problem of lack of segmentation of the nodule target with irregular borders, which affects the evaluation of the pathological changes of the thyroid gland.

2) The appearance of glands varies greatly from patient to patient, and the division of thyroid datasets by previous medical image segmentation methods puts a part of each case into a network to learn, so each individual case requires a large number of labels [8]. However, in clinical application, the segmentation auxiliary diagnostic system for thyroid nodules cannot be trained in every case. In the past, image segmentation algorithms could only segment all cases after training, and the segmentation effect of new cases that did not appear in the training set was very poor, and the clinical application significance was limited. How to reduce the pressure of labeling, improve clinical applicability, and improve the segmentation effect of new cases is one of the important issues.

3) During ultrasonography, the thyroid target and thyroid nodule target of the same case will be deformed to a certain extent with the compression of the ultrasound machine, and in the clinical examination, these deformation information provides important reference information for the diagnosis of thyroid disease [9], but in the previous thyroid and nodule segmentation algorithms, each thyroid image is segmented independently, and its change information in the sense of time and space is separated.

5. Clinical cooperation needs

This medical-engineering integration project is committed to introducing computer-processed medical images into clinical practice as an auxiliary means to help doctors better diagnose diseases. The development of medical image segmentation technology affects the development of other related technologies in medical image processing, such as visualization and three-dimensional reconstruction, and also occupies an extremely important position in the analysis of biomedical images, so that doctors can diagnose more accurately, and have guiding significance for doctors' clinical diagnosis and follow-up treatment plans.

Aiming at the problems in clinical practice:

1) Due to the characteristics of ultrasound equipment and ultrasound image imaging, clinical thyroid ultrasound images contain artifact noise, which sometimes affects the diagnosis of radiologists. Computer vision algorithms can process these noises, weaken the noise in ultrasound images, and assist radiologists to observe the ultrasound images of the thyroid gland more clearly, so as to make more accurate judgments. At the same time, the effects of computer vision algorithms need to be identified by radiologists, so we need medical experts to work together to promote the clinical application of algorithms.

2) The application of computer vision algorithms requires a large number of data labels, but there is a lack of public medical datasets, so it is necessary to cooperate with medical imaging doctors to produce more datasets to train models. At the same time, engineering technicians also need to continuously improve the generalization ability of algorithms to cope with the diversity of clinical cases.

3) For the clinical application of the algorithm, it is necessary for engineering technicians and medical imaging experts to jointly determine the appropriate and convenient form, jointly produce evaluation standards suitable for computer-aided diagnosis, and concretize and digitize the clinical auxiliary performance of computers. The two sides should actively communicate the needs of both sides.

6. References

Swipe up to read

[1] CHUAN-YU CHANG, YUE-FONG LEI, CHIN-HSIAO TSENG, 等. Thyroid Segmentation and Volume Estimation in Ultrasound Images[J/OL]. IEEE Transactions on Biomedical Engineering, 2010, 57(6): 1348-1357. DOI:10.1109/TBME.2010.2041003.

[2] KERAMIDAS E G, MAROULIS D, IAKOVIDIS D K. ΤND: A Thyroid Nodule Detection System for Analysis of Ultrasound Images and Videos[J/OL]. Journal of Medical Systems, 2012, 36(3): 1271-1281. DOI:10.1007/s10916-010-9588-7.

[3] KESARKAR X A, KULHALLI K V. Thyroid Nodule Detection using Artificial Neural Network[C/OL]//2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS). 2021: 11-15. DOI:10.1109/ICAIS50930.2021.9396035.

[4] FIGGE J J. Epidemiology of Thyroid Cancer[M/OL]//WARTOFSKY L, VAN NOSTRAND D. Thyroid Cancer: A Comprehensive Guide to Clinical Management. New York, NY: Springer, 2016: 9-15[2022-12-01]. https://doi.org/10.1007/978-1-4939-3314-3_2. DOI:10.1007/978-1-4939-3314-3_2.

[5] POPOVENIUC G, JONKLAAS J. Thyroid Nodules[J/OL]. Medical Clinics, 2012, 96(2): 329-349. DOI:10.1016/j.mcna.2012.02.002.

[6] Thyroid Nodule Classification in Ultrasound Images by Fine-Tuning Deep Convolutional Neural Network | SpringerLink[EB/OL]. [2022-11-30]. https://link.springer.com/article/10.1007/s10278-017-9997-y.

[7] TESSLER F N, MIDDLETON W D, GRANT E G, 等. ACR Thyroid Imaging, Reporting and Data System (TI-RADS): White Paper of the ACR TI-RADS Committee[J/OL]. Journal of the American College of Radiology, 2017, 14(5): 587-595. DOI:10.1016/j.jacr.2017.01.046.

[8] CHEN J, YOU H, LI K. A review of thyroid gland segmentation and thyroid nodule segmentation methods for medical ultrasound images[J/OL]. Computer Methods and Programs in Biomedicine, 2020, 185: 105329. DOI:10.1016/j.cmpb.2020.105329.

[9] Chen Yuefeng, Cong Shuzhen, Wang Yu, et al. Differential diagnosis of benign and malignant small solid thyroid nodules by ultrasound elastography[J/OL]. Chinese Journal of Medical Imaging Technology, 2012, 28(02): 252-255. DOI:10.13929/j.1003-3289.2012.02.050.

7. Contact the project team: Experts who are interested in this technology can scan the QR code below to communicate!

CJTER丨Introduction to the Technology of Medical-Engineering Integration Science and Engineering Experts (Phase 4): Precise Segmentation Technology of Thyroid Region in Ultrasound Video Based on Attention Mechanism
CJTER丨Introduction to the Technology of Medical-Engineering Integration Science and Engineering Experts (Phase 4): Precise Segmentation Technology of Thyroid Region in Ultrasound Video Based on Attention Mechanism

Read on