Abstract

Objective. To evaluate the ability of artificial neural network- (ANN-) based ultrasound radiomics to predict large-volume lymph node metastasis (LNM) preoperatively in clinical N0 disease (cN0) papillary thyroid carcinoma (PTC) patients. Methods. From January 2020 to April 2021, 306 cN0 PTC patients admitted to our hospital were retrospectively reviewed and divided into a training (n = 183) cohort and a validation cohort (n = 123) in a 6 : 4 ratio. Radiomic features quantitatively extracted from ultrasound images were pruned to train one ANN-based radiomic model and three conventional machine learning-based classifiers in the training cohort. Furthermore, an integrated model using ANN was constructed for better prediction. Meanwhile, the prediction of the two models was evaluated in the papillary thyroid microcarcinoma (PTMC) and conventional papillary thyroid cancer (CPTC) subgroups. Results. The radiomic model showed better discrimination than other classifiers for large-volume LNM in the validation cohort, with an area under the receiver operating characteristic curve (AUROC) of 0.856 and an area under the precision-recall curve (AUPR) of 0.381. The performance of the integrated model was better, with an AUROC of 0.910 and an AUPR of 0.463. According to the calibration curve and decision curve analysis, the radiomic and integrated models had good calibration and clinical usefulness. Moreover, the models had good predictive performance in the PTMC and CPTC subgroups. Conclusion. ANN-based ultrasound radiomics could be a potential tool to predict large-volume LNM preoperatively in cN0 PTC patients.

1. Introduction

Papillary thyroid carcinoma (PTC) is the most common pathological type of thyroid cancer. According to the World Health Organization (WHO), PTC with a maximum diameter of 10 mm is defined as papillary thyroid microcarcinoma (PTMC), and those with a maximum diameter of more than 10 mm are called conventional papillary thyroid cancer (CPTC) [1, 2]. Even though PTC is considered an indolent tumor, approximately 30%–80% of PTC patients would present with central lymph node metastasis (LNM). Clinically LN-positive (clinical N1 disease, cN1) cases are becoming more frequent due to increased ultrasound examination and more meticulous examination of surgical specimens by pathologists [3]. Nonetheless, 30%–65% of clinically N-negative (clinical N0 disease, cN0) PTC patients are detected with LNM postoperatively [4].

However, not all PTCs with LNM have a poor prognosis. The recurrence rate of patients with small-volume LNM (≤5 involved LNs) (median 4%, range 3%–8%) is significantly lower than that of patients with large-volume LNM (>5 involved LNs) (median 19%, range 7%–21%) [5, 6]. In addition, upstaging risk of the PTC based on the detection of microscopic locoregional metastases may result in more aggressive treatment [7, 8]. Even a single-microscopic LN metastasis can upstage a patient with low-risk PTC to intermediate risk of recurrence in the American Thyroid Association (ATA) system and an increased risk of death in the American Joint Committee on Cancer (AJCC) staging system [7, 8]. Therefore, an accurate preoperative predictive tool for large-volume LNM can more precisely guide treatment.

Previous studies reported that age (<40 years old) and male sex were significantly associated with large-volume LNM in PTC patients [911]. Nevertheless, these studies focused on identifying risk factors for large-volume LNM rather than constructing a predictive model. Ultrasound is the first-line noninvasive imaging method for cervical LNM, with a specificity of 85.0%–97.4% but a sensitivity of 20%–31% [12, 13]. Radiomics can identify high-throughput quantitative imaging features and discover information reflecting the underlying pathophysiology that cannot be assessed by visual interpretation [14]. In recent years, radiomics based on ultrasound has been deemed to have a good predictive ability for cervical LNM in PTC patients [15].

An artificial neural network (ANN) is a complex network with many simple components connected, which can perform complex logical operations and identify nonlinear relationships; thus, it has been applied in machine learning-based radiomics model construction [16]. This study developed and validated two predictive models that could adequately combine the ANN and ultrasound radiomics to predict large-volume LNM in cN0 PTC patients.

2. Materials and Methods

The review board of the First Affiliated Hospital of Nanchang University approved this retrospective study. A retrospective review with deidentified data was used, and no protected health information was acquired. Thus, the need for informed consent from all patients was waived.

2.1. Patients

From January 2020 to April 2021, patients with PTC admitted to the Department of Otolaryngology in our hospital were enrolled. The inclusion criteria were as follows: (1) patients treated through total thyroidectomy with bilateral central lymph node dissection (CND), with pathological results being available; (2) ultrasound examination performed within two weeks before surgery; (3) availability of ultrasound images of the target nodule in the most extended axis cross section; (4) more than 18 years old. The exclusion criteria were as follows: (1) no more than five lymph nodes (LNs) resected; (2) met cN1 diagnostic criteria preoperatively; (3) target nodule treated through radiofrequency ablation, radiotherapy, or chemotherapy before ultrasound examination; (4) target nodule unclear on ultrasound images due to artifacts; (5) accompanied by other diseases that can lead to pathological N-positive. In this study, cN1 was defined by at least one of the following features obtained during preoperative ultrasound examination: the ratio of transverse/long diameter >0.5, blurred corticomedullary boundary, vanished medulla structure, microcalcification, cystic changes, and chaotic or peripheral vascular pattern microcalcification [1719].

A total of 559 patients met the inclusion criteria, and 306 patients (median age 45 years, range 24–81 years; 65 men and 241 women) were enrolled after exclusion (Figure 1(a)). Among these patients, 156 patients were reported in our previous studies, which developed and validated an ultrasound radiomic model for predicting malignant thyroid nodules [20]. All patients were randomly divided into the training cohort (n = 183) and validation cohort (n = 123) in a 6 : 4 ratio.

2.2. Clinical and Ultrasound Information

Baseline clinicopathological data, including age, sex, and pathology of the nodule and LN, were obtained from medical records. Patients were divided into two groups by age (age <40 years and ≥40 years old) [10, 11]. Ultrasound Digital Imaging and Communications in Medicine (DICOM) images were acquired with a Philips iU Elite and EPIQ7 (ultrasound system, Philips Medical System, Bothell, WA, USA) using a 5–12 MHz linear transducer. Two radiologists with over 5 years and 8 years of experience were blinded to the pathological results and reviewed the images using Picture Archiving and Communication Systems (PACS). They evaluated the 2017 American College Radiology (ACR) Thyroid Imaging Reporting and Data System (TI-RADS) [21], tumor size, and capsule invasion in consensus. The nodule with the highest ACR score was selected as the target nodule in the case of multifocality; when the scores of nodules were the same, the larger diameter nodule was selected.

2.3. Nodule Segmentation and Feature Extraction

Two radiologists with over 3 years and 10 years of experience, blinded to the pathological results and corresponding LN images, manually segmented the region of interest (ROI) of the target nodule using 3D Slicer version 4.10.2 open-source software (3D Slicer, version 4.10.2; National Institutes of Health-funded; https://www.slicer.org) (Supplementary Materials 1) [22]. A single representative section with the largest nodule area was chosen for the nodular ROI. The intraobserver and interobserver agreements were evaluated using 30 randomly chosen nodules delineated by a radiologist twice within two weeks and by another radiologist. The mean intraclass correlation coefficient (ICC) > 0.75 represented satisfactory agreement. A radiologist delineated the remaining nodules if a strong agreement (mean ICC >0.90) was achieved. Open-source software (PyRadiomics 3.0.1; http://pyradiomics.readthedocs.io/en/latest/index.html) [23] extracted 849 radiomic texture, shape, and intensity features from the original and wavelet-filtered images of each nodule (Supplementary Materials 2). Resampling and z score normalization were performed as preprocessing steps.

2.4. Radiomic Feature Dimension Reduction and Selection

To resolve the data imbalance, we used SMOTE to balance the training cohort [24]. Dimensionality reduction and radiomic feature selection were performed in the following steps: (1) radiomic features with intraobserver or interobserver ICC no more than 0.75 were removed; (2) radiomic features were excluded due to insignificant differences based on univariate analysis (Mann–Whitney U test); (3) Spearman’s correlation coefficient (r) was used to assess the correlations among all radiomic features, and highly correlated features (>0.80) and those with a lower area under the receiver operator characteristic curve (AUROC) were removed; (4) we applied the least absolute shrinkage and selection operator (LASSO) method [25, 26] to select the most significant features.

2.5. Radiomic Model and Integrated Model Construction

Based on significant radiomic features, we built a single-layer, feed-forward ANN with a backpropagation algorithm to build a radiomic model for large-volume LNM using the data of the training cohort. These radiomic features were used to train linear discriminant analysis, support vector machine, and random forest classifiers.

To provide a more practical tool for prediction, we assessed the incremental value of clinical data as an additional predictor. Clinical factors with according to univariate and multivariate logistic regression analyses were considered independent risk factors. We established a clinical model using multivariate logistic regression for comparison. Then, an ANN integrated model incorporating radiomic features and independent clinical risk factors was constructed. Supplementary Materials 3 shows detailed ANN training. The probabilities predicted by the radiomic model and integrated model were called Rad-prob and Inte-prob, respectively. ROC may portray an overly optimistic performance on account of our data imbalance; thus, we applied the precision-recall (PR) curve simultaneously, which can focus on the minority class [27]. According to the PR curve, the optimal cut-off value was defined as the probability that yields the max sum of precision and recall in the training cohort.

2.6. Radiomic Model and Integrated Model Validation

AUROC and area under the PR curve (AUPR) evaluated the ANN-based and three conventional machine learning-based classifiers on the validation cohort. Predictive performance was assessed for radiomic and integrated models, including discrimination, calibration, and clinical usefulness. The Hosmer–Lemeshow test and calibration curve were evaluated for calibration [28]. The discrimination metrics included accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), AUROC, and AUPR. Decision curve analysis (DCA) was conducted to determine the clinical usefulness by quantifying the net benefits at different threshold probabilities.

2.7. Model Validation in PTMC and CPTC

The entire cohort was divided into the PTMC subgroup (≤10 mm; n = 114) and CPTC subgroup (>10 mm; n = 192) by the maximum diameter. Through subgroup analysis, we investigated whether patients with large-volume LNM could be predicted in the subgroups using the radiomic model and integrated model. The performance metrics included accuracy, sensitivity, specificity, PPV, NPV, AUROC, and AUPR.

2.8. Statistical Analysis

Statistical analyses were performed using Python (Version 3.8.8; https://www.python.org/) and R (Version 4.0.1, https://www.r-project.org/). Continuous variables were expressed as medians with interquartile ranges (IQRs) and compared using the Mann–Whitney U test, and categorical data were expressed as numbers with percentages and compared using the chi-square test or Fisher’s exact test. The Delong test was used to compare the AUROCs. All statistical tests were two-sided, and was considered statistically significant.

3. Results

3.1. Patient Clinicopathological Characteristics

The baseline clinicopathological characteristics are presented in Table 1. The analysis showed no significant differences in clinicopathological characteristics between the training and validation cohorts. PTCs with large-volume LNM accounted for 10.4% (19/183) and 8.9% (11/123) of the training and validation cohorts, respectively (). The characteristics of the patients according to their large-volume LNM status are listed in Table 2. Younger age (<40 years) and male sex were significantly associated with a higher prevalence of large-volume LNM (all ).

3.2. Radiomic Feature Dimension Reduction and Selection

The rates of intraobserver and interobserver agreement for the radiomics features reached 95.1% (807/849; mean ICC = 0.950) and 95.6% (812/849; mean ICC = 0.941), respectively (Supplementary Materials 4). Forty-four radiomic features were excluded due to unsatisfactory agreement, and 145 were excluded due to insignificant differences based on univariate analysis. After the correlation analysis, 58 features remained. Then, 25 radiomic features were selected as the most significant features for predicting large-volume LNM by LASSO regression (Supplementary Materials 5). The names of the features and heatmap of the pairwise Spearman correlations are shown in Figure 2.

3.3. Radiomic Model and Integrated Model Construction

Our ANN-based radiomic model consisted of 25 input radiomic feature variables, 15 neurons in the 1st hidden layer, and 1 output unit that can obtain each probability of large-volume LNM (Figure 1(b). Rad-prob had an accuracy of 86%, an AUROC of 0.890, and an AUPR of 0.348 in the training cohort.

In multivariate logistic regression, age (<40 years) (odds ratio (OR) 3.59, 95% confidence interval (CI) 1.31–9.87; ) and male sex (OR 3.72, 95% CI 1.36–10.18; ) were independent risk factors for large-volume LNM. With additional 2 input clinical factors, the integrated model was constructed (Figure 1(c); Supplementary Materials 6 for the training and testing loss and accuracy curves). The accuracy, AUROC, and AUPR of Inte-prob significantly increased to 91%, 0.910, and 0.463 (Table 3).

3.4. Radiomic Model and Integrated Model Validation

In the validation cohort, the AUROC and AUPR of the ANN-based radiomic model were higher than those of three conventional machine learning-based classifiers (detailed ROC and PR analyses were described in Supplementary Materials 7). The radiomic and integrated model showed good calibration in the validation cohort (Figure 3(e)). The accuracy, AUROC, and AUPR of Rad-prob were 83%, 0.856, and 0.381. Inte-prob achieved improved performance with an accuracy of 93%, an AUROC of 0.883, and an AUPR of 0.494 (Table 3 and Figures 3(f) and 3(g)). The discrimination of radiomic and integrated models was significantly better than that of the clinical model ( and 0.013). DCA showed that Inte-prob had the highest clinical value, followed by Rad-prob. Both Rad-prob and Inte-prob were significantly positively correlated with the number of involved LNs in the entire cohort (r = 0.442 and 0.464, both ) (Supplementary Materials 8).

3.5. Model Validation in PTMC and CPTC

Through further subgroup analysis, the Rad-prob (OR 2.72, 95% CI 1.37–5.40; 2.72, 95% CI 1.88–3.92) and Inte-prob (OR 2.72, 95% CI 1.43–5.18; 2.72, 95% CI 1.93–3.84) were independent predictors in large-volume LNM in the PTMC and CPTC subgroups (all ). In PTMC subgroup, the predictive performance of Rad-prob (accuracy 87%; AUROC 0.875; AUPR 0.145) and Inte-prob (accuracy 96%; AUROC 0.901; AUPR 0.298) outperformed that of the clinical model ( and 0.075). In CPTC subgroup, the prediction of Rad-prob (accuracy 83%; AUROC 0.877; AUPR 0.463) and Inte-prob (accuracy 92%; AUROC 0.897; AUPR 0.539) was significantly better than that of the clinical model ( and 0.003) (Table 4 and Figure 4).

4. Discussion

This study developed and validated the ANN-based radiomic and integrated models to predict large-volume LNM in cN0 PTC patients. Both models showed good discrimination, calibration, and clinical application, which outperformed the clinical model. The integrated model combining ultrasound and clinical information could achieve better outcome predictions than the radiomic model. Furthermore, the radiomic and integrated models had good predictive performance in the PTMC and CPTC subgroups.

Specific characteristics, including the number, size, and extranodal extension of LNs, can stratify the risk of recurrence in PTC. Small-volume subclinical microscopic N1 disease conveys a much smaller risk of recurrence than large-volume clinically apparent macroscopic LNM [29]. The involvement of more than 5 LNs is defined as large-volume LNM, associated with a 19% risk of recurrence and correlated with lung metastasis [5, 6, 30]. Accurately identifying PTC patients with a poor prognosis is essential for selecting appropriate clinical management strategies. However, the sensitivity for preoperatively detecting cervical LNM is deemed low [12, 13]. Thus, it would be helpful to find preoperative predictors beyond ultrasound features to predict the risk of large-volume LNM in cN0 PTC patients.

Age is the most important prognostic factor for thyroid carcinoma [9]. Large-volume LNM is more likely to appear in PTMC patients aged <40 years [10, 11]. Male sex has been identified as a risk factor for thyroid cancer [31]. PTC in men exhibits aggressive behavior and a worse prognosis than PTC in women [32]. Similarly, in our study, young age and male sex were independent risk factors. However, previous studies have not constructed a predictive model based on these clinical factors. In our study, the predictive performance of the clinical model was not ideal.

Radiomics has been recently applied to thyroid nodules, and it performs well in predicting malignancy and LNM [3336]. Park et al. [36] reported that radiomics could improve the discrimination of thyroid risk classification systems for malignant thyroid nodules and reduce the number of thyroid nodules recommended for biopsy. Li et al. [34] demonstrated that radiomics has a good prediction ability for pathologic LN stages in PTC patients. Jiang et al. [33] found that a shear wave elastography radiomic signature can accurately predict LNM in PTC patients. Therefore, radiomics is a potential tool to predict large-volume LNM.

Moreover, we used a combination of radiomics and ANN to improve the performance of predictive models. The ANN has several characteristics, including nonlinear statistics, a highly interconnected set of processing units (neurons), and weighted connections [37]. As a commonly used machine learning method, ANN has become a potential tool for predicting clinical outcomes. Hanai et al. [38] demonstrated that ANN is a more helpful tool than conventional statistical methods for predicting the survival of patients with non-small-cell lung cancer. Tong et al. [39] found that ANN-based models showed better performance than logistic regression models in predicting the survival of unresectable pancreatic cancer patients. Our study developed an ANN consisting of an input layer, a hidden layer, and an output layer for large-volume LNM prediction models by inputting the most valuable radiomic features. Although the comparison is not statistically significant due to data imbalance and relatively small study population, from the ROC and PR curves analyses, the ANN-based radiomic model had better discrimination and fewer overfitting possibilities than linear discriminant analysis, support vector machine, and random forest classifiers. The radiomic model showed favorable calibration and predictive value in predicting large-volume LNM in cN0 PTC patients. The novel model improved the clinical model AUROC, AUPR, accuracy, sensitivity, specificity, PPV, and NPV.

Furthermore, the integrated model obtained higher predictive performance by adding clinical independent risk factors. The integrated model displayed good calibration and discrimination with the highest AUROC, AUPR, accuracy, specialty, and PPV among the 3 models. It is worth noting that the PPV value of integrated model was much higher than that of the radiomic model and clinical model. DCA showed that the integrated model gained the highest overall net benefit, followed by the radiomic model. The better performance of radiomic and integrated models in this study indicates that ANN-based models could accurately identify high-risk patients with large-volume LNM, thus providing information to guide treatment and the prognosis for cN0 PTC.

Rad-prob and Inte-prob, unlike the clinical factors, were stable and independent predictors in the PTMC and CPTC subgroups. Young age (OR 3.44, 95% CI 1.40–8.43; ) and male sex (OR 3.26, 95% CI 1.31–8.08; ) were independent risk factors in the CPTC subgroup; however, tumor size (OR 2.07, 95% CI 1.07–3.99; ) was an independent risk factor in the PTMC subgroup. The radiomic and integrated models had stronger predictive value than the clinical model in both subgroups, although the difference was not statistically significant in the PTMC subgroup because of the small subgroup population. Our studies have demonstrated that the radiomic and integrated model could predict large-volume LNM in PTC with different tumor sizes.

Our study has several limitations. First, this study is a single-center retrospective study; thus, selection bias may be inevitable. A prospective multicenter study is necessary to validate the models further. Second, the proportion of large-volume LNM was low, which led to imbalanced data in our study. Although SMOTE was used, this data imbalance inevitably impacts the model construction. Third, the prognosis prediction of the models should be further validated by follow-up of recurrences in the future. Fourth, images were only acquired with Philips ultrasound instruments. We should investigate the influence of images from different ultrasound instruments.

5. Conclusions

In conclusion, radiomics can improve the performance of independent clinical predictors in outcome prediction. The ANN-based ultrasound radiomic model and integrated model combining imaging and clinical information have the potential to predict large-volume LNM in cN0 PTC patients preoperatively.

Abbreviations

ANN:Artificial neural network
ACR TI-RADS:American College of Radiology Thyroid Imaging, Reporting and Data System
AUPR:Area under the precision-recall curve
AUROC:Area under the receiver operator characteristic curve
CI:Confidence interval
cN0:Clinically N-negative
cN1:Clinically LN-positive
CPTC:Conventional papillary thyroid cancer
DCA:Decision curve analysis
DICOM:Digital Imaging and Communications in Medicine
ICC:Intraclass correlation coefficient
IQR:Interquartile range
LASSO:Least absolute shrinkage and selection operator
LNM:Lymph node metastasis
NPV:Negative predictive value
OR:Odds ratio
PACS:Picture Archiving and Communication Systems
PPV:Positive predictive value
PTC:Papillary thyroid carcinoma
PTMC:Papillary thyroid microcarcinoma
ROI:Region of interest
WHO:World Health Organization.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Disclosure

Wan Zhu and Xingzhi Huang are the co-first authors.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

Wan Zhu and Xingzhi Huang contributed equally to this paper.

Acknowledgments

This work was supported by the General Project of Interdisciplinary Innovation Fund of Nanchang University (no. 9167-28220007-YB2110).

Supplementary Materials

Supplementary Materials 1: graphs show manually segmented the target nodule region of interest. Supplementary Materials 2: the details of radiomic features extracted. Supplementary Materials 3: the details of the artificial neural network. Supplementary Materials 4: intraobserver and interobserver agreement based on the interclass correlation coefficient of radiomic features. Supplementary Materials 5: least absolute shrinkage and selection operator coefficient profiles of radiomic features. Supplementary Materials 6: training and validation accuracy and loss of the radiomic model and integrated model. Supplementary Materials 7: performance of the ANN-based and conventional machine learning-based classifiers for predicting large-volume LNM according to the ROC and PR analyses. Supplementary Materials 8: correlation of probabilities predicted by the radiomic model and integrated model with a number of positive lymph nodes. (Supplementary Materials)