Machine Learning for Predicting Distant Metastasis of Medullary Thyroid Carcinoma Using the SEER Database

Guo, Zhen-Tian; Tian, Kun; Xie, Xi-Yuan; Zhang, Yu-Hang; Fang, De-Bao

doi:https://doi.org/10.1155/2023/9965578

International Journal of Endocrinology

On this page

Abstract Introduction Materials and Methods Results Discussion Limitations Conclusions Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2023 | Article ID 9965578 | https://doi.org/10.1155/2023/9965578

Machine Learning for Predicting Distant Metastasis of Medullary Thyroid Carcinoma Using the SEER Database

Zhen-Tian Guo,¹Kun Tian,¹Xi-Yuan Xie,²Yu-Hang Zhang,³and De-Bao Fang⁴

Academic Editor: Alexander Schreiber

Received24 May 2023

Revised19 Dec 2023

Accepted21 Dec 2023

Published30 Dec 2023

Abstract

Objectives. We aimed to establish an effective machine learning (ML) model for predicting the risk of distant metastasis (DM) in medullary thyroid carcinoma (MTC). Methods. Demographic data of MTC patients were extracted from the Surveillance, Epidemiology, and End Results (SEER) database of the National Institutes of Health between 2004 and 2015 to develop six ML algorithm models. Models were evaluated based on accuracy, precision, recall rate, F1-score, and area under the receiver operating characteristic curve (AUC). The association between clinicopathological characteristics and target variables was interpreted. Analyses were performed using traditional logistic regression (LR). Results. In total, 2049 patients were included and 138 developed DM. Multivariable LR showed that age, sex, tumor size, extrathyroidal extension, and lymph node metastasis were predictive features for DM in MTC. Among the six ML models, the random forest (RF) had the best predictability in assessing the risk of DM in MTC, with an accuracy, precision, recall rate, F1-score, and AUC higher than those of the traditional binary LR model. Conclusion. RF was superior to traditional LR in predicting the risk of DM in MTC and can provide a valuable reference for clinicians in decision-making.

1. Introduction

As a result of changes in living environments, heightened health awareness, and advances in detection technology, the incidence of thyroid cancer has experienced a considerable increase in most parts of the world [1]. Medullary thyroid carcinoma (MTC) is a relatively rare malignancy, constituting approximately 5% of all thyroid malignancies. Patients with MTC generally exhibit a poorer prognosis than those with differentiated thyroid cancer (DTC), with MTC accounting for approximately 13% of all thyroid cancer-related fatalities [2, 3]. Roughly 75% of MTC cases are sporadic, while around 25% are autosomal dominant [4]. Research has demonstrated that mutations in RET, a proto-oncogene, are present in approximately 6% of sporadic MTC patients and up to 98% of familial-inherited MTC patients [5]. Studies have indicated that extrathyroidal extension and distant metastasis (DM) are significant predictors of poor prognosis in patients [6, 7]. At the time of initial diagnosis, 10%–15% of MTC patients present with DM [8]. DM of MTC may involve the bones, lungs, and liver [9]. The American Thyroid Association’s guidelines for the management of medullary thyroid cancer recommend various imaging examinations for MTC, potentially involving DM, including enhanced CT, MRI, abdominal ultrasound, and bone scans [10]. These diagnostic methods have a sensitivity of approximately 50%–80% for metastatic diseases. In recent years, the clinical application of drugs targeting RET proto-oncogene mutations has been proven to be effective in treating MTC patients with RET mutations [11]. Consequently, early diagnosis of MTC with DM and early intervention for high-risk patients may significantly improve patient survival.

Machine learning (ML) is a subfield of artificial intelligence technology. Compared to traditional predictive models, ML can enhance the accuracy of models by uncovering nonlinear relationships in large datasets [12, 13]. During medical treatment, vast amounts of data from patients are generated. Therefore, processing and analyzing these data using ML can offer a reliable reference for clinicians to diagnose diseases and prognosticate outcomes. Thus, our study aimed to develop a model based on the Surveillance, Epidemiology, and End Results (SEER) database to predict the occurrence of DM in patients with MTC.

2. Materials and Methods

2.1. Data Sources and Study Population

Data for this study were acquired from the SEER public databases, utilizing SEERStat 8.4.0.1 software for data extraction. Our study focused on patients diagnosed with MTC in the United States between 2004 and 2015. We excluded patients with missing data, unclear clinical and pathological conditions, uncertain histological classifications, or other types of thyroid cancer (TC). The histological types were restricted to medullary carcinomas. According to the International Classification of Diseases (ICD) for Oncology-3, patients’ histological codes are 8345/3 and 8510/3, adopting AJCC 7th edition TNM stage. Variables included age, sex (male or female), race (White, Black, and others), year of diagnosis, Spanish-Hispanic origin, laterality (unilateral and bilateral), multifocality (solitary and multifocal), tumor size, extrathyroidal extension, lymph node metastasis, MTC subtypes, and DM. Distant metastasis means that the tumor invades at least one or more target organs such as brain, bone, liver, lung, and so on. As the SEER database contains public data, informed consent from relevant patients for the use of the SEER database for research purposes was not required, nor was the ethical approval. Our request for access to the SEER data was approved by the National Cancer Institute, USA (reference number 19238-Nov2021).

2.2. Screening for Risk Factors and Model Construction

Statistical analysis was conducted using SPSS software (version 26.0; IBM Corporation). In the univariable analysis, we employed Pearson’s correlation analysis to examine the association between predictor variables, with results being presented in the form of heat maps. The predictive factors related to DM were initially screened through univariable analysis (), and the variables that met the criteria were incorporated into a multivariable logistic regression (LR) analysis. The receiver operating characteristic (ROC) curve was plotted and analyzed based on the results. An area under the ROC curve (AUC) greater than 0.5 was considered meaningful. All computed p values were two-sided, and statistical significance was accepted at <0.05.

The rate of DM of patients with MTC in the SEER database was low, resulting in an unbalanced original dataset. To establish a more accurate prediction model, it is essential to address this imbalance. In this study, we employed two techniques for processing the original dataset: oversampling and undersampling. We then used a correlation matrix to analyze the original and processed data. The synthetic minority oversampling technique (SMOTE) and undersampling are standard approaches for balancing class distribution in imbalanced datasets, widely used to improve prediction models [14]. The distribution of the target variables after the sampling process is illustrated in Figure 1. After data processing, the correlation between variables became more apparent, as demonstrated in Figure 2.

(a)

(b)

(c)

(a)

(b)

(c)

We used Python software (version 3.9.12, Python Software Foundation) to incorporate the selected variables include all variables in the ML model and construct a prediction model. The technically processed data (oversampled and undersampled data) were randomly divided into a training set (80%) and a test set (20%). The training set employed six commonly used ML algorithms: decision tree (DT), support vector machine (SVM), random forest (RF), k-nearest neighbors (KNN), extreme gradient boosting (XGBoost), and gradient boosting machine (GBM). Model evaluation was primarily based on accuracy, precision, recall, F1-score, and AUC value. The model with the highest AUC value was selected as the optimal model.

3. Results

3.1. Analysis of Patient Information

This study included a total of 2049 MTC patients, of which 138 (6.7%) developed DM and the remaining 1911 (93.3%) did not. The baseline characteristics of all patients are presented in Table 1.

In the univariable LR analysis, DM was significantly associated with age, sex, multifocality, tumor size, extrathyroidal extension, and lymph node metastasis () (Table 2). These characteristic variables were incorporated into the multivariable LR analysis.

In the multivariable LR analysis, age [15] sex, extrathyroidal extension, lymph node metastasis, and tumor size were identified as independent predictors of DM in MTC. However, multifocality was not an independent predictive factor for the occurrence of DM in MTC. Further details can be found in Table 2. The ROC curve was plotted based on traditional multivariable LR results (AUC = 0.838, 95% confidence interval (CI): 0.808–0.868, ). Detailed information is summarized in Figure 3.

For the analysis of the ML algorithm, six ML models were constructed and evaluated based on accuracy, precision, recall rate, F1-score, and AUC value. It was observed that ML models constructed after data oversampling outperformed those constructed after undersampling. Tables 3 and 4 provide details on the six ML models constructed from the over- and undersampled data. The ROC curves of the six ML models, constructed by oversampling and undersampling in the training and test sets, are depicted in Figure 4. In the models established using oversampled data, the AUC of all models was greater than 0.850, with the RF model performing better than the other models. The RF model demonstrated accuracy, precision, recall rate, F1-score, and AUC value of 0.890, 0.847,0.946, 0.894, and 0.946, respectively, as well as a higher AUC value than the LR model. This indicates that the diagnostic efficiency of the ML algorithm surpasses that of the traditional LR model and exhibits excellent prediction performance. Employing RF for feature selection, as illustrated in Figure 5, revealed that lymph node metastasis was the most critical factor in determining whether MTC patients also have DM.

(a)

(b)

(c)

(d)

Figure 4

ROC curves of six ML algorithms in different datasets. (a) The ROC curves of the six ML algorithms model in the test set with oversampling. (b) The ROC curves of the six ML algorithms model in the training set with oversampling. (c) The ROC curves of the six ML algorithms model in the test set with undersampling. (d) The ROC curves of the six ML algorithms model in the training set with undersampling. ROC, receiver operating characteristic; ML, machine learning; AUC, area under the receiver operating characteristic curve.

This study developed an online network calculator for evaluating the risk of distant metastasis in MTC patients, which can be applied to clinical patients (https://121.43.117.60:8000/).

4. Discussion

Patients with MTC account for only 5% of the total number of individuals newly diagnosed with TC, while the global incidence rate of MTC is rising rapidly. Deaths from MTC comprise approximately 13% of the total mortality rate of TC, and the 10-year overall survival rate of MTC ranges between 65% and 71%. However, when MTC occurs with DM, the 10-year overall survival rate can decrease to 40–44% [15, 16]. MTC neither concentrates radioactive iodine nor is it inhibited by thyroxine [17]. Total thyroidectomy is the primary treatment method for MTC, with the decision to perform lymph node dissection depending on the specific situation. Adjuvant radiation therapy can be considered for MTC patients with incomplete resection, a high risk of local recurrence, or DM [10]. Radiotherapy can provide continuous control in patients with DM and prevent further progression [18]. However, the impact of radiotherapy on patients’ survival rates remains controversial. In patients without DM, radiotherapy may cause more harm than good [19]. Some perspectives suggest that the role of radiation therapy in MTC is limited to patients who are ineligible or have contraindications for surgical treatment or targeted drugs [20]. Targeted drugs are recommended for patients with DM, particularly because studies have demonstrated [11, 21] that RET-specific inhibitors (selpercatinib and pralsetinib) are effective and promising therapies for MTC patients with DM and progression. The prognosis and treatment effectiveness of MTC are largely related to tumor staging; therefore, early diagnosis is a crucial objective in the management of MTC patients [22]. Previous research on MTC has mostly focused on prognosis and analysis of survival [23, 24].

However, there are few studies on the DM of MTC. Utilizing independent predictors to predict DM can help physicians better evaluate patients with MTC and provide them with more effective individualized treatment options.

Univariable analysis showed that age, sex, multifocality, tumor size, extrathyroidal extension, and lymph node metastasis were independent predictors of DM. However, multivariable analysis indicated that multifocality could not serve as an independent predictor of DM in patients with MTC. This finding is consistent with the conclusion of the RF feature selection, and it is generally believed that multilocality has an independent predictive effect on cervical lymph node metastasis in MTC [25]. Nonetheless, multifocality had a relatively small impact on predicting the occurrence of DM in patients with MTC, which aligns with findings of previous research [25, 26]. RF feature selection revealed that extrathyroidal extension was a key factor in predicting DM, while lymph node metastasis was the most important predictor of DM, consistent with a previous study [26]. We also identified tumor size was an important predictor. Compared with tumors larger than 4 cm, the odds ratio (OR) for tumors of 2–4 cm and ≤2 cm was 0.555 and 0.287, respectively. As tumor size gradually increases, the risk of DM in MTC also increases. Tumor size significantly impacts the recurrence and long-term survival rates of MTC [24]. Extrathyroidal extension and tumor size are also crucial predictive factors for lymph node and DM in MTC [6, 16]. Meanwhile, extrathyroidal extension and tumor size are directly related to T staging in TNM staging, suggesting that tumor stage can also serve as a predictive factor for DM. Contrary to a previous study [27], sex was considered as an independent predictor of DM. We also discovered that female sex was a protective factor for DM. This conclusion is similar to that of a previous study [26]. In our study, 55 years of age was used as the cutoff age [27] and it showed that older patients were more likely to develop DM than younger patients. Therefore, older patients should be actively followed up and regularly examined. In this study, race could not independently predict DM in patients with MTC, which is consistent with results of previous research [26, 27]. In traditional LR, MTC subtypes and Spanish-Hispanic could not be used as independent predictors, and their influence on the feature selection of RF was also small.

We constructed six predictive models based on the SEER database to predict DM in patients with MTC and evaluated six algorithmic models based on accuracy, precision, recall rate, F1-score, and AUC value. We employed the SMOTE technique to address unbalanced datasets and concluded that, for unbalanced datasets used to build ML models, SOMTE is superior to undersampling [14]. By oversampling and undersampling, we enhanced the performance of the model and determined that the prediction model established by oversampling outperformed the one established by undersampling. This may be attributed to fewer patients with DM among MTC patients, resulting in limited ability of the model to identify key predictive factors for patients with combined DM. This study established six ML algorithms, among which RF demonstrated excellent predictive performance (AUC = 0.946), surpassing that of the traditional LR model (AUC = 0.838). Therefore, RF was the best model for predicting MTC patients with DM using the SEER database.

5. Limitations

However, there are some limitations to this study. First, as this study is based on demographics of North American, other populations should be used for validation in future research. Second, the predictive performance of the model warrants further optimization, and additional predictive factors potentially related to DM should be incorporated into the prediction model in future studies. Finally, due to the limitations of the database, tumor markers such as CEA and AFP were not included in MTC patients. We will continue to improve and supplement the model in future studies.

6. Conclusions

In conclusion, this study aimed to identify independent predictors of DM in patients with MTC and to develop a prediction model utilizing ML algorithms. Our analysis, based on the SEER database, demonstrated that age, sex, tumor size, extrathyroidal extension, and lymph node metastasis were significant independent predictors of DM in MTC patients. The RF ML algorithm outperformed the traditional LR model in predicting DM, providing a more accurate and reliable tool for clinical use.

The application of the SMOTE technique for addressing unbalanced datasets was proven to be effective in enhancing the performance of the prediction model. Our findings underscore the importance of early diagnosis and individualized treatment plans for MTC patients, ultimately contributing to improved patient outcomes.

Data Availability

The dataset presented in this study can be found at https://seer.cancer.gov. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

We are very grateful to Professor Xu Zhang, a biostatistician from the First Affiliated Hospital of Anhui Medical University, for evaluating the experimental design and analysis of this article and providing valuable feedback. We would like to thank Editage (https://www.editage.com/) for English language editing.

References

C. La Vecchia, M. Malvezzi, C. Bosetti et al., “Thyroid cancer mortality and incidence: a global overview,” International Journal of Cancer, vol. 136, no. 9, pp. 2187–2195, 2015.
View at: Publisher Site | Google Scholar
T. Kondo, S. Ezzat, and S. L. Asa, “Pathogenetic mechanisms in thyroid follicular-cell neoplasia,” Nature Reviews Cancer, vol. 6, no. 4, pp. 292–306, 2006.
View at: Publisher Site | Google Scholar
S. Roman, R. Lin, and J. A. Sosa, “Prognosis of medullary thyroid carcinoma: demographic, clinical, and pathologic predictors of survival in 1252 cases,” Cancer, vol. 107, no. 9, pp. 2134–2142, 2006.
View at: Publisher Site | Google Scholar
A. Matrone, C. Gambale, A. Prete, and R. Elisei, “Sporadic medullary thyroid carcinoma: towards a precision medicine,” Frontiers in Endocrinology, vol. 13, Article ID 864253, 2022.
View at: Publisher Site | Google Scholar
R. Elisei, A. Tacito, T. Ramone et al., “Twenty-five years experience on RET genetic screening on hereditary MTC: an update on the prevalence of germline RET mutations,” Genes, vol. 10, no. 9, p. 698, 2019.
View at: Publisher Site | Google Scholar
A. Kotwal, D. Erickson, J. R. Geske, I. D. Hay, and M. R. Castro, “Predicting outcomes in sporadic and hereditary medullary thyroid carcinoma over two decades,” Thyroid, vol. 31, no. 4, pp. 616–626, 2021.
View at: Publisher Site | Google Scholar
O. Twito, S. Grozinsky-Glasberg, S. Levy et al., “Clinico-pathologic and dynamic prognostic factors in sporadic and familial medullary thyroid carcinoma: an Israeli multi-center study,” European Journal of Endocrinology, vol. 181, no. 1, pp. 13–21, 2019.
View at: Publisher Site | Google Scholar
R. S. Sippel, M. Kunnimalaiyaan, and H. Chen, “Current management of medullary thyroid cancer,” The Oncologist, vol. 13, no. 5, pp. 539–547, 2008.
View at: Publisher Site | Google Scholar
C. Nashed, S. V. Sakpal, S. Cherneykin, and R. S. Chamberlain, “Medullary thyroid carcinoma metastatic to skin,” Journal of Cutaneous Pathology, vol. 37, no. 12, pp. 1237–1240, 2010.
View at: Publisher Site | Google Scholar
S. A. Wells Jr, S. L. Asa, H. Dralle et al., “Revised American Thyroid Association guidelines for the management of medullary thyroid carcinoma,” Thyroid, vol. 25, no. 6, pp. 567–610, 2015.
View at: Publisher Site | Google Scholar
L. J. Wirth, E. Sherman, B. Robinson et al., “Efficacy of selpercatinib in RET-altered thyroid cancers,” New England Journal of Medicine, vol. 383, no. 9, pp. 825–835, 2020.
View at: Publisher Site | Google Scholar
A. M. Darcy, A. K. Louie, and L. W. Roberts, “Machine learning and the profession of medicine,” Journal of the American Medical Association, vol. 315, no. 6, pp. 551-552, 2016.
View at: Publisher Site | Google Scholar
Z. Obermeyer and E. J. Emanuel, “Predicting the future- big data, machine learning, and clinical medicine,” New England Journal of Medicine, vol. 375, no. 13, pp. 1216–1219, 2016.
View at: Publisher Site | Google Scholar
W. Liu, S. Wang, Z. Ye, P. Xu, X. Xia, and M. Guo, “Prediction of lung metastases in thyroid cancer using machine learning based on SEER database,” Cancer Medicine, vol. 11, no. 12, pp. 2503–2515, 2022.
View at: Publisher Site | Google Scholar
Z. Chen, Y. Mao, T. You, and G. Chen, “Establishment and validation of a nomogram model for predicting distant metastasis in medullary thyroid carcinoma: an analysis of the SEER database based on the AJCC 8th TNM staging system,” Frontiers in Endocrinology, vol. 14, Article ID 1119656, 2023.
View at: Publisher Site | Google Scholar
R. W. Randle, C. J. Balentine, G. E. Leverson et al., “Trends in the presentation, treatment, and survival of patients with medullary thyroid cancer over the past 30 years,” Surgery, vol. 161, no. 1, pp. 137–146, 2017.
View at: Publisher Site | Google Scholar
Z. T. Sahli, J. K. Canner, M. A. Zeiger, and A. Mathur, “Association between age and disease specific mortality in medullary thyroid cancer,” The American Journal of Surgery, vol. 221, no. 2, pp. 478–484, 2021.
View at: Publisher Site | Google Scholar
O. Hamdy, S. Awny, and I. H. Metwally, “Medullary thyroid cancer: epidemiological pattern and factors contributing to recurrence and metastasis,” Annals of the Royal College of Surgeons of England, vol. 102, no. 7, pp. 499–503, 2020.
View at: Publisher Site | Google Scholar
J. A. Call, J. S. Caudill, B. Mciver, and R. L. Foote, “A role for radiotherapy in the management of advanced medullary thyroid carcinoma: the mayo clinic experience,” Rare Tumors, vol. 5, no. 3, pp. 128–131, 2013.
View at: Publisher Site | Google Scholar
S. Huang, J. Zhong, Z. Zhang et al., “Prognosis of radiotherapy in medullary thyroid carcinoma patients without distant metastasis,” Translational Cancer Research, vol. 10, no. 11, pp. 4714–4726, 2021.
View at: Publisher Site | Google Scholar
A. Kukulska, J. Krajewska, Z. Kołosza et al., “Stereotactic radiotherapy is a useful treatment option for patients with medullary thyroid cancer,” Bone Marrow Concentrate Endocrine Disorders, vol. 21, no. 1, p. 160, 2021.
View at: Publisher Site | Google Scholar
V. Subbiah, D. Yang, V. Velcheti, A. Drilon, and F. Meric-Bernstam, “State-of-the-art strategies for targeting RET-dependent cancers,” Journal of Clinical Oncology, vol. 38, no. 11, pp. 1209–1221, 2020.
View at: Publisher Site | Google Scholar
F. Orlandi, P. Caraci, A. Mussa, E. Saggiorato, G. Pancani, and A. Angeli, “Treatment of medullary thyroid carcinoma: an update,” Endocrine-Related Cancer, vol. 8, no. 2, pp. 135–147, 2001.
View at: Publisher Site | Google Scholar
J. Tang, S. Jiang, L. Gao et al., “Construction and validation of a nomogram based on the log odds of positive lymph nodes to predict the prognosis of medullary thyroid carcinoma after surgery,” Annals of Surgical Oncology, vol. 28, no. 8, pp. 4360–4370, 2021.
View at: Publisher Site | Google Scholar
L. Chen, Y. Wang, K. Zhao, Y. Wang, and X. He, “Postoperative nomogram for predicting cancer-specific and overall survival among patients with medullary thyroid cancer,” International Journal of Endocrinology, vol. 2020, Article ID 8888677, 13 pages, 2020.
View at: Publisher Site | Google Scholar
W. Fan, C. Xiao, and F. Wu, “Analysis of risk factors for cervical lymph node metastases in patients with sporadic medullary thyroid carcinoma,” Journal of International Medical Research, vol. 46, no. 5, pp. 1982–1989, 2018.
View at: Publisher Site | Google Scholar
M. K. Le, M. Kawai, T. Odate, H. G. Vuong, N. Oishi, and T. Kondo, “Metastatic risk stratification of 2526 medullary thyroid carcinoma patients: a study based on surveillance, epidemiology, and end results database,” Endocrine Pathology, vol. 33, no. 3, pp. 348–358, 2022.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2023 Zhen-Tian Guo et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

207

Downloads

256

Citations