Abstract

Background. Metastases at regional lymph nodes could easily occur in patients with high-histological-grade colorectal cancer (CRC). However, few models were built on the basis of lymph nodes to predict the outcome of patients with histological grades III-IV CRC. Methods. Data in the Surveillance, Epidemiology, and End Results databases were used. Univariate and multivariate analyses were performed. A personalized prediction model was built in accordance with the results of the analyses. A nomogram was tested in two datasets and assessed using a calibration curve, a consistency index (C-index), and an area under the curve (AUC). Results. A total of 14,039 cases were obtained from the database. They were separated into two groups (9828 cases for constructing the model and 4211 cases for validation). Logistic and Cox regression analyses were then conducted. Factors such as log odds of positive lymph nodes (LODDS) were utilized. Then, a personalized prediction model was established. The C-index in the construction and validation groups was 0.770. The 1-, 3-, and 5-year AUCs were 0793, 0.828, and 0.830 in the construction group, respectively, and 0.796, 0.833, and 0.832 in the validation group, respectively. The calibration curves showed well consistency in the 1-, 3- and 5-year OS between prediction and reality in both groups. Conclusion. The nomogram built based on LODDS exhibited considerable reliability and accuracy.

1. Introduction

Colorectal cancer (CRC) is known as having the third highest incidence, regardless of gender [1], and the predominant type is adenocarcinoma [2]. The situation of cancer is an influencing factor for the outcome of patients with CRC [35]. Several studies reported that patients with CRC of different histological grades showed different outcomes [610]. High histological grades increased the likelihood of bowel obstruction before surgery [11]. Moreover, metastases at regional lymph nodes could more easily occur in patients with CRC with a higher histological grade [12]. The prognosis could also depend on the condition of regional lymph nodes [5]. Therefore, to ensure lymph node dissection has the desired effect, at least 12 regional lymph nodes should be obtained during surgery, according to the guidelines [13].

In the tumor (T), node (N), and metastasis (M) staging system of the American Joint Committee on Cancer (AJCC), pathologically positive regional lymph nodes were used as a criterion for stratifying patients [5]. The indicators involved in this system are intuitive and easy to obtain. However, the number of examined regional lymph nodes (ELNs) in the N stage was not considered. Hence, the lymph node ratio (LNR) was proposed as a supplement to the N staging system [14].

LNR refers to the proportion of positive regional lymph nodes (PLNs) in ELNs. It was reported to be a good predictor of outcomes in several kinds of cancer [1518]. What’s more, a study focused on right colon cancer pointed out that LNR was a potentially valuable factor in predicting the chance of tumor recurrence [15]. However, when the number of PLNs is 0, the value of LNR does not change regardless of the number of ELNs. Log odds of positive lymph nodes (LODDS), which refers to the logarithm of result of the PLNs divided by negative lymph nodes (NLNs), showed better prediction ability than LNR in several studies [1921]. LODDS could make up for the deficiency in LNR. Nowadays, few tools built based on LNR or LODDS could be used to evaluate the overall survival (OS) of patients with poorly differentiated or undifferentiated CRC, even though metastases at regional lymph nodes could easily occur in these patients [12].

By utilizing the Surveillance, Epidemiology, and End Results (SEER) database [22], the prognostic factors for patients with histological grades III-IV CRC were explored on the basis of clinical factors, including LNR and LODDS. A personalized prediction model that could be used to make clinical decisions was further constructed and validated.

2. Method

2.1. Data Sources

Cases in the SEER database were used in this study (November 2020 Submission deltails could be acquire at the website: https://seer.cancer.gov/data-software/documentation/seerstat/nov2020/). In this database, the sources of the cases cover all 18 states in the USA [22].

2.2. Included Participants

The inclusion criteria were as follows: (1) patients aged 18–80 years; (2) diagnosis was confirmed by positive histology; (3) clinical and follow-up data of patients were completed and available; (4) poorly differentiated or undifferentiated CRC with histological grade III-IV; (5) patients with one primary cancer only. The exclusion criteria were as follows: (1) patients with an autopsy or death certificate only; (2) patients whose overall survival times were less than 1 month; (3) patients with two or more primary cancers. Multiple primary cancer refers to cancer with a site and histological type different from those of first primary cancer, according to a previous study [23].

2.3. Variates and Definitions

In this study, demographic information (age and gender) and the characteristics of cancer (primary location, histological type, histological grade, AJCC TNM stage, LNR, and LODDS) were considered. Age was categorized into three levels following a previous study [24]: <45, 45–60, and >60 years. The information from the primary site was recoded on the basis of the second edition of the International Classification of Diseases for Oncology (ICD‐O‐2). The primary site was divided into the right colon (from the cecum to the transverse colon, but the appendix was excluded), the left colon (from the splenic flexure, descending to the sigmoid colon), and the rectum (rectosigmoid junction and rectum). Histologic codes 8140–8389 were identified as adenocarcinoma, 8480–8481 were defined as mucinous adenocarcinoma/mucin-producing adenocarcinoma (AM/MPA), and 8490 were defined as signet ring cell carcinoma (SRCC). The histologic codes were coded on the basis of ICD‐O‐2. Poorly differentiated cancer was defined as histological grade III, and undifferentiated cancer was defined as histological grade IV. NLNs were calculated using the following formula: NLNs = ELNs − PLNs. The value of LNR in every case was calculated in accordance with the formula LNR = PLNs/ELNs [1518]. The value of LODDS in every case was calculated as follows: LODDS = log ((PLNs + 0.5)/(NLNs + 0.5)) [20]. The cutoff values of LNR, ELNs, and NLNs were decided on the basis of the Kaplan–Meier method. On the basis of these cutoff values, LNR, ELNs, and NLNs were divided into two subgroups. LODDS was divided into three levels following Lee et al. [25]: <−1.3222, from −1.3222 to −0.5863, and >−0.5863. Survival months were calculated as survival months = FLOOR ((endpoint − date)/days in a month)), as defined in the SEER database (details could be acquired at website: https://seer.cancer.gov/survivaltime). OS refers to the time from the day of diagnosis to the day of death.

2.4. Risk Factors

A seven-to-three ratio was used to randomly divide all cases into construction and validation groups. The cases in the two groups were then compared. The mean and standard deviation (SD) were used to describe the continuous variables. Logistic regression analyses were conducted sequentially for the initial screening of risk factors associated with patients’ OS [26], and the least absolute shrinkage and selection operator (LASSO) regression algorithms were utilized. A cross-validation was also performed to explore the optimal tuning parameters (λ), and the most significant variables were screened out. Moreover, a 95% confidence interval (CI) and odds ratio (OR) were used to quantify the effect of features on OS. Then, a generalized linear model was constructed. A forest plot was drawn to display the model visually. The receiver operating characteristic curve (ROC) and area under the curve (AUC) were obtained in the construction and validation groups to evaluate the model’s predictive accuracy. The AUC values ranged from 0.5 to 1.0; the larger the AUC, the more reliable the model. Cox regression analyses were performed subsequently [26]. The hazard ratio (HR) and its 95% CI were applied to quantify the results. Schoenfeld’s global test [27] was used to verify whether the variables conformed to the proportional hazard (PH) assumption. Deviance residual diagrams were used to evaluate the distribution of data in each variable.

2.5. Nomogram Construction and Validation

By referring to the results of the above analyses, a nomogram was developed. Nomogram is known as a reliable tool to predict prognosis, and it displays risk factors visually. The concordance index (C-index) was separately calculated in the two datasets. Furthermore, 1-, 3-, and 5-year ROC analyses were performed, and AUCs were calculated to assess the nomogram’s predictive accuracy. The calibration curves in the two groups were obtained via 1000 resamples bootstrapping method to test the consistency between the prediction of the established model and reality.

2.6. Statistical Analysis

SEERStat (version 8.4.0) was used to collect data. Categorical variables were coded numerically and tested using the chi-square test or Fisher exact test, while continuous variables were tested using ANOVA to describe the characteristics between the two groups. Logistic and Cox regression analyses were conducted for variable selection [26]. The C-index, ROC, AUC, and calibration curves in the two groups were calculated or plotted. All the analyses and figures were performed or plotted using R software (version 4.1.2, https://www.r-project.org/). Packages such as “survival,” “survminer,” “caret,” “tableone,” “glmnet,” “forestplot,” “pROC,” “ezcox,” and “timeROC,” were used in this study. values (two-sided) = 0.05 were considered statistically significant.

3. Results

3.1. Characteristics of Patients Identified

A total of 14,039 cases were downloaded from the SEER database and divided randomly into construction (9828 cases) and validation (4211 cases) groups. The process of patient selection is shown in Figure 1.

The cutoff values of LNR, ELNs, and NLNs were calculated separately, as shown in Figures 2(a)2(c).

The patients were separated into two subgroups in accordance with their LNR status as low (≤0.24) and high (>0.24), their ELN status as low (≤11) and high (>11), and their NLN status as low (≤9) and high (>9), respectively. The characteristics of the cases in the two groups are listed in Table 1. More than half of them were older than 60 years (7839 cases, 55.8%); male (7167 cases, 51.1%); at the AJCC T3 stage (7775 cases, 55.4%); and located in the right colon (7645 cases, 54.5%). Most of them were white people (11,219 cases, 79.9%); at the AJCC M0 stage (10761 cases, 76.7%); grade III (11,721 cases, 84.5%); LNR ≤0.24 (9507 cases, 67.7%); without bone metastases (13,938 cases, 99.3%); without brain metastasis (14,004 cases, 99.8%); without liver metastasis (11,953 cases, 85.1%); without lung metastasis (13,913 cases, 97.0%); high ELNs (12,563 cases, 89.5%); and high NLNs (10,810 cases, 77.0%). The pathological tissue type with the largest proportion was adenocarcinoma (12,112 cases, 84.5%). A total of 5418 cases (38.6%) resulted in death, while 8621 (61.45%) cases were alive in this study. The survival time was 33.28 months (SD = 22.82 months) in total, with33.49 months (SD = 22.95 months) in the construction group and 32.80 months (SD = 22.51 months) in the validation group. No statistical difference was found among all variables between the two groups.

3.2. Exploration of Factors for Patients with Histological Grade III-IV CRC

The univariate and multivariate analyses were first performed in logistic regression to initially screen the risk variables for patients with histological grades III-IV CRC. The LASSO regression algorithm was used in this process. According to the multivariate logistic regression analysis (Lambda.1SE = 0.01087546, Figures 3(a) and 3(b)), variables including age, sex, race, a primary site of cancer, histological grade, T, N, M, histological type, LNR, LODDS, NLN, bone metastasis, and lung metastasis were screened out ().

The detailed results of the logistic regression analyses are shown in Table 2. Patients aged 45–60 and beyond 60 years, female, black, right colon, T3 stage, T4, M1, AM/MPA, SRCC, LODDS from −1.3222 to −0.5863, LODDS ≥−0.5863, high NLN, bone metastasis, and liver metastasis were preliminarily identified (). Subsequently, a generalized linear model was built, as shown in Figure 3(c). The ROCs were drawn, and the corresponding AUC values were calculated to assess the reliability of the established model, as shown in Figures 3(d) and 3(e). The AUC values in the construction and validation groups were 0.821 and 0.818, respectively, indicating that the established model had a high degree of predictive capacity. The Cox regression analyses were performed for further exploration (Table 3). Schoenfeld’s global test was also conducted, and the results are shown in Figures 4(a) and 4(b).

Age, sex, primary site, and NLN did not conform to the PH assumption () and were thus excluded from the following analyses. The remaining variables, including race, T, M, histological type, LODDS, liver metastasis, and bone metastasis, conformed to the PH assumption (). The deviance residual diagram in Figure 4(c) indicated that the residuals of all variables involved in the nomogram were in a symmetric pattern and had a constant, uniform spread throughout the fit. The results of multivariate Cox regression analysis showed that black race, T3, T4, M1, SRCC, LODDS from −1.3222 to −0.5863, LODDS ≥−0.5863, NLN, metastasis at the bone, and metastasis at the liver, resulted in a worse outcome, whereas other race patients led to an enhanced outcome ().

3.3. Construction and Verification of Nomogram

A nomogram was constructed, as shown in Figure 5.

The C-index of this nomogram in the construction and validation groups was 0.770. The results of 1-, 3-, and 5-year ROC analyses in the construction and validation groups are displayed in Figures 6(a) and 6(b).

The 1-, 3-, and 5-year AUC values in the construction group were 0793, 0.828, and 0.830, respectively, and those in the validation group were 0.796, 0.833, and 0.832, respectively. The 1-, 3-, and 5-year OS calibration curves in the two groups are shown in Figures 7(a)7(f). The calibration curves showed a well-consistent OS between the prediction and reality in the two groups.

4. Discussion

CRC has the third highest incidence among cancers [1]. Even though nearly 75% of patients with CRC could be potentially treated by surgery [28], CRC still ranks third in the highest mortality among cancers, and it continues to seriously endanger human health. Therefore, clinicians need to estimate the outcome and decide on subsequent treatment.

The TNM system is a common staging system in the diagnosis and treatment of patients with CRC [5]. This system stages cancer based on three aspects: the degree of cancer invasion, metastasis at regional lymph nodes, and the invasion situation of a distant organ. This system is simple and easy to use, but it still has its shortcomings. Several studies reported that ELNs during surgery could influence the prognosis of patients with CRC [29, 30]. Le Voyer et al. reported that the outcome of patients with ELNs of more than 40 was obviously better than that of patients with ELNs of less than 10 [29]. One explanation is that insufficient ELNs obtained during surgery could directly impair the accuracy of tumor staging [29], thus influencing the choice of subsequent treatment options. A study reported that the more ELNs obtained, the better the prognosis of patients with CRC, and at least 20 ELNs should be obtained during surgery [31]. Guidelines also recommended that at least 12 ELNs should be obtained during surgery [13]. However, the AJCC TNM staging system does not take ELNs into consideration. Thus, LNR and LODDS should be introduced [14, 32, 33].

LNR is the proportion of PLNs that make up ELNs and has been reported as a good predictor of outcomes in several kinds of cancer [1518]. One study that focused on right colon cancer pointed out that LNR is a potentially valuable factor in predicting the probability of tumor recurrence [15]. However, LNR also has its inherent shortcomings. When the number of PLNs is 0, the value of LNR does not change regardless of the number of ELNs. LODDS, which refers to the logarithm of the result of PLNs divided by NLNs, showed better prediction ability than LNR in several studies [1921]. Even when the PLN is 0, LODDS could differentiate patients in accordance with different ELNs. A research study reported that LODDS could be a potential factor in predicting the outcome of patients with CRC [34]. Arslan et al. further indicated that the LODDS classification showed better prediction ability in patients with ELNs less than 12 during surgery [35]. Additional studies are needed to explore which one is better.

A total of 14,039 cases of histological grades III-IV CRC were downloaded from the SEER database and randomly divided into two groups for model construction and validation. LODDS was identified in the logistic and Cox regression analyses. Meanwhile, LNR and the AJCC N staging system did not show a significant association with patient OS. Finally, a nomogram was created to visualize the results. This nomogram was built on the basis of LODDS, and it showed well prediction efficiency. This result is consistent with previous discussions.

This study has limitations. First, the cutoff value of LODDS was decided on the basis of its tertiles, as in the research conducted by Lee et al. [25]. Therefore, an optimal cutoff value should be further explored through follow-up studies to improve the reliability of this predictive model. Second, all cases involved in this study were downloaded from the SEER database. Cases from additional sources must be verified to improve the accuracy of the model.

5. Conclusion

LODDS was found to be a valuable predictive factor, and it showed better predictive ability for the OS of patients with histological grades III-IV CRC than LNR. Race, AJCC T stage, AJCC M stage, LODDS,histological type, bone metastasis, and liver metastasis were selected as isolated factors to construct a nomogram. The nomogram performed well in both groups. All variables involved in the nomogram were easily obtained in the clinical diagnosis and treatment of patients with CRC. The nomogram could provide a certain reference for doctors to assess the outcome of patients with histological grades III-IV CRC and choose subsequent treatment.

Data Availability

The primary data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Acknowledgments

This research was supported by the Huadong Medicine Joint Funds of the Zhejiang Provincial Natural Science Foundation of China under (Grant no. LHDMY22H160002) and Wenzhou Municipal Science and Technology Bureau (Grant no. Y20180085).