Abstract

Objective. To establish a prediction model of pneumonia risk in SARS-CoV-2-infected patients to reduce unnecessary chest CT scans. Materials and Methods. The model was constructed based on a retrospective cohort study. We selected SARS-CoV-2 test-positive patients and collected their clinical data and chest CT images from the outpatient and emergency departments of Hunan Provincial People’s Hospital, China. Univariate and multivariate logistic regression and least absolute shrinkage and selection operator (LASSO) regression were utilized to identify predictors of pneumonia risk for patients infected with SARS-CoV-2. These predictors were then incorporated into a nomogram to establish the model. To ensure its performance, the model was evaluated from the aspects of discrimination, calibration, and clinical validity. In addition, a smoothed curve was fitted using a generalized additive model (GAM) to explore the association between the pneumonia grade and the model’s predicted probability of pneumonia. Results. We selected 299 SARS-CoV-2 test-positive patients, of whom 205 cases were in the training cohort and 94 cases were in the validation cohort. Age, CRP natural log-transformed value (InCRP), and monocyte percentage (%Mon) were found to be valid predictors of pneumonia risk. This predictive model achieved good discrimination of AUC in the training and validation cohorts which was 0.7820 (95% CI: 0.7254–0.8439) and 0.8432 (95% CI: 0.7588–0.9151), respectively. At the cut-off value of 0.5, it had a sensitivity and specificity of 70.75% and 66.33% in the training cohort and 76.09% and 73.91% in the validation cohort, respectively. With suitable calibration accuracy shown in calibration curves, decision curve analysis indicated high clinical value in predicting pneumonia probability in SARS-CoV-2-infected patients. The probability of pneumonia predicted by the model was positively correlated with the actual pneumonia classification. Conclusion. This study has developed a pneumonia risk prediction model that can be utilized for diagnostic purposes in predicting the probability of pneumonia in patients infected with SARS-CoV-2.

1. Introduction

SARS-CoV-2 infection has prevailed globally since 2020, accounting for recurring quarantines in many countries. It has significantly impacted public health and the global economy [1, 2]. As of 10 February 2023, there have been 755,385,709 confirmed cases of COVID-19 reported to WHO globally, including 6,833,388 deaths. Omicron, the mutant strain, entered the community in November 2021 and is far more contagious and escape-resistant than the previous variants of concern (VOC), like Delta [38]. At the beginning of 2022, the Omicron version quickly surpasses the Delta variant as the prevalent strain worldwide [9].

During the early period of the COVID-19 pandemic, SARS-CoV-2 primarily affected the lung and caused pneumonia [1013]. As one of the most representative and accurate diagnostic methods for COVID-19 [14], chest computed tomography (CT) scans are widely used in mainland China.

However, recent studies have demonstrated that the most recent VOC, Omicron is much less likely to cause pulmonary infections [35, 15, 16], suggesting potential implications for adapting management strategies for these infections.

In clinical practice, we found that due to the apprehension of contracting severe pneumonia from the SARS-CoV-2, many people with mild symptoms are choosing to receive CT scans, causing excessive CT scans and putting a strain on the availability of healthcare resources, which is particularly true when SARS-CoV-2 localized epidemic outbreaks occur. Therefore, a strategy to evaluate the risk of pneumonia among recently infected people is essential to ensure the efficient use of medical resources and decrease unnecessary exposure to electromagnetic radiation.

This study is to improve the classification of pneumonia risk in individuals with the most recent VOC of SARS-CoV-2 infections. In this way, it can not only reduce the overuse of CT scans and nonessential ionizing radiation in individuals but also reduce the associated financial burden on patients and optimize the allocation of medical resources. As a result, we have developed and externally validated a pneumonia risk prediction model based on general patient data and blood routine tests, which meets the needs of the new phase of COVID-19 epidemic control.

2. Material and Methods

2.1. Materials

A retrospective analysis was performed on the clinical data of SARS-CoV-2 test-positive patients who visited outpatient and emergency departments and underwent chest CT scans at the Mawangdui Branch of Hunan Provincial People’s Hospital from 20 December 2022 to 23 December 2022 and at the Tianxinge Branch of Hunan Provincial People’s Hospital from 1 January 2023 to 4 January 2023. The inclusion criteria were as follows: (a) attendance as an outpatient or emergency (not including inpatients); (b) patients had completed chest CT scans, and CT imaging data were available; (c) SARS-CoV-2 infection positive was diagnosed by antigen test or nucleic acid test within 3 days before the current chest CT; (d) complete blood routine examination results were obtained. The exclusion criteria were as follows: (a) inflammation of a body part other than the lungs had been diagnosed at the time of the current blood routine tests; (2) the patient was already on antiviral medication at the time of the visit. The patient recruitment pathway is detailed in Figure 1.

The study was conducted in accordance with the Declaration of Helsinki. It was approved by the Medical Ethics Committee of Hunan Provincial People’s Hospital (The First Affiliated Hospital of Hunan Normal University), and patient informed consent was waived for this retrospective analysis.

2.2. Methods
2.2.1. Device Parameters and Image Analysis

At the Mawangdui Branch (training cohort) of Hunan Provincial People’s Hospital, CT scans were performed with a United Imaging uCT 760GE 128-slice CT using the following parameters: field of view (FOV), 230 mm × 230 mm; layer thickness, 5 mm; and layer spacing, 5 mm. At the Tianxinge Branch (validation cohort) of Hunan Provincial People’s Hospital, CT scans were performed with a United Imaging uCT 860 160-slice CT or a United Imaging uCT 960 + 640-slice CT using the following parameters: field of view (FOV), 230 mm × 230 mm; layer thickness, 5 mm; and layer spacing, 5 mm. Two attending radiologists conducted image analysis separately, and the final decision in case of a dispute was determined by consultation between the two physicians. CT diagnosis of COVID-19 was referred to the report published by the RSNA [17]. Typical findings were as follows: peripheral distribution, ground-glass opacity, fine reticular opacity, vascular thickening, and reverse halo sign. Patients with pneumonia were also classified into grades 0, 1, 2, 3, and 4 according to the extent and distribution of lung involvement (no lung involvement was categorized as grade 0).

2.2.2. Statistical Analysis and Construction and Evaluation of Predictive Models

Statistical analysis was performed using Empower Stats, version 5.0 (https://www.empowerstats.com, X&Y Solutions, Inc., Boston, MA, USA), R statistical software, version 4.2.0 (https://www.R-project.org, The R Foundation), and the SPSS statistical software, version 27.0 (SPSS Inc., Chicago, IL, USA) with continuity variables expressed as medians (min, max) and categorical variables expressed as frequencies (percentages). Kruskal–Wallis rank sum test or Fisher’s exact probability test was used to compare differences between groups of continuity variables. The Chi-square test was used for comparisons of categorical variables. After the natural log transformation of some continuity variables, to reduce irrelevant and redundant information, the predictor variables of the training cohort were filtered using both “univariate and then multivariate logistic regression” and “least absolute shrinkage and selection operator (LASSO)” methods. The variables selected by both screening methods were used as the final predictor variables. The prediction model was constructed based on multivariate logistic regression and was presented in a nomogram. The ROC curves were used, and 500 in eternal resamples were performed by Bootstrap to evaluate the discrimination of the pneumonia risk model between the training and validation cohorts. DeLong test and integrated discrimination improvement index (IDI) were used to compare the AUC of the pneumonia risk model with the AUCs for predictors incorporated in the model alone. Calibration curves were plotted to assess the calibration of the model. The clinical validity of the model was evaluated by the net benefit of DCA at different threshold probabilities. In addition, a smoothed curve was fitted using a generalized additive model (GAM) to explore the relationship between the pneumonia grade and the model’s predicted probability of pneumonia. A difference of was considered statistically significant.

3. Results

3.1. General Information

A total of 205 patients were enrolled in the training cohort, of which 105 cases (51.22%) were female and 100 cases (48.78%) were male, 99 cases (48.29%) without pneumonia and 106 cases (51.71%) with pneumonia. The median age of the training cohort was 47 years old, the youngest being 14 and the oldest being 97; a total of 94 cases were enrolled in the validation cohort, of which 60 (63.83%) were female and 34 (36.17%) were male, 47 (50.00%) were without pneumonia, and 47 (50.00%) were with pneumonia. The median age of the validation cohort was 56 years old, the youngest being 2 and the oldest 89; the distribution of the remaining baseline indicators is shown in Table 1.

3.2. Predictor Variable Screening Results

Among the baseline indicators in the training cohort, univariate logistic regression identified the following factors as possible predictors (): age, white blood cells (WBC), red blood cells (RBC), neutrophils percentage (%Neu), neutrophils number (#Neu), lymphocytes percentage (%Lymph), monocytes percentage (%Mon), red cell distribution width-standard deviation (RDW-SD), platelet distribution width (PDW), mean platelet volume (MPV), platelet large cell ratio (P-LCR), CRP natural log-transformed value (InCRP), eosinophils percentage (%Eos), basophils percentage (%Bas), basophils number (#Bas). Further multivariate logistic regression showed age, CRP natural log-transformed value (InCRP), neutrophils percentage (%Neu), and monocytes percentage (%Mon) as independent predictors () (Table 2). Lasso regression selected three predictors with nonzero coefficients: age, InCRP, %Mon (Figure 2) (screening lambda by 10-fold cross-validation, based on lambda. 1se, i.e., the maximum lambda corresponding to an error mean within one standard deviation of the minimum). To lessen irrelevant and redundant information, the variables age, InCRP, and %Mon selected by both screening methods were taken as the final predictor variables.

3.3. Construction and Evaluation of the Nomogram Prediction Model

Multivariable logistic regression analysis established a nomogram model based on the final selected predictor variables (Figure 3(a)). The AUC of the pneumonia risk model was 0.7820 (95% CI: 0.7254–0.8439) in the training cohort and 0.8432 (95% CI: 0.7588–0.9151) in the validation cohort (Figures 3(b) and 3(c)); at the cut-off value of 0.5, the sensitivity and specificity of the pneumonia risk model were 70.75%, 66.33% (training cohort), 76.09%, and 73.91% (validation cohort), respectively; the calibration curve showed good agreement between the predicted probability of pneumonia from the pneumonia risk model and the actually observed probability. Decision curve analysis (DCA) showed good clinical validity of the pneumonia risk model in the training and validation cohort (Figures 3(f) and 3(g)). Other diagnostic parameters of the model are shown in Table 3. A comparison of the AUC and DCA for the pneumonia risk model, with predictors incorporated in the model alone in the whole study cohort, is illustrated in Figure 4, which shows that the pneumonia risk model combining multiple predictors has better diagnostic performance than a single predictor.

3.4. Correlation between the Predicted Probability of Pneumonia Risk and Pneumonia Grade

We further explored the correlation between the predictive values of the pneumonia risk prediction model constructed in this study and the actual pneumonia severity rating. As mentioned in the method, patients with pneumonia were also classified into grades 0, 1, 2, 3, and 4 according to the extent and distribution of lung involvement (no lung involvement was categorized as grade 0). The actual pneumonia rating results are shown in Table 4. A positive linear correlation was found between the predicted pneumonia probability of the pneumonia risk model and actual pneumonia grade using GAM (Figure 5); see Figure 6 for examples.

4. Discussion

In this study, we constructed a pneumonia risk prediction model based on common, easily obtainable, and inexpensive clinical indicators such as “age,” “InCRP,” and “%Mon” to classify the pneumonia risk of patients infected with SARS-CoV-2. It provides an appropriate reference for clinicians in selecting chest CT examinations to reduce unnecessary medical ionizing radiation and alleviate patients’ economic burden. The model performs well in discrimination, calibration, and clinical effectiveness and can be widely applied for clinical use.

4.1. Analysis of the Rationality of Including “Age” in the Pneumonia Risk Prediction Model in This Study

The severity and fatality rates of COVID-19 significantly vary with age group, and they rise sharply in older people [1820]. According to recent studies, the activation of the nucleotide-binding domain and leucine rich repeat containing family, pyrin domain containing 3 (NLRP3) inflammasome, plays a role in lung inflammation and fibrosis induced by SARS-CoV-2 infections [21]; the NLRP3 inflammasome is excessively activated in older individuals due to impaired mitochondrial function, elevated levels of mitochondrial reactive oxygen species (mtROS), and/or mitochondrial DNA. This results in an exaggerated response from classically activated macrophages and subsequent increases in IL-1β [22]. This explains, to some extent, why elderly patients are more likely to have pneumonia after being infected with SARS-CoV-2 and also provides evidence for the rationality of including age as a predictive factor in our prediction model.

4.2. Analysis of the Rationality of Including “InCRP” in the Pneumonia Risk Prediction Model in This Study

As a general indicator of inflammation, CRP is associated with the clinical severity of COVID-19 [20, 23, 24]. CRP is an inflammatory biomarker synthesized by the liver. Our results show that CRP levels are significantly elevated in SARS-CoV-2-infected individuals, which is consistent with previous research [24, 25], and it may indicate COVID-19 changes earlier than chest CT—CRP was significantly elevated before CT findings in severe COVID-19 patients [26].

4.3. Analysis of the Rationality of Including “%Mon” in the Pneumonia Risk Prediction Model in This Study

In our study, %Mon was partially associated with the risk of pneumonia, which is in accord with recent studies [27]. Monocytes are innate immune system cells that participate in several immune function events, including phagocytosis, antigen presentation, and inflammatory responses [28]; circulating monocytes extravasate into peripheral tissues during sterile and nonsterile inflammation and undergo differentiation into macrophages or dendritic cells. A previous review article discussed the buildup of monocyte/macrophage cells in the lungs. These cells are likely sources of the proinflammatory cytokines and chemokines linked to deadly diseases brought on by human coronavirus infections, such as COVID-19 [29]. It suggests that the migration of monocytes into lung tissue may be the cause of the monocyte reduction in peripheral blood.

In previous relevant studies, additional factors, such as cardiovascular disease, hypertension, chronic respiratory disease, diabetes, obesity, and high serum ferritin levels, were found to be associated with the progression of COVID-19 [3032]. Angiotensin-converting enzyme 2 (ACE2) has been found to be a pathway by which SARS-CoV-2 enters cells, and angiotensin-converting enzyme inhibitor (ACE1) and angiotensin II receptor antagonist (ARB) are mainly used to treat cardiovascular disease and hypertension, which may lead to increased ACE2 expression and promote SARS-CoV-2 infection in hypertensive patients [33]. Moreover, smokers and COPD patients have higher levels of ACE2 expression in their lungs [34, 35]. This may go some way towards explaining why patients with chronic respiratory disease are more likely to progress after SARS-CoV-2 infection. Diabetes patients are more likely to develop COVID-19 at a severe stage. This might be brought on by hyperglycemic circumstances that affect neutrophil activity, antioxidant system function, and humoral immunity, all contributing to immunological dysfunction [36]. Obesity affects lung function by influencing lung volume and compliance, as well as narrowing peripheral airways [37]. Additionally, due to the high expression of angiotensin-converting enzyme type 2 in adipose tissue compared to the lungs, there is a hypothesis that SARS-CoV-2 may be capable of entering adipocytes and causing infection. This could contribute to the spread of the virus to other organs or serve as a natural reservoir for prolonged viral clearance [38]. Clinically applicable inflammatory marker panels now contain ferritin. Inflammation can cause the release of ferritin from macrophages or cells owing to tissue damage. This release explains the abnormal levels of ferritin in inflammation. Since our study is based on a retrospective analysis, it is limited because of missing information, so some of the valuable indicators reported by relevant studies are not included in this study. In addition, some of the indicators were not included in our study because they were derived from patients’ complaints rather than standard medical diagnoses and thus had low credibility.

From the standpoint of model promotion, the more streamlined a prediction model is, the less expensive, easier to use, and more suited to wide application it is. However, it will also result in a decline in prediction performance.

This is a matter of balance: whether the model should be applied mainly for primary screening of high-risk cases or whether it should prefer higher predictive accuracy. It depends on the application scenario of the constructed model.

In this study, the pneumonia risk prediction model we constructed was mainly applied to the primary screening of people at high risk of pneumonia in SARS-CoV-2-infected individuals, so we chose a more streamlined modeling strategy.

One unexpected finding was that the model performed better in the validation cohort than in the training cohort. This result may be explained by the relatively small sample size of the validation cohort and a certain degree of homology with the training cohort.

5. Conclusion

In this study, a pneumonia risk prediction model was developed and externally validated based on simple clinical and blood test indicators. The model was used to diagnostically predict the likelihood of pneumonia in patients infected with SARS-CoV-2 and performed well on dimensions of discrimination, calibration, and clinical validity. It can be used as a reference for the management of pneumonia risk classification in SARS-CoV-19-infected patients.

6. Limitations of This Study

Our study has several limitations. First, despite applying the inclusion criteria strictly, we could not completely rule out cases with potential lesions in body parts other than the lungs from influencing the predictors at study entry. This caused some confusion in constructing the model and difficulties in evaluating its predictive performance.

Second, even though external validation was carried out, it was a single-center retrospective study, and the sample size was somewhat tiny.

In later research, larger-sample and multicenter studies would be required to calibrate and validate the model.

Data Availability

The data used to support the findings of this study are restricted by the Medical Ethics Committee of Hunan Provincial People’s Hospital (The First Affiliated Hospital of Hunan Normal University) to protect the patient. Data are available from Xi Yi, [email protected], for researchers who meet the criteria for access to confidential data.

Disclosure

This is a preprint paper [39].

Conflicts of Interest

The authors declare that there are no conflicts of interest.

Authors’ Contributions

Xi Yi and Jirong Li, who had conceived and designed the study, had full access to all the data in the study and took responsibility for the integrity and analysis accuracy of the data. Daiyan Fu and Lile Wang contributed to the modification and revision of the manuscript; Guiliang Wang evaluated the quality of the literature. Xi Yi wrote the manuscript. All listed authors reviewed and approved the final manuscript.