Abstract

In this paper, the SIR epidemiological model for the COVID-19 with unknown parameters is considered in the first strategy. Three curves (, , and ) are fitted to the real data of South Korea, based on a detailed analysis of the actual data of South Korea, taken from the Korea Disease Control and Prevention Agency (KDCA). Using the least square method and minimizing the error between the fitted curve and the actual data, unknown parameters, like the transmission rate, recovery rate, and mortality rate, are estimated. The goodness of fit model is investigated with two criteria (SSE and RMSE), and the uncertainty range of the estimated parameters is also presented. Also, using the obtained determined model, the possible ending time and the turning point of the COVID-19 outbreak in the United States are predicted. Due to the lack of treatment and vaccine, in the next strategy, a new group called quarantined people is added to the proposed model. Also, a hidden state, including asymptomatic individuals, which is very common in COVID-19, is considered to make the model more realistic and closer to the real world. Then, the SIR model is developed into the SQAIR model. The delay in the recovery of the infected person is also considered as an unknown parameter. Like the previous steps, the possible ending time and the turning point in the United States are predicted. The model obtained in each strategy for South Korea is compared with the actual data from KDCA to prove the accuracy of the estimation of the parameters.

1. Introduction and Problem Statement

The novel coronavirus (COVID-19) is an emerging disease that was first spread from Wuhan, China. This disease has developed in the entire world and has attracted worldwide attention. Since COVID-19 has affected more than 200 countries worldwide, it is important to model this disease correctly and identify how it spreads and predict the disease to take the necessary steps. In this regard, predictive mathematical models play a crucial role in investigating the epidemic spreading in a lack of specific antivirals or effective vaccine. Many articles have been published about disease modeling and estimating the unknown parameters of infectious diseases, including COVID-19. Hence, in the following, after reviewing the other papers, we will discuss our objectives and innovations in the present paper.

Accordingly, the authors in [1] estimated the parameters of the SIR model of COVID-19 in India using an actual data set. Also, Bastos and Cajueiro [2] have used two variations of the SIR-type model (SIR and SIAS) to forecast the evolution of the SARS-CoV-2 virus with the real data in Brazil. The second wave forecasting of spreading the COVID-19 in Iran with the SIR model is considered in [3]. The authors in [4] have also forecasted the trend of COVID-19 using the least square error (LSE) technique. Furthermore, in [5], the key epidemic parameters are estimated in the generalized SEIR model to forecast COVID-19 on epidemic size, peak time, and possible ending time for five different regions. The asymptomatic and quarantined people have not been considered in these papers. Due to the nature of COVID-19, it is recommended to consider the group of quarantined and asymptomatic individuals. Therefore, in our work, we modified the SIR model by considering the asymptomatic people, and people were put into quarantine. Also, the author in [6] introduced a SIR-type model that only considered the asymptomatic individuals for COVID-19 in Northern Italy based on parameter estimation.

In [7], the parameters and initial values of the SIR epidemic model are estimated for reported case data of the Hong Kong seasonal influenza epidemic in New York City in 1968-1969, to recognize the relevance between unreported and reported cases. The study by Liu et al. [8] is aimed at developing the mathematical model considering a new group of unreported cases for the COVID-19 epidemic in Wuhan, in which the parameters and the initial conditions of the proposed model are estimated. Thereupon, using the parameterized model, the number of unreported cases is identified. Hence, since the initial values are known in the present study, obtained through real data, there is no requirement to estimate them. The study by Hadeler [9] is aimed at identifying the time-dependent transmission rate in epidemic SIR, SIRS, and SEIRS models and reviewing and comparing the various results. In addition, in [10], the authors estimate the infection rate of a SIR epidemic model based on input-output (IO) equations depending on the known quantity of output measurement and its derivatives. Furthermore, the authors in [11] introduced a more complete epidemic model for influenza that can be used for other diseases by parameter modification. In this regard, the authors in [12] have applied optimal control to the proposed epidemic model for COVID-19 compared to Ebola and influenza.

There are many different methods to estimate parameters in various epidemic models that can be used as required. For example, if new data is added during the identification process, then the model should be based on the observations until the current time. Therefore, the parameter estimation should be computed recursively over time, as described in [13] in detail. Moreover, if the model is considered two-dimensional, the study by Shafieirad et al. [14] can be helpful. In addition to the continuous models considered for epidemic dynamics, discrete models can also be used, which are discussed in [15] in detail. Also, since some people who have previously been infected with COVID-19 have been reported to be resusceptible, the authors in [16] introduced a modified SEIRS model considering the possibility of susceptibility for recovered people for control action. In [17], a new mathematical model with time-dependent coefficients is used to characterize the dynamics of COVID-19 in three countries: S. Korea, Italy, and Brazil.

Since the prevalence of COVID-19 in the United States is on the rise, it is vital to make predictions on the possible ending time. The method mentioned in this article can be applied to other countries and similar diseases. Since the prevalence of COVID-19 in South Korea has decreased and there is a complete set of data, taken from KDCA, an accurate model can be obtained to predict the ending point of the disease in other countries (including the United States).

Our motivation is to evaluate our method’s efficiency on a classical SIR and SQAIR epidemiological model to predict the turning point and ending time of the COVID-19 disease in the United States. For this purpose, the method used in this study is the following.

Using actual data of South Korea, taken from KDCA, which has provided accurate and well-documented statistics on the prevalence of the coronavirus disease, the epidemic model’s unknown parameters can be estimated. Using the obtained determined model, the possible ending time of COVID-19 in the United States can be predicted. Also, we use two strategies in this article to implement our motivation:

In the first strategy, the unknown parameters of a classical SIR (susceptible-infected-recovered) epidemiological model are estimated using the LS method more easily. Therefore, the turning point and ending time of COVID-19 in the United States are predicted. There may be asymptomatic carriers in the community in the incubation period despite having the disease and even despite the COVID-19 test result leading to transmit the disease to others. As a result, in this study, we also considered this group of people as asymptomatic people in our model and modified the basic SIR model to the SAIR model. Additionally, since there is no cure or vaccine for COVID-19 yet, it is necessary to quarantine susceptible people, and since there are no groups to include this group () in the SAIR model, therefore in the next step, the SAIR model is developed to the SQAIR model by introducing a quarantine group. Besides, considering the delay in transferring people from the infected group to the recovered group is an essential factor added to the SQAIR model because it makes the model more realistic and closer to the natural process of spreading COVID-19. The same steps are then applied to the SQAIR model to achieve the turning point and ending time of the COVID-19 outbreak in the United States. Since there is no proper viral treatment or effective vaccine yet to prevent and control the spreading rate, currently, the best options and widely used strategies for decreasing the outbreak’s growth rate are social distancing, stay-at-home orders, self-quarantine, lockdowns, isolation, and wearing a face mask.

In this paper, the group of quarantined people () refers to all the above strategies, which are only called quarantined people, for this group’s simplicity. As mentioned in the papers above, other groups can be added to the model, but this study is aimed at predicting COVID-19 with a comprehensive and straightforward model to incorporate the general features of the COVID-19 disease and can easily express the behavior of the disease. Furthermore, considering the delay in systems is critical because it is closer to the real world. Hence, in this study, delay in transferring infected people to the group of recovered people is considered an unknown parameter.

In the following, the general structure of the paper is presented.

In the first section of the paper, the introduction and problem statement were introduced. Our paper continues with Section 2, which presents the SIR model with dynamic equations and diagrams. In Section 3, the estimation of unknown parameters, model upgrade, prediction, and comparing results are presented. Finally, the conclusion is given in Section 4.

2. The SIR Epidemic Model

The SIR epidemic model used in this paper is described as follows: let be the number of susceptible people to infection, the number of infected people (people who have been tested positive for COVID-19), and the number of recovered people. The SIR epidemic model is given by where is the transmission rate and the initial conditions are ,, and. All states are positive values . The total population includes individuals who have been tested. In other words, the total number of people considered as a statistical population (due to the normalization, ) is equal to one.

Remark 1. The total population includes the individuals who have been tested (it is a statistical society that can be generalized to the total size of the population), which is generally variable. But in this work, it is fixed and equal to the total number of people on the last day that data is taken (72nd day). According to the other researches, is usually considered as the whole number of the country population; however, it is challenging to consider all population sizes of the people (almost 330 million in the U.S.) involved with COVID-19 because this disease is not equally distributed in all the states of a vast country like the United States. Hence, we considered a smaller community (people who have been tested) as our statistical society, which contains all three groups of people (, , and ). The parameters are also more accurately identified. For example, suppose the number of infected people is 500,000 and the total number of population is about 330 million. In that case, the ratio of infected people to the total population becomes small, and the estimated parameters are not obtained correctly. Of course, when we consider the statistical society as tested individuals, we can generalize them to the entire population.

As shown in Figure 1, the infected people recover at a rate of . indicates the removal rate of infected people due to mortality caused by infection.

Since the epidemic model parameters are unknown, estimating these parameters with the real data taken from South Korea is the main objective of this paper. As a result, using the known parameters, the spread of infection in the United States can be predicted by the method presented in Section 3.

Remark 2. The nature of epidemic models is discrete because data are collected and/or reported over discrete units of time that makes it easier to compare data with the output of a discrete model and can be easily implemented. For system identification, it is required to measure the input and output data in the time domain. Then, select a model structure (usually discrete model) and apply an estimation method (LS method in this paper) to estimate unknown parameter values. Since, in this study, the identification data, taken from medical reports, are daily, the discrete desired model structure is determined. Furthermore, these data may be weekly (also daily) in fast-spreading epidemics, such as influenza, SARS, Ebola, and especially novel coronavirus (COVID-19). Basically, epidemic modeling is all discrete in nature which can be considered continuous with a small step length. Of course, after estimating the parameters, it can be simply written in continuous form. Furthermore, the numerical investigation of discrete-time epidemic models is more straightforward. There has been some study of discrete epidemic models referred to in our paper [15].

Remark 3. Assume that the initial values , , and in which and all parameters are all positive. In the mentioned model, the change rate of the susceptible people is as , which shows that susceptible people become infected with the rate of and move from group to . Then, after a period when the number of susceptible people reaches zero, the rate of change () becomes zero and remains unchanged. After the number of the susceptible people reaches zero on a specific day , Eq. changes to which is a difference equation that eventually tends to zero . On the other hand, the number of recovered people increases at the rate of and when the number of infected people reaches zero the recovered people remains at its maximum value . As a result, on day , we reach a stable equilibrium point , where .

3. Parameter Estimation, Prediction, and Comparing Results

According to the daily official reports of the Korea Disease Control and Prevention Agency (KDCA), the numbers of infected and daily deaths are available in public. The number of infected people (people who have tested positive for COVID-19) and people who have died of the coronavirus disease are specified in Table 1. Using Equations (1a)–(1c) and (2), the number of the susceptible and recovered people can be computed

According to Equations (3) and (4), the number of infected, susceptible, and recovered people is determined. Minimizing an objective function leads to estimate the unknown parameters , presented in two strategies.

3.1. SIR Strategy

In the first strategy, three curves ( and ) are fitted to the real data of South Korea, given in Table 1. The goodness of fit describes how well the function fits a set of actual data shown in Table 2 with two criteria, sum of square error (SSE) and root mean squared error (RMSE) that measure the deviation of the actual data from the curve fitted to the data. For these two criteria, the smaller the value, the better the model fits. Therefore, according to Table 2, the fit results are reasonable because the SSE and RMSE values are small and close to zero. Applying the least square method to the objective functions leads to estimate the unknown values of the parameters. The error between the fitted curves and the actual data is considered as the objective function. Given the objective functions and , where is the total number of data. in which is the number of susceptible individuals (in every ) obtained from the fitted curve to the actual data. The actual number of susceptible people is indicated by , where and . Similarly, , in which and . Also, , where and . Finally, the optimal vectors (and ) are obtained using the least square method. Therefore, where in which, and , , , and .

Note that curve fitting is applied to the real data, based on a detailed analysis of the actual data of South Korea in Table 1, as mentioned. Then, using the least square method and minimizing the error between the fitted curve and the actual data, unknown parameters, like the transmission rate, recovery rate, and mortality rate , were obtained. Also, the uncertainty range of the estimated parameters is presented in Table 3.

The basic reproduction number can also be estimated as , based on estimated parameters (see [11] for details). The authors in [21] also estimated the reproduction number based on publicly available sources, which is a critical point in the outbreak of COVID-19, to investigate the growth rate of the COVID-19 outbreak in South Korea. According to Table 2, the uncertainty range of the basic reproduction number can be calculated in the following: as the lower range and as the upper range. The desired basic reproduction number can be calculated using the mean of parameters in Table 3. Then, the number of susceptible, infected, and recovered people is shown, respectively, in Figures 24. The real data series of the susceptible, infected, and recovered people obtained from Table 1 is compared with the number of people taken from the model with estimated parameters.

As it turns out, the resulting SIR model is properly fitted to South Korean data, so this model can be used to predict the possible ending point of COVID-19 in the United States. Because COVID-19 is spreading out rapidly in the United States, it can be crucial to know the turning (inflection) point and possible ending time of the disease to make an effective decision. As shown in Figures 57, in the simplest strategy (SIR), the epidemic situation for the United States is not hopeful for the next 50 days, and the turning point of the disease is in the middle of June, and the number of infected people in the peak is about twice its current value (Apr. 28, 2020). However, fortunately, it is expected to end up completely within seven months (from Apr. 28, 2020).

However, in order to get closer to the real world, the model can be developed. Therefore, our studies will be expanded in the following strategy.

3.2. SQAIR Strategy

Since coronavirus disease is currently incurable, quarantine is a priority in all countries. Therefore, a new group called quarantined people can be added to the proposed model. Also, considering a new hidden state can make the model more realistic. This hidden state can be indicated by that includes asymptomatic people, which is very common in COVID-19. Delay in the transfer of infected people to the group of recovered people is also considered. So, these three different conditions can be considered as follows: (1)The new group added is infected people who have negative COVID-19 test and no symptoms. They are in their incubation period that can transmit the disease to others without any visible symptoms(2)In coronavirus disease, infected people continue to be carriers of the virus after recovery, so they remain in the infected group because they can continue to infect susceptible individuals at the rate of , so they go to the group of recovered people with a delay (3)The quarantined people are shown by . Figure 8 shows the quarantine group and how to transfer to that group. In different countries, the quarantine rate of susceptible individuals may vary, so we consider this rate equal to

Equations (1a)–(1c) are reformulated as follows: where is the rate of transfer of individuals from group to . The values of vectors and are unknown, and is the quarantine rate. Since two new groups have been added to the model, the total number also changes.

Remark 4. Assume that the initial values , , , , and in which and all parameters are all positive. The rate of change of the susceptible people is as in which , which remains at zero after zeroing the number of susceptible people The rate of change of is ascending, which remains at its maximum with the zeroing of susceptible individuals . Then, after a period when the number of susceptible people reaches zero, the rate of change of asymptomatic people becomes which is a difference equation that eventually tends to zero . After day , the rate of changes in the infected people changes as which has a downward trend and converges to zero , and according to , the recovered people reach its maximum value on day and remains stationary . As a result, the stable equilibrium point of the model is obtained as , where and .

First, it is assumed that there is no group of quarantined people and the rate is not affected by quarantined people; therefore,

The total population includes the individuals who have tested (it is a statistical society that can be generalized to the total size of the population), which is generally variable. But in this work, the total population is fixed and equal to the total number of people on the last day that data was taken (72nd day). According to Equation (9), the number of and is obtained from the actual data directly. Since is known, the number of can be calculated. There is no reported data for the number of alone. Since the normalized value of is equal to one and the number of infected and recovered people for South Korea is known, therefore

Due to the incubation period of COVID-19, it is difficult to separate these two groups, and as shown in Figure 9, considering the quarantine rate of 95%, it is predicted that the total number of susceptible and asymptomatic people in the United States will eventually reach almost zero in two months. Since the number of each state is positive, then the sum of them is positive too. If the number of reaches zero, then the number of and must become zero individually. But even though we do not know the number of asymptomatic people, but in the end, we are sure that they will reach zero. Using the actual data in Table 1, a function (or curve) is fitted for the vector . Similarly, as mentioned before, the goodness of fit model is investigated with two criteria in Table 4. According to Table 4, small SSE and RMSE indicate a close fit of the function to the data. Therefore, our model fits very encouraging based on South Korean data.

Then, by derivation from the obtained function and equating it with Equation (11) that is obtained by Equations (8a) and (8c), an unknown value can be obtained.

Now to make the parameters more accurate and to choose the optimal parameter, similar to the previous one, using the LS method where , in which is the total number of and obtained from the fitted curve. Besides that, is the total number of the actual and obtained from , where and . Finally, where and

Also, according to Equation (8e),

Since is determined, so the value of is determined, too. Also, the value of is known, and according to Table 1, data analysis, and comparing the differences between these two vectors, can be obtained approximately. Now, considering the two new groups () and the delay, similarly, the unknown parameters including and are estimated and the new model is obtained. The uncertainty range of the estimated parameters is also presented in Table 4. The uncertainty range of the estimated parameters is also presented in Table 5.

Figures 1012 show the comparison of the number of infected, recovered, and the sum of two and groups, respectively, based on the actual data and the model obtained from the estimated parameters.

Finally, by estimating and , and for , the spread of COVID-19 in the United States can be predicted in Figures 1315. In Figure 13, the actual data published by CDC of the United States from Feb.15 to Apr.28 are marked in black spots, and the predicted number of infected people is shown in the blue line. If 95% of susceptible people (group ) were quarantined from the beginning of the disease, based on this study estimation, the epidemic of COVID-19 in the United States would end within approximately seven months, and although the population has almost tripled, the peak of the disease would not increase so much (black spots in Figure 13). The possible turning point of COVID-19 epidemic in the United States will be at the end of November 2020. But even in the current situation, by applying this technique to the epidemic situation in the U.S., it can be conjectured that the eventual eradication is reached in seven months that maximizes the number of individuals who escape infection altogether.

As shown in Figure 14, there is a significant difference between the number of predicted recovered people using the proposed model and the real number of recovered people in the United States. Since in the proposed model 95% of people have been quarantined at the beginning of the disease outbreak, fewer people get infected. As a result, fewer people will be recovered from the disease, and fewer recovered people imply that convergence toward immunity will be faster, whereas the number of recovered people in the U.S. is on the rise, indicating the high number of infected people. If the United States had followed this study procedure to quarantine in early severely, then the number of recovered people would have been smaller (because there were fewer infected). In Table 6, there is no statistic of the number of asymptomatic people and susceptible people. However, based on the proposed model, it can be predicted that the number of reached zero (Figure 9), but since the number of people is a positive number and the sum of them reached zero, it means both and reach zero. Achieving zero number of susceptible people means that people’s quarantine is well done and the number of asymptomatic people has fallen to zero (meaning that all of them are recovered). Eventually, in Figure 15, the number of people in quarantine is demonstrated, which as expected the susceptible people are quarantined well. Although there is a lack of actual data for some groups ( and ), the SQAIR model with estimated parameters should help forecast the epidemic of COVID-19 and prevent the spread of similar viruses in the future, since the 1918 influenza outbreak or the “Spanish flu” spread over the world (between 1918 and 1919) and about 500 million people became infected with this virus and the number of deaths estimated at least 50 million, with about 675,000 in the United States. Therefore, these simulation results are also useful not only for the first peaks but also for predicting the second peaks of COVID-19 observed in some countries or are expected soon. Accordingly, it can be estimated that the pandemic will peak during the second wave, in the fall of 2020. Hence, if the quarantine is not done correctly and is broken for any reason, it is possible to create the highly fatal next waves, as what happened in the Spanish flu in 1918. It should also be pointed out that the spreading out of COVID-19 in the United States would still be very severe. In addition to the high growth and even the mortality rate of the COVID-19 outbreak, the economic and social costs are the next problem, which are affected by this disease, and if the quarantine of people is not emphasized, it will have many catastrophic economic consequences discussed in [22]. Therefore, the law can contribute to preventing COVID-19 by supporting access to treatment and allowing public health authorities to limit contact with infectious people in response to disease outbreaks. Hereupon, the government should intervene to reduce the number of involved people, and it requires imposing martial law to strictly quarantine the population, efforts to treat infected people, and clinical research. Also, criminal penalties for breaking the quarantine and transmission of COVID-19 may create disincentives for individuals to stay home. Encouraging people to observe self-protection (like wearing a face mask, social distancing, and limiting gathering) is significant to break the transmission chain, especially in countries where rates of COVID-19 are high.

4. Conclusion

In this paper, the mentioned method’s efficiency to identify the unknown parameters of two basic (SIR) and extended (SQAIR) epidemic models was evaluated. In this regard, first, the SIR-type model with unknown parameters was considered to investigate the dynamic of COVID-19. After that, based on the real data from the Korea Disease Control and Prevention Agency (KDCA), the unknown model parameters were estimated to predict the spreading process of COVID-19 in the United States. In the absence of effective vaccine and treatment, the number of COVID-19 infected people rises rapidly. Therefore, it is essential to consider the quarantine strategy of susceptible people to apply adequate control and decrease the risk of virus spread. Thus, by adding the new group called quarantined people, the model got more realistic. In this way, the SQIR model with unknown parameters was introduced to analyze the epidemic of COVID-19. In the following, by incorporating the impact of asymptomatic people () on the epidemic procedure, which is almost impossible to distinguish such people in society, the model developed to SQAIR model, the COVID-19 dynamic was correctly modeled. The model parameters and the delay considered for the complete recovery of the infected people were also regarded as unknown parameters in the model. In both strategies, unknown parameters of the model were estimated using real data obtained from KDCA and the least-squares method. This model was then compared with the United States’ actual data published from the Centers for Disease Control and Prevention (CDC) and the possible end point of the disease, and its inflection point was predicted. Finally, the results were compared in the form of graphs. Although many countries try to break the transmission chain, traveling continues the increase of the COVID-19 prevalence. Therefore, the connections between cities and countries as network-based issues and their impact on the final result of the estimation can also be examined in future works. After developing the vaccine, the disease spread control by injecting the right amount of drug dosage at the correct times can also be an important study for future studies.

Data Availability

The data used are included in the paper and cited accordingly.

Conflicts of Interest

The authors declare that they have no conflicts of interest.