Abstract

In order to solve the problem that the railway passenger volume data are abnormal due to holidays and major events interfering with the prediction accuracy, the spline interpolation method is introduced to replace the abnormal passenger volume data. In addition, an improved particle swarm optimization (IPSO) is proposed to optimize the gradient difference acceleration law to combine and improve the predicted value and further improve the prediction accuracy of the railway passenger traffic. Finally, taking Beijing as the research object, the Holt exponential smoothing method and the BP neural network are selected to verify the effect of spline interpolation and IPSO-gradient difference acceleration law on prediction accuracy. The research results show that the spline interpolation method has a better prediction effect after processing abnormal passenger traffic data, and the improved particle swarm algorithm also shows better optimization ability and convergence speed when solving the double difference postulate. In comparison with the BP neural network, Holt exponential smoothing, simple averaging, and conventional redifference approaches, the IPSO-redifference acceleration method achieves a superior prediction performance, and the absolute values of the forecast error are reduced by 3.320%, 1.518%, 2.419%, and 0.602%.

1. Introduction

As an important basis for preparing macro development strategy and passenger transportation plan, railroad passenger volume forecasting is of great significance for formulating railroad network planning, designing train operation plans, optimizing passenger transportation product structure, and improving passenger transportation service level. There are more research results on railroad passenger traffic forecasting, which are broadly divided into two categories: one is to introduce corresponding methodologies to improve the model according to the shortcomings of the forecasting model itself in order to improve the forecasting accuracy. Dempster and Bui et al. studied the neural network model and proved that the prediction results of this model are more accurate than those of other models [1, 2]. Aihara et al. successfully combined artificial neural networks with the chaos theory to develop a freight volume forecasting model [3]. Li et al. predicted passenger transport volume on railways based on the Grey–Markov chain model [4]. Qiu et al. forecasted China’s railway freight volume based on the combined PSO-LSTM model [5].

Another category is the combined prediction model formed by combining the characteristics of several models. Wang combined the Gray model and the neural network to obtain a Gray neural network model, and successfully predicted the highway passenger flow [6]. Ye et al. combined the Markov and the Gray GM (1, 1) model and established the Gray–Markov forecasting model to forecast the freight volume [7]. Ge et al. studied an ARIMA model and combined it with the FSVR to propose a hybrid prediction method for high-speed railway passenger traffic forecasting [8].

When social emergencies such as epidemics and major social events occur, scholars choose models with a strong adaptability to passenger flow fluctuations to make relevant predictions. Jiao proposed an improved STL-LSTM model to improve the accuracy of bus passenger flow prediction during COVID-19 [9]. Wang et al. showed that the SARIMA-NAR combined model can be used during the epidemic [10].

Based on the current research findings, when there are sudden fluctuations in the training data caused by unexpected social events, scholars select models with a high adaptability to predict passenger flow fluctuations. However, the impact of sudden increases or decreases in the passenger volume caused by epidemics, major social activities, and other unexpected social events on the changes in the trend and pattern of passenger volume has not been effectively addressed. Consequently, it disrupts the accuracy of the prediction [11]. Sudden events such as the 2003 SARS epidemic in Beijing, the 2020 COVID-19 pandemic, and major holidays all cause sudden fluctuations in passenger volume [12]. When incorporating such data for volume prediction, fluctuations in passenger volume over time may impact the model’s training parameters, thus resulting in higher prediction errors [13]. In addition, most of the existing forecasting methods improve the forecasting models according to the characteristics of the research object or apply multiple forecasting models for combined forecasting. The prediction accuracy of the weighted combination of multiple models is too dependent on the prediction effect of the selected prediction model and has a certain degree of uncertainty, especially when one of the models has a better prediction accuracy, while another model has a poor prediction accuracy, then there will be a situation that the combined prediction accuracy is worse [14].

Spline interpolation is based on the segmented low-order polynomial interpolation method, and the interpolation condition and smoothness of the curve are achieved by adjusting the coefficients of the interpolation basis function on each interval, which has a high accuracy and stability, and the fitted curve is smooth and not easy to oscillate [15]. Spline interpolation can better explore the rule of smooth change of railroad passenger volume with time series [16]. Therefore, to address the problem that the irregular changes in railroad passenger volume under social emergencies affect the accuracy of passenger volume prediction, this paper introduces spline interpolation to correct the preprocessing of abnormal passenger volume data caused by major events such as new crown pneumonia and epidemics to solve the problem of interference of abnormal data on model prediction accuracy, and applies the Holt exponential smoothing method and the BP neural network for railroad passenger volume forecasting [17]. To address the shortcomings of the existing combined forecasting methods, the idea of the redifference acceleration rule is proposed to combine the forecast results for correction, and the redifference metric parameters are optimized by the improved particle swarm algorithm. Finally, combining the spline interpolation method and redifference acceleration rule with the Holt exponential smoothing method and the BP neural network forecasting proposes a railroad passenger traffic forecasting method that can adapt to irregular fluctuations of passenger traffic under unexpected events and canimprove the accuracy of combined forecasting.

2. Triple Spline Interpolation

Triple spline interpolation (spline interpolation for short) is an important method used in numerical analysis for function estimation and numerical fitting, which can explore the variation pattern of data series and estimate the function values of certain interpolation nodes in between [18]. The basic idea of applying the spline interpolation method to solve the problem of abnormal railroad passenger volume data caused by holidays or major events is as follows: assuming that the variation law of the railroad passenger volume forecast data series in the study area in the past years is a cubic polynomial function , and by excluding the abnormal data due to holidays or major events, the interpolation nodes (i.e., time nodes) are noted as , then the function is a cubic polynomial on every small interval of the interval (a, b). If the node is , then is said to be a node on the cubic spline function. If a function value is given at node , then the interpolation condition is satisfied [19].

Then, we call three times the spline interpolation function.

The expression of the cubic spline interpolation function iswhere , , , and are the parameters to be estimated.

To find out , it is necessary to determine the 4 parameters to be estimated on each small interval . Since is a second-order derivative continuous on the interval (a, b), the continuity condition is satisfied at node .

The natural boundary condition of at the interpolation node is

By combining the interpolation condition, the continuity condition and the natural boundary condition can be solved for . The correction value of the abnormal data due to holidays or major events is .

3. IPSO-Redifferential Acceleration Law

3.1. Law of Redifferential Acceleration

The law of redifferential acceleration is an approximation acceleration technique proposed by Liu Hui, a mathematician in the Wei-Jin period of China, to solve higher precision values, the basic principle of which is that in a monotonically bounded approximation series , the existence of gradually approaches the limiting value , assuming that the approximating series all satisfy the redifference metric , the series deviation ratio is a certain value, and the resulting series can be accelerated to approximate the limiting value according to the following formula:where is the improved value of and is the redifference metric ().

The redifference metric is calculated as

According to the redifferential acceleration law, is the higher accuracy series approximation value, and and have the next highest accuracy. Applying this idea to the correction of railroad passenger volume forecast results, the following redifference acceleration correction formula is available:where is the improved value of the railroad passenger volume forecast, is the forecast value of model b, which has a higher forecast accuracy, and is the forecast value of model a, which has a lower forecast accuracy than model b.

According to (7), it can be seen that the implementation of the acceleration rule requires the selection of “one main and one auxiliary” forecasting methods for railroad passenger traffic forecasting, and “one main” means the forecasting method with a better forecasting accuracy, and the forecasting result is . The “one auxiliary” is the forecasting method with a poorer forecasting accuracy, and the forecasting result is . The optimization by the redifferential acceleration rule can make the forecast correction value better than the forecasting result of the “one main” model. This design idea maximizes the promotion of the prediction value with a better accuracy and suppresses the prediction value with a poor accuracy, so as to obtain the prediction value with a higher accuracy.

The redifference metric can be calculated according to (6), but this method has two drawbacks: one is that a better or worse accuracy forecasting model is also needed to forecast the railroad passenger traffic, which not only increases the difficulty and complexity of the forecasting work, but also cannot guarantee the forecasting accuracy of the selected model; and the other is that the redifference metric calculated according to (6) is an estimated value, which is not the optimal parameter value to achieve combined forecasting. Therefore, this paper proposes an improved particle swarm algorithm (IPSO) to solve the optimal redifference metric.

3.2. IPSO Optimization Redifference Metric
3.2.1. Basic Principle of the Particle Swarm Algorithm (PSO)

The PSO is a swarm intelligent search algorithm developed by imitating the characteristics of a flock of birds foraging. Its basic idea is to find the final foraging location, i.e., the optimal solution position, by sharing the respective search information of each forager in the flock. In practical application, the particle flies continuously in the set space, and continuously adjusts its position according to its own search experience in the process of finding the optimal target location, until it satisfies the search termination condition to find the optimal solution [20].

Suppose the position and velocity of the ith particle of a certain population at the tth iteration are and , respectively, then the particle updates the position and velocity by supervising the individual extremes and population extremes to further approximate the optimal solution. The particle velocity and position update are calculated as follows:where is the weight, taken from 0.4 to 0.9; and are the individual learning factor and group learning factor, respectively; rand is a random number generated between 0 and 1; pbest is the individual extreme value; is the group extreme value; and is the speed coefficient, generally taken as 1 [21].

3.2.2. Improvement of PSO

The basic PSO generally adopts the method of fixed weights to seek the optimal solution, i.e., a certain fixed value in 0.4∼0.9, and the learning factor is generally taken as . Depending on different data and simulation environments, its value affects the optimization-seeking ability and convergence speed of the particle swarm algorithm. Therefore, this paper proposes the idea of nonlinear variation to optimize the weights and learning factors of PSO to improve the optimization ability of the algorithm.

The principle of weight optimization is as follows: at the initial iteration, a larger weight is set to ensure that the initial iteration has a larger particle search speed and a better global search capability, and when the number of iterations increases, the weight assignment needs to be reduced to slow down the particle search speed and ensure that the particles have a better local search capability. The optimization formula of weight is as follows:where is the maximum weight, is the minimum weight, ger is the maximum number of iterations, and iter is the current number of iterations, .

The learning factor optimization principle is the optimal value of ranging from 2.5 to 0.5, and the optimal value of ranging from 0.5 to 2.5. Therefore, in the optimal value range, in order to ensure the global search ability at the beginning of the iteration, the value of decreases nonlinearly with the increase of the number of iterations, so that the learning ability of the individual particle itself is larger at the beginning. At the same time, let increase gradually with the number of iterations to strengthen the group learning ability in the late iteration and avoid the algorithm falling into the local optimum in the late iteration. Based on this idea, this paper selects the nonlinear change function constructed by the power function to improve the learning factor, whose expression is

3.2.3. Algorithm Flow for IPSO Optimization of the Redifference Metric

The specific algorithm for IPSO optimization of the redifference metric taking values is shown in Figure 1.

Step 1. Forecast the railroad passenger volume according to the selected forecasting method.

Step 2. Use the redifference acceleration rule to process the predicted value of the railroad passenger traffic, and at this time, the predicted improved value is a function expression containing.

Step 3. Initialize the population particle parameters. According to the constraints, such as the range of particle parameters, set the initial parameter values.

Step 4. Calculate the particle fitness. With the goal of optimal prediction accuracy, the fitness function , which is the minimum absolute value of the relative error of prediction for all prediction years, is determined for the redifference metric optimization. Therefore, the fitness function is defined as follows:where is the improved value of the redifferential acceleration rule forecast in a period t, m is the total forecast year, and is the actual value of the passenger traffic in a period t.

Step 5. Compare the particle fitness values of different iterations to find the current optimal particle fitness value and its position, so as to update the particle search position and search speed.

Step 6. Determine whether the iteration termination condition of the particle swarm algorithm is satisfied. If satisfied, the algorithm ends; if not, return to step 4.

Step 7. Output the optimal solution.

4. Forecasting Method Selection

Railroad passenger volume forecasting methods can be divided into linear forecasting and nonlinear forecasting models [22]. Among the linear forecasting models, time series forecasting models are represented by ARIMA and the exponential smoothing method, which have the advantage of not needing to study the influence of independent variables (influencing factors) on forecasting results. Among the nonlinear forecasting models, the BP neural network has a better adaptability to nonlinear demand forecasting problems by virtue of its powerful self-adaptability, self-learning, and fault tolerance. Therefore, in this paper, the classical Holt exponential smoothing method and the BP neural network are chosen as the representatives of linear and nonlinear forecasting methods for railroad passenger traffic forecasting.

4.1. Holt Exponential Smoothing Method

The Holt exponential smoothing method is based on the primary exponential smoothing method to forecast the original time series based on the smoothed and trend values, which strengthens the prediction ability of the exponential smoothing method for trend data [23], and its prediction formula iswhere is the predicted value in the period t + m, is the smoothed value in a period t, is the actual value in a period t, is the trend value in a period t, m is the number of prediction overruns, and , is the exponential smoothing-related parameter [24].

4.2. BP Neural Network

The BP neural network is a multilayer feedforward neural network trained according to the error backpropagation algorithm, which is widely used [25]. The topology of the BP neural network is shown in Figure 2, and its basic idea is the gradient descent method, and the learning process includes forward propagation and backpropagation. In the forward propagation process, the railroad passenger prediction factors are passed from the input layer to the implicit layer as input information, and finally passed to the output layer to output the corresponding railroad passenger volume and training error information [26]. If the output prediction error is larger than the training target, the error is backpropagated from the output layer by repeatedly training and adjusting the weights of and until the error is reduced to an acceptable level or until the maximum number of learning is performed. At this point, the sample data is input again to obtain the output value with the minimum error [27].

There are many types of activation functions for the BP neural network prediction, each with its own advantages and disadvantages. In this paper, we choose the commonly used tansig function and purelin function as the hidden layer neuron and output layer neuron activation functions, respectively, whose expressions are [28] as follows:

5. Railway Passenger Volume Forecast Based on the Spline Interpolation and IPSO-Gradient Difference Acceleration Rule

The combined railroad passenger volume forecasting process combining the spline interpolation method and the redifferential acceleration rule with the Holt exponential smoothing method and BP neural network forecasting is shown in Figure 3.Step 1: the spline interpolation method is used to make data replacement corrections for the abnormal railroad passenger volume caused by holidays or major events, and the corrected railroad passenger volume data series is formed.Step 2: based on the corrected railway passenger volume data series, the BP neural network and the Holt exponential smoothing method are applied to make predictions.Step 3: the adaptation function of the IPSO-optimized redistribution metric was constructed based on the prediction results, and the IPSO was applied to solve the optimal redistribution metric.Step 4: based on the optimal redifference metric, the redifference acceleration law of equation (7) was applied to improve the prediction results of the BP neural network and Holt exponential smoothing.

6. Case Study

In 2003, Beijing was affected by the SARS epidemic, and the railroad passenger traffic was “abnormally decreasing” [2933]. Therefore, the research uses Beijing railroad passenger traffic from 2000 to 2019 as the research object, considering the impact of regional GDP, regional resident population, per capita consumption level, and the number of tourists. And the research uses the combined prediction effect of spline interpolation and redifference acceleration method combined with Holt’s exponential smoothing method and BP neural network prediction (Table 1).

6.1. Data Preprocessing Based on Spline Interpolation

In the data series of railroad passenger volume from 2000 to 2019 in Beijing, the railroad passenger volume in 2003 is excluded, and three times spline interpolation is performed by calling the spline function with the help of MATLAB, and its function image is shown in Figure 4, and the interpolation calculates the revised value of the railroad passenger volume in 2003 as 52.27 million. The preprocessing situation of the spline interpolation method for railroad passenger traffic in Beijing is shown in Figure 5. It can be seen that the overall smoothness of the railroad passenger volume after the spline interpolation method is better, which is more consistent with the overall development trend of the passenger volume.

6.2. Single Model Prediction Based on Spline Interpolation

The railroad passenger volume data from 2000 to 2016 are used as training samples, the railroad passenger volume from 2017 to 2019 are used as test samples, and the Holt exponential smoothing method and BP neural network are used for forecasting, respectively. In the research process, the “relative prediction error” is used as the evaluation index of the model prediction accuracy, which is abbreviated as “prediction error” for the convenience of presentation.

6.2.1. Holt Exponential Smoothing Prediction

The results and prediction errors of Holt exponential smoothing using SPSS are shown in Figures 6 and 7.

It can be seen that Holt exponential smoothing has a good prediction effect. After the spline interpolation method preprocessed the data, the absolute value of the average fitting prediction error for the training samples was 2.053%, and the absolute value of the average prediction error for the test samples was 2.160%.

The prediction effect of the spline interpolation method is better after the substitution correction of abnormal passenger traffic data. When predicting based on the original data of Beijing railroad passenger volume, the prediction error of the passenger volume in 2003 was as high as 16.016%, and the absolute values of the average prediction errors of its training and testing samples were 5.228% and 5.862%, respectively, which were 2.6 times of the prediction errors after processing the abnormal passenger volume data based on the spline interpolation method.

6.2.2. BP Neural Network Prediction

The prediction process of the BP neural network is implemented by MATLAB programming, and the main parameters are set as follows: the number of training is 1000 times, the hidden layer is 1 layer, and the neurons are set as 1. The prediction performance of the BP neural network is shown in Figures 8 and 9. Figure 8 shows the variation of prediction error with the number of iterations for different samples, blue is the training set, red is the test set, and green is the validation set generated by the system. The network converges after 14 times of training, and the network error of the validation set is 0.00917, and the network error of the training and test sets is much lower than 0.00917, with a better error accuracy. Figure 9 represents the BP neural network’s fit prediction goodness and the fit goodness of the training set, validation set, and the test set and all data are 0.99929, 0.97893, 0.99971, and 0.99777, respectively, which shows that the network training effect is very good.

The absolute values of the predicted values and prediction errors of the BP neural network are shown in Figures 10 and 11. Figure 10 demonstrates that the predicted and true values of the BP neural network are better fitted. It can also be seen from Figure 11 that the spline interpolation method has a better prediction effect after processing abnormal passenger traffic data, and the average absolute value of prediction error for training and test samples is reduced by 1.712% and 0.860%, respectively.

6.3. Combined Prediction Effect Based on the Redifferential Acceleration Law
6.3.1. Simulation Parameter Setting

The algorithm experiments in this section are conducted in the MATLAB 2019 environment, and the improved particle swarm algorithm code is written to find out the optimal redifference metric. The main parameters of the particle swarm algorithm are initialized as follows: the population size is 30, the maximum number of iterations is 300, and the speed search interval is −0.5 to 0.5. To further verify the effect of the nonlinear change optimization method proposed in this paper on the superiority-seeking ability and convergence speed of the particle swarm algorithm, the superiority-seeking situation and convergence speed of the basic particle swarm algorithm are compared with those of the basic particle swarm algorithm. The fixed weight of the basic particle swarm algorithm is taken as 0.9, and the learning factor is .

6.3.2. Prediction Results and Analysis

(1) Analysis of IPSO Algorithm Advantage. The variation of the fitness values of different particle swarm algorithms for solving the redifference metric with the number of iterations is shown in Figure 12. From the experimental results, it can be seen that the improved particle swarm algorithm with nonlinearly varying weights and learning factors proposed in this paper has a better finding ability and convergence speed, which is reflected in the following: the fitness value of the improved particle swarm algorithm for solving the optimal heavy difference metric is 0.00640188456, which converges after the 113th iteration, while the fitness value of the basic particle swarm algorithm for solving the optimal heavy difference metric is 0.00640189062, which converges after the 234th iteration. In contrast, the improved particle swarm algorithm has a better seeking ability and convergence speed.

(2) Combined Prediction Effect of the IPSO-Heavy Difference Acceleration Law. The IPSO-optimized heavy difference metric is taken as shown in Figure 13, and the optimal heavy difference metric is solved as 1.9088.

Section 4.2 shows that the prediction accuracy of the Holt exponential smoothing is better than that of the BP neural network. The prediction results of the BP neural network and Holt exponential smoothing from 2017 to 2019 after processing abnormal passenger traffic data by the spline interpolation method are improved by the redifference acceleration rule equation (7). To verify the improvement effect, the prediction results are compared with those of the average weighted combination approach, i.e.,where is the average weighted combination of the passenger traffic forecast, is the BP neural network forecast, is the Holt exponential smoothing forecast, and and are the weights, .

The predicted values of the railroad passenger traffic in Beijing from 2017 to 2019 are shown in Table 2.

The analysis shows that the improved prediction values of the redifference acceleration rule have a better prediction accuracy, which is reflected in the following.

After the improvement of the prediction results of the heavy difference acceleration rule on the BP neural network and the Holt exponential smoothing method, the mean value of the absolute value of the prediction error from 2017 to 2019 is 0.642%, which is 3.320% and 1.518% less than that of the BP neural network and the Holt exponential smoothing method, respectively.

Comparing the forecast results of the redifferential acceleration rule with the average weighted combination forecast results, the absolute value of its average forecast error of the railroad passenger traffic from 2017 to 2019 is reduced by 2.419%. In addition, the mean value of the absolute forecast error of the average weighted combination is 3.061%, which is higher than the mean absolute forecast error of the Holt exponential smoothing method of 2.160%. This also indicates that the direct weighted combination of multiple model prediction results has some uncertainty, i.e., there is no guarantee that its combined prediction effect is necessarily better than that of a single prediction model. The IPSO-redifference acceleration method effectively avoids this problem.

To further verify the advantage of the IPSO-redifference acceleration rule over the traditional redifference acceleration rule, the estimated value of the redifference metric is calculated according to (6). It is known that this method needs to introduce another prediction method with a worse prediction accuracy compared to the BP neural network and the Holt exponential smoothing method. After several experiments, the ARIMA (3,0,1) prediction model was selected, and its prediction results in the SPSS25 environment are shown in Figure 14. The traditional redifference acceleration method improves the prediction value with the following idea: the mean value of the prediction value of the BP neural network, the Holt exponential smoothing method, and the ARIMA (3,0,1) prediction model from 2017 to 2019 is substituted into (6) to estimate the redifference metric, and then the prediction results of the BP neural network and the Holt exponential smoothing method are improved according to (7).

The predicted values of the ARIMA (3,0,1) model from 2017 to 2019 are 143.64 million, 156.93 million, and 168.56 million. The mean values of the BP neural network, the Holt exponential smoothing method, and ARIMA (3,0,1) predicted values from 2017 to 2019 are 148.71 million, 146.13 million, and 156.38 million, respectively. Therefore, the redifference metric is calculated as

According to (7), the improvement values of the traditional redifferential acceleration rule on the prediction results of the BP neural network and the Holt exponential smoothing method from 2017 to 2019 were calculated as 138.95 million, 144.56 million, and 150.94 million, and the absolute values of prediction errors were 0.157%, 1.282%, and 2.296%.

By comparing the calculated results with Table 1, it can be seen that, after improving the forecasting results of the BP neural network and the Holt exponential smoothing method, the mean value of the absolute forecast error from 2017 to 2019 is 0.642%, which is 1.518% less than the forecast error of the Holt exponential smoothing method’s “main” forecasting model.

The IPSO-redifferential acceleration method has a higher prediction value improvement accuracy than that of the traditional redifferential acceleration method, and its average prediction error is reduced by 0.602% in absolute value.

7. Conclusion

To address the problem of abnormal railroad passenger volume data due to holidays or major events and the uncertainty of weighted combination forecasting, this paper introduces the spline interpolation method to make replacement corrections for abnormal railroad passenger volume and reduce the interference of abnormal data on forecasting accuracy. In addition, an improved particle swarm algorithm is proposed to optimize the redifference acceleration rule to improve the railroad passenger volume forecasting results, and combine the BP neural network and the Holt exponential smoothing method to forecast the railroad passenger volume in Beijing. The results show that the spline interpolation method has a better prediction effect after the substitution correction of anomalous data, and the improved particle swarm algorithm also shows a better finding ability and convergence speed when solving the optimal redifference metric. Compared with the prediction results of the BP neural network, the Holt exponential smoothing method, the average weighted combination method, and the traditional redifference acceleration rule, the prediction accuracy of the IPSO-redifference acceleration rule to improve the prediction results is higher. It is worth mentioning that there are more interpolation methods in the field of numerical analysis, such as Newton interpolation and Lagrange interpolation, and which interpolation method has the best effect in solving the problem of data anomalies caused by holidays or major events is the focus of future research.

Data Availability

The data used to support the findings of the study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors acknowledge Fan Dingyuan, Yang Fei, Ji Jinghao, and Zhang Zexi for their valuable comments on this research. This study was supported by the Fundamental Research Funds for China Railway Engineering Design and Consulting Group CO. LTD (Scientific Research 2019-8).