Abstract

Timely and accurate prediction of bus passenger flow plays a crucial role in uncovering real-time traffic demand, presenting an essential and formidable challenge in the realm of bus scheduling and management. The extensive application of deep learning methods in transit passenger flow prediction can be attributed to their exceptional ability to effectively capture spatiotemporal features, resulting in superior performance. However, prevailing deep learning models in transit passenger flow prediction tend to ignore the data enhancement. Additionally, the predominant focus on a single station in the prediction task presents challenges in effectively capturing the spatiotemporal features of the entire network. A model named TSD-ST is proposed to better accomplish the task of predicting short-term transit passenger flow at multistation. The TSD-ST model leverages time series decomposition for data enhancement. Simultaneously, in addition to considering the adjacency graph, the similarity of all the stations of the entire transit network is also considered and uses multigraph convolution and graph fusion modules. This approach enables the TSD-ST model to effectively capture spatiotemporal dependencies. Experiments based on real-world bus transit datasets confirm that the TSD-ST model shows better performance in prediction tasks at 30-min, 60-min, and 90-min time scales, with an average improvement of 21.87%. The effectiveness of each component has been verified through ablation experiments.

1. Introduction

Accurate and timely multistation short-term passenger flow prediction is important for bus transit management and scheduling. Fluctuations in travel demand or the transportation system’s short-term changes can result in undesirable waiting times and congestion for passengers [1], thereby diminishing the appeal of the bus transit system to passengers. Short-term prediction of passenger flow enables bus transit managers to optimize schedules, improve station passenger flow regulation planning, facilitate vehicle operation scheduling, and achieve efficient resource allocation. This approach helps meet the demand for passenger flow while enhancing service quality.

Currently, studies on passenger flow prediction mostly focus on a single station, but due to the complex spatial structure and time-varying features of transit networks, the bus passenger flow at a single station will be simultaneously affected by the spatial and temporal features of the historical passenger flow at stations that are directly or indirectly connected to the entire network [2]. Although some methods have achieved better results in predicting passenger flow at specific stations, complex modeling for each station is inappropriate if one wants to understand the state of the entire network, especially not conducive to future network adjustment and expansion. Therefore, prediction models for a single station cannot dynamically and effectively predict the spatial and temporal distribution and congestion of the entire network, limiting the real-time passenger flow organization, formulation, and adjustment of management strategies. The task of predicting passenger flow at multistation is challenging because transit passenger flow can be affected by many complex factors [3], such as (1) temporal dependencies: bus passenger flow is affected by temporal features, and current passenger flow is correlated with historical passenger flow [4]. Moreover, passenger flow on the same workday or on the same nonworkday will show similar trends. (2) Spatial dependencies of bus passenger flow exist not only between neighboring stations under the influence of network topology [5] but also between stations that are far away from each other but located within similar urban functional areas. In conclusion, there exists a demand for a multistation short-term bus transit passenger flow prediction approach that adequately incorporates both temporal and spatial dependencies.

On the other hand, factors such as weather conditions and random events can exert an impact on bus passenger flow [6]. It is worth noting that automated fare collection (AFC) data mainly serve as the source of bus passenger flow data, and most of it becomes available only after passengers have boarded the bus. Consequently, the arrival time of the bus at the station to obtain passenger data introduces significant uncertainty in the data collection process [7]. Thus, it is necessary to develop a data enhancement methodology that accounts for both cyclic changes in passenger flow and deviations from cyclic patterns resulting from temporary factors.

In recent years, deep learning has been widely applied to short-term passenger flow prediction tasks [8]. Typical statistical models, such as autoregressive integrated moving average (ARIMA) and its variants [9, 10], are usually used for single time series prediction, which can easily compute and capture the linear features of the data, but ignore potential dependencies between multiple time series under relatively complex traffic conditions. Deep learning is much more complex in terms of structure, with a strong fitting ability for complex function processing, and is more suitable for solving short-term passenger flow prediction problems.

Recurrent neural networks (RNN) and their variants, such as long short-term memory (LSTM) and gated recurrent units (GRU), perform well in time series prediction and are widely used in mainstream research on traffic prediction [11, 12]. However, RNN-based models only focus on capturing local temporal features and ignore spatial features [13]. In order to consider the spatial dependencies of the network, some researchers have constructed prediction models using a convolutional neural network (CNN) after converting the traffic network into images [5, 14]. However, CNN-based models only consider the absolute distance relationship between stations in a two-dimensional Euclidean space and are not conducive to the expansion of network structures in the real world [15]. Graph convolutional network (GCN) provides a more suitable approach for modeling spatial similarity of transit networks compared to CNN, which maintains the real topology of the network and captures the spatial dependencies between stations [16]. Several researchers have made attempts to enhance the effectiveness of GCN through various approaches [1721]. However, it is important to note that GCN-based models solely focus on spatial features. While some researchers have attempted to integrate GCN and RNN models to create combined prediction models [19, 22], most of these combined models ignore data enhancement and lack comprehensiveness in capturing temporal dependencies, spatial dependencies, and other influential factors. As a result, these models fail to effectively address the challenge of multistation short-term transit passenger flow prediction.

A novel model, named TSD-ST, is proposed for accurately predicting multistation short-term bus passenger flow. The proposed model combines time series decomposition (TSD) for data enhancement, capturing both cyclic variations in bus passenger flow patterns and the impact of temporary random events. Additionally, a dual-view multigraph convolution and graph fusion module is employed to consider global spatial dependencies. Consequently, this approach integrates temporal dependencies, spatial dependencies, and other influencing factors. The main contributions of this paper are summarized as follows:(1)A spatiotemporal model (TSD-ST) is proposed to solve the problem of predicting multistation short-term bus passenger flow.(2)The temporal dependence of bus transit passenger flow is effectively harnessed through the utilization of a time series decomposition method, and enhancing the quality of the data.(3)Constructing two types of graphs, namely, adjacency graph and similarity graph. Through the utilization of multigraph convolution and graph fusion, a dual-view approach is employed to comprehensively consider the spatial dependencies of passenger flows between different stations(4)Experiments were conducted using a real-world bus transit dataset to evaluate the performance of the TSD-ST model in predicting multistation short-term bus passenger flow. The experimental results demonstrate that the TSD-ST model exhibits excellent predictive capability.

The remaining sections of this paper are structured as follows: Section 2 presents a comprehensive review of pertinent research in the domain of passenger flow prediction. In Section 3, a systematic overview of the components comprising the TSD-ST model is provided. To validate the performance of the proposed model, Section 4 conducts experiments using real-world bus transit datasets. Finally, Section 5 summarizes the findings of this study and discusses potential directions for future research.

2.1. Statistical Methods

Initial attempts at passenger flow prediction have focused on linear models. Among the classical prediction models, linear regression models and Kalman filter-based methods are more commonly used in the passenger flow prediction literature. In 2009, a general linear regression prediction of passenger flow was conducted using public transportation smart card data [23], and in 2011, a Kalman filter-based method was developed to predict passenger flow by using the transaction records of automated fare collection (AFC) system and the video surveillance systems equipped in buses and stations equipped with video surveillance systems to predict the short-term passenger flow at each bus station [24].

Due to the spatiotemporal nature of most of the data used for passenger flow prediction, among different linear models, autoregressive moving average (ARMA) and autoregressive integrated moving average (ARIMA) models have received attention from researchers. One study proposed an interactive multiple model- (IMM-) based approach combined with time series methods to predict short-term passenger flow on bus lines [25]. There are several studies that address the problem of passenger flow prediction through ARIMA [9, 10, 26].

Statistical methods can easily compute and capture the linear characteristics of the data. However, such methods rely on static assumptions and are largely affected by fluctuating traffic data [27]. In addition, they are difficult to reflect the nonlinear and complex features of passenger flow.

2.2. Machine-Learning Methods

Traditional machine-learning methods are better at fitting complex data, and machine-learning techniques have been widely used by scholars in the field of short-term passenger flow prediction. Some scholars have developed a bus transit passenger flow prediction model based on least squares and support vector machines (LS-SVM) for predicting passenger flow on bus transit lines [28]. A Gaussian process-based approach is proposed to model and predict bus passenger flow [29]. Some scholars developed prediction models using random forests and regression trees and used automatic vehicle location (AVL) and automatic passenger counting (APC) data from bus fleets to predict passenger demand [30]. One scholar developed a hybrid model combining wavelet and SVM models [31]. Some scholars have used random forest regression to predict short-term passenger flow in railroad transportation systems [32].

Among different machine-learning models, artificial neural network (ANN) has gained the attention of scholars, and in 2014, some scholars applied two neural network models, Elman and BP, to solve the problem of passenger flow prediction, and the comparison of the prediction results of the two methods found that the Elman method can achieve higher prediction accuracy [33]. In 2019, some scholars used the radial basis function (RBF) network method to solve the problem of short-term passenger flow prediction at stations, and the research results show that the prediction accuracy evaluation indexes are all less than 1.5%, and the model prediction performance is better [34]. In addition, scholars have proposed various network models, such as multitask convolutional neural network (MTUNN) and parallel integrated neural network [35], and some scholars have combined neural networks with other linear and nonlinear methods simultaneously in their research [36, 37].

Machine-learning methods are difficult to achieve good results in complex networks with many nodes [38]. This is because most of them rely on complex artificial feature engineering, which leads to insufficient robustness in modeling massive data and cannot handle raw spatiotemporal data.

2.3. Deep Learning Methods

The emergence of real-time data collection and information dissemination systems and the increasing complexity of transportation networks have accelerated the popularity of deep learning models. In 2017, some scholars proposed the deep neural network (DNN), applying the SAE-DNN model for passenger flow prediction, and the results of the study found that the model can get better prediction results for stations with different passenger flow features [4]. In recent years, several scholars have proposed the LSTM model to solve the problem of passenger flow prediction, and the research results found that the LSTM model has a higher average prediction accuracy of its algorithm compared with other neural network models [4, 3941]. The GRU model has the advantage of fewer parameters and faster training compared to the LSTM model [42].

The emergence of GCN provides a new idea for traffic network modeling. GCN views the whole network as a graph, which preserves the real topology of the network and extracts the spatial features more effectively [16, 18]. In addition, GCN greatly maintains the global nature of the network by convolving the entire structured graph, which is theoretically superior to CNNs that can only capture adjacent spatial patterns due to the limited size of the kernel window.

There are many researchers based on the GCN model and fused it with temporal features to achieve good results in traffic prediction problems. For example, a diffusion convolutional recurrent neural network (DCRNN) [17] has been proposed to model traffic flow as a diffusion process on a directed graph and capture spatial dependencies using bidirectional random wandering on the graph. A new architecture called Graph WaveNet has been developed which uses an adaptive adjacency matrix to capture hidden spatial dependencies [43]. A novel enhanced dynamic graph convolutional network model (RDGCNI) [44] has been proposed, which generates graph adjacency matrices representing dynamic spatiotemporal dependencies between sites. A parallel-structured deep learning model consisting of a graph convolutional network and a stacked bidirectional unidirectional long-term and short-term memory network (GCN-SBULSTM) has been proposed [45], which treats the subway network as a structured graph and introduces a K-hop matrix that combines the travel distance, the population flow rate, and the adjacency to capture the dynamic spatial correlation between subway stations. A deep learning architecture combining residual networks (ResNet), GCN, and LSTM (called “ResLSTM”) has been proposed [46], and a novel metro passenger flow prediction based on the optimization of the parameter estimation of graph convolutional gated recurrent neural network model (TMFO-AGGRU) [47], and a T-GCN model combining GCN and GRU was proposed [18]. The graph convolutional network is used to capture the topology of the road network and model the spatial dependencies, and the GRU is used to capture temporal variations of traffic data on the road.

3. Methodology

3.1. Problem Statement

The transit passenger flow prediction problem addressed in this study can be classified as a spatiotemporal prediction problem. As shown in Figure 1, the bus passenger flow data for each time interval form a spatiotemporal graph. First of all, the bus network structure can be represented as a graph structure , where denotes the set of nodes of all stations in the bus network, corresponding to the real-world station , and it is important to note that the same station belonging to different lines is distinguished into two stations, and N denotes the number of stations. denotes the connecting edges between the bus stations, including the physical connecting edges and virtual connecting edges. is the weight of an edge. In addition, the bus passenger flow is not only affected by the spatial structure of the bus network but also by the passenger flow in the historical period. At moment t, the passenger flow of each station in the network is . To summarize, the forecasting problem can be expressed as follows:where is the predicted bus passenger flow at stations at time , is the historical passenger flow at station , is the deep learning model, is the spatial graph structure of the bus network, and is the learnable parameters.

The general framework of the TSD-ST model is shown in Figure 2. First, in feature engineering module, the dataset is decomposed into three items, trend, cycle, and effect, using time series decomposition (TSD), and the Trend and Cycle are nested and then trained by GRU. Next, two graph structures are constructed based on the spatial dependence of neighboring stations and the spatial dependence of functionally similar stations, and the output values of the GRUs are passed through the multilayer GCN, respectively, and then the outputs of the two graphs are summed up and put into the temporal attention, and finally, they are nested with the effect into the MLP decoder to complete the task of multistation passenger flow prediction.

3.2. Feature Engineering Module

Passenger flow at bus stations has obvious cyclical features [48], and unlike other modes of transportation, bus passenger flow is subject to greater uncertainty and can be affected by multiple contingencies. Therefore, in order to accurately represent the time series information, a time series decomposition (TSD) model is used to extract the temporal features [49, 50].

TSD is a common and flexible conception for mode decomposition, and its benefits are reflected in our approaches to modeling the time series features, namely, trend, cycle, and effect. By using the backtracking mechanism of RNNs to define the trend vector, the trend term can model near-future changes, e.g., upward, downward, turning, and leveling. By fitting a Fourier series, which consists of a set of triangle signals, the cycle term can model the multiple periodic patterns in passenger flow changes, including peaks and off-peaks, day and night, and workdays and nonworkdays. By telling how much the current passenger flow deviates from the “cycle,” the effect term can model arranged or casual events, e.g., holidays, extreme weather, and special events.

As shown in Figure 3, the temporal features are analyzed through trend, cycle, and effect, and equation (2) represents the relationship between the three features, where is the predicted value of passenger flow at the moment of , is the predicted time step, is the trend term obtained from the change of the historical passenger flow data, the cyclical change is included in , and represents the moment t of the random effect.

“Trend” feature is used to predict recent changes in the model (e.g., upward, downward, steering, and flat). Future trends can be estimated from historical data with respect to neighboring times, so a backtracking mechanism is used to define the trend term, as shown in equation (3), where the prediction interval denotes the size of the step.

“Cycle” feature is that the standard Fourier series can be used to model periodic variations, as shown in equation (4). Theoretically, the Fourier series can fit any function as long as there are an infinite number of triangular signals; however, in practice, it is too costly to build such a model. Therefore, the number of sines and cosines is critical to the fit. Given the time, the number of triangular signals , and the period length , equation (4) can be converted to (5). In addition, in equation (5), let , , which can be obtained as equation (6). Thus, the periodicity fitting can be solved as a multivariate linear regression task. Therefore, the periodicity characteristic can be expressed by equation (7).

“Effect” feature is used to consider that, in addition to trends and cycles, bus transit passenger flow may be affected by scheduled or fortuitous events, such as holidays, extreme weather, and special events. Assuming that the passenger flow at the current time is larger than that of the past cycle, indicating that the passenger flow at subsequent moments may also be larger, and the effect feature can be expressed by equation (8). It should be noted in particular that the historical data of passenger flow come from the data of passengers boarding the bus and swiping the card, and if there is no bus arriving at the station during the sampling time interval, there will be a situation that the passenger flow is 0. In this case, when calculating the , is taken as a relatively small value for calculation, and at the same time, in order to prevent from being too large, a parameter value is chosen to constrain the upper limit of .

3.3. Spatiotemporal Module
3.3.1. Temporal Features

(1) GRU-Based Temporal Feature Mining. Aiming at the features of large, cyclical, and complex bus transit passenger flow prediction data, a gated recurrent unit (GRU) model is established to deal with temporal features. As an alternative to long short-term memory (LSTM), GRU is widely used for temporal modeling, which is usually implemented with standard convolution or full connectivity. GRU has a strong memory capability, which enables selective memory of the extracted time series features of passenger flow and backward propagation through hidden nodes. GRU possesses several advantages over LSTM, including a smaller parameter count and faster training speed. This makes GRU better equipped to effectively capture the intrinsic patterns within time series data. The specific calculation process is shown as follows:

In spatiotemporal module, and embeddings, which have gone through the feature engineering module, are nested as as input to the GRU, and and represent the weights and biases in the training process. The final output is obtained by resetting and updating the gating control, which realizes the selective forgetting and remembering of the passenger flow data and achieves the effective mining of temporal features of passenger flow.

(2) Scaled Dot-Product Attention. Scaled dot-product attention is one part of the attention mechanism proposed in the transformer model [51], which allows the model to focus on the relevant part of the input sequence during self-attention. It is computed from the three input data: query, key, and value. Scaled-dot attention in this study can pay more attention to the historical time points that contribute most to the prediction and better capture the temporal relationships. The calculation formula is shown in equation (13). Specifically, the output hidden layer of MLP graph embedding is processed by three linear functions. , , and are obtained, respectively, as shown in Figure 4.where , , and are the matrix of query, key, and value, respectively. is the dimension of query and keys.

3.3.2. Spatial Features

To comprehensively encode the relationships between stations, two graphs containing the physical neighborhood of sites and the correlation between stations were generated. Among them, the adjacency graph reflects the physical connectivity between stations, while the similarity graph represents the semantic or functional correlation between stations. To better utilize this graph information, multigraph convolution and graph fusion are used in the neural network model. These parallel graph convolution operators are able to process different types of graph data simultaneously and fuse the obtained results to capture the spatial relationships and features between stations more comprehensively.

(1) Adjacency Graph and Similarity Graph. This subsection describes how adjacency graph and similarity graph are constructed. By definition, a graph consists of nodes, edges, and the weights of the edges, so the physical and similar graphs are denoted as and , respectively. It should be noted that, unlike ordinary road networks, here stations with the same geographic coordinates but belonging to different bus routes are distinguished as different nodes. Physical and similar graphs share the same nodes but have different edges and edge weights. and are the sets of edges of different graphs, and for a particular graph , denotes the weights of all edges. Specifically, is the weight of the edges from node to node .

Adjacency graph () is constructed directly from the physical topology of the bus transit system under study. An edge connecting nodes and is formed if nodes and in connect stations and in the real world. In order to compute the weights of these edges, we first construct a physical connectivity matrix . As shown in Figure 5, if there exists an edge between nodes and , , otherwise . Finally, we obtain the edge weights . Specifically, is computed by the following equation:

Similarity graph () is constructed based on similarities between stations. Typically, stations located within similar urban functional areas will have similar passenger flow trends. For example, the boarding passenger flow of bus stations in office areas will have a significant evening peak on weekdays, while stations in residential areas will have a significant morning peak on weekdays. To quantify this similarity, this paper uses the Spearman rank correlation coefficient to measure this relationship.

The Spearman correlation coefficient is used to deal with the monotonic relationship between two variables and has a value between −1 and 1. The closer the absolute value of the coefficient is to 1, the stronger the correlation between the two variables. Bus transit passenger flow datasets often exhibit a discrete nature and may not adhere to the assumptions of a normal distribution. In contrast to the Pearson correlation coefficient, the Spearman correlation coefficient offers a nonparametric approach that does not rely on assumptions regarding the distribution pattern of the data and is also able to deal with nonlinear relationships. Therefore, when analyzing such datasets, the Spearman correlation coefficient is deemed more appropriate. For a sample size, the historical passenger flow data of the two stations at moments are first assigned a rank according to their average descending position in the overall data. The correlation coefficient ρ can be expressed as follows:

The Spearman correlation matrix is defined as follows:

On the basis of the Spearman correlation matrix, the construction of the Spearman matrix is flexible. For example, these virtual edges can be determined by a predetermined similarity threshold or weights can be assigned based on the similarity values, as shown in Figure 5, which is a simple example of the Spearman matrix.

(2) GCN for Multigraph Convolution and Graph Fusion. Conventional approaches are unable to process graph-structured data efficiently, and it is difficult to perform deep mining from graph-structured data for prediction tasks. A widely used graph neural network, graph convolutional network (GCN), has emerged in recent years, which is capable of extracting useful feature information by performing graph convolution operations on graph-structured data. A major dependency of GCN is the adjacency matrix information; however, a single adjacency matrix may not be able to adequately account for the bus stops’ complex relationships. In order to consider these relationships more comprehensively, TSD-ST extends GCN using multigraph convolution and graph fusion.

First, two relationship graphs were constructed to represent the physical adjacencies of bus stops and the correlation relationships between stops, respectively. Together, these two graphs describe the multiple relationships between stops. Next, on each relational graph, multigraph convolutional operations and fusion are performed separately to capture richer feature information. Finally, the two graph relation structures are fused together to obtain multiple spatial relations in the transit network. Through multigraph convolution and graph fusion, TSD-ST is able to better utilize the graph structure data to fully exploit the spatial information between bus stops. The process of multigraph convolution and graph fusion is shown in Figure 2. The input of convolution is , is Hadamard product, denotes the parameters corresponding to the two graph convolutions, is the number of layers of graph convolution, is the set of neighbor node indices of node , is a custom parameter, and denotes the output of the layer graph convolution. The specific process is as follows:

Both MLP embedding and MLP decoder use multilayer perceptron (MLP), which is capable of learning complex nonlinear relationships, and the specific process is as follows:

4. Experiments and Analyses

4.1. Dataset and Settings

The TSD-ST model was validated using the April 2018 Beijing bus transit dataset, obtained in real time through the automated fare collection (AFC) system. This dataset includes various information, such as bus route numbers, passenger boarding times, boarding stop numbers, and alighting stop numbers. Prior to analysis, the dataset underwent preprocessing to eliminate erroneous data, including instances where passengers boarded and alighted at the same stop, empty data records, and cases where alighting occurred before boarding. This preprocessing step was crucial to ensure data validity. Furthermore, the dynamic information, such as bus line numbers, boarding station numbers, and alighting station numbers, was matched with static line information, encompassing details such as line names and boarding and alighting station names, and the corresponding latitude and longitude coordinates. By combining these datasets, comprehensive bus passenger travel data are obtained. Finally, different running directions of buses on the same line are distinguished according to the order of boarding and alighting stations, and these traffic data are summarized into raw data every 15 minutes. The time series decomposition (TSD) method is utilized to decompose the data in advance. The training set encompasses data from the initial 24 days, while the test set comprises data from the final 6 days. The ratio of the training set to the test set is maintained at 8 : 2.

Six different types of lines were selected for the experiment as shown in Figure 6, and the basic information of these six experimental lines is described in Table 1. According to these lines, the passenger flow at each of their stations for three consecutive days and the Spearman rank correlation coefficients between the stations were statistically analyzed as shown in Figure 7, and it is evident that there is a more pronounced temporal dependence and spatial dependencies in bus transit passenger flow.

To evaluate the TSD-ST approach, all the experiments were conducted on a Windows workstation equipped with an NVIDIA Quadro RTX 4000 GPU, an Intel(R) Core(TM) i9-10900K CPU, and 64 G RAM. According to several experiments, the parameters are set as follows: the number of epoch is 2000, the learning rate is 0.003, the model is optimized using the Adam optimizer, and the loss function uses the mean squared error (MSE), and in order to better train and measure the model effect, this paper chooses the root mean squared error (RMSE) and the mean absolute error (MAE) as the evaluation indexes. When calculating , set  = 0.01 and  = 50. The Spearman matrix was constructed by setting the threshold  = 0.6, and the multigraph convolution and graph fusion process was performed by taking  = 2, using a 2-layer graph convolution for the computation, and taking  = 0.5.where and are the true value and predicted value, and n is the number of all predicted values.

4.2. Experimental Performance

Prediction tasks with different time scales, including 30-min prediction, 60-min prediction, and 90-min prediction, were conducted to evaluate the performance of the TSD-ST model. Some commonly used and latest prediction models were selected for comparison:(1)Historical Average (HA): a method of averaging historical observations as a prediction for the future(2)Least Absolute Shrinkage and Selection Operator (LASSO): a linear regression method for feature selection and sparse modeling(3)Fully Connected Neural Network (FCNN): a traditional artificial neural network (ANN) architecture(4)Graph Convolutional Network (GCN) [52]: a deep learning model for processing graph data(5)Long Short-Term Memory (LSTM) [53]: a variant of recurrent neural networks (RNN), and LSTM is widely adopted in the literature for time series prediction(6)Gated Recurrent Unit (GRU) [54]: a special recurrent neural network (RNN) model(7)Graph Gated Recurrent Unit (GGRU) [47]: the linear operation in the gated recurrent neural network is replaced by the graph convolution operation, and GGRU uses attention mechanisms in graph convolution(8)Temporal Graph Convolutional Network (T-GCN) [18]: a spatiotemporal graph convolution model for traffic prediction

In the experiments, the simplest structure of the proposed model was first established, and then, the structures of other comparative models were determined accordingly, ensuring that each model had the same number of learnable parameters and input information. In addition, for the model-agnostic parameters, we used the most common settings and deployed the same configurations fairly to each test, such as the learning rate, epoch number, and optimizer. The experimental results are shown in Table 2.

The experimental results show that traditional time series prediction methods such as HA are not suitable for nonlinear data with complex features such as bus passenger flow, and deep learning models such as GRU and GCN can learn the trend in the temporal and spatial dimensions to a certain extent, but due to the large stochastic of the bus passenger flow, learning only from a single temporal dimension or spatial dimension does not effectively capture the features. GGRU and T-GCN, as combined models, synthesize the temporal and spatial features, but they did not achieve satisfactory results in experiments. The TSD-ST model outperformed all other comparison models across all prediction tasks, exhibiting an average improvement of 18.08% in root mean square error (RMSE) and 25.65% in mean absolute error (MAE) on the three time scales. Specifically, the RMSE prediction metrics for 30-min, 60-min, and 90-min predictions were improved by 14.82%, 18.42%, and 20.73%, while the MAE was improved by 21.94%, 26.39%, and 28.00%. In comparison with the T-GCN model, the RMSE was improved by 15.02% and the MAE was improved by 22.56% on average in the prediction task for the three time scales. Specifically, the RMSE prediction metrics on 30-min prediction, 60-min prediction, and 90-min prediction were improved by 13.20%, 15.38%, and 16.43%, and the MAE was improved by 19.95%, 23.55%, and 23.80%.

As shown in Figure 8, the TSD-ST model excels in effectively handling the presence of unavoidable zero values within the bus transit dataset, demonstrating swift responsiveness to changes in transit passenger flow, and accurately capturing the peak periods. Conversely, as the prediction task’s time scale expands, the performance of the T-GCN model notably diminishes, particularly when forecasting peak passenger flows. In contrast, the TSD-ST model consistently maintains its superior predictive capability.

In summary, real-world prediction tasks can demonstrate that TSD-ST is able to learn multiple spatial feature relationships between nodes of a bus transit network and capture the fluctuating trend of passenger flow changes in the time dimension and is more applicable to bus transit datasets. The model can excellently fulfill the task of multistop passenger flow prediction.

4.3. Ablation Experiments

To evaluate the effects of different components/modules in the TSD-ST, a series of ablation experiments were conducted to observe the change in overall performance by gradually removing certain design components/modules.

The results of these ablation experiments are presented in Table 3. It can be seen that deleting each component has an impact on the overall performance. In the 30-min prediction task, when the components of adjacency graph, similarity graph, cycle, effect, TSD, and AT & MLP decoder were reduced, RMSE values increased from 10.13 to 10.62, 10.90, 10.85, 10.16, 11.28, and 10.23, and MAE values increased from 5.90 to 6.34, 6.53, 6.55, 6.04, 6.90, and 6.14. In particular, when the similarity graph and time series decomposition (TSD) components were removed, there was a greater impact on the results. This observation further validates the conclusion of the previous analysis that bus passenger flow data have obvious temporal dependencies and spatial dependencies; the similarity graph captures the similarity between different stations to better utilize the spatial information, while the TSD model accurately predicts the temporal evolution trend of the bus passenger flow to make full use of the temporal dependencies.

5. Conclusions and Future Work

Accurate and effective passenger flow prediction can provide data support for bus transit operation planning and assist decision-making. In this study, a new deep learning framework (TSD-ST) is proposed for multistation short-term bus passenger flow prediction. The network can effectively incorporate temporal dependencies, spatial dependencies, and internal and external influences on bus passenger flow. Since bus passenger flow will show different trends between weekdays and nonweekdays, time series decomposition (TSD) can fully consider the temporal variation of passenger flow to understand the fluctuation and trend patterns of passenger flow in different periods, and it also takes into account the influence of random events. The similarity graph is introduced, the spatial similarity between stations is fully considered by the Spearman rank correlation coefficient, and multigraph convolution and graph fusion are performed to make up for the lack of traditional spatial learning limited to topological relations. Local and global features are modeled more accurately using GRU, temporal attention, and MLP.

Experimental results on real-world bus transit datasets show that the accuracy of both the TSD-ST proposed in this paper outperforms the current state-of-the-art models. The ablation experiments validate the effectiveness of the TSD module as well as the similarity graph in improving the overall accuracy, suggesting that considering the global spatial correlation of the network rather than the simple physical adjacency has greater advantages in the task of bus transit passenger flow prediction. In addition, the relatively low accuracies of GGRU and T-GCN demonstrate that bus transit passenger flow prediction is significantly different from other traffic prediction tasks, that bus transit passenger flow datasets have unique features, and that the TSD-ST model maintains the integrity of temporal and spatial dependencies to a large extent and is more applicable to bus transit passenger flow prediction.

Improvements can be made in future work. One issue is the expansion and refinement of the “effect,” such as incorporating more temporary factors, such as weather conditions, temporary events, and large meetings, to enhance the ability of the prediction model. In addition, the TSD-ST model can be further improved by using more bus transit passenger flow data from other cities to validate and optimize the model.

Data Availability

The bus transit data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported financially by the Science and Technology Planning Project of Guangdong Province (grant no. 2023B1212060029).