Abstract

Transit-oriented development is described as a geographic unit with multicircle structures. Most studies have analysed the impact of the built environment within station catchment areas on metro passenger flows from a macro perspective and have lacked analysis of the circle heterogeneity. Few relevant studies have independently investigated the impact of the built environment on the passenger flow in each circle and indeed neglected the systematic interaction between inner circles and circles in the TOD area. In this study, the 800 m buffer from the station was equally divided into four circles. Based on the gravity model, the representative built environment features around the metro stations on both sides were extracted using the block attention module (BAM). Subsequently, Shapley Additive exPlanation (SHAP) was used to explore the influence of different built environment variables on passenger flow at each circle between the origin and destination stations. The results indicate the following: (1) the station-to-station passenger flow is significantly affected by the availability of transfers and the distance between the origin and destination stations; (2) the impact of different built environments on ridership significantly varies within different circles; and (3) the built environment has a similar impact on average daily passenger flow on both sides. Therefore, this study proposes strategies to optimize the metro passenger flow by developing different land use in different circles and updating the urban spatial structure.

1. Introduction

The metro system has been prioritized in China to address the problems of traffic congestion and environmental pollution owing to automobile-based transportation. Between 2015 and 2021, the number of cities with metros in China increased from 26 to 49, and the total length of the network exceeded 9000 km. Despite the impact of the antiepidemic policy, the average daily traffic intensity still reached 4800 persons/km. Meanwhile, transit-oriented development (TOD), which integrates the metro system and land use development, was applied in many cities. Many studies in this area focused on the relationship between the built environment and metro ridership at the station or station-to-station levels [13].

Since its inception, the TOD catchment area has been considered as a geographical unit with a multicircle structure to enhance the use of public transport [4]. The previous literature has highlighted the significant important roles of density, diversity, and nonmotorized friendly design on metro ridership [5, 6]. However, one critical question that has surfaced is how the land use layout within the catchment area affects ridership. Exploring the impact of different land use in different circles on ridership is key to addressing the question. Furthermore, while some studies about the delineation of TOD catchment area have involved exploring the circle heterogeneity of the built environment’s influence on passenger flows, there are two gaps. On the one hand, the studies generally developed multiple models to explore the impact of the built environment of different circles on passenger flow, respectively, ignoring the results of the mutual influence between different circles in the TOD area as an independent geographical unit. On the other hand, due to the black box nature of deep learning models, most of the studies have had to abandon their predictive power and revert to traditional statistical or machine learning models, which offer better explanatory capabilities but poorer predictive performance.

Against this background, this study aims to investigate the circle heterogeneity of the impact of different land use types on station-to-station passenger flow, considering the built environment factors of the station catchment area. To achieve this aim, the convolutional block attention module model was selected due to its requirement for less sample data while ensuring prediction accuracy [7]. This type of attention mechanism was improved to extract the combined characteristics of the originating and destination stations, respectively, and modelled the average weekday metro passenger flow within the framework of a gravity model. Shapley Additive exPlanation (SHAP) was used to interpret the model results.

The main contributions of this study include the following aspects. First, this study strengthens the analysis of the impact of built environment factors on ridership at the station-to-station level and provides limited experience for deep learning models to analyze and interpret rather than just accurately predict travel behavior. Second, in view of the TOD multicircle structure, the circle heterogeneity of the impact of the built environment on ridership is studied to provide more feasible suggestions for planners. The remainder of this paper is organized as follows. The next section reviews the relevant literature. Section 3 presents an overview of the study area and data collection. The framework of the model is shown in Section 4. Section 5 presents and discusses the results. Finally, Section 6 provides the major conclusions and proposes potential applications of the study.

2. Literature Review

In the last three decades, TOD has become a focus in the field of urban planning and transportation [8, 9]. A better understanding of the impacts of station-area built environment factors on transit ridership can improve transit performance and inform land use within station catchment planning.

In terms of land use, almost all the existing studies have highlighted the influence of population [6, 1012] and employment densities [11, 13, 14] on ridership. The findings on the relationship between the diversity and metro ridership are not consistent. The entropy index, a measure of diversity, positively influences metro ridership [15, 16]. By contrast, Cervero [17] found that the land use mix has no evident impact on metro passenger flow. Several TOD-related studies also focused on land use variables. Many scholars investigated the effects of the difference in land use or the different types of POIs on metro ridership [11, 1824]. For example, An et al. [25] suggested that the commercial building is the most critical factor for the metro ridership prediction and the effect of residential factors was inconclusive. Li et al. [26] concluded that only common residences effectively improve metro ridership and suggested that the residential and scenic spots have nontrivial effects on ridership.

In terms of transit service, variables such as road density, intersection density, number of bus stops (or lines), terminal station, and transfer station that are associated with ridership were explored. Some studies highlighted the influence of intersection density on ridership [13, 26]. However, the negative effect of intersection density was reported in another study [25, 27]. Some previous studies also concluded that the number of bus stops around the metro station could influence the metro passenger flow because the bus is one of the primary egress and access modes to metro stations [2830]. E. Chen et al. [31] concluded that the effects of the built environment around the terminal or adjacent stations are more significant than those around the normal station.

Research on the station-to-station level can be considered an extension of that on the station level. The mentioned explanatory variables were calculated for both the origin and destination. Moreover, the station-to-station level ridership is affected by traffic impedance factors, including transfer times, detours, and route distance [3236].

Most TOD development guides and related studies described the TOD as an area with a multicircle spatial structure (shown in Figure 1) and emphasized the development of differences according to the different circles of TOD. Several studies have divided the station catchment into multiple buffer bands and modelled the impact of the built environment on passenger flows within each buffer band. Gutiérrez et al. [37] developed distance-decay weighted regression using the mobility survey to forecast the Madrid metro ridership. They found that the effect of built environment factors on ridership changed with different buffers. Similar experiments conducted by Manout et al. [38] in Lyon also yielded the same conclusion. The built environment factors in the buffer zones of 0–300 m, 300–600 m, and 600–900 m were counted by Jun et al. [15], and the geographically weighted regression models were calibrated to explore the impact of land use characteristics in these buffer zones on ridership at the station level. The results showed that only the population and land use mix significantly affect the ridership in the 0–300 m and 300–600 m buffers. Pan et al. [21] conducted a questionnaire survey of 33 sites and 11 neighborhood units in Beijing and collected 300 responses to analyze the influence of shopping facilities within seven circles on the willingness of residents to shop around the nearest metro station. The results from the model revealed that the influence of different shopping facilities on residents’ shopping trip willingness varies with different circles in three aspects: significance, sign, and value.

Regarding the models that examine the relationship with ridership, the direct ridership models (DRMs) are popular owing to the minimal data requirement and easy application. Initially, the relationship between the built environment and metro ridership was assumed to be linear or log-linear [15, 37, 39]. With the development of data mining methods, the traditional linear model has gradually been supplemented by machine learning models. These models have better predictive and explanatory capabilities. For example, tree-based models have been developed in related studies because they are more adept at dealing with nonlinear relationships [28, 34, 40, 41]. However, successful machine learning models require extensive expertise in capturing highly accurate features whenever possible. For metro ridership, it is difficult to achieve high-quality feature engineering manually. Thus, deep learning models are widely used in transportation because of their excellent performance in feature extraction. For example, recurrent neural network (RNN) models, which are suitable for modelling dynamic temporal dependency occurring in time series, were widely used to predict ridership in isolated metro stations or lines [42, 43]. As convolutional kernels of different sizes can extract spatial dependencies of features by automatically learning from the data, convolutional neural networks are often used for ridership prediction at large spatial scales [44].

Although previous studies explored the relationship between built environment factors and ridership at the station or station-to-station level, they have some limitations. The influence of various built environment factors in the station catchment area on passenger flow is not independent, and it is the result of cross-influence within the TOD area as a system of independent geographical units. In other words, the spatial structure and land use layout of the catchment area also significantly impact passenger flows. Furthermore, the results of studies that focus only on global built environment factors are undoubtedly biased. However, although deep learning models performed well in extracting the implicit features of these interactions, they are rarely used to explain the passenger flow correlation owing to their black box. Modelling the effective extraction of features and their rational interpretation is extremely important for TOD planning and construction practice.

Overall, this study aims to fill the gaps in the literature by (1) quantifying the circle heterogeneity of the built environment’s nonlinear effect on ridership and (2) using deep learning models to capture potential land use features within the station catchment and explaining the model results.

3. Dataset Description

As shown in Figure 2, the Xi’an metropolitan area was selected for the study. From 2011 to 2019, four metro lines and 57 stations were built and operated in Xi’an, making the city rank eighth among 40 Chinese cities with metro lines. A total of 3192 pairs of the average daily station-to-station passenger flow on the weekday during November 2019 were counted using Auto Fare Collection (AFC) data. Although the number of metro users is staggering, the development of the metro system is still confronted by the uneven distribution of passenger flow. In particular, the Zhonglou-Xiaozhai route has the largest average daily passenger flow with 2836 riders, whereas the passenger flow for Xinjiamiao-Daminggongbei route is one rider. To optimize travel demand management, it is important to study the influencing factors of OD passenger flow.

It should be noted that this study considered the land use factors and transit service within the 800 m buffers of both origin and destination stations. Moreover, considering the average scale of plots in Xi’an and the sample size of the dataset, the 800 m buffer was divided into four circles: 0–200 m, 200–400 m, 400–600 m, and 600–800 m. The number of bus stops (S) within the four circles between the origin and destination stations was counted as the transit service factor. The areas of the administration buildings (A), residential buildings (R), and commercial buildings (B) within the different circles were calculated to represent the land use factors of the origin and destination sides. The parks, squares, and scenic spots were classified into one category (G) to calculate the land use area. Some other studies also calculated building areas according to industrial and warehousing logistics [23]. However, most of the two types of land use are located outside the study area.

To preserve the location information of different circles, a matrix was constructed to represent the five features of the four circles, considering the built environment of the origin or destination stations (shown in Figure 3). In addition, the number of transfer times and interval (the number of stops) between the origin and destination stations was used to determine the impact of travel impedance factors on the metro ridership. Table 1 describes the details and source of the variables.

4. Method

As shown in equation (1), the original gravity model is expressed in the multiplicative form to examine the relationship with station-to-station ridership.where denotes the station-to-station ridership. , , and represent the independent variable of the station , , or the traffic impedance , respectively. , , , and are coefficients that could be estimated via logarithmic transformation.

Different from the original gravity model, the block attention module-gravity (BAM-Gravity) model proposed in this study uses the block attention module to extract built environment features around the stations based on the multiplicative form of the original gravity model. As shown in Figure 4, the overall structure of the BAM-Gravity model can be divided into three stages: (1) block attention module stage, (2) fully connected stage for extracting the built environment representing the metro station, and (3) original gravity model stage for examining the relationship with the station-to-station level ridership.

As shown in Figure 5 and equation (2), along the circle layer dimension, the independent variables of the origin/destination station are forwarded to the built environment dimension neural network to produce the built environment attention weights . The built environment attention refined data are generated by . Subsequently, another neural network is connected along the built environment dimension of to compute the circle attention weights . Finally, the attention refines data are defined as .where denotes the attention refined variables, is the input dataset of the metro stations, and denotes element-wise multiplication. The structures of and are shown in Figure 5.

The independent variables of the origin and destination station are refined by the attention mechanism and then connected to the two fully connected layers (, ), separately. After another fully connected layer (), the two traffic impedance variables become a one-dimension indicator representing traffic impedance. The overall BAM-Gravity model can be summarized as follows:where and denote the ridership and the traffic impedance variables from the origin station to the destination station, respectively. and represent the built environment variables of the stations on both sides. and denote the attention mechanisms for both stations. As shown in Figure 6, and refer to fully connected layers used to extract multidimensional built environment variables refined by the attention mechanism in one dimension. Similarly, is used to extract the two-dimensional traffic impedance variables. , , , , and other weights within the model are estimated by the MSE loss function and the Adam optimizer [45].

Shapley Additive exPlanation (SHAP), a method from coalitional game theory that is as important as permutation features, is an inspection technique that can be used for any model [46]. The SHAP values measure a features importance by calculating the average of its effect on the prediction under different circumstances. Specifically, it can be defined aswhere represents the importance of the built environment variable on the station-to-station ridership and is the number of the samples. denotes the effect of the variable of the sample on ridership and is defined as follows:where refers to the variable combination that does not contain the variable of the sample, is the number of the variables, and represents the trained BAM-Gravity model.

5. Results and Discussion

To detect overfitting, 25% of the total samples were randomly selected as a test subset, and the 5-fold cross validation was used during the training process. Before applying the BAM-Gravity model, all variables were standardized. After tuning hyperparameters, when the learning rate and number of training times were 0.005 and 1200, the R2 for the BAM-Gravity model was 0.88.

5.1. Importance of the Independent Variables

The contributions of the independent variables for station-to-station ridership are shown in Table 2. Overall, there are two interesting findings that should be noted. First, the contribution of the commercial/business buildings (374.66) to the passenger flow is maximum, followed by the residential buildings (368.22) and traffic impedance (276.83). This result is consistent for most studies, and most metro trips on weekdays are to and from residential and commercial/business buildings that provide primary residence, daily shopping, and job opportunities [6, 14, 18, 20, 24].

Second, the contribution of the built environment variables in the third circle layer to ridership (403.32) is the highest, followed by the second (353.22), fourth (298.43), and the first circles (258.57). To explain this counter-intuitive result, the relationship between built environment variables in each of the circles and ridership is discussed and analyzed in detail in the following subsections.

5.2. Relationship between Ridership and the Residential Buildings

Figure 7 displays the relationship between station-to-station ridership and the residential buildings in different circles of origin and destination sides. In addition to the first circle, residential buildings in the other circles were found to significantly contribute to passenger flow. In particular, as shown in Figures 7(a) and 7(b), the residential buildings in the second circle of the origin and destination stations affecting the ridership show a similar trend of exponential contribution to passenger flow. When the residential buildings are below 50,0000 m2, the ridership is below the average and flat values in the 0–50 range. With the increase in residential floor area, ridership exceeds the average by a maximum of 150 persons. The residential buildings in the third circle of the origin (Figure 7(c)) and destination (Figure 7(d)) stations promote ridership with logarithmic trends. Gradual and slow increase in ridership are observed when residential buildings are greater than 125,0000 m2. Figures 7(e) and 7(f) depict the increase in ridership with increasing number of residential buildings in the fourth circle on both sides. When the residential buildings are less than 100,0000 m2, the ridership is below average.

Almost all the studies claimed that residential buildings or population density promote growth in passenger flow [25, 47, 48]. However, several further findings are worth noting; the residential buildings in the first circle do not significantly affect passenger flow. Residential buildings in the third circle have the most significant impact on ridership (103.39 for O, 73.21 for D), followed by the second (46.41 for O, 52.31 for D) and fourth circles (36.67 for O, 34.05 for D). The area of the first circle is too small relative to the other circles, resulting in the development of a smaller amount of residential floor area within its boundaries (a maximum of approximately 50,0000 m2). Several studies demonstrated that residential buildings or population densities within an appropriate threshold have a significant impact on passenger flow [34, 41, 47]. Fewer residential buildings in the first circle do not fall within the effective threshold for influencing ridership. Moreover, the relationships between residential buildings in different circles and passenger flow are slightly different, exhibiting exponential, logarithmic, and linear growth in the second, third, and fourth circles, respectively. The combination of distance to the metro station and the area of land available for residential development leads to these results.

5.3. Relationship between Ridership and the Business\Commercial Buildings

As shown in Figure 8, the business\commercial buildings have significantly impact on ridership in the first (32.51 for O, 31.08 for D), third (60.48 for O, 35.62 for D), and fourth circles (96.98 for O, 85.05 for D). In particular, the business\commercial buildings in the first and third circles are positively correlated with ridership, whereas in the fourth circle, ridership is inhibited, regardless of the origin or destination station.

The results of this study differed slightly from those of previous studies [25, 49], and this difference is attributed to a combination of reasons. First, the development of large commercial facilities mainly for leisure and entertainment has a certain agglomeration effect and is mainly concentrated in the first circle, whereas the commercial facilities scattered in the fourth circle are mostly small retail businesses providing daily services within a walking distance for residents; this not only reduces the dependence of surrounding residents on the metro but also makes it less attractive to outside residents. Second, the first circle has higher land prices owing to its transportation advantages, and most of the large commercial or business office facilities within its boundaries are geared towards the middle and upper-income groups. The well-equipped parking facilities and the preference of this group to travel by private car result in a less-than-expected contribution of the commercial buildings in the first circle. As a result, business\commercial buildings have the most significant negative effect on ridership in the fourth circle, while in the first circle, they do not have the expected degree of impact.

5.4. Relationship between Ridership and the Public Green Land Area\Public Administration Buildings

As shown in Figures 9(a)–9(d), there are significant negative correlations between public green space and passenger flow in the second (48.39 for O, 66.48 for D) and third (27.53 for O, 30.51 for D) circles, regardless of whether it is at the origin or destination stations.

Although Du et al. [47] and similar studies [25, 50] concluded that the public green spaces would attract more passengers and the impact has no significant difference between weekdays and weekends, the results of this study are significantly different. Despite the fact that Xi’an is a well-known tourist destination with many places of interest, the corresponding AFC data analyzed for this study show that during low tourist seasons such as November, when temperatures are cooler, fewer people exercise in parks or visit scenic areas. Therefore, the larger public open space compresses the size of other facilities, leading to a reduction in ridership.

The public administration buildings significantly affect ridership in the first circle only (33.64 for O, 31.77 for D). The effective thresholds for public administration buildings on passenger flows at the origin (Figure 9(e)) and destination station (Figure 9(f)) are 2,0000–8,0000 m2 and 4,0000–8,0000 m2, respectively. The public administration buildings include primary and secondary schools, government offices, university campuses, hospitals, stadiums, and exhibition halls. Students and patients are reluctant to walk long distances to their destinations after reaching the metro station owing to safety and health concerns. Government offices only provide a small number of jobs, and therefore, they have little impact on passenger flow. During the working day, student travel on the university campus is largely spread around the university, with few long journeys by metro. At the same time, facilities such as stadiums and exhibition halls are hardly attractive to residents with busy daily work schedules.

5.5. Relationship between Ridership and the Traffic Impedance\Bus Stops

The number of bus stops in the first circle at the origin station has a different impact on ridership from that at the destination station. Figure 10(a) shows that when the number of bus stops in the first circle at the origin station is 3, the maximum number of passengers is reached, exceeding the average by approximately 70 persons. Figure 10(b) shows that the more the bus stops in the first circle on the destination side, the higher the passenger flow. The results are attributed to three reasons. First, the existing metro system does not yet fully meet Xi’an’s travel demand because a significant proportion of trips are made using the metro-bus connection or the bus-metro connection. Second, travelers prefer the metro-bus connection to the bus-metro connection. Eighty random interviews on the preference to metro-bus or bus-metro connections were conducted to explore the underlying reasons for this result. The majority of responses received include better punctuality and speed of the metro than bus and guaranteed subsequent travel arrangements after taking the metro first. Finally, some people would consider abandoning the metro if there was a direct bus to their destination, or if they do not intend to save time. Therefore, the optimal number of bus stops on the origin side can promote an increase in ridership; however, overly developed bus routes can lead to a decrease in ridership.

Other studies claimed that the number of bus stops on both sides positively affects passenger flow [11, 26, 41]. However, Figures 10(c) and 10(d) describe the impact of the number of bus stops in the second circle on ridership as negative, regardless of whether they are on the origin or destination side. The second circle of bus stops competes with the metro, with a greater number of stops leading to a higher chance of abandoning the metro for the bus.

The number of transfers (228.42, Rank 1) is the most important feature that affects OD flows. It corresponds to the findings of the other studies [34, 36]; people are more likely to abandon the metro because of the transfers (Figure 10(e)). Figure 10(f) shows that the relationship between the intervals and passenger flow is a parabolic curve. In particular, when the interval between two stations is in the range of 0–2 or 10–18, the OD passenger flow is lower than average. Furthermore, when it is in the range of 3–9, ridership is above average. It is consistent with the results of Gan et al. [34], as both studies conclude that most metro trips are taken to cover medium distances, and considerably short travel distance cannot reflect the advantages of metro reliability, thereby resulting in more alternative modes of travel options.

6. Conclusion

To investigate the influence of built environment factors in different circles of the origin and destination stations on passenger flow, five types of built environment variables (administration buildings, residential buildings, commercial/business buildings, land area of the parks/squares/scenic spots, and the number of bus stops) in four circles (0–200, 200–400, 400–600, and 600–800 m) of the origin and destination stations and the travel impedance variables (transfer times and intervals) were used for modelling the metro station-to-station ridership. The BAM-Gravity model was employed to detect the relationship with ridership, and SHAP was used to explain the modelling results. The results of this study are expected to provide planners with better actionability.

First, the transfer is the most critical factor affecting the metro station-to-station passenger flow. The result indicates that the coupling between urban spatial structure planning and rail transit network planning should be strengthened. In other words, it suggests that policymakers can improve connective efficiency by directly connecting two important areas or gradually reducing the functional connection between the two regions in the process of urban renewal. Moreover, the result shows that people are more inclined to ride the metro for medium distances. Based on this result, planners should pay more attention to the spatial connection between the two areas at this scale in optimizing the urban structure.

Second, the bus stops within the first circle at the origin and destination exhibit a parabola and a positive correlation with the ridership, respectively. They exhibit negative correlations in the second circle, regardless of the origin or destination station. The results show that the existing metro network in Xi’an is not adequate to cover the city’s travel demands. The results also highlight the role of integrated development of bus and metro systems in regulating the metro passenger flow. When the utility of metro travel needs to be increased, the connection between the metro and the bus systems should be strengthened in the first circle on the destination side and the number of bus stops (at most 3) on the origin side should be appropriately increased. When the metro passenger flow pressure needs to be relieved, the number of bus stops within the second circle should be increased. Furthermore, transportation planners and operators can adjust metro ridership by optimizing the bus schedules of the first and second circles on both sides.

Finally, and most importantly, the results of this study reveal the differences and similarities in the impact of different land use factors within different circles on passenger flow. The findings provide some reference for the optimization of land use around the metro station, whether it is around the origin or destination station. The residential buildings contribute to noticeable improvements in passenger flow, and the improvement level arranged in the sequence is as follows: third, second, and fourth circles. It suggests that planners should focus more on the circle-level heterogeneity of the impact of residential buildings on ridership and rationally develop residential land in each circle. Similarly, planners can adjust the passenger flow by planning commercial or business office buildings in different circles. Although the land use of parks and squares has an inhibitory effect on the passenger flow in the second and third circles, it does not mean that the development of land use such as parks should be prohibited. Planners should make the most of broken spaces by arranging public open spaces along major pedestrian routes and enhancing the landscaping of pedestrian spaces.

Although this study enriches the existing research on TOD, some aspects should be further explored. First, the results show that the built environment has essentially the same impact on average daily ridership at the origin and destination stations, but there is no discussion regarding the impact on passenger flows at different periods of the day. Second, this study does not investigate the competition between travel modes and the metro in terms of traffic impedance. Third, based on the density of the road network in Xi’an, 200 m was used to delineate the station catchment area circles; however, a smaller buffer band would have led to more detailed results. These limitations of this study can be addressed in the future.

Data Availability

The datasets generated and analyzed in the current study are not publicly available due to privacy reasons but are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was partially supported by the Fundamental Research Funds for the Central Universities (300102210657).