Abstract

The staggered work hours (SWH) policy is a practical strategy for managing travel demand, aiming to spread out the temporal distribution of travel volume by adjusting the schedules of travelers’ activities. The influence of the SWH policy on the commuting patterns of passengers using bus transit is not yet clear. We addressed this issue in a many-to-one bus line, treating commuters as Q-learning agents learning to minimize regrets by selecting appropriate bus runs. The learning outcomes reveal a SWH-induced equilibrium, where commuters departing from the same station with the same work start time experience identical minimal commuting costs, regardless of the chosen bus. Subsequently, we investigate the effectiveness of SWH policy by manipulating two key control variables: the division of travel demand between two categories of travelers and the staggered time interval. The results confirm that congestion during peak hours can potentially be mitigated by carefully selecting the above two key parameters. Correspondingly, we provide optimal control boundaries for these two parameters to design an effective SWH policy. Furthermore, we explore the combined impact of physical distancing and SWH policy on traffic flow patterns during an epidemic outbreak. Concurrently, we assess the infection risk through a surrogate index, revealing that the SWH policy has a positive effect in mitigating the risk of contact exposure.

1. Introduction

One feasible solution to alleviate urban congestion is the implementation of travel demand management measures. These measures aim to redistribute travel demand in terms of space, mode of travel, or time by modifying the travel behavior of traffic participants. Usually, travel demand management policies include ridesharing, car sharing, on-demand services, tolls, and flexible work arrangements. Staggered work hours (SWH) policy is one type of flexible work arrangements, and it is the primary focus of this paper. Unlike rigid work schedules, SWH policy permits employees to maintain the same number of daily working hours but with varying work schedules. One significant advantage of the SWH policy is its ability to disperse travel demand during the peak hours, thereby mitigating peak congestion and reducing commuting times. Comprehending how SWH policy influences commuters’ travel behavior is essential in the development of an effective SWH policy and optimizing the transport system during its implementation.

The optimal distribution of work start times was first examined by Henderson [1]. He carried out a theoretical study by considering traffic congestion and productivity effects. Arnott [2] generalized Henderson’s model by considering firm heterogeneity and analyzed optimal congestion tolls. Rather than relying on the flow congestion model (e.g., Refs. [1, 2]), subsequent studies depict the dynamic congestion pattern during peak periods through the bottleneck model introduced by Vickrey [3]. A considerable number of studies have expanded Vickrey’s model in diverse ways. For instance, scholars have taken into account limitations such as vehicle parking [4], broadened the model to networks with multiple bottlenecks [5], and evaluated the impact of pretrip information on the selection of departure times [6, 7]. For a thorough understanding of the extensions and applications of the bottleneck model, interested readers are directed to the review literature by Small [8] and Li et al. [9]. Then, using the bottleneck model, several researchers have studied the influences of the SWH policy on traffic congestion and urban productivity [1012]. Recently, Yang et al. [13] explored the influence of SWH policy on the departure time choice behavior of commuters by experimental studies.

Public transit networks handle substantial passenger loads, particularly during morning and evening peak hours. Therefore, it is essential to understand how commuters choose their departure times when utilizing urban mass transit services. Several studies have tackled this problem. For instance, Huang et al. [14] developed a departure time choice model for a one-to-one transit line and assumed that commuters choose their departure time by trading off the costs of in-vehicle crowding with the costs of schedule delays. In a subsequent study, Tian et al. [15] expanded commuting patterns to a many-to-one transit line and provided equilibrium properties. De Palma et al. [16, 17] discussed the formulation of in-vehicle crowding in public transport and obtained the optimal pricing and the optimal scheduling. Other than in-vehicle crowding, some researchers assume that the primary congestion cost of travelling is the waiting time at oversaturated stations. For example, Yang and Tang [18] depicted a rail transit bottleneck model, where commuters select their departure times by trading off between schedule delay costs and queuing time costs. Then, they proposed a fare-reward scheme to relieve queuing congestion at transit stations. Tang et al. [19] proposed a hybrid fare scheme by considering the heterogeneity in transit commuters’ scheduling flexibility. However, nearly all previous studies have focused on equilibrium departure rates, assuming that commuters share the same work start time. The travel behavior of public transit commuters remains unknown when implementing a SWH policy. Our work aims to fill this gap.

While mathematical equilibrium models are effective in examining equilibrium properties, they encounter analytical challenges when dealing with complex real-world factors, such as user heterogeneity, time-varying demand, and flexible transport services. In addition, these models overlook the crucial aspect that travelers learn and self-adjust their behavior over time. As such, agent-based simulation technology appears to be one feasible way to deal with user equilibrium. Such a method is inherently superior insofar as it can depict individual responses to the work start time and the interaction between other participants much more realistically. It has been proved that agent-based simulation technology is effective, flexible, and expansible in traffic system modelling [20, 21]. Yang et al. [22] utilized a multiagent-based Q-learning algorithm for evaluating the influence of SWH policy by simulating travelers’ time and location choices in their activity patterns. Xie et al. [23] simulated commuter departure time choices based on the BM reinforcement learning model in a many-to-one bus transit scenario. In our approach, passengers select their departure time guided by the regret theory. This theory posits that an agent’s decision is influenced not only by the associated utility but also by the anticipated disutility (regret) for not making a better decision [24]. The adjustment process is modeled using a multiagent-based Q-learning method, where the regret value serves as the reinforcement learning signal to guide choices.

So far, the SWH policy has been applied to tackle issues like congestion, but it could also bear significance for addressing other societal challenges, such as public health crises during an epidemic outbreak. For instance, in the context of the COVID-19 pandemic, certain measures like lockdowns or travel interventions have been implemented to decrease interactions among travelers on public transport (Thomas et al. [25]). One such intervention is physical distancing, which mandates that the occupancy of vehicles or facilities never exceeds a predetermined threshold (e.g., 50% of the maximum vehicle capacity). Consequently, the initial problem transforms into peak-hour bus commuting with a capped bus capacity. Then, our primary concerns involve assessing the collective impact of capped bus capacity and SWH policy on traffic flow patterns and understanding the properties of the resulting equilibrium state. In addition, we seek to explore the role of the SWH policy in reducing the probability of infection. To our knowledge, these topics have not been previously discussed in the realm of public transit systems. The insights gained can offer valuable recommendations for adjusting public transportation operations and scheduling residents’ work hours during epidemic outbreaks.

In summary, the purpose of this study is to delineate the departure patterns of commuters travelling on a capacity-limited urban bus transit line. Specifically, we aim to understand how the combined effects of SWH policy, line characteristics, and physical distancing influence commuters’ choices of departure times. We make the following contributions:(1)We derive the SWH-induced equilibrium using a multiagent-based Q-learning algorithm, in which the regret value is considered as the reinforcement learning signal guiding departure time choices.(2)We evaluate the SWH policy by analyzing the properties of the equilibrium state of traffic flow in terms of the commute travel cost and time-space distribution of departure flows.(3)We examine the combined effect of physical distancing and SWH policy on traffic flow patterns on public transit during an epidemic outbreak.(4)We provide optimal values for the staggered time interval and the proportion of the staggered population in designing a SWH policy, both with and without the requirement of physical distancing.

We emphasize that we test the SWH policy only from the perspective of commuters, i.e., based on minimizing their travel costs. The benefit analyses of the other two participants—the bus transit operator and the company—are beyond the scope of our discussion.

The remainder of the paper consists of four sections. Section 2 defines the problem and provides an overview of the methodology. Section 3 introduces an agent-based Q-learning model. We experimentally evaluate the SWH policy in Section 4. Conclusions and future studies are discussed in Section 5.

2. Problem Definition

We consider a bus line connecting a central business district to several residential areas, as depicted in Figure 1. Commuters adjust their departure times in response to assigned work starting times, taking into account in-vehicle crowding costs and schedule delay costs. This scenario aligns with previous studies [15, 23], offering ideal reference lines for model verification.

More precisely, the bus line includes board-only stations and a destination station . We refer to stations near the start of the line as upstream stations, and stations near the end of the line as downstream stations. During the peak hours, a number of commuters travel through the bus line, where is the number of commuters departing from station . The bus company schedules buses during the peak period, and each bus has a maximum capacity of (passengers). The buses arrive at the destination station with fixed-time headway (h). Let be the index set of buses, where 1 denotes the first bus reaching the destination. For modelling tractability, the running time in each of two neighboring stations from to is assumed to be constant and is denoted by .

All commuters are considered to be frequent users who are acquainted with the bus timetable through day-to-day learning or have complete information about the schedule provided by the traffic authorizer. Under this assumption, passengers experience zero waiting time at the station. Therefore, the departure time choice problem transforms into a bus run choice problem, illustrating how commuters select a bus that minimizes their total generalized commuting costs.

Figure 2 illustrates the scheme of the multiagent-based learning process. Commuters respond to the imposed work starting time by minimizing their disutility of travelling and arriving by selecting their departure time. In this dynamic system, one agent must alter his/her departure time to respond to other agents’ decisions. When an agent takes a particular bus, it will increase the degree of in-bus congestion, affect the ride experience of other agents, and consequently influence their decision-making. By considering the mutual interactions among commuters, all participants’ schedules can be calculated. As such, the accumulative volume distribution can be determined. For these reasons, such models are suitable for investigating how individual agents interact and learn to maximize their rewards. All agents are expected to converge to the state represented by the equilibrium if they are rational. In other words, each agent aims to choose the strategy that maximizes their utility function, creating a steady state—a combination of strategies for all agents—where no agent can benefit by unilaterally changing their strategy.

Here, we clarify two key components in Figure 2.

2.1. SWH Policy

As a preliminary analysis, we here consider a simple double-work start-time scenario. Then, the control variables relevant to the SWH policy are (i) the demand proportion of the two groups and (ii) the staggered time interval of the two groups.

Formally, bus commuters are divided into two groups: (1) commuters in Group 1 have the same work start time and (2) commuters in Group 2 have the same work start time . Let , and then, the staggered time interval equals to . The number of commuters within the two groups follows the division of , where is the proportion of Group 1 among all commuters. For instance, represents that 70% of the commuters belong to Group 1 and the remaining passengers are in Group 2.

2.2. Bus Operation Policy

Here, we are referring to a bus operation policy concerning physical distancing during epidemics. We consider two scenarios: one with the adoption of physical distancing and another without it. In the first scenario, we assume normal conditions where urban buses can be used up to their full physical capacity. In the second scenario, which pertains to epidemic conditions, the occupancy rate of vehicles must not exceed a predefined threshold to ensure safe social distancing, i.e., 50% total occupancy. Therefore, the control variable relevant to the bus operation policy in this context is the bus occupancy.

3. Multiagent-Based Q-Learning Model

In our approach, commuters are viewed as Q-learning agents who make departure time decisions. In what follows, we use the words “commuter” and “agent” interchangeably thereafter. One agent’s decision will influence other agents’ decisions when travelling in the same bus line. For example, an agent choosing to take a certain bus will increase the degree of congestion in this bus, thus affecting other agents’ ride experience. To avoid congestion or capacity limitation, the agent who initially decides to take the same bus may select a new bus, which will again influence other agents.

The following basic concepts need to be defined in advance when implementing the Q-learning algorithm.(i)Action Set: This corresponds to the set of bus runs, as we have transformed the problem of choosing departure times into a bus run selection issue.(ii)Reward: This represents the immediate feedback received upon taking a bus, and in our study, it is the inverse of the generalized commuting cost. The value of the commuting cost is bus-dependent, i.e., the number of agents who take t\he same bus.(iii)Q-Table: Utilized for calculating the maximum expected future rewards associated with an action. In this paper, the regret value serves as the reinforcement learning signal. Each agent maintains a Q-table that stores regret values for each bus run. A lower regret for a particular action implies a higher reward or, equivalently, a lower cost associated with taking that action.

Algorithm 1 presents the pseudocode of such a learning process in a daily iterative manner. At the beginning of a learning episode, agents receive the average congestion cost of each bus run based on previous days. Afterward, each agent chooses a bus run by using the policy derived from the Q-table. Then, the agent takes the bus and records its commuting cost (Section 3.1). When the travel is finished, the agent estimates his/her regret using the actual commuting cost and the received history information (Section 3.2). As an intermediate step, each agent also estimates the costs of his/her nontaken buses to compute the regret. Eventually, the Q-table is updated and guides the bus run selection on the next day (Section 3.3).

(1)Initialize Q-table: ;
(2)Initialize history of estimates: ;
(3)Initialize learning and exploration rates: , ;
(4)Fordo
(5) Receive app recommendations ;
(6) Update learning and exploration rates: , ;
(7) Choose action using policy derived from Q-table;
(8) Take action and observe the commuting cost using equation (1);
(9) Update estimate using equation (5);
(10) Update regret of action using equation (6);
(11) Update Q value of using equation (8):
(12)End
3.1. Generalized Commuting Cost

We use the term “reward” instead of “cost” for consistency in our terminology. The reward is inversely associated with one agent’s generalized commuting cost from taking a bus run. Generally, commuting costs encompass ticket fare, crowding costs, in-bus travel costs, and penalties for schedule delays (early or late arrival). For simplicity, we assign a zero value to the ticket fare since commuters leaving from the same station incur identical fares, which do not impact their departure time choices. Besides, the travel cost is the same for all commuters departing from the same station, and it does not influence the departure time choices of commuters. Therefore, without loss of generality, we set the value of the travel cost to zero from the same station in our discussion. In summary, commuters merely make their bus runs choices by trading off their in-bus crowding costs and the schedule delay penalties.

Specifically, let denote the total commuting cost of a commuter who departs from station and takes a bus run . is given in the following equation:

In equation (1), is the commuter’ crowding cost by taking bus run at station , and its value is determined by the degree of crowding effects and the in-bus time. Then, can be calculated bywhere indicates the number of commuters from station taking bus and is the time spent on the bus between two neighboring stations and . The function calculates the crowding cost per unit of in-bus travel time, which is assumed to be monotonically increasing with the number of commuters carrying on.

In equation (1), indicates the schedule delay penalty with respect to the scheduled work start time by taking bus service . We assume that there is a bus arriving at the workplace punctually, and this bus run is labeled by . We also call the work start time. In this way, any bus run with index will ultimately arrive early with an early arrival time of , while any bus run with index will arrive late with a late arrival time of . Thus, the schedule delay cost is given aswhere the coefficients and are the costs of a unit schedule delay that is early and late, respectively. According to Small [8], we set .

All commuters are assumed to be homogeneous regarding the value of time, the schedule delay coefficients, and the feeling of congestion. Heterogeneous commuters can be easily distinguished by their difference in the value of the travel time and schedule delay costs. To make the conclusions more concise, we do not consider the departure choices of heterogeneous commuters in this work.

3.2. Regret Estimation

Within the Q-learning algorithm, regret serves as a reinforcement signal, guiding commuters to minimize their estimated regret. To calculate regret, a commuter must possess comprehensive knowledge of (i) the average cost incurred by the commuter and (ii) the average cost of the best-fixed action in hindsight. Unfortunately, determining the latter necessitates advance knowledge of the commuting cost for all bus runs each day, a task typically impossible in reality. To solve this, Romas et al. [24] proposed an alternative definition of regret that describes the estimated regret of each action. A commuter can estimate regret according to this model by combining global and local information.

The global information refers to the mean estimated reward of all bus runs in the system. As suggested by Romas et al. [24], such information can be collected by a central authority at the end of each day and sent to terminal clients through a mobile app. Here, the app recommendations are merely used to calculate the agents’ regrets in their decision-making process. For a bus run , let be the reward for taking bus run at station on day . The value of is inversely associated with the commuting cost , i.e., . Using such information, the app can compute the mean reward for all bus runs. At a given station , let be the mean reward of taking bus at station up to time , and can be calculated by

The local information, on the other hand, is the actual reward an agent gained. The history estimate of an action can be defined as , where represents the most recent reward estimate of one agent for taking bus run on day departing from station . More specifically, the value of is given by equation (5), depending on whether or not the action is executed in the current day. We use to distinguish the bus run taken by the agent on day from any of its other buses . If , equals to the experienced reward by taking bus run . Otherwise, we assume the reward of nontaken actions is the same as the previous day’s estimation. That is, can be approximated by the most recent observation:

Building upon the local and global information from the above definitions, we can now formulate the estimated action regret. Let denote the estimated regret of taking bus run at station up to day , with the formulation provided in equation (6). The former term on the right-hand side of equation (6) is a linear combination of the local average reward and the global average reward by taking a bus. By maximizing the reward, we can find the best estimated bus run with the maximum expected reward. The latter term on the right-hand side of equation (6) represents the history estimates of taking a bus. Thus, the estimated action regret can be seen as an estimate of the average amount lost up to time for not taking the best estimated action.

3.3. Learning Process

A sketch of the learning process is illustrated in Algorithm 1. In the Q-learning model, we use the principle to balance exploration and exploitation. The principle works as in equation (7). A uniform random number between 0 and 1 is generated and then compared with . We call the exploration rates. If the new generated number is smaller than , we choose to explore, i.e., not to exploit what we have learned so far. In this case, the bus run is selected randomly, independent of the action-value estimates. Otherwise, the approach selects the bus run with the highest estimated reward most of the time.

Taking a commuter departing from station for instance, his/her learning process works as follows. At each day , he/she receives recommendation information from the app. Then, he/she chooses a bus obeying the principle. Upon arriving, the commuter calculates his/her commuting cost immediately. Afterward, the commuter updates his/her history action estimation using equation (5) and calculates the estimated regret of taking a bus run using equation (6). Finally, the commuter updates the Q value of action using the estimated action regret for that action, as follows:where is the learning rate.

In the interactive progress, the learning rate and the exploration rate are updated by multiplying by the decay rates and as and , respectively. Initially, according to [24], we set , , and .

4. Results and Discussion

4.1. Parameter Settings

The simulation conditions are set as follows: stations, buses, (h), , , , and (persons). According to Tian et al. [15], (RMB/h) and (RMB/h). The default bus capacity and time headway are set to (persons) and (h), unless otherwise stated. For the SWH policy, we fix the work start time of Group 2 to . Then, the control parameters are the travel demand division and work start time of Group 1. Here, the proportion is identical for the passengers boarding at all the stops. In one test, the iteration in the learning model is set to . When calculating the mean values of the related cost, 50 repetitions are used to guarantee accuracy.

4.2. Spatial-Temporal Characteristics

We first plot the aggregative travel profile in Figure 3 without implementing a SWH policy, i.e., , serving as a reference line for comparative experiments. Besides, the commuting costs of commuters on each bus are also calculated at each station. The result is represented in box plots (25%–75% quartile, 1.5 IQR) as shown in Figure 4.

We could find the following observations from Figures 3 and 4:(1)Commuters from upstream stations utilize more bus services than those from downstream stations. For example, commuters departing from station take the bus services in a range of , and this range decreases to , , and for commuters departing from , , and , respectively. That is to say, the farther the station is from the workplace, the longer the duration of the commuting period.(2)The profile of the cumulative number of departures exhibits a single peak shape. Under the hypotheses that per time unit cost of a late arrival is higher than per time unit cost of an early arrival, the time-declining rate of the late-arriving commuters is higher than the time-increasing rate of early-arriving ones. Due to the limited bus capacity, buses around the on-time service () are fully occupied.(3)The commuting cost at stations , , and exhibit centralized distributions. That is to say, commuters from the same departure station have almost the identical and minimal commuting costs regardless of which bus they take. In other words, user equilibrium is almost achieved. We use the term “almost” because the standard deviation of commuting costs for users departing from the station is relatively high.

The aforementioned simulation results align closely with the referenced analytical results obtained by Tian et al. [15]. This confirms the reliability of the proposed learning model and thus enables us to apply it to a numerical evaluation of the effect of SWH policy.

The outcomes of applying the SWH policy are presented in Figures 5 and 6, depicting departure time profiles and the corresponding commuting costs, respectively. Here, the results are obtained from a typical SWH policy with , , and and other default parameters. The staggered time interval is 50 minutes, given that the time headway is (h).

From Figures 5 and 6, we draw the following findings:(1)The SWH policy does influence commuters’ departure time choices and alters the cumulative departure flows. The profile of the cumulative number of departures exhibits a double-peak shape. The two peaks are at the work start times of and , respectively.(2)The SWH-induced equilibrium is identified, where commuters departing from the same station with the same work start time encounter identical minimal costs, regardless of the bus run they choose. As shown in Figure 6, the cost variation of commuters from the same group is considered small at the same station.(3)This segregation ultimately leads to a reduction in the mean commuting cost. For instance, when the SWH policy is implemented, the mean commuting cost for commuters from station is 10.58, compared to 14.26 when the SWH policy is not implemented.

It can be expected that as the staggered time interval increases, commuters from the two groups will gradually become more separated. When the time interval is significantly large, commuters from the two groups will not share the same bus run. To elucidate this segregation effect, we introduce a new index to measure the degree of mixing between the two categories of commuters.

Here, the mixed degree is defined as follows:where and are two sets with the elements of the bus index serving the commuters from Groups 1 and 2, respectively. Thus, indicates the number of buses that are shared by both groups and refers to the total number of utilized buses. When , commuters from the two groups are totally separated from each other; when , all buses are shared by commuters from the two groups.

Figure 7 shows how the ratio varies with the travel demand division and staggered time interval . Note that, for a given travel demand division, there is a critical staggered time interval that divides the curve of the mixed ratio into two regions: a volume-mixed region and a volume-separated region.

In the volume-mixed region, the value of the mixed ratio reduces as the staggered time interval increases. When the staggered time interval is larger than the aforementioned critical value, does not depend on the staggered time interval anymore, with a minimum value of 0. This means that in the volume-separated region, a commuter’s decision is not influenced by the commuters from the other group. Moreover, the results suggest that the value of such a critical staggered time interval depends on the travel demand division. Usually, it increases with the value of . That is to say, a smaller staggered time interval is enough to separate the two groups if the volume proportion of groups with the earlier work start time (i.e., Group 1) is more considerable.

4.3. Optimal Design

For the SWH policy, the relationship between the staggered time interval and the proportion of the staggered population needs to be determined appropriately to reduce the in-vehicle crowding. Depending on whether physical distancing is enforced, we solve the SWH policy design problem in a normal case (in Section 4.3.1) and in a pandemic outbreak case (in Section 4.3.2).

4.3.1. Normal Period

There is no passenger flow restriction in the normal period, and each bus can serve passengers to its maximum capacity, i.e., persons. Three types of costs—the mean total commuting cost, the in-vehicle crowding cost, and the schedule delay cost—are calculated by altering the staggered time interval and the demand division. The results are illustrated in Figure 8. Here, the mean values of the three related costs are calculated by averaging all the commuters’ costs in the transit system.

As indicated in Figure 8(b), achieving the minimum crowding cost requires satisfying two conditions: (1) ensuring a sufficiently large staggered time interval, and (2) equally dividing the staggered population proportion. To gain an exact solution, we address the following two subproblems: (I) determining if there exists an optimal time interval that minimizes crowding costs for a given demand division and (II) establishing whether there is an optimal division of commuters that minimizes crowding costs when the staggered time interval is specified.

(1) Fixing the Demand Proportion of the Two Groups. Figure 9 shows the relationship between the mean crowding cost and the staggered time interval for six selected travel demand divisions. The curve labeled by stands for a special case of SWH policy where two groups have the same work start time. This is used as the baseline for comparison.

One can observe the following conclusions from Figure 9:(1)There are two critical values of that divide the crowding cost curve into three regions: a policy-failure region, a cost-reduction region, and a minimum cost region. We distinguish the above two critical values by and , and . In the policy-failure region, i.e., , the mean crowding cost is independent of the staggered time interval. Implementing the SWH policy in this region will not ease rush-hour congestion. In the cost-reduction region, i.e., , the mean crowding cost reduces as the staggered time interval increases. Then, when , the crowding cost does not reduce anymore and maintains a minimum value in the minimum-cost region. Taking for example, the two critical values of the staggered time interval are and , corresponding to 20 min and 85 min, respectively.(2)Let denote the optimal staggered time interval, where the minimum cost is achieved by the smallest staggered time interval. By definition, . We find a tight relationship between the value of and the value of the demand mixed ratio . Recall that in Figure 7, the volume relationships of the two groups exist in two cases: a volume-mixed region and a volume-separated region. The above two regions are rightly separated by the critical value . The minimum crowding cost is achieved by separating the two classes of commuters until they travel independently, i.e., in a volume-separated region. When the staggered time interval is smaller than , the two groups in a volume-mixed region and their departure time decision affect each other. A simple case is provided in Figure 10 with demand division .(3)The optimal staggered time interval value depends on the quantitative relationship between two staggered groups. Generally, the optimal staggered time interval decreases as the demand division increases.(4)With regard to the minimum cost, it is also sensitive to the demand division. Due to demand division symmetry, i.e., and , will finally have the same minimum cost. Moreover, among all of the divisions, the minimum crowding cost can be achieved when .

(2) Fixing the Staggered Time Interval. Figure 11 presents the mean crowding costs as a function of the demand division for six given staggered time intervals. The curve with represents a special case where SWH policy is not implemented, serving as the baseline for comparison.

One can reach the following conclusions from Figure 11:(1)There are two critical demand divisions that divide the crowding cost curve into three regions: a policy-failure region, a cost-reduction region, and a cost-increase region. For clarity, the above two critical values are denoted by and , and . In the policy-failure region, i.e., , the mean crowding cost does not depend on the staggered time interval, and SWH policy fails to mitigate in-vehicle congestion. Then, a further increase to the demand division will ease in-vehicle congestion. At value , the minimum cost is reached. If the demand proportion is larger than , in-vehicle congestion increases. Taking for instance, the two critical values of demand division are and , respectively. However, if the staggered time interval is larger, the crowding cost is quite sensitive to the demand division, and the policy-failure region does not exist anymore.(2)When the staggered time interval is predetermined, the optimal division of the two categories of commuters is defined when the minimum in-vehicle crowding cost is reached. Let indicate the optimal division. This means that .(3)The optimal demand proportion is sensitive to the staggered time interval. Generally, as the staggered time interval increases, the optimal demand proportion, , tends to decrease. For example, when , while when . However, when the staggered time interval is large enough, the value of stabilizes and ceases to decrease further, settling at a value of 0.5. For instance, the critical demand proportion for and is identical, with the same value of .(4)In terms of the minimum cost, it is also sensitive to the staggered time interval. The minimum cost decreases as the staggered time interval increases. However, once the staggered time interval surpasses a critical value, the minimum cost no longer decreases.

Summarily, the staggered time interval and volume division are two controllable parameters for designing a SWH policy. From the point of view of local optimization, we find the scenario-dependent optimal amount of control variables to minimize the crowding cost, assuming that one of the two control parameters is predetermined.

Figure 12 illustrates the diagrams of efficient control regions when designing an efficient SWH policy.(i)Given the travel demand division, as Figure 12(a) suggests, the staggered time interval should be within its upper and lower boundaries. A staggered time interval that is smaller (or larger) than its lower boundary (or upper boundary) will not relieve in-vehicle congestion.(ii)In the same way, when the staggered time interval is predetermined, the travel demand division needs to be selected within the crowding cost reduction region as shown in Figure 12(b).

From the point of view of system optimization, the optimal combination of staggered the time interval and volume division should be set to achieve minimum crowding costs. As shown in Figure 12, the volume division needs to set be at and the staggered time interval has a value of .

4.3.2. Pandemic Outbreak Period

This section examines the supplementary impact of physical distancing on public transport services with the implementation of the SWH policy. In this analysis, we presume that the physical distancing policy limits bus capacity to 50% total occupancy, i.e., (persons). Amid the COVID-19 pandemic, overall transport demand significantly decreased due to community activity restrictions. Nevertheless, for comparative reasons, we assume that the total demand remains consistent with prepandemic scenarios (as discussed in Section 4.3.1).

Figure 13 shows the joint impact of the travel demand division and staggered time interval on the mean commuting cost and its two components (the in-vehicle crowding and schedule delay costs). Compared to the scenario with , the implementation of a physical distancing strategy leads to an additional reduction in in-vehicle congestion costs. However, due to limited boarding constraint, certain commuters have to adjust their departure times—either earlier or later—to take a bus, resulting in elevated schedule delay costs. On average, the decrease in crowding costs fails to offset the rise in schedule delay costs, ultimately leading to an overall increase in total commuting costs.

To delve deeper into the combined impact of SWH policy and physical distancing, we assess the changes in the three types of costs incurred by commuters who board from the same station with identical work start times. Two cases are discussed, i.e., and , and the results are illustrated in Figures 14(a) and 14(b), respectively. Here, we set the demand proportion to , which means that the commuters are equally divided into two groups in each station.

We draw the following conclusions from Figure 14:(1)Commuters from the downstream, i.e., station and , are significantly affected by the physical distancing measures. These commuters need to depart earlier or later to avoid taking a fully loaded bus. The considerable rise in schedule delay costs outweighs the benefits gained from reduced in-vehicle crowding. This leads to a significant surge in the total commuting cost.(2)When the staggered time interval is relatively small, i.e., , commuters from Group 2 (with a later work start time) in downstream stations suffer much higher commuting costs than Group 1. The increment of schedule delay costs mainly contributes to the rise in total commuting cost. However, the difference between those two groups will disappear when the staggered time interval is large enough. As indicated in Figure 14(b), commutes from the two groups have almost the same cost.

Finally, we provide the optimal parameter settings for designing an efficient SWH policy under the requirement of physical distancing. Diagrams of efficient control boundaries are given in Figure 15. It is noticed that the combined effect of SWH policy and physical distancing changes the efficient control boundaries; however, the difference is insignificant compared with Figure 12. Specifically, the policy-failure region is slightly smaller. In terms of system optimization, the optimal demand proportion should be set to and the staggered time interval has a value of . This value is identical to the case where a SWH policy is implemented under normal conditions, i.e., .

4.4. Risk of Infection

Finally, we would like to explore the combined effects of a SWH policy and physical distancing concerning the risk of infection during bus transit. When assessing the risk of infection, two crucial factors need to be considered. One factor is the level of physical contact between passengers, which is clearly related to crowd density. Typically, a higher number of physical contacts (or greater crowd density) implies a higher risk of infection to some extent. The second factor is the duration of physical contact. Longer durations in a crowded environment increase the probability of passengers getting infected.

A feasible way to assess this risk is by using simulation technology that describes a social-activity contact network and simultaneous disease transmission (Mo et al., [26]). However, we do not consider such a method due to its complex and tedious analyses and the introduction of more parameters. Moreover, we lack actual data to calibrate the model parameters.

Here, we adopt the value of in-vehicle crowding as a surrogate index to depict the risk of infection when commuting on a bus line. This is reasonable since in-vehicle crowding is defined as the function of the degree of crowding effects and the in-bus time, which contains the two critical factors for evaluating the risk of infection. The risk of infection will increase if commuters travel on a more crowded bus; it will be much higher if they travel for longer distances.

By this definition, we conclude that the SWH policy provides a significantly safer commuting environment for public transit in terms of the risk of infection. However, transit safety benefits are not uniformly distributed throughout the bus schedule. Buses near the work start times are typically fully loaded, failing to meet the requirements of physical distancing. In contrast, those buses departing earlier or later, deviating from the scheduled work start time, are safer due to the smaller number of onboard passengers.

To illustrate the above point, we record the number of utilized buses and the number of infection-safe buses in Figure 16, respectively. Here, an infection-safe bus refers to one where the number of onboard passengers is no more than 50 per cent of its maximum occupancy. We take the case without SWH as the baseline for comparison, i.e., when and . In the case without SWH, the mean number of utilized buses is 22.81, of which a mean number of 13.43 buses are safe. When implementing the SWH policy with parameter settings of and , the number of utilized buses is 31.51 and the mean number of safe buses is 23.44.

On the basis of SWH policy, enforcing physical distancing (or limiting the maximum bus load) will further reduce the risk of infection. Figure 17 shows the reduced infection risk as a percentage from enforcing a policy that combines SWH and physical distancing, compared with implementing SWH only. It is crucial to note that physical distancing has a significant impact only when the staggered time interval is small. The bus load factor (i.e., the mean number of passengers per bus) is largely reduced if the staggered time interval is large. In this case, physical distancing will not dramatically affect the bus load factor. So, the effectiveness of physical distancing is limited. For example, in Figure 17, there is only a 13.7% decrease in infection risk when the staggered time interval is , whereas there is a decline of 2.54% when the staggered time interval is .

5. Conclusions

This study examined how the SWH policy affected commuting pattern during peak hours. It focused on a straightforward bus route with multiple origins and a single destination. Commuters’ daily departure time choices were simulated using a multiagent-based Q-learning model. In this model, the regret value served as the signal for reinforcement learning, guiding individuals in making optimal choices for their departure times. The study explored SWH’s effects on commuting costs and the time-space distribution of departure flows. Results indicated that a well-designed SWH policy influences commuters’ departure time choices, leading to a deconcentration of the temporal distribution of travel demand. Notably, a new SWH-induced equilibrium is achieved, where commuters departing from the same station with the same work start time experience identical minimal costs, regardless of their choice of bus.

Concerning the design of an effective SWH policy, the following conclusions are drawn. First, with the division of travel demand, the minimum in-vehicle crowding is achieved when the staggered time interval surpasses a certain threshold. Second, given a staggered time interval, the in-vehicle crowding is reduced by properly adjusting the division proportion of the two groups. These conclusions can be extended to situations involving physical distancing during epidemic outbreaks. It is worth noting that SWH policy also contributes to lowering the risk of infection during such periods.

In this study, we focused solely on the benefits commuters gain from adopting the SWH policy. We neglected the decisions of firms to impose start times (arrival times) on their employees. For firms, the optimal design should not significantly deviate from the initial work schedules and should yield minimal changes (Yildirimoglu et al. [27]). This is because implementing substantial changes in work schedules may reduce positive production externalities. Thus, a SWH policy should mitigate congestion on public transit networks while reducing the impact on enterprise productivity as much as possible. This topic will be considered in future studies.

Another meaningful extension is to replace the current simplified bus line model with a more realistic one that takes into account the stochastic nature of public transport operations. This will allow us to explore how travel choice behavior influences the overall reliability of the bus line. In addition, it would be interesting to investigate the combined effect of the SWH policy and other bus operating methods, such as stop-skipping and limited boarding, on the overall efficiency of the bus system.

Data Availability

The source code of this study is available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

Shengjie Qiang conceptualized the study, proposed the methodology, performed formal analysis, wrote, reviewed, and edited the article, contributed to visualization, and provided funding acquisition. Qingxia Huang proposed the methodology, contributed to visualization, wrote the original draft, and reviewed and edited the article.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (No. 72001081), the Jiangxi Provincial Natural Science Foundation (No. 20212BAB214015), and Jiangxi Provincial Humanities and Social Sciences Research Project for Universities (No. GL21212).