Abstract

Previous studies on pedestrian crossing have mostly focused on pedestrian crossing decisions; while as an important behavioral aspect, the pedestrian crossing process, i.e., their motions during the entire crossing process, has been narrowly studied. Understanding how pedestrian moves across the street during their entire crossing process helps identify risky movements and reasons for such movements, which can further help in the implementation of effective countermeasures. Therefore, this paper proposed a new and easily applied approach for investigating and understanding the pattern of the pedestrian crossing process at crosswalks based on vision-based trajectory tracking technology and UAV (unmanned aerial vehicle) data. This study uses UAV for collecting video data which is timesaving and has a sufficient coverage area, compared to other methods. For trajectory extraction, the vision-based Deep-SORT-Yolov5 architecture is applied. An improved DBSCAN (density-based spatial clustering of applications with noise) algorithm is introduced for clustering and identification of patterns of pedestrian crossing processes based on their trajectories. This approach is tested via a case study involving six marked crosswalks in Shanghai, China. By using the proposed method, different crossing patterns are extracted and compared. The results show reasonable outputs of trajectory patterns, which reasonably explain the potential instincts of the pedestrians and affecting factors on the behavior of the pedestrian crossing process. Suggestions are made based on the results. This paper contributes to a more comprehensive safety analysis of pedestrian crossings by considering the pedestrian crossing process. The model, along with the UAV-based trajectory observation method, provides an easily-applied and low-cost way of traffic data collection for the purpose of pedestrian safety evaluation.

1. Introduction and Literature Review

As an important part of road traffic safety, pedestrian safety is particularly a serious issue. Pedestrians are unprotected road users and are the most vulnerable to road traffic accidents. According to the Global Status Report on Road Safety 2018 [1], about 1.35 million people die on the roads each year, with pedestrian deaths accounting for around 23 percent. In China, 14,000 accidents happened on crosswalks among the years between 2014 and 2017, resulting in 3,898 deaths. Most of these accidents occur when pedestrians are crossing the road and are exposed to motorized traffic [2]. Due to the heterogeneous nature of pedestrians, the movement state of pedestrians when crossing the road changes with the road environment and traffic conditions. Pedestrians will perceive and assess their environment at any time when crossing the road and make corresponding behavioral adaptations when necessary, which adds complexity and causes challenges for traffic management.

Cottrell and Mu [3] proved that pedestrian crossing safety was particularly affected by behavioral factors; thus, many studies have focused on the analysis of pedestrian’s behavior [47]. In the investigation of behavior, two behavioral types should be considered: (1) the decision-making behavior and (2) the behavioral process [8]. Decision-making behavior refers to behavior types with instantaneous decision-making; while, the behavioral process is a process with continuous behavior, reflecting the dynamic interpretation of a certain behavior [8]. In the scenario of pedestrian crossings, the former typically refers to the pedestrian crossing decision (the decision to cross), and the latter is normally the street-crossing process of the pedestrian (the way how he/she crosses). The crossing decisions of pedestrians have been heavily investigated by previous studies [911]. Among the few studies exploring the street-crossing process, most of them have relied on indicators such as the crossing speed of the pedestrians or distance measures from the pedestrian to the crosswalk [1214]. However, such indicators, though providing a rough statistical description, fall short in considering the changes and behavioral features during the pedestrian crossing process. Since the process of pedestrian crossing and their exposure to motorized traffic highly (if not fully) overlap, the characteristics during the pedestrian crossing process should therefore be further explored.

Checking from past literature, one reason for being lacking in the investigation of the pedestrian crossing process should be the limited methods in data collection. Most studies on pedestrian behavior during crossings have relied on traditional methods including questionnaire surveys [15, 16] or manual field observations [17, 18]. These methods can be biased, with subjective judgements by interviewees and observers, and time-consuming, and have reliability issues [19]. Meanwhile, traditional methods fall short in recording detailed information which can be used as a reference in describing the pedestrian crossing process. Different traffic data collection technologies have emerged in the recent decades [2022], where among them video-based tracking technologies have gained high popularity. With advances in deep learning techniques, video-based tracking technologies automatically track road users from videos and record the trajectory of them with high accuracy [23]. Such data provide detailed trajectories, i.e., positional and speed information of the road users in the scene, and can be used as an important data source for the analysis of road user behavior, including the pedestrian crossing process.

Traffic cameras have been widely installed and used for video data collection, while limitations exist. Traffic cameras are always not installed vertically down towards the street; therefore, tracking accuracy is challenged in many cases in angle calibration and the fish-eye effect of the camera [24]. Meanwhile, positioning 4–8 meters above the road surface limits the coverage of traffic cameras [24].Thus, for this, tracking and synchronization through multiple cameras can be possible, but it is highly challenged [25]. Recently, with the popularization and wide application of UAV (unmanned aerial vehicle), the use of low-altitude video information collected using UAV has gained popularity for traffic data collection [26, 27]. Compared to fixed traffic cameras, UAVs are more flexible and less affected by installation conditions [28]. Besides, it has a large coverage area, and it can hover high up in the air and shoot vertically down which helps obtain a good shooting distance and avoid the obstacles in the urban road environment thus obtaining a relatively comprehensive and clear view [29].

Therefore, in order to study the behavior of the pedestrian crossing process, this paper proposes a trajectory-based pattern recognition method based on two characteristic parameters of pedestrian crossing: average speed and average deviation value. The Deep-SORT-Yolov5 architecture is used as the image processing tool for trajectory data extraction. An improved DBSCAN algorithm is applied to cluster pedestrian trajectories into different pattern types. Based on that, a full methodological approach that investigates the pedestrian crossing process and its related affecting factors using trajectory motion patterns is described. A case study involving six sites in Shanghai is conducted for test and illustration purposes. Trajectory patterns at these sites are identified, results are analyzed, and the impact of attractions is discussed. The methodological approach, involving data collection using UAV and vision-based tracking, crossing pattern recognition and analysis, and contributing factor investigation, helps us understand the pedestrian crossing process which has remained much untapped. It also provides a practical and easily applicable way to investigate countermeasures and geometric designs to improve pedestrian safety in terms of the pedestrian crossing process.

2. Methodology

The methodology of the study is composed of three steps: (1) video data acquisition and processing, (2) trajectory clustering using improved DBSCAN, and (3) analysis of crossing patterns. The framework of methodology is presented in Figure 1.

2.1. Video Data Acquisition and Processing

The video data are collected using UAV (DJI Mavic2 Pro in this study, as shown in Figure 2). After the data are collected, videos are trimmed for data processing. For data processing, the Deep-SORT-Yolov5 architecture [30] is used for detection and trajectory tracking (Figure 3). The Deep-SORT-Yolov5 architecture used in this study involves two key steps, including multiobject detection and trajectory tracking. After collecting the data, the first step involved trimming the videos to remove segments that were not useful for the analysis, such as drone takeoff and landing, as well as segments without pedestrian or vehicle objects. Then, the drone was flown at an altitude of 30 meters. This altitude was chosen to ensure the clarity of traffic objects while precisely covering the pedestrian crossing scenes. The resulting disturbances in the video were relatively minor, and we applied video stabilization using OpenCV to address them. Furthermore, the DJI Mavic2 Pro automatically performed image correction within the camera while capturing the video, thus eliminating the need for further image distortion correction.

2.1.1. Multiobject Detection Using Yolov5

Yolov5 locates the object in the image while predicting its category and eventually converts the object detection problem into a regression problem [31]. In such a way, processing speed is much improved, making it highly efficient for object detection.

As presented in Figure 3, Yolov5 needs to be retrained for the scenario of this study. Existing publicly available datasets for traffic objects typically feature roadside angles and simple backgrounds. However, this study adopts a 90-degree overhead perspective and captures data at crosswalks within intersections. As a result, a new dataset is created to cater to the specific training requirements of this research. As illustrated in Figure 4, LabelImg provides a user interface for manually selecting and classifying objects. The traffic objects in the images are categorized and labeled as “pedestrian,” “nonvehicle,” and “vehicle.” Upon verification, we found that the highest accuracy in target identification occurs when enclosing only the pedestrian’s head within the bounding box. As a result, we used the coordinates of the pedestrian’s head to define the “pedestrian” object box. During labeling, efforts are made to align the bounding boxes closely with the objects to reduce background interference. Once the labeling process is completed, XML hypertext files are generated containing information for each labeled object, including its name and bounding box coordinates. Labelled objects are then saved as VOC data formats [32]. The samples are randomly split into training and validation sets in a 8 : 2 ratio.

For details on the training process, one can refer to reference [33]. The specific environmental configuration details for the algorithm framework in this paper are provided in Table 1. Taking into consideration the hardware environment and network characteristics of this experiment, several parameter adjustments are made. In this experiment, the configurations are as follows: classes = 3; name = vehicle, nonmotor vehicle, and pedestrian; filters = 3 × (classes + 5) = 24; learning_rate = 0.001; batch = 64; and batch/subdivision = 64/16. The parameters for the optimization algorithm during training, specifically momentum and decay for stochastic gradient descent, are set as 0.9 and 0.0005, respectively. In addition, max_batches is defined as 50000, and steps are set to 40000 and 45000, which means that when the training reaches 40000 and 45000 iterations, the learning rate is reduced to 0.0001 and 0.00001, respectively. The parameters for enhancing image data, including angle, saturation, exposure, and hue, were all set to their default values. Finally, the anchor box values obtained through k-means clustering are used to replace the original anchor values.

After the Yolo model is retrained, Yolov5 detects objects, which are vehicles and pedestrians in this case. Figure 5 shows a sample of detection outputs.

2.1.2. Object Tracking Using Deep-SORT

Deep-SORT is a multiobject tracking algorithm based on tracking-by-detection [34]. In the Deep-SORT-Yolov5 architecture, the detection part of Deep-SORT is replaced by the Yolov5 algorithm. The bounding box and features are used for sequentially tracking objects through frames. For details about Deep-SORT, one can refer to reference [35].

We evaluate the model’s performance on the validation set. Table 2 shows that the model exhibits good detection and trajectory tracking performance for traffic objects.

To illustrate the model’s fit to the actual data and its generalization ability, we plotted the curve of the loss function. As shown in Figure 6, we stopped the iteration when the curve became flat, with the final iteration number being 13000 and the average loss function value being 0.437.

2.2. Trajectory Clustering Using Improved DBSCAN

Trajectory clustering is an effective method for analyzing trajectory data for the purpose of pedestrian crossing process analysis [36]. An improved DBSCAN algorithm is chosen because it has the ability to cluster with noisy data filtered out and is able to define the proper number of clusters and can also be applied to clustering unknown and skewed datasets [37].

In typical DBSCAN, distances between points are used as the basis for clustering, while it has to be replaced by a proper measure for trajectories, i.e., similarity (distance) between trajectories. In this study, a new distance function is proposed to measure the similarity, as presented in Figure 7.

As presented, and are the sets of points on two trajectories (traj1 and traj2, respectively), where , , , . An average-minimum approach is used as follows: (1) the Euclidean distance from to the set B is calculated and the shortest distance from to traj2, , is determined, (2) by iterating from each point on traj1 to points on traj2, we get the group of shortest distance {}, and (3) the distance between the two trajectories is then calculated as the average of {}. The detailed calculation process is presented as

Based on the distance calculation method, a distance matrix of trajectories can be calculated, which is further used as the distance measure in the improved DBSCAN. The rest of the work for trajectory clustering adopts the typical DBSCAN algorithm (Figure 8), which relies on two global parameters: Eps (radius) which is “the radius of the adjacent neighborhood of a considered data point” and MinPts (minimum adjacent number) which is the “adjacent minimum number of data points located in the given region” [38]. The optimal parameter values are selected based on the distance curves and the first derivative of the distance curves, as referred in reference [39]. With the parameters determined, the improved DBSCAN groups the trajectories into clusters based on the distance matrix.

2.3. Pedestrian Crossing Pattern Analysis

As presented in Figure 1, with trajectories successfully clustered, different trajectory patterns can be compared for analysis purposes. A case study involving six crosswalk locations was conducted. Trajectories are extracted from the six sites, respectively. The study compares trajectory patterns in terms of the average crossing speed and the average offset to the crosswalk center, in each site, respectively, as follows:(i)The average crossing speed is the average speed of the individual pedestrian during his process of crossing the street(ii)The average offset to the crosswalk center is the average distance of the pedestrian to the center line of the crosswalk marking area (measured in each video frame) during the entire crossing process

The significance of the difference among patterns at the same site was tested using the Mann–Whitney U test/Kruskal–Wallis H test (the reason for using the method will be explained in the following section) to validate the clustering results (whether the clustering results can be clearly explained).

Furthermore, analysis and discussions are made according to the clustering results at six sites. Impacts of attractions on pedestrians and facilities/locations where pedestrians are moving towards (e.g., subway stations) are discussed. Suggestions for countermeasures based on the clustering results are also provided.

3. Case Study

3.1. Study Site and Data Collection
3.1.1. Study Site

A case study was conducted involving six sites from Shanghai, to validate the effectiveness and illustrate the application of the methodology. The selection of research locations followed the following principles: (1) sites located outside of no-fly zones and restricted fly zones, (2) minimal obstructions above pedestrian crossings, (3) proximity to facilities attracting a reasonable flow of pedestrians and vehicles, and (4) coverage of various intersection types, lane counts, and signalization scenarios. Six pedestrian crossing sites include four signalized intersections and two unsignalized intersections. The management of pedestrian crossings has been an issue according to the local police department. The details of the sites are provided (Figure 9) as follows:(i)SITE 1: SITE 1 (Cao’an_Moyu) is located at the intersection of Cao’an Highway and Moyu Road. The Moyu Road is the main road of a two-way five-lane (three lanes north to south and two lanes south to north). The northeast side of the experimental point is a shopping mall, and 10:30-12:00 in the morning is the period of large traffic flow.(ii)SITE 2: SITE 2 (Guoding_Zhengmin) is located at the intersection of Guoding Road and Zhengmin Road. Guoding Road is a two-way four-lane road with a large traffic flow at 11:00–12:00 in the morning.(iii)SITE 3: SITE 3 (Shuangdan_Yungu) is located at the intersection of Shuangdan Road and Yungu Road. Yungu Road is a two-way three-lane road. The northwest of the experimental site is a life square, and the southeast is where Wanda Mall and Jiading Metro Station are located. The crowd is more active during the evening peak period of 17:00–18:00.(iv)SITE 4: SITE 4 (Daxue_Zhixing) is located at the intersection of Daxue Road and Zhixing Road, a three-legged intersection with a metro station to its south. Zhixing Road is a two-way two-lane road. The metro station, along with a shopping square, attracts a large number of people.(v)SITE 5: SITE 5 (Changji_Yadan) is located at the intersection of Changji Road and Yadan Road. The southeast side is Changji East Road subway station, and the other side of the facility is connected to Changji East Road bus station. Therefore, a large number of people are transferred from the bus to the subway.(vi)SITE 6: SITE 6 (Anshan_Zhangwu) is located at the Y-type three-legged intersection of Anshan Road (finishing at the intersection) and Zhangwu Road (going east-west direction, through the intersection). It is the crosswalk located on Anshan Road, the south approach at the intersection. Anshan Road, where it is located, is a one-way, one-lane road.

3.1.2. Data Collection

As discussed, the DJI Mavic2 Pro UAV (3840 by 2160 pixels) was used for data collection. To cover the crosswalk area, the UAV was positioned approximately 30 m above the crosswalk, shooting vertically down toward the site. Since the UAV relies on a battery which lasts for less than 30 minutes, the battery of the UAV was changed every 20 minutes during data collection. Video data were collected for 1 hour at each site, and this could be effectively used for analysis in the study, as shown in Table 3.

3.2. Description of Trajectory Data Extracted
3.2.1. Trajectory Extraction and Correction

After data were collected, the vision-based Deep-SORT-Yolov5 architecture was further used for trajectory extraction. The raw trajectory data were extracted from the image coordinates (with the up-left corner of the video as the origin of the coordinates) and were measured in pixels. Meters-per-pixel (m/P) was calculated with reference to ground-truth measurements from the field. Then, for the convenience of calculation and analysis, the image coordinate system was converted into a distance coordinate system, setting the location of the up-left point at the start of the crosswalk (marking) as the origin and measured in meters. The conversion of the coordinate system is presented in Figure 10.

Despite the good performance of tracking using Deep-SORT-Yolov5 and vertical-angle UAV video, trajectories of pedestrians still had common remaining issues including (1) tracking multiple pedestrians as one and (2) one individual pedestrian tracked into disconnected trajectories. A simple self-developed tool was applied to correct the erroneous trajectories. The processing results are shown in Figure 11. The processing rules are as follows:(1)Matching the object IDs in the data with the video for classification, inspection, and correction.(2)If a trajectory has missing portions at the beginning or end and exhibits a significant gap, it is considered an invalid trajectory and is removed.(3)If a trajectory has a missing segment in the middle and is too short, a splicing process is applied to connect the two segments belonging to the same trajectory. The splicing procedure includes the following steps:(i)Splitting the discontinuous trajectory under the same ID into two segments and obtaining the starting and ending coordinates of each segment(ii)Calculating the distance between the starting point of the i-th segment and the ending points of all other segments to find the minimum distance(iii)Calculating the difference in frames between the starting point of the i-th segment and the ending points of all other segments to find the minimum difference(4)Verifying whether and belong to the same trajectory. If they do, the two segments are connected and missing frames are filled using linear interpolation to generate a new longer trajectory

3.2.2. Statistical Description of Trajectory Data

A total of 2154 continuous pedestrian trajectories were obtained (the 6 sites were 315, 359, 445, 414, 266, and 355). For the statistical description of the trajectory data, measures including average crossing speed and the average offset to the crosswalk center, which have been used as traditional pedestrian crossing process measurements, are used. The detailed statistics, as well as the distribution histogram of the measurements for different trajectory groups, are given in Table 4 and Figure 12.

A histogram is a useful tool for understanding the distribution of data. By analyzing the frequency distribution histograms of the average crossing speed and average offset for pedestrians at different sites, it is clear that the distribution shapes of these two parameters are different. Hence, this study compared the average crossing speed and average offset metrics among the six locations, resulting in rankings for each of the six locations. Among all the sites, pedestrians at SITE 5 have the smallest average offset value and the largest average crossing speed. Conversely, pedestrians at SITE 1 have the largest average offset value and the smallest crossing speed.

3.3. Clustering Results and Analysis
3.3.1. Trajectory Pattern Clustering

The improved DBSCAN algorithm was applied for pedestrians walking in different directions at each site, respectively. For clustering, the parameters (Eps and MinPts) were first determined based on the first derivative of the distance curves and distance curves derived from the distance matrix. Figures 1318 show the outcomes of the distance curves and the distance difference curves for the eleven trajectory groups at the six sites (the East-West data in site 5 is insufficient, so site 5 does not perform directional analysis). Eps and MinPts parameters were determined for each site as follows:(1)SITE 1: according to the curve outputs, for trajectories of pedestrians walking in the west-to-east direction, the maximum change of distance curve (determined by the distance difference curve, Figure 13(b)) occurs when k = 4. Therefore, the optimal MinPts = 4. Checking from Figure 13(a), when k = 4, one can find that Eps is around 0.6; thus, we determined the parameters (MinPts, Eps) = (4, 0.6). For the east-to-west direction, (MinPts, Eps) = (3, 0.55). The same process for other sites is as follows:(2)SITE 2: west-east (MinPts, Eps) = (4, 0.59) (Figure 14(a)). East-west (MinPts, Eps) = (8, 0.88) (Figure 14(b))(3)SITE 3: west-east (MinPts, Eps) = (4, 0.90) (Figure 15(a)). East-west (MinPts, Eps) = (5, 0.80) (Figure 15(b))(4)SITE 4: west-east (MinPts, Eps) = (7, 0.65) (Figure 16(a)). East-west (MinPts, Eps) = (7, 0.55) (Figure 16(b))(5)SITE 5: due to the particularity of pedestrian distribution at the intersection of Changji Road, only the one-way pedestrian crossing mode is analyzed, and (MinPts, Eps) = (3,0.53) is obtained from Figure 17(6)SITE 6: west-east (MinPts, Eps) = (7, 0.55) (Figure 18(a)). East-west (MinPts, Eps) = (14, 1.05) (Figure 18(b))

By using the parameters, the trajectory groups were then clustered. According to the K–S (Kolmogorov–Smirnov) test, not both of the data groups for comparison conform to the normal distribution, so the nonparametric test is adopted. The Mann–Whitney U test and the Kruskal–Wallis H test was used for the test between two data groups.

Results of clustering and the comparison of different clustered trajectories patterns, within each trajectory group, are provided in Table 5. In SITE 1, pedestrian trajectories are divided into four groups in the west-east direction, while only two groups are identified in the other direction. Pedestrian trajectories along the west-east direction in SITE 2 were clustered into two different patterns, while pedestrians along the east-west direction were successfully clustered into three categories. At SITE 3, the west-east pedestrian trajectory is clustered into 4 categories and the east-west pedestrian trajectory is clustered into 2 categories. The west-east pedestrian trajectory of SITE 4 gathers two different modes, and the pedestrians walking from east to west are successfully clustered into two categories. SITE 5 due to the particularity of its pedestrian distribution, only the west-east pedestrian trajectory is clustered and there is only one pedestrian crossing mode. Pedestrians walking from west to east in SITE 6 were identified as two crossing modes, and three different crossing modes were identified in the east-west pedestrian trajectory.

Comparisons were made among different patterns within each individual trajectory group, in terms of the average crossing speed and the average offset to the crosswalk center, based on the significance of difference. From the comparison results, as shown in Table 5, the clustered results (patterns identified) were all significantly different from each other, in terms of the average crossing speed and the average offset to the crosswalk center. This indicates that the proposed improved DBSCAN method can effectively identify potential features and automatically cluster trajectories based on these features, even though the selected variables may only partially describe the pedestrian trajectories during the crossing process.

3.3.2. Analysis of Study Sites According to the Trajectory Pattern

Pedestrian crossing patterns were further investigated for each site, based on the clustered results. Results are provided in Figure 19. In the figures, blue arrows give the direction of the pedestrians and trajectories within different cluster types are represented by different colors (a few gray ones were those identified as noise). In different cluster types, a solid line of the same color as the trajectory is used to indicate the central position of the trajectory distribution of that type. 95% of the trajectories are distributed within the range enclosed by the dotted lines on both sides of the solid line. In the figures, both the clustered outputs in the X–Y distance coordinates and their projections in the aerial view of the crosswalk are provided for visualization and analysis purposes.

Pedestrians are categorized into three crossing styles: conservative, ordinary, and adventurous, for risk analysis of their behavior during the street crossing. Conservative pedestrians have an average crossing offset concentrated within 0–2 meters, and they consistently stay within the pedestrian crosswalk markings during the crossing, benefitting from the protection provided by the crosswalk. Ordinary pedestrians exhibit an average crossing offset within the range of 2–4 meters. Some of their trajectories deviate slightly from the pedestrian crosswalk, but they are generally safe during the crossing. Adventurous pedestrians have an average crossing offset exceeding 4 meters, entirely departing from the pedestrian crosswalk markings, exposing themselves to vehicular traffic, and thus engaging in a higher-risk crossing behavior.(i)W-E direction at SITE 1: the trajectory of the red pattern is inclined towards the north side, and over 95% of the trajectories are distributed outside the pedestrian crossing. The average pedestrian crossing speed in this mode is 1.5 m/s, which belongs to the ordinary crossing style. Pedestrians in this pattern may be attracted by the shopping center on the southeast side. The trajectory of the orange pattern has the same lateral offset trend, but because the starting point is on the south side, the trajectory is distributed entirely within the pedestrian crossing range. The average pedestrian crossing speed in this mode is 1.4 m/s, which belongs to the ordinary crossing style. The deep green and light green trajectories are evenly distributed within the pedestrian crossing range. The average pedestrian crossing speed in both modes is 0.8–1.3 m/s, which belongs to the conservative crossing style. According to the analysis, the crossing behavior of pedestrians in the red pattern should be appropriately regulated.(ii)E-W direction at SITE 1: the trajectories are divided into two density clusters in the north-south direction, with a higher proportion of brown patterns, which may be related to the habit of Chinese pedestrians walking on the right side. Over 95% of the red trajectories are distributed on the pedestrian crossing, with an average crossing speed of 1.2 m/s, which belongs to the conservative style. About 30% of the trajectories on the east side of the brown cluster are outside the pedestrian crossing, with an average crossing speed of 1.4 m/s, belonging to the ordinary style.(iii)W-E direction at SITE 2: the blue trajectories are uniformly distributed on the pedestrian crossing, with an average crossing speed of 1.0 m/s, belonging to the conservative style. The red trajectories have an initial trend of moving northward (95% of the trajectories are distributed outside the pedestrian crossing), and their endpoint coincides with the blue trajectories. This trajectory also belongs to the conservative style. The reason for this phenomenon may be that the presence of utility poles and lampposts across the road causes pedestrians to have avoidance psychology. Therefore, the existence of supporting facilities has a certain degree of impact on pedestrian trajectories.(iv)E-W direction at SITE 2: the three types of trajectories converge from both sides of the pedestrian crossing under the obstruction of road facilities. The green trajectory is constantly exposed to vehicle traffic and has an average crossing speed of 1.7 m/s, belonging to the adventurous style. The latter half of the blue trajectory returns to the pedestrian crossing, with an average crossing speed belonging to the ordinary style. At the beginning of the crossing, 35% of the pink trajectory is distributed outside the pedestrian crossing. 80% of the average crossing speed belongs to the conservative style, and 20% belongs to the ordinary style.(v)W-E direction at SITE 3: more pedestrians come from the shopping mall and subway station, so the orange and purple trajectories have the largest proportion. Pedestrians on the orange trajectory do not walk directly along the line connecting the start and end points to save time but choose to detour on the pedestrian crossing. The remaining three trajectories have over 95% of their parts inside the pedestrian crossing, protected by the pedestrian crossing. 25% of the average crossing speed at this location belongs to the ordinary style, and 75% belongs to the conservative style.(vi)E-W direction at SITE 3: pedestrian trajectories in this direction are evenly clustered into north and south clusters. The pink trajectory is similar to the orange trajectory from west to east, but the departure and destination of the two trajectories are opposite. Pedestrians in brown tracks cross the crosswalk along the road. The pedestrian trajectories in this direction are all safe crossing strategies.(vii)W-E direction at SITE 4: the yellow trajectory is mostly distributed outside of the pedestrian crosswalk (over 95%), with an average crossing speed of 1.6 m/s, belonging to the adventurous style. The red trajectory is evenly distributed inside the pedestrian crosswalk, with a tendency to shift towards the south in the later stage. 20% of them belong to the conservative style, and 80% belong to the normal style. A subway station and a square in front of the station are located on the southeast side of the pedestrian crossing, while an office building is situated on the northeast side. The reason for the southward shift of pedestrians may be due to the attraction of the subway station and plaza.(viii)E-W direction at SITE 4: similar to the result for the W-E direction, one crossing pattern (green) had most of its trajectories within the crosswalk marking, while trajectories clustered as the blue pattern were mostly on the south side, outside the crosswalk marking area, mainly due to the dispersion of pedestrians from the metro and square. The blue pattern should be avoided as pedestrians are less protected walking outside the marking.(ix)W-E direction at SITE 5: the average crossing speed fluctuates around 1.3 m/s. The reason may be that the intersection does not set a signal to guide pedestrians to cross the street and does not set up guardrails and other supporting facilities, so pedestrians are not subjected to any restrictions. In addition, the coincidence degree between the trajectory distribution and the crosswalk marking is not high, which indicates that the geometric design of the crosswalk is unreasonable, and the facilities should be replanned according to the distribution law of pedestrian crossing trajectory.(x)W-E direction at SITE 6: blue trajectories are for those pedestrians who are walking from the south-west sidewalk on Anshan Road, and green ones are for those walking from the south-west sidewalk on Zhangwu Road. Results show that 40% of pedestrians walking from the south-west sidewalk on Zhangwu Road tended to cross outside the marking. The results indicate that the geometric design of the intersection and the design of the crosswalk marking have better protection for pedestrians walking from the south-west sidewalk on Anshan Road (over 95%).(xi)E-W direction at SITE 6: similar results can be found as in the W-E direction; the number of pedestrians walking towards the south-west sidewalk on Zhangwu Road is higher than those walking towards the south-west sidewalk on Anshan Road. Among the pedestrians walking towards Zhangwu Road, their walking patterns were successfully clustered into two. The green ones fall mostly within the crosswalk marking area, while the purple trajectories are outside the marking.

At present, the design of road signs and markings is mainly to meet the needs of vehicles. In order to improve the efficiency of traffic flow, the demand of pedestrians crossing the street is neglected, which leads to the setting of many crosswalk markings that do not conform to the actual pedestrian crossing rules. On the one hand, setting unreasonable crossing facilities will reduce the efficiency of pedestrian crossing, such as the large number of pedestrians during the peak period, which will cause congestion inside the crowd. On the other hand, it will increase the probability of pedestrians overflowing the crosswalk markings, and the overflowing pedestrians are exposed to the traffic flow, which poses a potential threat to the personal safety of pedestrians.

Pedestrians have adopted different crossing modes due to the combined influence of external factors. These factors include the design of crosswalks, the presence of ancillary facilities such as guardrails or isolation belts, and the properties of surrounding buildings. For example, buildings such as shopping malls and subway stations can attract pedestrian traffic, necessitating street crossings. Crosswalks and guardrails can help direct pedestrian traffic. By analyzing the causes of abnormal trajectory patterns, suggestions can be made for improving intersection facilities and limiting the occurrence of abnormal trajectories.

In order to explore the overflow degree of pedestrian crossing in this experimental point, the pedestrian distribution during the peak period is selected to analyze the boundary threshold of the width of the crosswalk. By further drawing the pedestrian trajectory heat map during peak hours and projecting it into the UAV aerial map, the location with the highest probability of pedestrian overflow can be obtained. Taking into account the distribution of each cluster of trajectories, reasonable suggestions for optimizing road facilities are proposed to ensure that 95% of each cluster of trajectories is protected by pedestrian crossings.

As shown in Figure 20, the color represents the density concentration, and the yellow color changing to red represents the density from small to large.

Based on the results, suggestions for the improvement or countermeasures can be further provided which are detailed as follows:(i)For SITE 1: the clustering results show that the overflow data are mainly from the west-east red trajectory (Figure 19). The main reason is that pedestrians are attracted by the comprehensive shopping mall from the northeast side. An effective measure to regulate the way such pedestrians cross the street is to extend the isolation zone on the east side, thereby limiting the pedestrian’s advance deflection direction (Figure 21). We refer to the 95% dotted line position of the red trajectory to determine the extension length.(ii)For SITE 2: a considerable part of the data points at this point fall on the side of the crosswalk near the side of the roadway, which greatly increases the risk of pedestrian crossing. The density of the overflow point on the upper left side of the crosswalk is the highest. An effective way to regulate such pedestrian crossing modes is to extend the length of the guardrail (Figure 22). The spillover rate will be greatly reduced after regulating such pedestrian crossing behavior.(iii)For SITE 3: the pedestrian trajectory of SITE 3 is mostly concentrated on the left side of the crosswalk. Pedestrians are always unconsciously biased towards the source of attraction, while there is a certain avoidance of traffic flow. Therefore, an arc trajectory is generated. Korean designer Jae Min Lim presented a new crosswalk called “Ergo Crosswalk” (Figure 23) at the 2010 Seoul Design Fair. The outline of the whole marking line is called the “meniscus” with two wide ends and a narrow middle, which fits people’s arc crossing trajectory and can guide people to regulate crossing. We can refer to the design of the abovementioned crescent pedestrian crosswalk. At the same time, the parking line is moved back or designed to be serrated (Figure 24), which can effectively limit the pedestrian trajectory in the crosswalk.(iv)For SITE 4: compared with the road design of the abovementioned SITES, SITE 4 does not have the facilities to restrict the pedestrian crossing, resulting in a wider distribution of pedestrians. Therefore, in view of the pedestrian psychology in this crossing mode, the sign of the crosswalk can be set at the guide sign outside the subway station or at the entrance and exit of the subway station to remind pedestrians to use the crosswalk facilities to cross the street (Figure 25). However, due to the large number of overflow pedestrians, this method can only serve as a warning for some pedestrians. A more effective method is to widen the crosswalk marking. The point is located at a three-way intersection. Vehicles cannot enter the pedestrian crossing when pedestrians are passing through. Therefore, the pedestrian crossing can be widened from north to south. According to China’s “urban road traffic signs and markings set specifications,” the width of the crosswalk in urban roads should be greater than or equal to 3 m, and 1 m should be the first level when widening (The Ministry of Public Security of the People’s Republic of China and Ministry of Housing and Urban-Rural Development of the People’s [40]).(v)For SITE 5: the trend of pedestrian crossing trajectory is completely inconsistent with the marking design of the crosswalk. So, transforming the geometric design of crossing facilities is necessary according to the crossing mode of pedestrians in the natural state. First, the area with the highest density is judged based on the heat map, and the shape of the crosswalk is roughly determined. Furthermore, a reasonable width of the crosswalk is set according to the boundary of the pedestrian area. We then refer to the 95% dotted line position of the red trajectory to determine the scope. Due to the attraction of the subway station to the track, pedestrians have a large deflection in the later stage of crossing, so pedestrians are limited to the crosswalk by extending the greening facilities in the lower left corner or setting a small range of guardrails (Figure 26).(vi)For SITE 6: pedestrians walking outside the crosswalk to save time for crossing. The crosswalk marking successfully protects pedestrians both from and towards the south-west sidewalk on Anshan Road but fails to provide a good shield for those crossing from and towards the south-west sidewalk on Zhangwu Road. A best solution for this can be expanding the crosswalk marking (in the north direction). Vehicles are distributed along fixed lanes, subject to specific traffic rules, turning angles, and inertial constraints, and the randomness of trajectory is greatly reduced compared with pedestrian crossing. Considering comprehensively, the optimized crosswalk can more effectively regulate the crossing behavior of pedestrians and vehicles and can improve the traffic efficiency. In this way, the crosswalk marking can cover a higher proportion of the pedestrian crossings; meanwhile, pedestrians may be more willing to walk on the crosswalk (Figure 27).

4. Conclusions

This paper mainly investigates the pedestrian crossing process, an important aspect of behavior that is also closely associated with safety but remains much unconsidered. For the purpose of improving the efficiency in data collection from multiple study sites, an easily-applied and low-cost data collection method using the UAV for video data collection and the vision-based tracking tool for trajectory extraction are used. Deep-SORT-Yolov5 architecture is introduced for video data processing in the extraction of trajectory data of pedestrians. By replacing the Euclidean point distance measure with a distance matrix describing the distance between trajectories, an improved DBSCAN method is proposed for clustering pedestrian patterns in terms of the shape and offsets of trajectories. The proposed methodology, including the data collection method based on UAV, trajectory extraction using Deep-SORT-Yolov5, and pattern recognition using the improved DBSCAN, is applied in a case study involving six crosswalk locations in Shanghai, China. By dividing pedestrians walking in different directions, two pedestrian groups walking in the opposite directions on the crosswalks are analyzed, respectively. Outcomes of pedestrian crossing patterns from clustering are compared, and discussions are made on the character of the patterns, key factors contributing to different patterns, and potential solutions for avoiding improper crossing patterns. The following key conclusions can be made:(i)Tested through the case study, the data collection method using UAV and vision-based Deep-SORT-Yolov5 tracking architecture has presented its advantages of being convenient, time-saving, good-in-data-quality, and flexible. Compared with traditional fixed traffic cameras, UAVs have stronger mobility, larger field of view, lower cost, and less operational space restrictions [41]. Meanwhile, a good coverage is achieved for effectively collecting high-quality data, presenting the outstanding ability in using this method for data collection.(ii)The method of onsite observation and manual recording is time-consuming and laborious and is often subjected to significant subjective influence of the observer. This method often judges the severity of conflicts based on individual events and fails to reflect the continuous evolution process of behaviors. Trajectories can provide more detailed, accurate, objective, and comprehensive data. Most importantly, trajectory data containing information such as position and time can help analyze the patterns of pedestrian crossing behavior.(iii)Results from clustering show that the improved DBSCAN is able to describe the features of the pedestrian crossing process with the trajectories of different pattern types being significantly different, measured by two typical pedestrian crossing measures including the average walking speed and the average offset to the center of the crosswalk.(iv)In the case study, observations of the pedestrian crossing process are clustered for pedestrians walking in different directions at the six study sites. Improper crossing patterns are identified, and the main reasons for such patterns are explained. Based on the clustering results, practical treatment suggestions are made in terms of the issues identified. Overall, the methodology proposed in this paper has shown a good performance in investigating the pedestrian crossing process.

As a key contribution, the study provides a novel approach in investigating pedestrian crossing behavior from the aspect of the crossing process, which will further contribute to studying pedestrian safety and behavior in a more comprehensive way. Besides, the study also provides a practical and convenient way of traffic safety analysis benefiting from the flexibility in data collecting using UAV, the detailed and formatted information in trajectory data processed using deep-learning tracking algorithms, and advanced measures in safety and behavior analysis.

While the study has several advantages associated with the use of UAVs, limitations do exist. The reliance on battery power limits the duration of data collection to a maximum of half an hour. Furthermore, obtaining permission from the city municipality to fly a UAV above urban roads adds to the difficulty of data collection. As a result, the amount of data collected is insufficient. In addition, the paper only proposes a “prototype” method for investigating the pedestrian crossing process using a distance measure to cluster patterns of trajectories. However, different trajectory features should be further considered. For future work, the proposed methodology will be updated with vision-based tracking technology, more advanced trajectory mining models capable of considering different trajectory features, and the use of long-lasting data collection equipment available in the UAV industry. Investigations into the effects of various environmental and traffic factors on the pedestrian crossing process will also be conducted using the data collected from different locations.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest regarding the publication of this paper.

Authors’ Contributions

Ting Fu drafted and prepared the manuscript, analyzed and interpreted the results, and revised the manuscript. Shuke Xie conceptualised the study, drafted and prepared the manuscript, designed the study, and revised the manuscript. Rubing Li collected the data and revised the manuscript. Junhua Wang conceptualised and designed the study and revised the manuscript. Lanfang Zhang and Anae Sobhani analyzed the study, interpreted the results, and revised the manuscript. Shou’en Fang conceptualised and designed the study and revised the manuscript.

Acknowledgments

This research was mainly supported by the Chinese National Natural Science Foundation (72001161), the Fundamental Research Funds for the Central Universities of China (2022-5-ZD-05), and the Shanghai Sailing Program (20YF1451800).