Abstract

Parking lots have many complex structures, diverse functions, and plentiful elements. The frequent flow of vehicles with narrow and dim spaces increases the probability of various traffic accidents. Due to the low severity and lack of relevant data, there is limited understanding of safety analyses for parking lot accidents. This study integrates multisource data to establish a Bayesian diagnostic model for parking lot accidents. The mutual information method is used to screen the possible influencing factors before modeling to reduce the subjectivity of Bayesian networks. Studying the cause and effect analysis of accidents provides diagnosis and prediction for property damage and event causes. This provides valuable correlation information between factors and accident characteristics, as well as consequences under the influence of multiple factor chains. As the developed model has good accuracy, this study proposes a parking lot safety evaluation system with a library of countermeasures based on the model results to ensure rigorous conclusions. The combination with ITS technology gives the system high scalability and adaptability in multiple scenarios.

1. Introduction

Automobiles have brought convenience and comfort to people’s lives but have also caused many significant urban problems, including increased parking demand and parking space constraints [1]. Compared with on-street parking, parking lots have become an important facility to effectively relieve parking pressure due to their zoning, advanced management methods, and user-friendly experience. With the rapid emergence of the intelligent transportation system (ITS), research on parking lot systems has gradually intensified, such as through the parking lot guidance algorithm [1, 2], space allocation [3], and dynamic parking pricing [4]. These works concentrate on parking lot efficiency and mobility, but fewer studies have emphasized safety inside parking lots, which is another critical consideration in ITS besides their efficiency [5].

Numerous studies have examined risk factors and prevention strategies for typical road components (i.e., intersections [6] and curved roadways [7]) and crash types (i.e., hazmat transportation [8] and rear-end crashes [9]). However, there is sparse research regarding accident safety in parking lots. Parking lots have large spaces, complex internal structures, and changing illumination conditions. With the advancement of ITS, parking lots will accommodate multiple types of elements, such as human vehicles (HV), automated vehicles (AV), electric vehicles (EV), and pedestrians, in diverse interactions. Existing studies have found that AVs are more prone to malfunction in parking lots due to the complexity of the environment [10]. In addition, improving the ability of AVs to sense their surroundings in complex scenarios such as parking lots is also a focus of research. For example, collision avoidance by additional signaling devices to sense a car in front of pedestrians [11], and global collision-free path planning for AVs in underground parking spaces [12]. Meanwhile, despite the low vehicle speeds in the parking lots, frequent blind spots and narrow entrances, and exit ramps raise safety concerns and could cause property damage and casualties. Research on pedestrian tracking under parking lot occlusion [13], real-time positioning of vehicles in underground parking lots based on high-precision maps, and multisensor fusion positioning have also been highly discussed [14]. The dual impact of ITS and complex internal environments may create new parking safety issues. Therefore, there is an urgent need for parking lot safety analyses.

The innovation point of this study lies in analyzing the causes and key factors related to parking lot safety by proposing a quantitative model. This will not only provide a better understanding of the causes of traffic accidents in parking lots but also contribute to the theoretical basis for the application of intelligent transportation in parking lots. The intelligent deployment of parking lots relies on the mature advancement of V2X technology, Internet of Things (IoT) technology, and AVs. The key factors and findings from this study could provide inspiration for the design focus and control strategies of these technologies, such as the environmental elements (e.g., light, ground surface) that need to be sensed by the smart infrastructure and the factors that require additional attention by AVs in the parking lot environment so as to better identify the surroundings and avoid collisions, thereby improving the efficiency of decision-making and avoiding wasted resources.

This study combines accident-based multisource traffic data and a Bayesian network analysis approach. Focusing on parking lot accidents, we implement a proposed mutual information approach to reduce the subjective influence of expert knowledge on model performance and to filter out critical indicators affecting parking lot incidents. Based on the Bayesian network, a multidimensional analysis is conducted for the dual objectives of cause (incident cause) and effect (property damage), and the influence of each factor on the two objectives is obtained. The influence of a single factor on the accident consequences is then considered from five aspects: human, vehicle, road, natural, and social environment, and a logical chain of factors that affect parking lot accidents most is innovatively proposed using mathematical expectations. Based on the model analysis, corresponding strategic measures are established, and a parking lot safety evaluation system is proposed, which can provide technical support for its development to cope with the emergence of new technologies and has specific guidance value for intelligent parking lot design and management strategies.

2. Literature Review

2.1. Feature Selection and Accident Analysis

Over the past few decades, the analysis of feature selection and accidents has become a hot topic of interest in many studies [15]. Feature selection can reduce the dimensionality of the dataset and improve the performance of the model while helping to better understand the causes of accidents and improve road safety. Mutual information is commonly used to select feature and discuss the degree of correlation between information. Gu et al. [16] proposed a feature selection algorithm based on conditional mutual information with conditional correlation minimum redundancy (CMI-MRMR) to reduce the data dimensionality while retaining the maximum degree of information. Chen et al. [17] proposed a new Maximin Conditional Mutual Information (MMCMI) ranking method, which can obtain better classification results in less time.

Since 2009, various statistical models have been developed to analyze traffic accidents [18]. With the further expansion of data volume, more data-driven methods are applied to accident analysis, such as support vector machines, artificial neural networks, the XGBoost model [19], and deep learning. However, algorithms such as deep learning are highly dependent on large amounts of data, and the “black box” characteristics do not allow us to know the inner logic of the causes of accidents. Bayesian networks, on the other hand, are widely used because of their advantages in not requiring large amounts of data and their ability to reflect the correlation and causal logic among multiple factors. Meanwhile, Bayesian can be flexibly combined with other methods to target some specific problems [2022], such as quality reliability analysis under Monte Carlo simulation [23] and risk ranking after random forest prediction results combined with Bayesian [21]. The parking lot data sample is not particularly sufficient to fully reflect the advantages of machine learning analysis, and we are aiming to understand the logical relationships between the factors. Therefore, we will use mutual information combined with Bayesian networks for modeling analysis, and the combination of these two methods has some advantages in accident analysis [22, 24].

2.2. Safety Analysis on Parking Lots

Given the characteristics of low vehicle speeds and complex surroundings, safety assessments and collision analyses on parking lots differ from ordinary road compositions. Some studies have focused on the simulation and safety assessment of evacuation in parking lots to develop dynamic optimization models [25, 26]. Others have studied the impact of accidents in terms of unique environmental factors in parking lots, such as the illumination system design on safety and perception [27]. This study summarizes the factors and studies related to parking collisions, as presented in Table 1. These studies contribute greatly to the analysis of parking lot safety assessment.

3. Methodology

In this study, we developed a Bayesian network model of parking lot safety, which utilizes multisource data as input to analyze the two targets of parking lot accidents: the first event cause (causation) and property damage (impact). The whole modeling process is divided into three steps. (1) First step. The data are preprocessed and discretized after obtaining the multisource data, and the indicators are preliminarily screened based on the mutual information between the variables and the target nodes. (2) Second step. The initial Bayesian network structure is established based on the previously screened variables. Using the network simplification principle proposed in this paper, we deleted the redundant nodes and arcs of the initial network to obtain the optimal model, followed by parameter learning to establish the Bayesian network model of the parking lot with two objectives. (3) In the third step, model validation and analysis are performed. The analysis includes node relationship analysis based on mutual information and strength of influence, sensitivity analysis to determine the impact of each variable on the two objectives, and identification of the most dangerous scenarios using multifactor conditional probability analysis. The implementation process of modeling and analysis is shown in Figure 1, and the specific methods used in each step will be described in the following sections.

3.1. Mutual Information (MI)

Mutual information (MI) is one of the most widely used measures of correlation between a pair of variables in machine learning and is often used to select feature variables [39]. In this study, the critical indicators were firstly screened by using the MI relationship among the variables in the first step. In addition, the MI relationship was also utilized in the network simplification principle in the second step. The MI between variables and is defined as,

This is equivalent to,where denotes the target node (property damage and accident cause), and is the set of factors that affect the target. denotes the uncertainty of the target, and denotes the uncertainty when investigating based on factor . describes the joint probability function of the target and the associated factors. and denote the prior probability functions of and , respectively.

3.2. Data Discretization Methods

In this paper, we use an algorithm based on class attribute contingency coefficient (CACC) for discrete data [40], which is suitable for cases with small data samples and uneven sample distribution and can fully consider all sample distributions to avoid overfitting. The dependency between variables is measured using equations (3) and (4) as scoring functions. Where is the total sample number; n is the number of classifications; is the number of samples of class i in the interval ; is the total sample number of class i; and is the total sample number in the interval . The algorithm process is as follows. Step 1. Determine a data set with a sample size of , discrete variables, and target classifications. Step 2. For each to-be-discrete variable , find the maximum and minimum values as initialization interval boundaries. Step 3. Arrange the values in the initial interval in ascending order and calculate the endpoints of adjacent values. Step 4. Iteratively calculate the CACC value and know that the value is no longer increasing, then the interval is optimal.

3.3. Bayesian Networks and Structure Learning

Bayesian classification can be classified into naive Bayes, seminaive Bayes, and Bayesian networks. Although the Tree Augmented Naive Bayes (TAN) method [41] in seminaive Bayes also uses a mutual information approach to build the network structure, this study still uses Bayesian networks for analysis. The seminaive Bayes method assumes that each feature depends and only depends on one other feature other than it. However, when actually considering parking lot safety, we cannot be sure that each feature depends on only one feature, and some information may be lost by such an oversimplified assumption. Therefore, the seminaive Bayes approach is less suitable than the Bayesian networks. A Bayesian network is a probabilistic graphical model that represents a set of variables and their conditional dependencies through a directed acyclic graph (DAG) [42]. It can provide a convenient framework to represent causal relationships, making inference uncertainty more logically evident. The nodes in the DAG model represent random variables. Variables or propositions that are considered causal (or unconditionally independent) are connected by arrows. The node from which the arrow departs is the “parent node,” and the other is the “child node.” The strength of association or confidence coefficient between variables is described by conditional probability tables (CPT), which can perform tasks such as prediction, diagnosis, and classification. The learning of Bayesian networks mainly includes structure learning, parameter learning, and network inference.

Structure learning is a process that obtains the network structure from a dataset . In this study, the initial network structure is established using the greedy thick thinning (GTT) algorithm [43]. It has a relatively simple learning process, large search space, and high learning efficiency, which searches the optimal network structure by utilizing the scoring function. The specific implementation steps are as follows [44]. (1) Network edge addition: generate an initial network with an empty structure to search and add directed edges that benefit the network structure score until it no longer increases. (2) Network edge deletion: edges that have no positive effect on the network structure are continuously searched and deleted until the structure score no longer increases. The network structure is scored using the function as follows, where denotes the prior probability of the network, is the number of random variables , is the number of states of the node , and is the total number of states of the parent node . The is the number of nodes that take the th combination of states. That is, .

3.4. Parameter Learning

Parameter learning of the Bayesian network is performed based on its constructed structure and data to obtain the conditional probability distribution of the node variables. In this study, the expectation-maximum (EM) algorithm [45] is used for parameter estimation. This learns the parameters using heuristics when there are unobserved implicit data (missing values) in the dataset. The EM algorithm is divided into two steps at each iteration: the E-step and the M-step. For a given data set , there exists unobserved implicit data . The maximum number of iterations is specified as . Then, (1) randomly generate the initial value of the initialized model parameters and (2) perform the jth iteration. This process defines the E-step. The conditional probability expectation of the joint distribution is then computed.where is the conditional distribution and is a newly introduced unknown distribution with . is the joint distribution of and , and is the likelihood estimation function. This process defined the M-step. is maximized to attain . If has converged, the iterations are stopped and the model parameters are taken. Otherwise, the E-step continues to iterate. is computed as follows:

4. Modeling and Analysis

The implementation and analysis of the Bayesian model for parking lot accident safety are presented in this section. The data source, data preprocessing, model analysis, and performance results are introduced.

4.1. Data Source

The crash database from the Florida Department of Highway Safety and Motor Vehicles (DHSMV) is used to model the contributing factors to parking lot accidents using the Bayesian network approach. The data in the DHSMV have two sources, the Florida Department of Transportation (FDOT) Safety Office and the Crash Analysis Reporting (CAR) System. These have recorded Florida traffic accident data since 2007. More than 300 attributes are used to describe and report each accident record, which covers four aspects of the accident: person (driver’s gender, age, etc.), vehicle (damage, impact area, etc.), road (accident location, road type, road markings, etc.), and environment (weather, light, date, etc.). We concentrate on property damage (primary target) and event causes (secondary target) in parking lot accidents. Such accidents are those where the location (SITE LOCATION) is in a parking lot, parking aisle, or stall. Parking types include surface parking lots, underground garages, above-ground parking lots, and multiuse parking lots. We screened the accident data that occurred in these places for subsequent analysis.

4.2. Data Preprocess and Variable Selection

Missing value processing, as well as continuous variable discretization, will be utilized as data preprocessing prior to modeling and data analysis. Data with incomplete information (more than 60% missing) and variables without information changes (i.e., variable values are of the same class) are first removed from the training data set to ensure the required accuracy. As parking lot accident data have no recorded fatalities and most people were only slightly injured, the casualties were unevenly distributed and not targeted for subsequent modeling. Ultimately, 5.92% of the parking lot incident records were missing. Thus, 5,599 events were selected, and 30 variables were retained for model generation. The names of the 30 variables involved and their abbreviation codes and explanations are shown in Appendix.

Before the preliminary screening of variables, continuous variables are discretized using an algorithm based on CACC to retain more knowledge. Variables to be discrete in the original data include continuous variables (MEDWIDTH and PROPERTY) and variables with more than ten categories to be reclassified (POINTIMP and VEHMOVE). The number of discrete categories was tried from four to eight categories, and the result with logical and evenly distributed classification results was selected. For example, POINTIMP is divided into four categories, just corresponding to the collision location in the four directions of the vehicle.

The mutual information-based variable screening was implemented to simplify the model complexity. This study selected and eliminated variables based on the following principles. (i) To retain as much data as possible, we performed variable selection with an MI of 0.5 as the criterion. First, for the preliminary target node (PTN), variables with an MI greater than 0.5 with PROPERTY were kept and denoted as the first-level related nodes (FLRN). (ii) We examine variables with an MI above 0.5 using FLRN one by one. We kept these nodes and denoted them as the second-level related nodes (SLRN). The implementation results are shown in Table 2, where only variables that comply with the requirements for the first time are presented. After filtering, 20 variables remained, and their MI correlation matrix is shown in Figure 2.

4.3. Bayesian Network Modeling

The GeNIe 3.0 Academic was employed to build the Bayesian model based on property damage in parking lot accidents. Prior to structure learning, we incorporated temporal tier rules to specify the temporal order between variables. We considered that DAYOWEEK, WEATCOND, LGHTCOND, TYPESHLD, URBSIZE, LANDUSE, RTESGNCD, and TIME are in higher tiers (later in time), and FRST_EVNT_CAUS_CD and PROPERTY are in lower tiers (earlier in time). No arcs point from variables in higher tiers to lower tiers, which avoids unrealistic logical occurrences.

The structure learning and parameter learning of the network were performed using the GTT algorithm and EM algorithm, respectively. We applied the following simplification principles to simplify redundant arcs and optimize the model structure. (i) Remove nodes that do not connect any arcs as they are irrelevant to model causality. (ii) Remove arcs that are irrelevant and logically incorrect, such as cases where subjective factors influence objective ones. (iii) Arcs with MI < 0.7 were deleted. This condition was tested to reduce the computational complexity while retaining more information. (iv) When there is more than one logical chain between two nodes, the logical chain with the larger MI is retained based on the MI between connected nodes. (v) Each time an arc is deleted, parameter learning is performed, and the accuracy and area under curve (AUC) values are compared using the “with and without comparison method” to ensure the accuracy improved at each step.

The resulting structure of the Bayesian network for parking accidents is shown in Figure 3. Thus, the network contains a total of 17 variable nodes. The lighting conditions and vehicle movement directly affect property damage, with weather conditions, time of day, and maximum parking lot speed limits being the first tier of indirect factors. Meanwhile, the point of impact and vehicle movement directly determine the event cause. After parameter learning, the Bayesian network model for the causality analysis of parking lot accidents is determined, as shown in Figure 4.

4.4. MI and Strength of Influence (SI) Analysis

Together with the MI, the strength of influence (SI) between the arcs in the network also indicates the influence magnitude between variables. The SI is calculated from the conditional probabilities of the child nodes. The MI and SI between all arcs and corresponding nodes are shown in Table 3.

With the integrated attention to the MI and SI analysis between variables, the lighting conditions (MI = 0.80, SI = 0.62) and vehicle movement (MI = 0.62, SI = 0.51) have the strongest correlation with the primary target (PROPERTY). The point of impact (MI = 0.22, SI = 0.34) and vehicle movement (MI = 0.85, SI = 0.32) have a relatively significant impact on the secondary target (FRST_EVNT_CAUS_CD). In addition, the time has a strong relevance to both lighting conditions and vehicle movement, and the high influence of the strength value ensures it has a potential relationship with the land use (SI = 0.56) and route signing qualifier (SI = 0.36). The maximum speed limit and vehicle type are the most likely to affect the point of impact and have some influence on the accident event cause.

4.5. Sensitivity Analysis

A sensitivity analysis [46] is a method to investigate the effect of small changes in probability on target nodes. Higher sensitivity values indicate that the node is more likely to cause interference to the target. The results of sensitivity for each variable to two target nodes are shown in Figure 5. The day of the week and maximum speed limit had the strongest influence on both the property damage and the event cause. The urban size and vehicle movement also had greater impacts on property damage, while vehicle movement placed significance on event causes. Moreover, lighting and weather conditions only affected property damage, implying that poor lighting and weather may lead to more severe accident losses. The vehicle movement, type, speed, point of impact, type of vehicle damage, and number of people involved all affected the cause of the accident to different extents. Most of these factors are related to the properties of the vehicle and can shed some light on accident prevention for drivers.

4.6. Conditional Probability Analysis

This subsection explores the impact of a single-factor change on the two targets and investigates the impact of a multifactor combination change on property damage. A series of factors and corresponding states that have the most significant influences are given. The conditional probability of the PROPERTY and FRST_EVNT_CAUS _CD influencing factors is shown in Figures 6(a) and 6(b), respectively. The following analysis discusses the impact of each factor on the targets in terms of the environment, people, vehicles, roads, and accident consequences.

4.6.1. Natural Environment

Mondays and Fridays are more sensitive to changes in property damage with accidents occurring on Mondays being more likely to have medium damage or collisions with gates. Poor streetlights significantly increase the likelihood of collisions with parked vehicles and fixed objects, which results in high losses. Early morning to midday is more likely to have vehicle-to-vehicle collisions with lower property damage, while more costly collisions with gates usually occur in the afternoon through midnight.

4.6.2. Social Environment

In high-density business centers and large cities, the probability of a backed-in collision increases slightly, but the risk of property damage is low. Parking lot accidents in residential or urban areas are usually vehicle collisions with gates. This implies that comprehensive and robust safety management measures and gate designs should be major considerations in parking lot planning and design.

4.6.3. People

The probability of higher damage loss increases in line with the number of people involved in parking lot accidents. This gives a greater likelihood of rear-end and backed-in collisions.

4.6.4. Vehicles

The probability of property damage greater than $750 increases by 13–26% at moderate speed limits (between 30 and 40 mph), implying that high- or low-speed limits can increase driver safety awareness. Medium speed limits do not restrain drivers with careless driving causing more property damage. Higher speed limits in parking lots decrease the probability of rear-end, backed-in, and collision with gates. Vehicles are more likely to collide with gates when going straight and with fixed objects when turning left. There is an increased probability of backed-in collisions when backing up. Parked and slow-moving vehicles are safest in parking lots because the probability of all accidents is lower. Angular collisions are more likely to occur in left turn and driverless situations. As speeds increase, vehicles are more likely to have high-loss collisions with other vehicles and objects in parking lots. This means that the complex environment of parking lots continues to have a greater impact on driving safety. Heavy trucks and commercial cargo are prone to collisions with gates, while commercial cargo may suffer moderate to high damage.

4.6.5. Roads

The probability that damage at a bypass exceeds $750 will increase with more traffic flow, implying that high-risk accidents are more likely to occur. Rear-end and backed-in collisions are the most common types of accidents as they have the highest probabilities in most cases.

4.6.6. Accident Consequences

Collisions with gates are unlikely to cause disabling damage to vehicles. Right front and rear-end crashes are more likely to result from angular collisions. The left front door corner may be presumed to be a backed-in collision or a collision with gates. The overall property damage is higher when the left side of the vehicle is damaged.

As multiple factors contribute to accidents, it is necessary to study the probability distribution of multiple factors under various condition combinations. Multifactor conditional probability analysis was performed for property damage because event causes cannot be ranked quantitatively. We used expectations to characterize the amount of property damage. The expectation of property damage is calculated as follows:where is the expectation of property damage corresponding to the th state of the th variable, represents the conditional probability corresponding to the th state of the th variable, represents the number of target classification nodes, and represents the average of the two ends of the th interval. The factor conditions with an expected loss of more than $900 are selected and ranked in Table 4.

Based on the causal logic of nodes in the network, we formed a complete logical chain by finding the preceding order nodes in Table 5 starting from the target node (PROPERTY). We list all scenarios and rank the expectations of property damage for each case. For instance, one of the predecessor nodes of PROPERTY is VEHMOVE, and one of the predecessor nodes of VEHMOVE is MAXSPEEDLMT. Then, the logic chain is MAXSPEEDLMT ⟶ VEHMOVE ⟶ PROPERTY. If we consider VEHMOVE = Making a left turn, MAXSPEEDLMT ≤ 30 mph, the expected property damage is calculated as $1151.25. A total of nine combinations are considered in this paper, and the top three rankings are shown in Table 5.

The highest property damage in Table 5 was caused by slow-moving vehicles at speed limits below 30 mph, $1,643.50. Although previous findings confirm the low probability of slow-moving vehicle accidents, slow-moving vehicles still carry a high risk of property damage under these speed limits. Left-turning vehicles under speed limits of less than 40 mph will also have great losses ($1,527) in accidents, and operators need to take precautions. It is speculated that parking lot speed limits may increase the risk of vehicle movement that would not otherwise exist, which affects accident damage. Moreover, high-loss parking accidents are most likely to occur at dimly lit bypasses at night near high-density residential areas in large cities, resulting in $1,479 in damage. Managers should consider factors such as night, lighting, and bypasses.

4.7. Model Validation

The accuracy and generalization ability of the model is verified using the leave-one-out (LOO) [47] method built in GeNIe. LOO is a special case of k-fold cross-validation, where k equals the number of samples in the dataset M. The method requires the network to be trained on M − 1 samples, tested on the remaining samples, and repeatedly implemented M times. LOO was shown to be more suitable for this study than k-fold cross-validation. Overall, the accuracy of the final prediction of property damage and accident causes reached 85.71% and 64.42%, respectively. The mean values of the AUC reached 0.978 for PROPERTY and 0.830 for FRST_EVNT_CAUS_CD. The results of their confusion matrix are shown in Tables 6 and 7, where the notation in Table 7 is consistent with Figure 6.

The prediction accuracy is low when the property damage is between $425 and $475 or over $2550. The rest of the states have high prediction accuracy, which are above 80%. In terms of accident causation, the prediction accuracy is lower for front-end collisions because the model confuses them with angle collisions. This may be due to the fact that the states of the remaining factors do not differ much under these two collision types. The angle collision has the highest prediction accuracy, and the other accident causes have about equal accuracy. The uneven data distribution may lead to some inaccuracy in the model during the training and recognition process. In the future, the study will enrich the data and expand the dataset volume to further explore the model performance.

To compare the performance of different methods, we perform further analysis for the primary target of the model (property damage). The result is verified by comparing the property damage of four different models under the same modeling process. They are the models proposed in this paper (with 20 variables), the expert knowledge model (with 10 variables summarized in Table 1), and the models under equal-width discretization and equal-volume discretization, respectively. The data sets used in the latter two of these models are the same as those used in the model proposed in this paper.

According to Figure 7, the accuracy and AUC of the proposed model outperformed the other models in predicting different states, with a greater advantage in AUC (significantly higher than the other models) when the property loss was below $1550. By comparing the ROC curves of different models in predicting property damage (Figure 8), it can be found that the training effect of the proposed model is better than other model settings. The next better results are models under equal-width discretization and equal-volume discretization, and the worst results are obtained by the model using only expert knowledge. Thus, it is confirmed that the proposed model has good adaptability and generalization ability.

4.8. Evaluation System

The Bayesian modeling analysis of parking lot accidents reveals variations in the influence of different factors on parking lot safety. Thus, a parking lot safety evaluation system is developed to present guidance for planning and management. The system is divided into three main phases: multisource data fusion, parking lot Bayesian network modeling, and safety evaluation and management (see Figure 9).(1)In the multisource data fusion phase, various sensors, image recognition, and deep learning technologies are used to collect various types of information, including people, vehicles, roads, and aspects of the environment. Through database systems and cloud storage services, data from various departments (meteorological department, traffic department, etc.) are fused with historical accident data to collect comprehensive information related to parking lot safety. The integrated dataset can be used to build subsequent Bayesian models.(2)In the parking lot Bayesian network modeling phase, machine learning methods are used to build a parking lot safety diagnosis model. The Bayesian approach based on MI preprocessing can reduce the interference of subjective experience to the model and provide more realistic conclusions. The associated accident diagnosis model uncovers potential risks and key factors that influence parking lot safety.(3)Safety evaluation and management. Managers and decision-makers can adjust their parking management approach based on the measurement library by responding to the highly sensitive indicators fed back by the model. The model conclusions have a guiding role for safety and security management. The proposed library of measures can significantly enhance the safety control of parking lots. These measures will be combined with emerging ITS technology to improve the self-governance and intelligence of parking lots (see Table 8). Factors with intrinsic characteristics do not propose corresponding measures as they require external planning that cannot be improved quickly.

With the emergence of diverse ITS technologies, data-driven modeling approaches are increasingly being applied to safety management. Data-driven parking safety evaluation systems based on the Bayesian approach are the primary achievements of parking safety management. This approach can be well integrated with advanced technologies and has good adaptability and scalability in diverse situations. The perception and recognition of artificial intelligence technology inside parking lots help develop guidance systems and optimization algorithms to reduce the risks of sensitive indicators in advance (first phase). The IoT technologies ensure rapid information interoperability between vehicles and facilities, which allows for the safety of autonomous and unmanned vehicles in complex environments such as parking lots. [48] enhance vehicle communications and boost V2X technologies, which ensures that real-time information monitoring in parking lots is up-to-date and precise. This process can significantly enhance parking lot safety control in large cities and high-density areas. Focused monitoring of attention indicators allows for the rational and efficient use of resources for allocation. With the impending arrival of new technologies, the Bayesian network-based parking lot safety evaluation system is promising in more future scenarios. This provides efficient and profound assistance and insight to decision-makers.

5. Conclusion

In this study, a Bayesian network diagnosis model is developed to analyze property damage in parking lot accidents using multisource data. The mutual information approach for factor selection before modeling provides an objective basis for a priori network construction and variable screening, which reduces the subjectivity characteristic of Bayesian methods. The accuracy of the final prediction for property damage and accident causes reached 85.71% and 64.42%, respectively.

The direct influences on property damage are the light conditions and vehicle movement. Bad weather, dim lighting, and late afternoon also cause higher amounts of damage. Parking lot speed limits can raise the risk of vehicle movement that would otherwise not exist, which increases accident losses. Drivers should be aware of left turns, backing, and straight-ahead behaviors, as well as bypass areas with high traffic volumes. Commercial cargo and heavy truck drivers should pay extra attention to their surroundings. The number of people in the vehicle is positively correlated with property damage, and all other vehicle-related factors should be considered. The combination of factors with the most property damage is the maximum speed limit ≤ 30 mph ⟶ slowing/stopped/stalled, and the most complex factors involved are bypass ⟶ large urbanized ⟶ high-density residential ⟶ 18:00–24:00 ⟶ dark (streetlights).

The conclusions also indicate that the point of impact and vehicle movement directly determine the cause of accidents. High-density residential and commercial areas require extra attention to collisions between vehicles and gates, especially for heavy trucks and commercial cargo. Gate geometry and layout design should be major considerations in parking lot planning and designs. Inadequate lighting and increased vehicle speeds can lead to collisions between vehicles and parking facilities.

A safety evaluation system applicable to parking lots is proposed based on the model. The system consists of three components: multisource data collection, data-driven Bayesian modeling of key factors, and safety management and measure implementation. The incorporation of emerging technologies allows the system to have a good adaptability and application prospects in various scenarios. However, the study has some shortcomings. With the emergence of new technologies, data collection and acquisition will become increasingly smart and accurate. Better uniform and reasonable data distributions give additional data dimensions, which will be examined and discussed in future studies.

Appendix

The definition of the variables in the database and their abbreviations is shown in Table 9. The continuous data are labeled with, and those with too many categories requiring reclassification are labeled with.

Data Availability

The data and record of accident reports are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors are grateful to the University of South Florida for their data support.