Abstract

Real-time status acquisition of parking spaces is highly valuable for an intelligent urban parking system. Crowdsourcing-based parking availability sensing via connected and automated vehicles (CAVs) provides a feasible method with the advantages of high coverage and low cost. However, data trust issues arise from incorrect detection and incomplete information. This paper proposes a trustworthiness assessment method for crowdsourced CAV data considering different impact factors, such as the distance between the CAV and the target parking space, line abrasion, scene complexity, and image sharpness. The crowdsourced CAV data are collected through extensive field experiments and PreScan simulations. The classical line detection algorithm of VPS-Net and the target detection algorithm of YOLO-v3 are applied to detect on-street parking availability. A failure probability model based on the XGBoost algorithm is then developed to establish the relationship between data trustworthiness and different impact factors. The results show that the proposed model has an average accuracy of 78.29% and can effectively assess the degrees of external influences on the trustworthiness of the crowdsourced data. This paper provides a new tool to identify the data quality and improve the sensing accuracy for a crowdsourcing-based parking availability information system.

1. Introduction

Due to the rapid increase in car ownership and traffic demand, most cities are confronted with intractable problems such as urban traffic jams, parking resource shortages [1], and environmental pollution [2]. Real-time status monitoring of on-street parking spaces is an essential foundation and premise for solving urban parking problems caused by unbalanced supply and demand. Over the past decade, several effective measures have been developed to alleviate searching-for-parking traffic by improving parking resource use efficiency, including advanced parking management strategies [3], parking reservation and dynamic allocation [4], park-and-ride [5], automated valet parking [6, 7], and cloud-based centralized parking dispatching [8, 9]. Hence, efficient and reliable methods for sensing on-street parking availability are critical for the quality enhancement of parking services and the digital transformation of urban traffic management.

Accurately sensing on-street parking status has remained difficult during the last few decades. Most research uses data from specially deployed in-ground sensors in parking lots or garages [10, 11]. Based on the recent advances in sensing and communication technologies, an increasing number of researchers use wireless sensors such as light sensors, distance sensors based on infrared or ultrasound, magnetometers, and even combinations of different sensor devices [12]. However, a huge amount of asphalt digging is required for in-ground sensors, especially loops, which require intrusive installation [13]. Due to its scarcity caused by frequent use, the installation and maintenance of asphalt have become relatively expensive. Furthermore, in-ground sensors have limitations that must be considered during the design phase, including the requirement for charging by cables or batteries [4]. Rather than specially deployed sensors, some studies use data from the already deployed infrastructure, such as on-street parking payment management systems and parking meters [14]. However, missing data have consistently been a noticeable problem in existing methods due to underpaid/overpaid and unpaid transactions. Additionally, this approach depends on the scope of the deployment, and a domain-wide deployment still requires significant overheads.

Fortunately, connected and automated vehicles (CAVs) have progressed considerably in recent years [1518]. Sharing is a major development direction in the future [19, 20]. Crowdsourced data collected by CAVs are widely used in the scientific literature, and practical applications related to intelligent transportation demonstrate the enormous potential for sensing parking availability status [21]. Unlike roadside devices [22, 23], onboard units (OBUs) such as onboard millimeter-wave radar and vehicle-around-view monitors could be more convenient and economical with high update frequency and broad coverage [24]. Study [25] is a typical solution for citywide parking availability sensing based on a fleet of taxis. Parking detection sensors are installed on cabs; these can collect information on the availability of parking spaces and show that crowdsourced data have high suitability as a source for sensing parking status. Meanwhile, some studies demonstrate that the crowdsensing solution would require a significantly smaller number of sensing units than a fixed sensing system [26]. Therefore, due to considerations of cost-effectiveness and reliability, onboard video sensors become the preferred choice.

However, crowdsourced data introduce new issues and challenges. The most significant of these problems is the inferior data quality caused by many external factors such as road scenarios, environmental conditions, facility status, and sensor capability [2730]. Data quality can directly affect the accuracy of on-street parking status monitoring. Meanwhile, CAV trajectories are highly random in space and time, exacerbating the impact of data quality on the reliability of estimates due to the uncertain coverage and update frequency. Accordingly, evaluating the trustworthiness of crowdsourced data from CAVs is highly valuable to avoid incorrect judgments arising from the excessive trust. Therefore, this paper focuses on the image data obtained from onboard video sensors. It aims to reveal the mechanism of different external factors on parking detection accuracy and propose an assessment method for real-time status monitoring of wide-area parking availability based on crowdsourced data. The main contributions of this paper are summarized as follows:(i)A trustworthiness assessment framework is proposed for crowdsourced CAV data through systematic simulation experiments, and four environment-related factors affecting the parking detection algorithm are introduced, laying the foundation for improving the accuracy of parking availability detection in an urban context.(ii)A failure probability prediction model of parking availability sensing is developed based on the XGBoost algorithm, which can quantitatively reveal the influence of different factors on the data trustworthiness of CAVs.

The organization of this paper is as follows. Section 2 introduces the related work, including methods of parking availability detection and the data quality of current image-based object detection methods. Section 3 describes the proposed trustworthiness assessment framework. Section 4 analyzes the external factors for the detection model. Section 5 presents the study results through an XGBoost-based trustworthiness assessment model for single vehicle detection. Finally, Section 6 summarizes the findings and future work.

2.1. Parking Availability Detection

Several technical routes exist for citywide parking space detection. A typical method is based on electromagnetic induction. For example, a loop detector determines the occupation of parking spaces using an electromagnetic field with a quantifiable inductance; the field is interrupted and the inductance is reduced when vehicles pass the loop [31]. Magnetic sensors detect parking spaces through the magnetic variations caused by the presence of vehicles [32]. Alternatively, by comparing the counts between two magnetic sensors installed along the pathway, the number of vehicles between them can be obtained [33]. Similarly, piezoelectric sensor detection depends on the induced electric energy resulting from the substance vibration or mechanical stress [34]. However, these methods are vulnerable to being affected by surroundings. Magnetic sensors can be influenced by large metal objects nearby. Ultrasonic and infrared sensors are sensitive to temperature and air pressure. Pneumatic tubes suffer from stress. Even for an inductive loop, the sonar and microwave detectors are sensitive to the vehicle’s speed because they fail to detect slow or stationary vehicles [35].

Another popular technical route is leveraging the image-based solution [36, 37]. Video sensors can detect multiple spaces simultaneously and provide wider area monitoring than other sensors [27]. Additionally, video sensors offer relatively low cost due to their easy installation, operation, and maintenance [38]. Maeda and Ishii [39] compared collected images with reference images using a normalized principal component of feature characteristics; however, obtaining and updating the reference images are difficult. Some studies use the typical shape of car elements for detection, but this requires many pixels per vehicle. Yamada and Mizuno [40] proposed an approach to detecting vehicle presence with grayscale images. They fragmented each image region corresponding to a cell through density and analyzed the segment area distribution. Baroffio et al. [41] provided a method depending on the hue histogram and linear support vector machine (SVM) with high accuracy. In recent years, with advances in deep learning, some researchers have introduced this to parking space detection. Fan et al. [42] applied deep learning to parking space detection tasks and proposed various neural network-based models, including the multistep long short-term memory recurrent neural network (LSTM-NN) model [43]. Feng et al. [44] introduced a hybrid deep learning framework called dConvLSTM-DCN, designed for accurate prediction of short-term and long-term vacant parking space availability within a region, and developed an intelligent parking guidance system using a deep gated graph recurrent neural network (G2RNN) [45]. Regarding image-based methods, Zhang et al. [46] proposed a method based on DCNN with YOLO-v2 to detect marked points in images. As image data are more complex than other data, Zinelli et al. [47] used an RCNN-based framework to adapt to various conditions. However, RCNN strongly depends on object proposals. Additionally, Suhr et al. [48] used CNN to detect parking spaces in combination with global information and the attributes of the parking spaces. Nurullayev and Lee [49] proposed a method based on a dilated convolutional neural network specifically designed for detecting parking spaces. These methods still suffer from low recognition rates, sensitivity to environmental changes, and weak generalization. To address these problems, Xu and Hu [50] proposed the YOLO-v3-based VPS-Net, the detection method adopted in this paper.

2.2. Regarding Image-Based Methods
2.2.1. Data Quality of Image-Based Object Detection

Data quality is relatively essential to real-time monitoring of parking availability status based on crowdsourcing data. Bock et al. [25] reported that an inaccurate detection result strongly influences the sensing of parking availability status and applied Kalman filters to overcome this issue. Extensive research has been conducted on the factors affecting the accuracy of identification results. Dorafshan et al. [51] suggested that edge clarity can impact crack identification and degrade accuracy for challenging settings, such as low lighting conditions, the presence of shadows, and low-quality cameras. Huang et al. [52] assumed that the interference of various types of objects in the picture and the intensity of light necessarily affect the performance of object detection. Zhu [53] indicated that identifying road traffic conditions would be influenced by various factors, including weather and road condition factors. Tabernik and Skočaj [54] proposed that occlusion, brightness, color alteration, distortion, and skew occurring in the background can pose a risk to object detection. Dewi et al. [55] demonstrated that the target size impacts the accuracy of image recognition.

For parking space detection, some studies also offer relevant influencing factors. For example, Amato et al. [56] showed that obstacles such as lampposts and other cars are closely related to detection accuracy. Ling et al. [57] demonstrated that image data from car parking spaces are sensitive to lighting and weather conditions. Yamada and Mizuno [40] demonstrated that the surface of the parking space would influence the detection results, especially for poor-condition white mark-off lines. Tang et al. [58] showed that deep learning models for parking space recognition are subject to variable environments, such as illumination changing, occlusion, and weather. Ichihashi et al. [59] proposed that weather, such as raindrops, can cause the camera to become distorted and make the sharpness of the image less clear, thus affecting the performance of camera-based vehicle detector for parking lot. Zaidi et al. [60] indicated that there are many reasons, such as occlusion, lighting, pose, and perspective, that can pose a challenge to the detection of neural networks. However, previous studies have considered a single factor, limited to the image quality or content. Conversely, this paper combines these two factors, considering the effects of the image and the surrounding environment in which it was taken on monitoring car parking images under different lighting conditions.

3. Trustworthiness Assessment Framework

3.1. Image-Based Parking Availability Detection

The parking space detection algorithm is the key to parking space status sensing. Its function is to accurately identify on-street parking spaces and determine the occupancy status of parking spaces. The image data collected by the surround-view camera cannot completely encompass the four endpoints comprising a parking space. Only the two closer endpoints comprising the entrance line of the parking space can be obtained, which cannot infer the type of parking space. Therefore, as shown in Figure 1, the classical VPS-Net algorithm [61] is applied to identify the outer entrance line endpoints of parking spaces through image grayscale processing and estimate the other two endpoint locations and the type of parking spaces. Meanwhile, a YOLO-v3 pretrained detector is used to detect and classify all marker points and parking space endpoints. Target detection is then performed based on matching pairs of marker points with geometric information. The reliability of the results of parking occupancy classification is enhanced by a deep convolutional neural network (DCNN).

3.1.1. Identification of Parking Space Endpoints and Entry Lines

Assuming that the two identified endpoints and meet certain confidence requirements, that is, they can form an effective entrance line of a parking space if it is the entrance line of a parallel parking space, ; if it is a vertical or an inclined parking space, . The parameters , , , and are based on a priori knowledge of the entrance line lengths of the different parking spaces.

After the endpoints satisfy the distance constraints, forming a valid parking space entry line may still be impossible. This can be solved by classifying local image patterns formed by two endpoints into predefined classes. A local coordinate system is established with the origin at , the midpoint of and , and as the X-axis. The rectangular region R is defined in the coordinate system. Its length along the X-axis is , and its length along the Y-axis is . The calculation equation is as follows:where and are hyperparameters controlling the width and height of the rectangular region.

3.1.2. Complete Parking Space Deduction

The complete parking space is obtained by deduction based on geometry and prior knowledge, as the video (picture) collected by the surround-view camera tends not to show the parking space completely. Each parking space comprises four points , , , and . Here, and are the two endpoints comprising the entrance line of the parking space, and and are the other two endpoints not covered in the image, whose coordinates can be calculated as follows:where and denote the angle and depth of the parking space; is the angle of the vertical and parallel parking spaces; and are the depths of the vertical and parallel parking spaces, respectively; is the depth of the inclined parking space; and and are the angles at an acute or obtuse angle, respectively.

3.1.3. Parking Status Classification

Regularization is required to maximize the classification performance due to the varying sizes of parking spaces in the surround-view image. Therefore, parking spaces are cut and warped to a uniform size of 120 × 46 pixels, depending on their position in the image. Perspective transformation techniques are used to implement this warping process. The four boundary points of the parking spaces in the image are used as source points. The target points are the four vertices of a fixed rectangle of 120 × 46 pixels. A series of labeled images is thus obtained, and these images are divided into positive and negative samples. The positive samples are the vacant parking spaces, and the negative samples are the occupied parking spaces. The number of training samples is then further increased by a 180° rotation transformation.

The parking space detection algorithm distinguishes the parking space occupancy based on different color markings. A green rectangle is marked when the parking space is identified as free, and a red rectangle is marked when the parking space is identified as occupied.

3.2. Experimental Design

We use PreScan to build the simulation scenario and collect the on-street parking simulation data. We exploit the dataset based on the measured data and PreScan simulation data to explore the probability model of the sensing failure of the parking state under different scenarios and interferences in this paper. The simulation dataset is prepared as follows:(1)An experimental simulation model is built, as shown in Figure 2, including elements such as parking spaces, moving vehicles, stopped vehicles, and obstacles. A total of 200 parking spaces (100 vertical and 100 parallel) were placed on both sides of the five simulated main roads. Since the buildings are not the concern of the project, their placement, orientation, and interval distance are set randomly; secondly, the traffic light waiting time, the yielding mechanism of sidewalks, and the traffic sequence of intersections are irrelevant to parking space detection and are thus not considered in the simulation. Simultaneously, some of the vertical parking spaces on the left side of the road are randomly occupied and pedestrians and electric vehicles for delivery are set between some parking spaces to fit the real scene.(2)In the field experiment, the image data of the surround-view camera are collected by installing a camera at the front and rear of the inspection vehicle and stitching the pictures of four viewpoints through a perspective transformation. In the simulation experiment, photos of the same perspective can be collected by setting the specific parameters of the sensor to simulate a camera with a vertical downward shooting angle directly above the inspection vehicle (at 30 m).(3)Additionally, an image input to the parking space detection algorithm with a size that does not meet the algorithm input requirements produces some distortion, affecting the accuracy of the parking space detection algorithm. Therefore, the size of the sensor output image is consistent with the input size of the parking space detection algorithm, which is 600 × 600 pixels. Subsequently, the frame rate is set to 10 Hz; that is, 10 pictures are taken per second. Relevant parameters for sensor installation position and output setting are shown in Figure 3.(4)The following describes the three states of the detection algorithm: correct detection, false detection, and missed detection, as shown in Figure 4.(i)Correct detection: when a car is in the identified parking space and the parking space is occupied or when no car is in the identified parking space and the parking space is empty.(ii)False detection: when a parking space is empty and recognized as occupied by the parking space detection algorithm or when a parking space is occupied by a car but detected as empty.(iii)Missed detection: the parking space is not detected when multiple frames containing it are not detected.(5)Since the frame rate of the onboard sensor is set to 10 Hz, a parking space contains multiple consecutive detection pictures, meaning a parking space has multiple detection results. When the detection results are consistent, the results can represent the current occupancy status of a parking space. The status of the parking space is marked as occupied regardless of whether the space is occupied when the results displayed by multiple frames of detection pictures are inconsistent to avoid cruising caused by system indication errors.

4. External Factors for the Detection Model

4.1. Definition and Classification of External Factors

The parking detection method requires identifying parking space endpoints and entry lines. The quality of the captured images, including the clarity of the parking space line endpoints and entrance lines, impacts the identification [62, 63]. Based on the principle of identification, four main factors can be identified that affect the imaging of parking space endpoints and entry lines: (1) roadside distance, (2) line abrasion, (3) scene complexity, and (4) image sharpness, as shown in Figure 5.

4.1.1. Roadside Distance (D)

Roadside distance (D) refers to the vertical distance between the vehicle with the surround-view camera and the entry line of the on-street parking space. This affects the size of the image captured by the surround-view camera and the degree of image edge distortion.

4.1.2. Line Abrasion (A)

Line abrasion (A) refers to the degree of missing or faded white lines of parking space entry due to vehicle movement, weather, and other reasons. The parking space identification detection model outlines the complete parking space by identifying the two endpoints of the entrance line and estimating the locations of the other two endpoints. Therefore, the abrasion of the parking space entry line significantly impacts whether the on-street parking space can be accurately identified.

4.1.3. Scene Complexity (C)

Scene complexity (C) refers to the composition of traffic elements constituting the traffic flow close to the parking space. This affects the identification accuracy of the parking space identification when pedestrians, electric vehicles, and other traffic elements encroach on the two endpoints of the parking space entry line.

For the complexity of individual parking spaces, a multivariate linear model is defined as follows:where denotes the traffic complexity of the i-th parking space; denotes the obstacle category; denotes the weight coefficient corresponding to that obstacle category, proportional to the single footprint; and denotes the number of obstacles in that category.

The obstacles are divided into three categories in the PreScan simulation experiment to better simulate realistic scenarios: (a) pedestrians (0.4 × 0.7 m); (b) electric vehicles (2.3 × 0.8 m); and (c) boxes, barricades, and so on (1.0 × 1.0 m), and assigned weights according to the footprint, as shown in Table 1.

Therefore, the traffic complexity of a single on-street parking space is calculated as follows:where is the number of pedestrians close to the parking space; is the number of electric vehicles; and is the number of boxes, barricades, and so on.

The overall traffic complexity is defined as follows:where indicates parking spaces in the model and takes a value of 200 in this paper.

The number of traffic elements in the parking space is extracted from the image by semantic object color mapping in the image segmentation sensor in PreScan, as shown in Figure 6. After semantic segmentation, different classes of objects are represented by different colors, and the number of different obstacles in each parking space can be directly output for the complexity calculation.

4.1.4. Image Sharpness (S)

Image sharpness (S) refers to the quality of the captured images. The image obtained by following the car is blurred in the scene due to stains on the camera, slow focus speed, and so on. The parking space identification algorithm cannot detect the two endpoints of the parking space entry line when the dataset is overly blurry. Therefore, the image sharpness of the collected image dataset is also an important factor affecting detection accuracy.

The factors were classified based on the definitions to further quantify and compare the impact of each factor on the accuracy of identifying on-street parking spaces, and the classification criteria are shown in Table 2.

4.2. Failure Probabilities Corresponding to Different Combinations

The application scenarios are first divided into four categories: (1) normal light on sunny days, (2) weak light at night, (3) harsh light on sunny days, and (4) rain and fog with low visibility. We then combine each application scenario (weather conditions) and the four influencing factors. The overall identification failure probability of the 200 parking spaces included in the simulation model corresponding to each combination is presented as a dataset. The failure probability refers to the percentage of 200 parking spaces subject to false and missed detection under different application scenarios and factors. Each application scenario corresponds to a total of 24 = 16 different combinations of factors and their corresponding failure probabilities, resulting in 64 data items for the four application scenarios. The collected data are shown in Table 3, taking normal light on sunny days as an example. The values for each factor are determined according to the classification criteria in Table 2.

Figure 7 shows the statistics and visualization of the failure probabilities corresponding to different combinations of factors in four application scenarios. Figure 7(a) shows the probability of failure corresponding to each of the 16 combinations of influencing factors in the four application scenarios, while Figure 7(b) better compares the impact of the four application scenarios on detection.

The failure probability can reflect the application effect of the parking space identification model in different scenarios. The failure probabilities obtained for normal light on sunny days and weak light at night are roughly similar for different combinations. In the application scenario of rain and fog with low visibility, the failure probability of parking space detection was high for all 16 cases, and the failure probability reached 60% for individual cases. The white fog may weaken the strong contrast between the white entry line and the road color in the overhead-view pictures taken on rainy and foggy days. The difference between the white entry line and the road color after graying out is unclear. The algorithm cannot determine the parking space entry line, resulting in detection failure. Thus, the model cannot currently be applied successfully to the scenario. In summary, the algorithm should be applied in four scenarios in order of effectiveness: normal light on sunny days, weak light at night, harsh light on sunny days, and rain and fog with low visibility.

Additionally, the comparison reveals that the influencing factor of image sharpness (S) has less impact on the identification accuracy. The peak in the fold at scene complexity (C = 2) indicates that the traffic complexity (C) near the parking space has a greater impact on identification accuracy.

5. XGBoost-Based Trustworthiness Assessment Model

5.1. Data Description

It is also necessary to obtain specific values for the different factors and the corresponding detection results for each parking space to construct a predictive model for the accuracy of parking detection under different factors. The specific data focus on the failure probability of parking detection for a single parking space under different factors compared to the graded data.

A specific dataset includes specific values for the four influencing factors and detection results. A total of 518 random frames were sampled from the PreScan simulation model under normal light on sunny days by stitching the images together. Part of the specific data is shown in Table 4.

The specific data no longer classify the roadside distance, entry line abrasion, or traffic complexity but remain precise to a specific value. However, the image sharpness is still divided into two levels, 1 and 2. The detection results are 0, 1, or 2, corresponding to correct, missed, and false detection, respectively. Each type of data collection is specified as follows:(1)Roadside distance: in the simulation, the lane width is set to 3.5 m, so the vertical distance from any point in the vehicle’s trajectory to the on-street parking space is available.(2)Line abrasion: the values range from 1.7 m (vehicles traveling near the center line of the lane adjacent to the parking space) to 5.3 m (vehicles traveling near the center lines of the two lanes adjacent to the parking space).(3)Scene complexity: the value of the scene complexity is computed by equation 5.(4)Image sharpness: As it is impossible to quantify the degree of camera contamination in the simulation, the clarity of the image data is still graded by setting the camera effects of the sensor. A value of 1, when set to “Default,” means that the camera is clear, while a value of 2, when set to “DirtyWindow,” means that the camera is contaminated and the collected images are blurred.

5.2. Model Description

Integrated learning is proposed to train and fit the data to classify and predict the results corresponding to any combination of factors. Integrated learning can combine multiple weakly supervised models to obtain a superior, more comprehensive, strongly supervised model. Compared to weakly supervised learning, integrated learning is faster, better in real time, and more accurate. The XGBoost algorithm is an implementation of integrated learning, which significantly improves the speed and efficiency. Therefore, we construct an evaluation prediction model based on the XGBoost algorithm to improve the interpretation and prediction of the impact of four environmentally relevant factors on parking space identification.

The objective function of the XGBoost algorithm iswhere is the loss function and is a constant term, is a regression tree, and is the regular term (including L1 regular and L2 regular), used to define the complexity. This limits the number of leaf nodes in the tree to avoid the tree being oversized. The smaller the value of this term, the lower the complexity and the greater the generalization ability. The expression is as follows:where is the number of leaf nodes, is the mode of the leaf node vector, is the difficulty of the node cut, and is the L2 regularization factor. The ultimate goal of XGBoost is to make the predicted value as close as possible to the true value with as good a generalization as possible.

The core idea of the XGBoost algorithm is to continuously perform feature splitting to grow a tree. With each added tree, a new function f(x) is learned to fit the residuals of the last prediction.where ; is the score of leaf node , which corresponds to the set of all K regression trees (regression trees) and is one of the regression trees. When the training is completed to obtain K trees, the predicted value of this sample is the sum of the scores of the corresponding leaf nodes of each tree.

Compared to the classical GBDT algorithm, the XGBoost algorithm has undergone some improvements, significantly improving effectiveness and performance. The XGBoost algorithm expands the objective function, Taylor, to the second order, preserving more information about the objective function. The XGBoost algorithm adds a strategy to automatically handle missing value features. Samples with missing values are automatically partitioned by dividing the samples with missing values in the left or right subtree and comparing the advantages and disadvantages of the objective function under the two solutions. The algorithm does not require preprocessing of missing features for padding [56].

5.3. Data Analysis

The 518 data items collected were divided into training and test sets in the proportion 3 : 1. Subsequently, a stratified K-fold division was conducted to ensure the stability and reliability of the final model. Stratified K-fold division divides the dataset into mutually exclusive subsets and conducts stratified K-fold cross-validation. Cross-validation enables all the data to be used as training and test sets, equivalent to expanding the dataset.

We initialize the model using the wrapped classifier and regressor in XGBoost. Some of these model parameters are set as follows: max_depth = 5, learning_rate = 0.1, and n_estimators = 160. A range of indicators for the model was obtained by cross-validation, as shown in Table 5.

The prediction accuracy of this dataset was obtained by the XGBoost model as 75.97%. The F1-scores corresponding to correct and missed detection are 0.76 and 0.78, respectively, both at a high level, indicating that the model performs well in predicting these two types of cases. Conversely, the F1-score corresponding to false detections is only 0.33, indicating that the model does not have high trustworthiness of results in predicting such cases.

After stratifying the dataset by four folds and cross-validating, the scores for each fold and their average scores were obtained, as shown in Figure 8(a).

Figure 8(a) shows that the cross-validation scores for each fold of the dataset were high and reached a mean of 0.73, indicating that the model has good generalization ability. Because the maximum depth of the number is set, the effect of model overfitting on the accuracy of the prediction results is circumvented or weakened.

Meanwhile, the relative importance of the four factors affecting the accuracy of on-street parking space identification was obtained experimentally, as shown in Figure 8(b). Figure 8(b) shows that roadside distance has the greatest impact on the accuracy of the parking space recognition algorithm, reaching 0.36. This is followed by the entry line abrasion and traffic complexity, which are of similar importance at 0.29 and 0.27, respectively. The influence of image sharpness was only 0.08, indicating that this factor barely affected the accuracy of the parking space identification algorithm. The results were analyzed only qualitatively, as the volume of data was insufficiently large. The importance of the three factors, roadside distance, entry line abrasion, and traffic complexity, may change as the volume of data rises. Overall, the three are close in importance and cannot be precisely ranked in terms of their impact.

5.4. Data Correction

The number of false detections is very small compared to the other two types of detections. The number of missed detections is much higher than the number of false detections when the values of the four factors are close. The missed and false detections are combined into one category to improve the prediction accuracy of the model trained on the small sample data: incorrect detection, considering that neither missed nor false detection can accurately provide the occupancy status of the parking spaces. The model then provides only two predictions: correct and incorrect detection.

The corrected data were imported, and the XGBoost model was retrained. A series of model evaluation indicators were obtained, as shown in Table 6. The model’s prediction accuracy improved from 75.97% to 78.29%, and the F1-scores of both predictions and their weighted averages were high, indicating that the model’s prediction results have a high degree of trustworthiness.

The stratified four-fold cross-validation scores before and after data correction were compared, as shown in Figure 9. Cross-validation scores generally improved after fixes were applied to the data. The average cross-validation score improved from 0.73 to 0.77, and the model’s predictive accuracy improved, indicating that the model has excellent generalization ability.

6. Conclusion

This study proposes a trustworthiness assessment framework for crowdsourcing-based citywide parking availability detection. Four environment-related factors impacting the parking detection algorithm, the distance between the CAV and the target parking space, line abrasion, scene complexity, and image sharpness, are determined through a series of field and simulation experiments. A failure probability prediction model of parking availability sensing is developed based on the XGBoost algorithm, which can reveal the influence mechanism of different external factors on the data accuracy. The experimental results show that the average prediction accuracy of the model is 78.29%, enabling the detection vehicle to determine the extent of algorithmic sensory failure while identifying parking spaces. The impact of the scene complexity is the most pronounced, with camera contamination having a very weak effect. This avoids unnecessary trips arising from excessive trust in the results of parking space detection. The model can effectively assess the trustworthiness of crowdsourced data and significantly reduce the impact of quality issues arising from sensor identification and incomplete information.

Data Availability

The data generated during the current study are owned by the Key Laboratory of Road and Traffic Engineering of the Ministry of Education, Tongji University, and are not publicly available. Contact the corresponding author for further details.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.

Acknowledgments

This work was sponsored by the Shanghai Sailing Program under grant no. 21YF1449400.