Abstract

Accurate vehicle acceleration prediction is useful for developing reliable Advanced Driving Assistance Systems (ADAS) and improving road safety. The existence of driver heterogeneity magnifies the variations in acceleration data, leading to consequential impacts on the precision of vehicle acceleration prediction. However, few studies have fully considered the driver heterogeneity when predicting vehicle acceleration. To model the characteristics of individual drivers, this study first identifies the driving behavior semantics which is defined as the underlying patterns of driving behaviors. The analysis results from the coupled hidden Markov model (CHMM) are used to evaluate the driving behavior differences between different drivers by Wasserstein distance. Then the convolutional neural network (CNN) and long short-term memory (LSTM) network are applied to predict vehicle acceleration. To validate the accuracy of the proposed prediction framework, vehicle acceleration data in car-following conditions is extracted from the safety pilot model deployment (SPMD) dataset. The segmentation results indicate that the CHMM possesses a robust capacity for modeling driving behavior. The prediction results demonstrate that the proposed framework, which incorporates driver clustering before prediction, significantly improves the accuracy of predictions. And the CNN-LSTM outperforms the LSTM in predicting vehicle acceleration during car-following scenarios. The findings from this study can enhance the development of personalized functionalities within ADAS to promote its deployment, thereby improving its acceptance and safety.

1. Introduction

Autonomous driving technology is gaining attention as a solution to promote the traffic efficiency and prevent accidents in diverse traffic conditions. In the car-following scenario, a significant challenge in autonomous driving is to accurately model the driving behavior and predict the future movements of the preceding vehicle. Driving behavior reflects driver’s operations on vehicles and has an important influence on road safety [1, 2]. Many studies have shown that 80% of traffic accidents are caused by inappropriate driving behavior such as aggressive behavior [3]. Vehicle acceleration is a crucial aspect for describing the driving behavior. Accurate prediction of vehicle acceleration allows advanced driver assistance systems (ADAS) to obtain dynamic changes of driver’s actions and enables ADAS to predict the driver’s future operations. This improvement in ADAS can avoid driver’s deficiencies in perception and decision-making, thereby improving traffic safety.

1.1. Driving Behavior Analysis

Many studies in the past have demonstrated significant heterogeneity in car-following behavior among drivers, indicating diverse responses and adjustments to traffic flow changes [4, 5]. The heterogeneity among drivers contributes to substantial fluctuations in the driver characteristic variables, thereby increasing the complexity of modeling driving behavior, which carries implications for both the design of ADAS and traffic safety [6, 7]. However, most existing driving behavior modeling methods are unable to effectively handle the heterogeneity, and therefore, cannot accurately capture the driving behavior characteristics of different drivers. To bridge this gap, driving behavior analysis and evaluation have emerged as critical research areas. These methods aim to analyze the driving behavior characteristics of different drivers, enhance the understanding of driver heterogeneity, and evaluate differences between individual drivers based on specific indicators to enable effective clustering and modeling of driving behavior. Approaches currently available for driving behavior analysis can be broadly classified into two categories: those based on statistical features and those based on driving behavior semantics.

Methods based on statistical features usually consider the characteristics of driving data. Traditional methods often rely on a single statistical feature, such as the mean and standard deviation of brake pressure [8], the steering wheel position [9], and the throttle position [10]. The complexity of driving behavior arises from the intricate interplay of multiple factors, including but not limited to acceleration, braking, steering, lane changing, and other associated elements. Therefore, relying on a single feature to assess driving behavior may not adequately capture the full range of its characteristics. Many studies have used multiple indicators to analyze driving behavior. Fugiglando et al. [11] employed different statistical features to analyze driving behavior and applied the K-means algorithm for drivers clustering [12]. Euclidean distance was applied to recognize the preference of driving behavior with different statistical features including vehicle speed and throttle opening [13]. However, one major limitation of these driving behavior analysis approaches is that the intrinsic and dynamic characteristics of driving behavior cannot be captured from statistical data. Additionally, driving behavior evaluation approaches above are also obtained based on static criteria and do not take into account the randomness of data changes during driving.

Driving behavior involves dynamic decision-making. Even when faced with identical traffic conditions, a driver’s decision can change over time [9]. Therefore, it is crucial to fully capture dynamic characteristics from drivers’ operations for a better analysis of driving behavior. Some studies suggest that driving behavior semantics should be segmented. Driving behavior semantics were known as the data blocks with the same behavior characteristics. The driving behavior semantics can reflect the distinct dynamic correspondence between the driving environment and driving operation caused by driver heterogeneity [14]. Methods based on driving behavior semantics can be classified into traditional methods and Markov chain-based methods. In traditional methods, supervised and unsupervised learning techniques have been commonly used [15]. Supervised learning methods, such as the fuzzy logic algorithm [16, 17], have been introduced for segmentation and identification. Xie and Zhu [18] employed timestamp-based segmentation and random forest classification to analyze driving behavior. However, supervised learning methods require labeled training data, which can be laborious and time-consuming. Unsupervised learning methods have emerged as an alternative. The two-step algorithm proposed by Higgs and Abbas [6] has been used for car-following segmentation and clustering of driving behavior using K-means. Taylor et al. [19] used regularization to model the heterogeneity of driving behavior semantics over time.

The methods based on Markov chain are often used in sequence segmentation fields, such as natural language processing. There are many similarities between driving behavior semantics analysis and natural language processing. Both driving behaviors and natural language data are time-series data, where the order of events is of vital importance. And the meaning of a word in natural language is determined by the words surrounding it. Similarly, a specific driving action may imply different things depending on the surrounding driving environment. Thus, some natural language processing approaches have also been applied in driving behavior segmentation. The hidden Markov model (HMM) has demonstrated significant advantages in capturing dynamic processes in natural language and has found extensive applications in modeling driving behavior [20]. The HMM with Gaussian mixture emissions (GMM-HMM) was proposed to analyze the heterogeneity of different driving behavior [20, 21]. These driving behavior analysis models could capture the underlying stochastic and dynamic characteristics of driver behaviors, but failed to obtain the microscopic preferences which included the interrelationship between different characteristic variables. Wang et al. [22] applied a Markov chain-based method to identify primitive driving patterns from temporal driving data and subsequently employed Kullback-Leibler (KL) divergence to classify 75 driving patterns. KL divergence is used to compare the difference between driving patterns by comparing feature distributions, which better takes into account the randomness of driving behavior data. Other commonly used methods for this purpose include the Jensen-Shannon (JS) divergence [23] and the Cauchy-Schwarz divergence [24]. These driving behavior evaluation methods consider the influence of the randomness of driving data and are capable of measuring the difference of information contained in two temporal driving data. However, these measures suffer from low discriminative power when the distributions have little or no overlap, leading to an inability to effectively distinguish difference between them and often require defining comparison intervals to ensure precise and stable computation results.

Therefore, the difference between driving behavior should be more comprehensively and meticulously distinguished considering the connection between characteristic variables. And driving behavior evaluation metrics should be able to handle various scenarios where the distributions have little overlap and have a wider range of applicability.

1.2. Vehicle Acceleration Prediction

There are two categories related to vehicle acceleration prediction models: mathematical model and machine learning model. The mathematical model is a fixed structure based on different mathematical parameters. Kim and Yi [25] utilized a probabilistic method for holistic vehicle states prediction including acceleration. This model could be solved as a multistage optimal estimation problem. The parameter analysis and Fisher discriminant method were combined to predict vehicle acceleration [26]. The desired spacing car-following model was adopted by the proposed model. However, these models have deficiencies in non-linear fitting for vehicle acceleration data.

Machine learning has become an increasingly popular approach for vehicle acceleration prediction. Zhang et al. [27] introduced a nonlinear autoregressive model with exogenous inputs (NARX) for onboard implementation. The support vector machine (SVM) was used to train the acceleration sample and forecast [28]. Recently, deep learning models have gained significant attention. A long short-term memory (LSTM) neural network was used to generate accurate vehicle acceleration distributions and predict future acceleration values [29]. Lio et al. [30] demonstrated the performance of recursive, non-recursive, structured, and nonstructured networks for vehicle acceleration prediction. Moreover, the nonrecursive network proved to be preferable. Although these models exhibit robust capabilities for nonlinear fitting, it should be noted that these models do not have specialized modules for feature extraction. When dealing with complex data, such as high-dimensional time-series data, they may not fully capture the characteristics of car-following data, which are critical for accurately predicting vehicle acceleration.

Previous prediction methods overlook the differences that exist in driving behavior among individuals and fail to fully extract the valuable information from traffic data. Furthermore, the current driving behavior evaluation indicators suffer from limited applicability that can lead to unreasonable clustering results of drivers. These limitations may decrease the accuracy of vehicle acceleration prediction which lower the acceptance of the ADAS [31]. To address shortcomings above, a framework is proposed for predicting vehicle acceleration by analyzing driving behavior in this paper. The driving behavior clustering based on driving behavior semantics segmentation is conducted before prediction. And the prediction model with a fusion of LSTM and convolutional neural network (CNN) is defined as the CNN-LSTM. In this study, to capture the fundamental driving patterns, a coupled hidden Markov model (CHMM) is utilized. The utilization of CHMM in this study provides several advantages compared to other hidden Markov models, particularly in its ability to model interacting processes across various domains [11, 32]. It enables a comprehensive consideration of the interrelationships among different variables. Based on the results of driving behavior semantics segmentation, drivers are clustered into different groups by Wasserstein distance. The Wasserstein distance is a distribution distance metric that is insensitive to outliers and makes no assumptions on the distribution range. It effectively distinguishes difference between two probability distributions, even in cases of limited overlap, making it a valuable tool for measuring driver heterogeneity [33, 34]. The prediction model combines the LSTM with the CNN, since the CNN is good at extracting features from variables which can improve the performance of the prediction model [35, 36].

The contributions of this study can be summarized as follows: (1) A method based on CHMM and Wasserstein distance for driving behavior analysis and evaluation is introduced to undertake a refined clustering of drivers. (2) A CNN-LSTM model is proposed for predicting vehicle acceleration. The results indicate that the CNN-LSTM with a strong feature extraction capability outperforms the LSTM in vehicle acceleration prediction.

The remainder of this paper is arranged as follows: Section 2 introduces the data sources and preprocessing. In Section 3, this paper introduces the CHMM and the CNN-LSTM. Section 4 shows the results of semantic segmentation and evaluation for driving behavior. This section also presents the vehicle acceleration prediction results using the CNN-LSTM. Finally, Section 5 provides conclusions and future work for this study.

2. Data Description and Preprocessing

2.1. Data Extraction

The car-following data used in this paper are obtained from the Safety Pilot Model Deployment dataset (SPMD). This comprehensive dataset contains driving data for 2,842 vehicles over two years in Ann Arbor, Michigan, USA. 98 sedans in this dataset are equipped with a data acquisition system and MobilEye [37]. The onboard data, including vehicle speed, acceleration, and GPS, are obtained from the data acquisition system while the lateral position relative to the lane or road edge is recorded by the MobileEye system. Each driver operates a vehicle and engage in several car-following instances. During these instances, data such as the car-following event ID, relative distance between the subject vehicle and the preceding vehicle, relative speed, acceleration, and data collection timestamps are collected.

The extraction principles for stable car-following events are as follows [38]: (1) The ego vehicle is in the same lane as the vehicle in front. (2) The relative distance is greater than 5 m and less than 120 m. (3) The speed of the ego vehicle exceeds 5 m/s. (4) If the ID of the preceding vehicle changes, the event is terminated. (5) The duration of each car-following process cannot be less than 50 s. The data collection area for the car-following events used in this study is illustrated in Figure 1 [22]. Records from 30 drivers with the longest trip durations are selected, and the histogram illustrating the speed distribution of the ego vehicle is shown in Figure 2.

The car-following scenario is shown in Figure 3. In this condition, the vehicle’s acceleration or deceleration is predominantly determined by the relative distance and relative speed between the preceding vehicle and the ego vehicle [39, 40]. To maintain the desired distance, the driver modulates the brake or accelerator pedal accordingly. Depending on that, three characteristic variables include relative speed, relative distance, and the ego vehicle acceleration, are selected to illustrate different driving behavior. The acceleration of the ego vehicle (): it can explain driving intentions and preferences in driving behavior. Relative distance (): represents the positional difference in the forward direction between the ego vehicle and the preceding vehicle. Relative speed (): it signifies the speed discrepancy between the ego vehicle and the preceding vehicle, .

2.2. Variables Segmentation

To better understand individual driving characteristics and ensure that the extracted similar driving behavior semantics from different drivers correspond to consistent driving behavior patterns, the three variables above are categorized into distinct levels. The classification process ensures that driving behavior semantics extracted from car-following data from different drivers consistently exhibit identical driving patterns. This classification is carried out by taking into account the data characteristics and the physical and mental perception thresholds of drivers [23, 41, 42]. The variable segmentation information is shown in Table 1.

To eliminate dimension, the raw data is standardized as follows. After processing, the characteristic variables have a mean value of 0 and a standard deviation of 1.where represents input data of the driver i in stable car-following event k, and the value of i is from 1 to 30. K is the number of stable car-following events of the driver. and denote the mean and standard deviation of the characteristic variables for driver i, respectively.

3. Methodology

3.1. Framework

Figure 4 displays the overall structure of the vehicle acceleration prediction in this paper. First, the car-following data is processed to obtain stable car-following events. Second, the CHMM model is used to divide the processed data into driving behavior semantics segments. And the difference of driving behavior is assessed using Wasserstein distance; drivers are grouped into subgroups depending on the driving behavior evaluation results. Third, the CNN-LSTM is constructed for acceleration prediction in different subgroups.

3.2. CHMM

Driving behavior data is subject to random variation. It only depends on the current state of the driving system. Traditional models struggle to capture the randomness of driving behavior. The HMM with its powerful ability to describe dynamic processes can effectively overcome the stochasticity. Therefore, HMM has gained extensive utilization in the analysis of driving behavior [43]. The CHMM is a variant of the HMM that extends its capabilities. The coupled hidden Markov model (CHMM) incorporates a coupled multichain structure that establishes interconnections between the hidden state variables of the HMM chain. This model effectively captures interactions among multiple sequences, thereby enhancing its ability to handle latent connections between variables [23]. Thus, the CHMM is able to process the interaction between different characteristic variables.

The CHMM with three chains is shown in Figure 5. In this study, the hidden state sequence is presented by the driving patterns, while the variables are applied as the observation sequence. The sequence of observations for each HMM is only related to the hidden states of this HMM. The hidden state in each chain is not only related to the hidden state at the previous moment in the HMM chain of itself but also influenced by the hidden states of other chains [23]. In the CHMM, each HMM chain has hidden states, and the whole number of hidden states is . The observation sequence is expressed as . For time , the observed values of , , and is , , and . The hidden states are , , , and represent the hidden states of , , and at . The CHMM is represented by parameters . The state transfer matric , probability of observations and initial state probabilities can be calculated by equations (2)–(4):

The emission function in the CHMM uses Gaussian distribution to calculate the observation probability, considering that all variables are continuous.where is the hidden states of the HMM chain c, and respectively represent the probability distribution’s mean and standard deviation.

Driving behavior segmentation is a classic decoding problem in the application of HMM. The CHMM can compute the most likely sequence of hidden states for the time-series driving characteristic in the car-following process. The input of the CHMM includes time series of characteristic variables in the following car events: the number of the hidden states, the initial state transition probabilities, and emission probabilities while the CHMM will compute and output the most likely sequence of hidden states for each point in time.

3.3. CNN-LSTM

A CNN-LSTM model consists of two parts: the CNN part is used to capture features of the car-following data and improve the algorithm’s efficiency. The LSTM part is then constructed to predict the acceleration data. The framework of the CNN-LSTM model is presented in Figure 6. First, the input dataset includes l time steps before t th time step, which can be written as follows:where is a multidimensional vector at time t, which is expressed as .

Second, the CNN part is utilized to extract different features. CNN is a deep neural network that employs convolutional computation [44, 45]. It has been shown to be effective in extracting features from matrices and accelerating the training process [46]. The core component of CNN is the convolution layer, which extracts features from characteristic variables via convolution operation [47, 48]. The structure of a single-layer convolutional network can be expressed as follows:where is the output of the convolution layer, is the collection of input samples of , is the convolution kernel, and is the bias weight of the layer. Where j is the channel index considering the multiple convolution filters in the convolutional layer, and the activation function is .

In general, the pooling layer follows the convolutional layer to reduce the dimensionality of the characteristic data obtained by convolution. In this paper, the maximum pooling layer is utilized.

The CNN part is followed by the LSTM part. LSTM, which is a variant of recurrent neural networks (RNN), incorporates input gates, output gates, forget gates, and memory cells to address the challenge of gradient vanishing or explosion [4951]. Figure 7 shows the structure of LSTM unit. The forget gate (FG) determines which information about the state of the cell is lost. According to the and the , the forget gate outputs a number between 0 and 1 for the state. 0 represents complete discard, and 1 indicates fully accepted. The FG is expressed in (8):where represents the state of the forget gate at time t, and is the forget gate weight and bias, respectively. is the input at time t, which also indicates the output from maximum pooling. is the output, is the activation function. Input Gate (IG) is used to update information in the state of the input cell. It takes the input at the present moment and the implicit layer state at the previous time into the sigmoid function. It is calculated as follows:where indicates the input gate state, and is updated by information stored in the previous memory cell and new candidate information, it is shown as following equations:where is the candidate cell state at t, is the updated cell state, and is the weight and bias of the cell state, respectively. is based on the memory cell state and output gate state . It is expressed in equations: and is the weight and bias of the input gate, respectively.

3.4. Wasserstein Distance

In the driving behavior evaluation part, the Wasserstein distance is used to evaluate the difference between the distributions of various evaluation indicators for drivers. The Wasserstein distance is a mathematical method used to measure the distance between two probability distributions and . It can be mathematically expressed as follows:where is a set of all joint distributions whose marginals are and , and is the Wasserstein distance of order between and . When p = 1, the Wasserstein distance is referred to as the first-order Wasserstein distance or Earth Mover’s Distance. In comparison to KL divergence and JS divergence, the Wasserstein distance possesses the advantage of being sensitive to the shape of distributions. Even when two distributions have minimal overlap, the Wasserstein distance can effectively capture and differentiate the dissimilarities between them. Furthermore, KL divergence and JS divergence have the limitations that both functions have a value of zero, rendering them undefined. In contrast, Wasserstein distance can overcome this drawback which is appropriate for the evaluation indicators in this study.

4. Results

4.1. Driving Behavior Evaluation

Given the diverse characteristics of drivers, similar car-following scenarios can prompt varied responses among individuals, resulting in significant fluctuations in the car-following characteristic variables between drivers. This poses a challenge to driving behavior modeling and prediction. By assessing the driving behavior among drivers and grouping those with minor differences together, the variability in the characteristic variables within a group is reduced, facilitating more accurate modeling and prediction of driving behavior. Depending on the characteristics of driving behavior, most studies have divided different drivers into three types: aggressive, moderate, and conservative [5254]. Therefore, in this study, the number of hidden states in a single CHMM chain is set to 3, and the model is referred to as CHMM_3.

Figure 8 shows the duration distribution results for driver #5. As can be discerned from the figure, the majority of driver #5’s durations fall within the 1–10 second interval, with the average duration being 5 seconds. Figure 9 illustrates the segmentation results of driver #30. The CHMM model divides the car-following process into several segments. One segment represents the same primitive driving pattern, reflecting a uniform driving behavior characteristic. The background color block represents extracted driving behavior patterns, with uniform colors indicating similar behavior types. The curves in the graph indicate significant fluctuations in the characteristic variables of driver #30. The segmentation results obtained from CHMM_3 align closely with fluctuations in the driver characteristic data. Thus, the driving behavior semantics extracted from the data effectively reflect the potential driving behavior characteristics represented by characteristic variables.

Table 2 displays the duration distribution of driving behavior semantics obtained from GMM-HMM and CHMM_3. The GMM-HMM has been widely applied in the field of segmentation to process sequence data and unearth hidden states [55, 56]. The results from CHMM_3 indicate that 81.3% of pattern durations fall within the range of 1 to 10, while the proportions of durations from 1 to 10 in the results of GMM-HMM are only about 60%. A similar study by Wang et al. [22] yielded similar findings, where over 80% of behavior semantics durations ranged between 1 and 10 seconds, with a mean duration of approximately 5.9 seconds. Thus, the behavior semantics segments obtained from CHMM_3 can better describe the actual driving behavior characteristics.

In this study, the difference in driving behavior among drivers is evaluated by Wasserstein distance from three aspects: the duration, occurrence probability of a primitive driving pattern, and distribution of characteristic variables between every pair of drivers [23]. To eliminate the influence of results with different magnitudes on the overall result, normalization is applied to the results of the three dimensions separately.

The duration distribution of semantics can provide insights into the characteristics of the patterns of driving behavior. For driver i, the duration of driving behavior semantics is , the duration distribution of semantics is represented as . For driver j, the distribution of the duration is represent as . The difference of the duration of driving behavior semantics between two drivers can be expressed as follows:

The occurrence probability of a primitive driving pattern can serve as an indicator of drivers’ driving behavior preference. Calculating the average values of three variables for each driving behavior semantics, labels are assigned to the driving behavior semantics based on the intervals defined in Table 1. For example, “CD-KE-AA” represents the semantic label “close distance-keeping-aggressive acceleration.” Due to the discrete distribution of labels, the frequencies of different labels are used instead of the values of a continuous distribution function. By conditioning the car-following distance, the frequencies of different labels are computed for each distance category. For drivers i and j, the occurrence probabilities of labels are and , respectively. And the average Wasserstein distance is then calculated for the three distance labels:

Distribution of characteristic variables reflects the degree of aggressiveness. According to Table 2, the driving behavior semantics are divided into eight segments based on the duration. The mean and standard deviation of the feature variables are separately calculated for driver i and j within these eight intervals. The data distributions are then fitted using a normal distribution which are represented as and . The average of the Wasserstein distance for the degree of aggressiveness difference is calculated as follows:

At last, the average from the , , and provides a comprehensive evaluation index of driving behavior.

Figure 10 shows the heatmap results of the three evaluation aspects after normalization, as well as the comprehensive evaluation results. The darker colors represent the larger Wasserstein distances, indicating a greater difference between the two drivers in each indicator. Rows and columns corresponding to drivers with significantly different behaviors compared to others in Figures 10(a)10(c) are highlighted with green boxes. According to Figure 10(a), drivers #15, #16, and #17 are significantly different from others in the distribution of duration as they have dark red vertical bars compared with other drivers. This indicates that these drivers have more different transition characteristics of driving patterns from others. And, for drivers #24 to #30, the Wasserstein distance values are small, reflecting that the duration distributions among these drivers are relatively small. This suggests that the transition characteristics of driving patterns among drivers #24 to #30 are fairly similar. From Figure 10(b), drivers #11 and #20 have a more distinct distribution of occurrence probability of driving pattern. This suggests that these drivers have more different preference for the selection of driving patterns from others. As for the distribution of characteristic variables in Figure 10(c), the most significant difference exists in drivers #25 and #26. This reflects that #25 and #26 have an aggressive level that differs from other drivers.

Therefore, the utilization of the Wasserstein distance in conjunction with driving behavior semantics allows for the visualization of differences in driving behavior among drivers. The results of the comprehensive evaluation are shown in Figure 10(d). To achieve driving clustering, the boundary condition of Wasserstein distance for driver clustering is set at the 25th percentile (0.29). According to the comprehensive evaluation results, 8 drivers with the boundary condition less than 0.29 between the two are grouped as group 1 (e.g., #5, #6, #9, #13, #19, #23, #27, #28), and the remaining 22 drivers are divided into group 2. All 30 drivers are defined as group 3.

4.2. Prediction Results Analysis and Comparison

We compared the performance of the proposed framework with the LSTM, and the acceleration prediction experiments are carried out separately for each group. All experiments have been conducted on a computer using the TensorFlow framework. As shown in equations (18) and (19), two indicators are employed for evaluating the performance of different algorithms on different groups: mean squared error (MSE), mean absolute error (MAE).where n is the number of prediction samples, represents the true value, and represents the predicted value of the acceleration.

For each driver in each group, the data is divided into training and testing sets, which are grouped based on car-following events, with a 4 : 1 ratio. Within the training set, allocate 15% of the data as the validation set. The input length for both models is 80, with a prediction length of 1. Due to the inherent randomness introduced by random seeds in deep learning models, each model’s performance can vary with each run. To reduce the influence of randomness in the CNN-LSTM and the LSTM, five separate experiments are conducted for both models, and results are shown in Table 3. For both models, across five repeated experiments, the data used and hyperparameters (including learning rate, number of neurons, type of optimizer, etc.) are kept consistent. Group 1 consists of eight drivers with a minor difference in driving behavior, whereas group 2 comprises drivers with varying driving behavior. Group 3 encompasses all the drivers examined in this study. And group 4 consists of eight drivers that were selected at random. The degree of heterogeneity between the four driver groups follows the pattern: group 2 exhibits higher heterogeneity than group 3, while group 3 exhibits higher heterogeneity than group 1. And group 4 also exhibits higher heterogeneity than group 1. The bold results in the table represent the minimum experimental errors among different groups. In each group, the results of the five experiments for different models are distinct, but they demonstrate minor fluctuations within a narrow range. The results suggest the existence of randomness in the deep learning network structure, emphasizing the importance of conducting five experiments to accurately assess the model’s overall prediction performance.

Column “Average” presents the mean performance metrics obtained from five experiments that employed LSTM and CNN-LSTM across four driver groups. The bold results represent the outcomes of the model with the smallest prediction errors among the four driver groups. The CNN-LSTM consistently outperforms the LSTM for all four driver groups. For group 1, the CNN-LSTM demonstrates marginally superior predictive accuracy compared to the LSTM. In terms of group 2, CNN-LSTM outperforms LSTM with a 21.5% improvement in MSE and a 13.5% improvement in MAE. Similarly, in group 3, the CNN-LSTM model outperforms LSTM with a 25.9% improvement in MSE and a 16.3% improvement in MAE. With group 4, the CNN-LSTM model outperforms LSTM with a 11.2% improvement in MSE and a 6.8% improvement in MAE.

For further comparison, group 1 shows decreased predictive errors in both models compared to the other three groups. This observation indicates that the effectiveness of both models in vehicle acceleration prediction is diminished when applied to driver groups with greater heterogeneity. In particular, the CNN-LSTM of group 1 achieves a higher accuracy than group 4 with an improvement of 60.0% in MSE and 42.8% in MAE. Both group 1 and group 4 comprise an equal number of drivers. The difference lies in the method of selecting these eight drivers. Group 1 is the result of clustering using the Wasserstein distance based on the behavior semantics divided by the CHMM, while group 4 is the result of random selection. The better performance of group 1 is probably due to the fact that the CHMM is capable of effectively partitioning driving behavior semantics and that the driving behavior evaluation method using Wasserstein distance is comprehensive and rational, leading to a more accurate evaluation of driver heterogeneity and consequently making the driving behavior more consistent among drivers within group 1. Groups 2–4 show larger heterogeneity among drivers, which presents a greater challenge for vehicle acceleration prediction.

The comparison of the true values with the prediction values is drawn in Figure 11. The areas where there are significant performance differences between two models are highlighted with green boxes. According to three subfigures separately, the CNN-LSTM exhibits better alignment with the true value curve compared to the LSTM, as evident from the visual analysis. This is probably because the proposed framework can proficiently leverage the informative data available. The model exhibits an exceptional nonlinear fitting capability on vehicle acceleration prediction for driver groups with minor differences in driving behavior. The pairwise comparison of the three subgraphs reveals that the two models in group 1 demonstrate superior fitting performance to the ground truth values among the three groups due to the relatively high degree of alignment observed between the lines. The findings are in line with the predicted outcomes presented in Table 3.

5. Conclusions

This paper has proposed a framework for predicting vehicle acceleration based on the CNN-LSTM, considering heterogeneity in driving behavior among drivers. To achieve this, the CHMM is utilized to segment car-following data on driving behavior into semantics segments. This model enables an accurate description of the interconnections between multiple variables. Based on the results of driving behavior evaluation, drivers are categorized into different groups by Wasserstein distance. Wasserstein distance has a wider range of applicability and can provide more accurate clustering results. The CNN-LSTM model is then employed to predict vehicle acceleration for the driver group with minor different characteristics. The experimental results demonstrate that the CNN-LSTM outperforms LSTM in terms of prediction accuracy. Moreover, the CNN-LSTM based on driving behavior analysis exhibits a superior prediction performance compared to the CNN-LSTM without clustering. Overall, the proposed model can provide a high accuracy for vehicle acceleration prediction. For future work, the proposed CNN-LSTM model can be expanded to process car-following data from other locations or different road conditions to further evaluate the generalization capabilities of the model.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Authors’ Contributions

Y.Zou. and S.T. conceptualized this study. S.T. and H.Z. performed the methodology. S.T., H.Z., and Q.K. was responsible for the software. S.T. validated the study. S.T. and H.Z. performed formal analysis. S.T. and Q.K. curated the data. S.T. prepared the original draft. S.T., Y.Zou., H.Z., and Y.Zhang., wrote, reviewed, and edited the study. S.T. visualized the study. Y.Zou. and H.Z. supervised the study. Y.Zou administered the project. All authors have read and agreed to the published version of the manuscript.

Acknowledgments

The authors greatly appreciate Yinsong Wang from Nebula Link Technology Co., Ltd. for his professional support and assistance in proofreading responses. The authors also greatly appreciate Ting Zhu from Zhuzhou CRRC Times Electric Co., Ltd. for the support in data collection and methodology.