Abstract

Since various freeway design features are simultaneously installed on roadways, it is important to assess their combined safety effects correctly. This study investigated associations between multiple roadway cross-section design features on freeways and traffic safety. In order to consider the interaction impact of multiple design features and nonlinearity of predictors concurrently, multivariate adaptive regression splines (MARS) models were developed for all types and freight vehicle crashes. In MARS models, a series of basis functions is applied to represent the space of predictors and the combined safety effectiveness of multiple design features can be interpreted by the interaction terms. The generalized linear regression models (GLMs) with negative binomial (NB) distribution were also evaluated for comparison purposes. The results determine that the MARS models show better model fitness than the NB models due to its strength to reflect the nonlinearity of crash predictors and interaction impacts among variables under different ranges. Various interaction impacts among parameters under different ranges based on knot values were found from the MARS models, whereas two interaction terms were found in the NB models. The results also showed that the combined safety effects of multiple treatments from the NB models over-estimated the real combined safety effects when using the simple multiplication approach suggested by the HSM (Highway Safety Manual). Therefore, it can be recommended that the MARS is applied to evaluate the safety impacts of multiple treatments to consider both the interaction impacts among treatments and nonlinearity issues simultaneously.

1. Introduction

Traffic safety has become one of the serious global concerns and many countries have taken safety plans and initiatives towards safer roadways. While roadway crashes occur overwhelmingly due to driver failures (human errors), there is an adequate potential to increase the safety of road users through the roadways themselves. Therefore, designing roadways with appropriate facilities contributes to traffic safety to prevent death and injury from crashes.

Among various roadway classifications such as rural two-lane highways, rural multilane highways, urban arterials, and freeways, the freeway is a roadway where additional efforts are needed to enhance traffic safety. In freeway sections, the severe crash risk may increase because vehicles drive mostly at high speeds. Moreover, there is a high potential for large-truck involved crashes due to high truck volumes and the frequent presence of interchanges.

It is generally known that large trucks (i.e., commercial or freight vehicles) are substantial contributors to the roadway fatalities and injuries [1]. According to the National Highway Traffic Safety Administration (NHTSA) [2], a 10-percent increase was found in 2017 in large-trucks involved in fatal crashes from 2016 in the United States. From 2016 to 2017, large truck fatalities per 100 million vehicle miles traveled increased by 6 percent. More specifically, the number of large-truck involved injury crashes also increased from 102,000 in 2016 to 107,000 in 2017. Also, the number of large trucks involved in property damage (only crashes) increased by 3 percent, from 351,000 in 2016 to 363,000 in 2017.

There have been a number of studies which tried to assess the safety of freeway [37] by simple comparison methods and traffic simulations. There are also studies that addressed freight safety by investigating the risk factors that contribute to severity of truck-involved crashes [817]. These studies determined the significant variables influencing severities of freight vehicle crashes such as a number of vehicles, speed, lighting condition, location type, age, gender, traffic volume, weather, time of day, etc. Although there has been a sufficient effort to examine relationships between injury levels of large-truck involved crashes and significant variables, the current literature on quantified safety effects of freeway design elements for freight vehicle crashes is limited.

For this reason, it is essential to evaluate freeway safety in a quantitative way specifically for freight vehicles in order to determine the relationships between freight safety and freeway design features. The Highway Safety Manual (HSM) [18] presents scientific approaches to explore and estimate the expected changes of safety effects due to the implementation of treatment. One of the well-known approaches to quantify the safety effectiveness of specific roadway countermeasure is the cross-sectional method [19]. Whereas the cross-sectional method has been widely adopted to evaluate safety effectiveness of specific treatment due to its strength to acquire data easily compared to the before-after methods and to separate a single treatment effect from the effects of other treatments applied [2023], this approach needs to be applied carefully because of some potential issues (e.g., selection of variable bias, omitted variable, correlation effect, confounding effect, appropriate functional form, etc.) [2428]. In order to overcome these issues, several alternative methods were explored such as matched pair control [29], case-control [30], fully Bayesian [31, 32], cross-validation process [33], generalized nonlinear regression [34, 35], propensity score matching [3638], and data mining method [39]. However, most approaches examined only the main effect of each variable in the models, but not the effects of interaction between variables. This point is essential and needs to be considered to overcome the issue of over- or under-estimation especially when the safety effects of multiple treatments are evaluated. Most of the previous studies have calculated single treatment safety effectiveness with no consideration of the combined effects of multiple treatments.

In this study, the multivariate adaptive regression splines (MARS) technique was applied to account for both nonlinearity issues and interaction effects among variables. It is known that the MARS can control nonlinear impacts and interaction effects of independent variables for complex data structure and has an advantage in the safety analysis because it is a transparent model unlike other data mining and machine learning techniques [4042].

Therefore, the objectives of this study were to assess the safety effects of multiple roadway design elements. The MARS model was evaluated to estimate the impacts of single and multiple treatments. The safety effectiveness was calculated for large-trucks involved crashes using the cross-sectional method. Additionally, in order to present the general insights, the safety effects of different roadway characteristics were also estimated for total crashes.

2. Data

The total 2,141 rural freeway segment sections with around 1,566 miles in total length were observed from the roadway characteristic inventory (RCI) system managed by the Florida Department of Transportation (FDOT). It is worth to note that the collected freeway sections are segments without any ramp facilities (e.g., interchanges, junctions, etc.). In the RCI database, it is able to identify historical roadway characteristics of specific roadway sections for the given dates. To overcome the issue of misinterpretation in the nonlinear modeling process, freeway segment sections with less than 0.1 mile length were excluded [42]. The identified roadway characteristics data were coordinated with the crash data from the Crash Analysis Resource System (CARS), which is also offered by the FDOT. Two data sets were obtained for six years (2008−2013) and matched based on roadway ID and mile point for each freeway segment section. The six-year time period was considered to obtain complete and stable datasets. Table 1 provides the descriptive statistics of the parameters. For the variable named Horizontal curve, freeway segments with any portion of sections including horizontal curve were considered as “1”. The outside shoulder is the roadway shoulder on roadside, whereas inside shoulder is located next to the median barrier.

3. Methodology

3.1. Multivariate Adaptive Regression Splines

The MARS is a nonparametric regression that can be used to assess complex relationships using a series of basis functions [43]. This technique is a form of regression analysis and can be seen as an extension of linear regression that could accommodate and model nonlinearities and interactions between variables. The MARS captures the nonlinearity aspect of polynomial regression by generating cut-points (known as knots), which are similar to the step functions. It is generally known that the step function regression is an alternative to the polynomial regression model. The step function regression makes segment (or range) of a variable into bins and fits a different constant for each bin. On the other hand, the polynomial regression assesses a general nonlinear relationship by imposing a simple nonlinear functional form.

Abraham et al. [44] described MARS as a multivariate piecewise regression technique and the splines can be representing the space of predictors broken into a number of regions. It has become widespread particularly in data mining and data science areas since this technique makes no assumption of any type of relationship between dependent and independent variables. In the MARS, an explanatory variable is partitioned into intervals and a separate line segment is fit to each interval. The MARS divides the space of independent variables into multiple knots and then suits a spline function between these knots.

The MARS model can be described as follows: [45]

where,

 = response variable,

 = coefficient of the constant basis function,

 = coefficient of the th basis function,

 = number of nonconstant basis functions,

 = th basis function.

Three main steps are needed to fit a MARS model [41, 42, 45]. In the first step, which is a constructive phase, basis functions are examined in several regions of the predictors using a forward stepwise selection procedure. The predictor and the knot location that contribute significantly to the model are searched and selected in an iterative way in this step. Also, the introduction of interaction is checked to enhance the model performance at each iteration. The second step, known as the pruning phase, performs a backward deletion procedure to eliminate the least contributed basis functions. A generalized cross-validation (GCV) criterion is generally used in this pruning step to find the best model. The GCV criterion can be estimated by Equation (1). In the last step (selection phase), the optimum MARS model from a group of recommended models can be selected based on the fitting results of each [41, 42]

where,

 = response for observation ,

 = number of observations,

 = complexity penalty function,

 = defined cost for each basis function optimization.

In order to develop the MARS models, the ADAPTIVEREG procedure in the SAS program [46] was used. In the ADAPTIVEREG procedure, the maximum order of interactions can be adjusted by the MAXORDER option, but there was no difference between selecting the default condition (2-way maximum interactions) and increasing the maximum number of interactions. It should be mentioned that whereas increasing the model complexity by adding more interactions might help improve predictive power for highly structured data, the applicability of model might be reduced. In this study, a 2-way maximum order of interactions was used consistently for two different crash types (i.e., total crashes and large truck-involved crashes). Moreover, the basis functions were constructed for each crash type since the rate of changes can differ within the range for different types of crashes. The basis functions can be constructed using truncated power functions based on knot values [47].

3.2. Cross-Sectional Method and Safety Performance Function

In the cross-sectional method, it is essential to develop a safety performance function (SPF). The generalized linear regression model with negative binomial (NB) distribution (known as the NB model) is used to develop a SPF to consider over-dispersed crash data properly. A SPF relates the crash frequency to traffic and roadway characteristics. There are two types of SPFs, which are the full SPF and the simple SPF. The full SPF relates the frequency of crashes to both traffic and roadway characteristics, whereas the simple SPF is evaluated only based on a traffic volume as a predictor. It is worth noting that the HSM provides the CMFs calculated based on the simple SPF only. However, the simple SPF is an oversimplified function to reflect the relationship between crash frequency and roadway characteristics since crash frequency is not only affected by the traffic volume [48].

The cross-sectional method can be used to estimate and quantify the safety effectiveness by taking the ratio of the average crash frequency of sites with the element to the average crash frequency of sites without the element [18]. In particular, the safety impact (i.e., crash modification factor (CMF)) is calculated from the coefficient of the variable associated with the treatment as the exponent of the coefficient when the form of the model is log-linear in the cross-sectional studies. The functional form of calculating safety effects in the cross-sectional method is shown in Equation (2).

where,

βk = coefficients for the variables k in SPF,

xkt = linear predictor k of treated condition (changed condition),

xkb = linear predictor k of untreated condition (baseline condition).

In this study, the NB models were also evaluated for comparison purposes. Akaike Information Criterion (AIC) value was used to compare the performances of NB and MARS models. The AIC value has been widely applied to observe preferred model due to its strength of including a penalty on increasing the number of estimated parameters in assessing the likelihood function [28]. The AIC value can be calculated by Equation (3) as follows:

where,

k = The number of estimated parameters in the model;

L = The maximum value of the likelihood function for the model.

As Park and Abdel-Aty [28] discussed, prediction models using data that are aggregated or averaged can lead to biased estimates. A use of disaggregated data can be one way to account for this bias and the selection of an appropriate functional form is crucial to enhance the model reliability [26]. For this reason, in this study, MARS models were developed to reflect nonlinear relationships between crash rates and predictors under different conditions with various interaction terms.

4. Results and Discussion

Tables 2 and 3 present the estimated MARS models for large truck-involved and total crashes. Overall, the estimated basis functions are statistically significant at a 95% confidence level except for three cases (i.e., Basis 19 function in MARS model for large truck-involved crashes and, Basis 3 and Basis 16 functions in MARS model for total crashes). In the MARS model for large truck-involved crashes, the first basis function, Basis 1, is MAX(Ln(AADT)−11.608,0) and where the knot value is 11.608. The Basis 1 function can be included in the model when the logarithm of AADT is greater than 11.608 and the Basis 1 function is 0 for otherwise. Other basis functions are constructed in a similar manner by using different knot values. The results also showed that more interaction terms and safety effects of cross-section elements (inside/outside shoulder rumble strips, widths of inside/outside shoulder, and widths of driving lane) were found for large truck-involved crashes compared to total crashes.

The NB regression models for large truck-involved and total crashes were developed as shown in Table 4 to compare model performance with the MARS models. In general, the estimated parameters are statistically significant at the 95% confidence level except for two cases (i.e., Inside shoulder width in the NB model for large truck-involved crashes and Outside rumble strips in NB mode for total crashes). The results show that the MARS models generally provide better model fits than the NB models. This may be because the MARS can account for both nonlinear effects and interaction impacts between variables. The results also indicate that various interacting impacts among basis functions under different ranges based on knot values were found from the MARS models whereas only two interaction impacts were found in the NB models (i.e., interaction between AADT and number of lanes, and interaction between Lane width and Curve in NB model for large truck-involved crashes).

In this study, the safety effects of different freeway design elements for different crash types were quantified using the cross-sectional study. The safety effectiveness of a change in a specific design feature can be calculated using the coefficient of the parameter in the exponential functional form. Table 5 and Table 6 provide a summary of the exponential functional forms to estimate safety effects for large truck-involved and total crashes. The results indicate that through the MARS models, more safety impact estimation functions for different freeway design elements under various nonlinear ranges can be captured within consideration of interacting with other design features.

Moreover, another advantage of using MARS is the strength of considering interaction effects between variables. As explained in the HSM, when multiple roadway design features are changed (i.e., treated), the combined safety effects of multiple treatments can be estimated by the multiplication of multiple single safety impacts. However, the HSM also cautioned that simple multiplication of multiple safety impacts might over- or under-estimate the number of predicted crashes. In order to overcome this problem, the application of the MARS models can be implied to evaluate the safety effects of multiple treatments due to its strength of accounting for the interaction impacts among parameters.

For example, an addition of outside rumble strips on freeway section is expected to reduce large truck involved crashes by 19% according to the safety effects estimation function using the NB model from Table 5. Besides, the evaluated safety impact for the same treatment using the MARS model is found to be a 10% reduction. Similarly, increasing outside shoulder width from 6ft to 12ft (i.e., widening outside shoulder width by 6ft) decreases large truck involved crashes by 22% using the NB model, whereas 33% reduction can be expected based on the MARS model. As suggested in the HSM, the multiple treatments (i.e., combination of addition of outside rumble strips and widening outside shoulder width by 6ft) safety impact can be calculated by multiplication of single impacts and the combined safety effectiveness is 37% large truck involved crashes reduction. On the other hand, the safety effects of multiple treatments using the MARS model can be estimated by Equation (4) as follows:

It should be noted that the basis functions in Equation (5) are from Table 2. The results show that a 35% reduction in large truck involved crashes can be expected. This indicates that the combined safety effects over-estimated the real safety effects of multiple treatments by around 4 percent when using the HSM approach (multiply single CMFs to estimate combined safety effectiveness) compared to the results of the estimation of multiple treatment’s safety impacts from MARS.

5. Conclusions

The main objective of this study was to evaluate the safety effectiveness of multiple freeway design elements for enhancing freight safety. Thus, this study assesses the safety effects of multiple roadway features using the cross-sectional method through the development and comparison of NB and MARS models for different crash types. The MARS models were developed to account for both nonlinearity of independent variables and interaction effects for a complex data structure.

The results showed that the MARS models generally provided a better model fits than the NB models. A variety of interaction impacts among parameters under different ranges based on knot values were found from the MARS models, whereas two interaction terms were found in the NB models. The results also showed that the combined safety effects of multiple treatments from the NB models possibly over-estimate the real combined safety effects when using the simple multiplication approach suggested by the HSM. Therefore, it can be recommended that the MARS is applied to evaluate the safety impacts of multiple treatments to consider the interaction impacts among treatments.

In future work, it is required to further improve the MARS models by increasing sample size and including additional more detailed freeway design characteristics. Moreover, the suggested framework of using a MARS model in the safety effects evaluation could be adopted for other roadway facilities such as arterials and intersections. It is also recommended to optimize the issue between complexity for increasing model accuracy and applicability for the ease of general implementation of the model. Lastly, the validity and transferability of a suggested evaluation framework for other conditions and regions need to be examined.

Data Availability

Some of all data used to support the findings of this study were supplied by State of Florida under license and so cannot be made freely available. Requests for access to these data should be made to Florida Department of Transportation (FDOT).

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported and a part of the project titled “Risk Prediction of Port Resources and Development of Smart Safety Management Technologies/PJT201159” was funded by the Ministry of Oceans and Fisheries, South Korea. All opinions and results are solely those of the authors.