Road Crash Frequency Prediction for Indian National Highways using Soft-Computing Tools
Transcript of Road Crash Frequency Prediction for Indian National Highways using Soft-Computing Tools
1
Road Crash Frequency Prediction for Indian National Highways
using Soft-Computing Tools
Ashutosh Arun1, Erramapalli Madhu2, Senathipathi Velmurugan3
Abstract: Road crashes and road fatalities have risen quite drastically in India in the past
decade. There are several factors that affect the frequency of crashes on Indian roads ranging
from deficiencies in geometric design, poor maintenance history and other environmental and
human behavioural factors. This paper studies the effects of these factors in predicting road
crash frequency on the National Highways of India. For this purpose, crash history of 4710
crashes was collected on total segment length of 889 kilometers falling on various National
Highways in India. Additionally, data such as pavement roughness and road geometrics was
also collected. Conventional Poisson-based Generalized Linear Regression Models and
modern soft-computing methods such as Multilayer Perceptron networks and a hybrid
Adaptive Neuro-Fuzzy Inference System were configured to predict frequency of crashes.
Results indicate that the Multilayer Perceptron had the best prediction performance. A
sensitivity analysis was also subsequently performed.
Keywords: National Highways, Crash Frequency, Poisson Generalized Linear Model,
Multilayer Perceptron, Adaptive Neuro-Fuzzy Inference System
1. INTRODUCTION
Road safety has become a major issue in India. In 2013, India witnessed 486,476 road crashes
(on all types of roads) which accords it the dubious distinction of having the maximum
number of road crashes in the world (Figure 1). This is an increase of 22% compared to the
road crash numbers in the year 2001 when the widening of National Highways of India
started under the National Highways Development Project (NHDP). Also from Figure 1, we
can see that the road fatalities figure for the same year stood at 137,572. From Figure 2 it can
be seen that about 28% of these crashes and around 33% of the crash-related deaths took
place on the National Highways (NH) of India.
Figure 1: Summary statistics of road crashes in India
for the years 2001-2013. [Source: Road Accidents in
India (2013)]
Figure 2: Percentage Share of National Highways in
various statistics related to road crashes in India.
[Source: Road Accidents in India (2013)]
0
100,000
200,000
300,000
400,000
500,000
600,000
20
01
20
02
20
03
20
04
20
05
20
06
20
07
20
08
20
09
20
10
20
11
20
12
20
13
Nu
mb
er
of
Cra
she
s/P
ers
on
s
Year
Total Road Crashes
Persons Killed
Persons Injured
10
15
20
25
30
35
40
45
20
01
20
02
20
03
20
04
20
05
20
06
20
07
20
08
20
09
20
10
20
11
20
12
20
13
Pe
rce
nta
ge S
har
e o
f N
H
Year
Total Road Crashes
Person Killed
Persons Injured
2
Therefore, there is an urgent need to study the road crash process on Indian roads in
general and the National Highways in particular. The National Highways of India are a set of
high speed corridors linking major urban centers and other places of strategic importance in
India. For such high speed corridors, it becomes imperative to determine the contribution of
elements related to highway design and maintenance that can be controlled by engineers
towards the occurrence of road crashes so that appropriate guidelines may be evolved with a
goal to prevent future crashes and necessary corrective measures may be taken up at identified
crash prone locations.
Considering the severity of the problem, a few Indian researchers have attempted road
crash-related studies have been carried out by Indian researchers in the recent past (Dinu and
Veeraragavan, 2011). Shah (2011) developed road crash occurrence prediction models for
Indian National Highways utilizing time-series crash data collected and compiled by NHAI’s
Road Safety Cell for various national highway sections falling on Golden Quadrilateral and
North-South Corridor developed under NHDP Phase-I.
Improving upon the work done by Shah (2011), in this study the effort was made to
develop models to predict road crash frequency by considering road geometrics and pavement
roughness of the segments as well. Furthermore, it was decided to test the efficiency of soft-
computing techniques such as Artificial Neural Networks and Fuzzy Logic, which tend to
mimic human behavior and logical reasoning, towards the modeling of complex processes
such as road crashes. Thus, in this paper, Artificial Neural Networks (ANNs) and a hybrid
technique of ANN and Fuzzy Logic known as Adaptive Network-based Fuzzy Inference
System (ANFIS) were used for crash frequency prediction on Indian highways in addition to
the conventional Poisson-based Generalized Linear Regression models.
The paper has been organized in seven sections. In the next section, the literature on
modeling road crash frequency is reviewed. In Section 3, methodology adopted in the present
study is discussed. Section 4 provides the description of data used and variables considered in
the model. Section 5 discusses the details about the development of road crash frequency
prediction models and the comparison amongst the various modeling techniques. Detailed
sensitivity analysis is presented in Section 6. Finally, Section 7 gives the conclusions drawn
from the study.
2. LITERATURE REVIEW
Theoretically, crash frequency variable can be considered as being Poisson distributed (Lord
et al., 2005). Miaou (1994), Vogt and Bared (1998) and Kononov and Allery (2003) have
demonstrated the successful use of Poisson and Negative Binomial (NB) regression in
modeling crash frequency. Importantly, in Thailand, where the roads operate under mixed
traffic conditions like in India, Poisson and NB-based Crash Prediction Models have been
successfully deployed (Thailand Accident Research Centre, 2009).
However, the Poisson Generalized Linear Model suffers a few drawbacks. Due to the
Poisson property where the variance is constrained to be equal to the mean, the Poisson
distribution cannot handle over- or under-dispersion. In order to overcome this shortcoming, a
robust covariance estimator (also known as Huber/White/Sandwich estimator) can be used to
estimate the coefficients of the Poisson model (Freedman, 2006; Hilbe, 2007). It is called
“robust” because it provides consistent estimate of variance of the Maximum Likelihood
estimates even when the underlying model specification is wrong.
3
The Poisson models are still fraught with several other shortcomings as explained by
Lord and Mannering (2010). More importantly, with regard to its specific application in India,
Dinu and Veeraragavan (2011) have strongly criticized the fixed parameters assumption of
Poisson models as they feel that it cannot suitably capture the huge variability arising out of
heterogeneous traffic conditions and other social and environmental factors prevalent here.
In view of such drawbacks of conventional models, several authors have also sought
to apply soft-computing methods such as Artificial Neural Networks and Fuzzy Logic models
for the purpose of crash prediction. These soft-computing tools have been proved to be
universal approximators (Cybenko, 1989; Stinchcombe and White, 1989; Kosko, 1992). The
most popular Feed-Forward ANN in crash prediction is Multilayer Perceptron (MLP)
network. Pande and Abdel-Aty (2006) demonstrated the applicability MLP for crash
frequency prediction. Xie et al. (2007) and Cansiz and Easa (2011) showed that the ANN
models provided superior results to the conventional stochastic models.
Adaptive Neuro-Fuzzy Inference System (ANFIS), originally developed by Jang
(1993), is a hybrid of the ANN and Fuzzy Logic techniques. It utilizes feed-forward neural
networks with supervised learning to tune a Sugeno-type Fuzzy Inference System and hence
combines the advantages of both the techniques. For this reason it has been used for crash
frequency prediction by some researchers (Awad and Janson, 1998; Zheng and Meng, 2011).
Indeed it needs to be mentioned that the soft-computing techniques are also plagued by
disadvantages such as over-fitting of the training data, ‘black-box’ nature, the “curse of
dimensionality” (Bishop, 1995; Friedman, 1997) etc.
3. METHODOLOGY
For the present study, a road crash was defined as a collision occurring between two or more
objects on a road section, at least one of which is any type of a moving vehicle.
The crash data were collected from the Road Safety Cell of National Highway
Authority of India (NHAI) for select stretches and these were then collated and merged with
quantitative road geometry and pavement roughness data collected using the Network Survey
Vehicle (NSV) of CSIR-Central Road Research Institute (CSIR-CRRI). The preliminary
database so created was then subjected to data pruning. Thereafter, the final variables for
modeling were selected and checked for the presence of correlations. These variables were
subjected to exploratory data analysis to identify patterns present in the data and to formulate
hypotheses regarding the individual dependent-independent variable relationships. Finally the
following models were prepared to predict crash frequency:
Poisson Generalized Linear Model
Multilayer Perceptron (MLP) Model
Adaptive Neuro-Fuzzy Inference System (ANFIS) Model
Before the estimation of the Poisson model, as is the vogue, the data was tested for the
presence of over- or under- dispersion using the following tests:
Regression-based tests calculating a dispersion statistic using Deviance and
Pearson’s chi-square
Lagrange Multiplier (LM) test (for details refer Cameron and Trivedi, 1998)
If dispersion was found, it was proposed to use the Quasi-Maximum Likelihood
Estimation and/or the Robust Covariance Estimator approach to handle the dispersion. The
models so developed were then compared with each other based on the error measures such as
Mean Absolute Error (MAE) and Root-Mean Squared Error (RMSE).
4
Additionally, sensitivity analysis was carried out using finally selected model/s to find
out the relationship of individual variables with crash frequency.
4. DATA AND VARIABLES
4.1 Data Collection and Collation
The crash data for the year of 2012 were collected from the Road Safety Cell of National
Highway Authority of India (NHAI) for select stretches on 8 National Highway corridors
geographically distributed across the country. These sections were recently widened to four
lane divided carriageways under NHDP Phase-I and fall on the important Golden
Quadrilateral (GQ) Corridor that connects the four major metropolitan cities of India namely
Delhi, Mumbai, Chennai and Kolkata. Only segment crashes were considered under this
study. The study sections are indicated in Figure 3.
Figure 3. Map of Golden Quadrilateral Project under NHDP Phase-I. [Source:
http://nhai.org/gqmain_english.htm] [Note: Entire GQ stretch is shown with study sections marked in bold lines
and labeled]
The crash data that were collected contained information regarding the day and time of
crash, crash location in terms of chainages, weather condition at the time of crash, cause/s of
crash (such as over-speeding, drunken driving etc.), nature of crash (i.e. overturning, head-on
collision, rear-end collision etc.), severity outcome of the crash etc. Quantitative road
geometry data (vertical gradient and horizontal curvature) and information regarding
pavement surface condition in terms of the International Roughness Index (IRI) were
collected for the study sections using the Automatic Road Survey System of CSIR-Central
Road Research Institute. The road geometry data were then integrated with the crash data
created earlier to create a preliminary database.
While collating the data, it was observed that the entries regarding exact crash location
for crashes on NH 2 were incomplete and hence the data for NH 2 were not entered into the
5
database. After removing the missing values, repeated observations and cases for which the
information seemed dubious, the database had records for 4710 crashes occurring on 889
homogeneous sections (based on traffic volume and horizontal and vertical alignment) along
with the road geometry and IRI data for all the study sections.
4.2 Variable Selection
Consistent with the objective of the paper wherein it was sought to model the effect of the
environmental and behavioral factors in addition to the engineering ones on crash frequency,
the following variables were selected for the modelling purpose as given in Table 1.
Table 1. List of Model Variables
S. No. Name of the Variable Name of the
Variable in the
Model
Nature of the
Variable
Variable Function in
the Model
1 Average Daily Traffic - - Multiplicative
Constant
1 Time of the Day Time Categorical Explanatory
2 Weather at the time of Crash Weather Categorical
3 Causal Conditions CC Categorical
4 Pavement Roughness (in IRI) IRI Continuous
5 Horizontal Curvature (in deg/km) Curve Continuous
6 Vertical Gradient (in %) Grade Continuous
7 Annual Crash Frequency Frequency Continuous Response
As seen in Table 1, logarithm of the Average Daily Traffic was used as the measure of
traffic exposure only and was multiplied with the result of the model in order to obtain the
final frequency values. The selected model variables were thereafter checked for correlations.
No significant correlations among the variables were found, as shown in Table 2.
Table 2. Spearman’s Rank-Correlation Coefficients for Independent Variables
TC WC CC Avg. Grade Avg. Curve Avg. IRI
TC 1.000
WC -0.003 1.000
CC -0.049 0.035 1.000
Avg. Grade -0.005 -0.024 -0.041 1.000
Avg. Curve 0.050 0.066 -0.019 -0.061 1.000
Avg. IRI -0.037 0.041 -0.067 0.056 0.112 1.000
Tables 3 and 4 provide details about the categorical variables and summary statistics
for the continuous variables respectively for providing deeper insights into the data.
6
Table 3: Details of Categorical Variables
Categorical
Variable
Categories Coding
Information
Crash
Counts
Percentage share in
total crashes
Time of Day 22 – 2 hrs. Time=1 438 9.30%
2 – 6 hrs. Time=2 897 19.04%
6 – 10 hrs. Time=3 810 17.20%
10 – 14 hrs. Time=4 769 16.33%
14 – 18 hrs. Time=5 777 16.50%
18 – 22 hrs. Time=6 524 11.13%
Unknown Time=7 495 10.51%
Weather
Condition
Fine Weather=1 3608 76.6%
Extreme Temperatures Weather=2 483 10.25%
Bad Weather with Low Visibility Weather=3 456 9.68%
Very Bad Weather with Very Poor
Visibility
Weather=4 102 2.17%
Unknown Weather=5 61 1.3%
Causal
Condition
Drunken Driving CC=1 137 2.91%
Over-speeding CC=2 3116 66.16%
Vehicle out of control CC=3 660 14.01%
Fault of driver or other road user CC=4 640 13.59%
Mechanical defect in vehicle/ Poor
road condition
CC=5 147 3.12%
Unknown CC=6 10 0.21%
Table 4: Summary Statistics for Continuous Variables
Average
Daily Traffic
(in PCU)
Avg. IRI (in
m/km)
Avg. Grade (in
%)
Avg. Curve (in
deg/km)
Crash
Frequency
Mean 34477 2.85 0.019 38.13 0.026
Median 29411 2.68 -0.007 30.65 0
Standard
Deviation
17378 0.77 1.02 28.68 0.169
Sample
Variance
0.59 1.04 822.38 0.029
Minimum 16934 1.505 -7.87 6.11 0
Maximum 64409 7.417 5.16 237.92 8
It can be seen from Table 4 that the Crash Frequency variable seems almost equi-
dispersed. However, before final model development, a formal test was proposed to be carried
out to ascertain the presence (or absence) of any over-dispersion whatsoever. After this the
final analysis was performed using the Poisson Generalized Linear Regression as a
conventional modelling tool followed by the configuration of soft-computing models.
7
5. MODEL DEVELOPMENT
It was sought to carry out external validation in addition to the usual internal validation for
testing the reproducibility of model results as suggested by Justice et al. (1999). For this
purpose, data for one highway corridor i.e. NH 79A & 79 was set aside from model
development and used exclusively for the purpose of external validation. The rest of the data
were divided in a ratio of approximately 70:30 to obtain the calibration and internal validation
datasets. The following sub-sections give the details of the models developed on the
calibration set.
5.1 Poisson Generalized Linear Models
The Poisson Models were fitted using the GENLIN command in SPSS® software. Log was
used as the link function for all the Poisson models that were developed. A hybrid method for
ML estimation was adopted where 1 iteration of Fisher’s scoring method was first employed
followed by Newton-Raphson method for faster convergence. Several Poisson models were
estimated incorporating the main effects and/or the various interactions of the independent
variables.
A LM test was conducted to check for the presence of over-dispersion. Also, it was
found that though the complete dataset indicated possible presence of over-dispersion, if one
utilized only a part of the dataset for modeling purpose, the data started showing signs of
equi-dispersion and sometimes even under-dispersion. This was also a problem observed by
Daniels et al. (2010). Cameron and Trivedi (1998) have also cautioned that an apparent over-
dispersion in the data might get eliminated once we include regressors in the model.
Thus, for the LM test, a NB log-linear model with its dispersion parameter set to 0 was
calibrated on the calibration dataset and the null hypothesis that the dispersion parameter
value is 0 was tested. The results (Table 5) indicated that under-dispersion was significant at
5% level of significance (p-value = 0.017). However, even non-directional dispersion was
significant at this level of significance. Moreover, the null hypothesis could not be rejected if
the results were considered at 1% level of significance indicating that nothing could be said
conclusively.
Table 5: Lagrange Multiplier test results
LM test
statistic
p-values (by Alternative Hypothesis)
Dispersion Parameter <
0
Dispersion Parameter >
0
Dispersion Parameter ≠
0
Ancillary
Parameter
-2.124 0.017 0.983 0.034
Thus in order to obtain conclusive evidence regarding the absence of under-dispersion
the dispersion statistics using Deviance and Pearson’s chi-square were also calculated (Table
6). Table 6. Dispersion statistic calculated from Deviance and Pearson’s chi-square
Value df ϕ̂
Deviance 12082.2 108133 0.112
Pearson Chi-Square 107827.1 108133 0.997
8
As it can be seen from Table 6, while the dispersion statistic from deviance indicated
under-dispersion, the one derived from Pearson’s chi-square has a value very close to 1,
indicating that data may be safely assumed to be equi-dispersed. It has been suggested in
literature that Pearson’s chi-square-based scaling should be preferred over Deviance-based
scaling (Hilbe, 2007; Barros and Hirakata, 2003). So it can be concluded that un-scaled
Poisson models could have been developed. Nevertheless, robust estimators for covariance
matrix were used in estimation just to be on the safer-side.
It was found that the various models with interaction terms were either found to be
showing lesser fit than the main effects-only model or found to be affected by
multicollinearity. Hence, finally the main effects-only model was chosen. Table 7 gives the
coefficients obtained using the Poisson regression. It can be seen that except the variable
Time=6, which corresponds with a time period of 6 PM to 10 PM, rest all of the variables
have proved to be significant predictors of road crash frequency variable.
Table 7: Parameter Estimates for Poisson model
Parameter β t-statistic
(Intercept) -10.631 -24.205
Time=1 -.254 -3.144
Time=2 .353 5.389
Time=3 .356 5.362
Time=4 .353 5.222
Time=5 .330 4.889
Time=6 .024 0.327
Time=7 0
Weather=1 3.770 25.681
Weather=2 1.648 10.338
Weather=3 1.827 11.644
Weather=4 .416 2.218
Weather=5 0
CC=1 2.502 5.896
CC=2 5.720 14.002
CC=3 4.100 9.968
CC=4 3.955 9.601
CC=5 2.842 6.771
CC=6 0
IRI -.064 -2.723
Grade .068 4.503
Curve .004 8
Log-Likelihood Ratio
(LL0/ LLβ) -8869.958
McFadden’s R2
(1- LLβ/LL0) 0.9
From the Table 7 above, it can be seen that the McFadden’s pseudo-R squared
measure, which is a statistic indicating the goodness of fit of the model, is over 0.9 indicating
that the developed model fits the data perfectly. The developed model was tested for the
presence of heteroscedasticity by plotting the Standardized Deviance Residuals against the
model predicted results. Given that on visual inspection, the model showed signs of
heteroscedasticity (Figure 4) it seems that the robust variance estimators proved to be a good
choice for model development.
9
Figure 4: Visual Inspection for Heteroscedasticity
5.2 Multilayer Perceptron Models
The MLP model contained 21 input nodes and one output node. Based on literature review, it
was decided that one hidden layer would be sufficient in the model. The optimal number of
neurons in the hidden layer was estimated based on a cross-validation technique. The MLP
models were fitted using the MLP command in SPSS.
Scaled Conjugate Gradient (SCG) Algorithm with batch training was used for training
the network as it is faster and more efficient than standard Back-Propagation Algorithm
(Moller, 1993). The default values were used for the parameters in SCG algorithm. Sum of
squares was used as the error function. The maximum number of training epochs was limited
to 1000.
Several model runs with different rescaling options (standardization, normalization
and adjusted normalization) for both input and output continuous variables, different
activation functions (hyperbolic tangent and sigmoid for hidden layer and hyperbolic tangent,
sigmoid, identity and softmax for output layer) and with different random seeds (leading to
different initial settings like initial weight matrices, initial automatic architecture selections
etc.) were done. Any other settings were kept at their default values. The best performing
network produced had 9 hidden neurons and was obtained at the following settings:
Covariates rescaled using standardization; the dependent variable rescaled using
normalization.
Sigmoid activation function in both hidden and output layers
5.3 Adaptive Neuro-Fuzzy Inference System Models
The ANFIS models were developed using MATLAB® Fuzzy Logic Toolbox. The Fuzzy
Logic Toolbox gives three options for configuring an initial Sugeno-type FIS in the form of
grid partitioning, subtractive clustering and fuzzy c-means clustering. Since grid partitioning
suffers from the “curse of dimensionality”, the choice was limited to the clustering
techniques. Hammouda and Karray (2000) compared the two clustering techniques and found
that fuzzy c-means clustering performed better in terms of lower RMSE values, higher
classification accuracy and lower computation time required. Thus, it was decided to use
fuzzy c-means clustering to generate an initial FIS.
10
This was accomplished by using the genfis3 function of the Fuzzy Logic Toolbox
which uses the fcm function (fuzzy c-means clustering) to determine the number of rules and
membership functions for the premise and consequents. By default, the premise membership
functions are Gaussian “bell-shaped” functions and the consequent membership functions are
linear. A problem with fcm is that it is sensitive to the initial conditions and hence several runs
were done to decide upon the best performing FIS (The best performing FIS is one that
produces the minimum value of an objective function). However, the different runs did not
produce any significant improvement in performance in our case. For each run, the default
fcm options were maintained.
The objective of applying the adaptive network to this initial FIS is to tune the
parameters of the membership functions so that the overall RMSE is minimized. Thus, to tune
the network the anfis function of the Fuzzy Logic Toolbox was used. The default options for
this function were kept as it is.
While developing the initial FIS structure for ANFIS models, initially the ‘auto’
option in fcm function was used where it utilizes a subtractive clustering technique to find out
the number of clusters. However, this strategy was found to be computationally very
expensive. Hence the number of clusters was found alternatively by starting with an initial of
5 clusters and then increasing them in steps of 5. The tuning of initial FIS using anfis function
was also done for each number of clusters in order to gauge the effect of increasing the
number of clusters on training time requirement.
The FIS obtained by this method was thus limited to 10 rules. Thus, the final ANFIS
network had 10 Gaussian membership functions for each of the 21 inputs, and 10 linear
membership functions for the output. The maximum number of training epochs was limited to
10 as an increase in the number of training epochs also increased the training time
requirement manifold.
6. RESULTS
Table 8 gives the error measures for all the three models in all the three phases: calibration,
internal validation and external validation, and also overall.
Table 8. Comparison of models based on error statistics
Model Calibration Internal Validation External Validation
MAE RMSE MAE RMSE MAE RMSE
Poisson 0.04 0.15 0.04 0.15 0.04 0.17
MLP 0.03 0.12 0.01 0.08 0.01 0.06
ANFIS 0.05 0.15 0.04 0.17 0.05 0.16
It can be seen from Table 8 that the Poisson and ANFIS models show almost similar
results. The MLP results are however better than the other two models, more so when we take
the validation results in comparison. This proves that the repeatability and reproducibility of
MLP model results are the best amongst all the models tested. Also, the MLP model had a
lower training time requirement when compared to the ANFIS model which makes it a more
attractive method. Thus, on the basis of the following results, the MLP models can be
regarded as the most suitable tool for modeling crash frequency for Indian national highways.
11
7. SENSITIVITY ANALYSIS
Before proceeding with the sensitivity analysis for individual variables, an attempt was made
to determine the relative importance of each variable towards predicting a road crash. The
MLP command in SPSS provides an option of conducting the importance analysis. The
results of this analysis are provided in Table 9.
Table 9. Importance Analysis for Independent Variables
Variables Importance
Time .076
Weather .293
Causal Conditions .345
Grade .133
Curve .153
IRI .086
From Table 9, it can be seen that the weather conditions and causal conditions possess
a high influence in predicting road crash frequency.
For sensitivity analysis, both Poisson GLM and MLP models were employed.
However, it was observed that the variable relationships derived using the Poisson model did
not agree with the observed behavior of factors. Thus, only the MLP model results are
presented below.
In the process of sensitivity analysis, firstly only the levels of categorical variables
were varied while keeping the continuous variables fixed at their mean values. The predicted
mean values were then averaged for a particular category of a particular variable and
presented in Figure 7.
0.007
0.0290.024
0.0220.024
0.010
0.000
0.010
0.020
0.030
0.040
22-2 hrs. 2-6 hrs. 6-10 hrs. 10-14 hrs. 14-18 hrs. 18-22 hrs.
Mar
gin
al C
rash
Fr
eq
ue
ncy
Time of Day
0.107
0.002 0.008-0.007
-0.0200.0000.0200.0400.0600.0800.1000.120
Fine ExtremeTemperatures
Bad Weatherwith PoorVisibility
Very BadWeather with
Very PoorVisibility
Mar
gin
al C
rash
Fr
eq
ue
ncy
Weather Conditions
12
Figure 5: Sensitivity Analysis for categorical variables
From Figure 5, the following inferences can be drawn:
Amongst all the categorical variables, causal conditions especially over-speeding
have the maximum positive effect on crash frequency highlighting the role of
human factors in crash causation.
The marginal crash frequency (MCF) values for the daytime hours are 0.024,
0.022 and 0.024. Thus, compared to night-time categories, the day-time actually
contributes more towards crash counts. The MCF value is the highest (0.029) for
the 2 AM – 6 AM time period category.
Fine weather conditions are proved to be conducive to higher crash frequencies as
the MCF value (0.102) is higher by a large margin when compared to the other
weather conditions defined in the database. Possible reckless driving and higher
speeds may be the reason behind this. The minimum effect is of the most adverse
weather condition of “Very bad weather with very poor visibility” perhaps due to
rarity of observing this weather condition in field.
An interesting point to note is that though the number of crashes observed were
more under Extreme Temperature Conditions (Weather = 2), the MCF is higher
for lightly inclement weather conditions indicated by the third category. Thus,
other conditions remaining constant, weather conditions such as light rain/strong
winds/light dust storms etc. shall result in more crashes than very high/very low
temperature conditions.
In the second step of sensitivity analysis, the continuous variables were varied one-by-
one within a range for which a large number of cases were observed in the database while the
other continuous variables were kept at their average values. This was repeated for all the
values of the continuous variables to get expected values for marginal crash frequency. The
results are presented in Figure 6.
-0.005
0.113
0.0180.013 -0.007
-0.0200.0000.0200.0400.0600.0800.1000.120
Mar
gin
al C
rash
Fr
eq
ue
ncy
Causal Conditions
13
Figure 61: Sensitivity Analysis for continuous variables
Some inferences drawn from the Figure 6 are as below:
For average Gradient variable the marginal crash frequency is expected to show a
very small increase from 0.009 to around 0.012 as the gradient changes from -7%
to +7%.
The relationship between marginal crash frequency and average curvature is
monotonically non-decreasing. Thus, lower the curvature, lower the crash
frequency.
The relationship between marginal crash frequency and average curvature is
monotonically non-decreasing i.e. greater the IRI, lesser the marginal frequency
8. CONCLUSION
The road crash frequency prediction models for Indian National Highways were developed in
this paper. An important objective of this paper was to compare the suitability of application
of soft-computing tools such as Artificial Neural Networks and Fuzzy Logic for the purpose
of prediction of crash frequency on Indian national highways. In that sense, it can be safely
declared that the ANN, particularly the Multilayer Perceptron (MLP) model, is a successful
technique as it has a performance better than the conventional crash frequency modeling
methods such as Poisson GLM. In fact, if one takes into account the insight that MLP model
enables one to have in the individual independent-dependent variable relationships, it will be
justified to say that MLP even outperforms the conventional modeling methods.
The sensitivity analysis done with the help of MLP models revealed that the human
and vehicle-related causal conditions are the main determinants of crashes occurring on
Indian highways. The road geometry has far lesser impact on crash causation. So it can be
argued that engineering solutions alone cannot have a very significant mitigating effect on
0
0.005
0.01
0.015
0.02
0.025
-8 -6 -4 -2 0 2 4 6 8
Mar
gin
al C
rash
Fr
eq
ue
ncy
Avg. Gradient (in %)
0
0.02
0.04
0.06
0.08
0 200 400
Mar
gin
al C
rash
Fr
eq
ue
ncy
Avg. Curvature (in deg./km)
0
0.005
0.01
0.015
0.02
0.025
0.03
0 5 10
Mar
gin
al C
rash
Fr
eq
ue
ncy
Average IRI (in m/km)
14
road crashes. The other two E’s of road safety i.e. Education and Enforcement will have to
play a far greater role in order to achieve any drastic results.
There are certain limitations of the study as well:
Speed is an important determinant of road crashes but since the data regarding the
vehicular speeds at the time of crashes was not readily available it could not be
utilized for modeling purpose.
Since all the highways considered in the study were four-lane divided highways, the
scope of the models is limited to this category of highways only.
Temporal and Spatial correlations, if present, were ignored in the study due to
incompleteness of the database.
ACKNOWLEDGEMENT
We are very grateful to the Director, CSIR-Central Road Research Institute for allowing us to
publish this paper.
REFERENCES
Awad, W.H., Janson, B.N. (1998). “Prediction models for truck accidents at freeway
ramps in Washington state using regression and artificial intelligence techniques”.
Transportation Research Record 1635, 30-36. doi:10.3141/1635-04
Barros, A.J., Hirakata, V.N. (2003). “Alternatives for logistic regression in cross-
sectional studies: an empirical comparison of models that directly estimate the
prevalence ratio”. BMC Medical Research Methodology, 3(1), 21.
Bishop, C.M. (1995). Neural networks for pattern recognition. Oxford, UK: Clarendon
Press.
Cameron, A.C., Trivedi, P.K. (1998). Regression analysis of count data. Cambridge,
UK: Cambridge University Press.
Cansiz, O.F., Easa, S.M. (2011). “Using artificial neural network to predict collisions on
horizontal tangents of 3-D two-lane highways”. International Journal of Engineering
and Applied Sciences, 7(1), 47-56.
Cybenko, D.L. (1989). “Approximation by superpositions of a sigmoidal function”.
Mathematics of Control, Signals, and Systems, 2, 303-314.
Daniels, S., Brijs, T., Nuyts, E., Wets, G. (2010). “Explaining variation in safety
performance of roundabouts”. Accident Analysis and Prevention, 42(2), 393-402.
Dinu, R.R., Veeraragavan, A. (2011). “Random parameter models for accident
prediction on two-lane undivided highways in India”. Journal of Safety Research,
42(1), 39-42.
Freedman, D.A. (2006). “On the so-called “huber sandwich estimator” and “robust
standard errors”. American Statistician, 60(4), 299-302.
15
Friedman, J.H. (1997). “On bias, variance, 0 / 1 — loss, and the curse-of-
dimensionality”. Data Mining and Knowledge Discovery, 77(1), 55-77. Springer.
Hammouda, K., Karray, F. (2000). A comparative study of data clustering techniques.
Tools of intelligent systems design. In Course Project, SYDE 625, University of
Waterloo, Waterloo, Canada. Available from
www.pami.uwaterloo.ca/pub/hammouda/sde625-paper.pdf
Hilbe, J. (2007). Negative binomial regression. Cambridge, UK: Cambridge University
Press.
Jang, J.S.R. (1993). “ANFIS: Adaptive Network-Based Fuzzy Inference System”. IEEE
Transactions on Systems Man and Cybernetics, 23(3), 665-685.
Justice, A.C., Covinsky, K.E., Berlin, J.A. (1999). “Assessing the generalizability of
prognostic information”. Annals of Internal Medicine, 130(6), 515-524.
Kononov, J., Allery, B. (2003). “Level of service of safety: Conceptual blueprint and
analytical framework”. Transportation Research Record 1840, 57-66.
Kosko, B. (1992). “Fuzzy systems as universal approximators”. 1992 Proc. IEEE
International Conference on Fuzzy Systems, 1153-1162.
Lord, D., Mannering, F. (2010). “The statistical analysis of crash-frequency data: a
review and assessment of methodological alternatives”. Transportation Research Part
A: Policy and Practice, 44(5), 291-305.
Lord, D., Washington, S.P., Ivan, J.N. (2005). “Poisson, Poisson-gamma and zero-
inflated regression models of motor vehicle crashes: balancing statistical fit and
theory”. Accident Analysis & Prevention, 37(1), 35-46.
Miaou, S.-P. (1994). “The relationship between truck accidents and geometric design of
road sections: poisson versus negative binomial regressions”. Accident Analysis &
Prevention, 26(4), 471-482.
Moller, M. (1993). “A scaled conjugate gradient algorithm for fast supervised learning”.
Neural Networks, 6(4), 525-533. Elsevier.
Pande, A., Abdel-Aty, M. (2006). “Assessment of freeway traffic parameters leading to
lane-change related collisions”. Accident Analysis and Prevention, 38(5), 936-948.
Planning Commission of India (2002). Tenth Five Year Plan, 2002-2007. Vol. II:
Sectoral policies and programmes. Government of India, New Delhi.
Shah, P. (2011). Development of probabilistic models for prediction of road crash
occurrence and road crash severity on high speed corridors. A dissertation submitted
for the degree of M.Tech to National Institute of Technology, Surat, Gujarat, India.
16
Stinchcombe, M., White, H. (1989). “Universal approximation using feedforward
networks with non-sigmoid hidden layer activation functions”. IJCNN International
Joint Conference on Neural Networks 1989, I-607--I-611, IEEE TAB Neural
Network Committee.
Road Accidents in India: 2013 (2013). Ministry of Road Transport and Highways, New
Delhi, Government of India.
Thailand Accident Research Centre (2009). Development of Accident Prediction
Models. Klong Luang, Pathumthani: Thailand Accident Research Centre.
Vogt, A., Bared, J.G. (1998). Accident models for two-lane rural roads: segments and
intersections. FHWA-RD-98-133, Washington DC: Federal Highway
Administration.
Xie, Y., Lord, D., Zhang, Y. (2007). “Predicting motor vehicle collisions using
Bayesian neural network models: An empirical analysis”. Accident Analysis and
Prevention, 39(5), 922-933.
Zheng, L., Meng, X. (2011). “An approach to predict road accident frequencies:
application of fuzzy neural network”. Paper presented at the 3rd International
Conference on Road Safety and Simulation, September 14-16, 2011, Indianapolis,
USA.