0% found this document useful (0 votes)

39 views10 pages

Predicting Crash Likelihood and Severity

Uploaded by

gamingpheonix752

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views10 pages

Predicting Crash Likelihood and Severity

Uploaded by

gamingpheonix752

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Accident Analysis and Prevention 57 (2013) 30–39

Contents lists available at SciVerse ScienceDirect

Accident Analysis and Prevention

journal homepage: www.elsevier.com/locate/aap

Predicting crash likelihood and severity on freeways with real-time

loop detector data
Chengcheng Xu a,∗ , Andrew P. Tarko b , Wei Wang a , Pan Liu a
a
School of Transportation, Southeast University, Si Pai Lou #2, Nanjing, 210096, China
b
Center for Road Safety, School of Civil Engineering, Purdue University, 550 Stadium Mall Drive, West Lafayette, IN 47907, United States

a r t i c l e i n f o a b s t r a c t

Article history: Real-time crash risk prediction using traffic data collected from loop detector stations is useful in dynamic
Received 27 August 2012 safety management systems aimed at improving traffic safety through application of proactive safety
Received in revised form 4 March 2013 countermeasures. The major drawback of most of the existing studies is that they focus on the crash
Accepted 31 March 2013
risk without consideration of crash severity. This paper presents an effort to develop a model that pre-
dicts the crash likelihood at different levels of severity with a particular focus on severe crashes. The
Keywords:
crash data and traffic data used in this study were collected on the I-880 freeway in California, United
Crash severity
States. This study considers three levels of crash severity: fatal/incapacitating injury crashes (KA), non-
Real-time safety management
Crash risk prediction
incapacitating/possible injury crashes (BC), and property-damage-only crashes (PDO). The sequential
Sequential logit model logit model was used to link the likelihood of crash occurrences at different severity levels to various
Freeway traffic flow characteristics derived from detector data. The elasticity analysis was conducted to evaluate
the effect of the traffic flow variables on the likelihood of crash and its severity.The results show that
the traffic flow characteristics contributing to crash likelihood were quite different at different levels of
severity. The PDO crashes were more likely to occur under congested traffic flow conditions with highly
variable speed and frequent lane changes, while the KA and BC crashes were more likely to occur under
less congested traffic flow conditions. High speed, coupled with a large speed difference between adja-
cent lanes under uncongested traffic conditions, was found to increase the likelihood of severe crashes
(KA). This study applied the 20-fold cross-validation method to estimate the prediction performance of
the developed models. The validation results show that the model’s crash prediction performance at each
severity level was satisfactory. The findings of this study can be used to predict the probabilities of crash
at different severity levels, which is valuable knowledge in the pursuit of reducing the risk of severe
crashes through the use of dynamic safety management systems on freeways.
© 2013 Elsevier Ltd. All rights reserved.

1. Introduction et al., 2003; Abdel-Aty et al., 2004, 2005; Zheng et al., 2010; Pande
et al., 2011; Ahmed et al., 2012; Ahmed and Abdel-Aty, 2012; Xu
Real-time crash risk prediction models estimate the likelihood et al., 2012a,b, in press; Li et al., 2012).
of crash occurrence for a given freeway segment over a short time Oh et al. (2001, 2005) applied a Bayesian model to establish the
period, such as 5 min. One of the important practical applications of statistical relationship between the crash risk and the real-time
real-time crash risk prediction models is identification of hazardous traffic flow states. The results showed that the standard deviation
traffic conditions that may lead to a crash. Predicting the crash risk of speed estimated in five-minute intervals was a good indicator of
in real-time is an essential task in freeway dynamic safety man- hazardous traffic conditions where the crash potential was consid-
agement systems. Crash risk prediction helps identify hazardous erably higher than under other traffic conditions. Lee et al. (2003)
traffic conditions where proactive crash prevention strategies are used a log-linear model to estimate crash risks based on real-time
needed to mitigate the high crash risk. In recent years, numerous traffic flow data collected from freeway loop detector stations. It
studies have developed freeway crash risk prediction models that was concluded that the coefficient of variation in speed, traffic den-
link the crash risk with certain traffic flow characteristics measured sity, and speed difference between upstream and downstream loop
with freeway traffic surveillance systems (Oh et al., 2001, 2005; Lee detector stations were significantly correlated with the crash risk.
Abdel-Aty et al. (2004) applied matched case-control logis-
tic regression to link crash likelihood with real-time traffic flow
∗ Corresponding author. Tel.: +86 13801580045.
characteristics. The traffic intervals preceding a crash were cases
E-mail addresses: [email protected] (C. Xu), [email protected] (A.P. Tarko),
that were matched with the crash-free intervals used as controls.
[email protected] (W. Wang), [email protected] (P. Liu). The results showed that the likelihood of crash occurrence was

0001-4575/$ – see front matter © 2013 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.aap.2013.03.035
C. Xu et al. / Accident Analysis and Prevention 57 (2013) 30–39 31

correlated with the average detector occupancy at the upstream

loop detector station and with the coefficient of variation in speed
observed at a downstream loop detector station. In a subsequent
study, Abdel-Aty et al. (2005) developed real-time crash risk pre-
diction models under high speed and low speed traffic conditions
based on the matched case-control logistic regression. It was
found that the mechanisms of multi-vehicle crashes were dif-
ferent in these two speed regimes. Abdel-Aty and Pande (2005)
applied the probabilistic neural network (PNN) model to predict
crash occurrences on freeways using multiple speed derivatives,
which included the logarithms of the coefficient of the varia-
tion in speed. Pande and Abdel-Aty (2006) developed a crash risk
prediction model based on the classification tree and neural net-
work to identify hazardous traffic conditions potentially leading to
lane-change-related collisions. The results indicated that the aver-
age speed, the difference in occupancy between adjacent lanes, Fig. 1. Valid rate of traffic data along the I-880 Freeway in 2008.
and standard deviation of speed and volume contributed to lane-
change-related crash risk.
probability of non-PDO crash conditional on crash occurrence, but
Recently, Zheng et al. (2010) used a matched case-control logis-
the model cannot be used directly to predict the occurrences of
tic regression model to evaluate the impacts of the speed variance
severe crashes. Golob’s study utilized data collected with a single
resulting from the oscillating traffic state on the likelihood of crash
detector station so the spatial differences in traffic flow states could
occurrence using case-controlled data. Hossain and Muromachi
not be included in the model. We were unable to find other stud-
(2011) developed separate crash risk prediction models for basic
ies aimed at developing real-time prediction models for crashes at
freeway segments and ramp vicinities and found that the contribut-
various levels of severity.
ing factors to crash risk were quite different for the two areas. The
The primary objective of this study is to explore the possibil-
mean and standard deviation of the difference in traffic flow param-
ity of developing crash risk models for distinct levels of severity
eters between adjacent lanes were the main contributing factors to
that are implementable in real-time freeway safety management
high crash risks on the basic freeway segments while the high ramp
systems. The practicality of such models is important, and this
flow and the variation in speed between downstream and upstream
aspect will be addressed by rigorous testing of the predictive ability
detector stations affected the crash risk within the ramp vicinities.
of the estimated models and by critical discussion of the models’
Xu et al. (2012a) used a K-means clustering analysis to classify traf-
performance from the standpoint of false alarms and their erod-
fic flow states and to test the connection between these traffic states
ing effect on drivers’ response to warning messages. The research
and the crash risks on freeways. Crash risk prediction models were
results will promote a better understanding of the impact of traf-
then developed for different traffic states, and the results demon-
fic flow characteristics on the likelihood of severe crashes and will
strated that the impacts of traffic flow characteristics on crash risks
help transportation professionals develop effective crash preven-
were different across different traffic states.
tion strategies that focus on consequential crash events.
Most of the real-time crash risk prediction models were devel-
oped using traffic data collected with loop detector stations. In later
studies, traffic data collected with other surveillance technologies 2. Data sources
were used to develop crash risk prediction models. Hourdos et al.
(2006) applied the binary logit model to identify crash prone con- To accomplish the research objective, data were obtained from
ditions on freeways using traffic data captured with video cameras. a 29-mile segment on the I-880 freeway in the San Francisco Bay
Several traffic flow characteristics contributing to crash likelihood area of California in the United States. There are 119 loop detectors
were identified, such as large speed differences between adjacent stations in the northbound and southbound directions along the
lanes and compression waves leading to abrupt changes in traffic selected freeway section with an average spacing of 0.5 miles. The
flow. Ahmed and Abdel-Aty (2012) applied matched case-control standard deviation of the spacing is around 0.3 mile. The minimum
logistic regression to develop a real-time crash risk prediction and maximum spacing are 0.15 and 1.68, respectively. A total of 5
model using traffic data collected from the tag readers on toll weather stations are located along the selected freeway section. All
roads known as Automatic Vehicle Identification (AVI) Systems. In the 5 weather stations are located within about 5 miles from the
a following study, Ahmed et al. (2012) applied the Bayesian semi- I-880N freeway. The collected crash, weather and traffic data cover
parametric Cox proportional hazards model to develop a real-time the entire 2008 period. A total of 794 crashes were identified and
crash risk prediction model based on traffic data collected from an used in the study.
AVI system. The results demonstrated that the likelihood of crash The traffic data were obtained from the Highway Performance
occurrences on freeways was affected by the average speed and Measurement System (PeMS) maintained by the California Depart-
standard deviation of speed measured by the AVI system. ment of Transportation (Caltrans). Fig. 1 illustrates the percentages
Crash severity is usually defined as the most serious injury of valid traffic data along the I-880 freeway in 2008. As shown in
among individuals involved in the crash. The severity of a crash can Fig. 1, traffic data has reasonable valid rate on the selected free-
range from low-cost property damage to extremely costly severe way segment and the percentage of valid data is generally around
injury and fatality. It is important to be able to predict the likelihood 90%. The average speed, volume, and occupancy in 30-s aggregation
of crashes at various levels of severity to focus safety management intervals were collected in each lane. Traffic data were excluded as
systems on proactive prevention of severe injuries. The current invalid or not usable under one or more of the following condi-
most common approach is to consider the real-time risk of crash tions: (1) the average occupancy was greater than 100%; (2) the
without distinguishing between different levels of severity. The average speed was greater than 0 mph while the flow rate was
first attempt to estimate the risk of a severe crash by using traf- 0 vph; (3) the flow rate was greater than 0 vph while the occu-
fic flow characteristics measured with loop detectors was made pancy was 0%; (4) the average speed was greater than 100 mile;
by Golob et al. (2008). He developed a binary logit model of the or (5) the occupancy was greater than 0% while the flow rate was
32 C. Xu et al. / Accident Analysis and Prevention 57 (2013) 30–39

Table 1
Variables considered for the models.

Symbol Variables

VehCntu Average 30-s vehicle count at the upstream station (veh/30 s)

DetOccu Average 30-s detector occupancy at the upstream station (%)
AvgSpdu Average 30-s speed at the upstream station (mile/h)
OccDevu Std. dev. of 30-s detector occupancies at the upstream station (%)
SpdDevu Std. dev. of 30-s mean speeds at the upstream station (mile/h)
CvSpdu Coefficient of variation of 30-s mean speeds at the upstream station (mile/h)
OccDifu Average absolute difference in 30-s detector occupancies between adjacent lanes at the upstream station (%)
SpdDifu Average absolute difference in 30-s mean speeds between adjacent lanes at the upstream station (mile/h)
VehCntd Average 30-s vehicle counts at the downstream station (veh/30 s)
DetOccd Average 30-s detector occupancy at the downstream station (%)
AvgSpdd Average 30-s speed at the downstream station (mile/h)
OccDevd Std. dev. of 30-s detector occupancies at the downstream station (%)
SpdDevd Std. dev. of 30-s mean speeds at the downstream station (mile/h)
CvSpdd Coefficient of variation of 30-s mean speeds at the downstream station (mile/h)
OccDifd Average absolute difference in 30-s detector occupancies between adjacent lanes at the downstream station (%)
SpdDifd Average absolute difference in 30-s mean speeds between adjacent lanes at the downstream station (mile/h)
AvgCntu–d Average absolute difference in vehicle counts between upstream and downstream stations (veh/30 s)
AvgOccu–d Average absolute difference in detector occupancies between upstream and downstream stations (%)
AvgSpdu–d Average absolute difference in speeds between upstream and downstream stations (mile/h)
DevCntu–d Std. dev. of absolute difference in vehicle counts between upstream and downstream stations (veh/30 s)
DevOccu–d Std. dev. of absolute difference in detector occupancies between upstream and downstream stations (%)
DevSpdu–d Std. dev. of absolute difference in speeds between upstream and downstream stations (mile/h)
DetDistu–d Distance between upstream and downstream stations (mile)
Widths Road surface width (ft)
Widtho 1 = if outer shoulder width > 10 ft; 0 = otherwise
Widthi 1 = if inner shoulder width > 10 ft; 0 = otherwise
Widthm Inner median width (ft)
Lanes Number of lanes
On-ramp 1 = if there is an on-ramp between upstream and downstream stations; 0 = otherwise
Off-ramp 1 = if there is an off-ramp between upstream and downstream stations; 0 = otherwise
Curve 1 = curve section; 0 = otherwise
Peak 1 = peak period; 0 = otherwise
Weather 1 = adverse weather conditions (rain or fog); 0 = otherwise

equal to 0 vph. The 30-s raw detector readings from two consecu- at different time along the selected I-880 freeway section. Thus,
tive upstream–downstream detector stations were aggregated into the selected non-crash cases could generally represent the normal
5-min intervals and converted into the 22 traffic flow variables traffic flow conditions.
presented in Table 1. The geometric data for the I-880 freeway were also obtained
The traffic flow variables in Table 1 consist of 5-min observations from the PeMS database. As shown in Table 1, 9 geometric vari-
supplemented with a crash indicator (1 if a crash occurred between ables were used in this study. The geometric data for each crash
the upstream and downstream detectors, and 0 otherwise). The case and non-crash case were extracted based on their milepost
researchers extracted traffic data in the time interval between 5 location. The weather data were obtained from National Climate
and 10 min prior to crash occurrence. The purpose of doing so was Data Center (NCDC) website which provides hourly weather infor-
to identify hazardous traffic condition ahead of the crash occur- mation from weather stations across the United States. Weather
rence time to make preemptive measures possible (Pande et al., conditions for each crash case and non-crash case were extracted
2011; Lee et al., 2011; Xu et al., in press). For example, if a crash based on their time and milepost location. Considering the sample
occurred at 10:00 pm, the traffic data were extracted from 9:50 size in each category, the rain and fog were combined as adverse
to 9:55 pm. This time lag was also adopted in previous studies weather conditions. As a result, the study considered two difference
to develop real-time crash risk prediction models (Pande et al., weather conditions, including clear weather and adverse weather.
2011; Lee et al., 2011; Xu et al., in press). For each crash in the
dataset, the researchers randomly selected 20 five-minute inter-
vals without crashes from the crash-free days. These intervals were
supplemented with the 22 traffic flow variables to form crash-free
observations. To generate the dataset of non-crash cases, the time
for each non-crash case was randomly chosen from the 527,0401-
min intervals in 2008 (60 min × 24 h × 366 days in 2008). Similarly,
the upstream and downstream station for each non-crash case was
randomly selected from the 119 loop detector stations. Then, each
randomly selected combination of time and stations was used to
extract 5-min detector data after the assigned time of the non-
crash from the assigned upstream and downstream station of the
non-crash. In addition, it was ensured that there were no crashes
observed at the location of each non-crash case during the whole
day. Each non-crash case was also assigned a random milepost loca-
tion based on its upstream and downstream station. Figs. 2 and 3
illustrate the distributions of non-crash cases over time and space.
The dataset of non-crash cases covers the normal traffic conditions Fig. 2. The distribution of non-crash cases over time.
C. Xu et al. / Accident Analysis and Prevention 57 (2013) 30–39 33

However, the main drawback associated with ordered probit/logit

models is that the parameters estimates and the set of significant
explanatory variables are the same over all the crash severity levels.
Even though the parameters estimates of each explanatory vari-
able could be different across different crash severity levels in the
generalized ordered logit model, the set of significant explana-
tory variables is still assumed to be the same across different
crash severity levels. Considering the fact that the multinomial and
nested logit models cannot explicitly represent the ordinality in
the discrete categories of the injury severity, this study applied a
sequential logit model to capture the impacts of different traffic
flow parameters on the crash likelihood at various severity levels.

3.1. Binary logit model

Fig. 3. The distribution of non-crash cases along the I-880 Freeway. In this study, a binary logit model was applied at each stage to
fit the developed sequential logit model. At each stage, a binary
logit model was used to fit a sub-sample that excluded the obser-
The crash data were obtained from the Statewide Integrated vations of a certain level used in the previous stage. The binary
Traffic Records System (SWITRS) maintained by Caltrans. The logit regression model was used in previous studies for predicting
crash dataset provided by SWITRS included crash location, crash a binary dependent variable as a function of the predictor variables
time, and crash severity. The crash severity is divided into five in transportation engineering (Xu and Tian, 2008; Hubbard et al.,
levels, including fatal crash (K), incapacitating injury crash (A), 2009; Liu et al., 2007). Using the binary logit model, the probability
non-incapacitating injury crash (B), possible injury crash (C), and of the occurrence of a crash can be estimated using the following
property-damage-only crash (PDO). The definition of each crash equation:
severity level is:
A fatal injury is any injure that results in death within a 30 day 1
P(xi ) = (i = 1, 2, ...., n) (1)
period after the crash occurred. 1 + e−g(xi )
An incapacitating injury is any injure other than a fatal injure,
where P(xi ) denotes the probability of the occurrence of a crash and
which prevents the injured person from normally continuing the
g(x) is the multiple linear combination of explanatory variables,
activities the person was capable of performing before the injury
which can be expressed as:
occurred.
A non-incapacitating injury is any injure other than a fatal injury P(xi )
or an incapacitating injury, which is evident to observers at the g(x) = ln = ˇ0 + ˇ1 x1i + · · · + ˇk xki (2)
1 − P(xi )
scene of the crash.
A possible injury is any injure that includes complaint of pain where xki denotes the value of variable k for sample i and ˇk is the
without visible injury. coefficient of variable k. The parameters ˇ0 , ˇ1 , ˇ2 , . . ., ˇk can be
The original five levels of severity were combined into three estimated by solving the log-likelihood function for Eq. (2), which
levels: KA, BC, and PDO (Leckrone et al., 2011; Jung et al., 2010). is given by:
This combining process acknowledged the similarity of the com- n
bined levels and increased the number of crashes at each new level,

ln L(ˇ, xi ) = [ˇ0 +ˇ1 x1i +· · ·+ˇki xki − ln(1 + eˇ0 +ˇ1 x1i +···+ˇki xki )]
thereby improving the chance for more significant variables being
i=1
included in the final models. Most of the crashes (56.0%) in the (3)
dataset were rear-end crashes followed by sideswipe crashes with
about 22.3%. About 54.9% and 13.7% of injury crashes in dataset
were rear-end crashes and sideswipe crashes respectively, and As per findings of previous studies (Hauer and Hakkert, 1988;
20.2% were hit object crashes. The crash frequency for each sever- Elvik and Mysen, 1999; Hauer, 2006; Savolainen et al., 2011), less
ity level is shown in Table 2. A total of 794 crash cases and 15,880 severe crashes are more likely to be under-reported and the under-
non-crash cases were included in our database. reporting rate decreases with the increase in severity level. Thus,
accident samples are usually over-represented by high severity
3. Research methodology level crashes. Further, it is prohibitive to include all the observations
of non-crash cases in the dataset. 15,880 observations of non-crash
The ordered probit/logit model is the most common model- cases were randomly selected to represent the normal traffic con-
ing approach that fits the data structure of an ordinal response. ditions. Finally, due to the missing or invalid real-time traffic data,
the crashes that could not be matched with real-time traffic data
were excluded from further data analysis. These factors will make
Table 2
the dataset an outcome (choice)-based sample.
Frequency distribution of observations in the sequential logit model.
When the conventional maximum likelihood estimator (MLE) is
Crash severity level Sequential structure used to the outcome-based samples, the multinomial logit model
Stage 1 Stage 2 Stage 3 could still produce unbiased estimates for model parameters except
the constant terms (Cosslett, 1981a,b; Yamamoto et al., 2008; Patil
Fatal and incapacitating injury (KA) 59 (1a ) 59 (1) 59 (1)
Non-incapacitating and possible injury(BC) 203 (1) 203 (1) 203 (0) et al., 2011). Similar to multinomial logit model, the parameter esti-
Property damage only (PDO) 532 (1) 532 (0) – mates in the binary sequential logit model are unbiased except
Non-crash 15,880 (0) – – constant terms (Yamamoto et al., 2008; Savolainen et al., 2011),
Total 16,674 794 262
because the binary sequential logit model is the combination of
a
SAS coding of crash severity level are in parentheses. several binary logit models.
34 C. Xu et al. / Accident Analysis and Prevention 57 (2013) 30–39

Stage 3. Crash types KA (binary response = 1) vs. crash types BC

(binary response = 0).

Based on the estimated binary logit model at each stage of the

sequential model, the likelihood of crash occurrence at different
severity levels can be calculated as follows:

P(Crash) = Pf 1 (7)

P(PDO) = P(Crash) P(PDO|Crash) = Pf 1 (1 − Pf 2 ) (8)

P(BC) = P(Crash) P(KA or BC|Crash) P(BC|KA or BC) = Pf 1 Pf 2 (1 − Pf 3 )

(9)
P(KA) = P(Crash)P(KA or BC|Crash)P(KA|KA or BC) = Pf 1 Pf 2 Pf 3
Fig. 4. The structure of the sequential logit model considered in this study. (10)

where P(Y) is the probability of Y; P(Y|X) is the probability of Y given

The computation of the predicted probability is based on the
that X happens; Pfi represents the estimated probability calculated
full set of parameter estimates including the constant term, thus the
by the binary logit model at each stage of the sequential structure
predicted probability by the binary sequential logit model would be
as shown in Fig. 4.
biased. To account for the biases caused by outcome-based samp-
ling scheme, the intercept of the binary logit model at each stage 3.3. Model validation technique
was adjusted by using an offset (Scott and Wild, 1986). To adjust
the estimated intercept, an offset value calculated as shown below This study used the k-fold cross-validation method to esti-
was added to the original intercept. mate the predication accuracy of the developed models. The k-fold
cross-validation method can minimize the bias associated with the
SR
i
offset = −Ln (4) random sampling of the training and validation dataset in estimat-
PRi
ing the prediction accuracy of a model (Olson and Delen, 2008).
where SRi represents the ratio of observations having outcome i In the k-fold cross-validation method, the complete dataset is
to other observations in the sample and PRi represents the ratio of randomly partitioned into k mutually exclusive subsets of approxi-
observations having outcome i to other observations in the total mately equal size. Among the k datasets, each single subset is used
population. as the validation dataset, and the other k − 1 subsets are combined
To evaluate the effect of the traffic flow variables on the likeli- to form a training dataset. Hence, the prediction model is trained
hood of crash and its severity, the elasticity analysis was conducted. and tested for k times (the folds). In this study, a 20-fold cross-
The elasticity represents the percentage change in the dependent validation was conducted to estimate the prediction performance
variable resulting from a 1% change in an independent variable of the developed model.
(Washington et al., 2003). The elasticity of the dependent variable
Y with respect to a continuous independent variable xi is given as: 4. Estimated models
∂Yi x
Ei = × i = [1 − P(i)]ˇi xi (5) The LOGISTIC procedure in SAS 9.2 was used to specify the binary
∂xi Yi
logit model at each stage with a significance level of 0.1 for retaining
Although each observation in the dataset has an elasticity that the explanatory variables in the models (SAS, 2011). To account for
depends on the value of xi and the estimated probability of crash the possible correlations between the candidate independent vari-
severity P(i), it is customary to report the average elasticity in the ables, the Pearson correlation parameters were calculated between
sample. In the following analysis, both the average and standard different pairs of candidate independent variables and several com-
deviation of the elasticity are given. Note that Eq. (5) cannot be binations were generated that included the maximum number
used to calculate the elasticity for indictor variables. The sensitivity of uncorrelated variables. Stepwise variable selection was then
of an indicator variable xi is made by computing a pseudo-elasticity applied to select the independent variables that should be included
using the following equation (Washington et al., 2003): in the binary logit model at each stage. The log likelihood at the
convergence of each model was compared, and the model with the
EXP[(x′ ˇ)][1 + EXP(xi ˇi )]
Ei = − 1 × 100 (6) highest log likelihood was considered the best model.
EXP[(x′ ˇ)][EXP(xi ˇi )] + 1
4.1. Results discussion

3.2. Model structure 4.1.1. First-stage model – all crashes

The estimation results for the sequential logit model are shown
Fig. 4 illustrates the structures of the binary sequential logit in Table 3. In the first stage, six traffic flow variables: the upstream
model used in this study. The severity level of crash in the sequen- occupancy, the upstream speed variance, the downstream speed
tial model varied from the lowest to the highest level. As shown variance, the difference in occupancy between adjacent lanes at
in Table 2, the sequential model were conducted in this study as the downstream station, the difference in occupancy between
follows, upstream and downstream station, the difference in vehicle count
between upstream and downstream station were found to be sig-
Stage 1. Crash types KA, BC, and PDO (binary response = 1) vs. non- nificantly correlated with the likelihood of crash. The results shown
crash cases (binary response = 0). in Table 3 indicate that the crash likelihood tends to be high
Stage 2. Crash type KA and BC (binary response = 1) vs. PDO (binary when the traffic density at the upstream detector station (approxi-
response = 0). mated with DetOccu ), the speed variance at the upstream detector
C. Xu et al. / Accident Analysis and Prevention 57 (2013) 30–39 35

Table 3
Estimation results for the sequential logit model.

Parameter Estimate Std. Error Wald 2 Pr > Chisq Elasticity

Stage 1 Crash (KA, BC, and PDO) vs. non-crash

DetOccu 0.074 0.007 99.586 <0.0001 0.467 (0.337b )
SpdDevu 0.060 0.016 14.544 <0.0001 0.224 (0.142)
SpdDevd 0.050 0.016 9.079 0.003 0.189 (0.117)
OccDifd 0.119 0.011 109.175 <0.0001 0.299 (0.299)
AvgCntu-d 0.092 0.039 5.593 0.018 0.09 (0.087)
AvgOccu-d 0.026 0.013 3.663 0.056 0.038 (0.057)
Weather 0.886 0.141 39.410 <0.0001 0.408 (0.041)
DetDistu-d 1.057 0.089 140.246 <0.0001 0.495 (0.365)
Widths −0.049 0.008 38.836 <0.0001 −2.535 (0.281)
Widtho −0.856 0.150 32.365 <0.0001 −0.415 (0.028)
Curve 0.508 0.121 17.646 <0.0001 0.243 (0.019)
Intercept −2.672 (−4.704a ) 0.420 40.545 <0.0001 –
Summary statistics:
−2L(c) = 6347.700; −2L(ˇ) = 5408.806
−2[L(c) − L(ˇ)] = 938.894 (11df); P < 0.0001

Stage 2 KA and BC vs. PDO

DetOccu −0.033 0.013 6.087 0.014 −0.202 (0.184)
VehCntd −0.056 0.025 5.180 0.023 −0.303 (0.142)
Peak −0.335 0.174 3.698 0.054 −0.174 (0.011)
Weather −0.689 0.300 5.290 0.021 −0.338 (0.019)
Widths −0.036 0.016 5.280 0.022 −0.956 (0.273)
Intercept 2.129 (0.644) 0.822 6.707 0.010 –
Summary statistics:
−2L(c) = 1007.047; −2L(ˇ) = 963.730
−2[L(c) − L(ˇ)] = 43.317 (5df); P < 0.0001

Stage 3 KA vs. BC
AvgSpdu 0.033 0.015 4.900 0.027 2.022 (0.448)
SpdDifu 0.067 0.020 10.723 0.001 0.859 (0.404)
VehCntd −0.117 0.042 7.577 0.006 −1.007 (0.464)
Intercept −3.510 (−1.971) 1.199 8.568 0.003 –
Summary statistics:
−2L(c) = 279.501; −2L(ˇ) = 238.076
−2[L(c) − L(ˇ)] = 41.426 (3df); P < 0.0001
a
The intercept adjustment for each logit model.
b
The standard deviation of elasticity for each variable.

station (represented by SpdDevu ), the speed variance at the down- model (first stage). The positive coefficient of variable DetDistu–d
stream detector station (represented by SpdDevd ), the traffic indicates that the probability of crash grows with the length of
inter-lane imbalance (measured with OccDifd ), the volume differ- the road segment. The findings from the aggregate crash pre-
ence between upstream and downstream station (represented by diction models in previous studies also demonstrated that the
AvgCntu–d ), and the occupancy difference between upstream and increase in road segment resulted in an increase in the number of
downstream station (represented by AvgOccu–d ) are high as well. accidents (Anastasopoulos and Mannering, 2009). Both variables
These results are consistent with the high lane-change activities Widths and Widtho had negative coefficients, indicating that the
postulated by Gazis et al. (1962) for certain conditions and con- crash risk decreases with the increase in road surface width and
firm the findings of previous statistical analyses (Lee et al., 2003; out shoulder width. As indicated by the coefficient of the variable
Abdel-Aty et al., 2004; Ahmed and Abdel-Aty, 2012; Xu et al., Curve, the curve segment could increase the crash risk on freeways.
2012a,b). Generally speaking, the small distances between vehicles These results are consistent with the findings of previous studies
in high-density traffic flow leave less time for taking crash avoid- (Anastasopoulos and Mannering, 2009; Das and Abdel-Aty, 2010;
ance maneuver. Speed fluctuations and traffic imbalances between Shively et al., 2010). The positive coefficient of weather conditions
lanes may encourage drivers to change lanes more frequently – a indicates that adverse weather conditions could increase crash like-
maneuver that can be quite dangerous in dense traffic. lihood on freeways. Surprisingly, the model does not include the
Elasticity tells how many times the crash probability changes if presence of on-ramps and off-ramps between the detector sta-
the explanatory variable changes by 1% while the other variables tions. It seems that the six traffic flow characteristics included in
remain fixed. Unlike the marginal effects, the elasticity is dimen- the model have already captured the impacts of the ramps.
sionless, thus it is more convenient for comparing the effects of As mentioned above, the parameter estimates in the binary
different variables. For example, the average elasticity values for sequential logit model are unbiased except constant terms when
the six traffic flow characteristics (DetOccu , SpdDevu , SpdDevd , MLE is used to the outcome-based samples. To account for the
OccDifd , AvgCntu–d , and AvgOccu–d ) are: 0.467, 0.224, 0.189, 0.299, biases caused by outcome-based sampling scheme, Eq. (4) was used
0.090, and 0.038, respectively. It means that the one-percent to adjust the intercept of the model at the first stage as follows:
increase in these six traffic flow characteristics is associated with
the 0.467%, 0.224%, 0.189%, 0.299%, 0.090%, and 0.038% increases in

Non-crashp
SR Crashs
crash
the crash probability, respectively. offset = −Ln = −Ln ×
PRcrash Crashp Non-crashs
Among the geometric variables, the spacing between upstream

and downstream stations (DetDistu–d ), the road surface width Crashr Crashs Non-crashp
(Widths ), the outer shoulder width (Widtho ) and the curve sec- = −Ln × × (11)
Crashp Crashr Non-crashs
tion (Curve) were found to be significant in the crash probability
36 C. Xu et al. / Accident Analysis and Prevention 57 (2013) 30–39

where SRcrash represents the ratio of crash cases to non-crash cases VehCntd ) were significantly correlated with the risk of fatal or inca-
in the data sample; PRcrash represents the ratio of crash cases to pacitating injury upon crash occurrence. Both variables AvgSpdu
non-crash cases in the total population; Non-crashs and Non-crashp and SpdDifu had positive coefficients, indicating a high risk of a fatal
represent the number of non-crash cases in the data sample and or incapacitating injury outcome from a crash if the crash occurs
total population, respectively; Crashs and Crashp represent the at high speed and considerable speed difference between lanes.
number of crashes in the data sample and total population, respec- On the other hand, the negative coefficient of the downstream
tively; Crashr represents the reported number of crashes. flow variable (VehCntd ) indicates that a fatality and incapacitat-
As shown in Eq. (11), three ratios were included in the offset ing injury are less likely in a crash that occurs in a high-volume
to account for the biased caused by the outcome-based sampling traffic flow. Therefore, crashes that occur in less congested traf-
scheme. The ratio between Non-Crashs and Non-Crashp is used to fic flow conditions with high speeds and high speed differences
account for the reduction in the number of non-crash cases in the between adjacent lanes are prone to produce fatal and incapacitat-
data sample. Due to the missing or invalid real-time traffic data, the ing injuries.
number of crashes in the data sample is smaller than the number The average elasticity for average speed variable (AvgSpdu ) was
of reported crashes. The ratio between Crashs and Crashr is used 2.022; and for the speed difference across lanes (SpdDifd ), the aver-
to account for the reduction in the number of crashes caused by age elasticity was 0.859, indicating that a 1% increase in AvgSpdu
missing or invalid real-time traffic data. Finally, the ratio between and SpdDifu increases the probability of a fatal and incapacitating
Crashr and Crashp is used to account for the effect of underreport- injury crash by 2.022% and 0.859%, respectively. The average elas-
ing of crash data. Although the actual value for reporting rate of all ticity of −1.007 for the flow intensity (represented by VehCntd )
crashes is unknown, this value can be estimated using the results of implies that the probability of a fatal and incapacitating injury crash
previous studies about under reporting of crashes. The study con- decreases 1.007% for 1% increase in VehCntd .
ducted by Elvik and Mysen (1999) reported that the reporting rates
for crash K, A, B, C, O are 95%, 69%, 27%, 11%, and 25%, respectively. By 4.2. Prediction performance
the combination of the above reporting rates and reported crashes
in the crash data, the reporting rate of all crashes can be easily esti- 4.2.1. Test design
mated. After determining the reporting rate of all crashes, the offset This study applied the 20-fold cross-validation method to esti-
for adjusting the intercept of the model at the first stage can be estimate the prediction performance of the developed models. The
mated using Eq. (11). The offsets for adjusting the intercept terms whole sample was randomly partitioned into 20 mutually exclusive
at other stages shown in Table 3 can be estimated using the similar sub-samples of approximately equal size. Then, each sub-sample
method shown above. was used as a validation sample and the other 19 sub-samples were
combined as a training sample. The sequential logit models were
4.1.2. Second-stage model – injury crashes developed for different training samples. Note that the explana-
At the second stage, the upstream traffic density (represented tory variables included in the binary logit model at each stage were
by DetOccu ), the downstream traffic volume (represented by the same variables shown in Table 3. Eqs. (7)–(10) were used to
VehCntd ), the weather conditions (represented by weather), the calculate the crash likelihood at different severity levels for the
peak period (represented by Peak), and road surface width (repre- observations in each validation sample.
sented by Widths ) were found to be significantly correlated with
the risk of injury and fatality once a crash happens. The negative 4.2.2. Measure of performance and results
coefficients of the two traffic flow variables indicate that an injury is The prediction accuracy of a model of binary outcome (event = 1
less likely to result from a crash that occurs in more intense and con- and non-event = 0) can be measured with two complementary indi-
gested traffic. The interpretation here is that more congested traffic cators: (1) the proportion of events predicted as an event (true
tends to be slower than less congested conditions; thus crashes positive rate) called sensitivity in SAS, and (2) the proportion of
tend to occur at relatively low speeds, thereby decreasing the risk non-events predicted as a non-event (true negative rate) called
of injury and fatality. The findings from the aggregate crash predic- specificity in SAS. The model predicts an event if the predicted prob-
tion models in previous studies also demonstrated that the crashes ability of event exceeds a pre-specified threshold. The sensitivity
occurred in congested traffic conditions are likely to be less severe and specificity depend on the same threshold varying between 0
(Shefer, 1997; Chang and Xiang, 2003; Wang et al., 2009). The aver- and 1. A convenient and meaningful tool for evaluating the pre-
age elasticity for the upstream occupancy (DetOccu ) was −0.202 diction performance of a logit model is a curve called the Receiver
which indicates that the 1% increase in the upstream occupancy Operating Characteristic (ROC), which compares the sensitivity and
will result in 0.202% reduction in the probability of injury crash. the 1 − specificity for a threshold running from 0 to 1. Thus, the ROC
The average elasticity of −0.303 for the downstream flow indicates curve is a graphical plot of the sensitivity (y-axis) vs. 1 − specificity
that the probability of injury crash decreases by 0.303% for each 1% (x-axis). To generate the ROC curve for each severity level, we cal-
increase in VehCntd . The variable Widths had negative coefficients, culated the sensitivity and 1 − specificity for multiple thresholds by
indicating that the risk of injury and fatality decreases with the using all of the validation samples. Fig. 5 presents the ROC curves
increase in road surface width. Large road surface width gives for four possible outcomes for the sequential logit model: (1) crash
drivers more space for taking crash avoidance maneuver and reduc- at any severity level, (2) property-damage-only crash (PDO), (3)
ing crash severity. Both variables Peak and Weather had negative non-incapacitating or possible injury crash (BC), and (4) fatal or
coefficient, indicating that the peak period and adverse weather incapacitating injury crash (KA).
conditions could decrease the risk of injury and fatality. These are The areas under the ROC curves for the four possible outcomes
consistent with the findings of previous studies (Das and Abdel-Aty, were found to be 0.784, 0.803, 0.758, and 0.797 respectively, indi-
2010; Khattak and Knapp, 2001; Kim et al., 2013). cating that the sequential logit model can provide good predictive
performance. Table 4 summarizes the crash prediction perfor-
4.1.3. Third-stage model – fatal and incapacitating injury crashes mance at different severity levels for the developed sequential logit
The model estimated in the third stage implies that the aver- model. The prediction performance is measured by the percent of
age speed measured at the upstream detector station (AvgSpdu ), predicted crashes at each severity level and for several false alarm
the difference in speeds between adjacent lanes at the upstream (false positive) rates. As shown in Table 4, the prediction accuracy
station (SpdDifu ) and the downstream traffic flow (represented by of crashes increased as the false alarm rate was increased. This
C. Xu et al. / Accident Analysis and Prevention 57 (2013) 30–39 37

Fig. 5. The ROC curve of each severity level for the sequential logit model.

Table 4 trade-off between the prediction accuracy and the false alarm rate
Prediction performance at different false alarm rates.
must be considered when setting a threshold value and needs to
1 − specificity Sensitivity of the sequential model be determined carefully to meet the requirement of the practical
implementation or the preference of a specific traffic agency. After
PDO BC KA
determining the threshold, the ROC curve can be easily used to
0.02 22.8% 16.5% 22.4%
estimate the predictive performance. For example, if a threshold
0.1 48.5% 38.0% 37.9%
0.2 66.2% 59.5% 60.3% value is selected to accept a 30% false alarm rate, the prediction
0.3 75.7% 67.5% 75.9% accuracy of PDO, BC, KA crashes is found to be 75.7%, 67.5% and
0.4 82.3% 76.5% 84.5% 75.9%, respectively on the ROC curve. For comparison purpose, we
0.5 87.2% 84.5% 91.4% also briefly summarized the previous studies regarding the real-
time crash risk prediction modes in Table 5. Compared with the
predictive performance of the models in previous studies in Table 5,
the predictive performance of the models in this study is good.

Table 5
Summary of predictive performance of the real-time crash risk models in previous studies.

Authors Prediction accuracy of Prediction accuracy of False alarm Sample sized of Sample size of
crash non-crash rate crash non-crash

Oh et al. (2001) 55.8% 72.1% 27.9% 52 4787

Oh et al. (2005) 35.2% 73.5% 26.5% 52 4787
Abdel-Aty et al. (2004) 69.4% 52.8% 47.2% 375 1875
Abdel-Aty and Pande (2005) 73.9% 71.3% 28.7% 377 2857
Abdel-Aty et al. (2005) 56.0% 80.0% 20.0% 1528 6112
Pande and Abdel-Aty (2006) 57.0% 70.0% 30.0% 162 3650
Hossain and Muromachi (2010) 63.3% 80.0% 20.0% 250 23,068
Hassan and Abdel-Aty (2011) 67.2% 64.7% 35.3% 67 201
Pande et al. (2011) 45.0% 80.0% 20.0% 533 2660
Ahmed and Abdel-Aty (2012) 69.9% 54.9% 45.2% 670 2680
Ahmed et al. (2012) 72.9% 57.9% 42.0% 447 1788
Abdel-Aty et al. (2012) 73.1% 60.3% 39.7% 106 670
Yu and Abdel-Aty (2013) AUC = 0.75 265 1017
Xu et al. (in press) 61.0% 80.0% 20.0% 807 8070
38 C. Xu et al. / Accident Analysis and Prevention 57 (2013) 30–39

4.2.3. Implementation discussion The sequential logit model was applied to link the likelihood of
One possible real-time implementation of a model such the one crash occurrences at different severity levels with various traffic
presented in this paper is calculating the probabilities of crashes flow characteristics. The real-time traffic and crash data utilized in
at certain levels of severity between specific detector stations and this study were obtained on the I-880 freeway in California, United
warning drivers entering the segment about the high risk. It is States. The model estimation results showed that the traffic flow
imperative, however, in such an implementation, that the rate of characteristics contributing to crash probability were found to vary
false positives is not too high because of the anticipated erosion of substantially across different crash severity level. In general, the
drivers’ attention to the message. Although drivers cannot tell in low severity crashes (PDO) tended to occur in congested traffic flow
advance which alerts will not be followed by actual crash events, conditions with highly variable speed and frequent lane changes.
they will soon realize the high frequency of such alerts. For exam- The injury crashes (KA and BC) were found to occur more often in
ple, let us assume the false alarm rate is 50% and a driver who travels less congested traffic flow conditions. The KA crashes, in particular,
daily to work on a five-mile freeway section with 10 pairs of detec- occurred under uncongested traffic flow conditions as well as with
tor stations and who will be subject to five alarms on average during large differences in speed between adjacent lanes. The elasticity
a one-way trip. Even if drivers are not familiar with the frequency analysis was conducted at each stage of the sequential logit model
of crashes, they may become immune to the alerts quickly if no estimation to evaluate the effects of the traffic flow variables on the
crashes are apparent. If the false alarm rate is set at much lower likelihood of crash severity.
rate, say 1/50, then the traveler in the considered example would The 20-fold cross-validation method was applied to evaluate the
witness one false alarm a week. However, setting the false alarm predictive performance of the developed sequential logit model.
rate at 0.02% makes the prediction performance rather low. Fortun- The validation results demonstrate that the predictive performance
ately, it seems that predictive accuracy of the most severe crashes of the developed models was deemed satisfactory. The predictive
is not too bad. Being able to predict approximately 22% of inca- accuracy of PDO, BC, KA crashes is found to be 75.7%, 67.5% and
pacitating injury or fatal crashes and preventing, hopefully, most 75.9% when the false alarm rate was equal to 30%. As expected, there
of them without eroding the alertness of motorists might be suf- was a strong trade-off between the false alarm rate and the percent-
ficient justification for implementing real-time warning systems age of crashes predicted by the model, which is an important aspect.
utilizing models such the one presented in this paper. A model should have a reasonably low false alarm rate in order to
reduce the danger of losing drivers’ responsiveness to alerts about
high-risk traffic conditions. The set of developed models were able
4.3. Temporal and spatial transferability
to correctly identify high-risk severe crash conditions in 22% of the
cases with the false alarm rate set at a safe level of 2%. This result
Previous studies have demonstrated that the real-time crash
provides hope that the real-time prediction of severe crashes is pos-
prediction models cannot be directly transferred from one road
sible, and implementation of this study’s results, based on warning
to another due to the difference in driver population and traffic
motorists, is practical.
patterns (Pande et al., 2011; Ahmed and Abdel-Aty, 2012). The
Other advanced dynamic safety management systems, such as
Bayesian updating approach could be used to improve the temporal
variable speed limits and ramp metering, may benefit from the
and spatial transferability of the developed sequential model above.
real-time detection of high-risk conditions as well if the connection
The Bayesian updating approach could update the old model as new
between these traffic control methods and safety is better under-
data becomes available. Assuming that a real-time crash prediction
stood. In these cases, the false alarm rate is not as critical as it is
model has been developed based on the historical data Y1 , and that
for warning-based solutions because the road users are not aware
the new data or data from other road Y2 are then obtained, the pos-
of the reason for a reduced speed limit or a ramp metering rate
terior distribution can be updated using Bayes’ theorem as follows:
change.

(|Y1 , Y2 ) ∝ f (Y1 , Y2 |)() = f (Y2 |Y1 , )f (Y1 |)()

Acknowledgement
∝ f (Y2 |)(|Y1 ) (12)
This research was jointly sponsored by China’s National Key
Thus, when updating a new model, we can use the estimation Basic Research Program (No. 2012CB725400), China’s National
results of the sequential logit model above to develop informative High-tech R&D Program (No. 2011AA110303-03), the Scholarship
prior distribution (|Y1 ) and incorporate the new data or data from Award for Excellent Doctoral Student granted by the Ministry of
other road Y2 in a new updated posterior distribution. It is obvious Education of China, and Scientific Research Foundation of Graduate
that the developed sequential logit model in this study can be eas- School of Southeast University.
ily updated by the Bayesian updating approach, but this application
remains as a future study task.
References
5. Conclusions
Abdel-Aty, M., Uddin, N., Abdalla, F., Pande, A., Hsia, L., 2004. Predicting freeway
crashes based on loop detector data using matched case-control logistic regres-
Most of the existing studies consider the likelihood of crash sion. Transportation Research Record 1897, 88–95.
without considering the crash outcome severity based on the Abdel-Aty, M., Uddin, N., Pande, A., 2005. Split models for predicting multi-vehicle
crashes during high-speed and low-speed operating conditions on freeways.
differing contribution of traffic flow characteristics to the crash Transportation Research Record 1908, 51–58.
probability at different severity levels. However, due to their much Abdel-Aty, M., Pande, A., 2005. Identifying crash propensity using specific traffic
higher social and economic impacts than less severe crashes, the speed conditions. Journal of Safety Research 36 (1), 97–108.
Abdel-Aty, M., Hassan, H., Ahmed, M., Al-Ghamdi, A., 2012. Real-time prediction of
ability to predict the occurrence likelihood of severe crashes is visibility related crashes. Transportation Research Part C 24, 288–298.
important. This study not only further developed real-time crash Ahmed, M., Abdel-Aty, M., 2012. The viability of using automatic vehicle iden-
risk prediction models to identify hazardous traffic conditions that tification data for real-time crash prediction. IEEE Transactions on Intelligent
Transportation Systems 13 (2), 459–468.
potentially lead to crashes, but addressed as well the possibility of Ahmed, M., Abdel-Aty, M., Yu, R., 2012. A Bayesian updating approach for real-time
predicting crashes at various levels of severity in real-time using safety evaluation using AVI data. In: Presented at the 91th Annual Meeting of
traffic data collected with freeway loop detectors. the Transportation Research Board, CD-ROM, Washington, DC.
C. Xu et al. / Accident Analysis and Prevention 57 (2013) 30–39 39

Anastasopoulos, P., Mannering, F., 2009. A note on modeling vehicle accident Liu, P., Wang, X., Lu, J., Sokolow, G., 2007. Headway acceptance characteristics of U-
frequencies with random-parameters count models. Accident Analysis and Pre- turning vehicles at unsignalized intersections. Transportation Research Record
vention 41, 153–159. 2027, 52–57.
Chang, G., Xiang, H., 2003. The Relationship Between Congestion Levels and Acci- Li, Z., Chung, K., Liu, P., Wang, W., Ragland, D., 2012. Surrogate safety measure
dents. Maryland State Highway Administration, Baltimore, MD. for evaluating rear-end collision risk near recurrent bottlenecks. In: Presented
Cosslett, S.R., 1981a. Efficient estimation of discrete-choice methods. In: Manski, at the 91th Annual Meeting of the Transportation Research Board, CD-ROM,
C., McFadden, D. (Eds.), Structural Analysis of Discrete Choice Data with Econo- Washington, DC.
metric Applications. MIT Press, Cambridge, MA, pp. 51–111. Oh, C., Oh, J., Ritchie, S., 2001. Real-time estimation of freeway accident likelihood.
Cosslett, S.R., 1981b. MLE for choice-based samples. Econometrica 49, 1289– In: Presented at 80th Annual Meeting of the Transportation Research Board,
1316. CD-ROM, Washington, D.C.
Das, A., Abdel-Aty, M., 2010. A genetic programming approach to explore the Oh, C., Oh, J., Ritchie, S., 2005. Real-time hazardous traffic condition warning sys-
crash severity on multi-lane roads. Accident Analysis and Prevention 42, tem: framework and evaluation. IEEE Transactions on Intelligent Transportation
548–557. Systems 6 (3), 265–272.
Elvik, R., Mysen, A.B., 1999. Incomplete accident reporting; meta-analysis of studies Olson, D., Delen, D., 2008. Advanced Data Mining Techniques. Springer, Berlin,
made in 13 countries. Transportation Research Record 1665, 133–140. Germany.
Golob, T., Recker, W., Pavlis, Y., 2008. Probabilistic models of freeway safety perfor- Pande, A., Abdel-Aty, M., 2006. Assessment of freeway traffic parameters lead-
mance using traffic flow data as predictors. Safety Science 46 (9), 1306–1333. ing to lane-change related collisions. Accident Analysis and Prevention 38 (5),
Gazis, D., Herman, R., Weiss, G.H., 1962. Density oscillations between lanes of a 936–948.
multilane highway. Operations Research 10, 658–667. Pande, A., Dasand, A., Abdel-Aty, M., Hassan, H., 2011. Real-time crash risk esti-
Hauer, E., Hakkert, A., 1988. Extent and some implications of incomplete accident mation are all freeways created equal? Transportation Research Record 2237,
reporting. Transportation Research Record 1185, 1–10. 60–66.
Hauer, E., 2006. The frequency-severity indeterminacy. Accident Analysis and Pre- Patil, S., Geedipally, S., Lord, D., 2011. Analysis of crash severities using nested
vention 38, 78–83. logit model—accounting for the underreporting of crashes. Accident Analysis
Hassan, H., Abdel-Aty, M., 2011. Exploring visibility-related crashes on freeways and Prevention 45, 646–653.
based on real-time traffic flow data. In: Presented at 90th Annual Meeting of the Savolainen, P., Mannering, F., Lord, D., Quddus, M., 2011. The statistical analysis
Transportation Research Board, CD-ROM, Washington, D.C. of highway crash-injury severities: a review and assessment of methodological
Hossain, M., Muromachi, Y., 2010. Evaluating location of placement and spacing of alternatives. Accident Analysis and Prevention 43, 1666–1676.
detectors for real-time crash prediction on urban expressways. In: Presented at Scott, A.J., Wild, C.J., 1986. Fitting logistic models under case-control or choice based
89th Annual Meeting of the Transportation Research Board, CD-ROM, Washing- sampling. Journal of the Royal Statistical Society Series B48 (2), 170–182.
ton, D.C. SAS Institute Inc, 2011. SAS/STAT(R) 9.2 User’s Guide, second edition.
Hossain, M., Muromachi, Y., 2011. Understanding crash mechanism and selecting Shively, T., Kockelman, K., Damien, P., 2010. A Bayesian semi-parametric model
appropriate interventions for real-time hazard mitigation on urban express- to estimate relationships between crash counts and roadway characteristics.
ways. Transportation Research Record 2213, 53–62. Transportation Research Part B 44, 699–715.
Hourdos, N., Garg, V., Michalopoulos, G., Davis, G., 2006. Real-time detection Shefer, D., 1997. Congestion and safety on highways: towards an analytical model.
of crash-prone conditions at freeway high-crash locations. Transportation Urban Studies 34 (4), 679–692.
Research Record 1968, 83–91. Wang, C., Quddus, M., Ison, S., 2009. The effects of area-wide road speed and cur-
Hubbard, S., Bullock, D., Mannering, F., 2009. Right turns on green and pedestrian vature on traffic casualties in England. Journal of Transport Geography 17 (5),
level of service: statistical assessment. Journal of Transportation Engineering 385–395.
135 (4), 153–159. Washington, S., Karlaftis, M., Mannering, F., 2003. Statistical and Econometric Meth-
Jung, S., Qin, X., Noyce, D., 2010. Rainfall effect on single-vehicle crash severities ods for Transportation Data Analysis. Chapman & HALL/CRC.
using polychotomous response models. Accident Analysis and Prevention 42 Xu, C., Liu, P., Wang, W., Li, Z., 2012a. Evaluation of the impacts of traffic states on
(1), 213–224. crash risks on freeways. Accident Analysis and Prevention 47, 162–171.
Khattak, A., Knapp, K., 2001. Interstate highway crash injuries during winter snow Xu, F., Tian, Z., 2008. Driver behavior and gap-acceptance characteristics at round-
and nonsnow events. Transportation Research Record 1746, 30–36. abouts in California. Transportation Research Record 2071, 117–124.
Kim, J., Ulfarsson, F., Kim, S., Shankar, V., 2013. Driver-injury severity in single- Xu, C., Liu, P., Wang, W., Li, Z., 2012b. Development of a crash risk index to identify
vehicle crashes in California: a mixed logit analysis of heterogeneity due to age real-time crash risks on freeways. In: Presented at the 91th Annual Meeting of
and gender. Accident Analysis and Prevention 50, 1073–1801. the Transportation Research Board, CD-ROM, Washington, DC.
Lee, C., Saccomanno, F., Hellinga, B., 2003. Real-time crash prediction model for Xu, C., Wang, W., Liu, P. A genetic programming model for real-time crash prediction
the application to crash prevention in freeway traffic. Transportation Research on freeways. IEEE Transactions on Intelligent Transportation Systems, in press.
Record 1840, 67–77. Yamamoto, T., Hashiji, J., Shankar, V., 2008. Underreporting in traffic accident data,
Lee, C., Park, P., Abdel-Aty, M., 2011. Lane-by-lane analysis of crash occurrence based bias in parameters and the structure of injury severity models. Accident Analysis
on driver’s lane-changing and car-following behavior. Journal of Transportation and Prevention 43, 1320–1329.
Safety and Security 3 (2), 108–122. Yu, R., Abdel-Aty, M., 2013. Utilizing support vector machine in real-time crash risk
Leckrone, S.J., Tarko, A.P., Anastasopoulos, P.C., 2011. Improving safety at high-speed evaluation. Accident Analysis and Prevention 51, 252–259.
rural intersections. In: Presented at 3rd International Conference on Road Safety Zheng, Z., Ahna, S., Monsere, C., 2010. Impact of traffic oscillations on freeway crash
and Simulation, CD-ROM, Indianapolis, IN. occurrences. Accident Analysis and Prevention 42, 626–636.

A Real Time Crash Prediction Fusion Framework - 2020 - Transportation Research
No ratings yet
A Real Time Crash Prediction Fusion Framework - 2020 - Transportation Research
24 pages
TDI060403 F
No ratings yet
TDI060403 F
15 pages
Safety Assessment of Urban Un-Signalized Intersections Using Conflict Analysis Technique
No ratings yet
Safety Assessment of Urban Un-Signalized Intersections Using Conflict Analysis Technique
19 pages
Analyzing Traffic Crash Severity With Combination
No ratings yet
Analyzing Traffic Crash Severity With Combination
16 pages
Advanced Quantitative Methods For Imminent Detection of Crash Pro
No ratings yet
Advanced Quantitative Methods For Imminent Detection of Crash Pro
179 pages
Real-Time Prediction of Crash Risk On Freeways Under Fog Conditions
No ratings yet
Real-Time Prediction of Crash Risk On Freeways Under Fog Conditions
12 pages
Injury Risk Assessment and Interpretation For Roadway Crashes Based On Pre-Crash Indicators and Machine Learning Methods
No ratings yet
Injury Risk Assessment and Interpretation For Roadway Crashes Based On Pre-Crash Indicators and Machine Learning Methods
25 pages
Zeng 2019
No ratings yet
Zeng 2019
9 pages
Research Article: An Improved Deep Learning Model For Traffic Crash Prediction
No ratings yet
Research Article: An Improved Deep Learning Model For Traffic Crash Prediction
14 pages
TTE4274 - Class 20
No ratings yet
TTE4274 - Class 20
64 pages
AJDTS 26 Is Zhang Accident Time Clearance
No ratings yet
AJDTS 26 Is Zhang Accident Time Clearance
10 pages
1 s2.0 S2046043021000514 Main
No ratings yet
1 s2.0 S2046043021000514 Main
12 pages
s13369 021 05650 3
No ratings yet
s13369 021 05650 3
26 pages
Final Document
No ratings yet
Final Document
47 pages
Identifying The Principal Factors Influencing Traffic Safety On Interstate Highways
No ratings yet
Identifying The Principal Factors Influencing Traffic Safety On Interstate Highways
10 pages
Data-Mining Techniques For Traffic Accident Modeling and Prediction in The United Arab Emirates
0% (1)
Data-Mining Techniques For Traffic Accident Modeling and Prediction in The United Arab Emirates
40 pages
1 s2.0 S2352146524003909 Main
No ratings yet
1 s2.0 S2352146524003909 Main
13 pages
1273-Article Text-4399-1-10-20140426
No ratings yet
1273-Article Text-4399-1-10-20140426
12 pages
1 s2.0 S2666691X23000301 Main
No ratings yet
1 s2.0 S2666691X23000301 Main
14 pages
Real-Time Vehicular Accident Prevention System Using Deep
No ratings yet
Real-Time Vehicular Accident Prevention System Using Deep
16 pages
Reconstruction of Statistical Models For Predicting Traffic Accident Rates and Application of Probabilistic Concepts
No ratings yet
Reconstruction of Statistical Models For Predicting Traffic Accident Rates and Application of Probabilistic Concepts
4 pages
PIIS240584402403696X
No ratings yet
PIIS240584402403696X
15 pages
Computer Aided Civil Eng - 2010 - Lee - A Computerized Feature Selection Method Using Genetic Algorithms To Forecast
No ratings yet
Computer Aided Civil Eng - 2010 - Lee - A Computerized Feature Selection Method Using Genetic Algorithms To Forecast
17 pages
FINAL - DOCUMENT (1) (1) - Removed
No ratings yet
FINAL - DOCUMENT (1) (1) - Removed
26 pages
6 Oh
No ratings yet
6 Oh
9 pages
Road Safety
No ratings yet
Road Safety
10 pages
5480-Article Text-8705-1-10-20200511
No ratings yet
5480-Article Text-8705-1-10-20200511
8 pages
Lucas Premutico Ramachandran Report
No ratings yet
Lucas Premutico Ramachandran Report
8 pages
Analysis of Historical Accident Data To Determine Accident Prone Locations and Cause of Accidents
No ratings yet
Analysis of Historical Accident Data To Determine Accident Prone Locations and Cause of Accidents
21 pages
Accident Analysis and Prevention: Sciencedirect
No ratings yet
Accident Analysis and Prevention: Sciencedirect
9 pages
23 Developing A New Real-Time Traffic Safety Management Framework For Urban Expressways Utilizing Reinforcement Learning Tree.
No ratings yet
23 Developing A New Real-Time Traffic Safety Management Framework For Urban Expressways Utilizing Reinforcement Learning Tree.
13 pages
Combine PDF
No ratings yet
Combine PDF
50 pages
Topic 05 - Highway Safety and Accident Analysis
No ratings yet
Topic 05 - Highway Safety and Accident Analysis
33 pages
1 s2.0 S0001457521005340 Main
No ratings yet
1 s2.0 S0001457521005340 Main
11 pages
Comparing Machine Learning and Deep Learning Methods For Real-Time Crash Prediction
No ratings yet
Comparing Machine Learning and Deep Learning Methods For Real-Time Crash Prediction
10 pages
Summary of Thesis - Nay Lwin Hein - PDF
No ratings yet
Summary of Thesis - Nay Lwin Hein - PDF
2 pages
Smart Accident Detection System
No ratings yet
Smart Accident Detection System
5 pages
Transportation Research Part C: Chengcheng Xu, Wei Wang, Pan Liu, Rui Guo, Zhibin Li
No ratings yet
Transportation Research Part C: Chengcheng Xu, Wei Wang, Pan Liu, Rui Guo, Zhibin Li
10 pages
Study of Relation Between Actual and Perceived Crash Risk: Sciencedirect
No ratings yet
Study of Relation Between Actual and Perceived Crash Risk: Sciencedirect
10 pages
Abdel-Aty and Pande2005
No ratings yet
Abdel-Aty and Pande2005
12 pages
Accident Analysis and Prevention: Application of Poisson Random Effect Models For Highway Network Screening
No ratings yet
Accident Analysis and Prevention: Application of Poisson Random Effect Models For Highway Network Screening
9 pages
Potential Real-Time Indicators of Sideswipe Crashes On Freeways
No ratings yet
Potential Real-Time Indicators of Sideswipe Crashes On Freeways
18 pages
Real-Time Traffic Accidents Post-Impact Prediction - Based On Crowdsourcing Data
No ratings yet
Real-Time Traffic Accidents Post-Impact Prediction - Based On Crowdsourcing Data
11 pages
Road Accident Analysis and Prediction of
No ratings yet
Road Accident Analysis and Prediction of
8 pages
Statistical Modeling of Total Crash Frequency at Highway Intersections
No ratings yet
Statistical Modeling of Total Crash Frequency at Highway Intersections
6 pages
Accident Detection Using Deep Learning A
No ratings yet
Accident Detection Using Deep Learning A
10 pages
Official Views of The World's Columbian Exposition
No ratings yet
Official Views of The World's Columbian Exposition
464 pages
Analyzing Road Design Risk Factors For R PDF
No ratings yet
Analyzing Road Design Risk Factors For R PDF
8 pages
Yiyoiuo
No ratings yet
Yiyoiuo
8 pages
The Witches' Devil by Roger J. Horne
No ratings yet
The Witches' Devil by Roger J. Horne
249 pages
2 - Unit 3 - Ref-2
No ratings yet
2 - Unit 3 - Ref-2
41 pages
Crash Risk Factor Identification Using Association Rules in Nagpur City, Maharashtra, India
No ratings yet
Crash Risk Factor Identification Using Association Rules in Nagpur City, Maharashtra, India
10 pages
A Comprehensive Study On IoT Based Accident Detection Systems For Smart Vehicles
No ratings yet
A Comprehensive Study On IoT Based Accident Detection Systems For Smart Vehicles
18 pages
JSRT Roadaccidentproject
No ratings yet
JSRT Roadaccidentproject
12 pages
Accident Prediction
No ratings yet
Accident Prediction
4 pages
Related Worked
No ratings yet
Related Worked
10 pages
Abrahams and McMinns Clinical Atlas of Human Anatomy 1st edition by Peter Abrahams, Jonathan Spratt, Marios Loukas, Albert VanSchoor 0702073350 9780702073359 - The ebook in PDF/DOCX format is available for instant download
100% (6)
Abrahams and McMinns Clinical Atlas of Human Anatomy 1st edition by Peter Abrahams, Jonathan Spratt, Marios Loukas, Albert VanSchoor 0702073350 9780702073359 - The ebook in PDF/DOCX format is available for instant download
28 pages
A Comprehensive Study On IoT Based Accident Detect
No ratings yet
A Comprehensive Study On IoT Based Accident Detect
19 pages
Prediction of Road Accidents in The Different States of India Using Machine Learning Algorithms
No ratings yet
Prediction of Road Accidents in The Different States of India Using Machine Learning Algorithms
6 pages
A Method For Relating Type of Crash To Tra C Ow Characteristics On Urban Freeways
No ratings yet
A Method For Relating Type of Crash To Tra C Ow Characteristics On Urban Freeways
28 pages
Scrubber
No ratings yet
Scrubber
15 pages
Measurements
No ratings yet
Measurements
34 pages
Successful Commercial Beekeeping
No ratings yet
Successful Commercial Beekeeping
5 pages
Development of Accident Prediction Models For Rural Highway Intersections - Oh
No ratings yet
Development of Accident Prediction Models For Rural Highway Intersections - Oh
10 pages
Analysis of Road Traffic Accident Using Data Mining: Keywords
No ratings yet
Analysis of Road Traffic Accident Using Data Mining: Keywords
9 pages
On The Art of Building in Ten Books
0% (1)
On The Art of Building in Ten Books
5 pages
4G-4G Traffic Sharing For LTC (Low Throughput Cells) Improvement in Huawei LTE
No ratings yet
4G-4G Traffic Sharing For LTC (Low Throughput Cells) Improvement in Huawei LTE
3 pages
Emphasis On Sustainable & Regenerative Farming Methods
No ratings yet
Emphasis On Sustainable & Regenerative Farming Methods
48 pages
Hang Thỏ Word Formation B2 C1 Chapter 1
No ratings yet
Hang Thỏ Word Formation B2 C1 Chapter 1
4 pages
AI Based EV Charging Load Estimation
No ratings yet
AI Based EV Charging Load Estimation
30 pages
Drug Study Potassium Citrate
No ratings yet
Drug Study Potassium Citrate
3 pages
SIROLL ALU en
No ratings yet
SIROLL ALU en
28 pages
GAs - BOUNDARY WALL-S77 BOUNDARY WALLS
100% (1)
GAs - BOUNDARY WALL-S77 BOUNDARY WALLS
1 page
Introduction To Chemotaxis Chemotaxis Describes How Bacteria and Cellular Organisms
No ratings yet
Introduction To Chemotaxis Chemotaxis Describes How Bacteria and Cellular Organisms
2 pages
Voolenvine FavoriteSocks 2020 Final PDF
No ratings yet
Voolenvine FavoriteSocks 2020 Final PDF
6 pages
BOM Prod Analysis
No ratings yet
BOM Prod Analysis
3 pages
WinGD - TIN036 1 - Update On Dual Fuel Methanol Engine Development
No ratings yet
WinGD - TIN036 1 - Update On Dual Fuel Methanol Engine Development
2 pages
Domestic Cat
No ratings yet
Domestic Cat
11 pages
2024 Acuvue Price List
No ratings yet
2024 Acuvue Price List
2 pages
Useful WordsFor Creative Writing
No ratings yet
Useful WordsFor Creative Writing
22 pages
Mnemonics Psychiatric Diagnosis.
No ratings yet
Mnemonics Psychiatric Diagnosis.
7 pages
Elementary GW 07b
No ratings yet
Elementary GW 07b
2 pages
Nphoton 2012 31 PDF
No ratings yet
Nphoton 2012 31 PDF
6 pages
BT Inter Phone User Manual
100% (1)
BT Inter Phone User Manual
15 pages
Memory of The World (SCIFICT - REPORT)
No ratings yet
Memory of The World (SCIFICT - REPORT)
21 pages
Minor Project File
No ratings yet
Minor Project File
29 pages
Brosur CCTV ZiFMachines
No ratings yet
Brosur CCTV ZiFMachines
3 pages
Emergency Light
No ratings yet
Emergency Light
2 pages
Molykote Products in India - Email Sales@
No ratings yet
Molykote Products in India - Email Sales@
2 pages
Highway-Rail Grade Crossing Identification and Prioritizing Model Development
From Everand
Highway-Rail Grade Crossing Identification and Prioritizing Model Development
Maxim A. Dulebenets
No ratings yet

Predicting Crash Likelihood and Severity

Uploaded by

Predicting Crash Likelihood and Severity

Uploaded by

Accident Analysis and Prevention 57 (2013) 30–39

Contents lists available at SciVerse ScienceDirect

Accident Analysis and Prevention

Predicting crash likelihood and severity on freeways with real-time

correlated with the average detector occupancy at the upstream

VehCntu Average 30-s vehicle count at the upstream station (veh/30 s)

However, the main drawback associated with ordered probit/logit

3.1. Binary logit model

Stage 3. Crash types KA (binary response = 1) vs. crash types BC

Based on the estimated binary logit model at each stage of the

P(PDO) = P(Crash) P(PDO|Crash) = Pf 1 (1 − Pf 2 ) (8)

P(BC) = P(Crash) P(KA or BC|Crash) P(BC|KA or BC) = Pf 1 Pf 2 (1 − Pf 3 )

where P(Y) is the probability of Y; P(Y|X) is the probability of Y given

3.2. Model structure 4.1.1. First-stage model – all crashes

Parameter Estimate Std. Error Wald 2 Pr > Chisq Elasticity

Stage 1 Crash (KA, BC, and PDO) vs. non-crash

Stage 2 KA and BC vs. PDO

Oh et al. (2001) 55.8% 72.1% 27.9% 52 4787

(|Y1 , Y2 ) ∝ f (Y1 , Y2 |)() = f (Y2 |Y1 , )f (Y1 |)()

You might also like

Parameter Estimate Std. Error Wald 2 Pr > Chisq Elasticity

(|Y1 , Y2 ) ∝ f (Y1 , Y2 |)() = f (Y2 |Y1 , )f (Y1 |)()