Applying Machine Learning Approaches To Analyze The Vulnerable Roadusers' Crashes at Statewide Traffic Analysis Zones

JSR-01585; No of Pages 14
Journal of Safety Research xxx (2019) xxx
Contents lists available at ScienceDirect
Journal of Safety Research
journal homepage: www.elsevier.com/locate/jsr
Applying machine learning approaches to analyze the vulnerable road-

users' crashes at statewide traffic analysis zones
Md Sharikur Rahman, ⁎ Mohamed Abdel-Aty, Samiul Hasan, Qing Cai
Department of Civil, Environmental and Construction Engineering, University of Central Florida, Orlando, FL 32816, USA
a r t i c l e i n f o a b s t r a c t
Article history: Introduction: In this paper, we present machine learning techniques to analyze pedestrian and bicycle crash by
Received 8 June 2018 developing macro-level crash prediction models. Methods: We collected the 2010–2012 Statewide Traffic Anal-
Received in revised form 30 March 2019 ysis Zone (STAZ) level crash data and developed rigorous machine learning approach (i.e., decision tree regres-
Accepted 16 April 2019
sion (DTR) models) for both pedestrian and bicycle crash counts. To our knowledge, this is the first application
Available online xxxx
of DTR models in the burgeoning macro-level traffic safety literature. Results: The DTR models uncovered the
Keywords:
most significant predictor variables for both response variables (pedestrian and bicycle crash counts) in terms
Machine learning of three broad categories: traffic, roadway, and socio-demographic characteristics. Additionally, spatial predictor
Macro-level variables of neighboring STAZs were considered along with the targeted STAZ in both DTR models. The DTR
Decision tree regression model considering spatial predictor variables (spatial DTR model) were compared without considering spatial
Statewide traffic analysis zone predictor variables (aspatial DTR model) and the model comparison results discovered that the prediction accu-
Ensemble technique racy of the spatial DTR model performed better than the aspatial DTR model. Finally, the current research effort
contributed to the safety literature by applying some ensemble techniques (i.e. bagging, random forest, and gra-
dient boosting) in order to improve the prediction accuracy of the DTR models (weak learner) for macro-level
crash count. The study revealed that all the ensemble techniques performed slightly better than the DTR
model and the gradient boosting technique outperformed other competing ensemble techniques in macro-
level crash prediction models.
© 2019 National Safety Council and Elsevier Ltd. All rights reserved.
1. Introduction 7.40 (ranked first among all states), which clearly present the challenge
faced in Florida (NHTSA, 2015; NHTSA, 2017b). The crash prediction
The most active forms of transportation are walking and bicycling models applied to the pedestrian and bicycle crashes would give some
which have the lowest impact on the environment and improve physi- valuable insights for a transportation planner to identify the contribut-
cal health of pedestrians and bicyclists. Transportation agencies are in- ing factors related to pedestrians and bicyclists' crashes which might
creasingly promoting walking and bicycling options for short distance be helpful for policy implications at a planning level.
trips to mitigate climate change and obesity problem among adults. In transportation safety research, crash prediction models are devel-
However, the most common problem impeding the preference of walk- oped for two levels: (1) micro-level (2) macro-level. The former one fo-
ing and bicycling is traffic safety concerns. According to the latest traffic cuses on crashes at a segment or intersection to identify the influence of
safety data from the National Highway Traffic Safety Administration contributing factors with the objective of offering engineering solutions.
(NHTSA), pedestrian and bicycle deaths have increased by 9.0% and On the other hand, the macro-level crashes from a spatial aggregation
1.3%, respectively in 2016 compared to the calendar year 2015 such as traffic analysis zone, census block, census tract, county are con-
(NHTSA, 2017a). Thus, the safety challenges associated with pedestrians sidered to quantify the significant factors at a macro-level so that it can
and bicyclists remain an important concern for transportation policy. provide countermeasures from a planning perspective. Statistical
The safety risk posed to active transportation users in Florida is exacer- models, such as Poisson and negative binomial regression, have been
bated compared to active transportation users in the US. In 2015, while employed to analyze both micro- and macro-level crashes for many
the national average for pedestrian and bicyclist fatalities per 100,000 years. However, statistical models have their own model-specific as-
population was 1.67 and 2.50, respectively, the corresponding number sumptions which lead to inaccurate results of injury likelihood (Chang
for the state of Florida was 3.10 (ranked second among all states) and & Chen, 2005; Rahman, 2018; Rahman, Abdel-Aty, Hasan, & Cai, 2019;
Saad, Abdel-aty, & Lee, 2019; Saad, Abdel-aty, Lee, & Cai, 2019; Saad,
⁎ Corresponding author. Abdel-aty, Lee, & Wang, 2018a; Yuan & Abdel-Aty, 2018). In this regard,
E-mail address: [email protected] (M.S. Rahman). this study contributes to the safety literature by undertaking pedestrian
https://doi.org/10.1016/j.jsr.2019.04.008
0022-4375/© 2019 National Safety Council and Elsevier Ltd. All rights reserved.
Please cite this article as: M.S. Rahman, M. Abdel-Aty, S. Hasan, et al., Applying machine learning approaches to analyze the vulnerable road-users'
crashes at statewide traf..., Journal of Safety Research, https://doi.org/10.1016/j.jsr.2019.04.008
2 M.S. Rahman et al. / Journal of Safety Research xxx (2019) xxx
and bicycle crash prediction model using the most widely applied ma- pre-defined underlying relationships between target variable and pre-
chine learning technique: decision tree regression (DTR). To the best dictors (Gong, Abdel-Aty, Cai, & Rahman, 2019; Tavakoli Kashani,
of our knowledge, none of the studies have explored machine learning Rabieyan, & Besharati, 2014). Among the machine learning techniques,
techniques in analyzing pedestrian and bicycle crashes at the macro- the decision tree model has gained much popularity in transportation
level. In this regard, three broad categories of predictor variables safety literature which can identify and easily explain the complex pat-
including traffic, roadway, and socio-demographic characteristics are terns associated with crash risk (Chang & Chen, 2005; Chang & Chien,
considered in the DTR model development and validation. In addition, 2013; Chang & Wang, 2006; Pande, Abdel-Aty, & Das, 2010). To over-
the attributes of the neighboring zones are considered as predictor var- come the shortcoming of the statistical modeling, decision tree can be
iables along with the targeted STAZs attributes in DTR models to im- a preferred alternative for forecasting traffic crashes with reasonable in-
prove the prediction accuracy of pedestrian and bicycle crashes. terpretations. Unlike statistical models, decision trees do not need any
Furthermore, the current study has undertaken some ensemble tech- predefined model assumption and underlying relationship between de-
niques (i.e. bagging, random forest, and gradient boosting) to improve pendent and independent variables. It does deal well with
the prediction accuracy of the DTR models considered as weak learner multicollinear independent variables and does treat satisfactorily dis-
which provides valuable insights on advancing crash prediction model- crete variables with more than two levels (Karlaftis & Golias, 2002;
ing techniques for macro-level crash analysis. Washington & Wolf, 1997). Moreover, decision tree models can help
in deciding how to subdivide heavily skewed target variables (i.e.,
2. Literature review zero crash counts) into ranges while the statistical modeling has some
limitations for dealing with heavily skewed data (Song & Lu, 2015).
Road traffic accidents are highly recognized as a national health Therefore, decision tree models might be a preferred option to analyze
problem which affects the society both emotionally and economically heavily skewed response variable which is most common in pedestrian
(Blincoe, Seay, & Zaloshnja, 2000; NHTSA, 2005). There is a considerable and bicycle crashes. A summary of earlier studies employing decision
number of research efforts that have been examined in crash frequency tree models in traffic safety literature is presented in Table 1 (Abdel-
estimation (vehicle, pedestrian, and bicycle) (see (Lord & Mannering, Aty, Keller, & Brady, 2005; Chang & Chen, 2005; Chang & Chien, 2013;
2010) for a detailed review). These studies have been conducted for dif- Chang & Wang, 2006; De Oña, López, & Abellán, 2013; Eustace,
ferent modes of vehicle (automobiles and motorbikes), pedestrian and Alqahtani, & Hovey, 2018; Iragavarapu, Lord, & Fitzpatrick, 2015;
bicycle, and for different scales – micro (such as intersection and seg- Karlaftis & Golias, 2002; Kashani & Mohaymany, 2011; Montella, Aria,
ment) and macro-level (such as census tract, traffic analysis zone D'Ambrosio, & Mauriello, 2012; Pande et al., 2010; Tavakoli Kashani
(TAZ), county). It is beyond the scope of this paper for exhaustive re- et al., 2014; Wah, Nasaruddin, Voon, & Lazim, 2012; Zheng, Lu, &
view of micro-level (see Eluru, Bhat, and Hensher (2008), Lord, Denver, 2016). The information provided in the table includes the
Washington, and Ivan (2005), Nashad, Yasmin, Eluru, Lee, and Abdel- study unit considered, the methodological approach employed, the tar-
Aty (2016) for detailed micro-level literature review) and macro-level get variables analyzed in the decision tree framework. The following ob-
(see Cai, Abdel-Aty, Lee, and Eluru (2017), Cai, Lee, Eluru, and Abdel- servations can be inferred from the table. From the table, it is evident
Aty (2016), Lee, Yasmin, Eluru, Abdel-Aty, and Cai (2018), Wang, that all the existing decision tree-based safety studies are conducted
Yuan, Schultz, and Fang (2018) for detailed macro-level literature re- at a micro-level such as roadway segments and intersections. To the
view) crash frequency studies. These studies have heavily focused on best our knowledge, none of the studies have explored decision tree
econometric statistical modeling approaches for the prediction of traffic methods in order to build the crash prediction model at the macro-
crashes with exploring significant contributing factors related to the level. It is also noticed that most of the model structures employed in
crash occurrence. However, statistical models can lead to inaccurate es- developing decision trees are classification trees except for two studies
timations of injury likelihood if prespecified model assumptions and (Abdel-Aty et al., 2005; Karlaftis & Golias, 2002) which conducted hier-
underlying relationship between dependent and independent variables archical tree-based regression for developing the micro-level crash pre-
of these models are invalid (Chang & Chen, 2005). diction model. Within the decision tree structure, those studies did not
Moreover, the presence of large number of zeroes in pedestrian and explore the total number of pedestrian and bicycle crashes while they
bicycle crashes is one of the major methodological challenges in statis- have predominantly analyzed crash frequency by severity levels or
tical modeling to analyze the contributing factors related to pedestrian other different attribute levels.
and bicyclist crashes. In crash count models, the presence of excess With the different modeling techniques, vulnerable road users' (pe-
zeros may result from two underlying processes or states of crash fre- destrian and bicycle) crashes have been investigated separately. Differ-
quency likelihoods: crash-free state (or zero crash state) and crash ent types of contributory factors were identified from previous studies.
state (see Mannering, Shankar, and Bhat (2016) for more explanation). Table 2 provides a summary of contributing factors related to vulnerable
In the presence of such dual-state, application of single-state model may road users' (non-motorist) crashes including both macro- (Cai, Abdel-
result in biased and inconsistent parameter estimates. In a statistical Aty, & Lee, 2017; Cai et al., 2016; Lee, Abdel-Aty, Choi, & Huang, 2015;
framework, the potential relaxation of the single-state count model is Nashad et al., 2016; Ukkusuri, Hasan, & Aziz, 2011; Zhang, Bigham,
zero inflated model for addressing the issue of excess zeros: zero in- Ragland, & Chen, 2015) and micro-level (Abdel-Aty et al., 2005;
flated (ZI) model (Shankar, Milton, & Mannering, 1997). But, several re- Abdel-Aty, Chundi, & Lee, 2007; Eluru et al., 2008; Lee & Abdel-Aty,
search studies have criticized the application of dual state ZI models for 2005; Pitt, Guyer, Hsieh, & Malek, 1990; Prati, Pietrantoni, & Fraboni,
traffic safety analysis (Lord et al., 2005; Lord, Washington, & Ivan, 2007; 2017; Roudsari et al., 2004) studies separately. Some general observa-
Son, Kweon, & Park, 2011). A ZI model assumes that two types of zeros tions can be made from the Table 2. In macro-level analysis, most of
exist, i.e., sampling zeros and structural zeros. For traffic safety, the the studies found that the length of sidewalks, total employment, num-
structural zeros correspond to inherently safe conditions implying ber of public transit commuters, number of commuters by walk, num-
zero crash by nature and the sampling zeros correspond to potential ber of commuters by bike, vehicle miles travel, population density,
crash conditions implying zero crash only by chance (Lord et al., 2005; number of rail and bus station, number of hotels, motels, and guest
Lord et al., 2007). Hence, the statistical assumptions of having structural house, number of schools, proportion of uneducated people, and num-
zeroes is unrealistic as a traffic crash could occur under any conditions. ber of signalized intersections had positive impact on the vulnerable
Recently, machine learning and/or data mining techniques have be- road users' crash frequency. However, the median household income,
come popular in transportation safety research to determine the factors proportion of heavy vehicle, proportion of high speeds (55 mph or
associated with traffic crashes. Unlike statistical models, machine learn- higher) roads, average geodesic distance, betweenness centrality, and
ing techniques are non-parametric methods which do not require any clustering coefficients reduced the likelihood of pedestrian and bicyclist
M.S. Rahman et al. / Journal of Safety Research xxx (2019) xxx 3
Table 1
Summary of previous traffic safety studies using decision tree and ensemble techniques.
Area of Studies Study unit (Scale) Methodology Target variables analyzed

interest
Decision Tree Kashani et al. Roadway segment (Micro) Classification Injury severity level – injury, fatality
(2014) Tree
Zheng et al. Highway-rail grade crossings (Micro) Classification Highway-rail grade crossings crash
(2016) Tree
Kashani et al. Two-lane, two-way rural roads Classific ation Injury severity level – light injury, serious injury, fatality
(2011) segments (Micro) Tree
Iragavarapu et al. Road segments-pedestrian crash Classification Injury severity level – fatal or non-fatal
(2015) (Micro) Tree
Chang and Chen National Freeway (Micro) Classification Injury Severity level (0–4, 4 representing 4 or more crashes)
(2005) Tree
Wah et al. (2012) Roadway segments (Micro) Classification Category of frequencies of motorcycle accidents – Zero frequency (0), Low
Tree frequency (1–19), High frequency (20 and above)
Chang and Wang Roadway segments (Micro) Classification Injury severity level – fatality, injury, no-injury
(2006) Tree
Pande et al. Roadway segments (Micro) Classification Binary variable – Crash vs Non-crash
(2010) Tree
Chang and Chien National freeways (Micro) Classification Injury severity level – fatality, injury, no-injury
(2013) Tree
Ona et al. (2013) Road Segments–Rural highways Classification Accident severity – slightly injured, killed or seriously injured (KSI) (state B)
(Micro) Tree
Montella et al. Roadway segments–Powered Classification Several response variables – severity, crash type, involved vehicles, alignment
(2012) two-wheeler crashes (Micro) Tree
Eustace et al. Road segments (Micro) Classification Injury severity level-fatal/injury, and property damage only
(2018) Tree
Abdel-Aty et al. Road segments (Micro) Regression Total crash, angle crash, left turn crash, head on crash, pedestrian crash, rear-end
(2005) Tree crash, right turn crash, sideswipe crash
Karlaftis and Road segments (Micro) Regression Total number of crashes
Golias (2002) Tree
Ensemble Sohn et al. (2002) Road segments (Micro) Arcing and Injury severity level-bodily injury and property damage
Techniques bagging
crashes. Table 2 also provide summary findings from earlier studies re- intoxicated, and very young or elderly are more prone to severe injuries,
garding the contributing factors that are related to the non-motorist as are pedestrians struck by an alcohol intoxicated driver, by non-sedan
crash risks in micro-level analysis. Overall, studies analyzing non-mo- vehicles (SUVs, pick-up vans), and by high speed vehicles. Moreover,
torized crash injury severity indicate that non-motorist who are male, vulnerable road users' injury occurred in crashes at school zone
Table 2
Summary of contributory factors for vulnerable road users' crashes.
Study Studies Contributory factors

unit
Positive effects Negative effects
Macro Cai, Abdel-Aty, Proportion of length of local roads, signalized intersection density, and length of Median household income, and proportion of heavy
and Lee (2017) sidewalks, pedestrian and bicycle commuters, and population equal to or older than 65 vehicle mileage
years old
Cai et al. Length of sidewalks, Number of total employments, public transportation commuter, Proportion of heavy vehicle mileage in VMT, distance to
(2016) walk commuters, bike commuters, population density, and signalized intersection nearest urban area
density.
Nashad et al. Vehicle miles travel, total population, public transit commuters, bike commuters, walk Proportion of heavy vehicles, distance to nearest urban
(2016) commuters, school enrolment density, length of sidewalk, proportion of urban roads area, proportion of industry employments
Zhang et al. Number of commercial properties, Number of bus lines, Number of 4-way intersections, Average geodesic distance, betweenness centrality, and
(2015) number of housing units, Vehicle miles travel. clustering coefficients.
Lee et al. Total population, number of rail and bus station, number of hotels, motels, and guest Median household income, proportion of high-speed roads
(2015) house per square mile, and number of schools per square mile (55 mph or higher), proportion of people working at home
Ukkusuri et al., Proportion of African-American population, industrial land use proportion of total land Median age population, number of three approach
2011 use, total number of signalized intersections, number of bus stops, and proportion of intersections, proportion of local road.
uneducated people
Micro Prati et al. Road type, age of cyclist, gender of cyclist, and the type of opponent vehicle
(2017)
Eluru et al. Age of the individual, the speed limit on the roadway, location of crashes, and time-of-day
(2008)
Abdel-Aty Driver's age, gender, and alcohol use, pedestrian's/bicyclist's age, number of lanes, median type, and speed
et al. (2007) limits
Abdel-Aty Right turn channelized on major roads, exclusive left turn lanes on minor roads, daily traffic volume on major roads, speed limits on minor roads, total
et al. (2005) left turn lanes of minor road
Lee et al. Higher traffic volume, drivers' age, drivers' sex, vehicle type, traffic control devices, locations, and lighting conditions
(2015)
Roudsari et al. Drivers' age, vehicle class, and speed limits on the roadway
(2004)
Pitt et al. Drivers' sex, gender, vehicle characteristics, speed of the roadway, and the time of the day
(1990)
locations, on higher speed-limit roads, on two-way roads with median, 3. Methodologies

and in residential and rural areas increase injury severity. Pedestrian–
motor vehicle crashes occurring during the night time and in adverse There are two types of decision tree-based methods: classification
weather conditions increase the likelihood of being fatally injured, as tree and regression tree. The former is designed to partition data
also do frontal collisions. However, Abdel-Aty et al. (2005) analyzed based on the discrete nature of categorical target variables, while the
contributing factors related to pedestrian and bicycle crash frequency latter is to partition (regress) data on the basis of continuous response
at the micro-level using decision tree regression models at signalized in- data. The target variables in this study are pedestrian and bicycle
tersections. The authors conclude that right turn channelization and crashes in each STAZ. Hence, this paper focuses on the latter method re-
daily traffic volume on major roads, and exclusive left turn lanes, gression tree and some ensemble techniques applied to improve the
speed limits, total left turn lanes of minor roads are the most important forecasting accuracy.
contributing factors to predict the vulnerable road users' crash
occurrence. 3.1. Regression tree framework
One of the basic assumptions of most of the modeling techniques are
that observations are independent from each other. Nevertheless, this A regression tree is referred to a set of rules for dividing a large col-
assumption is often violated in traffic data because of possible correla- lection of observations into smaller homogeneous groups based on the
tion among observations. For instance, some observations that are predictor (independent) variables with respect to a continuous target
from the same spatial units may have common unobserved factors. In (dependent) variable. The methods used to estimate regression trees
macro-level analysis, crashes occurring in a spatial unit are aggregated have been around since the early 1960s and are sometimes referred to
to obtain the crash frequency. However, this aggregation process as classification and regression tree (CART) (Breiman, Friedman,
might introduce errors in identifying the predictor variables for the spa- Olshen, & Stone, 1998). Generally, there are two key questions for the
tial unit. For example, a crash occurring closer to the boundary of the development of a regression tree: (1) which variable of all predictor
unit might be strongly related to the neighboring zone than the actual variables offered in the model should be selected to produce the maxi-
zone where the crash occurred. There is a considerable amount of re- mum reduction in variability of the response (target) variable, (2)
search that have been undertaken to accommodate for such spatial which value of the selected predictor variable (discrete or continuous)
unit induced bias (Huang, Abdel-Aty, & Darwiche, 2010; Lee et al., results in the maximum reduction in variability of the response variable.
2015; Siddiqui, Abdel-Aty, & Choi, 2012). The most recent study pro- Numerical search procedure is undertaken to iterate these two steps
posed the consideration of exogenous variables from neighboring until all the observations are partitioned into a smaller homogenous
zones for accounting for spatial dependency which was called spatial group (Washington, 2000). Tree partitioning is an essential step for de-
spillover model (Cai et al., 2016). And, the research effort revealed veloping a decision tree. Recursive partitioning methods have become
that models with spatial exogenous variables significantly popular and widely used tools for nonparametric regression tree in
outperformed the model that did not consider the spatial exogenous many scientific fields. Recursive partitioning creates a decision tree
variables. In our analysis, we introduce spatial predictor (exogenous) that strives to correctly classify members of the population by splitting
variables from neighboring zones for improving the prediction accu- it into sub-populations based on several predictor variables. The process
racy. Apart from the statistical and data mining methods, simulation is termed recursive because each sub-population may in turn be split an
techniques can identify the significant contributing factors related to indefinite number of times until the splitting process terminates after a
the crash occurrence (Cai, Saad, Abdel-aty, & Yuan, 2018; Ekram & particular stopping criterion is reached (James, Witten, Hastie, &
Rahman, 2018; Rahman & Abdel-aty, 2018; Rahman, Abdel-Aty, Lee, & Tibshirani, 2013).
Rahman, 2019; Rahman, Abdel-aty, Lee, & Rahman, 2019; Rahman, In this paper, the focus of the regression tree model is to predict the
Abdel-aty, Lee, & Rahman, 2019b; Rahman, Abdel-Aty, Wang, & Lee, total number of crashes. Let us assume that the response variable, Yn
2018; Saad, Abdel-Aty, Lee, & Wang, 2018b; Saad, Abdel-aty, Lee, & (total number of crashes), is a column vector of n observations, and
Wang, 2019; Wu, Abdel-Aty, Wang, & Rahman, 2019). Xn, p is a matrix of (p-1) random predictor variables measured for n
However, decision trees can be unstable because of the small varia- cases. The equation system for modeling regression tree, the deviance
tions in the data which might result in a completely different tree being D or sum of square (SSE) is defined as follows:
generated. This would result in a good prediction for the majority class,
but a relatively poor prediction for the minority class. This problem can X
L
SSE ¼ D ¼ ðY l −μ Þ2 ð1Þ
be mitigated by using decision trees within an ensemble (Mounce et al.,
l¼1
2017). In machine learning, ensemble methods are used to obtain better
predictive performance than could be obtained from any of the constit-
1X L
uent learning algorithms alone. Data ensemble combines various results μ¼ Y Arithmetic mean of Y ð2Þ
L l¼1 l
obtained from a single classifier fitted repeatedly based on bootstrap
resamples. The advantage of ensemble lies in the possibility that the dif-
Where,
ference of result caused by the variance of input data may be reduced by
combining each classifier's output. To the best of the authors' knowl- D = total deviance of Y, or the sum of squared errors (SSE);
edge, none of the studies have implemented ensemble techniques in Yl = lth observation in column vector Y; and.
the transportation safety field in order to improve the prediction accu- L = sample size over which D is calculated (L = n for total sample).
racy except for Sohn et al. (2003), which employed arcing and bagging
as ensemble techniques (Table 1). The result suggests that ensemble al- The observations in Y are partitioned based on a predictor variable
gorithms such as bagging and arcing improved the prediction accuracy X1 (which variable results in the maximum reduction in variability of
of traffic crashes compared to individual classifier decision tree. the response variable) that results in two subsamples, say samples b
In summary, the current study contributes to non-motorized macro- and c, each containing M and N of the original L observations (M + N
level crash analysis along three directions: (1) evaluate the regression = L). If the overall sample deviance is Da, then the deviance reduction
tree models for both pedestrian and bicycle crashes (2) consider spatial function is
predictor variables in crash prediction models (3) introduction of en-
semble techniques (i.e., bagging, random forests, and gradient boosting) Δ ¼ Da −Db −Dc ð3Þ
in order to improve the prediction accuracy of macro-level crash
analysis. Where, Δ is the deviance reduction when sample a is partitioned on X1
to obtain subsamples b and c, variable importance score, decision tree regression looks at the im-
provement measure attributable to each variable in its role as a either
L
X 2 a primary or a surrogate splitter. The values of all these improvements
Da ¼ Y ðaÞl −μ ðaÞ ¼ total deviance in sample ðnodeÞ a ð4Þ
are summed over each node and totaled and are then scaled relative
l¼1
to the best performing variable. The variable with the highest sum of
M 2 improvements is scored 100, and all other variables will have lower
X
Db ¼ Y ðbÞl −μ ðbÞ ¼ total deviance in sample ðnodeÞ b ð5Þ scores ranging downwards towards zero. A variable can obtain an im-
l¼1 portance score of zero in decision tree regression only if it never appears
as either a primary or a surrogate splitter. Because such a variable plays
N
X 2 no role anywhere in the tree, eliminating it from the dataset should
Dc ¼ Y ðcÞl −μ ðcÞ ¼ total deviance in sample ðnodeÞ c ð6Þ make no difference to the results.
l¼1
In regression tree, tree growth will continue until there are homog-
enous observations in each terminal node. At first, the regression tree
1X M
produces the maximal tree with a complex structure that overfits the
μ ðbÞ ¼ Y m ¼ Arithmetic mean of subsample ðnodeÞ b ð7Þ
M m¼1 training data. However, maximal tree produces good prediction accu-
racy in training data but worse prediction accuracy in testing sample.
1X N To have better understanding, complex tree overfits the training obser-
μ ð cÞ ¼ Y n ¼ Arithmetic mean of subsample ðnodeÞ c ð8Þ vations which results in overstated confidence in predictions and inclu-
N n¼1
sion of insignificant predictor variables. The most common method
used to reduce overfitting problem is called pruning. This method uses
It is worth mentioning that M is the sample size of subsample (node)
criteria about model complexity to trim the full tree model to a smaller
b, and N is the sample size of subsample (node) c. In regression tree,
and more manageable or practical tree size which reduce overfitting
predictor variable Xi taken from Xn,p is sought to partition the column
significantly (Washington, 2000; Washington & Wolf, 1997). Pruning
vector Y such that the deviance reduction function showed in Eq. (9)
is performed according to the cost-complexity algorithm. The principle
is maximized.
behind pruning is to remove the branches that add little to the predic-
L 2 M 2 X
N 2 tive value of the tree. The pruning process starts with the maximal
X X
Δ¼ Y ðaÞl −μ ðaÞ − Y ðbÞl −μ ðbÞ − Y ðcÞl −μ ðcÞ ð9Þ tree and selectively prunes upward to produce a sequence of sub-trees
l¼1 m¼1 n¼1 of the maximal tree, and eventually collapses to the tree of the root
node. The pruning process relies on a complexity parameter which is
While searching the matrix from Xn,p, two items must be sought to defined through a cost function of misclassification of the data and the
maximize Eq. (9): the variable Xi and the numerical value on which tree size (Kashani & Mohaymany, 2011). For each tree created, the “mis-
the corresponding partition of Y will produce the maximum reduction classification error rate” or “misclassification cost”, or in other words,
of the deviance reduction function. When this maximal partition is the “goodness of fit” index, is calculated as Eq. (10):
found, the original data in node a are partitioned into two subsamples
2 3
b and c having minimal combined deviance compared with all possible X
M X
j
subsamples. Thus, the reduction in node a deviance is greatest when the Misclassification error rate ¼ pðmÞ 41− p ð j=mÞ5
2
ð10Þ
deviances at nodes b and c are smallest. As mentioned earlier, numerical m¼1 j¼1
search procedures are used to maximize Eq. (9). In addition to develop

the decision tree regression, the list of variables that entered into the Where, p (m) is the proportion of existing observations in the terminal
model can be investigated based on variable importance. To determine node or leaf m (from all observations) and M is the number of terminal
each variable's importance, the improvement in the reduction of devi- nodes.
ance that can be attributed to each variable for the splits in the tree is The last step of building a regression tree is to select an optimal tree
rated (Abdel-Aty et al., 2005; Karlaftis & Golias, 2002). To calculate a from the pruned trees. The principle behind selecting the optimal tree is
Fig. 1. Misclassification error rates for both training and testing data.
to find a tree with respect to a measure of misclassification cost on the The basic idea underlying bagging is to reduce the variance of the de-
testing dataset so that the information in the learning dataset will not cision tree that creates several subsets of data from the training sample
overfit. Towards this end, the data is usually divided into two subsets, with replacement and build the final output averaging all the predic-
one for learning and the other for testing. The learning sample is used tions. To be more specific, if several similar datasets are created by re-
to split nodes, while the testing sample compares the misclassification sampling with replacement which is called bootstrapping and a
for all the subtrees. When the tree grows larger and larger, the misclas- number of regression trees are grown without pruning and averaged,
sification cost for the learning sample decreases monotonically, indicat- the variance component of the output error is reduced. Mathematically,
ing that the maximal tree always gives the best fit to the learning data. 1 2 B
it is possible to calculate ^f ðxÞ, ^f ðxÞ, …, ^f ðxÞ, using B separate training
On the other hand, the misclassification cost for the testing sample first
sets, and averaging them in order to obtain a single low variance statis-
decreases and then increases after reaching a minimum. This indicates
tical learning model, given by Eq. (11):
that the saturated tree is greatly overfitted when applied to analyze
the testing sample. Therefore, the optimal tree is determined when
XB
the misclassification costs reach a minimum for both the learning and ^f ðxÞ ¼ 1 ^f ðxÞ ð11Þ
avg
testing samples (see (Breiman et al., 1998) for a detailed review). Fig. B b¼1 b
1 shows how an optimal tree is selected from the decision trees created.
From the figure, with an increase in complexity (more terminal nodes), However, this is not practical because the dataset does not have ac-
the misclassification cost for train data will repeatedly decrease. How- cess to multiple training sets. Hence, the sample can bootstrap by taking
ever, for the test data, first there is a decrease, and then an increase is repeated samples from the training dataset (James et al., 2013). This can
observed. An optimal tree is the one that has the least misclassification generate B different bootstrapped training datasets and train the model
cost for the test data. 1 2 B
on the bth bootstrapped training set in order to get ^f ðxÞ;bf ðxÞ……^f
ðxÞ, and finally average all the predictions (See Eq. (12))
3.2. Ensemble techniques
XB
An ensemble technique is defined by a set of individually trained ^f ðxÞ ¼ 1 ^f b ðxÞ ð12Þ
bag
B b¼1
classifiers whose predictions are combined in order to improve the pre-
diction accuracy of a single classifier (i.e., regression tree). The predic-
tion of an ensemble technique typically requires more computation This empirical formulation is called bagging.
compared to a single learner so that ensembles techniques compensate Random forest is similar to bagging in that bootstrap samples are
poor learning algorithms by performing a lot of extra computation. In drawn to construct multiple trees. The main difference from bagging
this paper, we have undertaken bagging, random forests, and boosting is that random forest compute one extra step having the random selec-
as methods for creating three ensemble techniques of regression tree tion of predictor variables rather than using all variables to grow the
to construct more powerful prediction models. trees. The number of predictors used to find the best split at each
Table 3
Sample characteristics of the road accidents attributes.
Variables name Definition Targeted TAZs Neighboring TAZs
Mean S.D. Maxa Mean S.D. Maxa
Crash variables
Pedestrian crash Total number of pedestrian crashes per STAZ 1.907 3.315 39.000 – – –
Bicycle crash Total number of pedestrian crashes per STAZ 1.797 3.309 88.000 – – –
Traffic & roadway variables

VMT Total vehicle miles travel in the STAZ 31,381.0 41,852.3 684,742.8 195,519.7 169,120.3 2,103,376.3
Proportion of heavy vehicle in VMT Total heavy vehicle VMT in STAZ /Total vehicles VMT in STAZ 0.067 0.052 0.519 0.070 0.045 0.350
Proportion of length of arterial Total length of arterial road/ Total road length in the STAZ 0.221 0.275 1.000 0.144 0.125 1.000
roads
Proportion of length of collectors Total length of collector road/ Total road length in the STAZ 0.191 0.246 1.000 0.156 0.136 1.000
Proportion of length local roads Total length of local road/Total road length in the STAZ 0.572 0.329 1.000 0.680 0.200 1.000
Signalized intersection density Number of intersection per mile in each STAZ 0.227 0.578 8.756 0.378 5.552 495.032
Length of bike lanes Total length of bike lanes in each STAZ 0.303 1.096 28.637 1.909 3.847 38.901
Length of sidewalks Total length of sidewalk in each STAZ 0.993 1.750 25.683 6.304 6.745 77.720
Socio-demographic variables
Population density Population density per square mile 2520.3 4043.3 63,069.0 2330.2 3489.7 57,181.9
Proportion of families without Total number of families with no vehicle in STAZ/Total number of 0.095 0.123 1.000 0.095 0.108 1.000
vehicle families in STAZ
School enrolments density Total school enrolment per square miles in STAZ 775.02 5983.05 255,147.24 684.22 2900.54 102,285.73
Proportion of urban area Total urban area in STAZ/Total area in STAZ 0.722 0.430 1.000 0.650 0.434 1.000
Distance to the nearest urban area Distance of the STAZ to the nearest urban area 2.140 5.441 44.101 – – –
Hotels, motels, and timeshare Hotels, motels, and timeshare rooms density per square mile 172.49 941.71 32,609.84 121.678 528.078 11,397.148
rooms density
No of total employment Total employment in STAZ 1140.10 1722.45 31,932.15 6917.245 6725.135 76,533.000
Proportion of industry employment Proportion of industry employment 0.176 0.232 1.000 0.183 0.177 1.000
Proportion of commercial Proportion of commercial employment 0.299 0.235 1.000 0.305 0.177 1.000
employment
Proportion of service employment Proportion of service employment 0.525 0.257 1.000 0.495 0.186 1.000
No of commuters by public No of commuters using public transportation 18.813 54.273 934.000 119.582 246.299 3559.985
transportation
No of commuters by cycling No of commuters using bicycle 5.894 19.804 775.000 90.869 128.399 1902.135
No of commuters by walking No of commuters by walking 14.354 34.680 1288.000 37.566 74.484 1634.530
node is a randomly chosen subset of the total number of predictors. In predictor variables are summarized in Table 3. Specifically, the table
random forest, the trees are grown to maximum size without pruning, provides the predictor values at a STAZ level as well as for the neighbor-
and aggregation is by averaging the trees. Suppose, there are N observa- ing STAZs. For the targeted and the neighboring STAZs, all the predictor
tions and M predictor variables in the learning dataset. At first, subsets variables are calculated for each of 8518 zones. Table 3 included the
of data from the training sample with replacement are taken from full mean, standard deviation, minimum, and the maximum values of the
dataset like bagging. Then, a subset of M predictor variables is selected corresponding predictor variables for 8518 STAZs including both
randomly, and whichever variables give the best split is used to split aspatial and spatial variables. It is worth mentioning that, the predictor
the node iteratively. The main advantages of random forest over bag- variables of the targeted STAZs are the variables that are calculated from
ging is that random predictor selection diminishes correlations among the corresponding STAZs with crash frequency of both pedestrian and
unpruned trees and constructs a learning model with low bias and var- bicycle, while the variables of the neighboring STAZs denotes the aver-
iance at the same time. age values of the variables that are calculated from all the surrounding
Boosting is another approach for improving the predictions resulting STAZs adjacent to the targeted STAZ.
from a series of decision trees. Like bagging, boosting is an efficient ap- Roadway characteristics included are road lengths for different func-
proach that creates several subsets of data which constructs a final out- tional class, signalized intersection density, length of bike lanes and
put by averaging all the prediction of resulting trees. Unlike bagging, the sidewalks, etc. Intersection density denotes the number of intersections
training set used for each individual learner is chosen based on the per- per street mile in a STAZ. Vehicle-miles-traveled and proportion of
formance of the earlier learner(s). In boosting, observations that are in- heavy vehicles in VMT are considered as traffic characteristics. For de-
correctly predicted by previous classifiers in the individual learners are mographic characteristics, population density, proportion of families
chosen more often than observations that were correctly predicted Con- without vehicle, proportion of urban area, no of commuters by public
sequently, boosting attempts to produce new learners for its ensemble transportation, etc. are considered.
that are better able to correctly predict examples for which the current
ensemble performance is poor. It is worth mentioning that in bagging,
the resampling of the training set is not dependent on the performance 5. Modeling results and discussions
of the earlier classifiers. In machine learning, gradient boosting tech-
nique has gained much popularity for building powerful predictive 5.1. Model assessment
models from weak learners. Specifically, gradient boosting techniques
uses a base weak learner and try to boost the performance of weak In this study, from the 8518 STAZs, 70% of the STAZs were randomly
learners by iteratively shifting the focus towards problematic observa- selected as training set for model development while 30% were
tions that were difficult to predict. This ensemble technique identifies employed as testing set for model validation. In the first step, the
problematic observations by large residuals computed in the previous model estimation process involved estimating four models as follows:
iterations (Mayr, Binder, Gefeller, & Schmid, 2014). (1) DTR aspatial model for pedestrian crashes, (2) DTR spatial model
for pedestrian crashes, (3) DTR aspatial model for bicycle crashes, (4)
4. Data preparation DTR spatial model for bicycle crashes. Prior to discussing the model re-
sults, we compare the estimated models in Table 4. The table presents
This study is focused on pedestrian and bicycle crashes at the State- the Average Squared Error (ASE) and Standard Deviation of Error
wide Traffic Analysis Zone (STAZ) level. STAZ's are geographic entities (SDE) for the four DTR models with training and testing samples. It is
delineated by state or local transportation officials to tabulate traffic-re- worth mentioning that a series of trees have been produced in order
lated data such as journey-to-work and place-of-work statistics (Cai, to achieve the best DTR models for each of the four models mentioned
2017). The data provides crash information for 8518 STAZs, with an av- above. The model with the lower ASE and SDE is the preferred DTR
erage area of 6.472 mile2. Data for the empirical study is obtained from
Florida for the years 2010 to 2012. About 16,240 pedestrians and 15,307
Table 4
bicycles involved crashes that occurred in Florida in these 3 years' pe- Comparison of predictability between different models.
riod were compiled for the analysis. Among the STAZs, 46.18% of them
Pedestrian crashes
have zero pedestrian crashes while 49.86% of them didn't have any bicy-
Training (N = Without spatial With spatial predictor variables (%
cle crashes. The crash records are collected from Florida Department of 5963) predictor variables Reduction from aspatial model)
Transportation, Crash Analysis Reporting (CAR) and Signal Four Analyt- No of predictor 10 12
ics (S4A) databases. Roadway characteristics, traffic characteristics, and variable used
ASE 5.597 5.142 (8.1)
socio-demographic characteristics – three broad categories of predic-
SDE 2.366 2.268 (4.1)
tors are considered in our study. The response variables are the total Testing (N = Without spatial With spatial predictor variables
number of pedestrian and bicycle crash in each zone. The data 2555) predictor variables
employed are obtained from FDOT Transportation Statistics Division No of predictor 10 12
and US Census Bureau. The attributes are then aggregated at the STAZ variable used
ASE 6.328 6.178 (2.4)
level using geographical information system (GIS). As discussed earlier,
SDE 2.516 2.485 (1.2)
the current analysis considered spatial predictor variables which corre-
spond to characteristics of neighboring STAZs along with the target Bicycle crashes
Training (N = Without spatial With spatial predictor variables
STAZs. In macroscopic and microscopic analyses, crashes occurring in
5963) predictor variables
a spatial unit or site are aggregated to obtain the crash frequency. The No of predictor 9 12
aggregation process might introduce errors in identifying the exoge- variable used
nous variables for the spatial unit or site. To accommodate for such spa- ASE 5.413 5.092 (5.9)
tial unit or site induced bias, spatial correlation should be considered in SDE 2.327 2.257 (3.0)
Testing (N = Without spatial With spatial predictor variables
the crash model estimates. Towards this end, for every STAZ, the adja- 2555) predictor variables
cent STAZs are identified. Based on the identified neighbors, a new var- No of predictor 9 12
iable can be obtained by averaging the values of each predictor variable variable used
from surrounding STAZs. The average value based on those surrounding ASE 6.724 5.926 (11.8)
SDE 2.594 2.435 (6.1)
STAZs (identified neighbors) is the corresponding spatial predictor var-
iable of the targeted STAZs. The descriptive statistics of the response and ASE = Average Squared Error, SDE = Standard Deviation of Error.
Table 5
Variable importance for pedestrian crash of STAZs.
Predictor variables Aspatial Ranking Spatial Ranking
STAZ predictor variables

Number of commuters by public transportation 1.0000 1 1.0000 1
Number of total employments 0.5236 2 0.5372 2
Signalized intersection density 0.3999 3 0.4191 3
Number of commuters by walking 0.3744 4 0.3673 4
Vehicle miles traveled (VMT) 0.2968 5 0.3405 6
Length of sidewalks 0.2883 6 0.3479 5
Length of bike lanes 0.1359 7 0.1394 9
Distance to nearest urban area 0.0511 8 – –
Hotels, motels, and timeshare rooms density 0.0459 9 – –
Proportion of urban area 0.0215 10 – –
Spatial predictor variable

Number of commuters by public transportation in neighboring STAZs – – 0.3200 7
Number of commuters by walking in neighboring STAZs – – 0.1703 8
Population density in neighboring STAZs – – 0.1372 10
Proportion of families without vehicle in neighboring STAZs – – 0.1304 11
School enrolment density in neighboring density – – 0.0530 12
model. The percentage reduction of ASE and SDE in the spatial model The importance value of the most important variable is 1. Then all
compared to the aspatial model were quantified to observe the impact other variables are assigned with a relative importance. The variable im-
of spatial variables for both pedestrian and bicycle crash frequency. portance result of four models (2 model types with and without spatial
For instance, ASE and SDE in the training (testing) dataset for pedestrian predictor variables of neighboring STAZs) of pedestrian and bicycle
spatial model were 8.1% (2.4%) and 4.1% (1.2%) lower, respectively than crashes each are displayed in Table 5 and Table 6, separately. Across
the corresponding aspatial model (Table 4). Nevertheless, in terms of bi- the four models for either pedestrian or bicycle crashes, the significant
cycle spatial model, the percentage reduction of ASE and SDE in training importance variable are quite comparable. While the variables with rel-
(testing) datasets were found to be 5.9% (11.8%) and 3.0% (6.1%), re- ative importance results for all DTR models across pedestrians and bicy-
spectively, in comparison with the bicycle aspatial model. Furthermore, cle crashes are presented, the graphical discussion of decision tree
spatial models have the higher number of significant predictor variables regression focuses on the DTR model with spatial predictor variables
compared to the aspatial models. For example, the pedestrian spatial that offers the best model.
model has 12 significant predictor variables, while 10 variables are sig-
nificant in the pedestrian aspatial model. Generally, higher number of 5.2.1. DTR models for pedestrian crash
predictor variables overfitted the DTR models in the training data
(Chang & Chen, 2005; Chang & Wang, 2006). However, in our analysis, 5.2.1.1. Decision tree. Forty-two predictor variables including both
the models with lower ASE and SDE were found in both training and aspatial and spatial were used for developing spatial DTR models to pre-
testing data which does not overfit the DTR model. Therefore, across pe- dict the quantitative target variable: total number of pedestrian crashes.
destrian and bicycle crash prediction models, the models with spatial Fig. 2 shows the results of the regression tree, which has 32 terminal
predictor variables (spatial model) offer better prediction accuracy in nodes. It shows that the number of commuters by public transportation,
terms of ASE and SDE for both training and testing date sets. However, number of total employments, signalized intersection density, number
the spatial variables have higher impact on the bicycle crash frequency of commuters by walking, number of commuters by public transporta-
model, compared to the pedestrian crash frequency model in terms of tion in neighboring STAZs, vehicle miles traveled, length of sidewalk,
percentage reduction of ASE and SDE (Table 4). The result seems rea- and length of bike lanes are the primary splitters in the regression
sonable because there is a higher probability to travel to neighboring tree, implying that these variables are critical for predicting pedestrian
zones by bicycling rather than walking. The State of Florida has 8518 crashes at macro-level. The interpretation of the decision tree regres-
STAZs, with an average area of 6.472 mile2. In a nutshell, this result sion results is straightforward. The rectangular box in this figure con-
highlighted that inclusion of predictor variables of adjacent STAZs im- tains the node id, average number of pedestrian crashes, and the
prove crash prediction models using machine learning techniques number of observations in the dataset. The initial split at node 1 is
(DTR models) which confirmed similar results obtained using statistical based on the variable of total number of commuters by public transpor-
modeling techniques in Cai et al. (2016). tation. This indicates that the total number of commuters by public
transportation was found to minimize the deviance most and is there-
5.2. DTR model estimation and interpretation fore the variable used for the first split. The tree diagram then breaks
into two branches and then has the opportunity to branch again. In
As previously mentioned, DTR partitions the data into relatively ho- this case, the left branch is further divided by total employment while
mogeneous terminal nodes, and it takes the mean value observed in the right branch is divided by signalized intersection density. Each of
each node as its predicted value. The empirical analysis involved a series the new branches again splits before stating the final expectation of
of DTR model estimations in order to achieve the lowest possible ASE the pedestrian crash with terminal node. As an example, consider a
and SDE. Towards this end, lists of variables are entered into each zone where the total number of commuters by public transportation is
model and their relative importance were also produced. Variable im- less than 52, total employment is less than 709, length of sidewalk is
portance is calculated based on deviance (D) or sum of squared errors equal or greater than 0.45-mile, and vehicle miles traveled is equal or
(SSE) of each variable which indicates a measure of the dispersion. greater than 162,346.6. From the training set of this tree, it can be
The first partition of the observations in the DTR models is undertaken found that an average of 5 pedestrian crashes are expected in 3 years
based on the most important predictor variable resulting in the maxi- of a hypothetical zone, as based on 10 observations of similar zones
mum reduction in variability of the response variable. Then, further par- The interaction effects of the variables in decision tree are
titions are made based on the hierarchy of most important variables. completely different than regular regression. The hierarchical structure
Table 6
Variable importance for bicycle crash of STAZs.
Predictor variables Aspatial Ranking Spatial Ranking
STAZ predictor variables

Number of total employments 1.0000 1 1.0000 1
Number of commuters using bicycle 0.6684 2 0.5688 4
Number of commuters using public transport 0.6523 3 0.1543 9
Vehicle miles traveled (VMT) 0.4875 4 0.5909 3
Length of bike lanes 0.4403 5 0.2922 7
Proportion of urban area 0.3015 6 0.0369 12
Distance to nearest urban area 0.2040 7 – –
Signalized intersection density 0.1955 8 0.1387 10
School enrolment density 0.0749 9 – –
Number of commuters by walk – – 0.3254 6
Spatial predictor variable

Number of commuters using bicycle in neighboring STAZs – – 0.6464 2
Population density in neighboring STAZs – – 0.5270 5
Length of bike lanes in neighboring STAZs – – 0.2037 8
School enrolment density in neighboring STAZs – – 0.1327 11
of a tree means that the response to one input variable depends on significant predictor variables (rank-8,9,10) in DTR aspatial model,
values of other inputs in the tree, so interactions between predictors while those variables are not found significant variables in DTR spatial
are automatically modeled (Elith, Leathwick, & Hastie, 2008). The inter- models which offers the better fit. Among the significant important spa-
action of the variables is local in the decision tree technique, while the tial predictor variables, the number of commuters by public transporta-
classical regression have global interaction between the variables. The tion offers the most important variable to predict pedestrian crashes. Cai
local interaction means that the interaction is only used for certain et al. (2016) proved that the commuters by public transportation in
values of the predictor variables instead of all the values in the corre- neighboring STAZ has a positive impact on pedestrian crashes. More-
sponding predictor variables. There are a lot of interaction terms ob- over, the number of commuters by walking, population density, propor-
served in the decision tree regression shown in Fig. 2. For the previous tion of families without vehicle, and school enrolment density in
example shown above, vehicle miles traveled only has an effect in the neighboring STAZs are significant spatial variables of pedestrian crashes
model for the subset of data for which, length of sidewalk is equal or at the macro-level.
greater than 0.45-mile, total employment is less than 709, and the
total number of commuters by public transportation is less than 52. 5.2.2. DTR models for bicycle crash
Therefore, these variables are interacting each other for the correspond-
ing tree with local interaction. 5.2.2.1. Decision tree. The same number of predictor variables (42) in-
cluding the spatial and aspatial were tried to build the decision tree re-
5.2.1.2. Variable importance. For DTR spatial model, seven predictor var- gression model for bicycle crash frequency. Fig. 3 shows the results of
iables of targeted STAZs and five predictor variables of neighboring the regression tree for the bicycle crash frequency in the macro-level.
STAZ are found to be most important variables for forecasting pedes- The tree has 25 terminal nodes with 12 significant predictor vari-
trian crash. Five significant predictor variables of neighboring STAZ con- ables including total employment, number of commuters using bicycle,
firmed the importance of including spatial variables in order to predict number of commuters using public transport, vehicle miles traveled,
the pedestrian crashes at the macro-level. The results of the variable im- length of bike lanes, number of commuters using bicycle in neighboring
portance for both models (aspatial and spatial) for pedestrian crashes STAZs, population density in neighboring STAZs etc. From Fig. 3, the
are presented in Table 5. To emphasize the predictor variables, we also total number of employments was found to minimize the deviance
ranked each variable based on their variable importance – with 1 as most, thereby the variable used for the first split. From the tree, it is
the highest important variable and 12 as the lowest important variable also noticed that the left and the right brunch further divided by the
in spatial model. number of commuters using bicycle in targeted STAZs and number of
The following observations can be made based on the results pre- commuters using bicycle in neighboring STAZs, respectively. Hence, it
sented in Table 5. The most important variable for determining the is expected that the number of commuters using bicycle in targeted
number of pedestrian crashes at macro-level is number of commuters STAZs and number of commuters using bicycle in neighboring STAZs
using public transport with relative importance 1.0. The statistical are the primary splitters in the regression tree, implying that these var-
modeling results intuitively support that commuters by public trans- iables are critical for predicting bicycle crashes at macro-level. To inter-
portation reflect zones with higher pedestrian activity resulting in in- pret the tree, consider a STAZ where the total number of employments
creased crash risk (Abdel-Aty, Lee, Siddiqui, & Choi, 2013). The next is less than 889, number of commuters using bicycle is equal or greater
most important variable to predict the pedestrian crashes is total em- than 9, population density in neighboring STAZs equal or greater than
ployment which is surrogate measures of pedestrian exposure 2674.5. From Fig. 3, it can be found that an average of 3.82 bicycle
(Siddiqui et al., 2012). Hence, it is expected that total employment has crashes are expected in 3 years of a hypothetical STAZ, as based on
a higher impact on crash frequency. The variables including signalized 111 observations of similar STAZs. In order to explain the interaction
intersection density, number of walk commuters, length of sidewalks, term of the above example, population density in neighboring STAZs
and length of bike lanes represent the likelihood of pedestrian access. only has an effect in the model for the subset of data for which, number
Therefore, these variables are found to be significant variables in the of commuters using bicycle is equal or greater than 9, and the total
DTR model. The VMT variable is a measure of vehicle exposure and as number of employments is less than 889.
expected a significant predictor for pedestrian crashes. It is interesting
to note that the variables distance to nearest urban area, hotel, motel, 5.2.2.2. Variable importance. In the DTR model with spatial variables pre-
and timeshare room density, and proportion of urban area are sented in Table 6, eight variables of the targeted STAZs and four
Fig. 2. Spatial decision tree regression model for pedestrian crash.
Fig. 3. Spatial decision tree regression model for bicycle crash.
Fig. 3. Ensemble technique framework: Bagging, Random Forests, and Boosting.
variables of the neighboring STAZs are responsible for predicting bicycle The bicycle crash frequency model has dissimilar trend with higher
crash frequency. ranking of spatial variables. In terms of spatial variables effect, the im-
The impact of some predictor variables in the pedestrian and bicycle portant variables have mixed effects between pedestrian and bicyclists.
crash prediction models are quite similar. A possible reason is that Population density and the school enrolment density in neighboring
STAZs with high pedestrian activity are also likely to experience high bi- STAZs offers important spatial variables for both pedestrian and bicycle
cyclists activity. Among the parent (targeted) STAZ variables, number of crash prediction models. Number of commuters using bicycle and the
total employments is the most important predictor variable of bicycle length of bike lanes in neighboring STAZs are found significantly associ-
crashes. The other important variables for the bicycle crash propensity ated with bicycle crashes.
are vehicle miles traveled (VMT), number of commuters using bicycle,
number of commuters by walk, length of sidewalks, number of com- 5.3. Ensemble techniques results
muters using public transport, signalized intersection density, and pro-
portion of urban area, respectively. There are four main differences in To improve the prediction accuracy of the DTR models, we have used
the STAZ variable impacts between pedestrians' and bicyclists' crash fre- ensemble techniques using three structures: (1) Bagging, (2) Random
quency in terms of variable importance. First, the density of hotel, motel, Forest, (3) Gradient Boosting. Fig. 3 illustrates the basic framework of
and time share rooms is not a significant variable for predicting bicycle the three ensemble techniques proposed in the pedestrian and bicycle
crash. This result is intuitive because tourists are less likely to use bicy- crash prediction models. Some observations can be made from this
cles. Second, the school enrolment density does have significant impact framework. All the three ensemble techniques combine several decision
on bicycle crashes as it is possible that students are more likely use bicy- trees to produce better predictive performance than utilizing a single
cles for traveling to schools. Third, the length of sidewalks in the STAZ decision tree.
does not have significant importance to predict bicycle crashes, Bagging create several subsets of data by bootstrap resampling while
whereas, sidewalk length is found to be significant variable for the random forest utilizes the same process in addition to taking the
predicting pedestrian crashes. Finally, the ranking of the spatial vari- random subset predictors. Unlike bagging and random forest, boosting
ables in the pedestrian crash frequency model are all at the bottom. generate multiple training samples by re-weighting which can
Table 7
Comparison of predictability across ensemble techniques.
Measure of effectiveness Decision tree Bagging (% Reduction) Random Forests (% Reduction) Gradient Boosting (% Reduction)
Pedestrian crashes with spatial predictor variables

Training (N = 5963)
ASE 5.142 5.016 (2.5) 4.975 (3.2) 4.856 (5.6)
SDE 2.268 2.239 (1.3) 2.230 (1.7) 2.203 (2.9)
Testing (N = 2555)
ASE 6.178 6.089 (1.5) 6.015 (2.6) 5.915 (4.3)
SDE 2.485 2.468 (0.7) 2.453 (1.3) 2.432 (2.1)
Bicycle crashes with spatial predictor variables

Training (N = 5963)
ASE 5.092 4.965 (2.5) 4.912 (3.5) 4.821 (5.3)
SDE 2.257 2.228 (1.3) 2.216 (1.8) 2.196 (2.7)
Testing (N = 2555)
ASE 5.926 5.868 (1.0) 5.821 (1.8) 5.712 (3.6)
SDE 2.435 2.422 (0.5) 2.413 (0.9) 2.390 (1.8)
improves the accuracy of single learner. Finally, bagging, random forest, The paper is not without limitations. While the decision tree regres-
and boosting estimate the final prediction by averaging multiple esti- sion is considered, we do not consider other data mining techniques to
mates of individual trees. check the prediction accuracy. It will be an interesting exercise to model
The aforementioned three ensemble techniques were implemented the other data mining techniques such as neural network, support vec-
based on the methodology showed in Fig. 1 and the goodness of fit mea- tor machine and their ensembles. Moreover, it might be beneficial to ex-
sure such as ASE and SDE are calculated for the spatial models of pedes- plore the similar models for multiple spatial units and several years.
trian and bicycle crashes. The comparison results of the ensemble
techniques along with the DTR models (weak learners) for both pedes- Acknowledgment
trian and bicycle crashes are presented in Table 7.
The table presents the ASE and SDE for the ensemble techniques and The authors would like to gratefully acknowledge Florida Depart-
DTR model for both training and testing samples. The percentage reduc- ment of Transportation (FDOT) for providing access to the Florida crash
tions of ASE and SDE of these ensemble techniques compared to the data.
base model (DTR model) were also calculated in order to compare the
improvements across the models. For example, in the pedestrian crash References
frequency model, gradient boosting decreased the ASE and SDE of train-
ing (testing) dataset by 5.6% (4.3%) and 2.9% (2.1%), respectively, com- Abdel-Aty, M., Chundi, S. S., & Lee, C. (2007). Geo-spatial and log-linear analysis of pedes-
trian and bicyclist crashes involving school-aged children. Journal of Safety Research,
pared to the decision tree regression. Three significant conclusions can 38(5), 571–579. https://doi.org/10.1016/j.jsr.2007.04.006.
be made from the results highlighted in Table 7. First, all models with Abdel-Aty, M., Keller, J., & Brady, P. (2005). Analysis of types of crashes at signalized inter-
ensemble techniques perform slightly better than the original DTR sections by using complete crash data and tree-based regression. Transportation
Research Record Journal of the Transportation Research Board, 1908, 37–45.
model given the complexity of these models. Second, gradient boosting Abdel-Aty, M., Lee, J., Siddiqui, C., & Choi, K. (2013). Geographical unit based analysis in
provides the best performance in all ensemble techniques compared to the context of transportation safety planning. Transportation Research Part A: Policy
the other counterparts. Third, Random forests is better than bagging in and Practice, 49, 62–75.
Blincoe, L., Seay, A., & Zaloshnja, E. (2000). T.Miller, Romano, E., S.Luchter, R.Spicer, 2002.
terms of goodness-of-fit measures. The economic impact of motor vehicle crashes. DOT HS, 809, 446.
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1998). Classification and Regression
Trees. Chapman & Hall/CRC.
Cai, Q. (2017). Integrating the macroscopic and microscopic traffic safety analysis using hier-
6. Conclusion
archical models.
Cai, Q., Abdel-Aty, M., & Lee, J. (2017). Macro-level vulnerable road users crash analysis: A
This study applied machine learning techniques for pedestrian and Bayesian joint modeling approach of frequency and proportion. Accident Analysis &
Prevention, 107(May), 11–19. https://doi.org/10.1016/j.aap.2017.07.020.
bicycle crash analyses that captures the effects of important predictor
Cai, Q., Abdel-Aty, M., Lee, J., & Eluru, N. (2017). Comparative analysis of zonal systems for
variables at the macro-level. The study conducted decision tree regres- macro-level crash modeling. Journal of Safety Research, 61, 157–166.
sion (DTR) modeling analysis to highlight the importance of various Cai, Q., Lee, J., Eluru, N., & Abdel-Aty, M. (2016). Macro-level pedestrian and bicycle crash
traffic, roadway, and socio-demographic characteristics of the STAZ on analysis: Incorporating spatial spillover effects in dual state count models. Accident;
Analysis and Prevention, 93(407), 14–22. https://doi.org/10.1016/j.aap.2016.04.018.
the pedestrian and bicycle crash occurrence. To the best of the authors' Cai, Q., Saad, M., Abdel-aty, M., & Yuan, J. (2018). Safety impact of weaving distance on
knowledge, this is the first attempt to employ such DTR models at the freeway facilities with managed lanes using both microscopic traffic and driving sim-
macro-level. The study also considered spatial predictor variables from ulations. Transportation Research Record, 53. https://doi.org/10.1177/
0361198118780884.
neighboring STAZs in order to improve the prediction accuracy of DTR Chang, L. Y., & Chen, W. C. (2005). Data mining of tree-based models to analyze freeway
models for both pedestrian and bicycle crashes. It was found that the in- accident frequency. Journal of Safety Research, 36(4), 365–375.
troduction of spatial predictor variables on DTR models clearly Chang, L. Y., & Chien, J. T. (2013). Analysis of driver injury severity in truck-involved acci-
dents using a non-parametric classification tree model. Safety Science, 51(1), 17–22.
outperformed the DTR models that did not consider the spatial variables Chang, L. Y., & Wang, H. W. (2006). Analysis of traffic injury severity: An application of
in terms of goodness-of-fit measures. From the diagram of the decision non-parametric classification tree techniques. Accident; Analysis and Prevention, 38
tree regression, we can observe that the interactions between predictor (5), 1019–1027.
De Oña, J., López, G., & Abellán, J. (2013). Extracting decision rules from police accident
variables are automatically modeled as the response to one input vari-
reports through decision trees. Accident; Analysis and Prevention, 50, 1151–1160.
able depends on values of other inputs in the tree. It is also clear that Ekram, A. -A., & Rahman, M. S. (2018). Effects of connected and autonomous vehicles on
the interactions between the predictor variables are local in the decision contraflow operations for emergency evacuation: A microsimulation study.
Proceedings of the 97th Annual Meeting of the Transportation Research Board.
tree regression, while the classical regression have global interaction
Elith, J., Leathwick, J. R., & Hastie, T. (2008). A working guide to boosted regression trees.
between the independent variables. To facilitate a policy analysis at The Journal of Animal Ecology, 77(4), 802–813. https://doi.org/10.1111/j.1365-2656.
the macro-level, variable importance of DTR models for both pedes- 2008.01390.x.
trians and bicyclists crashes were computed. The variable importance Eluru, N., Bhat, C. R., & Hensher, D. A. (2008). A mixed generalized ordered response
model for examining pedestrian and bicyclist injury severity level in traffic crashes.
results clearly highlighted the significant predictor variables of the Accident; Analysis and Prevention, 40(3), 1033–1054.
targeted and neighboring STAZs including traffic (such as VMT), road- Eustace, D., Alqahtani, T., & Hovey, P. W. (2018). Classification tree modelling of factors
way (such as signalized intersection density, length of sidewalks and impacting severity of truck-related crashes in Ohio. Transportation Research Board
97th Annual Meeting.
bike lanes, etc.) and sociodemographic characteristics (such as popula- Gong, Y., Abdel-Aty, M., Cai, Q., & Rahman, M. S. (2019). A decentralized network level
tion density, commuters by public transportation, walking and bicy- adaptive signal control algorithm by deep reinforcement learning. Transportation Re-
cling) for both pedestrian and bicycle crashes. In terms of the planning search Board 98th Annual Meeting.
Huang, H., Abdel-Aty, M., & Darwiche, A. (2010). County-level crash risk analysis in Flor-
perspective, it is important to identify zones with high public transit ida Bayesian spatial modeling. Transportation Research Record Journal of the
commuter, employment area, pedestrian and bicyclist commuters, and Transportation Research Boar, 2148, 27–37.
undertake infrastructure upgrades to improve safety. Finally, the study Iragavarapu, V., Lord, D., & Fitzpatrick, K. (2015). Analysis of injury severity in pedestrian
crashes using classification regression trees. The Transportation Research Board 94th
undertook some ensemble techniques such as bagging, random forest,
Annual Meeting.
and gradient boosting to improve the prediction accuracy of pedestrian James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learn-
and bicycle crashes. The results revealed that, all the ensemble tech- ing. New York: Springer, 112.
Karlaftis, M. G., & Golias, I. (2002). Effects of road geometry and surface on speed and
niques offer slightly better fit compared to the original DTR models
safety. Accident; Analysis and Prevention, 34, 357–365 3 34 November 1998.
given the complexity of these techniques. Moreover, random forests is Kashani, A. T., & Mohaymany, A. S. (2011). Analysis of the traffic injury severity on two-
better than bagging in terms of goodness-of-fit measures. Finally, gradi- lane, two-way rural roads based on classification tree models. Safety Science, 49
ent boosting algorithms outperformed competing two ensemble tech- (10), 1314–1320.
Kashani, A. T., Rabieyan, R., & Besharati, M. M. (2014). A data mining approach to inves-
niques which found the best technique for predicting the pedestrian tigate the factors influencing the crash severity of motorcycle pillion passengers.
and bicycle crash in macro-level. Journal of safety research, 51, 93–98.
Lee, C., & Abdel-Aty, M. (2005). Comprehensive analysis of vehicle-pedestrian crashes at Saad, M., Abdel-aty, M., Lee, J., & Wang, L. (2019). Integrated safety and operational anal-
intersections in Florida. Accident; Analysis and Prevention, 37(4), 775–786. https://doi. ysis of the access design of managed toll lanes. Transportation Research Record.
org/10.1016/j.aap.2005.03.019. https://doi.org/10.1177/0361198118823502.
Lee, J., Abdel-Aty, M., Choi, K., & Huang, H. (2015). Multi-level hot zone identification for Shankar, V., Milton, J., & Mannering, F. (1997). Modeling accident frequencies as zero-al-
pedestrian safety. Accident; Analysis and Prevention, 76, 64–73. tered probability processes: An empirical inquiry. Accident; Analysis and Prevention,
Lee, J., Yasmin, S., Eluru, N., Abdel-Aty, M., & Cai, Q. (2018). Analysis of crash proportion by 29(6), 829–837.
vehicle type at traffic analysis zone level: a mixed fractional split multinomial logit Siddiqui, C., Abdel-Aty, M., & Choi, K. (2012). Macroscopic spatial analysis of pedestrian
modeling approach with spatial effects. Accident Analysis & Prevention, 111(Septem- and bicycle crashes. Accident; Analysis and Prevention, 45, 382–391.
ber 2017), 12–22. Sohn, S. Y., & Lee, S. H. (2003). Data fusion, ensemble and clustering to improve the clas-
Lord, D., & Mannering, F. (2010). The statistical analysis of crash-frequency data: A review sification accuracy for the severity of road traffic accidents in Korea. Safety Science, 41
and assessment of methodological alternatives. Transportation Research Part A: Policy (1), 1–14.
and Practice, 44(5), 291–305. Son, H. D., Kweon, Y. J., & Park, B. B. (2011). Development of crash prediction models with
Lord, D., Washington, S., & Ivan, J. N. (2007). Further notes on the application of zero-in- individual vehicular data. Transportation Research Part C: Emerging Technologies, 19
flated models in highway safety. Accident; Analysis and Prevention, 39(1), 53–57. (6), 1353–1363.
Lord, D., Washington, S. P., & Ivan, J. N. (2005). Poisson, poisson-gamma and zero-inflated Song, Y. -Y., & Lu, Y. (2015). Decision tree methods: Applications for classification and
regression models of motor vehicle crashes: Balancing statistical fit and theory. prediction. Shanghai Archives of Psychiatry, 27(2), 130–135.
Accident; Analysis and Prevention, 37(1), 35–46. Tavakoli Kashani, A., Rabieyan, R., & Besharati, M. M. (2014). A data mining approach to
Mannering, F. L., Shankar, V., & Bhat, C. R. (2016). Unobserved heterogeneity and the sta- investigate the factors influencing the crash severity of motorcycle pillion passengers.
tistical analysis of highway accident data. Analytic Methods in Accident Research, 11, Journal of Safety Research, 51, 93–98.
1–16. Ukkusuri, S., Hasan, S., & Aziz, H. (2011). Random parameter model used to explain effects
Mayr, A., Binder, H., Gefeller, O., & Schmid, M. (2014). The evolution of boosting algo- of built-environment characteristics on pedestrian crash frequency. Transportation
rithms: From machine learning to statistical modelling. Methods of Information in Research Record Journal of the Transportation Research Board, 2237, 98–106. https://
Medicine, 53(6), 419–427. doi.org/10.3141/2237-11.
Montella, A., Aria, M., D'Ambrosio, A., & Mauriello, F. (2012). Analysis of powered two- Wah, Y. B., Nasaruddin, N., Voon, W. S., & Lazim, M. A. (2012). Decision tree model for
wheeler crashes in Italy by classification trees and rules discovery. Accident; count data. World Congress on Engineering I, 4–9.
Analysis and Prevention, 49, 58–72. Wang, X., Yuan, J., Schultz, G. G., & Fang, S. (2018). Investigating the safety impact of road-
Mounce, S. R., Ellis, K., Edwards, J. M., Speight, V. L., Jakomis, N., & Boxall, J. B. (2017). En- way network features of suburban arterials in Shanghai. Accident Analysis &
semble decision tree models using RUSBoost for estimating risk of iron failure in Prevention, 113(January), 137–148. https://doi.org/10.1016/j.aap.2018.01.029.
drinking water distribution systems. Water Resources Management, 31(5), Washington, S. (2000). Iteratively specified tree-based regression: Theory and trip gener-
1575–1589. ation example. Journal of Transportation Engineering, 126(6), 482–491.
Nashad, T., Yasmin, S., Eluru, N., Lee, J., & Abdel-Aty, M. A. (2016). Joint modeling of pedes- Washington, S., & Wolf, J. (1997). Hierarchical tree-based versus ordinary least squares
trian and bicycle crashes: A copula based approach. Transportation Research Record, linear regression models: Theory and example applied to trip generation.
2601, 119–127. Transportation Research Record, 1581(1), 82–88. https://doi.org/10.3141/1581-11.
NHTSA (2005). Motor vehicle traffic crashes as a leading cause of death in the United States Wu, Y., Abdel-Aty, M., Wang, L., & Rahman, M. S. (2019). Improving flow and safety in low
2002 1 Young, 3. visibility conditions by applying connected vehicles and variable speed limits tech-
NHTSA (2015). Traffic safety facts: Bicyclists and other cyclists. nologies. Transportation Research Board 98th Annual Meeting.
NHTSA (2017a). 2016 motor vehicle crashes: Overview. Traffic Safety Facts Research Note, Yuan, J., & Abdel-Aty, M. (2018). Approach-level real-time crash risk analysis for signal-
1–9. ized intersections. Accident Analysis & Prevention, 119, 274–289.
NHTSA (2017b). Traffic Safety Facts: Pedestrian. Zhang, Y., Bigham, J., Ragland, D., & Chen, X. (2015). Investigating the associations be-
Pande, A., Abdel-Aty, M., & Das, A. (2010). A classification tree based modeling approach tween road network structure and non-motorist accidents. Journal of Transport
for segment related crashes on multilane highways. Journal of Safety Research, 41(5), Geography, 42, 34–47. https://doi.org/10.1016/j.jtrangeo.2014.10.010.
391–397. Zheng, Z., Lu, P., & Denver, T. (2016). Accident prediction for highway-rail grade crossings
Pitt, R., Guyer, B., Hsieh, C. C., & Malek, M. (1990). The severity of pedestrian injuries in using decision tree approach: An empirical analysis. Transportation Research Record
children: An analysis of the pedestrian injury causation study. Accident; Analysis Journal of the Transportation Research Board, 2545, 115–122.
and Prevention, 22(6), 549–559. https://doi.org/10.1016/0001-4575(90)90027-I.
Prati, G., Pietrantoni, L., & Fraboni, F. (2017). Using data mining techniques to predict the
Md Sharikur Rahman is a graduate research assistant and Ph.D. student at the University
severity of bicycle crashes. Accident; Analysis and Prevention, 101, 44–54. https://doi.
of Central Florida. His research area is traffic safety analysis. He received his B.S. in Civil En-
org/10.1016/j.aap.2017.01.008.
gineering from Bangladesh University of Engineering and Technology. His research inter-
Rahman, M. H., Abdel-Aty, M., Lee, J., & Rahman, M. S. (2019). Enhancing traffic safety at
ests lie in the field of microscopic traffic safety analysis. He is also the vice president of
school zones by operation and engineering countermeasures: A microscopic simula-
American Society of Highway Engineers at UCF.
tion approach. Simulation Modelling Practice and Theory.
Rahman, M. S. (2018). Applying Machine Learning Techniques to Analyze the Pedestrian and
Mohamed Abdel-Aty is a Pegasus Professor, Chair of the Civil, Environmental and Con-
Bicycle Crashes at the Macroscopic Level. Electron. Theses Diss.
struction Engineering Department at the University of Central Florida and a registered pro-
Rahman, M. S., Abdel-Aty, M., Hasan, S., & Cai, Q. (2019). Applying data mining techniques
fessional engineer in Florida. His main expertise and interests are in the areas of traffic
to analyze the pedestrian and bicycle crashes at the macroscopic level. Transportation
safety analysis, simulation, big data and data analytics and intelligent transportation sys-
Research Board 98th Annual Meeting.
tems (ITS). In 2015, he was awarded the Pegasus Professorship, the highest honor at the
Rahman, M. S., Abdel-aty, M., Lee, J., & Rahman, H. (2019b). Understanding the safety ben-
university. He is the Editor-in-Chief of Accident Analysis and Prevention and a member
efits of connected and automated vehicles on arterials' intersections and segments.
of multiple TRB Standing Committees including Highway Safety Performance (ANB25),
Transportation Research Board 98th Annual Meeting.
User Information Systems (AND20) and Safety Data, Analysis and Evaluation (ANB20).
Rahman, M. S., Abdel-Aty, M., Wang, L., & Lee, J. (2018). Understanding the highway
Dr. Abdel-Aty is a leading traffic safety expert at both the national and international levels.
safety benefits of different approaches of connected vehicles in reduced visibility con-
In addition, he has been invited to deliver many Keynote speeches in conferences around
ditions. Transportation Research Record, 2672(19), 91–101.
the world, including in Belgium, Brazil, China, Korea, Turkey, KSA, Qatar, and UAE.
Rahman, S., & Abdel-aty, M. (2018). Longitudinal safety evaluation of connected vehicles ’
platooning on expressways. Accident Analysis & Prevention, 117(December 2017),
Samiul Hasan received the bachelor's and master's degrees in civil engineering from Ban-
381–391. https://doi.org/10.1016/j.aap.2017.12.012.
gladesh University of Engineering and Technology in 2004 and 2007, respectively, and the
Rahman, S., Abdel-aty, M., Lee, J., & Rahman, H. (2019). Safety benefits of arterials ’ crash
Ph.D. degree in transportation and infrastructure systems from Purdue University in 2013.
risk under connected and automated vehicles 1. Transportation Research Part C, 100
He is currently an Assistant Professor with the Department of Civil, Environmental and
(July 2018), 354–371. https://doi.org/10.1016/j.trc.2019.01.029.
Construction Engineering, University of Central Florida, Orlando, FL, USA. His research in-
Roudsari, B. S., Mock, C. N., Kaufman, R., Grossman, D., Henary, B. Y., & Crandall, J. (2004).
terests include human mobility, urban computing, network modeling, agent-based simu-
Pedestrian crashes: Higher injury severity and mortality rate for light truck vehicles
lation, and disaster management. He received the Best Dissertation Award presented by
compared with passenger vehicles. Injury Prevention, 10(3), 154–158. https://doi.
the Transportation Science and Logistics Society of the Institute for Operations Research
org/10.1136/ip.2003.003814.
and the Management Sciences in 2014.
Saad, M., Abdel-aty, M., & Lee, J. (2019). Analysis of driving behavior at expressway toll
plazas. Transportation Research Part F: Traffic Psychology and Behaviour, 61, 163–177.
Qing Cai is a postdoctoral researcher in the Civil, Environmental and Construction Engi-
https://doi.org/10.1016/j.trf.2017.12.008.
neering Department at the University of Central Florida (UCF). He received his Ph.D. in
Saad, M., Abdel-aty, M., Lee, J., & Cai, Q. (2019). Bicycle safety analysis at intersections
transportation engineering from the same university. His main expertise and interest is
from crowdsourced data. Transportation Research Record, 1–14. https://doi.org/10.
in the areas of traffic safety, transportation planning, socio-demographic and land-use
1177/0361198119836764.
modeling, and data analytics.
Saad, M., Abdel-aty, M., Lee, J., & Wang, L. (2018a). Determining the optimal access design
of managed lanes considering dynamic pricing. 18th International Conference Road
Safety on Five Continents.
Saad, M., Abdel-Aty, M., Lee, J., & Wang, L. (2018b). Safety analysis of access zone design
for managed toll lanes on freeways. Journal of Transportation Engineering Part A
Systems, 144(11), 1–13. https://doi.org/10.1061/JTEPBS.0000191.

Applying Machine Learning Approaches To Analyze The Vulnerable Roadusers' Crashes at Statewide Traffic Analysis Zones

Uploaded by

Copyright:

Available Formats

Applying Machine Learning Approaches To Analyze The Vulnerable Roadusers' Crashes at Statewide Traffic Analysis Zones

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Applying Machine Learning Approaches To Analyze The Vulnerable Roadusers' Crashes at Statewide Traffic Analysis Zones

Uploaded by

Copyright:

Available Formats

JSR-01585; No of Pages 14

Journal of Safety Research xxx (2019) xxx

Contents lists available at ScienceDirect

Journal of Safety Research

journal homepage: www.elsevier.com/locate/jsr

Applying machine learning approaches to analyze the vulnerable road-

Area of Studies Study unit (Scale) Methodology Target variables analyzed

Study Studies Contributory factors

locations, on higher speed-limit roads, on two-way roads with median, 3. Methodologies

search procedures are used to maximize Eq. (9). In addition to develop

Variables name Deﬁnition Targeted TAZs Neighboring TAZs

Mean S.D. Maxa Mean S.D. Maxa

Trafﬁc & roadway variables

Predictor variables Aspatial Ranking Spatial Ranking

STAZ predictor variables

Spatial predictor variable

Predictor variables Aspatial Ranking Spatial Ranking

STAZ predictor variables

Spatial predictor variable

Fig. 2. Spatial decision tree regression model for pedestrian crash.

Fig. 3. Spatial decision tree regression model for bicycle crash.

Fig. 3. Ensemble technique framework: Bagging, Random Forests, and Boosting.

Pedestrian crashes with spatial predictor variables

Bicycle crashes with spatial predictor variables

You might also like