Identification of Vehicle-Pedestrian Collision - Yao 2018
Identification of Vehicle-Pedestrian Collision - Yao 2018
Identification of Vehicle-Pedestrian Collision - Yao 2018
Article
Identification of Vehicle-Pedestrian Collision
Hotspots at the Micro-Level Using Network Kernel
Density Estimation and Random Forests: A Case
Study in Shanghai, China
Shenjun Yao 1,2 , Jinzi Wang 1,2 , Lei Fang 3 and Jianping Wu 1,2, *
1 Key Laboratory of Geographic Information Science (Ministry of Education), East China Normal University,
Shanghai 200241, China; [email protected] (S.Y.); [email protected] (J.W.)
2 School of Geographic Sciences, East China Normal University, Shanghai 200241, China
3 Department of Environmental Science and Engineering, Fudan University, Shanghai 200438, China;
[email protected]
* Correspondence: [email protected]; Tel.: +86-21-5434-1204
Received: 12 November 2018; Accepted: 10 December 2018; Published: 13 December 2018
Abstract: The improvement of pedestrian safety plays a crucial role in developing a safe and
friendly walking environments, which can contribute to urban sustainability. A preliminary step
in improving pedestrian safety is to identify hazardous road locations for pedestrians. This study
proposes a framework for the identification of vehicle-pedestrian collision hot spots by integrating
the information about both the likelihood of the occurrence of vehicle-pedestrian collisions and the
potential for the reduction in vehicle-pedestrian crashes. First, a vehicle-pedestrian collision density
surface was produced via network kernel density estimation. By assigning a threshold value, possible
vehicle-pedestrian hot spots were identified. To obtain the potential for vehicle-pedestrian collision
reduction, random forests was employed to model the density with a set of variables describing
vehicle and pedestrian flows. The potential for crash reduction was then measured as the difference
between the observed vehicle-pedestrian crash density and the prediction produced by the random
forests models. The final hotspots were determined by excluding those with a crash reduction value
of no more than zero. The method was applied to the identification of hazardous road locations
for pedestrians in a district in Shanghai, China. The result indicates that the method is useful for
decision-making support.
Keywords: kernel density; random forests; pedestrians; crash; hotspots; safety; walking
1. Introduction
People start and end most of their trips on foot in their daily lives. However, mainly due to the
lack of awareness, pedestrians are often at high risk for death and injury. According to the World
Health Organization [1], approximately 1.24 million traffic deaths occur annually on the world’s roads,
of which about 22% involve pedestrians. As walking positively influences health and the environment,
encouraging walking can help develop a sustainable community. Despite a shift from motorized to
sustainable transport modes (such as walking and cycling) that have focused attention on pedestrian
safety, there is still much room for improvement to ensure a safe walking environment for pedestrians.
A preliminary step to improve pedestrian safety is to identify hazardous road locations for
pedestrians. This task plays a crucial role in safety countermeasure proposals and resource allocation.
From a geography perspective, hazardous road locations are usually represented by clusters of
traffic collisions. In the literature, extensive research has focused on the detection of traffic collision
concentration at the micro levels [2–12]. The studies can be categorized into two types [13,14]. The first
is the link-attribute class, where the road network is segmented into basic spatial units (BSUs) and
treats the traffic collisions as attributes attached to the BSUs. The other is the event-based type,
where individual traffic collision events represented by x and y coordinates in space are analyzed.
In traffic collision analysis, kernel density estimation (KDE) is one of the most popular event-based
approaches [15]. KDE has been widely applied to the identification of hazardous road locations.
Although some researchers employed traditional planar KDE [16–18] that estimates density in
two-dimensional space where traffic collisions are weighted based on the Euclidean distance, there has
been a growing trend in applying network KDE (NKDE), which estimates density in a one-dimensional
space where distance is calculated along the road network mainly because traffic collisions are a
network-constrained phenomenon. For instance, Xie and Yan [5] developed a novel NKDE approach
to estimate the density of network-constrained point events and applied it to the analysis of 2005 traffic
crash data in the Bowling Green, Kentucky, USA area. The results indicate that the NKDE is more
appropriate than standard planar KDE for density estimation of traffic collisions, since the latter is
likely to overestimate the density values.
In the context of road safety, hazardous road locations are usually referred to as traffic collision
“hotspots”, “blackspots”, ”sites with promise”, or “high risk locations”. A number of previous studies
employed different methods to detect traffic collision hot spots based on traffic collision frequency
and rate [19–22] aggregated by BSUs. Unlike spatial analysts who are interested in spatial analytical
techniques for the detection of traffic collision clusters, traffic safety researchers are more concerned
with the definition of hazardous road locations. Although using a simple ranking approach is the most
convenient way of defining a traffic collision hotspot, it is thought that the method is naive and is likely
to cause a large number of false positives. In handling this, previous studies have proposed other
measures to define a hazardous (or unsafe) road locations. For instance, McGuigan [23,24] measured
the “potential of accident reduction”, which was calculated as the difference between the observed and
the expected crash count at a site given exposure. Mahalel et al. [25] suggested that locations that are
selected for treatment should maximize the expected total reduction of traffic collisions. The premise
of these studies is that only excess traffic collisions can be prevented by appropriate treatments [26].
However, most of these studies focused on vehicle-vehicle collisions and dealt with collision frequency.
The method has not yet been applied to vehicle-pedestrian collision density.
As there is been no consensus on the best method of detecting hazardous road locations, this
study proposes an integrated micro-level method that incorporates both traffic crash intensity and the
potential for reduction to identify vehicle-pedestrian collision hot spots. The reasons for developing
the method are twofold. Firstly, there is a growing trend among nations worldwide to set a “zero”
tolerance vision in terms of fatalities to protect road users. To realize the ambitious target of zero road
fatalities and serious injuries on roads, researchers and engineers should be concerned with locations
where traffic collisions happen frequently. Secondly, in safety practice, resources are usually insufficient
for treating every hazardous road location. Policy-makers may not be interested in traffic crash clusters
that only result from high traffic volume. They may, instead, like to know hazardous road locations
that produce the maximum reduction in traffic deaths and injuries when appropriately treated. In this
light, we attempted to develop a framework to integrate both crash density and reduction potential
information sources for decision-making support for pedestrian safety.
The following section first introduces the steps for identifying vehicle-pedestrian hot spots, with
emphasis on models we used to analyse vehicle-pedestrian collisions. The study area and data
are introduced in Section 3, and the results are presented and discussed in Section 4, followed by
conclusions and further research directions in Section 5.
2. Method
The proposed framework for the identification of vehicle-pedestrian collision hot spots involves
three steps: producing a vehicle-pedestrian collision density surface, measuring the potential for
Sustainability 2018, 10, 4762 3 of 11
vehicle-pedestrian collision reduction, and identifying the vehicle-pedestrian collision hot spots.
This section introduces the models and approaches employed in each step.
where b is the bandwidth, dij is the network distance between reference point i and vehicle-pedestrian
traffic collision j, and Kern(.) is a kernel function that measures the distance decay effect, such as
Uniform, Triangle, Quartic, Triweight, and Gaussian [27]. In this study, the length of BSU was set as
200 m, and the Quartic function was chosen as the kernel function, which is determined by:
2
dij 15 dij 2 d
i f 0 < bij ≤ 1
Kern( ) = 16 (1 − b2
) (2)
b 0 otherwise;
Although the BSU length and the choice of kernel function may have limited influence on the
results, the selection of bandwidth has significant impacts on the resultant density surface [4,5,12].
A small bandwidth may produce a sharp density pattern and may result in a large number of tiny
isolated individual clusters, and a broad bandwidth produces smooth density surface where hazardous
road locations are likely to be mixed with safe neighboring locations. In this research, the bandwidth
was chosen as 250 m—an intermediate value—to ensure an appropriate density surface.
a subset of predictors and choosing the best split among those variables [34]. The outcome of the
RFs is determined by averaging the predictions of all the trees [35]. The importance of each predictor
can be estimated by examining the increase in prediction error when permuting the OOB data for
that variable and leaving all others unchanged. Two commonly used measures in RFs for assessing
variable importance are the mean decrease in accuracy and the decrease in node impurity. As the
former indicator is considered a more reliable measure [43], it was used for measuring the variable
importance in this study.
This study employed the Sci-Kit Learn (SKlearn, The French Institute for Research in Computer
Science and Automation, Rocquencourt, France) toolkit [44] that provides machine learning tools
in Python for data mining and data analysis. In SKlearn, the RandomForestRegressor tool was
used for implementing the RF algorithm. It contains several parameters that allow users to specify
modifications for optimizing the model, including n estimators (the number of decision trees), criterion
(the method to measure the quality of a split), max_depth (the maximum depth of a decision tree),
and min_samples_split (the minimum sample size in a split). SKlearn also provides functions that enable
users to measure the prediction accuracy of the model, such as cross_val_score mean_squared_error,
mean_absolute_error, and r2_score, which compute the values of mean squared error, mean absolute
error, and R2 , respectively. The function feature_importances is used for measuring the importance of
each variable.
Although independent validation samples are not necessary for RF, they allow the assessment of
the generalization capability of the method [38,45]. In this light, the dataset was randomly divided into
two parts: 70% for calibration and 30% for validation. The procedure was repeated n times, resulting
in n sub-samples. The final predicted density value was determined by averaging predictions from
RF models based on n sub-samples. The potential for vehicle-pedestrian collision reduction was then
obtained by calculating the difference between the observed vehicle-pedestrian collision density and
the final prediction. In this study, n was set to five.
diverse activities of human beings, which influence different features of pedestrian flow [51–53],
we employed land use data to reflect the spatial variation in pedestrian exposure. Point of Interest
(POI) data that could be used to further segment the activities were also introduced into the RF model
to incorporate more detailed features on pedestrian flow. In this research, land use data were derived
from Landsat (National Aeronautics and Space Administration, Washington, DC, US) images from
2014 with a spatial resolution of 30 m. POIs were collected from Baidu, Inc. (Beijing, China) in 2014.
The company provides application programming interfaces whereby users are allowed to develop
programs for collecting POI information from Baidu Map. As some land use and POI variables are
highly correlated, not all types of land use and POIs were integrated into the prediction models. Table 1
describes the variables that were finally introduced in the vehicle-pedestrian collision density models.
The result of the collinearity test for these variables was 3.4, reflecting little collinearity.
Due to data availability, we used the 2015 vehicle-pedestrian collision data, taxi GPS data from
2016, and land use and POI datasets from 2014. Since Changning District is located in the urban
area of Shanghai where the features of the built environment did not vary significantly from 2014 to
2016, it was reasonable to conduct analysis based on datasets collected from different years during
this period.
As mentioned before, the RF technique has strength in dealing with the complicated nonlinearity
As mentioned before, the RF technique has strength in dealing with the complicated nonlinearity
relationship between the vehicle (or pedestrian) flow and occurrence of vehicle-pedestrian collisions.
relationship between the vehicle (or pedestrian) flow and occurrence of vehicle‐pedestrian collisions.
Although it may have some black-box problems, RF is capable of providing importance of variables
Although it may have some black‐box problems, RF is capable of providing importance of variables
(also called “features” in RF). Figure 2 shows the value of the importance for each variable with different
(also called “features” in RF). Figure 2 shows the value of the importance for each variable with
samples. Although the importance of each variable varied in different samples, two variables—the
different samples. Although the importance of each variable varied in different samples, two
number of retail shops and the taxi flow—ranked as the top two regardless of which sample was used.
variables—the number of retail shops and the taxi flow—ranked as the top two regardless of which
The mean feature importance of the two variables among the five samples was 0.3 and 0.15, respectively,
sample was used. The mean feature importance of the two variables among the five samples was 0.3
indicating their ability to predict the occurrence of vehicle-pedestrian collisions. As mentioned before,
and 0.15, respectively, indicating their ability to predict the occurrence of vehicle‐pedestrian
previous studies have already investigated the relationship between land use characteristics and the
collisions. As mentioned before, previous studies have already investigated the relationship between
occurrence of traffic crashes involving pedestrians [30,51], and it was found that vehicle-pedestrian
land use characteristics and the occurrence of traffic crashes involving pedestrians [30,51], and it was
collisions were more likely to happen in commercial areas. In this study, the commercial land was
found that vehicle‐pedestrian collisions were more likely to happen in commercial areas. In this
further segmented into different types of places such as retail shops and restaurants. The average
study, the commercial land was further segmented into different types of places such as retail shops
importance value of the number of retail shops ranked in first place (see NoRetShp in Figure 2); the value
and restaurants. The average importance value of the number of retail shops ranked in first place (see
ofNoRetShp in Figure 2); the value of the restaurant count ranged from 0.04 to 0.08. This may have
the restaurant count ranged from 0.04 to 0.08. This may have occurred because different kinds of
activities
occurred may produce
because diverse
different types
kinds of of pedestrian
activities flow,
may thus significantly
produce influencing
diverse types the flow,
of pedestrian occurrence
thus
ofsignificantly
vehicle-pedestrian collisions. The findings suggest that introducing POIs into the vehicle-pedestrian
influencing the occurrence of vehicle‐pedestrian collisions. The findings suggest that
crash prediction models is desirable.
introducing POIs into the vehicle‐pedestrian crash prediction models is desirable.
Figure 2. Feature importance of variables in each sample.
The final predicted vehicle-pedestrian collision density was produced by averaging the predictions
of five samples, and the potential of collision reduction was then calculated by subtracting the
prediction from the observation of vehicle-pedestrian collision density. Altogether, there were 634 BSUs
Sustainability 2018, 10, x FOR PEER REVIEW 8 of 11
Figure 2. Feature importance of variables in each sample.
The final
Sustainability predicted
2018, 10, 4762 vehicle‐pedestrian collision density was produced by averaging 8 ofthe
11
predictions of five samples, and the potential of collision reduction was then calculated by
subtracting the prediction from the observation of vehicle‐pedestrian collision density. Altogether,
with collision reduction potential. By comparing the resultant locations with those detected by merely
there were 634 BSUs with collision reduction potential. By comparing the resultant locations with
setting the density
those detected by threshold value,the
merely setting 4 ofdensity
35 potential hot spots
threshold were
value, 4 of excluded. Figure
35 potential hot 3spots
showswere
the
spatial distribution of hot spots that were finally determined as hazards for pedestrians (see solid black
excluded. Figure 3 shows the spatial distribution of hot spots that were finally determined as hazards
lines in Figure 3), as well as locations with no crash reduction potential (see solid red lines in Figure 3).
for pedestrians (see solid black lines in Figure 3), as well as locations with no crash reduction potential
It can be observed from the figure that hot spots were also clustered, resulting in several hot zones
(see solid red lines in Figure 3). It can be observed from the figure that hot spots were also clustered,
for pedestrians. Some notable hot spots in this district (see the ellipse in Figure 3) were located in
resulting in several hot zones for pedestrians. Some notable hot spots in this district (see the ellipse
Tian Shan Road, Gu Bei Road, Mao Tai Road, Lou Shan Guan Road, and South Yu Ping Road. If the
in Figure 3) were located in Tian Shan Road, Gu Bei Road, Mao Tai Road, Lou Shan Guan Road, and
potential for vehicle-pedestrian collision reduction was not considered, the length of the roads that
South Yu Ping Road. If the potential for vehicle‐pedestrian collision reduction was not considered,
required further examination, including those colored in both black and red in the figure, was 2.7 km
the length of the roads that required further examination, including those colored in both black and
in total. When the proposed integrated method was applied, only 1.8 km of road segments were
red in the figure, was 2.7 km in total. When the proposed integrated method was applied, only 1.8
identified as hazardous. This allows engineers and policy-makers to focus their efforts on locations
km of road segments were identified as hazardous. This allows engineers and policy‐makers to focus
where there might be a higher likelihood of improving pedestrian safety.
their efforts on locations where there might be a higher likelihood of improving pedestrian safety.
Notably, in the absence of detailed vehicle and pedestrian exposure information at the micro level,
Notably, in the absence of detailed vehicle and pedestrian exposure information at the micro
we employed three variables—taxi
level, we employed flow, land
three variables—taxi flow, use,
land and POI
use, and data—to reflect reflect
POI data—to the variation in traffic
the variation in
and pedestrian characteristics across the study area by following previous studies on the relationship
traffic and pedestrian characteristics across the study area by following previous studies on the
between the vehicle
relationship between volume (or pedestrian
the vehicle volume flow)
(or and taxi flowflow)
pedestrian (or land
and use characteristics)
taxi flow (or land [52–54].
use
Although the focus of this research was not the validation of the three variables as proxies of vehicle
characteristics) [52–54]. Although the focus of this research was not the validation of the three
and pedestrian flow, the way in which vehicle and pedestrian exposure can be measured has always
variables as proxies of vehicle and pedestrian flow, the way in which vehicle and pedestrian exposure
been an measured
can be area of interest
has in road safety
always been research
an area [30]. With more
of interest experiments
in road on the feasibility
safety research [30]. With of proxy
more
variables being performed in future, better tools can be developed to increase the precision
experiments on the feasibility of proxy variables being performed in future, better tools can of the
be
estimation, and the proposed method in this research could be further improved.
developed to increase the precision of the estimation, and the proposed method in this research could
be further improved.
5. Conclusions
The improvement in pedestrian safety plays a crucial role in developing a safe and friendly
walking environment to help ensure urban sustainability. Given the importance of hot spot detection
in safety management, we proposed a framework for the identification of hazardous road locations
Sustainability 2018, 10, 4762 9 of 11
for pedestrians by integrating the likelihood of the occurrence of vehicle-pedestrian collisions and
the potential for the reduction in traffic collisions involving vehicles and pedestrians. The research
is of significance by not only theoretically enriching the methodology of hotspot identification but
also practically providing useful information for policy-makers to propose countermeasures for
pedestrian safety.
The method through which traffic and pedestrian exposures are measured by taxi trajectories,
land use, and POI variables has not been fully explored. As a further step, research efforts may be
dedicated to additional validation experiments. We used the proposed framework to identify the
vehicle-pedestrian crash hot spots in only one period. If more vehicle-pedestrian collision data in other
periods are available, the usefulness of the framework can be further examined. As the identification
of hazardous road locations is the first step in safety improvement programs, future studies should
investigate risk factors and the treatment of hot spots.
Author Contributions: Conceptualization, S.Y.; methodology, S.Y. and J.W. (Jianping Wu); software, J.W.
(Jinzi Wang); validation, S.Y., L.F. and J.W. (Jianping Wu); formal analysis, S.Y.; investigation, S.Y.; resources,
S.Y. and J.W. (Jianping Wu); data curation, S.Y. and J.W. (Jinzi Wang); writing—original draft preparation, S.Y.;
writing—review and editing, S.Y., L.F. and J.W. (Jianping Wu); visualization, S.Y.; supervision, J.W. (Jianping Wu);
project administration, S.Y.; funding acquisition, S.Y. and J.W. (Jianping Wu).
Funding: This research was funded by National Key R&D Program of China, grant No. 2017YFE0100700; National
Natural Science Foundation of China, grant No. 41701462; and China Postdoctoral Science Foundation, grants
No. 2016M601539 and No. 2018T110371.
Acknowledgments: The authors would like to thank Jie Zhu for technical support, and greatly appreciate the
valuable comments from editors and three reviewers.
Conflicts of Interest: The authors declare no conflict of interest. The funders had no role in the design of the
study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to
publish the results.
References
1. WHO. Global Status Report on Road Safety 2015; World Health Organization: Geneva, Switzerland, 2015.
2. Loo, B.P.Y.; Yao, S. The Identification of Traffic Crash Hot Zones under the Link-Attribute and Event-Based
Approaches in a Network-Constrained Environment. Comput. Environ. Urban Syst. 2013, 41, 249–261.
[CrossRef]
3. Yamada, I.; Thill, J.C. Local Indicators of Network-Constrained Clusters in Spatial Patterns Represented by a
Link Attribute. Ann. Assoc. Am. Geogr. 2010, 100, 269–285. [CrossRef]
4. Harirforoush, H.; Bellalite, L. A New Integrated GIS-Based Analysis to Detect Hotspots: A Case Study of the
City of Sherbrooke. Accid. Anal. Prev. 2016, in press. [CrossRef] [PubMed]
5. Xie, Z.; Yan, J. Kernel Density Estimation of Traffic Accidents in a Network Space. Comput. Environ.
Urban Syst. 2008, 32, 396–406. [CrossRef]
6. Xie, Z.; Yan, J. Detecting Traffic Accident Clusters with Network Kernel Density Estimation and Local Spatial
Statistics: An Integrated Approach. J. Transp. Geogr. 2013, 31, 64–71. [CrossRef]
7. Cheng, W.; Washington, S.P. Experimental Evaluation of Hotspot Identification Methods. Accid. Anal. Prev.
2005, 37, 870–881. [CrossRef] [PubMed]
8. Long, T.T.; Somenahalli, S.V.C. Using GIS to Identify Pedestrian-Vehicle Crash Hot Spots and Unsafe Bus
Stops. J. Public Trans. 2011, 14, 99–114. [CrossRef]
9. Hao, Y.; Liu, P.; Chen, J.; Wang, H. Comparative Analysis of the Spatial Analysis Methods for Hotspot
Identification. Accid. Anal. Prev. 2014, 66, 80–88. [CrossRef]
10. Nie, K.; Wang, Z.; Du, Q.; Ren, F.; Tian, Q. A Network-Constrained Integrated Method for Detecting
Spatial Cluster and Risk Location of Traffic Crash: A Case Study from Wuhan, China. Sustainability 2015, 7,
2662–2677. [CrossRef]
11. Naji, H.A.H.; Xue, Q.; Lyu, N.; Wu, C.; Zheng, K. Evaluating the Driving Risk of near-Crash Events Using a
Mixed-Ordered Logit Model. Sustainability 2018, 10, 2868. [CrossRef]
Sustainability 2018, 10, 4762 10 of 11
12. Loo, B.P.; Yao, S.; Wu, J. Spatial Point Analysis of Road Crashes in Shanghai: A GIS-Based Network Kernel
Density Method. In Proceedings of the 19th International Conference on Geoinformatics, Shanghai, China,
24–26 June 2011.
13. Yamada, I.; Thill, J.C. Local Indicators of Network-Constrained Clusters in Spatial Point Patterns. Geogr. Anal.
2007, 39, 268–292. [CrossRef]
14. Yao, S.; Loo, B.P.; Yang, B.Z. Traffic Collisions in Space: Four Decades of Advancement in Applied GIS.
Ann. GIS 2016, 22, 1–14. [CrossRef]
15. Silverman, B.W. Density Estimation for Statistics and Data Analysis; Chapman & Hall/CRC Press: Boca Raton,
FL, USA, 1986.
16. Flahaut, B.; Mouchart, M.; Martin, E.S.; Thomas, I. The Local Spatial Autocorrelation and the Kernel Method
for Identifying Black Zones: A Comparative Approach. Accid. Anal. Prev. 2003, 35, 991–1004. [CrossRef]
17. Erdogan, S.; Yilmaz, I.; Baybura, T.; Gullu, M. Geographical Information Systems Aided Traffic Accident
Analysis System Case Study: City of Afyonkarahisar. Accid. Anal. Prev. 2008, 40, 174–181. [CrossRef]
[PubMed]
18. Krisp, J.M.; Durot, S. Segmentation of Lines Based on Point Densities—An Optimisation of Wildlife Warning
Sign Placement in Southern Finland. Accid. Anal. Prev. 2007, 39, 38–46. [CrossRef] [PubMed]
19. Deacon, J.A.; Charles, V.Z.; Deen, R.C. Identification of Hazardous Rural Highway Locations. Transp. Res. Rec.
1974, 410. [CrossRef]
20. Norden, M.; Orlansky, J.; Jacobs, H. Application of Statistical Quality-Control Techniques to Analysis of
Highway-Accident Data. Highw. Res. Board Bull. 1956, 117, 17–31.
21. Morin, D.A. Application of Statistical Concepts to Accident Data. Highw. Res. Rec. 1967, 188, 72–79.
22. Stokes, R.; Mutabazi, M. Rate-Quality Control Method of Identifying Hazardous Road Locations.
Transp. Res. Rec. 1996, 1542, 44–48. [CrossRef]
23. McGuigan, D.R.D. The Use of Relationships between Road Accidents and Traffic Flow in “Black-Spot”
Identification. Traffic Eng. Control 1981, 22, 448–453.
24. McGuigan, D.R.D. Non-Junction Accident Rates and Their Use In ‘black-Spot’ Identification. Traffic Eng. Control
1982, 23, 60–65.
25. Mahalel, D.; Hakkert, A.S.; Prashker, J.N. A System for the Allocation of Safety Resources on a Road Network.
Accid. Anal. Prev. 1982, 14, 45–56. [CrossRef]
26. Cheng, W.; Washington, S. New Criteria for Evaluating Methods of Identifying Hot Spots. Transp. Res. Rec.
2008, 2083, 76–85. [CrossRef]
27. Waller, L.A.; Gotway, C.A. Applied Spatial Statistics for Public Health Data; Wiley-Interscience: Hoboken, NJ,
USA, 2004.
28. Huang, H.; Hong, C.C. Modeling Road Traffic Crashes with Zero-Inflation and Site-Specific Random Effects.
Stat. Methods Appl. 2010, 19, 445–462. [CrossRef]
29. Anastasopoulos, P.C.; Mannering, F.L. A Note on Modeling Vehicle Accident Frequencies with
Random-Parameters Count Models. Accid. Anal. Prev. 2009, 41, 153–159. [CrossRef] [PubMed]
30. Yao, S.; Loo, B.P.Y.; Lam, W.W.Y. Measures of Activity-Based Pedestrian Exposure to the Risk of
Vehicle-Pedestrian Collisions: Space-Time Path Vs. Potential Path Tree Methods. Accid. Anal. Prev. 2015, 75,
320–332. [CrossRef] [PubMed]
31. Chang, L.Y. Analysis of Freeway Accident Frequencies: Negative Binomial Regression Versus Artificial
Neural Network. Saf. Sci. 2005, 43, 541–557. [CrossRef]
32. Xie, Y.; Lord, D.; Zhang, Y. Predicting Motor Vehicle Collisions Using Bayesian Neural Network Models:
An Empirical Analysis. Accid. Anal. Prev. 2007, 39, 922–933. [CrossRef]
33. Zeng, Q.; Huang, H.; Xin, P.; Wong, S.C.; Gao, M. Rule Extraction from an Optimized Neural Network for
Traffic Crash Frequency Modeling. Accid. Anal. Prev. 2016, 97, 87–95. [CrossRef]
34. Liaw, A.; Wiener, M. Classification and Regression by Randomforest. R News 2002, 2, 18–22.
35. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [CrossRef]
36. Gromping, U. Variable Importance Assessment in Regression: Linear Regression Versus Random Forest.
Am. Stat. 2009, 63, 308–319. [CrossRef]
37. Haas, J.; Ban, Y. Urban Growth and Environmental Impacts in Jing-Jin-Ji, the Yangtze, River Delta and the
Pearl River Delta. Int. J. Appl. Earth Obs. Geoinf. 2014, 30, 42–55. [CrossRef]
Sustainability 2018, 10, 4762 11 of 11
38. Oliveira, S.; Oehler, F.; San-Miguel-Ayanz, J.; Camia, A.; Pereira, J.M.C. Modeling Spatial Patterns of Fire
Occurrence in Mediterranean Europe Using Multiple Regression and Random Forest. For. Ecol. Manag. 2012,
275, 117–129. [CrossRef]
39. Topouzelis, K.; Psyllos, A. Oil Spill Feature Selection and Classification Using Decision Tree Forest on Sar
Image Data. ISPRS J. Photogramm. Remote Sens. 2012, 68, 135–143. [CrossRef]
40. Rodriguez-Galiano, V.F.; Chica-Olmo, M.; Chica-Rivas, M. Predictive Modelling of Gold Potential with the
Integration of Multisource Information Based on Random Forest: A Case Study on the Rodalquilar Area,
Southern Spain. Int. J. Geogr. Inf. Sci. 2014, 28, 1336–1354. [CrossRef]
41. Wang, H.; Zhao, Y.; Pu, R.; Zhang, Z. Mapping Robinia Pseudoacacia Forest Health Conditions by Using
Combined Spectral, Spatial, and Textural Information Extracted from Ikonos Imagery and Random Forest
Classifier. Remote Sens. 2015, 7, 9020–9044. [CrossRef]
42. Breiman, L. Bagging Predictors. Mach. Learn. 1996, 24, 123–140. [CrossRef]
43. Genuer, R.; Poggi, J.M.; Tuleau-Malot, C. Variable Selection Using Random Forests. Pattern Recognit. Lett.
2010, 31, 2225–2236. [CrossRef]
44. Scikit-learn. Available online: https://scikit-learn.org/stable/ (accessed on 18 November 2018).
45. Cutler, D.R.; Edwards, T.C.; Beard, K.H.; Cutler, A.; Hess, K.T.; Gibson, J.; Lawler, J.J. Random Forests for
Classification in Ecology. Ecology 2007, 88, 2783–2792. [CrossRef]
46. Li, Q.; Zhang, T.; Yu, Y. Using Cloud Computing to Process Intensive Floating Car Data for Urban Traffic
Surveillance. Int. J. Geogr. Inf. Sci. 2011, 25, 1303–1322. [CrossRef]
47. Liu, X.; Gong, L.; Gong, Y.; Liu, Y. Revealing Travel Patterns and City Structure with Taxi Trip Data.
J. Transp. Geogr. 2015, 43, 78–90. [CrossRef]
48. Gao, S.; Wang, Y.; Gao, Y.; Liu, Y. Understanding Urban Traffic-Flow Characteristics: A Rethinking of
Betweenness Centrality. Environ. Plan. B Plan. Des. 2013, 40, 135–153. [CrossRef]
49. Wang, X.; Fan, T.; Chen, M.; Deng, B.; Wu, B.; Tremont, P. Safety Modeling of Urban Arterials in Shanghai,
China. Accid. Anal. Prev. 2015, 83, 57–66. [CrossRef] [PubMed]
50. Chen, B.Y.; Yuan, H.; Li, Q.; Lam, W.H.K.; Shaw, S.L.; Yan, K. Map-Matching Algorithm for Large-Scale
Low-Frequency Floating Car Data. Int. J. Geogr. Inf. Sci. 2014, 28, 22–38. [CrossRef]
51. Yang, B.Z.; Loo, B.P.Y. Land Use and Traffic Collisions: A Link-Attribute Analysis Using Empirical Bayes
Method. Accid. Anal. Prev. 2016, 95, 236–249. [CrossRef]
52. Ozbil, A.; Peponis, J.; Stone, B. Understanding the Link between Street Connectivity, Land Use and Pedestrian
Flows. Urban Des. Int. 2011, 16, 125–141. [CrossRef]
53. Lamíquiz, P.J.; López-Domínguez, J. Effects of Built Environment on Walking at the Neighbourhood Scale.
A New Role for Street Networks by Modelling Their Configurational Accessibility? Transp. Res. A Policy Pract.
2015, 74, 148–163. [CrossRef]
54. Castro, P.S.; Zhang, D.; Li, S. Urban Traffic Modelling and Prediction Using Large Scale Taxi Gps Traces.
In Proceedings of the 10th International Conference, Pervasive 2012, Newcastle, UK, 18–22 June 2012.
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).