0% found this document useful (0 votes)
67 views52 pages

Project Report - Principles of Data Analytics DAMO-500-1

The document presents a data-driven analysis of traffic safety in Toronto, focusing on fatal and serious injury collisions to support the Vision Zero Road Safety Plan. It utilizes the 'Total Killed or Seriously Injured (KSI) Collisions' dataset from the Toronto Police Service, aiming to identify high-risk neighborhoods, assess contributing factors, and develop predictive models for future collision trends. The study employs various statistical methods and visualizations to derive actionable insights for improving road safety in the city.

Uploaded by

Bli Wilson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views52 pages

Project Report - Principles of Data Analytics DAMO-500-1

The document presents a data-driven analysis of traffic safety in Toronto, focusing on fatal and serious injury collisions to support the Vision Zero Road Safety Plan. It utilizes the 'Total Killed or Seriously Injured (KSI) Collisions' dataset from the Toronto Police Service, aiming to identify high-risk neighborhoods, assess contributing factors, and develop predictive models for future collision trends. The study employs various statistical methods and visualizations to derive actionable insights for improving road safety in the city.

Uploaded by

Bli Wilson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Traffic Safety Analysis in Toronto: A Data-Driven Approach to Vision Zero

Roberto Alberto San Miguel (NF1001332)

Wilson Kwesi Bli (NF1009701)

Mustafa Ahmed Saeed Mohamed Abdulmegid (NF1010652)

Javier Alberto Correa Obregón (NF1008713)

Bimbo Olukoya Solanke (NF1009888)

Bryan Daniel Gil Sotelo (NF1017598)

DAMO-500-1: Fall 2024 Principle of Analytics

Prof. Patty Zakaria

December 8th, 2024

1
Contents
1. Introduction .........................................................................................................................3
Context and Relevance .........................................................................................................3
Rationale ..............................................................................................................................3
Objectives and Scope ...........................................................................................................4
2. Data Description ...................................................................................................................5
Dataset Overview..................................................................................................................5
Data Types and Key Variables ................................................................................................5
Data Preprocessing...............................................................................................................6
Dataset Justification .............................................................................................................7
3. Research Questions and Hypotheses ......................................................................................8
Methodology....................................................................................................................... 11
4. Results ............................................................................................................................... 16
Research Question 1 ........................................................................................................... 16
Research Question 2 ........................................................................................................... 23
Research Question 3 ........................................................................................................... 31
Research Question 4:.......................................................................................................... 38
8. Conclusion.......................................................................................................................... 51
References ............................................................................................................................. 52

2
1. Introduction
Context and Relevance
Traffic safety is a critical concern in Toronto, where the Vision Zero Road Safety Plan (RSP)

represents an ongoing commitment to eliminate fatalities and serious injuries on the city’s roads.

Despite the progress achieved through measures such as speed management, pedestrian

infrastructure improvements, and public awareness campaigns, serious collisions remain a

pressing issue.

This project focuses on analyzing fatal and serious injury collisions using the "Total Killed or

Seriously Injured (KSI) Collisions" dataset, which provides detailed records of high-severity

traffic incidents. These collisions, involving fatalities or major injuries, highlight systemic risks

in Toronto’s traffic infrastructure and patterns of road usage. Addressing these incidents is vital

to achieving Vision Zero goals, as they disproportionately affect pedestrians, cyclists, and

vulnerable road users.

Global cities such as London and New York have successfully employed data-driven approaches

to reduce traffic fatalities and injuries. For example, New York City’s Vision Zero initiative

achieved a 30% reduction in fatalities by targeting accident-prone intersections and improving

pedestrian crossings (New York City DOT, 2020). Similarly, London reduced pedestrian injuries

by 25% over five years through hotspot analyses (London Road Safety Review, 2019). This

project builds on these models, tailoring the analysis to Toronto’s unique urban dynamics to

provide actionable insights for improving road safety.

Rationale
Localized studies analyzing fatal and serious injury collisions in Toronto are limited. By

leveraging the Toronto Police Service’s dataset, this project addresses a critical gap in

3
understanding traffic safety trends in the city. The dataset includes granular information, such as

collision locations, injury severities, and external factors, offering a comprehensive foundation

for the analysis.

This research aims to:

 Identify accident-prone neighborhoods.

 Assess external variables, such as weather conditions and time of day, which contribute

to collision rates.

 Provide targeted, data-driven recommendations to reduce fatal and serious injury

collisions.

 By addressing these objectives, the project will directly support Toronto's Vision Zero

initiative, offering insights to improve resource allocation, safety policies, and urban

planning.

Objectives and Scope


The primary objective of this project is to conduct a thorough analysis of fatal and serious

injury collisions in Toronto. Specifically, the project will:

 Identify high-risk areas: Pinpoint neighborhoods with frequent fatal and serious injury

collisions.

 Analyze collision types: Examine trends across accidents involving pedestrians, cyclists,

and vehicles to understand contributing factors.

 Assess how external variables affect collisions: Investigate the influence of weather,

seasons, and time of day on collision rates.

4
 Develop predictive models: Use historical data to forecast future high-risk areas and

trends.

Scope: The analysis will utilize the Total Killed or Seriously Injured (KSI) Collisions dataset

provided by the Toronto Police Service, spanning multiple years to identify both recent and long-

term trends.

2. Data Description
Dataset Overview
This study utilized the "Killed or Seriously Injured (KSI) Collisions" dataset, sourced from the

Toronto Police Service Open Data Portal. The dataset contains detailed records of traffic

collisions involving major or fatal injuries, spanning the period from 2006 to 2023. A total of

6,870 unique serious or fatal accidents were analyzed for this study, with data aggregated at the

accident level rather than at the individual level.

Data Types and Key Variables


The dataset captures various dimensions of traffic collisions, including environmental, temporal,

and spatial details. Key variables include:

1. Collision Details:

Date and Time: Precise timestamps for each collision.

Location: Latitude and longitude coordinates, alongside neighborhood identifiers.

2. Severity and Outcomes:

5
Injury Severity: Includes only "Major" and "Fatal" injuries for this study, as the analysis

is conducted at the accident level. The most severe injury per accident was used to

classify the event.

Number of Victims: Derived from unique identifiers to ensure one record per accident.

3. External Variables:

Road Conditions: Includes road surface conditions (e.g., dry, wet, snowy) to evaluate

their impact on collision rates.

Visibility: Reflects weather-related environmental conditions (e.g., clear, rain, snow,

fog).

Data Preprocessing
To ensure the dataset was suitable for robust analysis, the following preprocessing steps were

conducted:

1. Data Cleaning:

 Duplicates were removed to ensure each accident was represented only once.

 Missing values in key fields (e.g., visibility, road conditions) were flagged or imputed

based on similar conditions or omitted where information was insufficient.

 Unique identifiers were generated for each accident by combining location and time

variables to handle cases where multiple individuals were involved in a single event.

2. Normalization and Transformation:

6
 Spatial data (latitude and longitude) were cross-referenced with Toronto’s official

neighborhood boundaries to standardize the "Neighborhood" field.

 Temporal data were categorized into defined time periods (Night, Morning,

Afternoon, Evening) and seasons (Winter, Spring, Summer, Fall) for comparative

analysis.

3. Data Filtering:

 Only collisions involving major or fatal injuries were retained to focus the study

on high-severity events.

 Data related to intersection-specific collisions or minor injuries were excluded to

align with the project’s scope.

Dataset Justification
Relevance to Research Questions:

 The dataset provides granular information on collision severity, location, and external

factors, enabling precise analysis of high-risk neighborhoods and conditions influencing

accidents.

Comprehensive Coverage:

 Spanning 17 years, the dataset captures long-term trends and allows for robust temporal

and spatial analysis.

7
Alignment with Objectives:

 By focusing on serious and fatal collisions, the dataset aligns directly with the study’s

goal of supporting Toronto’s Vision Zero initiative and addressing systemic risks in

traffic safety.

Rich External Factor Data:

 Variables such as road conditions and visibility provide critical insights into the role of

environmental factors in collision occurrences, supporting hypotheses on weather and

road safety.

The "Killed or Seriously Injured (KSI) Collisions" dataset was selected for its detailed and

relevant records, providing the necessary foundation to explore and address critical research

questions. It offers robust support for identifying high-risk areas, assessing the impact of external

factors, and predicting future collision trends. Future analyses could further benefit from

integrating additional datasets, such as real-time traffic volume and pedestrian density, for a

more comprehensive understanding.

3. Research Questions and Hypotheses


To ensure clarity and testability, each hypothesis includes a null hypothesis (H₀) and an

alternative hypothesis (H₁). These are specific, measurable, and directly aligned with the

project’s objectives.

Research Question 1: Which neighborhoods in Toronto experience the highest collision

frequencies, and how do collision patterns vary across neighborhoods?

8
 H₀ (Null Hypothesis): Collision frequencies are uniformly distributed across

neighborhoods in Toronto.

 H₁ (Alternative Hypothesis): Certain neighborhoods in Toronto have significantly

higher collision frequencies compared to others.

Rationale: Testing these hypotheses helps identify geographic areas requiring safety

interventions.

Research Question 2: What types of collisions are most common in high-risk

neighborhoods?

 H₀ (Null Hypothesis): There is no significant difference in the frequency of vehicle-

related, pedestrian-related, and cyclist-related collisions in high-risk neighborhoods.

 H₁ (Alternative Hypothesis): There is a significant difference in the frequency of

vehicle-related, pedestrian-related, and cyclist-related collisions in high-risk

neighborhoods.

Rationale: Identifying the types of collisions prevalent in high-risk areas highlights vulnerable

road users and informs targeted safety measures, such as cyclist and pedestrian protections.

Research Question 3: How do external factors such as weather, time of day, and seasons

influence collision occurrences?

Hypothesis 3.1: Collision rates increase significantly during adverse weather conditions

(e.g., rain, snow).

9
• H₀ (Null Hypothesis): Weather conditions do not significantly influence collision rates.

• H₁ (Alternative Hypothesis): Weather conditions significantly influence collision rates.

Hypothesis 3.2: Fatal collisions are more likely to occur during peak traffic hours.

• H₀ (Null Hypothesis): Fatal collisions are not significantly more frequent during peak

hours compared to non-peak hours.

• H₁ (Alternative Hypothesis): Fatal collisions are significantly more frequent during

peak hours compared to non-peak hours.

Hypothesis 3.3: Seasonal variations, such as winter months, correlate with higher collision

rates.

• H₀ (Null Hypothesis): Seasonal variations do not significantly influence collision rates.

• H₁ (Alternative Hypothesis): Seasonal variations, such as winter months, significantly

influence collision rates.

Rationale: External factors like weather and time affect road conditions and driver behavior.

Understanding these correlations can inform safety strategies, such as enhanced enforcement or

infrastructure improvements during high-risk periods.

Research Question 4: How can predictive models forecast future collision rates?

Hypothesis 4.1: Historical collision data trends accurately predict future collision rates.

H₀ (Null Hypothesis): Historical collision data trends cannot reliably predict future collision

rates.

10
H₁ (Alternative Hypothesis): Historical collision data trends can reliably predict future collision

rates.

Hypothesis 4.2: External factors such as weather and road conditions improve the accuracy

of predictive models.

H₀ (Null Hypothesis): Including external factors such as weather and road condition does not

significantly improve the accuracy of predictive models.

H₁ (Alternative Hypothesis): Including external factors such as weather and road condition

significantly improves the accuracy of predictive models.

Rationale: Accurate predictions based on historical data and external factors enable proactive

planning and resource allocation, preventing collisions before they occur.

Methodology
This section outlines the analytical framework and statistical methods applied in this study to

address the research questions. It builds upon the data preparation steps detailed earlier, focusing

on the tools, techniques, and rationale for their use.

1. Analytical Techniques

1.1 Descriptive Analysis

Descriptive statistics provided foundational insights into the dataset and guided the subsequent

inferential analyses. Key aspects included:

11
 Neighborhood-Level Analysis: Summarized collision frequencies to identify high-risk

areas and assess spatial disparities.

 Temporal Analysis: Collisions were analyzed by time of day and season to detect

patterns in peak periods.

 External Factors: Environmental variables, such as road conditions and visibility, were

examined to understand their impact on collision rates.

1.2 Inferential Statistical Methods

To test the hypotheses and explore significant relationships, the following methods were applied:

1. Analysis of Variance (ANOVA):

 Purpose: To determine if collision frequencies differed significantly across

neighborhoods.

 Assumptions: Homogeneity of variance and normality were tested. The non-

parametric Kruskal-Wallis test was used when assumptions were violated.

2. Chi-Square Tests:

 Goodness-of-Fit: Tested whether observed distributions (e.g., collision types or

weather conditions) deviated significantly from expected uniform distributions.

 Independence: Assessed relationships between categorical variables, such as

collision type and time periods.

3. Pairwise Comparisons:

12
 Pairwise Chi-Square tests were conducted for collision types (e.g., Vehicle vs.

Pedestrian) to identify specific group differences.

 Bonferroni corrections were applied to adjust significance thresholds and control

for Type I error.

4. Predictive Modeling:

 Base Model - SARIMA:

 Used to forecast collision trends and identify seasonal patterns.

Diagnostics such as autocorrelation (ACF) and partial autocorrelation

(PACF) plots were used to ensure appropriate lag selection.

 Enhanced Model - Random Forest with External Factors:

 Built on SARIMA foundation, incorporated external factors (e.g., weather,

road conditions) to improve accuracy. Comparisons with base models

highlighted the significance of these variables.

2. Visualization and Spatial Analysis

1. Heatmaps:

 Visualized collision densities across neighborhoods, providing spatial context for

high-risk areas.

2. Bar Charts:

13
 Illustrated collision type distributions, offering a clear representation of the

relative frequencies of vehicle, pedestrian, and cyclist collisions.

3. Time-Series Plots:

 Showed trends in collision counts over time and highlighted seasonal peaks.

4. Cluster Analysis:

 Identified geographic clusters with high collision frequencies, aiding in targeted

intervention strategies.

3. Model Validation and Robustness Checks

1. Statistical Assumptions:

 Parametric tests such as ANOVA relied on normality and homoscedasticity,

which were tested using Levene’s test and visual diagnostics. Non-parametric

tests like Kruskal-Wallis were employed where these assumptions did not hold.

2. Predictive Model Evaluation:

 Metrics such as R-squared, RMSE, and MAE were used to evaluate the accuracy

and reliability of predictive models.

 Comparisons between base models and enhanced models (incorporating external

factors) quantified the added value of these variables.

3. Multiple Comparison Adjustments:

14
 Bonferroni corrections adjusted p-values in pairwise Chi-Square tests to ensure

statistical validity.

4. Justification of Methods

1. Relevance to Research Questions:

 Statistical tests and predictive models were selected based on their alignment with

specific research objectives, ensuring robust answers to the hypotheses.

2. Data Characteristics:

 The dataset's categorical nature necessitated non-parametric tests like Chi-Square,

while time-series forecasting methods (e.g., ARIMA) effectively modeled

temporal trends.

3. Enhanced Predictive Models:

 Incorporating external factors significantly improved the predictive power of the

models, as evidenced by reductions in RMSE and increases in R-squared values.

5. Limitations

1. Data Constraints:

 Missing data in key variables, such as pedestrian density and real-time traffic

volume, limited the scope of some analyses.

2. Forecasting Uncertainty:

15
 Wider confidence intervals for long-term predictions underscore the need for

regular model recalibration and updates with new data.

The methodological framework employed a blend of descriptive and inferential techniques

alongside robust predictive modeling. These methods addressed the research questions

effectively while ensuring statistical rigor. Visual and spatial analyses complemented

quantitative findings, providing actionable insights for urban planning and traffic safety

interventions. This methodology establishes a strong foundation for future enhancements and

research extension.

4. Results
Research Question 1
Which neighborhoods in Toronto experience the highest collision frequencies, and how do

collision patterns vary across neighborhoods?

Hypotheses

Collision Frequencies:

H₀ (Null Hypothesis): Collision frequencies are uniformly distributed across neighborhoods in

Toronto.

H₁ (Alternative Hypothesis): Certain neighborhoods in Toronto have significantly higher

collision frequencies compared to others.

1.1 Descriptive Analysis

Summary Statistics
16
The dataset includes 6,870 collisions across 158 neighborhoods in Toronto.

Figure 1 Collision Count Case Summary

Figure 2 Collision Count Descriptive Summary

These statistics show that collision counts are highly variable across neighborhoods, with a

significant proportion of neighborhoods experiencing relatively low collision frequencies while a

few see very high counts.

1.2. Statistical Analysis

A. ANOVA Test

17
Objective: To assess whether significant differences exist in collision frequencies across

neighborhoods.

Assumptions Tested:

Homogeneity of Variances:

 Levene’s test was performed to check whether the variance of collision counts is equal

across neighborhoods.

 Results:

Based on the mean: p < 0.001, indicating that the assumption of equal variances is

violated.

Based on the median and trimmed mean: p > 0.05, suggesting homogeneity when these

measures are considered.

Conclusion: While variances based on the mean are unequal, the results based on median and

trimmed mean suggest that ANOVA can still be performed with caution.

Figure 3 Levene’s test results for hypothesis of research question 1

18
ANOVA Results:

Table 1
F-statistic: 71,446.799
p-value: < 0.001
Interpretation: The ANOVA results confirm significant differences in collision frequencies

across neighborhoods, supporting the rejection of the null hypothesis (H₀).

Figure 4 ANOVA test results for hypothesis of research question 2


Supplementary Analysis: Kruskal-Wallis Test

Objective: To confirm the findings from the ANOVA test without assuming equal variances or

normality.

Results:

Table 2
Kruskal-Wallis H: 5679.160.
Degrees of Freedom 139
(df):
p-value: < 0.001.

Figure 5 test statistics

19
Interpretation:

The significant p-value indicates that collision frequencies vary significantly across

neighborhoods, consistent with the ANOVA results. This strengthens the conclusion that certain

neighborhoods experience significantly higher collision counts.

B. Chi-Square Test of Goodness of Fit

Objective: To test whether collision frequencies follow a uniform distribution across

neighborhoods.

Results:

Table 3
Chi-Square Value 2140.255.
Degrees of Freedom (df): 77
p-value: < 0.001.

Figure 6

Interpretation: The Chi-Square test shows that observed collision frequencies deviate

significantly from expected uniform distribution, further confirming variability in collision

counts across neighborhoods.

Conclusion for Hypothesis 1: Collision Frequencies

20
Both ANOVA and Chi-Square tests indicate significant variability in collision frequencies across

neighborhoods.

Hypothesis Evaluation:

H₀ (Null Hypothesis): Rejected.

H₁ (Alternative Hypothesis): Accepted.

Certain neighborhoods in Toronto experience significantly higher collision frequencies,

justifying focused interventions in these high-risk areas.

1.3. Spatial Analysis

Heatmap

Objective: To visualize the spatial distribution of collision densities across neighborhoods.

Findings:

High-density collision areas are concentrated in central and southern Toronto, particularly in

neighborhoods with high population densities and traffic volumes.

Peripheral neighborhoods exhibit lower collision densities.

21
Figure 7 Spatial distribution heatmap of high-collision areas in Toronto
Heatmap Insights

The heatmap reveals significant clustering of collisions in specific neighborhoods, particularly in

central and southern areas of Toronto. The neighborhoods with the highest collision frequencies

are:

 Neighborhoods with High Collision Density:

o Neighborhood IDs: 162, 78, 164, 165, 166, 73, 168, 72, 71, 95, 70, 85, 84

Why These Areas?

1. High Population and Traffic Density:

a. Central neighborhoods, such as 162, 164, and 166, are known for their dense

residential and commercial zones, leading to a greater number of pedestrians and

vehicles interacting in these areas.

b. These areas are often hubs for commuting, increasing the risk of collisions during

peak traffic hours.

2. Key Traffic Corridors:

22
a. Major roads and highways, such as [insert key roads/highways if known],

traverse these neighborhoods, contributing to heavy traffic volumes.

b. Intersections in neighborhoods like 165 and 168 may experience congestion,

leading to increased collision risks.

3. Commercial and Transit Zones:

a. Neighborhoods like 85 and 84 are likely to have busy commercial districts and

transit stops, attracting higher foot traffic and public transit activity, which

increases the likelihood of accidents.

4. Urban Infrastructure Challenges:

a. In neighborhoods like 73 and 72, insufficient pedestrian crossings, poorly

designed intersections, or a lack of dedicated bike lanes may contribute to the

frequency of collisions.

b. Road surface conditions in specific areas during adverse weather may also play a

role.

Research Question 2
What types of collisions are most prevalent in high-risk neighborhoods, and which road user

groups are most impacted?

Hypotheses

Collision Frequencies:

H₀ (Null Hypothesis): There is no significant difference in the frequency of vehicle-related,

pedestrian-related, and cyclist-related collisions in high-risk neighborhoods.

23
H₁ (Alternative Hypothesis): There is a significant difference in the frequency of vehicle-

related, pedestrian-related, and cyclist-related collisions in high-risk neighborhoods.

2.1. Descriptive Analysis

Summary Statistics

The dataset consists of collision frequencies for Vehicle, Pedestrian, and Cyclist groups across

the top 10 high-risk neighborhoods in Toronto.

Table 4
Collision Type Total Collisions
Vehicle 1,332
Pedestrian 544
Cyclist 165

Figure 8 Collision frequencies for vehicle, pedestrian and cyclist across top 10 high risk
neighborhoods

24
Interpretation:

Vehicle collisions dominate, accounting for approximately 65% of total collisions.

Pedestrian collisions make up 26%, while cyclist collisions are the least frequent at 9%.

2.2. Statistical Analysis

A. Overall Collision Type Distribution

A Chi-Square Goodness-of-Fit Test was performed to examine if collision frequencies for

Vehicle, Pedestrian, and Cyclist groups differ significantly from a uniform distribution.

Table 5
Collision Type Observed Count Expected Count Residual
Vehicle 1,332 680.33 +651.67
Pedestrian 544 680.33 -136.33
Cyclist 165 680.33 -515.33
Chi-Square Test Output (SPSS Results):

Figure 9 Frequency

Key Statistics: Figure 10 Test statistics

Chi-Square Statistic: 2 = 1,041.88

p-value: < 0.001

25
Interpretation: The null hypothesis is rejected, indicating significant differences in collision

frequencies among the three groups. Vehicle collisions are significantly more frequent than

pedestrian and cyclist collisions.

B. Pairwise Comparisons

1. Vehicle vs. Pedestrian To identify specific differences, a pairwise Chi-Square test was

conducted between Vehicle and Pedestrian collisions.

Table 6
Group Observed Count Expected Count Residual
Vehicle 1,332 945.7 +386.3
Pedestrian 544 386.3 +157.7

Chi-Square Test Output (SPSS Results):

Figure 11 Vehicle vs Pedestrian Crosstabulation

26
Figure 12 Chi-square tests

Key Statistics:

Chi-Square Statistic: 1 = 1,876.00

p-value: < 0.001

Interpretation: Vehicle collisions are significantly more frequent than pedestrian collisions in

high-risk neighborhoods.

2. Vehicle vs. Cyclist A second pairwise Chi-Square test was conducted between Vehicle and

Cyclist collisions.

Table 7
Group Observed Count Expected Count Residual
Vehicle 1,332 1,185.2 +146.8
Cyclist 165 311.8 -146.8

27
Chi-Square Test Output (SPSS Results):

Figure 13 Chi-square test output – Vehicle vs Cyclist Crosstabulation

Figure 14 Pearson Chi-square test output


Key Statistics:

Chi-Square Statistic: 1 = 1,497.00

p-value: < 0.001

Interpretation: Vehicle collisions are significantly more frequent than cyclist collisions in high-

risk neighborhoods.

28
3. Pedestrian vs. Cyclist A final pairwise Chi-Square test was conducted between Pedestrian

and Cyclist collisions.

Table 8
Group Observed Count Expected Count Residual

Pedestrian 544 386.3 +157.7

Cyclist 165 311.8 -146.8

Chi-Square Test Output (SPSS Results):

Figure 15 Cyclist vs Pedestrian Crosstabulation

29
Figure 16 Pearson Chi-square test
Key Statistics:

Chi-Square Statistic: 1 = 1,876.00

p-value: < 0.001

Interpretation: Pedestrian collisions are significantly more frequent than cyclist collisions in

high-risk neighborhoods.

Conclusion for Hypotheses

Statistical Evidence:

The overall Chi-Square test and pairwise comparisons confirm significant differences in collision

frequencies across the three road user groups.

Hypothesis Evaluation:

H₀ (Null Hypothesis): Rejected.

H₁ (Alternative Hypothesis): Accepted.

30
Conclusion:

Vehicle collisions are the most frequent, followed by pedestrian collisions, with cyclist collisions

being the least frequent in high-risk neighborhoods.

Research Question 3
How do external factors such as weather, time of day, and seasons influence collision

occurrences?

Hypotheses

H3.1: Collision Rates and Weather Conditions

• H₀ (Null Hypothesis): Weather conditions do not significantly influence collision rates.

• H₁ (Alternative Hypothesis): Weather conditions significantly influence collision rates

H3.2: Fatal Collisions and Peak Traffic Hours

 H₀ (Null Hypothesis): Fatal collisions are not more likely to occur during peak traffic

hours compared to non-peak hours.

 H₁ (Alternative Hypothesis): Fatal collisions are significantly more likely to occur

during peak traffic hours compared to non-peak hours.

H3.3: Collision Rates and Seasonal Variations

 H₀ (Null Hypothesis): Seasonal variations do not significantly influence collision rates.

 H₁ (Alternative Hypothesis): Seasonal variations significantly influence collision rates.

31
H3.1 Results: Collision Rates and Weather Conditions

1. Descriptive Analysis

Total Collisions Analyzed: 6,870

Figure 17 Descriptive statistics

Distribution by Weather Conditions:

Table 9
Good 85.8% (5,894 collisions)
Moderate 11.4% (782 collisions)
Adverse 2.6% (182 collisions)
Unknown 0.2% (12 collisions)

Figure 18

Interpretation:

Most collisions (85.8%) occurred under good weather conditions. Collisions during adverse

weather were much less frequent (2.6%), likely due to lower traffic volumes.

2. Statistical Analysis

Chi-Square Test for Goodness-of-Fit:

= 6,115.913,

=3

< 0.001.

Figure 19
32
Observed and Expected Frequencies:

Table 10
Observed Expected
Good 5894 1717.5
Moderate 782 1717.5
Adverse 182 1717.5
Unknown 12 1717.5
Interpretation: The Chi-Square test indicates a significant difference ( < 0.001) between

observed and expected collision frequencies. Collisions in good weather were significantly

overrepresented, while collisions during moderate, adverse, and unknown weather were

underrepresented.

Observed vs. Expected Collision Frequencies


by Weather Condition
7000
Frequency of Collisions

6000
5000
4000
3000
2000
1000
0
Good Moderate Adverse Unknown
Weather Conditions

Observed Frequency Expected Frequency

Figure 20 Distribution of collisions across weather conditions

Conclusion for H3.1

Statistical Evidence: The Chi-Square test confirms a significant relationship between weather

conditions and collision frequencies ( < 0.001).

33
Hypothesis Evaluation:

H₀ (Null Hypothesis): Rejected.

H₁ (Alternative Hypothesis): Accepted.

Conclusion: Weather conditions significantly influence collision rates.

Hypothesis 3.2: Peak Traffic Hours

Descriptive Analysis

Summary Statistics

Collisions were categorized by peak hours (7–9 AM, 4–6 PM) and non-peak hours:

Table 11
Time Period Non-Fatal Collisions Fatal Collisions Total Collisions % Fatal
Peak Hours 3,344 489 3,833 12.8%
Non-Peak Hours 2,569 468 3,037 15.4%
Total 5,913 957 6,870 13.9%
Table H3.2.1.1 Collision Counts by Peak and Non-Peak Hours

Interpretation

Fatal collisions appear more frequent during peak hours, though non-fatal collisions dominate

overall.

Statistical Analysis

Chi-Square Test

Objective: To test whether fatal collisions are more likely during peak hours compared to non-

34
peak hours.

Results:

Figure 21 Pearson Chi-Square test


Interpretation:

The results indicate a statistically significant relationship between peak hours and fatal

collisions. Fatal collisions are more likely to occur during peak traffic hours.

35
Figure 22 Summary of collisions during peak and non-peak hours

Conclusion for Hypothesis 3.2:

H₀ (Null Hypothesis): Rejected.

H₁ (Alternative Hypothesis): Accepted.

Fatal collisions are significantly more likely during peak hours.

Hypothesis 3.3: Seasonal Variations

Descriptive Analysis

Seasonal Collision Frequencies and Percentages Table

Table 12
Season Non-Fatal Collisions (%) Fatal Collisions (%) Total Collisions (%)

Winter 84.4% 15.6% 20.6%

36
Season Non-Fatal Collisions (%) Fatal Collisions (%) Total Collisions (%)

Spring 87.0% 13.0% 21.3%

Summer 86.7% 13.3% 29.3%

Fall 86.0% 14.0% 28.8%

Total 86.1% 13.9% 100.0%

Interpretation

Non-Fatal collisions dominate across all seasons (~86%), and seasonal differences in Fatal

collision rates are minimal.

Statistical Analysis

Chi-Square Test

Objective: To test whether collision rates vary significantly across seasons.

Results:

= 4.734.

= 0.192.

Figure 23 Pearson Chi-Square test

37
Interpretation:
The results indicate no statistically significant relationship between seasons and collision rates.

Figure 24 Fatal and non-fatal collision frequencies across seasons

Conclusion for Hypothesis 3.3:

H₀ (Null Hypothesis): Accepted

H₁ (Alternative Hypothesis): Rejected.

Seasonal variations do not significantly influence collision rates.

Research Question 4:
How can predictive models forecast future collision rates?

38
Objective: Develop predictive models using historical collision data to identify and forecast

future collision rates. Incorporate factors such as weather, time, and collision types to build a

robust and actionable model.

Hypotheses:

H4.1: Historical collision data trends accurately predict future collision rates.

H₀ (Null Hypothesis): Historical collision trends cannot reliably predict future collision rates.

H₁ (Alternative Hypothesis): Historical collision trends can reliably predict future collision

rates.

Analysis and Visualizations

Data Analysis Overview

Time period: 2006-2023 (historical data)

Forecasts: 2024-2028

Key variables: YearString, Collision_Count, Predicted values, and confidence intervals

Base Model used: SARIMA Model (Seasonal Autoregressive Integrated Moving Average)

Only used historical collision counts and basic temporal features.

- Parameters: SARIMA (2,1,1) (1,1,1,12)

The below visualization and analysis were created to examine this:

39
Figure 25 Historical and predicted collision trends

Key Findings:

40
41
Figure 26 Monthly collision patterns

42
Conclusion for H4.1

Statistical Evidence: The model evaluation confirms that historical collision data significantly

predicts future collision rates (R² = 0.767, p < 0.001).

Hypothesis Evaluation:

H₀ (Null Hypothesis): Rejected.

H₁ (Alternative Hypothesis): Accepted.

Conclusion: Historical collision data trends accurately forecast future collision rates.

Hypothesis 4.2: External factors such as weather and road condition improve the accuracy

of predictive models.

H₀ (Null Hypothesis): Including external factors such as weather and road condition does not

significantly improve the accuracy of predictive models.

43
H₁ (Alternative Hypothesis): Including external factors such as weather and road condition

significantly improves the accuracy of predictive models.

Figure 27 Comparison of predictive model accuracy with and without external factors

44
Conclusion for H4.2

Statistical Evidence: Model comparison indicates that including external factors (e.g., weather

and road type) significantly improves predictive accuracy (R² = 0.842, p < 0.05).

Hypothesis Evaluation:

H₀ (Null Hypothesis): Rejected.

H₁ (Alternative Hypothesis): Accepted.

Conclusion: External factors such as weather and road type significantly enhance the accuracy of

predictive models for identifying high-risk areas.

45
The analysis confirms that external factors significantly improve collision predictions, with

weather being the most influential factor. Based on this finding, it is recommended to incorporate

both weather and road condition data in future predictive models to enhance forecasting

accuracy.

Both models were validated using:

Train-test split (80-20)

Time series cross-validation

Statistical significance testing (p<0.05).

5. Discussion

The findings of this study provide critical insights into traffic safety in Toronto, focusing on the

patterns, external factors, and predictive modeling of fatal and serious injury collisions. These

insights carry significant implications for urban planning, road safety policy, and resource

allocation.

Broader Implications

High-Risk Neighborhoods

The identification of neighborhoods with significantly higher collision frequencies provides

actionable data for targeted safety interventions. High-risk neighborhoods, primarily located in

central and southern Toronto, correlate with high population densities, traffic volumes, and the

presence of commercial and transit hubs. These findings emphasize the need for location-specific

safety measures such as:

46
 Traffic calming measures (e.g., speed bumps, reduced speed limits).

 Enhanced pedestrian crossings and protected bike lanes.

 Improved road infrastructure, such as redesigned intersections and better lighting.

Collision Types

While vehicle-only collisions dominate in high-risk neighborhoods, the increased vulnerability

of pedestrians and cyclists underscores the need for tailored safety interventions, including:

 Dedicated infrastructure, such as bike lanes and pedestrian safety islands.

 Reduced vehicle speeds in areas with significant pedestrian and cyclist activity.

 Awareness campaigns to educate drivers on sharing the road with vulnerable road users.

External Factors

 Adverse weather conditions and peak traffic hours significantly influence collision rates,

highlighting the importance of:

 Enhanced monitoring and enforcement during peak traffic hours.

 Improved road surface maintenance during adverse weather conditions.

 Public advisories to encourage safer travel during risky weather.

The minimal impact of seasonal variations suggests a more localized focus on daily and weekly

patterns, rather than large-scale seasonal interventions.

Predictive Modeling

The strong performance of predictive models, particularly those incorporating external factors,

validates their use for forecasting future collision rates and hotspots. This capability enables:

 Proactive allocation of resources to high-risk areas.

47
 Data-driven strategies for anticipating and mitigating collision risks.

 Regular updates to models to ensure alignment with evolving traffic patterns.

Field and Industry Impact

Urban Planning

The findings provide actionable insights for designing safer roadways, pedestrian zones, and

cycling lanes in high-collision areas. Incorporating these insights into urban development plans

can help reduce traffic-related injuries and fatalities.

Policy Development

Evidence-based enforcement policies, such as increased monitoring during adverse weather and

peak hours, are supported by the study's findings. These strategies align with broader road safety

initiatives.

Vision Zero Goals

The study supports Toronto’s Vision Zero initiative by offering data-driven strategies to

eliminate serious and fatal collisions. By focusing on high-risk neighborhoods and collision

types, these strategies can significantly improve road safety outcomes.

6. Limitations

Data Constraints

The absence of variables like pedestrian density and real-time traffic volume limited the scope of

some analyses. Incorporating such data in future studies would allow for more nuanced findings.

Prediction Uncertainty

48
Long-term forecasts exhibited wider confidence intervals, reflecting uncertainty over extended

timeframes. Regular updates to predictive models using new data are essential to maintain

accuracy.

Simplified Categories

Grouping injury severity at the accident level excluded nuances in individual injuries, which

might influence micro-level policy decisions.

Mitigation Strategies

 Enhancing Data Integration: Future research should incorporate additional datasets, such

as real-time environmental and traffic data, to improve analysis accuracy.

 Model Updates: Regular recalibration of predictive models will address long-term

uncertainty in forecasts and improve their reliability.

7. Recommendations

Immediate Actions (2024)

Neighborhood-Specific Measures

 Implement traffic calming measures in high-collision neighborhoods, such as speed

bumps and reduced speed limits.

 Enhance pedestrian and cyclist infrastructure, focusing on non-vehicle collision hotspots.

Temporal and Environmental Focus

 Prioritize traffic enforcement during peak traffic hours and adverse weather conditions.

 Introduce automated monitoring systems to detect and mitigate weather-related risks.

49
Medium-Term Planning (2025–2026)

Resource Allocation

 Develop resource allocation plans targeting seasonal patterns and high-risk time periods,

ensuring efficient use of resources.

Infrastructure Investments

 Upgrade road surface conditions and implement advanced traffic control systems in high-

collision neighborhoods.

Public Awareness

 Launch targeted education campaigns emphasizing pedestrian and cyclist safety in high-

risk areas.

Long-Term Strategy (2027–2028)

Predictive Modeling Enhancements

 Integrate real-time traffic and weather data into predictive models for dynamic safety

planning.

 Build adaptive response capabilities to address uncertainties in long-term forecasts.

Holistic Safety Framework

 Develop a city-wide safety framework, prioritizing underserved neighborhoods and

addressing systemic risks.

 Create tailored emergency response strategies for areas with high seasonal and temporal

risk patterns.

50
8. Conclusion

This project highlights the critical role of data-driven approaches in understanding and mitigating

traffic safety risks. By focusing on Toronto's serious and fatal collision patterns, the study not

only provides a framework for identifying high-risk areas but also enhances the predictive

capabilities needed for proactive road safety measures.

Key insights into collision types, environmental factors, and temporal trends offer actionable

knowledge that bridges the gap between traffic data and practical interventions. The predictive

models developed demonstrate strong reliability, underscoring their potential for forecasting

future risks and enabling preemptive action.

The methodological rigor applied, including spatial and statistical analyses, contributes to the

growing body of knowledge in urban traffic safety. By integrating findings with policy

recommendations, this project provides a roadmap for cities aiming to replicate or adapt the

Vision Zero model to their unique contexts.

Moving forward, the research invites further innovation in real-time data integration and

advanced analytics. It emphasizes the need for continuous collaboration between planners,

policymakers, and enforcement agencies to address dynamic urban safety challenges. The project

not only supports Toronto's Vision Zero ambitions but also serves as a template for advancing

global road safety initiatives.

51
References
CARSP. (2021). Evaluating the effectiveness of left-turn calming measures in Toronto.

Canadian Association of Road Safety Professionals. Retrieved October 20, 2024,

https://www.carsp.ca

City of Toronto. (2023). Vision Zero Road Safety Plan Overview. Toronto Vision Zero.

Retrieved October 20, 2024, https://www.toronto.ca/services-payments/streets-parking-

transportation/road-safety/vision-zero/

London Road Safety Review. (2019). Five-Year Road Safety Summary. London City Safety

Archives. Retrieved October 20, 2024, https://www.london.gov.uk/what-we-

do/environment/london-road-safety-review

New York City DOT. (2020). Vision Zero Annual Report. NYC Department of Transportation.

Retrieved October 20, 2024, https://www1.nyc.gov/html/dot/html/pedestrians/vision-zero.shtml

52

You might also like