Accident Data Analysis Using Machine Learning
Accident Data Analysis Using Machine Learning
https://doi.org/10.22214/ijraset.2022.44256
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue VI June 2022- Available at www.ijraset.com
Abstract: The main objective of this project is to analyze the road side accidents by scrutinizing accident-prone or hotspot areas
and their root causes. Accidents through roadways have been a great threat to developed as well as underdeveloped countries.
Road accidents and its safety have been a major concern for the world, and everyone is trying to handle this since years. Road
traffic and reckless driving occur in every part of the world. Because of this, many pedestrians are affected too. With no fault,
they become victims. Many road accidents occur because of numerous factors likeatmospheric changes, sharp curves, andhuman
faults. Injuries caused by road accidents are major but sometimes imperceptible, which later on affect health too. This study
aims to analyze road accidents in one of the popular metropolitan cities, i.e., Bengaluru,through Linear Regression, Polynomial
Regression, Decision Tree Regressor, Support Vector Regressor, Random Forest Regressor algorithms and machine learning by
scrutinizing accident-prone or hotspot areas and their root causes.
Keywords: Machine Learning Algorithms such as Random Forest, Super vector Machine, Linear Regression, Polynomial
Regression, Decision Tree.
I. INTRODUCTION
The main objective of this project is to analyze the road side accidents by scrutinizing accident-prone or hotspot areas and their
root causes. Accidents through roadways have been a great threat to developed as well as underdeveloped countries. Road
accidents and its safety have been a major concern for the world, and everyone is trying to handle this since years. Road traffic
and recklessdriving occur in every part of theworld. Because of this, manypedestrians are affected too. With nofault, they become
victims. Many road accidents occur because ofnumerous factors like atmospheric changes, sharp curves, and humanfaults. Injuries
caused by road accidents are major but sometimes imperceptible, which later on affect health too[1]. This study aims to analyze
road accidents in one of the popular metropolitan cities, i.e., Bengaluru, through Linear Regression, Polynomial Regression,
Decision Tree Regressor, Support Vector Regressor, Random Forest Regressor algorithms and machine learning by scrutinizing
accident- prone or hotspot areas and their root causes[2].
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2203
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue VI June 2022- Available at www.ijraset.com
C. Existing System
Data Analysization will be done by Data Analyst or Data Scientist. Most of the data Scientists are using K-Means clutering model
algorithm. For predicting the major occuring accident hotspot area’s. And mostlyit is preferred by data scientist too.
The K-means clustering model produced a low accuracy. Using K-means there were quite a few wrong predictions, which wrongly
got detected as Accident spots. Therefore, K-means would not be the preferred model, as it doesn’t correctly Analyze Accidents
and it also produced a lot of false positives Hence in this project we are using some of the machine learning algorithms they are
Linear Regression, Polynomial Regression, Decision Tree Regression, Super Vector Regression, Random Forest Regression
algorithms. Here we will choose the best algorithm to produce a good estimate of the generalization error and to be resistant to over
fitting out of5 algorithms[3]. This algorithms has been found to produce a good accuracy and precision. By analysing or predicting
from thesealgorithms we will be finding the algorithm which predicts and correct predictions then K-means algorithm. By this best
shown regression value will be assumed as best algorithm to analysize the data. Hence that algorithm will correctly analyze
accidents data where it was occurred.
III. METHODOLOGY
A. Linear Regression
Linear regression analysis is used to predict the value of a variable based on the value of another variable[3]. The variable you want
to predict is called the dependent variable. The variable you are using to predict the other variable's value is called the independent
variable[3].
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2204
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue VI June 2022- Available at www.ijraset.com
E. Random Forest
Random Forest Regression is a supervisedlearning algorithm that uses ensemble learning method for regression. Ensemble learning
method is a technique that combinespredictions from multiple machine learning algorithms to make a more accurate prediction than
a single model[4].
IV. IMPLEMENTATION
A. Data Collection and Analysis
The data was collected from external websites like kaggale.com for this project. And during the project data analysis we have
installed some python modules that are required they are
1) PANDAS - Pandas is a Python library for data analysis and it is most widely used in machine learning tasks.
2) NUMPY – It stands for ‘Numerical Python’ module, it can be utilised to perform number of mathematical operations on arrays
such astrigonometric, statistical and algebraic routines.
3) MATPLOTLIB[6] – It is a python library which is used for data visualization[6].
4) SEABORN[6] - It is a python library which is used for making statistical graphs[6].
B. Data Cleaning
Datasets are collected from different manually recorded materials. Some of the data values are incomplete, noisy and inconsistent.
Real world data set is not feasible for analysis or to make efficient decisions. Real world datasets are most of the time contains
unintentionally ‘dirty data.’ In case of machine learning datasets should be in a proper format to get a classification model. Further
to get quality output from the classification model the input must be quality. Prior to analysis dataset should be preprocessed
intensively to get valuable trends from the historical data.
V. CONCLUSION
The principal aim of our project is to choice the best machine learning regression algorithms out of 5 algorithms which helps the
data scientist to analyze the the data efficiently and easily. By the end of the project we can conclude that we can use 1 out 5 of the
best algorithm to analyze the data.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2205
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue VI June 2022- Available at www.ijraset.com
VI.FUTURE SCOPE
The main goal of this project is to decrease the number of accidents rate.
For this goal it is necessary to analyse the accidents rate first.
This helps the data scientist to use their timeefficiently as data scientists invest most of their time in data analysing.
REFERENCES
[1] Python Crash Course, Eric Matthes
[2] Head-First Python, Paul Barry
[3] Classification and Regression Treesby Leo Breiman, Jerome Friedman
[4] Decision trees, discriminant analysis,logistic regression, svm, ensamble methods and knn with matlab.
[5] The Hanford Plaintiffs, Trisha T. Pritikin, Richard C. Eymann
[6] Python Data Science Essentials - Third Edition by Alberto Boschetti, Luca Massaron
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2206