Analysis of Road Traffic Fatal Accident Using Data Mining Techniques
Analysis of Road Traffic Fatal Accident Using Data Mining Techniques
mining methods into analysis of traffic accidents on the Finnish roads. The data sets
collected from traffic fatal accidents are huge, multidimensional, and heterogeneous.
Moreover, they may contain incomplete and erroneous values, which make its
exploration and understanding a very demanding task. The target data of this study
was collected by the Finnish Road Administration Datasets. The intention is to
investigate the usability of robust clustering, association and frequent itemsets, and
visualization methods to the road traffic accident analysis. While the results show that
the selected data mining methods are able to produce understandable patterns from
the data, finding more fertilized information could be enhanced with more detailed and
comprehensive data sets. K-means algorithm takes accident frequency count as a
parameter to cluster the locations. Then we used association rule mining to
characterize these Surface Condition. The rules revealed different factors
associated with road accidents at different drunk and drive with varying accident
frequencies. The association rules for high-frequency accident location disclosed
that intersections on highways are more dangerous for every type of fatal accidents.
5
TABLE OF CONTENTS
2 LITERATURE SURVEY 3
2.1. RELATED WORK 3
4 METHODOLOGY 7
4.1. HARDWARE REQUIREMENTS 7
4.2. SOFTWARE REQUIRMENTS 7
4.3. SYSTEM ARCHITECTURE 8
4.4. APPLICATION OF JAVA & SQL 14
4.5. DEEP LEARNING 23
4.6. SYSTEM DESIGN AND RESTING PLAN 24
4.7. ALGORITHM 30
4.8. MODULES 31
6
6 CONCLUSION 35
6.1. CONCLUSION 35
REFERENCES 36
APPENDICES 38
A. SOURCE CODE 38
B. SCREENSHOTS 121
C. PLAGIARISM REPORT 139
7
LIST OF FIGURES
Fig
Figure Name Page No.
No.
1. SYSTEM ARCHITECURE 9
2. ED DIAGRAM 15
3. USE CASE DIAGRAM 18
4. SEQUENCE DIAGRAM 20
5. COLLABRATION DIAGRAM 21
6. PLATFORM INDEPENDENT 22
7. COLLECTION FRAMEWORK 23
8. MANAGEMENT STUDIO 24
9. QUERY EDITOR 27
10. NEW DATABASE 28
11. ADMIN LOGIN PAGE 32
12. REPORT ACCIDENT MODULE 33
13. Gantt Chart 34
LIST OF ABBREVIATIONS
8
ABBREVIATIONS EXPANSION
9
CHAPTER 1
INTRODUCTION
1.1 OVERVIEW
There are a lot of vehicles driving on the roadway every day, and traffic accidents
could happen at any time anywhere. Some accident involves fatality, means people die
in that accident. As human being, we all want to avoid accident and stay safe. To find
out how to drive safer, data mining technique could be applied on the traffic accident
dataset to find out some valuable information, thus give driving suggestion. Data mining
uses many different techniques and algorithms to discover the relationship in large
amount of data. It is considered one of the most important tool in information technology
in the previous decades. Association rule mining algorithm is a popular methodology to
identify the significant relations between the data stored in large database and also
plays a very important role in frequent itemset mining. A classical association rule
mining method is the Apriori algorithm who main task is to find frequent itemsets, which
is the method we use to analyze the roadway traffic data. Classification in data mining
methodology aims at constructing a model (classifier) from a training data set that can
be used to classify records of unknown class labels. The Naive Bayes technique is one
of the very basic probability-based methods for classification that is based on the Bayes’
hypothesis with the presumption of independence between each pair of variables. We
used the FARS dataset for our study. The Fatal Accidents Dataset contains all fatal
accidents on public roads in 2007 reported to the National Highway Transportation
Safety Administration. The dataset is downloaded from California Polytechnic State
University and all data originally came from FARS. The dataset contains 37,248 records
and 55 attributes. The data description can be found in the document FARS.
Literature survey is the most important step in software development process. Before
developing the tool it is necessary to determine the time factor, economy and company
strength. Once these things are satisfied, then the next step is to determine which
1
operating system and language can be used for developing the tool. Once the
programmers start building the tool the programmers need lot of external support. This
support can be obtained from senior programmers, from book or from websites. Before
building the system the above consideration are taken into account for developing the
proposed system. The major part of the project development sector considers and fully
survey all the required needs for developing the project. For every project Literature
survey is the most important sector in software development process. Before developing
the tools and the associated designing it is necessary to determine and survey the time
factor, resource requirement, man power, economy, and company strength. Once these
things are satisfied and fully surveyed, then the next step is to determine about the
software specifications in the respective system such as what type of operating system
the project would require, and what are all the necessary software are needed to
proceed with the next step such as developing the tools, and the associated operations.
2
CHAPTER 2
LITERATURE SURVEY
4
CHAPTER 3
The traffic accident using data mining technique that could possibly reduce the
fatality rate. Using a road safety database enables to reduce the fatality by implementing
road safety programs at local and national levels. Classification models to predict the
severity of injury that occurred during traffic accidents. Association rules mining
algorithm on a dataset about traffic accidents which was gathered from Government
Traffic Office, Apriori and Predictive Apriori association rules algorithms were applied to
the dataset to investigate the connection between recorded accidents and factors to
accident severity.
This paper presents our research to model the severity of injury resulting from
traffic accidents using artificial neural networks and decision trees. We have applied
them to an actual data set obtained from the National Automotive Sampling System
(NASS) General Estimates System (GES). Experiment results reveal that in all the
cases the decision tree outperforms the neural network. Our research analysis also
shows that the three most important factors in fatal injury are: driver’s seat belt usage,
light condition of the roadway, and driver’s alcohol usage. Our experiments also showed
that the model for fatal and non-fatal injury performed better than other classes. The
ability of predicting fatal and non-fatal injury is very important since drivers’ fatality has
the highest cost to society economically and socially.
5
CHAPTER 4
METHODOLOGY
System - Pentium-IV
Speed - 2.4GHZ
Hard disk - 40GB
Monitor - 15VGA color
RAM - 512MB