Project Report Data Visualization
Project Report Data Visualization
J COMPONENT
REVIEW REPORT
TEAM MEMBERS
Submitted to
1
LIST OF TOPICS
1 INTRODUCTION
1.1 Abstract 3
1.2 Introduction 3
1.3 Objectives 3
2 PROBLEM STATEMENT 3
3 DATA COLLECTION
3.2 Description 5
5 ACTIONS 7
6 IMPLEMENTATION 7
7 VALIDATION
8 SUGGESTIONS/ 23
CONCLUSION
2
1.INTRODUCTION
1.1 ABSTRACT
Roadway traffic safety is a major concern for transportation governing agencies as well as
ordinary citizens. In order to give safe driving suggestions, careful analysis of roadway traffic
data is critical to find out variables that are closely related to fatal accidents. The relationship
between fatal rate and other attributes including collision manner, weather, surface condition,
causality severity, and driver age were investigated.
Association rules were discovered by Apriori algorithm, classification model was built by Naive
Bayes classifier, and clusters were formed by a simple K-means clustering algorithm. Certain
safety driving suggestions were made based on statistics, association rules, classification model,
and clusters obtained.
1.2 INTRODUCTION
Investigations of the high-risk areas for road traffic crashes (RTCs) are urgently needed to guide
improvements in road safety. In this paper we apply statistics analysis and data visualisation
algorithms on the FARS Fatal Accident dataset as an attempt to address this problem. In this
paper, the relationship between fatality rate and other attributes including collision manner,
weather, surface condition, causality severity, and driver age were investigated.
1.3 OBJECTIVE
2. PROBLEM STATEMENT
Roadway traffic safety is a major concern for transportation governing agencies as well as
ordinary citizens. In order to give safe driving suggestions, careful analysis of roadway traffic
data is critical to find out variables that are closely related to fatal accidents. India is a country
3
having a high usage of vehicles. The vehicle consumption has drastically increased in the last 40
years from 6 million to 230 million vehicles. Due to the increasing rate of 9% vehicles per year,
the occurrence of road accidents has increased exponentially which in turn has hampered the
road security of the people in India.
3. DATA COLLECTION
4
3.2 DESCRIPTION
Quantitative attributes present- All Other attributes present based on each database
For example
4. TASKS
● Download datasets
● Data preparation
● Modelling
5
● Visualization
SYSTEM DESIGN
Numpy -NumPy is a Python library used for working with arrays. It also has functions for
working in the domain of linear algebra, fourier transform, and matrices.
Pandas - Pandas is a software library written for the Python programming language for data
manipulation and analysis. In particular, it offers data structures and operations for manipulating
numerical tables and time series.
Matplotlib- Matplotlib is a plotting library for the Python programming language and its
numerical mathematics extension NumPy. It provides an object-oriented API for embedding
plots into applications.
Plotly- The plotly Python library is an interactive, open-source plotting library that supports over
40 unique chart types covering a wide range of statistical, financial, geographic, scientific, and
3-dimensional use-cases.
6
OrderedDict- OrderedDict is a dict subclass that preserves the order in which key-value pairs,
commonly known as items, are inserted into the dictionary.
Bar_chart_race- Make animated bar chart races in Python with matplotlib. Img.
5. ACTIONS
● Data pre-processing
● Feature selection
● Training the model
● Statistical analysis
● Testing
● Output/ graphs prediction
6. IMPLEMENTATION
7
Percentage of accidents per 3-hour period
8
Number of Accidents happening in particular time interval
9
Number of Accidents at various severity level with respect to speed zone
10
Number of victims died with respect to gender
11
12
Percentage change in the accidents with previous year
13
14
15
16
17
18
19
20
21
7. VALIDATION
● Our primary aim is to analyse the data obtained by applying statistics analysis and data
visualisation algorithms on various Accident datasets.
● Certain safety driving suggestions were made based on statistics, association rules,
classification model, and clusters obtained.
● These are made by analysing data and comparing trends to find out why and how such
factors can affect road accidents.
● The main motivation of our project revolves around the fact that in India approximately
465033 people lose their lives every year because of road accidents.
● India’s young, productive population, aged 18-45 years, is involved in 70% of road
accidents.
● So, through this project we aim to analyse trends and information /data related to road
accidents in India and give a detailed analysis of these accidents which may be very
helpful in controlling these accidents.
● This project analysis makes people cautious of the way accidents happen so that they can
take precautions while driving.
22
8. SUGGESTIONS/ CONCLUSIONS
23