0% found this document useful (0 votes)
33 views12 pages

14 Ijsrcse 04156

Uploaded by

Durga Devi P
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views12 pages

14 Ijsrcse 04156

Uploaded by

Durga Devi P
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/357241281

Predicting Factors of Vehicle Traffic Accidents Using Machine Learning


Algorithms: In the Case of Wolaita Zone

Article in International Journal of Scientific Research in Computer Science Engineering and Information Technology · August 2020

CITATIONS READS

0 179

2 authors, including:

Aklilu Elias
Wolaita Sodo University
7 PUBLICATIONS 0 CITATIONS

SEE PROFILE

All content following this page was uploaded by Aklilu Elias on 22 December 2021.

The user has requested enhancement of the downloaded file.


International Journal of Scientific Research in ___________________________ Research Paper.
Computer Science and Engineering
Vol.8, Issue.4, pp.105-115, August (2020) E-ISSN: 2320-7639

Predicting Factors of Vehicle Traffic Accidents Using Machine


Learning Algorithms: In the Case of Wolaita Zone
Aklilu Elias Kurika1*, Tigist Simon Sundado2
1,2
Department of IT, School of Informatics, Wolaita Sodo University, Sodo, Ethiopia

*Corresponding Author: [email protected]/[email protected], Tel +251 91 648 5472, +251 94 903 2208

Available online at: www.isroset.org


Received: 28/Apr/2020, Accepted: 06/Jun/2020, Online: 31/Aug/2020
Abstract- Vehicle traffic accident is the ultimate and major agenda for government in which special attention has been
given to continuously reduce its occurrence and related risks. Wolaita zone is one of the major areas in which increased
vehicle traffic accident occurs. Government and concerned bodies have given special attention to reduce accident rate in
the country. By having this point as the motivating factor for study, this work tried to predict factors of vehicle accidents
by using machine learning algorithms. We used unbalanced datasets with 1611 instances which was seven years data from
2005-2011 E.C. In order to analyze data and evaluate patters of datasets, KDD process model was applied. The learning
algorithms applied for experiments were decision tree classifiers (J48, Random forest and Rep tree, Bayesian classifiers
(Naïve Bayes and Bayesian network). The experimental results, model evaluation and performance measurement shows
that F-measure of J48 and Rep tree classifiers are comparatively similar i.e. 97.87% and 97.80% respectively and Random
Forest tree performed less i.e. 90.9%. We identified the 1 st experiment of J48 tree as the best model by performance and 23
best rules were generated from this experiment; best features were also identified. The most common victims, most
commonly participated vehicles in accident and black spot areas for frequent accidents occurrences were identified. The
findings of this study are significant for road and traffic authority and police commission for the revision and endorsement
of the rules, regulations and standards related to traffic accidents; and therefore vehicle traffic accidents and related risks
can be reduced generally in our country Ethiopia and specially at Wolaita Zone. We made accident data ready for further
analysis in order to get most important patterns of datasets for any future researchers.

Keywords— Vehicle traffic accident, Decision Tree, Bayesian Classifiers, Machine Learning Algorithms, Performance
measurement

I. INTRODUCTION accident data according to their respective areas and


country perspectives.
Road or vehicle traffic accident is a universal problem and
worldwide reports show that on average, more than four Even though plenty of researches were conducted, vehicle
million peoples die because of many reasons in one year traffic accident increases rapidly and results in massive
Micheale [1]. Among this numbers, HIV AIDS and loss of humans’ life, materials damage and other
tuberculosis are the first and second cases for the deaths equivalent losses. Projections indicate that these figures
and vehicle traffic accident is the third known case for will increase by 65% over the next 20 years unless there is
those dying on every day. More than half the people killed new commitment to prevention.
by vehicle crashes were young adults aged between 15 and
44 years often the breadwinners in a family. Furthermore, Increased loses and related injuries caused various
road vehicle accident injuries low cost income and middle- problems to the economic development of respective
income countries between 1% and 2% of their gross countries. According to different countries perspectives,
national product; which is more than the total development there are diverse kinds of attributes and contributing
aid received by these countries according to WHO and factors of vehicle traffic accidents. Accident risk factors
World Bank [2]. This study also shows that in worldwide, were more over determined in the developed countries and
an estimated 1.2 million people were killed by road some preventive measures has been taken to reduce it. But
vehicle accidents each year and as many as 50 million traffic accident risks, related material damages and life
were injured. Statistics shows that every year, 1.2 million loses increase from time to time in developing countries.
people were known to die by road accidents worldwide. These points are the motivating factors for this study to be
The study shows that in the 2020, vehicle traffic accident conducted. In case of Ethiopia, some researches have been
will be the first factor to cause death of human beings in conducted, but the risk factors couldn’t be reduced from
the world as stated by Guardian [3]. A lot researches were time to time. In the case of Wolaita Zone, timely recorded
conducted on accidents in every parts of the world to data reality on the ground shows that traffic accident is the
reduce the accident rate and they used their own view on major issue ought to be given special attention. The reason

© 2020, IJSRCSE All Rights Reserved 105


Int. J. Sci. Res. in Computer Science and Engineering Vol.8, Issue.4, Aug 2020

is that risks of traffic accidents and related material and explored decision trees are better than neural networks in
live loses show enormous increase from time to time. But performance.
the reasons for increased traffic accident factors are not
well known. Additional deep analysis on accident data The definite factors of traffic accident were conducted and
indeed expected and this is also a motivating factor to identified by different researchers and their findings show
conduct study by machine learning algorithms. that causality factors were un-adopted speech, in-attention,
behavior of passengers, roadway features, demographic
Therefore the purpose of this study is to predict factors of features, environmental characters, technical characters,
vehicle traffic accidents using machine learning algorithms speed, age, gender, younger aged drivers, alcohol, less
in order to determine most determinant attributes to the control, wrong over-taking and tire blow [10], [11], [12],
occurrence of increased accident rate, the most common [20], [21] and [24]. These factors were identified in
victims of the accident, the most commonly participated various areas as the contributing factors for accidents. But
vehicles in accident, the black spot areas for frequent it is impossible to blindly take control measures over all
accident occurrences, the best machine learning these characteristics to be considered in particular area.
algorithms for analysis, generate important rules for the Akinbola et al., used machine learning algorithms to
occurrence of accidents, build the predictive model and predict the factors of traffic accidents [14] and [15].
finally to evaluate performance of the model. All these Classification and machine learning algorithms were used
objectives were attained finally as we can see from the to determine traffic injury occurrences by Gupta and
experimental results. Baluni [11]. Both of these authors used only decision tree;
and Tibebe et al., was all about machine learning
II. RELATED WORK algorithm but it was not for determining causes of traffic
accidents [16]. Experimental findings show that majority
The road features are one of contributing factors of traffic of participants in vehicle traffic accidents were females
accidents and they are related to locations of accident aged between 30 to 59 years, with primary or secondary
related factors; Accident data is basic to identify these education levels. By using multivariate logistic regression
features [7] used small amount of secondary data, but models, the researchers revealed that white people
types of road features were not clearly specified [8], [9] accounted for 48.1 % of participants and 61.2 % were
and [20]; the two wheeler vehicles involvement rates those living with partners [22].
determined accident accidents prone locations, other types
of vehicles were not considered yet to determine the most Works in classification algorithms and artificial
common accident occurring areas according to the intelligence has also comparatively similar findings as
researcher [9]. represented in [25] and [26].
Generally amount of data used by some of researchers was
Amount and type of data (primary or secondary) data used small and not suitable for analysis like social media data;
for study also matters the researchers to build model with secondary data which is collected by questionnaire. Using
better performance [9], [10] and [13]. This data was small, such kind of data for predicting factors of vehicle traffic
and it was both primary and secondary (social media data) accident is not feasible. Most of studies were conducted
data which is collected in questionnaire. Secondary data is only by J48 decision tree algorithms. Performance
not feasible for analysis as all of data scientists know. The comparisons have been made for only two algorithms;
problem of these studies was that researchers used only three types of vehicles being participated in traffic
secondary data; another limitation is that the method used accident occurrences were identified. In the case of
was not scientific and finally there is no evaluation Wolaita, there are various vehicles from smallest to the
parameter for performance and accuracy of his work. Only heavy ones (vans and trucks) flow on the road day to day.
decision tree algorithms were used by [8]; Studies Most of researchers used only decision tree algorithm;
performed by authors [11] & [16] were on the selected Bayesian networks and Naïve Bayes and decision tree
features of data sets to determine symbolic descriptions. algorithms were not widely used. Accuracy of predictive
Here; author used only one algorithm; another issues model for accident occurrence was also not good i.e. 85%
related to accidents were not considered yet. and recommended to be tried again with large amount of
data [26].
A comparative analysis in the performance measurement
and accuracy of algorithms were studied in detail by So predicting factors of vehicle accident is expected to
authors [10] and [11]. The first author compared six identify the most commonly contributing factors that hold
algorithms (classification and regression tree, random a lion share. In Ethiopia, Wolaita zone is one of the most
forest, ID3, functional trees, naïve bayes and J48) commonly known areas in which traffic accidents and
algorithms to determine accidents severity level. It related injuries take place. By predicting factors with
revealed that naive bayes value and J48 techniques value machine learning algorithms, the most contributing factors
were approximately same in accuracy. The second one is was determined from traffic accident data which is
comparative study on machine learning algorithms; the obtained from Wolaita police commission.
comparison has been made for decision tree and neural
networks to determine factors of increased traffic injury. It

© 2020, IJSRCSE All Rights Reserved 106


Int. J. Sci. Res. in Computer Science and Engineering Vol.8, Issue.4, Aug 2020

III. METHODOLOGY

To address the problems the researcher proposed knowledge discovery in datasets (KDD) process modeling as study
design and its possible steps are given as follows diagrammatically.

Figure 1 Study design

A. Data Integration: To keep normal compliance of data, irrelevant attributes from original Data. In this step we
we integrated data to common format according our made cleaning process of data before loading it to WEKA.
objectives and identified most important attributes to our
study. Some of attributes were ignored from original data D. Data Transformation: The original data was in word
because they are less meaningful to our study. processor. Some data were in spread sheet or excel
Accordingly, 36 important attributes were identified and document. We transformed it to the .CSV format which
1611datasets were prepared for analysis, which is 7 years the WEKA work bench supports. Loading data to WEKA
data from 2005-2011 E.C. The amount of data was limited is the next step after data transformation.
to 1611; because five years (2000-2004) data was burned
before it was being transformed to police commission E. Algorithm Selection: Classification algorithm has been
from road and transport authority. identified as the best technique to attain our objectives in
accordance with predetermined datasets we had. From
B. Data Selection various classification algorithms, decision tree classifiers
In order to get data for prediction, applicable data was (J48, Random Forest and Rep Tree) classifiers and from
selected from 12 districts and three city administrations of Bayesian classifiers (Naïve Bayes and Bayesian Network)
Wolaita Zone. The case study is limited to Wolaita zone classifiers were selected to conduct our experiments. We
only. This is because we wanted to define the scope of our have computed 15 experiments, (three for each classifiers
study only Wolaita. i.e. by 10 fold cross validation, by 66% split and by 90%
split for each of them respectively.) We have identified 14
C. Data Preprocessing: In this step, data cleaning, data best features among 36 attributes with wrapper method.
reduction and data transformation has been made to
prepare the best quality datasets for further analysis. F. Knowledge Generation: Finally the researcher
Original data was obtained from Wolaita Zone police generated hidden knowledge with proposed algorithms for
commission (PC) but, it has a lot of drawbacks such as the prepared datasets; and reported findings.
spelling errors, unreadable data, misspelt attributes names,
unknown values for some attributes and irrelevant personal IV. EXPERIMENTS AND RESULT DISCUSSION
representations of some terms. Some terms were
inconsistent and considered to be outliers. We removed
A. Most Commonly Participated Vehicles

© 2020, IJSRCSE All Rights Reserved 107


Int. J. Sci. Res. in Computer Science and Engineering Vol.8, Issue.4, Aug 2020

Most commonly participated Vehicles

Motor Cycle ISUZU Medium-Bus Minibus Trucker Sino-Truck Tagro-Bajaj

31.60%

10.49% 9.68%
7.20%
5.65% 5.34% 5.28%

Most Commonly Participated vehicles in accidents


Figure 2 most commonly participated vehicles.

From the total 31 kinds of vehicles participated in They account 75.34% and remaining 24 vehicles
accidents, we have identified 7 kinds vehicles as the most participation is only 24.66%. So we can conclude that if
commonly participated. But only Vans and Trucks as most these vehicles were given separate road in cities specially
commonly participated vehicles were identified by [21]. Sodo-City (>25%) traffic accident can be possibly
reduced.
B. Most Common Victims of Accidents
Victims of Accidents in percentage
45.00%
Pedestrains, 40.16%
40.00%
Percentage of victims

35.00%
30.00% Material, 27%
25.00%
Passengers, 19.93%
20.00%
15.00% Derivers, 11.73%
10.00%
5.00% Animals, 1.24%
0.00%
Class of Victims

Sex of Victims Age of Victims


Male,
60 53.8 35 B (19-30),
Victims Age in Percentage

30 30.54
Sex of Victime in %

50
25 A (1-18), C (31-50),
40 Other, 18.75 18.56
20
30 Female, 26.51
19.6 15
20 D (>50),
10 5.96
10 5
0 0
Sex of Victims Age of Victims
Figure 3 Most common victims

© 2020, IJSRCSE All Rights Reserved 108


Int. J. Sci. Res. in Computer Science and Engineering Vol.8, Issue.4, Aug 2020

The above diagram shows that the most common victims As it is known, the most productive human power is aged
of accidents are pedestrians (40.16%) and passengers between 18 and 50. Therefore traffic accident affects the
(19.93%). Derives are less victims. So we can conclude most productive classes of humans as we can conclude
that car traffic accident most commonly affects pedestrians from the above result.
and passengers in our case study.
C. Most Common Black Spot Areas
Males (53.8%) are most commonly affected by car traffic We have selected 19 places with frequent accident
accidents compared to females (19.6%); which are occurrences from the above five Woredas. We selected
opposite to study which revealed majority of participants areas with > = 15 accidents within 7 years. From the total
as females in accidents [22] and [23]. 18.75% of victims accidents occurred, these places account 521 (32.34%)
were aged between 1-18, 30.54% of victims were aged accidents. So concerned bodies must give attention to
between 19-30 and 18.56% of victims were aged between these areas.
31-50.

80
62
60 49 46
40 32 29 29 26 25 24 24 22 20 19 19 17 17 16
20 15 15 15

0
Black Spot Areas with frequent Accident occurrences among 5 selected woreda with more accidents
Wadu Tebela Ajjif Buge Arada
Merkato Kokate In-Marry-Chuch Golla Taba
Otona Dalbo Larena Kingnam Infont of WSU
Shochora Shasha Gale Kawo Shafa Fate Gununo

Figure 4 Accident Occurrence Places

From 15 different areas shown above, the first five (Sodo- accounts > 5% accident occurrences from the total one, so
city, Damot-Gale, Humbo, Sodo-Zuria and Boditi-City) we selected the black spot areas for frequent accidents
account a lot accidents i.e. 73.37% of total accidents. The occurrences from these five Woredas.
remaining 10 districts account only 26.63%. Each of them

40 36.19

30

20
12.48
9.31 8.5 6.89
10 4.9 4.72 3.72 2.6 2.15 1.99 1.92 1.86 1.36 0.68
0
Accident Woreda

Sodo City Damot Gale Humbo Sodo Zuria Boditi City


Boloso Sore Damot Woyde Areka City Offa Damot Fulasa
Diguna Fango Kindo Koysha Kindo Didaye Damot Sore Boloso Bombe

Figure 5 Most Common Black spot areas

D. Determinant Cases of Accidents- The Most The causality condition of accidents is mostly crossing the
Determinant Cases and causality condition of Accidents road (32.96%) straight crash (28.80%), roll down
are: Lack of attention (65.49%), over speed (10.62%), (16.70%), side to side crash (8.57%) and walking on the
Prohibiting Priority (10.37%), lack of experience (6.33%) road (5.90%).
and technic failure (3.54%)

© 2020, IJSRCSE All Rights Reserved 109


Int. J. Sci. Res. in Computer Science and Engineering Vol.8, Issue.4, Aug 2020

Case of accident in percent


Case of Accidents
70
60
50
40
30
20
10
0
Case of Accidents in %
Lack ofAttention 65.49
Over Speed 10.62
Prohibiting Priority 10.37
Technic Failure 3.54
Break Failure 1.55
Deriving In Night 1.49
Crossing Road Illegally 0.68

Figure 6 (a) Determinant cases of accidents

35 32.96

30 28.8

25

20 16.7
15

10 8.57
5.9
5 2.42 1.74 0.93 0.81 0.75 0.43
0
Causality Condition

Crossing Road Straight Crash Roll Down Side to side


Walking on road Turning to left Deriving Out Behind Crash
Illegal Cross Turning point Deriving back

Figure 6 (b) Determinant cases of Accidents

Table 1 Summary of Experimental Results


Exp Models NL ST TP Rate FP Rate Precision Accuracy
Exp.1 Trees.J48 -C 0.5-M 4 141 145 0.984 0.030 0.984 98.45%
Testmode=10-fold
Datasets=Unbalanced
Attributes=All
Exp.2 Trees.J48 -C 0.5-M 4 4 5 0.989 0.015 0.989 98.90 %
Testmode=Split=66%
Datasets=Unbalanced
Attributes=All
Exp.3 Trees.J48 -C 0.5-M 4 4 5 0.989 0.0014 0.989 98.91%
Testmode=Split=90%

© 2020, IJSRCSE All Rights Reserved 110


Int. J. Sci. Res. in Computer Science and Engineering Vol.8, Issue.4, Aug 2020

Datasets=Unbalanced
Attributes=All
Exp.4 RandomForest -P 100 -I 100 -num- - - 0.921 0.237 .926 92.12 %
slots 1 -K 0 -M 1.0 -V 0.001 -S 1
Testmode=10-fold
Datasets=Unbalanced
Attributes=All
Exp.5 RandomForest -P 100 -I 100 -num- - - 0.905 0.325 0.914 90.51 %
slots 1 -K 0 -M 1.0 -V 0.001 -S 1
Testmode=Split=66%
Datasets=Unbalanced
Attributes=All
Exp.6 RandomForest -P 100 -I 100 -num- - - 0.901 0.301 0.912 90.06 %
slots 1 -K 0 -M 1.0 -V 0.001 -S 1
Testmode=Split=90%
Datasets=Unbalanced
Attributes=All
Exp.7 trees.REPTree-M 2-V 0.001-N 3-S 1- 4 5 0.984 0.026 .984 98.386 %
L-1-I 0.0 Testmode=10-Fold
Attributes=All
Datasets=Unbalanced
Exp.8 trees.REPTree-M 2-V 0.001-N 3-S 1- 4 5 0.989 0.015 0.989 98.905 %
L-1-I 0.0 Testmode=Split=66%
Attributes=All
Datasets=Unbalanced
Exp.9 trees.REPTree-M 2-V 0.001-N 3-S 1- 4 5 0.991 0.003 0.991 99.07 %
L-1-I 0.0 Testmode=Split=80%
Attributes=All
Datasets=Unbalanced
Exp.10 Bayes.NaïveBayes-output-debug-info - - 0.946 0.068 0.948 94.60 %
Testmode=Split=90%
Attribute= All
Dataset=Unbalanced
Exp.11 Bayes.NaïveBayes-output-debug-info - - 0.954 0.066 0.956 95.438
Testmode=split=66%
Attribute= All
Dataset=Unbalanced
Exp.12 Bayes.NaïveBayes-output-debug-info - - 0.969 0.027 0.971 96.894 %
Testmode=split=90%
Attribute= All
Dataset=Unbalanced
Exp.13 Weka.Classifiers.bayes.net - - 0.942 0.061 0.946 94.165 %
Testmode=10-Fold
Attribute= All
Dataset=Unbalanced
Exp.14 Weka.Classifiers.bayes.net - - 0.954 0.048 0.957 95.44 %
Testmode=split=66%
Attribute= All
Dataset=Unbalanced
Exp.15 Weka.Classifiers.bayes.net - - 0.969 0.010 0.972 96.894 %
Testmode=split=90%
Attribute= All
Dataset=Unbalanced

Key: Exp: Experiment, NL: Number of Leaves, ST: Size of Tree, TP: True Positive, FP: False Positive

As we can see from the above experimental results and 1st Expt. Rep tree Precision = 97.70% and Recall =
below diagram, J48 and Rep tree classifiers are 97.90%, (FM= 97.80%)
comparatively similar by their accuracy. The first experimental results of J48 decision tree, includes
We computed average Precision and Recall of J48 and more features than exp.2 and 3 even though the number of
Rep tree and selected the J48 decision tree algorithm as a leaves and size of tree generated are more.
better than Rep tree. So we selected it as a working model and generated 23
1st Expt J48 tree Precision = 98% and Recall = 97.75%, best rules from this particular experiment.
(FM= 97.87%)

© 2020, IJSRCSE All Rights Reserved 111


Int. J. Sci. Res. in Computer Science and Engineering Vol.8, Issue.4, Aug 2020

True Positive False Positive Precision Accuracy

1.2

0.8

0.6

0.4

0.2

0
Exp 1 Exp 2 Exp 3 Exp 4 Exp 5 Exp 6 Exp 7 Exp 8 Exp 9 Exp 10Exp 11Exp 12Exp 13Exp 14Exp 15
Figure 7 Diagrammatical representations of all experiments
Below are some of the best rules generated
1. If Severity of Accident = Material Damage and Class of 4. If Severity of Accident = Slight and Edu/n Level =
Victims = Pedestrian and Time of Accident = Primary and Settlement of Road = Upward and Type of
Morning/Evening Then Fatal in Accident: Yes. Causality Vehicle = Motor Cycle, ISUZU, ISUZU-
2. If Severity of Accident = Material Damage and Class of Autobus, Minibus Then Fatal in Accident: Yes.
Victims = Pedestrian and Time of Accident = Night and 5. If Severity of Accident = Slight and Edu/n Level =
Number of Victims > 2: Then Fatal in Accident: Yes. Primary and Settlement of Road = Upward and Type of
3. If Severity of Accident = Material Damage and Class of Crashed Vehicle!= Motor Cycle Then Fatal in
Victims = Pedestrian and Time of Accident = Afternoon Accident: No.
and Type of Crashes = Vehicle With Pedestrian: Then
Fatal in Accident: No.

E. Performance Measurement of Learning Algorithms- In the experiment evaluation part, we have identified that J48 and Rep
tree are comparatively similar and better that the remaining three classifiers. So we have selected the first and third experiments
for each classifiers and measured performance of their classifiers accuracy as follows.
Table 2 Confusion Matrix of selected experimental results
Algorithms Actual Predicted Recall
J48 Tree Non-Fatal accidents Fatal accidents Accuracy
Exp.1 Non-Fatal accidents 1213 11 99.10% 98.45%
Fatal accidents 14 373 96.40%
Precision  98.90% 97.10%

Exp.3 Non-Fatal accidents 1217 7 99.40% 98.76%


Fatal accidents 13 374 96.60%
Precision  98.90% 98.20%
Rep Tree
Exp.7 Non-Fatal accidents 1211 13 98.90% 98.39%
Fatal accidents 12 375 96.90%
Precision 99% 96.40%

Exp.9 Non-Fatal accidents 1203 21 98.30% 98.76%


Fatal accidents 0 387 100%
Precision  100% 95.20%

F. Model Evaluations- Since the dataset we have was obtained by experiments performed with 10 fold cross
unbalanced, taking accuracy of the model to decide one validation tests according to expert judgments. Then we
model as best model is misleading. In such cases, it is ignored the rest experiments with 90% split tests and
advisable to take precision and recall for deciding whether accepted experiments with cross validation tests.
one model is better than the other or not. In our cases, four Experiment 1st (98%) average precision and (97.75%)
of the experiments listed above have comparatively similar average recall for two class labels and 7th experiment
precision and recall values. But the 1st and 7th experiments (97.70% ) average precision and (97.90%) average recall
were computed by 10 fold cross validation and the rest were selected to determine the best model with good
were computed by 90% split value for training and testing predictive accuracy for fatal and non-fatal accident
the model. So model with good predictive accuracy can be occurrences.

© 2020, IJSRCSE All Rights Reserved 112


Int. J. Sci. Res. in Computer Science and Engineering Vol.8, Issue.4, Aug 2020

Tester: weka.experiment.PairedCorrectedTTester -G 4 -D 1 -R 2 -S 0.05 -result-matrix


"weka.experiment.ResultMatrixPlainText -mean-prec 2 -stddev-prec 2 -col-name-width 0 -row-name-width 25 -mean-
width 0 -stddev-width 0 -sig-width 0 -count-width 5 -print-col-names -print-row-names -enum-col-names"
Analysing: F_measure
Datasets: 1
Resultsets: 6
Confidence: 0.05 (two tailed)
Sorted by: -
Date: 9/14/19 10:59 PM

Dataset (1) bayes.N | (2) baye (3) tree (4) tree (5) tree (6) tree
-------------------------------------------------------------------------
W-Z-A-Data(100) 0.96 | 0.96 0.99 v 0.95 0.99 v 0.88 *
-------------------------------------------------------------------------
(v/ /*) | (0/1/0) (1/0/0) (0/1/0) (1/0/0) (0/0/1)

Key:
(1) bayes.NaiveBayes
(2) bayes.BayesNet
(3) trees.J48
(4) trees.RandomForest
(5) trees.REPTree
(6) trees.RandomTree

(V/ /*) The symbol “V” represents victory = represents best algorithm and the symbol “*” represents Astrix = represents
the poorest algorithms against the base algorithm.

The above result shows that J48 Tree and Rep tree are were material damages and animals. Again 53.8% of
significantly best by performance than all other classifiers victims were males and only 19.6% are females, remaining
with the given dataset. Naïve Bayes and Bayesian network 26.51% accounts for others.
classifiers are significantly good by their performance and
the rest two algorithms (Random forest and Random tree) When we see the age of victims, 18.75% of victims were
classifiers are poor by performance when compared to aged between 1-18, 30.54% were aged between 19-30,
other classifiers with the given dataset. 18.56% were aged between 31-50 and only 5.96% were
aged above 50. The most productive human power was
V. DISCUSSIONS aged between 19-30 and 31-50. Therefore we identified
that the traffic accident most commonly affects the most
Feature selection experiment, the researchers identified productive classes of human populations.
determinant factors for the increased vehicle accident
occurrences; that are Accident Woreda, Specific Place, The most determinant cases of accident occurrences are
Month of Accident, Year of Experience, Crash Cost in lack of attention (65.49%), over speed (10.62%),
Birr, Type of Crashes, Year of Accident, and Day of Prohibiting Priority (10.37%), lack of experience (6.33%)
Accident including the four determinant attributes and technic failure (3.54%) When we relate the case of
identified from the decision tree rules. accident to causality condition, the conditions of accidents
are mostly crossing the road (32.96%) straight crash
From the generated decision tree rules the researchers (28.80%), roll down (16.70%), side to side crash (8.57%)
observed ten most predictive attributes for vehicle traffic and while victims are walking on the road (5.90%).
accident occurrences despite the attribute for splitting
criteria/root node attribute i.e. Severity of accident and The most commonly participated vehicles in the accidents
Class level attribute/leaf node attribute i.e. fatal in are Motor Cycle, ISUZU, MEDIUM-BUS, Minibus, Truck,
accident. These attributes are Settlement of Road, Sino Truck and Tagro Bajaj among 31 different kinds of
Education level (for derivers), Type of Crashed vehicle, vehicles. They account 31.6%, 10.49%, 9.68%, 7.20%,
Class of victims, Time of Accident, Number of Victims, 5.65%, 5.34% and 5.28% respectively.
Type of Crashes, Age of Deriver, Day of Accident and
Type of Causality Vehicle. The researchers identified the black spot areas for the
frequent accident occurrences broadly in Woreda levels
The most common victims of accident are also identified and Specific place levels. In Woreda level, among 15
from decision tree while traversing from the root node to different places, 5 Woreda are selected; which are Sodo
the leaf node. They are pedestrians (40.16%), passengers City (36.19%), Damot Gale (12.48%), Humbo (9.31%),
(19.93%) and derivers (11.73%). The remaining 28.24% Sodo Zuria (8.5%) and Boditi City (6.89%) Specific

© 2020, IJSRCSE All Rights Reserved 113


Int. J. Sci. Res. in Computer Science and Engineering Vol.8, Issue.4, Aug 2020

places for frequent accident occurrences from these (decision tree classifiers and Bayesian classifiers) were
Woreda are Wadu, Ajif, Arada, Merkato, In front of Marry used to address the problems as the class labels are used
Church, Kokate, Golla, Otona, Larena, Infront WSU, for datasets. KDD process modeling was used as a study
Buge, Taba, Shasha Gale, Tebela, Shochora, Dalbo, Kawo design.
Shafa, Kingnam, Fate and Gununo.
We addressed various statements of problems and
From experimental results and performance measurement objectives to determine determinant factors of vehicle
for learning algorithms, we identified the best machine traffic accidents. From the experimental results, 11
learning algorithms for vehicle traffic accident prediction. attributes were selected as the most determinant factors for
Experiment 1st (98%) average precision and (97.75%) accident occurrences. Seven most commonly participated
average recall for two class labels and 7th experiment vehicles were identified, 20 areas for frequent accident
(97.70% ) average precision and (97.90%) average recall occurrences were identified, pedestrians and passengers
were selected to determine the best model with good were identified as the most common victims and J48 and
predictive accuracy for fatal and non-fatal accident Rep tree classifiers were explored as best algorithms by
occurrences. From these two experiments, we calculated performance and model accuracy tan the rest.
F-Measure (harmonic mean of precision and recall) Comparatively, J48 algorithm was selected as the best
identified that the 1st experiment is comparatively better working model and from this particular model, 23 best
than the 7th experiment. So we selected the first experiment rules were generated from the selected model for accident
(J48 Tree) as the best algorithm and classifier model as occurrences. The limitation of this study was that the
the best predictive model to predict factors of vehicle researcher used small amount of datasets, difficulty to
traffic accidents and generate important rules for vehicle obtain suitable datasets and existence of attributes with
traffic accident occurrences. missing values. Another limitation for the researcher is
that only decision tree and Bayesian classifiers were used
Finally from the decision tree experiments, J48 decision for prediction.
tree is identified as better algorithm that Rep tree and
selected the 1st experimental model to generate the best Therefore the researchers recommend the future
rules. Because it holds most of attributes that are identified researchers try accident predictions with techniques like
in best feature selection experiment even though the support vector machine, multilayer perceptron and
number of leaves and size of tree are more than the rest artificial neural networks. The researchers also recommend
two experiments. future researchers to use use convolutional neural network
Accordingly, from first experiment of J48 decision tree, 23 with python programming language the get better result
best rules were generated by using IF…Then rules. These than the revealed results in this study. It is also
rules show the cases of various fatal and non-fatal accident recommended for them to add some unconsidered
occurrences. They also hold the most predictive attributes attributes to datasets and relate cases to behavior of
for vehicle traffic accident occurrences. derivers like amount of alcohol taken and mental
normality of derivers to get better results. Try with deep
We also evaluated classifier models select best algorithms learning with large amount of instances to get better result
from 15 different experiments. We have used f-measure and integrate it with knowledge base to know cases for
for model evaluation. Finally we identified that J48 and accident occurrences to use is as an expert system.
Rep tree are comparatively best algorithms by f-measure
than Naïve Bayes and Bayesian Network classifiers and ACKNOWLEDGMENT
Random Forest tree is poorest by its f-measure than the The Authors acknowledge Mrs. Tigist Simon Sundado
rest four learning algorithms. (Wud-Mimi) for her unforgettable support during these
thesis accomplishments from the proposal session to the
VI. CONCLUSIONS AND FUTURE WORK final defense and the reviewers for their constructive
comments. Special thanks deserves for my lovely mother
In this study, machine Learning approaches have been Mrs. Zenebech Daka Daracho (Buluke) whom nursed me
applied for data analysis and prediction of vehicle traffic from baby to who I am right now.
accidents. The researcher used seven years accident
datasets which have been used to explore important REFERENCES
features and pattern relationships of datasets to predict
vehicle traffic accident occurrences. Dataset used for this [1] Micheale Kihishen Gebru, "Road traffic accident: Human
study was unbalanced and it was collected from Wolaita security perspective," International Journal of Peace and
Development Studies, vol. 8, no. ISSN 2141–6621, pp. 16,
Zone police commission; it was 1611 instances with 36
March 2017.
attributes. The researchers used F-measure for [2] WHO and World Bank, "World Report on Traffic Injury
performance measurement of the model. The reality Preventions," New York, 2013.
behind is; accuracy is used to measure performance of the [3] Guardian, “Traffic Accident Predictions,” The Guardian
model if and only if the dataset used for experiment is Publisher, United Kingdom, pp. 23, 2012.
balanced. Unless, F-measure is used for performance [4] L. Deng and D, Deep Learning: Methods and Applications.:
Deep Learning Now Publishing, 2014.
evaluation of the model. Classification algorithms

© 2020, IJSRCSE All Rights Reserved 114


Int. J. Sci. Res. in Computer Science and Engineering Vol.8, Issue.4, Aug 2020

[5] Yoshua Bengio, Learning Deep Architectures.: Foundations and [20] Ankit Gupta Malaya Mohanty, "Factors affecting Road crash
Trends in Machine Learning, 2009. Modeling," Journal of Transport Literatures, 2015.
[6] A. Courville, and P. Vincent., Y. Bengio, Representation [21] Genc Burazeri, Bajram Hysa, Enver Roshi Gentiana Qirjako,
Learning: A Review and New Perspectives.: IEEE Trans PAMI, "Factors Associated with Fatal Traffic Accidents in Tirana,
special issue Learning Deep Architectures, 2013. Albania: Crosssectional Study," 2008.
[7] Schick S. (LMU), "Accident Related Factors," Europe, [22] . Ana Lúcia Andrade1, Rafael Alves Guimarães2, Polyana
September 2009. Maria Pimenta Mandavehicle ú 3,4 and Gabriela Camargo
[8] David Ian White, An Inverstigation of Factors Associated with Tobias 4,5 Otaliba Libanio Morais Neto1*, "Regional
Traffic Accidents and Causality Risk in Scotland. Scotland: disparities in road traffic injuries and their determinants in
Napier University, October 2002. Brazil," International Journal for Equity in Health, pp. 4, 2016.
[9] Durga Toshniwal2 Sachin Kumar1, A data mining approach to [23] Hermant Kumar Soni, “Machine Learning AC A new
characterize road accident locations.: Published Online: paradigm of AI,” International Journal of Scientific Research in
Springerlink.com, 2016. Network Security and Communication, Vol.7 , Issue.3 , pp.31-
[10] Armit Kaur Maninder Singh, "A Review on Road accidents in 32, Jun-2019.
Traffic system Using Data Mining Techniques," International [24] N. SelvaKumar, M. Rohini, C. Narmada, M. Yogeshprabhu,
Journal of Science and Research, pp. 6, 2014. “Network Traffic Control Using AI,” International Journal of
[11] Mrs.Bhumika Gupta Pragya Baluni, "A comparative study of Scientific Research in Network Security and Communication,
various Algorithms to explore factors for vehicle collision," Vol.8 , Issue.2 , pp.13-21, Apr-2020.
International Journal of Emerging Trends & Technology in [25] Hermant Kumar Soni, “Cervical Cancer prediction based on
Computer Science (IJETTCS), 2012. Hybrid Feature Selection Model and Classification Algorithm,”
[12] Sani Salisu, Atomsa Yakubu, Yusuf Musa Malgwi, Elrufai International Journal of Computer Sciences and Engineering,
Tijjani Abdullahi, I. A. Mohammed and Nuhu Abdul’alim Vol.8 , Issue.6 , pp.101-105, Jun-2020.
Muhammad L. J. Muhammad, "Using Decision Tree Data [26] Ajeesh Babu, Fathima Basheer, Jayasanker M, Tintu Mariyam
Mining Algorithm to Predict Causes of Road Traffic Accidents, Paul, Sithu Ubaid, “Disease Prediction Using Machine
its Prone Locations and Time along Kano –Wudil Highway," Learning Over Big Data,” International Journal of Computer
International Journal of Database Theory and Applications, Sciences and Engineering, Vol.8 , Issue.7 , pp.11-15, Jul-2020.
2017.
[13] Claus Pastor, Manfred Pfeiffer, Jochen Schmidt Heinz AUTHORS PROFILE
Hautzinger, "Analysys for Accident and Injury Risk studies.,"
Heilbronn University, November 2007. Mr. Aklilu Elias Kurika, MSc in IT, is
[14] *, Akinbola Olutayo2 Dipo T. Akomolafe1, "Using Data working as a lecturer in the Department
Mining Technique to Predict Cause of Accident and Accident of Information Technology at Wolaita
Prone Locations on Highways," American Journal of Database Sodo University, Sodo, Wolaita State
Theory and Application, pp. 1-13, 2012. Ethiopia. He has 4 years of teaching
[15] S. Vasavi, "Extracting Hidden Patterns Within Road Accident
Data Using Machine Learning Techniques," in Information and experiences in various Universities and Colleges. He has
Communication Technology Proceedings, Kanuru, AP, India, presented in 1 International Conferences & presented in 2
pp. 11, 2018. National Conferences at various Engineering and
[16] Dejene Ejigu, Pavel Kromer, Vaclav Snasel, Jan Platos and Informatics Colleges. His areas of specializations and
Ajith Abraham Tibebe Beshah, "Mining Traffic Accident interests are ML, DWDM, AI, NLP and SWE.
Features by Evolutionary Fuzzy Rules," IEEE Symposium on
Computational Intelligence in Vehicles and Transportation
Systems, 2013. Mrs. Tigist Simon Sundado is Working
[17] Micheline Kamber, Jia Pei Jiawei Han, Data Mining Concepts as a lecturer in department of
and Techniques, 3rd ed. Canada, USA: Simon Fraser Information Technology, at Wolaita
University, 2012. Sodo University, Ethiopia. Her areas of
[18] J,H Kamber, Data Mining concepts and techniques, Second interests are NLP, MLA, and DWDM &
Edition, USA, 2010.
[19] Ajith Abraham and Marcin Paprzycki Miao Chong, "Traffic IOT.
Accident Analysis Using Machine Learning Paradigms,"
ResearchGate, pp. 89, December 2004.

© 2020, IJSRCSE All Rights Reserved 115

View publication stats

You might also like