Performance Evaluation of Various Data Mining Algorithms On Road Traf Fic Accident Dataset
Performance Evaluation of Various Data Mining Algorithms On Road Traf Fic Accident Dataset
Performance Evaluation of Various Data Mining Algorithms On Road Traf Fic Accident Dataset
Abstract Many researchers use to spend much of time searching for the best
performing data mining classification and clustering algorithms to apply in road
accident data set for prediction of some classes such causes of the accident, prone
locations and time of the accident, even type of the vehicle used to involve in the
accident. The study was carried out by using two data mining tools—Weka and
Orange. The study evaluated Multi-layer Perceptron, J48, BayesNet classifiers on
150 instances of accident dataset using Weka. The results showed that Multi-layer
Perceptron classifier performed well with 85.33% accuracy, followed by J48 with
78.66% accuracy and BayesNet had 80.66% accuracy. The study had also found
two best rules for association rule mining using Apriori algorithm with 1.0 mini-
mum supports and 1.27 minimum confidences for rule one and 0.91 minimum
supports and 1.15 minimum confidences for rule two. With Silhouette score 0.7,
clustering and dimensionality reduction techniques K-means and Self-Organizing
Maps were also used on the dataset using Orange data mining tool.
S. Hussain (✉)
Dibrugarh University, Dibrugarh, Assam, India
e-mail: [email protected]
L. J. Muhammad ⋅ F. S. Ishaq ⋅ A. Yakubu
Mathematics and Computer Science Department, Federal University, Kashere,
Gombe State, Nigeria
e-mail: [email protected]
F. S. Ishaq
e-mail: [email protected]
A. Yakubu
e-mail: [email protected]
I. A. Mohammed
Department of Computer Science, Yobe State University, Damaturu, Yobe State, Nigeria
e-mail: [email protected]
1 Introduction
Road accident is termed as any crash that happens on a public circulated road that
involves one or more vehicle and killing or injuring one or more people. Thus, the
international act of murder or suicide and natural disasters are excluded [1, 2].
According to the report of [3], total number road traffic death has sited at 1.25
million per year and it further indicated that most of the road traffic casualty occurs
in developing countries. Traffic road accident, according to [4] killed more than 1.2
million people, injured people between 20 and 50 million in 2014 and it was the
ninth common cause of death in the year [4]. Yet, traffic road accident remains
among the most central public health problem that is facing the world and it is still
one of the most common causes of the death worldwide [1–3]. It has been reported
[4] half of the death on the world’s road are pedestrians, cyclists and motorcyclists
those actually have a minimum or no protection. According to [3], over 3400
people die on the world’s road every day and more than 10 million people are
injured or disabled every year.
Nowadays, issue of traffic road accident has become a teething problem in the
world [1]. World Health Organization worked with partners including govern-
mental and non-government organizations around the world to raise the profile of
the preventability of road traffic accident. In many countries around the world, there
are many organizations in the world to maintain road safety for reducing the
menace of the fatal road traffic accident [3].
On the other hand, researchers employed many techniques, especially statistics
techniques to identify the causes of the traffic road accident using the historical road
traffic road dataset. The data miners explored various parameters or variables for the
causes of the road accidents as well as behaviours of the divers by using different
data mining tools and techniques. So many researchers use to spend much of time
searching for the best performing data mining algorithm for mining on the traffic
road accidents data set. This study conducted the performance evaluation of various
data mining algorithms on the dataset of road traffic accident.
2 Literature Review
In the study of [1, 5], explored various parameters of road traffic accidents including
its cause, locations and time along Kano–Wudil Highway, Kano in Nigeria were
predicted using id3 decision data mining classification algorithm. The accuracy of
the classifier was 72.7273% for the cause of accident prediction, for prone location
and time of the accident the accuracy of the classifier was 80.6061% and 76.9697%,
respectively. The study was conducted using Weka data mining software.
In the work of [6], K-means data mining algorithm was applied to group the
location of the road accident into high-frequency, moderate-frequency and
low-frequency categories. The algorithm took the accident frequency count as
Performance Evaluation of Various Data Mining Algorithms … 69
parameters to cluster the locations. Then, the association rule data mining was used
to characterize the locations. However, the accuracy of the classifiers used in the
study was not determined.
In the work of [4], data mining classification algorithms were used on the Fatal
Accident dataset to determine the relationship between fatal rate and other acci-
dental attributes which includes surface condition, collision manner, light condition,
weather and drunk habit of the drivers. The classifier used was Naïve Bayes,
clustering was done by K-means algorithm while Apriori algorithm was applied for
generating the association rules. The accuracy of the classifier was found to be
67.95%. Apriori has 0.4 minimum supports and 0.6 minimum confidences. Weka
data mining software was also used for the analysis.
Elfadil [7] predicted the reasons behind road traffic accident applying Multi-class
SVM (Support Vector Machines) from the data collated from Dubai Police Unit,
United Arab Emirates. The result of the work showed that the model can predict the
cause of the accident with an accuracy of 76.7% and Weka was used for the
Dipo and Akinbola [8] collected the accident data from Lagos–Ibadan road
situated in Nigeria and analyzing the data by using id3 decision tree classifier of
Weka software. The study could predict the cause and location of the road accident
with an accuracy of 77.77%.
In the work of [9] traffic accident data collected, Naïve Bayes classifier was used
to predict the severity of the accident using Weka. Three experiments were con-
ducted, the first experiment conducted with seven attributes and accuracy achieved
was 87.252%, for the second experiment, the attributes increased to eight including
earlier seven attributes and accuracy achieved was 88.0613% and in the third
experiment, the attributes increased to 13 including earlier 8 attributes and accuracy
achieved was 89.4554%.
3 Methodology
3.1 J48
This classifier is used for generating decision tree based on the C4.5 algorithm.
Ross Quinlan developed this algorithm [2].
70 S. Hussain et al.
3.2 BayesNet
This classifier delivers higher accuracy on a large database. It also makes the
computational timeless with better speed. Bayesian Network uses conditional
dependencies using direct graph [1].
The occurrence of an item may be predicted by using the occurrence of other items
in the transactions. The rules that define such transactions in the form X → Y are
called association rules. Support is termed as the frequency of occurrence of a set of
items or itemset, while confidence is a fraction of transaction that contains the
itemset [10].
The data mining evaluation mechanism used for evaluating the performance of the
various classification algorithms to identify the suitable algorithm to be applied in
road traffic accident dataset for prediction include the following.
Accuracy is the proportion of the true positives and true negatives to the total
number of cases.
Specificity specifies the correct negatives divided by the all the negatives as
mentioned below
tn tn
Specificity = = ð2Þ
tn + fp N
Sensitivity or Recall
Performance Evaluation of Various Data Mining Algorithms … 71
tp tp
Recall = = ð3Þ
tp + fn P
Precision is the number of correct positive classifications divided by total number of
positive classifications.
Precision = ð4Þ
tp + fp
K-means clustering is used to group various objects into some clusters in such a
way that the mean of the objects within the cluster is the nearest mean. The goal of
this clustering method is to reduce the intra-cluster variance or to minimize the
squared error [11].
The study used the dataset of traffic road accident of Kano to Wudil high way in
Nigeria. The dataset was applied in the study of [1, 5] to predict the cause of the
accident, prone location and time along Kano to Wudil high way in Nigeria. The
dataset contains four Variables-Vehicle type, Accident Time, Accident Cause and
Accident Location and contains 30 months data staring from January 2014 to June
2016 (Fig. 1).
72 S. Hussain et al.
Weka data mining software was used for the experiment of the dataset of the study.
Weka open source data mining software was used to mine the dataset. Weka can
perform various data mining jobs using machine learning algorithms. The study
applied Multi-layer Perceptron, J48, BayesNet classifiers or algorithms directly on
150 instances to traffic road accident data set. The results found from the experi-
ment are shown in Table 1.
We had used the Apriori Rule Mining to find out the best possible association
rules using Weka. We had found the following two rules and the results of the
experiment are shown below.
1. AccidentCause = WrongOvertaking AccidentLocation = LocationC 15 ⇒
VehicleType = SmallCar 15 <conf:(1)> lift:(1.27) lev:(0.02) [3] conv:(3.2)
2. AccidentTime = Evening AccidentCause = WrongOvertaking 32 ⇒ Vehi-
cleType = SmallCar 29 <conf:(0.91)> lift:(1.15) lev:(0.03) [3] conv:(1.71)
We had applied K-means clustering, Self Organizing Map (SOM) on the datasets
as unsupervised learning using Orange data mining software. Orange is an open
source data mining software for both novice and expert users with great visual-
ization and the large toolbox. The silhouette score of 0.7 was achieved to depict the
meaningful clustering. The figures visualized the clustering and unsupervised
learning results (Figs. 2, 3, 4 and 5).
Three data mining algorithms of Weka software were applied on the 150 records of
traffic road accident on Kano–Wudil road, Nigeria. The experimental results
depicted that, for a prediction on traffic road accident dataset, Multi-layer Percep-
tron is most appropriate, suitable and efficient data mining algorithm to be used. In
the course of the experiment, Multi-layer Perceptron classifier performed well with
85.33% accuracy, followed by J48 with 78.66% accuracy and BayesNet had 80.66%
accuracy. Therefore, the study of Multi-layer Perceptron is recommended to
scholars and researchers used as efficient data mining classifier or algorithm for
predictive tasks. The study had also found two best rules for association rule mining
using Apriori algorithm with 1.0 minimum supports and 1.27 minimum confidences
for rule one and 0.91 minimum supports and 1.15 minimum confidences for rule
two. K-means clustering and Self Organizing Map were also applied to the dataset
with silhouette score of 0.7. The algorithms may be tested with more data and
different datasets for the performance evaluation as a future work.
1. Muhammad, L.J., Sani, S., Yakubu, A., Yusuf, M.M., Elrufai, T.A., Mohammed, I.A., Nuhu,
A.M.: Using decision tree data mining algorithm to predict causes of road traffic accidents,
its prone locations and time along Kano–Wudil highway. Int. J. Datab. Theory Appl. 10,
197–206 (2017)
2. Performance evaluation of the data mining models.
bitstream/10603/7989/14/14_chapter%205.pdf. Accessed 29 Dec 2017
3. Global status report on road safety: World Health Organization (2015).
entity/violence_injury_prevention/publications/road_traffic/en/index.html. Accessed 25 Dec
4. Global status report on road safety: time for action, WHO (2009)
5. Muhammad, L.J., Yakubu, A., Mohammed, I.A.: Data mining driven approach for predicting
causes of road accident. In: 13th International Conference 2017—Information Technology for
Sustainable Development, vol. 28, pp. 10–15. Nigeria Computer Society (2017)
6. Liling, L., Sharad, S., Gongzhu, H.: Analysis of road traffic fatal accidents using data mining
techniques. In: 2017 IEEE 15th International Conference Software Engineering Research,
Management and Applications (SERA) (2017)
7. Elfadil, A.M.: Predicting causes of traffic road accidents using multi-class support vector
machines. In: Proceeding of the 10th International Conference on Data Mining, July 21–24,
2014, pp. 37–42. Las Vegas, Nevada, USA (2014)
8. Dipo, T.A., Akinbola, O.: Using data mining technique to predict cause of accident and
accident prone locations on highways. Am. J. Datab. Theory Appl. 1(3), 26–38 (2012)
9. Jaideep, K., Chandra, P.S.: Mining road traffic accident data to improve safety on road-related
factors for classification and prediction of accident severity. Int. Res. J. Eng. Technol. (IRJET)
03(10) (2016)
10. Rakesh, A., Ramakrishnan, S.: Fast algorithms for mining association rules in large databases.
In: Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487–499,
Sept 12–15 1994
78 S. Hussain et al.
11. Vattani., A.: k-means requires exponentially many iterations even in the plane (PDF). Discrete
Comput. Geometry 45(4), 596–616. (2011).
12. Ultsch, A.: Emergence in self-organizing feature maps. In: Ritter, H., Haschke, R. (eds.),
Proceedings of the 6th International Workshop on Self-Organizing Maps (WSOM ‘07).
Bielefeld, Germany, Neuroinformatics Group (2007). ISBN 978-3-00-022473-7