0% found this document useful (0 votes)
86 views

Comparative Analysis of Classification Algorithms On Diferrent Dataset Using Weka SW PDF

This document compares the performance of two classification algorithms, J48 and Multilayer Perceptron, on different datasets from the UCI Machine Learning Repository using the Weka data mining software. J48 is based on the C4.5 decision tree algorithm while Multilayer Perceptron uses a multilayer feedforward neural network approach. The performance of the two algorithms is analyzed on five datasets to determine which algorithm performs better under different conditions. In most cases, Multilayer Perceptron achieved better performance than J48.

Uploaded by

hayder H
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
86 views

Comparative Analysis of Classification Algorithms On Diferrent Dataset Using Weka SW PDF

This document compares the performance of two classification algorithms, J48 and Multilayer Perceptron, on different datasets from the UCI Machine Learning Repository using the Weka data mining software. J48 is based on the C4.5 decision tree algorithm while Multilayer Perceptron uses a multilayer feedforward neural network approach. The performance of the two algorithms is analyzed on five datasets to determine which algorithm performs better under different conditions. In most cases, Multilayer Perceptron achieved better performance than J48.

Uploaded by

hayder H
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

International Journal of Computer Applications (0975 – 8887)

Volume 54– No.13, September 2012

Comparative Analysis of Classification Algorithms on


Different Datasets using WEKA
Rohit Arora Suman
M.Tech. CSE Deptt. Asstt. Prof. CSE Deptt.
Hindu College of Engineering Hindu College of Engineering
Sonepat, Haryana, India Sonepat, Haryana, India

ABSTRACT
Data mining is the upcoming research area to solve various
problems and classification is one of main problem in the field User Interface
of data mining. In this paper, we use two classification
algorithms J48 (which is java implementation of C4.5
algorithm) and multilayer perceptron alias MLP (which is a Pattern Evaluation
modification of the standard linear perceptron) of the Weka Knowledge
interface. It can be used for testing several datasets. The Base
Data Mining Engine
performance of J48 and Multilayer Perceptron have been
analysed so as to choose the better algorithm based on the
conditions of the datasets. The datasets have been chosen
Database or
from UCI Machine Learning Repository. Algorithm J48 is Data Warehouse Server
based on C4.5 decision based learning and algorithm
Multilayer Perceptron uses the multilayer feed forward neural
network approach for classification of datasets. When Data cleaning, integration and selection

comparing the performance of both algorithms we found


Multilayer Perceptron is better algorithm in most of the cases.

Keywords Data World Wide Other Info


Classification, Data Mining Techniques, Decision Tree, Database Warehouse Web Repositories
Multilayer Perceptron

1. INTRODUCTION Fig 1: Architecture of a Typical Data Mining System


Data mining is the process to pull out patterns from large
datasets by joining methods from statistics and artificial 2. RELATED WORK
intelligence with database management. It is an upcoming Recently studies have been done on various performance of
field in today world in much discipline. It has been accepted decision tree and on backpropagation.
as technology growth and the need for efficient data analysis
Classification is a classical problem in machine learning and
is required. The plan of data mining is not to give tight rules
data mining [3].
by analysing the data set, it is used to guess with some
certainty while only analysing a small set of the data. Decision trees are popular because they are practical and easy
to understand. Rules can also be extracted from decision trees
In recent times, data mining has been obtained a great
easily. Many algorithms, such as ID3 [4] and C4.5 [5], have
attention in the knowledge and information industry due to the
been devised for decision tree construction.
vast availability of large amounts of data and the forthcoming
need for converting such data into meaningful information In [6] neural networks are suitable in data-rich environments
and knowledge. The data mining technology is one and are typically used for extracting embedded knowledge in
comprehensive application of technology item relying on the the form of rules, quantitative evaluation of these rules,
database technology, statistical analysis, artificial intelligence, clustering, self-organization, classification and regression.
and it has shown great commercial value and gradually to They have an advantage, over other types of machine learning
other profession penetration in the retail, insurance, algorithms, for scaling.
telecommunication, power industries use [1].
The use of neural networks in classification is not uncommon
The major components of the architecture for a typical data in machine learning community [7]. In some cases, neural
mining system are shown in Fig 1 [2]. networks give a lower classification error rate than the
decision trees but require longer learning time [8], [9]. A
Good system architecture will make possible the data mining
decision tree can be converted to a set of (mutually exclusive)
system to make best use of the software environment. It
rules, each one corresponding to a tree branch. Algorithms
achieves data mining tasks in an effective and proper way to
have been proposed to learn directly sets of rules (that may
exchange information with other systems which is adaptable
not be representable by a tree) [10] or to simplify the set of
to users with diverse requirements and change with time.
rules corresponding to a decision tree [5].

21
International Journal of Computer Applications (0975 – 8887)
Volume 54– No.13, September 2012

The alternating decision tree method [11] is a classification  It is portable, since it is fully implemented in the
algorithm that tries to combine the interpretability of decision Java programming language and thus runs on
trees with the accuracy improvement obtained by boosting. almost any architecture
 It is a huge collection of data preprocessing and
3. METHODOLOGY modeling techniques
3.1 Datasets  It is easy to use due to its graphical user interface
There are five datasets we have used in our paper taken from
UCI Machine Learning Repository [12]. The details of each Weka supports several standard data mining tasks, more
datasets are shown in Table 1. specifically, data preprocessing, clustering, classification,
regression, visualization, and feature selection. All techniques
Table 1: Details of 5 datasets of Weka's software are predicated on the assumption that the
data is available as a single flat file or relation, where each
No. of data point is described by a fixed number of attributes
Datasets Instances Attributes Type
Classes (normally, numeric or nominal attributes, but some other
balance-scale 625 5 3 Numeric attribute types are also supported).
diabetes 768 9 2 Numeric 3.3 Classification algorithm J48
glass 214 10 7 Numeric J48 algorithm of Weka software is a popular machine learning
algorithm based upon J.R. Quilan C4.5 algorithm. All data to
lymphography 148 19 4 Nominal be examined will be of the categorical type and therefore
vehicle 946 19 4 Numeric continuous data will not be examined at this stage. The
algorithm will however leave room for adaption to include
this capability. The algorithm will be tested against C4.5 for
The first dataset balance-scale [12] was generated to model
verification purposes [5].
psychological experimental results. The attributes are the left
weight, the left distance, the right weight, and the right In Weka, the implementation of a particular learning
distance. The correct way to find the class is the greater of algorithm is encapsulated in a class, and it may depend on
(left-distance * left-weight) and (right-distance * right- other classes for some of its functionality. J48 class builds a
weight). If they are equal, it is balanced. C4.5 decision tree. Each time the Java virtual machine
executes J48, it creates an instance of this class by allocating
In the diabetes dataset [12] several constraints were placed on
memory for building and storing a decision tree classifier. The
the selection of instances from a larger database. In particular,
algorithm, the classifier it builds, and a procedure for
all patients here are females at least 21 years old of Pima
outputting the classifier is all part of that instantiation of the
Indian heritage.
J48 class.
The glass dataset [12] is used to determine whether the glass Larger programs are usually split into more than one class.
was a type of "float" glass or not. The J48 class does not actually contain any code for building
a decision tree. It includes references to instances of other
In the lymphography dataset [12] there is one of three classes that do most of the work. When there are a number of
domains provided by the Oncology Institute that has classes as in Weka software they become difficult to
repeatedly appeared in the machine learning literature. comprehend and navigate [14].

The datastet vehicle [12] is used to classify a given outline as 3.4 Classification function Multilayer
one of four types of vehicle, using a set of features extracted Perceptron
from the profile. The vehicle may be viewed from one of Multilayer Perceptron classifier is based upon
many different angles. backpropagation algorithm to classify instances. The network
is created by an MLP algorithm. The network can also be
3.2 Weka interface monitored and modified during training time. The nodes in
Weka (Waikato Environment for Knowledge Analysis) is a this network are all sigmoid (except for when the class is
popular suite of machine learning software written in Java, numeric in which case the output nodes become
developed at the University of Waikato, New Zealand [13]. unthresholded linear units).
The Weka suite contains a collection of visualization tools
and algorithms for data analysis and predictive modeling, The backpropagation neural network is essentially a network
together with graphical user interfaces for easy access to this of simple processing elements working together to produce a
functionality. complex output. The backpropagation algorithm performs
learning on a multilayer feed-forward neural network. It
The original non-Java version of Weka was TCL/TK front- iteratively learns a set of weights for prediction of the class
end software used to model algorithms implemented in other label of tuples. A multilayer feed-forward neural network
programming languages, plus data preprocessing utilities in C, consists of an input layer, one or more hidden layers, and an
and a Makefile-based system for running machine learning output layer. An example of a multilayer feed-forward
experiments. network is shown in Fig 2 [2].
This Java-based version (Weka 3) is used in many different
application areas, in particular for educational purposes and
research. There are various advantages of Weka:
 It is freely available under the GNU General Public
License

22
International Journal of Computer Applications (0975 – 8887)
Volume 54– No.13, September 2012

The confusion matrix helps us to find the various evaluation


Input Hidden Output
layer layer layer measures like Accuracy, Recall, Precision etc.

x1 Table 3. Accuracy on balance-scale


w1j S.N. Parameters J48 MLP
1 TP Rate 0.77 0.91
x2 w2j 2 FP Rate 0.17 0.04
.
.
3 Precision 0.73 0.92
wij wjk
.
4 Recall 0.77 0.91
xi 5 F-Measure 0.75 0.91
.
.
Oj Ok 6 ROC Area 0.81 0.98
.

xn wnj

Fig 2: A multilayer feed-forward neural network


Each layer is made up of units. The inputs to the network
correspond to the attributes measured for each training tuple.
The inputs are fed simultaneously into the units making up the
input layer. These inputs pass through the input layer and are
then weighted and fed simultaneously to a second layer of
“neuronlike” units, known as a hidden layer. The outputs of
the hidden layer units can be input to another hidden layer,
and so on. The number of hidden layers is arbitrary, although
in practice, usually only one is used [2]. At the core,
backpropagation is simply an efficient and exact method for Fig 3: Accuracy chart on balance-scale
calculating all the derivatives of a single target quantity (such
as pattern classification error) with respect to a large set of In balance-scale dataset accuracy parameters have shown in
input quantities (such as the parameters or weights in a Table 3 and Fig 3. Algorithm J48 having lower value than
classification rule) [15]. To improve the classification MLP. So MLP is better method for balance-scale dataset.
accuracy we should reduce the training time of neural network Table 4. Accuracy on diabetes
and reduce the number of input units of the network [16].
S.N. Parameters J48 MLP
4. RESULTS 1 TP Rate 0.74 0.75
For evaluating a classifier quality we can use confusion
matrix. Consider the algorithm J48 running on balance-scale 2 FP Rate 0.33 0.31
dataset in WEKA, for this dataset we obtain three classes then 3 Precision 0.74 0.75
we have 3x3 confusion matrix. The number of correctly
4 Recall 0.74 0.75
classified instances is the sum of diagonals in the matrix; all
others are incorrectly classified. Let TPA be the number of 5 F-Measure 0.74 0.75
true positives of class A, TPB be the number of true positives 6 ROC Area 0.75 0.79
of class B and TPC be the number of true positives of class C.
Then, TPA refers to the positive tuples that were correctly
labeled by the classifier in first row-first column i.e. 235.
Similarly, TPB refer to the positive tuples that were correctly
labeled by the classifier in second row-second column i.e. 0.
And, TPC refer to the positive tuples that were correctly
labeled by the classifier in third row-third column i.e. 244
shown in Table 2.
Table 2. Confusion matrix of three classes of balance-scale

Predicted class
A B C Total
A 235 10 43 288
Actual
B 32 0 17 49
class Fig 4: Accuracy chart on diabetes
C 32 12 244 288
Total 625 In diabetes dataset the accuracy parameters have shown in
Table 4 and Fig 4. The above chart shows that it have almost
Accuracy = (TPA+TPB + TPC)/(Total number of classification) equal accuracy measures except ROC Area measure in which
MLP has higher accuracy on the diabetes dataset. So, MLP is
i.e. Accuracy = (235+0+244)/625 = 76.64 better method for diabetes.

23
International Journal of Computer Applications (0975 – 8887)
Volume 54– No.13, September 2012

Table 5. Accuracy on glass Table 7. Accuracy on vehicle


S.N. Parameters J48 MLP S.N. Parameters J48 MLP
1 TP Rate 0.67 0.68 1 TP Rate 0.73 0.82
2 FP Rate 0.13 0.14 2 FP Rate 0.09 0.06
3 Precision 0.67 0.67 3 Precision 0.72 0.81
4 Recall 0.67 0.68 4 Recall 0.73 0.82
5 F-Measure 0.67 0.66 5 F-Measure 0.72 0.82
6 ROC Area 0.81 0.85 6 ROC Area 0.86 0.95

Fig 5: Accuracy chart on glass Fig 7: Accuracy chart on vehicle

In glass dataset accuracy parameters have shown in Table 5 In vehicle dataset accuracy parameters have shown in Table 7
and Fig 5. The above chart shows that it have almost equal and Fig 7. Algorithm MLP has better accuracy measure
accuracy measures except ROC Area measure in which MLP except FP rate. So MLP is better method for vehicle dataset.
has higher accuracy on the glass dataset. So, MLP is better
method for glass dataset. Table 8. Accuracy measure of J48 and MLP
S.N. Datasets J48 MLP
Table 6. Accuracy on lymphography
S.N. Parameters J48 MLP 1 balance-scale 76.64 90.72

1 TP Rate 0.77 0.85 2 diabetes 73.828 75.391


2 FP Rate 0.19 0.16 3 glass 66.822 67.757
3 Precision 0.78 0.84 4 lymphography 77.027 84.46
4 Recall 0.77 0.85 5 vehicle 72.459 81.679
5 F-Measure 0.77 0.83 From the values of Table 8 and the chart shown in Fig 8, the
6 ROC Area 0.79 0.92 accuracy measures is calculated on J48 and MLP algorithms.

Fig 8: Accuracy chart of J48 and MLP


Fig 6: Accuracy chart on lymphography
The J48 and MLP classification algorithm applies on all the
In lymphography dataset accuracy parameters have shown in datasets for accuracy measure. From the above chart in Fig 8
Table 6 and Fig 6. MLP has better accuracy measures except it is clear that MLP gives better results for almost 4 datasets
FP rate. So, MLP is better method for lymphography dataset. and approximate equal accuracy for glass dataset. Hence we
can clearly say that MLP is better algorithm than J48 for the
given 5 datasets.

24
International Journal of Computer Applications (0975 – 8887)
Volume 54– No.13, September 2012

5. CONCLUSION [6] Y. Bengio, J. M. Buhmann, M. Embrechts, and J. M.


In this paper, we evaluate the performance in terms of Zurada, "Introduction to the special issue on neural
classification accuracy of J48 and Multilayer Perceptron networks for data mining and knowledge discovery,"
algorithms using various accuracy measures like TP rate, FP IEEE Trans. Neural Networks, vol. 11, pp. 545-549,
rate, Precision, Recall, F-measure and ROC Area. Accuracy 2000.
has been measured on each datasets. On balance-scale, [7] D. Michie, D.J. Spiegelhalter, and C.C. Taylor, "Machine
lymphography and vehicle datasets Multilayer Perceptron is Learning, Neural and Statistical Classification", Ellis
clearly better algorithm. On diabetes and glass datasets Horwood Series in Artificial Intelligence, 1994.
accuracy is almost equal and Multilayer Perceptron is slightly
better algorithm. Thus we found that Multilayer Perceptron is [8] J.R. Quinlan, "Comparing Connectionist and Symbolic
better algorithm in most of the cases. Generally neural Learning Methods," S.J. Hanson, G.A. Drastall, and R.L.
networks have not been suited for data mining but from the Rivest, eds., Computational Learning Theory and Natural
above results we conclude that algorithm based on neural Learning Systems, vol. 1, pp. 445-456. A Bradford Book,
network has better learning capability hence suited for MIT Press, 1994.
classification problems if learned properly. [9] J.W. Shavlik, R.J. Mooney, and G.G. Towell, "Symbolic
and Neural Learning Algorithms: An Experimental
6. FUTURE SCOPE Comparison," Machine Learning, vol. 6, no. 2, pp. 111-
For the future work more algorithms from classification can
143, 1991.
be incorporated and much more datasets should be taken or
try to get the real dataset from the industry to have the actual [10] P. Clark and T. Niblett, "The CN2 induction algorithm.
impact of the performance of algorithms taken into Machine learning", 3(4):261-283, 1989.
consideration. Moreover, in Multilayer Perceptron algorithm
speed of learning with respect to number of attributes and the [11] Y. Freund and L. Mason. The alternating decision tree
number of instances can be taken into consideration for the algorithm. In Proceedings of the 16th International
performance. Conference on Machine Learning, pages 124-133, 1999.
[12] UCI Machine Learning Repository:
7. REFERENCES http://archive.ics.uci.edu/ml/datasets.html
[1] Z. Haiyang, "A Short Introduction to Data Mining and Its
Applications", IEEE, 2011 [13] Weka: http://www.cs.waikato.ac.nz/~ml/weka/
[2] J. Han and M. Kamber, “Data Mining: Concepts and [14] I. H. Witten, E. Frank, and M. A. Hall, Data Mining:
Techniques”, Morgan Kaufmann, 2nd , 2006 Practical Machine Learning Tools and Techniques, 3rd
ed. Morgan Kaufmann, 2011
[3] R. Agrawal, T. Imielinski, and A.N. Swami, "Database
Mining: A Performance Perspective," IEEE Trans. [15] P. J. Werbos, "Backpropagation Through Time: What It
Knowledge and Data Engineering, vol. 5, no. 6, pp. 914- Does and How to Do It", IEEE, 1990
925, Dec. 1993.
[16] H. Lu, R. Setiono, and H. Liu, "Effective Data Mining
[4] J.R. Quinlan, "Induction of Decision Trees," Machine Using Neural Networks", IEEE, 1996
Learning, vol. 1, no. 1, pp. 81-106, 1986.
[5] J.R. Quinlan, C4.5: Programs for Machine Learning.
Morgan Kaufmann, 1993.

25

You might also like