Comparative Analysis of Classification Algorithms On Diferrent Dataset Using Weka SW PDF
Comparative Analysis of Classification Algorithms On Diferrent Dataset Using Weka SW PDF
ABSTRACT
Data mining is the upcoming research area to solve various
problems and classification is one of main problem in the field User Interface
of data mining. In this paper, we use two classification
algorithms J48 (which is java implementation of C4.5
algorithm) and multilayer perceptron alias MLP (which is a Pattern Evaluation
modification of the standard linear perceptron) of the Weka Knowledge
interface. It can be used for testing several datasets. The Base
Data Mining Engine
performance of J48 and Multilayer Perceptron have been
analysed so as to choose the better algorithm based on the
conditions of the datasets. The datasets have been chosen
Database or
from UCI Machine Learning Repository. Algorithm J48 is Data Warehouse Server
based on C4.5 decision based learning and algorithm
Multilayer Perceptron uses the multilayer feed forward neural
network approach for classification of datasets. When Data cleaning, integration and selection
21
International Journal of Computer Applications (0975 – 8887)
Volume 54– No.13, September 2012
The alternating decision tree method [11] is a classification It is portable, since it is fully implemented in the
algorithm that tries to combine the interpretability of decision Java programming language and thus runs on
trees with the accuracy improvement obtained by boosting. almost any architecture
It is a huge collection of data preprocessing and
3. METHODOLOGY modeling techniques
3.1 Datasets It is easy to use due to its graphical user interface
There are five datasets we have used in our paper taken from
UCI Machine Learning Repository [12]. The details of each Weka supports several standard data mining tasks, more
datasets are shown in Table 1. specifically, data preprocessing, clustering, classification,
regression, visualization, and feature selection. All techniques
Table 1: Details of 5 datasets of Weka's software are predicated on the assumption that the
data is available as a single flat file or relation, where each
No. of data point is described by a fixed number of attributes
Datasets Instances Attributes Type
Classes (normally, numeric or nominal attributes, but some other
balance-scale 625 5 3 Numeric attribute types are also supported).
diabetes 768 9 2 Numeric 3.3 Classification algorithm J48
glass 214 10 7 Numeric J48 algorithm of Weka software is a popular machine learning
algorithm based upon J.R. Quilan C4.5 algorithm. All data to
lymphography 148 19 4 Nominal be examined will be of the categorical type and therefore
vehicle 946 19 4 Numeric continuous data will not be examined at this stage. The
algorithm will however leave room for adaption to include
this capability. The algorithm will be tested against C4.5 for
The first dataset balance-scale [12] was generated to model
verification purposes [5].
psychological experimental results. The attributes are the left
weight, the left distance, the right weight, and the right In Weka, the implementation of a particular learning
distance. The correct way to find the class is the greater of algorithm is encapsulated in a class, and it may depend on
(left-distance * left-weight) and (right-distance * right- other classes for some of its functionality. J48 class builds a
weight). If they are equal, it is balanced. C4.5 decision tree. Each time the Java virtual machine
executes J48, it creates an instance of this class by allocating
In the diabetes dataset [12] several constraints were placed on
memory for building and storing a decision tree classifier. The
the selection of instances from a larger database. In particular,
algorithm, the classifier it builds, and a procedure for
all patients here are females at least 21 years old of Pima
outputting the classifier is all part of that instantiation of the
Indian heritage.
J48 class.
The glass dataset [12] is used to determine whether the glass Larger programs are usually split into more than one class.
was a type of "float" glass or not. The J48 class does not actually contain any code for building
a decision tree. It includes references to instances of other
In the lymphography dataset [12] there is one of three classes that do most of the work. When there are a number of
domains provided by the Oncology Institute that has classes as in Weka software they become difficult to
repeatedly appeared in the machine learning literature. comprehend and navigate [14].
The datastet vehicle [12] is used to classify a given outline as 3.4 Classification function Multilayer
one of four types of vehicle, using a set of features extracted Perceptron
from the profile. The vehicle may be viewed from one of Multilayer Perceptron classifier is based upon
many different angles. backpropagation algorithm to classify instances. The network
is created by an MLP algorithm. The network can also be
3.2 Weka interface monitored and modified during training time. The nodes in
Weka (Waikato Environment for Knowledge Analysis) is a this network are all sigmoid (except for when the class is
popular suite of machine learning software written in Java, numeric in which case the output nodes become
developed at the University of Waikato, New Zealand [13]. unthresholded linear units).
The Weka suite contains a collection of visualization tools
and algorithms for data analysis and predictive modeling, The backpropagation neural network is essentially a network
together with graphical user interfaces for easy access to this of simple processing elements working together to produce a
functionality. complex output. The backpropagation algorithm performs
learning on a multilayer feed-forward neural network. It
The original non-Java version of Weka was TCL/TK front- iteratively learns a set of weights for prediction of the class
end software used to model algorithms implemented in other label of tuples. A multilayer feed-forward neural network
programming languages, plus data preprocessing utilities in C, consists of an input layer, one or more hidden layers, and an
and a Makefile-based system for running machine learning output layer. An example of a multilayer feed-forward
experiments. network is shown in Fig 2 [2].
This Java-based version (Weka 3) is used in many different
application areas, in particular for educational purposes and
research. There are various advantages of Weka:
It is freely available under the GNU General Public
License
22
International Journal of Computer Applications (0975 – 8887)
Volume 54– No.13, September 2012
xn wnj
Predicted class
A B C Total
A 235 10 43 288
Actual
B 32 0 17 49
class Fig 4: Accuracy chart on diabetes
C 32 12 244 288
Total 625 In diabetes dataset the accuracy parameters have shown in
Table 4 and Fig 4. The above chart shows that it have almost
Accuracy = (TPA+TPB + TPC)/(Total number of classification) equal accuracy measures except ROC Area measure in which
MLP has higher accuracy on the diabetes dataset. So, MLP is
i.e. Accuracy = (235+0+244)/625 = 76.64 better method for diabetes.
23
International Journal of Computer Applications (0975 – 8887)
Volume 54– No.13, September 2012
In glass dataset accuracy parameters have shown in Table 5 In vehicle dataset accuracy parameters have shown in Table 7
and Fig 5. The above chart shows that it have almost equal and Fig 7. Algorithm MLP has better accuracy measure
accuracy measures except ROC Area measure in which MLP except FP rate. So MLP is better method for vehicle dataset.
has higher accuracy on the glass dataset. So, MLP is better
method for glass dataset. Table 8. Accuracy measure of J48 and MLP
S.N. Datasets J48 MLP
Table 6. Accuracy on lymphography
S.N. Parameters J48 MLP 1 balance-scale 76.64 90.72
24
International Journal of Computer Applications (0975 – 8887)
Volume 54– No.13, September 2012
25