2013 Selection of The Best Classifier From Different Datasets Using WEKA PDF
2013 Selection of The Best Classifier From Different Datasets Using WEKA PDF
net/publication/313648961
CITATIONS READS
10 1,653
1 author:
Ranjita Dash
National Institute of Technology Rourkela
13 PUBLICATIONS 18 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Ranjita Dash on 13 February 2017.
Abstract
1. Introduction
In today’s world large amount of data is
available in science, industry, business and many other
Data mining is the process of extracting patterns
areas. These data can provide valuable information
from data [10, 11]. It is seen as an increasingly
which can be used by management for making
important tool by modern business to transform data as
important decisions. By using data mining we can find
the technology advances and the need for efficient data
valuable information. Data mining is the popular topic
analysis is required. Data mining involves the use of
among researchers. There is a lot of work that cannot
data analysis tools to discover previously unknown,
be explored till now. But, this paper focuses on the
valid patterns and relationships in large data set. It is
fundamental concept of the Data mining that is
currently used in a wide range of areas like marketing,
RRTT
Classification Techniques. In this paper, Naive Bays ,
surveillance, fraud detection, and scientific discovery
Functions, Lazy, Meta, Nested dichotomies, Rules and
etc.
Trees classifiers are used for the classification of data
In this paper we process a cancer dataset and use
IIJJEE
www.ijert.org 1
International Journal of Engineering Research & Technology (IJERT)
ISSN: 2278-0181
Vol. 2 Issue 3, March - 2013
www.ijert.org 2
International Journal of Engineering Research & Technology (IJERT)
ISSN: 2278-0181
Vol. 2 Issue 3, March - 2013
neurons enable the network to learn complex tasks by Decision tree induction has been studied in details in
extracting progressively more meaningful features from both areas of pattern recognition and machine learning
the input patterns. The network exhibits a high degree [13, 14]. This synthesizes the experience gained by
of connectivity determined by the network. A change in people working in the area of machine learning and
the connectivity of the network requires a change in the describes a computer program called ID3.
population of synaptic connections on their weights[5].
3.3. RULES CLASSIFIER
Association rules are used to find interesting
correction relationship among all the attributes. They
may predict more than one conclusion. The number of
records an association rule can predict correctly is
called coverage. Support is defined as coverage divided
by total number of records[5]. Accuracy is the number
of records that is predicted correctly expressed as a
percentage of all instances that are applied to the
methods of this algorithm are Conjunctive Rule,
Decision table,DTNB,JRip,NNge,Oner,Rider and Zero.
Rules are easier to understand than large trees. One root 4. DISCUSSION AND RESULT
is created for each path from the root to the leaf. Each
attribute value pair along a path forms a conjunction. By investigating the performance on the
The leaf holds the class prediction. Rules are mutually selected classification methods or algorithms namely
exclusive. These are learned one at a time .Each time a Bayes ,Function, Lazy ,Meta ,Rules ,Misc ,nested
rule is learned ,the tuples are covered by the rules are dichotomies and Trees we use the same experiment
removed. procedure as suggested by WEKA. The 75% data is
RRTT
used for training and the remaining is for testing
3.4. LAZY CLASSIFIER purposes.
In WEKA, all data are considered as instances and
IIJJEE
When making a classification or prediction, lazy features in the data are known as attributes. The
learners can be computationally expensive. They simulation results are partitioned into several sub items
require efficient storage techniques and well suited to for easier analysis and evaluation. On the first part,
implementation on parallel hardware. They offer little correctly and incorrectly classified instances will be
explanation or insight into the structure of the data. partitioned in numeric and percentage value and
Lazy learners however, naturally support incremental subsequently time taken to build model will be in
learning. They are able to model complex decision second .The results of the simulation are shown in
spaces having hyper polygonal shapes that may not be Tables. These are the graphical representation of the
as easily describable by other learning algorithms. The simulation result. On the basis of comparison done
methods of this algorithm are IBI, IBK,K- Star, LBK over accuracy and error rates the classification
and LWL. techniques with highest accuracy are obtained for this
dataset in given different machine learning tools.
We can clearly see that the highest accuracy is 75.52%
and the lowest is 51.74%.In fact, the highest accuracy
3.5. META CLASSIFIER belongs to the Meta classifier. The total time required
to build the model is also a crucial parameter in
comparing the classification algorithm. In this
Meta classifier includes a wide range of classifier. experiment, we can say that a single conjunctive rule
When the attributes have a large number of values learner requires the shortest time which is around 0.15
because the time and space complexities depend not seconds compared to the others.
only on the number of attributes, but also on the With the help of figures we are showing the
number of values for each attribute. working of various algorithms used in WEKA. We are
showing also advantages and disadvantages of each
3.6. DECISSION TREES algorithm. Every algorithm has their own importance
and we use them on the behaviour of the data. Deep
knowledge of algorithms is not required for working in
WEKA. This is the main reason WEKA is more
www.ijert.org 3
International Journal of Engineering Research & Technology (IJERT)
ISSN: 2278-0181
Vol. 2 Issue 3, March - 2013
suitable tool for data mining applications. This paper Figure no-1
shows only the clustering operations in the WEKA, we
will try to make a complete reference paper of WEKA.
incorrectly classcified
Table for best algorithms:-
Name of Correctly Incorrectl Time taken
instance
algorithm classified y to build the
35
instance classified model 30
25
instances 20
15
Bayesnet 72.028 27.972 0.03 10
5 incorrectly
0
oridinalclassclass…
Simple 75.1748 24.8252 1.44 classcified
kstar
j48
decission table
bayesnet
edmisc.hyperpipes
simple logistic
filteredclasscifier
instance
logistic
simple logistic
correctly classified instances
76
75
74 Figure no-3
73
72
71
70
69
68 correctly 4.2 Comparison between LUNG dataset,
67 classified HEART dataset, DIABETES DATASET
oridinalclassclasscif…
edmisc.hyperpipes
j48
bayesnet
kstar
decission table
simple logistic
filteredclasscifier
instances
www.ijert.org 4
International Journal of Engineering Research & Technology (IJERT)
ISSN: 2278-0181
Vol. 2 Issue 3, March - 2013
www.ijert.org 5
International Journal of Engineering Research & Technology (IJERT)
ISSN: 2278-0181
Vol. 2 Issue 3, March - 2013
[1] D.Lavanya, Dr.K.Usha Rani,..,” Analysis of feature [11] J. Han and M. Kamber, (2000) “Data Mining: Concepts
selection with classification: Breast cancer datasets”,Indian and Techniques,” Morgan Kaufmann.
Journal of Computer Science and Engineering
(IJCSE),October 2011. [12] William H. Wolberg, M.D., W. Nick Street, Ph.D.,
[2] E.Osuna, R.Freund, and F. Girosi, “Training support Dennis M. Heisey, Ph.D., Olvi L. Mangasarian, Ph.D.
vector machines: Application to face detection”. Proceedings computerized breast cancer diagnosis and prognosis from
of computer vision and pattern recognition, Puerto Rico pp. fine needle aspirates, Western Surgical Association meeting
130–136.1997. in Palm Desert, California, November 14, 1994.
[3] Buntine, Theory refinement on Bayesian networks. In B. [13] Chen, Y., Abraham, A., Yang, B.(2006), Feature
D. D’Ambrosio, P. Smets, & P.P. Bonissone (Eds.), In Press Selection and Classification using Flexible Neural Tree.
of Proceedings Journal of Neurocomputing 70(1-3): 305–313.
of the Seventh Annual Conference on Uncertainty Artificial
Intelligent (pp. 52-60). San Francisco, CA [14] K. Golnabi, et al., "Analysis of firewall policy rules
using data mining techniques," 2006, pp. 305-315.
[4] S. V. Chakravarthy and J. Ghosh (1994), Scale Based
Clustering [15] Duda, R.O., Hart, P.E.: “Pattern Classification and Scene
using Radial Basis Function Networks, In Press of Analysis”, In: Wiley-Interscience Publication, New York
Proceeding of (1973)
IEEE International Conference on Neural Networks, Orlando,
Florida.
pp. 897-902. 5. M. D. Buhmann (2003), Radial Basis [16] Bishop, C.M.: “Neural Networks for Pattern
Functions: Theory and Implementations, Recognition”. Oxford University Press, New York (1999).
RRTT
[17] Vapnik, V.N., The Nature of Statistical Learning Theory,
[5] Howell, A.J. and Buxton, H. (2002). RBF Network 1st ed., Springer-Verlag, New York, 1995.
Methods for Face
Detection and Attentional Frames, Neural Processing Letters [18] Ross Quinlan, (1993) C4.5: Programs for Machine
IIJJEE
www.ijert.org 6
International Journal of Engineering Research & Technology (IJERT)
ISSN: 2278-0181
Vol. 2 Issue 3, March - 2013
RRTT
IIJJEE
www.ijert.org 7
View publication stats