Decision Tree Analysis On J48 Algorithm PDF
Decision Tree Analysis On J48 Algorithm PDF
Manish Mathuria
Dept. of C.E. & I. T.,
Govt. Engineering College,
Ajmer, India
Abstract The Data Mining is a technique to drill database for giving meaning to the approachable data. It involves
systematic analysis of large data sets. The classification is used to manage data, sometimes tree modelling of data
helps to make predictions about new data. This research is focussed on J48 algorithm which is used to create
Univariate Decision Trees. The research study also discuss about the idea of multivariate decision tree with process of
classify instance by using more than one attribute at each internal node. The core concept behind the topic is to get
depth knowledge with new areas of research by explore more about data, information, knowledge, data mining
techniques, and tools. All the results with experiment on Weka are finally examined.
Keywords Data Mining; Classification Techniques; J48; Decision Trees; Univariate algorithm; Multivariate
algorithm; Pruning
I. INTRODUCTION
Weka is open source software for data mining under the GNU General public license. This system is developed at the
University of Waikato in New Zealand. Weka stands for the Waikato Environment for knowledge analysis. Weka is
freely available at http://www.cs.waikato.ac.nz/ml/weka. The system is written using object oriented language java.
Weka provides implementation of state-of-the-art data mining and machine learning algorithm. User can perform
association, filtering, classification, clustering, visualization, regression etc. by using weka tool. Each and every
organization is accession vast and amplifying amounts of data in different formats and different databases at different
platforms. This data provides any meaningful information that can be used to know anything about any object.
Information is nothing just data with some meaning or processed data. Information is than converted to knowledge to use
with KDD.
Data Mining is a non trivial extraction of implicit, previously unknown, and imaginable useful information from data.
Data mining finds important information hidden in large volumes of data. Data mining is the reasoning of data. It is the
use of software techniques for finding patterns and consistency in sets of data [12]. Data Mining is an interdisciplinary
field involving: Databases, Statistics, and Machine Learning. There are various techniques available for data mining as
given below:A. Association Rule Learning: - This is also called market basket analysis or dependency modelling. It is used to
discover relationship and association rules among variables.
B. Clustering: - This technique creates and discovers group of similar data items. This is also called unsupervised
classification.
C. Classification: - This can classify data according to their classes i.e. put data in single group that belongs to a
common class. This is also called supervised classification.
D. Regression: - It tries to find a function that model the data with least errors.
E. Summarization: - It provides easy to understand and analysis facility through visualization, reports etc [11].
It is possible to mine data with computer that automates this process. Various data mining tools are available in market
some are: Environment for DeveLoping KDD-Applications Supported by Index-Structures (ELKI)
jHepWork
Konstanz Information Miner (KNIME)
Orange (software)
RapidMiner
Scriptella ETL ETL (Extract-Transform-Load) and script execution tool
Weka [11].
II.
DECISION TREE
A decision tree is a decision support system that uses a tree-like graph decisions and their possible after-effect, including
chance event results, resource costs, and utility. A Decision Tree, or a classification tree, is used to learn a classification
function which concludes the value of a dependent attribute (variable) given the values of the independent (input)
2013, IJARCSSE All Rights Reserved
Page | 1114
Neeraj et al., International Journal of Advanced Research in Computer Science and Software Engineering 3(6),
June - 2013, pp. 1114-1119
attributes (variables). This verifies a problem known as supervised classification because the dependent attribute and the
counting of classes (values) are given [4].
Decision trees are the most powerful approaches in knowledge discovery and data mining. It includes the
technology of research large and complex bulk of data in order to discover useful patterns. This idea is very important
because it enables modelling and knowledge extraction from the bulk of data available. All theoreticians and specialist
are continually searching for techniques to make the process more efficient, cost-effective and accurate. Decision trees
are highly effective tools in many areas such as data and text mining, information extraction, machine learning, and
pattern recognition.
Decision tree offers many benefits to data mining, some are as follows: It is easy to understand by the end user.
It can handle a variety of input data: Nominal, Numeric and Textual
Able to process erroneous datasets or missing values
High performance with small number of efforts
This can be implemented data mining packages over a variety of platforms [10].
A tree includes: - A root node, leaf nodes that represent any classes, internal nodes that represent test conditions (applied
on attributes) as shown in figure 1.
Find the best splitting attribute (depending upon current selection criterion) [4].
B. Counting information gain
Entropy is used in this process. Entropy is a measure of disorder of data. Entropy is measured in bits, nats or bans. This
is also called measurement of uncertainty in any random variable. Just suppose that there is a fair coin, if single toss is
performed on that coin than its entropy will be one bit. A series of two fair coins tosses will have entropy of two bits.
Now if coin is not fair than there is uncertainty and this provides lower entropy rate.
Entropy for any P can be calculated as:The conditional entropy is:-
If base is 2 for logarithm than entropy measurement unit will be in bits, if base is 10 than unit is dits. Information Gain is
used for measuring association between inputs and outputs. It is a state to state change in information entropy. Finally
information gain can be calculated as: 2013, IJARCSSE All Rights Reserved
Page | 1115
Neeraj et al., International Journal of Advanced Research in Computer Science and Software Engineering 3(6),
June - 2013, pp. 1114-1119
To get a small and efficient tree, splitting should be based on highest gain. Just suppose that there are 9 male (m) and 5
female (f) in a class instance. This instance in divided further into two different groups or instances on the bases of their
calculated entropy and information gain. So 3m and 4f as left instance and 6m and 1f as right instance. Entropy and
information gain can be measured just by putting values in formula as given below:Entropy_bef = -5/14*log (5/14) - 9/14*log
(9/14)
Entropy_left = -3/7*log (3/7) -4/7*log (4/7)
Entropy__right = -6/7*log (6/7) -1/7*log
(1/7)
Entropy_aft
=
7/14*Entropy_left
+
7/14*Entropy__right
R := add r to R
return R
Some Separate and Conquer rule learning schemes are:
Reduced-error pruning for rules
Where wi are real-valued coefficients, yi are attributes and n is total no of attributes in an instance. Figure 3 and 4 shows
the difference between univariate and multivariate space partitioning and also represents that how multivariable test
conditions are placed on internal nodes [4].
Page | 1116
Neeraj et al., International Journal of Advanced Research in Computer Science and Software Engineering 3(6),
June - 2013, pp. 1114-1119
Construction
Firstly we should have a set of training instances. This uses a top-down decision tree algorithm and merit selection
criteria to choose the best splitting attribute to create a branch. Thus we have two partitions. Algorithm will apply same
top-down analysis to make further more partitions. One of the stopping criteria is when all the attributes values belong to
a single class. There is only difference in splitting criteria if do comparison between multi and univariate tree
construction. Multivariate DT uses LM (linear machine) [4].
LM
A linear machine is a set of R linear discriminant functions that are used collectively to assign an instance to one of the R
classes. Here p is an instance description that consists 1 and the n features that describe the instance. Discriminat function
gi(p) for multivariate has the form as [4]:Where
is a vector of n + 1 coefficients. The LM sates that instance p will belong to class i iff
= 1,...N. Here, the absolute error correction rule, and the thermal
, where j is the class to which the LM incorrectly assigns the instance. The correctly assignment
and
Where
It is a small real integer number such that its updates will classify the instance correctly [4], [9].
Thermal Perceptron
This method is used for non linearly separable instances. It also adjusts wi and wj, and makes use of some constants as:-
and
Page | 1117
Neeraj et al., International Journal of Advanced Research in Computer Science and Software Engineering 3(6),
June - 2013, pp. 1114-1119
IV. EXPERIMENT AND RESULTS
Page | 1118
Neeraj et al., International Journal of Advanced Research in Computer Science and Software Engineering 3(6),
June - 2013, pp. 1114-1119
V.
CONCLUSION
This paper presents discussion about Decision Trees with the Univariate and the Multivariate approaches. Weka is used
as data mining tool that provides various algorithms to be applied on data sets. The J48 algorithm is used to implement
Univariate Decision Tree approach, while its results are discussed. The Multivariate approach is introduced as the Linear
Machine approach that makes the use of the Absolute Error Correction and also the Thermal Perceptron Rules. Decision
Tree is a popular technique for supervised classification, especially when the results are interpreted by human.
Multivariate Decision Tree uses the concept of attributes correlation and provides the best way to perform conditional
tests as compare to Univariate approach. The research study concludes that Multivariate approach is far better than
Univariate approach while it allow us dealing with large amount of data.
REFERENCES
[1]. Dolado, J. J., D. . Rodrguez, and J. Riquelme. "A Two Stage Zone Regression Method for Global
Characterization of a Project Database." (2007): 13. Web. 5 Apr. 2013.
[2]. Berzal, Fernando, Juan-Carlos Cubero, and Nicol as Mar
n. "Building multi-way decision trees with
numerical attributes." 31. Web. 5 Apr. 2013.
[3]. Frank, Eibe. "Pruning Decision Trees and Lists." (2000): 218. Web. 5 Apr. 2013.
[4]. Korting, Thales S. "C4.5 algorithm and Multivariate Decision Trees." 5. Web. 2 Feb. 2013.
[5]. Quinlan, J. R. "Improved Use of Continuous Attributes in C4.5." 14. Web. 11 Jan. 2013.
[6]. JUNEJA, DEEPTI, et al. "A novel approach to construct decision tree using quick C4.5 algorithm." Oriental
Journal of Computer Science & Technology Vol. 3(2), 305-310 (2010) (2010): 6. Web. 18 Feb. 2013.
[7]. Ittner, Andreas, et al. "Non-Linear Decision Tree - NDT." In: Proceeding of 13th international conference on
machine learning (ICML''96) 6. Web. 16 Mar. 2013.
[8]. Moertini, Veronica S. "TOWARDS THE USE OF C4.5 ALGORITHM FOR CLASSIFYING BANKING
DATASET." Vol. 8 No. 2, October 2003 (2003): 12. Web. 24 Jan. 2013.
[9]. Utgoff, Paul E. "Linear Machine Decision Tree." (1991): 15. Web. 6 Feb. 2013.
[10]. Rokach, Lior, and Oded Maimon. "DECISION TREES." 28. Web. 1 Feb. 2013.
[11]. Data Mining from Wikipedia the free Encyclopedia. Web. <http://en.wikipedia.org/wiki/Data_mining>.
[12]. Term INTRODUCTION OF DATA MINING,Data Mining: What is Data Mining , source from
http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm.
[13]. Rokach, Lior. "Data Mining with Decision Trees: Theory and Applications." 69 (2008): Web. 3 Feb. 2013.
[14]. Gasperin, Matej. "Case Study on the use of Data Minig Techniques in Food Science using Honey Samples."
(February 2007): 18. Web. 8 May 2013.
[15]. Ozer, Patrick. "Data Mining Algorithms for Classification." (January 2008): 27. Web. 5 May 2013.
[16]. Gholap, Jay. "PERFORMANCE TUNING OF J48 ALGORITHM FOR PREDICTION OF SOIL FERTILITY." 5.
Web. 2 May 2013.
Page | 1119