ML - Module-3-Chapter-6 RNSIT
ML - Module-3-Chapter-6 RNSIT
MODULE 3
CHAPTER 6
DECISION TREE LEARNING
6.1 Introduction
Decision tree learning model, one of the most popular supervised predictive learning models,
classifies data instances with high accuracy and consistency. The model performs an
inductive inference that reaches a general conclusion from observed examples. This model
is variably used for solving complex classification applications.
Decision tree is a concept tree which summarizes the information contained in the training
dataset in the form of a tree structure. Once the concept model is built, test data can be easily
classified.
6.1.1 Structure of a Decision Tree A decision tree is a structure that includes a root node,
branches, and leaf nodes. Each internal node denotes a test on an attribute, each branch
denotes the outcome of a test, and each leaf node holds a class label. The topmost node in
the tree is the root node.
Goal Construct a decision tree with the given training dataset. The tree is constructed in a
top-down fashion. It starts from the root node. At every level of tree construction, we need
to find the best split attribute or best decision node among all attributes. This process is
recursive and continued until we end up in the last level of the tree or finding a leaf node
which cannot be split further. The tree construction is complete when all the test conditions
lead to a leaf node. The leaf node contains the target class or output of classification.
Goal Given a test instance, infer to the target class it belongs to.
Classification Inferring the target class for the test instance or object is based on inductive
inference on the constructed decision tree. In order to classify an object, we need to start
traversing the tree from the root. We traverse as we evaluate the test condition on every
decision node with the test object attribute value and walk to the branch corresponding to the
test's outcome. This process is repeated until we end up in a leaf node which contains the
target class of the test object.
Output Target label of the test instance.
4. Can model a high degree of nonlinearity in the relationship between the target variables
and the predictor variables
5. Quick to train
Let P be the probability distribution of data instances from 1 to n as shown in Eq. (6.2).
So, P=P1....... Pn (6.2)
Entropy of P is the information measure of this probability distribution given in Eq. (6.3),
Entropy_Info(P) = Entropy_Info( P1....... Pn )
=-(P1 ̧ log2(P1 ̧) + P2 log2(P2)+.......+Pn log(Pn)) (6.3)
where, P1, is the probability of data instances classified as class 1 and P2, is the probability of data
instances classified as class 2 and so on.
P1= |No of data instances belonging to class 1| / |Total no of data instances in the training
dataset|
There are many decision tree algorithms, such as ID3, C4.5, CART, CHAID, QUEST,
GUIDE, CRUISE, and CTREE, that are used for classification in real-time environment. The
most commonly used decision tree algorithms are ID3 (Iterative Dichotomizer 3), developed
by J.R Quinlan in 1986, and C4.5 is an advancement of ID3 presented by the same author in
1993. CART, that stands for Classification and Regression Trees, is another algorithm which
was developed by Breiman et al. in 1984.
The accuracy of the tree constructed depends upon the selection of the best split attribute.
Different algorithms are used for building decision trees which use different measures to
decide on the splitting criterion. Algorithms such as ID3, C4.5 and CART are popular
algorithms used in the construction of decision trees. The algorithm ID3 uses 'Information
Gain' as the splitting criterion whereas the algorithm C4.5 uses 'Gain Ratio' as the splitting
criterion. The CART algorithm is popularly used for classifying both categorical and
continuous-valued target variables. CART uses GINI Index to construct a decision tree.
Regression trees are a variant of decision trees where the target feature is a continuous valued
variable. These trees can be constructed using an algorithm called reduction in variance
which uses standard deviation to choose the best splitting attribute.