0% found this document useful (0 votes)
38 views10 pages

ML - Module-3-Chapter-6 RNSIT

Decision tree learning is a popular supervised predictive model used for classification and regression tasks, characterized by its tree structure comprising root, branches, and leaf nodes. It offers advantages such as ease of interpretation and quick training, but also has disadvantages like susceptibility to overfitting and challenges with continuous attributes. Various algorithms like ID3, C4.5, and CART are employed to construct decision trees, each with different criteria for splitting attributes.

Uploaded by

emanikanta535
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views10 pages

ML - Module-3-Chapter-6 RNSIT

Decision tree learning is a popular supervised predictive model used for classification and regression tasks, characterized by its tree structure comprising root, branches, and leaf nodes. It offers advantages such as ease of interpretation and quick training, but also has disadvantages like susceptibility to overfitting and challenges with continuous attributes. Various algorithms like ID3, C4.5, and CART are employed to construct decision trees, each with different criteria for splitting attributes.

Uploaded by

emanikanta535
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

MACHINE LEARNING(BCS602)

MODULE 3
CHAPTER 6
DECISION TREE LEARNING
6.1 Introduction
Decision tree learning model, one of the most popular supervised predictive learning models,
classifies data instances with high accuracy and consistency. The model performs an
inductive inference that reaches a general conclusion from observed examples. This model
is variably used for solving complex classification applications.
Decision tree is a concept tree which summarizes the information contained in the training
dataset in the form of a tree structure. Once the concept model is built, test data can be easily
classified.

• Why called as decision tree ?


• As starts from root node and finds number of solutions .
• The benefits of having a decision tree are as follows :
• It does not require any domain knowledge.
• It is easy to comprehend.
• The learning and classification steps of a decision tree are simple and fast.
• Example : Toll free number

6.1.1 Structure of a Decision Tree A decision tree is a structure that includes a root node,
branches, and leaf nodes. Each internal node denotes a test on an attribute, each branch
denotes the outcome of a test, and each leaf node holds a class label. The topmost node in
the tree is the root node.

Applies to classification and regression model.

Ms. Deepa S, Dept. Of CSE,RNSIT 1


MACHINE LEARNING(BCS602)

The decision tree consists of 2 major procedures:

1) Building a tree and

2) Knowledge inference or classification.

Building the Tree

Goal Construct a decision tree with the given training dataset. The tree is constructed in a
top-down fashion. It starts from the root node. At every level of tree construction, we need
to find the best split attribute or best decision node among all attributes. This process is
recursive and continued until we end up in the last level of the tree or finding a leaf node
which cannot be split further. The tree construction is complete when all the test conditions
lead to a leaf node. The leaf node contains the target class or output of classification.

Output Decision tree representing the complete hypothesis space.

Knowledge Inference or Classification

Goal Given a test instance, infer to the target class it belongs to.
Classification Inferring the target class for the test instance or object is based on inductive
inference on the constructed decision tree. In order to classify an object, we need to start
traversing the tree from the root. We traverse as we evaluate the test condition on every
decision node with the test object attribute value and walk to the branch corresponding to the
test's outcome. This process is repeated until we end up in a leaf node which contains the
target class of the test object.
Output Target label of the test instance.

Advantages of Decision Trees

1. Easy to model and interpret


2. Simple to understand
3. The input and output attributes can be discrete or continuous predictor variables.

Ms. Deepa S, Dept. Of CSE,RNSIT 2


MACHINE LEARNING(BCS602)

4. Can model a high degree of nonlinearity in the relationship between the target variables
and the predictor variables
5. Quick to train

Disadvantages of Decision Trees


Some of the issues that generally arise with a decision tree learning are that:
1. It is difficult to determine how deeply a decision tree can be grown or when to stop growing
it.
2. If training data has errors or missing attribute values, then the decision tree constructed
may become unstable or biased.
3. If the training data has continuous valued attributes, handling it is computationally
complex and has to be discretized.
4. A complex decision tree may also be over-fitting with the training data.
5. Decision tree learning is not well suited for classifying multiple output classes.
6. Learning an optimal decision tree is also known to be NP-complete.

6.1.2 Fundamentals of Entropy

• How to draw a decision tree ?


Entropy
Information gain

Entropy is the amount of uncertainty or randomness in the outcome of a random


variable or an event. Moreover, entropy describes about the homogeneity of the data
instances. The best feature is selected based on the entropy value. For example, when a coin
is flipped, head or tail are the two outcomes, hence its entropy is lower when compared to
rolling a dice which has got six outcomes.

Let P be the probability distribution of data instances from 1 to n as shown in Eq. (6.2).
So, P=P1....... Pn (6.2)
Entropy of P is the information measure of this probability distribution given in Eq. (6.3),
Entropy_Info(P) = Entropy_Info( P1....... Pn )
=-(P1 ̧ log2(P1 ̧) + P2 log2(P2)+.......+Pn log(Pn)) (6.3)

Ms. Deepa S, Dept. Of CSE,RNSIT 3


MACHINE LEARNING(BCS602)

where, P1, is the probability of data instances classified as class 1 and P2, is the probability of data
instances classified as class 2 and so on.
P1= |No of data instances belonging to class 1| / |Total no of data instances in the training
dataset|

Algorithm 6.1: General Algorithm for Decision Trees

6.2 DECISION TREE INDUCTION ALGORITHMS

There are many decision tree algorithms, such as ID3, C4.5, CART, CHAID, QUEST,
GUIDE, CRUISE, and CTREE, that are used for classification in real-time environment. The
most commonly used decision tree algorithms are ID3 (Iterative Dichotomizer 3), developed
by J.R Quinlan in 1986, and C4.5 is an advancement of ID3 presented by the same author in
1993. CART, that stands for Classification and Regression Trees, is another algorithm which
was developed by Breiman et al. in 1984.

The accuracy of the tree constructed depends upon the selection of the best split attribute.
Different algorithms are used for building decision trees which use different measures to
decide on the splitting criterion. Algorithms such as ID3, C4.5 and CART are popular
algorithms used in the construction of decision trees. The algorithm ID3 uses 'Information
Gain' as the splitting criterion whereas the algorithm C4.5 uses 'Gain Ratio' as the splitting

Ms. Deepa S, Dept. Of CSE,RNSIT 4


MACHINE LEARNING(BCS602)

criterion. The CART algorithm is popularly used for classifying both categorical and
continuous-valued target variables. CART uses GINI Index to construct a decision tree.

6.2.1 ID3 Tree Construction(ID3 stands for Iterative Dichotomiser 3 )


A decision tree is one of the most powerful tools of supervised learning algorithms used
for both classification and regression tasks.
It builds a flowchart-like tree structure where each internal node denotes a test on an
attribute, each branch represents an outcome of the test, and each leaf node (terminal node)
holds a class label. It is constructed by recursively splitting the training data into subsets
based on the values of the attributes until a stopping criterion is met, such as the maximum
depth of the tree or the minimum number of samples required to split a node .

Ms. Deepa S, Dept. Of CSE,RNSIT 5


MACHINE LEARNING(BCS602)

6.2.2 C4.5 Construction


C4.5 is a widely used algorithm for constructing decision trees from a dataset.
Disadvantages of ID3 are: Attributes must be nominal values, dataset must not include
missing data, and finally the algorithm tend to fall into overfitting.
To overcome this disadvantage Ross Quinlan, inventor of ID3, made some
improvements for these bottlenecks and created a new algorithm named C4.5. Now, the
algorithm can create a more generalized models including continuous data and could handle
missing data. And also works with discrete data, supports post-prunning.

Ms. Deepa S, Dept. Of CSE,RNSIT 6


MACHINE LEARNING(BCS602)

Dealing with Continuous Attributes in C4.5

Ms. Deepa S, Dept. Of CSE,RNSIT 7


MACHINE LEARNING(BCS602)

6.2.3 Classification and Regression Trees Construction


Classification and Regression Trees (CART) is a widely used algorithm for constructing
decision trees that can be applied to both classification and regression tasks. CART is similar
to C4.5 but has some differences in its construction and splitting criteria.
The classification method CART is required to construct a decision tree based on Gini's
impurity index. It serves as an example of how the values of other variables can be used to

Ms. Deepa S, Dept. Of CSE,RNSIT 8


MACHINE LEARNING(BCS602)

predict the values of a target variable. It functions as a fundamental machine-learning method


and provides a wide range of use cases

Ms. Deepa S, Dept. Of CSE,RNSIT 9


MACHINE LEARNING(BCS602)

6.2.4 Regression Trees

Regression trees are a variant of decision trees where the target feature is a continuous valued
variable. These trees can be constructed using an algorithm called reduction in variance
which uses standard deviation to choose the best splitting attribute.

Ms. Deepa S, Dept. Of CSE,RNSIT 10

You might also like