0% found this document useful (0 votes)
4 views

Decision Tree Learning

Uploaded by

AVINASH SAH
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Decision Tree Learning

Uploaded by

AVINASH SAH
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Decision Tree Learning

This Learning Model one of the most popular supervised predictive learning
models,classifies data instances with high accuracy and consistency.
The model is variably used for solving complex classification applications.
Decision tree is a concept tree which summarizes the information contained in the
training dataset in the form of a tree structure.
Once the concept model is built, test data can be easily classified.
Advantages:
1. Easy to model and interpret.
2. Simple to understand.
3. Quick to train.
4. Input and output attribute can be continuous predictor variables.
Disadvantages
1.It is difficult to determine how deeply a decision tree can be grown or when to stop growing it.
2. A complex decision tree may be also be overfitting with the training data.
3. This is not well suited for classifying multiple output classes.
4.Learning optimal decision tree is also known to be NP-complete.
 Example
 Predict a student’s academic performance of wheather he will pass or fail based on the given
information such as ‘Assessment’ and ‘Assignment’. The following table shows the independent
variables, Assessment and Assignment,and the target variable exam result with their
values.Draw a binary decision tree.

Attributes Values
Assessment >50,<50
Assignment Yes,No
Exam Result Pass,Fail
Decision tree digram
If(Assessment >=50) then ‘Pass’
Else if (Assessment<50)then
If(Assignment==Yes) then ‘Pass’
Else if (Assignment==No)then ‘Fail’
 Entropy:
To Define Information Gain precisely, we begin by defining a measure which is commonly used in
information theory called Entropy. Entropy basically tells us how impure a collection of data is. The
term impure here defines non-homogeneity. In other word we can say, “Entropy is the measurement of
homogeneity. It returns us the information about an arbitrary dataset that how impure/non-
homogeneous the data set is.”
Given a collection of examples/dataset S, containing positive and negative examples of some target
concept, the entropy of S relative to this boolean classification is-
 The dataset has 9 positive instances and 5 negative instances, therefore-
 By observing closely on equations 1.2, 1.3 and 1.4; we can come to a conclusion that if the data
set is completely homogeneous then the impurity is 0, therefore entropy is 0 (equation 1.4), but if
the data set can be equally divided into two classes, then it is completely non-homogeneous &
impurity is 100%, therefore entropy is 1 (equation 1.3).
 Now, if we try to plot the Entropy in a graph, it will look like Figure 2. It clearly shows that the
Entropy is lowest when the data set is homogeneous and highest when the data set is completely
non-homogeneous.

Thank you

You might also like