0% found this document useful (0 votes)

15 views24 pages

6 DecisionTrees ID3 CART

A decision tree is a classification and prediction tool that uses a tree-like structure to represent decisions based on attributes, with nodes for tests, branches for outcomes, and leaf nodes for class labels. Key concepts include entropy, information gain, and Gini impurity, which are used to determine the best splits in the data. Algorithms like ID3 and CART utilize these measures to build decision trees, with advantages such as interpretability and minimal preprocessing, but they can also overfit noisy data.

Uploaded by

thedabbingwriterindisguise

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views24 pages

6 DecisionTrees ID3 CART

Uploaded by

thedabbingwriterindisguise

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

What is a decision tree?

A decision tree is a classification and prediction tool having a tree-like structure, where each
internal node denotes a test on an attribute, each branch represents an outcome of the test,
and each leaf node (terminal node) holds a class label.

Above we have a small decision tree. An important advantage of the decision tree is that it is
highly interpretable. Here If Height > 180cm or if height < 180cm and weight > 80kg person is
male. Otherwise female. Did you ever think about how we came up with this decision tree? I
will try to explain it using the weather dataset.
Before going to it further I will explain some important terms related to decision trees.

Entropy
In machine learning, entropy is a measure of the randomness in the information being processed. The
higher the entropy, the harder it is to draw any conclusions from that information.

Information Gain
Information gain can be defined as the amount of information gained about a random variable or signal
from observing another random variable. It can be considered as the difference between the entropy of
parent node and weighted average entropy of child nodes.
Gini Impurity
Gini impurity is a measure of how often a randomly chosen element from the set would be incorrectly
labeled if it was randomly labeled according to the distribution of labels in the subset.

Gini impurity is lower bounded by 0, with 0 occurring if the data set contains only one class.
There are many algorithms there to build a decision tree. They are

1. CART (Classification and Regression Trees) — This makes use of Gini impurity as the metric.

2. ID3 (Iterative Dichotomiser 3) — This uses entropy and information gain as metric.

In this article, I will go through ID3. Once you got it, it is easy to implement the same using CART.

Classification using the ID3 algorithm

Consider whether a dataset based on which we will determine whether to play football or not.

Here There are for independent variables to determine the dependent variable. The independent variables
are Outlook, Temperature, Humidity, and Wind. The dependent variable is whether to play football or not.
As the first step, we have to find the parent node for our decision tree. For that follow the steps:

Find the entropy of the class variable.

E(S) = -[(9/14)log(9/14) + (5/14)log(5/14)] = 0.94
Note: Here typically we will take log to base 2.Here total there are 14 yes/no. Out of which 9 yes and 5 no.
Based on it we calculated probability above.
From the above data for outlook we can arrive at the following table easily.

Now we have to calculate average weighted entropy, ie, we have found the total of weights of each feature
multiplied by probabilities.
E(S, outlook) = (5/14)*E(3,2) + (4/14)*E(4,0) + (5/14)*E(2,3) = (5/14)(-(3/5)log(3/5)-(2/5)log(2/5))+ (4/14)(0) +
(5/14)((2/5)log(2/5)-(3/5)log(3/5)) = 0.693

The next step is to find the information gain.

It is the difference between parent entropy and average weighted entropy we found above.
IG(S, outlook) = 0.94 - 0.693 = 0.247
Similarly find Information gain for Temperature, Humidity, and Windy.
IG(S, Temperature) = 0.940 - 0.911 = 0.029
IG(S, Humidity) = 0.940 - 0.788 = 0.152
IG(S, Windy) = 0.940 - 0.8932 = 0.048

Now select the feature having the largest entropy gain.

Here it is Outlook. So it forms the first node (root node) of our decision tree.
Now our data look as follows:
Since overcast contains only examples of class ‘Yes’ we can set it as yes. That means If outlook is overcast
football will be played. Now our decision tree looks as follows.

The next step is to find the next node in our decision tree. Now we will find one under sunny. We have to
determine which of the following Temperature, Humidity or Wind has higher information gain.

Calculate parent entropy E(sunny)

E(sunny) = (-(3/5)log(3/5)-(2/5)log(2/5)) = 0.971.
Now Calculate the information gain of Temperature. IG(sunny, Temperature)
E(sunny, Temperature) = (2/5)*E(0,2) + (2/5)*E(1,1) + (1/5)*E(1,0)=2/5=0.4
Now calculate information gain.
IG(sunny, Temperature) = 0.971–0.4 =0.571
Similarly, we get:
IG(sunny, Humidity) = 0.971
IG(sunny, Windy) = 0.020
Here IG(sunny, Humidity) is the largest value. So Humidity is the node that comes under sunny.

For humidity from the above table, we can say that play will occur if humidity is normal and will not occur if
it is high. Similarly, find the nodes under rainy.

Note: A branch with entropy more than 0 needs further splitting.

Finally, our decision tree will look as below:
Classification using CART algorithm

Classification using CART is similar to it. But instead of entropy, we use Gini impurity.
So as the first step we will find the root node of our decision tree. For that Calculate the Gini index of the
class variable
Gini(S) = 1 - [(9/14)² + (5/14)²] = 0.4591
As the next step, we will calculate the Gini gain. For that first, we will find the average weighted Gini
impurity of Outlook, Temperature, Humidity, and Windy.

First, consider case of Outlook:

Gini(S, outlook) = (5/14)gini(3,2) + (4/14)gini(4,0)+ (5/14)gini(2,3) = (5/14)(1 - (3/5)² - (2/5)²) + (4/14)*0 +

(5/14)(1 - (2/5)² - (3/5)²)= 0.171+0+0.171 = 0.342
Gini gain (S, outlook) = 0.459 - 0.342 = 0.117
Gini gain(S, Temperature) = 0.459 - 0.4405 = 0.0185
Gini gain(S, Humidity) = 0.459 - 0.3674 = 0.0916
Gini gain(S, windy) = 0.459 - 0.4286 = 0.0304
Choose one that has a higher Gini gain. Gini gain is higher for outlook. So we can choose it as our root node.

Now you have got an idea of how to proceed further. Repeat the same steps we used in the ID3 algorithm.

Advantages and disadvantages of decision trees

Advantages:
1. Decision trees are super interpretable
2. Require little data preprocessing
3. Suitable for low latency applications
Disadvantages:
1. More likely to overfit noisy data. The probability of overfitting on noise increases as a tree gets deeper.
A solution for it is pruning. Another way to avoid overfitting is to use bagging techniques like
Random Forest.

The Classification and Regression Tree (CART) algorithm is a type of classification algorithm that is
required to build a decision tree on the basis of Gini’s impurity index.
It is a basic machine learning algorithm and provides a wide variety of use cases. A statistician named Leo
Breiman coined the phrase to describe Decision Tree algorithms that may be used for classification or
regression predictive modeling issues.

CART is an umbrella word that refers to the following types of decision trees:
 Classification Trees: When the target variable is continuous, the tree is used to find the "class" into
which the target variable is most likely to fall.
 Regression trees: These are used to forecast the value of a continuous variable.

Advantages of CART algorithm

The CART algorithm is nonparametric, thus it does not depend on information from a certain sort of
distribution.
1. The CART algorithm combines both testings with a test data set and cross-validation to more precisely
measure the goodness of fit.
2. CART allows one to utilize the same variables many times in various regions of the tree. This skill is
capable of revealing intricate interdependencies between groups of variables.
3. Outliers in the input variables have no meaningful effect on CART.
4. One can loosen halting restrictions to allow decision trees to overgrow and then trim the tree down to
its ideal size. This method reduces the likelihood of missing essential structure in the data set by
terminating too soon.
5. To choose the input set of variables, CART can be used in combination with other prediction
algorithms.

The CART algorithm is a subpart of Random Forest, which is one of the most powerful algorithms of
Machine learning. The CART algorithm is organized as a series of questions, the responses to which decide
the following question if any. The ultimate outcome of these questions is a tree-like structure with terminal
nodes when there are no more questions.

Gini’s impurity index which measures a distribution among affection of specific-field with the result of
instance. It means, it can measure how much every mentioned specification is affecting directly in the
resultant case. Gini index is used in the real-life scenario.
Step by Step ID3 Decision Tree Example
Decision tree algorithms transfom raw data to rule based decision making trees. Herein, ID3 is one of the
most common decision tree algorithm. Firstly, It was introduced in 1986 and it is acronym of Iterative
Dichotomiser.
First of all, dichotomisation means dividing into two completely opposite things. That’s why, the algorithm
iteratively divides attributes into two groups which are the most dominant attribute and others to
construct a tree. Then, it calculates the entropy and information gains of each attribute. In this way, the
most dominant attribute can be founded. After then, the most dominant one is put on the tree as decision
node. Thereafter, entropy and gain scores would be calculated again among the other attributes. Thus, the
next most dominant attribute is found. Finally, this procedure continues until reaching a decision for that
branch. That’s why, it is called Iterative Dichotomiser.

No matter which decision tree algorithm you are running: ID3, C4.5, CART, CHAID or Regression Trees.
They all look for the feature offering the highest information gain. Then, they add a decision rule for the
found feature and build another decision tree for the sub data set recursively until they reached a decision.

Besides, regular decision tree algorithms are designed to create branches for categorical features. Still, we
are able to build trees with continuous and numerical features. The trick is here that we will convert
continuous features into categorical. We will split the numerical feature where it offers the highest
information gain.
ID3 in Python
This blog post mentions the deeply explanation of ID3 algorithm and we will solve a problem step by step.
On the other hand, you might just want to run ID3 algorithm and its mathematical background might not
attract your attention.
Herein, you can find the python implementation of ID3 algorithm here. You can build ID3 decision trees
with a few lines of code. This package supports the most common decision tree algorithms such as
ID3, C4.5, CART, CHAID or Regression Trees, also some bagging methods such as random forest and some
boosting methods such as gradient boosting and adaboost.

Objective
Decision rules will be found based on entropy and information gain pair of features.

Data set
For instance, the following table informs about decision making factors to play tennis at outside for
previous 14 days.
Day Outlook Temp. Humidity Wind Decision

1 Sunny Hot High Weak No

2 Sunny Hot High Strong No

3 Overcast Hot High Weak Yes

4 Rain Mild High Weak Yes

5 Rain Cool Normal Weak Yes

6 Rain Cool Normal Strong No

7 Overcast Cool Normal Strong Yes

8 Sunny Mild High Weak No

9 Sunny Cool Normal Weak Yes

10 Rain Mild Normal Weak Yes

11 Sunny Mild Normal Strong Yes

12 Overcast Mild High Strong Yes

13 Overcast Hot Normal Weak Yes

14 Rain Mild High Strong No

We can summarize the ID3 algorithm as illustrated below

Entropy(S) = ∑ – p(I) . log2p(I)

Gain(S, A) = Entropy(S) – ∑ [ p(S|A) . Entropy(S|A) ]

These formulas might confuse your mind. Practicing will make it understandable.

Entropy
We need to calculate the entropy first. Decision column consists of 14 instances and includes two labels:
yes and no.

There are 9 decisions labeled yes, and 5 decisions labeled no.

Entropy(Decision) = – p(Yes) . log2p(Yes) – p(No) . log2p(No)

Entropy(Decision) = – (9/14) . log2(9/14) – (5/14) . log2(5/14) = 0.940

Now, we need to find the most dominant factor for decisioning.

Wind factor on decision

Gain(Decision, Wind) = Entropy(Decision) – ∑ [ p(Decision|Wind) . Entropy(Decision|Wind) ]

Wind attribute has two labels: weak and strong. We would reflect it to the formula.

Gain(Decision, Wind) = Entropy(Decision) – [ p(Decision|Wind=Weak) . Entropy(Decision|Wind=Weak) ] –

[ p(Decision|Wind=Strong) . Entropy(Decision|Wind=Strong) ]
Now, we need to calculate (Decision|Wind=Weak) and (Decision|Wind=Strong) respectively.
Weak wind factor on decision
Day Outlook Temp. Humidity Wind Decision

1 Sunny Hot High Weak No

3 Overcast Hot High Weak Yes

4 Rain Mild High Weak Yes

5 Rain Cool Normal Weak Yes

8 Sunny Mild High Weak No

9 Sunny Cool Normal Weak Yes

10 Rain Mild Normal Weak Yes

13 Overcast Hot Normal Weak Yes

There are 8 instances for weak wind. Decision of 2 items are no and 6 items are yes as illustrated below.
1- Entropy(Decision|Wind=Weak) = – p(No) . log2p(No) – p(Yes) . log2p(Yes)
2- Entropy(Decision|Wind=Weak) = – (2/8) . log2(2/8) – (6/8) . log2(6/8) = 0.811

Notice that if the number of instances of a class were 0 and total number of instances were n, then we need
to calculate -(0/n) . log2(0/n). Here, log(0) would be equal to -∞, and we cannot calculate 0 times ∞. This is a
special case often appears in decision tree applications. Even though compilers cannot compute this
operation, we can compute it with calculus.

Strong wind factor on decision

Day Outlook Temp. Humidity Wind Decision

2 Sunny Hot High Strong No

6 Rain Cool Normal Strong No

7 Overcast Cool Normal Strong Yes

11 Sunny Mild Normal Strong Yes

12 Overcast Mild High Strong Yes

14 Rain Mild High Strong No

Here, there are 6 instances for strong wind. Decision is divided into two equal parts.
1- Entropy(Decision|Wind=Strong) = – p(No) . log2p(No) – p(Yes) . log2p(Yes)
2- Entropy(Decision|Wind=Strong) = – (3/6) . log2(3/6) – (3/6) . log2(3/6) = 1

Now, we can turn back to Gain(Decision, Wind) equation.

Gain(Decision, Wind) = Entropy(Decision) – [ p(Decision|Wind=Weak) . Entropy(Decision|Wind=Weak) ] –
[ p(Decision|Wind=Strong) . Entropy(Decision|Wind=Strong) ] = 0.940 – [ (8/14) . 0.811 ] – [ (6/14). 1] =
0.048
Calculations for wind column is over. Now, we need to apply same calculations for other columns to find the
most dominant factor on decision.

Other factors on decision

We have applied similar calculation on the other columns.
1- Gain(Decision, Outlook) = 0.246
2- Gain(Decision, Temperature) = 0.029
3- Gain(Decision, Humidity) = 0.151
As seen, outlook factor on decision produces the highest score. That’s why, outlook decision will appear in
the root node of the tree.

Now, we need to test dataset for custom subsets of outlook attribute.

Overcast outlook on decision
Basically, decision will always be yes if outlook were overcast.

Day Outlook Temp. Humidity Wind Decision

3 Overcast Hot High Weak Yes

7 Overcast Cool Normal Strong Yes

12 Overcast Mild High Strong Yes

13 Overcast Hot Normal Weak Yes

Sunny outlook on decision

Day Outlook Temp. Humidity Wind Decision

1 Sunny Hot High Weak No

2 Sunny Hot High Strong No

8 Sunny Mild High Weak No

9 Sunny Cool Normal Weak Yes

11 Sunny Mild Normal Strong Yes

Here, there are 5 instances for sunny outlook. Decision would be probably 3/5 percent no, 2/5 percent yes.

1- Gain(Outlook=Sunny|Temperature) = 0.570

2- Gain(Outlook=Sunny|Humidity) = 0.970

3- Gain(Outlook=Sunny|Wind) = 0.019

Now, humidity is the decision because it produces the highest score if outlook were sunny.

At this point, decision will always be no if humidity were high.

Day Outlook Temp. Humidity Wind Decision

1 Sunny Hot High Weak No

2 Sunny Hot High Strong No

8 Sunny Mild High Weak No

On the other hand, decision will always be yes if humidity were normal

Day Outlook Temp. Humidity Wind Decision

9 Sunny Cool Normal Weak Yes

11 Sunny Mild Normal Strong Yes

Finally, it means that we need to check the humidity and decide if outlook were sunny.
Rain outlook on decision
Day Outlook Temp. Humidity Wind Decision

4 Rain Mild High Weak Yes

5 Rain Cool Normal Weak Yes

6 Rain Cool Normal Strong No

10 Rain Mild Normal Weak Yes

14 Rain Mild High Strong No

1- Gain(Outlook=Rain | Temperature) = 0.01997309402197489

2- Gain(Outlook=Rain | Humidity) = 0.01997309402197489

3- Gain(Outlook=Rain | Wind) = 0.9709505944546686

Here, wind produces the highest score if outlook were rain. That’s why, we need to check wind attribute in
2nd level if outlook were rain.
So, it is revealed that decision will always be yes if wind were weak and outlook were rain.

Day Outlook Temp. Humidity Wind Decision

4 Rain Mild High Weak Yes

5 Rain Cool Normal Weak Yes

10 Rain Mild Normal Weak Yes

What’s more, decision will be always no if wind were strong and outlook were rain.

Day Outlook Temp. Humidity Wind Decision

6 Rain Cool Normal Strong No

14 Rain Mild High Strong No

So, decision tree construction is over.

Feature Importance
Decision trees are naturally explainable and interpretable algorithms. Besides, we can find the feature
importance values as well to understand how model works.

References:
1. https://sefiks.com/2017/11/20/a-step-by-step-id3-decision-tree-example/
2. https://medium.datadriveninvestor.com/decision-tree-algorithm-with-hands-on-example-e6c2afb40d38
3. https://www.saedsayad.com/decision_tree.htm

Decision Tree
0% (1)
Decision Tree
24 pages
Unit-4 (1) .Docx ML
No ratings yet
Unit-4 (1) .Docx ML
42 pages
Classification Algorithms: Inteligência Artificial E Cibersegurança (Inacs)
No ratings yet
Classification Algorithms: Inteligência Artificial E Cibersegurança (Inacs)
60 pages
Decision Tree
No ratings yet
Decision Tree
100 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
28 pages
Decision Tree
No ratings yet
Decision Tree
36 pages
CSE 422 Machine Learning Tree Based Methods
No ratings yet
CSE 422 Machine Learning Tree Based Methods
35 pages
Classification and Clustering
No ratings yet
Classification and Clustering
59 pages
ML Unit 3
No ratings yet
ML Unit 3
22 pages
06-Classification Part1
No ratings yet
06-Classification Part1
44 pages
Decision Tree
No ratings yet
Decision Tree
66 pages
Decision Tree
No ratings yet
Decision Tree
12 pages
MODULE 4-Dr - GM
No ratings yet
MODULE 4-Dr - GM
23 pages
MLT UNIT-3 Notes
No ratings yet
MLT UNIT-3 Notes
35 pages
ML Unit 2 Final - III Yr
No ratings yet
ML Unit 2 Final - III Yr
72 pages
ML Unit-2 Material WORD
No ratings yet
ML Unit-2 Material WORD
25 pages
Tree
No ratings yet
Tree
7 pages
Data Mining Notes Unit 4
No ratings yet
Data Mining Notes Unit 4
30 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
Decision Tree
No ratings yet
Decision Tree
34 pages
Decision Tree
No ratings yet
Decision Tree
14 pages
CS467-M4-Machine Learning-Ktustudents - in
No ratings yet
CS467-M4-Machine Learning-Ktustudents - in
9 pages
L04 Decision Trees
No ratings yet
L04 Decision Trees
34 pages
ML Classification Tree
No ratings yet
ML Classification Tree
36 pages
Lecture 7.1 - Decision Tree Classification
No ratings yet
Lecture 7.1 - Decision Tree Classification
15 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
12 pages
Decision Tree For Classification (ID3 Information Gain Entropy)
No ratings yet
Decision Tree For Classification (ID3 Information Gain Entropy)
3 pages
06 Classification Decision Tree
No ratings yet
06 Classification Decision Tree
42 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
1.decision Trees Concepts
No ratings yet
1.decision Trees Concepts
70 pages
Experiment No-2
No ratings yet
Experiment No-2
4 pages
My Decision Tree Algorithm
No ratings yet
My Decision Tree Algorithm
21 pages
Decision Trees - 2022
No ratings yet
Decision Trees - 2022
49 pages
S&ML Unit 6 - Q & A
No ratings yet
S&ML Unit 6 - Q & A
12 pages
Examples
No ratings yet
Examples
8 pages
Day48 Decision Trees
No ratings yet
Day48 Decision Trees
5 pages
Decistion Tree
No ratings yet
Decistion Tree
27 pages
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
DM Unit 4
No ratings yet
DM Unit 4
24 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
Decision Trees
No ratings yet
Decision Trees
18 pages
Decision Tree
No ratings yet
Decision Tree
17 pages
Decision Tree
No ratings yet
Decision Tree
15 pages
DT-0 (3 Files Merged)
No ratings yet
DT-0 (3 Files Merged)
143 pages
ML Unit 3 Qa
No ratings yet
ML Unit 3 Qa
26 pages
Decision Trees: Decision Tree Is One of The Most Widely Used and
No ratings yet
Decision Trees: Decision Tree Is One of The Most Widely Used and
53 pages
Decsion Tree
No ratings yet
Decsion Tree
6 pages
DT Classifier
No ratings yet
DT Classifier
45 pages
Decision Tree
No ratings yet
Decision Tree
7 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
11 pages
Lecture 3 - Decision Trees and Random Forest
No ratings yet
Lecture 3 - Decision Trees and Random Forest
20 pages
Learning Decision Trees
No ratings yet
Learning Decision Trees
13 pages
Lecture 5a
No ratings yet
Lecture 5a
24 pages
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
No ratings yet
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
34 pages
ML Unit-3
No ratings yet
ML Unit-3
92 pages
DataMining-Handouts1 5
No ratings yet
DataMining-Handouts1 5
8 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
MLT Unit 3
100% (1)
MLT Unit 3
38 pages
Tokens in C++
100% (1)
Tokens in C++
10 pages
MSQ Questions From Each Subject
No ratings yet
MSQ Questions From Each Subject
8 pages
302 Defective Chessboard
No ratings yet
302 Defective Chessboard
35 pages
Lecture 09 - Sequential Quadratic Programming
No ratings yet
Lecture 09 - Sequential Quadratic Programming
4 pages
9 Roc Auc
No ratings yet
9 Roc Auc
27 pages
Big-O Algorithm Complexity Cheat Sheet
No ratings yet
Big-O Algorithm Complexity Cheat Sheet
9 pages
Data Matching
No ratings yet
Data Matching
74 pages
Binary Tree Traversal
No ratings yet
Binary Tree Traversal
4 pages
Dijkstra's and A-Star in Finding The Shortest Path: A Tutorial
No ratings yet
Dijkstra's and A-Star in Finding The Shortest Path: A Tutorial
5 pages
Unit 3 Process Deadlocks
No ratings yet
Unit 3 Process Deadlocks
30 pages
DL - Unit II
No ratings yet
DL - Unit II
78 pages
11.1. Deep Learning (RNN)
No ratings yet
11.1. Deep Learning (RNN)
69 pages
Java Project Details
No ratings yet
Java Project Details
9 pages
5A NBClassifier-1
No ratings yet
5A NBClassifier-1
42 pages
FA16-BCS-178 (AI ASSIGNMENT#2 8puzzle)
No ratings yet
FA16-BCS-178 (AI ASSIGNMENT#2 8puzzle)
6 pages
Imo 2019 Solutions-R856
No ratings yet
Imo 2019 Solutions-R856
23 pages
Software Testing Methodologies Syllabus
No ratings yet
Software Testing Methodologies Syllabus
2 pages
DS Lecture 2 (Compound Statements)
No ratings yet
DS Lecture 2 (Compound Statements)
29 pages
Dynamic Memory MGT Class - 2023-24
No ratings yet
Dynamic Memory MGT Class - 2023-24
41 pages
7 - Multinomial Naive Bayes
No ratings yet
7 - Multinomial Naive Bayes
9 pages
Dijkstra Algorithm Presentation
No ratings yet
Dijkstra Algorithm Presentation
26 pages
Design of Hamming Code Using Verilog
No ratings yet
Design of Hamming Code Using Verilog
18 pages
Linear Algebra: Syllabus For M Tech Signal Processing (2011 Batch)
No ratings yet
Linear Algebra: Syllabus For M Tech Signal Processing (2011 Batch)
20 pages
Queue Assignment
No ratings yet
Queue Assignment
25 pages
CFT and DFT
No ratings yet
CFT and DFT
39 pages
BCA DM Chapter 6 - Association
No ratings yet
BCA DM Chapter 6 - Association
37 pages
Simplex Method Is The Most Popular Method Used For The Solution of Linear Programming Problems (LPP)
No ratings yet
Simplex Method Is The Most Popular Method Used For The Solution of Linear Programming Problems (LPP)
32 pages
CS1113: Foundations of Computer Science II: Course Notes
No ratings yet
CS1113: Foundations of Computer Science II: Course Notes
14 pages
Dictionaries: Erin Keith
No ratings yet
Dictionaries: Erin Keith
22 pages
Unit - 4 List, Tuples and Dictionaries
No ratings yet
Unit - 4 List, Tuples and Dictionaries
10 pages
Rubrics
No ratings yet
Rubrics
1 page
Image Reognition
No ratings yet
Image Reognition
18 pages
Prolog Programming: A Do-It-Yourself Course For Beginners: Kris@coli - Uni-Sb - de
No ratings yet
Prolog Programming: A Do-It-Yourself Course For Beginners: Kris@coli - Uni-Sb - de
16 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet

6 DecisionTrees ID3 CART

Uploaded by

6 DecisionTrees ID3 CART

Uploaded by

What is a decision tree?

Classification using the ID3 algorithm

Find the entropy of the class variable.

The next step is to find the information gain.

Now select the feature having the largest entropy gain.

Calculate parent entropy E(sunny)

Note: A branch with entropy more than 0 needs further splitting.

First, consider case of Outlook:

Gini(S, outlook) = (5/14)gini(3,2) + (4/14)*gini(4,0)+ (5/14)*gini(2,3) = (5/14)(1 - (3/5)² - (2/5)²) + (4/14)*0 +

Advantages and disadvantages of decision trees

Advantages of CART algorithm

1 Sunny Hot High Weak No

2 Sunny Hot High Strong No

3 Overcast Hot High Weak Yes

4 Rain Mild High Weak Yes

5 Rain Cool Normal Weak Yes

6 Rain Cool Normal Strong No

7 Overcast Cool Normal Strong Yes

8 Sunny Mild High Weak No

9 Sunny Cool Normal Weak Yes

10 Rain Mild Normal Weak Yes

11 Sunny Mild Normal Strong Yes

12 Overcast Mild High Strong Yes

13 Overcast Hot Normal Weak Yes

14 Rain Mild High Strong No

Entropy(S) = ∑ – p(I) . log2p(I)

Gain(S, A) = Entropy(S) – ∑ [ p(S|A) . Entropy(S|A) ]

There are 9 decisions labeled yes, and 5 decisions labeled no.

Entropy(Decision) = – p(Yes) . log2p(Yes) – p(No) . log2p(No)

Entropy(Decision) = – (9/14) . log2(9/14) – (5/14) . log2(5/14) = 0.940

Now, we need to find the most dominant factor for decisioning.

Gain(Decision, Wind) = Entropy(Decision) – ∑ [ p(Decision|Wind) . Entropy(Decision|Wind) ]

Gain(Decision, Wind) = Entropy(Decision) – [ p(Decision|Wind=Weak) . Entropy(Decision|Wind=Weak) ] –

1 Sunny Hot High Weak No

3 Overcast Hot High Weak Yes

4 Rain Mild High Weak Yes

5 Rain Cool Normal Weak Yes

8 Sunny Mild High Weak No

9 Sunny Cool Normal Weak Yes

10 Rain Mild Normal Weak Yes

13 Overcast Hot Normal Weak Yes

Strong wind factor on decision

2 Sunny Hot High Strong No

6 Rain Cool Normal Strong No

7 Overcast Cool Normal Strong Yes

11 Sunny Mild Normal Strong Yes

12 Overcast Mild High Strong Yes

14 Rain Mild High Strong No

Now, we can turn back to Gain(Decision, Wind) equation.

Other factors on decision

Now, we need to test dataset for custom subsets of outlook attribute.

Day Outlook Temp. Humidity Wind Decision

3 Overcast Hot High Weak Yes

7 Overcast Cool Normal Strong Yes

12 Overcast Mild High Strong Yes

13 Overcast Hot Normal Weak Yes

Sunny outlook on decision

Day Outlook Temp. Humidity Wind Decision

1 Sunny Hot High Weak No

2 Sunny Hot High Strong No

8 Sunny Mild High Weak No

9 Sunny Cool Normal Weak Yes

11 Sunny Mild Normal Strong Yes

At this point, decision will always be no if humidity were high.

Day Outlook Temp. Humidity Wind Decision

1 Sunny Hot High Weak No

2 Sunny Hot High Strong No

8 Sunny Mild High Weak No

Day Outlook Temp. Humidity Wind Decision

9 Sunny Cool Normal Weak Yes

11 Sunny Mild Normal Strong Yes

4 Rain Mild High Weak Yes

5 Rain Cool Normal Weak Yes

6 Rain Cool Normal Strong No

10 Rain Mild Normal Weak Yes

14 Rain Mild High Strong No

Gini(S, outlook) = (5/14)gini(3,2) + (4/14)gini(4,0)+ (5/14)gini(2,3) = (5/14)(1 - (3/5)² - (2/5)²) + (4/14)*0 +