0% found this document useful (0 votes)

52 views6 pages

C2 W4 Lab 01 Decision Trees

Uploaded by

bhramanand awasthi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views6 pages

C2 W4 Lab 01 Decision Trees

Uploaded by

bhramanand awasthi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

C2_W4_Lab_01_Decision_Trees

June 19, 2024

1 Ungraded Lab: Decision Trees

In this notebook you will visualize how a decision tree is splitted using information gain.
We will revisit the dataset used in the video lectures. The dataset is:
As you saw in the lectures, in a decision tree, we decide if a node will be split or not by looking at
the information gain that split would give us. (Image of video IG)
Where
( ( ) ( ))
right
Information Gain = H(pnode
1 ) − w left
H pleft
1 + w right
H p 1 ,

and H is the entropy, defined as

H(p1 ) = −p1 log2 (p1 ) − (1 − p1 )log2 (1 − p1 )

Remember that log here is defined to be in base 2. Run the code block below to see by yourself
how the entropy. H(p) behaves while p varies.
Note that the H attains its higher value when p = 0.5. This means that the probability of event
is 0.5. And its minimum value is attained in p = 0 and p = 1, i.e., the probability of the event
happening is totally predictable. Thus, the entropy shows the degree of predictability of an event.
[1]: import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from utils import *

[2]: %matplotlib widget

_ = plot_entropy()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Ba

[ ]:

Ear Shape Face Shape Whiskers Cat

Pointy Round Present 1

1
Ear Shape Face Shape Whiskers Cat
Floppy Not Round Present 1
Floppy Round Absent 0
Pointy Not Round Present 0
Pointy Round Present 1
Pointy Round Absent 1
Floppy Not Round Absent 0
Pointy Round Absent 1
Floppy Round Absent 0
Floppy Round Absent 0

We will use one-hot encoding to encode the categorical features. They will be as follows:
• Ear Shape: Pointy = 1, Floppy = 0
• Face Shape: Round = 1, Not Round = 0
• Whiskers: Present = 1, Absent = 0
Therefore, we have two sets:
• X_train: for each example, contains 3 features: - Ear Shape (1 if pointy, 0 otherwise) - Face
Shape (1 if round, 0 otherwise) - Whiskers (1 if present, 0 otherwise)
• y_train: whether the animal is a cat - 1 if the animal is a cat - 0 otherwise
[ ]:

[3]: X_train = np.array([[1, 1, 1],

[0, 0, 1],
[0, 1, 0],
[1, 0, 1],
[1, 1, 1],
[1, 1, 0],
[0, 0, 0],
[1, 1, 0],
[0, 1, 0],
[0, 1, 0]])

y_train = np.array([1, 1, 0, 0, 1, 1, 0, 1, 0, 0])

[4]: #For instance, the first example

X_train[0]

[4]: array([1, 1, 1])

This means that the first example has a pointy ear shape, round face shape and it has whiskers.
On each node, we compute the information gain for each feature, then split the node on the feature
with the higher information gain, by comparing the entropy of the node with the weighted entropy
in the two splitted nodes.

2
So, the root node has every animal in our dataset. Remember that pnode
1 is the proportion of
positive class (cats) in the root node. So

5
pnode
1 = = 0.5
10

Now let’s write a function to compute the entropy.

[5]: def entropy(p):
if p == 0 or p == 1:
return 0
else:
return -p * np.log2(p) - (1- p)*np.log2(1 - p)

print(entropy(0.5))

1.0
To illustrate, let’s compute the information gain if we split the node for each of the features. To
do this, let’s write some functions.
[6]: def split_indices(X, index_feature):
"""Given a dataset and a index feature, return two lists for the two split␣
,→nodes, the left node has the animals that have

that feature = 1 and the right node those that have the feature = 0
index feature = 0 => ear shape
index feature = 1 => face shape
index feature = 2 => whiskers
"""
left_indices = []
right_indices = []
for i,x in enumerate(X):
if x[index_feature] == 1:
left_indices.append(i)
else:
right_indices.append(i)
return left_indices, right_indices

So, if we choose Ear Shape to split, then we must have in the left node (check the table above) the
indices:

0 3 4 5 7

and the right indices, the remaining ones.

[7]: split_indices(X_train, 0)

[7]: ([0, 3, 4, 5, 7], [1, 2, 6, 8, 9])

3
Now we need another function to compute the weighted entropy in the splitted nodes. As you’ve
seen in the video lecture, we must find:
• wleft and wright , the proportion of animals in each node.
• pleft and pright , the proportion of cats in each split.
Note the difference between these two definitions!! To illustrate, if we split the root node on the
feature of index 0 (Ear Shape), then in the left node, the one that has the animals 0, 3, 4, 5 and 7,
we have:

5 4
wleft = = 0.5 and pleft =
10 5
5 1
wright = = 0.5 and pright =
10 5

[8]: def weighted_entropy(X,y,left_indices,right_indices):

"""
This function takes the splitted dataset, the indices we chose to split and␣
,→returns the weighted entropy.

"""
w_left = len(left_indices)/len(X)
w_right = len(right_indices)/len(X)
p_left = sum(y[left_indices])/len(left_indices)
p_right = sum(y[right_indices])/len(right_indices)

weighted_entropy = w_left * entropy(p_left) + w_right * entropy(p_right)

return weighted_entropy

[9]: left_indices, right_indices = split_indices(X_train, 0)

weighted_entropy(X_train, y_train, left_indices, right_indices)

[9]: 0.7219280948873623

So, the weighted entropy in the 2 split nodes is 0.72. To compute the Information Gain we must
subtract it from the entropy in the node we chose to split (in this case, the root node).

[10]: def information_gain(X, y, left_indices, right_indices):

"""
Here, X has the elements in the node and y is theirs respectives classes
"""
p_node = sum(y)/len(y)
h_node = entropy(p_node)
w_entropy = weighted_entropy(X,y,left_indices,right_indices)
return h_node - w_entropy

[11]: information_gain(X_train, y_train, left_indices, right_indices)

[11]: 0.2780719051126377

4
Now, let’s compute the information gain if we split the root node for each feature:
[12]: for i, feature_name in enumerate(['Ear Shape', 'Face Shape', 'Whiskers']):
left_indices, right_indices = split_indices(X_train, i)
i_gain = information_gain(X_train, y_train, left_indices, right_indices)
print(f"Feature: {feature_name}, information gain if we split the root node␣
,→using this feature: {i_gain:.2f}")

Feature: Ear Shape, information gain if we split the root node using this
feature: 0.28
Feature: Face Shape, information gain if we split the root node using this
feature: 0.03
Feature: Whiskers, information gain if we split the root node using this
feature: 0.12
So, the best feature to split is indeed the Ear Shape. Run the code below to see the split in action.
You do not need to understand the following code block.
[13]: tree = []
build_tree_recursive(X_train, y_train, [0,1,2,3,4,5,6,7,8,9], "Root",␣
,→max_depth=1, current_depth=0, tree = tree)

generate_tree_viz([0,1,2,3,4,5,6,7,8,9], y_train, tree)

Depth 0, Root: Split on feature: 0

- Left leaf node with indices [0, 3, 4, 5, 7]
- Right leaf node with indices [1, 2, 6, 8, 9]
Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Ba

The process is recursive, which means we must perform these calculations for each node until we
meet a stopping criteria:
• If the tree depth after splitting exceeds a threshold
• If the resulting node has only 1 class
• If the information gain of splitting is below a threshold
The final tree looks like this:
[14]: tree = []
build_tree_recursive(X_train, y_train, [0,1,2,3,4,5,6,7,8,9], "Root",␣
,→max_depth=2, current_depth=0, tree = tree)

generate_tree_viz([0,1,2,3,4,5,6,7,8,9], y_train, tree)

Depth 0, Root: Split on feature: 0

- Depth 1, Left: Split on feature: 1
-- Left leaf node with indices [0, 4, 5, 7]
-- Right leaf node with indices [3]
- Depth 1, Right: Split on feature: 2

5
-- Left leaf node with indices [1]
-- Right leaf node with indices [2, 6, 8, 9]
Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Ba

Congratulations! You completed the notebook!

TESDA Circular No. 089-2019 - Mandatory SIL or OJT
88% (8)
TESDA Circular No. 089-2019 - Mandatory SIL or OJT
28 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
16 pages
Physical Science Q4 Week 2 v2
100% (1)
Physical Science Q4 Week 2 v2
20 pages
Mindful Sport Performance Enhancement: Mental Training Athletes Coaches
No ratings yet
Mindful Sport Performance Enhancement: Mental Training Athletes Coaches
321 pages
6.034 Notes: Section 5.1: Slide 5.1.1
No ratings yet
6.034 Notes: Section 5.1: Slide 5.1.1
22 pages
3-01 Kanban
No ratings yet
3-01 Kanban
58 pages
Script For Turn-Over and Installation Ceremonies
100% (15)
Script For Turn-Over and Installation Ceremonies
3 pages
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
No ratings yet
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
25 pages
Parker SSD Drives 590PR Manual en
No ratings yet
Parker SSD Drives 590PR Manual en
466 pages
Decision Tree
No ratings yet
Decision Tree
23 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
LVC 1 Post-Session Summary
No ratings yet
LVC 1 Post-Session Summary
9 pages
Machine Learning Unit4
No ratings yet
Machine Learning Unit4
8 pages
C2 W4 Decision Tree With Markdown
No ratings yet
C2 W4 Decision Tree With Markdown
14 pages
تمييز اشكال ميد
No ratings yet
تمييز اشكال ميد
267 pages
Decision Trees Implementation
No ratings yet
Decision Trees Implementation
13 pages
Merging Result-Merged
No ratings yet
Merging Result-Merged
14 pages
ML Lab Experiments (1) - Pages-2
No ratings yet
ML Lab Experiments (1) - Pages-2
10 pages
Decision Trees Notes
No ratings yet
Decision Trees Notes
11 pages
Giuaki
No ratings yet
Giuaki
7 pages
C2 W4 Decision Tree With Markdown
No ratings yet
C2 W4 Decision Tree With Markdown
17 pages
AI-ML Module3
No ratings yet
AI-ML Module3
117 pages
21 Decision Trees
No ratings yet
21 Decision Trees
62 pages
Decision Trees
No ratings yet
Decision Trees
5 pages
23 Id3
No ratings yet
23 Id3
20 pages
Decision Trees
No ratings yet
Decision Trees
8 pages
Pra 5 ML
No ratings yet
Pra 5 ML
5 pages
Ds 6
No ratings yet
Ds 6
24 pages
ML Lab Record
No ratings yet
ML Lab Record
33 pages
Neural Nets (Wrap-Up) and Decision Trees: CS 188: Artificial Intelligence
No ratings yet
Neural Nets (Wrap-Up) and Decision Trees: CS 188: Artificial Intelligence
26 pages
Dtree&rf
No ratings yet
Dtree&rf
26 pages
Decision Tree and Random Forest
No ratings yet
Decision Tree and Random Forest
74 pages
Act 9
No ratings yet
Act 9
22 pages
MLRD 7
No ratings yet
MLRD 7
23 pages
Machine Learning 10601 Recitation 8 Oct 21, 2009: Oznur Tastan
No ratings yet
Machine Learning 10601 Recitation 8 Oct 21, 2009: Oznur Tastan
46 pages
Random Forest: The Algorithm in A Nutshell
No ratings yet
Random Forest: The Algorithm in A Nutshell
10 pages
Decision Trees
No ratings yet
Decision Trees
7 pages
Lab Manual2
No ratings yet
Lab Manual2
6 pages
22.InfoTheory DecisionTrees Short
No ratings yet
22.InfoTheory DecisionTrees Short
25 pages
Lesson 5
No ratings yet
Lesson 5
28 pages
Decision Trees
No ratings yet
Decision Trees
11 pages
A Comment About Greediness in Decision Trees
No ratings yet
A Comment About Greediness in Decision Trees
3 pages
Lecture 4
No ratings yet
Lecture 4
74 pages
Decision Tree
No ratings yet
Decision Tree
52 pages
Decision Trees: Classifier
No ratings yet
Decision Trees: Classifier
23 pages
Lecture 4
No ratings yet
Lecture 4
74 pages
7 DecisionTree
No ratings yet
7 DecisionTree
58 pages
Decision Tree
No ratings yet
Decision Tree
29 pages
AI Report 4
No ratings yet
AI Report 4
6 pages
ML4 - Decision Trees & Random Forest
No ratings yet
ML4 - Decision Trees & Random Forest
44 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
18 pages
Subtitle
No ratings yet
Subtitle
4 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
80 pages
Aiml Easy Solution
No ratings yet
Aiml Easy Solution
70 pages
Rabia Malik (s0001)
No ratings yet
Rabia Malik (s0001)
5 pages
DMDW Co3 Session 14
No ratings yet
DMDW Co3 Session 14
55 pages
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
No ratings yet
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
25 pages
2024 Decision Trees
No ratings yet
2024 Decision Trees
28 pages
Decision Tree
No ratings yet
Decision Tree
25 pages
MLSP Lab Exp4
No ratings yet
MLSP Lab Exp4
9 pages
Week03 Classification
No ratings yet
Week03 Classification
22 pages
Decision Trees
No ratings yet
Decision Trees
18 pages
Lista Uf PDF
No ratings yet
Lista Uf PDF
10 pages
Decision Trees
No ratings yet
Decision Trees
8 pages
Yatin Pandya: An Architect
No ratings yet
Yatin Pandya: An Architect
1 page
Over Load Protection For Transformer
No ratings yet
Over Load Protection For Transformer
45 pages
Check My Accounting Homework
100% (1)
Check My Accounting Homework
5 pages
Testbank: Chapter 13 Diversification Strategy: True/False Questions
No ratings yet
Testbank: Chapter 13 Diversification Strategy: True/False Questions
8 pages
Modernism & Postmodernism
No ratings yet
Modernism & Postmodernism
1 page
1229 Sketching Curves c1
No ratings yet
1229 Sketching Curves c1
13 pages
Ibm Power S1022 Server Product Report
No ratings yet
Ibm Power S1022 Server Product Report
5 pages
Novo Top Notch 3.A4
No ratings yet
Novo Top Notch 3.A4
88 pages
Today in Physics 217: Electric Dipoles and Their Interactions
No ratings yet
Today in Physics 217: Electric Dipoles and Their Interactions
15 pages
Trinomial Option Pricing Model
No ratings yet
Trinomial Option Pricing Model
5 pages
WAD Final Report
No ratings yet
WAD Final Report
31 pages
Final Book 10
No ratings yet
Final Book 10
79 pages
Paper 1
No ratings yet
Paper 1
27 pages
Ece 534 Summer 2018 Fundamentals of Power Electronics
No ratings yet
Ece 534 Summer 2018 Fundamentals of Power Electronics
2 pages
Adbury Survey of Training Experiences and Clinical Practice in Assessment For Autism Spectrum Disorder by Neuropsychologists
No ratings yet
Adbury Survey of Training Experiences and Clinical Practice in Assessment For Autism Spectrum Disorder by Neuropsychologists
19 pages
D) Capitalism vs. Communism (Animal Farm) (FORMATIVE 1)
No ratings yet
D) Capitalism vs. Communism (Animal Farm) (FORMATIVE 1)
10 pages
Nure 231-11
No ratings yet
Nure 231-11
1 page
24 Generic Toolbar Component 169163
No ratings yet
24 Generic Toolbar Component 169163
8 pages
3 Kaby Lake Overclocking Chart
No ratings yet
3 Kaby Lake Overclocking Chart
10 pages
Burdwan University Economics PH D List
No ratings yet
Burdwan University Economics PH D List
8 pages
International Society For Soil Mechanics and Geotechnical Engineering
No ratings yet
International Society For Soil Mechanics and Geotechnical Engineering
3 pages
Letter of Acknowledgment - International Host Incubator
No ratings yet
Letter of Acknowledgment - International Host Incubator
2 pages

C2 W4 Lab 01 Decision Trees

Uploaded by

C2 W4 Lab 01 Decision Trees

Uploaded by

C2_W4_Lab_01_Decision_Trees

June 19, 2024

1 Ungraded Lab: Decision Trees

and H is the entropy, defined as

H(p1 ) = −p1 log2 (p1 ) − (1 − p1 )log2 (1 − p1 )

[2]: %matplotlib widget

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Ba

Ear Shape Face Shape Whiskers Cat

[3]: X_train = np.array([[1, 1, 1],

y_train = np.array([1, 1, 0, 0, 1, 1, 0, 1, 0, 0])

[4]: #For instance, the first example

[4]: array([1, 1, 1])

Now let’s write a function to compute the entropy.

and the right indices, the remaining ones.

[7]: ([0, 3, 4, 5, 7], [1, 2, 6, 8, 9])

[8]: def weighted_entropy(X,y,left_indices,right_indices):

weighted_entropy = w_left * entropy(p_left) + w_right * entropy(p_right)

[9]: left_indices, right_indices = split_indices(X_train, 0)

[10]: def information_gain(X, y, left_indices, right_indices):

[11]: information_gain(X_train, y_train, left_indices, right_indices)

generate_tree_viz([0,1,2,3,4,5,6,7,8,9], y_train, tree)

Depth 0, Root: Split on feature: 0

generate_tree_viz([0,1,2,3,4,5,6,7,8,9], y_train, tree)

Depth 0, Root: Split on feature: 0

Congratulations! You completed the notebook!

You might also like