0% found this document useful (0 votes)
70 views22 pages

Random Forests

The document discusses decision trees and random forests, explaining that decision trees can overfit data but random forests address this by building multiple decision trees on randomly selected subsets of data and taking the majority vote of the class predictions of each tree. It provides examples of how to calculate information gain to determine the best feature to split on at each node when building decision trees and compares the advantages and disadvantages of decision trees versus random forests.

Uploaded by

Waleed Soudi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views22 pages

Random Forests

The document discusses decision trees and random forests, explaining that decision trees can overfit data but random forests address this by building multiple decision trees on randomly selected subsets of data and taking the majority vote of the class predictions of each tree. It provides examples of how to calculate information gain to determine the best feature to split on at each node when building decision trees and compares the advantages and disadvantages of decision trees versus random forests.

Uploaded by

Waleed Soudi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Week #8

Random Forest

Artificial Intelligence &


Intelligent Systems
Slides By Dr.Rami Ibrahim
Objectives
 Introduction to Decision Trees.
 Decision Tree components.
 Build Decision Tree (Entropy Calculation)
 Decision Trees Advantages/Disadvantages
 Random Forest
Decision Trees
 Let us assume that a kid wants to play soccer.
However, we need to check the weather before going
out!!!
Decision Trees
 We have three roots in this tree, Sunny, Overcast, Rain.
 This is called a decision tree and is used for classification
(Yes, No).
Decision Trees
 We can answer sequential questions before making a
decision (outcome) by going along a route (top to bottom)
of the tree.
 At each node we have “If this, then that” condition.
Decision Tree Components
 Root Node: Starting point of the tree (Outlook).
 Branches: Lines connecting nodes.
 Leaf Nodes: Terminal nodes that predict the outcome
(Yes, No)
 Internal Nodes: Nodes that split for a value (Wind).
Decision Tree Components
 Depth of decision tree is the number of questions asked
before reaching the leaf node (class).
 The tree depth is represented by its longest route.
 Our tree depth is 2.
Decision Tree Example
 Assume we have the following table with 3 features A, B,
and C, and two target classes, Green and Blue.
A B C Target
1 1 1 Green
1 1 0 Green
0 0 1 Blue
1 0 0 Blue

 To decide which feature to start with in the decision tree,


we need to know some mathematical concepts first.
Decision Tree Example
 Entropy, H, is the level of impurity in a group

Where is the probability of

 Information Gain, IG, decides which feature to split at


each step when building the decision tree.

𝐼𝐺=𝐻 𝑃𝑎𝑟𝑒𝑛𝑡 𝑛𝑜𝑑𝑒 − 𝐻 𝑐h𝑖𝑙𝑑𝑛𝑜𝑑𝑒𝑠


Decision Tree Example
 Case#1: Split at A
A B C Target
1 1 1 Green
H(Parent) = -(2/4) (2/4) – (2/4) (2/4) =
1 1 0 Green
-(0.5)(-1)-(0.5)(-1) = 1
0 0 1 Blue
1 0 0 Blue
H(Child#1)= -(1/3) (1/3) – (2/3) (2/3) =
-(1/3)(-1.58)-(2/3)(-0.58) = 0.92

H(Child#2)= -(1) (1) = 0

IG = H(Parent) - (3/4) H(Child#1) – (1/4) H(Child#2) =


1 – (3/4) (0.92) – (1/4) 0 = 1-0.69-0 = 0.31
Decision Tree Example
 Case#2: Split at B
A B C Target
1 1 1 Green
H(Parent) = -(2/4) (2/4) – (2/4) (2/4) =
1 1 0 Green
-(0.5)(-1)-(0.5)(-1) = 1
0 0 1 Blue
1 0 0 Blue
H(Child#1)= -(2/2) (2/2) = 0

H(Child#2)= -(2/2) (2/2) = 0

IG = H(Parent) - (2/4) H(Child#1) – (2/4) H(Child#2) =


1 – 0 - 0= 1
Decision Tree Example
 Case#3: Split at C
A B C Target
1 1 1 Green
H(Parent) = -(2/4) (2/4) – (2/4) (2/4) =
1 1 0 Green
-(0.5)(-1)-(0.5)(-1) = 1
0 0 1 Blue
1 0 0 Blue
H(Child#1)= -(1/2) (1/2) -(1/2) (1/2) = 1

H(Child#2)= -(1/2) (1/2) -(1/2) (1/2) = 1

IG = H(Parent) - (2/4) H(Child#1) – (2/4) H(Child#2) =


1 – (2/4) (1) – (2/4) (1)= 1 – 0.5 – 0.5 = 1 -1 = 0
Decision Tree Example
 Summarizing the IG values for three features A, B, C:
Feature A B C
IG 0.31 1 0
Quality of Split Poor Best Worst

 The higher IG value means a higher quality of split for this


feature (Example: feature B). We want to maximize the IG
while building the decision tree.
 It is good to know how things work but all of these calculations
will be performed internally in Python.
Decision Tree Example
Decision Tree
Advantages/Disadvantages
 Advantages:
 Easy to follow and understand (Follow the route starting from root
node and ending with leaf node).
 Fast and can handle numerical and categorical datasets.
 Disadvantages:
 Training decision trees can be high computational.
 They can overfit (over-complex decision trees that cannot
generalize well and results in poor performance on unseen data).
Training error goes down when the depth is higher.
 Improve decision trees by:
 Pruning
 Bagging (Random Forest)
Overfitting & Underfitting
Overfitting & Underfitting
Random Forest
 Random Forest (RF) is one of the most powerful ML
algorithms that corrects overfitting decision trees while
training.
 The purpose of RF is to assemble multiple trees in
randomly selected subsets at training and outputting the
mode class.
 The decision tree is a greedy algorithm and will use the
feature as top split to minimize the error (maximize
information gain IG).
 RF de-correlate trees and exclude candidate features so
those features will not have a strong effect on the
prediction.
Random Forest
 To avoid the overfitting issue in decision trees, we can
apply a bagging approach which does the following:
 Create 5 bagged decision trees.
 Create 500 subsets from the dataset.
 Train the model on each subset.
 Assume that the classes predicted in the 5 decision
trees are: G, G, B, G, B. Then the most frequent class
G is the final prediction.
Random Forest
 Random Forest assemble multiple trees in randomly
selected subsets at training time and return the class
that is the mode of the classes.
 Random Forest make trees independent and minimize
correlation among them (strong features has no effect on
prediction).
 Given P number of features, number of random selected
features at each split (tree) m is sqrt(P).
 If we have 25 features in the dataset, the number of
selected features m is set to 5
Random Forest

m selected
P features features
N examples

Dataset m selected Take the


features mode class
(majority

....…
vote)
....…

m selected
features
References
Hands-On Machine Learning with Scikit-Learn, Keras, and
TensorFlow: Concepts, Tools, and Techniques to Build Intelligent
Systems’ 2nd Edition

You might also like