0% found this document useful (0 votes)
5 views38 pages

14 - Ensemble Methods

The document discusses machine learning, focusing on decision trees and ensemble methods like Random Forest. It highlights the strengths and weaknesses of decision trees, introduces the Random Forest algorithm which improves accuracy by combining multiple trees, and explains the Random Decision Tree approach. Key concepts include bias-variance tradeoff, bootstrapping, and the importance of feature selection in building robust models.

Uploaded by

nydiacarissa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views38 pages

14 - Ensemble Methods

The document discusses machine learning, focusing on decision trees and ensemble methods like Random Forest. It highlights the strengths and weaknesses of decision trees, introduces the Random Forest algorithm which improves accuracy by combining multiple trees, and explains the Random Decision Tree approach. Key concepts include bias-variance tradeoff, bootstrapping, and the importance of feature selection in building robust models.

Uploaded by

nydiacarissa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 38

Machine Learning

Ensemble Methods
ADF

1 05/12/2025
Outline
Decision Trees

Random Forest

Random Decision Tree

2 05/12/2025 Machine Learning


Review Decision Trees

3 05/12/2025 Machine Learning


Decision Trees
Decision trees have a long history in machine learning
– The 1st popular algorithm dates back to 1979

Very popular in many real world problems

Intuitive to understand

Easy to build, easy to use, easy to interpret

But in practice, they are not that awesome

4 05/12/2025 Machine Learning


Decision Trees
“Trees have one aspect that prevents them from being the
ideal tool for predictive learning, namely inaccuracy”
– - the elements of statistical learning

Work great with the data used to create them

Not flexible when it comes to classifying new sample

5 05/12/2025 Machine Learning


Smaller Decision Trees
More flexible, reduce overfitting

Bias-Variance Tradeoff
– Bias: Representation Power of Decision Trees
– Variance: require a sample size exponential in depth

6 05/12/2025 Machine Learning


Random Forest

7 05/12/2025 Machine Learning


Random Forest
A class of ensemble methods specifically designed for
decision tree classifier

It combines the predictions made by multiple decision trees,


– where each tree is generated based on the values of an
independent set of random vectors,

Combining the simplicity of decision trees with flexibility


resulting in a vast improvement in accuracy

8 05/12/2025 Machine Learning


Random Forest Algorithm
Choose T—number of trees to grow

Choose m<M (M is the number of total features) —number of


features used to calculate the best split at each node (typically
20%)

For each tree


– Choose a training set by choosing times
(N is the number of training examples)
with replacement from the training set
– For each node, randomly choose features and calculate the best split
– Fully grown and not pruned

Use majority voting among all the trees

9 05/12/2025 Machine Learning


Random Forest Example

10 05/12/2025 Machine Learning


Original Dataset

Bootstrap with the same size of


the original data

Original Dataset

11 05/12/2025 Machine Learning


Bootstrap Dataset

1 2

2 1

3 4

4 4

Original Dataset Bootstrapped Dataset

12 05/12/2025 Machine Learning


Random Forest Algorithm
Create a decision tree using the bootstrapped dataset

But ONLY use a random subset of variable (feature/columns)


at each step

For this example, let’s use m=2


(2 randomly selected features)

13 05/12/2025 Machine Learning


Building Decision Tree

Bootstrapped Dataset

14 05/12/2025 Machine Learning


Building Decision Tree randomly select 2

Bootstrapped Dataset

15 05/12/2025 Machine Learning


Building Decision Tree

Choose best splitting point


(for this example: blood circulation)

Bootstrapped Dataset

16 05/12/2025 Machine Learning


Building Decision Tree

Remove the already chosen feature


for the next step
Bootstrapped Dataset

17 05/12/2025 Machine Learning


Building Decision Tree

Bootstrapped Dataset
Repeat for the next Step,
Randomly select feature from the
remaining set

18 05/12/2025 Machine Learning


Building Decision Tree

Bootstrapped Dataset
Build the tree as usual, but only
considering a random subset of
feature at each step

19 05/12/2025 Machine Learning


Random Forest
Repeat the process to make new trees
– Make new bootstrapped dataset
– Build a tree with only considering a random subset of features at
each step

20 05/12/2025 Machine Learning


Random Forest Test Example

21 05/12/2025 Machine Learning


Test Example
Test new Data to each trees

YES

22 05/12/2025 Machine Learning


Random Forest Performance Test

23 05/12/2025 Machine Learning


Out of bag Data

1 2

2 1

3 4

4 4

Original Dataset Bootstrapped Dataset

24 05/12/2025 Machine Learning


Out of bag Data

Test the out-of-bag data to the forest

25 05/12/2025 Machine Learning


Out of bag Data

Track the performance


for all out-of-bag data

26 05/12/2025 Machine Learning


Random Forest Accuracy
Measure how accurate the random forest by the proportion
of out-of-bag sample that were correctly classified by the
Random Forest

Observe the random sample and choose the RF with


highest out-of-bag accuracy
– Usually start with or
– = total feature (dimension)

27 05/12/2025 Machine Learning


Random Forest Recap

28 05/12/2025 Machine Learning


Random Forest Recap
Bagging+random features

random forests tries to improve on bagging by


“de-correlating“ the trees.
– Each tree has the same expectation.

Improve accuracy
– Incorporate more diversity and reduce variances

Improve efficiency
– Searching among subsets of features is much faster than
searching among the complete set

29 05/12/2025 Machine Learning


Random Decision Tree

30 05/12/2025 Machine Learning


Random Decision Tree
Single-model learning algorithms
– Fix structure of the model, minimize some form of errors, or
maximize data likelihood
 (eg., Logistic regression, Naive Bayes, etc.)
– Use some “free-form” functions to match the data given some
“preference criteria” such as information gain, gini index and
MDL.
 (eg., Decision Tree, Rule-based Classifiers, etc.)

31 05/12/2025 Machine Learning


Random Decision Tree
Such methods will make mistakes if
– Data is insufficient
– Structure of the model or the preference criteria is inappropriate
for the problem

Learning as Encoding
– Make no assumption about the true model, neither parametric
form nor free form
– Do not prefer one base model over the other, just average them

32 05/12/2025 Machine Learning


Random Decision Tree Algorithm
At each node, an un-used feature is chosen randomly
– A discrete feature is un-used if it has never been chosen previously on a
given decision path starting from the root to the current node.
– A continuous feature can be chosen multiple times on the same decision
path, but each time a different threshold value is chosen

We stop when one of the following happens:


– A node becomes too small (<= 3 examples).
– Or the total height of the tree exceeds some limits, such as the total
number of features.

Prediction
– Simple averaging over multiple trees

33 05/12/2025 Machine Learning


Random Decision Tree Algorithm
B1: {0,1}
B1 == 0 B1 chosen
B2: {0,1} randomly
B3: continuous Y N B2: {0,1}
B3:
B2 == 0? B3 < 0.3? continuous
B2: {0,1} B3 chosen
randomly
B3: Y N Random threshold
continuous 0.3
B2 chosen
randomly ……… B3 < 0.6?

B3: continous
Random threshold
0.6
34 05/12/2025 Machine Learning
Random Decision Tree
Potential Advantages
– Training can be very efficient. Particularly true for very large
datasets.
– No cross-validation based estimation of parameters for some
parametric methods.
– Natural multi-class probability.
– Imposes very little about the structures of the model.

35 05/12/2025 Machine Learning


Question?

36 05/12/2025 Machine Learning


37 05/12/2025 Machine Learning
THANK YOU
05/12/2025
38

You might also like