0% found this document useful (0 votes)

43 views

Week 12

This document provides an overview of boosted, bagging and random forest machine learning techniques. It recaps regression, logistic regression and classification and regression trees. It then discusses bagging, which builds many classification or regression trees on bootstrapped samples and averages their predictions. Random forest is introduced as an improvement over bagging that adds randomness to the variable selection at each split to decrease correlation among trees. The document provides step-by-step explanations of bagging and random forest algorithms.

Uploaded by

Siddhant Jha

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views

Week 12

Uploaded by

Siddhant Jha

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

College of Science and Engineering

James Cook University

Week 12: Boosted, bagging and Random

Forest

Dr Carla Ewels
[email protected]

Oct 2021
Recap
1

I Regression and logistic regression

I Classification and Regression tree

2
1

1
X2

X2
0

0
−1

−1
−2

−2
−2 −1 0 1 2 −2 −1 0 1 2

X1 X1
2

2
1

1
X2

X2
0

0
−1

−1
−2

−2

−2 −1 0 1 2 −2 −1 0 1 2

X1 X1

I Top row: true linear boundary

I Bottom row: Non-linear boundary
Carla Ewels |
Recap
2

1. Grow - use greedy algorithm to find variable and split which

minimisea loss function
I Regression- RSS
I Classification - misclassification rate, Gini Index and Cross-entropy
2. Prune - cost complexity pruning. For each tuning parameter α,
find a subtree with the lowest cost complexity criteria
3. Choose tuning parameter - α. K-fold CV
3.1 Divide data into K sets, for k = 1, . . . , K
3.2 Repeatedly growing and pruning with training set
3.3 Use the validation set to find prediction error
For each α, average the errors from repetitions. Choose the α
with the smallest average error
4. The optimal tree is subtree with corresponding α value identified
in Step 3

Carla Ewels |
Example
3

1
Labour
.30 .47 .22
100%
yes Blair < 3.5 no 3
Labour
.16 .61 .22
65%
Hague >= 3.5
6
Labour
.37 .44 .19
22%
Europe >= 7.5

2 12 13 7
Conservative Conservative Labour Labour
.56 .21 .23 .54 .30 .16 .21 .56 .22 .06 .70 .24
35% 10% 12% 43%

What are misclassification rate, Gini index and cross-entropy of the

rightmost node?
Carla Ewels |
Classification tree- Loss function
4

I Misclassification error: At a terminal node m, the impurity of

the node is the proportion of cases are misclassified, i.e.
1 X
I(yi 6= k(m)) = 1 − p̂mk(m)
Nm
i∈Rm

I Gini Index: At a terminal node m, the impurity of the node is

X K
X
p̂mk p̂mk0 = p̂mk (1 − p̂mk )
k6=k0 k=1

I Cross-entropy (or deviance): At a terminal node m, the

impurity of node m is
K
X
− p̂mk log p̂mk
k=1

Carla Ewels |
Advanced trees
5

I Simple and easy for interpretation

I Low prediction accuracy
I Unstable
I Many ways to improve tree-based algorithm
I Bagging
I Random Forest
I Boosting
I Con: Black box
I Key features - variable selection

Carla Ewels |
Bagging
6

I Build on CARTS algorithm

I Generating large number of trees using bootstrapping samples
I Combine predictions of different trees
I Wisdom of crowds

Carla Ewels |
Bootstrapping
7

I Resampling method
I Draw random samples with replacement from training set
I Grow a tree
I Prediction
I Repeat for many times
I Prediction, regression tree

B
1 X ˆ∗b
fˆbag (x) = f (x)
B
b=1

where B is total number of bootstrapping samples and fˆ∗b the

prediction from bth samples
I This is called bagging

Carla Ewels |
Step illustration
8

Carla Ewels |
Bagging Classification tree
9

I Classification trees: for each test observation, we record the

class predicted by each of the B trees, and class with the most
votes is the predicted
I Does this work?
I For regression tree, we can prove mathematically (i.e. using bias
and variance) that using the average of model predictions has
lower MSE than using individual model prediction
I For classification tree, not always (no bias and variance)
I Bagging good classifier can improve the predictions, but bagging bad
classifier can produce worse results.
I wisdom of crowds - assumption that individual in the crowds are
independent
I Bagged trees are not independent, therefore the advantage of
wisdom of crowds is not always true.

Carla Ewels |
Other issues with bagging
10

I A bagged tree does not retain the tree structure, therefore hard
to interpret (black box)
I A bagged tree is not a tree

Carla Ewels |
Random Forest
11

I Overcome the dependency (correlation) among trees

I RF is developed after boosting (next part), however more popular
than boosting
I Breiman (2001) founds that trees can be de-correlated when a
different subset of predictors is used at each split

Carla Ewels |
Random Forest
12

1. For b = 1, . . . , B
(a) Draw a bootstrapping sample, Z ∗ of size N from training data (with
replacement)
(b) Grow a random-forest tree Tb to the bootstrapped data by
recursively repeating the following step for each terminal node of
the tree, until the minimum node size nmin is reached
i Select m variables at random from the p variables
ii Pick the best variable/split-point among the m
iii Split the node
(c) Output of ensemble of trees {Tb }B
1

Carla Ewels |
Random Forest
13

I Let P be size of all predictors, the typical size of the subset (m)
√
I P for classification
I p/2 for regression
I When m = P , random forest is the same as bagging.
I Prediction
I Regression,
B
1 X
fˆrf
B
(x) = Tb (x)
B
b=1
I Classification,
Let Ĉb (x) be the prediction of the bth random-forest tree, then
B
Ĉrf (x) = majority vote{Ĉb (x)}B
1

Carla Ewels |
Bagging and Random Forest in R
14

I Share common algorithm

I Full grown tree - no pruning required
I Same R-package: randomForest
I mtyr equals number of predictor - bagging
I mtyr < number of predictor - random forest

Carla Ewels |
Example: Heart data
15

I Binary outcome on heart disease of 303 patients presented with

chest pain (yes/no)
I 13 predictors
I CV yields a tree with six terminal node

Carla Ewels |
Example: Heart
16

Thal:a
|

Ca < 0.5 Ca < 0.5

Slope < 1.5 Oldpeak < 1.1

MaxHR < 161.5 ChestPain:bc Age < 52 Thal:b RestECG < 1

ChestPain:a Yes
RestBP < 157 No Yes Yes
Yes No
Chol < 244 No Chol < 244 Sex < 0.5
MaxHR < 156 No Yes
MaxHR < 145.5 Yes
No
No No No No Yes
No Yes

Thal:a
|
0.6

Training
Cross−Validation
Test
0.5
0.4
Error

0.3

Ca < 0.5 Ca < 0.5

0.2

Yes Yes
0.1

MaxHR < 161.5 ChestPain:bc

0.0

No No
No Yes
5 10 15

Tree Size

Carla Ewels |
Example: Heart
17

I Dashed line - single tree

I Bagging resulted in a slightly
better prediction than a
0.30

single tree approach

I RF has lowest error, better
0.25

than Bagging
I No overfitting issue with
Error

0.20

Bagging - around 100 trees

0.15

Test: Bagging
Test: RandomForest
OOB: Bagging
0.10

OOB: RandomForest

0 50 100 150 200 250 300

Number of Trees

Carla Ewels |
Example: Random forest with different subset
sizes 18

I p=500 gene expression data

I n=349 patients
I There are around 20,000 genes in humans, and individual genes
have different levels of activity, or expression, in particular cells,
tissues, and biological conditions.
I response - normal or 14 different types of cancer
I training and test sets

Carla Ewels |
Example: Random forest with different subset
sizes 19

m=p
m=p/2

0.5
m= p
Test Classification Error

0.4
0.3
0.2

0 100 200 300 400 500

Number of Trees

I Slightly improvement than bagging (orange- bagging)

I Better than single tree - error rate of 45.7%

Carla Ewels |
Boosted tree
20

I Unlike bagging and random forest,

I add “Boosting” to tree algorithm
What is boosting?
I the most powerful learning idea introduced in the late 90’s/00’s
I originally design for classification problem, but extended to
regression problems
I fitting “weak” classifiers to the original but modified data
I combines weak classifiers to produce a strong committee
I learning slowly

Carla Ewels |
Example:AdaBoost
21

I AdaBoost
I classification problem with
K = 2, Y ∈ {−1, 1}
I Prediction

M
X
G(x) = sign αm Gm (x)
m=1

coefficient αm is the weight

each subtree, Gm (x)
contributes to the overall
prediction.

Carla Ewels |
Example:AdaBoost
22

Carla Ewels |
Example:AdaBoost
23

Carla Ewels |
Boosting algorithm for regression tree
24

1. set fˆ(x) = 0 and ri = yi for all i in the training set

2. for b = 1, 2, . . . , B, repeat:
2.1 Fit a tree fˆb (x) with d splits (d + 1 terminal nodes) to training data
(X, r)
2.2 Update fˆ by adding in a shrunken version of the new tree:

fˆ(x) ← fˆ(x) + λfˆb (x)

2.3 Update residual

ri ← ri − λfˆb (xi )
3. Output the boosted model,
B
X
fˆ(xi ) = λfˆb (x)
b=1

Carla Ewels |
Extension of Boosted tree
25

I General framework - forward stagewise additive modelling

I Speed-up the search
I Different optimisation algorithms - gradient descent and speed
descent
I Limited loss function - differentiable
I Gradient tree boosting algorithm

Carla Ewels |
Tuning parameters
26

I Number of trees B. Large B results in overfitting. CV to find B

I Shrinkage parameter λ, controls the rate of learning, typically
between 0.01 and 0.001. Small λ will require B to be large
I Number of splits in each subtree - d, controls complexity of
ensemble
I d = 1 - stump
I Depends on problem in hand
I Seldom improvement when d is over 6

Carla Ewels |
Example
27

Carla Ewels |
Example
28

Carla Ewels |
Interpretation of trees
29

I Ensemble of trees - better predictive powerful

I Not able to interpret the findings
I Options
I Variable relative importance
I Partial dependency plots

Carla Ewels |
Variable importance
30

I CART- Not as reliable, amount of errors reduced by primary and

surrogate splits
I Bagging and random forest
I The variable importance ranking is derived from all trees
I At each tree, variable importance is quantified based on the level of
improvement the variable contributes at each internal node, and
sum over all trees

Carla Ewels |
Random Forest
31

I Random forest also uses OOB samples to measure variable

importance (prediction power)
I OOB samples are firstly passed down the tree to estimate the
baseline prediction accuracy
I The values of predictor x` in the OOB samples are then
permutated and passed down the tree again to estimate the
prediction accuracy.
I The differences between the prediction accuracies are then
averaged over all trees in the forest to quantify the importance of x`

Carla Ewels |
Boosting
32

I In boosted trees, a new tree is constructed at each iteration and

added to the final model
I Variable importance is evaluated at each “subtree”.
I In a single decision tree, the importance of a predictor(`) is the
sum of improvement it contributed at each internal node, hence
J−1
X
I`2 (T ) = ι̂2t I(v(t) = `)
j=1

where ι̂t is the improvement contributed by variable ` at internal

node t.
I Therefore, the importance of ` is the average contribution of the
variable over all trees,
M
1 X 2
I`2 = I (Tm )
M m=1 `

Carla Ewels |
Partial dependency plot
33

Carla Ewels |

DS Capstone Presentation
No ratings yet
DS Capstone Presentation
46 pages
Applied Multiple Regression/Correlation Analysis For The Behavioral Sciences
No ratings yet
Applied Multiple Regression/Correlation Analysis For The Behavioral Sciences
545 pages
859715094
No ratings yet
859715094
10 pages
Lecture 5
No ratings yet
Lecture 5
53 pages
CS109a Lecture16 Bagging RF Boosting
No ratings yet
CS109a Lecture16 Bagging RF Boosting
48 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
32 pages
Decision Tree & Regression
No ratings yet
Decision Tree & Regression
33 pages
08 Tree Advanced
No ratings yet
08 Tree Advanced
68 pages
RandomForest2324 CR - 4p
No ratings yet
RandomForest2324 CR - 4p
11 pages
Lecture #15: Regression Trees & Random Forests
No ratings yet
Lecture #15: Regression Trees & Random Forests
34 pages
Random Forests 2
No ratings yet
Random Forests 2
43 pages
Random Forest
No ratings yet
Random Forest
8 pages
Lecture 05 Random Forest 07112022 124639pm
No ratings yet
Lecture 05 Random Forest 07112022 124639pm
25 pages
05 - Ensemble Learning
No ratings yet
05 - Ensemble Learning
39 pages
Chp 8.2 Intro to Statistical Learning
No ratings yet
Chp 8.2 Intro to Statistical Learning
13 pages
Random Forest
No ratings yet
Random Forest
83 pages
Random Forests
No ratings yet
Random Forests
43 pages
Lecture 20: Bagging, Random Forests, Boosting: Reading: Chapter 8
No ratings yet
Lecture 20: Bagging, Random Forests, Boosting: Reading: Chapter 8
53 pages
Random Forests: N 1 N J X A I X A I
No ratings yet
Random Forests: N 1 N J X A I X A I
12 pages
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
No ratings yet
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
27 pages
Classification Algorithms
No ratings yet
Classification Algorithms
68 pages
Random Forest: Proprietary Content. ©great Learning. All Rights Reserved. Unauthorized Use or Distribution Prohibited
No ratings yet
Random Forest: Proprietary Content. ©great Learning. All Rights Reserved. Unauthorized Use or Distribution Prohibited
16 pages
Random Forest Intro Presented
No ratings yet
Random Forest Intro Presented
38 pages
Machine Learning: Practical Tutorial On Random Forest and Parameter Tuning in R
No ratings yet
Machine Learning: Practical Tutorial On Random Forest and Parameter Tuning in R
11 pages
ML-Lec6
No ratings yet
ML-Lec6
4 pages
R Examples of Using Some Prediction Tools (Highlight: Random Forest)
No ratings yet
R Examples of Using Some Prediction Tools (Highlight: Random Forest)
9 pages
Bagging and Random Forests
No ratings yet
Bagging and Random Forests
24 pages
CP 4
No ratings yet
CP 4
2 pages
phys361-S24-lecture-17-random-forests
No ratings yet
phys361-S24-lecture-17-random-forests
24 pages
CE880_Lecture7_slides
No ratings yet
CE880_Lecture7_slides
78 pages
Random Forest
No ratings yet
Random Forest
29 pages
Random Forest
No ratings yet
Random Forest
30 pages
Random Forest
No ratings yet
Random Forest
25 pages
Lecture+Notes+-+Random Forests
No ratings yet
Lecture+Notes+-+Random Forests
10 pages
ML Mid Question Solve
No ratings yet
ML Mid Question Solve
19 pages
Machine Learning: Classification & Decision Trees
No ratings yet
Machine Learning: Classification & Decision Trees
24 pages
23 Ens RandomForests
No ratings yet
23 Ens RandomForests
27 pages
Random Forest
No ratings yet
Random Forest
27 pages
Random Forest
No ratings yet
Random Forest
25 pages
Random Forest-Supervised ML
No ratings yet
Random Forest-Supervised ML
45 pages
Ushna FYP
No ratings yet
Ushna FYP
25 pages
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
100% (1)
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
14 pages
Lecture 9 PDF
100% (1)
Lecture 9 PDF
28 pages
Insurance Analytics: Prof. Julien Trufin
No ratings yet
Insurance Analytics: Prof. Julien Trufin
50 pages
Random Forest Summary
No ratings yet
Random Forest Summary
6 pages
5 - EnsembleModeling
No ratings yet
5 - EnsembleModeling
80 pages
Trees, Boosting, and Random Forest
No ratings yet
Trees, Boosting, and Random Forest
14 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
40 pages
Random Forest
No ratings yet
Random Forest
5 pages
Data Mining Notes
No ratings yet
Data Mining Notes
5 pages
Random Forest
No ratings yet
Random Forest
10 pages
Bagging and Random Forest Presentation1
100% (2)
Bagging and Random Forest Presentation1
23 pages
Schonlau Zou 2020 The Random Forest Algorithm For Statistical Learning
No ratings yet
Schonlau Zou 2020 The Random Forest Algorithm For Statistical Learning
27 pages
Supervised Learning: Overview 3: Rayid Ghani
No ratings yet
Supervised Learning: Overview 3: Rayid Ghani
20 pages
Random Forest
No ratings yet
Random Forest
5 pages
Montillo RandomForests 4-2-2009
No ratings yet
Montillo RandomForests 4-2-2009
28 pages
Random Forest
No ratings yet
Random Forest
16 pages
Week 7 - Tree-Based Model
100% (1)
Week 7 - Tree-Based Model
8 pages
Deep Learning and Neural Networks
No ratings yet
Deep Learning and Neural Networks
21 pages
Calculus: Maths of the Gods
From Everand
Calculus: Maths of the Gods
Bill Todorovich
No ratings yet
Factoring and Algebra - A Selection of Classic Mathematical Articles Containing Examples and Exercises on the Subject of Algebra (Mathematics Series)
From Everand
Factoring and Algebra - A Selection of Classic Mathematical Articles Containing Examples and Exercises on the Subject of Algebra (Mathematics Series)
CSPacademic
No ratings yet
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet
Capsule Calculus
From Everand
Capsule Calculus
Ira Ritow
No ratings yet
Publish or Perish': Family Life and Academic Research Productivity
No ratings yet
Publish or Perish': Family Life and Academic Research Productivity
10 pages
Senior High School Academic Progression in Mathematics
No ratings yet
Senior High School Academic Progression in Mathematics
10 pages
Tutorials On The Usage of The Geo R
No ratings yet
Tutorials On The Usage of The Geo R
113 pages
R22 ML QUESTION BANK FOR IT AND CSM
No ratings yet
R22 ML QUESTION BANK FOR IT AND CSM
4 pages
Kelompok 5 - Biostatistik Intermediet
No ratings yet
Kelompok 5 - Biostatistik Intermediet
8 pages
Lecture 9 Simple Regression
No ratings yet
Lecture 9 Simple Regression
52 pages
Analysis On Factors That Affect Stock Prices: A Study On Listed Cement Companies at Dhaka Stock Exchange
No ratings yet
Analysis On Factors That Affect Stock Prices: A Study On Listed Cement Companies at Dhaka Stock Exchange
21 pages
Untitled
No ratings yet
Untitled
3 pages
CONDINOetal - DAVAO DEL SUR - PROJECT TENSAI
No ratings yet
CONDINOetal - DAVAO DEL SUR - PROJECT TENSAI
30 pages
The Usage of Comic Strips in Teaching Possessive Pronouns
No ratings yet
The Usage of Comic Strips in Teaching Possessive Pronouns
85 pages
Fourth Quarter Periodical Test in Grade Eleven Statistics and Probability
100% (2)
Fourth Quarter Periodical Test in Grade Eleven Statistics and Probability
2 pages
Tugas 5 Peramalan Bisnis Tri Felbi Rayenra (19134085)
No ratings yet
Tugas 5 Peramalan Bisnis Tri Felbi Rayenra (19134085)
6 pages
Oral Defense Evaluation Forms
No ratings yet
Oral Defense Evaluation Forms
7 pages
Topic_2_Exercises(1)
No ratings yet
Topic_2_Exercises(1)
6 pages
Emsea RD
No ratings yet
Emsea RD
7 pages
How Leaders Blend Data And Intuition To Make Better Decisions
No ratings yet
How Leaders Blend Data And Intuition To Make Better Decisions
12 pages
M6 Assignment-SpringA25_WFAvQj1 (1)
No ratings yet
M6 Assignment-SpringA25_WFAvQj1 (1)
20 pages
Data Analyst Roadmap
No ratings yet
Data Analyst Roadmap
11 pages
Statistical Treatment
No ratings yet
Statistical Treatment
7 pages
AI With Power BI and Tableau
No ratings yet
AI With Power BI and Tableau
10 pages
Statistics For Data Analysis Lec 1 Introduction and Visualization
No ratings yet
Statistics For Data Analysis Lec 1 Introduction and Visualization
8 pages
Intentional Observer Effects On Quantum Randomness: A Bayesian Analysis Reveals Evidence Against Micro-Psychokinesis
No ratings yet
Intentional Observer Effects On Quantum Randomness: A Bayesian Analysis Reveals Evidence Against Micro-Psychokinesis
11 pages
Stats 4thq Quiz 1
No ratings yet
Stats 4thq Quiz 1
16 pages
01-Introduction To Data Analytics
No ratings yet
01-Introduction To Data Analytics
7 pages
Chapter I - Introduction 1.1. Introduction About The Study 1.1.1 Stress
No ratings yet
Chapter I - Introduction 1.1. Introduction About The Study 1.1.1 Stress
54 pages
Allama Iqbal Open University, Islamabad
No ratings yet
Allama Iqbal Open University, Islamabad
29 pages
BUSI 410 Business Analytics: Module 25: Store24 (A) - Managing Employee Retention
No ratings yet
BUSI 410 Business Analytics: Module 25: Store24 (A) - Managing Employee Retention
20 pages