0% found this document useful (0 votes)
18 views45 pages

210 Handout

Uploaded by

zizhu.diary
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views45 pages

210 Handout

Uploaded by

zizhu.diary
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Machine Learning Techniques

(機器學習技法)

Lecture 10: Random Forest


Hsuan-Tien Lin (林軒田)
[email protected]

Department of Computer Science


& Information Engineering
National Taiwan University
(國立台灣大學資訊工程系)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 0/22


Random Forest

Roadmap
1 Embedding Numerous Features: Kernel Models
2 Combining Predictive Features: Aggregation Models

Lecture 9: Decision Tree


recursive branching (purification) for conditional
aggregation of constant hypotheses

Lecture 10: Random Forest


Random Forest Algorithm
Out-Of-Bag Estimate
Feature Selection
Random Forest in Action
3 Distilling Implicit Features: Extraction Models

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 1/22


Random Forest Random Forest Algorithm

Recall: Bagging and Decision Tree

Bagging Decision Tree


function Bag(D, A) function DTree(D)
For t = 1, 2, . . . , T if termination return base gt
1 request size-N 0 data D̃t by else
1 learn b(x) and split D to
bootstrapping with D Dc by b(x)
2 obtain base gt by A(D̃t ) 2 build Gc ← DTree(Dc )
return G = Uniform({gt }) 3 return G(x) =
P
C
Jb(x) = cK Gc (x)
c=1

—reduces variance —large variance


by voting/averaging especially if fully-grown

putting them together?


(i.e. aggregation of aggregation :-) )
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 2/22
Random Forest Random Forest Algorithm

Random Forest (RF)


random forest (RF) = bagging + fully-grown C&RT decision tree

function RandomForest(D) function DTree(D)


For t = 1, 2, . . . , T if termination return base gt
1 request size-N 0 data D̃t by else
1 learn b(x) and split D to
bootstrapping with D
Dc by b(x)
2 obtain tree gt by DTree(D̃t )
2 build Gc ← DTree(Dc )
return G = Uniform({gt })
3 return G(x) =
P
C
Jb(x) = cK Gc (x)
c=1

• highly parallel/efficient to learn


• inherit pros of C&RT
• eliminate cons of fully-grown tree
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 3/22
Random Forest Random Forest Algorithm

Diversifying by Feature Projection


recall: data randomness for diversity in bagging

randomly sample N 0 examples from D

another possibility for diversity:

randomly sample d 0 features from x

• when sampling index i1 , i2 , . . . , id 0 : Φ(x) = (xi1 , xi2 , . . . , xid 0 )


0
• Z ∈ Rd : a random subspace of X ∈ Rd
• often d 0  d, efficient for large d
—can be generally applied on other models
• original RF re-sample new subspace for each b(x) in C&RT

RF = bagging + random-subspace C&RT

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 4/22


Random Forest Random Forest Algorithm

Diversifying by Feature Expansion


randomly sample d 0 features from x: Φ(x) = P · x
with row i of P sampled randomly ∈ natural basis

more powerful features for diversity: row i other than natural basis
• projection (combination) with random row pi of P: φi (x) = pTi x
• often consider low-dimensional projection:
only d 00 non-zero components in pi
• includes random subspace as special case:
d 00 = 1 and pi ∈ natural basis
• original RF consider d 0 random low-dimensional projections for
each b(x) in C&RT

RF = bagging + random-combination C&RT


—randomness everywhere!

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 5/22


Random Forest Random Forest Algorithm

Fun Time
Within RF that contains random-combination C&RT trees, which of the
following hypothesis is equivalent to each branching function b(x)
within the tree?
1 a constant
2 a decision stump
3 a perceptron
4 none of the other choices
Random Forest Random Forest Algorithm

Fun Time
Within RF that contains random-combination C&RT trees, which of the
following hypothesis is equivalent to each branching function b(x)
within the tree?
1 a constant
2 a decision stump
3 a perceptron
4 none of the other choices

Reference Answer: 3
In each b(x), the input vector x is first
projected by a random vector v and then
thresholded to make a binary decision, which
is exactly what a perceptron does.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 6/22


Random Forest Out-Of-Bag Estimate

Bagging Revisited

Bagging g1 g2 g3 ··· gT
function Bag(D, A) (x1 , y1 ) D̃1 ? D̃3 D̃T
For t = 1, 2, . . . , T (x2 , y2 ) ? ? D̃3 D̃T
1 request size-N 0 data D̃t (x3 , y3 ) ? D̃2 ? D̃T
by bootstrapping with D ···
(xN , yN ) D̃1 D̃2 ? ?
2 obtain base gt by A(D̃t )
return G = Uniform({gt })

? in t-th column: not used for obtaining gt


—called out-of-bag (OOB) examples of gt

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 7/22


Random Forest Out-Of-Bag Estimate

Number of OOB Examples


OOB (in ?) ⇐⇒ not sampled after N 0 drawings

if N 0 = N
N
• probability for (xn , yn ) to be OOB for gt : 1 − N1
• if N large:
 N
1 1 1 1
1− =  = N ≈
N N N 1 e
N−1 1+ N−1

OOB size per gt ≈ e1 N

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 8/22


Random Forest Out-Of-Bag Estimate

OOB versus Validation

OOB Validation
g1 g2 g3 ··· gT g1− g2− ··· −
gM
(x1 , y1 ) D̃1 ? D̃3 D̃T Dtrain Dtrain Dtrain
(x2 , y2 ) ? ? D̃3 D̃T Dval Dval Dval
(x3 , y3 ) ? D̃2 ? D̃T Dval Dval Dval
···
(xN , yN ) D̃1 ? ? ? Dtrain Dtrain Dtrain

• ? like Dval : ‘enough’ random examples unused during training


• use ? to validate gt ? easy, but rarely needed
PN −
• use ? to validate G? Eoob (G) = N1 n=1 err(yn , Gn (xn )),
with Gn− contains only trees that xn is OOB of,

such as GN (x) = average(g2 , g3 , gT )

Eoob : self-validation of bagging/RF


Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 9/22
Random Forest Out-Of-Bag Estimate

Model Selection by OOB Error

Previously: by Best Eval RF: by Best Eoob


gm∗ = Am∗ (D) Gm∗ = RFm∗ (D)
∗ ∗
m = argmin Em m = argmin Em
1≤m≤M 1≤m≤M
Em = Eval (Am (Dtrain )) Em = Eoob (RFm (D))
• use Eoob for self-validation
H1 H2 · · · HM —of RF parameters such
Dtrain as d 00
g1 g2 · · · gM
Dval • no re-training needed
E E2 · · · EM
| 1 {z }
pick the best
(Hm∗ , Em∗ )
D
Eoob often accurate in practice
g m∗

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 10/22


Random Forest Out-Of-Bag Estimate

Fun Time
For a data set with N = 1126, what is the probability that (x1126 , y1126 )
is not sampled after bootstrapping N 0 = N samples from the data set?
1 0.113
2 0.368
3 0.632
4 0.887
Random Forest Out-Of-Bag Estimate

Fun Time
For a data set with N = 1126, what is the probability that (x1126 , y1126 )
is not sampled after bootstrapping N 0 = N samples from the data set?
1 0.113
2 0.368
3 0.632
4 0.887

Reference Answer: 2
The value of (1 − N1 )N with N = 1126 is about
0.367716, which is close to e1 = 0.367879.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 11/22


Random Forest Feature Selection

Feature Selection
for x = (x1 , x2 , . . . , xd ), want to remove
• redundant features: like keeping one of ‘age’ and ‘full birthday’
• irrelevant features: like insurance type for cancer prediction
and only ‘learn’ subset-transform Φ(x) = (xi1 , xi2 , xid 0 )
with d 0 < d for g(Φ(x))

advantages: disadvantages:
• efficiency: simpler • computation:
hypothesis and shorter ‘combinatorial’ optimization
prediction time in training
• generalization: ‘feature • overfit: ‘combinatorial’
noise’ removed selection
• interpretability • mis-interpretability

decision tree: a rare model


with built-in feature selection
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 12/22
Random Forest Feature Selection

Feature Selection by Importance


idea: if possible to calculate

importance(i) for i = 1, 2, . . . , d

then can select i1 , i2 , . . . , id 0 of top-d 0 importance

importance by linear model


d
X
score = wT x = wi xi
i=1
• intuitive estimate: importance(i) = |wi | with some ‘good’ w
• getting ‘good’ w: learned from data
• non-linear models? often much harder

next: ‘easy’ feature selection in RF


Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 13/22
Random Forest Feature Selection

Feature Importance by Permutation Test


idea: random test
—if feature i needed, ‘random’ values of xn,i degrades performance

• which random values?


• uniform, Gaussian, . . .: P(xi ) changed
• bootstrap, permutation (of {xn,i }N n=1 ): P(xi ) approximately
remained
• permutation test:

importance(i) = performance(D) − performance(D(p) )

with D(p) is D with {xn,i } replaced by permuted {xn,i }N


n=1

permutation test: a general statistical tool for


arbitrary non-linear models like RF
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 14/22
Random Forest Feature Selection

Feature Importance in Original Random Forest


permutation test:

importance(i) = performance(D) − performance(D(p) )

with D(p) is D with {xn,i } replaced by permuted {xn,i }N


n=1

• performance(D (p) ): needs re-training and validation in general


• ‘escaping’ validation? OOB in RF
(p)
• original RF solution: importance(i) = Eoob (G) − Eoob (G),
(p)
where Eoob comes from replacing each request of xn,i by a
permuted OOB value

RF feature selection via permutation + OOB:


often efficient and promising in practice

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 15/22


Random Forest Feature Selection

Fun Time
For RF, if the 1126-th feature within the data set is a constant 5566,
what would importance(i) be?
1 0
2 1
3 1126
4 5566
Random Forest Feature Selection

Fun Time
For RF, if the 1126-th feature within the data set is a constant 5566,
what would importance(i) be?
1 0
2 1
3 1126
4 5566

Reference Answer: 1
When a feature is a constant, permutation
does not change its value. Then, Eoob (G) and
(p)
Eoob (G) are the same, and thus
importance(i) = 0.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 16/22


Random Forest Random Forest in Action

A Simple Data Set

gC & RT gt (N 0 = N/2) G with first t trees


with random combination

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 17/22


Random Forest Random Forest in Action

A Simple Data Set

gC & RT gt (N 0 = N/2) G with first t trees


with random combination

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 17/22


Random Forest Random Forest in Action

A Simple Data Set

gC & RT gt (N 0 = N/2) G with first t trees


with random combination

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 17/22


Random Forest Random Forest in Action

A Simple Data Set

gC & RT gt (N 0 = N/2) G with first t trees


with random combination

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 17/22


Random Forest Random Forest in Action

A Simple Data Set

gC & RT gt (N 0 = N/2) G with first t trees


with random combination

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 17/22


Random Forest Random Forest in Action

A Simple Data Set

gC & RT gt (N 0 = N/2) G with first t trees


with random combination

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 17/22


Random Forest Random Forest in Action

A Simple Data Set

gC & RT gt (N 0 = N/2) G with first t trees


with random combination

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 17/22


Random Forest Random Forest in Action

A Simple Data Set

gC & RT gt (N 0 = N/2) G with first t trees


with random combination

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 17/22


Random Forest Random Forest in Action

A Simple Data Set

gC & RT gt (N 0 = N/2) G with first t trees


with random combination

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 17/22


Random Forest Random Forest in Action

A Simple Data Set

gC & RT gt (N 0 = N/2) G with first t trees


with random combination

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 17/22


Random Forest Random Forest in Action

A Simple Data Set

gC & RT gt (N 0 = N/2) G with first t trees


with random combination

‘smooth’ and large-margin-like boundary


with many trees
Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 17/22
Random Forest Random Forest in Action

A Complicated Data Set


gt (N 0 = N/2) G with first t trees

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/22


Random Forest Random Forest in Action

A Complicated Data Set


gt (N 0 = N/2) G with first t trees

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/22


Random Forest Random Forest in Action

A Complicated Data Set


gt (N 0 = N/2) G with first t trees

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/22


Random Forest Random Forest in Action

A Complicated Data Set


gt (N 0 = N/2) G with first t trees

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/22


Random Forest Random Forest in Action

A Complicated Data Set


gt (N 0 = N/2) G with first t trees

‘easy yet robust’ nonlinear model

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 18/22


Random Forest Random Forest in Action

A Complicated and Noisy Data Set


gt (N 0 = N/2) G with first t trees

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 19/22


Random Forest Random Forest in Action

A Complicated and Noisy Data Set


gt (N 0 = N/2) G with first t trees

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 19/22


Random Forest Random Forest in Action

A Complicated and Noisy Data Set


gt (N 0 = N/2) G with first t trees

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 19/22


Random Forest Random Forest in Action

A Complicated and Noisy Data Set


gt (N 0 = N/2) G with first t trees

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 19/22


Random Forest Random Forest in Action

A Complicated and Noisy Data Set


gt (N 0 = N/2) G with first t trees

noise corrected by voting

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 19/22


Random Forest Random Forest in Action

How Many Trees Needed?


almost every theory: the more, the ‘better’
assuming good ḡ = limT →∞ G

Our NTU Experience


• KDDCup 2013 Track 1 (yes, NTU is world champion again! :-)):
predicting author-paper relation
• Eval of thousands of trees: [0.015, 0.019] depending on seed;
Eout of top 20 teams: [0.014, 0.019]
• decision: take 12000 trees with seed 1

cons of RF: may need lots of trees if the


whole random process too unstable
—should double-check stability of G
to ensure enough trees

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 20/22


Random Forest Random Forest in Action

Fun Time
Which of the following is not the best use of Random Forest?
1 train each tree with bootstrapped data
2 use Eoob to validate the performance
3 conduct feature selection with permutation test
4 fix the number of trees, T , to the lucky number 1126
Random Forest Random Forest in Action

Fun Time
Which of the following is not the best use of Random Forest?
1 train each tree with bootstrapped data
2 use Eoob to validate the performance
3 conduct feature selection with permutation test
4 fix the number of trees, T , to the lucky number 1126

Reference Answer: 4
A good value of T can depend on the nature of
the data and the stability of the whole random
process.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 21/22


Random Forest Random Forest in Action

Summary
1 Embedding Numerous Features: Kernel Models
2 Combining Predictive Features: Aggregation Models

Lecture 10: Random Forest


Random Forest Algorithm
bag of trees on randomly projected subspaces
Out-Of-Bag Estimate
self-validation with OOB examples
Feature Selection
permutation test for feature importance
Random Forest in Action
‘smooth’ boundary with many trees

• next: boosted decision trees beyond classification

3 Distilling Implicit Features: Extraction Models

Hsuan-Tien Lin (NTU CSIE) Machine Learning Techniques 22/22

You might also like