22 Boosting
22 Boosting
Boosting
4
Advantages of Random Forest
Easy to implement. works well “out of the box”.
Only two hyperparameters
RF is not sensitive to hyperparameters value
Known values that usually work well
Makes sense!
= how to change the current prediction such that loss is increased
Find a new classifier that points at the other direction
Inner product is minimized for opposing vectors
10
Example
11
Gradient boost
Task: train a new tree
Must be better than random,
=The inner product of with is negative (angle > 90)
= take a step in the right direction i.e., reduce loss
12
Gradient boost
13
Gradient boost
14
Sanity check
Currently:
However:
Minimum value for is at
Adding the new learner to would reduce the error
15
Gradient boost for trees
Hypothesis space spans all regression trees with () with Limited
depth (usually })
Highly biased model = weak learner
1. Until convergence
Regression tree minimizing
instead of
Hyperparameters =
16
Adaptive Boost
AdaBoost loss:
Assume: Binary classification,
Assume: The weak learners always return
17
AdaBoost
18
AdaBoost
We assumed that
19
AdaBoost
That is, the added learner should minimize exponential loss for misclassified
samples
20
AdaBoost
We define
The normalized loss contribution for each training sample
21
Yes!
As long as, , we have a meaningful value for
That is, we can still reduce the loss even when the training error is zero
This is good news! Why?
Even when fits our training data perfectly, we can continue training it and
widen the classification margin
Now that we can define , lets add it to our ensemble
22
Adaptive Boosting
23
Adaptive Boosting
We assume that
24
Adaptive Boosting
25
AdaBoost
Work in iterations!
At each iteration need to re-compute all the weights
Can simply update we won’t prove
26
AdaBost
27
28
29
30
What did we learn?
Boosting = iteratively build an ensemble where each new learner
() is trained to reduce the error of the () ensemble
Boosting is an extremely powerful algorithm, that turns any weak
learner (better than random) into a strong learner
For AdaBoost (=adaptive step size and exponential loss), the
training error decreases exponentially with iterations
Requires only steps until it is consistent with the training set (wasn’t
proved)
31
What next?
Class: Midterm!
Assignments:
Assignment (P3): SVM, linear regression and kernelization, due Tuesday
Nov-16
Assignment (P4): Decision trees, due Thursday Nov-25
Quizzes:
Quiz 5: decision trees and bagging, due Thursday Nov-18
32