0% found this document useful (0 votes)

22 views32 pages

22 Boosting

Uploaded by

damasodra33

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views32 pages

22 Boosting

Uploaded by

damasodra33

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 32

CSCE-421 Machine Learning

Boosting

Instructor: Guni Sharon 1

Examples by: Kilian Weinberger
Announcements
 Midterm on Tuesday, November-23 (in class)
 Covering all topics up to the exam date
 Written exam
 One theoretical question and 4 multiple Choice/answers
 We will have a preparation class (Nov-18)
 Go over over proofs (lectures + assignments). If unsure, post question on Campuswire. I
will address specific pre-asked questions on Nov-18.
 Due:
 Assignment (P3): SVM, linear regression and kernelization, due Tuesday Nov-16
 Quiz 5: decision trees and bagging, due Thursday Nov-18
 Assignment (P4): Decision trees, due Thursday Nov-25
2
Disadvantages of Bagging
 Loss of interpretability: the underlying model might be
interpretable, e.g., decision trees. However, an ensemble
prediction is harder to make sense of.
 Computationally expensive: Bagging slows down and grows
more intensive as the number of iterations increase
 Unstable benefit (across models): Bagging is not beneficial for
models that suffer from low variance
 “bagging a linear regression model will effectively just return the original
predictions” [Hands-On Machine Learning, Boehmke & Greenwell]
3
Random Forest
 A Random Forest is essentially a bagged decision trees, with a
modified splitting criteria
1. Sample data sets from with replacement.
2. For each train with a modified splitting criteria
3. Predict:
 Splitting criteria: split on feature that maximizes IG but don’t
consider all possible features
 Consider a random subsample

4
Advantages of Random Forest
 Easy to implement. works well “out of the box”.
 Only two hyperparameters
 RF is not sensitive to hyperparameters value
 Known values that usually work well

 as large as you can afford

 Insensitive to variance in features domain (scale, magnitude, or
slope, missing values)
 Doesn’t require data preprocessing
5
6
What about bias?
 Trees with high variance – use bagging!
 What if we have high bias = underfitting = the model is too weak
= can’t capture data structure
 Both training and testing losses are high
 More training data won’t help
 Can this problem be addressed with an ensemble approach?
 Can weak learners be combined to generate a strong learner with low
bias?
 Weak learners is slightly better than random guessing
7
Ensemble loss
 The loss of an ensemble
 where
 Instead of:
 We now consider a scaled average

 Question: can we define a new member to our ensemble that, if added,

will reduce the ensemble loss?

 Claim: when we group multiple weak classifiers with each one

progressively learning from the others' wrongly classified objects, we can
build one such strong model
8
Boosting
 Schapire, Robert E. (1990). "The Strength of Weak Learnability”

 How can we approximate the loss at around a known point ?

 2nd order Taylor series
 Derivative of summed terms = summation of terms derivative
Boosting

Independent of Scalar Independent of

 Makes sense!
 = how to change the current prediction such that loss is increased
 Find a new classifier that points at the other direction
 Inner product is minimized for opposing vectors

10
Example

 the update (direction) for that will maximize the loss

11
Gradient boost
 Task: train a new tree
 Must be better than random,
 =The inner product of with is negative (angle > 90)
 = take a step in the right direction i.e., reduce loss

12
Gradient boost

 Consider the magnitude of as some constant

 Adding a constant does not change the argmin
 is independent of so does not change the argmin

13
Gradient boost

 Consider squared loss

The current error in

 That is, train a new model to minimize the squared difference
between the output and the error in

14
Sanity check

 Currently:
 However:
 Minimum value for is at
 Adding the new learner to would reduce the error

 Assuming step size

15
Gradient boost for trees
 Hypothesis space spans all regression trees with () with Limited
depth (usually })
 Highly biased model = weak learner

1. Until convergence
Regression tree minimizing
instead of

 Hyperparameters =

16
Adaptive Boost
 AdaBoost loss:
 Assume: Binary classification,
 Assume: The weak learners always return

 is no longer a hyperparameter but an adaptive step size value where

better will be assigned higher
 Merge with gradient boosting:

17
AdaBoost

18
AdaBoost

 We assumed that

 is a constant per iteration (independent of )

19
AdaBoost

 That is, the added learner should minimize exponential loss for misclassified
samples

20
AdaBoost

 We define
 The normalized loss contribution for each training sample

 Moving forward we will say that

 What happens if ?
 Can we still define ?

21
 Yes!
 As long as, , we have a meaningful value for

 That is, we can still reduce the loss even when the training error is zero
 This is good news! Why?
 Even when fits our training data perfectly, we can continue training it and
widen the classification margin
 Now that we can define , lets add it to our ensemble

22
Adaptive Boosting

 is a weak learner. As a result, it’s contribution to the accuracy of

is noisy
 Intuitively, better should have a larger
 Can we formulate this intuition as an optimization problem?
 Yes!

23
Adaptive Boosting

 Convex function = we can find argmin in a closed form:

 We assume that

24
Adaptive Boosting

 (see 4 slides back)

 Define
 (at minimum loss)
 ()

 (Wow! The optimal step size!)

25
AdaBoost


 Work in iterations!
 At each iteration need to re-compute all the weights
 Can simply update we won’t prove

26
AdaBost

Must be better than random classifier

is an upper bound on the {0/1} error
rate

27
28
29
30
What did we learn?
 Boosting = iteratively build an ensemble where each new learner
() is trained to reduce the error of the () ensemble
 Boosting is an extremely powerful algorithm, that turns any weak
learner (better than random) into a strong learner
 For AdaBoost (=adaptive step size and exponential loss), the
training error decreases exponentially with iterations
 Requires only steps until it is consistent with the training set (wasn’t
proved)

31
What next?
 Class: Midterm!
 Assignments:
 Assignment (P3): SVM, linear regression and kernelization, due Tuesday
Nov-16
 Assignment (P4): Decision trees, due Thursday Nov-25
 Quizzes:
 Quiz 5: decision trees and bagging, due Thursday Nov-18

Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
An Analysis-Of-Variance Model For The Assessment of Configural Cue Utilization in Clinical Judgment
No ratings yet
An Analysis-Of-Variance Model For The Assessment of Configural Cue Utilization in Clinical Judgment
12 pages
کتاب هفتم بارگزاری شده
No ratings yet
کتاب هفتم بارگزاری شده
57 pages
Boosting Mit
No ratings yet
Boosting Mit
36 pages
Machine Learning: Ensemble Methods
No ratings yet
Machine Learning: Ensemble Methods
54 pages
14-AI ML Ensemble 2022
No ratings yet
14-AI ML Ensemble 2022
41 pages
Ensemble Methods
No ratings yet
Ensemble Methods
31 pages
09 EnsembleLearning
No ratings yet
09 EnsembleLearning
36 pages
Module 5,1 Ensemble - Bagging, RF, Boosting
No ratings yet
Module 5,1 Ensemble - Bagging, RF, Boosting
66 pages
Chapter 3 - Boosting Theory
No ratings yet
Chapter 3 - Boosting Theory
7 pages
Cornell CS578: Bagging and Boosting
No ratings yet
Cornell CS578: Bagging and Boosting
10 pages
Data Mining - Ensemble Methods
No ratings yet
Data Mining - Ensemble Methods
12 pages
16-Ensemble Learning - Cont... - 12-04-2024
No ratings yet
16-Ensemble Learning - Cont... - 12-04-2024
13 pages
Boosting
No ratings yet
Boosting
2 pages
Bagging and Boosting: 9.520 Class 10, 13 March 2006 Sasha Rakhlin
No ratings yet
Bagging and Boosting: 9.520 Class 10, 13 March 2006 Sasha Rakhlin
19 pages
Introduction To Boosting - 2
No ratings yet
Introduction To Boosting - 2
79 pages
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
No ratings yet
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
6 pages
Ensemble - Part 1
No ratings yet
Ensemble - Part 1
33 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
Unit V - Multiple Learners
No ratings yet
Unit V - Multiple Learners
54 pages
MLDM Lect17 Classification Ensembles
No ratings yet
MLDM Lect17 Classification Ensembles
2 pages
ML11 Generalization
No ratings yet
ML11 Generalization
40 pages
Unit 3 Aml
No ratings yet
Unit 3 Aml
9 pages
Lec5 Boosting v2.7 1
No ratings yet
Lec5 Boosting v2.7 1
46 pages
ML Interview
No ratings yet
ML Interview
65 pages
ML8 Ensembles
No ratings yet
ML8 Ensembles
31 pages
107 Boostong Models
No ratings yet
107 Boostong Models
27 pages
AIML Lect6 Ensembles
No ratings yet
AIML Lect6 Ensembles
41 pages
ML Unit-3
No ratings yet
ML Unit-3
28 pages
07 Boosting Notes
No ratings yet
07 Boosting Notes
10 pages
Evaluating Machine Learning Algorithms and Model Selection
No ratings yet
Evaluating Machine Learning Algorithms and Model Selection
10 pages
ML-Unit I - Ensemble Methods
No ratings yet
ML-Unit I - Ensemble Methods
54 pages
LECTURE+NOTES Boosting
No ratings yet
LECTURE+NOTES Boosting
8 pages
UNIT3 Class
No ratings yet
UNIT3 Class
30 pages
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
36 pages
Session 10 - Ensemble Methods (XGBoost)
No ratings yet
Session 10 - Ensemble Methods (XGBoost)
37 pages
Week 11 EnsembleLearning
No ratings yet
Week 11 EnsembleLearning
34 pages
ENG6500 7 Ensembles Boosting
No ratings yet
ENG6500 7 Ensembles Boosting
49 pages
Ensemble Final
No ratings yet
Ensemble Final
41 pages
Write Up
No ratings yet
Write Up
12 pages
Computational Data Analysis: Machine Learning
No ratings yet
Computational Data Analysis: Machine Learning
26 pages
05 - Ensemble Learning
No ratings yet
05 - Ensemble Learning
39 pages
Ens Embling
No ratings yet
Ens Embling
8 pages
Unit-3 ML
No ratings yet
Unit-3 ML
18 pages
Lecture 2.1 - AML
No ratings yet
Lecture 2.1 - AML
32 pages
ML Chapter 3
No ratings yet
ML Chapter 3
25 pages
Chapter 7 - Ensemble
No ratings yet
Chapter 7 - Ensemble
12 pages
DM - Lecture 4
No ratings yet
DM - Lecture 4
65 pages
Ensemble Classifiers
No ratings yet
Ensemble Classifiers
37 pages
Boosting
No ratings yet
Boosting
12 pages
UNIT III Word File
No ratings yet
UNIT III Word File
13 pages
Class Adv Classification V
No ratings yet
Class Adv Classification V
50 pages
Ensemble Learning
No ratings yet
Ensemble Learning
52 pages
ML Lecture 15 Ensemble
No ratings yet
ML Lecture 15 Ensemble
27 pages
5 - EnsembleModeling
No ratings yet
5 - EnsembleModeling
80 pages
Ensemble Learning (Autosaved)
No ratings yet
Ensemble Learning (Autosaved)
31 pages
Boosting
No ratings yet
Boosting
13 pages
Ch-4 Ensemble Learning
No ratings yet
Ch-4 Ensemble Learning
18 pages
Module 7 - Ensemble Learning
No ratings yet
Module 7 - Ensemble Learning
41 pages
CH 7 - Ensemble Learning and Random Forests
No ratings yet
CH 7 - Ensemble Learning and Random Forests
78 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
NeuralNets DeepLearning
No ratings yet
NeuralNets DeepLearning
17 pages
Image Augmentation
No ratings yet
Image Augmentation
8 pages
15 Finetune
No ratings yet
15 Finetune
33 pages
Neural Style
No ratings yet
Neural Style
6 pages
Fine Tuning
No ratings yet
Fine Tuning
3 pages
6 Perceptron
No ratings yet
6 Perceptron
32 pages
MLP Scratch
No ratings yet
MLP Scratch
8 pages
MLRD 1
No ratings yet
MLRD 1
28 pages
3 2KNN
No ratings yet
3 2KNN
27 pages
26 Neural Nets
No ratings yet
26 Neural Nets
77 pages
PART I: (Please Answer On The QUESTION SHEET) : F E - GTP 8 Intake Index 1
No ratings yet
PART I: (Please Answer On The QUESTION SHEET) : F E - GTP 8 Intake Index 1
8 pages
Chapter 6 Review
No ratings yet
Chapter 6 Review
4 pages
Time Series Mining Slides
No ratings yet
Time Series Mining Slides
42 pages
Lecture 6 Example Problem
No ratings yet
Lecture 6 Example Problem
5 pages
Multivariate Time Series Models
No ratings yet
Multivariate Time Series Models
28 pages
Chapter 5 - Cost Estimation
0% (2)
Chapter 5 - Cost Estimation
36 pages
Unit 7 Rank Correlation: Structure
No ratings yet
Unit 7 Rank Correlation: Structure
21 pages
Bayesian Disease Mapping Hierarchical Modeling in Spatial Epidemiology, Third Edition 3rd Edition Full Text
100% (14)
Bayesian Disease Mapping Hierarchical Modeling in Spatial Epidemiology, Third Edition 3rd Edition Full Text
14 pages
Solution RVCE AIML Test 3
No ratings yet
Solution RVCE AIML Test 3
3 pages
1-Instruction On Project Sta680 - 16.12.2023
No ratings yet
1-Instruction On Project Sta680 - 16.12.2023
4 pages
523-530 Jurnal Ministal Teguh Setiawan
No ratings yet
523-530 Jurnal Ministal Teguh Setiawan
8 pages
Sat Formulas Combined
No ratings yet
Sat Formulas Combined
13 pages
Lecture 8: Gradient Descent and Logistic Regression
No ratings yet
Lecture 8: Gradient Descent and Logistic Regression
39 pages
Germany22 Luedicke
No ratings yet
Germany22 Luedicke
39 pages
CS195-5: Introduction To Machine Learning: Greg Shakhnarovich
No ratings yet
CS195-5: Introduction To Machine Learning: Greg Shakhnarovich
33 pages
Medical Statistics From Scratch 4th Edition David Bowersdownload
100% (1)
Medical Statistics From Scratch 4th Edition David Bowersdownload
48 pages
4.1 Point and Interval Estimation
100% (1)
4.1 Point and Interval Estimation
30 pages
Statistics Solved MCQs (Set-1) McqMate - Com - Merged
No ratings yet
Statistics Solved MCQs (Set-1) McqMate - Com - Merged
47 pages
Lecture Material 12
No ratings yet
Lecture Material 12
9 pages
Sas/Ets Procedures: Rajender Parsad and Manoj Kumar Khandelwal I.A.S.R.I., Library Avenue, New Delhi - 110 012
No ratings yet
Sas/Ets Procedures: Rajender Parsad and Manoj Kumar Khandelwal I.A.S.R.I., Library Avenue, New Delhi - 110 012
10 pages
Lesson 6 Sampling Distribution of The MEAN
No ratings yet
Lesson 6 Sampling Distribution of The MEAN
12 pages
Statistics: Descriptive Statistics and Present Data
No ratings yet
Statistics: Descriptive Statistics and Present Data
41 pages
Lab 04 - Supervised ML Classification - Updated
No ratings yet
Lab 04 - Supervised ML Classification - Updated
21 pages
Arma Model
No ratings yet
Arma Model
15 pages
Presentation Schedule - SMA-T122PWB-1
No ratings yet
Presentation Schedule - SMA-T122PWB-1
9 pages
A Powerpoint®-Based Guide To Assist in Choosing The Suitable Statistical Test
No ratings yet
A Powerpoint®-Based Guide To Assist in Choosing The Suitable Statistical Test
43 pages
Measures of Central Tendency: 1.1 Summation Notation
No ratings yet
Measures of Central Tendency: 1.1 Summation Notation
14 pages
Written Assignment
No ratings yet
Written Assignment
3 pages
A2 QMT
0% (1)
A2 QMT
2 pages