0% found this document useful (0 votes)

15 views

Supervised Learning: Overview 3: Rayid Ghani

This document discusses supervised learning methods and ensemble methods. It covers common supervised learning algorithms like decision trees, support vector machines, and neural networks. It then focuses on ensemble methods like bagging, boosting, and random forests. Bagging creates ensembles by resampling the training data and combining models through voting. Boosting iteratively reweights examples to focus on misclassified examples. Random forests grow trees on random subsets of data and features to reduce correlation between trees.

Uploaded by

ashishamitav123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

Supervised Learning: Overview 3: Rayid Ghani

Uploaded by

ashishamitav123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 20

Supervised Learning: Overview 3

Rayid Ghani

Slides liberally borrowed and customized from lots of excellent online sources
Rayid Ghani @rayidghani
Methods
• Regression
• Nearest neighbor
• Decision Trees
• Support Vector Machines
• Bayes Classifier
• Ensembles
– Bagging We’ll cover these today
– Boosting
– Random Forests
• Neural Networks
Rayid Ghani @rayidghani
Why Ensembles?

Rayid Ghani @rayidghani

How can we create ensembles?

• Different learning algorithms

• Algorithms with different choice for parameters
• Data set with different features (e.g. random
subspace)
• Data set = different subsets (e.g. bagging, boosting)

Rayid Ghani @rayidghani

Ensemble Methods
• Bagging (Bootstrap Aggregation)
• Boosting
• Random Forests
• Stacking

Rayid Ghani @rayidghani

Bagging
• Create ensembles by repeatedly randomly resampling the training
data (Brieman, 1996).

• Given a training set of size n, create m samples of size n by

drawing n examples from the original data, with replacement.
– Each bootstrap sample will on average contain 63.2% of the unique training
examples, the rest are replicates.

• Combine the m resulting models using simple majority vote.

Rayid Ghani @rayidghani

Bagging
• For i = 1 .. M
– Draw samples with replacement
– Learn classifier Ci
• Final classifier is a vote of C1 .. CM
• Why does it work?
– Increases classifier stability/reduces variance
– Works better with unstable classifiers (Decision Trees)

figure from Friedman et al. 2000

Rayid Ghani @rayidghani
• Why does it work?
– Increases classifier stability/reduces variance
– Works better with unstable classifiers (Decision Trees)

• What are some problems with it?

Rayid Ghani @rayidghani

Boosting
• Examples are given weights.
• At each iteration, a new hypothesis/model is
learned and the examples are reweighted to
focus the model on examples that the most
recently learned model got wrong.

Rayid Ghani @rayidghani

Boosting
• General Loop:
Set all examples to have equal uniform weights.
For t from 1 to T do:
Learn a classifier, Ct, from the weighted examples
Increase the weights of examples Ct classifies incorrectly

• Base (weak) learner must focus on correctly classifying

the most highly weighted examples while strongly
avoiding over-fitting.

• During testing, each of the T hypotheses get a weighted

vote proportional to their accuracy on the training data.

Rayid Ghani @rayidghani

typically
where

the weights of incorrectly classified examples are

increased so that the base learner is forced to focus
on the hard examples in the training set

From [R. Schapire, NE&C03]

Rayid Ghani @rayidghani
Rayid Ghani @rayidghani
Rayid Ghani @rayidghani
Boosting
• Improves classification accuracy

• Can be used with many different types of

classifiers

figure from Friedman et al. 2000

Rayid Ghani @rayidghani
Random Forest
• Each tree is grown on a bootstrap sample of the training
set of N cases.
• A number m is specified much smaller than the total
number of variables M (e.g. m = sqrt(M)).
• At each node, m variables are selected at random out of
the M.
• The split used is the best split on these m variables.
• Final classification is done by majority vote across trees.

15
Rayid Ghani @rayidghani
Random Forests

• Motivation: reduce error correlation between classifiers

• Main idea: build a larger number of un-pruned decision

trees

• Key: using a random selection of features to split on at

each node

Rayid Ghani @rayidghani

Advantages of random forest
• Accurate

• More robust with respect to noise.

• More efficient on large data

• Provides an estimation of the importance of features in

determining classification

• More info at: http://stat-

www.berkeley.edu/users/breiman/RandomForests/cc_home.htm
Rayid Ghani @rayidghani
Questions to think about?
• Could you inject more randomness in random
forests?

• How?

• What would be the impact?

Rayid Ghani @rayidghani

Factors to consider
• Complexity
• Overfitting
• Robustness
• Interpretability
• Training Time
• Test Time

Rayid Ghani @rayidghani

What to remember about classifiers

• Better to have smart features and simple classifiers than

simple features and smart classifiers

• Need more training data with increasingly powerful

classifiers (bias-variance tradeoff)

Rayid Ghani @rayidghani

Slide credit: D. Hoiem

Inisimulations A300 ON THE LINE Official Manual Windows
No ratings yet
Inisimulations A300 ON THE LINE Official Manual Windows
50 pages
MR590I - Manual - Neha Refu
100% (2)
MR590I - Manual - Neha Refu
182 pages
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
No ratings yet
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
27 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
32 pages
Classification Algorithms
No ratings yet
Classification Algorithms
68 pages
Ensemble Learning Methods
100% (1)
Ensemble Learning Methods
24 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
40 pages
14 Model Ensembles
No ratings yet
14 Model Ensembles
63 pages
ML Unit 3
No ratings yet
ML Unit 3
22 pages
Lecture 05 Random Forest 07112022 124639pm
No ratings yet
Lecture 05 Random Forest 07112022 124639pm
25 pages
5 - EnsembleModeling
No ratings yet
5 - EnsembleModeling
80 pages
Trees, Boosting, and Random Forest
No ratings yet
Trees, Boosting, and Random Forest
14 pages
Unit-3(1)
No ratings yet
Unit-3(1)
63 pages
Lecture 10 Ensemble Methods
No ratings yet
Lecture 10 Ensemble Methods
69 pages
Lecture 6
No ratings yet
Lecture 6
24 pages
Random Forest-Supervised ML
No ratings yet
Random Forest-Supervised ML
45 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
13 PracticalMachineLearning
100% (1)
13 PracticalMachineLearning
84 pages
Ensemble Methods.pptx
No ratings yet
Ensemble Methods.pptx
32 pages
ML Mod 5.1
No ratings yet
ML Mod 5.1
18 pages
05 - Ensemble Learning
No ratings yet
05 - Ensemble Learning
39 pages
Enseble LEarning
100% (1)
Enseble LEarning
57 pages
Lecture 5
No ratings yet
Lecture 5
53 pages
Handout9 Trees Bagging Boosting
100% (1)
Handout9 Trees Bagging Boosting
23 pages
Unit-3(1)
No ratings yet
Unit-3(1)
59 pages
Ch-4 Ensemble Learning
No ratings yet
Ch-4 Ensemble Learning
18 pages
Ensemble Classification
No ratings yet
Ensemble Classification
25 pages
Random Forest
No ratings yet
Random Forest
10 pages
ML mod1
No ratings yet
ML mod1
48 pages
Lecture 9 PDF
100% (1)
Lecture 9 PDF
28 pages
Module 5,1 Ensemble_Bagging, RF,Boosting
No ratings yet
Module 5,1 Ensemble_Bagging, RF,Boosting
66 pages
Random Forest
No ratings yet
Random Forest
27 pages
CSC 3304 Lecture 08 Boosting Ensemble Methods
No ratings yet
CSC 3304 Lecture 08 Boosting Ensemble Methods
41 pages
Module 7 - Ensemble Learning
No ratings yet
Module 7 - Ensemble Learning
41 pages
DSA5102_lecture3
No ratings yet
DSA5102_lecture3
34 pages
Ensemble Learning
No ratings yet
Ensemble Learning
52 pages
ENSEMBLE LEARNING-1
No ratings yet
ENSEMBLE LEARNING-1
61 pages
Week 12
No ratings yet
Week 12
34 pages
Ensemble Final
No ratings yet
Ensemble Final
41 pages
ML UNIT-3 PART-1
No ratings yet
ML UNIT-3 PART-1
17 pages
Machine learning lecture 2,3,4
No ratings yet
Machine learning lecture 2,3,4
26 pages
Lec06 - Ensembling Methods Bagging Boosting
No ratings yet
Lec06 - Ensembling Methods Bagging Boosting
48 pages
UNIT-3 Material
No ratings yet
UNIT-3 Material
19 pages
Ensembles 1
No ratings yet
Ensembles 1
4 pages
CS109a Lecture16 Bagging RF Boosting
No ratings yet
CS109a Lecture16 Bagging RF Boosting
48 pages
Unit 3
No ratings yet
Unit 3
99 pages
Machine Learning: Ensemble Methods
No ratings yet
Machine Learning: Ensemble Methods
54 pages
ML - 5
No ratings yet
ML - 5
53 pages
Cornell CS578: Bagging and Boosting
No ratings yet
Cornell CS578: Bagging and Boosting
10 pages
Ensemble Methods
No ratings yet
Ensemble Methods
31 pages
Aiml ML Session 13
No ratings yet
Aiml ML Session 13
78 pages
2.4-Ensemble_methods_lecture_notes (1)
No ratings yet
2.4-Ensemble_methods_lecture_notes (1)
14 pages
phys361-S24-lecture-17-random-forests
No ratings yet
phys361-S24-lecture-17-random-forests
24 pages
Module 2
No ratings yet
Module 2
34 pages
ML-Unit I - Ensemble Methods
No ratings yet
ML-Unit I - Ensemble Methods
54 pages
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
100% (1)
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
14 pages
Session 10 - Ensemble Methods (XGBoost)
No ratings yet
Session 10 - Ensemble Methods (XGBoost)
37 pages
ShortCourse-QTT-Lecture2
No ratings yet
ShortCourse-QTT-Lecture2
37 pages
Algebra - Drill Sheets Gr. 6-8
From Everand
Algebra - Drill Sheets Gr. 6-8
Nat Reed
No ratings yet
Algebra - Task Sheets Gr. 3-5
From Everand
Algebra - Task Sheets Gr. 3-5
Nat Reed
No ratings yet
Measurement - Drill Sheets Gr. 3-5
From Everand
Measurement - Drill Sheets Gr. 3-5
Chris Forest
No ratings yet
Measurement - Task Sheets Gr. 3-5
From Everand
Measurement - Task Sheets Gr. 3-5
Chris Forest
No ratings yet
Labour Laws Every HR Professional Must Master in a Private Limited Company (India) (1)
No ratings yet
Labour Laws Every HR Professional Must Master in a Private Limited Company (India) (1)
7 pages
DPM 4 (Solutions)
No ratings yet
DPM 4 (Solutions)
6 pages
DPM 40(Solutions)
No ratings yet
DPM 40(Solutions)
8 pages
DPM 57(Solutions)
No ratings yet
DPM 57(Solutions)
12 pages
DPM 4
No ratings yet
DPM 4
10 pages
DPM 56(Solutions)
No ratings yet
DPM 56(Solutions)
8 pages
DPM 92(Solutions)
No ratings yet
DPM 92(Solutions)
8 pages
DPM 18(Solutions)
No ratings yet
DPM 18(Solutions)
7 pages
DPM 95
No ratings yet
DPM 95
13 pages
DPM 91(Solutions)
No ratings yet
DPM 91(Solutions)
9 pages
DPM 95(Solutions)
No ratings yet
DPM 95(Solutions)
9 pages
DPM 96
No ratings yet
DPM 96
14 pages
Ricoh Error Codes
50% (2)
Ricoh Error Codes
6 pages
First Boot
No ratings yet
First Boot
3,379 pages
Rock Hard Olivia Cunning download
No ratings yet
Rock Hard Olivia Cunning download
66 pages
Analyzing The Profitability Factor With Alphalens
No ratings yet
Analyzing The Profitability Factor With Alphalens
9 pages
Object Oriented Programming
No ratings yet
Object Oriented Programming
17 pages
Agilent - Keysight E4412-90013 Power Sensor
No ratings yet
Agilent - Keysight E4412-90013 Power Sensor
51 pages
05 IP Routing Basics
No ratings yet
05 IP Routing Basics
52 pages
Electronic Thesis and Dissertation PDF
100% (3)
Electronic Thesis and Dissertation PDF
6 pages
SB 0168
No ratings yet
SB 0168
8 pages
New Message Class in Sales
No ratings yet
New Message Class in Sales
3 pages
Screenshot 2024-11-24 at 00.13.56
No ratings yet
Screenshot 2024-11-24 at 00.13.56
1 page
7-System Security
No ratings yet
7-System Security
113 pages
ds_iql10
No ratings yet
ds_iql10
4 pages
All Proformas Updated
No ratings yet
All Proformas Updated
14 pages
Manual CNC ESA 530
No ratings yet
Manual CNC ESA 530
86 pages
Python Classical Aerodynamics of Potential
No ratings yet
Python Classical Aerodynamics of Potential
3 pages
3D Printing for Artists Designers and Makers 2nd Edition Stephen Hoskins instant download
No ratings yet
3D Printing for Artists Designers and Makers 2nd Edition Stephen Hoskins instant download
52 pages
Window 11 Update
No ratings yet
Window 11 Update
26 pages
Prototyping: Human Computer Interaction
No ratings yet
Prototyping: Human Computer Interaction
8 pages
ST Seminar topics
No ratings yet
ST Seminar topics
2 pages
Computer Programming Chapter4
No ratings yet
Computer Programming Chapter4
79 pages
Ccs337 Cs Unit IV
No ratings yet
Ccs337 Cs Unit IV
30 pages
Samsung Galaxy Tab S sm-t705 Service Manual 8 PDF
No ratings yet
Samsung Galaxy Tab S sm-t705 Service Manual 8 PDF
118 pages
Finite Fields of The Form GF
No ratings yet
Finite Fields of The Form GF
13 pages
AWS SAA-C02_C03 Cheat Sheet - S3
No ratings yet
AWS SAA-C02_C03 Cheat Sheet - S3
5 pages
DSA Lab 02
No ratings yet
DSA Lab 02
22 pages
Form Pengajuan Dana Beasiswa
No ratings yet
Form Pengajuan Dana Beasiswa
4 pages
How To Edit STL
No ratings yet
How To Edit STL
16 pages

Supervised Learning: Overview 3: Rayid Ghani

Uploaded by

Supervised Learning: Overview 3: Rayid Ghani

Uploaded by

Supervised Learning: Overview 3

Rayid Ghani @rayidghani

• Different learning algorithms

Rayid Ghani @rayidghani

Rayid Ghani @rayidghani

• Given a training set of size n, create m samples of size n by

• Combine the m resulting models using simple majority vote.

Rayid Ghani @rayidghani

figure from Friedman et al. 2000

• What are some problems with it?

Rayid Ghani @rayidghani

Rayid Ghani @rayidghani

• Base (weak) learner must focus on correctly classifying

• During testing, each of the T hypotheses get a weighted

Rayid Ghani @rayidghani

the weights of incorrectly classified examples are

From [R. Schapire, NE&C03]

• Can be used with many different types of

figure from Friedman et al. 2000

• Motivation: reduce error correlation between classifiers

• Main idea: build a larger number of un-pruned decision

• Key: using a random selection of features to split on at

Rayid Ghani @rayidghani

• More robust with respect to noise.

• More efficient on large data

• Provides an estimation of the importance of features in

• More info at: http://stat-

• What would be the impact?

Rayid Ghani @rayidghani

Rayid Ghani @rayidghani

• Better to have smart features and simple classifiers than

• Need more training data with increasingly powerful

Rayid Ghani @rayidghani

You might also like