0% found this document useful (0 votes)

18 views63 pages

14 Model Ensembles

Uploaded by

sairohith068620

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views63 pages

14 Model Ensembles

Uploaded by

sairohith068620

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 63

Model Ensembles

Instructor: Saravanan Thirumuruganathan

ML Paradigms

1. Build one GREAT model

• Traditional approach
• Logistic regression, KNN, Naïve Bayes, SVM, Decision Trees, ….

2. Build MANY decent models and combine them smartly

• Have become popular recently due to their great empirical performance and
interesting theoretical results
No Free Lunch Theorem
• There is no single machine learning algorithm that performs best for
all possible problems.

• Universal performance: If we average an algorithm's performance

across all possible problems, every algorithm will have the same
average performance.

• The effectiveness of an algorithm depends on how well it matches the

specific problem at hand. This is why domain knowledge is important!
Ensembles and Netflix Prize
• One of the winning teams BellKor was ensemble of 107 models!

“Our experience is that most efforts should be concentrated inderiving

substantially different approaches, rather than refining a simple
technique.”

“We strongly believe that the success of an ensemble approach

depends on the ability of its various predictors to expose different
complementing aspects of the data. Experience shows that this is very
different than optimizing the accuracy of each individual predictor.”

Quotes via Rich Zemel

Strong and Weak Learners
• Strong Learners
• Product a classifier that is very accurate
• Most of ML is focused on this
• A challenging problem

• Weak Learners
• Produce a classifier that is more accurate than random guessing
• Not hard to build weak learners
Ensemble Learning

1. Build strong learners from weak learners

2. Given a set of base classifiers, build an ensemble such that its

accuracy is higher than that of the base learners
Ensemble Learning Design Space

Image from Raymond Mooney

Ensemble Learning
When will ensemble learning work?

Any thoughts on how to combine the classifiers?

Ensemble Learning
• Necessary and sufficient condition for ensemble learning
• Accuracy
• Diversity

• A classifier is accurate if it is better than random guessing

• A set of classifiers is diverse if they make uncorrelated errors

Condorcet's jury theorem
• A theorem from 1785!

• A group of juries want to reach a decision by majority vote. Each voter

has an independent probability p of being correct

• If p > 1/2, then adding more voters increases the probability that the
majority voting is correct. At the limit, this probability approaches 1

• If p < 1/2, then more voters is bad. It is better to use a single jury
Majority Vote Classifier

Sebastian Raschka STAT 479: FS 2019

Majority Voting Classifier
Why does majority voting work?

Assumptions
• n classifiers
• Each classifier has an accuracy > 0.5
• Errors are uncorrelated

Sebastian Raschka STAT 479: FS 2019

Majority Voting Classifier
The probability that we make a wrong prediction via the ensemble
happens when k classifiers predict the same class label where k > n/2

Sebastian Raschka STAT 479: FS 2019

Base Error vs Ensemble Error

Sebastian Raschka STAT 479: FS 2019

Extensions to Majority Voting
• Majority voting works very well
• Even with weak learners when #classifiers increase with uncorrelated errors

• What can you do improve this simple approach?

Sebastian Raschka STAT 479: FS 2019

Extensions to Majority Voting
• Weighted majority voting
• Majority voting gives a weight of 1/n to each classifier
• Give a different weight based on held-out/validation dataset accuracy

• Soft voting
• Also take the output probability into account
• Classifiers has to be well calibrated

• Learn the weights using a ML model

Sebastian Raschka STAT 479: FS 2019

Soft Voting

Sebastian Raschka STAT 479: FS 2019

Stacking Algorithm

Sebastian Raschka STAT 479: FS 2019

Stacking Algorithm
• What is the problem with this simple algorithm?
Stacking Algorithm

Sebastian Raschka STAT 479: FS 2019

Stacking Algorithm

Sebastian Raschka STAT 479: FS 2019

Wolpert, David H. "Stacked generalization." Neural networks 5.2 (1992): 241-259.
Tang, J., S. Alelyani, and H. Liu. "Data Classification: Algorithms and Applications." Data Mining and Knowledge Discovery Series, CRC Press (2015): pp. 498-500.
Finding Base Classifiers for Ensembles
• For a good ensemble, the base classifiers should be
• Accurate : have accuracy > 50%
• Diverse: have uncorrelated errors

• Building accurate classifiers is not hard (at least for binary

classification)

• How to get diverse base classifiers?

Bagging
• Bootstrap Aggregating : Breiman, L. (1996). Bagging predictors.
Machine learning, 24(2), 123-140.

Sebastian Raschka STAT 479: FS 2019

Bootstrap Sampling

Sebastian Raschka STAT 479: FS 2019

Bagging Classifier

Sebastian Raschka STAT 479: FS 2019

Asymptotic Behavior of Bagging

Sebastian Raschka STAT 479: FS 2019

Bagging and Correlated Trees
• Suppose you have a feature f that is a great discriminator. Other
features are good but not as good as f

• So all bagged trees will select f at the top of the tree. The only
difference between trees will be in how the rest of the sub-tree
changes which might not be that much

• Solution?
Random Subspace Method

Training data

Md. Abu Sayed, University of Nevada Reno

Random Subspace Method

A test sample

66% confidence

Md. Abu Sayed, University of Nevada Reno

Random Forests

Random Forests = Bagging with trees + random feature subsets

random subspace method where each tree got a random subset of

features.

Sebastian Raschka STAT 479: FS 2019

Random Forests

Tree 1 Tree 2 Random Forest Tree N

Md. Abu Sayed, University of Nevada Reno

Simple Random Forest Algorithm
Difference with Standard Decision Tree Algo
• Train each tree on bootstrapped sample (not on entire data)

• For each split, consider only m random features

• Does not prune.

Out of Bag (OOB) Error

https://en.wikipedia.org/wiki/Out-of-bag_error
Random Forest: Bias and Variance
• Increasing the number of models (trees) decreases variance (less
overfitting)
• Bias is mostly unaffected, but will increase if the forest becomes too
large (oversmoothing)

Joaquin Vanschoren; ML for Engineers

Random Forest Tips
• Rule of thumb: start with #features * 10 and adjust
• Sklearn’s default values for rest of variables is fine

Illustration by Bradley Boehmke

Random Forest Pros
• Gives competitive performance
• Can give great performance with little tuning
• Individual trees can overfit, random forest does not (usually)
• Has built in validation dataset using OOB data
• OOB error is a good estimate for generalization error
• Usually, you do not need to do cross validation for random forests
Random Forest Cons
• Can be slow for large datasets
• Not very interpretable
• Can be beaten by advanced boosting based ensembles
ExtraTrees / Extremely Randomized Trees
• Takes randomness to one step deeper
• By default, decision trees are built on entire dataset (not bootstrap)
• When growing a decision tree, it randomly selects m out of M
features
• For each feature, it selects a split randomly
• Eg attr_k = v1 (for categorical) or attr_k <= v1 (for continuous)
• Then uses some metric like gini/entropy to find the best of the m
splits
ExtraTrees
• Robust to noise and irrelevant features
• Very efficient: trees constructed in parallel, feature selection is fast
due to random subset and random splits
• Low variance (even when compared to RF and much lower than DT)
• Bias reduction: random subset/splitting makes the bias lower

• Performance comparable to RF and does better for noisy dataset

• Not widely used due to lack of awareness
Bagging Summary
• In Bagging, the models can be trained in parallel

• Take different K bootstrap samples and train K models

• Errors of one base model do not influence another: Why?

• Less susceptible to overfitting on noisy data as models do not focus

on particular instances of data.
Boosting

Sebastian Raschka STAT 479: FS 2019

General Boosting Algorithm

• Initialize a weight vector with uniform weights

• Loop
• Apply weak learner to weighted training examples
• Increase weight for misclassified examples
• (Weighted) majority voting on trained classifiers

• Intuition: force classifier Ci+1 to focus on mistakes of Ci

Sebastian Raschka STAT 479: FS 2019

Decision Tree Stumps
• Decision tree with depth 1

• Categorical attribute: attr = v

• Numerical attribute: attr < v

• Simple classifier and weak learner

Sebastian Raschka STAT 479: FS 2019

Boosting with Decision Stumps

Sebastian Raschka STAT 479: FS 2019

AdaBoost: Bias and Variance
• AdaBoost reduces bias (and a little variance)
• Boosting too much will eventually increase variance

Joaquin Vanschoren; ML for Engineers

AdaBoost

Sebastian Raschka STAT 479: FS 2019

AdaBoost Pros
• High accuracy: Generally outperforms single models, especially on
complex datasets. Possible to get training error of 0

• Possible to get feature importance (e.g. using decision stumps)

• Can work with diverse base learners

AdaBoost Cons
• Sensitivity to noisy data and outliers
• Computationally expensive: Especially for large datasets or many
iterations
• Can overfit data (and increased weights)
• Harder to interpret than simpler models
• Sequential nature: Difficult to parallelize, which can slow down
training
Gradient Boosting
• Ensemble of models, each fixing the remaining mistakes of the previous
ones
• Base models are regression trees, predict probability of positive class p
• Each iteration, the task is to predict the residual error of the ensemble

• Additive model: Predictions at iteration I are sum of base model

predictions
• Base models should be low variance, but flexible enough to predict
residuals accurately (e.g. decision trees of depth 2-5)
Gradient Boosting: Bias and Variance
Very effective at reducing bias but too much boosting increases variance

Joaquin Vanschoren; ML for Engineers

Joaquin Vanschoren; ML for Engineers
Gradient Boosting: Pros and Cons
• Among the most powerful and widely used models
• Work well on heterogeneous features and different scales
• Typically better than random forests, but requires more tuning, longer
training
• Does not work well on high-dimensional sparse data

Joaquin Vanschoren; ML for Engineers

XGBoost
• Faster version of Gradient Boosting models.

• Empirically, one of the best performing models

• RandomForest, XGBoost, LightGBM are the first approaches that you

should try

• Not very easy to explain.

Sebastian Raschka STAT 479: FS 2019

Boosting and Bagging

Rich Zemel CSC411 Fall 2014

Mixture of Experts (MoE)

Rich Zemel CSC411 Fall 2014

Cooperations vs Specialization
• Boosting and Bagging
• base classifiers cooperate to produce a prediction
• Each classifier has a fixed weight that is used for weighted majority voting

• MoE
• Weight of expert depends on input x
• Gating network forces experts to “specialize” instead of cooperate
Ensemble Learning Limitations
• If classifiers are accurate and diverse, we can push the accuracy of the
ensemble arbitrarily high by combining classifiers

• Typically, it is challenging for classifiers to make uncorrelated errors

• A realistic claim: for data points where classifiers predict with > 50%
accuracy, can push accuracy arbitrarily high (some data points just too
hard)

From: Neural network ensembles. Hansen and Salamon. TPAMI 1990

Why Decision Stumps as Base Learners
• Use max depth of tree as hyper parameter

• Shallow trees
• High bias but very low variance (underfitting)
• Keep low variance, reduce bias with Boosting

• Deep trees
• High variance but low bias (overfitting)
• Keep low bias, reduce variance with Bagging

Observation by Joaquin Vanschoren

Which ML Models to Combine?
• If model underfits (high bias, low variance): combine with other low-
variance models
• Need to be different: 'experts' on different parts of the data
• Bias reduction. Can be done with Boosting

• If model overfits (low bias, high variance): combine with other low-
bias models
• Need to be different: individual mistakes must be different
• Variance reduction. Can be done with Bagging

Observation by Joaquin Vanschoren

Bagging Summary
• Bagging is a variance-reduction technique
• Build many high-variance (overfitting) models on random data
samples
• Aggregation (soft voting) over many models reduces variance
• Diminishing returns, over-smoothing may increase bias error
• Parallelizes easily, doesn't require much tuning

Observation by Joaquin Vanschoren

Boosting Summary
• Boosting is a bias-reduction technique
• Build low-variance models that correct each other's mistakes
• By reweighting misclassified samples: AdaBoost
• By predicting the residual error: Gradient Boosting
• Additive models: predictions are sum of base-model predictions
• Can drive the error to zero, but risk overfitting
• Doesn't parallelize easily. Slower to train, much faster to predict.
• XGBoost,LightGBM,... are fast and offer some parallelization

Observation by Joaquin Vanschoren

Pa - Unit - Iv
No ratings yet
Pa - Unit - Iv
45 pages
05 - Ensemble Learning
No ratings yet
05 - Ensemble Learning
39 pages
Unit I ML (I) 24-25-1
No ratings yet
Unit I ML (I) 24-25-1
152 pages
ML Lecture 15 Ensemble
No ratings yet
ML Lecture 15 Ensemble
27 pages
Ensemble Methods
No ratings yet
Ensemble Methods
30 pages
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
No ratings yet
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
27 pages
Unit I ML (I) 24-25
No ratings yet
Unit I ML (I) 24-25
79 pages
AP® Calculus AB & BC All Access Book + Online
From Everand
AP® Calculus AB & BC All Access Book + Online
Stu Schwartz
5/5 (1)
ML Unit 3 (DS)
No ratings yet
ML Unit 3 (DS)
31 pages
UNIT-3 Material
No ratings yet
UNIT-3 Material
19 pages
Phys361 S24 Lecture 17 Random Forests
No ratings yet
Phys361 S24 Lecture 17 Random Forests
24 pages
07-Ensembles Notes
No ratings yet
07-Ensembles Notes
21 pages
07 Ensembles Slides
No ratings yet
07 Ensembles Slides
64 pages
D3 IT Random Forest Apr 2023
No ratings yet
D3 IT Random Forest Apr 2023
32 pages
Chapter07 Ensemble Learning
No ratings yet
Chapter07 Ensemble Learning
21 pages
ENsemble, Random Forest
No ratings yet
ENsemble, Random Forest
28 pages
ML-Unit I - Ensemble Methods
No ratings yet
ML-Unit I - Ensemble Methods
54 pages
ML - 5
No ratings yet
ML - 5
53 pages
2025 Ensemble Learning
No ratings yet
2025 Ensemble Learning
25 pages
Boosting
No ratings yet
Boosting
28 pages
Machine Learning Lab Mannual R20
No ratings yet
Machine Learning Lab Mannual R20
26 pages
Bagging
No ratings yet
Bagging
7 pages
ML Unit-3
No ratings yet
ML Unit-3
16 pages
Lec06 - Ensembling Methods Bagging Boosting
No ratings yet
Lec06 - Ensembling Methods Bagging Boosting
48 pages
5 - EnsembleModeling
No ratings yet
5 - EnsembleModeling
80 pages
Unit 3
No ratings yet
Unit 3
59 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
40 pages
Ensemble Methods
No ratings yet
Ensemble Methods
31 pages
Classification Algorithms
No ratings yet
Classification Algorithms
68 pages
ML Unit-3 Part-1
No ratings yet
ML Unit-3 Part-1
17 pages
Ensemble Methods
No ratings yet
Ensemble Methods
32 pages
Generalized Estimating Equations (Gees)
No ratings yet
Generalized Estimating Equations (Gees)
40 pages
Lecture 17 - Ensemble Learning
No ratings yet
Lecture 17 - Ensemble Learning
31 pages
Ensemble Final
No ratings yet
Ensemble Final
41 pages
Unit 3
No ratings yet
Unit 3
63 pages
VTU Module-4 Chapter-2 Ensemble Learning and Random Forests
No ratings yet
VTU Module-4 Chapter-2 Ensemble Learning and Random Forests
61 pages
2021 01 Slides l4 ML
No ratings yet
2021 01 Slides l4 ML
253 pages
Machine Learning For Structural Engineering (April 2022)
No ratings yet
Machine Learning For Structural Engineering (April 2022)
44 pages
Jntuk Machine Learning 3-2 Unit-3
No ratings yet
Jntuk Machine Learning 3-2 Unit-3
33 pages
Unit 3
No ratings yet
Unit 3
99 pages
Module 2
No ratings yet
Module 2
34 pages
Machine Learning For Algo Trading
100% (3)
Machine Learning For Algo Trading
29 pages
Ensemble Classification
No ratings yet
Ensemble Classification
25 pages
Machine Learning Lecture 2,3,4
No ratings yet
Machine Learning Lecture 2,3,4
26 pages
Ensemble Learning
No ratings yet
Ensemble Learning
52 pages
Module 7 - Ensemble Learning
No ratings yet
Module 7 - Ensemble Learning
41 pages
ML Unit-3
No ratings yet
ML Unit-3
15 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
SEM Vs M Regression
No ratings yet
SEM Vs M Regression
4 pages
An Introduction of Ensemble Learning
100% (1)
An Introduction of Ensemble Learning
40 pages
Factor Analysis
No ratings yet
Factor Analysis
11 pages
Regression BTW SHRM V Ip
No ratings yet
Regression BTW SHRM V Ip
6 pages
Classification Vs Regression
No ratings yet
Classification Vs Regression
3 pages
Regression Shrinkage and Selection Via The Lasso: A Retrospective
No ratings yet
Regression Shrinkage and Selection Via The Lasso: A Retrospective
10 pages
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
No ratings yet
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
16 pages
Model Development For Entry Capacity Estimation of Selected Roundabouts of Nepal
No ratings yet
Model Development For Entry Capacity Estimation of Selected Roundabouts of Nepal
53 pages
Business Statistics Unit 4 Correlation and Regression
No ratings yet
Business Statistics Unit 4 Correlation and Regression
27 pages
ML Unit 3 r20 Jntuk
No ratings yet
ML Unit 3 r20 Jntuk
22 pages
Lecture 6
No ratings yet
Lecture 6
24 pages
Supervised Learning: Overview 3: Rayid Ghani
No ratings yet
Supervised Learning: Overview 3: Rayid Ghani
20 pages
2.4-Ensemble Methods Lecture Notes
No ratings yet
2.4-Ensemble Methods Lecture Notes
14 pages
Lecture 10 Ensemble Methods
No ratings yet
Lecture 10 Ensemble Methods
69 pages
Workshop On SEM Intro NRMS
No ratings yet
Workshop On SEM Intro NRMS
29 pages
Business Statistics, 4e: by Ken Black
No ratings yet
Business Statistics, 4e: by Ken Black
38 pages
Ensembles of Classifiers: Evgueni Smirnov
No ratings yet
Ensembles of Classifiers: Evgueni Smirnov
43 pages
Dav Assignment 5
No ratings yet
Dav Assignment 5
2 pages
DTSpaper110915 PDF
No ratings yet
DTSpaper110915 PDF
9 pages
Click The Link Below To Download - : Regression-And-Anova-10671280
100% (1)
Click The Link Below To Download - : Regression-And-Anova-10671280
81 pages
Ensemble Learning Methods
100% (1)
Ensemble Learning Methods
24 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
33 pages
13 PracticalMachineLearning
100% (1)
13 PracticalMachineLearning
84 pages
Random Forest
No ratings yet
Random Forest
10 pages
4.2 Correlation Regression TQ
No ratings yet
4.2 Correlation Regression TQ
9 pages
Regression Analysis
No ratings yet
Regression Analysis
16 pages
Assessment Factor
No ratings yet
Assessment Factor
11 pages
The Autoregressive
No ratings yet
The Autoregressive
7 pages
SMDS Unit 5
No ratings yet
SMDS Unit 5
21 pages
Princeton Review PSAT/NMSQT Prep, 2025: 3 Practice Tests + Review + Online Tools for the Digital PSAT
From Everand
Princeton Review PSAT/NMSQT Prep, 2025: 3 Practice Tests + Review + Online Tools for the Digital PSAT
The Princeton Review
No ratings yet
What Is Ensemble Learning
No ratings yet
What Is Ensemble Learning
4 pages
2022 Test
No ratings yet
2022 Test
12 pages
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
36 pages
Management Accounting: Activity Cost Behavior
No ratings yet
Management Accounting: Activity Cost Behavior
28 pages
Example of Supervised Learning Algorithms
No ratings yet
Example of Supervised Learning Algorithms
5 pages
Wooldridge 7e Ch13 SM
No ratings yet
Wooldridge 7e Ch13 SM
10 pages
Ensemble Classifiers
100% (1)
Ensemble Classifiers
37 pages
TEST1
No ratings yet
TEST1
8 pages
ASM Question Paper
No ratings yet
ASM Question Paper
2 pages
Data Management Using R and Excel
No ratings yet
Data Management Using R and Excel
3 pages
Ensembles 1
No ratings yet
Ensembles 1
4 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet