ENsemble, Random Forest

Uploaded by

shaikhmismail66

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views28 pages

ENsemble, Random Forest

Uploaded by

shaikhmismail66

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Chapter 7.

Ensemble Learning and

Random Forests

Nidhin Pattaniyil
Table of Contents
- Voting Classifier
- Bagging and Pasting
- Random Forests
- Boosting
- Stacking
Introduction
- Ensemble: group of predictors
- Ensemble Learning: aggregate predictions of a group of predictors
- Ensemble method: ensemble learning algorithm
- Ensemble Methods:
- Bagging
- Boosting
- Stacking
- Work best when predictors are independent from one another as possible
Voting Classifiers
Voting Classifiers
- Hard Voting Classifier: predict the class that gets the most vote
Voting Classifiers: Soft Voting
- clf1 -> [0.2, 0.8], clf2 -> [0.1, 0.9], clf3 ->
[0.8, 0.2]
-
- With equal weights, the probabilities will
get calculated as the following:
-
- Prob of Class 0 = 0.33*0.2 + 0.33*0.1 +
0.33*0.8 = 0.363
-
- Prob of Class 1 = 0.33*0.8 + 0.33*0.9 +
0.33*0.2 = 0.627
-
- The probability predicted by ensemble
classifier will be [36.3%, 62.7%].
Logistic Regression: 0.864
RandomForestClassifier 0.896
SVC: 0.888

VotingClassifier 0.904
Bagging and Pasting
Bagging and Pasting
- Use same training algorithm but train on different random subsets of training
set
- Two types:
- Bagging: sampling with replacement;
- Pasting: sampling without replacement
- Each individual predictor has a higher bias
- Ensemble has a similar bias but a lower variance than a single predictor
trained on original training set
Bagging and Pasting
- Ensemble’s prediction will likely generalize better than single Decision Tree
Out-of-bag Dataset

Wikipedia: Out-of-bag error

Random Patches and Random Subspaces
- Sample features ( bootstrap_features and max_features)
- Sample records: ( bootstrap and max_samples)
- Random Patches
- Sampling both training instances and features
- Random Subspaces
- Keeping all training instances but sampling features
- Sampling features results in even more predictor diversity, trading a bit more
bias for a lower variance.
Random Forests
Random Forest
- Ensemble of Decision Trees trained via bagging
- If using a BaggingClassifier of DecisionTreeClassifier, you could just use
RandomForestClassifier
- All the hyperparameters of DecisionTree and BaggingClassifier
- at each split , it only searches for the best feature among a random subset of
features..
- leads to greater tree diversity thus higher bias, low variance
-
Extra-Trees
- Faster to train than RandomForest
- Extra Trees uses random thresholds instead of searching for best threshold
Feature Importance
- For each feature we can collect how on average it decreases the impurity.
- The average over all trees in the forest is the measure of the feature
importance.
- weighted average, where each node’s weight is equal to the number of
training samples that are associated with it
Boosting
Boosting
- In Random Forest, all the trees can be independently trained
- Train predictors sequentially , each trying to correct its predecessors error
- Two popular boosting methods:
- AdaBoost : increase misclassified instance weight at each iteration
- Gradient Boosting: new predictor trained on residual errors of previous predictor
AdaBoost
Gradient Boosting (step 0)
- Trying to predict income

Reference:
https://www.analyticsvidhya.com/blog/2021/03/gradient-boosting-machine-for-data-scientists/
Gradient Boosting (step 1)
- Train model 1
- compute predictions

Reference: https://www.analyticsvidhya.com/blog/2021/03/gradient-boosting-machine-for-data-scientists/
Gradient Boosting (step 2)
- Using the predictions , compute residual
- Save model 1 predictions

Reference: https://www.analyticsvidhya.com/blog/2021/03/gradient-boosting-machine-for-data-scientists/
Gradient Boosting (step 3)
- Train a new model where the target is the error from model 1
- Save model 1 predictions
- Repeat for further models

Reference: https://www.analyticsvidhya.com/blog/2021/03/gradient-boosting-machine-for-data-scientists/
Gradient Boosting
- Model 0: predicts the target
- Model 1 and above, target is the previous error

Reference: https://www.analyticsvidhya.com/blog/2021/03/gradient-boosting-machine-for-data-scientists/
Gradient Boosting
- XGBoost, LightGBM, Catboost are other popular libraries
- Gradient Boosting also used for ranking
Stacking
Stacking
- Instead of using hard voting, train a model
to perform the aggregating
- Training
- Create a hold out dataset
- Train classifiers on split 1
- Get output from classifier on split 2 and
use as training data
- Blender is trained from first layers
predictions
Summary
- Ensemble methods: Bagging / Boosting / Stacking
- Voting: Hard or Soft Voting
- Sample Training Data / Sample Features
- Random Forests: Bagging Tree Classifier ; feature importance, OOB score
- Boosting: AdaBoost / Gradient Boosting
- Stacking: model to perform aggregation

Data Mining
75% (4)
Data Mining
22 pages
Quiz 10 - Regression, Cluster Analysis, & Association Analysis
No ratings yet
Quiz 10 - Regression, Cluster Analysis, & Association Analysis
3 pages
Group9 ABA Ensemble Model
No ratings yet
Group9 ABA Ensemble Model
5 pages
Unit 3 Aml
No ratings yet
Unit 3 Aml
9 pages
ML-Unit I - Ensemble Methods
No ratings yet
ML-Unit I - Ensemble Methods
54 pages
Ensemble Methods
No ratings yet
Ensemble Methods
31 pages
Unit 3
No ratings yet
Unit 3
63 pages
Unit 3
No ratings yet
Unit 3
59 pages
ML Lecture 15 Ensemble
No ratings yet
ML Lecture 15 Ensemble
27 pages
Ensemble Method
No ratings yet
Ensemble Method
8 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
4 pages
Unit 3
No ratings yet
Unit 3
99 pages
Ensemble Learning
No ratings yet
Ensemble Learning
52 pages
ML Unit-3 Part-1
No ratings yet
ML Unit-3 Part-1
17 pages
Module 7 - Ensemble Learning
No ratings yet
Module 7 - Ensemble Learning
41 pages
Unit-3 ML
No ratings yet
Unit-3 ML
18 pages
Module 2
No ratings yet
Module 2
34 pages
ML Unit-3
No ratings yet
ML Unit-3
28 pages
Bagging
No ratings yet
Bagging
7 pages
05 - Ensemble Learning
No ratings yet
05 - Ensemble Learning
39 pages
Ensemble Learning Methods
100% (1)
Ensemble Learning Methods
24 pages
Eda - M4
No ratings yet
Eda - M4
7 pages
Classification Algorithms
No ratings yet
Classification Algorithms
68 pages
ML Unit 3 (DS)
No ratings yet
ML Unit 3 (DS)
31 pages
Chapter 7 - Ensemble
No ratings yet
Chapter 7 - Ensemble
12 pages
14 Model Ensembles
No ratings yet
14 Model Ensembles
63 pages
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
No ratings yet
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
27 pages
Lecture 6
No ratings yet
Lecture 6
24 pages
Ensemble Methods
No ratings yet
Ensemble Methods
32 pages
Ensemble Techniques Presentation
No ratings yet
Ensemble Techniques Presentation
17 pages
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
No ratings yet
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
6 pages
Lecture 10 Ensemble Methods
No ratings yet
Lecture 10 Ensemble Methods
69 pages
Ensemble Learning (Autosaved)
No ratings yet
Ensemble Learning (Autosaved)
31 pages
Lecture 2.1 - AML
No ratings yet
Lecture 2.1 - AML
32 pages
Unit 5 ML
No ratings yet
Unit 5 ML
14 pages
Ensemble Final
No ratings yet
Ensemble Final
41 pages
Ensemble - Part 1
No ratings yet
Ensemble - Part 1
33 pages
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
36 pages
ML Mod 5.1
No ratings yet
ML Mod 5.1
18 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
8 pages
Ensemble Learning
No ratings yet
Ensemble Learning
35 pages
Unit V - Multiple Learners
No ratings yet
Unit V - Multiple Learners
54 pages
ML Unit 3
No ratings yet
ML Unit 3
22 pages
2.4-Ensemble Methods Lecture Notes
No ratings yet
2.4-Ensemble Methods Lecture Notes
14 pages
Machine Learning
No ratings yet
Machine Learning
76 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
40 pages
07-Ensembles Notes
No ratings yet
07-Ensembles Notes
21 pages
Bagging Vs Boosting - Javatpoint
No ratings yet
Bagging Vs Boosting - Javatpoint
8 pages
Pa - Unit - Iv
No ratings yet
Pa - Unit - Iv
45 pages
D3 IT Random Forest Apr 2023
No ratings yet
D3 IT Random Forest Apr 2023
32 pages
3 Ensemble Learning Bagging
No ratings yet
3 Ensemble Learning Bagging
15 pages
Ch-4 Ensemble Learning
No ratings yet
Ch-4 Ensemble Learning
18 pages
Evaluating Machine Learning Algorithms and Model Selection
No ratings yet
Evaluating Machine Learning Algorithms and Model Selection
10 pages
ML Unit 3-1
No ratings yet
ML Unit 3-1
14 pages
U1-Ensemble Methods
No ratings yet
U1-Ensemble Methods
17 pages
Ensemble Learning and Random Forest 4th
No ratings yet
Ensemble Learning and Random Forest 4th
19 pages
Bagging
No ratings yet
Bagging
6 pages
ML Unit-3
No ratings yet
ML Unit-3
15 pages
Data Mining - Ensemble Methods
No ratings yet
Data Mining - Ensemble Methods
12 pages
ML Mod1
No ratings yet
ML Mod1
48 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Cbsnews Post-Speech 20250304
100% (3)
Cbsnews Post-Speech 20250304
3 pages
M2 Differentiate Descriptive From Inferential Statistics
No ratings yet
M2 Differentiate Descriptive From Inferential Statistics
5 pages
Pengaruh Beban Kerja, Kompensasi, Dan Kepuasan Terhadap Prestasi Kerja Karyawan Dengan Disiplin Kerja Sebagai Variabel Intervening Pada Pt. Indosawit Kecamatan Ukui Kabupaten Pelalawan Provinsi Riau
No ratings yet
Pengaruh Beban Kerja, Kompensasi, Dan Kepuasan Terhadap Prestasi Kerja Karyawan Dengan Disiplin Kerja Sebagai Variabel Intervening Pada Pt. Indosawit Kecamatan Ukui Kabupaten Pelalawan Provinsi Riau
11 pages
Silva Et Al. (2009)
No ratings yet
Silva Et Al. (2009)
4 pages
đề CLC số 1
No ratings yet
đề CLC số 1
2 pages
Linear Transformation of Random Variables
No ratings yet
Linear Transformation of Random Variables
10 pages
Palomaria - Module 3
No ratings yet
Palomaria - Module 3
9 pages
Jadad Scale PDF
100% (1)
Jadad Scale PDF
12 pages
Module 2
No ratings yet
Module 2
45 pages
205-Article Text-845-3-10-20220513
No ratings yet
205-Article Text-845-3-10-20220513
10 pages
Basic Business Statistics: 12 Edition
No ratings yet
Basic Business Statistics: 12 Edition
57 pages
Bit2301 Research Methodology Past Paper Questions
No ratings yet
Bit2301 Research Methodology Past Paper Questions
6 pages
11 CramerV
No ratings yet
11 CramerV
4 pages
KWT 5.ukuran Keragaman Data
No ratings yet
KWT 5.ukuran Keragaman Data
27 pages
Copia de LD50 LC50 Probit Analysis
No ratings yet
Copia de LD50 LC50 Probit Analysis
15 pages
2 T-Test: Group Statistics
No ratings yet
2 T-Test: Group Statistics
7 pages
g03 Bergonio g05 Fabul
No ratings yet
g03 Bergonio g05 Fabul
25 pages
MBA Sahil Business Analytics
No ratings yet
MBA Sahil Business Analytics
5 pages
Question Paper of Math
No ratings yet
Question Paper of Math
2 pages
Week 7 and 8
No ratings yet
Week 7 and 8
32 pages
Hasil Uji Paired T Test
No ratings yet
Hasil Uji Paired T Test
2 pages
Econ3016: Empirical Finance WEEK 3/4
No ratings yet
Econ3016: Empirical Finance WEEK 3/4
31 pages
Statistical Methods For Scientific Research Trainning Module1
50% (2)
Statistical Methods For Scientific Research Trainning Module1
257 pages
Boletín Científico de La Escuela Superior Atotonilco de Tula
No ratings yet
Boletín Científico de La Escuela Superior Atotonilco de Tula
5 pages
Sampling Distribution
No ratings yet
Sampling Distribution
13 pages
How To Prepare Statistics For SSC CGL Tier II Study Notes in PDF
No ratings yet
How To Prepare Statistics For SSC CGL Tier II Study Notes in PDF
10 pages
Statistics Learning From Data 1st Edition Roxy Peck Test Bank
100% (35)
Statistics Learning From Data 1st Edition Roxy Peck Test Bank
7 pages

ENsemble, Random Forest

Uploaded by

ENsemble, Random Forest

Uploaded by

Chapter 7.

Ensemble Learning and

Wikipedia: Out-of-bag error

You might also like