0% found this document useful (0 votes)
18 views28 pages

ENsemble, Random Forest

Uploaded by

shaikhmismail66
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views28 pages

ENsemble, Random Forest

Uploaded by

shaikhmismail66
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Chapter 7.

Ensemble Learning and


Random Forests

Nidhin Pattaniyil
Table of Contents
- Voting Classifier
- Bagging and Pasting
- Random Forests
- Boosting
- Stacking
Introduction
- Ensemble: group of predictors
- Ensemble Learning: aggregate predictions of a group of predictors
- Ensemble method: ensemble learning algorithm
- Ensemble Methods:
- Bagging
- Boosting
- Stacking
- Work best when predictors are independent from one another as possible
Voting Classifiers
Voting Classifiers
- Hard Voting Classifier: predict the class that gets the most vote
Voting Classifiers: Soft Voting
- clf1 -> [0.2, 0.8], clf2 -> [0.1, 0.9], clf3 ->
[0.8, 0.2]
-
- With equal weights, the probabilities will
get calculated as the following:
-
- Prob of Class 0 = 0.33*0.2 + 0.33*0.1 +
0.33*0.8 = 0.363
-
- Prob of Class 1 = 0.33*0.8 + 0.33*0.9 +
0.33*0.2 = 0.627
-
- The probability predicted by ensemble
classifier will be [36.3%, 62.7%].
Logistic Regression: 0.864
RandomForestClassifier 0.896
SVC: 0.888

VotingClassifier 0.904
Bagging and Pasting
Bagging and Pasting
- Use same training algorithm but train on different random subsets of training
set
- Two types:
- Bagging: sampling with replacement;
- Pasting: sampling without replacement
- Each individual predictor has a higher bias
- Ensemble has a similar bias but a lower variance than a single predictor
trained on original training set
Bagging and Pasting
- Ensemble’s prediction will likely generalize better than single Decision Tree
Out-of-bag Dataset

Wikipedia: Out-of-bag error


Random Patches and Random Subspaces
- Sample features ( bootstrap_features and max_features)
- Sample records: ( bootstrap and max_samples)
- Random Patches
- Sampling both training instances and features
- Random Subspaces
- Keeping all training instances but sampling features
- Sampling features results in even more predictor diversity, trading a bit more
bias for a lower variance.
Random Forests
Random Forest
- Ensemble of Decision Trees trained via bagging
- If using a BaggingClassifier of DecisionTreeClassifier, you could just use
RandomForestClassifier
- All the hyperparameters of DecisionTree and BaggingClassifier
- at each split , it only searches for the best feature among a random subset of
features..
- leads to greater tree diversity thus higher bias, low variance
-
Extra-Trees
- Faster to train than RandomForest
- Extra Trees uses random thresholds instead of searching for best threshold
Feature Importance
- For each feature we can collect how on average it decreases the impurity.
- The average over all trees in the forest is the measure of the feature
importance.
- weighted average, where each node’s weight is equal to the number of
training samples that are associated with it
Boosting
Boosting
- In Random Forest, all the trees can be independently trained
- Train predictors sequentially , each trying to correct its predecessors error
- Two popular boosting methods:
- AdaBoost : increase misclassified instance weight at each iteration
- Gradient Boosting: new predictor trained on residual errors of previous predictor
AdaBoost
Gradient Boosting (step 0)
- Trying to predict income

Reference:
https://www.analyticsvidhya.com/blog/2021/03/gradient-boosting-machine-for-data-scientists/
Gradient Boosting (step 1)
- Train model 1
- compute predictions

Reference: https://www.analyticsvidhya.com/blog/2021/03/gradient-boosting-machine-for-data-scientists/
Gradient Boosting (step 2)
- Using the predictions , compute residual
- Save model 1 predictions

Reference: https://www.analyticsvidhya.com/blog/2021/03/gradient-boosting-machine-for-data-scientists/
Gradient Boosting (step 3)
- Train a new model where the target is the error from model 1
- Save model 1 predictions
- Repeat for further models

Reference: https://www.analyticsvidhya.com/blog/2021/03/gradient-boosting-machine-for-data-scientists/
Gradient Boosting
- Model 0: predicts the target
- Model 1 and above, target is the previous error

Reference: https://www.analyticsvidhya.com/blog/2021/03/gradient-boosting-machine-for-data-scientists/
Gradient Boosting
- XGBoost, LightGBM, Catboost are other popular libraries
- Gradient Boosting also used for ranking
Stacking
Stacking
- Instead of using hard voting, train a model
to perform the aggregating
- Training
- Create a hold out dataset
- Train classifiers on split 1
- Get output from classifier on split 2 and
use as training data
- Blender is trained from first layers
predictions
Summary
- Ensemble methods: Bagging / Boosting / Stacking
- Voting: Hard or Soft Voting
- Sample Training Data / Sample Features
- Random Forests: Bagging Tree Classifier ; feature importance, OOB score
- Boosting: AdaBoost / Gradient Boosting
- Stacking: model to perform aggregation

You might also like