0% found this document useful (0 votes)
18 views

ML-Lecture-15-Ensemble

This document is a lecture on Ensemble Learning Methods, detailing various techniques such as Bagging, Boosting, and Stacking, along with specific algorithms like Random Forest, AdaBoost, and XGBoost. It explains the process of creating multiple classifiers and combining their predictions to improve model performance. Additionally, it covers the advantages and disadvantages of these methods, particularly focusing on Random Forests and their comparison to decision trees.

Uploaded by

Shohanur Rahman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

ML-Lecture-15-Ensemble

This document is a lecture on Ensemble Learning Methods, detailing various techniques such as Bagging, Boosting, and Stacking, along with specific algorithms like Random Forest, AdaBoost, and XGBoost. It explains the process of creating multiple classifiers and combining their predictions to improve model performance. Additionally, it covers the advantages and disadvantages of these methods, particularly focusing on Random Forests and their comparison to decision trees.

Uploaded by

Shohanur Rahman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Machine Learning

Lecture 15: Ensemble Learning Methods


COURSE CODE: CSE451
2023
Course Teacher
Dr. Mrinal Kanti Baowaly
Associate Professor
Department of Computer Science and
Engineering, Bangabandhu Sheikh
Mujibur Rahman Science and
Technology University, Bangladesh.

Email: [email protected]
Ensemble Learning
 A powerful way to improve the performance of your model
 Construct a set of classifiers from training data
 Predict class label of test data by combining the predictions made
by multiple classifiers or models
 Examples: Random Forest, AdaBoost, Stochastic Gradient Boosting,
Gradient Boosting Machine(GBM), XGBoost, LightGBM, CatBoost
General Approach
Original
D Training data

Step 1:
Create Multiple D1 D2 .... Dt-1 Dt
Data Sets

Step 2:
Build Multiple C1 C2 Ct -1 Ct
Classifiers

Step 3:
Combine C*
Classifiers
Simple Ensemble Techniques
 Max Voting
 Averaging
 Weighted Averaging
Max Voting
 Multiple models are used to make predictions for each data point
 The predictions by each model are considered as a ‘vote’
 The predictions which we get from the majority of the models are
used as the final prediction
 Generally used for classification problems
 For example, when you asked 5 of your colleagues to rate your movie (out of
5); we’ll assume three of them rated it as 4 while two of them gave it a 5. Since
the majority gave a rating of 4, you can take the final rating of the movie as 4.
You can consider this as taking the mode of all the predictions.
Averaging
 Similar to the max voting technique, multiple predictions are made
for each data
 Take an average of predictions from all the models and use it to
make the final prediction.
 Averaging can be used in regression or classification problems.
 For example, in the previous case study of max voting, the averaging method
would take the average of all the values, i.e. (5+4+5+4+4)/5 = 4.4.
Hence, final rating of the movie is 4.4.
Weighted Averaging
 This is an extension of the averaging method.
 All models are assigned different weights defining the importance
of each model for prediction.
 For example, if two of your colleagues are critics, while others have no prior
experience in this field, then the answers by these two friends are given more
importance as compared to the other people.
The result can be calculated as [(5*0.23) + (4*0.23) + (5*0.18) + (4*0.18) +
(4*0.18)] = 4.41.
Hence, final rating of the movie is 4.41.

Implementation: AnalyticsVidhya, GeeksForGeeks


Advanced Ensemble Techniques
 Bagging: The idea behind bagging is combining the results of
multiple models run in parallel (for instance, all decision trees) to
get a generalized result.
 Boosting: Boosting is a sequential process, where each subsequent
model attempts to correct the errors of the previous model.
 Stacking: Stacking is an ensemble learning technique that uses
multiple models’ (called base models) predictions as features to
build a new model (called meta-model).
Bagging
 Multiple subsets are created from the
original dataset, selecting observations
with replacement (called bootstrapping).
 A base model (weak model) is created on
each of these subsets.
 The models run in parallel and are
independent of each other.
 The final predictions are determined by
combining the predictions from all the
models
Boosting
1. A base (weak) learner takes all the distributions
and assign equal weight or attention to each
observation.
2. If there is any prediction error caused by the base
learning algorithm, then we pay higher weight or
attention to observations having prediction error.
3. Apply the next base learning algorithm.
4. Repeat step 2 to 3 until the algorithm can correctly
classify the output or maximum number of
iterations is reached.
5. The weak learners are combined to form a strong
learner that will predict a more accurate outcome.
An Example of Boosting (AdaBoost)
 B1 consist of 10 data points which consist of two types namely plus(+) and minus(-
) and 5 of which are plus(+) and other 5 are minus(-) and each one has been
assigned equal weight initially. The first model tries to classify the data points and
generates a vertical separator line but it wrongly classifies 3 plus(+) as minus(-).
 B2 consists of the 10 data points from the previous model in which the 3 wrongly
classified plus(+) are weighted more so that the current model tries more to
classify these pluses(+) correctly. This model generates a vertical separator line
which correctly classifies the previously wrongly classified pluses(+) but in this
attempt, it wrongly classifies three minuses(-).
 B3 consists of the 10 data points from the previous model in which the 3 wrongly
classified minus(-) are weighted more so that the current model tries more to
classify these minuses(-) correctly. This model generates a horizontal separator
line which correctly classifies the previously wrongly classified minuses(-).
 B4 combines together B1, B2 and B3 in order to build a strong prediction model
which is much better than any individual model used.
Another Example: Dataaspirant, Detail Implementation: AnalyticsVidhya
HW: Difference between Bagging and
Boosting

Ref: QuantDare
Stacking Ensemble Learning

Level 0

Level 1

Source and Implementation:


GeeksForGeeks, AnalyticsVidhya
Random Forests Classifier
 The random forests algorithm
 How does the algorithm work?
 Its advantages and disadvantages
 Comparison between random forests and decision trees
 Finding important features
 Building a classifier with scikit-learn
Random Forests Algorithm
 It is a popular supervised learning algorithm.
 Random forest builds multiple decision trees (called forest) on
various random samples (or subsets) from a given dataset takes the
prediction from each tree and predicts the final output based on
the majority votes of the predictions.
 It is based on ‘bagging’ ensemble method that yields a more
accurate and stable prediction.
 It can be used both for classification and regression.
How does the algorithm work?
 Select random samples from a given
dataset (using bootstrapping).
 Construct a decision tree for each
sample and get a prediction result
from each decision tree.
 Final prediction is made by selecting
the prediction with the most votes
(for classification) or averaging the
predictions (for regression).
Advantages of Random Forests
 Random forests is considered as a highly accurate and robust
method because of the number of decision trees participating in
the process.
 It likely does not suffer from the overfitting problem because it
creates multiple trees on random subsets, takes the average or
most votes of the predictions of the trees, which cancel out the
biases. The randomness and voting or averaging mechanisms in
random forests elegantly solve the overfitting problem.
 It can handle missing data.
 It can be used in both classification and regression problems.
Disadvantages of Random Forests
 Random forests is slow because it builds multiple decision trees
and makes the final prediction by combining the predictions of
each individual tree.
 The model is difficult to interpret compared to a decision tree,
where you can easily make a decision by following the path in the
tree
Random Forest vs Decision Tree
 Random forest is a set of multiple decision trees whereas decision
tree is a single tree.
 Deep decision tree may suffer from overfitting, but random forest
prevents overfitting by creating multiple trees on random subsets.
 Decision tree is computationally faster, but random forest is slower.
 Random forests is difficult to interpret, while a decision tree is
easily interpretable and can be converted to rules.
Finding Important Features
 Random forests offers a good feature selection indicator.
 Scikit-learn provides an extra variable(feature_importances_) with the
model, which shows the relative importance or contribution of each feature
in the prediction.
 It automatically computes the relevance score of each feature in the
training phase. Then it scales the relevance down so that the sum of all
scores is 1. The higher the score, the more important the feature.
 This score will help you choose the most important features and drop the
least important ones for model building.
 Random forest uses gini importance (or impurity-based feature importance)
to calculate the importance of each feature.
More on Random Forest (LAB)
 Build a Random Forest classifier with scikit-learn
 Find important features of a Random Forest classifier with scikit-
learn
 Build both Decision Tree and Random Forest classifiers and
compare their performances
 Why does Random Forest model outperform the Decision Tree?

Source: DataCamp, AnalyticsVidhya


Advanced Boosting Methods
 What is GBM?
 What is XGBoost?
 What is LightGBM?
 Advantages of using Light GBM and XGBoost
 Build classifiers using GBM, LightGBM and XGBoost
 Compare GBM, LightGBM and XGBoost
 Which algorithm takes the crown: LightGBM or XGBoost?
Source: AnalyticsVidhya [1], [2]
Advanced Boosting Methods(Cont..)
 What is CatBoost?
 Advantages of CatBoost library
 CatBoost in comparison to other boosting algorithms
 Installing CatBoost
 Solving ML challenge using CatBoost

Source: AnalyticsVidhya, Dataaspirant


Comparison of CatBoost to other
boosting algorithms
A Comprehensive Course on Ensemble
Learning

Enroll now
Study Materials of Ensemble Methods
 AnalyticsVidhya: A Comprehensive Guide to Ensemble Learning
(with Python codes)
 GeeksForGeeks: Ensemble Method in Python
 AnalyticsVidhya: Basics of Ensemble Learning Explained in Simple
English
 Dataaspirant: How the Kaggle winners algorithm XGBoost algorithm
works

You might also like