0% found this document useful (0 votes)

35 views41 pages

Module 7 - Ensemble Learning

Uploaded by

hassanmehmoood786

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views41 pages

Module 7 - Ensemble Learning

Uploaded by

hassanmehmoood786

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 41

Ensemble Learning

Ensemble Learning
• An ensemble is a composite model that combines a series of low-
performing classifiers with the aim of creating an improved classifier.
• Here, individual classifier vote and final prediction label returned that
performs majority voting.
• Ensembles offer more accuracy than individual or base classifiers.
• Ensemble methods can parallelize by allocating each base learner to
different-different machines.
• Finally, you can say Ensemble learning methods are meta-algorithms that
combine several machine learning methods into a single predictive
model to increase performance.
• Ensemble methods can decrease variance using a bagging approach, bias
using a boosting approach, or improve predictions using a stacking
approach.
Real life examples
Let's take a real example to build intuition.
• Suppose, you want to invest in a company XYZ.
• You are not sure about its performance, though.
• So, you look for advice on whether the stock price will
increase by more than 6% per annum or not?
• You decide to approach various experts having diverse
domain experience.
The survey prediction
• Employee of Company XYZ:
– In the past, he has been right 70% times.
• Financial Advisor of Company XYZ:
– In the past, he has been right 75% times.
• Stock Market Trader:
– In the past, he has been right 70% times.
• Employee of a competitor:
– In the past, he has been right 60% times.
• Market Research team in the same segment:
– In the past, he has been right 75% times.
• Social Media Expert:
– In the past, he has been right 65% times.
Conclusion
• Given the broad spectrum of access you have, you can probably combine
all the information and make an informed decision.
• In a scenario when all the 6 experts/teams verify that it’s a good
decision(assuming all the predictions are independent of each other), you
will get a combined accuracy rate of 1 - (30% . 25% . 30% . 40% . 25% .
35%) = 1 - 0.07875 = 99.92125%
• The assumption used here that all the predictions are completely
independent is slightly extreme as they are expected to be correlated.
However, you can see how we can be so sure by combining various
forecasts together.
• Well, Ensemble learning is no different.
Ensembling
• An ensemble is the art of combining a diverse set of learners
(individual models) together to improvise on the stability and
predictive power of the model.
• In our example, the way we combine all the predictions collectively will
be termed as Ensemble learning.
• Moreover, Ensemble-based models can be incorporated in both of the
two scenarios, i.e., when data is of large volume and when data is
too little.
Basic Ensemble Structure
Summary
• Use multiple learning algorithms (classifiers)
• Combine the decisions
• Can be more accurate than the individual classifiers
• Generate a group of base-learners
• Different learners use different
– Algorithms
– Hyperparameters
– Representations (Modalities)
– Training sets
How models are different?
• Difference in population
• Difference in hypothesis
• Difference in modelling technique
• Difference in initial seed
Why ensembles?
• There are two main reasons to use an ensemble over a single model, and
they are related; they are:

– Performance: An ensemble can make better predictions and achieve

better performance than any single contributing model.

– Robustness: An ensemble reduces the spread or dispersion of the

predictions and model performance.
Ensemble Creation Approaches
• Unweighted Voting (e.g. Bagging)

• Weighted voting – based on accuracy (e.g. Boosting), Expertise, etc.

• Stacking - Learn the combination function

Ensemble Learning Methods
• Bagging

• Boosting

• Stacking
Bagging
• Bagging stands for bootstrap aggregation.
• It combines multiple learners in a way to reduce the variance of
estimates.
• It involves training multiple instances of a base model on different
subsets of the training data and then combining their predictions.
• For example, random forest trains M Decision Tree, you can train M
different trees on different random subsets of the data and perform
voting for final prediction.
• Example:
– Random Forest
– Extra Trees
Bagging
Bagging Example
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.ensemble import BaggingClassifier

data = datasets.load_wine(as_frame = True)

X = data.data
y = data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state =

22)
estimator_range = [2,4,6,8,10,12,14,16,18,20]

models = []
scores = []
for n_estimators in estimator_range:

# Create a bagging classifier

clf = BaggingClassifier(n_estimatorsn_estimators = n_estimators, random_state = 22)

# Fit the model

clf.fit(X_train, y_train)

# Append the model and score to their respective list

models.append(clf)
scores.append(accuracy_score(y_true = y_test, y_pred = clf.predict(X_test)))

# Generate the plot of the scores against a number of the estimators

plt.figure(figsize=(9,6))
plt.plot(estimator_range, scores)

# Adjust labels and font (to make them visible)

plt.xlabel("n_estimators", font size = 18)
plt.ylabel("score", font size = 18)
plt.tick_params(label size = 16)

# show the plot

plt.show()
Random Forest
• Random forest is a type of supervised machine learning
algorithm based on ensemble learning.
• The random forest algorithm combines multiple algorithm of
the same type i.e. multiple decision trees, resulting in a
forest of trees, hence the name "Random Forest".
• The random forest algorithm can be used for both regression
and classification tasks.
How it works?
• Pick N random records from the dataset.
• Build a decision tree based on these N records.
• Choose the number of trees you want in your algorithm and
repeat steps 1 and 2.
• In case of a regression problem, for a new record, each tree
in the forest predicts a value for Y (output).
• The final value can be calculated by taking the average of all
the values predicted by all the trees in the forest.
• Or, in case of a classification problem, each tree in the forest
predicts the category to which the new record belongs.
Finally, the new record is assigned to the category that wins
the majority vote.
Majority Voting
Regressor Output
Boosting
• Boosting algorithms are a set of the low accurate classifier to
create a highly accurate classifier.
• Low accuracy classifier (or weak classifier) offers the accuracy better
than the flipping of a coin.
• This is done by building a model from the training data, then creating
a second model that attempts to correct the errors from the first model.
Models are added until the training set is predicted perfectly or a
maximum number of models are added.
• Highly accurate classifier( or strong classifier) offer error rate close to 0.
• Boosting algorithm can track the model who failed the accurate
prediction.
• Boosting algorithms are less affected by the overfitting problem.
Boosting Models
Models that are typically used in Boosting technique are:

– XGBoost (Extreme Gradient Boosting)

– GBM (Gradient Boosting Machine)

– ADABoost (Adaptive Boosting)

Adaboost Summary
• Initially, Adaboost selects a training subset randomly.
• It iteratively trains the AdaBoost machine learning model by selecting
the training set based on the accurate prediction of the last training.
• It assigns the higher weight to wrong classified observations so that in
the next iteration these observations will get the high probability for
classification.
• Also, it assigns the weight to the trained classifier in each iteration
according to the accuracy of the classifier. The more accurate classifier
will get a higher weight.
• This process iterates until the complete training data fits without any
error or until reached to the specified maximum number of estimators.
• To classify, perform a "vote" across all of the learning algorithms you
built.
How Adaboost Works?
Comparison
Stacking
• Stacked Generalization or “Stacking” for short is an ensemble
machine learning algorithm.
• It involves combining the predictions from multiple
machine learning models on the same dataset, like bagging
and boosting.
• Stacking addresses the question:
– Given multiple machine learning models that are skillful
on a problem, but in different ways, how do you choose which
model to use (trust)?
Bagging Boosting
The most effective manner of mixing A manner of mixing predictions that belong
predictions that belong to the same type. to different sorts.

The main task of it is decrease the variance The main task of it is decrease the bias but
but not bias. not variance.
Here each of the model is different weight. Here each of the model is same weight.

Each of the model is built here Each of the model is built here dependently.
independently.
This training records subsets are decided on Each new subset consists of the factors that
using row sampling with alternative and were misclassified through preceding
random sampling techniques from the whole models.
training dataset.

It is trying to solve by over fitting problem. It is trying to solve by reducing the bias.
If the classifier is volatile (excessive If the classifier is stable and easy (excessive
variance), then apply bagging. bias) the practice boosting.
In the bagging base, the classifier is works In the boosting base, the classifier is works
parallelly. sequentially.
Example is random forest model by using Example is AdaBoost using the boosting
Stacking
• The approach to this question is to use another machine
learning model that learns when to use or trust each model in
the ensemble.
– Unlike bagging, in stacking, the models are typically
different (e.g. not all decision trees) and fit on the same dataset
(e.g. instead of samples of the training dataset).
– Unlike boosting, in stacking, a single model is used to
learn how to best combine the predictions from the contributing
models (e.g. instead of a sequence of models that correct the
predictions of prior models).
Stacking
• The architecture of a stacking model involves two or more base
models, often referred to as level-0 models, and a meta-model
that combines the predictions of the base models, referred to as a
level-1 model.

– Level-0 Models (Base-Models): Models fit on the training

data and whose predictions are compiled.

– Level-1 Model (Meta-Model): Model that learns how to best

combine the predictions of the base models.
Stacking
Stacking
• The meta-model is trained on the predictions made by base
models on out-of-sample data.
• That is, data not used to train the base models is fed to the
base models, predictions are made, and these predictions,
along with the expected outputs, provide the input and output
pairs of the training dataset used to fit the meta-model.
• The outputs from the base models used as input to the
meta-model may be real values in the case of regression
and probability values, probability like values, or class labels
in the case of classification.
Voting
• Voting is an ensemble machine learning algorithm.
• For regression, a voting ensemble involves making a
prediction that is the average of multiple other regression
models.
• In classification, a hard voting ensemble involves summing
the votes for crisp class labels from other models and
predicting the class with the most votes.
• A soft voting ensemble involves summing the predicted
probabilities for class labels and predicting the class label with
the largest sum probability.
Voting
• A voting ensemble (or a “majority voting ensemble“) is an
ensemble machine learning model that combines the
predictions from multiple other models.
• It is a technique that may be used to improve model
performance, ideally achieving better performance than any
single model used in the ensemble.
• A voting ensemble works by combining the predictions from
multiple models.
• It can be used for classification or regression.
Voting
Voting
• In the case of regression, this involves calculating the
average of the predictions from the models.
• In the case of classification, the predictions for each label
are summed and the label with the majority vote is predicted.

– Regression Voting Ensemble: Predictions are the average

of contributing models.

– Classification Voting Ensemble: Predictions are the

majority vote of contributing models.
Voting
• There are two approaches to the majority vote prediction for
classification they are hard voting and soft voting.

• Hard voting involves summing the predictions for each class label
and predicting the class label with the most votes.
• Soft voting involves summing the predicted probabilities (or
probability-like scores) for each class label and predicting the class
label with the largest probability.
– Hard Voting. Predict the class with the largest sum of votes
from models
– Soft Voting. Predict the class with the largest summed
probability from models.
Voting
• Use voting ensembles when:
– All models in the ensemble have generally the same
good performance.
– All models in the ensemble mostly already agree.

• Hard voting is appropriate when the models used in the

voting ensemble predict crisp class labels.

• Soft voting is appropriate when the models used in the

voting ensemble predict the probability of class membership.
Voting
• Soft voting can be used for models that do not natively
predict a class membership probability, although may require
calibration of their probability-like scores prior to being used
in the ensemble (e.g. support vector machine, k-nearest
neighbors, and decision trees).

– Hard voting is for models that predict class labels.

– Soft voting is for models that predict class membership
probabilities.

• The voting ensemble is not guaranteed to provide better

performance than any single model used in the ensemble.
Voting
• Use a voting ensemble if:
– It results in better performance than any model used in the
ensemble.
– It results in a lower variance than any model used in the
ensemble.

• A voting ensemble is particularly useful for machine learning

models that use a stochastic learning algorithm and result in a
different final model each time it is trained on the same
dataset.
• One example is neural networks that are fit using stochastic
gradient descent.

Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Indicators of Sustainable Development For Tourism Destinations
No ratings yet
Indicators of Sustainable Development For Tourism Destinations
516 pages
Baker Hughes Caisson ESP (2010)
No ratings yet
Baker Hughes Caisson ESP (2010)
2 pages
Unit 1 - HRP - Full With Case Study
100% (10)
Unit 1 - HRP - Full With Case Study
69 pages
Ensemble Learning
No ratings yet
Ensemble Learning
52 pages
Ensemble Methods
No ratings yet
Ensemble Methods
31 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
Unit 3
No ratings yet
Unit 3
99 pages
ML-Lecture-15-Ensemble
No ratings yet
ML-Lecture-15-Ensemble
27 pages
An Introduction of Ensemble Learning
100% (1)
An Introduction of Ensemble Learning
40 pages
unit 5 ML
No ratings yet
unit 5 ML
14 pages
Lecture 10 Ensemble Methods
No ratings yet
Lecture 10 Ensemble Methods
69 pages
Module 2
No ratings yet
Module 2
34 pages
Ensemble Learning
No ratings yet
Ensemble Learning
26 pages
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
36 pages
12 Ensemble Model
No ratings yet
12 Ensemble Model
90 pages
Lecture 5
No ratings yet
Lecture 5
11 pages
Ensemble Classifiers
100% (1)
Ensemble Classifiers
37 pages
Ensemble Methods (Final)
No ratings yet
Ensemble Methods (Final)
16 pages
Ensemble Classification
No ratings yet
Ensemble Classification
25 pages
Ensemble Final
No ratings yet
Ensemble Final
41 pages
What Is Ensemble Learning
No ratings yet
What Is Ensemble Learning
4 pages
Ensemble Learning
No ratings yet
Ensemble Learning
32 pages
Group9 ABA Ensemble Model
No ratings yet
Group9 ABA Ensemble Model
5 pages
Article Review 9 Eng
No ratings yet
Article Review 9 Eng
21 pages
Ensemble Learning
No ratings yet
Ensemble Learning
35 pages
Unit-3 ML
No ratings yet
Unit-3 ML
18 pages
Ensemble_Learning_SA
No ratings yet
Ensemble_Learning_SA
27 pages
Ensemble TBL Notes
No ratings yet
Ensemble TBL Notes
2 pages
Unit-3(1)
No ratings yet
Unit-3(1)
59 pages
Unit-3(1)
No ratings yet
Unit-3(1)
63 pages
Ensemble Classifiers
No ratings yet
Ensemble Classifiers
37 pages
Ensemble_Techniques_Presentation
No ratings yet
Ensemble_Techniques_Presentation
17 pages
UNIT 3 AML
No ratings yet
UNIT 3 AML
9 pages
Time To Explore (5) ML
No ratings yet
Time To Explore (5) ML
9 pages
Week 11 EnsembleLearning
No ratings yet
Week 11 EnsembleLearning
34 pages
Unit 4 Ensemble Techniques and Unsupervised Learning
100% (1)
Unit 4 Ensemble Techniques and Unsupervised Learning
25 pages
UMl - unit 3
No ratings yet
UMl - unit 3
50 pages
2.4-Ensemble_methods_lecture_notes (1)
No ratings yet
2.4-Ensemble_methods_lecture_notes (1)
14 pages
ML-Unit I - Ensemble Methods
No ratings yet
ML-Unit I - Ensemble Methods
54 pages
Unit V -Multiple Learners
No ratings yet
Unit V -Multiple Learners
54 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
8 pages
ML UNIT 3-1
No ratings yet
ML UNIT 3-1
14 pages
Ensemble Learning in Machine Learning
No ratings yet
Ensemble Learning in Machine Learning
15 pages
ML Mod 5.1
No ratings yet
ML Mod 5.1
18 pages
Ch-4 Ensemble Learning
No ratings yet
Ch-4 Ensemble Learning
18 pages
UNIT III Word File
No ratings yet
UNIT III Word File
13 pages
UNIT-5 ML notes
No ratings yet
UNIT-5 ML notes
24 pages
Unit I ML (I) 24-25
No ratings yet
Unit I ML (I) 24-25
79 pages
Ensemble Learning: Comprehensive Explanation: Base Models
No ratings yet
Ensemble Learning: Comprehensive Explanation: Base Models
20 pages
Ensemble Learning in Machine Learning
No ratings yet
Ensemble Learning in Machine Learning
4 pages
Classification Through Ensembling Techniques
No ratings yet
Classification Through Ensembling Techniques
10 pages
Ensemble Learning Methods
100% (1)
Ensemble Learning Methods
24 pages
Visual_Ensemble_Techniques_Presentation
No ratings yet
Visual_Ensemble_Techniques_Presentation
13 pages
Ensemble Learning
100% (1)
Ensemble Learning
7 pages
Ensemble Machine Learning Approach
No ratings yet
Ensemble Machine Learning Approach
13 pages
Class Adv Classification V
No ratings yet
Class Adv Classification V
50 pages
Ensembles of Classifiers: Evgueni Smirnov
No ratings yet
Ensembles of Classifiers: Evgueni Smirnov
43 pages
Unit 4 Part 1
No ratings yet
Unit 4 Part 1
47 pages
Machine learning lecture 2,3,4
No ratings yet
Machine learning lecture 2,3,4
26 pages
AIML Lect6 Ensembles
No ratings yet
AIML Lect6 Ensembles
41 pages
ML Unit-3
No ratings yet
ML Unit-3
28 pages
Ways to Achieve Quality
From Everand
Ways to Achieve Quality
chakrapani srinivasa
5/5 (1)
(Opiupouiofdghdfgh 4456
No ratings yet
(Opiupouiofdghdfgh 4456
3 pages
ADC Section 1 - Introduction
No ratings yet
ADC Section 1 - Introduction
6 pages
The Human Relations Approach
No ratings yet
The Human Relations Approach
4 pages
Lubrication of Spline Shafts
No ratings yet
Lubrication of Spline Shafts
2 pages
MSDS Asam Nitrat
No ratings yet
MSDS Asam Nitrat
7 pages
fs1 L.e1
No ratings yet
fs1 L.e1
14 pages
3109
No ratings yet
3109
27 pages
FINAL LEGAL WRITING Tone, Letters and Typography
No ratings yet
FINAL LEGAL WRITING Tone, Letters and Typography
31 pages
Bacteriostatic and Hepatoprotective Properties of
No ratings yet
Bacteriostatic and Hepatoprotective Properties of
220 pages
Conversation Strategies
0% (1)
Conversation Strategies
8 pages
Hazop PDF
100% (3)
Hazop PDF
43 pages
امتحان تاريخ الفكر الاقتصادي 24 (1) -استدراكي (1)
No ratings yet
امتحان تاريخ الفكر الاقتصادي 24 (1) -استدراكي (1)
4 pages
Tutorial 1 Co Solutions Tit 402
No ratings yet
Tutorial 1 Co Solutions Tit 402
14 pages
Remotely David Thomson All Chapters Instant Download
100% (3)
Remotely David Thomson All Chapters Instant Download
51 pages
Project Assignment - 3
No ratings yet
Project Assignment - 3
6 pages
Marriot
100% (1)
Marriot
38 pages
Hssreporter - Com - Second Year March Exam Time Table 2024
No ratings yet
Hssreporter - Com - Second Year March Exam Time Table 2024
1 page
Earned Value Management
No ratings yet
Earned Value Management
11 pages
Emotional Flexibility
No ratings yet
Emotional Flexibility
2 pages
3RV10314HA10 Datasheet en
No ratings yet
3RV10314HA10 Datasheet en
4 pages
Study of Normalized Difference Built-Up (Ndbi) Index in Automatically Mapping Urban Areas From Landsat TM Imagery
100% (1)
Study of Normalized Difference Built-Up (Ndbi) Index in Automatically Mapping Urban Areas From Landsat TM Imagery
8 pages
VAS5054A
No ratings yet
VAS5054A
3 pages
Report On Promotions: Date: Teacher: Division: Calapan City School: Balingayan Elementary School Grade
No ratings yet
Report On Promotions: Date: Teacher: Division: Calapan City School: Balingayan Elementary School Grade
12 pages
Lagman, Alyssa Noreen R. IA12201
No ratings yet
Lagman, Alyssa Noreen R. IA12201
5 pages
ITIL Foundation PDF
No ratings yet
ITIL Foundation PDF
5 pages
UKSSSC LT Grade Teacher Syllabus 2024, Exam Pattern, Marking Scheme
No ratings yet
UKSSSC LT Grade Teacher Syllabus 2024, Exam Pattern, Marking Scheme
1 page
Ki Umbara
No ratings yet
Ki Umbara
103 pages

Module 7 - Ensemble Learning

Uploaded by

Module 7 - Ensemble Learning

Uploaded by

Ensemble Learning

– Performance: An ensemble can make better predictions and achieve

– Robustness: An ensemble reduces the spread or dispersion of the

• Weighted voting – based on accuracy (e.g. Boosting), Expertise, etc.

• Stacking - Learn the combination function

data = datasets.load_wine(as_frame = True)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state =

# Create a bagging classifier

# Fit the model

# Append the model and score to their respective list

# Generate the plot of the scores against a number of the estimators

# Adjust labels and font (to make them visible)

# show the plot

– XGBoost (Extreme Gradient Boosting)

– GBM (Gradient Boosting Machine)

– ADABoost (Adaptive Boosting)

– Level-0 Models (Base-Models): Models fit on the training

– Level-1 Model (Meta-Model): Model that learns how to best

– Regression Voting Ensemble: Predictions are the average

– Classification Voting Ensemble: Predictions are the

• Hard voting is appropriate when the models used in the

• Soft voting is appropriate when the models used in the

– Hard voting is for models that predict class labels.

• The voting ensemble is not guaranteed to provide better

• A voting ensemble is particularly useful for machine learning

You might also like