Module 7 - Ensemble Learning
Module 7 - Ensemble Learning
Ensemble Learning
• An ensemble is a composite model that combines a series of low-
performing classifiers with the aim of creating an improved classifier.
• Here, individual classifier vote and final prediction label returned that
performs majority voting.
• Ensembles offer more accuracy than individual or base classifiers.
• Ensemble methods can parallelize by allocating each base learner to
different-different machines.
• Finally, you can say Ensemble learning methods are meta-algorithms that
combine several machine learning methods into a single predictive
model to increase performance.
• Ensemble methods can decrease variance using a bagging approach, bias
using a boosting approach, or improve predictions using a stacking
approach.
Real life examples
Let's take a real example to build intuition.
• Suppose, you want to invest in a company XYZ.
• You are not sure about its performance, though.
• So, you look for advice on whether the stock price will
increase by more than 6% per annum or not?
• You decide to approach various experts having diverse
domain experience.
The survey prediction
• Employee of Company XYZ:
– In the past, he has been right 70% times.
• Financial Advisor of Company XYZ:
– In the past, he has been right 75% times.
• Stock Market Trader:
– In the past, he has been right 70% times.
• Employee of a competitor:
– In the past, he has been right 60% times.
• Market Research team in the same segment:
– In the past, he has been right 75% times.
• Social Media Expert:
– In the past, he has been right 65% times.
Conclusion
• Given the broad spectrum of access you have, you can probably combine
all the information and make an informed decision.
• In a scenario when all the 6 experts/teams verify that it’s a good
decision(assuming all the predictions are independent of each other), you
will get a combined accuracy rate of 1 - (30% . 25% . 30% . 40% . 25% .
35%) = 1 - 0.07875 = 99.92125%
• The assumption used here that all the predictions are completely
independent is slightly extreme as they are expected to be correlated.
However, you can see how we can be so sure by combining various
forecasts together.
• Well, Ensemble learning is no different.
Ensembling
• An ensemble is the art of combining a diverse set of learners
(individual models) together to improvise on the stability and
predictive power of the model.
• In our example, the way we combine all the predictions collectively will
be termed as Ensemble learning.
• Moreover, Ensemble-based models can be incorporated in both of the
two scenarios, i.e., when data is of large volume and when data is
too little.
Basic Ensemble Structure
Summary
• Use multiple learning algorithms (classifiers)
• Combine the decisions
• Can be more accurate than the individual classifiers
• Generate a group of base-learners
• Different learners use different
– Algorithms
– Hyperparameters
– Representations (Modalities)
– Training sets
How models are different?
• Difference in population
• Difference in hypothesis
• Difference in modelling technique
• Difference in initial seed
Why ensembles?
• There are two main reasons to use an ensemble over a single model, and
they are related; they are:
• Boosting
• Stacking
Bagging
• Bagging stands for bootstrap aggregation.
• It combines multiple learners in a way to reduce the variance of
estimates.
• It involves training multiple instances of a base model on different
subsets of the training data and then combining their predictions.
• For example, random forest trains M Decision Tree, you can train M
different trees on different random subsets of the data and perform
voting for final prediction.
• Example:
– Random Forest
– Extra Trees
Bagging
Bagging Example
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.ensemble import BaggingClassifier
models = []
scores = []
for n_estimators in estimator_range:
The main task of it is decrease the variance The main task of it is decrease the bias but
but not bias. not variance.
Here each of the model is different weight. Here each of the model is same weight.
Each of the model is built here Each of the model is built here dependently.
independently.
This training records subsets are decided on Each new subset consists of the factors that
using row sampling with alternative and were misclassified through preceding
random sampling techniques from the whole models.
training dataset.
It is trying to solve by over fitting problem. It is trying to solve by reducing the bias.
If the classifier is volatile (excessive If the classifier is stable and easy (excessive
variance), then apply bagging. bias) the practice boosting.
In the bagging base, the classifier is works In the boosting base, the classifier is works
parallelly. sequentially.
Example is random forest model by using Example is AdaBoost using the boosting
Stacking
• The approach to this question is to use another machine
learning model that learns when to use or trust each model in
the ensemble.
– Unlike bagging, in stacking, the models are typically
different (e.g. not all decision trees) and fit on the same dataset
(e.g. instead of samples of the training dataset).
– Unlike boosting, in stacking, a single model is used to
learn how to best combine the predictions from the contributing
models (e.g. instead of a sequence of models that correct the
predictions of prior models).
Stacking
• The architecture of a stacking model involves two or more base
models, often referred to as level-0 models, and a meta-model
that combines the predictions of the base models, referred to as a
level-1 model.
• Hard voting involves summing the predictions for each class label
and predicting the class label with the most votes.
• Soft voting involves summing the predicted probabilities (or
probability-like scores) for each class label and predicting the class
label with the largest probability.
– Hard Voting. Predict the class with the largest sum of votes
from models
– Soft Voting. Predict the class with the largest summed
probability from models.
Voting
• Use voting ensembles when:
– All models in the ensemble have generally the same
good performance.
– All models in the ensemble mostly already agree.