Lecture 6
Lecture 6
Learning
Presented BY:
Dr. Amran Hossain
Associate Professor
CSE,DUET
Ensemble Methods, what are they?
• Ensemble method is a machine learning
technique that combines several base models in
order to produce one optimal predictive model. To
better understand this definition lets take a step
back into ultimate goal of machine learning and
model building.
• Ensemble learning is a general meta approach to
machine learning that seeks better predictive
performance by combining the predictions from
multiple models.
• Ensemble methods are techniques that aim at
improving the accuracy of results in models by
combining multiple models instead of using a
single model. The combined models increase the
accuracy of the results significantly
Types of Ensemble Methods
BAGGING
• Bagging, the short form for bootstrap aggregating, is mainly applied in classification
and regression.
• It increases the accuracy of models through decision trees, which reduces variance to
a large extent. The reduction of variance increases accuracy, eliminating overfitting,
which is a challenge to many predictive models.
• Bagging is classified into two types, i.e., bootstrapping and aggregation.
• Bootstrapping is a sampling technique where samples are derived from the whole population (set)
using the replacement procedure. The sampling with replacement method helps make the
selection procedure randomized. The base learning algorithm is run on the samples to complete
the procedure.
• Aggregation in bagging is done to incorporate all possible outcomes of the prediction and
randomize the outcome. Without aggregation, predictions will not be accurate because all
outcomes are not put into consideration. Therefore, the aggregation is based on the probability
bootstrapping procedures or on the basis of all outcomes of the predictive models.
BAGGING
• BAGGing, or Bootstrap AGGregating.
BAGGing gets its name because it
combines Bootstrapping and
Aggregation to form one ensemble
model.
• Given a sample of data, multiple
bootstrapped subsamples are pulled.
• A Decision Tree is formed on each of
the bootstrapped subsamples.
• After each subsample Decision Tree
has been formed, an algorithm is used
to aggregate over the Decision Trees to
form the most efficient predictor. The
image below will help explain:
BOOSTING
• Boosting is an ensemble technique that learns from previous predictor
mistakes to make better predictions in the future.
• The technique combines several weak base learners to form one strong
learner, thus significantly improving the predictability of models.
• Boosting works by arranging weak learners in a sequence, such that weak
learners learn from the next learner in the sequence to create better
predictive models.
• Boosting takes many forms, including gradient boosting, Adaptive Boosting
(AdaBoost), and XGBoost (Extreme Gradient Boosting).
• AdaBoost uses weak learners in the form of decision trees, which mostly
include one split that is popularly known as decision stumps.
• AdaBoost’s main decision stump comprises observations carrying similar
weights.
BOOSTING
• Gradient boosting adds predictors sequentially to the
ensemble, where preceding predictors correct their
successors, thereby increasing the model’s accuracy. New
predictors are fit to counter the effects of errors in the
previous predictors.
• The key property of boosting ensembles is the idea of
correcting prediction errors. The models are fit and added to
the ensemble sequentially such that the second model
attempts to correct the predictions of the first model, the
third corrects the second model, and so on.
• The gradient of descent helps the gradient booster identify
problems in learners’ predictions and counter them
accordingly.
• XGBoost makes use of decision trees with boosted gradient,
providing improved speed and performance.
• It relies heavily on the computational speed and the
performance of the target model. Model training should
follow a sequence, thus making the implementation of
gradient boosted machines slow.
BOOSTING
Stacking
• Stacking, another ensemble method, is often referred to as
stacked generalization. This technique works by allowing a
training algorithm to ensemble several other similar learning
algorithm predictions. Stacking has been successfully
implemented in regression, density estimations, distance
learning, and classifications. It can also be used to measure
the error rate involved during bagging.
• Stacked Generalization, or stacking for short, is an ensemble
method that seeks a diverse group of members by varying
the model types fit on the training data and using a model to
combine predictions.
• Stacking is a general procedure where a learner is trained to
combine the individual learners. Here, the individual learners
are called the first-level learners, while the combiner is called
the second-level learner, or meta-learner.
Random Forest
• Random Forest is a popular machine learning algorithm that belongs to
the supervised learning technique. It can be used for both Classification
and Regression problems in ML.
• It is based on the concept of ensemble learning, which is a process of
combining multiple classifiers to solve a complex problem and to improve
the performance of the model.
• As the name suggests, "Random Forest is a classifier that contains a
number of decision trees on various subsets of the given dataset and
takes the average to improve the predictive accuracy of that dataset.“
• Instead of relying on one decision tree, the random forest takes the
prediction from each tree and based on the majority votes of predictions,
and it predicts the final output.
Random Forest
Assumptions for Random Forest
• Since the random forest combines multiple trees to
predict the class of the dataset, it is possible that some
decision trees may predict the correct output, while
others may not.
• But together, all the trees predict the correct output.
Therefore, below are two assumptions for a better
Random forest classifier:
•There should be some actual values in the feature
variable of the dataset so that the classifier can predict
accurate results rather than a guessed result.
•The predictions from each tree must have very low
correlations.
Random Forest
• Why use Random Forest?
• Below are some points that explain why we should use the Random
Forest algorithm:
• It takes less training time as compared to other algorithms.
• It predicts output with high accuracy, even for the large dataset it runs
efficiently.
• It can also maintain accuracy when a large proportion of data is missing.
How does Random Forest algorithm work?
• Random Forest works in two-phase first is to create the random forest by
combining N decision tree, and second is to make predictions for each tree
created in the first phase.
• The Working process can be explained in the below steps and diagram:
• Step-1: Select random K data points from the training set.
• Step-2: Build the decision trees associated with the selected data points (Subsets).
• Step-3: Choose the number N for decision trees that you want to build.
• Step-4: Repeat Step 1 & 2.
• Step-5: For new data points, find the predictions of each decision tree, and assign the
new data points to the category that wins the majority votes.
Example of random Forest
• Suppose there is a dataset that contains multiple fruit images. So, this dataset is given to the Random forest classifier. The dataset is
divided into subsets and given to each decision tree. During the training phase, each decision tree produces a prediction result, and
when a new data point occurs, then based on the majority of results, the Random Forest classifier predicts the final decision. Consider
the below image:
Prediction Class
Random Forest(RF)
• Calculating and Combine Majority Voting:
• (1,0,1,1)