ML-Lecture-15-Ensemble
ML-Lecture-15-Ensemble
Email: [email protected]
Ensemble Learning
A powerful way to improve the performance of your model
Construct a set of classifiers from training data
Predict class label of test data by combining the predictions made
by multiple classifiers or models
Examples: Random Forest, AdaBoost, Stochastic Gradient Boosting,
Gradient Boosting Machine(GBM), XGBoost, LightGBM, CatBoost
General Approach
Original
D Training data
Step 1:
Create Multiple D1 D2 .... Dt-1 Dt
Data Sets
Step 2:
Build Multiple C1 C2 Ct -1 Ct
Classifiers
Step 3:
Combine C*
Classifiers
Simple Ensemble Techniques
Max Voting
Averaging
Weighted Averaging
Max Voting
Multiple models are used to make predictions for each data point
The predictions by each model are considered as a ‘vote’
The predictions which we get from the majority of the models are
used as the final prediction
Generally used for classification problems
For example, when you asked 5 of your colleagues to rate your movie (out of
5); we’ll assume three of them rated it as 4 while two of them gave it a 5. Since
the majority gave a rating of 4, you can take the final rating of the movie as 4.
You can consider this as taking the mode of all the predictions.
Averaging
Similar to the max voting technique, multiple predictions are made
for each data
Take an average of predictions from all the models and use it to
make the final prediction.
Averaging can be used in regression or classification problems.
For example, in the previous case study of max voting, the averaging method
would take the average of all the values, i.e. (5+4+5+4+4)/5 = 4.4.
Hence, final rating of the movie is 4.4.
Weighted Averaging
This is an extension of the averaging method.
All models are assigned different weights defining the importance
of each model for prediction.
For example, if two of your colleagues are critics, while others have no prior
experience in this field, then the answers by these two friends are given more
importance as compared to the other people.
The result can be calculated as [(5*0.23) + (4*0.23) + (5*0.18) + (4*0.18) +
(4*0.18)] = 4.41.
Hence, final rating of the movie is 4.41.
Ref: QuantDare
Stacking Ensemble Learning
Level 0
Level 1
Enroll now
Study Materials of Ensemble Methods
AnalyticsVidhya: A Comprehensive Guide to Ensemble Learning
(with Python codes)
GeeksForGeeks: Ensemble Method in Python
AnalyticsVidhya: Basics of Ensemble Learning Explained in Simple
English
Dataaspirant: How the Kaggle winners algorithm XGBoost algorithm
works