107 Boostong Models
107 Boostong Models
BOOSTING
TECHNIQUES
S U N I L G O RA N T L A
W H AT I S B O O S T I N G ?
• Boosting is an ensemble technique in machine learning that
focuses on improving the accuracy of a predictive model by
combining multiple weak learners to form a strong learner.
Here's how boosting works:
TYPES OF BOOSTING
ALGORITHMS:
• AdaBoost (Adaptive Boosting): One of the earliest and simplest
forms of boosting, where the emphasis is on correcting the
misclassifications made by the previous model.
• Gradient Boosting: A more advanced method where each new
model is trained to minimize the residual errors (the difference
between the actual and predicted values) from the previous models.
• XGBoost (Extreme Gradient Boosting): An optimized version of
gradient boosting that is faster and more efficient, often used in
competitions like Kaggle.
CHALLENGES:
• Computationally Intensive: Since boosting involves training
multiple models sequentially, it can be computationally
expensive.
• Sensitive to Noisy Data: If the dataset contains a lot of noise,
boosting can overemphasize these errors, leading to overfitting.
EXAMPLE
Y _ TA R G E T = Y _ P R E D _ B A S E + R E S I D UA L
Y_pred_m
Feature X Target y Y_pred Residual
1
SR_parent = 0
SR_child1 = 22.41
SR_child2 = 33.62
Approach 2
•K=1
• MSE = Sum((y_actual – y_pred)^2) + k*(w1+w2 + w3 …wi)
• W1 = ????
X = 2,3,4 X = 5,6
Y = 3,6,8 Y = 11,14
X = 2,3 X=4
Y = 3,6 Y=8
Y_pred = 4.5 Y_pred = 8
STEP 1: BASE MODEL (INITIAL
PREDICTION)
• Base Prediction: Start with an initial prediction, typically the
mean of the target values.
• Mean of y=8.4
• The base model predicts 8.4 for every data point.
• Parallel • Sequential
• Equal weight • All models cannot equal weightage
• Can use fully grown models. Ex fully • Weak Learners like stumps
grown trees can be used in random
forest.
Y = y1+ 0.2*y2 + 0.1*y3
ADA-BOOST
Age BMI Gender Pred (first residual
model)
25 24 M 1 M 0
41 19 M 1/8 M 0
43 25 F 1/8 M 1
22 31 F 1/8 F 0
28 22 F 1/8 M 1
31 30 M 1/8 M 0
52 26 M 1/8 F 1
41 26 F 1/8 F 0
• 10000 data points
• Equal Weightage to all data points
• 500 data points Build a model 400 correct predictions and
100 wrong predictions. Update all the weights giving more
weightage to wrongly predicted rows.
• No prediction, correctly predicted, and wrongly predicted.
WORKING
• Step 1: Initialize weights equally for all training samples.
• Step 2: Train a weak classifier on the weighted dataset.
• Step 3: Increase weights of incorrectly classified samples to
focus the next classifier on these errors.
• Step 4: Combine weak classifiers by their performance to form
the final strong classifier.
Feature AdaBoost Gradient Boosting
Typically uses simple weak learners like Can use any differentiable model,
Base Learner decision stumps. commonly decision trees.
Updates the weights of samples based on Does not explicitly update weights; focuses
Weight Update whether they are correctly classified. on reducing the overall loss.
GRADIENT Combines classifiers based on their Combines classifiers by summing their
BOOSTING
Classifier Combination weighted majority vote. outputs, adjusted by a learning rate.
Effective for binary and multi-class Versatile, used for both regression and
Application classification problems. classification tasks.
Generally faster with fewer iterations but More computationally expensive due to
Computational Cost can be slower with large datasets. iterative nature and complex models.
Catboost Lightgbm Xgboost
O R D E R E D TA R G E T B A S E D
ENCODING
Age BMI Gender Target
(Target) based
encoding
25 24 M 24.75
41 19 M 24.75
43 25 F
22 31 F
28 22 F
31 30 M (2+0.5)/(3 +
1)
52 26 M
41 26 F
Y = w*x + c
Y = 1000*x + 500
Parameters = w, c
Step w = 0 , c= 0
Cost function = 1000
• W = 0, c = 0, cost_function = 10000
• Step1 : w = 10, c =0, cost_function = 9500
• Step99: w = 100, c =0, cost_function = 5500