0% found this document useful (0 votes)
26 views27 pages

107 Boostong Models

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views27 pages

107 Boostong Models

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

107

BOOSTING
TECHNIQUES

S U N I L G O RA N T L A
W H AT I S B O O S T I N G ?
• Boosting is an ensemble technique in machine learning that
focuses on improving the accuracy of a predictive model by
combining multiple weak learners to form a strong learner.
Here's how boosting works:
TYPES OF BOOSTING
ALGORITHMS:
• AdaBoost (Adaptive Boosting): One of the earliest and simplest
forms of boosting, where the emphasis is on correcting the
misclassifications made by the previous model.
• Gradient Boosting: A more advanced method where each new
model is trained to minimize the residual errors (the difference
between the actual and predicted values) from the previous models.
• XGBoost (Extreme Gradient Boosting): An optimized version of
gradient boosting that is faster and more efficient, often used in
competitions like Kaggle.
CHALLENGES:
• Computationally Intensive: Since boosting involves training
multiple models sequentially, it can be computationally
expensive.
• Sensitive to Noisy Data: If the dataset contains a lot of noise,
boosting can overemphasize these errors, leading to overfitting.
EXAMPLE
Y _ TA R G E T = Y _ P R E D _ B A S E + R E S I D UA L

Y_pred_m
Feature X Target y Y_pred Residual
1

2 3 8.4 -5.4 -2.7

3 6 8.4 -2.4 -2.7

4 8 8.4 -.4 -2.7

5 11 8.4 2.6 4.1

6 14 8.4 5.6 4.1


SR = (sum(res))^2/(#no of res + lambda)

SR_parent = 0

SR_child1 = 22.41
SR_child2 = 33.62

Gain = abs(SR_parent - (SR_child1 + SR_child


Gain = 56.03

Y_tar = y_pred_base + y_pred_m1


X = 2,3,4 X = 5,6
Res = -5.4, -2.4, -0.4 Res = 2.6, 5.6
• Approach 1
• Y = w1*x1 + w2*x2 …… wi*xi + w0
• MSE = Sum((y_actual – y_pred)^2) = 5000
• W1 = 10000

Approach 2
•K=1
• MSE = Sum((y_actual – y_pred)^2) + k*(w1+w2 + w3 …wi)
• W1 = ????
X = 2,3,4 X = 5,6
Y = 3,6,8 Y = 11,14

Y_pred =5.6 Y_pred = 12.5

X = 2,3 X=4
Y = 3,6 Y=8
Y_pred = 4.5 Y_pred = 8
STEP 1: BASE MODEL (INITIAL
PREDICTION)
• Base Prediction: Start with an initial prediction, typically the
mean of the target values.
• Mean of y=8.4
• The base model predicts 8.4 for every data point.

• Compute residuals from Base Model


STEP 2: FIRST MODEL M1
• Calculate Residuals: Subtract the base prediction from actual
values.
• Residuals for each row:
• Row 1: 3−8.4=−5.43
• Row 2: 6−8.4=−2.46
• Row 3: 8−8.4=−0.4
• Row 4: 11−8.4=2.6
• Row 5: 14−8.4=5.6
C O M PA R I S O N
Bagging Boosting

• Parallel • Sequential
• Equal weight • All models cannot equal weightage
• Can use fully grown models. Ex fully • Weak Learners like stumps
grown trees can be used in random
forest.
Y = y1+ 0.2*y2 + 0.1*y3
ADA-BOOST
Age BMI Gender Pred (first residual
model)
25 24 M 1 M 0
41 19 M 1/8 M 0
43 25 F 1/8 M 1
22 31 F 1/8 F 0
28 22 F 1/8 M 1
31 30 M 1/8 M 0
52 26 M 1/8 F 1
41 26 F 1/8 F 0
• 10000 data points
• Equal Weightage to all data points
• 500 data points  Build a model  400 correct predictions and
100 wrong predictions.  Update all the weights giving more
weightage to wrongly predicted rows.
• No prediction, correctly predicted, and wrongly predicted.
WORKING
• Step 1: Initialize weights equally for all training samples.
• Step 2: Train a weak classifier on the weighted dataset.
• Step 3: Increase weights of incorrectly classified samples to
focus the next classifier on these errors.
• Step 4: Combine weak classifiers by their performance to form
the final strong classifier.
Feature AdaBoost Gradient Boosting

Optimizes the loss function by fitting new


Adjusts the weights of misclassified models to the residual errors of the
Algorithm Focus samples to focus on harder cases. previous models.

Typically uses simple weak learners like Can use any differentiable model,
Base Learner decision stumps. commonly decision trees.

Misclassified samples receive higher Subsequent models are trained to correct


Error Handling weights in subsequent models. the residual errors from previous models.

Updates the weights of samples based on Does not explicitly update weights; focuses
Weight Update whether they are correctly classified. on reducing the overall loss.
GRADIENT Combines classifiers based on their Combines classifiers by summing their

BOOSTING
Classifier Combination weighted majority vote. outputs, adjusted by a learning rate.

Implicitly handled through the weight Explicitly defined as a parameter to scale


Learning Rate adjustment process. the contribution of each learner.

More sensitive to outliers as they receive Less sensitive to outliers as it gradually


Sensitivity to Outliers higher weights. reduces the error over iterations.

Effective for binary and multi-class Versatile, used for both regression and
Application classification problems. classification tasks.

Good performance with simple models; Tends to perform better on complex


Performance may struggle with complex problems. problems due to iterative refinement.

Generally faster with fewer iterations but More computationally expensive due to
Computational Cost can be slower with large datasets. iterative nature and complex models.
Catboost Lightgbm Xgboost

1. Symmetric tree 1. leaf based 1. Level Based

2. Encoding Categorical 2. Encoding 2. Encoding


features: Categorical Categorical
1. Ordered target features: features:
based encoding 1. Bins/Bucketing 1. Manual
3. MVP 3. Sampling 3. Sampling:
Technique: Bootstrap
1. GOSS (Random Sample)
Training data set size: 50,000
(Count + prior)/(Max Count + 1)

O R D E R E D TA R G E T B A S E D
ENCODING
Age BMI Gender Target
(Target) based
encoding
25 24 M 24.75
41 19 M 24.75
43 25 F
22 31 F
28 22 F
31 30 M (2+0.5)/(3 +
1)
52 26 M
41 26 F
Y = w*x + c
Y = 1000*x + 500

Cost function = Summation((y_act –


y_pred)^2))

Parameters = w, c

Step w = 0 , c= 0
Cost function = 1000
• W = 0, c = 0, cost_function = 10000
• Step1 : w = 10, c =0, cost_function = 9500
• Step99: w = 100, c =0, cost_function = 5500

• W_new = W_old + lr* slope


• Loss function = RMSE + l^2*weights

You might also like