0% found this document useful (0 votes)
17 views9 pages

Time To Explore (5) ML

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views9 pages

Time To Explore (5) ML

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Time To Explore[5]

Submitted To: -
Submitted By: -
Dr. Nilamadhab Mishra
Aditya Vikram
Faculty, SCSE
Singh

(Reg: 22BCE10522)
Ques 1 . What do you mean by Ensemble Learning? Investigate
common ensemble learning methods?

Ans. Ensemble learning is a technique in machine learning where multiple models, often
called "weak learners," are trained to solve the same problem and combined to get better
performance than any of the individual models could achieve on their own. The idea is
that by combining the predictions from multiple models, the ensemble can reduce the risk
of overfitting and improve generalization to new data.

Common Ensemble Learning Methods

1. Bagging (Bootstrap Aggregating)


o Description: Bagging involves training multiple versions of a model on
different random subsets of the training data (with replacement) and then
averaging the predictions (for regression) or taking a majority vote (for
classification).
o Example Algorithm: Random Forest
▪ Combines the predictions of many decision trees, each trained on a
different subset of the data.
2. Boosting
o Description: Boosting trains models sequentially, with each new model
focusing on the errors made by the previous ones. The final prediction is a
weighted sum of the predictions from all models.
o Example Algorithms:
▪ AdaBoost: Adjusts the weights of incorrectly classified instances so
that subsequent models focus more on difficult cases.
▪ Gradient Boosting Machines (GBM): Builds models in a stage-wise
fashion by optimizing a loss function. Popular implementations
include XGBoost and LightGBM.
3. Stacking (Stacked Generalization)
o Description: Stacking involves training multiple base models and then
using another model, called a meta-learner, to combine the base models'
predictions. The meta-learner is trained to make the final prediction based
on the predictions of the base models.
o Example: Using logistic regression as the meta-learner to combine the
predictions from a random forest, a support vector machine, and a neural
network.
4. Voting
o Description: Voting is a simple ensemble method where multiple models
are trained and their predictions are combined by voting (for classification)
or averaging (for regression). It can be either hard voting (majority voting) or
soft voting (weighted average of probabilities).
o Example: Using different classifiers like decision trees, logistic regression,
and k-nearest neighbors, and combining their predictions by majority vote.
5. Bagging with Different Models
o Description: Instead of using the same model multiple times as in bagging,
this method uses different types of models to create the ensemble.
o Example: Combining the predictions of a decision tree, a logistic regression
model, and a support vector machine.

Benefits of Ensemble Learning

• Improved Accuracy: By combining multiple models, ensembles can often achieve


higher accuracy than individual models.
• Reduced Overfitting: Ensembles can help mitigate the risk of overfitting by
averaging out the biases of individual models.
• Robustness: Ensembles are more robust to noise and variability in the data.

Ensemble learning is a powerful tool in machine learning, and understanding these


methods can help in building more accurate and reliable models.
Ques 2. Investigate the Model Combination Schemes in Ensemble
Learning with some scenarios and architectures. ---Bagging—
Boosting---Stacking?
Ans. Bagging (Bootstrap Aggregating)

Scenario: You have a dataset and want to create a robust classifier to


reduce variance and prevent overfitting.

Architecture:

1. Data Sampling: Generate multiple subsets of the training data by


sampling with replacement (bootstrap samples).
2. Training: Train a separate model (e.g., decision tree) on each
bootstrap sample.
3. Combining Predictions: Aggregate the predictions from all models.
For classification, use majority voting; for regression, use averaging.

Example: Random Forest

• Step 1: Create multiple bootstrapped subsets of the original training


data.
• Step 2: Train a decision tree on each subset.
• Step 3: Combine the predictions of all decision trees by majority vote
for classification or averaging for regression.

Advantages:

• Reduces variance
• Handles overfitting better than a single model

Disadvantages:

• Can be computationally intensive


• Does not reduce bias

Boosting

Scenario: You want to create a strong model by focusing on the errors of


previous models and sequentially building a better composite model.

Architecture:

1. Initial Model: Train an initial weak model on the training data.


2. Error Focus: Adjust the weights of misclassified instances, increasing
their importance.
3. Sequential Training: Train subsequent models focusing more on the
errors of the previous model.
4. Combining Predictions: Aggregate the predictions from all models,
typically using a weighted sum.

Example: AdaBoost

• Step 1: Train a weak classifier (e.g., decision stump) on the original


dataset.
• Step 2: Increase the weights of misclassified instances.
• Step 3: Train a new classifier focusing on the harder instances.
• Step 4: Repeat the process for a specified number of iterations.
• Step 5: Combine the classifiers using a weighted vote based on their
accuracy.

Advantages:

• Reduces both bias and variance


• Can improve the performance of weak learners

Disadvantages:

• Can be sensitive to noisy data and outliers


• Computationally intensive

Stacking (Stacked Generalization)

Scenario: You want to leverage the strengths of different models by


combining them in a sophisticated way.

Architecture:

1. Base Models: Train multiple different base models (e.g., decision


trees, logistic regression, SVM).
2. Meta-Learner: Train a meta-learner (e.g., logistic regression) on the
predictions of the base models.
3. Combining Predictions: Use the meta-learner to make the final
prediction based on the base models' predictions.
Example: Stacked Ensemble

• Step 1: Train base models (e.g., decision tree, SVM, neural network)
on the training data.
• Step 2: Use the predictions of the base models as input features for
the meta-learner.
• Step 3: Train the meta-learner on this new feature set.
• Step 4: For final prediction, use the meta-learner to combine the
predictions from the base models.

Advantages:

• Can capture complex patterns by combining diverse models


• Often achieves better performance than individual models
Ques 3. Investigate the Error-Correcting Output Codes(ECOC)
ensemble method with an application scenario?
Ans. Error-Correcting Output Codes (ECOC)

Error-Correcting Output Codes (ECOC) is an ensemble method used


primarily for multiclass classification problems. ECOC works by
decomposing a multiclass classification problem into multiple binary
classification problems. This method draws inspiration from error-correcting
codes used in digital communication, aiming to improve classification
robustness and accuracy.

How ECOC Works

1. Coding Matrix:
o Create a binary coding matrix where rows correspond to the
original classes and columns correspond to the binary classifiers.
o Each class is represented by a binary string (code word), and
each binary classifier distinguishes between a subset of classes.
2. Training Binary Classifiers:
o Train a separate binary classifier for each column of the coding
matrix.
o Each classifier learns to distinguish between two groups of
classes based on the coding matrix.
3. Decoding Predictions:
o For a new instance, get predictions from all binary classifiers.
o Compare the resulting binary string to each row in the coding
matrix to find the closest match (using a distance measure like
Hamming distance).
4. Final Prediction:
o The class whose code word is closest to the predicted binary
string is chosen as the final class prediction.

Example Scenario: Handwritten Digit Recognition

Application: Recognizing handwritten digits (0-9) using the MNIST dataset.

1. Coding Matrix Construction:


o Create a coding matrix with 10 rows (one for each digit) and
several columns (each representing a binary classifier).
Example coding matrix (simplified for illustration):

2. Training Binary Classifiers:

• Train a binary classifier for each column in the coding matrix.


• For the first classifier, class 0 vs. all others, class 1 vs. all others, etc.

3. Classifying New Instances:

• For a new handwritten digit image, get predictions from each binary
classifier.
• Example predictions: [1, 0, 1, 0].

Decoding Predictions:

• Compare the predicted binary string [1, 0, 1, 0] with each row in the
coding matrix.
• Calculate the Hamming distance between the predicted code and each
class code
Final Prediction:s

• The digit whose code has the smallest Hamming distance to the
predicted binary string is selected (in this case, digit 1).

You might also like