0% found this document useful (0 votes)
17 views54 pages

Unit V - Multiple Learners

The document discusses various ensemble learning techniques including multiple learners, voting methods, Error-Correcting Output Codes (ECOC), bagging, boosting, stacking, and cascading. It highlights the importance of combining different models to improve accuracy and robustness in classification tasks. Additionally, it outlines the advantages, applications, and methodologies of each technique, emphasizing their roles in enhancing predictive performance.

Uploaded by

janarthana9789
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views54 pages

Unit V - Multiple Learners

The document discusses various ensemble learning techniques including multiple learners, voting methods, Error-Correcting Output Codes (ECOC), bagging, boosting, stacking, and cascading. It highlights the importance of combining different models to improve accuracy and robustness in classification tasks. Additionally, it outlines the advantages, applications, and methodologies of each technique, emphasizing their roles in enhancing predictive performance.

Uploaded by

janarthana9789
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 54

Multiple Learners

Prepared by: Nivetha R


Department of Computer Science and
Engineering
Multiple Learners

• There is no algorithm that is always the most accurate


• Generate a group of base-learners which when combined has higher
accuracy
• Different learners use different
• Algorithms
• Hyperparameters
• Representations /Modalities/Views
• Training sets
Voting
Voting – Hard and Soft Voting
Voting
Voting
Voting
Fixed Combination Rules
Fixed Combination Rules
Error-Correcting Output Codes
1. Classification Task:
o It is divided into multiple subtasks. Instead of solving one large classification problem, it is broken
down into simpler binary classification problems.

2. Simpler Classification Problems:


o Each binary classifier (a model that outputs either -1 or +1) focuses on a specific aspect of the
task, helping to simplify the overall problem.

3. Binary Classifiers:
o Each classifier outputs either -1 or +1. These outputs correspond to the class predictions for
specific parts of the task.

4. Code Matrix (W):


o A matrix W is introduced with K rows and L columns.
o The rows represent the different classes, and the columns correspond to different classifiers (base-
learners).
o The values within the matrix are binary (-1 or +1), which act as the "codes" for each class. Each
class is represented as a unique combination of -1s and +1s across the classifiers.
Error-Correcting Output Codes
Error-Correcting Output Codes
Error-Correcting Output Codes
Error-Correcting Output Codes

• One per class


 1  1  1  1
L=K  1  1  1  1
W  
 1  1  1  1
 
 1  1  1  1
• Pairwise  1 1 1 0 0 0 
L=K(K-1)/2   1 0 0 1 1 0 
W  
 0  1 0  1 0  1
 
 0 0  1 0  1  1
ECOC -Working
ECOC -10 class problem
Error-Correcting Output Codes
Types of ECOC

• There are several variations of ECOC, each with its own characteristics and
applications.

• One-vs-All (OvA): In the One-vs-All approach, each class is compared against all other
classes. This results in a code matrix where each column has one class labeled as 1
and all others as -1. This method is simple but may not be optimal for all problems.

• One-vs-One (OvO): In the One-vs-One approach, each pair of classes is compared,


resulting in a code matrix where each column represents a binary classifier for a pair of
classes. This method can be more accurate but requires training more classifiers.

• Dense and Sparse Codes: Dense codes use a larger number of binary classifiers,
resulting in more robust error correction but higher computational cost. Sparse codes
use fewer classifiers, reducing computational cost but potentially sacrificing some
robustness.
Error-Correcting Output Codes

Advantages of Error Correcting Output Codes(ECOC)

• Robustness to Errors:

• Improved Generalization

• Flexibility

Applications of Error Correcting Output Codes(ECOC)

1. Image Classification

2. Text Classification

3. Bioinformatics

4. Speech Recognition
Bagging
Bagging

• Use bootstrapping to generate L training sets


and train one base-learner with each
(Breiman, 1996)
• Use voting (Average or median with
regression)
• Unstable algorithms profit from bagging
Bagging
What Is Bagging?

Bagging, an abbreviation for Bootstrap Aggregating, is a machine learning


ensemble strategy for enhancing the reliability and precision of predictive
models. It entails generating numerous subsets of the training data by
employing random sampling with replacement. These subsets train multiple
base learners, such as decision trees, neural networks, or other models.

During prediction, the outputs of these base learners are aggregated, often by
averaging (for regression tasks) or voting (for classification tasks), to produce
the final prediction. Bagging helps to reduce overfitting by introducing
diversity among the base learners and improves the overall performance by
reducing variance and increasing robustness
Bagging method
Bootstrap Aggregating is an ensemble learning technique that integrates many
models to produce a more accurate and robust prediction model. The following
stages are included in the bagging algorithm:

Bootstrap Sampling– It’s the process of randomly sampling a dataset with


replacement to generate various subsets known as bootstrap samples. The size of
each subset is the same as the original dataset.

Base Model Training– A base model, such as a decision tree or a neural network, is
trained individually on each bootstrap sample. Because the subsets are not similar,
each base model generates a separate prediction model.

Aggregation– The result of each base model is then aggregated via aggregation,
which is commonly accomplished by taking the average for regression problems or
the mode for classification issues. This aggregation process contributes to the
reduction of variance and the improvement of the generalization performance of the
final prediction model.

Prediction– The completed model is used to forecast fresh data


Steps of Bagging
Dataset Preparation: Prepare your dataset, ensuring it's properly
cleaned and preprocessed. Split it into a training set and a test set.

Bootstrap Sampling: Randomly sample from the training dataset with


replacement to create multiple bootstrap samples. Each bootstrap
sample should typically have the same size as the original dataset, but
some data points may be repeated while others may be omitted.

Model Training: Train a base model (e.g., decision tree, neural


network, etc.) on each bootstrap sample. Each model should be trained
independently of the others.

Prediction Generation: Use each trained model to predict the test


dataset.

Combining Predictions: Combine the predictions from all the models.


You can use majority voting to determine the final predicted class for
classification tasks. For regression tasks, you can average the
predictions.
Steps of Bagging
Evaluation: Evaluate the bagging ensemble's performance on the test dataset
using appropriate metrics (e.g., accuracy, F1 score, mean squared error, etc.).

Hyperparameter Tuning: If necessary, tune the hyperparameters of the base


model(s) or the bagging ensemble itself using techniques like cross-validation.

Deployment: Once you're satisfied with the performance of the bagging


ensemble, deploy it to make predictions on new, unseen data.
Boosting

The term ‘Boosting’ refers to a family of algorithms


which converts weak learner to strong learners.
Boosting
Boosting
Boosting
Training of Boosting Model
How Does Boosting Algorithms
Work?
• Step 1: The base learner takes all the distributions and assign equal weight or
attention to each observation.

• Step 2: If there is any prediction error caused by first base learning algorithm,
then we pay higher attention to observations having prediction error. Then, we
apply the next base learning algorithm.

• Step 3: Iterate Step 2 till the limit of base learning algorithm is reached or higher
accuracy is achieved.

Finally, it combines the outputs from weak learner and creates a strong
learner which eventually improves the prediction power of the model.
Boosting pays higher focus on examples which are mis-classified or have
higher errors by preceding weak rules.
AdaBoost
AdaBoost
• Box 1: You can see that we have assigned equal weights to each data point and
applied a decision stump to classify them as + (plus) or – (minus). The decision
stump (D1) has generated vertical line at left side to classify the data points. We
see that, this vertical line has incorrectly predicted three + (plus) as – (minus). In
such case, we’ll assign higher weights to these three + (plus) and apply another
decision stump.
AdaBoost
• Box 2: Here, you can see that the size of three incorrectly predicted + (plus) is
bigger as compared to rest of the data points. In this case, the second decision
stump (D2) will try to predict them correctly. Now, a vertical line (D2) at right side
of this box has classified three mis-classified + (plus) correctly. But again, it has
caused mis-classification errors. This time with three -(minus). Again, we will
assign higher weight to three – (minus) and apply another decision stump.
AdaBoost
• Box 3: Here, three – (minus) are given higher weights. A decision stump (D3) is
applied to predict these mis-classified observation correctly. This time a
horizontal line is generated to classify + (plus) and – (minus) based on higher
weight of mis-classified observation.
AdaBoost
• Box 3: Here, three – (minus) are given higher weights. A decision stump (D3) is
applied to predict these mis-classified observation correctly. This time a
horizontal line is generated to classify + (plus) and – (minus) based on higher
weight of mis-classified observation.
AdaBoost
Boosting
Boosting
Stacking

• Stacked generalization is an ensemble method where a new model


learns how to best combine the predictions from multiple existing
models
• In stacking, an algorithm takes the outputs of sub-models as input
and attempts to learn how to best combine the input predictions to
make a better output prediction.
• Can be Non Linear
Stacking

• Combiner f () is
another learner
(Wolpert, 1992)
Stacking
Stacking
Stacking
Architecture of Stacking
Architecture of Stacking
Stacking
Cascading
Cascading
Cascading

Use dj only if
preceding ones are
not confident

Cascade learners in
order of complexity
Key Components of Cascading

1.Stages:
1. Each stage typically consists of a classifier or model that operates on the
data.
2. Early stages use simpler models that quickly identify and discard obvious
negative samples.
3. Later stages employ more complex models that handle the remaining,
more challenging cases.
2.Decision Thresholds:
1. Each classifier in the cascade has a threshold that determines whether a
sample is classified positively or negatively.
2. The thresholds can be adjusted to balance sensitivity (true positive rate)
and specificity (true negative rate).
3.Error Focus:
1. Each subsequent model in the cascade is trained to focus on the errors
made by the previous classifiers, enhancing the overall model's ability to
correct mistakes.
How Cascading Works

1.Initial Classifier: The first model is trained on the entire dataset. It makes

predictions and classifies samples.

2.Filtering:

1. Samples that are classified as negatives are discarded.

2. Positive samples are passed to the next stage for further analysis.

3.Subsequent Classifiers:

1. Each subsequent classifier builds on the output of the previous one.

2. This continues until all stages have been applied or until a final decision is

made.
Cascading

• For example, suppose we want to build a machine learning model which would
detect if a credit card transaction is fraudulent or not.
Applications of Cascading

You might also like