Unit V - Multiple Learners
Unit V - Multiple Learners
3. Binary Classifiers:
o Each classifier outputs either -1 or +1. These outputs correspond to the class predictions for
specific parts of the task.
• There are several variations of ECOC, each with its own characteristics and
applications.
• One-vs-All (OvA): In the One-vs-All approach, each class is compared against all other
classes. This results in a code matrix where each column has one class labeled as 1
and all others as -1. This method is simple but may not be optimal for all problems.
• Dense and Sparse Codes: Dense codes use a larger number of binary classifiers,
resulting in more robust error correction but higher computational cost. Sparse codes
use fewer classifiers, reducing computational cost but potentially sacrificing some
robustness.
Error-Correcting Output Codes
• Robustness to Errors:
• Improved Generalization
• Flexibility
1. Image Classification
2. Text Classification
3. Bioinformatics
4. Speech Recognition
Bagging
Bagging
During prediction, the outputs of these base learners are aggregated, often by
averaging (for regression tasks) or voting (for classification tasks), to produce
the final prediction. Bagging helps to reduce overfitting by introducing
diversity among the base learners and improves the overall performance by
reducing variance and increasing robustness
Bagging method
Bootstrap Aggregating is an ensemble learning technique that integrates many
models to produce a more accurate and robust prediction model. The following
stages are included in the bagging algorithm:
Base Model Training– A base model, such as a decision tree or a neural network, is
trained individually on each bootstrap sample. Because the subsets are not similar,
each base model generates a separate prediction model.
Aggregation– The result of each base model is then aggregated via aggregation,
which is commonly accomplished by taking the average for regression problems or
the mode for classification issues. This aggregation process contributes to the
reduction of variance and the improvement of the generalization performance of the
final prediction model.
• Step 2: If there is any prediction error caused by first base learning algorithm,
then we pay higher attention to observations having prediction error. Then, we
apply the next base learning algorithm.
• Step 3: Iterate Step 2 till the limit of base learning algorithm is reached or higher
accuracy is achieved.
Finally, it combines the outputs from weak learner and creates a strong
learner which eventually improves the prediction power of the model.
Boosting pays higher focus on examples which are mis-classified or have
higher errors by preceding weak rules.
AdaBoost
AdaBoost
• Box 1: You can see that we have assigned equal weights to each data point and
applied a decision stump to classify them as + (plus) or – (minus). The decision
stump (D1) has generated vertical line at left side to classify the data points. We
see that, this vertical line has incorrectly predicted three + (plus) as – (minus). In
such case, we’ll assign higher weights to these three + (plus) and apply another
decision stump.
AdaBoost
• Box 2: Here, you can see that the size of three incorrectly predicted + (plus) is
bigger as compared to rest of the data points. In this case, the second decision
stump (D2) will try to predict them correctly. Now, a vertical line (D2) at right side
of this box has classified three mis-classified + (plus) correctly. But again, it has
caused mis-classification errors. This time with three -(minus). Again, we will
assign higher weight to three – (minus) and apply another decision stump.
AdaBoost
• Box 3: Here, three – (minus) are given higher weights. A decision stump (D3) is
applied to predict these mis-classified observation correctly. This time a
horizontal line is generated to classify + (plus) and – (minus) based on higher
weight of mis-classified observation.
AdaBoost
• Box 3: Here, three – (minus) are given higher weights. A decision stump (D3) is
applied to predict these mis-classified observation correctly. This time a
horizontal line is generated to classify + (plus) and – (minus) based on higher
weight of mis-classified observation.
AdaBoost
Boosting
Boosting
Stacking
• Combiner f () is
another learner
(Wolpert, 1992)
Stacking
Stacking
Stacking
Architecture of Stacking
Architecture of Stacking
Stacking
Cascading
Cascading
Cascading
Use dj only if
preceding ones are
not confident
Cascade learners in
order of complexity
Key Components of Cascading
1.Stages:
1. Each stage typically consists of a classifier or model that operates on the
data.
2. Early stages use simpler models that quickly identify and discard obvious
negative samples.
3. Later stages employ more complex models that handle the remaining,
more challenging cases.
2.Decision Thresholds:
1. Each classifier in the cascade has a threshold that determines whether a
sample is classified positively or negatively.
2. The thresholds can be adjusted to balance sensitivity (true positive rate)
and specificity (true negative rate).
3.Error Focus:
1. Each subsequent model in the cascade is trained to focus on the errors
made by the previous classifiers, enhancing the overall model's ability to
correct mistakes.
How Cascading Works
1.Initial Classifier: The first model is trained on the entire dataset. It makes
2.Filtering:
2. Positive samples are passed to the next stage for further analysis.
3.Subsequent Classifiers:
2. This continues until all stages have been applied or until a final decision is
made.
Cascading
• For example, suppose we want to build a machine learning model which would
detect if a credit card transaction is fraudulent or not.
Applications of Cascading