Ensemble Learning and Random Forest 4th
Ensemble Learning and Random Forest 4th
Ensemble Learning
Ensemble means ‘a collection of things’ and in Machine Learning terminology,
Ensemble learning refers to the approach of combining multiple ML models to
produce a more accurate and robust prediction compared to any individual model.
It implements an ensemble of fast algorithms (classifiers) such as decision trees for
learning and allows them to vote.
Example:
If three models predict labels as [A, A, B], the final prediction is A.
b. Averaging
Example:
If three models predict outputs as [2.5, 3.0, 3.5], the final prediction is (2.5 + 3.0 + 3.5) / 3 = 3.0.
c. Weighted Average
Similar to averaging but assigns different weights to models based on their performance.
Final prediction = sum( weight*prediction)
Voting Classifier
A voting classifier is a machine learning model that gains experience by training on
a collection of several models and forecasts an output (class) based on the class
with the highest likelihood of becoming the output.
Hard Voting: In hard voting, the predicted output class is a class with the
highest majority of votes, i.e., the class with the highest probability of being
predicted by each classifier.
Example:
Final Prediction = Dog (because "Dog" has the majority of votes, 2 out of 3).
Soft Voting: In this, the average probabilities of the classes determine which
one will be the final prediction.
Why Use a Voting Classifier?
In short, a Voting Classifier makes decisions based on the collective wisdom of multiple
models, improving the chances of accurate predictions.
Bagging involves training multiple models on random subsets of the training data with
replacement. This means some data points may appear multiple times in a subset, while
others may not appear at all. Each model is trained independently, and their results are
combined (e.g., by averaging for regression or voting for classification).
Bagging helps improve accuracy and reduce overfitting, especially in models that
have high variance.
How It Works:
Key Features:
Advantages:
2. Pasting
Pasting is an ensemble technique similar to bagging except for the fact that in
pasting sampling is done without replacement i.e. an observation can be present in
only one subset. Since pasting limits diversity of models its performance with is
suboptimal when compared to bagging, particularly in case of small datasets.
However, pasting is preferred over bagging in case of so large datasets, owing to
computational efficiency.
Pasting is similar to bagging, but the key difference is that it uses random subsets of the
training data without replacement. This means no data point will be repeated in a subset.
How It Works:
Key Features:
Advantages:
Data Diversity Some data points are All data points are
repeated unique
In Summary:
Bagging: Samples data with replacement, reduces variance, works well for unstable
models.
Pasting: Samples data without replacement, increases diversity, useful for smaller
datasets.
Both methods improve the robustness and accuracy of machine learning models!
Random forests do not require a validation dataset. Most random forests use a technique
called out-of-bag-evaluation (OOB evaluation) to evaluate the quality of the model. OOB
evaluation treats the training set as if it were on the test set of a cross-validation.
OOB Evaluation is a method used in Bagging (e.g., Random Forests) to evaluate model
performance without needing a separate validation set.
How It Works:
Key Advantages:
In Summary:
OOB Evaluation provides an unbiased estimate of a Bagging model's accuracy using data left
out during training.
Random Patches involves randomly sampling both instances (rows) and features (columns)
from the dataset to train each model. This creates unique training sets for each model in the
ensemble.
How It Works:
Advantages:
2. Random Subspaces
Random Subspaces focuses on sampling only features (columns) randomly while using all
the data points (rows) for training. This ensures that each model trains on a unique set of
features.
How It Works:
Advantages:
Key Differences:
In Summary:
Random Patches randomizes both rows and columns to train diverse models.
Random Subspaces randomizes only columns (features), keeping all rows for training.
Both methods enhance ensemble diversity and help prevent overfitting.
As the name suggests, "Random Forest is a classifier that contains a number of decision trees on
various subsets of the given dataset and takes the average to improve the predictive accuracy of
that dataset." Instead of relying on one decision tree, the random forest takes the prediction from
each tree and based on the majority votes of predictions, and it predicts the final output.
The greater number of trees in the forest leads to higher accuracy and prevents the problem of
overfitting.
The below diagram explains the working of the Random Forest algorithm:
1. Select Random Data Points (K): Randomly select K data points from the training set (with
replacement).
2. Build Decision Trees: For each subset, build a decision tree based on the selected data
points.
3. Repeat: Repeat the above steps to create N decision trees in the forest.
1. Banking: Banking sector mostly uses this algorithm for the identification of loan risk.
2. Medicine: With the help of this algorithm, disease trends and risks of the disease can be
identified.
3. Land Use: We can identify the areas of similar land use by this algorithm.
4. Marketing: Marketing trends can be identified using this algorithm.
These features make Random Forest robust, efficient, and reliable for various tasks.
Boosting
Boosting is an ensemble modeling technique that attempts to build a strong
classifier from the number of weak classifiers. It is done by building a model by
using weak models in series. Firstly, a model is built from the training data. Then
the second model is built which tries to correct the errors present in the first
model. This procedure is continued and models are added until either the
complete training data set is predicted correctly or the maximum number of
models are added.
Advantages of Boosting
AdaBoost
Definition: AdaBoost (Adaptive Boosting) combines multiple weak classifiers (e.g., decision
stumps) to form a strong classifier by focusing on hard-to-classify examples.
1. How It Works:
Initialize Weights: Start with equal weights for all data points.
Train Weak Learner: Build a weak model on the data.
Update Weights: Increase weights of misclassified points for the next iteration.
Combine Learners: Final prediction is a weighted sum of all weak learners.
2. Features:
Focuses on difficult examples.
Handles classification and regression.
3. Pros: Improves accuracy, easy to implement, robust for small datasets.
4. Cons: Sensitive to outliers and noisy data.
Stacking
Stacking is one of the most popular ensemble machine learning techniques used to predict
multiple nodes to build a new model and improve model performance. Stacking enables us to train
multiple models to solve similar problems, and based on their combined output, it builds a new
model with improved performance.
Stacking is an ensemble learning technique that combines the predictions of multiple base
models (also called level-1 models) through a meta-model (also called a level-2 model). This
approach leverages the strengths of different models to produce a more robust and accurate
final prediction.
In stacking, an algorithm takes the outputs of sub-models as input and attempts to learn how to
best combine the input predictions to make a better output prediction.
Stacking is also known as a stacked generalization and is an extended form of the Model Averaging
Ensemble technique in which all sub-models equally participate as per their performance weights
and build a new model with better predictions. This new model is stacked up on top of the others;
this is the reason why it is named stacking.
The basic architecture of stacking can be represented as shown below the image
1. Original Data: Divided into n-folds, used for training and testing.
2. Base Models (Level-0 Models): Trained on subsets of the data to produce predictions.
3. Level-0 Predictions: Outputs from the base models.
4. Meta-Model (Level-1 Model): Combines level-0 predictions to produce final results.
5. Level-1 Prediction: The meta-model is trained on level-0 predictions and outputs the final
prediction.
Key Idea: The meta-model learns how to best integrate base model predictions for improved
accuracy.
Questions
1.Discuss how Random Forest algorithm give output for regression problems
In regression tasks, Random Forest predicts a continuous output (numerical
value) rather than a class. Here's how it operates:
1. Training Phase:
Multiple decision trees are built using different bootstrap samples from
the training data.
Each tree is trained independently, focusing on minimizing the error (e.g.,
Mean Squared Error) during splits.
2. Prediction Phase:
When a new data point is provided, each decision tree in the forest
predicts a numerical value (regression output).
3. Final Output:
The Random Forest aggregates the predictions from all the decision trees.
Final Prediction = Average of the outputs from all individual trees.
Example
If a Random Forest consists of 5 decision trees, and the trees predict values for
a new data point as [10,12,11,13,12]:
The output is the average of all tree predictions, making it robust and
accurate for regression problems.
3.What is Bagging and Boosting ? Write few differences between them in detail
Summary
5.Describe the max voting technique in ensemble learning. How does it work in
the context of classification
The Max Voting technique is a simple and effective method used in ensemble learning for
classification tasks. It combines the predictions of multiple models (classifiers) and selects the
output class that receives the highest number of votes.
How It Works
Example
Suppose we have three models predicting the class of a given data point, and the possible classes
are A, B, and C.
Model 1 predicts: A
Model 2 predicts: B
Model 3 predicts: A
A: 2 votes
B: 1 vote
C: 0 votes
Key Characteristics
Advantages
Limitations
Advantages
Reduces Variance: Minimizes the variability in predictions, leading to more stable results.
Improves Accuracy: Aggregating multiple predictions often yields better results than
using a single model.
Limitations
Not Suitable for Classification: Averaging is primarily used for regression; it doesn't
naturally handle categorical outputs.
Dependent on Model Quality: If individual models are poorly trained, averaging won't
improve performance significantly.