0% found this document useful (0 votes)
63 views19 pages

Ensemble Learning and Random Forest 4th

machine learning 5th sem rgpv

Uploaded by

rajishere12345
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views19 pages

Ensemble Learning and Random Forest 4th

machine learning 5th sem rgpv

Uploaded by

rajishere12345
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Ensemble Learning and Random Forest

Ensemble Learning
Ensemble means ‘a collection of things’ and in Machine Learning terminology,
Ensemble learning refers to the approach of combining multiple ML models to
produce a more accurate and robust prediction compared to any individual model.
It implements an ensemble of fast algorithms (classifiers) such as decision trees for
learning and allows them to vote.

Ensemble Learning is a technique in machine learning where multiple models


(called base models or learners) are combined to produce a stronger and more
accurate model. The idea is that by aggregating the predictions of multiple models,
the ensemble reduces variance, bias, and improves predictions.

Why Use Ensembles?


To achieve better generalization and robustness.
To reduce overfitting (variance) or underfitting (bias).

Basic Ensemble Techniques


a. Max Voting

Used primarily for classification tasks.


Each model in the ensemble votes for a class, and the class with the most votes is the
final prediction.

Example:
If three models predict labels as [A, A, B], the final prediction is A.

b. Averaging

Used for regression tasks.


The predictions from all models are averaged to get the final prediction.

Example:
If three models predict outputs as [2.5, 3.0, 3.5], the final prediction is (2.5 + 3.0 + 3.5) / 3 = 3.0.

c. Weighted Average

Similar to averaging but assigns different weights to models based on their performance.
Final prediction = sum( weight*prediction)

Voting Classifier
A voting classifier is a machine learning model that gains experience by training on
a collection of several models and forecasts an output (class) based on the class
with the highest likelihood of becoming the output.

A Voting Classifier is a machine learning model that combines the predictions of


multiple models to make a more reliable decision. Instead of relying on a single
model, it uses the "wisdom of the crowd" approach to make better predictions.

The idea is simple:

Train multiple models (e.g., Decision Trees, Logistic Regression, etc.).


Combine their predictions to decide the final output

There are primarily two different types of voting classifiers:

Hard Voting: In hard voting, the predicted output class is a class with the
highest majority of votes, i.e., the class with the highest probability of being
predicted by each classifier.

Example:

Model 1 predicts: Cat

Model 2 predicts: Dog

Model 3 predicts: Dog

Final Prediction = Dog (because "Dog" has the majority of votes, 2 out of 3).

Soft Voting: In this, the average probabilities of the classes determine which
one will be the final prediction.
Why Use a Voting Classifier?

It combines the strengths of different models.


It is more robust and accurate than a single model.
It works well when the models are diverse (e.g., different algorithms).

In short, a Voting Classifier makes decisions based on the collective wisdom of multiple
models, improving the chances of accurate predictions.

1. Bagging (Bootstrap Aggregating)

Bagging involves training multiple models on random subsets of the training data with
replacement. This means some data points may appear multiple times in a subset, while
others may not appear at all. Each model is trained independently, and their results are
combined (e.g., by averaging for regression or voting for classification).

Bagging helps improve accuracy and reduce overfitting, especially in models that
have high variance.

How It Works:

1. Randomly select subsets of the training data with replacement.


2. Train a model on each subset.
3. Combine the predictions of all models (e.g., majority vote or average).

Key Features:

With Replacement: Some data points can be repeated in a subset.


Goal: Reduces variance and helps prevent overfitting.
Example: Random Forest is a well-known algorithm that uses bagging with decision trees.

Advantages:

Reduces overfitting by averaging multiple models.


Handles high variance models (e.g., decision trees) well.

2. Pasting
Pasting is an ensemble technique similar to bagging except for the fact that in
pasting sampling is done without replacement i.e. an observation can be present in
only one subset. Since pasting limits diversity of models its performance with is
suboptimal when compared to bagging, particularly in case of small datasets.
However, pasting is preferred over bagging in case of so large datasets, owing to
computational efficiency.

Pasting is similar to bagging, but the key difference is that it uses random subsets of the
training data without replacement. This means no data point will be repeated in a subset.

How It Works:

1. Randomly select subsets of the training data without replacement.


2. Train a model on each subset.
3. Combine the predictions of all models (e.g., majority vote or average).

Key Features:

Without Replacement: Each data point appears only once in a subset.


Goal: Reduces overfitting and increases diversity among models.

Advantages:

Ensures more diverse training data for each model.


Useful when the dataset is small and you want all data points to contribute.

Bagging vs. Pasting: Key Differences

Feature Bagging Pasting

Sampling Method With replacement Without replacement

Data Diversity Some data points are All data points are
repeated unique

Use Case Larger datasets Smaller datasets

Example Random Forest Rarely used directly

When to Use Them?

Bagging is preferred when:


The model has high variance (e.g., decision trees).
The dataset is large, and data repetition isn’t a concern.
Pasting is useful when:
The dataset is small, and you want all data points to be used.
You want to increase model diversity without repeating data points.

In Summary:

Bagging: Samples data with replacement, reduces variance, works well for unstable
models.
Pasting: Samples data without replacement, increases diversity, useful for smaller
datasets.

Both methods improve the robustness and accuracy of machine learning models!

#Out-of-Bag (OOB) Evaluation:

Random forests do not require a validation dataset. Most random forests use a technique
called out-of-bag-evaluation (OOB evaluation) to evaluate the quality of the model. OOB
evaluation treats the training set as if it were on the test set of a cross-validation.

OOB Evaluation is a method used in Bagging (e.g., Random Forests) to evaluate model
performance without needing a separate validation set.

How It Works:

1. In Bagging, each model is trained on a random subset of data (in-bag samples).


2. The remaining data (out-of-bag samples) are left out and used as a test set for that
model.
3. The model’s predictions on OOB samples are compared with actual labels to calculate
accuracy or error.

Key Advantages:

No need for a separate validation set.


Efficient and reliable performance estimation.
Saves data, especially useful for small datasets.

In Summary:

OOB Evaluation provides an unbiased estimate of a Bagging model's accuracy using data left
out during training.

Random Patches and Random Subspaces


1. Random Patches

Random Patches involves randomly sampling both instances (rows) and features (columns)
from the dataset to train each model. This creates unique training sets for each model in the
ensemble.

How It Works:

Randomly select a subset of data points (rows).


Randomly select a subset of features (columns).
Train a model on this patch of the dataset.

Advantages:

Useful for datasets with many features and instances.


Increases model diversity by varying both data and features.
Reduces the risk of overfitting.

2. Random Subspaces

Random Subspaces focuses on sampling only features (columns) randomly while using all
the data points (rows) for training. This ensures that each model trains on a unique set of
features.

How It Works:

Randomly select a subset of features.


Use all the data points (rows) but only the selected features.
Train a model on this reduced feature set.

Advantages:

Works well for high-dimensional datasets (e.g., text or image data).


Reduces overfitting by forcing models to rely on different subsets of features.
Promotes feature diversity among models.

Key Differences:

Feature Random Patches Random Subspaces

Sampling Instances Yes No

Sampling Features Yes Yes

Use Case Large datasets (many High-dimensional datasets


rows/cols)

In Summary:
Random Patches randomizes both rows and columns to train diverse models.
Random Subspaces randomizes only columns (features), keeping all rows for training.
Both methods enhance ensemble diversity and help prevent overfitting.

Random Forest Algorithm


Random Forest is a popular machine learning algorithm that belongs to the supervised learning
technique. It can be used for both Classification and Regression problems in ML. It is based on the
concept of ensemble learning, which is a process of combining multiple classifiers to solve a
complex problem and to improve the performance of the model.

As the name suggests, "Random Forest is a classifier that contains a number of decision trees on
various subsets of the given dataset and takes the average to improve the predictive accuracy of
that dataset." Instead of relying on one decision tree, the random forest takes the prediction from
each tree and based on the majority votes of predictions, and it predicts the final output.

The greater number of trees in the forest leads to higher accuracy and prevents the problem of
overfitting.

The below diagram explains the working of the Random Forest algorithm:

Assumptions for Random Forest


There should be some actual values in the feature variable of the dataset so that the
classifier can predict accurate results rather than a guessed result.
The predictions from each tree must have very low correlations.

Why use Random Forest?


Below are some points that explain why we should use the Random Forest algorithm:

It takes less training time as compared to other algorithms.


It predicts output with high accuracy, even for the large dataset it runs efficiently.
It can also maintain accuracy when a large proportion of data is missing.
Random Forest Algorithm: How It Works

The Random Forest algorithm operates in two phases:

Phase 1: Creating the Random Forest

1. Select Random Data Points (K): Randomly select K data points from the training set (with
replacement).
2. Build Decision Trees: For each subset, build a decision tree based on the selected data
points.
3. Repeat: Repeat the above steps to create N decision trees in the forest.

Phase 2: Making Predictions

1. Make Predictions: For a new data point:


Each of the N decision trees makes a prediction.
2. Majority Voting: The final prediction is made by selecting the category (or value) that
wins the majority vote from all trees.

Applications of Random Forest


There are mainly four sectors where Random forest mostly used:

1. Banking: Banking sector mostly uses this algorithm for the identification of loan risk.
2. Medicine: With the help of this algorithm, disease trends and risks of the disease can be
identified.
3. Land Use: We can identify the areas of similar land use by this algorithm.
4. Marketing: Marketing trends can be identified using this algorithm.

Advantages of Random Forest


Random Forest is capable of performing both Classification and Regression tasks.
It is capable of handling large datasets with high dimensionality.
It enhances the accuracy of the model and prevents the overfitting issue.

Disadvantages of Random Forest


Although random forest can be used for both classification and regression tasks, it is not
more suitable for Regression tasks.

Key Features of Random Forest

1. High Accuracy: Combines multiple trees for more accurate predictions.


2. Resistance to Overfitting: Averaging results reduces overfitting.
3. Handles Large Datasets: Efficiently manages big datasets using multiple trees.
4. Variable Importance: Assesses which features are most important.
5. Built-in Cross-Validation: Uses out-of-bag samples for internal validation.
6. Handles Missing Values: Makes predictions using available data.
7. Parallelization: Speeds up training by leveraging parallel processing.

These features make Random Forest robust, efficient, and reliable for various tasks.

Boosting
Boosting is an ensemble modeling technique that attempts to build a strong
classifier from the number of weak classifiers. It is done by building a model by
using weak models in series. Firstly, a model is built from the training data. Then
the second model is built which tries to correct the errors present in the first
model. This procedure is continued and models are added until either the
complete training data set is predicted correctly or the maximum number of
models are added.

Advantages of Boosting

Improved Accuracy – Boosting can improve the accuracy of the model by


combining several weak models’ accuracies and averaging them for regression
or voting over them for classification to increase the accuracy of the final
model.
Robustness to Overfitting – Boosting can reduce the risk of overfitting by
reweighting the inputs that are classified wrongly.
Better handling of imbalanced data – Boosting can handle the imbalance data
by focusing more on the data points that are misclassified
Better Interpretability – Boosting can increase the interpretability of the model
by breaking the model decision process into multiple processes

Types Of Boosting Algorithms

1. Gradient Boosting – It is a boosting technique that builds a final model from


the sum of several weak learning algorithms that were trained on the same
dataset. It operates on the idea of stagewise addition. The first weak learner in
the gradient boosting algorithm will not be trained on the dataset; instead, it
will simply return the mean of the relevant column. The residual for the first
weak learner algorithm’s output will then be calculated and used as the output
column or target column for the next weak learning algorithm that will be
trained. The second weak learner will be trained using the same methodology,
and the residuals will be computed and utilized as an output column once more
for the third weak learner, and so on until we achieve zero residuals. The
dataset for gradient boosting must be in the form of numerical or categorical
data, and the loss function used to generate the residuals must be differential
at all times.

AdaBoost
Definition: AdaBoost (Adaptive Boosting) combines multiple weak classifiers (e.g., decision
stumps) to form a strong classifier by focusing on hard-to-classify examples.

1. How It Works:
Initialize Weights: Start with equal weights for all data points.
Train Weak Learner: Build a weak model on the data.
Update Weights: Increase weights of misclassified points for the next iteration.
Combine Learners: Final prediction is a weighted sum of all weak learners.
2. Features:
Focuses on difficult examples.
Handles classification and regression.
3. Pros: Improves accuracy, easy to implement, robust for small datasets.
4. Cons: Sensitive to outliers and noisy data.

AdaBoost iteratively refines weak models to build a strong, accurate ensemble.

Disadvantages of Boosting Algorithms

Boosting algorithms also have some disadvantages these are:

Boosting Algorithms are vulnerable to the outliers


It is difficult to use boosting algorithms for Real-Time applications.
It is computationally expensive for large datasets
Aspect AdaBoost Gradient Boosting

Main Idea Focuses on misclassified Minimizes a loss function


samples by adjusting their by sequentially adding
weights after each weak learners to correct
iteration. residual errors.

Weight Adjustment Adjusts weights of data Fits the next model to


points to emphasize correct the residuals of the
difficult cases. previous model.

Loss Function Uses exponential loss to Can use customizable loss


penalize errors. functions (e.g., mean
squared error, log loss).

Weak Learner Typically uses decision Can use deeper trees or


stumps (1-level decision other models as weak
trees). learners.

Speed Faster to train, simpler Slower due to optimization


implementation. of the loss function at each
step.

Robustness to Outliers Sensitive to outliers due to Less sensitive to outliers


the focus on misclassified with appropriate loss
points. functions.

Flexibility Less flexible due to a fixed Highly flexible with


loss function. customizable loss
functions.

Applications Simple classification Complex regression and


problems. classification tasks.

Stacking
Stacking is one of the most popular ensemble machine learning techniques used to predict
multiple nodes to build a new model and improve model performance. Stacking enables us to train
multiple models to solve similar problems, and based on their combined output, it builds a new
model with improved performance.

Stacking is an ensemble learning technique that combines the predictions of multiple base
models (also called level-1 models) through a meta-model (also called a level-2 model). This
approach leverages the strengths of different models to produce a more robust and accurate
final prediction.

In stacking, an algorithm takes the outputs of sub-models as input and attempts to learn how to
best combine the input predictions to make a better output prediction.

Stacking is also known as a stacked generalization and is an extended form of the Model Averaging
Ensemble technique in which all sub-models equally participate as per their performance weights
and build a new model with better predictions. This new model is stacked up on top of the others;
this is the reason why it is named stacking.

The basic architecture of stacking can be represented as shown below the image

1. Original Data: Divided into n-folds, used for training and testing.
2. Base Models (Level-0 Models): Trained on subsets of the data to produce predictions.
3. Level-0 Predictions: Outputs from the base models.
4. Meta-Model (Level-1 Model): Combines level-0 predictions to produce final results.
5. Level-1 Prediction: The meta-model is trained on level-0 predictions and outputs the final
prediction.

Key Idea: The meta-model learns how to best integrate base model predictions for improved
accuracy.

Questions
1.Discuss how Random Forest algorithm give output for regression problems
In regression tasks, Random Forest predicts a continuous output (numerical
value) rather than a class. Here's how it operates:

Steps for Regression with Random Forest

1. Training Phase:
Multiple decision trees are built using different bootstrap samples from
the training data.
Each tree is trained independently, focusing on minimizing the error (e.g.,
Mean Squared Error) during splits.
2. Prediction Phase:
When a new data point is provided, each decision tree in the forest
predicts a numerical value (regression output).
3. Final Output:
The Random Forest aggregates the predictions from all the decision trees.
Final Prediction = Average of the outputs from all individual trees.

Key Advantages for Regression

Reduces Overfitting: By averaging outputs, it avoids overfitting that may


occur in individual trees.
Handles Non-Linear Relationships: Can capture complex interactions
between features.
Robustness to Noise: Less sensitive to outliers due to ensemble averaging.

Example

If a Random Forest consists of 5 decision trees, and the trees predict values for
a new data point as [10,12,11,13,12]:

The output is the average of all tree predictions, making it robust and
accurate for regression problems.

2.How is a Random Forest related to decision trees ? Discuss

1. Foundation on Decision Trees:


A Random Forest is an ensemble method that combines multiple decision
trees to improve predictive accuracy and reduce overfitting.
Each decision tree in the Random Forest is a standalone model that splits
data based on feature thresholds to predict outcomes.
2. Key Differences:
Overfitting:
Decision trees are prone to overfitting, especially with noisy or
complex data.
Random Forest mitigates overfitting by aggregating predictions from
multiple trees.
Model Diversity:
In Random Forest, each tree is trained on a random subset of the data
(bagging) and randomly selected features, leading to diverse models.
Decision trees use the entire dataset and all features for splits.
3. Prediction:
Decision Trees: Provide a single output based on the training data.
Random Forest: Combines outputs from all trees (majority vote for
classification, average for regression) to produce a more robust and
accurate prediction.
4. Advantages of Random Forest over Decision Trees:
Higher accuracy due to ensemble learning.
Greater resistance to noise and outliers.
More generalized models with reduced overfitting.
5. Example:
A decision tree might overfit by memorizing the data, while Random
Forest generalizes by averaging the predictions from multiple, varied
trees.

Summary: Random Forest builds upon decision trees by creating an ensemble


of them, improving accuracy, robustness, and generalization.

3.What is Bagging and Boosting ? Write few differences between them in detail

Bagging and Boosting : above

Differences Between Bagging and Boosting


Feature Bagging Boosting

Model Training Models are trained Models are trained


independently in sequentially, one after
parallel. another.

Purpose Reduces variance to Reduces bias and


prevent overfitting. improves accuracy.

Data Sampling Uses bootstrapped Focuses on all data but


samples (random assigns higher weights
subsets with to misclassified points.
replacement).

Error Handling Treats all models Corrects errors of


equally; averages previous models by re-
predictions. weighting data.

Examples Random Forest AdaBoost, Gradient


Boosting

Overfitting Less prone to Can overfit if models


overfitting due to become too complex.
independent models.

Speed Faster, as models are Slower, due to


trained in parallel. sequential training.

Summary

Bagging focuses on reducing variance by training models independently on different


subsets of data.
Boosting aims to reduce bias by sequentially improving on the mistakes of previous
models.
4.Explain bagging and pasting as ensemble techniques .What are the key
differences between them

Feature Bagging Pasting

Sampling Method With replacement Without replacement

Subset Composition Data points may repeat Subsets contain


in subsets unique data points

Diversity of Subsets Less diverse More diverse

Dataset Size Works well for small Better for large


datasets datasets

Risk of Overfitting Lower Slightly higher

Computational Cost Higher due to possible Lower as no repetitions


data repetition in subsets

Ideal Use Case High-variance models Large datasets with


like decision trees less variance

Popular Example Random Forest Custom ensembles


without overlap

5.Describe the max voting technique in ensemble learning. How does it work in
the context of classification

The Max Voting technique is a simple and effective method used in ensemble learning for
classification tasks. It combines the predictions of multiple models (classifiers) and selects the
output class that receives the highest number of votes.

How It Works

1. Train Multiple Models:


Several models (e.g., decision trees, SVMs, or logistic regression) are trained on the same
dataset.
2. Generate Predictions:
Each model predicts the output class for a given input.
3. Count Votes:
For each possible output class, the number of models predicting that class is counted.
4. Choose the Class with Maximum Votes:
The final prediction is the class that receives the majority of votes from the models.

Example

Suppose we have three models predicting the class of a given data point, and the possible classes
are A, B, and C.

Model 1 predicts: A
Model 2 predicts: B
Model 3 predicts: A

The votes for each class are:

A: 2 votes
B: 1 vote
C: 0 votes

Since class A has the maximum votes, the final prediction is A.

Key Characteristics

Simple and Intuitive: Easy to implement and interpret.


Effective for Classification: Works well when individual models perform reasonably well.
Robust to Outliers: Less sensitive to outliers since predictions are based on majority votes.

Advantages

Improves overall accuracy by leveraging the strengths of multiple models.


Reduces the impact of weak or poorly performing models.

Limitations

Requires a large number of models to ensure diversity and better accuracy.


Ineffective if all models make similar mistakes (low model diversity).

6.Explain the averaging technique in ensemble learning .How does it combine


predictions from multiple models

The Averaging Technique is used in ensemble learning to combine the


predictions of multiple models, particularly in regression tasks. It works by
taking the average of the predictions from all the models to produce the final
output.

How Averaging Works

1. Train Multiple Models:


Several models (e.g., linear regression, decision trees, or neural networks)
are trained on the same dataset.
2. Generate Predictions:
Each model makes a prediction for a given input.
3. Compute the Average:
The final prediction is obtained by calculating the arithmetic mean of the
predictions from all the models.

Advantages

Reduces Variance: Minimizes the variability in predictions, leading to more stable results.
Improves Accuracy: Aggregating multiple predictions often yields better results than
using a single model.

Limitations
Not Suitable for Classification: Averaging is primarily used for regression; it doesn't
naturally handle categorical outputs.
Dependent on Model Quality: If individual models are poorly trained, averaging won't
improve performance significantly.

You might also like