ML Unit-3
ML Unit-3
Home
Unit-2
Unit-3
Ensemble Learning and Random Forests: Introduction, Voting Classifiers, Bagging and
Pasting, Random Forests, Boosting, Stacking. Support Vector Machine: Linear SVM
Classification, Nonlinear SVM Classification SVM Regression, Naïve Bayes Classifiers.
Unit-4
Unit-5
Neural Networks and Deep Learning: Introduction to Artificial Neural Networks with Keras,
Implementing MLPs with Keras, Installing Tensor Flow 2, Loading and Preprocessing Data
with Tensor Flow.
Home
Unit-3
Ensemble Learning and Random Forests: Introduction, Voting Classifiers, Bagging and
Pasting, Random Forests, Boosting, Stacking. Support Vector Machine: Linear SVM
Classification, Nonlinear SVM Classification SVM Regression, Naïve Bayes Classifiers.
Ensemble Learning
Ensemble learning is a machine learning technique where multiple models (often referred to as
"weak learners" or "base estimators") are combined to solve a problem and improve overall
performance. The idea is that by aggregating the predictions of several models, the ensemble
will be more robust and accurate than any single model.
Key Benefits of Ensemble Learning:
1. Improved Accuracy: Combines the strengths of multiple models, often outperforming
individual models.
2. Reduced Overfitting: By averaging or voting, ensemble models generalize better to
unseen data.
3. Error Reduction: It reduces three types of errors:
o Bias: By combining multiple models, bias in individual models is reduced.
o Variance: Aggregating predictions smooths out variance in individual models.
o Noise: Uncorrelated errors in models cancel out.
Types of Ensemble Learning:
1. Bagging (Bootstrap Aggregating):
• Builds multiple models using different subsets of the training data.
• Reduces variance and avoids overfitting.
• Example: Random Forests.
2. Boosting:
• Builds models sequentially, where each model corrects the errors of its predecessor.
• Focuses on reducing bias.
• Example: AdaBoost, Gradient Boosting, XGBoost.
3. Stacking: Combines predictions from multiple models using another model (meta-
model) that learns how to aggregate them.
4. Voting: Combines predictions by taking a majority vote (classification) or averaging
predictions (regression).
Home
Random Forest
Random Forest is a powerful and widely used ensemble learning technique that extends the
concept of decision trees by combining multiple decision trees to improve accuracy and reduce
overfitting.
How Random Forest Works:
1. Bagging:
o Random Forests use bagging, meaning each tree is trained on a different
bootstrapped subset of the training data (sampling with replacement).
2. Random Feature Selection:
o At each split in the decision tree, a random subset of features is considered. This
ensures that the trees are decorrelated and diverse, which improves ensemble
performance.
3. Aggregation:
o For classification: Each tree votes for a class, and the majority vote is selected.
o For regression: Predictions are averaged across all trees.
Advantages of Random Forests:
1. High Accuracy: Performs well on both classification and regression tasks.
2. Handles High-Dimensional Data: Works well with datasets having many features.
3. Robust to Overfitting: The randomness introduced by bagging and feature selection
reduces overfitting.
4. Handles Missing Values: Can handle datasets with missing data to some extent.
5. Feature Importance: Provides insights into feature importance, aiding interpretability.
Disadvantages of Random Forests:
1. Computationally Expensive: Training and prediction can be slow for large datasets or
many trees.
2. Not Fully Interpretable: Unlike single decision trees, Random Forests are more of a
"black box."
3. Overfitting Risk (Rare): While generally robust, a large number of trees can
sometimes lead to slight overfitting if not tuned properly.
Applications of Random Forests:
Classification Tasks: Medical diagnosis, Spam detection, Image and text classification.
Regression Tasks: Predicting housing prices, Forecasting sales. Environmental modelling.
Home
Voting Classifiers
A voting classifier is a machine learning model that gains experience by training on a collection
of several models and forecasts an output (class) based on the class with the highest likelihood
of becoming the output. To forecast the output class based on the largest majority of votes, it
averages the results of each classifier provided into the voting classifier. The concept is to build
a single model that learns from various models and predicts output based on their aggregate
majority of votes for each output class, rather than building separate specialized models and
determining the accuracy for each of them.
There are primarily two different types of voting classifiers:
• Hard Voting: In hard voting, the predicted output class is a class with the highest
majority of votes, i.e., the class with the highest probability of being predicted by each
classifier. For example, let's say classifiers predicted the output classes as (Cat, Dog,
Dog). As the classifiers predicted class "dog" a maximum number of times, we will
proceed with Dog as our final prediction.
• Soft Voting: In this, the average probabilities of the classes determine which one will
be the final prediction. For example, let's say the probabilities of the class being a "dog"
is (0.30, 0.47, 0.53) and a "cat" is (0.20, 0.32, 0.40). So, the average for a class dog is
0.4333, and the cat is 0.3067, from this, we can confirm our final prediction to be a dog
as it has the highest average probability.
Home
• Weighted Majority Voting: In addition to the simple majority vote (hard voting) as
described in the previous section, we can compute a weighted majority vote by
associating a weight w with classifier C.
Program:
# Import the required libraries
import pandas as pd
import numpy as np
# Evaluation Metrics
from sklearn.metrics import accuracy_score
df["target"].value_counts()
X = df.drop("target", axis = 1)
y = df["target"]
Home
# train test split
np.random.seed(42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)
X_train.shape, y_train.shape, X_test.shape, y_test.shape
((242, 13), (242,), (61, 13), (61,))
return model_scores
lr = LogisticRegression()
kn = KNeighborsClassifier()
rf = RandomForestClassifier()
estimators=[('lr', lr), ('kn', kn), ('rf', rf)]
Home
# Create a Voting Classifier with soft voting
voting_clf_soft = VotingClassifier(
estimators=estimators,
voting='soft'
)
Home
Bagging, Pasting, Boosting and Staking
Bagging, Pasting, Boosting, and Stacking are four popular ensemble learning methods used
in machine learning. These techniques combine multiple models to improve performance by
reducing overfitting, increasing accuracy, or enhancing robustness.
Bagging
One of the first uses of ensemble methods in machine learning was the bagging technique. This
technique was developed to overcome instability in decision trees. An example of the bagging
technique is the random forest algorithm. The random forest is an ensemble of multiple
decision trees. Decision trees tend to be prone to overfitting. Because of this, a single decision
tree doesn’t provide reliable predictions. To improve the prediction accuracy of decision trees,
bagging is employed to form a random forest. The resulting random forest has a lower variance
compared to the individual trees. The success of bagging led to developing other ensemble
techniques such as boosting, stacking, and many others.
We use bagging to combine weak learners of high variance. Bagging aims to produce a model
with lower variance than the individual weak models. These weak learners are homogenous,
meaning they are of the same type. Bagging is also known as Bootstrap aggregating. It consists
of two steps: bootstrapping and aggregation.
Bootstrapping:
Multiple subsets (samples) are created from the original training dataset by sampling with
replacement. Each subset can have duplicate samples and will typically be the same size as
the original dataset.
Aggregating:
Individual weak learners train independently from each other. Each learner makes independent
predictions. The system aggregates the results of those predictions to get the overall prediction.
The predictions are aggregated using either max voting or averaging.
Max Voting
Each model makes a prediction, and a prediction from each model counts as a single ‘vote.’
The most occurring ‘vote’ is chosen as the representative for the combined model.
Home
Averaging
Using it generally for regression problems. It involves taking the average of the predictions.
The resulting average is used as the overall prediction for the combined model.
Bagging example:
For example, let’s say we have a set of observations: [2, 4, 32, 8, 16]. If we want each bootstrap
sample containing n observations, the following are valid samples:
• n=3: [32, 4, 4], [8, 16, 2], [2, 2, 2]…
• n=4: [2, 32, 4, 16], [2, 4, 2, 8], [8, 32, 4, 2]…
Since we drawn data with replacement, the observations can appear more than one time in a
single sample.
Pasting example:
Let's consider the numerical list [2, 4, 32, 8, 16] as our dataset. We will create pasting samples
by sampling without replacement.
• Paste Sample 1: [2, 4, 32, 8, 16]
• Paste Sample 2: [8, 4, 2, 32, 16]
Pasting ensures no repetition of elements in each sample. All elements from the original dataset
are included in each paste sample. The order of elements might vary, but no element repeats in
a single sample.
Home
Disadvantages of Bagging
Home
Boosting
We use boosting to combine weak learners with high bias. Boosting aims to produce a model
with a lower bias than the individual models. Like in bagging, the weak learners are
homogeneous.
Boosting involves sequentially training weak learners. Here, each subsequent learner improves
the errors of previous learners in the sequence. A sample of data is first taken from the
initial dataset. Using this sample to train the first model, and the model makes its prediction.
The samples can either be correctly or incorrectly predicted. The samples that are wrongly
predicted are reused for training the next model. In this way, subsequent models can improve
on the errors of previous models.
Unlike bagging, which aggregates prediction results at the end, boosting aggregates the results
at each step. Weighted averaging involves giving all models different weights depending on
their predictive power. In other words, it gives more weight to the model with the highest
predictive power. This is because the learner with the highest predictive power is considered
the most important.
Sequential Learning: Models are trained sequentially, where each model corrects the errors
made by the previous ones. This is in contrast to bagging where models are trained
independently.
Focus on Errors: Each successive model focuses more on the misclassified points from the
previous model. Misclassified points are given higher weight so that the next model will pay
more attention to them.
Weighted Voting (or Averaging): When combining the predictions from all models, boosting
uses a weighted average or voting mechanism where the predictions of the models with higher
accuracy carry more weight.
Home
Boosting works with the following steps:
1. We sample m-number of subsets from an initial training dataset.
2. Using the first subset, we train the first weak learner.
3. We test the trained weak learner using the training data. As a result of the testing, some
data points will be incorrectly predicted.
4. Each data point with the wrong prediction is sent into the second subset of data, and
this subset is updated.
5. Using this updated subset, we train and test the second weak learner.
6. We continue with the next subset until reaching the total number of subsets.
7. We now have the total prediction. The overall prediction has already been aggregated
at each step, so there is no need to calculate it.
Examples of Boosting:
2. Gradient Boosting
Gradient Boosting trains models sequentially, where each new model tries to correct the
residual errors (i.e., the difference between the predicted and actual values) from the
previous model. Each model is fitted to the negative gradient of the loss function, hence
the name Gradient Boosting. Example: House price prediction.
Home
5. CatBoost (Categorical Boosting)
CatBoost is a gradient boosting algorithm optimized for categorical features. Unlike other
boosting algorithms, CatBoost automatically handles categorical data without the need for
extensive preprocessing like one-hot encoding. It uses a technique called ordered boosting
to reduce overfitting and ensure model stability. Example: Credit card fraud detection.
Differences between bagging and boosting
Home
Stacking:
We use stacking to improve the prediction accuracy of strong learners. Stacking aims to create
a single robust model from multiple heterogeneous strong learners.
Stacking differs from bagging and boosting in machine learning in that:
• It combines strong learners
• It combines heterogeneous models
• It consists of creating a Metamodel.
Individual heterogeneous models are trained using an initial dataset. These models make
predictions and form a single new dataset using those predictions. Using this new data set to
train the metamodel, which makes the final prediction. Combining the prediction using
weighted averaging.
Because stacking combines strong learners, it can combine bagged or boosted models.
Home
Random Forests
Random forest is a supervised learning algorithm. The “forest” it builds is an ensemble
of decision trees, usually trained with the bagging method. The general idea of the bagging
method is that a combination of learning models increases the overall result.
A Random Forest is a collection of decision trees that work together to make predictions.
The greater number of trees in the forest leads to higher accuracy and prevents the problem of
overfitting.Random Forest algorithm is a powerful tree learning technique in Machine Learning
to make predictions and then we do voting of all the tress to make prediction. They are widely
used for classification and regression task.
• It is a type of classifier that uses many decision trees to make predictions.
• It takes different random parts of the dataset to train each tree and then it combines the
results by averaging them. This approach helps improve the accuracy of
predictions. Random Forest is based on ensemble learning.
Imagine asking a group of friends for advice on where to go for vacation. Each friend gives
their recommendation based on their unique perspective and preferences (decision trees trained
on different subsets of data). You then make your final decision by considering the majority
opinion or averaging their suggestions (ensemble prediction).
Random forest algorithm in Machine Learning
Home
• Then - Multiple Decision Trees are created from the training data. Each tree is trained
on a random subset of the data (with replacement) and a random subset of features.
This process is known as bagging or bootstrap aggregating.
• Each Decision Tree in the ensemble learns to make predictions independently.
• When presented with a new, unseen instance, each Decision Tree in the ensemble
makes a prediction.
The final prediction is made by combining the predictions of all the Decision Trees. This is
typically done through a majority vote (for classification) or averaging (for regression).
Key Features of Random Forest
• Handles Missing Data: Automatically handles missing values during training,
eliminating the need for manual imputation.
• Algorithm ranks features based on their importance in making predictions offering
valuable insights for feature selection and interpretability.
• Scales Well with Large and Complex Data without significant performance
degradation.
• Algorithm is versatile and can be applied to both classification tasks (e.g., predicting
categories) and regression tasks (e.g., predicting continuous values).
Applications of Random Forest
There are mainly four sectors where Random forest mostly used:
1. Banking: Banking sector mostly uses this algorithm for the identification of loan risk.
2. Medicine: With the help of this algorithm, disease trends and risks of the disease can
be identified.
3. Land Use: We can identify the areas of similar land use by this algorithm.
4. Marketing: Marketing trends can be identified using this algorithm.
Advantages of Random Forest
• Random Forest provides very accurate predictions even with large datasets.
• Random Forest can handle missing data well without compromising with accuracy.
• It doesn’t require normalization or standardization on dataset.
• When we combine multiple decision trees it reduces the risk of overfitting of the
model.
Limitations of Random Forest
• It can be computationally expensive especially with a large number of trees.
• It’s harder to interpret the model compared to simpler models like decision trees.