0% found this document useful (0 votes)
11 views16 pages

ML Unit-3

The document outlines a syllabus for a Machine Learning course, covering key concepts such as supervised and unsupervised learning, ensemble methods, and neural networks. It details various techniques like Random Forests, Boosting, and Voting Classifiers, along with their advantages, disadvantages, and applications. Additionally, it includes practical programming examples for implementing these techniques using Python and Scikit-Learn.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views16 pages

ML Unit-3

The document outlines a syllabus for a Machine Learning course, covering key concepts such as supervised and unsupervised learning, ensemble methods, and neural networks. It details various techniques like Random Forests, Boosting, and Voting Classifiers, along with their advantages, disadvantages, and applications. Additionally, it includes practical programming examples for implementing these techniques using Python and Scikit-Learn.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Artificial Intelligence and Data Science (AI & DS)

Home

Machine Learning [B20AD3201]


Syllabus
Unit-1

Introduction- Artificial Intelligence, Machine Learning, Deep Learning, Types of Machine


Learning Systems, Main Challenges of Machine Learning. Statistical Learning: Introduction,
Supervised and Unsupervised Learning, Training and Test Loss, Tradeoffs in Statistical
Learning, Estimating Risk Statistics, Sampling distribution of an estimator, Empirical Risk
Minimization.

Unit-2

Supervised Learning (Regression/Classification): Basic Methods: Distance-based Methods,


Nearest Neighbours, Decision Trees, Naive Bayes, Linear Models: Linear Regression, Logistic
Regression, Generalized Linear Models, Support Vector Machines, Binary Classification:
Multiclass/Structured outputs, MNIST, Ranking.

Unit-3

Ensemble Learning and Random Forests: Introduction, Voting Classifiers, Bagging and
Pasting, Random Forests, Boosting, Stacking. Support Vector Machine: Linear SVM
Classification, Nonlinear SVM Classification SVM Regression, Naïve Bayes Classifiers.

Unit-4

Unsupervised Learning Techniques: Clustering, K-Means, Limits of K-Means, Using


Clustering for Image Segmentation, Using Clustering for Pre processing, Using Clustering for
Semi-Supervised Learning, DBSCAN, Gaussian Mixtures. Dimensionality Reduction: The
Curse of Dimensionality, Main Approaches for Dimensionality Reduction, PCA, Using Scikit-
Learn, Randomized PCA, Kernel PCA.

Unit-5

Neural Networks and Deep Learning: Introduction to Artificial Neural Networks with Keras,
Implementing MLPs with Keras, Installing Tensor Flow 2, Loading and Preprocessing Data
with Tensor Flow.

Department of Information Technology, SRKREC(A)


Artificial Intelligence and Data Science (AI & DS)

Home
Unit-3

Ensemble Learning and Random Forests: Introduction, Voting Classifiers, Bagging and
Pasting, Random Forests, Boosting, Stacking. Support Vector Machine: Linear SVM
Classification, Nonlinear SVM Classification SVM Regression, Naïve Bayes Classifiers.

Ensemble Learning
Ensemble learning is a machine learning technique where multiple models (often referred to as
"weak learners" or "base estimators") are combined to solve a problem and improve overall
performance. The idea is that by aggregating the predictions of several models, the ensemble
will be more robust and accurate than any single model.
Key Benefits of Ensemble Learning:
1. Improved Accuracy: Combines the strengths of multiple models, often outperforming
individual models.
2. Reduced Overfitting: By averaging or voting, ensemble models generalize better to
unseen data.
3. Error Reduction: It reduces three types of errors:
o Bias: By combining multiple models, bias in individual models is reduced.
o Variance: Aggregating predictions smooths out variance in individual models.
o Noise: Uncorrelated errors in models cancel out.
Types of Ensemble Learning:
1. Bagging (Bootstrap Aggregating):
• Builds multiple models using different subsets of the training data.
• Reduces variance and avoids overfitting.
• Example: Random Forests.
2. Boosting:
• Builds models sequentially, where each model corrects the errors of its predecessor.
• Focuses on reducing bias.
• Example: AdaBoost, Gradient Boosting, XGBoost.
3. Stacking: Combines predictions from multiple models using another model (meta-
model) that learns how to aggregate them.
4. Voting: Combines predictions by taking a majority vote (classification) or averaging
predictions (regression).

Department of Information Technology, SRKREC(A)


Artificial Intelligence and Data Science (AI & DS)

Home
Random Forest
Random Forest is a powerful and widely used ensemble learning technique that extends the
concept of decision trees by combining multiple decision trees to improve accuracy and reduce
overfitting.
How Random Forest Works:
1. Bagging:
o Random Forests use bagging, meaning each tree is trained on a different
bootstrapped subset of the training data (sampling with replacement).
2. Random Feature Selection:
o At each split in the decision tree, a random subset of features is considered. This
ensures that the trees are decorrelated and diverse, which improves ensemble
performance.
3. Aggregation:
o For classification: Each tree votes for a class, and the majority vote is selected.
o For regression: Predictions are averaged across all trees.
Advantages of Random Forests:
1. High Accuracy: Performs well on both classification and regression tasks.
2. Handles High-Dimensional Data: Works well with datasets having many features.
3. Robust to Overfitting: The randomness introduced by bagging and feature selection
reduces overfitting.
4. Handles Missing Values: Can handle datasets with missing data to some extent.
5. Feature Importance: Provides insights into feature importance, aiding interpretability.
Disadvantages of Random Forests:
1. Computationally Expensive: Training and prediction can be slow for large datasets or
many trees.
2. Not Fully Interpretable: Unlike single decision trees, Random Forests are more of a
"black box."
3. Overfitting Risk (Rare): While generally robust, a large number of trees can
sometimes lead to slight overfitting if not tuned properly.
Applications of Random Forests:
Classification Tasks: Medical diagnosis, Spam detection, Image and text classification.
Regression Tasks: Predicting housing prices, Forecasting sales. Environmental modelling.

Department of Information Technology, SRKREC(A)


Artificial Intelligence and Data Science (AI & DS)

Home
Voting Classifiers
A voting classifier is a machine learning model that gains experience by training on a collection
of several models and forecasts an output (class) based on the class with the highest likelihood
of becoming the output. To forecast the output class based on the largest majority of votes, it
averages the results of each classifier provided into the voting classifier. The concept is to build
a single model that learns from various models and predicts output based on their aggregate
majority of votes for each output class, rather than building separate specialized models and
determining the accuracy for each of them.
There are primarily two different types of voting classifiers:
• Hard Voting: In hard voting, the predicted output class is a class with the highest
majority of votes, i.e., the class with the highest probability of being predicted by each
classifier. For example, let's say classifiers predicted the output classes as (Cat, Dog,
Dog). As the classifiers predicted class "dog" a maximum number of times, we will
proceed with Dog as our final prediction.

• Soft Voting: In this, the average probabilities of the classes determine which one will
be the final prediction. For example, let's say the probabilities of the class being a "dog"
is (0.30, 0.47, 0.53) and a "cat" is (0.20, 0.32, 0.40). So, the average for a class dog is
0.4333, and the cat is 0.3067, from this, we can confirm our final prediction to be a dog
as it has the highest average probability.

Department of Information Technology, SRKREC(A)


Artificial Intelligence and Data Science (AI & DS)

Home

• Weighted Majority Voting: In addition to the simple majority vote (hard voting) as
described in the previous section, we can compute a weighted majority vote by
associating a weight w with classifier C.

Hard Voting, Soft Voting and Weighted Majority Voting implementation:

Program:
# Import the required libraries

import pandas as pd
import numpy as np

#Preprocessing and Modelling


from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier

# Evaluation Metrics
from sklearn.metrics import accuracy_score

# import the dataset


df = pd.read_csv("heart-disease.csv")

# print top 5 lines


df.head()

df["target"].value_counts()

X = df.drop("target", axis = 1)
y = df["target"]

Department of Information Technology, SRKREC(A)


Artificial Intelligence and Data Science (AI & DS)

Home
# train test split
np.random.seed(42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)
X_train.shape, y_train.shape, X_test.shape, y_test.shape
((242, 13), (242,), (61, 13), (61,))

models = {"Logistic Regression": LogisticRegression(),


"KNeighbors": KNeighborsClassifier(),
"Random Forest Classifier": RandomForestClassifier()}

# Train, fit and score each of these models


def modelling(models, X_train, y_train, X_test, y_test):
np.random.seed(42) # For reproducibility
model_scores = {}

for name, model in models.items():


# Train the model
model.fit(X_train, y_train)
# Predict using the model
y_pred = model.predict(X_test)
# Calculate accuracy
model_scores[name] = accuracy_score(y_test, y_pred)

return model_scores

model_scores = modelling(models, X_train, y_train, X_test, y_test)


model_scores

{'Logistic Regression': 0.8852459016393442,


'KNeighbors': 0.6885245901639344,
'Random Forest Classifier': 0.8360655737704918}

lr = LogisticRegression()
kn = KNeighborsClassifier()
rf = RandomForestClassifier()
estimators=[('lr', lr), ('kn', kn), ('rf', rf)]

# Create a Voting Classifier with hard voting


voting_clf_hard = VotingClassifier(
estimators=estimators,
voting='hard'
)
# Train the Voting Classifier
voting_clf_hard.fit(X_train, y_train)

# Make predictions and evaluate


y_pred_hard = voting_clf_hard.predict(X_test)
accuracy_hard = accuracy_score(y_test, y_pred_hard)
print(f"Hard Voting Accuracy: {accuracy_hard:.2f}")
Hard Voting Accuracy: 0.89

Department of Information Technology, SRKREC(A)


Artificial Intelligence and Data Science (AI & DS)

Home
# Create a Voting Classifier with soft voting
voting_clf_soft = VotingClassifier(
estimators=estimators,
voting='soft'
)

# Train the Voting Classifier with soft voting


voting_clf_soft.fit(X_train, y_train)

# Make predictions and evaluate


y_pred_soft = voting_clf_soft.predict(X_test)
accuracy_soft = accuracy_score(y_test, y_pred_soft)
print(f"Soft Voting Accuracy: {accuracy_soft:.2f}")
Hard Voting Accuracy: 0.89

# Create a weighted Voting Classifier with soft voting


voting_clf_weighted = VotingClassifier(
estimators=estimators,
voting='soft',
weights=[2, 1, 2]
)

# Train and evaluate the weighted Voting Classifier


voting_clf_weighted.fit(X_train, y_train)
y_pred_weighted = voting_clf_weighted.predict(X_test)
accuracy_weighted = accuracy_score(y_test, y_pred_weighted)
print(f"Weighted Soft Voting Accuracy: {accuracy_weighted:.2f}")
Hard Voting Accuracy: 0.89

Department of Information Technology, SRKREC(A)


Artificial Intelligence and Data Science (AI & DS)

Home
Bagging, Pasting, Boosting and Staking

Bagging, Pasting, Boosting, and Stacking are four popular ensemble learning methods used
in machine learning. These techniques combine multiple models to improve performance by
reducing overfitting, increasing accuracy, or enhancing robustness.

Bagging
One of the first uses of ensemble methods in machine learning was the bagging technique. This
technique was developed to overcome instability in decision trees. An example of the bagging
technique is the random forest algorithm. The random forest is an ensemble of multiple
decision trees. Decision trees tend to be prone to overfitting. Because of this, a single decision
tree doesn’t provide reliable predictions. To improve the prediction accuracy of decision trees,
bagging is employed to form a random forest. The resulting random forest has a lower variance
compared to the individual trees. The success of bagging led to developing other ensemble
techniques such as boosting, stacking, and many others.

We use bagging to combine weak learners of high variance. Bagging aims to produce a model
with lower variance than the individual weak models. These weak learners are homogenous,
meaning they are of the same type. Bagging is also known as Bootstrap aggregating. It consists
of two steps: bootstrapping and aggregation.

Bootstrapping:
Multiple subsets (samples) are created from the original training dataset by sampling with
replacement. Each subset can have duplicate samples and will typically be the same size as
the original dataset.

Aggregating:
Individual weak learners train independently from each other. Each learner makes independent
predictions. The system aggregates the results of those predictions to get the overall prediction.
The predictions are aggregated using either max voting or averaging.

Max Voting
Each model makes a prediction, and a prediction from each model counts as a single ‘vote.’
The most occurring ‘vote’ is chosen as the representative for the combined model.

Department of Information Technology, SRKREC(A)


Artificial Intelligence and Data Science (AI & DS)

Home
Averaging
Using it generally for regression problems. It involves taking the average of the predictions.
The resulting average is used as the overall prediction for the combined model.

With or without sample replacement:


1. Sample with replacement is called bagging.
2. Sample without replacement is called pasting.
1. With Replacement
• In with replacement sampling, each row in the original dataset can appear multiple
times in a single bootstrap sample.
• The dataset is sampled randomly, and the same row can be selected again after it's
already been included.
2. Without Replacement
• In without replacement sampling, each row is selected at most once in a single
bootstrap sample.
• The dataset is sampled randomly, but no duplicates occur in the sample.

Bagging example:
For example, let’s say we have a set of observations: [2, 4, 32, 8, 16]. If we want each bootstrap
sample containing n observations, the following are valid samples:
• n=3: [32, 4, 4], [8, 16, 2], [2, 2, 2]…
• n=4: [2, 32, 4, 16], [2, 4, 2, 8], [8, 32, 4, 2]…
Since we drawn data with replacement, the observations can appear more than one time in a
single sample.
Pasting example:
Let's consider the numerical list [2, 4, 32, 8, 16] as our dataset. We will create pasting samples
by sampling without replacement.
• Paste Sample 1: [2, 4, 32, 8, 16]
• Paste Sample 2: [8, 4, 2, 32, 16]
Pasting ensures no repetition of elements in each sample. All elements from the original dataset
are included in each paste sample. The order of elements might vary, but no element repeats in
a single sample.

Department of Information Technology, SRKREC(A)


Artificial Intelligence and Data Science (AI & DS)

Home

The steps of bagging are as follows:


1. We have an initial training dataset containing n-number of instances.
2. We create a m-number of subsets of data from the training set. We take a subset of N
sample points from the initial dataset for each subset. Each subset is taken with
replacement. This means that a specific data point can be sampled more than once.
3. For each subset of data, we train the corresponding weak learners independently. These
models are homogeneous, meaning that they are of the same type.
4. Each model makes a prediction.
5. Aggregating the predictions into a single prediction. For this, using either max voting
or averaging.
Advantages of Bagging
1. Reduces Overfitting (Variance Reduction)
2. Improves Accuracy
3. Parallelization
4. Works Well with High-Variance Models
5. Simple and Easy to Implement

Disadvantages of Bagging

1. Increased Computational Cost


2. Not Effective for High-Bias Models
3. Requires Large Training Data

Department of Information Technology, SRKREC(A)


Artificial Intelligence and Data Science (AI & DS)

Home
Boosting
We use boosting to combine weak learners with high bias. Boosting aims to produce a model
with a lower bias than the individual models. Like in bagging, the weak learners are
homogeneous.
Boosting involves sequentially training weak learners. Here, each subsequent learner improves
the errors of previous learners in the sequence. A sample of data is first taken from the
initial dataset. Using this sample to train the first model, and the model makes its prediction.
The samples can either be correctly or incorrectly predicted. The samples that are wrongly
predicted are reused for training the next model. In this way, subsequent models can improve
on the errors of previous models.
Unlike bagging, which aggregates prediction results at the end, boosting aggregates the results
at each step. Weighted averaging involves giving all models different weights depending on
their predictive power. In other words, it gives more weight to the model with the highest
predictive power. This is because the learner with the highest predictive power is considered
the most important.
Sequential Learning: Models are trained sequentially, where each model corrects the errors
made by the previous ones. This is in contrast to bagging where models are trained
independently.
Focus on Errors: Each successive model focuses more on the misclassified points from the
previous model. Misclassified points are given higher weight so that the next model will pay
more attention to them.
Weighted Voting (or Averaging): When combining the predictions from all models, boosting
uses a weighted average or voting mechanism where the predictions of the models with higher
accuracy carry more weight.

Department of Information Technology, SRKREC(A)


Artificial Intelligence and Data Science (AI & DS)

Home
Boosting works with the following steps:
1. We sample m-number of subsets from an initial training dataset.
2. Using the first subset, we train the first weak learner.
3. We test the trained weak learner using the training data. As a result of the testing, some
data points will be incorrectly predicted.
4. Each data point with the wrong prediction is sent into the second subset of data, and
this subset is updated.
5. Using this updated subset, we train and test the second weak learner.
6. We continue with the next subset until reaching the total number of subsets.
7. We now have the total prediction. The overall prediction has already been aggregated
at each step, so there is no need to calculate it.

Examples of Boosting:

1. AdaBoost (Adaptive Boosting)


AdaBoost builds a sequence of models where each new model focuses more on the data
points that were misclassified by previous models. Each model is assigned a weight based
on its performance, and the final prediction is a weighted sum of the predictions from all
models. Example: Spam email classifier.

2. Gradient Boosting
Gradient Boosting trains models sequentially, where each new model tries to correct the
residual errors (i.e., the difference between the predicted and actual values) from the
previous model. Each model is fitted to the negative gradient of the loss function, hence
the name Gradient Boosting. Example: House price prediction.

3. XGBoost (Extreme Gradient Boosting)


XGBoost is an optimized and regularized version of Gradient Boosting. It improves upon
traditional gradient boosting by introducing techniques like tree pruning, regularization,
and parallelization to make it faster and more efficient. Example: Predictive modeling for
customer churn.

4. LightGBM (Light Gradient Boosting Machine)


LightGBM is a gradient boosting framework that uses leaf-wise tree growth instead of
level-wise growth used by traditional gradient boosting. This results in faster training times
and better performance on large datasets. Example: Web search ranking problem.

Department of Information Technology, SRKREC(A)


Artificial Intelligence and Data Science (AI & DS)

Home
5. CatBoost (Categorical Boosting)
CatBoost is a gradient boosting algorithm optimized for categorical features. Unlike other
boosting algorithms, CatBoost automatically handles categorical data without the need for
extensive preprocessing like one-hot encoding. It uses a technique called ordered boosting
to reduce overfitting and ensure model stability. Example: Credit card fraud detection.
Differences between bagging and boosting

Feature Bagging Boosting


Sequential, dependent on
Training Method Parallel, independent
previous model

Purpose Reduce variance Reduce bias and variance

Equal weight for all Models weighted based on


Model Weighting
models performance

Bootstrap sampling (with No resampling, focus on


Data Sampling
replacement) misclassified instances

Corrects errors of previous


Error Correction No error correction
models

Prone to overfitting (if too many


Overfitting Less prone to overfitting
iterations)

Random Forest, Bagged AdaBoost, Gradient Boosting,


Common Algorithms
Decision Trees XGBoost, LightGBM

Speed Faster to train (parallel) Slower to train (sequential)

Department of Information Technology, SRKREC(A)


Artificial Intelligence and Data Science (AI & DS)

Home
Stacking:
We use stacking to improve the prediction accuracy of strong learners. Stacking aims to create
a single robust model from multiple heterogeneous strong learners.
Stacking differs from bagging and boosting in machine learning in that:
• It combines strong learners
• It combines heterogeneous models
• It consists of creating a Metamodel.
Individual heterogeneous models are trained using an initial dataset. These models make
predictions and form a single new dataset using those predictions. Using this new data set to
train the metamodel, which makes the final prediction. Combining the prediction using
weighted averaging.
Because stacking combines strong learners, it can combine bagged or boosted models.

The steps of Stacking are as follows:


• We use initial training data to train m-number of algorithms.
• Using the output of each algorithm, we create a new training set.
• Using the new training set, we create a meta-model algorithm.
• Using the results of the meta-model, we make the final prediction. Combining the result
using weighted averaging.

Department of Information Technology, SRKREC(A)


Artificial Intelligence and Data Science (AI & DS)

Home
Random Forests
Random forest is a supervised learning algorithm. The “forest” it builds is an ensemble
of decision trees, usually trained with the bagging method. The general idea of the bagging
method is that a combination of learning models increases the overall result.
A Random Forest is a collection of decision trees that work together to make predictions.
The greater number of trees in the forest leads to higher accuracy and prevents the problem of
overfitting.Random Forest algorithm is a powerful tree learning technique in Machine Learning
to make predictions and then we do voting of all the tress to make prediction. They are widely
used for classification and regression task.
• It is a type of classifier that uses many decision trees to make predictions.
• It takes different random parts of the dataset to train each tree and then it combines the
results by averaging them. This approach helps improve the accuracy of
predictions. Random Forest is based on ensemble learning.
Imagine asking a group of friends for advice on where to go for vacation. Each friend gives
their recommendation based on their unique perspective and preferences (decision trees trained
on different subsets of data). You then make your final decision by considering the majority
opinion or averaging their suggestions (ensemble prediction).
Random forest algorithm in Machine Learning

The random Forest algorithm works in several steps:


Process starts with a dataset with rows and their corresponding class labels (columns).

Department of Information Technology, SRKREC(A)


Artificial Intelligence and Data Science (AI & DS)

Home
• Then - Multiple Decision Trees are created from the training data. Each tree is trained
on a random subset of the data (with replacement) and a random subset of features.
This process is known as bagging or bootstrap aggregating.
• Each Decision Tree in the ensemble learns to make predictions independently.
• When presented with a new, unseen instance, each Decision Tree in the ensemble
makes a prediction.
The final prediction is made by combining the predictions of all the Decision Trees. This is
typically done through a majority vote (for classification) or averaging (for regression).
Key Features of Random Forest
• Handles Missing Data: Automatically handles missing values during training,
eliminating the need for manual imputation.
• Algorithm ranks features based on their importance in making predictions offering
valuable insights for feature selection and interpretability.
• Scales Well with Large and Complex Data without significant performance
degradation.
• Algorithm is versatile and can be applied to both classification tasks (e.g., predicting
categories) and regression tasks (e.g., predicting continuous values).
Applications of Random Forest
There are mainly four sectors where Random forest mostly used:
1. Banking: Banking sector mostly uses this algorithm for the identification of loan risk.
2. Medicine: With the help of this algorithm, disease trends and risks of the disease can
be identified.
3. Land Use: We can identify the areas of similar land use by this algorithm.
4. Marketing: Marketing trends can be identified using this algorithm.
Advantages of Random Forest
• Random Forest provides very accurate predictions even with large datasets.
• Random Forest can handle missing data well without compromising with accuracy.
• It doesn’t require normalization or standardization on dataset.
• When we combine multiple decision trees it reduces the risk of overfitting of the
model.
Limitations of Random Forest
• It can be computationally expensive especially with a large number of trees.
• It’s harder to interpret the model compared to simpler models like decision trees.

Department of Information Technology, SRKREC(A)

You might also like