0% found this document useful (0 votes)
4 views42 pages

ML Unit 3 (Ab22)

Boosting is a machine learning technique that combines multiple weak models to create a strong model, improving accuracy and handling complex data. Key types of boosting include AdaBoost, Gradient Boosting, and XGBoost, each with distinct advantages and disadvantages. While boosting can enhance model performance, it may also lead to overfitting and requires careful tuning of hyperparameters.

Uploaded by

venkata rajesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views42 pages

ML Unit 3 (Ab22)

Boosting is a machine learning technique that combines multiple weak models to create a strong model, improving accuracy and handling complex data. Key types of boosting include AdaBoost, Gradient Boosting, and XGBoost, each with distinct advantages and disadvantages. While boosting can enhance model performance, it may also lead to overfitting and requires careful tuning of hyperparameters.

Uploaded by

venkata rajesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Machine Learning (unit-3)

Q) Boosting?

Boosting is a machine learning technique that combines multiple weak models


to create a strong model. It's like having a team of experts working together to
make a decision, where each expert has a different perspective on the
problem.

How Does Boosting Work?

Here's a simplified explanation:

1. Start with a Weak Model: We start with a weak model that makes
predictions on the data.

2. Identify Errors: We identify the errors made by the weak model and give
more weight to the data points that were misclassified.

3. Create a New Model: We create a new model that tries to correct the errors
made by the previous model.

4. Combine Models: We combine the predictions of all the models to create a


strong model.

5. Repeat: We repeat the process until we achieve the desired level of


accuracy.

Key Concepts

1. Weak Model: A weak model is a model that makes predictions on the data,
but is not very accurate.

2. Strong Model: A strong model is a model that makes accurate predictions on


the data.

3. Weighting: We give more weight to the data points that were misclassified
by the previous model.

4. Iteration: We repeat the process of creating a new model and combining it


with the previous models.

Types of Boosting

1. AdaBoost: AdaBoost is a popular boosting algorithm that gives more weight


to the data points that were misclassified by the previous model.
2. Gradient Boosting: Gradient Boosting is a boosting algorithm that uses
gradient descent to optimize the loss function.

3. XGBoost: XGBoost is a popular gradient boosting algorithm that is widely


used in machine learning competitions.

Advantages:

1. Improved Accuracy: Boosting can improve the accuracy of the model by


combining the predictions of multiple weak models.

2. Handling Complex Data: Boosting can handle complex data sets with
multiple features and interactions between them.

3. Robustness to Outliers: Boosting can provide robustness to outliers and


noise in the data by reducing the impact of individual models.

4. Flexibility: Boosting allows for the use of different models and algorithms,
which can be useful for handling different types of data.

5. Handling Missing Values: Boosting can handle missing values in the data by
using surrogate models to impute the missing values.

6. Interpretability: Boosting can provide interpretability of the results by


analyzing the contribution of each model to the final prediction.

7. Efficient Use of Data: Boosting can make efficient use of the data by using
the same data to train multiple models.

Disadvantages:

1. Computational Cost: Boosting can be computationally expensive, especially


when using large datasets and complex models.

2. Overfitting: Boosting can lead to overfitting, especially when using a large


number of models or iterations.

3. Sensitive to Hyperparameters: Boosting can be sensitive to


hyperparameters, such as the learning rate and the number of iterations.

4. Difficult to Interpret: Boosting can make it difficult to interpret the results,


especially when using complex models and algorithms.

5. Requires Large Amounts of Data: Boosting requires large amounts of data to


train and validate the models, which can be a limitation in some cases.
6. Can be Sensitive to Noise: Boosting can be sensitive to noise in the data,
which can affect the performance of the model.

7. Not Suitable for All Problems: Boosting may not be suitable for all problems,
especially those that require a simple and interpretable model.

Boosting example:

Step 1: Define the Problem

Suppose we want to predict whether a person is likely to buy a car based on


their age and income. We have a dataset of 5 people, with their corresponding
ages and incomes.

| Age | Income | Buy Car |

| --- | --- | --- |

| 25 | 50000 | Yes |

| 30 | 60000 | Yes |

| 35 | 70000 | No |

| 20 | 40000 | No |

| 40 | 80000 | Yes |

Step 2: Create a Weak Model

We start with a weak model that uses only the age to make predictions. This
model is not very accurate, but it's a good starting point.

| Age | Predicted Buy Car |

| --- | --- |

| 25 | Yes |

| 30 | Yes |

| 35 | Yes |

| 20 | No |

| 40 | Yes |
The weak model predicts that people under 30 will buy a car, and people over
30 will not buy a car.

Step 3: Identify Errors

We identify the errors made by the weak model. For example, the model
predicts that a person aged 35 will buy a car, but they actually do not.

| Age | Predicted Buy Car | Actual Buy Car |

| --- | --- | --- |

| 25 | Yes | Yes |

| 30 | Yes | Yes |

| 35 | Yes | No (Error) |

| 20 | No | No |

| 40 | Yes | Yes |

Step 4: Create a New Model

We create a new model that uses the age and income to make predictions.
This model is more accurate than the previous model.

| Age | Income | Predicted Buy Car |

| --- | --- | --- |

| 25 | 50000 | Yes |

| 30 | 60000 | Yes |

| 35 | 70000 | No |

| 20 | 40000 | No |

| 40 | 80000 | Yes |

The new model predicts that people with an income over 60000 will buy a car,
and people with an income under 60000 will not buy a car.

Step 5: Combine Models

We combine the predictions of the two models to create a strong model.


| Age | Income | Predicted Buy Car (Weak Model) | Predicted Buy Car (New
Model) | Final Predicted Buy Car |

| --- | --- | --- | --- | --- |

| 25 | 50000 | Yes | Yes | Yes |

| 30 | 60000 | Yes | Yes | Yes |

| 35 | 70000 | Yes | No | No |

| 20 | 40000 | No | No | No |

| 40 | 80000 | Yes | Yes | Yes |

The strong model uses the predictions of both models to make a final
prediction.

Step 6: Repeat

We repeat the process of creating a new model and combining it with the
previous models until we achieve the desired level of accuracy.

............................................ END…………………

Q) Role of weak learner in boosting?

In boosting, a weak learner is a machine learning model that is trained on a


subset of the data and is used to make predictions on the remaining data. The
weak learner is called "weak" because it is not expected to perform well on its
own, but rather to contribute to the overall performance of the boosting
ensemble.

The role of a weak learner in boosting is to:

1. Make predictions: The weak learner makes predictions on the training data,
which are then used to compute the error.

2. Compute the error: The error is computed as the difference between the
predicted values and the actual values.

3. Update the weights: The weights of the training data are updated based on
the error, so that the data points that are misclassified are given more weight.

4. Train the next weak learner: The next weak learner is trained on the
updated weighted data, and the process is repeated.
The weak learner is typically a simple model, such as a decision tree or a
linear classifier, and is trained on a subset of the data. The weak learner is not
expected to perform well on its own, but rather to contribute to the overall
performance of the boosting ensemble.

Properties of a weak learner:

1. Simple: A weak learner is typically a simple model, such as a decision tree


or a linear classifier.

2. Fast: A weak learner is typically fast to train and evaluate.

3. Low accuracy: A weak learner is not expected to have high accuracy on its
own.

4. Diverse: A weak learner should be diverse from other weak learners, so that
they can contribute to the overall performance of the boosting ensemble.

Examples of weak learners:

1. Decision trees: Decision trees are a popular choice for weak learners in
boosting, as they are simple, fast, and can be trained on a subset of the data.

2. Linear classifiers: Linear classifiers, such as logistic regression or linear


SVM, can be used as weak learners in boosting.

3. Neural networks: Neural networks can be used as weak learners in


boosting, but they are typically more complex and computationally expensive
than decision trees or linear classifiers.

Advantages of using weak learners:

1. Improved accuracy: Boosting with weak learners can improve the accuracy
of the overall model.

2. Robustness to overfitting: Boosting with weak learners can help to prevent


overfitting, as the weak learners are trained on a subset of the data and are
not expected to perform well on their own.

3. Flexibility: Boosting with weak learners can be used with a variety of


machine learning models, including decision trees, linear classifiers, and
neural networks.

………………….. END………………….
Q)Ada Boost?

AdaBoost is a type of machine learning algorithm that combines multiple


weak models to create a strong model. It's like having a team of experts
working together to make a decision.

How does AdaBoost work?

1. Start with a weak model: We start with a simple model that makes
predictions on the data.

2. Identify errors: We identify the errors made by the weak model.

3. Create a new model: We create a new model that tries to correct the errors
made by the previous model.

4. Combine models: We combine the predictions of all the models to create a


strong model.

5. Repeat: We repeat the process until we achieve the desired level of


accuracy.

Types of AdaBoost Algorithms

There are several types of AdaBoost algorithms, including:

1. AdaBoost.M1: This is the original AdaBoost algorithm, which uses a decision


tree as the base model.

2. AdaBoost.M2: This algorithm is an extension of the original AdaBoost


algorithm, which uses a different weight update rule.

3. AdaBoost.R2: This algorithm is an extension of the original AdaBoost


algorithm, which uses a different model update rule.

4. Real AdaBoost: This algorithm is an extension of the original AdaBoost


algorithm, which uses a different weight update rule and a different model
update rule.

AdaBoost Formula

The AdaBoost formula is as follows:

F(x) = ∑[i=1 to T] α_i * h_i(x)

where:
- F(x) is the final prediction

- α_i is the weight of the i-th base model

- h_i(x) is the prediction of the i-th base model

- T is the number of iterations

Key concepts

1. Weak model: A simple model that makes predictions on the data.

2. Strong model: A combination of multiple weak models that makes accurate


predictions.

3. Weighting: We give more weight to the models that are more accurate.

4. Iteration: We repeat the process of creating a new model and combining it


with the previous models.

Example

Suppose we want to predict whether a person will buy a car based on their
age and income. We start with a weak model that uses only the age to make
predictions. We then create a new model that uses the age and income to
make predictions. We combine the predictions of both models to create a
strong model.

Advantages

1. High accuracy: AdaBoost can achieve high accuracy on complex datasets.

2. Robust to noise: AdaBoost is robust to noise and can handle datasets with
outliers.

3. Simple to implement: AdaBoost is simple to implement and requires


minimal computational resources.

Disadvantages

1. Computational cost: AdaBoost can be computationally expensive, especially


on large datasets.

2. Overfitting: AdaBoost can suffer from overfitting, especially if the number of


iterations is too high.

Real-life example:
AdaBoost is used in many real-life applications, such as:

1. Image classification: AdaBoost is used to classify images into different


categories, such as objects, scenes, and actions.

2. Text classification: AdaBoost is used to classify text into different categories,


such as spam vs. non-spam emails.

3. Recommendation systems: AdaBoost is used to recommend products to


users based on their past behavior and preferences.

………………………….. END…………………

Q) Gradient Boosting?

Gradient Boosting is a type of machine learning algorithm that combines


multiple weak models to create a strong model. It's like having a team of
experts working together to make a decision.

How does Gradient Boosting work?

1. Start with a weak model: We start with a simple model that makes
predictions on the data.

2. Calculate the error: We calculate the error between the predicted values
and the actual values.

3. Create a new model: We create a new model that tries to correct the errors
made by the previous model.

4. Combine models: We combine the predictions of all the models to create a


strong model.

5. Repeat: We repeat the process until we achieve the desired level of


accuracy.

Key concepts

1. Weak model: A simple model that makes predictions on the data.

2. Strong model: A combination of multiple weak models that makes accurate


predictions.

3. Gradient: We use the gradient of the loss function to update the models.

4. Boosting: We combine the models to create a strong model.


Example: Suppose we want to predict the price of a house based on its
features, such as the number of bedrooms, bathrooms, and square footage. We
start with a weak model that uses only the number of bedrooms to make
predictions. We then create a new model that uses the number of bedrooms
and bathrooms to make predictions. We combine the predictions of both
models to create a strong model.

Advantages

1. High accuracy: Gradient Boosting can achieve high accuracy on complex


datasets.

2. Robust to noise: Gradient Boosting is robust to noise and can handle


datasets with outliers.

3. Simple to implement: Gradient Boosting is simple to implement and


requires minimal computational resources.

Disadvantages

1. Computational cost: Gradient Boosting can be computationally expensive,


especially on large datasets.

2. Overfitting: Gradient Boosting can suffer from overfitting, especially if the


number of iterations is too high.

Real-life example

Gradient Boosting is used in many real-life applications, such as:

1. Predicting stock prices: Gradient Boosting is used to predict stock prices


based on historical data.

2. Recommendation systems: Gradient Boosting is used to recommend


products to users based on their past behavior and preferences.

3. Image classification: Gradient Boosting is used to classify images into


different categories, such as objects, scenes, and actions.

Types of Gradient Boosting

1. Gradient Boosting Machine (GBM): This is the most common type of


Gradient Boosting.
2. Extreme Gradient Boosting (XGBoost): This is a variant of Gradient Boosting
that is optimized for performance and speed.

3. Light Gradient Boosting Machine (LightGBM): This is a variant of Gradient


Boosting that is optimized for performance and speed.

.............................. END…………..

Q)XG Boost?

What is XGBoost?

XGBoost is a type of machine learning algorithm that combines multiple weak


models to create a strong model. It's like having a team of experts working
together to make a decision.

How does XGBoost work?

1. Start with a weak model: We start with a simple model that makes
predictions on the data.

2. Calculate the error: We calculate the error between the predicted values
and the actual values.

3. Create a new model: We create a new model that tries to correct the errors
made by the previous model.

4. Combine models: We combine the predictions of all the models to create a


strong model.

5. Repeat: We repeat the process until we achieve the desired level of


accuracy.

Key concepts

1. Weak model: A simple model that makes predictions on the data.

2. Strong model: A combination of multiple weak models that makes accurate


predictions.

3. Gradient: We use the gradient of the loss function to update the models.

4. Boosting: We combine the models to create a strong model.

What makes XGBoost special?


1. Speed: XGBoost is optimized for performance and speed, making it much
faster than other Gradient Boosting algorithms.

2. Handling missing values: XGBoost can handle missing values in the data,
which is a common problem in machine learning.

3. Regularization: XGBoost has a built-in regularization mechanism to prevent


overfitting.

4. Parallelization: XGBoost can take advantage of multiple CPU cores to speed


up the training process.

Advantages

1. High accuracy: XGBoost can achieve high accuracy on complex datasets.

2. Fast training time: XGBoost is much faster than other Gradient Boosting
algorithms.

3. Handling missing values: XGBoost can handle missing values in the data.

4. Easy to use: XGBoost has a simple and intuitive API.

Disadvantages

1. Overfitting: XGBoost can suffer from overfitting, especially if the number of


iterations is too high.

2. Computational cost: XGBoost can be computationally expensive, especially


on large datasets.

Real-life example

XGBoost is used in many real-life applications, such as:

1. Predicting stock prices: XGBoost is used to predict stock prices based on


historical data.

2. Recommendation systems: XGBoost is used to recommend products to


users based on their past behavior and preferences.

3. Image classification: XGBoost is used to classify images into different


categories, such as objects, scenes, and actions.

……………………… END………………………
Q) SVM Regression?

SVM regression is a type of machine learning algorithm that uses a technique


called Support Vector Machines (SVMs) to make predictions on continuous
data. It's like a powerful tool that helps us find the best line or curve that fits
the data.

How Does SVM Regression Work?

Here's a simplified explanation:

1. Data: We have a dataset with input features (e.g. age, income) and a
continuous output variable (e.g. house price).

2. Find the Best Line: SVM regression tries to find the best line or curve that
fits the data, by minimizing the error between the predicted and actual values.

3. Use Kernels: SVM regression uses a technique called kernels to transform


the data into a higher-dimensional space, where it's easier to find the best line
or curve.

4. Find the Support Vectors: The algorithm identifies the most important data
points, called support vectors, that help define the best line or curve.

5. Make Predictions: Once the best line or curve is found, SVM regression uses
it to make predictions on new, unseen data.

Key Concepts*
1. Kernel Trick: The kernel trick is a mathematical technique that allows us to
transform the data into a higher-dimensional space, without actually having to
compute the coordinates of the data points in that space.

2. Regularization: Regularization is a technique that helps prevent overfitting


by adding a penalty term to the error function, to discourage the algorithm
from fitting the noise in the data.

3. Hyperparameters: Hyperparameters are parameters that need to be set


before training the algorithm, such as the kernel type, regularization
parameter, and epsilon value.

Types of SVM Regression*

1. Linear SVM Regression: Linear SVM regression uses a linear kernel to find
the best line that fits the data.

2. Non-Linear SVM Regression: Non-linear SVM regression uses a non-linear


kernel (e.g. polynomial, radial basis function) to find the best curve that fits
the data.

3. Epsilon-SVM Regression: Epsilon-SVM regression is a type of SVM


regression that uses a parameter called epsilon to control the amount of error
allowed in the predictions.

Example*

Suppose we want to predict house prices based on features like age, income,
and location. We collect a dataset of 100 houses, with their corresponding
prices. We use SVM regression to find the best line or curve that fits the data,
and then use it to make predictions on new, unseen data.

………………….. END……………….

Q) gaussian RBF kernal in SVM?

In machine learning, a Gaussian RBF (Radial Basis Function) kernel is a type of


kernel used in Support Vector Machines (SVMs) to classify data.

What is a kernel?

A kernel is a way to transform the original data into a higher-dimensional


space, where it becomes easier to separate the data into different classes.
Think of it like a magic lens that helps the SVM see the data in a new and more
useful way.

What is a Gaussian RBF kernel?

A Gaussian RBF kernel is a specific type of kernel that uses a Gaussian


distribution (also known as a bell curve) to transform the data. It's called
"radial" because it's based on the distance between the data points, and "basis
function" because it's a mathematical function that helps to represent the
data.

How does it work?

The Gaussian RBF kernel takes the original data points and calculates the
similarity between them using a Gaussian function. The similarity is based on
the distance between the points, so points that are close together will have a
high similarity, while points that are far apart will have a low similarity.

The Gaussian RBF kernel is defined as:

K(x, y) = exp(-γ|x - y|^2)

where x and y are the data points, γ is a parameter that controls the width of
the Gaussian function, and exp is the exponential function.

What are the benefits of using a Gaussian RBF kernel?

1. Non-linear separation: The Gaussian RBF kernel can separate data that is
not linearly separable, which means it can handle complex relationships
between the data points.

2. Robust to noise: The Gaussian RBF kernel is robust to noise and outliers in
the data, which means it can handle data that is not perfect.

3. Flexible: The Gaussian RBF kernel can be used with different types of data,
including numerical and categorical data.

Common applications of Gaussian RBF kernel

1. Image classification: Gaussian RBF kernel is commonly used in image


classification tasks, such as object recognition and image segmentation.

2. Text classification: Gaussian RBF kernel is used in text classification tasks,


such as spam detection and sentiment analysis.
3. Bioinformatics: Gaussian RBF kernel is used in bioinformatics to classify
biological data, such as gene expression data and protein sequences.

In summary, the Gaussian RBF kernel is a powerful tool in machine learning


that can be used to classify complex data. It's a non-linear kernel that can
handle non-linear relationships between the data points, and it's robust to
noise and outliers.

…………………. END…………..

Q) ensemble learning methods?

Ensemble learning is a technique in machine learning where we combine


multiple models to improve the performance of a single model. It's like having
a team of experts working together to make a decision, rather than relying on
a single expert.

Why Ensemble Learning?

Ensemble learning helps to:

1. Improve Accuracy: By combining multiple models, we can reduce the error


rate and improve the overall accuracy of the predictions.

2. Reduce Overfitting: Ensemble learning can help to reduce overfitting by


averaging out the predictions of multiple models.

3. Increase Robustness: Ensemble learning can make the model more robust
to changes in the data or to outliers.

Types of Ensemble Learning Methods

There are several types of ensemble learning methods, including:

1. Bagging: Bagging involves creating multiple models on different subsets of


the data and then combining their predictions.

2. Boosting: Boosting involves creating multiple models on the same data, but
with different weights assigned to each model.

3. Stacking: Stacking involves creating multiple models and then using a meta-
model to combine their predictions.

4. Random Forest: Random Forest is a type of ensemble learning method that


combines multiple decision trees to make predictions.
5. Gradient Boosting: Gradient Boosting is a type of ensemble learning method
that combines multiple models to make predictions, with each model
attempting to correct the errors of the previous model.

How Ensemble Learning Works

Here's a simple example of how ensemble learning works:

1. Train Multiple Models: We train multiple models on the same data, using
different algorithms or different subsets of the data.

2. Make Predictions: Each model makes predictions on the test data.

3. Combine Predictions: We combine the predictions of each model using a


voting scheme or by taking the average of the predictions.

4. Make Final Prediction: The final prediction is made based on the combined
predictions of all the models.

Example

Suppose we want to predict whether a person is likely to buy a car based on


their age, income, and location. We train three models:

1. Model 1: A decision tree model that predicts 70% of the people will buy a
car.

2. Model 2: A logistic regression model that predicts 60% of the people will
buy a car.

3. Model 3: A neural network model that predicts 80% of the people will buy a
car.

We combine the predictions of each model using a voting scheme, where the
model with the highest prediction wins. The final prediction is that 75% of the
people will buy a car.

…………………. END………………….

Q) Bagging and Pasting ?

What is Bagging?

Bagging, also known as Bootstrap Aggregating, is a machine learning


technique that combines multiple models to improve the accuracy and
stability of predictions. It's like creating a team of experts, where each expert
makes a prediction, and then the team votes on the final answer.

How does Bagging work?

Here's how bagging works:

1. Create multiple datasets: Take your original dataset and create multiple
subsets of it, called bootstrap samples. Each bootstrap sample is created by
randomly selecting a subset of the original data with replacement.

2. Train a model on each dataset: Train a machine learning model on each


bootstrap sample.

3. Make predictions: Use each model to make predictions on new, unseen data.

4. Combine predictions: Combine the predictions from each model to create a


final prediction.

What is Pasting?

Pasting, also known as pasting ensemble, is a technique that combines


multiple models to improve the accuracy and stability of predictions. It's
similar to bagging, but instead of creating multiple datasets, you create
multiple models with different parameters.

How does Pasting work?

Here's how pasting works:

1. Create multiple models: Create multiple machine learning models with


different parameters, such as different learning rates or regularization
strengths.

2. Train each model: Train each model on the same dataset.

3. Make predictions: Use each model to make predictions on new, unseen data.

4. Combine predictions: Combine the predictions from each model to create a


final prediction.

Key differences between Bagging and Pasting

Here are the key differences between bagging and pasting:


- Dataset creation: Bagging creates multiple datasets, while pasting creates
multiple models with different parameters.

- Model training: Bagging trains a model on each dataset, while pasting trains
multiple models on the same dataset.

- Prediction combination: Both bagging and pasting combine predictions from


multiple models, but bagging uses a voting mechanism, while pasting uses a
weighted average.

Benefits of Bagging and Pasting

Both bagging and pasting have several benefits, including:

- Improved accuracy: By combining multiple models, you can improve the


accuracy of predictions.

- Increased stability: By averaging out the predictions from multiple models,


you can reduce the impact of overfitting and improve the stability of
predictions.

- Reduced variance: By combining multiple models, you can reduce the


variance of predictions and improve the overall performance of the model.

Common applications of Bagging and Pasting

Bagging and pasting are commonly used in a variety of applications, including:

- Classification: Bagging and pasting can be used to improve the accuracy of


classification models, such as decision trees or random forests.

- Regression: Bagging and pasting can be used to improve the accuracy of


regression models, such as linear regression or neural networks.

- Time series forecasting: Bagging and pasting can be used to improve the
accuracy of time series forecasting models, such as ARIMA or LSTM.

Real-life example of Bagging

Let's say you're trying to predict the price of a house, and you have a dataset
of 100 houses with their features (number of bedrooms, square footage,
location, etc.). You create 5 bootstrap samples, each with 80 houses. You train
a decision tree model on each bootstrap sample, and then you use each model
to estimate the price of a new house. The estimates are:
- Model 1: $500,000

- Model 2: $550,000

- Model 3: $450,000

- Model 4: $520,000

- Model 5: $480,000

You take the average of all 5 estimates, which is $500,000. This is your final
prediction.

Real-life example of Pasting

Let's say you're trying to predict the price of a house, and you have a dataset
of 100 houses with their features (number of bedrooms, square footage,
location, etc.). You create 5 different models:

- Model 1: A simple linear regression model

- Model 2: A complex neural network model

- Model 3: A decision tree model

- Model 4: A random forest model

- Model 5: A gradient boosting model

You train each model on the same dataset, and then you use each model to
estimate the price of a new house. The estimates are:

- Model 1: $480,000

- Model 2: $520,000

- Model 3: $450,000

- Model 4: $500,000

- Model 5: $550,000

You combine the estimates from each model using a weighted average, where
the weights are based on the performance of each model on the training data.
The final prediction is $505,000.

...................... END ………………


Q) Voting Classifiers?

A Voting Classifier is a type of machine learning algorithm that combines the


predictions of multiple models to make a final prediction. It's like having a
team of experts voting on a decision.

How does a Voting Classifier work?

1. Train multiple models: We train multiple machine learning models on the


same dataset.

2. Make predictions: Each model makes predictions on the test data.

3. Combine predictions: We combine the predictions of all the models using a


voting mechanism.

4. Final prediction: The final prediction is the class with the most votes.

Types of Voting Classifiers

1. Hard Voting:

Hard Voting Classifiers are a type of ensemble learning method that combines
the predictions of multiple models to make a final prediction. In simple words,
it's like having a team of experts voting on a decision, where each model gets
one vote, and the class with the most votes wins.

For example, let's say we have three models predicting whether a picture is
of a dog, cat, or horse. Model 1 predicts dog, Model 2 predicts cat, and Model 3
predicts dog. In this case, the final prediction would be dog because it got two
votes.

Hard voting classifiers are simple to implement and can be very effective in
improving the accuracy of your models. They work by reducing the impact of
noise and outliers in the data, and by combining the strengths of multiple
models.

Some popular classification algorithms used in machine learning include


Naive Bayes Classifier, Logistic Regression, Decision Tree, Random Forests,
Support Vector Machines, and K-Nearest Neighbour.³ These algorithms can be
used for various classification problems, such as image recognition, text
classification, and customer behavior prediction.
To illustrate this, let's consider a text classification problem where we want to
classify phrases or words into a particular category, such as sports or not
sports. We can use a hard voting classifier to combine the predictions of
multiple models, such as Naive Bayes and Logistic Regression, to make a final
prediction.

In summary, hard voting classifiers are a powerful tool in machine learning


that can help improve the accuracy and robustness of your models by
combining the predictions of multiple models.

2. Soft Voting:

Soft Voting Classifiers are a type of ensemble learning method that combines
the predictions of multiple models to make a final prediction. In simple words,
it's like having a team of experts voting on a decision, where each model gets a
weighted vote based on its confidence in the prediction.

For example, let's say we have three models predicting whether a picture is
of a dog, cat, or horse. Model 1 predicts dog with 80% confidence, Model 2
predicts cat with 60% confidence, and Model 3 predicts dog with 70%
confidence. In this case, the final prediction would be dog because it got the
highest weighted vote.

Soft voting classifiers are useful when we have multiple models with different
strengths and weaknesses, and we want to combine their predictions to make
a more accurate final prediction. They can be used for both binary and
multiclass classification problems, and can handle nonlinear relationships
between features.

Some popular classification algorithms used in soft voting classifiers include


Support Vector Machines (SVM), Logistic Regression, and Decision Trees.
These algorithms can be used for various classification problems, such as
image recognition, text classification, and customer behavior prediction.

To illustrate this, let's consider a text classification problem where we want to


classify phrases or words into a particular category, such as sports or not
sports. We can use a soft voting classifier to combine the predictions of
multiple models, such as SVM and Logistic Regression, to make a final
prediction. The soft voting classifier would calculate the weighted average of
the predictions from each model, and assign the label with the highest
weighted average to the new data point.
Overall, soft voting classifiers are a powerful tool in machine learning that can
help improve the accuracy and robustness of our models by combining the
strengths of multiple models.

Advantages

1. Improved accuracy: Voting Classifiers can improve the accuracy of


individual models.

2. Robustness to noise: Voting Classifiers can reduce the impact of noise in the
data.

3. Handling missing values: Voting Classifiers can handle missing values in the
data.

Disadvantages

1. Increased complexity: Voting Classifiers can be more complex to implement


and train.

2. Overfitting: Voting Classifiers can suffer from overfitting if the individual


models are overfitting.

Real-life example

Voting Classifiers are used in many real-life applications, such as:

1. Image classification: Voting Classifiers are used to classify images into


different categories, such as objects, scenes, and actions.

2. Text classification: Voting Classifiers are used to classify text into different
categories, such as spam vs. non-spam emails.

3. Recommendation systems: Voting Classifiers are used to recommend


products to users based on their past behavior and preferences.

………………………….. END ………………..

Q) Stacking ?

Stacking is a technique in machine learning where we combine the predictions


of multiple models to make a final prediction. In simple words, it's like having
a team of experts working together to make a decision, where each expert
(model) makes a prediction, and then a meta-model (another model)
combines these predictions to make a final decision.
Here's how it works:

1. Train multiple models: We train multiple models on the same dataset, each
with a different algorithm or configuration.

2. Make predictions: Each model makes predictions on the training data.

3. Create a meta-model: We create a new model, called a meta-model, that


takes the predictions from each of the individual models as input.

4. Train the meta-model: We train the meta-model to make a final prediction


based on the predictions from the individual models.

5. Make a final prediction: The meta-model makes a final prediction on new,


unseen data.

Stacking is useful when we have multiple models that are good at different
things, and we want to combine their strengths to make a more accurate final
prediction. It's like having a team of experts, where each expert is good at a
different aspect of the problem, and the meta-model combines their expertise
to make a final decision.

Some popular algorithms used in stacking include Logistic Regression,


Decision Trees, and Random Forests. These algorithms can be used as
individual models or as meta-models, depending on the problem and the data.

Overall, stacking is a powerful technique in machine learning that can help us


improve the accuracy and robustness of our models by combining the
strengths of multiple models.

Advantages:

1. Improved Accuracy: Stacking can improve the accuracy of the model by


combining the predictions of multiple models.

2. Reducing Overfitting: Stacking can reduce overfitting by averaging the


predictions of multiple models, which can help to reduce the impact of noise
in the data.

3. Handling Different Types of Data: Stacking can handle different types of


data, such as numerical and categorical data, by using different models for
each type of data.
4. Flexibility: Stacking allows for the use of different models and algorithms,
which can be useful for handling complex data sets.

5. Robustness: Stacking can provide robustness to the model by reducing the


impact of outliers and noise in the data.

6. Improved Interpretability: Stacking can provide improved interpretability


of the results by allowing for the analysis of the contributions of each model to
the final prediction.

Disadvantages:

1. Increased Complexity: Stacking can increase the complexity of the model,


which can make it more difficult to interpret and understand.

2. Computational Cost: Stacking can be computationally expensive, especially


when using large datasets and complex models.

3. Overfitting to the Meta-Model: Stacking can lead to overfitting to the meta-


model, which can result in poor performance on unseen data.

4. Difficulty in Choosing the Right Models: Stacking requires the choice of the
right models and algorithms, which can be difficult and time-consuming.

5. Risk of Correlated Models: Stacking can lead to correlated models, which


can result in poor performance if the models are not diverse enough.

6. Difficulty in Interpreting the Results: Stacking can make it difficult to


interpret the results, especially when using complex models and algorithms.

7. Requires Large Amounts of Data: Stacking requires large amounts of data to


train and validate the models, which can be a limitation in some cases.

8. Can be Sensitive to Hyperparameters: Stacking can be sensitive to


hyperparameters, such as the choice of models and algorithms, which can
affect the performance of the model.

When to Use Stacking:

1. When dealing with complex data sets: Stacking can be useful when dealing
with complex data sets that require the use of multiple models and
algorithms.

2. When improving accuracy is crucial: Stacking can be useful when improving


accuracy is crucial, such as in applications where the cost of errors is high.
3. When handling different types of data: Stacking can be useful when
handling different types of data, such as numerical and categorical data.

When Not to Use Stacking:

1. When dealing with simple data sets: Stacking may not be necessary when
dealing with simple data sets that can be handled by a single model.

2. When computational resources are limited: Stacking can be


computationally expensive, so it may not be suitable when computational
resources are limited.

3. When interpretability is crucial: Stacking can make it difficult to interpret


the results, so it may not be suitable when interpretability is crucial.

Here's an example:

Let's say we're trying to predict whether a customer will buy a product or not.
We have three models:

- Model 1: A Logistic Regression model that predicts based on demographic


data (age, income, etc.).

- Model 2: A Decision Tree model that predicts based on purchase history


(what products they've bought before).

- Model 3: A Random Forest model that predicts based on browsing behavior


(what pages they've visited on our website).

We train each model on the same dataset and get the following predictions:

| Customer | Model 1 | Model 2 | Model 3 |

| --- | --- | --- | --- |

| John | 0.8 | 0.6 | 0.7 |

| Jane | 0.4 | 0.8 | 0.5 |

| Bob | 0.7 | 0.4 | 0.6 |

We then create a meta-model, let's say a Linear Regression model, that takes
the predictions from each of the individual models as input. The meta-model
predicts the final probability of a customer buying the product.

| Customer | Meta-Model |
| --- | --- |

| John | 0.85 |

| Jane | 0.55 |

| Bob | 0.65 |

In this example, the meta-model combines the predictions from the individual
models to make a final prediction. The final prediction is a weighted average
of the individual predictions, where the weights are learned by the meta-
model.

……………………….. END……………

Q) list of kernels in SVM?

1. Linear Kernel: Maps the data to a higher-dimensional space using a linear


transformation.

- Example: Classification of linearly separable data, such as separating two


classes of points in a 2D space.

- Formula: K(x, y) = x · y

2. Polynomial Kernel: Maps the data to a higher-dimensional space using a


polynomial transformation.

- Example: Classification of non-linearly separable data, such as separating


two classes of points in a 2D space using a polynomial curve.

- Formula: K(x, y) = (x · y + 1)^d

3. Radial Basis Function (RBF) Kernel: Maps the data to a higher-dimensional


space using a Gaussian distribution.

- Example: Classification of non-linearly separable data, such as separating


two classes of points in a 2D space using a Gaussian curve.

- Formula: K(x, y) = exp(-γ|x - y|^2)

4. Sigmoid Kernel: Maps the data to a higher-dimensional space using a


sigmoid function.

- Example: Binary classification problems, such as spam vs. non-spam


emails.
- Formula: K(x, y) = tanh(αx · y + β)

5. Gaussian Kernel: Similar to the RBF kernel, but with a different formulation.

- Example: Classification of non-linearly separable data, such as separating


two classes of points in a 2D space using a Gaussian curve.

- Formula: K(x, y) = exp(-|x - y|^2 / 2σ^2)

6. Laplacian Kernel: Maps the data to a higher-dimensional space using a


Laplacian distribution.

- Example: Classification of non-linearly separable data, such as separating


two classes of points in a 2D space using a Laplacian curve.

- Formula: K(x, y) = exp(-|x - y| / σ)

7. Anova Kernel: Maps the data to a higher-dimensional space using an


Analysis of Variance (ANOVA) transformation.

- Example: Classification of non-linearly separable data, such as separating


two classes of points in a 2D space using an ANOVA curve.

- Formula: K(x, y) = ∑[i=1 to d] (x_i - y_i)^2

8. Tangent Kernel: Maps the data to a higher-dimensional space using a


tangent function.

- Example: Classification of non-linearly separable data, such as separating


two classes of points in a 2D space using a tangent curve.

- Formula: K(x, y) = tan(π/2 * (x · y + 1))

9. Chi-Square Kernel: Maps the data to a higher-dimensional space using a chi-


square distribution.

- Example: Classification of non-linearly separable data, such as separating


two classes of points in a 2D space using a chi-square curve.

- Formula: K(x, y) = 1 - ∑[i=1 to d] (x_i - y_i)^2 / (x_i + y_i)

10. Histogram Intersection Kernel: Maps the data to a higher-dimensional


space using a histogram intersection transformation.

- Example: Classification of non-linearly separable data, such as separating


two classes of points in a 2D space using a histogram intersection curve.
- Formula: K(x, y) = ∑[i=1 to d] min(x_i, y_i)

These kernels can be used for different types of data and problems, such as:

- Linear kernel for linearly separable data

- Polynomial kernel for non-linearly separable data

- RBF kernel for non-linearly separable data with a large number of features

- Sigmoid kernel for binary classification problems

Each kernel has its own strengths and weaknesses, and the choice of kernel
depends on the specific problem and data.

…………………….. END …………………

Q) Random forests?

Random Forest is a popular machine learning algorithm that belongs to the


supervised learning technique. It can be used for both Classification and
Regression problems in ML. It is based on the concept of ensemble
learning, which is a process of combining multiple classifiers to solve a complex
problem and to improve the performance of the model.

As the name suggests, "Random Forest is a classifier that contains a number of


decision trees on various subsets of the given dataset and takes the average to
improve the predictive accuracy of that dataset." Instead of relying on one
decision tree, the random forest takes the prediction from each tree and based
on the majority votes of predictions, and it predicts the final output.

The greater number of trees in the forest leads to higher accuracy and
prevents the problem of overfitting.

The below diagram explains the working of the Random Forest algorithm:
Note: To better understand the Random Forest Algorithm, you should have
knowledge of the Decision Tree Algorithm.

Assumptions for Random Forest:

Since the random forest combines multiple trees to predict the class of the
dataset, it is possible that some decision trees may predict the correct output,
while others may not. But together, all the trees predict the correct output.
Therefore, below are two assumptions for a better Random forest classifier:

o There should be some actual values in the feature variable of the dataset
so that the classifier can predict accurate results rather than a guessed
result.
o The predictions from each tree must have very low correlations.

Why use Random Forest?

Below are some points that explain why we should use the Random Forest
algorithm:

o It takes less training time as compared to other algorithms.


o It predicts output with high accuracy, even for the large dataset it runs
efficiently.
o It can also maintain accuracy when a large proportion of data is missing.

How does Random Forest algorithm work?

Random Forest works in two-phase first is to create the random forest by


combining N decision tree, and second is to make predictions for each tree
created in the first phase.

The Working process can be explained in the below steps and diagram:

Step-1: Select random K data points from the training set.

Step-2: Build the decision trees associated with the selected data points
(Subsets).

Step-3: Choose the number N for decision trees that you want to build.

Step-4: Repeat Step 1 & 2.

Step-5: For new data points, find the predictions of each decision tree, and
assign the new data points to the category that wins the majority votes.
The working of the algorithm can be better understood by the below example:

Example: Suppose there is a dataset that contains multiple fruit images. So,
this dataset is given to the Random forest classifier. The dataset is divided into
subsets and given to each decision tree. During the training phase, each
decision tree produces a prediction result, and when a new data point occurs,
then based on the majority of results, the Random Forest classifier predicts
the final decision. Consider the below image:

Applications of Random Forest:

There are mainly four sectors where Random forest mostly used:

1. Banking: Banking sector mostly uses this algorithm for the


identification of loan risk.
2. Medicine: With the help of this algorithm, disease trends and risks of
the disease can be identified.
3. Land Use: We can identify the areas of similar land use by this
algorithm.
4. Marketing: Marketing trends can be identified using this algorithm.

Advantages of Random Forest:

o Random Forest is capable of performing both Classification and


Regression tasks.
o It is capable of handling large datasets with high dimensionality.
o It enhances the accuracy of the model and prevents the overfitting issue.
Disadvantages of Random Forest:

o Although random forest can be used for both classification and


regression tasks, it is not more suitable for Regression tasks.

………………… end ………………

Q)naive bayes ?

o Naïve Bayes algorithm is a supervised learning algorithm, which is


based on Bayes theorem and used for solving classification problems.
o It is mainly used in text classification that includes a high-dimensional
training dataset.
o Naïve Bayes Classifier is one of the simple and most effective
Classification algorithms which helps in building the fast machine
learning models that can make quick predictions.
o It is a probabilistic classifier, which means it predicts on the basis
of the probability of an object.
o Some popular examples of Naïve Bayes Algorithm are spam filtration,
Sentimental analysis, and classifying articles.

Why is it called Naïve Bayes?

The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which
can be described as:

o Naïve: It is called Naïve because it assumes that the occurrence of a


certain feature is independent of the occurrence of other features. Such
as if the fruit is identified on the bases of color, shape, and taste, then
red, spherical, and sweet fruit is recognized as an apple. Hence each
feature individually contributes to identify that it is an apple without
depending on each other.
o Bayes: It is called Bayes because it depends on the principle of Bayes'
Theorem.

Types of Naive Bayes

There are three types of Naive Bayes algorithms:

o 1. Multinomial Naive Bayes: This algorithm is used for multi-class


classification problems and is suitable for features that have multiple
values.
o 2. Bernoulli Naive Bayes: This algorithm is used for binary classification
problems and is suitable for features that have only two values.
o 3. Gaussian Naive Bayes: This algorithm is used for continuous features
and is suitable for features that follow a Gaussian distribution.

Bayes' Theorem:

o Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is


used to determine the probability of a hypothesis with prior knowledge.
It depends on the conditional probability.
o The formula for Bayes' theorem is given as:

Where,

P(A|B) is Posterior probability: Probability of hypothesis A on the observed


event B.

P(B|A) is Likelihood probability: Probability of the evidence given that the


probability of a hypothesis is true.

P(A) is Prior Probability: Probability of hypothesis before observing the


evidence.

P(B) is Marginal Probability: Probability of Evidence.

Working of Naïve Bayes' Classifier:

Working of Naïve Bayes' Classifier can be understood with the help of the
below example:

Suppose we have a dataset of weather conditions and corresponding target


variable "Play". So using this dataset we need to decide that whether we
should play or not on a particular day according to the weather conditions. So
to solve this problem, we need to follow the below steps:

1. Convert the given dataset into frequency tables.


2. Generate Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability.

Problem: If the weather is sunny, then the Player should play or not?

Solution: To solve this, first consider the below dataset:


Outlook Play

0 Rainy Yes

1 Sunny Yes

2 Overcast Yes

3 Overcast Yes

4 Sunny No

5 Rainy Yes

6 Sunny Yes

7 Overcast Yes

8 Rainy No

9 Sunny No

10 Sunny Yes

11 Rainy No

12 Overcast Yes

13 Overcast Yes

Frequency table for the Weather Conditions:

Weather Yes No

Overcast 5 0
Rainy 2 2

Sunny 3 2

Total 10 4

Likelihood table weather condition:

Weather No Yes

Overcast 0 5 5/14= 0.35

Rainy 2 2 4/14=0.29

Sunny 2 3 5/14=0.35

All 4/14=0.29 10/14=0.71

Applying Bayes'theorem:

P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)

P(Sunny|Yes)= 3/10= 0.3

P(Sunny)= 0.35

P(Yes)=0.71

So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60

P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)

P(Sunny|NO)= 2/4=0.5

P(No)= 0.29

P(Sunny)= 0.35

So P(No|Sunny)= 0.5*0.29/0.35 = 0.41

So as we can see from the above calculation that P(Yes|Sunny)>P(No|Sunny)


Hence on a Sunny day, Player can play the game.

Advantages of Naïve Bayes Classifier:


o Naïve Bayes is one of the fast and easy ML algorithms to predict a class
of datasets.
o It can be used for Binary as well as Multi-class Classifications.
o It performs well in Multi-class predictions as compared to the other
Algorithms.
o It is the most popular choice for text classification problems.

Disadvantages of Naïve Bayes Classifier:


o Naive Bayes assumes that all features are independent or unrelated, so
it cannot learn the relationship between features.

Applications of Naïve Bayes Classifier:


o It is used for Credit Scoring.
o It is used in medical data classification.
o It can be used in real-time predictions because Naïve Bayes Classifier
is an eager learner.
o It is used in Text classification such as Spam filtering and Sentiment
analysis.

…………… end………….
Q)Support Vector Machine ?

Support Vector Machine or SVM is one of the most popular Supervised


Learning algorithms, which is used for Classification as well as Regression
problems. However, primarily, it is used for Classification problems in
Machine Learning.

The goal of the SVM algorithm is to create the best line or decision boundary
that can segregate n-dimensional space into classes so that we can easily put
the new data point in the correct category in the future. This best decision
boundary is called a hyperplane.

SVM chooses the extreme points/vectors that help in creating the hyperplane.
These extreme cases are called as support vectors, and hence algorithm is
termed as Support Vector Machine. Consider the below diagram in which
there are two different categories that are classified using a decision boundary
or hyperplane:
Example: SVM can be understood with the example that we have used in the
KNN classifier. Suppose we see a strange cat that also has some features of
dogs, so if we want a model that can accurately identify whether it is a cat or
dog, so such a model can be created by using the SVM algorithm. We will first
train our model with lots of images of cats and dogs so that it can learn about
different features of cats and dogs, and then we test it with this strange
creature. So as support vector creates a decision boundary between these two
data (cat and dog) and choose extreme cases (support vectors), it will see the
extreme case of cat and dog. On the basis of the support vectors, it will classify
it as a cat. Consider the below diagram:
SVM algorithm can be used for Face detection, image classification, text
categorization, etc.

Types of SVM:

SVM can be of two types:

1.Linear SVM:

A Linear Support Vector Machine (SVM) is a type of supervised learning


algorithm used for classification and regression tasks. It is a linear
model that uses a hyperplane to separate the data into different classes.

How Linear SVM Works

The goal of a Linear SVM is to find the best hyperplane that separates
the data into different classes. The hyperplane is defined by a set of
weights and a bias term. The weights determine the direction of the
hyperplane, and the bias term determines the position of the
hyperplane.

The Linear SVM algorithm works as follows:

1. Data Preparation: The data is preprocessed to ensure that it is in a


suitable format for the algorithm.

2. Choose a Kernel: A kernel is chosen to transform the data into a


higher-dimensional space. For a Linear SVM, the kernel is a linear
function.

3. Find the Optimal Hyperplane: The algorithm finds the optimal


hyperplane that separates the data into different classes. The optimal
hyperplane is the one that maximizes the margin between the classes.

4. Make Predictions: Once the optimal hyperplane is found, the


algorithm can make predictions on new, unseen data.

Example of Linear SVM

Suppose we have a dataset of exam scores and hours studied, and we


want to predict whether a student will pass or fail an exam based on
their score and hours studied. The dataset is as follows:
| Exam Score | Hours Studied | Pass/Fail |

| --- | --- | --- |

| 80 | 10 | Pass |

| 70 | 8 | Fail |

| 90 | 12 | Pass |

| 60 | 6 | Fail |

| 85 | 11 | Pass |

We can use a Linear SVM to classify the students into two classes: Pass
and Fail. The Linear SVM algorithm will find the optimal hyperplane
that separates the data into these two classes.

The resulting hyperplane might look like this:

Pass: Exam Score > 75 and Hours Studied > 9

Fail: Exam Score < 75 or Hours Studied < 9

This hyperplane separates the data into two classes: Pass and Fail. The
students who have an exam score greater than 75 and have studied for
more than 9 hours are classified as Pass, while the students who have an
exam score less than 75 or have studied for less than 9 hours are
classified as Fail.

2.Non-linear SVM:

Nonlinear SVM in Machine Learning

A Nonlinear Support Vector Machine (SVM) is a type of supervised learning


algorithm used for classification and regression tasks. It is a nonlinear model
that uses a kernel to transform the data into a higher-dimensional space,
where it can be separated by a hyperplane.

How Nonlinear SVM Works


The goal of a Nonlinear SVM is to find the best hyperplane that separates the
data into different classes. The hyperplane is defined by a set of weights and a
bias term. The weights determine the direction of the hyperplane, and the bias
term determines the position of the hyperplane.

The Nonlinear SVM algorithm works as follows:

1. Data Preparation: The data is preprocessed to ensure that it is in a suitable


format for the algorithm.

2. Choose a Kernel: A kernel is chosen to transform the data into a higher-


dimensional space. Common kernels used in Nonlinear SVM include the
polynomial kernel, radial basis function (RBF) kernel, and sigmoid kernel.

3. Find the Optimal Hyperplane: The algorithm finds the optimal hyperplane
that separates the data into different classes. The optimal hyperplane is the
one that maximizes the margin between the classes.

4. Make Predictions: Once the optimal hyperplane is found, the algorithm can
make predictions on new, unseen data.

Example of Nonlinear SVM

Suppose we have a dataset of exam scores and hours studied, and we want to
predict whether a student will pass or fail an exam based on their score and
hours studied. The dataset is as follows:

| Exam Score | Hours Studied | Pass/Fail |

| --- | --- | --- |

| 80 | 10 | Pass |

| 70 | 8 | Fail |

| 90 | 12 | Pass |

| 60 | 6 | Fail |

| 85 | 11 | Pass |
We can use a Nonlinear SVM to classify the students into two classes: Pass and
Fail. The Nonlinear SVM algorithm will find the optimal hyperplane that
separates the data into these two classes.

The resulting hyperplane might look like this:

Pass: (Exam Score - 75)^2 + (Hours Studied - 9)^2 < 100

Fail: (Exam Score - 75)^2 + (Hours Studied - 9)^2 > 100

This hyperplane separates the data into two classes: Pass and Fail. The
students who have an exam score and hours studied that satisfy the inequality
are classified as Pass, while the students who do not satisfy the inequality are
classified as Fail.

Hyperplane and Support Vectors in the SVM algorithm:

Hyperplane:

There can be multiple lines/decision boundaries to segregate the classes in n-


dimensional space, but we need to find out the best decision boundary that
helps to classify the data points. This best boundary is known as the
hyperplane of SVM.

The dimensions of the hyperplane depend on the features present in the


dataset, which means if there are 2 features (as shown in image), then
hyperplane will be a straight line. And if there are 3 features, then hyperplane
will be a 2-dimension plane.

We always create a hyperplane that has a maximum margin, which means the
maximum distance between the data points.

Support Vectors:

The data points or vectors that are the closest to the hyperplane and which
affect the position of the hyperplane are termed as Support Vector. Since these
vectors support the hyperplane, hence called a Support vector.

…………………. END…………..

You might also like