0% found this document useful (0 votes)
7 views48 pages

ML Mod1

Uploaded by

Aishwarya Balaji
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views48 pages

ML Mod1

Uploaded by

Aishwarya Balaji
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Module 1

Classification
Contents
Supervised Learning (Classification):
• Support Vector Machine (SVC and SVR),
• Loss function in SVM,
• Kernel Methods,
• Random Forest, and
• Ensemble classification methods (Bagging and Boosting Techniques).
Introduction
There are data points that have to be classified.

Linearly separable Non-linearly separable

A 2-dimensional synthetic dataset that contains 2 classes


• When data is linearly separable, then there exist multiple
hyperplanes.

Logistic regression draws the


multiple hyperplanes.

Among these hyperplanes, which is


the most effective hyperplane?
Answer is Support Vector Machine (SVM)
• Support Vector Machine or SVM is one of the most popular Supervised
Learning algorithms invented in 1994, which is used for Classification
as well as Regression problems. However, primarily, it is used for
Classification problems in Machine Learning.
• The goal of the SVM algorithm is to create the best line or decision
boundary that can segregate n-dimensional space into classes so that
we can easily put the new data point in the correct category in the
future. This best decision boundary is called a hyperplane.
How SVM works

Hence, the SVM algorithm helps to find the best line or decision boundary; this best boundary or
region is called as a hyperplane. SVM algorithm finds the closest point of the lines from both classes.
These points are called support vectors. The distance between the vectors and the hyperplane is called
as margin. And the goal of SVM is to maximize this margin. The hyperplane with maximum margin is
called the optimal hyperplane.
Hard margin SVM
• Each data point must lie on the correct side of the margin and there
should be no misclassification. Hard margin works well only if our data is
linearly separable.
• Hard margin SVM does not allow any misclassification to happen.
• Suppose if the data is nonlinear the Hard margin SVM will not return any
hyperplane as it will not be able to separate the data.

• Therefore we need soft-margin SVM.


Hard margin
Soft margin SVM
• Soft margin SVM allows some misclassification to happen by relaxing the
hard constraints of Support Vector Machine.
• Soft margin SVM is implemented with the help of the Regularization
parameter (C).
• Regularization parameter (C): It tells us how much misclassification
should be avoided.
– If C value is large, then SVM tries to reduce the number of misclassified
points.
- If C value is small, then SVM misclassifies the points more.
• Gamma parameter: This does not have any role if the
data is linearly separable. Gamma is used only for
nonlinear data points only.

Gamma value can be small or large


The gamma value is small (0.1) The gamma value is large (1)
Kernels
Ensemble method
• The main idea behind ensemble methods is that a group of “weak
learners” can come together to form a “strong learner”.

Bagging
Ensemble Learning: Bagging and Boosting
Bagging
Random Forest
• Random Forest is a supervised learning technique for Classification
and Regression problems in ML and is a bagging technique.
• It is based on the concept of ensemble learning, which is a process
of combining multiple classifiers to solve a complex problem and improve
the model’s performance.
• "Random Forest is a classifier that contains several decision
trees on various subsets of the given dataset, based on the
majority votes of predictions, and it predicts the final output.
• The greater number of trees in the forest leads to higher accuracy
and prevents the problem of overfitting.
Random forest with and without
replacement
• With replacement: When a sampling unit is drawn from a
population and is returned to that population after its
characteristics have been recorded before the next unit is drawn.
• So we might end up selecting and measuring the same unit more
than once.
Random forest with and without
replacement
• Without replacement: When a sampling unit is drawn from a
population and is not returned to that population before the next
unit is drawn.

Sampling without replacement is always good.


Out of bag error
2
𝑇𝑟𝑎𝑖𝑛𝑖𝑛𝑔 = ×𝑑
3

1
𝑉𝑎𝑙𝑖𝑑𝑎𝑡𝑖𝑜𝑛 = × 𝑑
3

OOB error and this value


will be in terms of
percentage and is called
oob_score
Important Hyperparameters in
Random Forest
Hyperparameters to Increase the Predictive Power
• n_estimators: Number of trees the algorithm builds before averaging
the predictions.
• max_features: Maximum number of features random forest considers
splitting a node.
• mini_sample_leaf: Determines the minimum number of leaves
required to split an internal node.
• criterion: How to split the node in each tree? (Entropy/Gini
impurity/Log Loss)
• max_leaf_nodes: Maximum leaf nodes in each tree
Hyperparameters to Increase the Speed
• n_jobs: it tells the engine how many processors it is allowed to use.
If the value is 1, it can use only one processor, but if the value is -
1, there is no limit.
• random_state: controls randomness of the sample. The model will
always produce the same results if it has a definite value of random
state and has been given the same hyperparameters and training
data.
• oob_score: OOB means out of the bag. It is a random forest cross-
validation method. In this, one-third of the sample is not used to
train the data; instead used to evaluate its performance. These
samples are called out-of-bag samples.
Hands-on Random forest
Boosting
Bias ( training and testing errors will be high ). Boosting is used
to reduce the bias.
1. Initialize the dataset and assign equal weight to each of the data points.
2. Provide this as input to the model and identify the wrongly classified data points.
3. Increase the weight of the wrongly classified data points.
4. if (got required results)
Goto step 5
else
Goto step 2
5. End

The principle behind boosting algorithms is


to first build a model on the training
dataset and then build a second model to
rectify the errors present in the first model.
This procedure is continued until and
unless the errors are minimized and the
dataset is predicted correctly.
Types of boosting
• Gradient boosting
• XGBoost
• Adaboost
• Catboost
Adaboost
• AdaBoost algorithm, Adaptive Boosting, is a Boosting
technique used as an Ensemble Method in Machine
Learning. It is called Adaptive Boosting as the weights are re-
assigned to each instance, with higher weights assigned to
incorrectly classified instances.
• Towards the end, aggregate these models by taking the
weighted average of the individual models.
Numerical example on Adaboost
• Refer the class notes
• Gradient Boosting
• XGBoost (Extreme Gradient Boosting)
• Catboost (it works with categorical data (the Cat) and it
uses gradient boosting (the Boost)).
• Practical Implementation of Boosting
Question Bank
• Explain the working principle of SVM and write a Python
program to implement SVM.
• Explain the different types of Kernel methods.
• Problems on Linear SVM and Non-linear SVM (Refer class
notes).
• Explain the working principle of Random Forest and write a
Python program to implement Random Forest.
• Explain the working mechanism of Ada boost with an example.
• write a Python program to implement Gradient and Adaboost.
• write a Python program to implement XGBoost and Catboost.
Question Bank
• Write a Python program to implement LIME and SHAP for
interpreting the results.
• Write a Python program to hyperparameter using manual
method.
• Write a Python program to implement Randomized search CV.
• Write a Python program to implement Grid search CV.

You might also like