ML Mod1
ML Mod1
Classification
Contents
Supervised Learning (Classification):
• Support Vector Machine (SVC and SVR),
• Loss function in SVM,
• Kernel Methods,
• Random Forest, and
• Ensemble classification methods (Bagging and Boosting Techniques).
Introduction
There are data points that have to be classified.
Hence, the SVM algorithm helps to find the best line or decision boundary; this best boundary or
region is called as a hyperplane. SVM algorithm finds the closest point of the lines from both classes.
These points are called support vectors. The distance between the vectors and the hyperplane is called
as margin. And the goal of SVM is to maximize this margin. The hyperplane with maximum margin is
called the optimal hyperplane.
Hard margin SVM
• Each data point must lie on the correct side of the margin and there
should be no misclassification. Hard margin works well only if our data is
linearly separable.
• Hard margin SVM does not allow any misclassification to happen.
• Suppose if the data is nonlinear the Hard margin SVM will not return any
hyperplane as it will not be able to separate the data.
Bagging
Ensemble Learning: Bagging and Boosting
Bagging
Random Forest
• Random Forest is a supervised learning technique for Classification
and Regression problems in ML and is a bagging technique.
• It is based on the concept of ensemble learning, which is a process
of combining multiple classifiers to solve a complex problem and improve
the model’s performance.
• "Random Forest is a classifier that contains several decision
trees on various subsets of the given dataset, based on the
majority votes of predictions, and it predicts the final output.
• The greater number of trees in the forest leads to higher accuracy
and prevents the problem of overfitting.
Random forest with and without
replacement
• With replacement: When a sampling unit is drawn from a
population and is returned to that population after its
characteristics have been recorded before the next unit is drawn.
• So we might end up selecting and measuring the same unit more
than once.
Random forest with and without
replacement
• Without replacement: When a sampling unit is drawn from a
population and is not returned to that population before the next
unit is drawn.
1
𝑉𝑎𝑙𝑖𝑑𝑎𝑡𝑖𝑜𝑛 = × 𝑑
3