ML 4
ML 4
#Classification
The Classification algorithm is a Supervised Learning technique that is used to identify the
category of new observations on the basis of training data. In Classification, a program learns
from the given dataset or observations and then classifies new observation into a number of
classes or groups. Such as, Yes or No, 0 or 1, Spam or Not Spam, cat or dog, etc. Classes can
be called as targets/labels or categories.
Unlike regression, the output variable of Classification is a category, not a value, such as
"Green or Blue", "fruit or animal", etc. Since the Classification algorithm is a Supervised
learning technique, hence it takes labeled input data, which means it contains input with the
corresponding output.
In classification algorithm, a discrete output function(y) is mapped to input variable(x).
The best example of an ML classification algorithm is Email Spam Detector.
The main goal of the Classification algorithm is to identify the category of a given dataset,
and these algorithms are mainly used to predict the output for the categorical data.
Classification algorithms can be better understood using the below diagram. In the below
diagram, there are two classes, class A and Class B. These classes have features that are
similar to each other and dissimilar to other classes.
3. AUC-ROC curve:
o ROC curve stands for Receiver Operating Characteristics Curve and AUC stands
for Area Under the Curve.
o It is a graph that shows the performance of the classification model at different
thresholds.
o To visualize the performance of the multi-class classification model, we use the AUC-
ROC Curve.
o The ROC curve is plotted with TPR and FPR, where TPR (True Positive Rate) on Y-axis
and FPR(False Positive Rate) on X-axis.
Use cases of Classification Algorithms
Classification algorithms can be used in different places. Below are some popular use cases
of Classification Algorithms:
o Email Spam Detection
o Speech Recognition
o Identifications of Cancer tumor cells.
o Drugs Classification
o Biometric Identification, etc.
Classification Types
There are two main classification types in machine learning:
Binary Classification
In binary classification, the goal is to classify the input into one of two classes or categories.
Example – On the basis of the given health conditions of a person, we have to determine
whether the person has a certain disease or not.
Multiclass Classification
In multi-class classification, the goal is to classify the input into one of several classes or
categories. For Example – On the basis of data about different species of flowers, we have to
determine which specie our observation belongs to.
Note: Logistic regression uses the concept of predictive modeling as regression; therefore, it
is called logistic regression, but is used to classify samples; Therefore, it falls under the
classification algorithm.
Logistic Function (Sigmoid Function):
o The sigmoid function is a mathematical function used to map the predicted values to
probabilities.
o It maps any real value into another value within a range of 0 and 1.
o The value of the logistic regression must be between 0 and 1, which cannot go
beyond this limit, so it forms a curve like the "S" form. The S-form curve is called the
Sigmoid function or the logistic function.
o In logistic regression, we use the concept of the threshold value, which defines the
probability of either 0 or 1. Such as values above the threshold value tends to 1, and
a value below the threshold values tends to 0.
Assumptions for Logistic Regression:
o The dependent variable must be categorical in nature.
o The independent variable should not have multi-collinearity.
Logistic Regression Equation:
The Logistic regression equation can be obtained from the Linear Regression equation. The
mathematical steps to get Logistic Regression equations are given below:
o We know the equation of the straight line can be written as:
o In Logistic Regression y can be between 0 and 1 only, so for this let's divide the above
equation by (1-y):
o But we need range between -[infinity] to +[infinity], then take logarithm of the
equation it will become:
Example: SVM can be understood with the example that we have used in the KNN classifier.
Suppose we see a strange cat that also has some features of dogs, so if we want a model
that can accurately identify whether it is a cat or dog, so such a model can be created by
using the SVM algorithm. We will first train our model with lots of images of cats and dogs
so that it can learn about different features of cats and dogs, and then we test it with this
strange creature. So as support vector creates a decision boundary between these two data
(cat and dog) and choose extreme cases (support vectors), it will see the extreme case of cat
and dog. On the basis of the support vectors, it will classify it as a cat. Consider the below
diagram:
SVM algorithm can be used for Face detection, image classification, text categorization, etc.
Types of SVM
SVM can be of two types:
o Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset
can be classified into two classes by using a single straight line, then such data is
termed as linearly separable data, and classifier is used called as Linear SVM
classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which
means if a dataset cannot be classified by using a straight line, then such data is
termed as non-linear data and classifier used is called as Non-linear SVM classifier.
Hyperplane and Support Vectors in the SVM algorithm:
Hyperplane: There can be multiple lines/decision boundaries to segregate the classes in n-
dimensional space, but we need to find out the best decision boundary that helps to classify
the data points. This best boundary is known as the hyperplane of SVM.
The dimensions of the hyperplane depend on the features present in the dataset, which
means if there are 2 features (as shown in image), then hyperplane will be a straight line.
And if there are 3 features, then hyperplane will be a 2-dimension plane.
We always create a hyperplane that has a maximum margin, which means the maximum
distance between the data points.
Support Vectors:
The data points or vectors that are the closest to the hyperplane and which affect the
position of the hyperplane are termed as Support Vector. Since these vectors support the
hyperplane, hence called a Support vector.
How does SVM works?
Linear SVM:
The working of the SVM algorithm can be understood by using an example. Suppose we
have a dataset that has two tags (green and blue), and the dataset has two features x1 and
x2. We want a classifier that can classify the pair(x1, x2) of coordinates in either green or
blue. Consider the below image:
So as it is 2-d space so by just using a straight line, we can easily separate these two classes.
But there can be multiple lines that can separate these classes. Consider the below image:
Hence, the SVM algorithm helps to find the best line or decision boundary; this best
boundary or region is called as a hyperplane. SVM algorithm finds the closest point of the
lines from both the classes. These points are called support vectors. The distance between
the vectors and the hyperplane is called as margin. And the goal of SVM is to maximize this
margin. The hyperplane with maximum margin is called the optimal hyperplane.
Non-Linear SVM:
If data is linearly arranged, then we can separate it by using a straight line, but for non-linear
data, we cannot draw a single straight line. Consider the below image:
So to separate these data points, we need to add one more dimension. For linear data, we
have used two dimensions x and y, so for non-linear data, we will add a third dimension z. It
can be calculated as:
z=x2 +y2
By adding the third dimension, the sample space will become as below image:
So now, SVM will divide the datasets into classes in the following way. Consider the below
image:
Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis. If we convert
it in 2d space with z=1, then it will become as:
models.
Random Forest:
Random Forest is an extension over bagging. Each classifier in the ensemble is a decision
tree classifier and is generated using a random selection of attributes at each node to
determine the split. During classification, each tree votes and the most popular class is
returned.
Implementation steps of Random Forest –
1. Multiple subsets are created from the original data set, selecting observations
with replacement.
2. A subset of features is selected randomly and whichever feature gives the
best split is used to split the node iteratively.
3. The tree is grown to the largest.
4. Repeat the above steps and prediction is given based on the aggregation of
predictions from n number of trees.
Ensemble classifiers combine the predictions of multiple base models (weak or strong
learners) to create a more robust and accurate prediction model. The main idea is that by
leveraging the strengths of different models, ensembles can reduce bias, variance, or both,
often outperforming individual models.
Challenges
1. Complexity:
o More computationally intensive than single models.
o Difficult to interpret.
2. Diminishing Returns:
o Over-complex ensembles might not significantly improve performance.
3. Risk of Overfitting:
o If not managed, boosting models can overfit noisy data.
The Holdout Method is a simple and widely used technique for evaluating the performance
of a machine learning model. It involves splitting the dataset into separate training and
testing subsets, training the model on the training set, and evaluating its performance on
the testing set.
Computational Cost Low (single split and training) High (multiple splits and training)
Less efficient (unused data in each More efficient (uses entire dataset for
Data Utilization split) training)
Variance High (depends on the split) Low (averages performance across splits)
Use Case Quick prototyping or large datasets Robust model evaluation for small datasets
Best Practices
1. Shuffle Data:
Shuffle data before splitting to ensure randomization and prevent patterns in
the order.
2. Stratification:
Use stratified splitting for classification problems to maintain class balance.
3. Multiple Splits:
Repeat the holdout process with different splits to reduce variability.