0% found this document useful (0 votes)
15 views127 pages

ML Unit 3

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views127 pages

ML Unit 3

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 127

Classification

&

SVM

1
Syllabus

• Classification: Training a Binary Classifier, Performance


Measures, Measuring Accuracy Using Cross Validation,
Confusion Matrix, Precision and Recall, Precision/Recall
Tradeoff, The ROC Curve, Multiclass Classification, Error
Analysis, Multilabel Classification, Multi Output
Classification. k-NN Classifier.
Support Vector Machines: Linear SVM Classification,
Soft Margin Classification, Nonlinear SVM Classification.

2
What is the Classification ?
The Supervised Machine Learning algorithm can be broadly classified into
Classification and Regression Algorithms.

The Classification algorithm is a Supervised Learning technique that is used


to identify the category of new observations based on training data.

In Classification, a program learns from the given dataset or observations


and then classifies new observation into several classes or groups.
• EX: Yes or No, 0 or 1, Spam or Not Spam, cat or dog, etc.
• Classes can be called as targets/labels or categories.

3
Goal of Classification

In classification algorithm, a discrete output function(y) is mapped to input variable(x).


• y=f(x), where y = categorical output

The best example of an ML classification algorithm is Email Spam Detector.

Goal: The main goal of the Classification algorithm is


• To identify the category of a given dataset.
• The algorithms are mainly used to predict the output for the categorical data.
Classification algorithms can be better understood using this diagram.
• In diagram, there are two classes, class A and Class B.
• These classes have features that are similar to each other
• and dissimilar to other classes.

4
Types of Classification
The algorithm which implements the classification on a dataset is known as a classifier
Types of
Classifications.

Binary Multi-class
Classifier Classifier

If the classification problem has only two If a classification problem has more
possible outcomes, then it is called as than two outcomes, then it is called as
Binary Classifier. Multi-class Classifier.

Examples: YES or NO, MALE or FEMALE, Example: Classifications of types of


SPAM or NOT SPAM, CAT or DOG, etc. crops, Classification of types of music.
5
Application Observation 0 1

Medical Diagnosis Patient Healthy Diseased

Email Analysis Email Not Spam Spam

Financial Data Analysis Transaction Not Fraud Fraud

Marketing Website visitor Won't Buy Will Buy

Image Classification Image Hotdog Not Hotdog


6
Multi class classification
When we solve a classification problem having only two class labels, then it
becomes easy for us to filter the data, apply any classification algorithm, train
the model with filtered data, and predict the outcomes.

But when we have more than two class instances in input train data, then it
might get complex to analyze the data, train the model, and predict relatively
accurate results.

To handle these multiple class instances, we use multi-class classification.

Multi-class classification is the classification technique that allows us to


categorize the test data into multiple class labels present in trained data as a
model prediction. 7
8
9
10
11
12
13
Eager learners: Lazy learners
• Eager learners are machine learning • Instance-based learners, on the other
algorithms that first build a model from hand, do not create any model
the training dataset before making any immediately from the training data, and
prediction on future datasets. this is where the lazy aspect comes from.
• They spend more time during the • They just memorize the training data,
training process because of their and each time there is a need to make a
eagerness to have a better generalization prediction, they search for the nearest
during the training from learning the neighbor from the whole training data,
weights, but they require less time to which makes them very slow during
make predictions. prediction.
• Most machine learning algorithms are • Examples: K-Nearest Neighbor,
eager learners: • Case-based reasoning.
• Logistic Regression,
• Support Vector Machine,
• Decision Trees,
• Artificial Neural Networks. 14
Why Performance Metrics

How the How you


performance of weight the
There are We must ML algorithms importance of
various metrics carefully is measured various
which we can choose the and characteristics
use to evaluate metrics for in the result
the evaluating ML compared will will be
performance of performance be dependent influenced
ML algorithms. because − entirely on the completely by
metric you the metric you
choose. choose.

15
TYPES OF Performance Measures

Confusion AUC/ROC
Accuracy
Matrix curve

Precision Recall F1-Score


16
Confusion Matrix
• A confusion matrix is a tabular representation of prediction outcomes
of any binary classifier, which is used to describe the performance of
the classification model on a set of test data when true values are
known.
• The confusion matrix is simple to implement.

or

17
True positives(TP): The number of positive observations
the model correctly predicted as positive.

False-positive(FP): The number of negative observations


the model incorrectly predicted as positive.

False-negative(FN): The number of positive observations


the model incorrectly predicted as negative.

True negative(TN): The number of negative observations


the model correctly predicted as negative.
18
19
20
Confusion Matrix

Actual YES Actual No

Predicted TP=100 FP=10


YES

Predicted FN = 5 TN=50
NO

21
Confusion Matrix

Actual YES Actual No

Predicted TP=100 FP=10 110


YES

Predicted FN = 5 TN=50 55
NO
105 60
22
We can determine the following from the above matrix:

In this example, the total number of predictions are


165, out of which 110 time predicted yes, whereas 55
times predicted No.

However, in reality, 60 cases in which patients don't


have the disease, whereas 105 cases in which
patients have the disease.

23
24
25
26
Example:

• let’s assume our test set has 1100 images (1000 non-cat images, and
100 cat images), with the below confusion matrix.
• Out of 100 cat images the model has predicted 90 of them correctly
and has mis-classified 10 of them.
• The 90 samples predicted as cat are considered as as true-positive.
• The 10 samples predicted as non-cat are false negative.

• Out of 1000 non-cat images, the model has classified 940 of them
correctly, and mis-classified 60 of them.
• The 940 correctly classified samples are referred as true-negative.
• The 60 are referred as false-positive.

27
28
Accuracy
• Accuracy: The accuracy simply measures how often the Classifier
makes the correct predictions. It is the ratio between number of
correct predictions to the total number of predictions( No of data
points) .

Classification accuracy= (TP+TN) / (TP+FP+FN+TN)


= (90+940)/ (1000+100)
= 1030 / 1100= 93.6%
29
Precision
• The precision metric is used to overcome the limitation of
Accuracy.
• Precision is defined as the ratio of correctly classified positive
samples (True Positive) to a total number of classified positive
samples (either correctly or incorrectly).
• Precision = (True Positive)/(True Positive + False Positive)
• Precision = TP/(TP+FP)
=90/(90+60)
= (0.6)*100= 60%

• precision helps us to visualize the reliability of the machine


learning model in classifying the model as positive.
30
31
32
Recall / Sensitivity

• The recall is calculated as the ratio between the numbers of


Positive samples correctly classified as Positive to the total
number of Positive samples.
• The recall measures the model's ability to detect positive
samples.
• The higher the recall, the more positive samples detected.
Recall = True Positive/(True Positive + False Negative)
Recall = TP/(TP+FN)
= 90/(90+10)
33
=(0.9)*100=90%
34
35
Precision Recall
It helps us to measure the ability to classify It helps us to measure how many positive
positive samples in the model. samples were correctly classified.

While calculating the Precision of a model, we While calculating the Recall of a model, we only
should consider both Positive as well as need all positive samples while all negative
Negative samples that are classified. samples will be neglected.

The precision of a machine learning model is Recall of a machine learning model is dependent
dependent on both the negative and positive on positive samples and independent of negative
samples. samples.

In Precision, we should consider all positive The recall cares about correctly classifying all
samples that are classified as positive either positive samples. It does not consider if any
correctly or incorrectly. negative sample is classified as positive.
36
F1-score
The F1 score is the harmonic mean of precision and recall. The F1 score
will give a number between 0 and 1.
If the F1 score is 1.0 this indicates perfect precision and recall. If the F1
score is 0 this means that either the precision or the recall is 0.

F1-score make use of both precision and recall, so it should be used if both
of them are important for evaluation, but one (precision or recall) is
slightly more important to consider than the other.
For example, when False negatives are comparatively more important than
false positives, or vice versa. 37
38
39
40
41
42
43
ROC curve
• ROC known for Receiver operating characteristic curve.
• ROC represents a graph to show the performance of a classification
model at different threshold levels. The curve is plotted between two
parameters, which are:
• True Positive Rate
• False Positive Rate
• TPR or true Positive rate is a synonym for Recall, hence can be
calculated as:
• TPR = TP/(TP+FN).
• FPR or False Positive Rate can be calculated as:
• FPR = FP/(FP+TN).
44
AUC curve
• AUC is known for Area Under the ROC curve. As its name suggests, AUC
calculates the two-dimensional area under the entire ROC curve, as shown
below image:

• AUC calculates the performance across all the thresholds and provides an
aggregate measure.
• The value of AUC ranges from 0 to 1. It means a model with 100% wrong
prediction will have an AUC of 0.0, whereas models with 100% correct
predictions will have an AUC of 1.0. 45
Confusion Matrix

Actual YES Actual No


ACCURACY

Predicted 95 5
PRECISION
YES
RECALL
Predicted 5 45 F1-SCORE
NO

46
Problem Definition

Actual Actual
YES No

Predict 95 5
ed YES

Predict 5 45
ed NO
100 50
47
Confusion Matrix

Actual YES Actual No

Predicted TP=95 FP=5


YES

Predicted FN = 5 TN=45
NO

48
Accuracy

Act Actu
ual al

YES No

Predict TP FP=5
ed YES =95

Predict FN=5 TN=45


ed NO
49
Misclassification Rate

Actu
al Actual
YES No

Predic TP FP=5
ted =95
YES

Predic FN=5 TN=45


ted
NO
50
Precision
Actu
al Actual
YES No

Predic TP FP=5
ted =95
YES

Predic FN=5 TN=45


ted
NO

Precision = TP/(TP+FP) 51
Recall / Sensitivity
Act
ual Actu
YES al
No

Predi TP FP=5
cted =95
YES

Predi FN=5 TN=45


cted
NO

Recall = TP/(TP+FN) 52
FPR
Actu
al Actual
YES No

Predic TP FP=5
ted =95
YES

Predic FN=5 TN=45


ted
NO

53
TNR / Specificity
Actu
al Actual
YES No

Predic TP FP=5
ted =95
YES

Predic FN=5 TN=45


ted
NO

Specificity = TN/(TN+FP) 54
Prevalence
Actual
YES Actual
No

Predicted YES TP =95 FP=4

Predicted NO FN=5 TN=45

100 50
55
56
Out of Total Predicted Positive predicted by the model what is the % of Actual
positive values
i.e TP/(TP+FP).
Out of Total Actual Positive values by the model what is the % of Actual
positive values
TP/(TP+FN)
(here TP+FN is Actual YES)

57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
Error Analysis
• Error Analysis: Error analysis is the
process to isolate, observe and diagnose
erroneous ML predictions thereby
helping understand pockets of high and
low performance of the model.
• Manually examines the examples(in
cross validation set) that your algorithm
made errors on it.
• When it is said that “the model accuracy
is 80%” it might not be uniform across
subgroups of data and there might be
some input conditions which the model
fails more.

97
98
Multi Label Classification
Classification

Single Label Multi Label

Binary Multi class Problem Transformation


Method
YES or NO, One vs One,
T or F One vs Rest
Algorithm Adaption
Method 99
Multi Label

Problem Transformation Algorithm Adaption


Method Method

Multi single Level Take single level problem,


Adapt Multi class Problem.

Character Acting.
Method Acting

Copy, Copy-WT, Select-Min,


select-Max ML-DT,ML-KNN,ML-NB

100
Multi Label classification

101
Multi - Label Vs Multi - Class

102
Multi - Label Vs Multi - Class

103
Binary Vs Multi – Class Vs Multi - Label

104
Multi Label classification

105
K-Nearest Neighbor(KNN) Algorithm
K-NN algorithm stores all the
K-NN algorithm assumes available data and classifies a
K-Nearest Neighbour is
the similarity between the new data point based on the
one of the simplest similarity.
new case/data and
Machine Learning
available cases and put the This means when new data
algorithms based on
new case into the category appears then it can be easily
Supervised Learning
that is most similar to the classified into a well suite
technique. category by using K- NN
available categories.
algorithm.

KNN algorithm at the training phase


K-NN algorithm can be used for
just stores the dataset and when it
Regression as well as for
gets new data, then it classifies that
Classification but mostly it is used
data into a category that is much
for the Classification problems.
similar to the new data.
106
Suppose, we have an image of a creature
that looks similar to cat and dog, but we want
to know either it is a cat or dog.

So, for this identification, we can use the KNN


algorithm, as it works on a similarity
measure.

Our KNN model will find the similar features


of the new data set to the cats and dog's
images and based on the most similar
features it will put it in either cat or dog
category.
107
Why do we need a K-NN Algorithm?
Suppose there are two categories, i.e., Category A and Category B, and we have a
new data point x1, so this data point will lie in which of these categories.

To solve this type of problem, we need a K-NN algorithm. With the help of K-NN,
we can easily identify the category or class of a particular dataset.

108
How does K-NN work?
The K-NN working can be explained on the basis of the below algorithm:

Step-5:
Step-4: Assign the
Step-2: Step-3: Among new data
Step-1: Calculate Take the K these k points to
the nearest neighbors, that Step-6:
Select the
Euclidean neighbors count the category
number K Our model
distance as per the number of for which
of the is ready.
of K calculated the data the
neighbors
number of Euclidean points in number of
neighbors distance. each the
category. neighbor is
maximum.

109
Suppose we have a new data point,
and we need to put it in the required
category.

•Firstly, we will choose the number of


neighbors, so we will choose the k=5.

•Next, we will calculate


the Euclidean distance between the
data points.

•The Euclidean distance is the


distance between two points. 110
By calculating the Euclidean distance, we got the
nearest neighbors, as three nearest neighbors in
category A and two nearest neighbors in
category B.

As we can see the 3 nearest neighbors are


from category A, hence this new data point
must belong to category A.
111
How to select the value of K in the K-NN Algorithm?
There is no particular way to determine the best value for "K“.

so, we need to try some values to find the best out of them.

The optimal K value usually found is the square root of N, where N is the
total number of samples. The most preferred value for K is 5.

Euclidean distance, cosine similarity measure, Minkowsky, correlation, and


Chi square, are used in the k-NN for Distance caliculation.

A very low value for K such as K=1 or K=2, can be noisy and lead to the
effects of outliers in the model.

Large values for K are good, but it may find some difficulties. 112
Pros & Cons of KNN

Advantages of KNN Algorithm Disadvantages of KNN Algorithm

• It is simple to implement. • Always needs to determine the


• It is robust to the noisy training value of K which may be
data complex some time.
• It can be more effective if the • The computation cost is high
training data is large. because of calculating the
distance between the data
points for all the training
samples.

113
Applications of K - NN

K- NN based Isometric mapping K- NN based Recommendation System


114
SVM – Support Vector Machine
Support Vector Machine or SVM is one of the most popular Supervised
Learning algorithms.

It is used for Classification as well as Regression problems.

The goal of the SVM algorithm is to create the best line or decision
boundary that can segregate n-dimensional space into classes.
So that we can easily put the new data point in the correct category in the
future.

This best decision boundary is called a hyperplane.

115
• SVM chooses the extreme
points/vectors that help in creating
the hyperplane.

These extreme cases are called as


support vectors, and hence algorithm
is termed as Support Vector Machine.

116
▪ Suppose we see a strange cat
that also has some features of
dogs.

▪ so, if we want a model that can


accurately identify whether it is a
cat or dog, so such a model can
be created by using the SVM
algorithm.

▪ We will first train our model with


lots of images of cats and dogs.

So that it can learn about different features of cats and dogs,


and then we test it with this strange creature.
117
118
Types of SVM

Linear SVM: Non-linear SVM:


Linear SVM is used for linearly separable Non-Linear SVM is used for non-
data. linearly separated data, which means
which means if a dataset can be classified if a dataset cannot be classified by
into two classes by using a single straight using a straight line, then such data is
line, then such data is termed as linearly termed as non-linear data and
separable data, and classifier is used called classifier used is called as Non-linear
as Linear SVM classifier. SVM classifier.
119
How does SVM works?
• Suppose we have a dataset that has two tags
(green and blue), and the dataset has two
features x1 and x2.
• We want a classifier that can classify the pair(x1,
x2) of coordinates in either green or blue.
• See the image:

• So, as it is 2-D space so by just using a straight


line, we can easily separate these two classes.
• But there can be multiple lines that can separate
these classes. See the image:

120
• The SVM algorithm helps to find the best line
or decision boundary.
• This best boundary or region is called as
a hyperplane.
• SVM algorithm finds the closest point of the
lines from both the classes.
• These points are called support vectors. The
distance between the vectors and the
hyperplane is called as margin.
• The goal of SVM is to maximize this margin.
• The hyperplane with maximum margin is called
the optimal hyperplane.

121
• Non-Linear SVM:
• If data is linearly arranged, then we can separate
it by using a straight line.
• But for non-linear data, we cannot draw a single
straight line. See the beside image:

So, to separate these data points, we need


to add one more dimension. For linear
data, we have used two dimensions x and
y, so for non-linear data, we will add a
third-dimension Z.
It can be calculated as: z=x2 +y2 122
• So now, SVM will divide the
datasets into classes in the
following way.
• Consider the image:

• Since we are in 3-D Space,


• hence it is looking like a plane parallel to
the x-axis.
• If we convert it in 2-D space with z=1, then
it will become as:
• Hence, we get a circumference of radius 1
in case of non-linear data. 123
29-03-23
• Q- 6
• R-
• T – 5,7,8,9
• U- 5,9
• V- 3
• LE -25, 29

124
125
126
127

You might also like