ML Unit 3
ML Unit 3
&
SVM
1
Syllabus
2
What is the Classification ?
The Supervised Machine Learning algorithm can be broadly classified into
Classification and Regression Algorithms.
3
Goal of Classification
4
Types of Classification
The algorithm which implements the classification on a dataset is known as a classifier
Types of
Classifications.
Binary Multi-class
Classifier Classifier
If the classification problem has only two If a classification problem has more
possible outcomes, then it is called as than two outcomes, then it is called as
Binary Classifier. Multi-class Classifier.
But when we have more than two class instances in input train data, then it
might get complex to analyze the data, train the model, and predict relatively
accurate results.
15
TYPES OF Performance Measures
Confusion AUC/ROC
Accuracy
Matrix curve
or
17
True positives(TP): The number of positive observations
the model correctly predicted as positive.
Predicted FN = 5 TN=50
NO
21
Confusion Matrix
Predicted FN = 5 TN=50 55
NO
105 60
22
We can determine the following from the above matrix:
23
24
25
26
Example:
• let’s assume our test set has 1100 images (1000 non-cat images, and
100 cat images), with the below confusion matrix.
• Out of 100 cat images the model has predicted 90 of them correctly
and has mis-classified 10 of them.
• The 90 samples predicted as cat are considered as as true-positive.
• The 10 samples predicted as non-cat are false negative.
• Out of 1000 non-cat images, the model has classified 940 of them
correctly, and mis-classified 60 of them.
• The 940 correctly classified samples are referred as true-negative.
• The 60 are referred as false-positive.
27
28
Accuracy
• Accuracy: The accuracy simply measures how often the Classifier
makes the correct predictions. It is the ratio between number of
correct predictions to the total number of predictions( No of data
points) .
While calculating the Precision of a model, we While calculating the Recall of a model, we only
should consider both Positive as well as need all positive samples while all negative
Negative samples that are classified. samples will be neglected.
The precision of a machine learning model is Recall of a machine learning model is dependent
dependent on both the negative and positive on positive samples and independent of negative
samples. samples.
In Precision, we should consider all positive The recall cares about correctly classifying all
samples that are classified as positive either positive samples. It does not consider if any
correctly or incorrectly. negative sample is classified as positive.
36
F1-score
The F1 score is the harmonic mean of precision and recall. The F1 score
will give a number between 0 and 1.
If the F1 score is 1.0 this indicates perfect precision and recall. If the F1
score is 0 this means that either the precision or the recall is 0.
F1-score make use of both precision and recall, so it should be used if both
of them are important for evaluation, but one (precision or recall) is
slightly more important to consider than the other.
For example, when False negatives are comparatively more important than
false positives, or vice versa. 37
38
39
40
41
42
43
ROC curve
• ROC known for Receiver operating characteristic curve.
• ROC represents a graph to show the performance of a classification
model at different threshold levels. The curve is plotted between two
parameters, which are:
• True Positive Rate
• False Positive Rate
• TPR or true Positive rate is a synonym for Recall, hence can be
calculated as:
• TPR = TP/(TP+FN).
• FPR or False Positive Rate can be calculated as:
• FPR = FP/(FP+TN).
44
AUC curve
• AUC is known for Area Under the ROC curve. As its name suggests, AUC
calculates the two-dimensional area under the entire ROC curve, as shown
below image:
• AUC calculates the performance across all the thresholds and provides an
aggregate measure.
• The value of AUC ranges from 0 to 1. It means a model with 100% wrong
prediction will have an AUC of 0.0, whereas models with 100% correct
predictions will have an AUC of 1.0. 45
Confusion Matrix
Predicted 95 5
PRECISION
YES
RECALL
Predicted 5 45 F1-SCORE
NO
46
Problem Definition
Actual Actual
YES No
Predict 95 5
ed YES
Predict 5 45
ed NO
100 50
47
Confusion Matrix
Predicted FN = 5 TN=45
NO
48
Accuracy
Act Actu
ual al
YES No
Predict TP FP=5
ed YES =95
Actu
al Actual
YES No
Predic TP FP=5
ted =95
YES
Predic TP FP=5
ted =95
YES
Precision = TP/(TP+FP) 51
Recall / Sensitivity
Act
ual Actu
YES al
No
Predi TP FP=5
cted =95
YES
Recall = TP/(TP+FN) 52
FPR
Actu
al Actual
YES No
Predic TP FP=5
ted =95
YES
53
TNR / Specificity
Actu
al Actual
YES No
Predic TP FP=5
ted =95
YES
Specificity = TN/(TN+FP) 54
Prevalence
Actual
YES Actual
No
100 50
55
56
Out of Total Predicted Positive predicted by the model what is the % of Actual
positive values
i.e TP/(TP+FP).
Out of Total Actual Positive values by the model what is the % of Actual
positive values
TP/(TP+FN)
(here TP+FN is Actual YES)
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
Error Analysis
• Error Analysis: Error analysis is the
process to isolate, observe and diagnose
erroneous ML predictions thereby
helping understand pockets of high and
low performance of the model.
• Manually examines the examples(in
cross validation set) that your algorithm
made errors on it.
• When it is said that “the model accuracy
is 80%” it might not be uniform across
subgroups of data and there might be
some input conditions which the model
fails more.
97
98
Multi Label Classification
Classification
Character Acting.
Method Acting
100
Multi Label classification
101
Multi - Label Vs Multi - Class
102
Multi - Label Vs Multi - Class
103
Binary Vs Multi – Class Vs Multi - Label
104
Multi Label classification
105
K-Nearest Neighbor(KNN) Algorithm
K-NN algorithm stores all the
K-NN algorithm assumes available data and classifies a
K-Nearest Neighbour is
the similarity between the new data point based on the
one of the simplest similarity.
new case/data and
Machine Learning
available cases and put the This means when new data
algorithms based on
new case into the category appears then it can be easily
Supervised Learning
that is most similar to the classified into a well suite
technique. category by using K- NN
available categories.
algorithm.
To solve this type of problem, we need a K-NN algorithm. With the help of K-NN,
we can easily identify the category or class of a particular dataset.
108
How does K-NN work?
The K-NN working can be explained on the basis of the below algorithm:
Step-5:
Step-4: Assign the
Step-2: Step-3: Among new data
Step-1: Calculate Take the K these k points to
the nearest neighbors, that Step-6:
Select the
Euclidean neighbors count the category
number K Our model
distance as per the number of for which
of the is ready.
of K calculated the data the
neighbors
number of Euclidean points in number of
neighbors distance. each the
category. neighbor is
maximum.
109
Suppose we have a new data point,
and we need to put it in the required
category.
so, we need to try some values to find the best out of them.
The optimal K value usually found is the square root of N, where N is the
total number of samples. The most preferred value for K is 5.
A very low value for K such as K=1 or K=2, can be noisy and lead to the
effects of outliers in the model.
Large values for K are good, but it may find some difficulties. 112
Pros & Cons of KNN
113
Applications of K - NN
The goal of the SVM algorithm is to create the best line or decision
boundary that can segregate n-dimensional space into classes.
So that we can easily put the new data point in the correct category in the
future.
115
• SVM chooses the extreme
points/vectors that help in creating
the hyperplane.
116
▪ Suppose we see a strange cat
that also has some features of
dogs.
120
• The SVM algorithm helps to find the best line
or decision boundary.
• This best boundary or region is called as
a hyperplane.
• SVM algorithm finds the closest point of the
lines from both the classes.
• These points are called support vectors. The
distance between the vectors and the
hyperplane is called as margin.
• The goal of SVM is to maximize this margin.
• The hyperplane with maximum margin is called
the optimal hyperplane.
121
• Non-Linear SVM:
• If data is linearly arranged, then we can separate
it by using a straight line.
• But for non-linear data, we cannot draw a single
straight line. See the beside image:
124
125
126
127