0% found this document useful (0 votes)

15 views127 pages

ML Unit 3

Uploaded by

NEMANI SRINITYA NEMANI SRINITYA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views127 pages

ML Unit 3

Uploaded by

NEMANI SRINITYA NEMANI SRINITYA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 127

Classification

SVM

1
Syllabus

• Classification: Training a Binary Classifier, Performance

Measures, Measuring Accuracy Using Cross Validation,
Confusion Matrix, Precision and Recall, Precision/Recall
Tradeoff, The ROC Curve, Multiclass Classification, Error
Analysis, Multilabel Classification, Multi Output
Classification. k-NN Classifier.
Support Vector Machines: Linear SVM Classification,
Soft Margin Classification, Nonlinear SVM Classification.

2
What is the Classification ?
The Supervised Machine Learning algorithm can be broadly classified into
Classification and Regression Algorithms.

The Classification algorithm is a Supervised Learning technique that is used

to identify the category of new observations based on training data.

In Classification, a program learns from the given dataset or observations

and then classifies new observation into several classes or groups.
• EX: Yes or No, 0 or 1, Spam or Not Spam, cat or dog, etc.
• Classes can be called as targets/labels or categories.

3
Goal of Classification

In classification algorithm, a discrete output function(y) is mapped to input variable(x).

• y=f(x), where y = categorical output

The best example of an ML classification algorithm is Email Spam Detector.

Goal: The main goal of the Classification algorithm is

• To identify the category of a given dataset.
• The algorithms are mainly used to predict the output for the categorical data.
Classification algorithms can be better understood using this diagram.
• In diagram, there are two classes, class A and Class B.
• These classes have features that are similar to each other
• and dissimilar to other classes.

4
Types of Classification
The algorithm which implements the classification on a dataset is known as a classifier
Types of
Classifications.

Binary Multi-class
Classifier Classifier

If the classification problem has only two If a classification problem has more
possible outcomes, then it is called as than two outcomes, then it is called as
Binary Classifier. Multi-class Classifier.

Examples: YES or NO, MALE or FEMALE, Example: Classifications of types of

SPAM or NOT SPAM, CAT or DOG, etc. crops, Classification of types of music.
5
Application Observation 0 1

Medical Diagnosis Patient Healthy Diseased

Email Analysis Email Not Spam Spam

Financial Data Analysis Transaction Not Fraud Fraud

Marketing Website visitor Won't Buy Will Buy

Image Classification Image Hotdog Not Hotdog

6
Multi class classification
When we solve a classification problem having only two class labels, then it
becomes easy for us to filter the data, apply any classification algorithm, train
the model with filtered data, and predict the outcomes.

But when we have more than two class instances in input train data, then it
might get complex to analyze the data, train the model, and predict relatively
accurate results.

To handle these multiple class instances, we use multi-class classification.

Multi-class classification is the classification technique that allows us to

categorize the test data into multiple class labels present in trained data as a
model prediction. 7
8
9
10
11
12
13
Eager learners: Lazy learners
• Eager learners are machine learning • Instance-based learners, on the other
algorithms that first build a model from hand, do not create any model
the training dataset before making any immediately from the training data, and
prediction on future datasets. this is where the lazy aspect comes from.
• They spend more time during the • They just memorize the training data,
training process because of their and each time there is a need to make a
eagerness to have a better generalization prediction, they search for the nearest
during the training from learning the neighbor from the whole training data,
weights, but they require less time to which makes them very slow during
make predictions. prediction.
• Most machine learning algorithms are • Examples: K-Nearest Neighbor,
eager learners: • Case-based reasoning.
• Logistic Regression,
• Support Vector Machine,
• Decision Trees,
• Artificial Neural Networks. 14
Why Performance Metrics

How the How you

performance of weight the
There are We must ML algorithms importance of
various metrics carefully is measured various
which we can choose the and characteristics
use to evaluate metrics for in the result
the evaluating ML compared will will be
performance of performance be dependent influenced
ML algorithms. because − entirely on the completely by
metric you the metric you
choose. choose.

15
TYPES OF Performance Measures

Confusion AUC/ROC
Accuracy
Matrix curve

Precision Recall F1-Score

16
Confusion Matrix
• A confusion matrix is a tabular representation of prediction outcomes
of any binary classifier, which is used to describe the performance of
the classification model on a set of test data when true values are
known.
• The confusion matrix is simple to implement.

17
True positives(TP): The number of positive observations
the model correctly predicted as positive.

False-positive(FP): The number of negative observations

the model incorrectly predicted as positive.

False-negative(FN): The number of positive observations

the model incorrectly predicted as negative.

True negative(TN): The number of negative observations

the model correctly predicted as negative.
18
19
20
Confusion Matrix

Actual YES Actual No

Predicted TP=100 FP=10

YES

Predicted FN = 5 TN=50
NO

21
Confusion Matrix

Actual YES Actual No

Predicted TP=100 FP=10 110

YES

Predicted FN = 5 TN=50 55
NO
105 60
22
We can determine the following from the above matrix:

In this example, the total number of predictions are

165, out of which 110 time predicted yes, whereas 55
times predicted No.

However, in reality, 60 cases in which patients don't

have the disease, whereas 105 cases in which
patients have the disease.

23
24
25
26
Example:

• let’s assume our test set has 1100 images (1000 non-cat images, and
100 cat images), with the below confusion matrix.
• Out of 100 cat images the model has predicted 90 of them correctly
and has mis-classified 10 of them.
• The 90 samples predicted as cat are considered as as true-positive.
• The 10 samples predicted as non-cat are false negative.

• Out of 1000 non-cat images, the model has classified 940 of them
correctly, and mis-classified 60 of them.
• The 940 correctly classified samples are referred as true-negative.
• The 60 are referred as false-positive.

27
28
Accuracy
• Accuracy: The accuracy simply measures how often the Classifier
makes the correct predictions. It is the ratio between number of
correct predictions to the total number of predictions( No of data
points) .

Classification accuracy= (TP+TN) / (TP+FP+FN+TN)

= (90+940)/ (1000+100)
= 1030 / 1100= 93.6%
29
Precision
• The precision metric is used to overcome the limitation of
Accuracy.
• Precision is defined as the ratio of correctly classified positive
samples (True Positive) to a total number of classified positive
samples (either correctly or incorrectly).
• Precision = (True Positive)/(True Positive + False Positive)
• Precision = TP/(TP+FP)
=90/(90+60)
= (0.6)*100= 60%

• precision helps us to visualize the reliability of the machine

learning model in classifying the model as positive.
30
31
32
Recall / Sensitivity

• The recall is calculated as the ratio between the numbers of

Positive samples correctly classified as Positive to the total
number of Positive samples.
• The recall measures the model's ability to detect positive
samples.
• The higher the recall, the more positive samples detected.
Recall = True Positive/(True Positive + False Negative)
Recall = TP/(TP+FN)
= 90/(90+10)
33
=(0.9)*100=90%
34
35
Precision Recall
It helps us to measure the ability to classify It helps us to measure how many positive
positive samples in the model. samples were correctly classified.

While calculating the Precision of a model, we While calculating the Recall of a model, we only
should consider both Positive as well as need all positive samples while all negative
Negative samples that are classified. samples will be neglected.

The precision of a machine learning model is Recall of a machine learning model is dependent
dependent on both the negative and positive on positive samples and independent of negative
samples. samples.

In Precision, we should consider all positive The recall cares about correctly classifying all
samples that are classified as positive either positive samples. It does not consider if any
correctly or incorrectly. negative sample is classified as positive.
36
F1-score
The F1 score is the harmonic mean of precision and recall. The F1 score
will give a number between 0 and 1.
If the F1 score is 1.0 this indicates perfect precision and recall. If the F1
score is 0 this means that either the precision or the recall is 0.

F1-score make use of both precision and recall, so it should be used if both
of them are important for evaluation, but one (precision or recall) is
slightly more important to consider than the other.
For example, when False negatives are comparatively more important than
false positives, or vice versa. 37
38
39
40
41
42
43
ROC curve
• ROC known for Receiver operating characteristic curve.
• ROC represents a graph to show the performance of a classification
model at different threshold levels. The curve is plotted between two
parameters, which are:
• True Positive Rate
• False Positive Rate
• TPR or true Positive rate is a synonym for Recall, hence can be
calculated as:
• TPR = TP/(TP+FN).
• FPR or False Positive Rate can be calculated as:
• FPR = FP/(FP+TN).
44
AUC curve
• AUC is known for Area Under the ROC curve. As its name suggests, AUC
calculates the two-dimensional area under the entire ROC curve, as shown
below image:

• AUC calculates the performance across all the thresholds and provides an
aggregate measure.
• The value of AUC ranges from 0 to 1. It means a model with 100% wrong
prediction will have an AUC of 0.0, whereas models with 100% correct
predictions will have an AUC of 1.0. 45
Confusion Matrix

Actual YES Actual No

ACCURACY

Predicted 95 5
PRECISION
YES
RECALL
Predicted 5 45 F1-SCORE
NO

46
Problem Definition

Actual Actual
YES No

Predict 95 5
ed YES

Predict 5 45
ed NO
100 50
47
Confusion Matrix

Actual YES Actual No

Predicted TP=95 FP=5

YES

Predicted FN = 5 TN=45
NO

48
Accuracy

Act Actu
ual al

YES No

Predict TP FP=5
ed YES =95

Predict FN=5 TN=45

ed NO
49
Misclassification Rate

Actu
al Actual
YES No

Predic TP FP=5
ted =95
YES

Predic FN=5 TN=45

ted
NO
50
Precision
Actu
al Actual
YES No

Predic TP FP=5
ted =95
YES

Predic FN=5 TN=45

ted
NO

Precision = TP/(TP+FP) 51
Recall / Sensitivity
Act
ual Actu
YES al
No

Predi TP FP=5
cted =95
YES

Predi FN=5 TN=45

cted
NO

Recall = TP/(TP+FN) 52
FPR
Actu
al Actual
YES No

Predic TP FP=5
ted =95
YES

Predic FN=5 TN=45

ted
NO

53
TNR / Specificity
Actu
al Actual
YES No

Predic TP FP=5
ted =95
YES

Predic FN=5 TN=45

ted
NO

Specificity = TN/(TN+FP) 54
Prevalence
Actual
YES Actual
No

Predicted YES TP =95 FP=4

Predicted NO FN=5 TN=45

100 50
55
56
Out of Total Predicted Positive predicted by the model what is the % of Actual
positive values
i.e TP/(TP+FP).
Out of Total Actual Positive values by the model what is the % of Actual
positive values
TP/(TP+FN)
(here TP+FN is Actual YES)

57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
Error Analysis
• Error Analysis: Error analysis is the
process to isolate, observe and diagnose
erroneous ML predictions thereby
helping understand pockets of high and
low performance of the model.
• Manually examines the examples(in
cross validation set) that your algorithm
made errors on it.
• When it is said that “the model accuracy
is 80%” it might not be uniform across
subgroups of data and there might be
some input conditions which the model
fails more.

97
98
Multi Label Classification
Classification

Single Label Multi Label

Binary Multi class Problem Transformation

Method
YES or NO, One vs One,
T or F One vs Rest
Algorithm Adaption
Method 99
Multi Label

Problem Transformation Algorithm Adaption

Method Method

Multi single Level Take single level problem,

Adapt Multi class Problem.

Character Acting.
Method Acting

Copy, Copy-WT, Select-Min,

select-Max ML-DT,ML-KNN,ML-NB

100
Multi Label classification

101
Multi - Label Vs Multi - Class

102
Multi - Label Vs Multi - Class

103
Binary Vs Multi – Class Vs Multi - Label

104
Multi Label classification

105
K-Nearest Neighbor(KNN) Algorithm
K-NN algorithm stores all the
K-NN algorithm assumes available data and classifies a
K-Nearest Neighbour is
the similarity between the new data point based on the
one of the simplest similarity.
new case/data and
Machine Learning
available cases and put the This means when new data
algorithms based on
new case into the category appears then it can be easily
Supervised Learning
that is most similar to the classified into a well suite
technique. category by using K- NN
available categories.
algorithm.

KNN algorithm at the training phase

K-NN algorithm can be used for
just stores the dataset and when it
Regression as well as for
gets new data, then it classifies that
Classification but mostly it is used
data into a category that is much
for the Classification problems.
similar to the new data.
106
Suppose, we have an image of a creature
that looks similar to cat and dog, but we want
to know either it is a cat or dog.

So, for this identification, we can use the KNN

algorithm, as it works on a similarity
measure.

Our KNN model will find the similar features

of the new data set to the cats and dog's
images and based on the most similar
features it will put it in either cat or dog
category.
107
Why do we need a K-NN Algorithm?
Suppose there are two categories, i.e., Category A and Category B, and we have a
new data point x1, so this data point will lie in which of these categories.

To solve this type of problem, we need a K-NN algorithm. With the help of K-NN,
we can easily identify the category or class of a particular dataset.

108
How does K-NN work?
The K-NN working can be explained on the basis of the below algorithm:

Step-5:
Step-4: Assign the
Step-2: Step-3: Among new data
Step-1: Calculate Take the K these k points to
the nearest neighbors, that Step-6:
Select the
Euclidean neighbors count the category
number K Our model
distance as per the number of for which
of the is ready.
of K calculated the data the
neighbors
number of Euclidean points in number of
neighbors distance. each the
category. neighbor is
maximum.

109
Suppose we have a new data point,
and we need to put it in the required
category.

•Firstly, we will choose the number of

neighbors, so we will choose the k=5.

•Next, we will calculate

the Euclidean distance between the
data points.

•The Euclidean distance is the

distance between two points. 110
By calculating the Euclidean distance, we got the
nearest neighbors, as three nearest neighbors in
category A and two nearest neighbors in
category B.

As we can see the 3 nearest neighbors are

from category A, hence this new data point
must belong to category A.
111
How to select the value of K in the K-NN Algorithm?
There is no particular way to determine the best value for "K“.

so, we need to try some values to find the best out of them.

The optimal K value usually found is the square root of N, where N is the
total number of samples. The most preferred value for K is 5.

Euclidean distance, cosine similarity measure, Minkowsky, correlation, and

Chi square, are used in the k-NN for Distance caliculation.

A very low value for K such as K=1 or K=2, can be noisy and lead to the
effects of outliers in the model.

Large values for K are good, but it may find some difficulties. 112
Pros & Cons of KNN

Advantages of KNN Algorithm Disadvantages of KNN Algorithm

• It is simple to implement. • Always needs to determine the

• It is robust to the noisy training value of K which may be
data complex some time.
• It can be more effective if the • The computation cost is high
training data is large. because of calculating the
distance between the data
points for all the training
samples.

113
Applications of K - NN

K- NN based Isometric mapping K- NN based Recommendation System

114
SVM – Support Vector Machine
Support Vector Machine or SVM is one of the most popular Supervised
Learning algorithms.

It is used for Classification as well as Regression problems.

The goal of the SVM algorithm is to create the best line or decision
boundary that can segregate n-dimensional space into classes.
So that we can easily put the new data point in the correct category in the
future.

This best decision boundary is called a hyperplane.

115
• SVM chooses the extreme
points/vectors that help in creating
the hyperplane.

These extreme cases are called as

support vectors, and hence algorithm
is termed as Support Vector Machine.

116
▪ Suppose we see a strange cat
that also has some features of
dogs.

▪ so, if we want a model that can

accurately identify whether it is a
cat or dog, so such a model can
be created by using the SVM
algorithm.

▪ We will first train our model with

lots of images of cats and dogs.

So that it can learn about different features of cats and dogs,

and then we test it with this strange creature.
117
118
Types of SVM

Linear SVM: Non-linear SVM:

Linear SVM is used for linearly separable Non-Linear SVM is used for non-
data. linearly separated data, which means
which means if a dataset can be classified if a dataset cannot be classified by
into two classes by using a single straight using a straight line, then such data is
line, then such data is termed as linearly termed as non-linear data and
separable data, and classifier is used called classifier used is called as Non-linear
as Linear SVM classifier. SVM classifier.
119
How does SVM works?
• Suppose we have a dataset that has two tags
(green and blue), and the dataset has two
features x1 and x2.
• We want a classifier that can classify the pair(x1,
x2) of coordinates in either green or blue.
• See the image:

• So, as it is 2-D space so by just using a straight

line, we can easily separate these two classes.
• But there can be multiple lines that can separate
these classes. See the image:

120
• The SVM algorithm helps to find the best line
or decision boundary.
• This best boundary or region is called as
a hyperplane.
• SVM algorithm finds the closest point of the
lines from both the classes.
• These points are called support vectors. The
distance between the vectors and the
hyperplane is called as margin.
• The goal of SVM is to maximize this margin.
• The hyperplane with maximum margin is called
the optimal hyperplane.

121
• Non-Linear SVM:
• If data is linearly arranged, then we can separate
it by using a straight line.
• But for non-linear data, we cannot draw a single
straight line. See the beside image:

So, to separate these data points, we need

to add one more dimension. For linear
data, we have used two dimensions x and
y, so for non-linear data, we will add a
third-dimension Z.
It can be calculated as: z=x2 +y2 122
• So now, SVM will divide the
datasets into classes in the
following way.
• Consider the image:

• Since we are in 3-D Space,

• hence it is looking like a plane parallel to
the x-axis.
• If we convert it in 2-D space with z=1, then
it will become as:
• Hence, we get a circumference of radius 1
in case of non-linear data. 123
29-03-23
• Q- 6
• R-
• T – 5,7,8,9
• U- 5,9
• V- 3
• LE -25, 29

124
125
126
127

Lesson 4 - Performance Metrics
No ratings yet
Lesson 4 - Performance Metrics
46 pages
Ipl Prediction Documentation
No ratings yet
Ipl Prediction Documentation
18 pages
KNN Evaluation
No ratings yet
KNN Evaluation
51 pages
Module 2
No ratings yet
Module 2
151 pages
Confusion Matrix
No ratings yet
Confusion Matrix
43 pages
Cerad Total Score
No ratings yet
Cerad Total Score
6 pages
A Cardiovascular Disease Prediction Using Machine Learning Algorithms
No ratings yet
A Cardiovascular Disease Prediction Using Machine Learning Algorithms
10 pages
Confusion Matrix, Accuracy, Precision, Recall, F1 Score
No ratings yet
Confusion Matrix, Accuracy, Precision, Recall, F1 Score
1 page
Lecture 2 Classifier Performance Metrics
No ratings yet
Lecture 2 Classifier Performance Metrics
60 pages
Performance Metrics
No ratings yet
Performance Metrics
12 pages
Bayer's in Silico ADMET Platform - Journey of Machine Learning
No ratings yet
Bayer's in Silico ADMET Platform - Journey of Machine Learning
8 pages
Summer Internship Report
No ratings yet
Summer Internship Report
27 pages
Unit 4 ML
No ratings yet
Unit 4 ML
28 pages
Insert - Elecsys CA 15-3 II.03045838500.V22.en
No ratings yet
Insert - Elecsys CA 15-3 II.03045838500.V22.en
5 pages
Unit Ii
No ratings yet
Unit Ii
118 pages
BSC ML CH1
No ratings yet
BSC ML CH1
63 pages
0 Machine Learning Overview and Metrics LT
No ratings yet
0 Machine Learning Overview and Metrics LT
84 pages
Unit 5 Classification PDF
No ratings yet
Unit 5 Classification PDF
131 pages
Unit 4 Learning
No ratings yet
Unit 4 Learning
100 pages
Codebasics Data Science Bootcamp Brochure
No ratings yet
Codebasics Data Science Bootcamp Brochure
32 pages
Machine Learning Note
No ratings yet
Machine Learning Note
40 pages
Unit6 - 7 Issues
No ratings yet
Unit6 - 7 Issues
53 pages
A Comparative Analysis of Deep Learning Models For Flower Recognition and Health Prediction Proposal
No ratings yet
A Comparative Analysis of Deep Learning Models For Flower Recognition and Health Prediction Proposal
7 pages
ML CH 5
No ratings yet
ML CH 5
45 pages
Binary Classification PDF
No ratings yet
Binary Classification PDF
27 pages
CH-5 ML
No ratings yet
CH-5 ML
36 pages
ML Unit 2
No ratings yet
ML Unit 2
31 pages
MLP - Week 5 - MNIST - Perceptron - Ipynb - Colaboratory
No ratings yet
MLP - Week 5 - MNIST - Perceptron - Ipynb - Colaboratory
31 pages
Classification Metrics
No ratings yet
Classification Metrics
39 pages
Lecture 7
No ratings yet
Lecture 7
25 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
Session 1 Evaluation Model
No ratings yet
Session 1 Evaluation Model
58 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
41 pages
Lec 8
No ratings yet
Lec 8
35 pages
Intermediate Analytics-Regression-Week 3-1
No ratings yet
Intermediate Analytics-Regression-Week 3-1
44 pages
Lec5 Classification
No ratings yet
Lec5 Classification
27 pages
3-Performance Measures
No ratings yet
3-Performance Measures
35 pages
Chap3 Part1 Classification
No ratings yet
Chap3 Part1 Classification
38 pages
ML Notes UT-2
No ratings yet
ML Notes UT-2
19 pages
L 13 Choose Your Own Algorithm D 07062024 111828am
No ratings yet
L 13 Choose Your Own Algorithm D 07062024 111828am
36 pages
Performance Measures - Session 2
No ratings yet
Performance Measures - Session 2
35 pages
Classification Algorithm in Machine Learning
No ratings yet
Classification Algorithm in Machine Learning
13 pages
Chapter 7 - LAST
No ratings yet
Chapter 7 - LAST
29 pages
Chapter 3 Model Evaluation Final
No ratings yet
Chapter 3 Model Evaluation Final
30 pages
UNIT-1-2.Binary Classification and Related Tasks
No ratings yet
UNIT-1-2.Binary Classification and Related Tasks
22 pages
19-Performance Metrics
No ratings yet
19-Performance Metrics
23 pages
Accuracy Precision and Recall
No ratings yet
Accuracy Precision and Recall
21 pages
PM Project Logistic Regression LDA
No ratings yet
PM Project Logistic Regression LDA
22 pages
Chapter 5 Model Evaluation
No ratings yet
Chapter 5 Model Evaluation
21 pages
2-Training and Testing Models, Evaluation Metrics-01-07-2023
No ratings yet
2-Training and Testing Models, Evaluation Metrics-01-07-2023
23 pages
Paper9-Ijisae 12 Batini+Dhanwanth
No ratings yet
Paper9-Ijisae 12 Batini+Dhanwanth
10 pages
Module 7 - Evaluation Measures
No ratings yet
Module 7 - Evaluation Measures
27 pages
Heart Disease Prediction Using Binary Classification
No ratings yet
Heart Disease Prediction Using Binary Classification
44 pages
Myomectomy in Patient SC
No ratings yet
Myomectomy in Patient SC
53 pages
Real Time Prediction of Rock Mass Classification 2022 Journal of Rock Mecha
No ratings yet
Real Time Prediction of Rock Mass Classification 2022 Journal of Rock Mecha
21 pages
Unit 3
No ratings yet
Unit 3
13 pages
Ai DS 2 Book-Chpt-5
No ratings yet
Ai DS 2 Book-Chpt-5
17 pages
Accuracy and Error Measures
No ratings yet
Accuracy and Error Measures
14 pages
Session 2 Evaluation Boosting Bagging Contemporary Business Anaytics
No ratings yet
Session 2 Evaluation Boosting Bagging Contemporary Business Anaytics
17 pages
Date Fruit Classification Project
No ratings yet
Date Fruit Classification Project
11 pages
Confusion Matrix
No ratings yet
Confusion Matrix
14 pages
Iai&ml Unit-5
No ratings yet
Iai&ml Unit-5
15 pages
Evaluation Metrics-ML
No ratings yet
Evaluation Metrics-ML
16 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
11 pages
ML Metrics
No ratings yet
ML Metrics
9 pages
Instruction & Option Choice
No ratings yet
Instruction & Option Choice
6 pages
Machine Learningassignment
No ratings yet
Machine Learningassignment
10 pages
A Novel Brain Tumor Classification Model
No ratings yet
A Novel Brain Tumor Classification Model
12 pages
Classification Metrics Mod 6
No ratings yet
Classification Metrics Mod 6
8 pages
Mock Test Session 1 - Question Sheet
No ratings yet
Mock Test Session 1 - Question Sheet
36 pages
Malware Detection and Classification Based On Graph Convolutional Networks and Function Call Graphs
No ratings yet
Malware Detection and Classification Based On Graph Convolutional Networks and Function Call Graphs
11 pages
Evaluation Measures For Machine Learning Models
No ratings yet
Evaluation Measures For Machine Learning Models
6 pages
Evaluation Measures
No ratings yet
Evaluation Measures
8 pages
Using Machine Learning and Natural Language Processing Cancer
No ratings yet
Using Machine Learning and Natural Language Processing Cancer
9 pages
Imp Notes For Aamd
No ratings yet
Imp Notes For Aamd
6 pages
Comprehensive Guide On Confusion Matrix 1657202063
No ratings yet
Comprehensive Guide On Confusion Matrix 1657202063
5 pages
Evaluation of Predictive Models Final
No ratings yet
Evaluation of Predictive Models Final
6 pages
Keywords:-STS Score, MACE, CABG
No ratings yet
Keywords:-STS Score, MACE, CABG
6 pages
Suspicious Activity Recognition For Monitoring Cheating in 1thcpiyo
No ratings yet
Suspicious Activity Recognition For Monitoring Cheating in 1thcpiyo
10 pages
Biology of Sport
No ratings yet
Biology of Sport
10 pages
Machine-Learning-Based Spectrum Sensing Enhancement For Software-Defined Radio Applications
No ratings yet
Machine-Learning-Based Spectrum Sensing Enhancement For Software-Defined Radio Applications
6 pages
Naïve Bayes + Neural Network
No ratings yet
Naïve Bayes + Neural Network
10 pages
Mellowness Detection of Dragon Fruit Using Deep Learning Strategy
No ratings yet
Mellowness Detection of Dragon Fruit Using Deep Learning Strategy
9 pages
Khanum Et Al - 2013 - Predicting Impacts Climate Change Medicinal Asclepiads of Pakistan - Maxent
No ratings yet
Khanum Et Al - 2013 - Predicting Impacts Climate Change Medicinal Asclepiads of Pakistan - Maxent
9 pages
CAD System For Lung Nodule Detection Using Deep Learning With CNN
No ratings yet
CAD System For Lung Nodule Detection Using Deep Learning With CNN
8 pages
E Nose IoT
No ratings yet
E Nose IoT
8 pages
Comparison of Appendicitis Scoring Systems in Chil
No ratings yet
Comparison of Appendicitis Scoring Systems in Chil
6 pages
MELD Vs CPT
No ratings yet
MELD Vs CPT
6 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
From Everand
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
Idea Link
No ratings yet

ML Unit 3

Uploaded by

ML Unit 3

Uploaded by

Classification

• Classification: Training a Binary Classifier, Performance

The Classification algorithm is a Supervised Learning technique that is used

In Classification, a program learns from the given dataset or observations

In classification algorithm, a discrete output function(y) is mapped to input variable(x).

The best example of an ML classification algorithm is Email Spam Detector.

Goal: The main goal of the Classification algorithm is

Examples: YES or NO, MALE or FEMALE, Example: Classifications of types of

Medical Diagnosis Patient Healthy Diseased

Email Analysis Email Not Spam Spam

Financial Data Analysis Transaction Not Fraud Fraud

Marketing Website visitor Won't Buy Will Buy

Image Classification Image Hotdog Not Hotdog

To handle these multiple class instances, we use multi-class classification.

Multi-class classification is the classification technique that allows us to

How the How you

Precision Recall F1-Score

False-positive(FP): The number of negative observations

False-negative(FN): The number of positive observations

True negative(TN): The number of negative observations

Actual YES Actual No

Predicted TP=100 FP=10

Actual YES Actual No

Predicted TP=100 FP=10 110

In this example, the total number of predictions are

However, in reality, 60 cases in which patients don't

Classification accuracy= (TP+TN) / (TP+FP+FN+TN)

• precision helps us to visualize the reliability of the machine

• The recall is calculated as the ratio between the numbers of

Actual YES Actual No

Actual YES Actual No

Predicted TP=95 FP=5

Predict FN=5 TN=45

Predic FN=5 TN=45

Predic FN=5 TN=45

Predi FN=5 TN=45

Predic FN=5 TN=45

Predic FN=5 TN=45

Predicted YES TP =95 FP=4

Predicted NO FN=5 TN=45

Single Label Multi Label

Binary Multi class Problem Transformation

Problem Transformation Algorithm Adaption

Multi single Level Take single level problem,

Copy, Copy-WT, Select-Min,

KNN algorithm at the training phase

So, for this identification, we can use the KNN

Our KNN model will find the similar features

•Firstly, we will choose the number of

•Next, we will calculate

•The Euclidean distance is the

As we can see the 3 nearest neighbors are

Euclidean distance, cosine similarity measure, Minkowsky, correlation, and

Advantages of KNN Algorithm Disadvantages of KNN Algorithm

• It is simple to implement. • Always needs to determine the

K- NN based Isometric mapping K- NN based Recommendation System

It is used for Classification as well as Regression problems.

This best decision boundary is called a hyperplane.

These extreme cases are called as

▪ so, if we want a model that can

▪ We will first train our model with

So that it can learn about different features of cats and dogs,

Linear SVM: Non-linear SVM:

• So, as it is 2-D space so by just using a straight

So, to separate these data points, we need

• Since we are in 3-D Space,

You might also like