0% found this document useful (0 votes)

6 views13 pages

MLS 2 - Classification

Uploaded by

golgothgolgoth039

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views13 pages

MLS 2 - Classification

Uploaded by

golgothgolgoth039

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

jacques.lethuaut@gmail.

com
R8L0PN473F
Classiﬁcation

This file is meant for personal use by [email protected] only.

2. Gaussian Models

3. Logistic Regression

4. Performance Assessments
[email protected]
R8L0PN473F

5. K-Nearest Neighbors

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 2
Discussion questions
1. Why do we use logistic regression?
2. What is a confusion matrix and how can you interpret it?
3. Why is accuracy not always a good performance measure?
4. How to choose the threshold using the Precision-Recall curve?
5. Is there a performance measure that can cover both Precision and Recall?
[email protected]
R8L0PN473F
6. How does the K-NN algorithm work? How to identify K in this algorithm?

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 3
Why do we use logistic regression?
● Logistic Regression is a supervised learning algorithm that is used for classiﬁcation problems, i.e., where the
dependent variable is categorical.
● In logistic regression, we use the Sigmoid function to calculate the probability of the dependent variable.
● The real-life applications of logistic regression are churn prediction, spam detection, etc.
● The below image shows how logistic regression is different from linear regression in ﬁtting the model.

[email protected]
R8L0PN473F

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action. Image Source
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 4
Confusion matrix
It is used to measure the performance of a classiﬁcation
algorithm. It can be used to calculate the following metrics:
Actual Values
1. Accuracy: Proportion of correctly predicted results
among the total number of observations Positive (1) Negative (0)

Accuracy = (TP+TN)/(TP+FP+FN+TN)
[email protected]
Positive (1) TP FP

Predicted Values
R8L0PN473F
2. Precision: Proportion of true positives to all the
predicted positives, i.e., how valid the predictions are

Precision = (TP)/(TP+FP)
Negative (0) FN TN
3. Recall: Proportion of true positives to all the actual
positives, i.e., how complete the predictions are

Recall = (TP)/(TP+FN)
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 5
Why accuracy is not always a good performance measure
Accuracy is simply the overall % of correct predictions and can be high even for very useless models.

# Total
Model Misses out on
Patients – 100
Cancer rate – predicts that Accuracy – 2 critical
# of Patients
2% no one has 98% patients
having cancer
cancer having cancer
-2

Here, accuracy
[email protected]
●
R8L0PN473F
will be 98%, even if we simply ● The other important metrics are Recall and
predict that every patient does not have cancer. Precision:
● In this case, Recall should be used as a measure of ○ Recall - What % of actuals 1s did the model
model performance; high recall imply fewer false capture in prediction?
negatives. ○ Precision - What % of predicted 1s are
● Fewer false negatives implies a lower chance of actual 1s?
‘missing’ a cancer patient, i.e., predicting a cancer ● There is a tradeoff - as you try to increase the
patient as one not having cancer. Recall, the Precision will reduce and vice versa.
● This is where we need other metrics to evaluate ● This tradeoff can be used to ﬁgure out the right
model performance. threshold to use for the model.

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 6
How to chose thresholds using the Precision-Recall curve?
● The Precision-Recall curve is a useful measure of
the success of prediction when the classes are
imbalanced.
● The curve shows the tradeoff between the precision
and the recall for different thresholds.
● It can be used to select an optimal threshold as
required to improve the model performance.
[email protected]
● Here, as we can see, the precision and the recall are
R8L0PN473F
almost equal when the threshold is around 0.4.
● If we want a higher precision, we can increase the
threshold.
● If we want a higher recall, we can decrease the
Choosing different thresholds can
threshold.
completely change the model’s
performance.
It is important to think about what
constitutes the ‘sweet spot’.

This file is meant for personal use by [email protected] only.

● F1 Score is a measure that takes into account both Precision and Recall.
● The F1 Score is the harmonic mean of Precision and Recall. Therefore, this score takes both false positives and false
negatives into account.

[email protected]
R8L0PN473F

● The highest possible value of the F1 score is 1, indicating perfect precision and recall, and the lowest possible value is
0.

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 8
K-Nearest Neighbours (K-NN) algorithm
This algorithm uses features from the training data to predict the values of new data points, which means the
new data point will be assigned a value based on how similar it is to the data points in the training set. We can
deﬁne its working in the following steps:
● Step 1: We need to choose the value of K, i.e., the number of nearest data points to consider. K can be any
positive integer.
● Step 2: For each point in the test data do the following:
Calculate the distance between the test point and each training point with the help of any of the
[email protected]
R8L0PN473F○
distance methods, namely: Euclidean, Manhattan, etc. The most commonly used method to calculate
the distance is the Euclidean method.
○ Now, based on the distance value, sort them in ascending order.
○ Next, choose the top K rows from the sorted array.
○ Now, assign a class to the test point based on the most frequent class.
● Step 3: Repeat this process until all the test points are classiﬁed in a
particular class.
We try different values of K and plot them against the test error. The lower the
value of the test error, the better the
This file value
is meantofforK.personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 9
[email protected]
R8L0PN473F Case Study

This file is meant for personal use by [email protected] only.

Linear Discriminant Analysis Quadratic Discriminant Analysis

It is a linear classifier but much less It is a non-linear classifier but more flexible
flexible than QDA than LDA

It assumes a common covariance matrix for It assumes that each class has its
R8L0PN473F all the classes covariance matrix
[email protected]

It is preferred when the training set only It is preferred when the training set is very
has a few observations large

It can be used as a dimensionality It cannot be used as a dimensionality

reduction technique reduction technique

This file is meant for personal use by [email protected] only.

Int3209 - Data Mining: Week 5: Classification Model Improvements
No ratings yet
Int3209 - Data Mining: Week 5: Classification Model Improvements
56 pages
PDF
0% (1)
PDF
1 page
Zomato SQL Analysis Project
No ratings yet
Zomato SQL Analysis Project
23 pages
Programming With Python and GUI Development... 2024
No ratings yet
Programming With Python and GUI Development... 2024
145 pages
9b. Evaluation of Classifiers
No ratings yet
9b. Evaluation of Classifiers
4 pages
Idea User's Guide - Aac
100% (2)
Idea User's Guide - Aac
118 pages
Tax Invoice Cum Acknowledgement Receipt of PAN Application (Form 49A)
No ratings yet
Tax Invoice Cum Acknowledgement Receipt of PAN Application (Form 49A)
1 page
OPC UA Part 1 - Overview and Concepts 1.02 Specification
No ratings yet
OPC UA Part 1 - Overview and Concepts 1.02 Specification
30 pages
Supervised Classification Notes
No ratings yet
Supervised Classification Notes
31 pages
ERROR and Confusion Matrix
No ratings yet
ERROR and Confusion Matrix
29 pages
Performance Measures
No ratings yet
Performance Measures
32 pages
Introduction To Machine Learning and Logistic Regression
No ratings yet
Introduction To Machine Learning and Logistic Regression
28 pages
CS305 Exercise 5: Task 1: Comparing Machine Learning Algorithms
No ratings yet
CS305 Exercise 5: Task 1: Comparing Machine Learning Algorithms
7 pages
Vitara Service Manual
No ratings yet
Vitara Service Manual
835 pages
List - Midterm - 1 ML
No ratings yet
List - Midterm - 1 ML
6 pages
MLS - Logistic Regression
No ratings yet
MLS - Logistic Regression
13 pages
Module 2
No ratings yet
Module 2
72 pages
BAI 3303 Notes
No ratings yet
BAI 3303 Notes
12 pages
Unit4 PPT
No ratings yet
Unit4 PPT
126 pages
Mapúa University: Mesh Analysis and Nodal Analysis
No ratings yet
Mapúa University: Mesh Analysis and Nodal Analysis
9 pages
Lecture 7 Classification
No ratings yet
Lecture 7 Classification
33 pages
Machine Learning Model
No ratings yet
Machine Learning Model
9 pages
KNN Evaluation
No ratings yet
KNN Evaluation
51 pages
Chp1 Precision Recall Tradeoff
No ratings yet
Chp1 Precision Recall Tradeoff
11 pages
Performance Metrics (Classification) : Enrique J. de La Hoz D
100% (1)
Performance Metrics (Classification) : Enrique J. de La Hoz D
30 pages
Unit 2
No ratings yet
Unit 2
20 pages
Sanatander Analysis
No ratings yet
Sanatander Analysis
19 pages
Smart LOCK LEZN Smart Zoom Company Door Lock Offers
No ratings yet
Smart LOCK LEZN Smart Zoom Company Door Lock Offers
11 pages
ML CLASS 5 Logistic Regression Algorithm
No ratings yet
ML CLASS 5 Logistic Regression Algorithm
16 pages
INSY446 - 4 - Classification Part 1
No ratings yet
INSY446 - 4 - Classification Part 1
26 pages
NoSQL Databases
No ratings yet
NoSQL Databases
6 pages
5 Markd
No ratings yet
5 Markd
24 pages
Classification
No ratings yet
Classification
74 pages
Practical 7 Classification Revision Questions
No ratings yet
Practical 7 Classification Revision Questions
8 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
A10 Model Performance v2 2up
No ratings yet
A10 Model Performance v2 2up
11 pages
CGDF-Junior Auditor - 2019 (IBA) - Math Part Solution by Khairul Alam
No ratings yet
CGDF-Junior Auditor - 2019 (IBA) - Math Part Solution by Khairul Alam
8 pages
? Task
No ratings yet
? Task
23 pages
Classification and K Nearest Neighbour Algorithm
No ratings yet
Classification and K Nearest Neighbour Algorithm
53 pages
B-56 Sanket Jambhulkar MLA-3
No ratings yet
B-56 Sanket Jambhulkar MLA-3
7 pages
ML Questions Answers
No ratings yet
ML Questions Answers
4 pages
ML 2 PPT Unit 2
No ratings yet
ML 2 PPT Unit 2
214 pages
Logistic Regression Tech Document
No ratings yet
Logistic Regression Tech Document
12 pages
DS Unit 4
No ratings yet
DS Unit 4
13 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
Linear Regression Vs Logistic Regression
No ratings yet
Linear Regression Vs Logistic Regression
8 pages
Assingment On Database
No ratings yet
Assingment On Database
16 pages
ICMsystem DS E102
No ratings yet
ICMsystem DS E102
4 pages
Cisco D9854
No ratings yet
Cisco D9854
8 pages
Performance Parameters
No ratings yet
Performance Parameters
14 pages
Intermediate Analytics-Regression-Week 3-1
No ratings yet
Intermediate Analytics-Regression-Week 3-1
44 pages
Machine Lar Arii
No ratings yet
Machine Lar Arii
9 pages
USB 2.0 To RS232 - Installation - Guide-Update
No ratings yet
USB 2.0 To RS232 - Installation - Guide-Update
15 pages
2-Training and Testing Models, Evaluation Metrics-01-07-2023
No ratings yet
2-Training and Testing Models, Evaluation Metrics-01-07-2023
23 pages
Domande Complete ML UNIPD
No ratings yet
Domande Complete ML UNIPD
12 pages
Hands On Machine Learning 3 Edition
No ratings yet
Hands On Machine Learning 3 Edition
31 pages
04 - Model Selection
No ratings yet
04 - Model Selection
62 pages
Xchapter 1
No ratings yet
Xchapter 1
31 pages
MISY 631 Final Review Calculators Will Be Provided For The Exam
No ratings yet
MISY 631 Final Review Calculators Will Be Provided For The Exam
9 pages
Module 3
No ratings yet
Module 3
63 pages
Machine Learningassignment
No ratings yet
Machine Learningassignment
10 pages
CH-5 ML
No ratings yet
CH-5 ML
36 pages
DL IT324a 4
No ratings yet
DL IT324a 4
52 pages
L 13 Choose Your Own Algorithm D 07062024 111828am
No ratings yet
L 13 Choose Your Own Algorithm D 07062024 111828am
36 pages
Session01 DataScience
No ratings yet
Session01 DataScience
79 pages
Data Science Notes
No ratings yet
Data Science Notes
36 pages
Kishore
No ratings yet
Kishore
50 pages
Connections I V2.1.0
No ratings yet
Connections I V2.1.0
49 pages
Machine Learning Cheatsheet
No ratings yet
Machine Learning Cheatsheet
12 pages
Unit6 - 7 Issues
No ratings yet
Unit6 - 7 Issues
53 pages
Introduction To Machine Learning: ETH Zurich Janik Schuettler Marcel Graetz FS18
No ratings yet
Introduction To Machine Learning: ETH Zurich Janik Schuettler Marcel Graetz FS18
18 pages
850c Display Manual Biktrix Version
No ratings yet
850c Display Manual Biktrix Version
9 pages
Notebook - Deep Neural Networks
No ratings yet
Notebook - Deep Neural Networks
28 pages
DM Unit - 3
No ratings yet
DM Unit - 3
21 pages
Project Report On Secondary Research
No ratings yet
Project Report On Secondary Research
7 pages
Practical No. 1 Aim: The Euclid Problem Theory
No ratings yet
Practical No. 1 Aim: The Euclid Problem Theory
4 pages
Stock Market Dashboard in Python
No ratings yet
Stock Market Dashboard in Python
4 pages
The Fourth Industrial Revolution Will Be People Powered
No ratings yet
The Fourth Industrial Revolution Will Be People Powered
8 pages
Coding Guidelines
No ratings yet
Coding Guidelines
22 pages
Report 2.0
No ratings yet
Report 2.0
28 pages
ML LVC 3 Post-Session Summary
No ratings yet
ML LVC 3 Post-Session Summary
16 pages
Notebook - Music Recommendation System Reference
No ratings yet
Notebook - Music Recommendation System Reference
22 pages
Vanderbilt #Readyforanychallenge: Cybersecurity: How To Implement Best Practices
No ratings yet
Vanderbilt #Readyforanychallenge: Cybersecurity: How To Implement Best Practices
6 pages
The CNN Architecture
No ratings yet
The CNN Architecture
15 pages
MLS 1 - Regression
No ratings yet
MLS 1 - Regression
20 pages
Notebook - Text Classification
No ratings yet
Notebook - Text Classification
7 pages
MLS 1 - Presentation
No ratings yet
MLS 1 - Presentation
11 pages
My Mine
No ratings yet
My Mine
5 pages
CPE/EE 421/521 Fall 2004 Chapter 1 - The Microcomputer: Dr. Rhonda Kay Gaede
No ratings yet
CPE/EE 421/521 Fall 2004 Chapter 1 - The Microcomputer: Dr. Rhonda Kay Gaede
6 pages
1 3 Multiple Hypothesis Testing
No ratings yet
1 3 Multiple Hypothesis Testing
14 pages
Time Series Analysis 1718649022
No ratings yet
Time Series Analysis 1718649022
5 pages
Building A Tanh Activation Function
No ratings yet
Building A Tanh Activation Function
9 pages
5 2-6 Spatial Environmental Data Gaussian Processes
No ratings yet
5 2-6 Spatial Environmental Data Gaussian Processes
4 pages
ML LVC 2 Post-Session Summary
No ratings yet
ML LVC 2 Post-Session Summary
12 pages
40 Gbps QSFP Cables Ds
No ratings yet
40 Gbps QSFP Cables Ds
2 pages
Lecture Set 1
No ratings yet
Lecture Set 1
14 pages
5 3-2 Spatial Environmental Data Model Selection Long-Range Dependencies
No ratings yet
5 3-2 Spatial Environmental Data Model Selection Long-Range Dependencies
3 pages
Notebook - Agave Plant Maturation Model Inference and Testing
No ratings yet
Notebook - Agave Plant Maturation Model Inference and Testing
7 pages
5 2-4 Spatial Environmental Data Gaussian Processes
No ratings yet
5 2-4 Spatial Environmental Data Gaussian Processes
3 pages
New System To Harness 40% of The Sun's Heat To Produce Clean Hydrogen Fuel
No ratings yet
New System To Harness 40% of The Sun's Heat To Produce Clean Hydrogen Fuel
6 pages
Advantages and Disadvantages of Doing Coursework
100% (1)
Advantages and Disadvantages of Doing Coursework
5 pages
Glossary of Notations - Recommender Systems Part 3
No ratings yet
Glossary of Notations - Recommender Systems Part 3
4 pages
Notebook - Geospatial
No ratings yet
Notebook - Geospatial
11 pages
C2 Logarithms & Exponential Functions 5 QP
No ratings yet
C2 Logarithms & Exponential Functions 5 QP
3 pages
RAGE Against The Machine - Retrieval-Augmented LLM Explanations
No ratings yet
RAGE Against The Machine - Retrieval-Augmented LLM Explanations
4 pages
MTSM-1 Multi Location Connectivity of COSEC Door Controller
No ratings yet
MTSM-1 Multi Location Connectivity of COSEC Door Controller
8 pages
Data Pipeline in ML
No ratings yet
Data Pipeline in ML
3 pages
Boston Dataset
No ratings yet
Boston Dataset
6 pages
Notebook - Main Code
No ratings yet
Notebook - Main Code
4 pages
2291 - Simulation and Programming Techniques - 1169 - (29!10!2024 08-16-28 - 319 AM)
No ratings yet
2291 - Simulation and Programming Techniques - 1169 - (29!10!2024 08-16-28 - 319 AM)
2 pages
ML LVC 3 Glossary
No ratings yet
ML LVC 3 Glossary
1 page
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
From Everand
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
Idea Link
No ratings yet
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Practical Statistical Process Control
From Everand
Practical Statistical Process Control
Colin Hardwick
5/5 (9)
MCS-011: Problem Solving and Programming
From Everand
MCS-011: Problem Solving and Programming
Dr. DK Sukhani
No ratings yet

MLS 2 - Classification

Uploaded by

MLS 2 - Classification

Uploaded by

jacques.lethuaut@gmail.

This file is meant for personal use by [email protected] only.

This file is meant for personal use by [email protected] only.

This file is meant for personal use by [email protected] only.

This file is meant for personal use by [email protected] only.

This file is meant for personal use by [email protected] only.

This file is meant for personal use by [email protected] only.

This file is meant for personal use by [email protected] only.

This file is meant for personal use by [email protected] only.

This file is meant for personal use by [email protected] only.

Linear Discriminant Analysis Quadratic Discriminant Analysis

It can be used as a dimensionality It cannot be used as a dimensionality

This file is meant for personal use by [email protected] only.

This file is meant for personal use by [email protected] only.

You might also like