0% found this document useful (0 votes)

3 views39 pages

ML Classification Trupesh Patel

Uploaded by

Mraryanraj03

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views39 pages

ML Classification Trupesh Patel

Uploaded by

Mraryanraj03

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

❖ Logistic Regression
❖ Support Vector Machine
❖ K- Nearest neighbour (KNN)
Logistic regression
Introduction
❖ Logistic Regression is commonly used to estimate the probability that
an instance belongs to a particular class (e.g., what is the probability
that this email is spam?).
❖ If the estimated probability is greater than 50%, then the model predicts
that the instance belongs to that class (called the positive class, labeled
“1”), or else it predicts that it does not (i.e., it belongs to the negative
class, labeled “0”). This makes it a binary classiﬁer.
Estimating Probabilities :
❖ Logistic Regression model computes a weighted sum of the input features
(plus a bias term), but instead of outputting the result directly like the Linear
Regression model does, it outputs the logistic of this result.

The logistic—also called the logit, noted σ(·)—is a sigmoid function (i.e.,
S-shaped) that outputs a number between 0 and 1.
Logistic Function
❖ Once the Logistic Regression model has estimated the probability p = hθ (x)
that an instance x belongs to the positive class, it can make its prediction ŷ
easily.

❖ Logistic Regression model prediction y = 0 if p < 0 . 5, 1 if p ≥ 0 . 5 .

Training and Cost function
❖ The objective of training is to set the parameter vector θ so that the
model estimates high probabilities for positive instances (y = 1) and low
probabilities for negative instances (y = 0). This idea is captured by the
cost function for a single training instance x.

❖ Cost function of a single training instance

Training and Cost function
❖ This cost function makes sense because – log(t) grows very large when t
approaches 0, so the cost will be large if the model estimates a
probability close to 0 for a positive instance, and it will also be very large
if the model estimates a probability close to 1 for a negative instance.
❖ On the other hand, – log(t) is close to 0 when t is close to 1, so the cost
will be close to 0 if the estimated probability is close to 0 for a negative
instance or close to 1 for a positive instance, which is precisely what we
want. The cost function over the whole training set is simply the average
cost over all train‐ ing instances.
❖ It can be written in a single expression (as you can verify easily), called
the log loss.
Training and Cost function
❖ The bad news is that there is no known closed-form equation to compute
the value of θ that minimizes this cost function (there is no equivalent of
the Normal Equation).
❖ But the good news is that this cost function is convex, so Gradient
Descent (or any other optimization algorithm) is guaranteed to find the
global minimum (if the learn‐ ing rate is not too large and you wait long
enough).
❖ The partial derivatives of the cost function with regards to the jth model
parameter θj is given by
Training and Cost function
❖ for each instance it computes the prediction error and multiplies it by the
jth feature value, and then it computes the average over all training
instances.
❖ Once you have the gradient vector containing all the partial derivatives
you can use it in the Batch Gradient Descent algorithm. That’s it: you
now know how to train a Logistic Regression model.
Decision Boundaries :
❖ Let’s use the iris dataset to illustrate Logistic Regression. This is a
famous dataset that contains the sepal and petal length and width of
150 iris flowers of three different species: Iris-Setosa, Iris-Versicolor, and
Iris-Virginica.
Decision Boundaries :
❖ Let’s try to build a classifier to detect the Iris-Virginica type based only
on the petal width feature. First let’s load the data:
Decision Boundaries :
❖ Now let’s train a Logistic Regression model:
Decision Boundaries :
❖ Let’s look at the model’s estimated probabilities for flowers with petal
widths varying from 0 to 3 cm.
Decision Boundaries :
❖ The petal width of Iris-Virginica flowers (represented by triangles)
ranges from 1.4 cm to 2.5 cm, while the other iris flowers (represented by
squares) generally have a smaller petal width, ranging from 0.1 cm to 1.8
cm.

Estimated probabilities and decision boundary

Decision Boundaries :
❖ There is a decision
boundary at around 1.6 cm
where both probabilities are
equal to 50%: if the petal
width is higher than 1.6 cm,
the classifier will predict
that the flower is an
IrisVirginica, or else it will
predict that it is not (even if
it is not very confident):
Decision Boundaries :
❖ Fig displays two features: petal width
and length.
❖ Once trained, the Logistic Regression
classifier can estimate the probability
that a new flower is an Iris-Virginica
based on these two features.
❖ The dashed line represents the points
where the model estimates a 50%
prrobability: this is the model’s
decision boundary. Note that it is a
linear boundary.17 Each parallel line
represents the points where the model
outputs a specific probability, from 15%
(bottom left) to 90% (top right). All the
flowers beyond the top-right line have
an over 90% chance of being
Iris-Virginica according to the model.
Overfitting :

❖ Overfitting is a modeling error that occurs when a function or model is too closely fit
the training set and resulting in a drastic difference of fitting in the test set.
Examples :
❖ Overfitting is a modeling error that occurs when a function or model is too closely fit
the training set and resulting in a drastic difference of fitting in the test set.

❖ we need to predict if a student will land a job interview based on his resume. Now
assume we train a model from a dataset of 20,000 resumes and their outcomes.

❖ Then we try a model out on the original dataset and it predicts outcomes with 98%
Accuracy… Wow! It’s Amazing, but not in Reality.

❖ But now comes the bad news. When we run a model out on the new dataset of
resumes, we only get 50% of Accuracy.

❖ Our model doesn’t get generalized well from our training data to see unseen data.
This is known an Overﬁtting and it is a common problem in Data Science.

❖ In fact, Overﬁtting occurs in the real world all the time. We need to handle it to
generalize the model.
Find overﬁtting :

❖ The primary challenge in machine learning and in data science is that we can't
evaluate the model performance until we test it. So the first step to finding the
Overfitting is to split the data into the Training and Testing set.

❖ The performance can be measured using the percentage of accuracy observed in

both data sets to conclude on the presence of overfitting. If the model performs
better on the training set than on the test set, it means that the model is likely
overfitting. For example, it would be a big Alert if our model saw 99% accuracy on
the training set but only 50% accuracy on the test set.
Prevent overfitting :

❖ Training with more data

❖ Data augmentation

❖ Cross validation
❖ Feature selection
❖ Regularization
Regularization :

❖ Keep all the features but reduce the magnitude /value of parameters (theta J) to
make the value smaller.

❖ Works well when we have a lot of features, each of which contributes a bit to
predicting y.
Regularization :

❖ Modify the cost function by adding an extra regularization term in the end to shrink every single
parameter (e.g. close to 0)

❖ lambda (regularization parameter) controls the tradeoff between two goals:

❖ former formula — 1st goal: ﬁt the training data well

❖ extra lambda (purple) — 2nd goal: keep the parameters small to avoid overfitting
❖ If all parameters (theta) are close to 0, the result will be close to 0. -> it will generate a flat
straight line that fails to fit the features well → underfit

❖ To sum up, if lambda is chosen to be too large, it may smooth out the function too much and
cause underﬁtting.
Support vector
machine
Support vector machine
❖ A Support Vector Machine (SVM) is a very powerful and versatile Machine
Learning model, capable of performing linear or nonlinear classiﬁcation,
regression, and even outlier detection

❖ SVMs are particularly well suited for classification of complex but small- or
medium-sized datasets.
❖ Applications : face detection, text and hypertext categorization ,
classification of images , handwriting recognition
Linear Classifiers α
x f yest
denotes +1
f(x,w,b) = sign(w. x - b)
denotes -1

How would you

classify this data?
Linear Classiﬁers α
x f yest
f(x,w,b) = sign(w. x - b)
denotes +1
denotes -1

How would you

classify this data?
Linear Classiﬁers α
x f yest
f(x,w,b) = sign(w. x - b)
denotes +1
denotes -1

How would you

classify this data?
Linear Classiﬁers α
x f yest
f(x,w,b) = sign(w. x - b)
denotes +1
denotes -1

How would you

classify this data?
Linear Classiﬁers α
x f yest
denotes +1 f(x,w,b) = sign(w. x - b)
denotes -1

Any of these
would be fine..

..but which is
best?
Linear Classifiers α
x f yest
f(x,w,b) = sign(w. x - b)
denotes +1
Define the margin of
denotes -1
a linear classifier as
the width that the
boundary could be
increased by before
hitting a datapoint.
Linear Classifiers α
x f yest
f(x,w,b) = sign(w. x - b)
denotes +1
The maximum margin
denotes -1 linear classifier is the
linear classifier with the,
um, maximum margin.
This is the simplest kind of
SVM (Called an LSVM)
Linear Classifiers α
x f yest
f(x,w,b) = sign(w. x - b)
denotes +1
The maximum margin
denotes -1 linear classifier is the
linear classifier with the,
Support Vectors are um, maximum margin.
those data points that
the margin pushes up This is the simplest kind of
against SVM (Called an LSVM)
Why max margin?
1. Intuitively this feels safest.
2. If we’ve madef(x,w,b)
a small error=in sign(w.
the locationxof- the
b)
denotes +1 boundary (it’s been jolted in its perpendicular
direction) this gives us least
Thechance of causing margin
maximum a
denotes -1
misclassification.
linear classifier is the
3. LOOCV is easy since the model is immune to removal
linear classifier with
of any non-support-vector datapoints.
Support Vectors are the, um, maximum
4. There’s some theory (using VC dimension) that is
those datapoints margin.
related to (but not the same as) the proposition that
that the margin this is a good thing.
pushes up against This is the simplest
5. Empirically it works very very well.
kind of SVM (Called an
LSVM)
Specifying a line
one
and margin
1” z Plus-Plane
=+
C l ass Classifier Boundary
red ict Minus-Plane
“P -1”
=
Cl ass
re d ict zone
“P

• How do we represent this mathematically?

• …in m input dimensions?
Specifying a line
e
and margin
” zon Plus-Plane
+1 Classifier Boundary
l a s s=
i ctC Minus-Plane
P red zone
“ -1”
=1 ss =
+ b C l a
wx b=0
ed i ct
+
wx b=-1
+ “ Pr
w x

• Plus-plane = { x : w . x + b = +1 }
• Minus-plane = { x : w . x + b = -1 }
Classify as.. +1 if w . x + b >= 1
-1 if w . x + b <= -1
Universe if -1 < w . x + b < 1
explodes
K-nearest Neighbor(KNN)
The k-nearest neighbors classifier (kNN) is a non-parametric supervised
machine learning algorithm. It’s distance-based: it classifies objects based
on their proximate neighbors’ classes.

Non-parametric means that there is no fine-tuning of parameters in the

training step of the model. Although k can be considered an algorithm
parameter in some sense, it’s actually a hyperparameter. It’s selected
manually and remains fixed at both training and inference time.
K-nearest Neighbor(KNN)
The k-nearest neighbors algorithm is also non-linear. In contrast to simpler
models like linear regression, it will work well with data in which the
relationship between the independent variable (x) and the dependent
variable (y) is not a straight line.
What is k in k-nearest neighbors?
The parameter k in kNN refers to the number of labeled points (neighbors)
considered for classification. The value of k indicates the number of these
points used to determine the result. Our task is to calculate the distance and
identify which categories are closest to our unknown entity.
K-nearest Neighbor(KNN)
How does it works?
Given a point whose class we do not know, we can try to understand which
points in our feature space are closest to it.
These points are the k-nearest neighbors. Since similar things occupy
similar places in feature space, it’s very likely that the point belongs to the
same class as its neighbors.
Based on that, it’s possible to classify a new point as belonging to one class
or another.

Pablo Picasso. An Introduction (PDFDrive)
100% (3)
Pablo Picasso. An Introduction (PDFDrive)
200 pages
Disciplines and Ideas in The Social Sciences Quarter 1: Week 5 - Module 5
50% (2)
Disciplines and Ideas in The Social Sciences Quarter 1: Week 5 - Module 5
16 pages
BITS F464 ML Lecture Notes
No ratings yet
BITS F464 ML Lecture Notes
86 pages
ML 2 PPT Unit 2
No ratings yet
ML 2 PPT Unit 2
214 pages
CA 2 Assessment in Learning
No ratings yet
CA 2 Assessment in Learning
23 pages
Provisional Allotment 308726
No ratings yet
Provisional Allotment 308726
3 pages
AI Unit3
No ratings yet
AI Unit3
66 pages
DAC ML Tutorial Final Deck
No ratings yet
DAC ML Tutorial Final Deck
150 pages
ML Unit 4 Trupesh Patel
No ratings yet
ML Unit 4 Trupesh Patel
56 pages
Rizal's Life: Family, Childhood and Early Education: Here Is Where Our Presentation Begins
100% (2)
Rizal's Life: Family, Childhood and Early Education: Here Is Where Our Presentation Begins
12 pages
Lesson Plan Template Ailene
No ratings yet
Lesson Plan Template Ailene
10 pages
Machine Learning - I
No ratings yet
Machine Learning - I
126 pages
3.4 Directing of School or College Sports Program
No ratings yet
3.4 Directing of School or College Sports Program
2 pages
Supervised Learning
No ratings yet
Supervised Learning
6 pages
A Layman's Guide To The Project
No ratings yet
A Layman's Guide To The Project
34 pages
04 - Linear-Classification-2024
No ratings yet
04 - Linear-Classification-2024
65 pages
Mock Exams 2024
No ratings yet
Mock Exams 2024
81 pages
Machine Learning - Introduction
No ratings yet
Machine Learning - Introduction
36 pages
CNS Unit-1
No ratings yet
CNS Unit-1
75 pages
3-LG Eval
No ratings yet
3-LG Eval
52 pages
The Hundred-Page Machine Learning Book - Andriy Burkov
No ratings yet
The Hundred-Page Machine Learning Book - Andriy Burkov
16 pages
Unit II
100% (1)
Unit II
13 pages
July4 SaketAnand FriendlyIntroToML
No ratings yet
July4 SaketAnand FriendlyIntroToML
84 pages
01B DL2023 LinearModels
No ratings yet
01B DL2023 LinearModels
47 pages
Chapter 2 - Logistic Regression
No ratings yet
Chapter 2 - Logistic Regression
88 pages
ML - Logistic Regression&KNN
No ratings yet
ML - Logistic Regression&KNN
48 pages
Week11 - Regularization and Optimization
No ratings yet
Week11 - Regularization and Optimization
75 pages
Machine Learning Shortnote
No ratings yet
Machine Learning Shortnote
14 pages
Head Master
100% (1)
Head Master
1 page
DSA5105 Lecture1
No ratings yet
DSA5105 Lecture1
51 pages
Logistic Regression
No ratings yet
Logistic Regression
74 pages
AC-ED L04 - Logistic Regression, Regularization
No ratings yet
AC-ED L04 - Logistic Regression, Regularization
80 pages
Machine Learning & AI
No ratings yet
Machine Learning & AI
38 pages
DAILY-LESSON-LOG-TLE10-Week 2 2024-2025
No ratings yet
DAILY-LESSON-LOG-TLE10-Week 2 2024-2025
5 pages
Lecture 7 Classification
No ratings yet
Lecture 7 Classification
33 pages
06 Logistic Regression
No ratings yet
06 Logistic Regression
55 pages
Lecture3 Logistic Regression Classifier V0
No ratings yet
Lecture3 Logistic Regression Classifier V0
41 pages
The Man Who Fell To Earth
100% (1)
The Man Who Fell To Earth
9 pages
Turtle - Turtle Graphics - Python 3.12.3 Documentation
No ratings yet
Turtle - Turtle Graphics - Python 3.12.3 Documentation
34 pages
Logistic Regression - Byimran
No ratings yet
Logistic Regression - Byimran
35 pages
Logistic Regression
No ratings yet
Logistic Regression
42 pages
Unit 1,2,3
No ratings yet
Unit 1,2,3
17 pages
Cours1 ML
No ratings yet
Cours1 ML
41 pages
Lec1 PDF
No ratings yet
Lec1 PDF
56 pages
Block Cipher Modes of Operation CNS
No ratings yet
Block Cipher Modes of Operation CNS
8 pages
Statistical Learning Theory
No ratings yet
Statistical Learning Theory
4 pages
ML 01
No ratings yet
ML 01
24 pages
The Waiting Room
No ratings yet
The Waiting Room
39 pages
Lecture 15 - Recap and Midterm Review
No ratings yet
Lecture 15 - Recap and Midterm Review
37 pages
3 LogisticRegression
No ratings yet
3 LogisticRegression
30 pages
A Puppet Can
No ratings yet
A Puppet Can
15 pages
Slide 2
No ratings yet
Slide 2
30 pages
Week 4 Logistic
No ratings yet
Week 4 Logistic
21 pages
Gansp Awareness Quiz PDF
No ratings yet
Gansp Awareness Quiz PDF
13 pages
07 Logistics Regression
No ratings yet
07 Logistics Regression
23 pages
Narayana GTMS 2024
No ratings yet
Narayana GTMS 2024
10 pages
Cheatsheet Supervised Learning
100% (1)
Cheatsheet Supervised Learning
4 pages
Introduction To Machine Learning: Dr. Muhammad Amjad Iqbal
No ratings yet
Introduction To Machine Learning: Dr. Muhammad Amjad Iqbal
20 pages
Psychodynamic Treatment For Depression
No ratings yet
Psychodynamic Treatment For Depression
20 pages
Logistic Regression
No ratings yet
Logistic Regression
24 pages
Lecture 09 ML
No ratings yet
Lecture 09 ML
26 pages
CH 1
No ratings yet
CH 1
24 pages
Lect4 Log Reg
No ratings yet
Lect4 Log Reg
20 pages
M02Logistic Regression Logistic RegressioLogistic Regressionn
No ratings yet
M02Logistic Regression Logistic RegressioLogistic Regressionn
19 pages
04 Probability and Learning PDF
No ratings yet
04 Probability and Learning PDF
34 pages
Chapter 1 The Problem and Its Background
100% (1)
Chapter 1 The Problem and Its Background
3 pages
Machine Learning - SoS 2017
No ratings yet
Machine Learning - SoS 2017
15 pages
Case Study On BPO Employee - Impact of Employee Remuneration in Employee Motivation (Responses)
No ratings yet
Case Study On BPO Employee - Impact of Employee Remuneration in Employee Motivation (Responses)
13 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
10 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
Learning3 6pp
No ratings yet
Learning3 6pp
15 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
2 Machine Learning-Enabled Data-Driven Research On Paper-Reinforced Composite Materials - SpringerLink
No ratings yet
2 Machine Learning-Enabled Data-Driven Research On Paper-Reinforced Composite Materials - SpringerLink
9 pages
Libros para Aprender Ingles
100% (1)
Libros para Aprender Ingles
3 pages
4 Linear Regression Additional Notes
No ratings yet
4 Linear Regression Additional Notes
8 pages
Sage HR Africa - Profile Brochure - May14 - Press
No ratings yet
Sage HR Africa - Profile Brochure - May14 - Press
8 pages
UT Dallas Syllabus For Entp6375.501.11s Taught by Rajiv Shah (rxs079000)
No ratings yet
UT Dallas Syllabus For Entp6375.501.11s Taught by Rajiv Shah (rxs079000)
4 pages
Q No. 1 1.1machine Learning:: Machine Learning Is The Study of Computer Algorithms That Improve Automatically
No ratings yet
Q No. 1 1.1machine Learning:: Machine Learning Is The Study of Computer Algorithms That Improve Automatically
10 pages
Unit 3
No ratings yet
Unit 3
9 pages
Supervised Learning
No ratings yet
Supervised Learning
5 pages
Heart Disease Detector
No ratings yet
Heart Disease Detector
7 pages
1 s2.0 S266682702400001X Main
No ratings yet
1 s2.0 S266682702400001X Main
8 pages
Machine Learning 2
No ratings yet
Machine Learning 2
19 pages
Cheatsheet Supervised Learning
No ratings yet
Cheatsheet Supervised Learning
4 pages
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
No ratings yet
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
10 pages
Non Text Magic Studio Magic Design For Presentations L&P
No ratings yet
Non Text Magic Studio Magic Design For Presentations L&P
6 pages
Cghs Empanneled Hospital
No ratings yet
Cghs Empanneled Hospital
4 pages
Notes Chapter Logistic Regression
No ratings yet
Notes Chapter Logistic Regression
6 pages
Ethical Hacker v1 0 Overview
No ratings yet
Ethical Hacker v1 0 Overview
1 page
AP® Computer Science Principles Course and Exam Description, Effective Fall 2023
No ratings yet
AP® Computer Science Principles Course and Exam Description, Effective Fall 2023
1 page
05 Acknowledgement
No ratings yet
05 Acknowledgement
2 pages
MS Computer Engineering Degree Requirement Worksheet: Course Title Units Prerequisite Semester
No ratings yet
MS Computer Engineering Degree Requirement Worksheet: Course Title Units Prerequisite Semester
2 pages
0 Lesson Plan 11a The Food of Love
No ratings yet
0 Lesson Plan 11a The Food of Love
2 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

ML Classification Trupesh Patel

Uploaded by

ML Classification Trupesh Patel

Uploaded by

Contents

❖ Logistic Regression model prediction y = 0 if p < 0 . 5, 1 if p ≥ 0 . 5 .

❖ Cost function of a single training instance

Estimated probabilities and decision boundary

❖ The performance can be measured using the percentage of accuracy observed in

❖ Training with more data

❖ lambda (regularization parameter) controls the tradeoff between two goals:

❖ former formula — 1st goal: ﬁt the training data well

How would you

How would you

How would you

How would you

• How do we represent this mathematically?

Non-parametric means that there is no fine-tuning of parameters in the

Non-parametric means that there is no fine-tuning of parameters in the

You might also like