ML Classification Trupesh Patel
ML Classification Trupesh Patel
❖ Logistic Regression
❖ Support Vector Machine
❖ K- Nearest neighbour (KNN)
Logistic regression
Introduction
❖ Logistic Regression is commonly used to estimate the probability that
an instance belongs to a particular class (e.g., what is the probability
that this email is spam?).
❖ If the estimated probability is greater than 50%, then the model predicts
that the instance belongs to that class (called the positive class, labeled
“1”), or else it predicts that it does not (i.e., it belongs to the negative
class, labeled “0”). This makes it a binary classifier.
Estimating Probabilities :
❖ Logistic Regression model computes a weighted sum of the input features
(plus a bias term), but instead of outputting the result directly like the Linear
Regression model does, it outputs the logistic of this result.
The logistic—also called the logit, noted σ(·)—is a sigmoid function (i.e.,
S-shaped) that outputs a number between 0 and 1.
Logistic Function
❖ Once the Logistic Regression model has estimated the probability p = hθ (x)
that an instance x belongs to the positive class, it can make its prediction ŷ
easily.
❖ Overfitting is a modeling error that occurs when a function or model is too closely fit
the training set and resulting in a drastic difference of fitting in the test set.
Examples :
❖ Overfitting is a modeling error that occurs when a function or model is too closely fit
the training set and resulting in a drastic difference of fitting in the test set.
❖ we need to predict if a student will land a job interview based on his resume. Now
assume we train a model from a dataset of 20,000 resumes and their outcomes.
❖ Then we try a model out on the original dataset and it predicts outcomes with 98%
Accuracy… Wow! It’s Amazing, but not in Reality.
❖ But now comes the bad news. When we run a model out on the new dataset of
resumes, we only get 50% of Accuracy.
❖ Our model doesn’t get generalized well from our training data to see unseen data.
This is known an Overfitting and it is a common problem in Data Science.
❖ In fact, Overfitting occurs in the real world all the time. We need to handle it to
generalize the model.
Find overfitting :
❖ The primary challenge in machine learning and in data science is that we can't
evaluate the model performance until we test it. So the first step to finding the
Overfitting is to split the data into the Training and Testing set.
❖ Data augmentation
❖ Cross validation
❖ Feature selection
❖ Regularization
Regularization :
❖ Keep all the features but reduce the magnitude /value of parameters (theta J) to
make the value smaller.
❖ Works well when we have a lot of features, each of which contributes a bit to
predicting y.
Regularization :
❖ Modify the cost function by adding an extra regularization term in the end to shrink every single
parameter (e.g. close to 0)
❖ extra lambda (purple) — 2nd goal: keep the parameters small to avoid overfitting
❖ If all parameters (theta) are close to 0, the result will be close to 0. -> it will generate a flat
straight line that fails to fit the features well → underfit
❖ To sum up, if lambda is chosen to be too large, it may smooth out the function too much and
cause underfitting.
Support vector
machine
Support vector machine
❖ A Support Vector Machine (SVM) is a very powerful and versatile Machine
Learning model, capable of performing linear or nonlinear classification,
regression, and even outlier detection
❖ SVMs are particularly well suited for classification of complex but small- or
medium-sized datasets.
❖ Applications : face detection, text and hypertext categorization ,
classification of images , handwriting recognition
Linear Classifiers α
x f yest
denotes +1
f(x,w,b) = sign(w. x - b)
denotes -1
Any of these
would be fine..
..but which is
best?
Linear Classifiers α
x f yest
f(x,w,b) = sign(w. x - b)
denotes +1
Define the margin of
denotes -1
a linear classifier as
the width that the
boundary could be
increased by before
hitting a datapoint.
Linear Classifiers α
x f yest
f(x,w,b) = sign(w. x - b)
denotes +1
The maximum margin
denotes -1 linear classifier is the
linear classifier with the,
um, maximum margin.
This is the simplest kind of
SVM (Called an LSVM)
Linear Classifiers α
x f yest
f(x,w,b) = sign(w. x - b)
denotes +1
The maximum margin
denotes -1 linear classifier is the
linear classifier with the,
Support Vectors are um, maximum margin.
those data points that
the margin pushes up This is the simplest kind of
against SVM (Called an LSVM)
Why max margin?
1. Intuitively this feels safest.
2. If we’ve madef(x,w,b)
a small error=in sign(w.
the locationxof- the
b)
denotes +1 boundary (it’s been jolted in its perpendicular
direction) this gives us least
Thechance of causing margin
maximum a
denotes -1
misclassification.
linear classifier is the
3. LOOCV is easy since the model is immune to removal
linear classifier with
of any non-support-vector datapoints.
Support Vectors are the, um, maximum
4. There’s some theory (using VC dimension) that is
those datapoints margin.
related to (but not the same as) the proposition that
that the margin this is a good thing.
pushes up against This is the simplest
5. Empirically it works very very well.
kind of SVM (Called an
LSVM)
Specifying a line
one
and margin
1” z Plus-Plane
=+
C l ass Classifier Boundary
red ict Minus-Plane
“P -1”
=
Cl ass
re d ict zone
“P
• Plus-plane = { x : w . x + b = +1 }
• Minus-plane = { x : w . x + b = -1 }
Classify as.. +1 if w . x + b >= 1
-1 if w . x + b <= -1
Universe if -1 < w . x + b < 1
explodes
K-nearest Neighbor(KNN)
The k-nearest neighbors classifier (kNN) is a non-parametric supervised
machine learning algorithm. It’s distance-based: it classifies objects based
on their proximate neighbors’ classes.