0% found this document useful (0 votes)
8 views

Supervised Learning Logistic Regression and SVMS: Rayid Ghani

The document discusses supervised learning methods for classification including logistic regression and support vector machines. It provides an overview of logistic regression including how it maps data to a probability space and finds optimal parameters through gradient descent. The document also covers support vector machines, describing how they find the optimal separating hyperplane to maximize the margin between classes.

Uploaded by

ashishamitav123
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Supervised Learning Logistic Regression and SVMS: Rayid Ghani

The document discusses supervised learning methods for classification including logistic regression and support vector machines. It provides an overview of logistic regression including how it maps data to a probability space and finds optimal parameters through gradient descent. The document also covers support vector machines, describing how they find the optimal separating hyperplane to maximize the margin between classes.

Uploaded by

ashishamitav123
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 40

Supervised Learning

Logistic Regression and SVMs

Rayid Ghani

Slides liberally borrowed and customized from lots of excellent online sources
Rayid Ghani @rayidghani
Types of Learning

Unsupervised “Weakly” supervised Fully supervised

Classification
Clustering
Anomaly Detection Binary, multi-
PCA
class,

hierarchical,
sequential

Regression

Rayid Ghani @rayidghani


Supervised learning framework

y = f(x)
features/variables/inputs/predict
Output / dependent Learned ors/independent variables
variable function

• Training: given a training set of labeled examples {(x1,y1), …, (xN,yN)},


estimate the prediction function f that minimizes future generalization error

• Testing: apply f to a new test example x and output the predicted value y
= f(x)

Rayid Ghani @rayidghani


Methods
• Nearest neighbor
• Decision Trees
• Logistic Regression We’ll cover these two in
class today
• Support Vector Machines
• Bayes Classifier
• Ensembles
– Bagging
– Boosting
– Random Forests
• Neural Networks
Rayid Ghani @rayidghani
What we covered last class
• Nearest Neighbor
• Decision Trees

Rayid Ghani @rayidghani


Logistic Regression
• Maps x to a 0-1 space

Rayid Ghani @rayidghani


Logistic Regression
• Goal: estimate parameters (βi) to estimate log-
odds Log (P/ (1– P))

Log (P/ (1– P)) = β0 + βi Xi

Rayid Ghani @rayidghani


1-Dimensional Decision Boundary

Rayid Ghani @rayidghani


2-D Decision Boundary

Rayid Ghani @rayidghani


What are we optimizing for?
• Goal is to Predict 1 for class 1, 0 for class 0

Find βi to maximize
Rayid Ghani @rayidghani
How do you train a Logistic Regression model?

• (Stochastic) Gradient Descent


– For each training data point
• Predict using current coefficients
• Estimate new coefficients using errors

Rayid Ghani @rayidghani


Interpreting Logistic Regression coefficients

• effect of the independent variable on the "odds


ratio" 

• Expected change in log odds ln(p/(1-p) because


of a one-unit increase

• What does that mean in terms of change of


probability?

Rayid Ghani @rayidghani


How does Logistic Regression
control underfitting and
overfitting?

Rayid Ghani @rayidghani


Regularization
• A way to reduce overfitting in machine learning
models by adding penalties for increasing model
complexity

• Logistic Regression:
– Early stopping
– add a penalty term to the cost function based on non-
zero coefficients
• L1 (Lasso):
• L2 (Ridge):
Rayid Ghani @rayidghani
Rayid Ghani @rayidghani
Regularization
• What is the impact of L1 (Least Absolute
Shrinkage and Selection Operator) ?

• What does L2 do?

Rayid Ghani @rayidghani


Data Preparation?
• Do we have to do any data prep for logistic
regression?

Rayid Ghani @rayidghani


More about Logistic Regression
• http://nbviewer.jupyter.org/github/justmarkham/
DAT8/blob/master/notebooks/12_logistic_regres
sion.ipynb

• Little deeper into Logistic Regression


https://www.youtube.com/watch?v=31Q5FGRnxt
4&list=PL5-da3qGB5IC4vaDba5ClatUmFppXLAhE

• Visualization:
https://florianhartl.com/logistic-regression-geom
Rayid Ghani @rayidghani
Linear Classifiers

Rayid Ghani @rayidghani


Which is the “best” separator?

Rayid Ghani @rayidghani


Support Vector Machines
The goal of a support vector machine is to find  the
optimal separating hyperplane which maximizes the
margin of the training data.
ρ

Rayid Ghani @rayidghani


wTx + b = 0
wTx + b > 0
wTx + b < 0

y = f(x) = sign(wTx + b)

Rayid Ghani @rayidghani


Linear SVMs Mathematically
• Then we can formulate the quadratic optimization problem:

 Find w and b such that


is maximized while classifying all the data correctly
s.t for all (xi, yi), i=1..n : yi(wTxi + b) ≥ 1

Which can be reformulated as:


Find w and b such that
||w|| 2 is minimized
s.t for all (xi, yi), i=1..n : yi(wTxi + b) ≥ 1

Rayid Ghani @rayidghani


Rayid Ghani @rayidghani
Solving constrained optimization
• Use Lagrange Multipliers

• Detailed Math:
http://www.engr.mun.ca/~baxter/Publications/La
grangeForSVMs.pdf

Rayid Ghani @rayidghani


Slack Variables

If C is large, errors cost a lot (hard margin). If C is small, errors are tolerated (softer margin).

Rayid Ghani @rayidghani


Rayid Ghani @rayidghani
Nonlinear SVMs
• Datasets that are linearly separable work out great:

0 x

• But what if the dataset is just too hard?

0 x

• We can map it to a higher-dimensional space:


x2

0 x
Rayid Ghani @rayidghani
Logit vs Linear SVMs
• Logistic regression maximizes the conditional
likelihood of the training data

• Linear SVM maximizes the margin

Rayid Ghani @rayidghani


Using SVMs
• Kernel: Linear, Polynomial, RBF
• Normalizing features
• Other Parameters
– C: penalty for errors

• No Probability estimates

Rayid Ghani @rayidghani


Probabilistic Classifiers
• We want to estimate P(class | data) for every
class (and often pick the class with the max
p( class | data)

Rayid Ghani @rayidghani


Bayes’ Rule

• For a data point d and a class c

P(d | c)P(c)
P(c | d) =
P(d)
Rayid Ghani @rayidghani
Naïve Bayes Classifier

cMAP =argmax P(c | d)


MAP
MAPisis“maximum
“maximumaa
posteriori”
posteriori” ==most
mostlikely
likely
class
class
cÎ C
P(d | c)P(c)
=argmax Bayes
BayesRule
Rule

cÎ C P(d)
=argmax P(d | c)P(c) Dropping
Droppingthe
the
denominator
denominator
cÎ C
Rayid Ghani @rayidghani
Naïve Bayes Classifier

cMAP =argmax P(d | c)P(c)


cÎ C
Data
Datapoint
pointdd

=argmax P(x1, x2,…, xn | c)P(c) represented


representedasas
features
featuresx1..xn
x1..xn

cÎ C

Rayid Ghani @rayidghani


Naïve Bayes Classifier

cMAP =argmax P(x1, x2 ,…, xn | c)P(c)


cÎ C

O(|X|
O(|X|n•|C|)
n
•|C|)parameters
parameters How
Howoften
oftendoes
doesthis
this
class
classoccur?
occur?

Could
Couldonly
onlybe
beestimated
estimatedififaavery,
very,
very
verylarge
largenumber
numberof oftraining
training We
Wecan
canjust
justcount
countthe
the
examples
exampleswaswasavailable.
available. relative
relativefrequencies
frequenciesininaa
data
dataset
set

Rayid Ghani @rayidghani


Naïve Bayes - Independence Assumptions

P(x1, x2 ,…, xn | c)
• Conditional Independence: Assume the feature
probabilities P(xi|cj) are independent given the class c.

P(x1,…, xn | c) =P(x1 | c)·P(x2 | c)·P(x3 | c)·...·P(xn | c)

Rayid Ghani @rayidghani


Naïve Bayes Classifier

cMAP =argmax P(x1, x2 ,…, xn | c)P(c)


cÎ C

cNB =argmax P(c j )Õ P(x | c)


cÎ C xÎ X

Rayid Ghani @rayidghani


Naïve Bayes Example

Rayid Ghani @rayidghani


Pros and Cons
• Pros
– Fast
– No need to fill in missing data
– Explicitly handles class priors
– Incrementally updated

• Cons
– Independence assumption may be violated in practice (Features are
correlated)
– Gives skewed probability estimates when # of features is large. Why?
Rayid Ghani @rayidghani
Questions to think about
• When would Naïve Bayes fail?

• How would you relax some of the assumptions


behind naïve bayes to make it more robust?

Rayid Ghani @rayidghani

You might also like