0% found this document useful (0 votes)

10 views

Supervised Learning Logistic Regression and SVMS: Rayid Ghani

The document discusses supervised learning methods for classification including logistic regression and support vector machines. It provides an overview of logistic regression including how it maps data to a probability space and finds optimal parameters through gradient descent. The document also covers support vector machines, describing how they find the optimal separating hyperplane to maximize the margin between classes.

Uploaded by

ashishamitav123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Supervised Learning Logistic Regression and SVMS: Rayid Ghani

Uploaded by

ashishamitav123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 40

Supervised Learning

Logistic Regression and SVMs

Rayid Ghani

Slides liberally borrowed and customized from lots of excellent online sources
Rayid Ghani @rayidghani
Types of Learning

Unsupervised “Weakly” supervised Fully supervised

Classification
Clustering
Anomaly Detection Binary, multi-
PCA
class,
…
hierarchical,
sequential

Regression

Rayid Ghani @rayidghani

Supervised learning framework

y = f(x)
features/variables/inputs/predict
Output / dependent Learned ors/independent variables
variable function

• Training: given a training set of labeled examples {(x1,y1), …, (xN,yN)},

estimate the prediction function f that minimizes future generalization error

• Testing: apply f to a new test example x and output the predicted value y
= f(x)

Rayid Ghani @rayidghani

Methods
• Nearest neighbor
• Decision Trees
• Logistic Regression We’ll cover these two in
class today
• Support Vector Machines
• Bayes Classifier
• Ensembles
– Bagging
– Boosting
– Random Forests
• Neural Networks
Rayid Ghani @rayidghani
What we covered last class
• Nearest Neighbor
• Decision Trees

Rayid Ghani @rayidghani

Logistic Regression
• Maps x to a 0-1 space

Rayid Ghani @rayidghani

Logistic Regression
• Goal: estimate parameters (βi) to estimate log-
odds Log (P/ (1– P))

Log (P/ (1– P)) = β0 + βi Xi

Rayid Ghani @rayidghani

1-Dimensional Decision Boundary

Rayid Ghani @rayidghani

2-D Decision Boundary

Rayid Ghani @rayidghani

What are we optimizing for?
• Goal is to Predict 1 for class 1, 0 for class 0

Find βi to maximize
Rayid Ghani @rayidghani
How do you train a Logistic Regression model?

• (Stochastic) Gradient Descent

– For each training data point
• Predict using current coefficients
• Estimate new coefficients using errors

Rayid Ghani @rayidghani

Interpreting Logistic Regression coefficients

• effect of the independent variable on the "odds

ratio"

• Expected change in log odds ln(p/(1-p) because

of a one-unit increase

• What does that mean in terms of change of

probability?

Rayid Ghani @rayidghani

How does Logistic Regression
control underfitting and
overfitting?

Rayid Ghani @rayidghani

Regularization
• A way to reduce overfitting in machine learning
models by adding penalties for increasing model
complexity

• Logistic Regression:
– Early stopping
– add a penalty term to the cost function based on non-
zero coefficients
• L1 (Lasso):
• L2 (Ridge):
Rayid Ghani @rayidghani
Rayid Ghani @rayidghani
Regularization
• What is the impact of L1 (Least Absolute
Shrinkage and Selection Operator) ?

• What does L2 do?

Rayid Ghani @rayidghani

Data Preparation?
• Do we have to do any data prep for logistic
regression?

Rayid Ghani @rayidghani

More about Logistic Regression
• http://nbviewer.jupyter.org/github/justmarkham/
DAT8/blob/master/notebooks/12_logistic_regres
sion.ipynb

• Little deeper into Logistic Regression

https://www.youtube.com/watch?v=31Q5FGRnxt
4&list=PL5-da3qGB5IC4vaDba5ClatUmFppXLAhE

• Visualization:
https://florianhartl.com/logistic-regression-geom
Rayid Ghani @rayidghani
Linear Classifiers

Rayid Ghani @rayidghani

Which is the “best” separator?

Rayid Ghani @rayidghani

Support Vector Machines
The goal of a support vector machine is to find the
optimal separating hyperplane which maximizes the
margin of the training data.
ρ

Rayid Ghani @rayidghani

wTx + b = 0
wTx + b > 0
wTx + b < 0

y = f(x) = sign(wTx + b)

Rayid Ghani @rayidghani

Linear SVMs Mathematically
• Then we can formulate the quadratic optimization problem:

Find w and b such that

is maximized while classifying all the data correctly
s.t for all (xi, yi), i=1..n : yi(wTxi + b) ≥ 1

Which can be reformulated as:

Find w and b such that
||w|| 2 is minimized
s.t for all (xi, yi), i=1..n : yi(wTxi + b) ≥ 1

Rayid Ghani @rayidghani

Rayid Ghani @rayidghani
Solving constrained optimization
• Use Lagrange Multipliers

• Detailed Math:
http://www.engr.mun.ca/~baxter/Publications/La
grangeForSVMs.pdf

Rayid Ghani @rayidghani

Slack Variables

If C is large, errors cost a lot (hard margin). If C is small, errors are tolerated (softer margin).

Rayid Ghani @rayidghani

Rayid Ghani @rayidghani
Nonlinear SVMs
• Datasets that are linearly separable work out great:

0 x

• But what if the dataset is just too hard?

0 x

• We can map it to a higher-dimensional space:

0 x
Rayid Ghani @rayidghani
Logit vs Linear SVMs
• Logistic regression maximizes the conditional
likelihood of the training data

• Linear SVM maximizes the margin

Rayid Ghani @rayidghani

Using SVMs
• Kernel: Linear, Polynomial, RBF
• Normalizing features
• Other Parameters
– C: penalty for errors

• No Probability estimates

Rayid Ghani @rayidghani

Probabilistic Classifiers
• We want to estimate P(class | data) for every
class (and often pick the class with the max
p( class | data)

Rayid Ghani @rayidghani

Bayes’ Rule

• For a data point d and a class c

P(d | c)P(c)
P(c | d) =
P(d)
Rayid Ghani @rayidghani
Naïve Bayes Classifier

cMAP =argmax P(c | d)

MAP
MAPisis“maximum
“maximumaa
posteriori”
posteriori” ==most
mostlikely
likely
class
class
cÎ C
P(d | c)P(c)
=argmax Bayes
BayesRule
Rule

cÎ C P(d)
=argmax P(d | c)P(c) Dropping
Droppingthe
the
denominator
denominator
cÎ C
Rayid Ghani @rayidghani
Naïve Bayes Classifier

cMAP =argmax P(d | c)P(c)

cÎ C
Data
Datapoint
pointdd

=argmax P(x1, x2,…, xn | c)P(c) represented

representedasas
features
featuresx1..xn
x1..xn

cÎ C

Rayid Ghani @rayidghani

Naïve Bayes Classifier

cMAP =argmax P(x1, x2 ,…, xn | c)P(c)

cÎ C

O(|X|
O(|X|n•|C|)
n
•|C|)parameters
parameters How
Howoften
oftendoes
doesthis
this
class
classoccur?
occur?

Could
Couldonly
onlybe
beestimated
estimatedififaavery,
very,
very
verylarge
largenumber
numberof oftraining
training We
Wecan
canjust
justcount
countthe
the
examples
exampleswaswasavailable.
available. relative
relativefrequencies
frequenciesininaa
data
dataset
set

Rayid Ghani @rayidghani

Naïve Bayes - Independence Assumptions

P(x1, x2 ,…, xn | c)
• Conditional Independence: Assume the feature
probabilities P(xi|cj) are independent given the class c.

P(x1,…, xn | c) =P(x1 | c)·P(x2 | c)·P(x3 | c)·...·P(xn | c)

Rayid Ghani @rayidghani

Naïve Bayes Classifier

cMAP =argmax P(x1, x2 ,…, xn | c)P(c)

cÎ C

cNB =argmax P(c j )Õ P(x | c)

cÎ C xÎ X

Rayid Ghani @rayidghani

Naïve Bayes Example

Rayid Ghani @rayidghani

Pros and Cons
• Pros
– Fast
– No need to fill in missing data
– Explicitly handles class priors
– Incrementally updated

• Cons
– Independence assumption may be violated in practice (Features are
correlated)
– Gives skewed probability estimates when # of features is large. Why?
Rayid Ghani @rayidghani
Questions to think about
• When would Naïve Bayes fail?

• How would you relax some of the assumptions

behind naïve bayes to make it more robust?

Rayid Ghani @rayidghani

ML DSBA Lab2
No ratings yet
ML DSBA Lab2
4 pages
B24 ML Exp-1
No ratings yet
B24 ML Exp-1
10 pages
COMP-377Week6_v1.1
No ratings yet
COMP-377Week6_v1.1
38 pages
Lecture 4-Logistic-Regression
No ratings yet
Lecture 4-Logistic-Regression
50 pages
Logistic Regression: Jia Li
No ratings yet
Logistic Regression: Jia Li
44 pages
Logit PDF
No ratings yet
Logit PDF
44 pages
Notes 05
No ratings yet
Notes 05
51 pages
Week 4 Logistic
No ratings yet
Week 4 Logistic
21 pages
Business Analytics & Machine Learning: Logistic and Poisson Regressions
No ratings yet
Business Analytics & Machine Learning: Logistic and Poisson Regressions
62 pages
Ch2Regression and Regularization1
No ratings yet
Ch2Regression and Regularization1
45 pages
Lecture 6
No ratings yet
Lecture 6
19 pages
Logistic Regression
No ratings yet
Logistic Regression
20 pages
Introduction To Machine Learning: 2 Linear Classifiers
No ratings yet
Introduction To Machine Learning: 2 Linear Classifiers
4 pages
15) Machine Learning Algorithms - Google Docs
No ratings yet
15) Machine Learning Algorithms - Google Docs
5 pages
Logistic Regression
No ratings yet
Logistic Regression
34 pages
3. LR, decision tree
No ratings yet
3. LR, decision tree
48 pages
ML CLASS 5 Logistic Regression Algorithm
No ratings yet
ML CLASS 5 Logistic Regression Algorithm
16 pages
ML_MU_Unit_2 - Supervised Learning-Classification Techniques
No ratings yet
ML_MU_Unit_2 - Supervised Learning-Classification Techniques
153 pages
Logistic Regression
No ratings yet
Logistic Regression
9 pages
02 LogisticRegression
No ratings yet
02 LogisticRegression
29 pages
ML Unit 3
No ratings yet
ML Unit 3
40 pages
Logistic Regression and Naive Bayes
No ratings yet
Logistic Regression and Naive Bayes
4 pages
09_23ECE216_LogisticRegression
No ratings yet
09_23ECE216_LogisticRegression
40 pages
Logistic Regression Algorithm
No ratings yet
Logistic Regression Algorithm
8 pages
Logistic Regression
No ratings yet
Logistic Regression
11 pages
Mathematics Behind Logistic Regression Model 1598272636
No ratings yet
Mathematics Behind Logistic Regression Model 1598272636
6 pages
What Is Logistic Regression
No ratings yet
What Is Logistic Regression
20 pages
Logistic Regression in Data Analysis: An Overview
No ratings yet
Logistic Regression in Data Analysis: An Overview
21 pages
Logistic regression
No ratings yet
Logistic regression
12 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
ML Tutorial
No ratings yet
ML Tutorial
45 pages
Machine learning notes
No ratings yet
Machine learning notes
53 pages
Basic ML Algorithm
No ratings yet
Basic ML Algorithm
74 pages
Logistic Regression
No ratings yet
Logistic Regression
42 pages
Unit II
100% (1)
Unit II
13 pages
L4 - Logistic Regression - B
No ratings yet
L4 - Logistic Regression - B
45 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Logistic Regression
No ratings yet
Logistic Regression
26 pages
Experiment No 3
No ratings yet
Experiment No 3
7 pages
Machine Learning Using Optimization and Logistic Regression and Sigmoid Function_grp 06
No ratings yet
Machine Learning Using Optimization and Logistic Regression and Sigmoid Function_grp 06
31 pages
Logistic Regression: Adapted From: Tom Mitchell's Machine Learning Book Evan Wei Xiang and Qiang Yang
No ratings yet
Logistic Regression: Adapted From: Tom Mitchell's Machine Learning Book Evan Wei Xiang and Qiang Yang
15 pages
Logistic Regression
No ratings yet
Logistic Regression
21 pages
ML Algo
No ratings yet
ML Algo
36 pages
ML - Unit 2
No ratings yet
ML - Unit 2
155 pages
ML-classification models
No ratings yet
ML-classification models
27 pages
Cours1 ML
No ratings yet
Cours1 ML
41 pages
Intro to Linear and Logistic Reg
No ratings yet
Intro to Linear and Logistic Reg
5 pages
ML Lec-9
No ratings yet
ML Lec-9
13 pages
04 Probability and Learning PDF
No ratings yet
04 Probability and Learning PDF
34 pages
Lecture Note #9_PEC-CS701E
No ratings yet
Lecture Note #9_PEC-CS701E
41 pages
Logistic Regression
No ratings yet
Logistic Regression
14 pages
ML 03 Logistic Regression
No ratings yet
ML 03 Logistic Regression
32 pages
Machine Learning - Unit 2
No ratings yet
Machine Learning - Unit 2
104 pages
AI lab8
No ratings yet
AI lab8
8 pages
Logistic Regression For Machine Learning Complete TutorialUnderstand This Popular Supervised Classifi
No ratings yet
Logistic Regression For Machine Learning Complete TutorialUnderstand This Popular Supervised Classifi
10 pages
Lect 11 P1
No ratings yet
Lect 11 P1
21 pages
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
No ratings yet
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
67 pages
Machine Learning 2
No ratings yet
Machine Learning 2
19 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
Neo4j Graph Data Science Certified - Exam Practice Tests
From Everand
Neo4j Graph Data Science Certified - Exam Practice Tests
Cristian Scutaru
No ratings yet
Labour Laws Every HR Professional Must Master in a Private Limited Company (India) (1)
No ratings yet
Labour Laws Every HR Professional Must Master in a Private Limited Company (India) (1)
7 pages
DPM 40(Solutions)
No ratings yet
DPM 40(Solutions)
8 pages
DPM 4 (Solutions)
No ratings yet
DPM 4 (Solutions)
6 pages
DPM 56(Solutions)
No ratings yet
DPM 56(Solutions)
8 pages
DPM 4
No ratings yet
DPM 4
10 pages
DPM 57(Solutions)
No ratings yet
DPM 57(Solutions)
12 pages
DPM 91(Solutions)
No ratings yet
DPM 91(Solutions)
9 pages
DPM 18(Solutions)
No ratings yet
DPM 18(Solutions)
7 pages
DPM 92(Solutions)
No ratings yet
DPM 92(Solutions)
8 pages
DPM 95
No ratings yet
DPM 95
13 pages
DPM 95(Solutions)
No ratings yet
DPM 95(Solutions)
9 pages
DPM 96
No ratings yet
DPM 96
14 pages
SNA-Graph Essentials
No ratings yet
SNA-Graph Essentials
106 pages
Ecdsa Impl
No ratings yet
Ecdsa Impl
6 pages
Java Puzzle
No ratings yet
Java Puzzle
275 pages
Natural Language Processing For Prolog Programmers
No ratings yet
Natural Language Processing For Prolog Programmers
4 pages
Chapter 3 Regular Expression
No ratings yet
Chapter 3 Regular Expression
25 pages
Lab 5
No ratings yet
Lab 5
2 pages
Semi Graph
100% (1)
Semi Graph
46 pages
PCD - QN Bank
No ratings yet
PCD - QN Bank
3 pages
Java Generics Tutorial
No ratings yet
Java Generics Tutorial
2 pages
Sorting Techniques: Bubble Sort Insertion Sort Selection Sort Quick Sort Merge Sort
100% (1)
Sorting Techniques: Bubble Sort Insertion Sort Selection Sort Quick Sort Merge Sort
22 pages
Flip Flops
No ratings yet
Flip Flops
12 pages
Cs-509: Advanced Data Structures and Algorithms: Assignment Number 3
No ratings yet
Cs-509: Advanced Data Structures and Algorithms: Assignment Number 3
7 pages
Test Items List Tuple and Set
No ratings yet
Test Items List Tuple and Set
5 pages
Visa - LeetCode
No ratings yet
Visa - LeetCode
3 pages
Book Numerical
100% (1)
Book Numerical
388 pages
Solu 8
No ratings yet
Solu 8
35 pages
6 Functions
No ratings yet
6 Functions
32 pages
Quantum Computing
No ratings yet
Quantum Computing
12 pages
Self - Learningactivity4.3.A: A. 1.1) Evaluate The Following
100% (2)
Self - Learningactivity4.3.A: A. 1.1) Evaluate The Following
6 pages
Lab1 Problems 0286
No ratings yet
Lab1 Problems 0286
4 pages
Lecture 3.1.2 - Puzzle Friendly Hash
No ratings yet
Lecture 3.1.2 - Puzzle Friendly Hash
7 pages
Algorithm Design and Analysis (ADA) : Dynamic Programming I
No ratings yet
Algorithm Design and Analysis (ADA) : Dynamic Programming I
23 pages
DS Lab 3
No ratings yet
DS Lab 3
8 pages
Seminar - Report Iit Bmbay
No ratings yet
Seminar - Report Iit Bmbay
42 pages
Desalgo 02 - Practice - Exercises - 1
No ratings yet
Desalgo 02 - Practice - Exercises - 1
2 pages
2504.04021v1
No ratings yet
2504.04021v1
10 pages
Ring Star Fully Connected Mesh: A A A B B B C
No ratings yet
Ring Star Fully Connected Mesh: A A A B B B C
4 pages
Quadratic Programming: Optimization of Chemical Processes
No ratings yet
Quadratic Programming: Optimization of Chemical Processes
12 pages
MCS 011
No ratings yet
MCS 011
4 pages
Digital Communication Coding: En. Mohd Nazri Mahmud Mphil (Cambridge, Uk) Beng (Essex, Uk)
No ratings yet
Digital Communication Coding: En. Mohd Nazri Mahmud Mphil (Cambridge, Uk) Beng (Essex, Uk)
25 pages

Supervised Learning Logistic Regression and SVMS: Rayid Ghani

Uploaded by

Supervised Learning Logistic Regression and SVMS: Rayid Ghani

Uploaded by

Supervised Learning

Logistic Regression and SVMs

Unsupervised “Weakly” supervised Fully supervised

Rayid Ghani @rayidghani

• Training: given a training set of labeled examples {(x1,y1), …, (xN,yN)},

Rayid Ghani @rayidghani

Rayid Ghani @rayidghani

Rayid Ghani @rayidghani

Log (P/ (1– P)) = β0 + βi Xi

Rayid Ghani @rayidghani

Rayid Ghani @rayidghani

Rayid Ghani @rayidghani

• (Stochastic) Gradient Descent

Rayid Ghani @rayidghani

• effect of the independent variable on the "odds

• Expected change in log odds ln(p/(1-p) because

• What does that mean in terms of change of

Rayid Ghani @rayidghani

Rayid Ghani @rayidghani

• What does L2 do?

Rayid Ghani @rayidghani

Rayid Ghani @rayidghani

• Little deeper into Logistic Regression

Rayid Ghani @rayidghani

Rayid Ghani @rayidghani

Rayid Ghani @rayidghani

Rayid Ghani @rayidghani

Find w and b such that

Which can be reformulated as:

Rayid Ghani @rayidghani

Rayid Ghani @rayidghani

Rayid Ghani @rayidghani

• But what if the dataset is just too hard?

• We can map it to a higher-dimensional space:

• Linear SVM maximizes the margin

Rayid Ghani @rayidghani

Rayid Ghani @rayidghani

Rayid Ghani @rayidghani

• For a data point d and a class c

cMAP =argmax P(c | d)

cMAP =argmax P(d | c)P(c)

=argmax P(x1, x2,…, xn | c)P(c) represented

Rayid Ghani @rayidghani

cMAP =argmax P(x1, x2 ,…, xn | c)P(c)

Rayid Ghani @rayidghani

P(x1,…, xn | c) =P(x1 | c)·P(x2 | c)·P(x3 | c)·...·P(xn | c)

Rayid Ghani @rayidghani

cMAP =argmax P(x1, x2 ,…, xn | c)P(c)

cNB =argmax P(c j )Õ P(x | c)

Rayid Ghani @rayidghani

Rayid Ghani @rayidghani

• How would you relax some of the assumptions

Rayid Ghani @rayidghani

You might also like