0% found this document useful (0 votes)
3 views15 pages

L16 LogisticRegression

The document discusses logistic regression as a method for classification, explaining how it predicts the probability of a binary outcome using a logistic function. It details the process of estimating parameters through maximum likelihood estimation and gradient ascent. The document also includes examples and mathematical formulations relevant to logistic regression, including the likelihood function and its gradient.

Uploaded by

Ed Z
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views15 pages

L16 LogisticRegression

The document discusses logistic regression as a method for classification, explaining how it predicts the probability of a binary outcome using a logistic function. It details the process of estimating parameters through maximum likelihood estimation and gradient ascent. The document also includes examples and mathematical formulations relevant to logistic regression, including the likelihood function and its gradient.

Uploaded by

Ed Z
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Logistic Regression

Foundations of Data Analysis

April 5, 2022
Classification as Regression
y
(xi , yi )

Regression problem:
Given x (independent variable),
predict y (dependent variable).
x

y
(xi , yi )
Classification problem: 1

Given x (features),
predict y (labels).
0
x
Classification as Regression

What if given x, we wanted to predict p(y = 1 | x)?


y
(xi , yi )
1

0
x

Linear fit to p(y = 1 | x) goes outside [0, 1]!


Classification as Regression

We want to use a nonlinear function with outputs in [0, 1]


y
(xi , yi )
1

0
x

This is logistic regression.


Logistic Function

1
σ(t) =
1 + e−t
Linear Predictor Inside Logistic Function

1
p(y | x) = σ(β0 + β1 x) =
1 + e−β0 −β1 x

β0 : “intercept” β1 : “slope”
Example (from Wikipedia)
Pass/fail of exam (y) vs. Hours spent studying (x)
Multivariate Predictor

If x is multivariate: x = (x(1) , x(2) , . . . , x(d) ),

p(y | x) = σ(β0 + β1 x(1) + β2 x(2) + · · · + βd x(d) )


1
=
1 + e−β0 −β1 x −β2 x(2) −···−βd x(d)
(1)

(Note: just multivariate linear regression inside σ )


Multivariate Predictor
Data matrix X with n data points (rows):

1 x11 x12 · · · x1d


 
1 x21 x22 · · · x2d 
X=
 ... ... .. 
. 
1 xn1 xn2 · · · xnd

Logistic regression evaluated for the ith data point


(ith row vector):

p(y | Xi• ) = σ(Xi• β)

(Note: Xi• β is the dot product btwn ith row and β )


How To Estimate Parameter β ?

Maximize likelihood:
1. Compute derivative (gradient) of likelihood w.r.t. β
2. Solve for β that makes this derivative zero
Likelihood Function

Use Bernoulli likelihood:

n
Y
L(β; X, y) = σ(Xi• β)yi (1 − σ(Xi• β))1−yi
i=1
Log-Likelihood Function

`(β; X, y) = ln L(β; X, y)
X n
= (yi − 1)Xi• β − ln(1 + e−Xi• β )
i=1
Gradient of Log-Likelihood Function

 ∂` 
∂β
 0
 ∂` 
 ∂β1 
∇`(β; X, y) =  
 
 ... 
 
 
∂`
∂βd
n 
e−Xi• β

∂` X
= (yi − 1) + Xik
∂βk 1 + e−Xi• β
i=1
Problem! Can’t solve for β that makes this zero!
Gradient Ascent
I Take a small step in the gradient direction
I Repeat until the gradient is zero
Algorithm for Logistic Regression

Set  = small threshold


Set δ = step size along gradient
Initialize β

While k∇`k > 


Update β ← β + δ∇`(β)

You might also like