Logistic Regression
Logistic Regression
Aarti Singh
2
Discriminative Classifiers
Bayes Classifier:
Why not learn P(Y|X) directly? Or better yet, why not learn the
decision boundary directly?
• Assume some functional form for P(Y|X) or for the
decision boundary
• Estimate parameters of functional form directly from
training data
Today we will see one such classifier – Logistic Regression
3
Logistic Regression Not really regression
1
<latexit sha1_base64="m0EUV6lfJZ7gYJ/jxnNzqIh9V+E=">AAACF3icbVDLSsNAFJ34rPUVdelmsAgt0pCIoptC0Y3LCraNNCFMppN26OTBzMRaYv/Cjb/ixoUibnXn3zh9LLT1wIXDOfdy7z1+wqiQpvmtLSwuLa+s5tby6xubW9v6zm5DxCnHpI5jFnPbR4IwGpG6pJIRO+EEhT4jTb93OfKbd4QLGkc3cpAQN0SdiAYUI6kkTzdqxduK9WCXYAU6AUfYyqwjh9wnxXLfM8uOSEOPwr4q26OloacXTMMcA84Ta0oKYIqap3857RinIYkkZkiIlmUm0s0QlxQzMsw7qSAJwj3UIS1FIxQS4Wbjv4bwUCltGMRcVSThWP09kaFQiEHoq84Qya6Y9Ubif14rlcG5m9EoSSWJ8GRRkDIoYzgKCbYpJ1iygSIIc6puhbiLVDpSRZlXIVizL8+TxrFhnRrm9UmhejGNIwf2wQEoAgucgSq4AjVQBxg8gmfwCt60J+1Fe9c+Jq0L2nRmD/yB9vkDoxidIw==</latexit>
P (Y = 1|X) = P
1 + exp( w0 i w i Xi )
logistic (z)
Logistic
function
(or Sigmoid), s(z) = :
P (Y = 1|X) = P
1 + exp( w0 i w i Xi )
Decision boundary:
1
1
0
6
Training Logistic Regression
How to learn the parameters w0, w1, … wd? (d features)
Training Data
Maximum (Conditional) Likelihood Estimates
7
Expressing Conditional log Likelihood
1
<latexit sha1_base64="fRKecmAH7QifqDp6iBtMHm4ssnA=">AAACGXicbVDLSsNAFJ3UV62vqEs3g0Vo0ZZEFN0Uim5cVrBtpAlhMp20QycPZibWEvsbbvwVNy4Ucakr/8bpY6GtBy4czrmXe+/xYkaFNIxvLbOwuLS8kl3Nra1vbG7p2zsNESUckzqOWMQtDwnCaEjqkkpGrJgTFHiMNL3e5chv3hEuaBTeyEFMnAB1QupTjKSSXN2oFW4r5oN11C/CCrR9jrCZmoc2uY8Lpb5rlGyRBC6FfVWWS4tDV88bZWMMOE/MKcmDKWqu/mm3I5wEJJSYISFaphFLJ0VcUszIMGcngsQI91CHtBQNUUCEk44/G8IDpbShH3FVoYRj9fdEigIhBoGnOgMku2LWG4n/ea1E+udOSsM4kSTEk0V+wqCM4Cgm2KacYMkGiiDMqboV4i5S6UgVZk6FYM6+PE8ax2XztGxcn+SrF9M4smAP7IMCMMEZqIIrUAN1gMEjeAav4E170l60d+1j0prRpjO74A+0rx8CJZ3a</latexit>
P (Y = 1|X, w) = P
1 + exp( w0 i w i Xi )
1
<latexit sha1_base64="+IFZRrn+I1EsqOPovILz4rQSSBM=">AAACGHicbVDLSsNAFJ3UV62vqEs3g0VoqdREFN0Uim5cVrAPaUKYTCft0MmDmYm1xH6GG3/FjQtF3Hbn3zhts9DWAxcO59zLvfe4EaNCGsa3lllaXlldy67nNja3tnf03b2GCGOOSR2HLOQtFwnCaEDqkkpGWhEnyHcZabr964nffCBc0DC4k8OI2D7qBtSjGEklOfpJrXBfMZ5ax4MirEDL4wibiVmyyGNUGDhGyRKx71A4UNVyaHHk6HmjbEwBF4mZkjxIUXP0sdUJceyTQGKGhGibRiTtBHFJMSOjnBULEiHcR13SVjRAPhF2Mn1sBI+U0oFeyFUFEk7V3xMJ8oUY+q7q9JHsiXlvIv7ntWPpXdoJDaJYkgDPFnkxgzKEk5Rgh3KCJRsqgjCn6laIe0iFI1WWORWCOf/yImmcls3zsnF7lq9epXFkwQE4BAVgggtQBTegBuoAg2fwCt7Bh/aivWmf2tesNaOlM/vgD7TxD4f8naA=</latexit>
P (Y = 0|X, w) = P
1 + exp(w0 + i wi Xi )
8
Expressing Conditional log Likelihood
1
<latexit sha1_base64="fRKecmAH7QifqDp6iBtMHm4ssnA=">AAACGXicbVDLSsNAFJ3UV62vqEs3g0Vo0ZZEFN0Uim5cVrBtpAlhMp20QycPZibWEvsbbvwVNy4Ucakr/8bpY6GtBy4czrmXe+/xYkaFNIxvLbOwuLS8kl3Nra1vbG7p2zsNESUckzqOWMQtDwnCaEjqkkpGrJgTFHiMNL3e5chv3hEuaBTeyEFMnAB1QupTjKSSXN2oFW4r5oN11C/CCrR9jrCZmoc2uY8Lpb5rlGyRBC6FfVWWS4tDV88bZWMMOE/MKcmDKWqu/mm3I5wEJJSYISFaphFLJ0VcUszIMGcngsQI91CHtBQNUUCEk44/G8IDpbShH3FVoYRj9fdEigIhBoGnOgMku2LWG4n/ea1E+udOSsM4kSTEk0V+wqCM4Cgm2KacYMkGiiDMqboV4i5S6UgVZk6FYM6+PE8ax2XztGxcn+SrF9M4smAP7IMCMMEZqIIrUAN1gMEjeAav4E170l60d+1j0prRpjO74A+0rx8CJZ3a</latexit>
P (Y = 1|X, w) = P
1 + exp( w0 i w i Xi )
1
<latexit sha1_base64="+IFZRrn+I1EsqOPovILz4rQSSBM=">AAACGHicbVDLSsNAFJ3UV62vqEs3g0VoqdREFN0Uim5cVrAPaUKYTCft0MmDmYm1xH6GG3/FjQtF3Hbn3zhts9DWAxcO59zLvfe4EaNCGsa3lllaXlldy67nNja3tnf03b2GCGOOSR2HLOQtFwnCaEDqkkpGWhEnyHcZabr964nffCBc0DC4k8OI2D7qBtSjGEklOfpJrXBfMZ5ax4MirEDL4wibiVmyyGNUGDhGyRKx71A4UNVyaHHk6HmjbEwBF4mZkjxIUXP0sdUJceyTQGKGhGibRiTtBHFJMSOjnBULEiHcR13SVjRAPhF2Mn1sBI+U0oFeyFUFEk7V3xMJ8oUY+q7q9JHsiXlvIv7ntWPpXdoJDaJYkgDPFnkxgzKEk5Rgh3KCJRsqgjCn6laIe0iFI1WWORWCOf/yImmcls3zsnF7lq9epXFkwQE4BAVgggtQBTegBuoAg2fwCt7Bh/aivWmf2tesNaOlM/vgD7TxD4f8naA=</latexit>
P (Y = 0|X, w) = P
1 + exp(w0 + i wi Xi )
9
Expressing Conditional log Likelihood
1
<latexit sha1_base64="fRKecmAH7QifqDp6iBtMHm4ssnA=">AAACGXicbVDLSsNAFJ3UV62vqEs3g0Vo0ZZEFN0Uim5cVrBtpAlhMp20QycPZibWEvsbbvwVNy4Ucakr/8bpY6GtBy4czrmXe+/xYkaFNIxvLbOwuLS8kl3Nra1vbG7p2zsNESUckzqOWMQtDwnCaEjqkkpGrJgTFHiMNL3e5chv3hEuaBTeyEFMnAB1QupTjKSSXN2oFW4r5oN11C/CCrR9jrCZmoc2uY8Lpb5rlGyRBC6FfVWWS4tDV88bZWMMOE/MKcmDKWqu/mm3I5wEJJSYISFaphFLJ0VcUszIMGcngsQI91CHtBQNUUCEk44/G8IDpbShH3FVoYRj9fdEigIhBoGnOgMku2LWG4n/ea1E+udOSsM4kSTEk0V+wqCM4Cgm2KacYMkGiiDMqboV4i5S6UgVZk6FYM6+PE8ax2XztGxcn+SrF9M4smAP7IMCMMEZqIIrUAN1gMEjeAav4E170l60d+1j0prRpjO74A+0rx8CJZ3a</latexit>
P (Y = 1|X, w) = P
1 + exp( w0 i w i Xi )
1
<latexit sha1_base64="+IFZRrn+I1EsqOPovILz4rQSSBM=">AAACGHicbVDLSsNAFJ3UV62vqEs3g0VoqdREFN0Uim5cVrAPaUKYTCft0MmDmYm1xH6GG3/FjQtF3Hbn3zhts9DWAxcO59zLvfe4EaNCGsa3lllaXlldy67nNja3tnf03b2GCGOOSR2HLOQtFwnCaEDqkkpGWhEnyHcZabr964nffCBc0DC4k8OI2D7qBtSjGEklOfpJrXBfMZ5ax4MirEDL4wibiVmyyGNUGDhGyRKx71A4UNVyaHHk6HmjbEwBF4mZkjxIUXP0sdUJceyTQGKGhGibRiTtBHFJMSOjnBULEiHcR13SVjRAPhF2Mn1sBI+U0oFeyFUFEk7V3xMJ8oUY+q7q9JHsiXlvIv7ntWPpXdoJDaJYkgDPFnkxgzKEk5Rgh3KCJRsqgjCn6laIe0iFI1WWORWCOf/yImmcls3zsnF7lq9epXFkwQE4BAVgggtQBTegBuoAg2fwCt7Bh/aivWmf2tesNaOlM/vgD7TxD4f8naA=</latexit>
P (Y = 0|X, w) = P
1 + exp(w0 + i wi Xi )
Gradient:
l(w)
d
Update rule: Learning rate, h>0
w
Ø Poll: Effect of step-size h? 11
Effect of step-size h
12
Gradient Ascent for M(C)LE
13
Gradient Ascent for M(C)LE
" d
#
X 1 X
= y j
Pd · exp(w0 + wi xji )
j 1 + exp(w0 + i wi xji ) i
14
Gradient Ascent for M(C)LE
Logistic Regression
Gradient ascent algorithm: iterate until change < e
For i=1,…,d,
• Define priors on w
– Common assumption: Normal
distribution, zero mean, identity
covariance
Zero-mean Gaussian prior
– “Pushes” parameters towards zero
Logistic
logistic (z)
function
(or Sigmoid), s(z) = :
logistic (z)
function
(or Sigmoid), s(z) = :
17
That’s M(C)LE. How about M(C)AP?
• M(C)AP estimate
Zero-mean Gaussian prior
• Gradient
Same as before
Penalization = Regularization 19
M(C)LE vs. M(C)AP
• Maximum conditional likelihood estimate
20
Logistic Regression for more than 2
classes
• Logistic regression in more general case, where Y 2{y1,…,yK}
for k<K
Predict
22
Gaussian Naïve Bayes vs. Logistic
Regression
Set of Gaussian
Naïve Bayes parameters Set of Logistic
(feature variance Regression parameters
independent of class label)
More in
Paper…
however,
26