4.2 Generative
4.2 Generative
Linear Classification:
Probabilistic Generative Models
Sargur N. Srihari
University at Buffalo, State University of New York
USA
1
Machine Learning Srihari
3
Machine Learning Srihari
Generative Model
• Model class conditionals p(x|Ck), priors p(Ck)
• Compute posteriors p(Ck|x) from Bayes theorem
• Two class Case
• Posterior for class C1 is Since
p(x ) = ∑ p(x,Ci ) = ∑ p(x|Ci )p(Ci )
i i
p(x | C1 ) p(C1 )
p(C1 |x ) =
p(x | C1 ) p(C1 ) + p(x | C2 ) p(C2 ) LLR with
1 p(x | C1 ) p(C1 ) Bayes odds
= = σ (a) where a = ln
1+ exp(−a) p(x | C2 ) p(C2 )
4
Machine Learning Srihari
5
Machine Learning Srihari
6
Machine Learning Srihari
• p(C1|x)=σ(a) where a = ln
p(x |C 1 )p(C 1 )
p(x |C 2 )p(C 2 )
Log ratio of
probabilities
σ (a) =
1 + exp(−a)
8
Machine Learning Srihari
Continuous Inputs: Gaussians
• Assume Gaussian class-conditional densities
with same covariance matrix Σ
1 1 ⎧ 1 ⎫
p(x |C k ) = exp ⎨ − (x − µ )T −1
Σ (x − µ )
k ⎬
(2π )D/2 Σ 1/2 ⎩ 2 k
⎭
Linear Decision
boundary
A logistic sigmoid
of a linear function of x
Values are positive (need not sum to 1) Red ink proportional to p(C1|x)
Blue ink to p(C2|x)=1-p(C1|x)
Value 1 or 0
10
Machine Learning Srihari
1
wk 0 = − µkT Σ −1µk + ln p(C k )
2
where t =(t1,..,tN)T
Convenient to maximize log of likelihood
14
Machine Learning Srihari
15
Machine Learning Srihari
16
Machine Learning Srihari
Discrete Features
• Assuming binary features x i ∈{0,1}
With M inputs, distribution is a table of 2M values
i =1
{ }
= ∑ x i ln µki + (1 − x i )ln(1 − µki + ln p(C k )
i =1
which is linear in x
•Similar results for discrete variables which take
more than 2 values
Machine Learning Srihari
Exponential Family
• We have seen that for both Gaussian
distributed and discrete inputs, the posterior
class probabilities are given by generalized
linear models with logistic sigmoid (K=2) or
softmax (K≥2) activation functions
• These are particular cases of a more general
result obtained by assuming that the class-
conditional densities p(x|Ck) are members of the
exponential family of distributions
18
Machine Learning Srihari
– soft-max functions
exp(ak )
p(Ck | x ) =
∑ j exp(a j )
• Continuous case with shared covariance
– we get linear functions of input x
• Discrete case with independent features
also results in linear functions
21