0% found this document useful (0 votes)
49 views21 pages

4.2 Generative

This document discusses linear classification using probabilistic generative models. It covers: 1) An overview of generative vs discriminative models, and using logistic sigmoid and softmax functions for Bayes classification. 2) Continuous inputs, where Gaussian distributed class-conditionals lead to parameter estimation and a linear decision boundary when the covariance matrices are the same. 3) It also briefly mentions discrete features and the exponential family, and how softmax generalizes logistic sigmoid to multiple classes.

Uploaded by

p_manimozhi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views21 pages

4.2 Generative

This document discusses linear classification using probabilistic generative models. It covers: 1) An overview of generative vs discriminative models, and using logistic sigmoid and softmax functions for Bayes classification. 2) Continuous inputs, where Gaussian distributed class-conditionals lead to parameter estimation and a linear decision boundary when the covariance matrices are the same. 3) It also briefly mentions discrete features and the exponential family, and how softmax generalizes logistic sigmoid to multiple classes.

Uploaded by

p_manimozhi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Machine Learning Srihari

Linear Classification:
Probabilistic Generative Models

Sargur N. Srihari
University at Buffalo, State University of New York
USA

1
Machine Learning Srihari

Linear Classification using


Probabilistic Generative Models
• Topics
1. Overview (Generative vs Discriminative)
2. Bayes Classifier
• using Logistic Sigmoid and Softmax
3. Continuous inputs
• Gaussian Distributed Class-conditionals
– Parameter Estimation
4. Discrete Features
5. Exponential Family
2
Machine Learning Srihari

Overview of Methods for Classification


1. Generative Models (Two-step)
1. Infer class-conditional densities p(x|Ck) and priors p(Ck)
2. Use Bayes theorem to determine posterior probabilities
p(x |C k )p(C k )
p(C k | x) =
p(x)
2. Discriminative Models (One-step)
– Directly infer posterior probabilities p(Ck|x)
• Decision Theory
– In both cases use decision theory to assign each new x
to a class

3
Machine Learning Srihari

Generative Model
• Model class conditionals p(x|Ck), priors p(Ck)
• Compute posteriors p(Ck|x) from Bayes theorem
• Two class Case
• Posterior for class C1 is Since
p(x ) = ∑ p(x,Ci ) = ∑ p(x|Ci )p(Ci )
i i

p(x | C1 ) p(C1 )
p(C1 |x ) =
p(x | C1 ) p(C1 ) + p(x | C2 ) p(C2 ) LLR with
1 p(x | C1 ) p(C1 ) Bayes odds
= = σ (a) where a = ln
1+ exp(−a) p(x | C2 ) p(C2 )

4
Machine Learning Srihari

Logistic Sigmoid Function


1 1
σ (a) =
1+ exp(−a)
σ (a) Property : σ (−a) = 1− σ (a)
0.5
⎛ σ ⎞
Inverse : a = ln⎜ ⎟
⎝1− σ ⎠
If σ (a) = P(C1 | x ) then
0
!5 0 5 inverse represents
a ln[p(C1|x)/p(C2|x)
Sigmoid: S-shaped or squashing function
Log ratio of
maps real a ε (-∞, +∞) to finite (0,1) interval probabilities
called logit or log
Note: Dotted line is scaled probit function odds
cdf of a zero-mean unit variance Gaussian

5
Machine Learning Srihari

Generalizations and Special Cases


• More than 2 classes
• Gaussian Distribution of x
• Discrete Features
• Exponential Family

6
Machine Learning Srihari

Softmax: Generalization of logistic sigmoid 1

• For K=2 we used logistic sigmoid 0.5


1

• p(C1|x)=σ(a) where a = ln
p(x |C 1 )p(C 1 )
p(x |C 2 )p(C 2 )
Log ratio of
probabilities
σ (a) =
1 + exp(−a)

• For K > 2, we can use its generalization


!5 0 5

p(x |C k )p(C k ) If K=2 this reduces to a sigmoid


p(C k | x) = p(C1|x)=exp(a1)/ [exp(a1)+exp (a2)]
∑ j
p(x |C j )p(C j ) =1/ [1+ exp (a2-a1)]
=1/ [1+ exp (lnp(x|C2)p(C2)-ln(x|C1)p(C1)]
exp(ak ) =1/ [1+ p(x|C2)p(C2) / p(x|C1)p(C1)]
=
∑ j
exp(a j ) =1/ [1+ exp (-a)] where
a = ln
p(x |C 1 )p(C 1 )
p(x |C 2 )p(C 2 )

– Quantities ak are defined by ak=ln p(x|Ck)p(Ck)


• Known as the soft-max function
– Since it is a smoothed max function
• If ak>>aj for all j ≠ k then p(Ck|x) =1 and 0 for rest
• A general technique for finding max of several ak 7
Machine Learning Srihari

Specific forms of class-conditionals


• We will next see that linear classifiers occur
both in continuous and discrete cases as
consequences of choosing specific forms of the
class-conditional densities p(x|Ck)
• Looking first at continuous input variables x
• Then discussing discrete inputs

8
Machine Learning Srihari
Continuous Inputs: Gaussians
• Assume Gaussian class-conditional densities
with same covariance matrix Σ
1 1 ⎧ 1 ⎫
p(x |C k ) = exp ⎨ − (x − µ )T −1
Σ (x − µ )
k ⎬
(2π )D/2 Σ 1/2 ⎩ 2 k

• Consider first two-class case.


⎛ p(x |C 1 )p(C 1 ) ⎞
– Substituting into p(C 1 | x) = σ ⎜ ln ⎟
⎝ p(x |C 2 )p(C 2 ) ⎠

– And rearranging we get p(C 1 | x) = σ (wT x + w0 )


• where
p(C 1 )
w = Σ −1(µ1 − µ2 ) 1 1
w0 = − µ1T Σ −1µ1 + µT2 Σ −1µ2 + ln
2 2 p(C 2 )

– Quadratic terms in x from the exponents of the Gaussians


have cancelled due to common covariance matrices
• The argument of the logistic sigmoid is a linear
function of x
Machine Learning Srihari

Two Gaussian Classes


Two-dimensional input space x =(x1,x2)
Class-conditional densities p(x|Ck) Posterior p(C1|x)

Linear Decision
boundary

A logistic sigmoid
of a linear function of x
Values are positive (need not sum to 1) Red ink proportional to p(C1|x)
Blue ink to p(C2|x)=1-p(C1|x)
Value 1 or 0
10
Machine Learning Srihari

Continuous case with K >2


p(x |C k )p(C k )
p(C k | x) =
∑ j
p(x |C j )p(C j )
exp(ak )
=
∑ j
exp(a j )

• With Gaussian class conditionals


ak (x) = wkT x + wk 0 Quadratic terms
cancel thereby
– where wk = Σ −1µk leading to linearity

1
wk 0 = − µkT Σ −1µk + ln p(C k )
2

– If we did not assume shared covariance


matrix we get a quadratic discriminant 11
Machine Learning Srihari

Three-class case with Gaussian models


Both Linear and Quadratic Decision boundaries
2.5
2
1.5
1
0.5
0
!0.5
!1
!1.5
!2
!2.5
!2 !1 0 1 2

Class-conditional Densities Posterior Probabilities


C1 and C2 have same covariance Between C1 and C2 boundary is linear,
matrix Others are quadratic
RGB values correspond to posterior
probabilities
12
Machine Learning Srihari

Maximum Likelihood Solutions


• Once we have specified a parametric
functional forms
– for the class-conditional densities p(x|Ck)
– we can then determine the parameters together
with the prior probabilities p(Ck) using maximum
likelihood
• This requires a data set of observations x
along with their class labels
13
Machine Learning Srihari

M.L.E. for Gaussian Parameters


• Assuming parametric forms for p(x|Ck) we can
determine values of parameters and priors p(Ck)
using maximum likelihood

where t =(t1,..,tN)T
Convenient to maximize log of likelihood
14
Machine Learning Srihari

Max Likelihood for Prior and Means


Estimates for prior probabilities
MLE for p is
Fraction of points

Estimates for class means

Mean of all input vectors


xn assigned to class C1

15
Machine Learning Srihari

Max Likelihood for Covariance Matrix


Solution for Shared Covariance Matrix
Pick out terms in log-likelihood function depending on Σ

Weighted average of the


two separate
covariance matrices

16
Machine Learning Srihari

Discrete Features
• Assuming binary features x i ∈{0,1}
With M inputs, distribution is a table of 2M values

• Naive Bayes assumption: independent features


Class-conditional distributions have the form
M
p(x |C k ) = ∏ µkii (1 − µki )
x 1−xi

i =1

Substituting in the form needed for normalized exponential


ak (x) = ln(p(x |C k )p(C k ))
M

{ }
= ∑ x i ln µki + (1 − x i )ln(1 − µki + ln p(C k )
i =1

which is linear in x
•Similar results for discrete variables which take
more than 2 values
Machine Learning Srihari

Exponential Family
• We have seen that for both Gaussian
distributed and discrete inputs, the posterior
class probabilities are given by generalized
linear models with logistic sigmoid (K=2) or
softmax (K≥2) activation functions
• These are particular cases of a more general
result obtained by assuming that the class-
conditional densities p(x|Ck) are members of the
exponential family of distributions

18
Machine Learning Srihari

Exponential Family Definition


• Class-conditionals that belong to the
exponential family have the general form
p(x | λk ) = h(x)g(λk )exp λkT u(x) { }
–Where λk are natural parameters of the distribution,
u(x) is a function of x and g (λk) is a coefficient that
ensures distribution is normalized
• Restricting attention to the subclass of such
distributions for which u(x)=x and introducing a
scaling parameter s we obtain the form
1 1 ⎧1 ⎫
p(x | λk ,s) = h( x)g(λk )exp ⎨ λkT x ⎬
s s ⎩s ⎭
• Note that each class has its own parameter vector λk but
share a scale parameter
Machine Learning Srihari

Exponential Family Sigmoidal form


• For the two-class problem
– Substitute expressions for the class conditional
densities into a = ln p(x
p(x |C )p(C )
1
|C )p(C )
2
and we see that the
1

posterior probability is given by a logistic sigmoid


acting on a linear function a(x)
a(x) = (λ1 − λ2 )T x + ln g(λ1 ) − ln g(λ2 ) + ln p(C 1 ) − ln p(C 2 )
• For the K-class problem
– Substituting the class-conditional density
expression into ak=ln p(x|Ck)p(Ck) and we get
ak (x) = λkT x + ln g(λk ) + ln p(C k )
– which is again a linear function of x
20
Machine Learning Srihari

Summary of probabilistic linear classifiers


• Defined using
– logistic sigmoid
p(C1 | x ) = σ (a) where a is LLR with Bayes odds

– soft-max functions
exp(ak )
p(Ck | x ) =
∑ j exp(a j )
• Continuous case with shared covariance
– we get linear functions of input x
• Discrete case with independent features
also results in linear functions
21

You might also like