0% found this document useful (0 votes)

49 views21 pages

4.2 Generative

This document discusses linear classification using probabilistic generative models. It covers: 1) An overview of generative vs discriminative models, and using logistic sigmoid and softmax functions for Bayes classification. 2) Continuous inputs, where Gaussian distributed class-conditionals lead to parameter estimation and a linear decision boundary when the covariance matrices are the same. 3) It also briefly mentions discrete features and the exponential family, and how softmax generalizes logistic sigmoid to multiple classes.

Uploaded by

p_manimozhi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views21 pages

4.2 Generative

Uploaded by

p_manimozhi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Machine Learning Srihari

Linear Classification:
Probabilistic Generative Models

Sargur N. Srihari
University at Buffalo, State University of New York
USA

1
Machine Learning Srihari

Linear Classification using

Probabilistic Generative Models
• Topics
1. Overview (Generative vs Discriminative)
2. Bayes Classifier
• using Logistic Sigmoid and Softmax
3. Continuous inputs
• Gaussian Distributed Class-conditionals
– Parameter Estimation
4. Discrete Features
5. Exponential Family
2
Machine Learning Srihari

Overview of Methods for Classification

1. Generative Models (Two-step)
1. Infer class-conditional densities p(x|Ck) and priors p(Ck)
2. Use Bayes theorem to determine posterior probabilities
p(x |C k )p(C k )
p(C k | x) =
p(x)
2. Discriminative Models (One-step)
– Directly infer posterior probabilities p(Ck|x)
• Decision Theory
– In both cases use decision theory to assign each new x
to a class

3
Machine Learning Srihari

Generative Model
• Model class conditionals p(x|Ck), priors p(Ck)
• Compute posteriors p(Ck|x) from Bayes theorem
• Two class Case
• Posterior for class C1 is Since
p(x ) = ∑ p(x,Ci ) = ∑ p(x|Ci )p(Ci )
i i

p(x | C1 ) p(C1 )
p(C1 |x ) =
p(x | C1 ) p(C1 ) + p(x | C2 ) p(C2 ) LLR with
1 p(x | C1 ) p(C1 ) Bayes odds
= = σ (a) where a = ln
1+ exp(−a) p(x | C2 ) p(C2 )

4
Machine Learning Srihari

Logistic Sigmoid Function

1 1
σ (a) =
1+ exp(−a)
σ (a) Property : σ (−a) = 1− σ (a)
0.5
⎛ σ ⎞
Inverse : a = ln⎜ ⎟
⎝1− σ ⎠
If σ (a) = P(C1 | x ) then
0
!5 0 5 inverse represents
a ln[p(C1|x)/p(C2|x)
Sigmoid: S-shaped or squashing function
Log ratio of
maps real a ε (-∞, +∞) to finite (0,1) interval probabilities
called logit or log
Note: Dotted line is scaled probit function odds
cdf of a zero-mean unit variance Gaussian

5
Machine Learning Srihari

Generalizations and Special Cases

• More than 2 classes
• Gaussian Distribution of x
• Discrete Features
• Exponential Family

6
Machine Learning Srihari

Softmax: Generalization of logistic sigmoid 1

• For K=2 we used logistic sigmoid 0.5

• p(C1|x)=σ(a) where a = ln
p(x |C 1 )p(C 1 )
p(x |C 2 )p(C 2 )
Log ratio of
probabilities
σ (a) =
1 + exp(−a)

• For K > 2, we can use its generalization

!5 0 5

p(x |C k )p(C k ) If K=2 this reduces to a sigmoid

– Quantities ak are defined by ak=ln p(x|Ck)p(Ck)

• Known as the soft-max function
– Since it is a smoothed max function
• If ak>>aj for all j ≠ k then p(Ck|x) =1 and 0 for rest
• A general technique for finding max of several ak 7
Machine Learning Srihari

Specific forms of class-conditionals

• We will next see that linear classifiers occur
both in continuous and discrete cases as
consequences of choosing specific forms of the
class-conditional densities p(x|Ck)
• Looking first at continuous input variables x
• Then discussing discrete inputs

8
Machine Learning Srihari
Continuous Inputs: Gaussians
• Assume Gaussian class-conditional densities
with same covariance matrix Σ
1 1 ⎧ 1 ⎫
p(x |C k ) = exp ⎨ − (x − µ )T −1
Σ (x − µ )
k ⎬
(2π )D/2 Σ 1/2 ⎩ 2 k
⎭

• Consider first two-class case.

⎛ p(x |C 1 )p(C 1 ) ⎞
– Substituting into p(C 1 | x) = σ ⎜ ln ⎟
⎝ p(x |C 2 )p(C 2 ) ⎠

– And rearranging we get p(C 1 | x) = σ (wT x + w0 )

• where
p(C 1 )
w = Σ −1(µ1 − µ2 ) 1 1
w0 = − µ1T Σ −1µ1 + µT2 Σ −1µ2 + ln
2 2 p(C 2 )

– Quadratic terms in x from the exponents of the Gaussians

have cancelled due to common covariance matrices
• The argument of the logistic sigmoid is a linear
function of x
Machine Learning Srihari

Two Gaussian Classes

Two-dimensional input space x =(x1,x2)
Class-conditional densities p(x|Ck) Posterior p(C1|x)

Linear Decision
boundary

A logistic sigmoid
of a linear function of x
Values are positive (need not sum to 1) Red ink proportional to p(C1|x)
Blue ink to p(C2|x)=1-p(C1|x)
Value 1 or 0
10
Machine Learning Srihari

Continuous case with K >2

p(x |C k )p(C k )
p(C k | x) =
∑ j
p(x |C j )p(C j )
exp(ak )
=
∑ j
exp(a j )

• With Gaussian class conditionals

ak (x) = wkT x + wk 0 Quadratic terms
cancel thereby
– where wk = Σ −1µk leading to linearity

1
wk 0 = − µkT Σ −1µk + ln p(C k )
2

– If we did not assume shared covariance

matrix we get a quadratic discriminant 11
Machine Learning Srihari

Three-class case with Gaussian models

Both Linear and Quadratic Decision boundaries
2.5
2
1.5
1
0.5
0
!0.5
!1
!1.5
!2
!2.5
!2 !1 0 1 2

Class-conditional Densities Posterior Probabilities

C1 and C2 have same covariance Between C1 and C2 boundary is linear,
matrix Others are quadratic
RGB values correspond to posterior
probabilities
12
Machine Learning Srihari

Maximum Likelihood Solutions

• Once we have specified a parametric
functional forms
– for the class-conditional densities p(x|Ck)
– we can then determine the parameters together
with the prior probabilities p(Ck) using maximum
likelihood
• This requires a data set of observations x
along with their class labels
13
Machine Learning Srihari

M.L.E. for Gaussian Parameters

• Assuming parametric forms for p(x|Ck) we can
determine values of parameters and priors p(Ck)
using maximum likelihood

where t =(t1,..,tN)T
Convenient to maximize log of likelihood
14
Machine Learning Srihari

Max Likelihood for Prior and Means

Estimates for prior probabilities
MLE for p is
Fraction of points

Estimates for class means

Mean of all input vectors

xn assigned to class C1

15
Machine Learning Srihari

Max Likelihood for Covariance Matrix

Solution for Shared Covariance Matrix
Pick out terms in log-likelihood function depending on Σ

Weighted average of the

two separate
covariance matrices

16
Machine Learning Srihari

Discrete Features
• Assuming binary features x i ∈{0,1}
With M inputs, distribution is a table of 2M values

• Naive Bayes assumption: independent features

Class-conditional distributions have the form
M
p(x |C k ) = ∏ µkii (1 − µki )
x 1−xi

i =1

Substituting in the form needed for normalized exponential

ak (x) = ln(p(x |C k )p(C k ))
M

{ }
= ∑ x i ln µki + (1 − x i )ln(1 − µki + ln p(C k )
i =1

which is linear in x
•Similar results for discrete variables which take
more than 2 values
Machine Learning Srihari

Exponential Family
• We have seen that for both Gaussian
distributed and discrete inputs, the posterior
class probabilities are given by generalized
linear models with logistic sigmoid (K=2) or
softmax (K≥2) activation functions
• These are particular cases of a more general
result obtained by assuming that the class-
conditional densities p(x|Ck) are members of the
exponential family of distributions

18
Machine Learning Srihari

Exponential Family Definition

• Class-conditionals that belong to the
exponential family have the general form
p(x | λk ) = h(x)g(λk )exp λkT u(x) { }
–Where λk are natural parameters of the distribution,
u(x) is a function of x and g (λk) is a coefficient that
ensures distribution is normalized
• Restricting attention to the subclass of such
distributions for which u(x)=x and introducing a
scaling parameter s we obtain the form
1 1 ⎧1 ⎫
p(x | λk ,s) = h( x)g(λk )exp ⎨ λkT x ⎬
s s ⎩s ⎭
• Note that each class has its own parameter vector λk but
share a scale parameter
Machine Learning Srihari

Exponential Family Sigmoidal form

• For the two-class problem
– Substitute expressions for the class conditional
densities into a = ln p(x
p(x |C )p(C )
1
|C )p(C )
2
and we see that the
1

posterior probability is given by a logistic sigmoid

acting on a linear function a(x)
a(x) = (λ1 − λ2 )T x + ln g(λ1 ) − ln g(λ2 ) + ln p(C 1 ) − ln p(C 2 )
• For the K-class problem
– Substituting the class-conditional density
expression into ak=ln p(x|Ck)p(Ck) and we get
ak (x) = λkT x + ln g(λk ) + ln p(C k )
– which is again a linear function of x
20
Machine Learning Srihari

Summary of probabilistic linear classifiers

• Defined using
– logistic sigmoid
p(C1 | x ) = σ (a) where a is LLR with Bayes odds

– soft-max functions
exp(ak )
p(Ck | x ) =
∑ j exp(a j )
• Continuous case with shared covariance
– we get linear functions of input x
• Discrete case with independent features
also results in linear functions
21

Murphy Book Solution
No ratings yet
Murphy Book Solution
100 pages
Notes ch2 Discrete and Continuous Distributions
No ratings yet
Notes ch2 Discrete and Continuous Distributions
48 pages
Classification
100% (2)
Classification
105 pages
Bayesian
No ratings yet
Bayesian
91 pages
Statistical Machine Learning W4400 Lecture Slides PDF
No ratings yet
Statistical Machine Learning W4400 Lecture Slides PDF
520 pages
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
No ratings yet
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
56 pages
Maximum Likelihood and Bayesian Parameter Estimation: Chapter 3, DHS
No ratings yet
Maximum Likelihood and Bayesian Parameter Estimation: Chapter 3, DHS
35 pages
Ch13 5-ConditionalRandomFields
No ratings yet
Ch13 5-ConditionalRandomFields
57 pages
Bayesian Learning: Berrin Yanikoglu
No ratings yet
Bayesian Learning: Berrin Yanikoglu
64 pages
Classification: Probabilistic Generative Model
No ratings yet
Classification: Probabilistic Generative Model
34 pages
Probability Theory For Machine Learning: Chris Cremer September 2015
No ratings yet
Probability Theory For Machine Learning: Chris Cremer September 2015
40 pages
Log-Linear Models and Conditional Random Fieldsels
No ratings yet
Log-Linear Models and Conditional Random Fieldsels
27 pages
Normal Distribution
No ratings yet
Normal Distribution
32 pages
Discriminative Generative: R Follow A
100% (1)
Discriminative Generative: R Follow A
18 pages
Module - 4 - ECE3047 - Machine Learning
No ratings yet
Module - 4 - ECE3047 - Machine Learning
81 pages
2223hk1 Slide01 ML2022-2
No ratings yet
2223hk1 Slide01 ML2022-2
23 pages
Lecture 15 - Generative Models For Supervised Learning - Plain
No ratings yet
Lecture 15 - Generative Models For Supervised Learning - Plain
15 pages
Machine Learning: Lecture 6: Bayesian Learning (Based On Chapter 6 of Mitchell T.., Machine Learning, 1997)
No ratings yet
Machine Learning: Lecture 6: Bayesian Learning (Based On Chapter 6 of Mitchell T.., Machine Learning, 1997)
15 pages
Unit 3-Generative Models
No ratings yet
Unit 3-Generative Models
23 pages
Toc 1
No ratings yet
Toc 1
17 pages
Unit 3-Discriminative Models
No ratings yet
Unit 3-Discriminative Models
29 pages
08 Generative
No ratings yet
08 Generative
23 pages
Brief Intro To ML PDF
No ratings yet
Brief Intro To ML PDF
236 pages
K - Nearest Neighbours Classifier / Regressor
No ratings yet
K - Nearest Neighbours Classifier / Regressor
35 pages
Unit 5 - Machine Learning - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Machine Learning - WWW - Rgpvnotes.in
17 pages
Statistical Machine Learning-The Basic Approach and Current Research Challenges
No ratings yet
Statistical Machine Learning-The Basic Approach and Current Research Challenges
35 pages
Machine Learning - Unit 2
No ratings yet
Machine Learning - Unit 2
104 pages
8 - Classification NaiveBayes PDF
No ratings yet
8 - Classification NaiveBayes PDF
13 pages
Lecture 04
No ratings yet
Lecture 04
28 pages
4 Continuous Probability Distribution.9188.1578362393.1974
No ratings yet
4 Continuous Probability Distribution.9188.1578362393.1974
47 pages
IandF CT6 201704 Exam
No ratings yet
IandF CT6 201704 Exam
6 pages
Cheatsheet Supervised Learning
100% (1)
Cheatsheet Supervised Learning
4 pages
Chapter 4
No ratings yet
Chapter 4
21 pages
Lecture5 Maximum Likelihood
No ratings yet
Lecture5 Maximum Likelihood
13 pages
Naive Bays & Support Vector Machines 2024-PPG
No ratings yet
Naive Bays & Support Vector Machines 2024-PPG
63 pages
Unit V Graphical Models
No ratings yet
Unit V Graphical Models
23 pages
6.867 Section 3: Classification: 1 Intro 2 2 Representation 2 3 Probabilistic Models 2
No ratings yet
6.867 Section 3: Classification: 1 Intro 2 2 Representation 2 3 Probabilistic Models 2
10 pages
Unit 3
No ratings yet
Unit 3
9 pages
Unit 3
No ratings yet
Unit 3
119 pages
Cheatsheet Supervised Learning
No ratings yet
Cheatsheet Supervised Learning
4 pages
Extreme Value Theory
No ratings yet
Extreme Value Theory
8 pages
Exam 1 Intro Probability
No ratings yet
Exam 1 Intro Probability
6 pages
Mllabprog 5
No ratings yet
Mllabprog 5
6 pages
Foundations of Data Science
No ratings yet
Foundations of Data Science
4 pages
Lecture 4
No ratings yet
Lecture 4
51 pages
20EC3305 - PTRP - Unit I & 2 - Sessional-I Question Bank - 2022-23
No ratings yet
20EC3305 - PTRP - Unit I & 2 - Sessional-I Question Bank - 2022-23
4 pages
4.6 Weibull Distributions
No ratings yet
4.6 Weibull Distributions
2 pages
Heavy-Tailed Distribution
No ratings yet
Heavy-Tailed Distribution
7 pages
Mobcomp Unit3
No ratings yet
Mobcomp Unit3
12 pages
Unit 1 mth145 Random Variable
No ratings yet
Unit 1 mth145 Random Variable
16 pages
HKDSE M1 Probability Distributions
No ratings yet
HKDSE M1 Probability Distributions
15 pages
EJEMPLO
No ratings yet
EJEMPLO
11 pages
Bayesian Learning
No ratings yet
Bayesian Learning
21 pages
Statistical Distributions
No ratings yet
Statistical Distributions
170 pages
3 6-ConditionalIndependence
No ratings yet
3 6-ConditionalIndependence
38 pages
ML Basics Lecture2 Linear Classification
No ratings yet
ML Basics Lecture2 Linear Classification
34 pages
L3 (Week3) Bayesian Classifier
No ratings yet
L3 (Week3) Bayesian Classifier
21 pages
Ch.3 - Ch.4 - Ch.5 Part II
No ratings yet
Ch.3 - Ch.4 - Ch.5 Part II
86 pages
Kmeans Notes
No ratings yet
Kmeans Notes
8 pages
Statistical Perspective
No ratings yet
Statistical Perspective
85 pages
07 - Bayesian Learning
No ratings yet
07 - Bayesian Learning
55 pages
CQF P MP Ps Solutions
No ratings yet
CQF P MP Ps Solutions
14 pages
PRML RefSheet
No ratings yet
PRML RefSheet
6 pages
UCS - 401 - Unit-LV - Probabilistic Models Normal Distribution and Its Geometric Interpretations - 03
No ratings yet
UCS - 401 - Unit-LV - Probabilistic Models Normal Distribution and Its Geometric Interpretations - 03
14 pages
Rock
No ratings yet
Rock
48 pages
Notes6 Classification
No ratings yet
Notes6 Classification
10 pages
Chapter 3 Special Continuous Distribution
No ratings yet
Chapter 3 Special Continuous Distribution
54 pages
ML-chap10 2024 110300
No ratings yet
ML-chap10 2024 110300
29 pages
Unit 3
No ratings yet
Unit 3
99 pages
Further Statistics Mock 5: Answers 9231/4M/05/M/J/24
No ratings yet
Further Statistics Mock 5: Answers 9231/4M/05/M/J/24
1 page
Lecture 6 - Generative Models
No ratings yet
Lecture 6 - Generative Models
33 pages
1.practise Problem Set - Probability and Probability Distribution (CO-1)
No ratings yet
1.practise Problem Set - Probability and Probability Distribution (CO-1)
9 pages
sTA642 ASSIGNMENT 1 SOLUTION
No ratings yet
sTA642 ASSIGNMENT 1 SOLUTION
2 pages
Lec04 Classifiers NBC
No ratings yet
Lec04 Classifiers NBC
24 pages
Unit 5 - Machine Learning
No ratings yet
Unit 5 - Machine Learning
16 pages
Lecture 3-Revision - Part2
No ratings yet
Lecture 3-Revision - Part2
25 pages
Lecture 03 Bayes Classifier With Prob Concepts
No ratings yet
Lecture 03 Bayes Classifier With Prob Concepts
70 pages
DL Highlights
No ratings yet
DL Highlights
6 pages
Stat 301 L18
No ratings yet
Stat 301 L18
18 pages
H3 Test 3 Practice Questions
No ratings yet
H3 Test 3 Practice Questions
26 pages
Quality Control Assurance and Reliability: Dr. C.Phaneendra Kiran
No ratings yet
Quality Control Assurance and Reliability: Dr. C.Phaneendra Kiran
89 pages
Final Sem
No ratings yet
Final Sem
2 pages
FML Unit3
No ratings yet
FML Unit3
18 pages
Final PHM113s - Modelanswer
No ratings yet
Final PHM113s - Modelanswer
4 pages
4.ML Estimation
No ratings yet
4.ML Estimation
19 pages
Bayesian
No ratings yet
Bayesian
23 pages
Xstat
No ratings yet
Xstat
57 pages
Assignment-1 (BMAS0108)
No ratings yet
Assignment-1 (BMAS0108)
2 pages
Key Concepts in Probabilistic Learning and SVMs
No ratings yet
Key Concepts in Probabilistic Learning and SVMs
15 pages
FRM Chapter2 Random Variables Summary
No ratings yet
FRM Chapter2 Random Variables Summary
3 pages
ML 05 Bayesian Classifier
No ratings yet
ML 05 Bayesian Classifier
19 pages
Naive Bayes Classifier and Other Topics
No ratings yet
Naive Bayes Classifier and Other Topics
52 pages
Bayes Classification
No ratings yet
Bayes Classification
4 pages
Applied Statistics - Lecture 1: Mario Beraha
No ratings yet
Applied Statistics - Lecture 1: Mario Beraha
52 pages

4.2 Generative

Uploaded by

4.2 Generative

Uploaded by

Machine Learning Srihari

Linear Classification using

Overview of Methods for Classification

Logistic Sigmoid Function

Generalizations and Special Cases

Softmax: Generalization of logistic sigmoid 1

• For K=2 we used logistic sigmoid 0.5

• For K > 2, we can use its generalization

p(x |C k )p(C k ) If K=2 this reduces to a sigmoid

– Quantities ak are defined by ak=ln p(x|Ck)p(Ck)

Specific forms of class-conditionals

• Consider first two-class case.

– And rearranging we get p(C 1 | x) = σ (wT x + w0 )

– Quadratic terms in x from the exponents of the Gaussians

Two Gaussian Classes

Continuous case with K >2

• With Gaussian class conditionals

– If we did not assume shared covariance

Three-class case with Gaussian models

Class-conditional Densities Posterior Probabilities

Maximum Likelihood Solutions

M.L.E. for Gaussian Parameters

Max Likelihood for Prior and Means

Estimates for class means

Mean of all input vectors

Max Likelihood for Covariance Matrix

Weighted average of the

• Naive Bayes assumption: independent features

Substituting in the form needed for normalized exponential

Exponential Family Definition

Exponential Family Sigmoidal form

posterior probability is given by a logistic sigmoid

Summary of probabilistic linear classifiers

You might also like