0% found this document useful (0 votes)
162 views29 pages

Chapter 3 (PR)

1) The document discusses parameter estimation using maximum likelihood and Bayesian approaches. 2) For maximum likelihood, parameters are viewed as fixed quantities estimated by maximizing the likelihood of observed training examples. For Bayesian estimation, parameters are random variables with a prior distribution transformed into a posterior distribution via Bayes' theorem. 3) When the class-conditional probability density function (pdf) has a parametric form, maximum likelihood estimates the parameters by differentiating the log-likelihood function. For the Gaussian case with unknown mean and variance, the estimates are simply the sample mean and variance. 4) Bayesian estimation proceeds in three phases: 1) applying Bayes' formula to derive the posterior pdf of parameters from their prior pdf and

Uploaded by

Srikanta Karthik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
162 views29 pages

Chapter 3 (PR)

1) The document discusses parameter estimation using maximum likelihood and Bayesian approaches. 2) For maximum likelihood, parameters are viewed as fixed quantities estimated by maximizing the likelihood of observed training examples. For Bayesian estimation, parameters are random variables with a prior distribution transformed into a posterior distribution via Bayes' theorem. 3) When the class-conditional probability density function (pdf) has a parametric form, maximum likelihood estimates the parameters by differentiating the log-likelihood function. For the Gaussian case with unknown mean and variance, the estimates are simply the sample mean and variance. 4) Bayesian estimation proceeds in three phases: 1) applying Bayes' formula to derive the posterior pdf of parameters from their prior pdf and

Uploaded by

Srikanta Karthik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Chapter 3

Maximum-Likelihood and
Bayesian Parameter Estimation

Pattern Recognition Soochow, Fall Semester 1


Bayes Theorem for Classification

To compute posterior probability , we need to know:

Prior probability: Likelihood:

The collection of training examples is composed of c data sets 
 Each example in       is drawn according to the class‐
conditional pdf, i.e. 
 Examples in       are i.i.d. random variables, i.e. 
independent and identically distributed (独立同
分布)

Pattern Recognition Soochow, Fall Semester 2


Bayes Theorem for Classification (Cont.)
For prior probability: no difficulty
(Here,      returns the cardinality, 
i.e. number of elements, of a set)
For class-conditional pdf:
Ch. 3  Case I:                has certain parametric form
e.g.: (parameters: )

To show the dependence of 
on      explicitly:

Ch. 4  Case II:                doesn’t have parametric form
Pattern Recognition Soochow, Fall Semester 3
Estimation Under Parametric Form
Parametric class-conditional pdf:

 Assumption I: Maximum‐Likelihood (ML) estimation (极大似然估计)
Estimate parameter values by 
View parameters as 
maximizing the likelihood
quantities whose values 
(probability) of observing the 
are fixed but unknown
actual training examples

 Assumption II: Bayesian estimation (贝叶斯估计)

View parameters as  Observation of the actual training 
random variables  examples transforms parameters’ 
having some known  prior distribution into posterior 
prior distribution distribution (via Bayes theorem)

Pattern Recognition Soochow, Fall Semester 4


Maximum-Likelihood Estimation
Settings
Likelihood function for each category is governed by some 
fixed but unknown parameters, i.e. 
Task: Estimate                from
A simplified treatment
Examples in       gives no information about 

Work with each category separately and therefore simplify 
the notations by dropping subscripts w.r.t. categories
without loss of generality:

Pattern Recognition Soochow, Fall Semester 5


Maximum-Likelihood Estimation (Cont.)
Parameters to be estimated
A set of i.i.d. examples

The objective function


The likelihood of     w.r.t. the 
set of observed examples

The maximum-likelihood estimation


Intuitively,      best agrees with 
the actually observed examples

Pattern Recognition Soochow, Fall Semester 6


Maximum-Likelihood Estimation (Cont.)
Gradient Operator (梯度算子)
 Let                                       be a p‐dimensional vector
 Let                      be p‐variate real‐valued function over μ

is named as the log‐likelihood function

Pattern Recognition Soochow, Fall Semester 7


Maximum-Likelihood Estimation (Cont.)

p‐dimensional vector with  p‐variate real‐valued 


each component being a  function over     (not 
function over over xk) 

Necessary conditions for ML estimate


(a set of p equations)

Pattern Recognition Soochow, Fall Semester 8


The Gaussian Case: Unknown

suppose      is known

Pattern Recognition Soochow, Fall Semester 9


The Gaussian Case: Unknown ¹
(Cont.) Intuitive result
¹
ML estimate for the unknown      
is just the arithmetic average of 
training samples – sample mean

(necessary condition
for ML estimate     )  Multiply       on 
both sides

Pattern Recognition Soochow, Fall Semester 10


The Gaussian Case: Unknown ¹ and

Consider univariate case

Pattern Recognition Soochow, Fall Semester 11


The Gaussian Case: Unknown ¹ and
(Cont.)

(xk ¡ μ1 )

(necessary condition
for ML estimate     and      ) 

Pattern Recognition Soochow, Fall Semester 12


The Gaussian Case: Unknown ¹ and
(Cont.)

ML estimate in univariate case

Pattern Recognition Soochow, Fall Semester 13


The Gaussian Case: Unknown ¹ and §
(Cont.) Intuitive
ML estimate in multivariate case result as well!

Arithmetic average of 
n vectors

Arithmetic average
of n matrices

Pattern Recognition Soochow, Fall Semester 14


Bayesian Estimation
Settings
 The parametric form of the likelihood function for 
each category is known
 However,      is considered to be random variables 
instead of being fixed (but unknown) values

In this case, we can no longer make a single ML estimate      
and then infer                based on            and 
How can we Fully exploit training examples!
proceed under
this situation

Pattern Recognition Soochow, Fall Semester 15


Bayesian Estimation (Cont.)

Eq.22 [pp.91]
Two assumptions

Eq.23 [pp.91]

Pattern Recognition Soochow, Fall Semester 16


Bayesian Estimation (Cont.)
Key problem
Determine

Treat each class  Simplify the class‐conditional pdf
independently notation                     as  

(      random variables w.r.t. parametric form)

(     is independent of      given     )

Pattern Recognition Soochow, Fall Semester 17


Bayesian Estimation: The General
Procedure
Phase I: prior pdf  posterior pdf (for  )

parametric 
form
training  posterior 
set Bayes pdf
Formula

prior pdf

Pattern Recognition Soochow, Fall Semester 18


Bayesian Estimation: The General
Procedure
Phase II: posterior pdf (for  )  class‐conditional pdf (for x)

parametric 
form
posterior  class‐conditional 
pdf Law of pdf
Total Prob.

Phase III: 

Pattern Recognition Soochow, Fall Semester 19


The Gaussian Case: Unknown
Consider univariate case: (     is known)

Phase I: prior pdf  posterior pdf (for  )

Gaussian parametric 
form

 Prior pdf still takes 
Gaussian form
 Other form of 
prior pdf could be 
How would look like in this case? assumed as well

Pattern Recognition Soochow, Fall Semester 20


The Gaussian Case: Unknown
(Cont.)

(                        is a constant
not related to    ) 

(examples in      are i.i.d.) 

Pattern Recognition Soochow, Fall Semester 21


The Gaussian Case: Unknown
(Cont.) is an exponential  is a 
function of a quadratic  normal pdf
function of as well

Pattern Recognition Soochow, Fall Semester 22


The Gaussian Case: Unknown
(Cont.)

Equating the

coefficients in
both form:

Pattern Recognition Soochow, Fall Semester 23


The Gaussian Case: Unknown
(Cont.)
Phase II: posterior pdf (for  )  class‐conditional pdf (for x)

How would look


like in this case?

Pattern Recognition Soochow, Fall Semester 24


The Gaussian Case: Unknown
(Cont.) Then, phase III
follows naturally
Eq.25 [pp.92] for prediction

Eq.36 [pp.95]

is an exponential  is a 
function of a quadratic  normal pdf
function of as well

Pattern Recognition Soochow, Fall Semester 25


The Gaussian Case: Unknown
(Multivariate)
(     is known)

Pattern Recognition Soochow, Fall Semester 26


Summary
 Key issue for PR
 Estimate prior and class‐conditional pdf from 
training set
 Basic assumption on training examples: i.i.d.
 Two strategies to the key issue
 Parametric form for class‐conditional pdf
 Maximum likelihood (ML) estimation

 Bayesian estimation

 No parametric form for class‐conditional pdf
Pattern Recognition Soochow, Fall Semester 27
Summary (Cont.)
 Maximum likelihood estimation
 Settings: parameters as fixed but unknown values

 The objective function: Log‐likelihood function

 Necessary conditions for ML estimation: gradient 
for the objective function should be zero vector

 The Gaussian case
 Unknown  ¹

 ¹
Unknown       and §

Pattern Recognition Soochow, Fall Semester 28


Summary (Cont.)
 Bayesian estimation
 Settings: parameters as random variables

 The general procedure
 Phase I: prior pdf  posterior pdf (for μ )

 Phase II: posterior pdf (for   ) 
μ class‐conditional pdf (for x)

 Phase III: prediction (Eq.22 [pp.91])

 The Gaussian case
 ¹
Unknown      : univariate and multivariate

Pattern Recognition Soochow, Fall Semester 29

You might also like