0% found this document useful (0 votes)
34 views34 pages

Chapter 3

This document discusses parameter estimation techniques, including maximum likelihood estimation and Bayesian estimation. It covers the contents of parameter estimation, maximum likelihood estimation, Bayesian estimation, and provides examples of parameter estimation for Gaussian distributions when the mean and variance are known or unknown. The goal of parameter estimation is to estimate unknown parameter vectors from sample data sets using maximum likelihood or Bayesian approaches.

Uploaded by

Unryu Shi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views34 pages

Chapter 3

This document discusses parameter estimation techniques, including maximum likelihood estimation and Bayesian estimation. It covers the contents of parameter estimation, maximum likelihood estimation, Bayesian estimation, and provides examples of parameter estimation for Gaussian distributions when the mean and variance are known or unknown. The goal of parameter estimation is to estimate unknown parameter vectors from sample data sets using maximum likelihood or Bayesian approaches.

Uploaded by

Unryu Shi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

MIMA Group

M L
D M

Chapter 3
Parameter Estimation

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University


Contents MIMA

 Introduction
 Maximum-Likelihood Estimation
 Bayesian Estimation

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 2


Bayesian Theorem MIMA

p(x | i ) P(i )
P(i | x) 
p ( x)
c
p (x)   p (x | i ) P(i )
j 1

 To compute posterior probability P (i | x) , we


need to know:
p ( x | i ) P(i )

How can we get these values?

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 3


Samples MIMA

D  {D1 , D 2 ,  , D c } D1 D2
The samples in Dj are drawn
independently according to the
probability law p(x|j). That is,
examples in Dj are i.i.d. random
variables, i.e., independent D3
and identically distributed.

It is easy to compute the prior


Dj
P(i ) 
probability:

c
i 1
Di

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 4


Samples MIMA

 For class-conditional pdf:


 Case I: p(x|j) has certain parametric form
 e.g.

p (x |  j ) ~ N (μ j , Σ j )
j θ j  (1 ,  2 ,  ,  m ) T

 If X  R d j contains “d+d(d+1)/2” free parameters.

 Case II: p(x|j) doesn’t have parametric form


 Next chapter.

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 5


Goal MIMA

θ̂ 2
1 θ̂
D  {D1 , D 2 ,  , D c } D1 1 2
D2
p(x |  j )  p(x | θ j )

D3
Use Dj to estimate the unknown 3 θ̂ 3
parameter vector j

θ j  (1 ,  2 ,  ,  m ) T

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 6


Estimation Under Parametric Form MIMA

 Maximum-Likelihood Estimation
View parameters as Estimate parameter values by
quantities whose maximizing the likelihood
values are fixed but (probability) of observing the
unknown actual examples.

 Bayesian Estimation

View parameters as Observation of the actual


random variables training examples transforms
having some known parameters’ prior into posterior
prior distribution distribution. (via Bayes rule)

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 7


Maximum-Likelihood Estimation MIMA

 Because each class is considered individually,


the subscript used before will be dropped.
 Now the problem becomes:

Given a sample set D, whose elements are


drawn independently from a population
possessing a known parameter form, say p(x|),
D
we want to choose a θ̂ that will make D to
occur most likely.

θ̂
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 8
Maximum-Likelihood Estimation (Cont.)
MIMA

 Criterion of ML
D  {x1 , x 2 ,  , x n }
 By the independence assumption, we have
n
p(D | θ)  p (x1 | θ) p(x 2 | θ)  p(x n | θ)   p (x k | θ)
k 1

 The Likelihood Function


n
L(θ | D )  p (D | θ)   p(x k | θ)
k 1
 The maximum-likelihood
θˆ  arg max L( | D)
estimation: 

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 9


Maximum-Likelihood Estimation (Cont.)
MIMA

 Often, we resort to maximize the log-likelihood


function
n
l (θ | D )  ln L(θ | D )   ln p (x k | θ)
k 1

θˆ  arg max l (θ | D )
θ

why?
θˆ  arg max L(θ | D )
θ

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 10


Maximum-Likelihood Estimation (Cont.)
MIMA

 Find the extreme values using the method in


differential calculus.
 Gradient Operator
 Let f() be a continuous function, where =(1, 2,…, n)T.

T
Gradient     
Operator  θ   , , , 
 1  2  n 

 Find the extreme values by solving


θ f  0
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 11
The Gaussian Case I MIMA

 Case I: unknown , and  is known


1  1 1 
p ( x | μ, Σ )  exp  (x  μ) Σ (x  μ)
T

(2 ) d / 2 | Σ |1/ 2  2 
n
L(μ | D )  p (D | μ)   p (x k | θ)
k 1
1  1 n

n/2 
1
 exp   ( k )
x μ T
Σ ( k  )
x μ
(2 ) nd / 2
| Σ | k 1  2 

l (μ | D )  ln L(μ | D )
1 n
  ln(2 ) nd / 2
|Σ| n/2
  (x k  μ)T Σ 1 (x k  μ)
2 k 1
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 12
The Gaussian Case I MIMA

l (μ | D )  ln L(μ | D )
1 n
  ln(2 ) nd / 2
|Σ| n/2
  (x k  μ)T Σ 1 (x k  μ)
2 k 1
n
 μ l (μ | D )   Σ 1 (x k  μ)  0
k 1

1 n
μˆ   x k Sample Mean!
n k 1

Intuitive Result: Maximum estimate for the unknown  is just


the arithmetic average of training samples---sample mean.

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 13


The Gaussian Case II MIMA

 Case II: both  and  are unknown


 Consider univariate case
1  ( x   )2 
p( x |  ,  ) 2
exp   θ  (1 , 2 )T  (  ,  2 )T
2   2 2

n
1 n
 ( xk   ) 2 
L(θ | D )  p (D | θ)   p ( xk | θ)  n/2 n 
exp  
k 1 ( 2 )  k 1  2 2

1 n
l (θ | D )  ln L(θ | D )   ln(2 ) n/2
 
n

2 2
 k
( x
k 1
  ) 2

1 n
  ln(2 )  2  k 1 
n/2
n/2
 ( x  ) 2

2 2 k 1

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 14


The Gaussian Case II MIMA

1 n
l (θ | D )   ln(2 ) n / 2  2  k 1 
n/2
 ( x  ) 2

2 2 k 1

 1 n  Unbiased Estimator:
   k 1 ( x   )  E [θˆ]  θ
 θl (θ | D )   2 k 1

2 0
 n  ( xk  1 )  Consistent Estimator:
n

 2  2 2  lim E[θˆ ]  θ
 2 k  1 2  n 
unbiased
1 n
ˆ  ˆ1   xk Arithmetic average of n vectors
n k 1
1 n
ˆ 2  ˆ2   ( xk  ˆ ) 2 Arithmetic average of n matrices
n k 1
(x k  μˆ )(x k  μˆ )T
biased
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 15
MLE for Normal Population MIMA

1 n Sample Mean
μˆ   x k
n k 1 E[μˆ ]  μ
1 n
ˆ   (x  μˆ )(x  μˆ )T ˆ n 1
Σ k k
E[ Σ ]  ΣΣ
n k 1 n
1 n

Sample Covariance Matrix
C (x k  μ)(x k  μ)
ˆ ˆ T

n  1 k 1 E[C]  Σ

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 16


MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 17


Bayesian Estimation MIMA

 Settings
 The parametric form of the likelihood function for each
category is known
 However, j is considered to be random variables
instead of being fixed (but unknown) values.

In this case, we can no longer make a single ML estimate θ̂


and then infer P(i | x) based on P(i ) and p(x | i )

How can we Fully exploit training


proceed? examples!

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 18


Posterior Probabilities from sample MIMA

P(i , x, D ) P(i , x, D )
P(i | x, D )  
P( x, D )  P( j , x, D )
c
j 1

P(i , x, D )  P( D)  P(i , x | D )  P( D)  P(i | D )  P( x | i , D )

Assumptions:
P(i | D )  P(i ) P(x | i , D i ) P(i )
P(i | x, D )  c

P ( x | i , D )  P ( x |  i , D i )  P(x |  , D
j 1
j j ) P( j )

Each class can be


considered independently
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 19
Problem Formulation MIMA

P(x | i , D i ) P(i )
P(i | x, D )  c

 P(x |  , D
j 1
j j ) P( j )

The key problem is to determine, P(x | i , D i ) ,treat each class


independently, the problem becomes P(x | D )

This is always the central problem of Bayesian Learning.

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 20


Class-Conditional Density Estimation MIMA

Assume p(x) is unknown but knowing it has a


fixed form with parameter vector .
p (x | D )   p (x, θ | D)dθ  :Random variable w.r.t. parametric form

  p (x | θ, D) p (θ | D )dθ

  p (x | θ) p (θ | D )dθ x is independent of D given 

The form of The posterior density


distribution is assumed we want to estimate
known
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 21
Bayesian Estimation: General Procedure MIMA

p (θ | D )  ?
Phase I:

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 22


Bayesian Estimation: General Procedure MIMA

Phase II:
p(x | D )   p(x | θ) p(θ | D )dθ

Phase III:
P(x | i , D i ) P(i )
P(i | x, D )  c

 P (x |  , D
j 1
j j ) P ( j )

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 23


The Gaussian Case MIMA

 The univariate Gaussian: unknown 


n

Phase I: p (θ | D )    p (x k | θ) p (θ)
k 1

p(  )  p( x |  )  D p(  | D)

1  1 x  
2

p( x |  )  exp    
2   2    

1  1    0  
2

p(  )  exp    
2  0  2   0  

Other form of prior pdf could be assumed as well


Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 24
The Gaussian Case MIMA
1  1   0 
 
2
1  1  x   2 
p(  )  exp     p( x |  )  exp    
2  0  2   0   2   2    

n
p(θ | D )    p(x k | θ) p(θ)
k 1

n
1  1  xk   
2
 1  1    0  
2

p(  | D )    exp     exp   
k 1 2   2     2  0  2   0  

 1  n x   2      2 
 
   exp    k    0
 
 2  k 1      0  
 

 1  n 1  2  1 n
0   
 
  exp   2  2    2 2  xk  2    
 2   0   k 1  0   

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 25


The Gaussian Case MIMA

p (  | D ) is an exponential function of a quadratic function of ;


thus p (  | D ) is also a normal.
p (  | D ) ~ N (  n ,  n2 )
1  1    n  
2

p(  | D )  exp   
2  n  2   n  

 1 2 

1

exp  2   2 n    n 
2

Comparison
2  n  2 n 

 1  n 1  2  1 n
0   
p(  | D )    exp   2  2    2 2  xk  2    
 2   0   k 1  0   

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 26


The Gaussian Case MIMA

 Equating the coefficients in both form; then, we


have

 n 02  2 1 n
 n   2  ˆ 
2  n
0 ̂ n   xk
 n 0    n 0  
2 2
n k 1

  2 2
 2 0
n   2 2
n
0

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 27


The Gaussian Case MIMA

Phase II: p (x | D )   p (x | θ) p (θ | D )dθ

p(  | D)  p( x |  ) p( x | D)

1  1 x  
2

p( x |  )  exp    
2   2    

p (  | D ) ~ N (  n ,  n2 )

How would p(x|D) look like in this case?


Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 28
The Gaussian Case MIMA

1  1  x   2 
p (x | D )   p(x | u ) p (u | D )dθ p( x |  )  exp    
2   2    

p (  | D ) ~ N (  n ,  n2 )

1  1 x 
2
  1    n  
2

p( x | D ) 
2 n  exp     exp  
 2    
 d
 2   n  

1  1 ( x  n ) 2
  1  n
2 2
  n x   n 
2 2

2

2 
 exp  exp  2 2 
   d
2 n  2  n 
2
 2   n   n  
2 2

p(x|D) is an exponential function of a quadratic


function of x; thus, it is also a normal pdf. =?
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 29
The Gaussian Case MIMA

1  1  x   2 
p (x | D )   p(x | u ) p (u | D )dθ p( x |  )  exp    
2   2    

p (  | D ) ~ N (  n ,  n2 )

p( x | D ) ~ N (  ,    ) 2 2
1  1 x 
2
  1    n  
2

p( x | D ) 
2 n  exp     exp  
 2     2  n  
 n
 d
n

1  1 ( x  n ) 2
  1  n
2 2
  n x   n 
2 2

2

2 
 exp  exp  2 2 
   d
2 n  2  n 
2
 2   n   n  
2 2

p(x|D) is an exponential function of a quadratic


function of x; thus, it is also a normal pdf. =?
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 30
The Gaussian Case MIMA

Phase III:

P(x | i , D i ) P(i )
P(i | x, D )  c

 P(x |  , D
j 1
j j ) P ( j )

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 31


Summary MIMA

 Key issue
 Estimate prior and class-conditional pdf from training
set
 Basic assumption on training examples: i.i.d.
 Two strategies to key issue
 Parametric form for class-conditional pdf
 Maximum likelihood estimation

 Bayesian estimation

 No parametric form for class-conditional pdf

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 32


Summary MIMA

 Maximum likelihood estimation


 Settings: parameters as fixed but unknown values
 The objective function: log-likelihood function
 The gradient for the objective function should be zero
 Gaussian
 Bayesian estimation
 Settings: parameters as random variables
 General procedure: I, II, III
 Gaussian case
Project 3.2

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 33


MIMA Group

Any Question?

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University

You might also like