0% found this document useful (0 votes)

34 views34 pages

Chapter 3

This document discusses parameter estimation techniques, including maximum likelihood estimation and Bayesian estimation. It covers the contents of parameter estimation, maximum likelihood estimation, Bayesian estimation, and provides examples of parameter estimation for Gaussian distributions when the mean and variance are known or unknown. The goal of parameter estimation is to estimate unknown parameter vectors from sample data sets using maximum likelihood or Bayesian approaches.

Uploaded by

Unryu Shi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views34 pages

Chapter 3

Uploaded by

Unryu Shi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

MIMA Group

M L
D M

Chapter 3
Parameter Estimation

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University

Contents MIMA

 Introduction
 Maximum-Likelihood Estimation
 Bayesian Estimation

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 2

Bayesian Theorem MIMA

p(x | i ) P(i )
P(i | x) 
p ( x)
c
p (x)   p (x | i ) P(i )
j 1

 To compute posterior probability P (i | x) , we

need to know:
p ( x | i ) P(i )

How can we get these values?

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 3

Samples MIMA

D  {D1 , D 2 ,  , D c } D1 D2
The samples in Dj are drawn
independently according to the
probability law p(x|j). That is,
examples in Dj are i.i.d. random
variables, i.e., independent D3
and identically distributed.

It is easy to compute the prior

Dj
P(i ) 
probability:

c
i 1
Di

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 4

Samples MIMA

 For class-conditional pdf:

 Case I: p(x|j) has certain parametric form
 e.g.

p (x |  j ) ~ N (μ j , Σ j )
j θ j  (1 ,  2 ,  ,  m ) T

 If X  R d j contains “d+d(d+1)/2” free parameters.

 Case II: p(x|j) doesn’t have parametric form

 Next chapter.

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 5

Goal MIMA

θ̂ 2
1 θ̂
D  {D1 , D 2 ,  , D c } D1 1 2
D2
p(x |  j )  p(x | θ j )

D3
Use Dj to estimate the unknown 3 θ̂ 3
parameter vector j

θ j  (1 ,  2 ,  ,  m ) T

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 6

Estimation Under Parametric Form MIMA

 Maximum-Likelihood Estimation
View parameters as Estimate parameter values by
quantities whose maximizing the likelihood
values are fixed but (probability) of observing the
unknown actual examples.

 Bayesian Estimation

View parameters as Observation of the actual

random variables training examples transforms
having some known parameters’ prior into posterior
prior distribution distribution. (via Bayes rule)

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 7

Maximum-Likelihood Estimation MIMA

 Because each class is considered individually,

the subscript used before will be dropped.
 Now the problem becomes:

Given a sample set D, whose elements are

drawn independently from a population
possessing a known parameter form, say p(x|),
D
we want to choose a θ̂ that will make D to
occur most likely.

θ̂
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 8
Maximum-Likelihood Estimation (Cont.)
MIMA

 Criterion of ML
D  {x1 , x 2 ,  , x n }
 By the independence assumption, we have
n
p(D | θ)  p (x1 | θ) p(x 2 | θ)  p(x n | θ)   p (x k | θ)
k 1

 The Likelihood Function

n
L(θ | D )  p (D | θ)   p(x k | θ)
k 1
 The maximum-likelihood
θˆ  arg max L( | D)
estimation: 

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 9

Maximum-Likelihood Estimation (Cont.)
MIMA

 Often, we resort to maximize the log-likelihood

function
n
l (θ | D )  ln L(θ | D )   ln p (x k | θ)
k 1

θˆ  arg max l (θ | D )
θ

why?
θˆ  arg max L(θ | D )
θ

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 10

Maximum-Likelihood Estimation (Cont.)
MIMA

 Find the extreme values using the method in

differential calculus.
 Gradient Operator
 Let f() be a continuous function, where =(1, 2,…, n)T.

T
Gradient     
Operator  θ   , , , 
 1  2  n 

 Find the extreme values by solving

θ f  0
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 11
The Gaussian Case I MIMA

 Case I: unknown , and  is known

1  1 1 
p ( x | μ, Σ )  exp  (x  μ) Σ (x  μ)
T

(2 ) d / 2 | Σ |1/ 2  2 
n
L(μ | D )  p (D | μ)   p (x k | θ)
k 1
1  1 n

n/2 
1
 exp   ( k )
x μ T
Σ ( k  )
x μ
(2 ) nd / 2
| Σ | k 1  2 

l (μ | D )  ln L(μ | D )
1 n
  ln(2 ) nd / 2
|Σ| n/2
  (x k  μ)T Σ 1 (x k  μ)
2 k 1
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 12
The Gaussian Case I MIMA

1 n
μˆ   x k Sample Mean!
n k 1

Intuitive Result: Maximum estimate for the unknown  is just

the arithmetic average of training samples---sample mean.

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 13

The Gaussian Case II MIMA

 Case II: both  and  are unknown

 Consider univariate case
1  ( x   )2 
p( x |  ,  ) 2
exp   θ  (1 , 2 )T  (  ,  2 )T
2   2 2

n
1 n
 ( xk   ) 2 
L(θ | D )  p (D | θ)   p ( xk | θ)  n/2 n 
exp  
k 1 ( 2 )  k 1  2 2


1 n
l (θ | D )  ln L(θ | D )   ln(2 ) n/2
 
n

2 2
 k
( x
k 1
  ) 2

1 n
  ln(2 )  2  k 1 
n/2
n/2
 ( x  ) 2

2 2 k 1

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 14

The Gaussian Case II MIMA

1 n
l (θ | D )   ln(2 ) n / 2  2  k 1 
n/2
 ( x  ) 2

2 2 k 1

 1 n  Unbiased Estimator:
   k 1 ( x   )  E [θˆ]  θ
 θl (θ | D )   2 k 1

2 0
 n  ( xk  1 )  Consistent Estimator:
n

 2  2 2  lim E[θˆ ]  θ
 2 k  1 2  n 
unbiased
1 n
ˆ  ˆ1   xk Arithmetic average of n vectors
n k 1
1 n
ˆ 2  ˆ2   ( xk  ˆ ) 2 Arithmetic average of n matrices
n k 1
(x k  μˆ )(x k  μˆ )T
biased
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 15
MLE for Normal Population MIMA

1 n Sample Mean
μˆ   x k
n k 1 E[μˆ ]  μ
1 n
ˆ   (x  μˆ )(x  μˆ )T ˆ n 1
Σ k k
E[ Σ ]  ΣΣ
n k 1 n
1 n

Sample Covariance Matrix
C (x k  μ)(x k  μ)
ˆ ˆ T

n  1 k 1 E[C]  Σ

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 16

MIMA

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 17

Bayesian Estimation MIMA

 Settings
 The parametric form of the likelihood function for each
category is known
 However, j is considered to be random variables
instead of being fixed (but unknown) values.

In this case, we can no longer make a single ML estimate θ̂

and then infer P(i | x) based on P(i ) and p(x | i )

How can we Fully exploit training

proceed? examples!

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 18

Posterior Probabilities from sample MIMA

P(i , x, D ) P(i , x, D )
P(i | x, D )  
P( x, D )  P( j , x, D )
c
j 1

P(i , x, D )  P( D)  P(i , x | D )  P( D)  P(i | D )  P( x | i , D )

Assumptions:
P(i | D )  P(i ) P(x | i , D i ) P(i )
P(i | x, D )  c

P ( x | i , D )  P ( x |  i , D i )  P(x |  , D
j 1
j j ) P( j )

Each class can be

considered independently
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 19
Problem Formulation MIMA

P(x | i , D i ) P(i )
P(i | x, D )  c

 P(x |  , D
j 1
j j ) P( j )

The key problem is to determine, P(x | i , D i ) ,treat each class

independently, the problem becomes P(x | D )

This is always the central problem of Bayesian Learning.

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 20

Class-Conditional Density Estimation MIMA

Assume p(x) is unknown but knowing it has a

fixed form with parameter vector .
p (x | D )   p (x, θ | D)dθ  :Random variable w.r.t. parametric form

  p (x | θ, D) p (θ | D )dθ

  p (x | θ) p (θ | D )dθ x is independent of D given 

The form of The posterior density

distribution is assumed we want to estimate
known
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 21
Bayesian Estimation: General Procedure MIMA

p (θ | D )  ?
Phase I:

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 22

Bayesian Estimation: General Procedure MIMA

Phase II:
p(x | D )   p(x | θ) p(θ | D )dθ

Phase III:
P(x | i , D i ) P(i )
P(i | x, D )  c

 P (x |  , D
j 1
j j ) P ( j )

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 23

The Gaussian Case MIMA

 The univariate Gaussian: unknown 

Phase I: p (θ | D )    p (x k | θ) p (θ)
k 1

p(  )  p( x |  )  D p(  | D)

1  1 x  
2

p( x |  )  exp    
2   2    

1  1    0  
2

p(  )  exp    
2  0  2   0  

Other form of prior pdf could be assumed as well

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 24
The Gaussian Case MIMA
1  1   0 
 
2
1  1  x   2 
p(  )  exp     p( x |  )  exp    
2  0  2   0   2   2    

n
p(θ | D )    p(x k | θ) p(θ)
k 1

n
1  1  xk   
2
 1  1    0  
2

p(  | D )    exp     exp   
k 1 2   2     2  0  2   0  

 1  n x   2      2 
 
   exp    k    0
 
 2  k 1      0  
 

 1  n 1  2  1 n
0   
 
  exp   2  2    2 2  xk  2    
 2   0   k 1  0   

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 25

The Gaussian Case MIMA

p (  | D ) is an exponential function of a quadratic function of ;

thus p (  | D ) is also a normal.
p (  | D ) ~ N (  n ,  n2 )
1  1    n  
2

p(  | D )  exp   
2  n  2   n  

 1 2 

1

exp  2   2 n    n 
2


Comparison
2  n  2 n 

 1  n 1  2  1 n
0   
p(  | D )    exp   2  2    2 2  xk  2    
 2   0   k 1  0   

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 26

The Gaussian Case MIMA

 Equating the coefficients in both form; then, we

have

 n 02  2 1 n
 n   2  ˆ 
2  n
0 ̂ n   xk
 n 0    n 0  
2 2
n k 1

  2 2
 2 0
n   2 2
n
0

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 27

The Gaussian Case MIMA

Phase II: p (x | D )   p (x | θ) p (θ | D )dθ

p(  | D)  p( x |  ) p( x | D)

1  1 x  
2

p( x |  )  exp    
2   2    

p (  | D ) ~ N (  n ,  n2 )

How would p(x|D) look like in this case?

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 28
The Gaussian Case MIMA

1  1  x   2 
p (x | D )   p(x | u ) p (u | D )dθ p( x |  )  exp    
2   2    

p (  | D ) ~ N (  n ,  n2 )

1  1 x 
2
  1    n  
2

p( x | D ) 
2 n  exp     exp  
 2    
 d
 2   n  

1  1 ( x  n ) 2
  1  n
2 2
  n x   n 
2 2

2

2 
 exp  exp  2 2 
   d
2 n  2  n 
2
 2   n   n  
2 2


p(x|D) is an exponential function of a quadratic

function of x; thus, it is also a normal pdf. =?
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 29
The Gaussian Case MIMA

1  1  x   2 
p (x | D )   p(x | u ) p (u | D )dθ p( x |  )  exp    
2   2    

p (  | D ) ~ N (  n ,  n2 )

p( x | D ) ~ N (  ,    ) 2 2
1  1 x 
2
  1    n  
2

p( x | D ) 
2 n  exp     exp  
 2     2  n  
 n
 d
n


1  1 ( x  n ) 2
  1  n
2 2
  n x   n 
2 2

2

2 
 exp  exp  2 2 
   d
2 n  2  n 
2
 2   n   n  
2 2


p(x|D) is an exponential function of a quadratic

function of x; thus, it is also a normal pdf. =?
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 30
The Gaussian Case MIMA

Phase III:

P(x | i , D i ) P(i )
P(i | x, D )  c

 P(x |  , D
j 1
j j ) P ( j )

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 31

Summary MIMA

 Key issue
 Estimate prior and class-conditional pdf from training
set
 Basic assumption on training examples: i.i.d.
 Two strategies to key issue
 Parametric form for class-conditional pdf
 Maximum likelihood estimation

 Bayesian estimation

 No parametric form for class-conditional pdf

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 32

Summary MIMA

 Maximum likelihood estimation

 Settings: parameters as fixed but unknown values
 The objective function: log-likelihood function
 The gradient for the objective function should be zero
 Gaussian
 Bayesian estimation
 Settings: parameters as random variables
 General procedure: I, II, III
 Gaussian case
Project 3.2

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 33

MIMA Group

Any Question?

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University

2018 G11 Math E PDF
No ratings yet
2018 G11 Math E PDF
244 pages
Naive Bayes Classifier and Other Topics
No ratings yet
Naive Bayes Classifier and Other Topics
52 pages
Lec11 Introduction2BayesianStatistics
No ratings yet
Lec11 Introduction2BayesianStatistics
48 pages
Steel Construction and Design Manual 6th
No ratings yet
Steel Construction and Design Manual 6th
54 pages
562
No ratings yet
562
98 pages
Lecture 2 - Unit 1 - Types of Research
No ratings yet
Lecture 2 - Unit 1 - Types of Research
17 pages
Class13-PatternClassification BayesClassifier UnimodalDensity
No ratings yet
Class13-PatternClassification BayesClassifier UnimodalDensity
30 pages
CSE 108.04 Analysis & Design of Reinforced Concrete Foundati
No ratings yet
CSE 108.04 Analysis & Design of Reinforced Concrete Foundati
85 pages
CS775 Lec 2
No ratings yet
CS775 Lec 2
66 pages
Maximum Likelihood and Bayesian Parameter Estimation: Chapter 3, DHS
No ratings yet
Maximum Likelihood and Bayesian Parameter Estimation: Chapter 3, DHS
35 pages
Unit 5 - Machine Learning
No ratings yet
Unit 5 - Machine Learning
16 pages
Notes and Solutions For: Pattern Recognition by Sergios Theodoridis and Konstantinos Koutroumbas.
100% (1)
Notes and Solutions For: Pattern Recognition by Sergios Theodoridis and Konstantinos Koutroumbas.
209 pages
Bayesian Learning
No ratings yet
Bayesian Learning
21 pages
1st Lessonl
No ratings yet
1st Lessonl
4 pages
Interface Manual
100% (1)
Interface Manual
22 pages
A Study On Power Assists For Bicycle Rickshaws in India
No ratings yet
A Study On Power Assists For Bicycle Rickshaws in India
47 pages
7.estimation Clustering
No ratings yet
7.estimation Clustering
56 pages
HKCEE 1984 Mathematics II: N N N N
No ratings yet
HKCEE 1984 Mathematics II: N N N N
10 pages
L23 Bayesian Naive
No ratings yet
L23 Bayesian Naive
18 pages
4.ML Estimation
No ratings yet
4.ML Estimation
19 pages
Naïve Bayes Classifier: Ke Chen
No ratings yet
Naïve Bayes Classifier: Ke Chen
19 pages
Aiml Lab Algorithms
No ratings yet
Aiml Lab Algorithms
10 pages
Sergios Theodoridis Konstantinos Koutroumbas
No ratings yet
Sergios Theodoridis Konstantinos Koutroumbas
76 pages
Azure Synapse DW - Pool Best Practices & Field Guidance: Prepared by
No ratings yet
Azure Synapse DW - Pool Best Practices & Field Guidance: Prepared by
41 pages
Lecture 4
No ratings yet
Lecture 4
51 pages
Statistical Machine Learning W4400 Lecture Slides PDF
No ratings yet
Statistical Machine Learning W4400 Lecture Slides PDF
520 pages
Textile Wet Processing Through Natural Product
No ratings yet
Textile Wet Processing Through Natural Product
14 pages
Chapter 4
No ratings yet
Chapter 4
19 pages
Dr. Arslan Shaukat
No ratings yet
Dr. Arslan Shaukat
18 pages
Tactical Barbell Interactive Spreadsheet - Improved
No ratings yet
Tactical Barbell Interactive Spreadsheet - Improved
10 pages
Bayesian and MLE
No ratings yet
Bayesian and MLE
30 pages
Logfile
No ratings yet
Logfile
19 pages
Holiday Homework Science 2023
No ratings yet
Holiday Homework Science 2023
17 pages
Homework 1 - Theoretical Part: IFT 6390 Fundamentals of Machine Learning Ioannis Mitliagkas
No ratings yet
Homework 1 - Theoretical Part: IFT 6390 Fundamentals of Machine Learning Ioannis Mitliagkas
6 pages
Assignment 10 Solution
No ratings yet
Assignment 10 Solution
8 pages
Sci 10 Lesson 1st Week
No ratings yet
Sci 10 Lesson 1st Week
18 pages
SAE AMS 5529j-2012
No ratings yet
SAE AMS 5529j-2012
5 pages
ML Map and Bayseian
No ratings yet
ML Map and Bayseian
35 pages
جلسه پنجم-1
No ratings yet
جلسه پنجم-1
15 pages
Exponential Function
No ratings yet
Exponential Function
22 pages
SL09. Bayesian Learning
No ratings yet
SL09. Bayesian Learning
4 pages
GEC 2-Mathematics in The Modern World: (Module 3 Week 7-9)
No ratings yet
GEC 2-Mathematics in The Modern World: (Module 3 Week 7-9)
7 pages
DSP Lect 99.L1
No ratings yet
DSP Lect 99.L1
31 pages
Chapter 4 ML Parametric Classification
No ratings yet
Chapter 4 ML Parametric Classification
42 pages
Electrical Syllabus
No ratings yet
Electrical Syllabus
2 pages
Puneet Tandon PDF
No ratings yet
Puneet Tandon PDF
2 pages
Maximum-Likelihood & Bayesian Parameter Estimation: Srihari: CSE 555
No ratings yet
Maximum-Likelihood & Bayesian Parameter Estimation: Srihari: CSE 555
9 pages
L6: Parameter Estimation: Parameter Estimation Maximum Likelihood Bayesian Estimation Numerical Examples
No ratings yet
L6: Parameter Estimation: Parameter Estimation Maximum Likelihood Bayesian Estimation Numerical Examples
15 pages
Nayes Bayes Classifier
No ratings yet
Nayes Bayes Classifier
46 pages
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
No ratings yet
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
56 pages
4.2 Bayes Decision Theory
No ratings yet
4.2 Bayes Decision Theory
49 pages
PR January20 04 PDF
No ratings yet
PR January20 04 PDF
40 pages
Ephysicsl Experiment 6 - Torque - Finalreport
No ratings yet
Ephysicsl Experiment 6 - Torque - Finalreport
4 pages
ML 05 Bayesian Classifier
No ratings yet
ML 05 Bayesian Classifier
19 pages
Bernanke and Blinder (1988)
No ratings yet
Bernanke and Blinder (1988)
5 pages
Session 32 - Point Estimate
No ratings yet
Session 32 - Point Estimate
53 pages
Lecture 03 Bayes Classifier With Prob Concepts
No ratings yet
Lecture 03 Bayes Classifier With Prob Concepts
70 pages
Part 1: C C: Ode and Ommentary
No ratings yet
Part 1: C C: Ode and Ommentary
1 page
2 Mle
No ratings yet
2 Mle
28 pages
Survey of Failures in Wind Power Systems With Focus On Swedish Wind Power Plants During 19972005
No ratings yet
Survey of Failures in Wind Power Systems With Focus On Swedish Wind Power Plants During 19972005
7 pages
Ba Yes Naive
No ratings yet
Ba Yes Naive
15 pages
Testing and Evaluating Glycol Sample
No ratings yet
Testing and Evaluating Glycol Sample
3 pages
Bayesian
No ratings yet
Bayesian
91 pages
Naïve Bayes Classifier: April 25, 2006
No ratings yet
Naïve Bayes Classifier: April 25, 2006
19 pages
IB Math Studies Project Assessment Criteria
No ratings yet
IB Math Studies Project Assessment Criteria
8 pages
PBM Notes
No ratings yet
PBM Notes
130 pages
L3 (Week3) Bayesian Classifier
No ratings yet
L3 (Week3) Bayesian Classifier
21 pages
Alternating Current Short Notes
No ratings yet
Alternating Current Short Notes
4 pages
Module05 - Bayesian Reasoning
No ratings yet
Module05 - Bayesian Reasoning
37 pages
Wk04 Machine Learning
No ratings yet
Wk04 Machine Learning
6 pages
I2ml3e Chap4
No ratings yet
I2ml3e Chap4
28 pages
Weatherwax Theodoridis Solutions
No ratings yet
Weatherwax Theodoridis Solutions
212 pages
Learning Models From Data: 1 Parametric Estimation
No ratings yet
Learning Models From Data: 1 Parametric Estimation
14 pages
Bayesian Learning: Berrin Yanikoglu
No ratings yet
Bayesian Learning: Berrin Yanikoglu
64 pages
Machine Learning: Lecture 6: Bayesian Learning (Based On Chapter 6 of Mitchell T.., Machine Learning, 1997)
No ratings yet
Machine Learning: Lecture 6: Bayesian Learning (Based On Chapter 6 of Mitchell T.., Machine Learning, 1997)
15 pages
Agricultural Land Use in Kerala
No ratings yet
Agricultural Land Use in Kerala
5 pages
10-701/15-781, Machine Learning: Homework 1: Aarti Singh Carnegie Mellon University
No ratings yet
10-701/15-781, Machine Learning: Homework 1: Aarti Singh Carnegie Mellon University
6 pages
Unit 5 - Machine Learning - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Machine Learning - WWW - Rgpvnotes.in
17 pages
CSCE 970 Lecture 2: Bayesian-Based Classifiers: Most Probable
No ratings yet
CSCE 970 Lecture 2: Bayesian-Based Classifiers: Most Probable
5 pages
MLE and MAP Ex PG 1-4 Print
No ratings yet
MLE and MAP Ex PG 1-4 Print
10 pages
6.867 Section 3: Classification: 1 Intro 2 2 Representation 2 3 Probabilistic Models 2
No ratings yet
6.867 Section 3: Classification: 1 Intro 2 2 Representation 2 3 Probabilistic Models 2
10 pages
AutoCAD 2012 Full Version Gratis
No ratings yet
AutoCAD 2012 Full Version Gratis
3 pages
Duda Solutions PDF
No ratings yet
Duda Solutions PDF
77 pages
Max Likelihood
No ratings yet
Max Likelihood
4 pages
A Pattern Is An Abstract Object, Such As A Set of Measurements Describing A Physical Object
No ratings yet
A Pattern Is An Abstract Object, Such As A Set of Measurements Describing A Physical Object
12 pages
Point Estimation: Definition of Estimators
No ratings yet
Point Estimation: Definition of Estimators
8 pages
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Chapter 3

Uploaded by

Chapter 3

Uploaded by

MIMA Group

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 2

 To compute posterior probability P (i | x) , we

How can we get these values?

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 3

It is easy to compute the prior

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 4

 For class-conditional pdf:

 If X  R d j contains “d+d(d+1)/2” free parameters.

 Case II: p(x|j) doesn’t have parametric form

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 5

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 6

View parameters as Observation of the actual

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 7

 Because each class is considered individually,

Given a sample set D, whose elements are

 The Likelihood Function

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 9

 Often, we resort to maximize the log-likelihood

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 10

 Find the extreme values using the method in

 Find the extreme values by solving

 Case I: unknown , and  is known

Intuitive Result: Maximum estimate for the unknown  is just

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 13

 Case II: both  and  are unknown

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 14

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 16

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 17

In this case, we can no longer make a single ML estimate θ̂

How can we Fully exploit training

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 18

P(i , x, D )  P( D)  P(i , x | D )  P( D)  P(i | D )  P( x | i , D )

Each class can be

The key problem is to determine, P(x | i , D i ) ,treat each class

This is always the central problem of Bayesian Learning.

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 20

Assume p(x) is unknown but knowing it has a

  p (x | θ) p (θ | D )dθ x is independent of D given 

The form of The posterior density

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 22

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 23

 The univariate Gaussian: unknown 

Other form of prior pdf could be assumed as well

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 25

p (  | D ) is an exponential function of a quadratic function of ;

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 26

 Equating the coefficients in both form; then, we

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 27

Phase II: p (x | D )   p (x | θ) p (θ | D )dθ

How would p(x|D) look like in this case?

p(x|D) is an exponential function of a quadratic

p(x|D) is an exponential function of a quadratic

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 31

 No parametric form for class-conditional pdf

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 32

 Maximum likelihood estimation

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 33

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University

You might also like