0% found this document useful (0 votes)

71 views51 pages

Model Inference and Averaging: Dept. Computer Science & Engineering, Shanghai Jiao Tong University

This document discusses various statistical modeling and inference techniques, including the bootstrap method, Bayesian inference, the EM algorithm, MCMC sampling, and model averaging. It provides technical details on maximum likelihood estimation, describing how it works by choosing parameters to maximize the likelihood of obtaining the observed data. Examples are given for how MLE can be applied to estimate the mean parameter of Gaussian and Poisson distributions. The key steps of MLE are to define the likelihood function and take its derivative to find the parameter values that maximize likelihood.

Uploaded by

Peter Parker

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

71 views51 pages

Model Inference and Averaging: Dept. Computer Science & Engineering, Shanghai Jiao Tong University

Uploaded by

Peter Parker

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 51

Model Inference and

Averaging
Dept. Computer Science & Engineering,
Shanghai Jiao Tong University
Contents
• The Bootstrap and Maximum Likelihood Methods
• Bayesian Methods
• Relationship Between the Bootstrap and Bayesian
Inference
• The EM Algorithm
• MCMC for Sampling from the Posterior
• Bagging
• Model Averaging and Stacking

2018/10/25 Model Inference and Averaging 2

Bootstrap by Basis Expansions
• Consider a linearN expansion
 ( x)    j h j ( x)
j 1

• The least square error solution

ˆ
  (H H ) H y
T 1 T

• The Covariance of \beta

ˆ
cor (  )  ( H H ) ˆ ;
T 1 2

ˆ 2   ( yi  ˆ ( xi )) 2 / N
2018/10/25 Model Inference and Averaging 3
2018/10/25 Model Inference and Averaging 4
Parametric Model
• Assume a parameterized probability density
(parametric model) for observations
zi  gθ ( z )
E.g. normal distribution
θ = ( μ, σ 2 )
1
1 - ( z - μ )2 σ 2
2
gθ ( z ) = e
2πσ 2
2018/10/25 Model Inference and Averaging 5
Maximum Likelihood Inference
• Suppose we are trying to measure the true
value of some quantity (xT).
– We make repeated measurements of this
quantity {x1, x2, … xn}.
– The standard way to estimate xT from our
measurements is to calculate the mean value:
1 N
 x   xi
N i1
and set xT = μx.

2018/10/25 Model Inference and Averaging 6


Maximum Likelihood Inference
• Suppose we are trying to measure the true
value
DOESof some quantity (xT).MAKE SENSE?
THIS PROCEDURE
– We make repeated measurements of this
quantity {x1, x2, … xn}.
– The
The maximum
standard waylikelihood method
to estimate (MLM)
xT from our
answers thisisquestion
measurements and the
to calculate provides
meanavalue:
1 N
general method for estimating parameters
 x  from
of interest  xidata.
N i1
and set xT = μx.

2018/10/25 Model Inference and Averaging 7


The Maximum Likelihood Method
• Statement of the Maximum Likelihood
Method
– Assume we have made N measurements of x
{x1, x2, …, xn}.
– Assume we know the probability distribution
function that describes x: f(x, a).
– Assume we want to determine the parameter a.
• MLM: pick a to maximize the probability of
getting the measurements (the xi's) we did!
2018/10/25 Model Inference and Averaging 8
The MLM Implementation
• The probability of measuring x1 is f ( x1 ,  )dx
• The probability of measuring x2 is f ( x2 ,  )dx
• The probability of measuring xn is f ( xn ,  )dx
• If the measurements are independent, the
probability of getting the measurements we did is:
L  f ( x1 ,  )dx  f ( x2 ,  )dx    f ( xn ,  )dx
 f ( x1 ,  )  f ( x2 ,  )    f ( xn ,  )[dx n ]
• We can drop the dxn term as it is only a
proportionality constant. N
L is called the Likelihood Function : L   f ( xi , )
i 1
2018/10/25 Model Inference and Averaging 9
Log Maximum Likelihood Method
• We want to pick the a that maximizes L:
L
0
  *
– Often easier to maximize lnL.
– L and lnL are both maximum at the same
location.

• we maximize lnL rather than L itself because
lnL converts the product into a summation.
N
ln L   ln f (xi ,  )
i1
2018/10/25 Model Inference and Averaging 10
Log Maximum Likelihood Method
• The new maximization condition is:
 ln L N

 ln f ( xi ,  ) 0
   * i 1    *

•  could be an array of parameters (e.g.

slope and intercept) or just a single variable.
• equations to determine a range from simple
linear equations to coupled non-linear
equations.
2018/10/25 Model Inference and Averaging 11
An Example: Gaussian
• Let f(x, ) be given by a Gaussian distribution
function.
• Let  =μbe the mean of the Gaussian. We
want to use our data+MLM to find the mean.
• We want the best estimate of a from our set
of n measurements {x1, x2, …, xn}.
• Let’s assume that s is the same for each
measurement.

2018/10/25 Model Inference and Averaging 12

An Example: Gaussian
• Gaussian PDF ( x i  ) 2
1 
f (xi , )  e 2  2

 2
• The likelihood function for this problem is:
n n ( xi  ) 2
1 
L   f ( xi ,
)   e 2 2

i 1 i 1  2
n
( xi  ) 2
 1  
n ( x1  ) 2
( x2  ) 2
( xn  ) 2 n 
 1    
2 2
 2 2 2

2 2 2

 e e e  e i 1

 2   2 
2018/10/25 Model Inference and Averaging 13
An Example: Gaussian
n
( xi  ) 2
n 
1
ln L  ln  f ( xi ,  )  ln([ ]n e i1 2 2
)
i 1  2
1 n ( x   )2
 n ln( ) i 2
 2 i 1 2
• We want to find the a that maximizes the log
likelihood function:
 ln L    1  n
( x   ) 2

 n ln   i
0
     2  i 1 2 2

 n n
1 n


 i 1
( xi   )  0;
2

i 1
2( xi   )( 1)  0    xi
n i 1
2018/10/25 Model Inference and Averaging 14
An Example: Gaussian
• If  are different for each data point then  is
just the weighted average:
n
xi
  2
  i n1 i Weighted Average
1
i 1  i
2

2018/10/25 Model Inference and Averaging 15

An Example: Poisson
• Let f(x,) be given by a Poisson distribution.
• Let  = μ be the mean of the Poisson.
• We want the best estimate of a from our set
of n measurements {x1, x2, … xn}.
• Poisson PDF:

e  x
f ( x, ) 
x!
2018/10/25 Model Inference and Averaging 16
An Example: Poisson
• The likelihood function for this problem is:

n n
e  xi
L f ( xi ,  )  
i 1 i 1 xi !
n
 xi
e   e  x1
e  
x2 xn
e  n  i1
 ... 
x1! x2 ! xn ! x1! x2 !..xn !

2018/10/25 Model Inference and Averaging 17

An Example: Poisson
• Find a that maximizes the log likelihood
function:
d ln L d  n

   n  ln    xi  ln( x1! x2 !..xn !) 
d d  i 1 
1 n
 n   xi  0
 i 1
1 n
   xi Average
n i 1

2018/10/25 Model Inference and Averaging 18

General properties of MLM
• For large data samples (large n) the
likelihood function, L, approaches a
Gaussian distribution.
• Maximum likelihood estimates are usually
consistent.
– For large n the estimates converge to the true
value of the parameters we wish to determine.
• Maximum likelihood estimates are usually
unbiased.
– For all sample sizes the parameter of interest is
calculated correctly.
2018/10/25 Model Inference and Averaging 19
General properties of MLM
• Maximum likelihood estimate is efficient: the
estimate has the smallest variance.
• Maximum likelihood estimate is sufficient: it
uses all the information in the observations
(the xi’s).
• The solution from MLM is unique.
• Bad news: we must know the correct
probability distribution for the problem at
hand!
2018/10/25 Model Inference and Averaging 20
Maximum Likelihood
• We maximize the likelihood function
N
L(θ; Z ) =  g (z )
i =1
θ i

• Log-likelihood function
(θ ; Z ) = log L(θ ; Z )
N N
=  log g
i =1
θ
( zi ) = i =1
(θ ; zi )

2018/10/25 Model Inference and Averaging 21

Score Function
• Assess the precision of  using the likelihood
function N
(θ ; Z ) =  (θ; zi )
i =1

 (θ ; zi )
wher e (θ; zi ) =

• Assume that L takes its maximum in the
interior parameter space. Then
(θˆ; Z ) = 0
2018/10/25 Model Inference and Averaging 22
Likelihood Function
• We maximize the likelihood
N
function
L(θ; Z ) =  g (z )
i =1
θ i

• We omit normalization since only adds a

constant factor
• Think of L as a function of ˆ with Z fixed
• Log-likelihood function
(θ ; Z ) = log L(θ ; Z )
N N
=  log g
i =1
θ
( zi ) = 
i =1
(θ ; zi )
2018/10/25 Model Inference and Averaging 23
Fisher Information
• Negative sum of second derivatives is the
information matrix N 2
 (θ ; zi )
I(θ ) = - 
i =1 θ θ T
• is called the observed information, should be
greater 0.
• Fisher information ( expected information ) is
i(θ ) = E I(θ ) 
Assume that 0 is the true value of 
2018/10/25 Model Inference and Averaging 24
Sampling Theory
• Basic result of sampling theory
• The sampling distribution of the max-
likelihood estimator approaches the following
normal distribution, as N  
ˆ  N( 0, i( 0 )1)
• When we sample independently from g ( z ) 0

• This suggests to approximate the distribution

with
N( ˆ, i( ˆ) 1 )
2018/10/25 Model Inference and Averaging 25
Error Bound
• The corresponding error estimates are
obtained from
i( ˆ)jj1 and I( ˆ)jj1
• The confidence points have the form
ˆj  z( 1 )  i( ˆ)jj1

and ˆj  z( 1 )  I( ˆ)jj1

z(1-α) is the 1-α percentile of

the normal distribution
2018/10/25 Model Inference and Averaging 26
Simplified form of the Fisher information

Suppose, in addition, that the operations of integration and

differentiation can be swapped for the second derivative of f(x;θ) as
well, i.e.,
2   2 
2 
T ( x) f ( x; θ )dx  =  T ( x)  2 f ( x; θ )  dx
   
In this case, it can be shown that the Fisher information equals
 2 
I ( ) = - E  2 log f ( X ;  ) 
  
The Cramér–Rao bound can then be written as
ˆ 1 1
var (  )  
I(  )  2 
- E  2 log f ( X ;  ) 
  
2018/10/25 Model Inference and Averaging 27
Single-parameter proof

If the expectation of T is denoted by ψ(θ), then, for all θ,

2
 (  ) 
var ( t( X ) ) 
I(  )
Let X be a random variable with probability density function f(x;θ).
Here T = t(X) is a statistic, which is used as an estimator for ψ(θ). If
V is the score, i.e.

V  l n f ( X;  )

then the expectation of V, written E(V), is zero. If we consider the
covariance cov(V,T) of V and T, we have cov(V,T) = E(VT), because
E(V) = 0. Expanding this expression we have
  
cov( V , T )  E  T  l n f ( X;  ) 
  
2018/10/25 Model Inference and Averaging 28
Using the chain rule
 1 Q
lnQ 
 Q 
and the definition of expectation gives, after cancelling f(x;θ),
because the integration and differentiation operations
commute (second condition).
       
E T  l n f ( X;  )     
t( x ) f ( x;  )  dx    t( x) f ( x;  ) dx    (  )
    

The Cauchy-Schwarz inequality shows that

var ( T ) var ( V )  cov( V , T )   (  )

Therefore 2 2 2
 (  )   (  )     1
var T     E( T ) 
var ( V ) I(  )    I(  )
2018/10/25 Model Inference and Averaging 29
An Example
• Consider a linearN expansion
 ( x)    j h j ( x)
j 1

• The least square error solution

ˆ
  (H H ) H y
T 1 T

• The Covariance of \beta

ˆ
cor (  )  ( H H ) ˆ ;
T 1 2

ˆ 2   ( yi  ˆ ( xi )) 2 / N
2018/10/25 Model Inference and Averaging 30
An Example
N
Consider prediction model ˆ ( x)   ˆ j h j ( x),
j 1

The standard deviation

se[ ˆ ( x)]  [ h( x)T ( H T H ) 1 h( x)]1/2 ˆ

• The confidence region

ˆ ( x)  1.96se[ ˆ ( x)]

2018/10/25 Model Inference and Averaging 31

Bayesian Methods
• Given a sampling model Pr(Z |  ) and a prior Pr( )
for the parameters, estimate the posterior
probability Pr ( Z  )  Pr (  )
Pr (  Z) 
 Pr ( Z  )  Pr (  )d
• By drawing samples or estimating its mean or
mode
• Differences to mere counting ( frequentist
approach )
– Prior: allow for uncertainties present before seeing the
data
– Posterior: allow for uncertainties present after seeing the
data
2018/10/25 Model Inference and Averaging 32
Bayesian Methods
• The posterior distribution affords also a
predictive distribution of seeing future values
new
Z
Pr ( z Z)   Pr ( z  )  Pr (  Z) d
new new

• In contrast, the max-likelihood approach

would predict future data on the basis of
Pr ( z new ˆ) not accounting for the uncertainty
in the parameters
2018/10/25 Model Inference and Averaging 33
An Example
• Consider a linearN expansion
 ( x)    j h j ( x)
j 1

• The least square error solution

 N( 0,  )

 T 1 x   
px  
1  12 x  
e
2  p/2

p/2

2018/10/25 Model Inference and Averaging 34

• The posterior distribution for \beta is also Gaussian, with mean and
covariance
1
 T  1  2
E(  Z)   H H    HT y,
  
1
 T  1  2
cov(  Z) =  H H     2.
  

• The corresponding posterior values for \mu(x),

1
  1 
2
E( ( x) Z)  h( x)T  HT H    HT y,
  
1
  1 
2
cov  ( x) , ( x) Z   h( x)T  HT H    h( x) 2 .
  
2018/10/25 Model Inference and Averaging 35
2018/10/25 Model Inference and Averaging 36
Bootstrap vs Bayesian
• The bootstrap mean is an approximate
posterior average
• Simple example:
– Single observation z drawn from a normal
distribution z N(  , 1)
– Assume a normal prior for  :  N( 0,  )
– Resulting posterior distribution
 z 1 

 z N , 
 1  1  1  1  
2018/10/25 Model Inference and Averaging 37
Bootstrap vs Bayesian
• Three ingredients make this work
– The choice of a noninformative prior for 
– The dependence of ( ; Z) on Z only through
the max-likelihood estimate ˆ Thus
( ; Z)  ( ; ˆ)
– The symmetry of
( ˆ;  )  ( ˆ;  )  const ant .

2018/10/25 Model Inference and Averaging 38

Bootstrap vs Bayesian
• The bootstrap distribution represents an (approximate)
nonparametric, noninformative posterior distribution for our
parameter.
• But this bootstrap distribution is obtained painlessly without
having to formally specify a prior and without having to
sample from the posterior distribution.
• Hence we might think of the bootstrap distribution as a
\poor man's" Bayes posterior. By perturbing the data, the
bootstrap approximates the Bayesian effect of perturbing
the parameters, and is typically much simpler to carry out.

2018/10/25 Model Inference and Averaging 39

Contents
• The Bootstrap and Maximum Likelihood Methods
• Bayesian Methods
• Relationship Between the Bootstrap and Bayesian
Inference
• The EM Algorithm
• MCMC for Sampling from the Posterior
• Bagging
• Model Averaging and Stacking

2018/10/25 Model Inference and Averaging 40

The EM Algorithm
• The EM algorithm for two-component
Gaussian mixtures
– Take initial guesses ˆ , ˆ1 ,ˆ1 , ˆ 2 ,ˆ2 for the
parameters

– Expectation Step: Compute the responsibilities

ˆ ˆ ( yi )

ˆi  2
, i = 1, ,N
( 1  ˆ )ˆ ( yi )  
ˆ ˆ ( yi )
1 2

2018/10/25 Model Inference and Averaging 41

The EM Algorithm
– Maximization Step: Compute the weighted
means and variances
 i 1( 1  î ) yi  i 1 ( 1 ˆ ) ( ˆ
 ) 2
N N
 y  1
ˆ1  , ˆ12  i i
,
 ( 1  î )  ( 1  î )
N N
i 1 i 1

 i 1 î yi  i 1 i i 1
ˆ( ˆ
 ) 2
N N
y 
ˆ2  , ˆ22  ,
 î  î
N N
i 1 i 1

ˆ   ˆi  N
N
i 1

– Iterate 2 and 3 until convergence

2018/10/25 Model Inference and Averaging 42
The EM Algorithm in General
• Baum-Welch algorithm
• Applicable to problems for which maximizing
the log-likelihood is difficult but simplified by
enlarging the sample set with unobserved
( latent ) data ( data augmentation ).

2018/10/25 Model Inference and Averaging 43

The EM Algorithm in General

Z input data, with log-likelihood (  , Z )

Z m latent data (in our example ∆i)
T  ( Z, Zm ) complete data with
log-likelihood 0(  , T)

2018/10/25 Model Inference and Averaging 44

Pr ( Z , Z  )m

Pr ( Z m
Z,  ) 
Pr ( Z  )
Pr ( T  )
Pr ( Z  ) 
Pr ( Z m
Z,  )

we have ( ; Z) = 0
(  ; T)  1
(  ; Z Z)
 m

2018/10/25 Model Inference and Averaging 45

The EM Algorithm in General
• Start with initial params ˆ
• Expectation Step: at the j-th step compute
 ˆ( j)
 ˆ
Q(  ,  )  E( 0(  , T)  Z,  )
( j)

as a function of the dummy argument

• Maximization Step: Determine the new
params ˆ( j1) by maximizing
 ˆ
Q(  ,  )( j)

• Iterate 2 and 3 until convergence

2018/10/25 Model Inference and Averaging 46
Contents
• The Bootstrap and Maximum Likelihood Methods
• Bayesian Methods
• Relationship Between the Bootstrap and Bayesian
Inference
• The EM Algorithm
• MCMC for Sampling from the Posterior
• Bagging
• Model Averaging and Stacking

2018/10/25 Model Inference and Averaging 47

Model Averaging and Stacking
• Given predictions fˆ1( x) , fˆ2( x) , , fˆM( x)
• Under square-error loss, seek weights
wˆ  ( wˆ1, wˆ2 , , wˆ M )
• Such that 2
 ˆ  M
wˆ  arg min EP Y   wm f m( x) 
w  m 1 
• Here the input x is fixed and the N
observations in Z are distributed according to
P
2018/10/25 Model Inference and Averaging 48
Model Averaging and Stacking
• The solution is the population linear
regression of Y on Fˆ( x ) T
  fˆ1( x) , fˆ2( x) , , fˆM( x) 
1
namely wˆ  EP  Fˆ( x) Fˆ( x)  EP  Fˆ( x)Y 
T

• Now the full regression has smaller error,

namely
2
 M
ˆ  ˆ
2
EP Y   wˆ m f m( x)   EP Y  f m( x)  m
 m 1 
• Population linear regression is not available,
thus replace it by the linear regression over
the training set
2018/10/25 Model Inference and Averaging 49
Contents
• The Bootstrap and Maximum Likelihood Methods
• Bayesian Methods
• Relationship Between the Bootstrap and Bayesian
Inference
• The EM Algorithm
• MCMC for Sampling from the Posterior
• Bagging
• Model Averaging and Stacking

2018/10/25 Model Inference and Averaging 50

2018/10/25 Model Inference and Averaging 51

Gmat Quant Cheat Sheet
0% (1)
Gmat Quant Cheat Sheet
1 page
Maximum Likelihood
No ratings yet
Maximum Likelihood
11 pages
Lecture 03 Maximum Likelihood Estimation
No ratings yet
Lecture 03 Maximum Likelihood Estimation
22 pages
ML Notes
No ratings yet
ML Notes
4 pages
CS775 Lec 2
No ratings yet
CS775 Lec 2
66 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
16 pages
Inf 2
No ratings yet
Inf 2
37 pages
Statistical Machine Learning W4400 Lecture Slides PDF
No ratings yet
Statistical Machine Learning W4400 Lecture Slides PDF
520 pages
Lecture1 ML MLE
No ratings yet
Lecture1 ML MLE
103 pages
11 Parameter Estimation
No ratings yet
11 Parameter Estimation
6 pages
Learning Models From Data: 1 Parametric Estimation
No ratings yet
Learning Models From Data: 1 Parametric Estimation
14 pages
Maximum Likelihood Problem
No ratings yet
Maximum Likelihood Problem
8 pages
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
No ratings yet
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
15 pages
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
No ratings yet
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
15 pages
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
No ratings yet
10.0 Lesson Plan: Answer Questions Robust Estimators Maximum Likelihood Estimators
15 pages
Maximum Likelihood Estimation by K.Kashin
No ratings yet
Maximum Likelihood Estimation by K.Kashin
34 pages
3.exponential Family & Point Estimation - 552
0% (1)
3.exponential Family & Point Estimation - 552
33 pages
Chapter 2: Statistical Inference, Point Estimation, and Confidence Intervals
No ratings yet
Chapter 2: Statistical Inference, Point Estimation, and Confidence Intervals
16 pages
7.estimation Clustering
No ratings yet
7.estimation Clustering
56 pages
Lecture 22
No ratings yet
Lecture 22
7 pages
Maximum Likelihood Method: MLM: pick α to maximize the probability of getting the measurements (the x 's) that we did!
No ratings yet
Maximum Likelihood Method: MLM: pick α to maximize the probability of getting the measurements (the x 's) that we did!
8 pages
جلسه پنجم-1
No ratings yet
جلسه پنجم-1
15 pages
Likelihood, Bayesian, and Decision Theory
No ratings yet
Likelihood, Bayesian, and Decision Theory
50 pages
Maximum Likelihood Learning of Gaussians For Data Mining
No ratings yet
Maximum Likelihood Learning of Gaussians For Data Mining
25 pages
Point Estimation: Definition of Estimators
No ratings yet
Point Estimation: Definition of Estimators
8 pages
MIT14 30s09 Lec19
No ratings yet
MIT14 30s09 Lec19
7 pages
4.ML Estimation
No ratings yet
4.ML Estimation
19 pages
Bayesian and MLE
No ratings yet
Bayesian and MLE
30 pages
Likelihood EM HMM Kalman
No ratings yet
Likelihood EM HMM Kalman
46 pages
Wk04 Machine Learning
No ratings yet
Wk04 Machine Learning
6 pages
Mstat Note12 Parametric Inference FSP
No ratings yet
Mstat Note12 Parametric Inference FSP
45 pages
Topic 14: Maximum Likelihood Estimation: 1 Examples
No ratings yet
Topic 14: Maximum Likelihood Estimation: 1 Examples
6 pages
Project Report
No ratings yet
Project Report
56 pages
Probability Theory For Machine Learning: Chris Cremer September 2015
No ratings yet
Probability Theory For Machine Learning: Chris Cremer September 2015
40 pages
Probability Concepts Explained
No ratings yet
Probability Concepts Explained
10 pages
Lecture 2 Annotated
No ratings yet
Lecture 2 Annotated
60 pages
11 Mle
No ratings yet
11 Mle
26 pages
Estimation 4
No ratings yet
Estimation 4
16 pages
L08 MaximumLikelihoodEstimation
No ratings yet
L08 MaximumLikelihoodEstimation
5 pages
Sta255 Week 11-2 Pre
No ratings yet
Sta255 Week 11-2 Pre
21 pages
Chap - 2point - Estimation
No ratings yet
Chap - 2point - Estimation
11 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
6 pages
Experiment 1
No ratings yet
Experiment 1
5 pages
9 Mle
No ratings yet
9 Mle
39 pages
AllNotes 4
No ratings yet
AllNotes 4
56 pages
Lecture 3 ML - Optimization
No ratings yet
Lecture 3 ML - Optimization
32 pages
8.estimation I - 530
100% (1)
8.estimation I - 530
22 pages
Agricultural Land Use in Kerala
No ratings yet
Agricultural Land Use in Kerala
5 pages
PRCI Slides 1
No ratings yet
PRCI Slides 1
86 pages
7 Mle
No ratings yet
7 Mle
31 pages
TS Theme3
No ratings yet
TS Theme3
18 pages
Maximum Likelihood and Bayesian Parameter Estimation: Chapter 3, DHS
No ratings yet
Maximum Likelihood and Bayesian Parameter Estimation: Chapter 3, DHS
35 pages
ML Columbia PDF
No ratings yet
ML Columbia PDF
615 pages
Asset-V1 ColumbiaX+CSMM.102x+1T2018+type@asset+block@ML Lecture1
No ratings yet
Asset-V1 ColumbiaX+CSMM.102x+1T2018+type@asset+block@ML Lecture1
17 pages
Unsupervised Learning Clustering Math
No ratings yet
Unsupervised Learning Clustering Math
28 pages
Week 1 1720465962 Estimation Hour 2
No ratings yet
Week 1 1720465962 Estimation Hour 2
14 pages
Maximum
No ratings yet
Maximum
3 pages
Module 4
No ratings yet
Module 4
3 pages
MLE Dan Bayesian Estimation From Walpole Book
No ratings yet
MLE Dan Bayesian Estimation From Walpole Book
13 pages
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Linear Classifiers: Dept. Computer Science & Engineering, Shanghai Jiao Tong University
No ratings yet
Linear Classifiers: Dept. Computer Science & Engineering, Shanghai Jiao Tong University
46 pages
Non-Negative Matrix Factorization
No ratings yet
Non-Negative Matrix Factorization
21 pages
Kernel Methods: Dept. Computer Science & Engineering, Shanghai Jiao Tong University
No ratings yet
Kernel Methods: Dept. Computer Science & Engineering, Shanghai Jiao Tong University
29 pages
Basis Expansion and Regularization: Prof. Liqing Zhang
No ratings yet
Basis Expansion and Regularization: Prof. Liqing Zhang
45 pages
Statistical Learning & Inference: Dept. Computer Science & Engineering, Shanghai Jiao Tong University
No ratings yet
Statistical Learning & Inference: Dept. Computer Science & Engineering, Shanghai Jiao Tong University
32 pages
Overview of Supervised Learning
No ratings yet
Overview of Supervised Learning
41 pages
2.3.4.A TwoComplementArithmetic
No ratings yet
2.3.4.A TwoComplementArithmetic
5 pages
Practice: Transportation Degeneracy: Combinatorial Optimization
No ratings yet
Practice: Transportation Degeneracy: Combinatorial Optimization
11 pages
Monday Tuesday Wednesday Thursday Friday: GRADES 1 To 12 Daily Lesson Log
No ratings yet
Monday Tuesday Wednesday Thursday Friday: GRADES 1 To 12 Daily Lesson Log
7 pages
Reviewed Risk Theory Notes
No ratings yet
Reviewed Risk Theory Notes
112 pages
IEEE 754 Floating Point Notes
No ratings yet
IEEE 754 Floating Point Notes
4 pages
CAT 4 MBA EXAM Sample Questions
No ratings yet
CAT 4 MBA EXAM Sample Questions
21 pages
2023-06-10 Exam 2b.updated
No ratings yet
2023-06-10 Exam 2b.updated
3 pages
Transient Simulation: Lecture Iv: II II II
No ratings yet
Transient Simulation: Lecture Iv: II II II
22 pages
Riemann and Physics
100% (1)
Riemann and Physics
38 pages
Fourier Series: Yi Cheng Cal Poly Pomona
No ratings yet
Fourier Series: Yi Cheng Cal Poly Pomona
46 pages
Pythagoras Expanding Factorisation
No ratings yet
Pythagoras Expanding Factorisation
73 pages
Alan - Macdonald 2012 A.survey - Of.geometric - Algebra.and - Geometric.calculus
No ratings yet
Alan - Macdonald 2012 A.survey - Of.geometric - Algebra.and - Geometric.calculus
33 pages
Intervention For Calculus
No ratings yet
Intervention For Calculus
4 pages
Chapter2 Answers 3rd
No ratings yet
Chapter2 Answers 3rd
24 pages
18MAB201T-Unit 4 TS1
No ratings yet
18MAB201T-Unit 4 TS1
1 page
Eapp Module 1
100% (1)
Eapp Module 1
29 pages
Basic Engineering Correlation Algebra Re PDF
No ratings yet
Basic Engineering Correlation Algebra Re PDF
78 pages
TEST 1. Derivatives (2012)
No ratings yet
TEST 1. Derivatives (2012)
10 pages
JEE - Mathematics - Straight Lines - Solutions
No ratings yet
JEE - Mathematics - Straight Lines - Solutions
41 pages
Circle Vocabulary PDF
No ratings yet
Circle Vocabulary PDF
4 pages
Problem Set No.2
No ratings yet
Problem Set No.2
91 pages
pastpapersMathematics20-20International20 (0607) 202520Specimen20Paper20&20Syllabus0607 Y25 SP
No ratings yet
pastpapersMathematics20-20International20 (0607) 202520Specimen20Paper20&20Syllabus0607 Y25 SP
16 pages
01 Probability
No ratings yet
01 Probability
7 pages
Programming Workbook FSH
No ratings yet
Programming Workbook FSH
515 pages
Numbering System 1
No ratings yet
Numbering System 1
52 pages
Lesson Plan Template MAED 3224: Ccss - Math.Content.2.Nbt.B.7
No ratings yet
Lesson Plan Template MAED 3224: Ccss - Math.Content.2.Nbt.B.7
6 pages
Wk05 EE379 CS PDF
No ratings yet
Wk05 EE379 CS PDF
40 pages
Demand and Supply (Mathematical)
No ratings yet
Demand and Supply (Mathematical)
11 pages
Problems and Challenges of NTEC To TEIs
100% (1)
Problems and Challenges of NTEC To TEIs
7 pages

Model Inference and Averaging: Dept. Computer Science & Engineering, Shanghai Jiao Tong University

Uploaded by

Model Inference and Averaging: Dept. Computer Science & Engineering, Shanghai Jiao Tong University

Uploaded by

Model Inference and

2018/10/25 Model Inference and Averaging 2

• The least square error solution

• The Covariance of \beta

2018/10/25 Model Inference and Averaging 6

2018/10/25 Model Inference and Averaging 7

•  could be an array of parameters (e.g.

2018/10/25 Model Inference and Averaging 12

2018/10/25 Model Inference and Averaging 15

2018/10/25 Model Inference and Averaging 17

2018/10/25 Model Inference and Averaging 18

2018/10/25 Model Inference and Averaging 21

• We omit normalization since only adds a

• This suggests to approximate the distribution

and ˆj  z( 1 )  I( ˆ)jj1

z(1-α) is the 1-α percentile of

Suppose, in addition, that the operations of integration and

If the expectation of T is denoted by ψ(θ), then, for all θ,

The Cauchy-Schwarz inequality shows that

• The least square error solution

• The Covariance of \beta

The standard deviation

• The confidence region

2018/10/25 Model Inference and Averaging 31

• In contrast, the max-likelihood approach

• The least square error solution

2018/10/25 Model Inference and Averaging 34

• The corresponding posterior values for \mu(x),

2018/10/25 Model Inference and Averaging 38

2018/10/25 Model Inference and Averaging 39

2018/10/25 Model Inference and Averaging 40

– Expectation Step: Compute the responsibilities

2018/10/25 Model Inference and Averaging 41

– Iterate 2 and 3 until convergence

2018/10/25 Model Inference and Averaging 43

Z input data, with log-likelihood (  , Z )

2018/10/25 Model Inference and Averaging 44

2018/10/25 Model Inference and Averaging 45

as a function of the dummy argument

• Iterate 2 and 3 until convergence

2018/10/25 Model Inference and Averaging 47

• Now the full regression has smaller error,

2018/10/25 Model Inference and Averaging 50

You might also like