Model Inference and Averaging: Dept. Computer Science & Engineering, Shanghai Jiao Tong University
Model Inference and Averaging: Dept. Computer Science & Engineering, Shanghai Jiao Tong University
Averaging
Dept. Computer Science & Engineering,
Shanghai Jiao Tong University
Contents
• The Bootstrap and Maximum Likelihood Methods
• Bayesian Methods
• Relationship Between the Bootstrap and Bayesian
Inference
• The EM Algorithm
• MCMC for Sampling from the Posterior
• Bagging
• Model Averaging and Stacking
ˆ 2 ( yi ˆ ( xi )) 2 / N
2018/10/25 Model Inference and Averaging 3
2018/10/25 Model Inference and Averaging 4
Parametric Model
• Assume a parameterized probability density
(parametric model) for observations
zi gθ ( z )
E.g. normal distribution
θ = ( μ, σ 2 )
1
1 - ( z - μ )2 σ 2
2
gθ ( z ) = e
2πσ 2
2018/10/25 Model Inference and Averaging 5
Maximum Likelihood Inference
• Suppose we are trying to measure the true
value of some quantity (xT).
– We make repeated measurements of this
quantity {x1, x2, … xn}.
– The standard way to estimate xT from our
measurements is to calculate the mean value:
1 N
x xi
N i1
and set xT = μx.
Maximum Likelihood Inference
• Suppose we are trying to measure the true
value
DOESof some quantity (xT).MAKE SENSE?
THIS PROCEDURE
– We make repeated measurements of this
quantity {x1, x2, … xn}.
– The
The maximum
standard waylikelihood method
to estimate (MLM)
xT from our
answers thisisquestion
measurements and the
to calculate provides
meanavalue:
1 N
general method for estimating parameters
x from
of interest xidata.
N i1
and set xT = μx.
The Maximum Likelihood Method
• Statement of the Maximum Likelihood
Method
– Assume we have made N measurements of x
{x1, x2, …, xn}.
– Assume we know the probability distribution
function that describes x: f(x, a).
– Assume we want to determine the parameter a.
• MLM: pick a to maximize the probability of
getting the measurements (the xi's) we did!
2018/10/25 Model Inference and Averaging 8
The MLM Implementation
• The probability of measuring x1 is f ( x1 , )dx
• The probability of measuring x2 is f ( x2 , )dx
• The probability of measuring xn is f ( xn , )dx
• If the measurements are independent, the
probability of getting the measurements we did is:
L f ( x1 , )dx f ( x2 , )dx f ( xn , )dx
f ( x1 , ) f ( x2 , ) f ( xn , )[dx n ]
• We can drop the dxn term as it is only a
proportionality constant. N
L is called the Likelihood Function : L f ( xi , )
i 1
2018/10/25 Model Inference and Averaging 9
Log Maximum Likelihood Method
• We want to pick the a that maximizes L:
L
0
*
– Often easier to maximize lnL.
– L and lnL are both maximum at the same
location.
• we maximize lnL rather than L itself because
lnL converts the product into a summation.
N
ln L ln f (xi , )
i1
2018/10/25 Model Inference and Averaging 10
Log Maximum Likelihood Method
• The new maximization condition is:
ln L N
ln f ( xi , ) 0
* i 1 *
2
• The likelihood function for this problem is:
n n ( xi ) 2
1
L f ( xi ,
) e 2 2
i 1 i 1 2
n
( xi ) 2
1
n ( x1 ) 2
( x2 ) 2
( xn ) 2 n
1
2 2
2 2 2
2 2 2
e e e e i 1
2 2
2018/10/25 Model Inference and Averaging 13
An Example: Gaussian
n
( xi ) 2
n
1
ln L ln f ( xi , ) ln([ ]n e i1 2 2
)
i 1 2
1 n ( x )2
n ln( ) i 2
2 i 1 2
• We want to find the a that maximizes the log
likelihood function:
ln L 1 n
( x ) 2
n ln i
0
2 i 1 2 2
n n
1 n
i 1
( xi ) 0;
2
i 1
2( xi )( 1) 0 xi
n i 1
2018/10/25 Model Inference and Averaging 14
An Example: Gaussian
• If are different for each data point then is
just the weighted average:
n
xi
2
i n1 i Weighted Average
1
i 1 i
2
• Log-likelihood function
(θ ; Z ) = log L(θ ; Z )
N N
= log g
i =1
θ
( zi ) = i =1
(θ ; zi )
(θ ; zi )
wher e (θ; zi ) =
• Assume that L takes its maximum in the
interior parameter space. Then
(θˆ; Z ) = 0
2018/10/25 Model Inference and Averaging 22
Likelihood Function
• We maximize the likelihood
N
function
L(θ; Z ) = g (z )
i =1
θ i
Therefore 2 2 2
( ) ( ) 1
var T E( T )
var ( V ) I( ) I( )
2018/10/25 Model Inference and Averaging 29
An Example
• Consider a linearN expansion
( x) j h j ( x)
j 1
ˆ 2 ( yi ˆ ( xi )) 2 / N
2018/10/25 Model Inference and Averaging 30
An Example
N
Consider prediction model ˆ ( x) ˆ j h j ( x),
j 1
ˆ ( x) 1.96se[ ˆ ( x)]
N( 0, )
T 1 x
px
1 12 x
e
2 p/2
p/2
i 1 ˆi yi i 1 i i 1
ˆ( ˆ
) 2
N N
y
ˆ2 , ˆ22 ,
ˆi ˆi
N N
i 1 i 1
ˆ ˆi N
N
i 1
Pr ( Z m
Z, )
Pr ( Z )
Pr ( T )
Pr ( Z )
Pr ( Z m
Z, )
we have ( ; Z) = 0
( ; T) 1
( ; Z Z)
m