0% found this document useful (0 votes)

17 views69 pages

TheoryIdeasInla Screen

Uploaded by

bayesianito

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views69 pages

TheoryIdeasInla Screen

Uploaded by

bayesianito

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 69

Introduction and motivation

for the INLA course in Iceland 2016

Haakon Chris. Bakka

Norwegian University of Science and Technology

What are the main ideas behind R-INLA? What would you need to
know to implement it yourself?

Wide time estimate: 4h presentation, 2h exercises.

Source

This presentation is partially based on the wonderful and very

readable paper Bayesian Computing with INLA: A Review
by Rue et. al. in 2016.
Goal

The goal of this presentation is to improve your understanding of

the INLA method.

The level of details is set to the maximum we can get through in

2-3 hours. This is not about coding.
What kind of models can we work with?
Latent Gaussian Models

Likelihood Y
y | x, θ1 ∼ π(yi | xi , θ1 ).
i∈I

Latent variable (linear predictor)

x | θ2 ∼ N µ(θ2 ), Q−1 (θ2 )

Posterior Y
π(x, θ|y) ∝ π(θ) π(x|θ) π(yi |xi , θ). (1)
i∈I
We make the following critical assumptions

1. The number of hyperparameters |θ| is small, typically 2 to 5,

but not exceeding 20.
2. The distribution of the latent field, x|θ is Gaussian and
required to be a Gaussian Markov random field (GMRF).
Possibly with a high dimension (e.g. 105 ). Many xi will not be
observed.
3. Conditionally on the knowledge of θ and x, the observations y
are mutually independent
Are there many interesting models of this type?
Example 1: Groups and individuals

ηij = µ + cij β + ui + vi + wij , i = 1, . . . , N, j = 1, . . . , M

for covariates cij and “random effects” u, v and w.

If we assign Gaussian priors on µ, β, u and v, then

x = (µ, β, u, v, η)

is jointly Gaussian.
Example 1 - Precision matrix
Precision matrix (η, u, v, µ, β) N = 100, |θ| = 5.

1.0
0.8
0.6
0.4
0.2
0.0

0.0 0.2 0.4 0.6 0.8 1.0

Example 2: Smoothing of binary time-series

I Data is sequence of 0 and 1s

I Probability for a 1 at time t, pt , depends on time

exp(ηt )
pt =
1 + exp(ηt )
I Linear predictor

ηt = µ + βct + ut + vt , t = 1, . . . , n
Example 2 - Continued
Prior models
I µ and β are Normal
I u is an AR-model, like

ut = φut−1 + t

with parameters (φ, σ2 ).

I v could be an unstructured term/ ’random effect’/
slow-varying trend
gives
x = (µ, β, u, v, η)
is jointly Gaussian.

Hyperparameters
θ = (φ, σ2 , σv2 )
Example 2 - Rewritten

We can reinterpret the model as

θ ∼ π(θ)
x | θ ∼ π(x | θ) = N (0, Σ(θ))
Y
y | x, θ ∼ π(yi | ηi , θ)
i

I dim(x) could be large 102 -105

I dim(θ) is small 1-5
Precision matrix (η, u, v, µ, β) N = 100, M = 5.

1.0
0.8
0.6
0.4
0.2
0.0

0.0 0.2 0.4 0.6 0.8 1.0

Example 3: Presence-Absence-Abundance

I Count data, with lots of zeros

I Bernoulli model for presence-absence, with probability
pi = invlogit(ηi )
I Poisson model for abundance, with intensity λi = exp(γi )
I ηi = Xi α + f1,i + ...
I γi = Xi β + f4,i + ...

x = (η, γ, α, β, f 1 , f 2 , ...)
θ = (θf1 , θf2 , θf3 , θf4 , ...)
What is it that we want to do?

We want to compute probabilities in, and to sample from, big

multivariate Gaussians.
Waypoint

So far There are a million interesting models. These models

have huge Gaussian components.
Next goal How does the precision matrix Q(θ) look?
First What is a GMRF? (And why are we talking about
precision matrix?)
What is a Gaussian Markov random field (GMRF)?

A GMRF is a simple construct

I A normal distributed random vector

x = (x1 , . . . , xn )T

I Additional Markov properties:

xi ⊥ xj | x−ij

xi and xj are conditional independent (CI).

The precision matrix

If xi ⊥ xj | x−ij for a set of {i, j}, then we need to constrain the

parametrisation of the GMRF.
I Covariance matrix: difficult
I Precision matrix: easy
Conditional independence and the precision matrix

The density of a zero mean Gaussian

1
π(x) ∝ |Q|1/2 exp − xT Qx
2

Constraining the parametrisation to obey CI properties

Theorem

xi ⊥ xj | x−ij ⇐⇒ Qij = 0
Example: Auto-regressive process
Simple example of a GMRF
Auto-regressive process of order 1

xt | xt−1 , . . . , x1 ∼ N (φxt−1 , 1), t = 2, . . . , n

and x1 ∼ N (0, (1 − φ2 )−1 ).

x1 x2 x3 x4 x5 xn

Tridiagonal precision matrix

 
1 −φ
−φ 1 + φ2 −φ 
 
Q=
 .. .. .. 
 . . . 

 2
−φ 1 + φ −φ 
−φ 1
Main features of GMRFs

I Analytical tractable
I Modelling using conditional independence
I Merging GMRFs using conditioning (hierarchical models)
I Unified framework for
I understanding
I representation
I Computation using numerical methods for sparse matrices
I Why and how can be understood making use of conditional
independence (maybe today...)
Waypoint

So far We know what a GMRF (Gaussian Markov Random

Field) is, and why they make sense computationally,
and why the make sense for modeling relationships.
Next How does the precision matrix Q(θ) look in a typical
model?
General example Q(θ)

η = µ1 + βz + A1 f 1 + A2 f 2 + .
Let τµ and τβ be the (fixed) prior precisions for µ and β. The two
model components f 1 and f 2 have sparse precision matrices Q1 (θ)
and Q2 (θ). Also, A1 , and similarly for A2 , is a n × m1 sparse
matrix, which is zero except for exactly one 1 in each row. The
joint precision matrix Qjoint (θ) of (η, f 1 , f 2 , β, µ) is
 
τ I τ A1 τ A2 τ Iz τ I1
 Q1 (θ) + τ A1 AT
1 τ A1 AT2 τ A1 z τ A1 1 
Q2 (θ) + τ A2 AT
 

 2 τ A2 z τ A2 1  
 τβ + τ zT z τ zT 1 
τµ + τ 1 T 1
Looking at Q

1 Q = INLA ::: inla . rw (100 , 2)

2 image ( Q )

1 demo ( " Epil " )

2 result = inla ( formula , family = " poisson " , data = Epil ,
control . compute = list ( config = T ) )
3 image ( force Symmetr ic ( result $ misc $ configs $ config [[1]] $ Q ) )
4 image ( chol ( forc eSymmetr ic ( result $ misc $ configs $ config [[1]] $ Q )
))
What do we care about?
The most important quantity in Bayesian statistics is the posterior
distribution:

Posterior Prior Likelihood

z }| { z }| { zY }| {
π(x, θ | y) ∝ π(θ)π(x | θ) π(yi | xi , θ)
i∈I

from which we can derive the quantities of interest, such as

Z Z
π(xi | y) ∝ π(x, θ | y)dx−i dθ
Z
= π(xi | θ, y)π(θ | y)dθ

or π(θj | y).

These are very high dimensional integrals and are typically not
analytically tractable.
Gaussian likelihood gives Gaussian fixed-hyper posterior

Fix the hyperparameters to θ0 , then

π(x|y = data) ∝ π(x, y = data)

∝ π(y = data|x)π(x)

where everything is Gaussian. To renormalise, we need to compute

a determinant.
Recall: What is our model framework?

Latent Gaussian models

Y
y |x, θ ∼ π(yi |ηi , θ)
i
x|θ ∼ π(x|θ) ∼ N (0, Q(θ)−1 ) Gaussian!
θ ∼ π(θ) Not Gaussian

where the precision matrix Q(θ) is sparse. Generally these

“sparse” Gaussian distributions are called Gaussian Markov random
fields (GMRFs).

The sparseness can be exploited for very quick computations for

the Gaussian part of the model through numerical algorithms for
sparse matrices.
The INLA idea

Use the posterior distribution

π(x, θ | y) ∝ π(θ)π(x | θ)π(y | x, θ)

to approximate the posterior marginals

π(xi | y ) and π(θj | y )

directly.

Let us consider a toy example to illustrate the ideas.

How does INLA work?
Observations
yi = m(i) + i , i = 1, . . . , n
Here, we assume that m(i) is a smooth function wrt i and
iid
i ∼ N (0, τ0 ) with known precision τ0 .
How does INLA work?
Observations
yi = m(i) + i , i = 1, . . . , n
Here, we assume that m(i) is a smooth function wrt i and
iid
i ∼ N (0, τ0 ) with known precision τ0 .
●
●

10 ●
1 n = 50 ● ●
●

2 idx = 1: n
3 # generate something 5 ●
● ●
● ● ●

smooth representing m ●
●
●
●

● ● ●

4 fun = 100 * (( idx - n / 2) / n ) ^3 ●

● ● ● ●
y
0 ●
●
● ● ●
●
●
●
● ●
● ●
●
●
5 # add some noise ●
●
●

●
6 y = fun + rnorm (n , mean ●
−5 ●
=0 , sd =1) ● ● ● ●

7 plot ( idx , y ) ●

−10
●

0 10 20 30 40 50
idx
Assumed hierarchical model

1. Data: Gaussian observations with known precision

yi | xi , θ ∼ N (xi , τ0 )

1
model="rw2"
Assumed hierarchical model

1. Data: Gaussian observations with known precision

yi | xi , θ ∼ N (xi , τ0 )

2. Latent model: A random walk of second order1

n
!
θX
π(x | θ) ∝ θ(n−2)/2 exp − (xi − 2xi−1 + xi−2 )2
2
i=3

1
model="rw2"
Assumed hierarchical model

1. Data: Gaussian observations with known precision

yi | xi , θ ∼ N (xi , τ0 )

2. Latent model: A random walk of second order1

n
!
θX
π(x | θ) ∝ θ(n−2)/2 exp − (xi − 2xi−1 + xi−2 )2
2
i=3

3. Hyperparameter: The smoothing parameter θ which we assign

a Γ(a, b) prior

π(θ) ∝ θa−1 exp (−bθ) , θ>0

1
model="rw2"
Derivation of posterior marginals (I)

Since
x, y | θ ∼ N (·, ·)
(derived using π(x, y | θ) ∝ π(y | x, θ) π(x | θ)),
we can compute (numerically) all marginals, using that
Derivation of posterior marginals (I)

Since
x, y | θ ∼ N (·, ·)
(derived using π(x, y | θ) ∝ π(y | x, θ) π(x | θ)),
we can compute (numerically) all marginals, using that

π(x, θ | y )
π(θ | y) =
π(x | y , θ)
Gaussian
z }| {
π(x, y | θ) π(θ)
∝
π(x | y, θ)
| {z }
Gaussian
Posterior marginal for hyperparameter

Select a grid of points t1 , . . . , tk to represent the density θ | y.

(Here, the points are chosen to be equi-distant).
Posterior marginal for θ Posterior marginal for θ (interpolated)
● ●

● ●
0.012

0.012
● ●
Exponential of log density

Exponential of log density

● ●

● ●
0.008

0.008
● ●

● ●
0.004

0.004
● ●

● ●
● ●

● ●
0.000

0.000
● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●

1 2 3 4 5 6 1 2 3 4 5 6
Log precision log(θ) Log precision log(θ)
Derivation of posterior marginals (II)

From
x | y, θ ∼ N (·, ·)
Derivation of posterior marginals (II)

From
x | y, θ ∼ N (·, ·)
we can compute
Z
π(xi | y) = π(xi | y, θ) π(θ | y) dθ
| {z }
Gaussian
X
≈ π(xi | y, tk )π(tk | y)∆k
k

where tk , k = 1, . . . , K , correspond to representative points of

θ | y and ∆k are the corresponding weights (equal to 1 if points
are equi-distant).
Posterior marginal for latent parameters

Compute the conditional marginal posterior for each xi given tk .

Here, shown for x1 .
Posterior marginal for x1 for each θ (unweighted)
0.8
0.6
density

0.4
0.2
0.0

−13 −12 −11 −10 −9 −8 −7 −6

x1
Posterior marginal for latent parameters

Weigh the resulting (conditional) marginal posterior by the density

associated with each θk .
Posterior marginal for x1 for each θ (weighted)
0.4
0.3
weighted density

0.2
0.1
0.0

−13 −12 −11 −10 −9 −8 −7 −6

x1
Posterior marginal for latent parameters

Numerically sum over all conditional densities to obtain the

posterior marginal for each xi .
Posterior marginal for x1
0.8

●●●●●
0.6

● ●
● ●
●
●
●
●
●
●
●
●
density

●
●
0.4

●
●
●
●
●
●
●
● ●
● ●
●
0.2

●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●●
●● ●●
●● ●●●
●●●
0.0

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

−13 −12 −11 −10 −9 −8 −7 −6

x1
Fitted spline

The posterior marginals are used to calculate summary statistics,

like means, variances and credible intervals:

●
●
10

●
●
● ●
5

●
● ● ●
● ●
●
●
●
●
● ● ●
● ● ● ● ●
y

● ● ● ●
● ● ● ● ● ● ●
● ●
● ●
●
●
●
−5

●
●
● ● ● ●

●
−10

0 10 20 30 40 50

idx
R-code

1 formula = y ~ -1 + f ( idx , model = " rw2 " , constr = FALSE ,

2 hyper = list ( prec = list ( prior = " loggamma " , param = c (a , b ) ) ) )
3
4 result = inla ( formula ,
5 data = data . frame ( y =y , idx = idx ) ,
6 control . family = list ( initial = log ( tau _ 0) , fixed = TRUE
))
7
8 plot ( idx , y , pch =19)
9 lines ( result $ summary . random [[1]] $ mean , col =2 , lwd =2)
Extensions

This is the basic idea behind INLA. It is quite simple.

Extensions

This is the basic idea behind INLA. It is quite simple.

However, we need to extend this basic idea so we can deal with

1. More than one hyperparameter
2. Non-Gaussian observations
1. More than one hyperparameter

I Locate the mode

θ2

θ1
1. More than one hyperparameter

I Locate the mode z2

z1
I Compute the Hessian to
construct principal

θ2
components

θ1
1. More than one hyperparameter

I Locate the mode z2

z1
I Compute the Hessian to
construct principal

θ2
components
I Grid-search to locate bulk
of the probability mass

θ1
1. More than one hyperparameter

I Locate the mode z2

z1
I Compute the Hessian to
construct principal

θ2
components
I Grid-search to locate bulk
of the probability mass

θ1

All points found have equal area weight ∆k .

Alternatives for moderate number of hyperparameters

Integrating out the hyperparameter for moderate m (6 to 12) is

expensive as the number of evaluation points is exponential in m.

Alternatives:
I Extreme: use just the modal configuration (empirical Bayes)
I Use a central composite design (CCD), e.g. for m = 2
design points circle points
2. Non-Gaussian observations

In application we may choose likelihoods other than a Gaussian.

How does this change things?
Non-Gaussian, BUT KNOWN
z }| {
π(x, y | θ) π(θ)
π(θ | y) ∝
π(x | y, θ)
| {z }
Non-Gaussian and UNKNOWN

I In many cases π(x | y , θ) is very close to a Gaussian

distribution, and can be replaced with a Laplace
approximation.
The GMRF (Laplace) approximation
Let x denote a GMRF with precision matrix Q and mean µ.
Approximate

n
!
1 > X
π(x|θ, y ) ∝ exp − x Qx + log π(yi |xi )
2
i=1

by using a second-order Taylor expansion of log π(yi |xi ) around µ0 ,

say.
Recall

1 1
f (x) ≈ f (x0 ) + f 0 (x0 )(x − x0 ) + f 00 (x0 )(x − x0 )2 = a + bx − cx 2
2 2
with b = f 0 (x0 ) − f 00 (x0 )x0 and c = −f 00 (x0 ). (Note: a is not
relevant).
The GMRF approximation (II)
Thus,
n
!
1 > X
π̃(x|θ, y ) ∝ exp − x Qx + (ai + bi xi − 0.5ci xi2 )
2
i=1

1 T T
∝ exp − x (Q + diag(c))x + b x
2

to get a Gaussian approximation with precision matrix Q + diag(c)

and mean given by the solution of (Q + diag(c))µ = b. The
canonical parameterisation is

NC (b, Q + diag(c))

which corresponds to

N ((Q + diag(c))−1 b, (Q + diag(c))−1 ).

Illustration

Expansion around 0 Expansion around 0.5

1.0 1.0
full conditional mode
normal approximation ● expansion point
0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0.0 ● 0.0 ●

−1 0 1 2 3 4 −1 0 1 2 3 4

Expansion around 1 Expansion around 1.5

1.0 1.0

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0.0 ● 0.0 ●

−1 0 1 2 3 4 −1 0 1 2 3 4

If y | x, θ is Gaussian ”the approximation” is exact!

What do we get ...

π(x, y | θ) π(θ)
π̃(θ | y) ∝
π̃(x | y, θ)
x=x? (θ)

I find the mode of π̃(θ | y) (optimization)

I explore π̃(θ | y) to find grid points tk for numerical
integration.

However, why is it called integrated nested Laplace approximation?

There is another step that changes:

X
π(xi | y) ≈ π(xi | y, tk ) π̃(tk | y)∆k
| {z }
k
Not Gaussian!
Approximating π(xi |y , θ)

Three possible approximations:

1. Gaussian distribution derived from π̃G (x|θ, y ), i.e.

π̃(xi |θ, y ) = N (xi ; µi (θ), σi2 (θ))

with mean µi (θ) and marginal variance σi2 (θ).

However, errors in location and/or lack of skewness possible

2. Laplace approximation

3. Simplified Laplace approximation

Laplace approximation of π(xi |θ, y )

π(x, θ, y )
π̃LA (xi |θ, y ) ∝
π̃GG (x −i |xi , θ, y )
x −i =x ?
−i (xi ,θ)

The approximation is very good but expensive as n factorizations

of (n − 1) × (n − 1) matrices are required to get the n marginals.

Computational modifications exist:

1. Approximate the modal configuration of the GMRF
approximation.
2. Reduce the size n by only involving the “neighbors”.
Simplified Laplace approximation

Faster alternative to the Laplace approximation

I based on a series expansion up to third order of the numerator

and denominator of π̃LA (xi |θ, y )
I corrects the Gaussian approximation for error in location and
lack of skewness.

This is default option when using INLA but this choice can be
modified.
The integrated nested Laplace approximation (INLA)

Step I Approximate π(θ|y ) using the Laplace approximation and

select good evaluation points t k .

Step II For each t k and i approximate π(xi |y , t K ) using the

Laplace or simplified Laplace approximation for selected
values of xi .

Step III For each i, sum out t k

X
π̃(xi |y ) = π̃(xi |t k , y ) × π̃(t k |y ) × ∆k .
k
How can we assess the error in the approximations?

Tool 1: Compare a sequence of improved approximations

1. Gaussian approximation
2. Simplified Laplace
3. Laplace
No big differences → good approximation.
How can we assess the error in the approximations?

Tool 2: Estimate the ”effective” number of parameters and

compare this with the number of observations.

Experience ahs shown that n = 2 is usually very good.

Limitations

I The dimension of the latent field x can be large (102 –106 )

I But the dimension of the hyperparameters θ must be small
(≤ 15)

In other words, each random effect can be big, but there cannot be
too many random effects unless they share parameters.
How to represent the posterior?
The posterior
π(θ, x|y = data)
is a massive object!

Discuss:
I Samples from π(θ, x|y = data)
I Marginal, marginal-joint, and conditional distributions
I Samples from θ and Gaussian π(x|θ = sample, y = data)

1 demo ( " Epil " )

2 result = inla ( formula , family = " poisson " , data = Epil ,
control . compute = list ( config = T ) )
3 str ( result $ misc $ configs )
Optimisation and exploring hyper-space

Make sure the parametrisation is good. Asymptotically Gaussian,

but can we make it more Gaussian? More unimodal?
Optimise through gradient descent
Explore through Hessian and grid or CCD
All of these are just choices that you can choose! Inside INLA you
can choose precision vs speed! All the way from high dim
spacetime to 100 datapoints and a simple model, you can choose
an appropriate tradeoff!
Waypoint

So far Many details about INLA

Later Go over this, and read the paper
Now A few additional notes
Internal representation of hyper-parameters
Look through logfile

Practical: Use the logfile I have

Summary
Thank you for the attention

Any questions?

19-Bayesian 2
No ratings yet
19-Bayesian 2
39 pages
Lecture 5
No ratings yet
Lecture 5
23 pages
PracticalIntroInla Student
No ratings yet
PracticalIntroInla Student
34 pages
1.2.6 Advanced
No ratings yet
1.2.6 Advanced
5 pages
Intro Bayes Time Series 1
No ratings yet
Intro Bayes Time Series 1
72 pages
TheoryIdeasInla Student
No ratings yet
TheoryIdeasInla Student
34 pages
15 Ejs1092
No ratings yet
15 Ejs1092
26 pages
Introduction Student
No ratings yet
Introduction Student
14 pages
Lec15 16 Handout
No ratings yet
Lec15 16 Handout
33 pages
Gaussian Process Model: With Omar Knio (KAUST & Duke University)
No ratings yet
Gaussian Process Model: With Omar Knio (KAUST & Duke University)
17 pages
Stochastic Partial Differential Equations (SPDEs) 22077 3 A Key Idea
No ratings yet
Stochastic Partial Differential Equations (SPDEs) 22077 3 A Key Idea
9 pages
Chapter 2.3.6
No ratings yet
Chapter 2.3.6
4 pages
CS772 Lec5
No ratings yet
CS772 Lec5
22 pages
Chapter 2.3.3
No ratings yet
Chapter 2.3.3
3 pages
Talk On Regression Based Method For Bayesian Nonparanormal Graphical Models
No ratings yet
Talk On Regression Based Method For Bayesian Nonparanormal Graphical Models
40 pages
Chapter 3 - Bayesian Inference
No ratings yet
Chapter 3 - Bayesian Inference
114 pages
Chapter 4 ML Parametric Classification
No ratings yet
Chapter 4 ML Parametric Classification
42 pages
INLA Rinla
No ratings yet
INLA Rinla
87 pages
Lin Reg
No ratings yet
Lin Reg
34 pages
The Art of Gaussian Processes Classic and Contemporary
No ratings yet
The Art of Gaussian Processes Classic and Contemporary
216 pages
Lec12 13 BayesianInferenceForTheGaussian
No ratings yet
Lec12 13 BayesianInferenceForTheGaussian
57 pages
Wilson2020 Part1
No ratings yet
Wilson2020 Part1
52 pages
Tungban Probabilistic ML 2021 - Lecture09
No ratings yet
Tungban Probabilistic ML 2021 - Lecture09
46 pages
Sta457 Week 1 Lec Note
No ratings yet
Sta457 Week 1 Lec Note
48 pages
CS 601 Machine Learning Unit 5
No ratings yet
CS 601 Machine Learning Unit 5
18 pages
07 - Bayesian Learning
No ratings yet
07 - Bayesian Learning
55 pages
Bayesian Kernel Methods
No ratings yet
Bayesian Kernel Methods
40 pages
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
1 Notions On Sets Combinatorics
No ratings yet
1 Notions On Sets Combinatorics
61 pages
PRML Slides 2
No ratings yet
PRML Slides 2
86 pages
Lec22 Introduction2BayesianRegression
No ratings yet
Lec22 Introduction2BayesianRegression
42 pages
Lec5 Part2
No ratings yet
Lec5 Part2
33 pages
Baye's Theorem Questions and Answers - Sanfoundry
No ratings yet
Baye's Theorem Questions and Answers - Sanfoundry
5 pages
CB PDF
No ratings yet
CB PDF
69 pages
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Lec24 BayesianLinearRegression
No ratings yet
Lec24 BayesianLinearRegression
29 pages
Two-Variable Regression Analysis
No ratings yet
Two-Variable Regression Analysis
34 pages
Unit 5 - Machine Learning
No ratings yet
Unit 5 - Machine Learning
16 pages
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Bayes Gauss
100% (1)
Bayes Gauss
29 pages
Gaussian Processes in Machine Learning
No ratings yet
Gaussian Processes in Machine Learning
9 pages
Unit - 3
No ratings yet
Unit - 3
19 pages
Bayes Intro PT 2
No ratings yet
Bayes Intro PT 2
13 pages
IDF Curves
No ratings yet
IDF Curves
33 pages
Bayesian Modeling - Student
No ratings yet
Bayesian Modeling - Student
26 pages
Lecture2 2013
No ratings yet
Lecture2 2013
60 pages
Wk04 Machine Learning
No ratings yet
Wk04 Machine Learning
6 pages
Watson Introduccion A La Econometria PDF
No ratings yet
Watson Introduccion A La Econometria PDF
253 pages
Introduction To Bayesian Methods: Jessi Cisewski Department of Statistics Yale University
No ratings yet
Introduction To Bayesian Methods: Jessi Cisewski Department of Statistics Yale University
53 pages
ID Analisis Efektivitas Metode Forecasting Terhadap Permintaan Produk PT Arara Abad
No ratings yet
ID Analisis Efektivitas Metode Forecasting Terhadap Permintaan Produk PT Arara Abad
16 pages
Discrete Probability PDF
100% (3)
Discrete Probability PDF
272 pages
Inlaexercise1 V2
No ratings yet
Inlaexercise1 V2
4 pages
Carnaval and More
No ratings yet
Carnaval and More
1 page
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
From Everand
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
Gérard Blanchet
3/5 (1)
Cucanyes Dol - 1
No ratings yet
Cucanyes Dol - 1
1 page
Lesson-5-Part-Two-Hypergeometric 2
No ratings yet
Lesson-5-Part-Two-Hypergeometric 2
12 pages
Advanced ML Notes (Midterm)
No ratings yet
Advanced ML Notes (Midterm)
10 pages
Statistics and Probability
0% (1)
Statistics and Probability
14 pages
YZV231E Hw2
No ratings yet
YZV231E Hw2
4 pages
Bayes Optimization For Machine Learning
No ratings yet
Bayes Optimization For Machine Learning
29 pages
Ryan Adams 140814 Bayesopt Ncap
No ratings yet
Ryan Adams 140814 Bayesopt Ncap
84 pages
Machine Learning and Pattern Recognition Bayesian Complexity Control
No ratings yet
Machine Learning and Pattern Recognition Bayesian Complexity Control
4 pages
Advanced in La Topics
No ratings yet
Advanced in La Topics
2 pages
ML and MAP - HTML
No ratings yet
ML and MAP - HTML
9 pages
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Muhammad Reza Chandra Kusuma - 20180913150015 - TP4-W6-S7-R0 - 2140071320
No ratings yet
Muhammad Reza Chandra Kusuma - 20180913150015 - TP4-W6-S7-R0 - 2140071320
5 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
MTH 106 INTRODUCTORY TO DESCRIPTIVE STATISTICS
100% (1)
MTH 106 INTRODUCTORY TO DESCRIPTIVE STATISTICS
134 pages
CQF January 2022 M1L4 Exercises
No ratings yet
CQF January 2022 M1L4 Exercises
2 pages
Gaussian Process Tutorial by Andrew NG
No ratings yet
Gaussian Process Tutorial by Andrew NG
13 pages
A Tutorial On Gaussian Processes (Or Why I Don'T Use SVMS) : Zoubin Ghahramani
No ratings yet
A Tutorial On Gaussian Processes (Or Why I Don'T Use SVMS) : Zoubin Ghahramani
31 pages
Machine Learning and Pattern Recognition Gaussian Processes
No ratings yet
Machine Learning and Pattern Recognition Gaussian Processes
6 pages
RVM Tutorial
No ratings yet
RVM Tutorial
25 pages
Ajuste de DIstribucion Estadistica - Software Hydrogomon
No ratings yet
Ajuste de DIstribucion Estadistica - Software Hydrogomon
6 pages
04 Mahalanobis Distance in R MV PDF
No ratings yet
04 Mahalanobis Distance in R MV PDF
9 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Ghahramani Lecture2
No ratings yet
Ghahramani Lecture2
30 pages
Gaussian Process - Part 2: 1 2 N T I 1 2 N T
No ratings yet
Gaussian Process - Part 2: 1 2 N T I 1 2 N T
4 pages
Documentation For GPML Matlab Code
No ratings yet
Documentation For GPML Matlab Code
10 pages
Bayesian Linear Model Gory Details
No ratings yet
Bayesian Linear Model Gory Details
9 pages
MATH 102 Lesson 1 CalTech 1
No ratings yet
MATH 102 Lesson 1 CalTech 1
4 pages
114635812-Correlation and Regression
100% (1)
114635812-Correlation and Regression
5 pages
Business Statistics by Ken Black, 7 Edition Answers To Analyzing The Databases Questions in Chapter 7
No ratings yet
Business Statistics by Ken Black, 7 Edition Answers To Analyzing The Databases Questions in Chapter 7
5 pages
IEOR 6711: Stochastic Models I Fall 2013, Professor Whitt Lecture Notes, Tuesday, September 3 Laws of Large Numbers
No ratings yet
IEOR 6711: Stochastic Models I Fall 2013, Professor Whitt Lecture Notes, Tuesday, September 3 Laws of Large Numbers
6 pages
Factoring and Algebra - A Selection of Classic Mathematical Articles Containing Examples and Exercises on the Subject of Algebra (Mathematics Series)
From Everand
Factoring and Algebra - A Selection of Classic Mathematical Articles Containing Examples and Exercises on the Subject of Algebra (Mathematics Series)
CSPacademic
No ratings yet
Gaussian Mixture Model: P (X - Y) P (Y - X) P (X)
No ratings yet
Gaussian Mixture Model: P (X - Y) P (Y - X) P (X)
3 pages
Journal of Rock Mechanics and Geotechnical Engineering: Adam Hamrouni, Badreddine Sbartai, Daniel Dias
100% (1)
Journal of Rock Mechanics and Geotechnical Engineering: Adam Hamrouni, Badreddine Sbartai, Daniel Dias
8 pages
What Is Distribution?
No ratings yet
What Is Distribution?
4 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Module 3 Sampling and Sampling Distributions PDF
No ratings yet
Module 3 Sampling and Sampling Distributions PDF
11 pages
STA 112 Outline - 021843
No ratings yet
STA 112 Outline - 021843
5 pages
Introduction To Stochastic Processes
No ratings yet
Introduction To Stochastic Processes
22 pages
TABEL RANK SPEARMAN-spearman Ranked Correlation Table
No ratings yet
TABEL RANK SPEARMAN-spearman Ranked Correlation Table
1 page
Standard Normal (Z) Table: Area Between 0 and Z
No ratings yet
Standard Normal (Z) Table: Area Between 0 and Z
13 pages
Handout TMA 306
No ratings yet
Handout TMA 306
5 pages

TheoryIdeasInla Screen

Uploaded by

TheoryIdeasInla Screen

Uploaded by

Introduction and motivation

for the INLA course in Iceland 2016

Haakon Chris. Bakka

Norwegian University of Science and Technology

Wide time estimate: 4h presentation, 2h exercises.

This presentation is partially based on the wonderful and very

The goal of this presentation is to improve your understanding of

The level of details is set to the maximum we can get through in

Latent variable (linear predictor)

1. The number of hyperparameters |θ| is small, typically 2 to 5,

ηij = µ + cij β + ui + vi + wij , i = 1, . . . , N, j = 1, . . . , M

for covariates cij and “random effects” u, v and w.

If we assign Gaussian priors on µ, β, u and v, then

0.0 0.2 0.4 0.6 0.8 1.0

I Data is sequence of 0 and 1s

with parameters (φ, σ2 ).

We can reinterpret the model as

I dim(x) could be large 102 -105

0.0 0.2 0.4 0.6 0.8 1.0

I Count data, with lots of zeros

We want to compute probabilities in, and to sample from, big

So far There are a million interesting models. These models

A GMRF is a simple construct

I Additional Markov properties:

xi and xj are conditional independent (CI).

If xi ⊥ xj | x−ij for a set of {i, j}, then we need to constrain the

The density of a zero mean Gaussian

Constraining the parametrisation to obey CI properties

xt | xt−1 , . . . , x1 ∼ N (φxt−1 , 1), t = 2, . . . , n

and x1 ∼ N (0, (1 − φ2 )−1 ).

Tridiagonal precision matrix

So far We know what a GMRF (Gaussian Markov Random

1 Q = INLA ::: inla . rw (100 , 2)

1 demo ( " Epil " )

Posterior Prior Likelihood

from which we can derive the quantities of interest, such as

Fix the hyperparameters to θ0 , then

π(x|y = data) ∝ π(x, y = data)

where everything is Gaussian. To renormalise, we need to compute

Latent Gaussian models

where the precision matrix Q(θ) is sparse. Generally these

The sparseness can be exploited for very quick computations for

Use the posterior distribution

π(x, θ | y) ∝ π(θ)π(x | θ)π(y | x, θ)

to approximate the posterior marginals

π(xi | y ) and π(θj | y )

Let us consider a toy example to illustrate the ideas.

4 fun = 100 * (( idx - n / 2) / n ) ^3 ●

1. Data: Gaussian observations with known precision

1. Data: Gaussian observations with known precision

2. Latent model: A random walk of second order1

1. Data: Gaussian observations with known precision

2. Latent model: A random walk of second order1

3. Hyperparameter: The smoothing parameter θ which we assign

π(θ) ∝ θa−1 exp (−bθ) , θ>0

Select a grid of points t1 , . . . , tk to represent the density θ | y.

Exponential of log density

where tk , k = 1, . . . , K , correspond to representative points of

Compute the conditional marginal posterior for each xi given tk .

−13 −12 −11 −10 −9 −8 −7 −6

Weigh the resulting (conditional) marginal posterior by the density

−13 −12 −11 −10 −9 −8 −7 −6

Numerically sum over all conditional densities to obtain the

−13 −12 −11 −10 −9 −8 −7 −6

The posterior marginals are used to calculate summary statistics,

1 formula = y ~ -1 + f ( idx , model = " rw2 " , constr = FALSE ,

This is the basic idea behind INLA. It is quite simple.

This is the basic idea behind INLA. It is quite simple.

However, we need to extend this basic idea so we can deal with

I Locate the mode

I Locate the mode z2

I Locate the mode z2

I Locate the mode z2

All points found have equal area weight ∆k .

Integrating out the hyperparameter for moderate m (6 to 12) is

In application we may choose likelihoods other than a Gaussian.

with parameters (φ, σ2 ).