0% found this document useful (0 votes)
17 views4 pages

Inlaexercise1 V2

Exercises using INLA v2

Uploaded by

bayesianito
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views4 pages

Inlaexercise1 V2

Exercises using INLA v2

Uploaded by

bayesianito
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Problem set for INLA course in Iceland 2016

Haakon Bakka <[email protected]>

Department of Mathematical Sciences,


Norwegian University of Science and Technology

Problem 1: Basic INLA setup and a linear regression


It should be possible to do this problem without knowing anything about INLA. I assume everyone knows
basic R programming.

1. Start a recent version of R (> 3.1.0). (Suggested IDE for R: RStudio, https://www.rstudio.com)

2. Install package R-INLA, or update the one you have


> install.packages("INLA", repos="https://www.math.ntnu.no/inla/R/testing")
> inla.update(testing=T)
3. Load package R-INLA:
> library(INLA)
4. Download the data sets from the internet and save it in the current working directory. All the files related
to this problem set are available at http://tinyurl.com/h36omdm.

5. Read the Seeds data into R using the supplied script


> data(Seeds)
> df = Seeds
> df$y = df$r

6. Make yourself familiar with the data set (a short summary and /or some plots), also:
> ?Seeds
7a. Fit a simple linear regression with R-INLA
> formula1 = y ~ x1 + x2 + n + plate
> res = inla(formula=formula1, data = df, family="gaussian")
> summary(res)
> plot(res)
1
read the summary and look at the plots.

7b. Make sure to let me know that you have come this far! (I probably gave you a sheet/poll to keep track
of this.)
1 I do not consider this a good model, at all. In these first exercises, we are learning how to fit a given model and look at

the results. At the moment we do not care if the model is a good representation of reality. That will come later!

1
Problem 2: The inference results
1a. Plot the marginal posterior for the fixed effects yourself.

1b. Plot the (marginal posterior of) the hyper-parameter τ (precision) by


> plot(res$marginals.hyperpar$P)
> plot(res$marginals.hyperpar[[1]])
In your code, remove the P and use the autocomplete feature to find the actual name of the hyper-parameter.

1c. Plot the marginal posterior for the hyper parameter σ 2 = 1/τ using
> inla.tmarginal(function(x){(1/x)}, res$marginals.hyperpar[[1]])
and get estimates and credible intervals for sd (standard deviation) σ.

2. Look at the posterior mean of the hyper-parameter in summary(res), and compute this value by using
> inla.emarginal()
3a. Compute the residuals by
> res = inla(formula=formula1, data = df, family="gaussian",
control.predictor=list(compute=TRUE))
> err = res$summary.fitted.values[ , "mean"]-df$y
and summarise them.

3b. Use advanced google skills to find control.predictor on the INLA homepage. This means, google

"control.predictor" site:r-inla.org
3c. Plot or histogram these residuals in any way you like

4. Write down the entire prior model π(y) on a sheet of paper. There are some things you need to know;
look them up or ask me specific questions. Hints:2

4a. Write down π(y|η)π(η|θ)π(θ)

4b. Look at only one of the documents you get when running
> inla.doc("gaussian")
alternatively, use http://www.r-inla.org/models/likelihoods.

4c. Run this code to find π(βj )

> ?control.fixed
> tmp = inla(y~x, data = data.frame(y=0:2, x= rep(NA, 3)),
control.fixed = list(expand.factor.strategy="inla"))
> summary(tmp)
5. Have a look around the inla result object

> str(res, 1)
> str(res$[whatever], 1)
2I don’t want to give you the answers directly, I would rather show you how to find answers.

2
Problem 3: A model with random effects
This is a Generalised Linear Mixed Model, which is just a fancy name for ”fixed effects and random effects”.
You can also call this a GAM (Generalised Additive Model), or an LGM (Latent Gaussian Model). We will
continue with the dataset Seeds:
This data concerns the proportion of seeds that germinated on each of 21 plates arranged
according to a 2 by 2 factorial layout by seed and type of root extract. The data are shown
below, where ri and ni are the number of germinated and the total number of seeds on the i th
plate, i = 1, ..., N .
1a. Fit the model

yi ∼ P ois(ni eηi )
η = Xβ

with the covariates (X) of your choice. Write down the entire model. Note the two different symbols η, n.

1b. Add an iid effect (unstructured effect) to the formula you used in ”1a”. Write down the entire model.3
2. Fit this model:
The model is essentially a random effects logistic, allowing for over-dispersion. If pi is the
probability of germination on the i th plate, we assume

ri ∼ Binomial(pi , ni )
logit(pi ) = a0 + a1 x1i + a2 x2i + i

where x1i , x2i are the seed type and root extract of the i th plate. Write down the entire model.
3. Your favourite biologist wants to know how the effect changes from root extract 1 to root extract 2. Set
up different ways to interpret this question, and find the answers (write code), using the binomial model.
Feel free to add the naive estimates to your plots / tables. Feel obligated to show any relevant uncertainty
in a clear way.

4. You want to have interactions between the different factors x1 and x2. Use
> df$x1x2 = as.factor(df$x1*10 + df$x2)
and create a model similar to the previous model. Make sure it is self-consistent (that nothing is modelled
twice).

5. Change the prior for the hyper-parameter of the iid effect. Information:
> inla.doc("pc.prec")

Problem 4: Set up this for another dataset


Create a new script. Read ”Surg” data into R.
> df = read.table("Surg.txt", header = TRUE, stringsAsFactors = FALSE)
This data considers mortality rates in 12 hospitals performing cardiac surgery in babies. The first three rows
of the data are:
n r hospital
1 47 0 1
2 148 18 2
3 119 8 3
3I mean: Write down the prior model π(y) on a sheet of paper. Use at least the 3 levels in the INLA hierarchical model
way. Clearly separate out the latent Gaussian.

3
2. Create an appropriate model for this data, using binomial likelihood and logit link function.

3. Change the link function to ”probit”.

4. Compare
> res$summary.linear.predictor
> res$summary.fitted.values

for the two models (the two link functions). Also compare these to
> df$r/df$n

Problem 5: Time series and nonlinear effects


In this section we look at time series. But time t is just another covariate. What we do with the time
covariate, you may do with any covariate.

1. Read ”Simdata” data into R. This simulated data contains a series of count data y ∼ P oi(exp(λ)),
where λt = ρλt−1 + t with ρ = 0.8. Fit this model in R-INLA, and plot the marginal posterior for the
autocorrelation in AR(1) process.

1b. Fit the model again, but this time, fix ρ = 0.8. Do you see any differences in the results?

2. Load data “Tokyo” using data(Tokyo). The number of occurrences of rainfall over 1 mm in the Tokyo
area for each calendar year during two years (1983-84) are registered. It is of interest to estimate the
underlying probability pt of rainfall for calendar day t which is, apriori, assumed to change gradually over
time. The likelihood model is binomial yt |ηt ∼ Bin(nt , pt ) with logit link function

exp(ηt )
pt =
1 + exp(ηt )

The model for the latent variables can be written as ηt = f (t), where t is the observed time whose effect is
modelled as a smooth function f ().

(a) Fit the above model in R-INLA, using a cyclic RW2 (random walk of order 2).
(b) Plot the smooth function and the data together.
3. Read the ”Birth” data into R. This data contains the number of births per month in New York city, from
January 1946 to December 1959 (originally collected by Newton). This seasonal time series consists of an
average mean, a trend component, a seasonal component and an irregular component. Decomposing the time
series means separating the time series into these three components: that is, estimating trend component
and seasonal component.

(a) Fit the above model in R-INLA. Use ”RW2” and ”AR1”
(b) Plot the trend component.
(c) Plot the seasonal component.

Acknowledgement
Parts of this problem set was originally written by Geir-Arne Fuglstad and Jingyi Guo. Thank you!

You might also like