Inlaexercise1 V2
Inlaexercise1 V2
1. Start a recent version of R (> 3.1.0). (Suggested IDE for R: RStudio, https://www.rstudio.com)
6. Make yourself familiar with the data set (a short summary and /or some plots), also:
> ?Seeds
7a. Fit a simple linear regression with R-INLA
> formula1 = y ~ x1 + x2 + n + plate
> res = inla(formula=formula1, data = df, family="gaussian")
> summary(res)
> plot(res)
1
read the summary and look at the plots.
7b. Make sure to let me know that you have come this far! (I probably gave you a sheet/poll to keep track
of this.)
1 I do not consider this a good model, at all. In these first exercises, we are learning how to fit a given model and look at
the results. At the moment we do not care if the model is a good representation of reality. That will come later!
1
Problem 2: The inference results
1a. Plot the marginal posterior for the fixed effects yourself.
1c. Plot the marginal posterior for the hyper parameter σ 2 = 1/τ using
> inla.tmarginal(function(x){(1/x)}, res$marginals.hyperpar[[1]])
and get estimates and credible intervals for sd (standard deviation) σ.
2. Look at the posterior mean of the hyper-parameter in summary(res), and compute this value by using
> inla.emarginal()
3a. Compute the residuals by
> res = inla(formula=formula1, data = df, family="gaussian",
control.predictor=list(compute=TRUE))
> err = res$summary.fitted.values[ , "mean"]-df$y
and summarise them.
3b. Use advanced google skills to find control.predictor on the INLA homepage. This means, google
"control.predictor" site:r-inla.org
3c. Plot or histogram these residuals in any way you like
4. Write down the entire prior model π(y) on a sheet of paper. There are some things you need to know;
look them up or ask me specific questions. Hints:2
4b. Look at only one of the documents you get when running
> inla.doc("gaussian")
alternatively, use http://www.r-inla.org/models/likelihoods.
> ?control.fixed
> tmp = inla(y~x, data = data.frame(y=0:2, x= rep(NA, 3)),
control.fixed = list(expand.factor.strategy="inla"))
> summary(tmp)
5. Have a look around the inla result object
> str(res, 1)
> str(res$[whatever], 1)
2I don’t want to give you the answers directly, I would rather show you how to find answers.
2
Problem 3: A model with random effects
This is a Generalised Linear Mixed Model, which is just a fancy name for ”fixed effects and random effects”.
You can also call this a GAM (Generalised Additive Model), or an LGM (Latent Gaussian Model). We will
continue with the dataset Seeds:
This data concerns the proportion of seeds that germinated on each of 21 plates arranged
according to a 2 by 2 factorial layout by seed and type of root extract. The data are shown
below, where ri and ni are the number of germinated and the total number of seeds on the i th
plate, i = 1, ..., N .
1a. Fit the model
yi ∼ P ois(ni eηi )
η = Xβ
with the covariates (X) of your choice. Write down the entire model. Note the two different symbols η, n.
1b. Add an iid effect (unstructured effect) to the formula you used in ”1a”. Write down the entire model.3
2. Fit this model:
The model is essentially a random effects logistic, allowing for over-dispersion. If pi is the
probability of germination on the i th plate, we assume
ri ∼ Binomial(pi , ni )
logit(pi ) = a0 + a1 x1i + a2 x2i + i
where x1i , x2i are the seed type and root extract of the i th plate. Write down the entire model.
3. Your favourite biologist wants to know how the effect changes from root extract 1 to root extract 2. Set
up different ways to interpret this question, and find the answers (write code), using the binomial model.
Feel free to add the naive estimates to your plots / tables. Feel obligated to show any relevant uncertainty
in a clear way.
4. You want to have interactions between the different factors x1 and x2. Use
> df$x1x2 = as.factor(df$x1*10 + df$x2)
and create a model similar to the previous model. Make sure it is self-consistent (that nothing is modelled
twice).
5. Change the prior for the hyper-parameter of the iid effect. Information:
> inla.doc("pc.prec")
3
2. Create an appropriate model for this data, using binomial likelihood and logit link function.
4. Compare
> res$summary.linear.predictor
> res$summary.fitted.values
for the two models (the two link functions). Also compare these to
> df$r/df$n
1. Read ”Simdata” data into R. This simulated data contains a series of count data y ∼ P oi(exp(λ)),
where λt = ρλt−1 + t with ρ = 0.8. Fit this model in R-INLA, and plot the marginal posterior for the
autocorrelation in AR(1) process.
1b. Fit the model again, but this time, fix ρ = 0.8. Do you see any differences in the results?
2. Load data “Tokyo” using data(Tokyo). The number of occurrences of rainfall over 1 mm in the Tokyo
area for each calendar year during two years (1983-84) are registered. It is of interest to estimate the
underlying probability pt of rainfall for calendar day t which is, apriori, assumed to change gradually over
time. The likelihood model is binomial yt |ηt ∼ Bin(nt , pt ) with logit link function
exp(ηt )
pt =
1 + exp(ηt )
The model for the latent variables can be written as ηt = f (t), where t is the observed time whose effect is
modelled as a smooth function f ().
(a) Fit the above model in R-INLA, using a cyclic RW2 (random walk of order 2).
(b) Plot the smooth function and the data together.
3. Read the ”Birth” data into R. This data contains the number of births per month in New York city, from
January 1946 to December 1959 (originally collected by Newton). This seasonal time series consists of an
average mean, a trend component, a seasonal component and an irregular component. Decomposing the time
series means separating the time series into these three components: that is, estimating trend component
and seasonal component.
(a) Fit the above model in R-INLA. Use ”RW2” and ”AR1”
(b) Plot the trend component.
(c) Plot the seasonal component.
Acknowledgement
Parts of this problem set was originally written by Geir-Arne Fuglstad and Jingyi Guo. Thank you!