Mixed Models Day 2:
Longitudinal Data I
Modelling Time as Continuous
Rebecca Stellato
Examples of Longitudinal Data
• Example (Reisby et al.)
o 66 patients
o with or without endogenous depression (level 2)
o depression scores measured weekly at weeks 0 – 5, using Hamilton
Depression Rating Scale (HDRS) (level 1)
o (week is a level-1 variable)
o from week 1 onwards, patients are treated with imipramine
o Research question: do patients with endogenous depression respond
better to treatment with an antidepressant than patients with
nonendogenous depression ↔ is the pattern of HDRS over time
different for patients with endogenous and nonendogenous
depression?
2
Examples of Longitudinal Data
• Example (Stoop et al. 2012)
o 14 patients with Hurler syndrome
o after haematopoietic stem cell transplantation
o various radiologic measurements, including the odontoid/body ratio
o Research question: what is the pattern of orthopedic manifestations
after stem cell transplant?
3
Examples of Longitudinal Data
Reisby et al. Stoop et al.
4
First questions I ask when looking at a longitudinal
study:
• Is everyone (theoretically) measured at the same fixed occasions in
time, or are the measurements at varying occasions per person?
o If fixed occasions: how many measurements (theoretically) per person?
• Is it reasonable to think the pattern(s) over time is (are) linear?
o If so: time can be continuous in fixed part of the model, and we can
think about random coefficients for linear time effect (random slopes
for time)
• And of course... what is the research question?
5
Example: Reisby Data
Which fixed effects?
• Research question: differing patterns over time for the two groups?
o fixed effects:
➢ (intercept)
➢ time
➢ group
➢ group*time – why?
o time continuous or categorical?
• HDRS over time ±linear – okay to use time as continuous
6
Aside: interactions
group and time main group and time main main effect time only
effects, group*time effects
interaction
7
Aside: interactions (ignoring random effects & subscripts)
• Main effects + interaction: different slopes & intercepts for groups
𝑦 = 𝛽0 + 𝛽1 ∙ 𝑡𝑖𝑚𝑒 + 𝛽2 ∙ 𝑒𝑛𝑑𝑜 + 𝛽3 ∙ 𝑒𝑛𝑑𝑜 ∙ 𝑡𝑖𝑚𝑒
endo = 0: 𝑦 = 𝛽0 + 𝛽1 ∙ 𝑡𝑖𝑚𝑒 + 𝛽2 ∙ 0 + 𝛽3 ∙ 0 ∙ 𝑡𝑖𝑚𝑒
𝑦 = 𝛽0 + 𝛽1 ∙ 𝑡𝑖𝑚𝑒
endo = 1: 𝑦 = 𝛽0 + 𝛽1 ∙ 𝑡𝑖𝑚𝑒 + 𝛽2 ∙ 1 + 𝛽3 ∙ 1 ∙ 𝑡𝑖𝑚𝑒
𝑦 = 𝛽0 + 𝛽1 ∙ 𝑡𝑖𝑚𝑒 + 𝛽2 + 𝛽3 ∙ 𝑡𝑖𝑚𝑒 ↔
𝑦 = 𝛽0 + 𝛽2 + (𝛽1 +𝛽3 ) ∙ 𝑡𝑖𝑚𝑒
• Main effects for time & endo: same slope, different intercepts
𝑦 = 𝛽0 + 𝛽1 ∙ 𝑡𝑖𝑚𝑒 + 𝛽2 ∙ 𝑒𝑛𝑑𝑜
endo = 0: 𝑦 = 𝛽0 + 𝛽1 ∙ 𝑡𝑖𝑚𝑒
endo = 1: 𝑦 = (𝛽0 + 𝛽2 ) + 𝛽1 ∙ 𝑡𝑖𝑚𝑒
• Time only: everyone has same line: 𝑦 = 𝛽0 + 𝛽1 ∙ 𝑡𝑖𝑚𝑒
8
Example: Reisby Data
Which random effects?
• How to deal with multiple measurements?
o Random effects
• intercept per patient? YES!
– each patient seems to have a different starting point
• coefficient of time per patient? Probably.
– looks like patients have differing slopes over time
– (also: RI + RS for time almost always fits longitudinal data better)
9
Example: Reisby Data
“Wide” version “Long” version
10
Example: Reisby Data
Descriptive Statistics (R)
> describeBy(x=reisby.wide[,3:8], group=reisby.wide$endo, skew=FALSE)
group: 0
vars n mean sd min max range se
hdrs.0 1 28 22.79 4.12 15 33 18 0.78
hdrs.1 2 29 20.48 3.83 11 27 16 0.71
hdrs.2 3 28 17.00 4.35 10 28 18 0.82
hdrs.3 4 29 15.34 6.17 0 26 26 1.15
hdrs.4 5 29 12.62 6.72 0 28 28 1.25
hdrs.5 6 27 11.22 6.34 1 29 28 1.22
--------------------------------------------------------
group: 1
vars n mean sd min max range se
hdrs.0 1 33 24.00 4.85 17 34 17 0.84
hdrs.1 2 34 23.00 5.10 15 39 24 0.87
hdrs.2 3 37 19.30 6.08 10 33 23 1.00
hdrs.3 4 36 17.28 6.56 7 32 25 1.09
hdrs.4 5 34 14.47 7.17 2 34 32 1.23
hdrs.5 6 31 12.58 7.96 0 33 33 1.43
describeBy() function in the ‘psych’ package (gives more stats than
presented here)
11
Example: Reisby Data
correlations between HDRS measurements (R)
> round(cor(reisby.wide[,3:8], use="pairwise.complete.obs"), digits=3)
hdrs.0 hdrs.1 hdrs.2 hdrs.3 hdrs.4 hdrs.5
hdrs.0 1.000 0.493 0.410 0.333 0.227 0.184
hdrs.1 0.493 1.000 0.494 0.412 0.308 0.218
hdrs.2 0.410 0.494 1.000 0.738 0.669 0.461
hdrs.3 0.333 0.412 0.738 1.000 0.817 0.568
hdrs.4 0.227 0.308 0.669 0.817 1.000 0.654
hdrs.5 0.184 0.218 0.461 0.568 0.654 1.000
12
Example: Reisby Data
“Spaghetti Plot” (R, using ggplot2 on long version of dataset)
0 1
30
hdrs
20
10
13
1 2 3 4 5 6 1 2 3 4 5 6
week
Linear time effect
• Time technically measured in categories (weeks 1, 2 ...)
• Reasonable to model time as linear?
o Pro: 1 parameter for slope of HDRS in time
o Possible con: time effect might not be linear
• need to check whether this assumption is reasonable:
– initial data analysis (spaghetti plot, individual plots)
– checking model assumptions
– model comparisons (more on day 3)
14
Random intercept with linear time effect
Predicted values from a LME with random intercept and linear time:
15
Random intercept + random linear time effect
• We can make the assumption that the (linear) time effect is different
for each individual by incorporating a random (linear) time effect:
o 𝑦𝑖𝑗 = 𝛽0 + 0𝑖 + 𝛽1 + 1𝑖 ∙ 𝑡𝑖𝑚𝑒𝑖𝑗 + 𝛽2 ∙ 𝑒𝑛𝑑𝑜𝑖 + 𝛽3 ∙ 𝑒𝑛𝑑𝑜𝑖 ∙ 𝑡𝑖𝑚𝑒𝑖𝑗 + 𝜀𝑖𝑗
o 𝜀𝑖𝑗 ~𝑁 0, 𝜎𝑒 2 ;
o 0𝑖 ~𝑁(0, 𝜎0 2 ) ; 1𝑖 ~𝑁(0, 𝜎1 2 ) ; 𝑐𝑜𝑣 0𝑖 , 1𝑖 = 𝜎01
0𝑖 𝜎0 2 𝜎01
• This last line can also be written: ~𝑁 0, 𝜮 , 𝜮 =
1𝑖 𝜎01 𝜎1 2
• 𝜮 is the variance-covariance matrix of the random effects
16
R output random intercept + random slope model
Parameter estimates of the fixed part of the previous model:
Value Std.Error DF t-value p-value
(Intercept) 22.476263 0.7986132 307 28.144117 0.0000
time -2.365687 0.3134845 307 -7.546425 0.0000
endo 1.988021 1.0747911 64 1.849681 0.0690
time:endo -0.027056 0.4217258 307 -0.064155 0.9489
17
R output random intercept + random slope model
Parameter estimates of the random part (intercept, slope) of the model:
StdDev Corr
(Intercept) 3.411893 (Intr)
time 1.441193 -0.285
Residual 3.495500
• So:
o 𝜎ො𝑒 2 = 3.502 = 12.22 ; 𝜎ො0 2 = 3.412 = 11.64 ; 𝜎ො1 2 = 1.442 = 2.08
𝑜𝑟𝑟 0𝑖 , 1𝑖 = 01ൗ 𝜎ෝ ∙ෝ𝜎
ෝ
𝜎
o 𝑐ෟ = −0.285
0 1
→ 𝜎ො01 = −0.285 ∙ 3.41 ∙ 1.44 = −1.40
Note: random intercept and slope are negatively correlated (the higher
the intercept the more negative the slope); often true in longitudinal
data
18
Interpretation of model
• Intercept (22.48) is average HDRS score when all variables = 0
o so for patients with nonendogenous depression (reference) at time = 0
• Estimate for endo (1.99) is average difference in HDRS between
endogenous and nonendogenous patients at time = 0
o patients with nonendogenous depression start with average of 22.48
o patients with endogenous depression start with average of 22.476 +
1.988 = 24.46
• Estimate of random intercept s.d. 3.41 indicates considerable
fluctuation around fixed intercepts:
o patients can start quite a bit higher/lower than average
19
Interpretation of model, cont.
• “Average” slope is -2.37 for patients with exogenous depression
• Interaction endo*time (-0.027) is difference in slope endogenous vs.
nonendogenous
o nonendogenous: per time unit (1 week) the HDRS score decreases on
average by 2.37
o endogenous per time unit (1 week) the HDRS score decreases on
average by (2.37+0.027=) 2.39
• Estimate of random slope s.d. 1.44, so for individuals the slope can
be quite a bit steeper or flatter, may even be positive for some
patients (as seen in the plot).
20
Random intercept + random linear time effect
Predicted values from a LME with random intercept + random linear
time effect:
21
Break & practice
• Exercise 1
• lecture starts again at …
22
Testing in Mixed Models
• To decide which LME model fits the data best we can use likelihood-
based methods:
o Likelihood Ratio Test (LRT)
• LRT can be used to test nested models (one is a special case of the other)
• based on the χ²-distribution
o Akaikes Information Criterium (AIC)
• combination of likelihood and # parameters used in the model (d.f.)
• model with the lowest AIC (high likelihood with few parameters) is deemed
best
• can be used for nested and non-nested models
23
(Restricted) Maximum Likelihood Estimation
• Mixed models: maximum likelihood used to estimate fixed
regression coefficients, SE’s, and variances of random effects
o likelihood quite complex, solved by iteration until convergence
• (Empirical Bayes methods used to estimate individual random
effects)
• Problem with ML estimation:
o variance parameters (residual variance, variance(s) of random effect(s))
biased downwards
• Solution: REstricted (or: REsidual) Maximum Likelihood (REML)
o gives unbiased estimates of variance parameters
o BUT: adjusts likelihood for number of covariates in model, so cannot be
used to compare models that differ w.r.t. fixed parts of model
24
When to use ML, REML?
• Testing models that differ in variance components:
o REML will give interpretable LRT, AIC
o so will ML
• Testing models that differ in fixed effects:
o only ML will give interpretable LRT, AIC
• Reporting results (esp if you include the random components):
o use REML!
25
When to use ML, REML?
• Leading me to suggest the following model-building strategy:
1. Start with full fixed model and (using ML estimation - but REML also
okay), select appropriate random part of model
2. With the random part chosen, (using ML estimation) try to reduce fixed
part of model
3. Once you have your final model: run that model once more using
REML; this is the model you present to your audience
• Testing random effect(s):
o variance parameters are never <0
o LRT (REML/ML) for random effects: chi-square test, but divide p-value
by 2
o AIC also okay (but please: no AIC + 2)
• Testing fixed effect(s):
o LRT (ML only!) for fixed effects: chi-square test, usual p-value
o AIC okay (only under ML)
26
Reisby example, random part of the model
• Compare LMEs we’ve fit so far:
o (note: I’m using ML estimation here)
o fixed: time, endo, time*endo
o random: intercept vs. int+slope
> anova(lme.ril, lme.ris)
Model df AIC BIC logLik Test L.Ratio p-value
lme.ril 1 6 2294.137 2317.699 -1141.069
lme.ris 2 8 2230.929 2262.345 -1107.465 1 vs 2 67.20798 <.0001
Using LRT: RI+RS significantly better than only RI
Using AIC: RI+RS better than only RI (AIC lower)
27
Reisby example, fixed part of the model
• Now we have the random structure (for now), we’ll look at the fixed
part of the model
• Three possibilities:
o only time: one intercept & slope for both groups
o endo + time: two intercepts & one slope for both groups
o endo*time (both main effects + interaction): two intercepts & slopes:
separate lines for the two groups
• We use ML estimation for testing the fixed part of the model
28
Reisby example, fixed part of the model
> lme2.ris<-update(lme.ris, fixed=hdrs ~ time+endo)
> lme3.ris<-update(lme.ris, fixed=hdrs ~ time)
> anova(lme.ris, lme2.ris, lme3.ris)
Model df AIC BIC logLik Test L.Ratio p-value
lme.ris 1 8 2230.929 2262.345 -1107.465
lme2.ris 2 7 2228.933 2256.422 -1107.467 1 vs 2 0.004160 0.9486
lme3.ris 3 6 2231.037 2254.599 -1109.519 2 vs 3 4.104108 0.0428
• interaction not significant: no evidence that time effect differs for the
groups
• effect of endo (just barely) significant: significant difference between
depression scores of people with and without endogenous
depression
29
Reisby example, final model with REML
> lme2.ris.reml <- update(lme2.ris, method="REML")
> summary(lme2.ris.reml)
Linear mixed-effects model fit by REML
Data: reisby.long
AIC BIC logLik
2228.116 2255.548 -1107.058
Random effects:
Formula: ~time | id
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
(Intercept) 3.490342 (Intr)
time 1.457808 -0.287
Residual 3.494719
30
Reisby example, final model
Fixed effects: hdrs ~ time + endo
Value Std.Error DF t-value p-value
(Intercept) 22.492881 0.7598098 308 29.603306 0.0000
time -2.380472 0.2103154 308 -11.318581 0.0000
endo 1.956867 0.9658720 64 2.026011 0.0469
Correlation:
(Intr) time
time -0.318
endo -0.704 -0.008
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.73520482 -0.49503123 0.03559898 0.49317021 3.62063687
Number of Observations: 375
Number of Groups: 66
31
Statistical conclusions imipramine example (so far)
• A model with fixed linear time effect a fixed effect for endo, and a
random intercept and random slope for time seems to provide the
“best” fit for these data
• We still need to check some model assumptions (tomorrow)
32
“Clinical” conclusions imipramine example
• There was no significant interaction between time and group
➢ Time trends same for patients with endogenous and nonendogenous
depression: the lines run parallel
• There was a (barely) significant main effect for group
➢ Patients with endogenous depression score 2.0 points (95%CI: 0.03 -
3.9) higher on HDRS than those with nonendogenous depression
• The effect of time is statistically significant
➢ For patients with both endogenous and nonendogenous depression,
HDRS scores decrease, on average, by 2.4 (95%CI: 2.0 - 2.8) points per
week
➢ On average 5*2.38 = 11.9 points in the course of the study
➢ Am & Eur guidelines suggest that a 3-point change is clinically relevant
• Before presenting these results, we need to check our model
assumptions! 33
Break & practice
• Exercise 2
• Lectures starts at …
34
Centering Explanatory Variables
• In the London schools dataset, both the outcome and the intake test
had been “centered” (actually, both were standardized)
• What is the effect of centering an explanatory variable?
o changes the interpretation of the fixed intercept
o can change the variance of the random intercepts, and the correlation
of random intercepts with random slopes
35
Centering explanatory variables
Source: Hedeker and Gibbons, Longitudinal Data Analysis. Wiley & Sons 2006, p. 60
https://www.wiley.com/en-nl/Longitudinal+Data+Analysis-p-9780471420279
Inclusion is permitted according to the agreement with the publisher. 36
Centering explanatory variables
• Take Reisby data, center time (week 2.5 becomes 0 point)
o for sake of simplicity, use model with just fixed effect of time, random
effects for intercept and time
37
Centering explanatory variables
Parameter estimate Model 1 (time not Model 2 (time
centered) centered)
Fixed: intercept 23.58 17.63
Fixed: time -2.38 -2.38
Random: intercept (s.d.) 3.60 4.34
Random: slope of time (s.d.) 1.46 1.46
Random: corr (int-slope) -0.28 0.61
Residual (s.d.) 3.49 3.49
Estimates for fixed intercept, variation of random intercepts, and
correlation rand int-slope all changed! 38
Linear mixed effects models with polynomial terms
• Instead of linear trends over time, it is quite possible to observe non-
linear trends (think of children’s growth, for instance)
• There are many non-linear models that can be used within mixed
models (beyond the scope of this course)
• It is possible to fit polynomials as part of a “linear” mixed model
39
Example: Herpes Antibody Levels
• 45 children suffering from
o solid lump tumour (N=18)
o leukemia (N=27)
• Measurements of antibody levels to a herpes virus taken during
hospital visits for courses of chemotherapy
• Duration: 1 mo - 3 yrs (median 12 mo)
• Intervals between measurements differed per child
• Questions:
o are antibody levels affected by chemo?
o if so, is change related to cancer type?
40
Linear mixed effects models with polynomial terms
leukemia
AL
8
6
virus
0 10 20 30 40
month
41
Linear mixed effects models with polynomial terms
solid lump tumor
ST
6
virus
0 5 10 15
month
42
Linear mixed effects models with polynomial terms
Source: Brown & Prescott, Applied Mixed Models in Medicine, 3nd Edition. Wiley, 2015, p. 272
43
Back to Stoop, et al.
• How would we model data
from Stoop, et al.?
– time: discrete or continuous?
– LME or CPM?
– time: linear? quadratic?
– Theory vs. practice....
44
What to do with baseline measurement?
• In clinical trials, a baseline measurement of outcome often taken
before randomization
o Is baseline an “outcome” of an intervention?
• yes: use as first outcome measurement in mixed model?
• no: ignore?
• no: use as covariate in model?
• In an observational study, there is no experimental intervention
o “Baseline” should just be treated as the first of the repeated outcomes
• In Reisby example, baseline HDRS is before patients are treated, but
there is no randomization
o Is baseline HDRS an outcome?
45
Reisby Example, use baseline HDRS as covariate
Select only time >0, use hdrs.base as covariate in mixed model
Original data: Data with hdrs.base as cov:
46
Reisby Example, use baseline HDRS as covariate
Select only time >0, use hdrs.base as covariate in mixed model with
random intercept, random slope, fixed time & endo
• Compare models with and without baseline adjustment. What do
you expect will happen to:
o …the estimate of the fixed intercept?
o …the estimate of the fixed effect of endo?
o …the estimate of the fixed effect of time?
47
Reisby Example, use baseline HDRS as covariate
Select only time >0, use hdrs.base as covariate in mixed model with
random intercept, random slope, fixed time & endo
• What do you expect will happen to:
o … the standard errors of the fixed effects?
•
o … the variance of the random intercepts?
o … the variance of the random slopes?
o ... the residual variance?
48
Reisby Example, use baseline HDRS as covariate
Select only time >0, use centered hdrs.base as covariate in mixed model
with random intercept, random slope, fixed time & endo
• What will change from previous analysis?
49
Final note on baseline measurement
Hypothetically: biggest change between baseline & next measure?
Baseline as first outcome Baseline as covariate
time*tx interaction No time*tx interaction!
50
Summary longitudinal data
• Longitudinal data is a specific form of multilevel data
o measurements within patients
o challenge is in modelling time properly
• Time can be continuous or discrete
o continuous: measurements at different times for different patients
• linear time trend?
• polynomial? (spline?)
o discrete: everyone measured at a few specific time points
• model (fixed) time effect as categorical , what to do with random part?
• with 3+ measurements per person and approximately linear time trends,
you could still consider modelling data as continuous (Reisby, today)
• Centering explanatory variables can change meaning of fixed &
(some) random effects
• First outcome measurement (“baseline”) may have different meaning
depending on study design
51