Slides1 mrbm2324
Slides1 mrbm2324
Lecture 11
Linear Mixed Models
Adrian Bowman
School of Mathematics & Statistics
The University of Glasgow
[email protected]
Introduction
In our earlier discussions of regression models, where a response
variable Y is related to a vector of explanatory variables x, we
assumed that the ‘structural’ part of the model was determined by
a ‘linear predictor’ of the form X β.
Introduction
In our earlier discussions of regression models, where a response
variable Y is related to a vector of explanatory variables x, we
assumed that the ‘structural’ part of the model was determined by
a ‘linear predictor’ of the form X β.
Here X denotes a design matrix and β denotes a vector of
unknown parameters.
Introduction
In our earlier discussions of regression models, where a response
variable Y is related to a vector of explanatory variables x, we
assumed that the ‘structural’ part of the model was determined by
a ‘linear predictor’ of the form X β.
Here X denotes a design matrix and β denotes a vector of
unknown parameters.
A description of the error distribution for Y is required to complete
the model and allow us to fit it to the data. (As we saw in
generalized linear models, a ‘link’ function may also be required.)
Introduction
In our earlier discussions of regression models, where a response
variable Y is related to a vector of explanatory variables x, we
assumed that the ‘structural’ part of the model was determined by
a ‘linear predictor’ of the form X β.
Here X denotes a design matrix and β denotes a vector of
unknown parameters.
A description of the error distribution for Y is required to complete
the model and allow us to fit it to the data. (As we saw in
generalized linear models, a ‘link’ function may also be required.)
This kind of model is extremely useful, as we have seen. However,
there are many kinds of experiment which give rise to data with a
more complex structure.
Introduction
In our earlier discussions of regression models, where a response
variable Y is related to a vector of explanatory variables x, we
assumed that the ‘structural’ part of the model was determined by
a ‘linear predictor’ of the form X β.
Here X denotes a design matrix and β denotes a vector of
unknown parameters.
A description of the error distribution for Y is required to complete
the model and allow us to fit it to the data. (As we saw in
generalized linear models, a ‘link’ function may also be required.)
This kind of model is extremely useful, as we have seen. However,
there are many kinds of experiment which give rise to data with a
more complex structure.
In particular, the source of random variation can be more complex
than a single ‘add-on’ error term.
Components of Variability in Paste Strength
B10 ● ● ● ●●
B09 ● ● ● ●
B08 ●● ● ● ● ●
B07 ● ● ● ● ●●
B06 ● ● ● ● ● ●
B05 ● ● ● ● ●
B04 ● ●● ● ●●
B03 ● ● ● ●● ●
B02 ● ● ● ● ● ●
B01 ● ● ●●● ●
54 56 58 60 62 64 66
Strength
Components of Variability in Paste Strength
yij = µ + βi + εij ,
where yij denotes the jth observation from the ith batch, which has
mean µ + βi . As usual, εij represents the additional error variation.
The end result is a simple ‘one-way analysis of variance’ model.
Components of Variability in Paste Strength
However, the difficulty with this approach is that it gives a
description only of the particular batches we have observed.
Components of Variability in Paste Strength
However, the difficulty with this approach is that it gives a
description only of the particular batches we have observed.
As we collect more data, involving more batches, the number of βi
parameters increases without limit. This isn’t actually what we
would like to do. We need a description of the variation associated
with any batches we might observe in the future.
Components of Variability in Paste Strength
However, the difficulty with this approach is that it gives a
description only of the particular batches we have observed.
As we collect more data, involving more batches, the number of βi
parameters increases without limit. This isn’t actually what we
would like to do. We need a description of the variation associated
with any batches we might observe in the future.
To do this, we regard the βi parameters as drawn from a
distribution which describes the variation in the batch means.
Components of Variability in Paste Strength
However, the difficulty with this approach is that it gives a
description only of the particular batches we have observed.
As we collect more data, involving more batches, the number of βi
parameters increases without limit. This isn’t actually what we
would like to do. We need a description of the variation associated
with any batches we might observe in the future.
To do this, we regard the βi parameters as drawn from a
distribution which describes the variation in the batch means.
To reflect the random nature of the batch means, it is helpful to
change notation a little, representing them by bi .
Components of Variability in Paste Strength
However, the difficulty with this approach is that it gives a
description only of the particular batches we have observed.
As we collect more data, involving more batches, the number of βi
parameters increases without limit. This isn’t actually what we
would like to do. We need a description of the variation associated
with any batches we might observe in the future.
To do this, we regard the βi parameters as drawn from a
distribution which describes the variation in the batch means.
To reflect the random nature of the batch means, it is helpful to
change notation a little, representing them by bi .
We can make the further, hopefully reasonable, assumption that
the batch means come from a normal distribution. This can have
mean 0, because the bi ’s represent deviations from the overall
mean µ.
Components of Variability in Paste Strength
However, the difficulty with this approach is that it gives a
description only of the particular batches we have observed.
As we collect more data, involving more batches, the number of βi
parameters increases without limit. This isn’t actually what we
would like to do. We need a description of the variation associated
with any batches we might observe in the future.
To do this, we regard the βi parameters as drawn from a
distribution which describes the variation in the batch means.
To reflect the random nature of the batch means, it is helpful to
change notation a little, representing them by bi .
We can make the further, hopefully reasonable, assumption that
the batch means come from a normal distribution. This can have
mean 0, because the bi ’s represent deviations from the overall
mean µ.
However, we do need an additional variance parameter, σb2 , which
describes the variance of the bi ’s.
Components of Variability in Paste Strength
The model is then
This is a very concise and useful description of the way the data
are generated. There are now two variance parameters, σb2 and σ 2 ,
which describe the variance of the batch means about the overall
mean and the variance of observations about the batch means,
respectively.
Components of Variability in Paste Strength
The model is then
This is a very concise and useful description of the way the data
are generated. There are now two variance parameters, σb2 and σ 2 ,
which describe the variance of the batch means about the overall
mean and the variance of observations about the batch means,
respectively.
Traditionally, this topic has often been referred to as components
of variance. However, it is now usually handled under a more
general modelling framework referred to as linear mixed models.
Components of Variability in Paste Strength
The model is then
This is a very concise and useful description of the way the data
are generated. There are now two variance parameters, σb2 and σ 2 ,
which describe the variance of the batch means about the overall
mean and the variance of observations about the batch means,
respectively.
Traditionally, this topic has often been referred to as components
of variance. However, it is now usually handled under a more
general modelling framework referred to as linear mixed models.
This involves a linear model structure which has a mixture of both
fixed (in this case only µ) and random (here the bi ) effects.
Components of Variability in Paste Strength
The model is then
This is a very concise and useful description of the way the data
are generated. There are now two variance parameters, σb2 and σ 2 ,
which describe the variance of the batch means about the overall
mean and the variance of observations about the batch means,
respectively.
Traditionally, this topic has often been referred to as components
of variance. However, it is now usually handled under a more
general modelling framework referred to as linear mixed models.
This involves a linear model structure which has a mixture of both
fixed (in this case only µ) and random (here the bi ) effects.
This is a very powerful framework which has the ability to describe
very complex data structures.
Model fitting and assessment - likelihood
The likelihood methods which we discussed in lectures 4 and 5
provide a natural starting point for fitting models such as this, as
we have a full description of the systematic and random parts of
the model and an interest in identifying the unknown parameters.
Model fitting and assessment - likelihood
The likelihood methods which we discussed in lectures 4 and 5
provide a natural starting point for fitting models such as this, as
we have a full description of the systematic and random parts of
the model and an interest in identifying the unknown parameters.
We do also need to assume a particular form for the error
distribution. We have done that already in assuming a normal
distribution for the random effects. We will also use a normal
distribution for the error terms εij .
Model fitting and assessment - likelihood
The likelihood methods which we discussed in lectures 4 and 5
provide a natural starting point for fitting models such as this, as
we have a full description of the systematic and random parts of
the model and an interest in identifying the unknown parameters.
We do also need to assume a particular form for the error
distribution. We have done that already in assuming a normal
distribution for the random effects. We will also use a normal
distribution for the error terms εij .
How should we define the likelihood function in this setting, when
the model involves the bi ’s which we do not observe directly?
Model fitting and assessment - likelihood
The likelihood methods which we discussed in lectures 4 and 5
provide a natural starting point for fitting models such as this, as
we have a full description of the systematic and random parts of
the model and an interest in identifying the unknown parameters.
We do also need to assume a particular form for the error
distribution. We have done that already in assuming a normal
distribution for the random effects. We will also use a normal
distribution for the error terms εij .
How should we define the likelihood function in this setting, when
the model involves the bi ’s which we do not observe directly?
Suppose we represent the data from batch i as yi . Then we can
write the likelihood component for this batch as
Z
2
p(yi |µ, σb , σ ) = p(yi |bi , σb2 , σ 2 )p(bi |σb2 , σ 2 )dbi .
2
Model fitting and assessment - likelihood
Z
p(yi |µ, σb2 , σ 2 ) = p(yi |bi , σb2 , σ 2 )p(bi |σb2 , σ 2 )dbi .
yij = µ + ε∗ij ,
yij = µ + ε∗ij ,
cov yij , yi 0 j 0 =
Model fitting and assessment - likelihood
y ∼ N(X β, σ 2 Σ).
Model fitting and assessment - likelihood
y ∼ N(X β, σ 2 Σ).
Notice that all the usual functions (summary, plot, etc.) can be
applied to the fitted model.
Model fitting and assessment - likelihood
Fixed e f f e c t s : Strength ˜ 1
V a l u e Std . E r r o r DF t−v a l u e p−v a l u e
( I n t e r c e p t ) 6 0 . 0 5 3 3 3 0 . 6 4 7 5 5 4 5 50 9 2 . 7 3 8 6 5 0
Fixed e f f e c t s : Strength ˜ 1
V a l u e Std . E r r o r DF t−v a l u e p−v a l u e
( I n t e r c e p t ) 6 0 . 0 5 3 3 3 0 . 6 7 6 8 7 0 3 50 8 8 . 7 2 2 0 6 0
1
0
log(LH)
−1
−2
1 2 3 4 5 6 7 8 9 10
Time
Multilevel models
Returning to the past data, it may also be true that each cask has
its own mean. This is supported by the lattice plot shown below.
54 56 58 60 62 64 66
B09 B10
3 ●● ● ●
2 ● ●
1 ● ●●
3 ● ● ●● ●● ●●
Cask
2 ●● ● ● ● ● ●●
1 ● ● ● ●
● ●
●
3 ●● ● ● ● ● ●
●
2 ● ● ●● ●● ● ●
1 ●
● ● ● ● ● ●●
54 56 58 60 62 64 66 54 56 58 60 62 64 66
Strength
Multilevel models
To describe this, we now require a model with two nested levels of
random effects. This can be written as
where yijk denotes the kth observation in cask j within batch i, the
bi ’s represent the random effects for batch and the bij ’s represent
the random effects of cask within batch.
Multilevel models
To describe this, we now require a model with two nested levels of
random effects. This can be written as
where yijk denotes the kth observation in cask j within batch i, the
bi ’s represent the random effects for batch and the bij ’s represent
the random effects of cask within batch.
In fact, a very general prescription for linear models can be given as
yi = Xi β + Zi bi + εi ,
We do not have the time to develop the details of the model fitting
at an algebraic level. With R the models can be specified and
fitted as
model <− lme ( S t r e n g t h ˜ 1 ,
random = ˜ 1 | Batch / Cask )
Fixed e f f e c t s : Strength ˜ 1
V a l u e Std . E r r o r DF t−v a l u e p−v a l u e
( I n t e r c e p t ) 6 0 . 0 5 3 3 3 0 . 6 7 6 8 6 4 8 30 8 8 . 7 2 2 7 9 0
Multilevel models