0% found this document useful (0 votes)
3 views73 pages

Slides1 mrbm2324

The document discusses linear mixed models, focusing on how to analyze the variability in paste strength from different batches. It introduces a model that accounts for both fixed and random effects, allowing for a more comprehensive understanding of the data structure. The likelihood methods are highlighted as a starting point for fitting these models and estimating unknown parameters.

Uploaded by

miru park
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views73 pages

Slides1 mrbm2324

The document discusses linear mixed models, focusing on how to analyze the variability in paste strength from different batches. It introduces a model that accounts for both fixed and random effects, allowing for a more comprehensive understanding of the data structure. The likelihood methods are highlighted as a starting point for fitting these models and estimating unknown parameters.

Uploaded by

miru park
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

SMSTC

Modern Regression and Bayesian Methods

Lecture 11
Linear Mixed Models
Adrian Bowman
School of Mathematics & Statistics
The University of Glasgow
[email protected]
Introduction
In our earlier discussions of regression models, where a response
variable Y is related to a vector of explanatory variables x, we
assumed that the ‘structural’ part of the model was determined by
a ‘linear predictor’ of the form X β.
Introduction
In our earlier discussions of regression models, where a response
variable Y is related to a vector of explanatory variables x, we
assumed that the ‘structural’ part of the model was determined by
a ‘linear predictor’ of the form X β.
Here X denotes a design matrix and β denotes a vector of
unknown parameters.
Introduction
In our earlier discussions of regression models, where a response
variable Y is related to a vector of explanatory variables x, we
assumed that the ‘structural’ part of the model was determined by
a ‘linear predictor’ of the form X β.
Here X denotes a design matrix and β denotes a vector of
unknown parameters.
A description of the error distribution for Y is required to complete
the model and allow us to fit it to the data. (As we saw in
generalized linear models, a ‘link’ function may also be required.)
Introduction
In our earlier discussions of regression models, where a response
variable Y is related to a vector of explanatory variables x, we
assumed that the ‘structural’ part of the model was determined by
a ‘linear predictor’ of the form X β.
Here X denotes a design matrix and β denotes a vector of
unknown parameters.
A description of the error distribution for Y is required to complete
the model and allow us to fit it to the data. (As we saw in
generalized linear models, a ‘link’ function may also be required.)
This kind of model is extremely useful, as we have seen. However,
there are many kinds of experiment which give rise to data with a
more complex structure.
Introduction
In our earlier discussions of regression models, where a response
variable Y is related to a vector of explanatory variables x, we
assumed that the ‘structural’ part of the model was determined by
a ‘linear predictor’ of the form X β.
Here X denotes a design matrix and β denotes a vector of
unknown parameters.
A description of the error distribution for Y is required to complete
the model and allow us to fit it to the data. (As we saw in
generalized linear models, a ‘link’ function may also be required.)
This kind of model is extremely useful, as we have seen. However,
there are many kinds of experiment which give rise to data with a
more complex structure.
In particular, the source of random variation can be more complex
than a single ‘add-on’ error term.
Components of Variability in Paste Strength

A company that uses a chemical paste in one of its production


processes receives deliveries of the paste in batches. The quality
control department of the company is concerned about the
variability in the strength of paste and decided to investigate.
Ten batches of paste were randomly selected for a number of
deliveries (one batch is received per delivery). From each of the
batches a random sample of three casks was selected and two
random determinations were made from random samples from
each cask. How should we best examine the components of
variability in paste strength?
Data Source: Intermediate Statistical Methods by G B Wetherill
Components of Variability in Paste Strength
The data, which are available in the dataframe paste, are plotted
below (using a dotplot from the lattice package). The main
axes relate strength to batch. However, if the figure is viewed in
colour, the casks within each batch are also identified.

B10 ● ● ● ●●

B09 ● ● ● ●

B08 ●● ● ● ● ●

B07 ● ● ● ● ●●

B06 ● ● ● ● ● ●

B05 ● ● ● ● ●

B04 ● ●● ● ●●

B03 ● ● ● ●● ●

B02 ● ● ● ● ● ●

B01 ● ● ●●● ●

54 56 58 60 62 64 66

Strength
Components of Variability in Paste Strength

A little reflection suggests that each batch may have a different


mean value. (For simplicity, we will ignore any variation among the
casks at the moment. We will return to them later.)
Components of Variability in Paste Strength

A little reflection suggests that each batch may have a different


mean value. (For simplicity, we will ignore any variation among the
casks at the moment. We will return to them later.)
This would be a natural consequence of the production process -
something we are all familiar with when buying different products.
Components of Variability in Paste Strength

A little reflection suggests that each batch may have a different


mean value. (For simplicity, we will ignore any variation among the
casks at the moment. We will return to them later.)
This would be a natural consequence of the production process -
something we are all familiar with when buying different products.
The plot certainly suggests that may be true, with some batches
producing consistently low values and some consistently high ones.
Components of Variability in Paste Strength

A little reflection suggests that each batch may have a different


mean value. (For simplicity, we will ignore any variation among the
casks at the moment. We will return to them later.)
This would be a natural consequence of the production process -
something we are all familiar with when buying different products.
The plot certainly suggests that may be true, with some batches
producing consistently low values and some consistently high ones.
A natural model would therefore be

yij = µ + βi + εij ,

where yij denotes the jth observation from the ith batch, which has
mean µ + βi . As usual, εij represents the additional error variation.
The end result is a simple ‘one-way analysis of variance’ model.
Components of Variability in Paste Strength
However, the difficulty with this approach is that it gives a
description only of the particular batches we have observed.
Components of Variability in Paste Strength
However, the difficulty with this approach is that it gives a
description only of the particular batches we have observed.
As we collect more data, involving more batches, the number of βi
parameters increases without limit. This isn’t actually what we
would like to do. We need a description of the variation associated
with any batches we might observe in the future.
Components of Variability in Paste Strength
However, the difficulty with this approach is that it gives a
description only of the particular batches we have observed.
As we collect more data, involving more batches, the number of βi
parameters increases without limit. This isn’t actually what we
would like to do. We need a description of the variation associated
with any batches we might observe in the future.
To do this, we regard the βi parameters as drawn from a
distribution which describes the variation in the batch means.
Components of Variability in Paste Strength
However, the difficulty with this approach is that it gives a
description only of the particular batches we have observed.
As we collect more data, involving more batches, the number of βi
parameters increases without limit. This isn’t actually what we
would like to do. We need a description of the variation associated
with any batches we might observe in the future.
To do this, we regard the βi parameters as drawn from a
distribution which describes the variation in the batch means.
To reflect the random nature of the batch means, it is helpful to
change notation a little, representing them by bi .
Components of Variability in Paste Strength
However, the difficulty with this approach is that it gives a
description only of the particular batches we have observed.
As we collect more data, involving more batches, the number of βi
parameters increases without limit. This isn’t actually what we
would like to do. We need a description of the variation associated
with any batches we might observe in the future.
To do this, we regard the βi parameters as drawn from a
distribution which describes the variation in the batch means.
To reflect the random nature of the batch means, it is helpful to
change notation a little, representing them by bi .
We can make the further, hopefully reasonable, assumption that
the batch means come from a normal distribution. This can have
mean 0, because the bi ’s represent deviations from the overall
mean µ.
Components of Variability in Paste Strength
However, the difficulty with this approach is that it gives a
description only of the particular batches we have observed.
As we collect more data, involving more batches, the number of βi
parameters increases without limit. This isn’t actually what we
would like to do. We need a description of the variation associated
with any batches we might observe in the future.
To do this, we regard the βi parameters as drawn from a
distribution which describes the variation in the batch means.
To reflect the random nature of the batch means, it is helpful to
change notation a little, representing them by bi .
We can make the further, hopefully reasonable, assumption that
the batch means come from a normal distribution. This can have
mean 0, because the bi ’s represent deviations from the overall
mean µ.
However, we do need an additional variance parameter, σb2 , which
describes the variance of the bi ’s.
Components of Variability in Paste Strength
The model is then

yij = µ + bi + εij , bi ∼ N(0, σb2 ). (1)


Components of Variability in Paste Strength
The model is then

yij = µ + bi + εij , bi ∼ N(0, σb2 ). (1)

This is a very concise and useful description of the way the data
are generated. There are now two variance parameters, σb2 and σ 2 ,
which describe the variance of the batch means about the overall
mean and the variance of observations about the batch means,
respectively.
Components of Variability in Paste Strength
The model is then

yij = µ + bi + εij , bi ∼ N(0, σb2 ). (1)

This is a very concise and useful description of the way the data
are generated. There are now two variance parameters, σb2 and σ 2 ,
which describe the variance of the batch means about the overall
mean and the variance of observations about the batch means,
respectively.
Traditionally, this topic has often been referred to as components
of variance. However, it is now usually handled under a more
general modelling framework referred to as linear mixed models.
Components of Variability in Paste Strength
The model is then

yij = µ + bi + εij , bi ∼ N(0, σb2 ). (1)

This is a very concise and useful description of the way the data
are generated. There are now two variance parameters, σb2 and σ 2 ,
which describe the variance of the batch means about the overall
mean and the variance of observations about the batch means,
respectively.
Traditionally, this topic has often been referred to as components
of variance. However, it is now usually handled under a more
general modelling framework referred to as linear mixed models.
This involves a linear model structure which has a mixture of both
fixed (in this case only µ) and random (here the bi ) effects.
Components of Variability in Paste Strength
The model is then

yij = µ + bi + εij , bi ∼ N(0, σb2 ). (1)

This is a very concise and useful description of the way the data
are generated. There are now two variance parameters, σb2 and σ 2 ,
which describe the variance of the batch means about the overall
mean and the variance of observations about the batch means,
respectively.
Traditionally, this topic has often been referred to as components
of variance. However, it is now usually handled under a more
general modelling framework referred to as linear mixed models.
This involves a linear model structure which has a mixture of both
fixed (in this case only µ) and random (here the bi ) effects.
This is a very powerful framework which has the ability to describe
very complex data structures.
Model fitting and assessment - likelihood
The likelihood methods which we discussed in lectures 4 and 5
provide a natural starting point for fitting models such as this, as
we have a full description of the systematic and random parts of
the model and an interest in identifying the unknown parameters.
Model fitting and assessment - likelihood
The likelihood methods which we discussed in lectures 4 and 5
provide a natural starting point for fitting models such as this, as
we have a full description of the systematic and random parts of
the model and an interest in identifying the unknown parameters.
We do also need to assume a particular form for the error
distribution. We have done that already in assuming a normal
distribution for the random effects. We will also use a normal
distribution for the error terms εij .
Model fitting and assessment - likelihood
The likelihood methods which we discussed in lectures 4 and 5
provide a natural starting point for fitting models such as this, as
we have a full description of the systematic and random parts of
the model and an interest in identifying the unknown parameters.
We do also need to assume a particular form for the error
distribution. We have done that already in assuming a normal
distribution for the random effects. We will also use a normal
distribution for the error terms εij .
How should we define the likelihood function in this setting, when
the model involves the bi ’s which we do not observe directly?
Model fitting and assessment - likelihood
The likelihood methods which we discussed in lectures 4 and 5
provide a natural starting point for fitting models such as this, as
we have a full description of the systematic and random parts of
the model and an interest in identifying the unknown parameters.
We do also need to assume a particular form for the error
distribution. We have done that already in assuming a normal
distribution for the random effects. We will also use a normal
distribution for the error terms εij .
How should we define the likelihood function in this setting, when
the model involves the bi ’s which we do not observe directly?
Suppose we represent the data from batch i as yi . Then we can
write the likelihood component for this batch as
Z
2
p(yi |µ, σb , σ ) = p(yi |bi , σb2 , σ 2 )p(bi |σb2 , σ 2 )dbi .
2
Model fitting and assessment - likelihood

Z
p(yi |µ, σb2 , σ 2 ) = p(yi |bi , σb2 , σ 2 )p(bi |σb2 , σ 2 )dbi .

This decomposition allows us to write down the density functions


of yi |bi and bi easily, in terms of the way the model is specified,
and then integrate across bi . With normal density functions this is
quite tractable.
Model fitting and assessment - likelihood

A natural alternative approach is to rewrite model (1) in a more


familiar linear model manner.
Model fitting and assessment - likelihood

A natural alternative approach is to rewrite model (1) in a more


familiar linear model manner.
This can be done by combining the two error components bi + εij
into a single term, say ε∗ij , which has mean 0 and variance σb2 + σ 2 .
Model fitting and assessment - likelihood

A natural alternative approach is to rewrite model (1) in a more


familiar linear model manner.
This can be done by combining the two error components bi + εij
into a single term, say ε∗ij , which has mean 0 and variance σb2 + σ 2 .
This gives a linear model

yij = µ + ε∗ij ,

which can be expressed in the usual vector-matrix notation as


y = X β + ε, but where the error vector ε is not of the simple form
σ2I .
Model fitting and assessment - likelihood

A natural alternative approach is to rewrite model (1) in a more


familiar linear model manner.
This can be done by combining the two error components bi + εij
into a single term, say ε∗ij , which has mean 0 and variance σb2 + σ 2 .
This gives a linear model

yij = µ + ε∗ij ,

which can be expressed in the usual vector-matrix notation as


y = X β + ε, but where the error vector ε is not of the simple form
σ2I .

To see this, evaluate the covariance cov yij , yi 0 j 0 .
Model fitting and assessment - likelihood


cov yij , yi 0 j 0 =
Model fitting and assessment - likelihood

In general, we can represent the marginal covariance matrix of y as


σ 2 Σ, which will itself be a function of unknown parameters, in this
case σb2 and σ 2 .
Model fitting and assessment - likelihood

In general, we can represent the marginal covariance matrix of y as


σ 2 Σ, which will itself be a function of unknown parameters, in this
case σb2 and σ 2 .
This allows us to write the model as

y ∼ N(X β, σ 2 Σ).
Model fitting and assessment - likelihood

In general, we can represent the marginal covariance matrix of y as


σ 2 Σ, which will itself be a function of unknown parameters, in this
case σb2 and σ 2 .
This allows us to write the model as

y ∼ N(X β, σ 2 Σ).

We can therefore write down the log-likelihood function fairly


easily, as it is based on a relatively straightforward normal density
function.
Model fitting and assessment - likelihood
We can also adopt a two-stage approach to finding the maximum
likelihood estimates.
Model fitting and assessment - likelihood
We can also adopt a two-stage approach to finding the maximum
likelihood estimates.
If we knew the true value of σb2 then we could apply the method of
generalized least squares, which was mentioned briefly in the earlier
lectures on regression, to find the maximum likelihood estimates of
β(σb2 ) and σ 2 (σb2 ) as

β̂(σb2 ) = (X > Σ−1 X )−1 X > Σ−1 y ,


σˆ2 (σb2 ) = (y − X β̂(σb2 ))> Σ−1 (y − X β̂(σb2 ))/n,

where n denotes the sample size.


Model fitting and assessment - likelihood
We can also adopt a two-stage approach to finding the maximum
likelihood estimates.
If we knew the true value of σb2 then we could apply the method of
generalized least squares, which was mentioned briefly in the earlier
lectures on regression, to find the maximum likelihood estimates of
β(σb2 ) and σ 2 (σb2 ) as

β̂(σb2 ) = (X > Σ−1 X )−1 X > Σ−1 y ,


σˆ2 (σb2 ) = (y − X β̂(σb2 ))> Σ−1 (y − X β̂(σb2 ))/n,

where n denotes the sample size.


The profile likelihood can then be constructed by evaluating the
log-likelihood at the maximum likelihood estimates β̂(σb2 ) and
σˆ2 (σb2 ) for each value of σb2 .
Model fitting and assessment - likelihood
We can also adopt a two-stage approach to finding the maximum
likelihood estimates.
If we knew the true value of σb2 then we could apply the method of
generalized least squares, which was mentioned briefly in the earlier
lectures on regression, to find the maximum likelihood estimates of
β(σb2 ) and σ 2 (σb2 ) as

β̂(σb2 ) = (X > Σ−1 X )−1 X > Σ−1 y ,


σˆ2 (σb2 ) = (y − X β̂(σb2 ))> Σ−1 (y − X β̂(σb2 ))/n,

where n denotes the sample size.


The profile likelihood can then be constructed by evaluating the
log-likelihood at the maximum likelihood estimates β̂(σb2 ) and
σˆ2 (σb2 ) for each value of σb2 .
That gives a simple one-dimensional function to maximise for the
global mle’s.
Model fitting and assessment - likelihood

This alternative formulation of the likelihood approach does not


give an efficient prescription for the actual computations. Pinheiro
& Bates (2000, section 2.2) show how that can be done much
more efficiently.
Model fitting and assessment - likelihood

In R, the likelihood approach is easily implemented as


model <− lme ( S t r e n g t h ˜ 1 , random = ˜ 1 | Batch ,
method = ”ML” )
summary ( model )
p l o t ( model )
qqnorm ( model )

Notice that all the usual functions (summary, plot, etc.) can be
applied to the fitted model.
Model fitting and assessment - likelihood

The essential part of the output is shown below.


Random e f f e c t s :
Formula : ˜1 | Batch
( Intercept ) Residual
StdDev : 1.698791 2.724873

Fixed e f f e c t s : Strength ˜ 1
V a l u e Std . E r r o r DF t−v a l u e p−v a l u e
( I n t e r c e p t ) 6 0 . 0 5 3 3 3 0 . 6 4 7 5 5 4 5 50 9 2 . 7 3 8 6 5 0

This shows the estimates of µ, σb and σ to be 60.053, 1.699 and


2.725 respectively.
Model fitting and assessment - REML
A disadvantage of the likelihood approach in this setting is that it
can be subject to bias.
Model fitting and assessment - REML
A disadvantage of the likelihood approach in this setting is that it
can be subject to bias.
The simplest example is in the estimation of a variance parameter
using the divisor n, whereas the unbiased estimator has divisor
n − p, where p is the number of fixed effect regression parameters
in the model. However, this issue applies more widely.
Model fitting and assessment - REML
A disadvantage of the likelihood approach in this setting is that it
can be subject to bias.
The simplest example is in the estimation of a variance parameter
using the divisor n, whereas the unbiased estimator has divisor
n − p, where p is the number of fixed effect regression parameters
in the model. However, this issue applies more widely.
The method of Restricted Maximum Likelihood, or REML for
short, offers an alternative method of model fitting.
Model fitting and assessment - REML
A disadvantage of the likelihood approach in this setting is that it
can be subject to bias.
The simplest example is in the estimation of a variance parameter
using the divisor n, whereas the unbiased estimator has divisor
n − p, where p is the number of fixed effect regression parameters
in the model. However, this issue applies more widely.
The method of Restricted Maximum Likelihood, or REML for
short, offers an alternative method of model fitting.
A general starting point is to define the likelihood for the variance
parameters by integrating out the fixed effect regression
parameters β to give
Z
L(σb , σ ) = L(β, σb2 , σ 2 )dβ.
2 2

This has a Bayesian interpretation (see later lectures).


Model fitting and assessment - REML
However, at a more operational level, the effect of β can be
removed by first constructing the ordinary least squares residuals
for the regression of y on X , then estimating the variance
parameters by maximizing the likelihood based on these residuals,
and then using this fitted variance structure in a further maximum
likelihood analyses to provide estimates of the fixed effects.
Model fitting and assessment - REML
However, at a more operational level, the effect of β can be
removed by first constructing the ordinary least squares residuals
for the regression of y on X , then estimating the variance
parameters by maximizing the likelihood based on these residuals,
and then using this fitted variance structure in a further maximum
likelihood analyses to provide estimates of the fixed effects.
Algebraically, the difference between ML and REML estimation
arises from the fact that the log-likelihood maximised by the
former is
1 1
− log |H| − (y − X β̂)> H −1 (y − X β̂)
2 2
while the loglikelihood maximised by the latter is
1 1 1
− log |H| − log |X > H −1 X | − (y − X β̂)> H −1 (y − X β̂),
2 2 2
> −1 −1 > −1
where β̂ = (X H X ) X H y and H is used as a convenient
notation for σ 2 Σ. This is discussed by Diggle et al. (1994, section
4.5).
Model fitting and assessment - REML

In R, the REML approach is easily implemented as


model <− lme ( S t r e n g t h ˜ 1 , random = ˜ 1 | Batch ,
method = ”REML” )

Actually, REML is the default method of estimation in the lme


function.
Model fitting and assessment - REML

With the paste data, the output is


Random e f f e c t s :
Formula : ˜1 | Batch
( Intercept ) Residual
StdDev : 1.828673 2.724873

Fixed e f f e c t s : Strength ˜ 1
V a l u e Std . E r r o r DF t−v a l u e p−v a l u e
( I n t e r c e p t ) 6 0 . 0 5 3 3 3 0 . 6 7 6 8 7 0 3 50 8 8 . 7 2 2 0 6 0

In this case the estimates of σ and µ are the same as those


produced by ML estimation, while the estimates of σb are different.
This means, in turn, that the standard errors of β̂ are also different.
Pause for thought (and coffee!)

An important use of linear mixed models is in providing flexible


descriptions of longitudinal data, where observations are made on
individuals over time.
Leutinizing hormone
These data, reported by Raz(1989, Biometrics 54, 851-71) refer to
an experiment which compared the concentrations of leutinizing
hormone (LH) in 16 suckled and 16 nonsuckled cows.
Measurements were made daily from day 1 through to day 4
postpartum, and twice daily from day 5 through to day 10
postpartum. The cows were ovariectomised on day 5 postpartum.
Pause for thought (and coffee!)
The plot below shows the measurements on a log scale, with
points from the same animals joined by lines.

1
0
log(LH)

−1
−2

1 2 3 4 5 6 7 8 9 10

Time
Multilevel models
Returning to the past data, it may also be true that each cask has
its own mean. This is supported by the lattice plot shown below.
54 56 58 60 62 64 66

B09 B10

3 ●● ● ●

2 ● ●

1 ● ●●

B05 B06 B07 B08

3 ● ● ●● ●● ●●
Cask

2 ●● ● ● ● ● ●●

1 ● ● ● ●
● ●

B01 B02 B03 B04

3 ●● ● ● ● ● ●

2 ● ● ●● ●● ● ●

1 ●
● ● ● ● ● ●●

54 56 58 60 62 64 66 54 56 58 60 62 64 66

Strength
Multilevel models
To describe this, we now require a model with two nested levels of
random effects. This can be written as

yijk = µ + bi + bij + εijk ,

where yijk denotes the kth observation in cask j within batch i, the
bi ’s represent the random effects for batch and the bij ’s represent
the random effects of cask within batch.
Multilevel models
To describe this, we now require a model with two nested levels of
random effects. This can be written as

yijk = µ + bi + bij + εijk ,

where yijk denotes the kth observation in cask j within batch i, the
bi ’s represent the random effects for batch and the bij ’s represent
the random effects of cask within batch.
In fact, a very general prescription for linear models can be given as

yi = Xi β + Zi bi + εi ,

where yi denotes the ith group of data defined by random effects,


so that yi and yj are independent when i 6= j, β is a vector of fixed
effect regression parameters with design matrix Xi , bi denotes a
vector of random effects with design matrix Zi and εi represents,
as usual, a vector of error terms. The methods we have sketched
above can be applied to this general formulation to provide a very
flexible and powerful set of modelling tools.
Multilevel models

We do not have the time to develop the details of the model fitting
at an algebraic level. With R the models can be specified and
fitted as
model <− lme ( S t r e n g t h ˜ 1 ,
random = ˜ 1 | Batch / Cask )

The ‘forward slash’ symbol represents ‘nesting’.


Multilevel models

This produced the following output.


Random e f f e c t s :
Formula : ˜1 | Batch
( Intercept )
StdDev : 1.287349

Formula : ˜1 | Cask %i n% Batch


( Intercept ) Residual
StdDev : 2.904061 0.8234124

Fixed e f f e c t s : Strength ˜ 1
V a l u e Std . E r r o r DF t−v a l u e p−v a l u e
( I n t e r c e p t ) 6 0 . 0 5 3 3 3 0 . 6 7 6 8 6 4 8 30 8 8 . 7 2 2 7 9 0
Multilevel models

This shows that the estimates of standard deviation associated


with batch, cask within batch and residual error variation are
1.287, 2.904 and 0.823 respectively.
Multilevel models

This shows that the estimates of standard deviation associated


with batch, cask within batch and residual error variation are
1.287, 2.904 and 0.823 respectively.
This gives a very useful description of the different levels of
variation within each stratum.
Multilevel models

This shows that the estimates of standard deviation associated


with batch, cask within batch and residual error variation are
1.287, 2.904 and 0.823 respectively.
This gives a very useful description of the different levels of
variation within each stratum.
Interestingly, the dominant component is in fact cask within batch.
Comparing models

The analogues of F-tests to compare models are available in the


mixed effects setting.
Comparing models

The analogues of F-tests to compare models are available in the


mixed effects setting.
However, we have to be more careful about how we do this,
depending on the estimation method which has been used.
Comparing models

The analogues of F-tests to compare models are available in the


mixed effects setting.
However, we have to be more careful about how we do this,
depending on the estimation method which has been used.
Pinheiro & Bates (2000) give a lot of useful discussion on this
topic.
Comparing models

The analogues of F-tests to compare models are available in the


mixed effects setting.
However, we have to be more careful about how we do this,
depending on the estimation method which has been used.
Pinheiro & Bates (2000) give a lot of useful discussion on this
topic.
Some aspects of this will be discussed in the context of the case
study below.
A case study - Reading Attainment in Primary
School-children

These data arose from a longitudinal study of a cohort of 407


pupils who entered 33 multi-ethnic inner London infant schools in
1982, and who were followed-up until the end of their junior
schooling in 1989.
A case study - Reading Attainment in Primary
School-children

These data arose from a longitudinal study of a cohort of 407


pupils who entered 33 multi-ethnic inner London infant schools in
1982, and who were followed-up until the end of their junior
schooling in 1989.
The reading ability of pupils was tested on up to six occasions:
annually from 1982 to 1986 and in 1989.
A case study - Reading Attainment in Primary
School-children

These data arose from a longitudinal study of a cohort of 407


pupils who entered 33 multi-ethnic inner London infant schools in
1982, and who were followed-up until the end of their junior
schooling in 1989.
The reading ability of pupils was tested on up to six occasions:
annually from 1982 to 1986 and in 1989.
Data are also available on the age of the pupils at the occasions
when testing was performed and also their sex and ethnic group.
A case study - Reading Attainment in Primary
School-children

These data arose from a longitudinal study of a cohort of 407


pupils who entered 33 multi-ethnic inner London infant schools in
1982, and who were followed-up until the end of their junior
schooling in 1989.
The reading ability of pupils was tested on up to six occasions:
annually from 1982 to 1986 and in 1989.
Data are also available on the age of the pupils at the occasions
when testing was performed and also their sex and ethnic group.
The pupils took a variable number of the assessments and so the
data are unbalanced.
A case study - Reading Attainment in Primary
School-children

These data arose from a longitudinal study of a cohort of 407


pupils who entered 33 multi-ethnic inner London infant schools in
1982, and who were followed-up until the end of their junior
schooling in 1989.
The reading ability of pupils was tested on up to six occasions:
annually from 1982 to 1986 and in 1989.
Data are also available on the age of the pupils at the occasions
when testing was performed and also their sex and ethnic group.
The pupils took a variable number of the assessments and so the
data are unbalanced.
(Data Source: Statistics in Education by Ian Plewis.)
A case study - Reading Attainment in Primary
School-children

The dataset has eight columns:

1 : School Number (1 to 33)


2 : Pupil Number (1 to 751)
3 : Assessment Occasion (1 to 6)
4 : Reading attainment score
5 : A standardised reading score (to be ignored here)
6 : Ethnic Group (white or black African Caribbean)
7 : Sex (boy or girl)
8 : Age (in years, but mean-centred)
A case study

The questions of interest are:

(a) How does reading ability develop as children grow older?


(b) Does this development vary from pupil to pupil or from school
to school?
(c) If so, does it vary systematically from one type of pupil to
another (e.g. boys vs girls, white vs black or both), and
according to the characteristics of the school?
Further reading

Diggle, P.J., Liang, K.-Y. & Zeger, S.L. (1994). Analysis of


Longitudinal Data. OUP, Oxford.
Laird, N.M. & Ware, J.H. (1982). Random-effects models for
longitudinal data. Biometrics 38, 963-974.
Pinheiro, J.C. & Bates, D.M. (2000). Mixed-Effects Models in S
and S-PLUS. Springer, New York.

You might also like