0% found this document useful (0 votes)

3 views73 pages

Slides1 mrbm2324

The document discusses linear mixed models, focusing on how to analyze the variability in paste strength from different batches. It introduces a model that accounts for both fixed and random effects, allowing for a more comprehensive understanding of the data structure. The likelihood methods are highlighted as a starting point for fitting these models and estimating unknown parameters.

Uploaded by

miru park

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views73 pages

Slides1 mrbm2324

Uploaded by

miru park

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 73

SMSTC

Modern Regression and Bayesian Methods

Lecture 11
Linear Mixed Models
Adrian Bowman
School of Mathematics & Statistics
The University of Glasgow
[email protected]
Introduction
In our earlier discussions of regression models, where a response
variable Y is related to a vector of explanatory variables x, we
assumed that the ‘structural’ part of the model was determined by
a ‘linear predictor’ of the form X β.
Introduction
In our earlier discussions of regression models, where a response
variable Y is related to a vector of explanatory variables x, we
assumed that the ‘structural’ part of the model was determined by
a ‘linear predictor’ of the form X β.
Here X denotes a design matrix and β denotes a vector of
unknown parameters.
Introduction
In our earlier discussions of regression models, where a response
variable Y is related to a vector of explanatory variables x, we
assumed that the ‘structural’ part of the model was determined by
a ‘linear predictor’ of the form X β.
Here X denotes a design matrix and β denotes a vector of
unknown parameters.
A description of the error distribution for Y is required to complete
the model and allow us to fit it to the data. (As we saw in
generalized linear models, a ‘link’ function may also be required.)
Introduction
In our earlier discussions of regression models, where a response
variable Y is related to a vector of explanatory variables x, we
assumed that the ‘structural’ part of the model was determined by
a ‘linear predictor’ of the form X β.
Here X denotes a design matrix and β denotes a vector of
unknown parameters.
A description of the error distribution for Y is required to complete
the model and allow us to fit it to the data. (As we saw in
generalized linear models, a ‘link’ function may also be required.)
This kind of model is extremely useful, as we have seen. However,
there are many kinds of experiment which give rise to data with a
more complex structure.
Introduction
In our earlier discussions of regression models, where a response
variable Y is related to a vector of explanatory variables x, we
assumed that the ‘structural’ part of the model was determined by
a ‘linear predictor’ of the form X β.
Here X denotes a design matrix and β denotes a vector of
unknown parameters.
A description of the error distribution for Y is required to complete
the model and allow us to fit it to the data. (As we saw in
generalized linear models, a ‘link’ function may also be required.)
This kind of model is extremely useful, as we have seen. However,
there are many kinds of experiment which give rise to data with a
more complex structure.
In particular, the source of random variation can be more complex
than a single ‘add-on’ error term.
Components of Variability in Paste Strength

A company that uses a chemical paste in one of its production

processes receives deliveries of the paste in batches. The quality
control department of the company is concerned about the
variability in the strength of paste and decided to investigate.
Ten batches of paste were randomly selected for a number of
deliveries (one batch is received per delivery). From each of the
batches a random sample of three casks was selected and two
random determinations were made from random samples from
each cask. How should we best examine the components of
variability in paste strength?
Data Source: Intermediate Statistical Methods by G B Wetherill
Components of Variability in Paste Strength
The data, which are available in the dataframe paste, are plotted
below (using a dotplot from the lattice package). The main
axes relate strength to batch. However, if the figure is viewed in
colour, the casks within each batch are also identified.

B10 ● ● ● ●●

B09 ● ● ● ●

B08 ●● ● ● ● ●

B07 ● ● ● ● ●●

B06 ● ● ● ● ● ●

B05 ● ● ● ● ●

B04 ● ●● ● ●●

B03 ● ● ● ●● ●

B02 ● ● ● ● ● ●

B01 ● ● ●●● ●

54 56 58 60 62 64 66

Strength
Components of Variability in Paste Strength

A little reflection suggests that each batch may have a different

mean value. (For simplicity, we will ignore any variation among the
casks at the moment. We will return to them later.)
Components of Variability in Paste Strength

A little reflection suggests that each batch may have a different

mean value. (For simplicity, we will ignore any variation among the
casks at the moment. We will return to them later.)
This would be a natural consequence of the production process -
something we are all familiar with when buying different products.
The plot certainly suggests that may be true, with some batches
producing consistently low values and some consistently high ones.
Components of Variability in Paste Strength

A little reflection suggests that each batch may have a different

yij = µ + βi + εij ,

where yij denotes the jth observation from the ith batch, which has
mean µ + βi . As usual, εij represents the additional error variation.
The end result is a simple ‘one-way analysis of variance’ model.
Components of Variability in Paste Strength
However, the difficulty with this approach is that it gives a
description only of the particular batches we have observed.
Components of Variability in Paste Strength
However, the difficulty with this approach is that it gives a
description only of the particular batches we have observed.
As we collect more data, involving more batches, the number of βi
parameters increases without limit. This isn’t actually what we
would like to do. We need a description of the variation associated
with any batches we might observe in the future.
Components of Variability in Paste Strength
However, the difficulty with this approach is that it gives a
description only of the particular batches we have observed.
As we collect more data, involving more batches, the number of βi
parameters increases without limit. This isn’t actually what we
would like to do. We need a description of the variation associated
with any batches we might observe in the future.
To do this, we regard the βi parameters as drawn from a
distribution which describes the variation in the batch means.
Components of Variability in Paste Strength
However, the difficulty with this approach is that it gives a
description only of the particular batches we have observed.
As we collect more data, involving more batches, the number of βi
parameters increases without limit. This isn’t actually what we
would like to do. We need a description of the variation associated
with any batches we might observe in the future.
To do this, we regard the βi parameters as drawn from a
distribution which describes the variation in the batch means.
To reflect the random nature of the batch means, it is helpful to
change notation a little, representing them by bi .
Components of Variability in Paste Strength
However, the difficulty with this approach is that it gives a
description only of the particular batches we have observed.
As we collect more data, involving more batches, the number of βi
parameters increases without limit. This isn’t actually what we
would like to do. We need a description of the variation associated
with any batches we might observe in the future.
To do this, we regard the βi parameters as drawn from a
distribution which describes the variation in the batch means.
To reflect the random nature of the batch means, it is helpful to
change notation a little, representing them by bi .
We can make the further, hopefully reasonable, assumption that
the batch means come from a normal distribution. This can have
mean 0, because the bi ’s represent deviations from the overall
mean µ.
Components of Variability in Paste Strength
However, the difficulty with this approach is that it gives a
description only of the particular batches we have observed.
As we collect more data, involving more batches, the number of βi
parameters increases without limit. This isn’t actually what we
would like to do. We need a description of the variation associated
with any batches we might observe in the future.
To do this, we regard the βi parameters as drawn from a
distribution which describes the variation in the batch means.
To reflect the random nature of the batch means, it is helpful to
change notation a little, representing them by bi .
We can make the further, hopefully reasonable, assumption that
the batch means come from a normal distribution. This can have
mean 0, because the bi ’s represent deviations from the overall
mean µ.
However, we do need an additional variance parameter, σb2 , which
describes the variance of the bi ’s.
Components of Variability in Paste Strength
The model is then

yij = µ + bi + εij , bi ∼ N(0, σb2 ). (1)

Components of Variability in Paste Strength
The model is then

yij = µ + bi + εij , bi ∼ N(0, σb2 ). (1)

This is a very concise and useful description of the way the data
are generated. There are now two variance parameters, σb2 and σ 2 ,
which describe the variance of the batch means about the overall
mean and the variance of observations about the batch means,
respectively.
Traditionally, this topic has often been referred to as components
of variance. However, it is now usually handled under a more
general modelling framework referred to as linear mixed models.
This involves a linear model structure which has a mixture of both
fixed (in this case only µ) and random (here the bi ) effects.
This is a very powerful framework which has the ability to describe
very complex data structures.
Model fitting and assessment - likelihood
The likelihood methods which we discussed in lectures 4 and 5
provide a natural starting point for fitting models such as this, as
we have a full description of the systematic and random parts of
the model and an interest in identifying the unknown parameters.
Model fitting and assessment - likelihood
The likelihood methods which we discussed in lectures 4 and 5
provide a natural starting point for fitting models such as this, as
we have a full description of the systematic and random parts of
the model and an interest in identifying the unknown parameters.
We do also need to assume a particular form for the error
distribution. We have done that already in assuming a normal
distribution for the random effects. We will also use a normal
distribution for the error terms εij .
Model fitting and assessment - likelihood
The likelihood methods which we discussed in lectures 4 and 5
provide a natural starting point for fitting models such as this, as
we have a full description of the systematic and random parts of
the model and an interest in identifying the unknown parameters.
We do also need to assume a particular form for the error
distribution. We have done that already in assuming a normal
distribution for the random effects. We will also use a normal
distribution for the error terms εij .
How should we define the likelihood function in this setting, when
the model involves the bi ’s which we do not observe directly?
Model fitting and assessment - likelihood
The likelihood methods which we discussed in lectures 4 and 5
provide a natural starting point for fitting models such as this, as
we have a full description of the systematic and random parts of
the model and an interest in identifying the unknown parameters.
We do also need to assume a particular form for the error
distribution. We have done that already in assuming a normal
distribution for the random effects. We will also use a normal
distribution for the error terms εij .
How should we define the likelihood function in this setting, when
the model involves the bi ’s which we do not observe directly?
Suppose we represent the data from batch i as yi . Then we can
write the likelihood component for this batch as
Z
2
p(yi |µ, σb , σ ) = p(yi |bi , σb2 , σ 2 )p(bi |σb2 , σ 2 )dbi .
2
Model fitting and assessment - likelihood

Z
p(yi |µ, σb2 , σ 2 ) = p(yi |bi , σb2 , σ 2 )p(bi |σb2 , σ 2 )dbi .

This decomposition allows us to write down the density functions

of yi |bi and bi easily, in terms of the way the model is specified,
and then integrate across bi . With normal density functions this is
quite tractable.
Model fitting and assessment - likelihood

A natural alternative approach is to rewrite model (1) in a more

familiar linear model manner.
Model fitting and assessment - likelihood

A natural alternative approach is to rewrite model (1) in a more

familiar linear model manner.
This can be done by combining the two error components bi + εij
into a single term, say ε∗ij , which has mean 0 and variance σb2 + σ 2 .
Model fitting and assessment - likelihood

A natural alternative approach is to rewrite model (1) in a more

familiar linear model manner.
This can be done by combining the two error components bi + εij
into a single term, say ε∗ij , which has mean 0 and variance σb2 + σ 2 .
This gives a linear model

yij = µ + ε∗ij ,

which can be expressed in the usual vector-matrix notation as

y = X β + ε, but where the error vector ε is not of the simple form
σ2I .
Model fitting and assessment - likelihood

A natural alternative approach is to rewrite model (1) in a more

familiar linear model manner.
This can be done by combining the two error components bi + εij
into a single term, say ε∗ij , which has mean 0 and variance σb2 + σ 2 .
This gives a linear model

yij = µ + ε∗ij ,

which can be expressed in the usual vector-matrix notation as

y = X β + ε, but where the error vector ε is not of the simple form
σ2I .

To see this, evaluate the covariance cov yij , yi 0 j 0 .
Model fitting and assessment - likelihood

cov yij , yi 0 j 0 =
Model fitting and assessment - likelihood

In general, we can represent the marginal covariance matrix of y as

σ 2 Σ, which will itself be a function of unknown parameters, in this
case σb2 and σ 2 .
Model fitting and assessment - likelihood

In general, we can represent the marginal covariance matrix of y as

σ 2 Σ, which will itself be a function of unknown parameters, in this
case σb2 and σ 2 .
This allows us to write the model as

y ∼ N(X β, σ 2 Σ).
Model fitting and assessment - likelihood

In general, we can represent the marginal covariance matrix of y as

σ 2 Σ, which will itself be a function of unknown parameters, in this
case σb2 and σ 2 .
This allows us to write the model as

y ∼ N(X β, σ 2 Σ).

We can therefore write down the log-likelihood function fairly

easily, as it is based on a relatively straightforward normal density
function.
Model fitting and assessment - likelihood
We can also adopt a two-stage approach to finding the maximum
likelihood estimates.
Model fitting and assessment - likelihood
We can also adopt a two-stage approach to finding the maximum
likelihood estimates.
If we knew the true value of σb2 then we could apply the method of
generalized least squares, which was mentioned briefly in the earlier
lectures on regression, to find the maximum likelihood estimates of
β(σb2 ) and σ 2 (σb2 ) as

β̂(σb2 ) = (X > Σ−1 X )−1 X > Σ−1 y ,

σˆ2 (σb2 ) = (y − X β̂(σb2 ))> Σ−1 (y − X β̂(σb2 ))/n,

where n denotes the sample size.

Model fitting and assessment - likelihood
We can also adopt a two-stage approach to finding the maximum
likelihood estimates.
If we knew the true value of σb2 then we could apply the method of
generalized least squares, which was mentioned briefly in the earlier
lectures on regression, to find the maximum likelihood estimates of
β(σb2 ) and σ 2 (σb2 ) as

β̂(σb2 ) = (X > Σ−1 X )−1 X > Σ−1 y ,

σˆ2 (σb2 ) = (y − X β̂(σb2 ))> Σ−1 (y − X β̂(σb2 ))/n,

where n denotes the sample size.

The profile likelihood can then be constructed by evaluating the
log-likelihood at the maximum likelihood estimates β̂(σb2 ) and
σˆ2 (σb2 ) for each value of σb2 .
Model fitting and assessment - likelihood
We can also adopt a two-stage approach to finding the maximum
likelihood estimates.
If we knew the true value of σb2 then we could apply the method of
generalized least squares, which was mentioned briefly in the earlier
lectures on regression, to find the maximum likelihood estimates of
β(σb2 ) and σ 2 (σb2 ) as

β̂(σb2 ) = (X > Σ−1 X )−1 X > Σ−1 y ,

σˆ2 (σb2 ) = (y − X β̂(σb2 ))> Σ−1 (y − X β̂(σb2 ))/n,

where n denotes the sample size.

The profile likelihood can then be constructed by evaluating the
log-likelihood at the maximum likelihood estimates β̂(σb2 ) and
σˆ2 (σb2 ) for each value of σb2 .
That gives a simple one-dimensional function to maximise for the
global mle’s.
Model fitting and assessment - likelihood

This alternative formulation of the likelihood approach does not

give an efficient prescription for the actual computations. Pinheiro
& Bates (2000, section 2.2) show how that can be done much
more efficiently.
Model fitting and assessment - likelihood

In R, the likelihood approach is easily implemented as

model <− lme ( S t r e n g t h ˜ 1 , random = ˜ 1 | Batch ,
method = ”ML” )
summary ( model )
p l o t ( model )
qqnorm ( model )

Notice that all the usual functions (summary, plot, etc.) can be
applied to the fitted model.
Model fitting and assessment - likelihood

The essential part of the output is shown below.

Random e f f e c t s :
Formula : ˜1 | Batch
( Intercept ) Residual
StdDev : 1.698791 2.724873

Fixed e f f e c t s : Strength ˜ 1
V a l u e Std . E r r o r DF t−v a l u e p−v a l u e
( I n t e r c e p t ) 6 0 . 0 5 3 3 3 0 . 6 4 7 5 5 4 5 50 9 2 . 7 3 8 6 5 0

This shows the estimates of µ, σb and σ to be 60.053, 1.699 and

2.725 respectively.
Model fitting and assessment - REML
A disadvantage of the likelihood approach in this setting is that it
can be subject to bias.
Model fitting and assessment - REML
A disadvantage of the likelihood approach in this setting is that it
can be subject to bias.
The simplest example is in the estimation of a variance parameter
using the divisor n, whereas the unbiased estimator has divisor
n − p, where p is the number of fixed effect regression parameters
in the model. However, this issue applies more widely.
Model fitting and assessment - REML
A disadvantage of the likelihood approach in this setting is that it
can be subject to bias.
The simplest example is in the estimation of a variance parameter
using the divisor n, whereas the unbiased estimator has divisor
n − p, where p is the number of fixed effect regression parameters
in the model. However, this issue applies more widely.
The method of Restricted Maximum Likelihood, or REML for
short, offers an alternative method of model fitting.
Model fitting and assessment - REML
A disadvantage of the likelihood approach in this setting is that it
can be subject to bias.
The simplest example is in the estimation of a variance parameter
using the divisor n, whereas the unbiased estimator has divisor
n − p, where p is the number of fixed effect regression parameters
in the model. However, this issue applies more widely.
The method of Restricted Maximum Likelihood, or REML for
short, offers an alternative method of model fitting.
A general starting point is to define the likelihood for the variance
parameters by integrating out the fixed effect regression
parameters β to give
Z
L(σb , σ ) = L(β, σb2 , σ 2 )dβ.
2 2

This has a Bayesian interpretation (see later lectures).

Model fitting and assessment - REML
However, at a more operational level, the effect of β can be
removed by first constructing the ordinary least squares residuals
for the regression of y on X , then estimating the variance
parameters by maximizing the likelihood based on these residuals,
and then using this fitted variance structure in a further maximum
likelihood analyses to provide estimates of the fixed effects.
Model fitting and assessment - REML
However, at a more operational level, the effect of β can be
removed by first constructing the ordinary least squares residuals
for the regression of y on X , then estimating the variance
parameters by maximizing the likelihood based on these residuals,
and then using this fitted variance structure in a further maximum
likelihood analyses to provide estimates of the fixed effects.
Algebraically, the difference between ML and REML estimation
arises from the fact that the log-likelihood maximised by the
former is
1 1
− log |H| − (y − X β̂)> H −1 (y − X β̂)
2 2
while the loglikelihood maximised by the latter is
1 1 1
− log |H| − log |X > H −1 X | − (y − X β̂)> H −1 (y − X β̂),
2 2 2
> −1 −1 > −1
where β̂ = (X H X ) X H y and H is used as a convenient
notation for σ 2 Σ. This is discussed by Diggle et al. (1994, section
4.5).
Model fitting and assessment - REML

In R, the REML approach is easily implemented as

model <− lme ( S t r e n g t h ˜ 1 , random = ˜ 1 | Batch ,
method = ”REML” )

Actually, REML is the default method of estimation in the lme

function.
Model fitting and assessment - REML

With the paste data, the output is

Random e f f e c t s :
Formula : ˜1 | Batch
( Intercept ) Residual
StdDev : 1.828673 2.724873

Fixed e f f e c t s : Strength ˜ 1
V a l u e Std . E r r o r DF t−v a l u e p−v a l u e
( I n t e r c e p t ) 6 0 . 0 5 3 3 3 0 . 6 7 6 8 7 0 3 50 8 8 . 7 2 2 0 6 0

In this case the estimates of σ and µ are the same as those

produced by ML estimation, while the estimates of σb are different.
This means, in turn, that the standard errors of β̂ are also different.
Pause for thought (and coffee!)

An important use of linear mixed models is in providing flexible

descriptions of longitudinal data, where observations are made on
individuals over time.
Leutinizing hormone
These data, reported by Raz(1989, Biometrics 54, 851-71) refer to
an experiment which compared the concentrations of leutinizing
hormone (LH) in 16 suckled and 16 nonsuckled cows.
Measurements were made daily from day 1 through to day 4
postpartum, and twice daily from day 5 through to day 10
postpartum. The cows were ovariectomised on day 5 postpartum.
Pause for thought (and coffee!)
The plot below shows the measurements on a log scale, with
points from the same animals joined by lines.

1
0
log(LH)

−1
−2

1 2 3 4 5 6 7 8 9 10

Time
Multilevel models
Returning to the past data, it may also be true that each cask has
its own mean. This is supported by the lattice plot shown below.
54 56 58 60 62 64 66

B09 B10

3 ●● ● ●

2 ● ●

1 ● ●●

B05 B06 B07 B08

3 ● ● ●● ●● ●●
Cask

2 ●● ● ● ● ● ●●

1 ● ● ● ●
● ●
●

B01 B02 B03 B04

3 ●● ● ● ● ● ●
●

2 ● ● ●● ●● ● ●

1 ●
● ● ● ● ● ●●

54 56 58 60 62 64 66 54 56 58 60 62 64 66

Strength
Multilevel models
To describe this, we now require a model with two nested levels of
random effects. This can be written as

yijk = µ + bi + bij + εijk ,

where yijk denotes the kth observation in cask j within batch i, the
bi ’s represent the random effects for batch and the bij ’s represent
the random effects of cask within batch.
Multilevel models
To describe this, we now require a model with two nested levels of
random effects. This can be written as

yijk = µ + bi + bij + εijk ,

where yijk denotes the kth observation in cask j within batch i, the
bi ’s represent the random effects for batch and the bij ’s represent
the random effects of cask within batch.
In fact, a very general prescription for linear models can be given as

yi = Xi β + Zi bi + εi ,

where yi denotes the ith group of data defined by random effects,

so that yi and yj are independent when i 6= j, β is a vector of fixed
effect regression parameters with design matrix Xi , bi denotes a
vector of random effects with design matrix Zi and εi represents,
as usual, a vector of error terms. The methods we have sketched
above can be applied to this general formulation to provide a very
flexible and powerful set of modelling tools.
Multilevel models

We do not have the time to develop the details of the model fitting
at an algebraic level. With R the models can be specified and
fitted as
model <− lme ( S t r e n g t h ˜ 1 ,
random = ˜ 1 | Batch / Cask )

The ‘forward slash’ symbol represents ‘nesting’.

Multilevel models

This produced the following output.

Random e f f e c t s :
Formula : ˜1 | Batch
( Intercept )
StdDev : 1.287349

Formula : ˜1 | Cask %i n% Batch

( Intercept ) Residual
StdDev : 2.904061 0.8234124

Fixed e f f e c t s : Strength ˜ 1
V a l u e Std . E r r o r DF t−v a l u e p−v a l u e
( I n t e r c e p t ) 6 0 . 0 5 3 3 3 0 . 6 7 6 8 6 4 8 30 8 8 . 7 2 2 7 9 0
Multilevel models

This shows that the estimates of standard deviation associated

with batch, cask within batch and residual error variation are
1.287, 2.904 and 0.823 respectively.
Multilevel models

This shows that the estimates of standard deviation associated

with batch, cask within batch and residual error variation are
1.287, 2.904 and 0.823 respectively.
This gives a very useful description of the different levels of
variation within each stratum.
Multilevel models

This shows that the estimates of standard deviation associated

The analogues of F-tests to compare models are available in the

mixed effects setting.
Comparing models

The analogues of F-tests to compare models are available in the

mixed effects setting.
However, we have to be more careful about how we do this,
depending on the estimation method which has been used.
Comparing models

The analogues of F-tests to compare models are available in the

mixed effects setting.
However, we have to be more careful about how we do this,
depending on the estimation method which has been used.
Pinheiro & Bates (2000) give a lot of useful discussion on this
topic.
Some aspects of this will be discussed in the context of the case
study below.
A case study - Reading Attainment in Primary
School-children

These data arose from a longitudinal study of a cohort of 407

pupils who entered 33 multi-ethnic inner London infant schools in
1982, and who were followed-up until the end of their junior
schooling in 1989.
A case study - Reading Attainment in Primary
School-children

These data arose from a longitudinal study of a cohort of 407

pupils who entered 33 multi-ethnic inner London infant schools in
1982, and who were followed-up until the end of their junior
schooling in 1989.
The reading ability of pupils was tested on up to six occasions:
annually from 1982 to 1986 and in 1989.
Data are also available on the age of the pupils at the occasions
when testing was performed and also their sex and ethnic group.
A case study - Reading Attainment in Primary
School-children

These data arose from a longitudinal study of a cohort of 407

pupils who entered 33 multi-ethnic inner London infant schools in
1982, and who were followed-up until the end of their junior
schooling in 1989.
The reading ability of pupils was tested on up to six occasions:
annually from 1982 to 1986 and in 1989.
Data are also available on the age of the pupils at the occasions
when testing was performed and also their sex and ethnic group.
The pupils took a variable number of the assessments and so the
data are unbalanced.
A case study - Reading Attainment in Primary
School-children

These data arose from a longitudinal study of a cohort of 407

pupils who entered 33 multi-ethnic inner London infant schools in
1982, and who were followed-up until the end of their junior
schooling in 1989.
The reading ability of pupils was tested on up to six occasions:
annually from 1982 to 1986 and in 1989.
Data are also available on the age of the pupils at the occasions
when testing was performed and also their sex and ethnic group.
The pupils took a variable number of the assessments and so the
data are unbalanced.
(Data Source: Statistics in Education by Ian Plewis.)
A case study - Reading Attainment in Primary
School-children

The dataset has eight columns:

1 : School Number (1 to 33)

2 : Pupil Number (1 to 751)
3 : Assessment Occasion (1 to 6)
4 : Reading attainment score
5 : A standardised reading score (to be ignored here)
6 : Ethnic Group (white or black African Caribbean)
7 : Sex (boy or girl)
8 : Age (in years, but mean-centred)
A case study

The questions of interest are:

(a) How does reading ability develop as children grow older?

(b) Does this development vary from pupil to pupil or from school
to school?
(c) If so, does it vary systematically from one type of pupil to
another (e.g. boys vs girls, white vs black or both), and
according to the characteristics of the school?
Further reading

Diggle, P.J., Liang, K.-Y. & Zeger, S.L. (1994). Analysis of

Longitudinal Data. OUP, Oxford.
Laird, N.M. & Ware, J.H. (1982). Random-effects models for
longitudinal data. Biometrics 38, 963-974.
Pinheiro, J.C. & Bates, D.M. (2000). Mixed-Effects Models in S
and S-PLUS. Springer, New York.

Week 4 - Anova
No ratings yet
Week 4 - Anova
93 pages
Module 5 (003) - Updated
No ratings yet
Module 5 (003) - Updated
101 pages
Skewness 2025
No ratings yet
Skewness 2025
62 pages
DS Unit 1
No ratings yet
DS Unit 1
99 pages
Walpole Ch-13 KZ
No ratings yet
Walpole Ch-13 KZ
31 pages
Chapter 1
No ratings yet
Chapter 1
18 pages
Statistics and Econometrics ICA - R Project
No ratings yet
Statistics and Econometrics ICA - R Project
21 pages
Statistics Learners' Working Manual
No ratings yet
Statistics Learners' Working Manual
25 pages
Bioinfo 10
No ratings yet
Bioinfo 10
88 pages
Discovering Knowledge in Data - 2014 - Larose - 2 - Partie2
No ratings yet
Discovering Knowledge in Data - 2014 - Larose - 2 - Partie2
10 pages
Real Statistics Examples ANOVA 1
No ratings yet
Real Statistics Examples ANOVA 1
317 pages
STATAfor Econ Workshop 3
No ratings yet
STATAfor Econ Workshop 3
12 pages
OneFactorANOVA Introduction
No ratings yet
OneFactorANOVA Introduction
11 pages
Wk02 Lect-1
No ratings yet
Wk02 Lect-1
36 pages
Lesson 4-Comparing Means p3
No ratings yet
Lesson 4-Comparing Means p3
30 pages
Statistics Foundation Slider Team Group#1
No ratings yet
Statistics Foundation Slider Team Group#1
94 pages
DAE11
No ratings yet
DAE11
51 pages
ANOVA 2 (2) NNNKN
No ratings yet
ANOVA 2 (2) NNNKN
36 pages
American Statistical Association American Society For Quality
No ratings yet
American Statistical Association American Society For Quality
9 pages
Module 1 - ANALYSIS OF VARIANCE
No ratings yet
Module 1 - ANALYSIS OF VARIANCE
11 pages
CHEE2010 Tutorial 5: Analysis of Variance (ANOVA) : Submission Instructions
No ratings yet
CHEE2010 Tutorial 5: Analysis of Variance (ANOVA) : Submission Instructions
9 pages
Lesson 08 Data Analysis Using Statistics
No ratings yet
Lesson 08 Data Analysis Using Statistics
100 pages
AGE 301 - NOTE - A-1
No ratings yet
AGE 301 - NOTE - A-1
8 pages
Business Analytics Unit 4
No ratings yet
Business Analytics Unit 4
24 pages
P7.01 Anova Gia Ara 2016 2017
No ratings yet
P7.01 Anova Gia Ara 2016 2017
15 pages
R Assignment 1 Instructions 202501
No ratings yet
R Assignment 1 Instructions 202501
4 pages
Analysis of Variance Webinar
No ratings yet
Analysis of Variance Webinar
44 pages
Histogram Charts in Matlab: Data Analysis Statistics
No ratings yet
Histogram Charts in Matlab: Data Analysis Statistics
13 pages
Experiment No. 1:: Collegium ("Council of State") and The Italian Word Statista ("Statesman" or "Politician")
No ratings yet
Experiment No. 1:: Collegium ("Council of State") and The Italian Word Statista ("Statesman" or "Politician")
7 pages
تقرير نماذج خطية
No ratings yet
تقرير نماذج خطية
31 pages
Chapt 1
No ratings yet
Chapt 1
20 pages
Outliers Correlation
No ratings yet
Outliers Correlation
21 pages
Tutorial 8
No ratings yet
Tutorial 8
23 pages
One Way Anova
100% (1)
One Way Anova
23 pages
Anova
No ratings yet
Anova
28 pages
Multi-Vari Chart (Graphing Using Minitab)
No ratings yet
Multi-Vari Chart (Graphing Using Minitab)
57 pages
Calculo Poder Tutorial STATISTICA
No ratings yet
Calculo Poder Tutorial STATISTICA
17 pages
BuildingPredictiveModelsR Caret
No ratings yet
BuildingPredictiveModelsR Caret
26 pages
Business Analytics
No ratings yet
Business Analytics
19 pages
BES - R Lab 4
No ratings yet
BES - R Lab 4
6 pages
The Roles of Statistics
No ratings yet
The Roles of Statistics
56 pages
Anova & Factor Analysis
No ratings yet
Anova & Factor Analysis
24 pages
1design (Cha1) 2012-1
No ratings yet
1design (Cha1) 2012-1
7 pages
A Simple Introduction To ANOVA
No ratings yet
A Simple Introduction To ANOVA
20 pages
A-& B-Basis Analysis: User Manual
No ratings yet
A-& B-Basis Analysis: User Manual
10 pages
Anova
No ratings yet
Anova
46 pages
MCSL044 Section 3 CRC
No ratings yet
MCSL044 Section 3 CRC
10 pages
R Manual PDF
No ratings yet
R Manual PDF
78 pages
A Simple Introduction To ANOVA (With Applications in Excel) : Source: Megapixl
No ratings yet
A Simple Introduction To ANOVA (With Applications in Excel) : Source: Megapixl
22 pages
Checking Model Assumptions
No ratings yet
Checking Model Assumptions
4 pages
Excel Tutorial Mac f19
No ratings yet
Excel Tutorial Mac f19
5 pages
ANOVA Matlab Instructions
No ratings yet
ANOVA Matlab Instructions
6 pages
Success Stories in The Process Industries: Big Data
No ratings yet
Success Stories in The Process Industries: Big Data
5 pages
Machine Learning Basics: Lecture Slides For Chapter 5 of Deep Learning Ian Goodfellow
No ratings yet
Machine Learning Basics: Lecture Slides For Chapter 5 of Deep Learning Ian Goodfellow
85 pages
ANOVA Matlab Instructions PDF
No ratings yet
ANOVA Matlab Instructions PDF
6 pages
Xlstat® Tip Sheet For Business Statistics - Cengage Learning
No ratings yet
Xlstat® Tip Sheet For Business Statistics - Cengage Learning
30 pages
R Session Bootstrapping Randomisation 2024
No ratings yet
R Session Bootstrapping Randomisation 2024
4 pages
Analysis of Variance (ANOVA) Is A Collection of
No ratings yet
Analysis of Variance (ANOVA) Is A Collection of
25 pages
Quick Start PDF
No ratings yet
Quick Start PDF
17 pages
WS-011 Windows Server 2019 Administration
No ratings yet
WS-011 Windows Server 2019 Administration
58 pages
Electronic - Evidence - Lecture - Atty. Ed Lim
100% (1)
Electronic - Evidence - Lecture - Atty. Ed Lim
37 pages
CCI Antenna HBSA-M65-19R010-62 - Ver 1.1 PDF
No ratings yet
CCI Antenna HBSA-M65-19R010-62 - Ver 1.1 PDF
3 pages
Datasheet LCD 1602A
No ratings yet
Datasheet LCD 1602A
12 pages
Vtools User Manual
No ratings yet
Vtools User Manual
136 pages
Introduction To Internet of Things
No ratings yet
Introduction To Internet of Things
54 pages
Stochastic Notes
No ratings yet
Stochastic Notes
133 pages
Methods Lecture5 Slides 2024
No ratings yet
Methods Lecture5 Slides 2024
255 pages
Geoff 9
No ratings yet
Geoff 9
17 pages
Drag & Drop Volume Profile Indicator User Guide: Dragdropvolumeprofile - Ex4 Next
No ratings yet
Drag & Drop Volume Profile Indicator User Guide: Dragdropvolumeprofile - Ex4 Next
7 pages
S M S T C Inverse Problems Lecture 2 Annotated
No ratings yet
S M S T C Inverse Problems Lecture 2 Annotated
49 pages
S M S T C Inverse Problems Lecture 1 Annotated
No ratings yet
S M S T C Inverse Problems Lecture 1 Annotated
30 pages
S M S T C Inverse Problems Lecture 4
No ratings yet
S M S T C Inverse Problems Lecture 4
47 pages
S M S T C Inverse Problems Lecture 5
No ratings yet
S M S T C Inverse Problems Lecture 5
39 pages
S M S T C Inverse Problems Lecture 3 Annotated
No ratings yet
S M S T C Inverse Problems Lecture 3 Annotated
38 pages
Askey Wilson
No ratings yet
Askey Wilson
23 pages
S M S T C Lecture Notes Lecture2
No ratings yet
S M S T C Lecture Notes Lecture2
9 pages
Daniel - Matta Analytics
No ratings yet
Daniel - Matta Analytics
1 page
S M S T C Lecture Notes Lecture5
No ratings yet
S M S T C Lecture Notes Lecture5
14 pages
S M S T C Lecture Notes Lecture4
No ratings yet
S M S T C Lecture Notes Lecture4
11 pages
Algebraic Der of Laguerre
No ratings yet
Algebraic Der of Laguerre
11 pages
Launcher Log
No ratings yet
Launcher Log
179 pages
Lecture Slides Week1
No ratings yet
Lecture Slides Week1
33 pages
Advanced Java Record
No ratings yet
Advanced Java Record
39 pages
Lecture Slides Week3
No ratings yet
Lecture Slides Week3
23 pages
Composite Fermions and Integer Partitions
No ratings yet
Composite Fermions and Integer Partitions
8 pages
Elliptic Problems in Weak Form
No ratings yet
Elliptic Problems in Weak Form
37 pages
Coherent Op
No ratings yet
Coherent Op
8 pages
Lecture Slides Week3
No ratings yet
Lecture Slides Week3
24 pages
#Infytq Preparation (For 2022 Batch) Previous Year Questions Series + Practice
No ratings yet
#Infytq Preparation (For 2022 Batch) Previous Year Questions Series + Practice
6 pages
Interface Lecture1
No ratings yet
Interface Lecture1
29 pages
What's A Satellite Assembly ?
No ratings yet
What's A Satellite Assembly ?
5 pages
Comb Trig With Cheby
No ratings yet
Comb Trig With Cheby
6 pages
SMSTC Adhm 2020 Slides
No ratings yet
SMSTC Adhm 2020 Slides
18 pages
Notes CC Unit 03
No ratings yet
Notes CC Unit 03
33 pages
Gradientflows 2
No ratings yet
Gradientflows 2
8 pages
Hankel Bullshit 2
No ratings yet
Hankel Bullshit 2
38 pages
Lecture Slides Week5
No ratings yet
Lecture Slides Week5
15 pages
Lecture Slides Week5
No ratings yet
Lecture Slides Week5
11 pages
Lecture 2 - Periodicity Properties and Lyndon Words
No ratings yet
Lecture 2 - Periodicity Properties and Lyndon Words
8 pages
Lecture 3 - Unavoidable Patterns
No ratings yet
Lecture 3 - Unavoidable Patterns
7 pages
S M S T C I P Exercises Solutions Week 6
No ratings yet
S M S T C I P Exercises Solutions Week 6
2 pages
Delta Ia-Plc DVP TP C en 20170913
No ratings yet
Delta Ia-Plc DVP TP C en 20170913
25 pages
How To Construct Qab Wald
No ratings yet
How To Construct Qab Wald
8 pages
Smart Parking System
No ratings yet
Smart Parking System
15 pages
Netgear Cm1000v2 User - Manual Um - en
No ratings yet
Netgear Cm1000v2 User - Manual Um - en
28 pages
Covert Channels: Presented by Michael Lemay
No ratings yet
Covert Channels: Presented by Michael Lemay
34 pages
Presentation of Final Project 2
No ratings yet
Presentation of Final Project 2
26 pages
EC - A1P - Language Test 3B
No ratings yet
EC - A1P - Language Test 3B
4 pages
Launching of Servizing App
No ratings yet
Launching of Servizing App
4 pages
711 - ACV - Computer Hardware Assembly & Maintenance - 21-05-2024
No ratings yet
711 - ACV - Computer Hardware Assembly & Maintenance - 21-05-2024
8 pages
Escalation Points
No ratings yet
Escalation Points
2 pages
Weekly Lesson Plan (Grade 10)
No ratings yet
Weekly Lesson Plan (Grade 10)
8 pages
IT English Test Unit 5
No ratings yet
IT English Test Unit 5
6 pages
OMAGND15 Fujitsu v5
No ratings yet
OMAGND15 Fujitsu v5
5 pages
Preparoty Worksheet 1 (Fill in The Blanks)
No ratings yet
Preparoty Worksheet 1 (Fill in The Blanks)
2 pages
Abstract Algebra
No ratings yet
Abstract Algebra
4 pages
How To Add A Right Margin To The Visual Studio Code Editor?: Stack Overflow
No ratings yet
How To Add A Right Margin To The Visual Studio Code Editor?: Stack Overflow
6 pages
Erwin Tutorial: Erwin Workplace Consists of The Following Main Areas
No ratings yet
Erwin Tutorial: Erwin Workplace Consists of The Following Main Areas
5 pages
Statistics II for Dummies
From Everand
Statistics II for Dummies
Deborah J. Rumsey
3.5/5 (31)
Markov Chains: From Theory to Implementation and Experimentation
From Everand
Markov Chains: From Theory to Implementation and Experimentation
Paul A. Gagniuc
No ratings yet

Slides1 mrbm2324

Uploaded by

Slides1 mrbm2324

Uploaded by

SMSTC

Modern Regression and Bayesian Methods

A company that uses a chemical paste in one of its production

A little reflection suggests that each batch may have a different

A little reflection suggests that each batch may have a different

A little reflection suggests that each batch may have a different

A little reflection suggests that each batch may have a different

yij = µ + bi + εij , bi ∼ N(0, σb2 ). (1)

yij = µ + bi + εij , bi ∼ N(0, σb2 ). (1)

yij = µ + bi + εij , bi ∼ N(0, σb2 ). (1)

yij = µ + bi + εij , bi ∼ N(0, σb2 ). (1)

yij = µ + bi + εij , bi ∼ N(0, σb2 ). (1)

This decomposition allows us to write down the density functions

A natural alternative approach is to rewrite model (1) in a more

A natural alternative approach is to rewrite model (1) in a more

A natural alternative approach is to rewrite model (1) in a more

which can be expressed in the usual vector-matrix notation as

A natural alternative approach is to rewrite model (1) in a more

which can be expressed in the usual vector-matrix notation as

In general, we can represent the marginal covariance matrix of y as

In general, we can represent the marginal covariance matrix of y as

In general, we can represent the marginal covariance matrix of y as

We can therefore write down the log-likelihood function fairly

β̂(σb2 ) = (X > Σ−1 X )−1 X > Σ−1 y ,

where n denotes the sample size.

β̂(σb2 ) = (X > Σ−1 X )−1 X > Σ−1 y ,

where n denotes the sample size.

β̂(σb2 ) = (X > Σ−1 X )−1 X > Σ−1 y ,

where n denotes the sample size.

This alternative formulation of the likelihood approach does not

In R, the likelihood approach is easily implemented as

The essential part of the output is shown below.

This shows the estimates of µ, σb and σ to be 60.053, 1.699 and

This has a Bayesian interpretation (see later lectures).

In R, the REML approach is easily implemented as

Actually, REML is the default method of estimation in the lme

With the paste data, the output is

In this case the estimates of σ and µ are the same as those

An important use of linear mixed models is in providing flexible

B05 B06 B07 B08

B01 B02 B03 B04

yijk = µ + bi + bij + εijk ,

yijk = µ + bi + bij + εijk ,

where yi denotes the ith group of data defined by random effects,

The ‘forward slash’ symbol represents ‘nesting’.

This produced the following output.

Formula : ˜1 | Cask %i n% Batch

This shows that the estimates of standard deviation associated

This shows that the estimates of standard deviation associated

This shows that the estimates of standard deviation associated

The analogues of F-tests to compare models are available in the

The analogues of F-tests to compare models are available in the

The analogues of F-tests to compare models are available in the

The analogues of F-tests to compare models are available in the

These data arose from a longitudinal study of a cohort of 407

These data arose from a longitudinal study of a cohort of 407

These data arose from a longitudinal study of a cohort of 407

These data arose from a longitudinal study of a cohort of 407

These data arose from a longitudinal study of a cohort of 407

The dataset has eight columns:

1 : School Number (1 to 33)

The questions of interest are:

(a) How does reading ability develop as children grow older?

Diggle, P.J., Liang, K.-Y. & Zeger, S.L. (1994). Analysis of

You might also like