Introduction Multilevel Analysis: Rens Van de Schoot
Introduction Multilevel Analysis: Rens Van de Schoot
Utrecht University
[email protected]
http://www.joophox.net
Introduction Multilevel Analysis
Rens van de Schoot
[email protected] / rensvandeschoot.wordpress.com
2
Multilevel Regression Model
Known in literature under a variety of names
Hierarchical linear model (HLM)
Random coefficient model
Variance component model
Multilevel model
Contextual analysis
Mixed Linear Model
3
Hierarchical Data Structure
Three level data structure
Groups at different levels may have different sizes
Response (outcome) variable at lowest level
Explanatory variables at all levels
4
Examples?
5
Traditional Approaches
Disaggregate all variables to the lowest level
Do standard analyses (anova, multiple regression)
Aggregate all variables to the highest level
Do standard analyses (anova, multiple regression)
Ancova with groups as factor
Some improvements:
explanatory variables as deviations from their group mean have both
deviation score and disaggregated group mean as predictor
(separates individual and group effects)
Why not? What is wrong with this?
6
Problems With Standard Analysis
of Hierarchical Data
Multiple Regression assumes
independent observations
independent error terms
equal variances of errors for all observations
(assumption of homoscedastic errors)
normal distribution for errors
With hierarchical data
observations are not independent
errors are not independent
different observations may have errors with different variances
(heteroscedastic errors)
7
Problems With Standard Analysis
of Hierarchical Data
Observations in the same group are generally not
independent
they tend to be more similar than observations from different
groups
selection, shared history, contextual group effects
The degree of similarity is indicated by the intraclass
correlation rho: r
Standard statistical tests are not at all robust against
violation of the independence assumption
That is why we need special multilevel techniques!
8
Sample size?
Hox, J., van de Schoot. R., & Matthijsse, S. (2012).
How few countries will do? Comparative survey
analysis from a Bayesian perspective. Survey
Research Methods, Vol.6, No.2, pp. 87-93
9
Research questions I/III
Questions with respect to variables at the lowest level
Intelligence (IQ) as predictor of school achievement (SA)
10
Research questions II/III
Questions with respect to the influence of variables at
a higher level on the dependent variable on the
lowest level
Mean intelligence of a class (MIQ) as predictor of
school achievement (SA); (control for individual IQ)
11
Research questions III/III
Questions with respect to the interaction of variables
on different levels (moderation effect)
The relation between intelligence and school
achievement is not the same in all classes
12
Graphical Picture of Simple
Two-level Regression Model
Outcome variable on pupil level
Explanatory variables at both levels: individual & group
Residual error at individual level
Plus residual error at school level
school size
pupil sex grade
error
error
School level
Pupil level
13
Regression analysis
In ordinary regression, with one explanatory variable X:
Y
i
= b
0
+ b
1
X
i
+ e
i
b
0
intercept,
b
1
regression slope,
e
i
residual error term
14
Regression analysis
15
Building the Multilevel Regression
Model: Random intercept model
In multilevel regression, at the lowest level:
Y
ij
= b
0j
+ b
1j
X
ij
+ e
ij
b
0j
intercept,
b
1j
regression slope,
e
ij
residual error term
subscript i for individuals, j for groups
each group has its own intercept coefficient b
0j
and its own slope coefficient b
1j
16
Building the Multilevel Regression
Model: Intercept only model
In multilevel regression, at the lowest level:
Y
ij
= b
0j
+ e
ij
Random intercept model:
b
0j
= g
00
+ u
0j
g
00
is the intercept of b
0j
u
0j
is the residual error term in the
equation for b
0j
17
Building the Multilevel Regression
Model: Random intercept model
In multilevel regression, at the lowest level:
Y
ij
= b
0j
+ b
1j
X
ij
+ e
ij
Random intercept model:
b
0j
= g
00
+ u
0j
g
00
is the intercept of b
0j
u
0j
is the residual error term in the
equation for b
0j
18
Building the Multilevel Regression
Model: Random intercept model
19
Building the Multilevel Regression
Model: Intercept only model
Y
ij
= b
0j
+ b
1j
X
ij
+ e
ij
Random intercept model:
b
0j
= g
00
+ u
0j
g
00
is the intercept of b
0j
u
0j
is the residual error term in the
equation for b
0j
Random slope model:
b
1j
= g
10
+ u
1j
g
10
is the intercept of
1j
u
1j
is the residual error term in the equation for b
1j
20
Difference with the usual
regression model:
Each class has a different intercept coefficient b0j
and a different slope coefficient b1j
Since the intercept and the slope coefficients vary
across the classes: random coefficients
=> Random intercept model & random slope model
21
Building the Multilevel Regression
Model: Random slope model
22
Building the Multilevel Regression
Model: the Second (Group) Level
Next step:
explain the variation of the regression coefficients b0j
and b1j by introducing explanatory variables at the
class level
23
Building the Multilevel Regression
Model: the Second (Group) Level
At the lowest (individual) level we have
Y
ij
= b
0j
+ b
1j
X
ij
+ e
ij
b
0j
= g
00
+ g
01
Z
j
+ u
0j
g
00
and g
01
are the intercept and slope to predict b
0j
from Z
j
u
0j
is the residual error term in the equation for b
0j
24
Building the Multilevel Regression
Model: Cross level interaction
At the lowest (individual) level we have
Y
ij
= b
0j
+ b
1j
X
ij
+ e
ij
b
0j
= g
00
+ g
01
Z
j
+ u
0j
g
00
and g
01
are the intercept and slope to predict b
0j
from Z
j
u
0j
is the residual error term in the equation for b
0j
b
1j
= g
10
+ g
11
Z
j
+ u
1j
g
10
and g
11
are the intercept and slope to predict
1j
from Z
j
u
1j
is the residual error term in the equation for b
1j
25
Building the Multilevel Regression
Model: Single Equation Version
At the lowest (individual) level we have
Y
ij
= b
0j
+ b
1j
X
ij
+ e
ij
and at the second (group) level
b
0j
= g
00
+ g
01
Z
j
+ u
0j
b
1j
= g
10
+ g
11
Z
j
+ u
1j
Combining (substitution and rearranging terms) gives
Y
ij
= g
00
+ g
10
X
ij
+ g
01
Z
j
+ g
11
Z
j
X
ij
+ u
1j
X
ij
+ u
0j
+ e
ij
26
Building the Multilevel Regression
Model: Single Equation Version
Y
ij
= [g
00
+ g
10
X
ij
+ g
01
Z
j
+ g
11
Z
j
X
ij
] + [u
1j
X
ij
+ u
0j
+ e
ij
]
This equation has two distinct parts
[g
00
+ g
10
X
ij
+ g
01
Z
j
+ g
11
Z
j
X
ij
] contains all the fixed coefficients,
it is called the fixed part of the model
[u
1j
X
ij
+ u
0j
+ e
ij
] contains all the random error terms, it is
called the random part of the model
27
Building the Multilevel Regression
Model: Interpretation
Y
ij
= [g
00
+ g
10
X
ij
+ g
01
Z
j
+ g
11
Z
j
X
ij
] + [u
1j
X
ij
+ u
0j
+ e
ij
]
Several error variances
e
2
variance of the lowest level errors e
ij
2
u0
variance of the highest level errors u
0j
2
u1
variance of the highest level errors u
1j
u01
covariance of u
0j
and u
1j
28
Full Multilevel Regression Model
Explanatory variables at all levels
Higher level variables predict variation of lowest level
intercept and slopes
Predicting the intercept implies a direct effect
Predicting slopes implies cross-level interactions
29
Model Exploration
1 Intercept-only model
calculate intraclass correlation
2 Fixed model, 1
st
level predictor variables
test individual slopes for significance
3 Model intercept by 2
nd
level predictor variables
test for significance, how much intercept variance explained?
4 Random coefficient model
test if any 1
st
level slope has a significant variance
component (this is best done one-by-one)
5 Model random slopes by higher level variables: cross
level interactions
test for significance, how much slope variance is explained?
30
Example: Popularity in Schools
Outcome: popularity rating
100 classes, 2000 pupils
Explanatory variables
Pupil level: sex (0=boy, 1=girl)
Class level: teacher experience (in years)
31
Graphical Picture of Simple
Two-level Regression Model
32
Popularity Example:
Intercept-only Model
Popularity
ij
= g
00
+ u
0j
+ e
ij
Estimates (st. err.)
g
00
= 5.31 (.10) (This is just the overall average popularity)
e
2
= 0.64 (.02)
2
u0
= 0.88 (.13)
33
Popularity Example:
Fixed Model
Popularity
ij
= g
00
+ g
10
sex
ij
+ u
0j
+ e
ij
Estimates (st. err.)
g
00
= 4.89 (.10),
g
10
= 0.84 (.03)
e
2
= 0.46 (.02)
2
u0
= 0.85 (.12)
34
Popularity Example:
Fixed Model + Higher Level Variable
Popularity
ij
= g
00
+ g
10
sex
ij
+ g
01
t.exp.
j
+ u
0j
+ e
ij
Estimates (st. err.)
g
00
= 3.56 (.17),
g
10
= 0.84 (.03),
g
01
= 0.09 (.01)
e
2
= 0.46 (.02)
2
u0
= 0.48 (.07)
35
Popularity Example:
Random Coefficient Model
Popularity
ij
=
g
00
+ g
10
sex
ij
+ g
01
t.exp.
j
+ u
0j
+ u
1j
sex
ij
+ e
ij
Estimates (st. err.)
g
00
= 3.34 (.16), g
10
= 0.84 (.06), g
01
= 0.11 (.01)
e
2
= 0.39 (.01)
2
u0
= 0.41 (.06)
u01
= 0.02 (.04) (covariance between intercept and slope)
2
u1
= 0.27 (.05)
Slope variation for sex
36
Popularity Example:
Random Coefficient Model + Interaction
Popularity
ij
= g
00
+ g
10
sex
ij
+ g
01
t.exp.
j
+ g
11
sex
ij
t.exp.
j
+ u
0j
+ u
1j
sex
ij
+ e
ij
Estimates (st. err.)
g
00
= 3.31 (.16), g
10
= 1.33 (.13), g
01
= 0.11 (.01),
g
11
= -0.03 (.01)
e
2
= 0.39 (.01)
2
u0
= 0.40 (.06)
u01
= 0.02 (.04)
2
u1
= 0.22 (.04)
Smaller, but still significant slope variation for sex
37
5-day course Multilevel Analyses in Mplus
21-25 jan. 2013
http://www.uu.nl/faculty/socialsciences/NL/organisatie/graduateschool/promoveren/onderwijs%20v
oor%20promovendi/courseoffering/Pages/Multilevel-Analyses-using-Mplus.aspx
The 9th International Multilevel Conference is on
March 27-28 (2013). http://multilevel.fss.uu.nl/
Prior to the conference (26th of March) a one-day
course is taught by prof. Stef van Buuren on Mutiple
Imputation of Multilevel missing data in MICE.
5
th
Mplus users meeting will be organized, 25th of
March http://mplus.fss.uu.nl