Chapter 22 - Elements of Hierarchical Regression Models
Chapter 22 - Elements of Hierarchical Regression Models
Chapter 22 - Elements of Hierarchical Regression Models
22
ELEMENTS OF HIERARCHICAL
LINEAR REGRESSION MODELS
1 Kreft, I. and De Leeuw, J., Introducing Multilevel Modeling, p. 1. Sage Publications, California, 2007.
2 See, for example, Luke, D. S., Multilevel Modeling, Sage Publications, California, 2004; Twisk, J. W.
R., Applied Multilevel Analysis, Cambridge University Press, Cambridge, 2006; Hox, J. J., Multilevel
Analysis: Techniques and Applications, 2nd edn, Routledge, 2010; Bickel, R., Multilevel Analysis for
Applied Research: Its Just Regression!, Guilford Press, 2007; Gelman, A. and Hill, J., Data Analysis
Using Regression and Multilevel/Hierarchical Models, Cambridge University Press, Cambridge, 2007;
Byrk, A. S. and Raundenbush, S. W., Hierarchical Linear Models, Sage Publications, California, 1992.
For more advanced discussion with applications, see Rabe-Hesketh, S. and Skrondal, A., Multilevel and
Longitudinal Modeling Using Stata: Vol 1: Continuous Responses, 3rd edn, Stata Press, 2012. Volume 2
deals with Categorical Responses, Counts and Survival. Finally, there is Hox, J., Multilevel modeling:
when and why, in I. Balderjahn, R. Mathar, and M. Schader (eds.), Classication, Data Analysis, and Data
Highways, Springer-Verlag, Berlin, 1998, pp. 14754..
3 For an interesting application of HLM involving President Obamas 2008 election, see http://www.
elecdem.eu/media/universityofexter/elecdem/pdfs/intanbulwkspjan2012/Hierarc.
4 For a list of multilevel modeling software packages, see Luke, op cit., p. 74.
22.1
22.2
NELS
NELS is a longitudinal survey that follows eighth-grade students as they transit into
and out of high school. The objective of the survey is to observe changes in students
lives during adolescence and the role that school plays in promoting growth and positive life choices. The study began in 1988 when the students were in the eighth grade
and followed them through grade 12; they also followed students who dropped out of
high school as well. In the rst three waves, data were collected on achievement tests
in mathematics, reading, social studies, and science.
The baseline 1988 survey included 24,599 students, one parent of each student
respondent, school principals (about 1,302), and teachers (about 5,000). The vast
amount of data collected makes it possible to analyze it at various levels, such as
parent, teacher, and school as well as to conduct analysis by region, race/ethnicity,
public vs. private school, and the like. It is only when you examine the actual data
that you will appreciate the wealth of the data. You can analyze it at several levels,
depending on your interest and computing facilities.
In the data, the student-level, or Level 1, variables are socio-economic status
of students (SES), the number of hours of homework done per week (Homework),
students race (white coded as 1, and non-white coded as 0), parents education level
5 For an overview of these surveys, see Vartanian, T. P., Secondary Data Analysis, Pocket Guides to
Social Work Research Methods, Oxford University Press, Oxford, 2011.
(Parented), and class size, measured by student/teacher ratio (ratio), . The macro-level, or Level 2, variables are school id (schid), education sector (public schools coded
as 1 and private schools coded as 0) (Public), percentage of ethnic minority students
in school (% minorities), geographic region of school (Northeast, North Central,
South and West), represented by dummy variables, and composition of school
(urban, suburban, and rural) (Urban), also represented by dummy variables.
The dependent variable in our analysis is the score on a math test, which is a Level
1 variable.
22.3
(22.1)
where Math = score on a math test, u is the error term, and i is the ith student,
and where we assume the error term follows the usual (classical) OLS assumptions,
in particular the assumption that ui ~ N(0,2), that is they are identically and independently distributed as a normal variate with zero mean and constant a variance.
In short, they are NIID. The intercept, B1, in this model is assumed to be xed, for
its value is assumed the same across all schools and individuals. Hence, we can call
(22.1) a xed coecient model. If we estimate this regression, what we obtain is the
average math score of all 260 students regardless of their school aliation.
Since our data are clustered into 10 schools, it is quite likely that one or more
assumption of the (normal) classical linear regression model may not hold true. To
allow for this possibility, we estimate regression (22.1) with Statas robust standard
errors option.8 The results are shown in Table 22.2.
Table 22.2 Nave regression with robust standard error option.
. regress math, robust
Linear regression
Number of obs
260
F( 0, 259)
0.00
Prob > F
R-squared
0.0000
Root MSE
11.136
Robust
math
_cons
Coef.
51.3
Std. Err.
.6906026
t
74.28
P>|t|
0.000
52.65991
Since there is no regressor in this model, the _cons (for constant) simply represents the average math score of 260 students regardless of their school aliation in
the sample. This average score is statistically highly signicant.
Although the results are based on the robust standard errors, they may not be
reliable, for the robust option in the present case neglects the fact that our data
are clustered into school districts (schid). It is very likely that observations within
a cluster (i.e. school here) are correlated: It is dicult to maintain that the math
scores of students in the same school are uncorrelated. In fact, students in the same
school tend to have test scores that are correlated, since they all study in the same
environment and probably have similar backgrounds. This correlation is called
the clustering problem or the Moulton Problem, named after Brent Moulton,
who published an inuential paper on this subject.9 Briey, the Moulton problem
arises because the standard regression techniques applied to hierarchical data often
exaggerate the statistical signicance of the estimated coecients. As we will show
shortly, the standard errors of the estimated coecients are underestimated, thereby
exaggerating the estimated t values.10
The standard error reported in Table 22.2 does not correct for the Moulton problem, even though it may correct for heteroscedasticity in the error term. One way
to take into account the clustering problem is to use the clustered standard errors.
These standard errors allow regression errors to be correlated within a cluster, but
8 The robust option in Stata uses the HuberWhite sandwich estimator. Such standard errors try to
take into account some of the violations of the classical linear regression assumptions, such as non-normality of regression errors, heteroscedasticity, and outliers in the data.
9 See Moulton, B. (1986) Random group eects and the precision of regression estimates, Journal of
Econometrics, 32, 38597.
10 The reason for this is that with correlation among observations in a given school (or cluster), we
actually have fewer independent observations than the actual number of observations in that school.
assume that the regression errors are uncorrelated across clusters. If we use the cluster
option, we need not use the robust option, for the latter is implied in the former.11
Using Statas cluster option, we obtain the results in Table 22.3.
Table 22.3 Estimation of the nave model with the clustered standard errors.
regress math, cluster (schid)
Linear regression
Number of obs
260
F( 0, 9)
0.00
Prob > F
R-squared
0.0000
Root MSE
11.136
Coef.
51.3
Std. Err.
3.402609
t
15.08
P>|t|
0.000
58.99724
The command cluster (schid) is Statas command to use the clustered standard
error option with the cluster variables, in this case a single Level 2 variable, schid. If
you compare the results in Table 22.3 with those in Table 22.2, you will notice that
the coecient value remains the same in both cases, but the standard errors are
vastly dierent. This suggests that the error term in the nave model suers from
heteroscedasticity, autocorrelation, or other related problems. As a result, the OLS
standard errors, even with the robust option, are severely underestimated.12
In the present case, even with the cluster option, the estimated coecient is highly
signicant. However, this cannot be taken for granted in all situations.
22.4
11 It may be noted that clustered standard errors are sometimes known as Rogers standard errors, since
Rogers implemented them in Stata in 1993. See Rogers, W. (1993) sg17: Regression standard errors in
clustered samples, Stata Technical Bulletin, 13, 1923.
12 If the number of clusters or groups (schools in the present case) is small relative to the overall sample
size, the clustered standard errors could be somewhat larger than the OLS results.
tests13 resulting from the analysis, this does not make much practical sense. In HLM,
we do not estimate separate intercepts for the various schools. Instead, we assume
that these intercepts are randomly distributed around a (grand) mean value with
certain variance. More specically, we assume
Mathij = B1j + uij
(22.2)
where Mathij = math score for student i in school j and B1j = intercept value of school
j, i going from 1 to 260 and j going from 1 to 10. We further assume that
B1j = 1+ vj
(22.3)
where vj is the error term. The coecient 1 represents the mean value of math score
across all students and across all schools. We can call it the grand mean. The individual class mean math score varies around this grand mean. We assume the error term
vj has zero mean and constant variance.
Combining Eqs. (22.2) and (22.3), we obtain:
Mathij
J1 v j uij
J1 wij
(22.4)
where
wij
v j uij
(22.5)
That is, the composite error term wij is the sum of school-specic error term vj (the
Level 2 error term) and student-specic error term uij (the Level 1 error term), or
the regression error term. Assuming these errors are independently distributed, we
obtain:
V2w
ij
V2j V2ij
(22.6)
That is, the total variance is the sum of the error variance due to the school eect
and due to the individual student, or the usual regression error term.
If we take the ratio of the school-specic variance to the total variance, we obtain
what is known as the intra-class correlation coecient (ICC),14 which is denoted
by rho (= )
ICC U
V2j
V2j
V2j Vij2
wij2
(22.7)
It gives the proportion of the total variation in math scores that is attributable to
dierences among schools (i.e. clusters). In general, ICC is the proportion of the total
variance that is between groups or clusters. A higher ICC means school dierences
account for a larger proportion of the total variance. To put it dierently, a higher
ICC means each additional member of a cluster provides less unique information.
As a result, in cases of high ICC, a researcher would prefer to have more clusters
with fewer members per cluster than to have more members in a small number of
clusters.
13 The power of a test is the probability of rejecting the null hypothesis when it is false; it depends on
the value of the parameters under the alternative hypotheses.
14 ICC is much dierent from the usual Pearsons coecient of correlation. The former is correlation
of observations within a cluster, whereas the latter is correlation between two variables. For example, a
Pearson correlation coecient of 0.3 may be considered small, but an ICC of 0.3 is considered quite large.
Once we estimate (22.4), we can easily obtain the value of ICC given in Eq. (22.7).
For this purpose, we need to use statistical software specically designed to estimate
such models. Toward that end, we can use the xtmixed command in Stata 12.15 The
xtmixed procedure in Stata ts linear mixed models to hierarchical data. Mixed
models contain both xed eects and random eects. The xed eect is given by
the coecient 1 and the random eect is given by wij.
The xed eects are similar to the usual regression coecients and are estimated
directly. However, the random eects are not estimated directly but are obtained
from their estimated variances and covariances. Random eects mean random intercepts, or random slopes, or both that take into account the clustered nature of the
data. In estimating such models, the error term is usually assumed to be normally
distributed.16 The estimated coecients involve one or more iterations, which are
usually obtained by the NewtonRaphson interative procedure.
We rst present the results of (22.4) (Table 22.4) and then comment on them.
Table 22.4 HLM regression results of model (22.4): random intercept but no
regressor.
xtmixed math || schid:,variance
performing EM optimization:
Performing gradient-based optimization:
Iteration 0: log likelihood = 937.38956
Iteration 1: log likelihood = 937.38956
Computing standard errors:
Mixed-eects ML regression
Number of obs
260
Number of groups
10
20
avg
26.0
max
67
Wald chi2(0)
Coef.
Std. Err.
_cons
48.87206
1.835121
Random-eects Parameters
26.63
P>|z|
0.000
52.46883
Estimate
Std. Err.
var(_cons)
30.54173
14.49877
12.04512
77.44192
var(Residual)
72.23582
6.451525
60.63594
86.05481
schid: Identity
LR test vs. linear regression: chibar2(01) = 115.35 Prob >= chibar2 = 0.0000
Note: The term identity means that these are random eects at Level 2 variable, schid, in
the present case.
Note: The Wald statistic is not reported because there are no regressors in the model.
15 Stata has an alternative procedure, called gllamm, that can also estimate the mixed eects models.
16 If the assumption of normality and large samples are not met, the ML estimates are unbiased, but
their standard errors are biased downward. On this see, Van Deer Leeden, R., Busing, F., and Meijer, E.
(1997) Applications of Bootstrap Methods to Two-level Models. Paper presented at Multilevel Conference,
Amsterdam, April 12. On bootstrap methods, See Chapter 23.
Before we discuss these results, let us look at Statas xtmixed command. The
xtmixed command is followed by the regressand (math in our case), followed by two
vertical lines, followed by the school ID (schid), which is the Level 2 variable, which is
followed by options; here we use the option variance, which tells Stata that we need
the variances of the two error terms. Without this option, Stata produces standard
deviations of the two error terms (i.e. the square root of the variance).
The output is divided into three parts. The rst part gives general statistics, such
as the number of observations, the number of groups (schools in the present case),
the value of the likelihood function, and the Wald statistic, which like the R2 in OLS,
gives the overall t of the model. In the present example, there are no regressors,
so the value of Wald statistic is not reported. But in general, the Wald statistic has
degrees of freedom and it follows the chi-square distribution.
The second part of the output gives information about the estimated coecients,
their standard errors and their z statistics (remember we are using the ML method with normal distribution), and their p values. Here the estimated value of the
common intercept 1 is about 48.87, which is highly signicant. The coecient(s)
reported in this part are xed eects. But notice that this value is smaller than that
obtained from the OLS regression with the robust or the clustered standard method.
The standard error of the xed coecient is also dierent from that obtained in
Tables 22.2 or 22.3.
The third part of the table gives the estimates of the error variances, V2j and V2ij ,
their standard errors, and the 95% condence intervals. Both these estimates are statistically signicant.17 Notice that the estimated V2ij of about 72.23 is much smaller than
the estimate of the error variance given in Table 21.2 or Table 22.3, which is 124.01
(=11.1362). What this suggests is that much of the latter error variance is accounted for
by the introduction of the random intercept to the model, that is, by explicitly considering the impact of the Level 2 variable.
Let us examine the output in Table 22.4 further. The error variances, V2j and V2ij
are, respectively, 30.54 and 72.23, giving the total variance of 102.77. Using Eq. (22.7),
we obtain an ICC value of about 0.30. This value suggests that about 30% of the total
variation in math scores is attributed to a Level 2 variable (school ID). This means
that the math scores of students in a school are not independent, which violates a
critical assumption of the classical linear regression model. To put it dierently, in
analyzing math scores we should not neglect the nested nature of our data.
This is further conrmed by the result of the likelhood ratio (LR) test,18 whose
value is given at the end of the output of Table 22.4. The LR test compares the random coecient model with the xed coecient OLS regression. Since this test is
signicant, we can conclude that the random coecient model is preferable to the
xed-coecient OLS model.
Some software packages produce a statistic called deviance, which is a measure of
judging the extent to which a model explains a set of data when parameter estimation
is carried out by the method of maximum-likelihood (ML). It is computed as follows:
Deviance = 2lf
(22.8)
where lf is the log-likelihood function value. The smaller the value of the deviance,
the better the model. For the nave OLS model, the deviance value is 2(995.06) =
2
1990.12. For the nave HLM model, the deviance is 2(937.38) = 1874.76. Therefore,
on the basis of deviance, the HLM nave model is preferable to the OLS nave model.
More formally, if we use HLM instead of OLS, the deviance is reduced by about
115.36 (1990.12 1874.76), which is simply the value of the LR statistic given at the
end of Table 22.4. And this LR value is highly statistically signicant, for its p value is
practically zero.19
Neglecting ICC can have serious consequences on Type I error, for the nominal
and actual levels of signicance can dier substantially. Assuming a nominal level of
signicance of 5%, a sample size of 50, and an ICC of 0.20, the actual level of signicance is about 59%. Also, assuming a nominal level of signicance of 5%, sample size
of 50 and an ICC of 0.01, the actual level of signicance is 11%.20
What all this suggests is that one should not neglect ICC in analyzing multilevel
data.
22.5
(22.9)
Again, note that we are pooling 260 observations to estimate this regression, without worrying about the Level 2 variable. The results are shown in Table 22.5.
We also present the results of Eq. (22.9) with clustered standard errors (Table
22.6).
As you would expect, there is a positive and statistically signicant relationship
between the grade on a math test and the number of hours of homework. But notice
that the clustered standard errors are substantially higher than the OLS standard
errors. Thus, it is important to take the structured nature of the data explicitly in the
analysis.
Equation (22.9) is a xed coecient model, for it assumes that the regression
coecient is the same across all schools. This assumption may be as unrealistic as
the assumption that the intercept remains the same across schools. Shortly, we will
relax these assumptions with HLM modeling.
22.6
19 Kreft and De Leeuw suggest that one model has signicant improvement over another model if the
dierence between their deviances is at least twice as large as the dierence in the number of estimated
parameters. See Kreft et al., p. 65.
20 For further details, see Barcikowski, R. S. (1981) Statistical power with group mean as a unit of analysis, Journal of Educational Statistics, 6(3), 26785.
10
Number of obs
260
F( 1, 258)
88.65
Prob > F
0.0000
R-squared
0.2470
Root MSE
9.6185
Robust
t
P>|t|
homework
math
3.571856
Coef.
Std. Err
.379369
9.42
0.000
2.824802
4.31891
_cons
44.07386
.9370938
47.03
0.000
42.22854
45.91918
Note: Root MSE is the standard error of the regression, that is, the square root of the error
variance. The latter is therefore about 93.73. For this model, the log-likelihood statistic is
958.1770.
Table 22.6 OLS regression of math grades on hours of work with clustered
standard errors.
. regress math homework, cluster(schid)
Linear regression
Number of obs
260
F( 1, 9)
21.54
Prob > F
0.0012
R-squared
0.2470
Root MSE
9.6815
Std. Err.
P>|t|
homework
math
3.571856
.7695854
4.64
0.001
1.830933
5.312779
_cons
44.07386
2.23222
19.74
0.000
39.02423
49.12349
(22.10)
In this model, the intercept is random, but the slope coecient is xed (there is no
subscript j on B2.) Now instead of estimating separate intercept for each school, we
postulate that the random intercept in Eq. (22.10) varies with the school ID, the Level
2 variable, as follows:
B1j = 1 + 2schidj + vj
(22.11)
where schid = school ID. Notice how the original intercept parameter, B1j, now
becomes the dependent variable in the regression (22.11). This is because we are now
treating B1j as a random variable.
What Eq. (22.11) states is that the random intercept is equal to the average intercept for all schools (= 1) and that it moves systematically with the school ID; each
school may have special characteristics.
Combining Eqs. (22.10) and (22.11), we obtain:
Mathij J1 J 2 schid j v j B2 Homeworkij uij
J1 J 2 schid j B2 Homeworkij (v j uij )
(22.12)
J1 J 2 schid j B2 Homeworkij wij
where wij = vj + uij, that is the composite error term wij is the sum of school-specic
error term and the regression error term, which are assummed to be independent of
each other. In this model the original intercept B1j is not explicitly present, but it can
be retrieved in the estimating procedure, as discussed below.
For this model the total, school-specic, and student specic variances are the same as
in Eq. (22.6). The estimated values of these variances will enable us to estimate the ICC.
Using the xtmixed command of Stata 12, the regression results of Model (22.12)
are as shown in Table 22.7 (compare this output with that given in Table 22.6).
Table 22.7 Results of regression (22.12): random intercept, constant slope.
xtmixed math homework || schid:,variance
Performing EM optimization:
Performing gradient-based optimization:
Iteration 0: log likelihood = 921.32881
Iteration 1: log likelihood = 921.32881
Computing standard errors:
Mixed-eects ML regression
Number of obs
260
Number of groups
10
Coef.
Std. Err.
20
avg
26.0
max
67
Wald chi2(1)
34.37
0.0000
P>|z|
homework
2.214345
.3777094
5.86
0.000
1.474048
2.954642
_cons
44.97838
1.724798
26.08
0.000
41.59784
48.35892
Random-eects Parameters
Estimate
Std. Err.
schid: Identity
var(_cons)
22.50327
10.99337
8.638015
58.62426
var(Residual)
64.2578
5.741049
53.93568
76.55536
LR test vs. linear regression: chibar2(01) = 73.70 Prob >= chibar2 = 0.0000
Interpretation of results
As in Table 22.6, the rst part of the table gives summary measures, such as the
number of observations in the sample, the number of macro- or group variables (10
Damodar Gujarati (2015) Econometrics by Example, Second Edition, Palgrave
11
12
in the present case), the log-likelihood function, and the Wald (chi-square) statistic
as a measure of the overall t of the model. In the present case, the Wald statistic
is highly signicant, suggesting that the model we have used gives a good t. The
log-likelihood value is not particularly useful in the present case; it is useful if we are
comparing two models.
The middle part of the table gives output that is quite similar to the usual OLS
output, namely, the regression coecients, their standard errors, their z (standard
normal) values, and the 95% condence intervals for the estimated coecients. As
you can see, the estimated coecients are highly statistically signicant.
The next part of the table is the special feature of HLM modeling. It gives the
variance of the random intercept term (=22.50), the error variance (=64.25), from
which we can compute the total variance ( V2w = 86.75 = 22.50 + 64.25). From these
numbers, we obtain:
22.50
| 0.26
86.75
ICC
(22.13)
That is, about 26% of the total variance in math scores is accounted for by dierences
in schools. This result, therefore, casts doubt on the OLS results given in Table 22.6.
The LR test given at the end of Table 22.7 shows that the random intercept/constant
slope model, which takes into account explicitly the Level 2 variable, school ID, is
preferable to the OLS model.21
22.7
(22.14)
In this model, both intercept and slope coecients are random, as they carry the j
(Level 2) subscript.
We assume that the random intercept evolves as per (22.11) and the random slope
evolves as follows:
B2 j
O1 O2 schid j Z j
(22.15)
Mathij
(22.16)
21 The deviance for the OLS model is 1916.35 and that for the random intercept model is 1842.65,
a reduction of about 73.7, which is precisely the LR value in Table 22.7, and this LR value is highly
signicant.
In Eq. (22.16), the rst four terms on the right-hand side are xed and the terms in
the parenthesis are random. Model (22.16) is known as a mixed-eects model.
The random eects include the usual regression error term, uij, the error term
vj associated with the random intercept term, which represents variability among
schools, and j representing variability in the slope coecients across schools.
A noteworthy feature of (22.16) is that it includes an interaction term between
schid and Homework, which brings together variables measured at dierent levels in
hierarchically structured data (Table 22.8).
Table 22.8 Results of Model 22.14: random intercept, random slope, with
interaction.
. xtmixed math homework cp || schid: homework, variance
xtmixed math homework cp || schid: homework, variance
Performing EM optimization:
Performing gradient-based optimization:
Iteration 0: log likelihood = 888.11122
Iteration 1: log likelihood = 888.11122
Computing standard errors:
Mixed-eects ML regression
Number of obs
260
Number of groups
10
Coef.
Std. Err.
20
avg
26.0
max
67
Wald chi2(2)
5.88
0.0530
P>|z|
homework
.9656993
2.109296
0.46
0.647
5.099843
3.168445
cp
.0000805
.000046
1.75
0.080
9.68e-06
.0001707
_cons
44.82843
2.487166
18.02
0.000
39.95368
49.70319
Random-eects Parameters
Estimate
Std. Err.
schid: Independent
var(homework)
13.16657
6.666627
4.880718
35.51905
var(_cons)
55.87579
27.26257
21.47389
145.3907
var(Residual)
43.29007
3.971024
36.16655
51.81667
chi2(2) = 103.91
First, notice that in the xtmixed command, after the || sign, we use schid, the
Level 2 variable, followed by the regressor, homework. If we omit this regressor, we
will be back to the random intercept but xed regressor model. Since in this model
homework is the only regressor, we are assuming that the slope coecient of this
variable varies from school to school.
Damodar Gujarati (2015) Econometrics by Example, Second Edition, Palgrave
13
14
The results given in this table seem perplexing. The coecient of homework is
negative, although it is statistically insignicant. The coecient of the cross-product
term is positive and is signicant at about the 8% level. This suggests that homework
combined with school has a positive impact on the test score: Better schools and
more homework have positive eect on test scores.
Looking at the random eects parameters in the third part of this table, it seems
the variance of the random slope coecient is signicant at the 5% level, suggesting
that the slope coecient is random.
22.8
1. HLM with random intercept but xed slope coecients of the two
regressors
In this model (Table 22.9), both slope coecients are statistically signicant and have
the correct signs math score is positively related to the hours of homework and
negatively related to the ratio variable the higher the student/teacher ratio, the
lower the math performance, ceteris paribus. Also, note that the variances of both
the (random) intercept and the regression error term are statistically signicant.
From the LR test given in this table, we can say that this model is superior to the OLS
model.
3. HLM with random intercept and variable ratio coecient but xed
homework coecient
In this model (Table 22.11), both slope coecients are individually statistically highly
signicant and have the correct signs, but the cross-product between schid and ratio
is not. It may be that the student/teacher ratio does not vary much from school to
school.
Table 22.9 HLM with random intercept but xed slope coecients.
xtmixed math homework ratio || schid:,variance
Performing EM optimization:
Performing gradient-based optimization:
Iteration 0: log likelihood = 918.19803
Iteration 1: log likelihood = 918.19803
Computing standard errors:
Mixed-eects ML regression
Group variable: schid
Coef.
Std. Err.
Number of obs
260
Number of groups
10
20
avg
26.0
max
67
Wald chi2(2)
46.83
0.0000
P>|z|
homework
2.218977
.3740691
5.93
0.000
1.485815
2.952139
ratio
.9342158
.3106066
3.01
0.003
1.542993
.3254381
_cons
59.46378
5.009873
11.87
0.000
49.6446
69.28295
Random-eects Parameters
Estimate
Std. Err.
schid: Identity
var(_cons)
10.58601
5.875833
var(Residual)
64.28677
5.746139
chibar2(01) = 23.35
3.566705
31.41935
53.95588
76.59572
22 One can also use information criteria, such as the Akaike or Schwarz, that we discussed earlier in the
text, to choose among the four models.
15
16
Table 22.10 HLM with random intercept, one random coecient, and one xed
coecient.
xtmixed math homework ratio cp|| schid: homework, variance
Performing EM optimization:
Performing gradient-based optimization:
Iteration 0: log likelihood = 886.4015
Iteration 1: log likelihood = 886.4015
Computing standard errors:
Mixed-eects ML regression
Number of obs
260
Number of groups
10
20
Coef.
Std. Err.
avg
26.0
max
67
Wald chi2(3)
10.01
0.0184
P>|z|
homework
.9382923
2.082309
0.45
0.652
5.019544
3.142959
ratio
1.137399
.5558153
2.05
0.041
2.226777
.0480207
cp
.0000798
.0000454
1.76
0.079
9.18e06
.0001688
_cons
62.45449
8.840323
7.06
0.000
45.12778
79.78121
Random-eects Parameters
Estimate
Std. Err.
schid: Independent
var(homework)
12.82021
6.535745
4.720113
34.82073
var(_cons)
37.15618
19.35477
13.38559
103.1394
var(Residual)
43.37125
3.987358
36.21981
51.93469
chi2(2) = 67.41
22.9
Table 22.11 HLM with random intercept, one random coecient, and one xed
coecient.
xtmixed math homework ratio cpr || schid: ratio, variance
Performing EM optimization:
Performing gradient-based optimization:
Iteration 0: log likelihood = 917.32341
Iteration 1: log likelihood = 917.17633
Iteration 2: log likelihood = 917.13629
Iteration 3: log likelihood = 917.13527
Iteration 4: log likelihood = 917.13527
Computing standard errors:
Mixed-eects ML regression
Group variable: schid
Coef.
homework
2.222804
ratio
cpr
_cons
Std. Err.
Number of obs
260
Number of groups
10
20
avg
26.0
max
67
Wald chi2(3)
53.58
0.0000
P>|z|
.3721716
5.97
0.000
1.100971
.296897
3.71
3.67e06
2.38e06
1.54
59.94752
4.516209
13.27
Random-eects Parameters
Estimate
2.952246
0.000
1.682878
.5190637
0.123
9.97e07
8.34e06
0.000
51.09591
68.79912
Std. Err.
schid: Independent
var(ratio)
4.56e15
7.54e14
3.73e29
.5567737
var(_cons)
8.037677
4.752642
2.522429
25.61192
var(Residual)
64.28484
5.745975
53.95424
76.59343
chi2(2) = 15.77
24 For a technical discussion of various types of standard errors, see Angrist, J. S. and Pischke, J.-S.,
Mostly Harmless Econometrics: An Empiricists Companion, Chapter 8. Princeton University Press,
Princeton, New Jersey, 2009.
17
18
Table 22.12 HLM with random intercept, random slopes, and interaction terms.
xtmixed math homework ratio cp cpr || schid: homework ratio, variance
Performing EM optimization:
Performing gradient-based optimization:
Iteration 0: log likelihood = 886.00286
Iteration 1: log likelihood = 885.92193
Iteration 2: log likelihood = 885.88619
Iteration 3: log likelihood = 885.88589
Iteration 4: log likelihood = 885.88589
Computing standard errors:
Mixed-eects ML regression
Number of obs
260
Number of groups
10
Coef.
Std. Err.
20
avg
26.0
max
67
Wald chi2(4)
11.48
0.0216
P>|z|
homework
1.108778
2.08595
0.53
ratio
.9298151
.5686153
1.64
0.102
2.044281
.1846504
cp
.0000842
.0000456
1.85
0.064
5.05e06
.0001735
cpr
4.71e06
4.54e06
1.04
0.300
.0000136
4.20e06
_cons
61.96072
8.458131
7.33
0.000
45.38309
78.53835
Random-eects Parameters
Estimate
0.595
Std. Err.
5.197165
2.979608
schid: Independent
var(homework)
12.77393
6.508705
4.705563
34.67668
var(ratio)
1.87e12
var(_cons)
33.45384
17.46507
12.0244
93.074
var(Residual)
43.35486
3.983863
36.20939
51.91041
LR test vs. linear regression: chi2(3) = 68.21 Prob > chi2 = 0.0000
Note: LR test is conservative and is provided only for reference.
25 But remember that misspecication of the error term is bad for all estimation methods.
26 Steenbergen, M. R. and Jones, B. S. (2002) Modeling multilevel data structures, American Journal of
Political Science, 46(1), 21837.
27 Hox, op cit., pp. 14754.
19
20
28 See, Mok, M., Sample Size Requirements for 2-level Designs in Educational Research. Multilevel
Models Project, University of London, 2005.
29 See Kreft, I. G. G., Are Multilevel Techniques Necessary? An Overview, Including Simulation Studies,
California State University, Los Angeles, 1996.
interested reader may consult the Stata manuals (or SAS or SPSS manuals) for further details.
Exercises
22.1 In this chapter we discussed HLM modeling of math test data for 260 students
in 10 randomly selected school. Table 22.1 (on the companion website)30 gives
data on 519 students in 23 schools 8 schools are in the private sector and 15
schools are in the public sector. The student level (Level 1) data and the school
level data (Level 2) are the same as in the sample discussed in the text.
Explore these data by developing HLM model(s), considering the relevant
explanatory variables and taking into account various cross-level interaction
eects and compare your analysis with the standard OLS regression using clustered standard errors.
22.2 There are many interesting data sets given in Sophia Rabe-Hesketh and Andres
Skrondals Multilevel and Longitudinal Modeling Using Stata, Vol. 1 (continuous response models) and Vol. 2 (categorical responses, counts and survival),
3rd edn, published by Stata Press. All the data in these volumes can be downloaded from the following website:
http://www.stata-press.com/data/mlmus3.html
Choose the data of your interest and try to model it using HLM, considering
various aspects of HLM modeling.
30 The data were adapted from Kreft, I. and De Leeuw, J. Introducing Multilevel Modeling, Sage
Publications, California, 2007.
21