Applied Multilevel Analysis-Section B 1
Applied Multilevel Analysis-Section B 1
Course: Stat-5123
Section-B (Binary)
Use multilevel model whenever your data is grouped (or nested) in more than one
category (for example, states, countries, etc)
Multilevel models (mixed-effects, random effects, or hierarchical linear models) are
now a standard generalization of conventional regression models for analyzing
clustered and longitudinal data in the social, psychological, behavioral and medical
sciences. Examples include students within schools, respondents within
neighborhoods, patients with hospitals, repeated measures within subjects, and panel
survey waves on households. Multilevel models have been further generalized to
handle a wide range of response types, including, continuous, categorical (binary or
dichotomous, ordinal, and nominal or discrete choice), count, and survival responses.
Some advantages:
• Regular regression ignores the average variation between entities.
• Individual regression may face sample problems and lack of generalization
To distinguish this approach from the generalized linear modeling, where the transformation
is part of the statistical model, it is often referred to as an empirical transformation. Some
general guidelines for choosing a suitable transformation have been suggested for situations
in which a specific transformation is often successful. For instance, for the proportion p some
recommended transformations are: the arcsine transformation f(p) = 2 arcsin (√p), the logit
transformation f(p) = logit(p) = ln(p/(1 − p)), where ‗ln‘ is the natural logarithm, and the
probit or inverse normal transformation f(p) = Φ−1 (p), where Φ−1 is the inverse of the
standard normal distribution. Thus, for proportions, we can use the logit transformation, and
use standard regression procedures on the transformed variable:
The transformation is a part of the statistical model; it is often referred to as an empirical
transformation. Empirical transformations have the disadvantage that they are ad hoc, and
may encounter problems in specific situations. For instance, if we model dichotomous data,
which are simply the observed proportions in a sample of size 1, both the logistic
The modern approach to the problem of non-normally distributed variables is to include the
necessary transformation and the choice of the appropriate error distribution (not necessarily
a normal distribution) explicitly in the statistical model. This class of statistical models is
called generalized linear models.
An outcome variable with a specific error distribution that has mean and variance ,
a linear additive regression equation that produces an unobserved (latent) predictor of the
outcome variable ,
a link function that links the expected values of the outcome variable to the predicted
values
Question: Commonly used generalized linear model for dichotomous data is the logistic
regression model:
Commonly used generalized linear model for dichotomous data is the logistic regression
model specified by
Table: Some commonly used canonical link functions and the corresponding error
distribution
In multilevel generalized linear models, the multilevel structure appears in the linear
regression equation of the generalized linear model. Thus, a two-level model for proportions
is written as follows:
= + + + + + ,
These equations state that our outcome variable is a proportion , that we use a logit link
function, and that conditional on the predictor variables, we assume that has a binomial
error distribution, with expected value , and number of trials . If there is only one trial
(all are equal to one), the only possible outcomes are 0 and 1, and we are modeling
dichotomous data. This specific case of the binomial distribution is called the Bernoulli
distribution. Note that the usual lowest-level residual variance is not in the model
equation, because it is part of the specification of the error distribution. If the error
distribution is binomial, the variance is a function of the number of trials and the
population proportion : σ² = n × × (1 – ) and it does not have to be estimated
separately.
The numerical integration approach does not use an approximate likelihood, but uses
numerical integration of the exact likelihood function. Numerical integration maximizes the
correct likelihood. The estimation methods involve the numerical integration of a complex
likelihood function, which becomes more complicated as the number of random effects
increases. The actual calculations involve quadrature points, and the numerical approximation
becomes better when the number of quadrature points in the numerical integration is
increased. Unfortunately, increasing the number of quadrature points also increases the
computing time, sometimes dramatically. When full maximum likelihood estimation with
numerical integration is used, the test procedures and goodness of fit indices based on the
deviance are appropriate. Simulation research suggests that when both approaches are
feasible, the numerical integration method achieves more precise estimates.
Answer: The reason is that complex models or small data sets may pose convergence
problems, and we may be forced to use first-order MQL. Goldstein and Rasbash (1996)
suggest using bootstrap methods to improve the quasi-likelihood estimates, and Browne
(1998) explores bootstrap and Bayesian methods.
to one) which has a mean of zero and a variance of 3.29. The assumption of an
underlying latent variable is convenient for interpretation, but not crucial. An important issue
in these models is that the underlying scale is standardized to the same standard distribution
in each of the analyzed models.
If we start with an intercept-only model, and then estimate a second model where we add a
number of explanatory variables that explain part of the variance, we normally expect that the
estimated variance components become smaller. However, in logistic and probit regression
(and in many other generalized linear models), the underlying latent variable is rescaled, so
the lowest-level residual variance is again (or unity in probit regression). Consequently,
the values of the regression coefficients and higher-level variances are also rescaled. The
phenomenon of the change in scale is not specific to multilevel generalized linear modeling,
it also occurs in single-level logistic and probit models. For the single-level logistic model,
several pseudo- formulas have been proposed to provide an indication of the explained
variance. These are all based on the log-likelihood. They can be applied in multilevel logistic
and probit regression, provided that a good estimate of the log-likelihood is available.
The variation proportion in observed response due to the individual level in the hierarchy of
model is defined by variance partition coefficients (VPCs). Thus, the community's relative
importance where the source of variation is women for the MHCS non-utilization is
Here,
VPCs are based on residuals compared to the originally observed responses. In the
conditional model, the proportion of variation in outcome unexplained by the independent
variables exists at each level of the model hierarchy.
Within a given class or community, the actual correlation of the model (i.e., homogeneity or
similarity) of the discerned responses is termed as intra-class correlations (ICCs).
In conditional models, somewhat depending on observed responses, ICCs are based on the
residuals. Therefore it measures similarity in responses having adjusted for the independent
variables; that is, the homogeneity in unexplained reactions. When ICCs are computed based
on conditional models, it is sometimes referred to as adjusted ICCs. VPCs and ICCs are the
same for a two-level model.
The purpose of MLM is to partition variance in the outcome between the different groupings
in the data.
Definition
( )
Intraclass correlation =
( ) ⁄
Logistic regression
In this table, The estimated variance (SE) for the random part is 1.02 (0.12) and the ICC
value for the null model (Model 0) is 0.237, suggesting that there presents 24% heterogeneity
between two different communities which is greater than 0 and rest 76% heterogeneity occurs
due to the differences within community in case of ANC utilization in Bangladesh. I.e. there
is a variation due to 2nd level (community-level). For this reason we can apply multilevel
regression model instead of simple regression model.
From the null model (Model 0), the likelihood ratio (LR) test vs. traditional single-level
logistic regression statistic = 313.0 with a p-value that is effectively zero, P=0.000
(p<0.000). This data is significantly better fitted by a two-level model compared to the
traditional single-level model.
The next model (Model 1) contains only the lower level or individual-level variables. The
estimated variance (SE) for Model 1 is 0.42(0.08).
So Model 1 can explain [(1.02-0.42)/1.02]*100= 58.82% of the variance compared to the null
model (Model 0). And the ICC/VPC= = 0.113, Which is (0.237-0.113)= 12.4%
lower than the null model (Model 1).
………..
The full model contains the significant variables of both level (Level-1 and Level-2).
Question: Basics of Multilevel Generalized Linear Model for count Data.
Frequently the outcome variable of interest is a count of events. In most cases count data do
not have a nice normal distribution. A count cannot be lower than zero, so count data always
have a lower bound at zero. When the outcome is a count of events that occur frequently,
these problems can be addressed by taking the square root or in more extreme cases the
logarithm. However, such nonlinear transformations change the interpretation of the
underlying scale, so analyzing counts directly may be preferable. Count data can be analyzed
directly using a generalized linear model.
When the counted events are relatively rare they are often analyzed using a Poisson model.
Examples of such events are frequency of depressive symptoms in a normal population,
traffic accidents on specific road stretches, or conflicts in stable relationships. More frequent
counts are often analyzed using a negative binomial model.
Example: counting how many days a pupil has missed school is probably not a Poisson
variate, because one may miss school because of an illness, and if this lasts several days these
counts are not independent. The number of typing errors on randomly chosen book pages is
probably a Poisson variate.
The Poisson model for count data is a generalized linear regression model that consists of
three components:
I. An outcome variable y with a specific error distribution that has mean µ and variance
,
II. Linear additive regression equation that produces a predictor of the outcome
variable y,
III. A link function that links the expected values of the outcome variable y to the
predicted values for
For counts, the outcome variable is often assumed to follow a Poisson distribution with event
rate . The model can be further extended by including a varying exposure rate m. For
instance, if book pages have different numbers of words, the distribution of typing errors
would be Poisson with exposure rate the number of words on a page.
The multilevel Poisson regression model for a count for person in group can be
written as:
⁄ = Poisson ( )
The standard link function for the Poisson distribution is the logarithm, and
= ln ( )
The first-level and second-level model is constructed as usual, so
and
giving,
Since the Poisson distribution has only one parameter, its variance is equal to the mean.
Estimating an expected count implies a specific variance. Therefore, just as in logistic
regression, the first-level equations do not have a lowest-level error term.
In the Poisson model, the variance of the outcome is equal to the mean. When the observed
variance is much larger than expected under the Poisson model, we have over-dispersion.
One way to model over-dispersion is to add an explicit error term to the model. Thus, for the
Poisson model we have the link function , and the inverse is = exp( ),
where is the outcome predicted by the linear regression model. The negative binomial
adds an explicit error term ε to the model, as follows:
= exp ( ) = exp ( ) exp ( )
The error term in the model increases the variance compared to the variance implied by the
Poisson model. This is similar to adding a dispersion parameter in a Poisson model.
Given that the negative binomial model is a Poisson model with an added variance term, the
test on the deviance can be used to assess whether the negative binomial model fits better.
The negative binomial model cannot directly be compared to the Poisson model with over-
dispersion parameter, because these models are not nested. However, the AIC and BIC can be
used to compare these models. Both the AIC and the BIC are smaller for the Poisson model
with over-dispersion than for the negative binomial model.
iii. The Zero-Inflated Model:
When the data show an excess of zeros compared to the expected number under the Poisson
distribution, it is sometimes assumed that there are two processes that produce the data. Some
of the zeros are part of the event count, and are assumed to follow a Poisson model (or a
negative binomial). Other zeros are part of the event taking place or not, a binary process
modeled by a binomial model. These zeros are not part of the count; they are structural zeros,
indicating that the event never takes place. Thus, the assumption is that our data actually
include two populations, one that always produces zeros and a second that produces counts
following a Poisson model. For example, assume that we study risky behavior, such as using
drugs or having unsafe sex. One population never shows this behavior, it is simply not part of
their behavior repertoire. These individuals will always report a zero. The other population
consists of individuals who do have this behavior in their repertoire. These individuals can
report on their behavior, and these reports can also contain zeros. An individual may
sometimes use drugs, but just did not do this in the time period surveyed. Models for such
mixtures are referred to as zero-inflated Poisson or ZIP models.