STA303 Generalized Linear Mixed Models
Patrick Brown, University of Toronto
Winter 2025
The Video
What did you get from the video?
1.
2.
3.
What I hoped you’d say
GLMM’s
• 𝑌𝑖 : value of the 𝑖th observation
• 𝐺 a distribution function
ind
𝑌𝑖 |𝑈 ∼ G(𝜇𝑖 , 𝜃) • location parameter 𝜇𝑖
ℎ(𝜇𝑖 ) =𝑋𝑖 𝛽 + 𝐴𝑖 𝑈 • dispersion parameter 𝜃
• ℎ(⋅): link function
𝑈 ∼ MVN(0, Σ)
• 𝑋𝑖 fixed effects, covariates
[,1] [,2] [,3] [,4] [,5] [,6] • 𝑈 vector of random effects
[1,] 1 1 0 0 0 0 • 𝐴𝑖 random effects design matrix
[2,] 0 0 1 1 1 0 • many rows, few columns
[3,] 0 0 0 0 0 1 • zeros, one 1 per row
• Σ random effects variance
National Youth Tobacco Survey
http://pbrown.ca/teaching/appliedstats/data/smoke2014.RData
> dim(smoke)
[1] 22007 160
> smoke[c(1, 1000, 10000), c("Age", "Sex", "state", "RuralUrban",
+ "Race", "Smoked_cigars_cigarillos")]
Age Sex state RuralUrban Race Smoked_cigars_cigarillos
1 18 F AL Rural white FALSE
1000 12 <NA> AR Urban hispanic FALSE
10000 12 F MD Rural black FALSE
Which factors influence whether an American high school student has smoked a cigar?
Prepare the data
> table(smoke$Age)
9 10 11 12 13 14 15 16 17 18 19
40 12 1327 3082 3617 3218 2939 2900 2853 1682 180
> table(smoke$Race)
white black hispanic asian native pacific
9893 3430 6081 973 338 85
> smoke = smoke[smoke$Age > 11 &smoke$Age < 19 &
+ smoke$Race %in% c('white','black','hispanic') &
+ !is.na(smoke$Sex)& !is.na(smoke$Age), ]
> smoke$ageFac = relevel(factor(smoke$Age), '15')
> smoke$Race = relevel(factor(as.character(smoke$Race)), 'white')
A Model
ind
𝑌𝑖𝑗𝑘 |𝑈 , 𝑉 ∼ Bern(𝜇𝑖𝑗𝑘 )
logit(𝜇𝑖𝑗𝑘 ) =𝑋𝑖𝑗𝑘 𝛽 + 𝑈𝑖 + 𝑉𝑖𝑗 • State 𝑖, school 𝑗, individual 𝑘
ind • 𝑋𝑖𝑗𝑘 covariates
𝑈𝑖 ∼ N(0, 𝜎1 ) • urban/rural
ind • age-sex-race interaction
𝑉𝑖𝑗 ∼ N(0, 𝜎2 )
• single-year age groups
> smokeRes = glmmTMB::glmmTMB( • 𝑈𝑖 state-level effect
+ Smoked_cigars_cigarillos ~ • 𝑉𝑖𝑗 school-level effect
+ Sex*ageFac*Race + RuralUrban +
+ (1|school) + (1|state),
+ data=smoke, family=binomial)
Results
> theCi = confint(smokeRes, full = TRUE)
> rownames(theCi) = gsub("..Intercept.|Race", "", rownames(theCi))
> knitr::kable(theCi[grep("age", rownames(theCi), invert = TRUE),
+ c(3, 1, 2)], digits = 2)
Estimate 2.5 % 97.5 % • In baseline category (white, age 15,
urban, male), prevalence is
(Intercept) -2.89 -3.27 -2.51
> exp(theCi[1, ])/(1 + exp(theCi[1, ])
SexF -0.73 -1.25 -0.20
2.5 % 97.5 % Estimate
black 0.13 -0.48 0.73
0.03669720 0.07519812 0.05272197
hispanic -0.26 -0.82 0.30
• Considerable school-level variation
RuralUrbanRural 0.27 0.00 0.55
> length(unique(smoke$school))
SexF:black 0.56 -0.33 1.46
[1] 207
SexF:hispanic 0.72 -0.13 1.58
• Not sure about state random effect
Std.Dev|school 0.59 0.45 0.78 • could be small, could be large
Std.Dev|state 0.18 0.03 1.08
Random effects
> Pmisc::ranefPlot(smokeRes, > Pmisc::ranefPlot(smokeRes,
+ grpvar='school') + grpvar='state')
040080102941
mdr_00206981 mdr_00885264 mdr_00195150 IN
mdr_00420852
mdr_01556040
mdr_00308474 mdr_00558306
mdr_00023739 mdr_00212875 IA
mdr_00223185
mdr_04870297
mdr_01051636
mdr_00073758
420681007335
mdr_00957659 AS
mdr_00660882
mdr_00986636
mdr_02044739
mdr_00917374 mdr_00634053
mdr_11128518 MP
mdr_00013045
mdr_11561544
mdr_01051686
mdr_00112134
mdr_05092216
mdr_00184735
mdr_03004536
mdr_00527321
mdr_00674089
mdr_10010102
mdr_00015378
mdr_00786503 FM
mdr_01016234
mdr_11454965
mdr_01556105 mdr_00605363
mdr_00197196
mdr_00194364 NM
mdr_02042664
mdr_01145419
mdr_00768068
mdr_03400473 mdr_00183626
mdr_11079620 TN
mdr_00291601
mdr_05348091
mdr_00224402
mdr_00139431
mdr_04029515 IL
mdr_00125818
mdr_00263549
mdr_00767947
mdr_00237942
mdr_11708431
mdr_04948076 JA
mdr_11684990
mdr_10009464
mdr_00689618
mdr_00527333
mdr_00986765 mdr_00406545
mdr_00925515 WI
mdr_00767624
pss_00006042
mdr_01548392
mdr_01837492
mdr_01540455 AL
mdr_00013057
mdr_05350800
mdr_00960864
mdr_02889711
mdr_10910643 PA
mdr_01007647
mdr_00271297
mdr_00605571
mdr_00680533
mdr_05348431 mdr_00196960 NC
mdr_00302066
mdr_00069018
mdr_00173267
mdr_01067178
mdr_04886131
mdr_00308931
mdr_00073203 NE
mdr_01398290
mdr_00081781
mdr_00422549
mdr_05279646
mdr_00184072
pss_00007641 AR
mdr_00663858
mdr_11235422
mdr_00998275
mdr_02231633
mdr_11454525 ND
mdr_00057273
mdr_04916310
mdr_00269153
mdr_03404522
pss_00006046
mdr_00040347 MD
mdr_00065517
mdr_00184711
pss_00016465
mdr_99999999
mdr_11457802 SC
mdr_00230047 mdr_05272521
mdr_03252927
mdr_00308620
mdr_00663949
mdr_00445321
mdr_00446375 FL
mdr_00687311
mdr_00767351
mdr_05351086
mdr_00884959 MN
mdr_04776653 mdr_10019718
mdr_01398264
pss_00008044
mdr_03048180
mdr_00128339
mdr_01070917 OR
550960001221
mdr_00794029
mdr_01556466
mdr_00115930
mdr_00684802
mdr_01011492 VI
mdr_00832281
mdr_01481653
mdr_00432013
mdr_00139417
mdr_00319629
mdr_04875390 WY
mdr_00924262
mdr_05101001
mdr_04032263
mdr_00906662
mdr_11103609
mdr_00286967 MH
mdr_04755594
mdr_00066688
mdr_00891689
mdr_00406478
mdr_00668511 WA
mdr_11070806
mdr_00430766
mdr_00690265
mdr_05345295
mdr_00986545
mdr_01439014 LA
mdr_00527917
mdr_01821479
mdr_01396292
mdr_01144544
mdr_05280877
mdr_04949472 TX
mdr_00014922
mdr_04451774
mdr_00746864
mdr_11717523
mdr_11068322
mdr_00996681
mdr_04290788 WV
mdr_01016856
mdr_00657639
mdr_05274505
mdr_00065672
mdr_03018757 PR
mdr_01071002
mdr_01398111
mdr_00015500
mdr_01405556
mdr_01426926
mdr_00047084 CO
mdr_04871112
mdr_04924771
mdr_11540875
mdr_00508868
mdr_04457091 VA
mdr_10910368
mdr_05348819
mdr_01067788 PW
−1.0 −0.5 0.0 0.5 1.0 1.5 −0.5 0.0 0.5
x x
Proportion of cigar smokers
white
black
0.200
hispanic
F
M
proportion
0.020 0.050
0.005
12 13 14 15 16 17 18
age
Predict function
> smokePred = do.call(expand.grid, lapply(smoke[c("ageFac",
+ "Race", "Sex")], levels))
> smokePred$RuralUrban = "Rural"
> smokePred$school = smokePred$state = NA
> smokePred[1:2, ]
ageFac Race Sex RuralUrban state school
1 15 white M Rural NA NA
2 12 white M Rural NA NA
> predLogit = predict(smokeRes, smokePred, se.fit = TRUE)
> predCI = do.call(cbind, predLogit) %*% Pmisc::ciMat()
> smokePred$y = exp(predCI)/(1 + exp(predCI))
> smokePred$y[1:2, ]
est 2.5 97.5
eta_predict 0.068106328 0.0485519531 0.09475195
Proportion of cigar smokers
white
black
0.200
hispanic
F
M
proportion
0.050
• Differences between boys and girls
increases with age
• Sex difference highest for whites
0.020
• Age-sex differences dominate
0.005
12 13 14 15 16 17 18
age
Likelihood
𝜕
log 𝜋(𝑦|𝑢) ≈ log 𝜋(𝑦|𝑢0 ) + (𝑢 − 𝑢0 )T log 𝜋(𝑦|𝑢)∣ +
𝜕𝑢 𝑢=𝑢0
1 𝜕2
( ) (𝑢 − 𝑢0 )T log 𝜋(𝑦|𝑢)∣ (𝑢 − 𝑢0 )
2 (𝜕𝑢)2 𝑢=𝑢 0
Laplace approximation: 𝑢0 = argmax𝑢 𝜋(𝑦|𝑢; 𝛽, 𝜃), first derivative is zero.
𝐿(𝑦; 𝛽, 𝜃, Σ) ∝
1
∫ 𝜋(𝑦|𝑢∗ , 𝛽, 𝜃) exp [− 21 (𝑢 − 𝑢∗ )T 𝐻(𝑢 − 𝑢∗ )] |Σ|− 2 exp (− 12 𝑢𝑇 Σ−1 𝑢) 𝑑𝑢
−2 log 𝐿(𝑌 ; 𝛽, 𝜃) + 𝐶 ≈ −2 log 𝜋(𝑦|𝑢∗ , 𝛽, 𝜃) + log |Σ(𝜃)| − log ∣𝐻(𝑢∗ ) + Σ(𝜃)−1 ∣
Conditional distributions
[𝑈 |𝑌 ] = [𝑌 |𝑈 ][𝑈 ]/[𝑌 ] ∝ [𝑌 |𝑈 ][𝑈 ]
log 𝜋(𝑈 = 𝑢|𝑦) = log 𝜋(𝑦|𝑢∗ , 𝛽, 𝜃) − 21 (𝑢 − 𝑢∗ )T 𝐻(𝑢 − 𝑢∗ ) − 21 𝑢𝑇 Σ−1 𝑢 + 𝐶
𝑢∗ (𝛽, 𝜃) =argmax𝑢 𝜋(𝑢|𝑦; 𝛽, 𝜃)
Normal approximation
𝑈 |𝑌 ≈ N [𝑢∗ , (Σ−1 + 𝐻)−1 ]
• We fit GLMM’s by pretending stuff is Normal
• the hard part is extra maximization
• outer optimization: 𝛽 , 𝜃
• inner optimization:
The Algorithm
1. minimize
−2 log 𝐿(𝑌 ; 𝛽, 𝜃) = −2 log 𝜋(𝑦|𝑢∗ , 𝛽, 𝜃) + log |Σ(𝜃)| − log ∣𝐻(𝑢∗ ) + Σ(𝜃)−1 ∣
• Try many different 𝛽 , 𝜃
• for each one find 𝑢∗ (𝛽, 𝜃)
• MLE’s 𝛽 ,̂ 𝜃 ̂
2. conditional distribution
• E(𝑈 |𝑌 ) = 𝑢∗ (𝛽,̂ 𝜃)̂
• var(𝑈 |𝑌 )−1 = Σ−1 + 𝐻
3. Standard errors
• Information matrix
• 2nd deriv of log 𝐿(𝑌 ; 𝛽, 𝜃)
In summary
• No closed-form likelihood for GLMM’s
• Taylor series at the mode (Laplace Approximation)
• Likelihoods and conditional distributions look Gaussian
• More work than LMM’s because of extra optimizations
• Autodiff makes it easy
A note on likelihood
Maximum Likelihood Estimation
𝛽,𝜏 𝜋(𝑌 ; 𝛽, 𝜃) = ∫ 𝜋(𝑌 |𝑈 ; 𝛽, 𝜃)𝜋(𝑈 ; 𝜃)𝑑𝑈
Maximum Absolute Probability estimator
𝑈,𝛽,𝜏 𝜋(𝑌 , 𝑈 ; 𝛽, 𝜃)
• not the same thing!
Multi-level schools
> school[1:2, ]
school class gender socialClass ravensTest student english math year
1 1 1 f absent 23 1 72 23 0
2 1 1 f absent 23 1 80 24 1
Histogram of school$mathProp
100 200 300
Frequency
> quantile(school$math)
0% 25% 50% 75% 100%
1 22 28 33 40
> school$mathProp = (school$math - 0.5)/40
0
0.0 0.2 0.4 0.6 0.8 1.0
school$mathProp
Check betas
1.0 2.0
11
12
dens
> betaMean = 0.3; betaSd = 0.1 22
14
> betaSd2 = (1-betaSd^2)/betaSd^2
0.0
24
> theta1 = betaMean * betaSd2 0.0 0.2 0.4 0.6 0.8 41.0
4
> theta2=(1-betaMean)*betaSd2 𝜇 = E(𝑈 ) = 𝜃1 /(𝜃1 + 𝜃2 )
> dat = data.frame(
sd(𝑈 )
+ y = rbeta(1000, theta1, theta2)) 𝜏=
geomean{𝜇, 1 − 𝜇}
> c(log(betaMean/(1-betaMean)),
+ 1/betaSd^2) 𝜃1 =𝜇(1 − 𝜏 2 )/𝜏 2
[1] -0.8472979 100.0000000 𝜃2 =(1 − 𝜇)(1 − 𝜏 2 )/𝜏 2
> confint(glmmTMB(y ~ 1, data=dat, family=beta_family), full=TRUE)
2.5 % 97.5 % Estimate
cond.(Intercept) -0.8538388 -0.8262142 -0.8400265
sigma 86.6948647 103.2238958 94.5990575
==
Schools beta
ind
𝑌𝑖𝑗𝑘ℓ |𝑈 , 𝑉 , 𝑊 ∼ Beta (𝜇𝑖𝑗𝑘ℓ , 𝜏 )
𝜇𝑖𝑗𝑘ℓ = 𝑋𝑖𝑗𝑘ℓ 𝛽+𝑈𝑖 + 𝑉𝑖𝑗 + 𝑊𝑖𝑗𝑘
𝑈𝑖 ∼𝑁 (0, 𝜎12 )
• school 𝑖
𝑉𝑖𝑗 ∼𝑁 (0, 𝜎22 ) • class 𝑗
𝑊𝑖𝑗𝑘 ∼𝑁 (0, 𝜎32 ) • student 𝑘
> schoolRes = glmmTMB( • test ℓ
+ mathProp ~ gender + socialClass +
+ (1|school / class / student),
+ family=beta_family,
+ data=school)
Results
2.5 % 97.5 % Esti
> schoolTable = confint(
(Intercept) 0.79 1.33
+ schoolRes,full=TRUE)
genderm -0.10 0.08
> rownames(schoolTable) = gsub(
ClassII -0.30 0.27
+ "cond[.]|social",
ClassIIIn -0.58 0.02
+ "", rownames(schoolTable))
ClassIIIm -0.71 -0.17
> schoolTable['sigma',] = 1/
ClassIV -0.69 -0.09
+ sqrt(schoolTable['sigma',c(2,1,3)])
ClassV -0.95 -0.33
> rownames(schoolTable) = gsub(
ClasslongUnemp -0.85 -0.21
+ "sigma","tau",rownames(schoolTable))
ClasscurrUnemp -1.05 -0.20
> rownames(schoolTable) = gsub(
Classabsent -0.77 -0.21
+ "Std.Dev..Intercept.","SD",
tau 0.26 0.28
+ rownames(schoolTable))
SD|student:class:school 0.60 0.68
> knitr::kable(schoolTable, digits=2)
SD|class:school 0.20 0.34
SD|school 0.00 Inf
References I