MAS-II Formula Sheet
MAS-II Formula Sheet
Updated 12/07/23
Introduction
INTRODUCTION TO to Credibility
CREDIBILITY Predictive Distribution Bühlmann Credibility
Revised unconditional distribution Expected Hypothetical Mean (EHM):
Classical Credibility (w.r.t. parameter) of the model 𝜇𝜇, = EfE[𝑋𝑋 ∣ 𝜃𝜃]h
a.k.a. Limited Fluctuation Credibility Predictive density function: 𝑓𝑓(𝑥𝑥 ∣ data) Expected Process Variance (EPV):
Predictive Mean = Bayesian Premium 𝜇𝜇67 = EfVar[𝑋𝑋 ∣ 𝜃𝜃]h
Standard for Full Credibility
# of exposures needed, 𝑛𝑛! : Loss Function Variance of Hypothetical Mean (VHM):
&
For aggregate loss/pure premium: 𝜎𝜎89 = VarfE[𝑋𝑋 ∣ 𝜃𝜃]h
𝑧𝑧"#(%⁄&) & & Function to Bayesian 𝜇𝜇67
𝑛𝑛! = # & (𝐶𝐶) ) Bühlmann 𝑘𝑘: 𝑘𝑘 = &
Minimize Estimation 𝜎𝜎89
𝑘𝑘
Posterior 𝑛𝑛
# of claims needed for, 𝑛𝑛* : Squared-error loss Bühlmann Credibility Factor: 𝑍𝑍 =
mean 𝑛𝑛 + 𝑘𝑘
For aggregate loss/pure premium: Posterior
Absolute loss Bühlmann Credibility Premium:
𝑧𝑧"#(%⁄&) & 𝜎𝜎+& median
𝑛𝑛* = # & * + 𝐶𝐶,& . 𝑈𝑈 = 𝑍𝑍𝑥𝑥̅ + (1 − 𝑍𝑍)𝜇𝜇,
𝑘𝑘 𝜇𝜇+ Zero-one loss Posterior mode 𝑈𝑈 = 𝜇𝜇, + 𝑍𝑍(𝑥𝑥̅ − 𝜇𝜇, )
• For claim frequency only: set 𝐶𝐶,& = 0 Note: 𝑍𝑍 and 𝑥𝑥̅ have the same 𝑛𝑛.
"
-!
• For claim severity only: set =0 Conjugate Priors
.!
Poisson/Gamma Empirical Bayes Method
𝑛𝑛* • Model: Poisson with mean 𝜆𝜆
𝑛𝑛* = 𝑛𝑛! ⋅ 𝜇𝜇+ ⟺ 𝑛𝑛! = Uniform Exposures
𝜇𝜇+ • Prior: 𝜆𝜆 ∼ Gamma (𝛼𝛼, 𝜃𝜃) ∑;13" ∑2:3" 𝑥𝑥1:
𝜇𝜇̂ , = = 𝑥𝑥̅
Partial Credibility • Posterior: 𝑟𝑟 ⋅ 𝑛𝑛
𝑈𝑈 = 𝑍𝑍𝑍𝑍 + (1 − 𝑍𝑍)𝑀𝑀 ( 𝜆𝜆 ∣ data ) ∼ Gamma (𝛼𝛼 ∗ , 𝜃𝜃 ∗ ) ∑;13" ∑2:3"r𝑥𝑥1: − 𝑥𝑥̅ 1 s
&
o 𝛼𝛼 = 𝛼𝛼 + ∑213" 𝑥𝑥1
∗ 𝜇𝜇̂ 67 =
= 𝑀𝑀 + 𝑍𝑍(𝐷𝐷 − 𝑀𝑀) 𝑟𝑟(𝑛𝑛 − 1)
" #"
where o 𝜃𝜃 ∗ = ] + 𝑛𝑛^
4 ∑;13"(𝑥𝑥̅ 1 − 𝑥𝑥̅ )& 𝜇𝜇̂ 67
• 𝑈𝑈: Updated prediction &
𝜎𝜎t89 = −
Exponential/Gamma 𝑟𝑟 − 1 𝑛𝑛
• 𝐷𝐷: Observed value
• 𝑀𝑀: Manual rate • Model: Exponential with rate 𝜆𝜆, or Non-uniform Exposures
• 𝑍𝑍: credibility factor/credibility mean 𝜆𝜆#" 2#
∑;13" ∑:3" 𝑚𝑚1: 𝑥𝑥1:
• Prior: 𝜆𝜆 ∼ Gamma (𝛼𝛼, 𝜃𝜃) 𝜇𝜇̂ , = = 𝑥𝑥̅
𝑚𝑚
Square Root Rule • Posterior:
2 &
𝑛𝑛 𝑛𝑛 ⋅ 𝜇𝜇+ ( 𝜆𝜆 ∣ data ) ∼ Gamma (𝛼𝛼 ∗ , 𝜃𝜃 ∗ ) ∑;13" ∑:3"
#
𝑚𝑚1: r𝑥𝑥1: − 𝑥𝑥̅1 s
𝑍𝑍 = L = L ∗ 𝜇𝜇̂ 67 =
𝑛𝑛! 𝑛𝑛* o 𝛼𝛼 = 𝛼𝛼 + 𝑛𝑛 ∑;13"(𝑛𝑛1 − 1)
" #"
where 𝑛𝑛 is the actual # of exposures. o 𝜃𝜃 ∗ = ] + ∑213" 𝑥𝑥1 ^ ∑;13" 𝑚𝑚1 (𝑥𝑥̅1 − 𝑥𝑥̅ )& − 𝜇𝜇̂ 67 (𝑟𝑟 − 1)
4 &
𝜎𝜎t89 =
𝑚𝑚 − 𝑚𝑚#" ∑;13" 𝑚𝑚1&
Bayesian Credibility Binomial/Beta
Model Distribution • Model: Binomial with fixed 𝑚𝑚 and Credibility premium for a risk in Class 𝑖𝑖:
Distribution of model conditioned on a probability of success 𝑞𝑞 𝑈𝑈 = 𝑍𝑍v1 𝑥𝑥̅1 + r1 − 𝑍𝑍v1 s𝜇𝜇̂ ,
parameter • Prior: 𝑞𝑞 ∼ Beta (𝑎𝑎, 𝑏𝑏, 1) Note: 𝑍𝑍v1 and 𝑥𝑥̅1 have the same 𝑛𝑛.
Model density function: 𝑓𝑓( 𝑥𝑥 ∣ 𝜃𝜃 ) • Posterior:
( 𝑞𝑞 ∣ data ) ∼ Beta (𝑎𝑎∗ , 𝑏𝑏∗ , 1) Balancing the Estimators
Prior Distribution
o 𝑎𝑎 = 𝑎𝑎 + ∑213" 𝑥𝑥1
∗ ∑;13" 𝑍𝑍1 𝑥𝑥̅1
Initial distribution of the parameter Estimate EHM as: 𝜇𝜇̂ <=>? =
o 𝑏𝑏∗ = 𝑏𝑏 + [𝑛𝑛(𝑚𝑚) − ∑213" 𝑥𝑥1 ] ∑;13" 𝑍𝑍1
Prior density function: 𝜋𝜋(𝜃𝜃)
Geometric/Beta
Posterior Distribution
• Model: Geometric with probability of
Revised distribution of the parameter "#5
Posterior density function: 𝜋𝜋(𝜃𝜃 ∣ data) success 𝑞𝑞, or mean 5
𝑓𝑓( data ∣ 𝜃𝜃 ) ⋅ 𝜋𝜋(𝜃𝜃) • Prior: 𝑞𝑞 ∼ Beta (𝑎𝑎, 𝑏𝑏, 1)
𝜋𝜋(𝜃𝜃 ∣ data) = /
∫#/ 𝑓𝑓( data ∣ 𝜃𝜃 ) ⋅ 𝜋𝜋(𝜃𝜃) d𝜃𝜃 • Posterior:
Note: Use numerator and domain to check ( 𝑞𝑞 ∣ data ) ∼ Beta (𝑎𝑎∗ , 𝑏𝑏∗ , 1)
∗
if (𝜃𝜃 ∣ data) follows a distribution in the o 𝑎𝑎 = 𝑎𝑎 + 𝑛𝑛
exam tables to skip integration. o 𝑏𝑏∗ = 𝑏𝑏 + ∑213" 𝑥𝑥1
© 2023 Coaching Actuaries. All Rights Reserved www.coachingactuaries.com MAS-II Formula Sheet 1
LINEAR
LinearMIXED MODELS
Mixed (LMM)(LMM)
Models Factors and Effects Matrix Specification
• Covariate – predictor variable. 𝐘𝐘1 = 𝐗𝐗
ä 1 𝜷𝜷 + 𝐙𝐙
{| 𝐮𝐮1 +|~
1|}| 𝝐𝝐1
Basics of Linear Mixed Modeling • Fixed factor – categorical variable that CDE>? =FG?HI
Types and Structures of Data Sets includes all possible levels. 𝐮𝐮1 ~ 𝒩𝒩(𝟎𝟎, 𝐃𝐃)
• Clustered data – the dependent variable • Random factor – categorical variable 𝝐𝝐1 ~ 𝒩𝒩(𝟎𝟎, 𝐑𝐑 1 )
is measured once for each subject (or unit whose levels are randomly sampled from
𝑌𝑌",1
of analysis), and subjects are grouped a larger population of levels. ⎡ ⎤
𝑌𝑌&,1
into, or nested within, clusters of subjects • Fixed effect – describes the relationship 𝐘𝐘1 = ⎢ ⎥
that share some commonality. ⎢ ⋮ ⎥
between the dependent variable and a ⎣𝑌𝑌2#,1 ⎦
• Repeated measures – the dependent fixed factor or continuous covariate.
(") (&) (B)
variable is measured more than once on • Random effect – random values ⎡ 𝑥𝑥",1 𝑥𝑥",1 ⋯ 𝑥𝑥",1 ⎤ 𝛽𝛽"
the same subject across levels of one or associated with specific levels of a ⎢ (") (&)
⋯ 𝑥𝑥&,1 ⎥ 𝜷𝜷 = õ𝛽𝛽&
(B)
𝐗𝐗 1 = ⎢ 𝑥𝑥&,1 𝑥𝑥&,1
more categorical explanatory variables random factor. ⋮ ⋮ ⋱ ⋮
⎥ ⋮ú
called repeated-measures factors. ⎢ (") (&) (B) ⎥ 𝛽𝛽B
• Nested factors – each level of a factor can 𝑥𝑥
⎣ 2#,1 𝑥𝑥2#,1 ⋯ 𝑥𝑥2#,1 ⎦
• Longitudinal data – the dependent only be measured within a single level of
(") (&) (5)
variable is measured at multiple points in another factor. ⎡𝑧𝑧",1 𝑧𝑧",1 ⋯ 𝑧𝑧",1 ⎤ 𝑢𝑢",1
time for each subject. ⎢ (") ⋯ 𝑧𝑧&,1 ⎥ 𝐮𝐮 = õ𝑢𝑢&,1 ú
(&) (5)
• Crossed factor – a factor can be measured
𝐙𝐙1 = ⎢𝑧𝑧&,1 𝑧𝑧&,1
⎥ 1 ⋮
• Clustered longitudinal data – the across multiple levels of another factor. ⋮ ⋮ ⋱ ⋮
⎢ (") (5) ⎥ 𝑢𝑢5,1
dependent variable is measured at (&)
⎣𝑧𝑧2#,1 𝑧𝑧2#,1 ⋯ 𝑧𝑧2#,1 ⎦
multiple points in time for each subject, Matrix Specification for a Single
𝐃𝐃 = Var[𝐮𝐮$ ]
and subjects are grouped within clusters. Observation
(") (&) (B) ⎡ Var,𝑢𝑢%,$ . Cov,𝑢𝑢%,$ , 𝑢𝑢',$ . ⋯ Cov,𝑢𝑢%,$ , 𝑢𝑢(,$ . ⎤
Hierarchical/Multilevel Data 𝑌𝑌1 = {||||||||}||||||||~
𝛽𝛽" 𝑥𝑥@,1 + 𝛽𝛽& 𝑥𝑥@,1 + ⋯ + 𝛽𝛽B 𝑥𝑥@,1 + ⎢ Cov,𝑢𝑢',$ , 𝑢𝑢(,$ .⎥
= ⎢Cov,𝑢𝑢%,$ , 𝑢𝑢',$ . Var,𝑢𝑢',$ . ⋯
⎥
CDE>? ⋮ ⋮ ⋱ ⋮
• Level 1 – Observations at the most (") (&) (5)
⎢ ⎥
𝑢𝑢",1 𝑧𝑧@,1 + 𝑢𝑢&,1 𝑧𝑧@,1 + ⋯ + 𝑢𝑢5,1 𝑧𝑧@,1 + 𝜖𝜖@,1 ⎣Cov,𝑢𝑢%,$ , 𝑢𝑢(,$ . Cov,𝑢𝑢',$ , 𝑢𝑢(,$ . ⋯ Var,𝑢𝑢(,$ . ⎦
detailed level of data. {|||||||||||}|||||||||||~
𝐑𝐑 $ = Var[𝝐𝝐$ ]
o For clustered data, Level 1 is =FG?HI
Var,𝜖𝜖%,$ . Cov,𝜖𝜖%,$ , 𝜖𝜖',$ . ⋯ Cov,𝜖𝜖%,$ , 𝜖𝜖)!,$ .
the subject. ⎡ ⎤
Notation ⎢ Cov,𝜖𝜖%,$ , 𝜖𝜖',$ . Var,𝜖𝜖',$ . ⋯ Cov,𝜖𝜖',$ , 𝜖𝜖)!,$ .⎥
o For repeated measures/longitudinal 𝑌𝑌 The dependent variable =⎢
⋮ ⋮ ⋱ ⋮
⎥
⎢ ⎥
data, Level 1 is the repeated measures 𝑖𝑖 Identifies a subject ⎣Cov,𝜖𝜖%,$ , 𝜖𝜖)!,$ . Cov,𝜖𝜖',$ , 𝜖𝜖)!,$ . ⋯ Var,𝜖𝜖)!,$ . ⎦
made on a subject. 𝑡𝑡 Indexes time
• Level 2 – The next most detailed level Covariance Structures
𝑋𝑋 (") , … , 𝑋𝑋 (B) The 𝑝𝑝 covariates associated
of data. • Unstructured
with fixed effects
o For clustered data, Level 2 is a cluster 𝜎𝜎J&*
(")
𝑥𝑥@,1 The 𝑡𝑡th observed value of 𝜎𝜎J& 𝜎𝜎J*,J"
𝐃𝐃 = ù * 𝜎𝜎
û 𝜽𝜽𝐃𝐃 = † J*,J" °
of subjects. 𝑋𝑋 (") for subject 𝑖𝑖 𝜎𝜎J*,J" 𝜎𝜎J&"
o For repeated measures, it is the subject. 𝜎𝜎J&"
𝑍𝑍 (") , … , 𝑍𝑍 (5) The 𝑞𝑞 covariates associated
• Level 3 – The next level of data. with random effects • Diagonal/Variance components
o Clusters of Level 2 units (clusters (") 𝜎𝜎J& 0 𝜎𝜎J&*
𝑧𝑧@,1 The 𝑡𝑡th observed value of 𝐃𝐃 = ù * & û 𝜽𝜽𝐃𝐃 = ù & û
of clusters). 𝑍𝑍 (") for subject 𝑖𝑖 0 𝜎𝜎J" 𝜎𝜎J"
𝛽𝛽" , … , 𝛽𝛽B The 𝑝𝑝 fixed effects • Compound symmetric
𝑢𝑢",1 , … , 𝑢𝑢5,1 The 𝑞𝑞 random effects 𝜎𝜎 & + 𝜎𝜎" 𝜎𝜎" ⋯ 𝜎𝜎"
⎡ & ⎤
associated with subject 𝑖𝑖 𝐑𝐑1 = ⎢ 𝜎𝜎" 𝜎𝜎 + 𝜎𝜎" ⋯ 𝜎𝜎" ⎥
𝜖𝜖@,1 The random residual ⎢ ⋮ ⋮ ⋱ ⋮ ⎥
⎣ 𝜎𝜎" 𝜎𝜎" ⋯ 𝜎𝜎 & + 𝜎𝜎" ⎦
𝜎𝜎 &
𝜽𝜽𝐑𝐑 = ¢ £
𝜎𝜎"
• First-order autoregressive
𝜎𝜎 & 𝜎𝜎 & 𝜌𝜌 ⋯ 𝜎𝜎 & 𝜌𝜌2##"
⎡ & &
⎤
𝐑𝐑1 = ⎢ 𝜎𝜎 𝜌𝜌 𝜎𝜎 ⋯ 𝜎𝜎 & 𝜌𝜌2##& ⎥
⎢ ⋮ ⋮ ⋱ ⋮ ⎥
⎣𝜎𝜎 & 𝜌𝜌2##" 𝜎𝜎 & 𝜌𝜌2##& ⋯ 𝜎𝜎 & ⎦
𝜎𝜎 &
𝜽𝜽𝐑𝐑 = ¢ £
𝜌𝜌
• Both 𝐃𝐃 and 𝐑𝐑1 must be positive definite.
• Heterogeneous variances are also
possible. Same structure for each group,
but different parameters in 𝜽𝜽𝐃𝐃 or 𝜽𝜽𝐑𝐑 .
© 2023 Coaching Actuaries. All Rights Reserved www.coachingactuaries.com MAS-II Formula Sheet 2
Hierarchical Models Model Estimation and Inference Intraclass Correlation Coefficient (ICC)
Break the LMM into levels and write an Maximum Likelihood and Restricted In general, the ICC for a given level of
equation for each level. Maximum Likelihood clustering can be thought of as the
• Maximum Likelihood (ML) proportion of the total observed variation
Consider the model below: ® is the best linear
(") (&) (O) o If 𝜽𝜽 is known, 𝜷𝜷 due to the random effects at that level and
Y1,:,M = 𝛽𝛽N + 𝛽𝛽" 𝑥𝑥1,:,M + 𝛽𝛽& 𝑥𝑥:,M + 𝛽𝛽O 𝑥𝑥M unbiased estimator (BLUE) for 𝜷𝜷. higher levels. It must be positive.
(")
+ 𝑢𝑢N,:|M + 𝑢𝑢",:|M 𝑥𝑥1,:,M ® is the empirical
o If 𝜽𝜽 is not known, 𝜷𝜷 The ICC for level 𝑗𝑗 of a two-level variance
+ 𝑢𝑢N,M + 𝜖𝜖1,:,M best linear unbiased estimator (EBLUE) components model is:
𝑢𝑢N,:|M for 𝜷𝜷.
𝐮𝐮:|M = ]𝑢𝑢 ^ ~ 𝒩𝒩r𝟎𝟎, 𝐃𝐃(") s Variance in common 𝜎𝜎 &
",:|M • Restricted Maximum Likelihood (REML) ICC: = = & DGQ &
& Total variance 𝜎𝜎DGQ + 𝜎𝜎
𝑢𝑢N,M ~ Normal r0, 𝜎𝜎DGQ:S=HTU s o Adjusts for the loss of degrees of
& freedom from estimating the fixed
𝜖𝜖1,:,M ~ Normal (0, 𝜎𝜎 ) For a subject 𝑖𝑖 in cluster 𝑗𝑗 nested within
effects to produce an unbiased estimate group of clusters 𝑘𝑘, the Level 2 ICC is:
In hierarchical form: for 𝜽𝜽. & &
𝜎𝜎DGQ::(M) + 𝜎𝜎DGQ:M
• Level 1 – ICC:|M = & &
Estimate REML ML 𝜎𝜎DGQ::(M) + 𝜎𝜎DGQ:M + 𝜎𝜎 &
Y1,:,M = 𝑏𝑏N,:|M + 𝑏𝑏",:|M 𝑥𝑥1,:,M + 𝜖𝜖1,:,M
®
𝜷𝜷 Unbiased Unbiased and the Level 3 ICC is:
𝜖𝜖1,:,M ~ Normal (0, 𝜎𝜎 & ) &
Biased 𝜎𝜎DGQ:M
• Level 2 – ®
𝜽𝜽 Unbiased ICCM = & &
(&) Downward 𝜎𝜎DGQ::(M) + 𝜎𝜎DGQ:M + 𝜎𝜎 &
𝑏𝑏N,:|M = 𝑏𝑏N,M + 𝛽𝛽& 𝑥𝑥:,M + 𝑢𝑢N,:|M
®h Biased Biased
𝑏𝑏",:|M = 𝛽𝛽" + 𝑢𝑢",:|M Varf𝜷𝜷 Marginal ICC
Downward Downward
𝑢𝑢N,:|M For the implied marginal model arising
𝐮𝐮:|M = ]𝑢𝑢 ^ ~ 𝒩𝒩r𝟎𝟎, 𝐃𝐃(") s
",:|M Computational Algorithms from a variance components model
• Level 3 – Used for estimating parameters in an LMM. Covf𝑌𝑌1,: , 𝑌𝑌1 +,: h
(O) Corrf𝑌𝑌1,: , 𝑌𝑌1 +,: h =
𝑏𝑏N,M = 𝛽𝛽N + 𝛽𝛽O 𝑥𝑥M + 𝑢𝑢N,M • Expectation Maximization
¨Varf𝑌𝑌1,: h ⋅ Varf𝑌𝑌1 +,: h
&
𝑢𝑢N,M ~ Normal r0, 𝜎𝜎DGQ:S=HTU s o Pros: good at finding starting values for
& &
other algorithms. 𝜎𝜎DGQ 𝜎𝜎DGQ
Marginal Linear Models = = = ICC:
o Cons: converges slowly and produces &
&
𝜎𝜎DGQ+ 𝜎𝜎 &
A population-averaged model: no ¨r𝜎𝜎DGQ
&
+ 𝜎𝜎 & s
"optimistic" estimators.
random effects. • Newton-Raphson is referred to as the marginal ICC and can be
𝐘𝐘1 = 𝐗𝐗 1 𝜷𝜷 + 𝝐𝝐∗1 o Pros: converges in a small number of viewed as the marginal correlation between
𝝐𝝐∗1 ~ 𝒩𝒩(𝟎𝟎, 𝐕𝐕1∗ ) iterations and can be used to obtain an two different observations within the
Implied Marginal Model asymptotic covariance matrix for the same group.
𝐘𝐘1 = 𝐗𝐗 1 𝜷𝜷 + 𝝐𝝐∗1 covariance parameters in 𝜽𝜽.
𝝐𝝐∗1 ~ 𝒩𝒩(𝟎𝟎, 𝐕𝐕1 ) o Cons: each iteration takes a while.
𝐕𝐕1 = 𝐙𝐙1 𝐃𝐃𝐙𝐙1V + 𝐑𝐑1 • Fisher Scoring
o Pros: less computationally intensive
• Easier to fit than an LMM. Only restriction
and more likely to converge.
is that 𝐕𝐕1 must be positive definite.
o Cons: difficult to obtain the expected
• E[𝐘𝐘1 ] = 𝐗𝐗 1 𝜷𝜷
Hessian matrix, which is needed in
• Var[𝐘𝐘1 ] = 𝐙𝐙1 𝐃𝐃𝐙𝐙1V + 𝐑𝐑1
order for the estimates to be accurate.
• 𝐘𝐘1 ~ 𝒩𝒩r𝐗𝐗 1 𝜷𝜷, 𝐙𝐙1 𝐃𝐃𝐙𝐙1V + 𝐑𝐑1 s
In general, start with a few iterations of EM
to generate starting values, finish with N-R,
and avoid Fisher scoring.
© 2023 Coaching Actuaries. All Rights Reserved www.coachingactuaries.com MAS-II Formula Sheet 3
EBLUPs Alternative Tests Information Criterion
≠: = Ef𝐮𝐮: Æ 𝐘𝐘: = 𝐲𝐲: h
𝐮𝐮 • 𝑡𝑡-test AIC = −2𝑙𝑙r𝜽𝜽 ® ; 𝐲𝐲s + 2 ⋅ 𝑝𝑝
• Empirical – based on the estimates of 𝜷𝜷 𝛽𝛽v: BIC = −2𝑙𝑙r𝜽𝜽 ®; 𝐲𝐲s + ln 𝑛𝑛 ⋅ 𝑝𝑝
𝑡𝑡. 𝑠𝑠. =
and 𝜽𝜽 𝑠𝑠𝑠𝑠r𝛽𝛽v: s • 𝑝𝑝 is the number of parameters in the
• Best – minimum variance among all o Used to test the hypothesis 𝐻𝐻N : 𝛽𝛽: = 0 model (fixed effects and
unbiased predictors versus 𝐻𝐻" : 𝛽𝛽: ≠ 0. covariance parameters).
• Linear – linear functions of the o Does not follow an exact 𝑡𝑡-distribution. • 𝑛𝑛 is the number of observations.
observed data Degrees of freedom are calculated ®; 𝐲𝐲s is the log-likelihood of the
• 𝑙𝑙r𝜽𝜽
• Unbiased using a computer. observed data in 𝐲𝐲 under the REML
• Predictors • 𝐹𝐹-test estimates of the parameters, 𝜽𝜽 ®.
For a variance components model, o Used to test the hypothesis 𝐻𝐻N : 𝐋𝐋𝜷𝜷 = 𝟎𝟎
The Top-Down Strategy
versus 𝐻𝐻" : 𝐋𝐋𝜷𝜷 ≠ 𝟎𝟎, where 𝐋𝐋 is some
𝜇𝜇̂ + 𝑢𝑢tN,: = 𝑍𝑍: × 𝑦𝑦≤: + r1 − 𝑍𝑍: s × 𝜇𝜇̂ Start with a complex model and reduce it to
matrix that encodes the fixed effects
𝑣𝑣 𝜎𝜎 & a simpler model.
𝑘𝑘 = = & being tested.
𝑎𝑎 𝜎𝜎DGQ 1. Build a model with a "loaded"
o Follows an approximate 𝐹𝐹-distribution
𝑛𝑛: 𝑛𝑛: mean structure.
𝑍𝑍: = = with 𝑑𝑑" numerator degrees of freedom
𝑛𝑛: + 𝑘𝑘 𝜎𝜎 & 2. Select a structure for the random effects.
𝑛𝑛: + & and 𝑑𝑑& denominator degrees
𝜎𝜎DGQ 3. Select a residual error
of freedom.
• If 𝑛𝑛: = 1, then 𝑍𝑍: = ICC: . covariance structure.
§ 𝑑𝑑" = number of parameters
• 𝑢𝑢tN,: is the EBLUP for level 𝑗𝑗 of the 4. Reduce the model by removing non-
being tested.
random factor. significant fixed effects.
§ 𝑑𝑑& is usually approximated with
• 𝑍𝑍: is the Bühlmann credibility factor for a computer. The Step-Up Strategy
level 𝑗𝑗. o Type I 𝐹𝐹-test: conditional on only the More commonly utilized for constructing
• 𝜇𝜇̂ is the unconditional predicted value effects listed before the one being models in hierarchical form. Start with a
using the marginal model. tested (sequential). simple model and gradually add terms.
• 𝑦𝑦≤: is the observed mean response for o Type III 𝐹𝐹-test: conditional on all other 1. Build a "means-only" model.
level 𝑗𝑗. fixed effects. 2. Check the random intercepts.
• 𝜇𝜇̂ + 𝑢𝑢tN,: is the shrinkage mean for level 𝑗𝑗. o Kenward-Roger Method: corrects for 3. Add the Level 1 covariates and related
bias in the estimation of 𝑑𝑑& by inflating Level 2 random coefficients.
Likelihood Ratio Tests the marginal covariance matrix. More 4. Add the Level 2 covariates and related
𝑡𝑡. 𝑠𝑠. = 2(𝑙𝑙" − 𝑙𝑙N ) important the smaller the sample size Level 3 random coefficients.
• Used for comparing two nested models. is. 5. Repeat Step 4 if necessary for any
• 𝐻𝐻N : The null model is adequate. • Omnibus Wald Test Level 3 covariates.
• 𝐻𝐻" : The reference model is better. o Used to test the hypothesis 𝐻𝐻N : 𝐋𝐋𝜷𝜷 = 𝟎𝟎
• Reject 𝐻𝐻N in favor of 𝐻𝐻" at the 𝛼𝛼 versus 𝐻𝐻" : 𝐋𝐋𝜷𝜷 ≠ 𝟎𝟎.
significance level if 𝑡𝑡. 𝑠𝑠. exceeds the value o Difficult to calculate the test statistic,
from the test distribution. but it follows a chi-square distribution
with degrees of freedom equal to the
REML/ Test
Testing number of parameters being tested.
ML Distribution
• Wald 𝑧𝑧-test
𝑑𝑑 fixed
ML 𝜒𝜒W& o Can be used to test the significance of a
effects
covariance parameter.
𝑑𝑑 residual
o Not well suited for LMMs.
covariance REML 𝜒𝜒W&
parameters
A lone
random REML 0.5𝜒𝜒"&
effect
One of
50-50
multiple
REML mixture of
random &
𝜒𝜒W#" and 𝜒𝜒W&
effects
© 2023 Coaching Actuaries. All Rights Reserved www.coachingactuaries.com MAS-II Formula Sheet 4
LMM Diagnostics and Other Issues Other Diagnostics Missing Data
Residual Diagnostics • *Influence Diagnostics – Techniques used • LMMs are better at handling datasets that
• Conditional Residual – The difference to identify the influence an observation or have different-sized groups or missing
between the observed response and the set of observations have on the response, observations than alternatives such as
conditional predicted value. or the parameter estimates in 𝜷𝜷 and 𝜽𝜽. repeated-measures ANOVA.
𝝐𝝐t1 = 𝐲𝐲1 − 𝐗𝐗1 𝜷𝜷® − 𝐙𝐙1 𝐮𝐮
≠1 • Random Effect Diagnostics – Diagnose • LMMs assume that any unobserved data
• Marginal Residual – The difference random effects by looking at the EBLUPs. is missing at random, meaning that the
between the observed response and the o EBLUPs do not have to follow the true probability of having missing data on a
unconditional predicted value. distribution of the random effects, so given variable may depend on other
®
𝝐𝝐t∗1 = 𝐲𝐲1 − 𝐗𝐗1 𝜷𝜷 checking them for normality is observed data, but cannot depend on the
• Standardized Residual – Scaling a not needed. data that would have been observed.
residual by dividing by its true standard o Focus on identifying potential outliers,
Centering Covariates
deviation. Denoted as 𝜖𝜖̂1,: XQF
. as an unusually small or large EBLUP
• Grand Mean Centering – The overall
could point toward an abnormality
• Studentized Residual – Scaling a residual mean of a covariate is subtracted from
within the corresponding group.
by dividing by its estimated standard each observation.
XQT
deviation. Denoted as 𝜖𝜖̂1,: . • Observed vs. Predicted Values – Plot the
o Changes the interpretation of the
observed response values against the
o Internal Studentization – The estimate intercept, but not the
conditional predicted values to verify a
of the standard deviation includes the corresponding coefficient.
model's accuracy.
observation the residual • Group Mean Centering – The mean
o We hope to see a roughly linear
corresponds to. covariate value for a higher-level cluster
relationship between observed and
o External Studentization – The estimate or group is subtracted from
predicted values. If these values are not
of the standard deviation does not each observation.
similar, our model may not
include the observation the residual o Changes the interpretation of the
be adequate.
corresponds to. intercept and the
• Pearson Residual – Scaling a residual by Aliasing corresponding coefficient.
dividing by the estimated standard When there is ambiguity in the specification
6 Crossed Random Factors
deviation of the response. Denoted as 𝜖𝜖̂1,: . of a parametric model that would lead to
A model with crossed random factors has
multiple possible sets of parameters that
Potential Issues with Residuals multiple random factors whose levels do
each imply identical or
• Residuals with non-zero averages not have a specific nesting structure. This
indistinguishable models.
• Heteroscedasticity slightly changes the way the model is
• Nonestimability, a result of aliasing,
• Non-normal errors specified, the way parameters are
implies that infinitely many sets of
• Outliers estimated, and the form of the implied
parameters would lead to the same
marginal covariance matrix.
predicted values.
• Intrinsic Aliasing – Aliasing due to a
model's formula specification. Sometimes
referred to as "nonidentifiability"
or "overparameterization".
• Extrinsic Aliasing – Aliasing due to
characteristics of the dataset.
© 2023 Coaching Actuaries. All Rights Reserved www.coachingactuaries.com MAS-II Formula Sheet 5
Name Parameter(s) Description Diagnostic Type
Change in ML log-likelihood for all data with
Likelihood Distance 𝝍𝝍
𝝍𝝍 estimated using all data vs. reduced data
Overall influence
Restricted Likelihood Change in REML log-likelihood for all data with
𝝍𝝍
Distance 𝝍𝝍 estimated using all data vs. reduced data
𝜷𝜷 Scaled change in estimated 𝜷𝜷 vector
Cook’s Distance
𝜽𝜽 Scaled change in estimated 𝜽𝜽 vector
Scaled change in estimated 𝜷𝜷 vector using
𝜷𝜷 ®h Change in parameter estimates
Multivariate DFITS the “externalized” Varf𝜷𝜷
Statistic Scaled change in estimated 𝜽𝜽 vector using
𝜽𝜽 ®h
the “externalized” Varf𝜽𝜽
Change in precision of estimated 𝜷𝜷 vector based on
𝜷𝜷 ®h
the determinant of Varf𝜷𝜷 Change in precision of parameter
Covariance Ratio
Change in precision of estimated 𝜽𝜽 vector based on estimates
𝜽𝜽 ®h
the determinant of Varf𝜽𝜽
Predicted Residual
Sum of Squared PRESS Residuals calculated by
Error Sum of Squares N/A Sum of squared PRESS residuals
deleting observations in 𝑢𝑢
(PRESS) Statistic
*More information on Influence Diagnostics
© 2023 Coaching Actuaries. All Rights Reserved www.coachingactuaries.com MAS-II Formula Sheet 6
STATISTICAL LEARNING Key Ideas on Model Accuracy
Statistical Learning
• As flexibility increases, the training MSE (or error rate) decreases,
Overview and Prerequisites but the test MSE (or error rate) follows a u-shaped pattern.
Types of Variables • Low flexibility leads to a method with low variance and high bias;
Response A variable of primary interest high flexibility leads to a method with high variance and low bias.
Explanatory A variable used to study the response variable
Count A quantitative variable usually valid on Validation Set
non-negative integers • Randomly splits all available observations into two groups: the
Continuous A real-valued quantitative variable training set and the validation set.
Nominal A categorical/qualitative variable having categories • Only the observations in the training set are used to attain the
without a meaningful or logical order fitted model, and those in validation set are used to estimate the
Ordinal A categorical/qualitative variable having categories test MSE.
with a meaningful or logical order 𝑘𝑘-fold Cross-Validation
Notation 1. Randomly divide all available observations into 𝑘𝑘 folds.
𝑦𝑦, 𝑌𝑌 Response variable 2. For 𝑣𝑣 = 1, … , 𝑘𝑘, obtain the 𝑣𝑣th fit by training with all observations
𝑥𝑥, 𝑋𝑋 Explanatory variable except those in the 𝑣𝑣th fold.
Subscript 𝑖𝑖 Index for observations 3. For 𝑣𝑣 = 1, … , 𝑘𝑘, use 𝑦𝑦t from the 𝑣𝑣th fit to calculate a test MSE
𝑛𝑛 No. of observations estimate with observations in the 𝑣𝑣th fold.
Subscript 𝑗𝑗 Index for variables except response 4. To calculate CV error, average the 𝑘𝑘 test MSE estimates in the
𝑝𝑝 No. of variables except response previous step.
𝐀𝐀V Transpose of matrix 𝐀𝐀 Leave-one-out Cross-Validation (LOOCV)
𝐀𝐀#" Inverse of matrix 𝐀𝐀 • Calculate LOOCV error as a special case of 𝑘𝑘-fold cross-validation
𝜀𝜀 Error term where 𝑘𝑘 = 𝑛𝑛.
𝑦𝑦t, 𝑌𝑌ƒ, 𝑓𝑓v(𝑥𝑥) Estimate/Estimator of 𝑓𝑓(𝑥𝑥)
Key Ideas on Cross-Validation
Regression Problems • With respect to bias, LOOCV < 𝑘𝑘-fold CV < Validation Set.
𝑌𝑌 = 𝑓𝑓r𝑥𝑥" , … , 𝑥𝑥B s + 𝜀𝜀 where E[𝜀𝜀] = 0, so E[𝑌𝑌] = 𝑓𝑓r𝑥𝑥" , … , 𝑥𝑥B s • With respect to variance, LOOCV > 𝑘𝑘-fold CV > Validation Set.
&
Test MSE = E #r𝑌𝑌 − 𝑌𝑌ƒs & ,
Standardizing Variables
∑213"(𝑦𝑦1 − 𝑦𝑦t1 )& • A centered variable is the result of subtracting the sample mean
which can be estimated using
𝑛𝑛 from a variable.
For fixed inputs 𝑥𝑥" , … , 𝑥𝑥B , the test MSE is
• A scaled variable is the result of dividing a variable by its sample
&
Varf𝑓𝑓vr𝑥𝑥" , … , 𝑥𝑥B sh + rBiasf𝑓𝑓vr𝑥𝑥" , … , 𝑥𝑥B shs +
{||||||||||||}||||||||||||~ Var[𝜀𝜀]
{}~ standard deviation.
=>?T<DYZ> >==H= D==>?T<DYZ> >==H= • A standardized variable is the result of first centering a variable,
Classification Problems then scaling it.
Test Error Rate = Ef𝐼𝐼r𝑌𝑌 ≠ 𝑌𝑌ƒsh,
∑213" 𝐼𝐼(𝑦𝑦1 ≠ 𝑦𝑦t1 )
which can be estimated using
𝑛𝑛
Bayes Classifier:
𝑓𝑓r𝑥𝑥" , … , 𝑥𝑥B s = arg max Prr𝑌𝑌 = 𝑐𝑐Æ𝑋𝑋" = 𝑥𝑥" , … , 𝑋𝑋B = 𝑥𝑥B s
*
© 2023 Coaching Actuaries. All Rights Reserved www.coachingactuaries.com MAS-II Formula Sheet 7
Contrasting Statistical Learning Elements
© 2023 Coaching Actuaries. All Rights Reserved www.coachingactuaries.com MAS-II Formula Sheet 8
𝒌𝒌-Nearest Neighbors (KNN) Decision Trees Cost Complexity Pruning
Algorithm Notation Regression:
1. Let the observation having inputs 𝑅𝑅 Region of predictor space |V|
&
𝑥𝑥" , … , 𝑥𝑥B be the center of 𝑛𝑛\ No. of observations in node 𝑚𝑚 Minimize — — r𝑦𝑦1 − 𝑦𝑦≤], s + 𝜆𝜆|𝑇𝑇|
the neighborhood. 𝑛𝑛\,* No. of category 𝑐𝑐 observations in \3" 1:𝐱𝐱 # ∈],
© 2023 Coaching Actuaries. All Rights Reserved www.coachingactuaries.com MAS-II Formula Sheet 9
Ensemble Methods Boosting Principal Components Analysis (PCA)
Bagging Let 𝑧𝑧" be the actual response variable, 𝑦𝑦. Notation
1. Create 𝑏𝑏 bootstrap samples from the 1. For 𝑘𝑘 = 1, 2, … , 𝑏𝑏: 𝑧𝑧, 𝑍𝑍 Principal component (score)
original training dataset. • Use recursive binary splitting to fit a 𝑚𝑚 Index for principal components
2. Construct a decision tree for each tree with 𝑑𝑑 splits to the data with 𝑧𝑧M as 𝜙𝜙 Principal component loading
bootstrap sample using recursive the response. 𝑥𝑥, 𝑋𝑋 Centered explanatory variable
binary splitting. • Update 𝑧𝑧M by subtracting 𝜆𝜆 ⋅ 𝑓𝑓vM (𝐱𝐱), i.e.,
Principal Components
3. Predict the response of a new observation let 𝑧𝑧Mb" = 𝑧𝑧M − 𝜆𝜆 ⋅ 𝑓𝑓vM (𝐱𝐱). B B
by averaging the predictions (regression 2. Calculate the boosted model prediction as 𝑧𝑧\ = — 𝜙𝜙:,\ 𝑥𝑥: , 𝑧𝑧1,\ = — 𝜙𝜙:,\ 𝑥𝑥1,:
trees) or by using the most frequent 𝑓𝑓v(𝐱𝐱) = ∑cM3" 𝜆𝜆 ⋅ 𝑓𝑓vM (𝐱𝐱). :3" :3"
category (classification trees) across B &
• ∑:3" 𝜙𝜙:,\ =1
all 𝑏𝑏 trees. Properties
• Increasing 𝑏𝑏 can cause overfitting. • ∑B:3" 𝜙𝜙:,\ ⋅ 𝜙𝜙:,J = 0, 𝑚𝑚 ≠ 𝑢𝑢
Properties • Boosting reduces bias.
• Increasing 𝑏𝑏 does not cause overfitting. Proportion of Variance Explained (PVE)
• 𝑑𝑑 controls complexity of the B B 2
• Bagging reduces variance. boosted model. 1 &
— 𝑠𝑠k&5 = — — 𝑥𝑥1,:
• Out-of-bag error is a valid estimate of • 𝜆𝜆 controls the rate at which 𝑛𝑛
:3" :3" 13"
test error. boosting learns. 2
1 &
Random Forests 𝑠𝑠l&, = — 𝑧𝑧1,\
Bayesian Additive Regression Trees (BART) 𝑛𝑛
13"
1. Create 𝑏𝑏 bootstrap samples from the (J)
Let 𝑓𝑓v (𝐱𝐱) be the prediction at 𝐱𝐱 for the 𝑘𝑘th
M
𝑠𝑠l&,
original training dataset. PVE =
tree in the 𝑢𝑢th iteration for 𝑘𝑘 = 1, 2, … , 𝑏𝑏 ∑B:3" 𝑠𝑠k&5
2. Construct a decision tree for each
and 𝑢𝑢 = 1, 2, … , 𝑣𝑣. • The variance explained by each
bootstrap sample using recursive binary
(") ∑- e
splitting. At each split, a random subset of 1. Initiate by letting 𝑓𝑓v (𝐱𝐱) = #.* # for 𝑘𝑘 =
M 2c
subsequent principal component is
𝑘𝑘 variables are considered. 1, 2, … , 𝑏𝑏. always less than the variance explained
3. Predict the response of a new observation (") by the previous principal component.
2. Calculate 𝑓𝑓v (") (𝐱𝐱) = ∑cM3" 𝑓𝑓vM (𝐱𝐱) = 𝑦𝑦≤.
by averaging the predictions (regression • Total variance is the sum of the variance
3. For 𝑢𝑢 = 2, 3, … , 𝑣𝑣:
trees) or by using the most frequent explained by the first 𝑘𝑘 principal
a) For 𝑘𝑘 = 1, 2, … , 𝑏𝑏:
category (classification trees) across components and MSE of the 𝑘𝑘-
i. For 𝑖𝑖 = 1, 2, … , 𝑛𝑛, calculate the
all 𝑏𝑏 trees. dimensional approximation.
current partial residual, 𝑟𝑟1 = 𝑦𝑦1 −
(J) (J#")
Properties ∑M +fM 𝑓𝑓v + (𝐱𝐱1 ) − ∑M +gM 𝑓𝑓v +
M
(𝐱𝐱1 ).
M
Key Ideas
• Bagging is a special case of (J)
ii. Fit a new tree, 𝑓𝑓vM (𝐱𝐱), to the partial • All principal components are
random forests. residuals by randomly perturbing uncorrelated with one another.
• Increasing 𝑏𝑏 does not cause overfitting. the 𝑘𝑘th tree from the previous • A dataset has min(𝑛𝑛 − 1, 𝑝𝑝) distinct
• Decreasing 𝑘𝑘 reduces the correlation (J#") principal components.
iteration, 𝑓𝑓v
M (𝐱𝐱).
between predictions. (J)
• The first 𝑘𝑘 principal component scores
b) Calculate 𝑓𝑓v (J) (𝐱𝐱) = ∑cM3" 𝑓𝑓vM (𝐱𝐱). and loadings approximate the original
4. Calculate the mean after 𝑡𝑡 burn-in dataset, 𝑥𝑥1,: ≈ ∑M\3" 𝑧𝑧1,\ 𝜙𝜙:,\ .
∑20.34* hi (0) (𝐱𝐱)
samples, 𝑓𝑓v(𝐱𝐱) = . • Principal components are low-
j#@
dimensional surfaces in 𝑝𝑝-dimensional
Properties space that are closest to the observations.
• Like bagging and random forests, BART • Scaling has a significant effect on the
incorporates randomness. result of PCA.
• Like boosting, BART sequentially builds • A scree plot can be used to determine the
trees to capture information not captured number of principal components.
by previous trees.
• Each principal component loading vector
is unique up to a sign flip.
• PCA is most useful when multicollinearity
is present in the features.
© 2023 Coaching Actuaries. All Rights Reserved www.coachingactuaries.com MAS-II Formula Sheet 10
Cluster Analysis Hierarchical Clustering Key Ideas
Notation 1. Select the dissimilarity measure and • For 𝑘𝑘-means clustering, the algorithm
𝐶𝐶 Cluster containing indices linkage to be used. Treat each needs to be repeated for each 𝑘𝑘.
𝑊𝑊(𝐶𝐶) Within-cluster variation observation as its own cluster. • For hierarchical clustering, the algorithm
of cluster 2. For 𝑘𝑘 = 𝑛𝑛, 𝑛𝑛 − 1, … , 2: only needs to be performed once for any
|𝐶𝐶| No. of observations in cluster • Compute the inter-cluster dissimilarity number of clusters.
between all 𝑘𝑘 clusters. • The result of clustering depends on many
𝑘𝑘-Means Clustering
• Examine all rM&s pairwise parameters, such as:
1. Randomly assign a cluster to each
dissimilarities. The two clusters with o Choice of 𝑘𝑘 in 𝑘𝑘-means clustering.
observation. This serves as the initial
the lowest inter-cluster dissimilarity o Choice of number of clusters, linkage,
cluster assignments.
are fused. The dissimilarity indicates and dissimilarity measure in
2. Calculate the centroid of each cluster.
the height in the dendrogram at which hierarchical clustering.
3. For each observation, identify the closest
these two clusters join. o Choice to standardize variables.
centroid and reassign to that cluster.
4. Repeat steps 2 and 3 until the cluster
Linkage Inter-cluster Dissimilarity
assignments stop changing.
B Complete The largest dissimilarity
1 &
𝑊𝑊(𝐶𝐶J ) = — —r𝑥𝑥1,: − 𝑥𝑥\,: s Single The smallest dissimilarity
|𝐶𝐶J |
1,\∈m0 :3"
B Average The arithmetic mean
&
= 2 — —r𝑥𝑥1,: − 𝑥𝑥̅J,: s
The dissimilarity between
1∈m0 :3" Centroid
the cluster centroids
Neural Network
© 2023 Coaching Actuaries. All Rights Reserved www.coachingactuaries.com MAS-II Formula Sheet 11
Activation Functions Performance Measures Receiver Operating Characteristic
Sigmoid: Lift measures a model's ability to avoid (ROC) Curves
𝑒𝑒 l adverse selection by accurately determining • Plots the true positive rates (sensitivity)
𝑔𝑔(𝑧𝑧) =
1 + 𝑒𝑒 l an actuarially fair premium rate for against the false positive rates (1 minus
each insured. specificity) for different values of the
Rectified linear unit (ReLU):
0, 𝑧𝑧 < 0 discrimination threshold.
𝑔𝑔(𝑧𝑧) = 𝑧𝑧b = ‹ Actual vs. Predicted Plots
𝑧𝑧, otherwise • Sensitivity, true positive rate, or hit rate is
• Plots the actual response variable against
the percentage of positive observations
Softmax: the predicted response variable for
with correct predictions.
𝑒𝑒 l6 each model.
𝑔𝑔(𝑧𝑧* ) = • Specificity is the percentage of negative
∑a
\3" 𝑒𝑒
l, • The better model is closer to the
observations with correct predictions.
diagonal line.
Estimating Parameters • Observations belong to exactly one of the
• Coefficients are also called weights. Simple Quantile Plots following four groups: true positive, true
Intercepts are also called biases. • Plots the average actual response and the negative, false positive, false negative.
• Estimated to minimize squared-error loss average predicted response for each • AUROC is the area under the ROC curve.
for a regression problem and to minimize quantile for each model. • The better model has a larger AUROC.
cross entropy for a classification problem. • The better model is better at predicting
o Squared-error loss: ∑213"[𝑦𝑦1 − 𝑓𝑓(𝐱𝐱1 )]& the actual response in each quantile, has
o Cross entropy: fewer reversals, and has a larger vertical
− ∑213" ∑a*3" 𝑦𝑦1,* ln[𝑓𝑓* (𝐱𝐱 1 )] distance between the first and
• Slow learning, using gradient descent: last quantiles.
1. Start with an initial estimate 𝜽𝜽 ®(N) for 𝜽𝜽,
Double Lift Charts
and set 𝑡𝑡 = 0. • Plots the average actual response and the
2. Iterate until the objective 𝑅𝑅(𝜽𝜽) fails to average predicted response for each
decrease: quantile for each model in one chart.
®(@b") ← 𝜽𝜽
a. Set 𝜽𝜽 ®(@) − 𝜌𝜌 ⋅ 𝑅𝑅′r𝜽𝜽
® (@) s.
• The better model is better at predicting
b. Set 𝑡𝑡 ← 𝑡𝑡 + 1. the actual response in each quantile.
• Stochastic gradient descent is gradient
descent, but instead of all 𝑛𝑛 observations Loss Ratio Charts
contributing to the calculation of • Plots the actual loss ratio for
gradient, only a sampled minibatch does. each quantile.
• Regularization such as lasso, ridge, early • Used to examine the efficacy of the
stopping, and dropout learning mitigates current rating plan.
the risk of overfitting. • If new rating plan can distinguish
between policies with low loss ratios and
those with high loss ratios, the current
rating plan is poor.
Lorenz Curves
• Plots the cumulative percentage of actual
response against the cumulative
percentage of exposures.
• The Gini index is twice the area between
the Lorenz curve and the line of equality.
• The better model has a larger Gini index.
© 2023 Coaching Actuaries. All Rights Reserved www.coachingactuaries.com MAS-II Formula Sheet 12
Time
TIME SERIES Series with
WITH Additive Seasonality Time Series Models
CONSTANT VARIANCE
Constant Variance 1. Estimate the seasonality component for
White Noise
each observation as 𝑠𝑠̂@ = 𝑥𝑥@ − 𝑚𝑚 ≠@.
E[𝑊𝑊@ ] = 0
Notation 2. Calculate the average seasonality &
Var[𝑊𝑊@ ] = 𝜎𝜎s
𝑀𝑀@ Trend component for each season, 𝑠𝑠̅1 .
𝛾𝛾M = 0, 𝑘𝑘 ≠ 0
𝑆𝑆@ Seasonal effect 3. Adjust the averages to equal 0, i.e.,
𝑍𝑍@ Error term calculate each 𝑠𝑠̅1∗ = 𝑠𝑠̅1 − ∑`13" 𝑠𝑠̅1 ⁄𝑔𝑔. Random Walk
𝛾𝛾M Lag 𝑘𝑘 autocovariance function @
4. Calculate seasonally adjusted data as
𝜌𝜌M Lag 𝑘𝑘 autocorrelation function 𝑥𝑥@ − 𝑠𝑠̅1∗ . 𝑋𝑋@ = 𝑋𝑋@#" + 𝑊𝑊@ = — 𝑊𝑊1
𝑐𝑐M Lag 𝑘𝑘 sample 13"
© 2023 Coaching Actuaries. All Rights Reserved www.coachingactuaries.com MAS-II Formula Sheet 13
Autoregressive Models, AR(𝑝𝑝) ARMA Models, ARMA(𝑝𝑝, 𝑞𝑞) Time Series with Regression
𝑋𝑋@ = 𝛼𝛼" 𝑋𝑋@#" + 𝛼𝛼& 𝑋𝑋@#& + ⋯ + 𝛼𝛼B 𝑋𝑋@#B + 𝑊𝑊@ 𝑋𝑋@ = 𝛼𝛼" 𝑋𝑋@#" + ⋯ + 𝛼𝛼B 𝑋𝑋@#B + 𝑊𝑊@ + 𝛽𝛽" 𝑊𝑊@#"
Variance of Sample Mean
𝜃𝜃B (𝐁𝐁) ⋅ 𝑋𝑋@ = 𝑊𝑊@ + ⋯ + 𝛽𝛽5 𝑊𝑊@#5 2#"
𝜎𝜎 & 𝑘𝑘
𝜃𝜃B (𝐁𝐁) ⋅ 𝑋𝑋@ = 𝜙𝜙5 (𝐁𝐁) ⋅ 𝑊𝑊@ Var[𝑋𝑋≤] = ˆ1 + 2 — 1 − Ò 𝜌𝜌M ˜
Stationary AR(1) 𝑛𝑛 𝑛𝑛
M3"
𝑋𝑋@ = 𝛼𝛼𝑋𝑋@#" + 𝑊𝑊@ Stationary ARMA(1, 1)
𝑋𝑋@ = 𝛼𝛼𝑋𝑋@#" + 𝑊𝑊@ + 𝛽𝛽𝑊𝑊@#" Harmonic Seasonal Model
E[𝑋𝑋@ ] = 0 ⌊`⁄&⌋
&
𝜎𝜎s E[𝑋𝑋@ ] = 0 2𝜋𝜋𝜋𝜋𝜋𝜋
Var[𝑋𝑋@ ] = 𝑋𝑋@ = 𝑀𝑀@ + — ¢𝛽𝛽",: sin Ò
1 − 𝛼𝛼 & 1 + 2𝛼𝛼𝛼𝛼 + 𝛽𝛽 & 𝑔𝑔
& :3"
M &
𝛼𝛼 𝜎𝜎s Var[𝑋𝑋@ ] = 𝜎𝜎s * .
1 − 𝛼𝛼 & 2𝜋𝜋𝜋𝜋𝜋𝜋
𝛾𝛾M = + 𝛽𝛽&,: cos Ò£ + 𝑍𝑍@
1 − 𝛼𝛼 & 1 + 𝛼𝛼𝛼𝛼 𝑔𝑔
& (𝛼𝛼
𝜌𝜌M = 𝛼𝛼 M 𝛾𝛾M = 𝜎𝜎s + 𝛽𝛽)𝛼𝛼 M#" Ò, 𝑘𝑘 > 0
1 − 𝛼𝛼 &
|𝛼𝛼| < 1 𝛼𝛼 M#" (𝛼𝛼
+ 𝛽𝛽)(1 + 𝛼𝛼𝛼𝛼) Correction Factors for Logged Models
𝜌𝜌M = , 𝑘𝑘 > 0 • If 𝑍𝑍@ follows a Gaussian white noise
Stationary AR(2) 1 + 2𝛼𝛼𝛼𝛼 + 𝛽𝛽&
𝜌𝜌M = 𝛼𝛼𝜌𝜌M#" , 𝑘𝑘 ≥ 2 process, use the lognormal correction
𝑋𝑋@ = 𝛼𝛼" 𝑋𝑋@#" + 𝛼𝛼& 𝑋𝑋@#& + 𝑊𝑊@ factor:
𝛼𝛼& − 𝛼𝛼" < 1 ARIMA Models, ARIMA(𝑝𝑝, 𝑑𝑑, 𝑞𝑞) "
E[𝑒𝑒 y-49 ] = 𝑒𝑒 - ⁄&
𝛼𝛼& + 𝛼𝛼" < 1 𝜃𝜃B (𝐁𝐁)(1 − 𝐁𝐁)W ⋅ 𝑋𝑋@ = 𝜙𝜙5 (𝐁𝐁) ⋅ 𝑊𝑊@ • For any 𝑍𝑍@ , use the empirical correction
|𝛼𝛼& | < 1 • If ∇W 𝑋𝑋@ = 𝑊𝑊@ , then 𝑋𝑋@ is I(𝑑𝑑). factor:
2
• If ∇W 𝑋𝑋@ is ARMA(𝑝𝑝, 𝑞𝑞), then 𝑋𝑋@ is 1
Moving Average Models, MA(𝑞𝑞) ARIMA(𝑝𝑝, 𝑑𝑑, 𝑞𝑞). E[𝑒𝑒 y-49 ] = — 𝑒𝑒 y3
𝑋𝑋@ = 𝑊𝑊@ + 𝛽𝛽" 𝑊𝑊@#" + ⋯ + 𝛽𝛽5 𝑊𝑊@#5 𝑛𝑛
@3"
• ARIMA(0, 𝑑𝑑, 𝑞𝑞) = IMA(𝑑𝑑, 𝑞𝑞)
𝑋𝑋@ = 𝜙𝜙5 (𝐁𝐁) ⋅ 𝑊𝑊@ • ARIMA(𝑝𝑝, 𝑑𝑑, 0) = ARI(𝑝𝑝, 𝑑𝑑)
E[𝑋𝑋@ ] = 0 Seasonal ARIMA(𝑝𝑝, 𝑑𝑑, 𝑞𝑞)(𝑃𝑃, 𝐷𝐷, 𝑄𝑄)`
5
Θt (𝐁𝐁 ` ) ⋅ 𝜃𝜃B (𝐁𝐁) ⋅ (1 − 𝐁𝐁 ` )u ⋅ (1 − 𝐁𝐁)W ⋅ 𝑋𝑋@
&
Var[𝑋𝑋@ ] = 𝜎𝜎s — 𝛽𝛽1&
13N
= Φv (𝐁𝐁 ` ) ⋅ 𝜙𝜙5 (𝐁𝐁) ⋅ 𝑊𝑊@
5#M
&
𝛾𝛾M = 𝜎𝜎s — 𝛽𝛽1 𝛽𝛽1bM , 0 ≤ 𝑘𝑘 ≤ 𝑞𝑞
13N
1, 𝑘𝑘 = 0
⎧ 5#M
⎪∑13N 𝛽𝛽1 𝛽𝛽1bM
𝜌𝜌M = 5 &
, 1 ≤ 𝑘𝑘 ≤ 𝑞𝑞
⎨ ∑13N 𝛽𝛽1
⎪0, 𝑘𝑘 > 𝑞𝑞
⎩
• An invertible MA(𝑞𝑞) model can be
expressed as a stationary AR(∞) model.
• A stationary AR(𝑝𝑝) model can be
expressed as an invertible MA(∞) model.
© 2023 Coaching Actuaries. All Rights Reserved www.coachingactuaries.com MAS-II Formula Sheet 14