0% found this document useful (0 votes)
11 views24 pages

w6 - Statistical Modelling

statistical modelling

Uploaded by

viickeemart
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views24 pages

w6 - Statistical Modelling

statistical modelling

Uploaded by

viickeemart
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Generalized Linear Models

120
Agenda

• Motivation

• Exponential family

• Generalized linear models

• MLE of a generalized linear model

• Goodness of fit measures

• Comparison of models

• Extensions

121
. . . Agenda

• Two practitioners’ guides

▶ A Practitioner’s Guide to generalized Linear Models,


https://www.casact.org/pubs/dpp/dpp04/04dpp1.pdf

▶ Generalized linear Models for Insurance Rating,


https://www.casact.org/sites/default/files/2021-01/
05-Goldburd-Khare-Tevet.pdf

122
The Problem

• Task: build up a statistical procedure relating

▶ a response variable Y

▶ to explanatory variables or covariates X = (X1 , . . . , Xp ),

▶ n (independent) observations for each variable, yi , xi = (xi1 , . . . , xip ),


i = 1, . . . , n (Yi corresponding rvs)

• Use the statistical model for

▶ predictions

▶ inference

• Linear model: too rigid assumptions


Linear Models

• Multivariate linear regression model:

▶ given Xi = xi ,

Yi = β0 + β1 xi1 + . . . + βp xip + εi , i = 1, . . . , n

where the error terms (εi ) are iid with εi ∼ N(0, σε2 )

▶ equivalently, given Xi = xi ,

Yi ∼ N(β0 + β1 xi1 + . . . + βp xip , σε2 )

where
E [Yi |Xi = xi ] = β0 + β1 xi1 + . . . + βp xip
and the Yi independent

124
. . . Linear Models

• Multivariate linear regression model: defined by

▶ [LM1] random component: conditionally on Xi = xi ,

Yi ∼ N(µi , σε2 )

and independent

▶ [LM2] systematic component: the p covariates are combined to form a


linear predictor
ηi = β0 + β1 xi1 + . . . + βp xip

▶ [LM3] link function connecting the random and the systematic


components: linear model ≡ identity function:

E [Yi ] = µi = g −1 (ηi ) = ηi

where g =Id

125
Shortcomings of the Linear Model

• Normality assumption (1) ⇝ often not justified in practice

▶ errors may present skewness and/or kurtosis

• Normality assumption (2)

▶ response variables can take any real value, inconsistent with

▶ count data (number of claims, number of deaths, . . . )

▶ positive variables (claim sizes, lifetimes, . . . )

▶ bounded variables (relative frequencies, death or claim probabilities,


...)

126
. . . Shortcomings of the Linear Model

• Some models can be linearized

▶ Example.

Yi = β0 eβ1 Xi εi ⇝ log Yi = log β0 + β1 Xi + log εi

▶ linearization may not be possible or meaningful

• The variance of the response variable is constant (homoscedasticity)

▶ counter-intuitive if the variable is positive and the mean goes to 0

▶ the variance should be related to the mean

127
Generalized Linear Models

• GLM: a convenient framework where [LM1] and [LM3] are relaxed

▶ [GLM1] random component: conditionally on Xi = xi , Yi has a


distribution in the exponential family, and are independent

▶ [GLM2] systematic component: the p covariates are combined to form


the linear predictor

ηi = β0 + β1 xi1 + . . . + βp xip

▶ [GLM3] link function connecting the random and systematic


components: any differentiable and monotonic function such that

E [Yi ] = µi = g −1 (ηi ) or g (µi ) = ηi

128
Exponential Family

• A parametric family of distributions is a set

{fθ (y )| θ ∈ Θ}

where for each θ ∈ Θ, fθ is a pdf or pmf

• Y rv belongs to the exponential family if its pdf (continuous case) or


pmf (discrete case) can be written as
 
y · θ − b(θ)
fY (y ; θ, ϕ) = exp + c(y , ϕ)
a(ϕ)

where a, b and c are known functions and θ, ϕ are parameters

129
Exponential Family

• Two parameters

▶ θ: natural (or canonical) parameter, related to the location of the


distribution ⇝ the covariates will enter into θ

▶ ϕ: related to the scale parameter ⇝ ϕ enters in the variance only

• Further

▶ the distribution may depend only on the canonical parameter (scale


known)

▶ the variance depends on both θ and ϕ ⇝ it depends on the mean

▶ the distribution is determined by mean and (possibly]) variance


. . . Exponential Family
• Exercise. Check that the N(µ, σ 2 ), µ ∈ R, σ 2 > 0 is an exponential
family; identify θ, ϕ, a, b, c.
1 1 2
√ e− 2σ2 (y −µ) =
2 π σ2

y µ − µ2 /2 1 y2
  
2
= exp − + log(2πσ )
σ2 2 σ2
so that
θ = µ, ϕ = σ 2 , a(ϕ) = ϕ = σ 2
θ2 1
b(θ) = , c(y , ϕ) = − (y 2 /2 + log(2πϕ))
2 2

▶ another possible choice is θ = 2µ, b(θ) = θ2 /4, ϕ = σ 2 a(ϕ) = 2ϕ ⇝


parametrization is not unique
▶ note that E [Y ] = µ = b ′ (θ), VAR[Y ] = σ 2 = b ′′ (θ)a(ϕ)
131
. . . Exponential Family

• In general for any distribution in the exponential family

E [Y ] = µ = b ′ (θ), VAR[Y ] = b ′′ (θ)a(ϕ)

▶ the mean only depends on θ:

θ = (b ′ )−1 (E [Y ]) = (b ′ )−1 (µ)

⇝ prediction essentially depends on θ

▶ the variance depends on the mean, through the variance function


V (µ) = b ′′ (θ), and on the scale parameter ϕ:

VAR[Y ] = V (µ)a(ϕ)

132
. . . Exponential Family

• Exercise. Check that Poisson(µ), µ > 0 is an exponential family;


identify θ, ϕ, a, b, c; for positive integer y

µy e−µ
= exp (y log µ − µ − log(y !))
y!

so that
θ = log µ, b(θ) = µ = eθ , ϕ = 1
a(ϕ) = 1, c(y , ϕ) = − log(y !)

▶ again b ′ (θ) = eθ = µ = E [Y ] and the variance function is


V (µ) = b ′′ (θ) = eθ = µ
▶ the variance only depends on the variance function; scale parameter is
ϕ = 1 ⇝ no need of estimating ϕ

133
. . . Exponential Family

• Some examples
model fY (y ; θ, ϕ) E [Y ] θ b(θ) ϕ a(ϕ)
θ2
N(µ, σ 2 ) µ µ 2
σ2 ϕ

Gamma(α, λ) µ= α
λ
− µ1 − log(−θ) α 1
ϕ

Poisson(µ) µ log µ eθ 1 1
µ
Bernoulli(p) µ=p log 1−µ
log(1 + eθ ) 1 1

• Many other distributions belong to the exponential family:


▶ binomial with fixed (known) number of trials
▶ binomial proportion
▶ inverse normal
▶ ...

134
. . . Exponential Family

• Not every (family of) distributions is an exponential family

• Distributions for which the support changes with the parameters

▶ Exercise. uniform over an interval (a, b)

▶ binomial where the number of trials is a parameter

γ
• Weibull: fY (y ; c, γ) = c γ y γ−1 e−c y , y > 0, with c > 0, γ > 0

135
The Link Function

• Example. Exam pass rate; response variable

▶ Y = student pass/fail in an actuarial exam (categorical)

▶ map it to 0/1 ⇝ Bernoulli

▶ E [Y ] = µ = pass prob.

• Example. Exam pass rate; 3 covariates (p = 3)

▶ N: number of assignments submitted by the student (0, 1, 2, 3, 4)

▶ S: student’s mark on the mock exam (0 ≤ S ≤ 100)

▶ T : whether the student has attended tutorials or not (Yes or No,


categorical)
. . . The Link Function

• Example. Exam pass rate; linear predictor is

η = αT + β1 N + β2 S

where αT = αYes or αT = αNo ⇝ not using baseline here!

• Need a link function transforming the value η into the mean µ = E [Y ]

g (µ) = η ⇔ µ = g −1 (η)

▶ in this example µ is a probability ⇝ 0 < g −1 (η) < 1 for all η


▶ for instance, the logit or log-odds function


 
µ
η = g (µ) = log , µ = g −1 (η) =
1−µ 1 + eη

137
. . . The Link Function

• Example. Pass rate example; to estimate the model individual data


(whether student passed or not, and the corresponding covariates) is
needed

▶ suppose ML (see later) gives the following estimates

αYes = −1.501, αNo = −3.196, β1 = 0.5459, β2 = 0.0251

▶ note αYes > αNo , β1 , β2 > 0, but β2 close to 0 ⇝ comments?

• equivalently, grouped data – pass frequency for groups of students


sharing the same value of covariates – could be used ⇝ Binomial
proportion distribution

138
. . . The Link Function

• Exercise. Given the above estimates,

1 what is the pass probability for a student who attends tutorials,


submits 3 assignments, and scores 60% on the mock exam?

2 how much would the probability change if the fourth assignment were
submitted

3 what is the highest pass probability for someone who does not attend
tutorials?

4 can anyone get a pass probability of 0 or 1? If not, what are the


maximum and minimum scores?

• Prediction of Y : for instance, if µ


b > 0.5 then yb = 1

139
. . . The Link Function
• Exercise.
1 use
ηb = −1.501 + 0.5459 · 3 + 0.0251 · 60 = 1.6427
1.6427
e
and so µb = 1+e 1.6427 = 84%

2 using N = 4, same calculation gives ηb = 2.1886 and µb = 90% so the


pass rate increases by 6%
3 use
ηb = −3.196 + 0.5459 · 4 + 0.0251 · 100 = 1.4976
and so µ
b = 82%
4 min prob.:

ηb = −3.196 + 0.5459 · 0 + 0.0251 · 0 = −3.196

and µ
b = 4%; max prob.:

ηb = −1.501 + 0.5459 · 4 + 0.0251 · 10 = 3.1926

and µ
b = 96%
140
. . . The Link Function

• In general, the link function relates the response mean µ = E [Y ] to


the linear predictor

η = β0 + β1 X1 + . . . + βp Xp = g (µ)

where g is the link function, assumed to be

▶ differentiable ⇝ smooth, for ML

▶ monotonic (increasing or decreasing) ⇝ parameter interpretation

• Inverting, g −1 inverse link function

µ = g −1 (η) = g −1 (β0 + β1 X1 + . . . + βp Xp )

141
. . . Link Function

• When linearizing a model (as in p. 127) the response is transformed;


here the mean of the response is transformed

• Link function choice g is not unique but must be consistent with the
range of µ = E [Y ]

• Canonical link function: guarantees that

θ = η = g (µ)

recalling that
µ = b ′ (θ)⇝ θ = (b ′ )−1 (µ)
the canonical link function is

g (µ) = θ = (b ′ )−1 (µ)


. . . Link Function

• Some canonical link functions

model link function g (µ) name


normal g (µ) = θ = µ identity
Poisson g (µ) = θ = log µ log(arithm)
 
µ
Binomial g (µ) = θ = log n−µ logit (log-odds)
(fixed n)
Gamma g (µ) = θ = 1
µ
reciprocal∗

actually in the Gamma case g (µ) = − µ1 , but the ‘−1’ is dropped as it can be
absorbed in the parameters

143

You might also like