0% found this document useful (0 votes)
48 views32 pages

18.650 - Fundamentals of Statistics

This document discusses generalized linear models which generalize normal linear regression models. It defines components of linear models and generalized linear models. Examples are provided of linear models for a kyphosis data set and a predator-prey model. Exponential family distributions are discussed including the normal, Poisson, Bernoulli, and gamma distributions. Canonical forms of exponential families are defined.

Uploaded by

phantom29
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views32 pages

18.650 - Fundamentals of Statistics

This document discusses generalized linear models which generalize normal linear regression models. It defines components of linear models and generalized linear models. Examples are provided of linear models for a kyphosis data set and a predator-prey model. Exponential family distributions are discussed including the normal, Poisson, Bernoulli, and gamma distributions. Canonical forms of exponential families are defined.

Uploaded by

phantom29
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

18.

650 – Fundamentals of Statistics

7. Generalized linear models

1/32
Linear model

A linear model assumes


2
Y |X = x ⇠ N (µ(x), I),

And 1
>
IE(Y |X = x) = µ(x) = x ,

1
Throughout we drop the boldface notation for vectors 2/32
Components of a linear model

The two model components (that we are going to relax) are

1. Random component: the response variable Y is continuous


and Y |X = x is with mean µ(x).

2. Regression function: µ(x) = x > .

3/32
Kyphosis

The Kyphosis data consist of measurements on 81 children


following corrective spinal surgery. The binary response variable,
Y , indicates the presence or absence of a postoperative deforming.
The three covariates are:
I X (1) : Age of the child in month,

I X (2) : Number of the vertebrae involved in the operation, and

I X (3) : Start of the range of the vertebrae involved.

Write X = ( (1) (2)


,X ,X ,X )(3) > 2 IR 4

4/32
Kyphosis

I The response variable is binary so there is no choice:


Y |X = x is with expected value
µ(x) = IE[Y |X = x] 2
I We cannot write
>
µ(x) = x
because the right-hand side ranges through
I We need an invertible function f such that f (x> ) 2

5/32
Generalization

A generalized linear model (GLM) generalizes normal linear


regression models in the following directions.

1. Random component:

Y |X = x ⇠ some distribution

(e.g. Bernoulli, exponential, Poisson)

2. Regression function:
>
µ(x) = x

where g called link function and µ(x) = IE(Y |X = x) is the

6/32
Predator/Prey
Consider the following model for the number of preys Y that a
predator (Hawk) catches per day a predator given a number X of
preys (mice) in its hunting territory.
Random component: Y > 0 and the variance of capture rate is
known to be approximately equal to its expectation so we propose
the following model:

Y |X = x ⇠

Where µ(x) = IE[Y |X = x].


Regression function: We assume
mx
µ(x) = , for some unknown m, h > 0.
h+x
where:
I m is the max expected daily preys the predator can cope with
I h is the number of preys such that µ(h) =
7/32
The regression function m(x) for m = h = 10

8/32
Example 2: Prey Capture Rate

Obviously µ(x) is not linear but using reciprocal link: g(x) = ,


the right-hand side can be made linear in the parameters:

1 1
g(µ(x)) = = = 0+ 1 .
µ(x) x

9/32
Exponential Family

A family of distribution {IP✓ : ✓ 2 ⇥}, ⇥ ⇢ k


is said to be a
IR
q
k-parameter exponential family on IR , if there exist real valued
functions:
I ⌘1 , ⌘2 , · · · , ⌘k and B of ✓,

I T1 , T2 , · · · , Tk , and h of y 2 IRq such that the density


function (pmf or pdf) of IP✓ can be written as

hX
k i
f✓ (y) = exp ⌘i (✓)Ti (y) B(✓) h(y)
i=1

10/32
Normal distribution example
I Consider Y ⇠ N (µ, 2 ), ✓ = (µ, 2 ). The density is
⇣µ 1 µ 2 ⌘ 1
2
f✓ (y) = exp 2
y 2
y 2
p ,
2 2 2⇡
which forms a two-parameter exponential family with
µ 1 2
⌘1 = 2
, ⌘ 2 = 2
, T 1 (y) = y, T 2 (y) = y ,
2
µ 2 p
B(✓) = 2
+ log( 2⇡), h(y) = 1.
2
I When 2 is known, it becomes a one-parameter exponential
family on IR:
y2
µ µ 2 e 2 2
⌘= 2
, T (y) = y, B(✓) = 2
, h(y) = p .
2 2⇡
11/32
Examples of discrete distributions

The following distributions form discrete exponential families of


distributions with pmf

I Bernoulli(p): p (1y
p) 1 y
, y 2 {0, 1}
y
I Poisson( ): e , y = 0, 1, . . . .
y!

12/32
Examples of Continuous distributions
The following distributions form continuous exponential families
of distributions with pdf:
1 y
I Gamma(a, b): y a 1
e b;
(a)ba
I above: a: shape parameter, b: scale parameter
I reparametrize: µ = ab: mean parameter
✓ ◆a
1 a a 1
ay
y e µ.
(a) µ

I Inverse Gamma(↵, ): y ↵ 1
e /y
.
(↵)
s
2 2 (y µ)2
I Inverse Gaussian(µ, 2 ): e 2µ2 y .
2⇡y 3

Others: Chi-square, Beta, Binomial, Negative binomial


distributions.
13/32
One-parameter canonical exponential family

I Canonical exponential family for k = 1, y 2 IR


⇣ y✓ b(✓) ⌘
f✓ (y) = exp + c(y, )

for some known functions b(·) and c(·, ·) .

I If is known, this is a one-parameter exponential family with


✓ being the canonical parameter .
I If is unknown, this may/may not be a two-parameter
exponential family.
I is called dispersion parameter.
I In this class, we always assume that is known.

14/32
Normal distribution example

I Consider the following Normal density function with known


variance 2 ,
1 (y µ)2
f✓ (y) = p e 2 2
2⇡
⇢ 1 2 ✓ ◆
yµ 2µ 1 y2 2
= exp 2 2
+ log(2⇡ ) ,
2

I Therefore ✓ = µ, = 2, b(✓) = ✓2
2 , and

1 y 2
c(y, ) = ( + log(2⇡ )).
2

15/32
Other distributions

Table 1: Exponential Family


Normal Poisson Bernoulli
Notation 2
N (µ, ) P(µ) B(p)
Range of y ( 1, 1) [0, 1) {0, 1}
2 1 1
✓2
b(✓) 2 e✓ log(1 + e✓ )
1 y2
c(y, ) 2 ( + log(2⇡ )) log y! 0

16/32
Likelihood

Let `(✓) = log f✓ (Y ) denote the log-likelihood function.


The mean IE(Y ) and the variance var(Y ) can be derived from the
following identities
I First identity
@`
IE( ) =
@✓
I Second identity

@2` @` 2
IE( 2 ) + IE( ) = 0.
@✓ @✓

17/32
Expected value

Note that
Y✓ b(✓)
`(✓) = + c(Y ; ),

Therefore
@`
=
@✓
It yields
@` IE(Y ) 0
b (✓)
0 = IE( ) = ,
@✓
which leads to
IE(Y ) =

18/32
Variance
On the other hand we have we have
@2` @` 2
+ ( ) =
@✓2 @✓
and from the previous result,

Y 0
b (✓) Y IE(Y )
=

Together, with the second identity, this yields


00
b (✓) var(Y )
0= + 2
,

which leads to
var(Y ) =

19/32
Example: Poisson distribution

Example: Consider a Poisson likelihood,


µy µ
f (y) = e = exp y log µ µ log(y!)
y!
Thus,

✓= b(✓) = = c(y, ) = log(y!),

So
✓ 00
µ=e , b(✓) = b (✓) =

20/32
Link function

I is the parameter of interest, and needs to appear somehow


in the likelihood function to use maximum likelihood.
I A link function g relates the linear predictor X > to the mean
parameter µ,
>
X = g(µ).
I g is required to be monotone increasing and di↵erentiable
1 >
µ=g (X ).

21/32
Examples of link functions

I For LM, g(·) = identity.


I Poisson data. Suppose Y |X ⇠ Poisson(µ(X)).
I µ(X) > 0;
I log(µ(X)) = X > ;
I In general, a link function for the count data should map
(0, +1) to IR.
I The log link is a natural one.
I Bernoulli/Binomial data.
I 0 < µ < 1;
I g should map (0, 1) to IR:
I 3 choices: ⇣ ⌘
µ(X)
1. logit: log 1 µ(X)
= X> ;
2. probit: 1
(µ(X)) = X > where (·) is the normal cdf;
I The logit link is the natural choice.

22/32
Examples of link functions for Bernoulli response
5

2
I in blue:
1
1 g1 (x) = f1 (x) =
x
log (logit link)
0
1 x
-1
I in red:
1 1
g2 (x) = f2 (x) = (x)
-2
(probit link)
-3

-4

-5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

23/32
Examples of link functions for Bernoulli response
1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
-5 -4 -3 -2 -1 0 1 2 3 4 5

ex
I in blue: f1 (x) =
1 + ex
I in red: f2 (x) = (x) (Gaussian CDF)
24/32
Canonical Link

I The function g that links the mean µ to the canonical


parameter ✓ is called Canonical Link:

g(µ) = ✓

I Since µ = b0 (✓), the canonical link is given by


0 1
g(µ) = (b ) (µ) .

I If > 0, the canonical link function is strictly increasing.


Why?

25/32
Example: the Bernoulli distribution

I We can check that



b(✓) = log(1 + e )

I Hence we solve

0 exp(✓)
b (✓) = =µ , ✓=
1 + exp(✓)
I The canonical link for the Bernoulli distribution is the

26/32
Other examples

b(✓) g(µ)
Normal 2
✓ /2 µ
Poisson exp(✓) log µ
✓ µ
Bernoulli log(1 + e ) log 1 µ
1
Gamma log( ✓) µ

27/32
Model and notation

I Let (Xi , Yi ) 2 IRp ⇥ IR, i = 1, . . . , n be independent random


pairs such that the conditional distribution of Yi given
Xi = xi has density in the canonical exponential family:

yi ✓ i b(✓i ) o
f✓i (yi ) = exp + c(yi , ) .

I Y = (Y1 , . . . , Yn )> , X = (X1 , . . . , Xn )>


I Here the mean µi = IE[Yi |Xi ] is related to the canonical
parameter ✓i via
µi =
I and µi depends linearly on the covariates through a link
function g:
g(µi ) = .

28/32
Back to

I Given a link function g, note the following relationship


between and ✓:
0 1
✓i = (b ) (µi )
0 1 1 > >
= (b ) (g (Xi )) ⌘ h(Xi ),

where h is defined as
0 1 1 0 1
h = (b ) g = (g b ) .

I Remark: if g is the canonical link function, h is

29/32
Log-likelihood

I The log-likelihood is given by


X Yi ✓i b(✓i )
`n (Y, X, ) =
i
X Yi h(X > ) >
b(h(Xi ))
i
=
i

up to a constant term.
I Note that when we use the canonical link function, we obtain
the simpler expression
X Yi X > >
b(Xi )
i
`n (Y, X, ) =
i

30/32
Strict concavity

I The log-likelihood `(✓) is strictly concave using the


canonical function when > 0. Why?
I As a consequence the maximum likelihood estimator is

I On the other hand, if another parameterization is used, the


likelihood function may not be strictly concave leading to
several local maxima.

31/32
Concluding remarks

I Maximum likelihood for Bernoulli Y and the logit link is called

I In general, there is no closed form for the MLE and we have


to use
I The asymptotic normality of the MLE also applies to GLMs.

32/32

You might also like