0% found this document useful (0 votes)
47 views11 pages

Generalized Linear Models: FX Axb C DX Axb C DX

The document discusses generalized linear models, which unify linear and nonlinear regression models. It describes how distributions from the exponential family, such as the normal, binomial, and Poisson distributions, can be represented in a canonical form with natural parameters. This allows the mean and variance of these distributions to be expressed consistently for use in generalized linear models. Examples are provided to demonstrate how the binomial and Poisson distributions fit into this framework.

Uploaded by

Aishat Omotola
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views11 pages

Generalized Linear Models: FX Axb C DX Axb C DX

The document discusses generalized linear models, which unify linear and nonlinear regression models. It describes how distributions from the exponential family, such as the normal, binomial, and Poisson distributions, can be represented in a canonical form with natural parameters. This allows the mean and variance of these distributions to be expressed consistently for use in generalized linear models. Examples are provided to demonstrate how the binomial and Poisson distributions fit into this framework.

Uploaded by

Aishat Omotola
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Chapter 16

Generalized Linear Models

The usual linear regression model assumes normal distribution of study variables whereas nonlinear
logistic and Poison regressions are based on Bernoulli and Poisson distributions respectively of study
variables. Similar to as in logistic and Poisson regressions, the study variable can follow different
probability distributions like exponential, gamma, inverse normal etc. One such family of distribution is
described by exponential family of distributions. The generalized linear model is based on this
distribution and unifies linear and nonlinear regression models. It assumes that the distribution of study
variable is a member of exponential family of distribution.

Exponential family of distribution


A random variable X belongs to exponential family with single parameter θ has a probability density
function
f ( X , θ ) exp [ a ( X )b(θ ) + c(θ ) + d ( X ) ]
=

where a ( X ), b(θ ), c(θ ) and d ( X ) are all known function.

If a ( X ) = X , the distribution in said to be in canonical form. The function b(θ ) is called the natural
parameter of the distribution. The parameter θ is of interest and all other parameters which are not of
interest are called nuisance parameters.

Example:
Normal distribution
1  1 
f x ( x, µ , σ )
= exp  − 2 ( x − µ ) 2  ; −∞ < x < ∞; −∞ < µ < ∞; σ 2 > 0
σ 2π  2σ 
  µ   µ 2
1  x2 
= exp  x  2  +  − 2 − ln 2πσ 2  − 2  .
  σ   2σ 2  2σ 
µ
Here= , b(θ )
a ( x) x= .
σ2

Regression Analysis | Chapter 16 | Generalized Linear Models | Shalabh, IIT Kanpur


1
Binomial distribution
n
x, p )   p x (1 − p ) n − x , 0 < p <=
f (= 1, x 0,1,..., n
 x
  p   n 
= exp  x ln   + n ln(1 − p ) + ln    .
  1− p   x 
Here
 p 
= , b(θ ) ln 
a ( x) x= .
 1− p 

Expected values and variance of a(X):


The exponential family of distribution for a random variables X and parameter of interest θ is
f ( X , θ ) exp [ a( X )b(θ ) + c(θ ) + d ( X ) ]
=
= , θ ) a ( X )b(θ ) + c(θ ) + d ( X ).
L ln f ( X=

dL
Let U =

then for any distribution
E (U ) = 0

Var (U=
) E (U 2=
) E (−U ')
dU
where U ' = . The function U is called score and Var (U ) is called information.

The log-likelihood function is
=L ln [ f ( X ,=
θ ) ] a( X )b(θ ) + c(θ ) + d ( y )
and then
dL
=
U = a ( X )b '(θ ) + c '(θ )

d 2L
=U' = a( X )b "(θ ) + c "(θ )
dθ 2
db(θ ) d 2b(θ ) dc(θ ) d 2 c(θ )
where b '(θ ) =
= , b "(θ ) = , c '(θ ) = and c "(θ ) .
dθ dθ 2 dθ dθ 2
Since E(U) = 0, so

Regression Analysis | Chapter 16 | Generalized Linear Models | Shalabh, IIT Kanpur


2
=E (U ) b '(θ ) E [ a ( X ) ] + c '(θ )
=0 b '(θ ) E [ a ( X ) ] + c '(θ )
c '(θ )
⇒ E [ a( X )] =
− .
b '(θ )
Since

Var (U ) = [b '(θ ) ] Var [ a ( X ) ] ,


2

−b "(θ ) E [ a ( X )] − c "(θ )
E (−U ') =
Var (U=
) E (−U ')

−b "(θ ) E [ a ( X )] − c "(θ )
⇒ Var [ a ( X )] = 2
[b '(θ )]
b "(θ )c '(θ ) − c "(θ )b '(θ )
= .
[b '(θ )]
3

Now we consider two examples which illustrate how other distribution and their properties can be
obtained as particular cases:

Example: Binomial distribution


Consider X follows a Binomial distribution with parameters n and π , i.e. X ~ Bin(n, π ) . Then in the
framework of exponential class of family
n
f ( x, π )   π x (1 − π ) n − x
=
 x
  n 
= exp  x ln π − x ln(1 − π ) + n ln(1 − π ) + ln    .
  x 

π n
Here a= , θ π , b=
( x) x= (θ ) ln (θ ) n ln(1 − π ), d ( x) = ln   ,
, c=
1− π  x
n
) x ln π − x ln(1 − π ) + n ln(1 − π ) + ln   .
L = ln f ( x, π =
 x
It is the canonical form of f ( x, π ) with natural parameter ln π .

Regression Analysis | Chapter 16 | Generalized Linear Models | Shalabh, IIT Kanpur


3
dL x x n
U = =+ −
dπ π 1 − π 1 − π
x n
= −
π (1 − π ) 1 − π
x − nπ
=
π (1 − π )

E ( x) − nπ
E (U ) =
π (1 − π )

nπ − nπ
=
π (1 − π )
=0

Var ( x)
Var (U ) =
π 2 (1 − π ) 2
nπ (1 − π )
= 2
π (1 − π ) 2
n
=
π (1 − π ) 2
 ( − n) 
E (−U ') = E  − 2
 π (1 − π ) 
n
=
π (1 − π )
1
b '(θ ) = b '(π )
=
π (1 − π )
2π − 1
b "(θ ) =
= b "(π )
[π (1 − π )]
2

n
c '(θ ) =
− = c '(π ).
1− π
n
c "(θ ) =
− = c "(π ).
(1 − π ) 2
Thus
c '(π )
E [ a( X )] =
E( X ) =
− π
=
b '(π )
Var [ a ( X ) ] = Var ( X )
b "(π )c '(π ) − c "(π )b '(π )
= = nπ (1 − π ).
[b '(π )]
3

Regression Analysis | Chapter 16 | Generalized Linear Models | Shalabh, IIT Kanpur


4
Example: Poisson distribution
Consider that the random variable X follows a poisson distribution with parameter λ , i.e., X ~ P(λ ).
Then
exp(−λ )λ x
f ( x, λ ) =
x!
= exp [ x ln λ − λ − ln x !]
L ln f ( x, λ=
= ) x ln λ − λ − ln x !
It is the canonical form of f ( x, λ ) and ln λ is the natural parameter. Here
a( X ) =X , b(θ ) =
ln λ , c(θ ) =
−λ , d ( X ) =
− ln X !
dL x
U= = −1
dλ λ
E ( x)
E=(U ) −1
λ
λ
= −1
λ
=0

Var ( x)
Var (U ) =
λ2
λ
=
λ2
1
=
λ

  d  x  
E (−U ') = E  −   − 1 
  d λ  λ  
 x 
= E 2 
λ 
λ
=
λ2
1
=
λ
1
b '(θ=
) = b '(λ )
λ
1
b "(θ ) = b "(λ )
− 2 =
λ
c '(θ ) =−1 =c '(λ )
c "(θ )= 0= c "(λ )

Regression Analysis | Chapter 16 | Generalized Linear Models | Shalabh, IIT Kanpur


5
c '(λ )
E [ a( X )] = E( X ) = − = λ
b '(λ )
b "(λ )c '(λ ) − c "(λ )b '(λ )
Var [ a ( X ) ] =
[b '(λ )]
3

1
−0

2

1
λ3
= λ.

Linear predictors and link functions


The role of generalized model is basically to unify various distributions of study variable. This is
accomplished by developing a linear model having an appropriate function of expected value of study
variable.

Denoting ηi to be the linear predictor which relates to expected value of study variable, it is expressed
as
ηi = g [ E ( yi ) ]
= g ( µi )
= xi' β
n
where xi' β= β 0 + ∑ β j xij .
j =1

Thus
E ( yi ) = g −1 (ηi )
= g −1 ( xi' β )
where g is the function called as link function.

Several choices of link functions are available. If


ηi = θ i
then ηi is the canonical link. The choice of θi and canonical link is related to the distribution of
study variable which in turn governs the appropriate, usually nonlinear regression models. The

Regression Analysis | Chapter 16 | Generalized Linear Models | Shalabh, IIT Kanpur


6
canonical link provides mathematical convenience in deriving the statistical properties of the model and
compatibility with sensible conclusions on scientific grounds.

For example, is case y has a

• normal distribution, the canonical link function is identity link defined as ηi = µi .

• Binomial distribution, then logistic regression is used and logistic link is used as canonical link
 π 
which is defined as ηi = ln  i  .
 1− πi 
• Poisson distribution, then log link is used as canonical link which is given as ηi = ln λ .

• Exponential and gamma distribution, then the canonical link function used is reciprocal link
1
given by ηi = .
λ1
Other types of link functions are
- probit link given as ηi = Φ −1 [ E ( yi ) ] where Φ is the cumulative distribution function of N (0,1)

distribution.
- Complementary log-log link given by
=ηi ln ln {1 − E ( yi )}

- power family link


[ E ( yi ) ]λ , λ ≠ 0
ηi = 
ln [ E ( yi ) ] , λ = 0
which is based on power transformation similar to Box-Cox transformation.

A link is preferable if it maps the range of µi onto the whole real line and provides good empirical
approximation. It should also carry a meaningful interpretation is case of real applications.

There are two components in any generalized linear model:


i) distribution of study variable and
ii) link function.

The choice of link function is like choosing an appropriate transformation on study variable. The link
function takes the advantage of natural distribution of study variable. The incorrect choice of link
function can give arise to incorrect statistical inferences.
Regression Analysis | Chapter 16 | Generalized Linear Models | Shalabh, IIT Kanpur
7
Maximum likelihood estimation of GLM:
The least squares method can not be directly applied when the study variable is not continuous. So we
use the maximum likelihood estimation method in GLM which has a close connection with iteratively
weighted least squares method.

Given the data ( xi , yi ), i = 1, 2,..., n and y following exponential family of distribution, the joint p.d.f. is

 n n n

θ , φ ) exp  ∑ yi b(θi ) + ∑ c(θi ) + ∑ d ( yi ) 
f ( yi ;=
=  i 1 =i 1 =i 1 
where θ is the parameter of interest and φ is nuisance parameters. The θ and/or φ can be a vector also

like (θ1 , θ 2 ,..., θ n ) and/or (φ1 , φ2 ,..., φn ) respectively.

Consider a smaller set of parameter β = ( β1 , β 2 ,..., β k ) ' which relates some function g ( µi ) to µi . In case

µi is E ( yi ) then g ( µi ) relates µi to a linear combinations of β ' s via g ( µi ) = xi' β .

ri
For example, if data on yi and ni such that yi ~ Bin(ni , π i ), then yi = is the number of successes is
ni

ni trials where π i is the probability of success. Then joint p.d.f. of all n data set is

 n π n n
 ni  
exp  ∑ yi ln i + ∑ ni ln(1 − π i ) + ∑ ln    .
=  i 1= 1 − π i i 1 =i 1  yi  

Assuming that the variation in π i is explained by xi values, choose suitable link function g (π i ) = xi' β .
A sensible link function is log-odds as
πi
g (π i ) = ln .
1− πi
Now the objective is to fit a model
πi
ln = xi' β = β 0 + β1 xi1 + ... + β k xik
1− πi
or equivalently
exp( xi' β )
πi = .
1 + exp( xi' β )
The general log-likelihood function is
n n n n
L( β ) =ln f ( y;θ , φ ) =∑ Li =
∑ yib(θi ) +∑ c(θi ) +∑ d ( yi ).
=i 1 =i 1 =i 1 =i 1

Regression Analysis | Chapter 16 | Generalized Linear Models | Shalabh, IIT Kanpur


8
The log-likelihood function is numerically maximized for a given data set. Generally, iteratively
reweighted least squares method is used.

Suppose β̂ is the final value obtained after optimization and is the maximum likelihood estimator of β ,
then asymptotically

E ( βˆ ) = β
V ( βˆ ) = a (φ )( X 'V −1 X ) −1
where V is a diagonal matrix formed by the variances of estimated parameters in the linear predictor,
apart from a (φ ) . The covariance matrix can be estimated by replacing V by its estimate Vˆ .

In GLM, the variance of yi is not constant and so generalized least squares estimation is used to get
more efficient estimators.

To conduct the test of hypothesis in GLM, the model deviance is used for testing the goodness of model
fit. The difference in deviance of full and reduced models is used to decide for subset model.

The Wald inference can be applied for testing hypothesis and confidence interval estimation about
individual model parameters. The Wald statistic for testing the null hypothesis
H 0 : R β r where R is q × (k + 1) with rank ( R) = q is
=
−1
( R βˆ − r ) '  R ( X 'VX
W= ˆ ) −1 R ' ( R βˆ − r ).

The distribution of W under H 0 is χ 2 distribution with q degrees of freedom.

In particular, for H 0 : β j = β 0 , the test statistic is

βˆ j − β 0
=
Z =
W
se( βˆ j )

which has N (0,1) distribution under H 0 and se( βˆ j ) is the standard error of βˆ j . The confidence

intervals can be constructed using Wald test. For example, 100 (1 − α )% confidence interval for β j is

βˆ j ± Z α se( βˆ j ) .
2

Regression Analysis | Chapter 16 | Generalized Linear Models | Shalabh, IIT Kanpur


9
The likelihood ratio test comprise the maximized log-likelihood function between the full and reduced
models. The reduced model is the full model under null hypothesis.

The likelihood ratio test statistic is


−2( Lˆreduced − Lˆ full )

where Lˆ full and Lˆreduced are the maximized likelihood functions under full and reduced models. The

likelihood ratio test statistic has a χ 2 -distribution with degrees of freedom equal to the difference in the
degrees of freedom of full and reduced model.

Prediction and confidence interval with GLM


Suppose we want to estimate the mean response function at x = x0 . The estimate is given by

yˆ=
0 µˆ=
0 g −1 ( xo' β )

where g is the associated link function.

It is understood that x0 is expandable to model form if more terms, e.g., interaction forms, are to be
accommodated in the linear predictor.

To find the confidence interval, the asymptotic covariance matrix of β̂ given by Ω =a (φ )( X 'V ' X ) −1 ,

is estimated as Ω̂ . The asymptotic variance of linear predictor estimated at x = x0 is

Var (ηˆ0 )= Var ( x0' βˆ )= x0' V ( βˆ ) x0= x0' Ωx0


ˆ x . Then 100(1 − α )% confidence interval on true mean response at x = x is
and its estimate is x0' Ω 0 0

 ˆ x  ≤ µ ( x ) ≤ g −1  x ' βˆ + Z x ' Ω
ˆ 
g −1  x0' βˆ − Z α x0' Ω 0 0 0 α 0 x0  .
 2   2 
This approach usually works in practice because β̂ is the maximum likelihood estimate of β . So any

function of β̂ is also a maximum likelihood estimate. This method constructs the confidence interval in
the space of linear predictor and transform back the interval into the original metric. The Wald method
can also be used to derive the approximate confidence interval for mean response.

Regression Analysis | Chapter 16 | Generalized Linear Models | Shalabh, IIT Kanpur


10
Residual analysis is GLM
The usual approach of finding the residuals is adopted in case of GLM.
The i th ordinary residual from GLM is
e=
i yi − yˆi
= yi − µˆ i .
The residual analysis is generally performed in GLM using deviance residuals defined as
=ri di sign( yi − yˆi )

where di is the contribution of i th observation to the deviance.


We explain it in the context of logistic and poisson regression. In case of logistic regression
 yi 
 yi  1 − n 
=di yi ln   + (ni − yi )  =  , i 1, 2,..., n
i

 niπ i 
ˆ 1− πi 
ˆ
 

1
where πˆi = .
1 + exp( xi' β )
yi
As fitting of data to the model becomes better, then πˆi ≡ and deviance residuals become smaller and
ni
close to zero.
In case of Poisson regression,
 yi 
=di yi ln   −  yi − exp(
= xi' β )  , i 1, 2,..., n.
 exp( xi
'
β ) 

Here yi and yˆi = exp( xi' βˆ ) become close to each other as deviance residuals approach zero.

The behaviour of deviance residuals is like the behaviour of ordinary residuals as in standard normal
linear regression model. The normal probability plot is obtained by plotting the deviance residuals on a
normal probability scale versus fitted values. Usually, the fitted values are transformed to constant
information scale before plotting, so
• yˆi is used in case of usual regression with normal distribution,

• 2sin −1 πˆi is used in case of logistic regression.

• 2 yˆi is used in Poisson regression.

• 2 ln yˆi is used when study variable has gamma distribution.

Regression Analysis | Chapter 16 | Generalized Linear Models | Shalabh, IIT Kanpur


11

You might also like