Logistic Nota

GENERALIZED LINEAR MODEL
LOGISTIC REGRESSION
Generalized Linear Models

Traditional applications of linear models, such as
SLR and multiple linear regression, assume that

the response variable is
Normally distributed
Constant variance
Independent
There are many situations where these
assumptions are inappropriate
The response is either binary (0,1), or a count

The response is continuous, but nonnormal
2
Some Approaches to These Problems

Data transformation
Induce approximate normality
Stabilize variance
Simplify model form
Weighted least squares
Often used to stabilize variance
Generalized linear models (GLM)
Approach is about 25-30 years old, unifies linear and nonlinear
regression models
Response distribution is a member of the exponential family
(normal, exponential, gamma, binomial, Poisson)

Original applications were in biopharmaceutical sciences
Lots of recent interest in GLMs in industrial statistics
GLMs are simple models; include linear regression and
OLS as a special case

Parameter estimation is by maximum likelihood
(assume that the response distribution is known)
Inference on parameters is based on large-sample or
asymptotic theory
We will consider logistic regression, Poisson regression,
then the GLM
4
Logistic regression: an overview

1. Models with binary outcomes
2.
Problems with linear models using binary outcomes
3.
What logit models look like
4.
Predicting y and/or odds of y for a given x.
1. Binary outcome..
Binary outcomes are outcomes with two possible values, Success
or failure
The outcome of interest (success) is commonly scored 1 if it
occurs, otherwise 0 (failure). The units of analysis for binary

(0,1) outcomes are individuals.
Occurs often in the biopharmaceutical field; dose-response
studies, bioassays, clinical trials
Industrial applications include failure analysis, fatigue
testing, reliability testing. Example: functional electrical

testing on a semiconductor can yield: success in which case
the device works or failure due to a short, an open, or some other

failure mode
Other examples: college graduation, employment,
improvement under a treatment.
Binary Response Variables

Possible model:
i 1, 2,..., n
yi 0 j xij i xi i
j 1
yi 0 or 1
k
The response yi is a Bernoulli random variable
P ( yi 1) i with 0 i 1
P ( yi 0) 1 i
E ( yi ) i xi i
Var ( yi ) i (1 i )
2
yi
2. Problems With This Model

The error terms take on only two values, so they cant
possibly be normally distributed.

Error distribution is neither identical nor normal.
The variance of the observations is a function of the mean
(see previous slide).
Heteroskedasticity is more of a problem, but still often not
fatal because it acts in a conservative direction
2. Problems With This Model

A linear response function could result in predicted values that
fall outside the 0, 1 range, and this is impossible because
0 E ( yi ) i xi i 1
Nonsensical predictions.
Bad predictions due to nonlinear functional form even within
reasonable values of y.
Binary Response Variables The Challenger

Data
10
At Least One
O-ring Failure
Temperature
at Launch
1.0
At Least One
O-ring Failure
53
70
56
70
57
72
63
73
66
75
67
75
67
76
67
76
68
78
69
79
70
80
70
81
O-Ring Fail
Temperature
at Launch
0.5
0.0
50
60
70
Temperature
Data for space shuttle

launches and static tests
prior to the launch of
Challenger
80
A solution for nonlinear relationships:

Linear model: yi = i + ixi + i
(identity transform: no change in yi)

Generalized linear model: F(yi) = i + ixi + i
(F is some function such that F(y) is linear with xk)

Logit model: log( pi /(1- pi) )= i + ixi + i
(log( pi /(1- pi) is the log odds or logit of pi)
Binary Response Variables

There is a lot of empirical evidence that the response
function should be nonlinear; an S shape is quite logical

See the scatter plot of the Challenger data
The logistic response function is a common choice
exp(x
1
E ( y)
1 exp(x 1 exp(x
12
13
Logistic Regression Curve

1.0
Probability
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
9 10 11 12 13 14 15 16 17 18 19 20 21
Assumption
pii
P
(pi )
Logit
Transform
Predictor
Predictor
The Logistic Response Function

The logistic response function can be easily linearized. Let:
x and E ( y )
Define
ln
This is called the logit transformation
16
Logistic Regression Model

Model:
yi E ( yi ) i
where
E ( yi ) i
exp(xi
1 exp( xi
The model parameters are estimated by the method of
maximum likelihood (MLE)

17
Maximum Likelihood Estimation in Logistic

Regression
The distribution of each observation yi is
fi ( yi ) iyi (1 i )1 yi , i 1, 2,..., n
The likelihood function is
n
i 1
L(y , f i ( yi ) iyi (1 i )1 yi
We usually work with the log-likelihood:
i n
ln L(y, ln fi ( yi ) yi ln
ln(1 i )
i 1
i 1
1 i i 1
n
18

Regression
The maximum likelihood estimators (MLEs) of the model
parameters are those values that maximize the likelihood

(or log-likelihood) function
ML has been around since the first part of the previous
century
Often gives estimators that are intuitively pleasing
MLEs have nice properties; unbiased (for large samples),
minimum variance (or nearly so), and they have an
approximate normal distribution when n is large
19

Regression
If we have ni trials at each observation, we can write the
log-likelihood as
ln L(y, Xy ni ln[1 exp( xi

i 1
The derivative of the log-likelihood is

n
ni
ln L(y,
X y
exp(xixi
i 1 1 exp( xi
n
Xy ni i xi
i 1
Xy X because i ni i )
20

Regression
Setting this last result to zero gives the maximum likelihood
score equations
X(y 0
These equations look easy to solveweve actually seen them
before in linear regression:

y X
X y 0 results from OLS or ML with normal errors
Since XX y X y X 0,
XX Xy, and XX)1 Xy (OLS or the normal-theory MLE)
21

Regression
Solving the ML score equations in logistic regression isnt quite as
easy, because
ni
, i 1, 2,..., n
1 exp(xi
Logistic regression is a nonlinear model

It turns out that the solution is actually fairly easy, and is based on
iteratively reweighted least squares or IRLS

An iterative procedure is necessary because parameter estimates
must be updated from an initial guess through several steps
Weights are necessary because the variance of the observations is
not constant
The weights are functions of the unknown parameters
22
3. What does the logit mean?

The logit model is related to the odds for a binary outcome.
odds = Pr(y=1) / Pr(y=0) = p/(1-p)
log odds = ln (odds) = ln (p/(1-p))
(in statistics, all logs refer to the natural log:
If x = en where e = 2.718, then ln(x) = n
Thus, the logit is the predicted log odds of Y for a given value of x
Interpretation of the Parameters in Logistic

Regression
The log-odds at x is
( x)
( x) ln
0 1 x
1 ( x)
The log-odds at x + 1 is
( x 1)
( x 1) ln
0 1 ( x 1)
1 ( x 1)
The difference in the log-odds is
( x 1) ( x) 1
24
Interpretation of the Parameters in Logistic

Regression
The odds ratio is found by taking antilogs:
Odds x 1
1
OR
e
Odds x
The odds ratio is interpreted as the estimated increase in
the probability of success associated with a one-unit

increase in the value of the predictor variable
25
4. Computing p from a log odds

The formal statement of the logit model is (again)
p
log i
1 pi
Then
Then
i i xi i
Note: ( x) p
p
e x
1 p
e x
p
1 e x
x
e
Or, predicted p, p
1 e x
Thus, getting meaningful predictions out of your model for a reader
to understand, takes a bit of work.
Inference on the Model Parameters

Likelihood ratio tests (LRT)
A LRT can be used to compare a full model with a reduced model
of interest.
Analogous to the extra sum of squares technique to compare full
and reduced models.
The LRT compares twice the logarithm of the value of the likelihood
function for the full model (FM) to twice the logarithm of the value of
the likelihood function of the reduced model (RM) to obtain test
statistic:
L( FM )
LR 2 ln
2[ln L( FM ) ln L( RM )]
L( RM )
27
For large samples, when the reduced model is correct, the test
statistic LR follows a chi-square distribution with df equal to the
difference in the no of parameters between full & reduced models.
2
If LR > ,df , we would reject the claim that the reduced model is
appropriate.

LR approach can be used to provide a test for significance of logistic
regression
uses current model (fit the data) as the full model & compares it to a
reduced model (only has constant prob of success). The constant probability
of success model is
e 0
E ( y) p
1 e 0
logistic regression model with no regressor variables.
The MLE of reduced model is just y/n.
Substituting this into log likelihood fcn gives the max values of the likelihood
fcn for the reduced model as
ln L( RM ) y ln( y ) (n y ) ln(n y ) n ln(n)

Therefore, the LRT for testing significance of regression is:
28
n
n
LR 2 yi ln p i (n y )i ln(1 p i ) [ y ln( y ) (n y ) ln(n y ) n ln(n)]

i 1
i 1
Testing Goodness of Fit (GOF)

The GOF of the logistic regression model can also be assessed using a LRT
procedure.
This test compares the current model to a saturated model, where each obs (or
group obs when n=1) is allowed to have its own parameter (a success probability).
The Deviance is defined as twice the difference in log-likelihoods between this
saturated model and the full model (current model) that has been fit to the data
with estimated
xi'
p i
1 e xi
'
The Deviance is defined as

n
yi
ni yi
L(saturated model)
D 2 ln
2 yi ln(
) (ni yi ) ln(
)
L(FM)
n
p
n
(1
p
)
i 1
i i
i
i
29
Testing Goodness of Fit (GOF)

In calculating the deviance, y ln(
y n we have (n - y ) ln(
n y
) 0.
n(1 p)
y
) 0 if y 0 and if
np
When the logistic regression model is an adequate fit to the data, & the sample
size is large, the deviance has a chi-square distribution with n-p df; p is no of
parameters in the model.
Small values of deviance (or large p value) imply that the model provides a
satisfactory fit to the data, while large values of deviance imply that the current
model is not adequate.
A good rule of thumb is to divide the deviance by its number of degrees of

freedom.
If the ratio D/(n-p) is much greater than unity, the current model is not an
adequate fit to the data.
30
Pearson chi-square goodness-of-fit statistic:
The Pearson chi-square GOF statistic can be compared to a chi-square

distribution with n-p degrees of freedom.
Small values of the statistic (or a large P value) imply that the model
provides a satisfactory fit to the data.
The Pearson chi-square statistic can also be divided by the number of df np and the ratio compared to unity.
If the ratio greatly exceeds unity, the GOF of the model is questionable.
31
The Hosmer-Lemeshow (HL) goodness-of-fit statistic:

When there are no replicates on the regressor variables, the observations can be
grouped to perform a GOF test called the Hosmer-Lemeshow test.
In this procedure, the observations are classified into g groups based on the
estimated probabilities of success.
Generally, about 10 groups are used (when g=10 the groups are called the
deciles of risk) and the observed number of successes Oj and failures Nj-Oj are
compared with the expected frequencies in each group,
and
where Nj is the number of obs in the jth group and the average estimated success
probability in the jth group is
The Hosmer-Lemeshow statistic is really just a Pearson chi-square GOF statistic

comparing observed and expected frequencies:
32
If the fitted logistic regression model is correct, the HL statistic follows a chisquare distribution with g-2 df when the sample size is large.
Large values of the HL imply that the model is not adequate fit to the data.
It is also useful to compute the ratio of the HL to the no of df g-p with values
close to unity implying an adequate fit.
H null : model is not adequate
H 1 : model is adequate
33
Likelihood Inference on the Model Parameters

Deviance can also be used to test hypotheses about subsets of the
model parameters (analogous to the extra SS method)

Procedure:
X1 X 2 2 , with p parameters, 2 has r parameters

This full model has deviance (
H 0 : 2 0
H1 : 2 0
The reduced model is X1 , with deviance (1 )
The difference in deviance between the full and reduced models is
( | 1 ) (1 ) (with r degrees of freedom

( | 1 ) has a chi-square distribution under H 0 : 0
Large values of ( | 1 ) imply that H 0 : 0 should be rejected
34

Tests on individual model coefficients can also be done using Wald
inference
Uses the result that the MLEs have an approximate normal
distribution, so the distribution of
Z0
se( )
is standard normal if the true value of the parameter is zero. Some
computer programs report the square of Z (which is chi-square), and
others calculate the P-value using the t distribution
35
Logistic Regression with 1 Predictor

Response - Presence/Absence of characteristic
Predictor - Numeric variable observed for each case
Model - (x) Probability of presence at predictor level x
0 1 x
e
( x)
0 1 x
1 e
= 0 P(Presence) is the same at each level of x
> 0 P(Presence) increases as x increases
1< 0 P(Presence) decreases as x increases
Logistic Regression with 1 Predictor
0 are unknown parameters and must be estimated using
statistical software such as SPSS, SAS, R or STATA (or in a matrix

language)
Primary interest in estimating and testing hypotheses regarding
Large-Sample test (Wald Test):
H0: = 0
HA: 0
2
T .S . : X obs
2
R.R. : X obs
^

^1
^
1
2 ,1
2
P val : P ( 2 X obs
)
Note: Some software packages

perform this as an equivalent Ztest or t-test
Odds Ratio
Interpretation of Regression Coefficient ():
In linear regression, the slope coefficient is the change in the mean response
as x increases by 1 unit
In logistic regression, we can show that:
odds( x 1)
e
odds( x)
( x)
odds( x)
1 ( x)
Thus e represents the change in the odds of the outcome

(multiplicatively) by increasing x by 1 unit
If = 0, the odds and probability are the same at all x levels (e=1)
If > 0 , the odds and probability increase as x increases (e>1)
If < 0 , the odds and probability decrease as x increases (e<1)
95% Confidence Interval for Odds Ratio

Step 1: Construct a 95% CI for :
^
^
^
^
^
^
1.96 , 1.96
1.96
^
Step 2: Raise e = 2.718 to the lower and upper bounds of the CI:
e 1.96 , e 1.96
^ ^
^ ^
If entire interval is above 1, conclude positive association

If entire interval is below 1, conclude negative association
If interval contains 1, cannot conclude there is an association
EXAMPLE 1: Sex ratios in insects
Ex. Sex ratios in insects (the proportion of all

individuals that are males)
In the species in question, it has been observed that the sex ratio
is highly variable, and an experiment was set up to see whether

population density was involved in determining the fraction of
males.
Density
1
4
10
22
55
121
210
444
females
1
3
7
18
22
41
52
79
male
0
1
3
4
33
80
158
365
Ex. Sex ratios in insects (the proportion of all

individuals
that
It certainly looks
as ifare
theremales)
are proportionally more males at
density,
but
shoulditplot
the data
as proportions
see
high
In the
species
in we
question,
has been
observed
that the sextoratio
more clearly.
is highly variable, andthis
an experiment
was set up to see whether
population density was involved in determining the fraction of
males.
Density
1
4
10
22
55
121
210
444
females
1
3
7
18
22
41
52
79
male
0
1
3
4
33
80
158
365
Enter the data into R

Make it as a data frame:
Density
1
4
10
22
55
121
210
444
females
1
3
7
18
22
41
52
79
male
0
1
3
4
33
80
158
365
Evidently, a logarithmic transformation of the explanatory variable is

likely to improve the model fit (population density involved in
determining the fraction of males)
Question: increasing pop. density leads to a significant increase
in the proportion of males in the population? (whether the sex ratio

is density-dependent?)
The response variable matched pair of counts that we wish to
analyse as proportion data

The explanatory variable population density
First: bind together the vectors of male and female counts into a
single object that will be the response in the analysis.

y <- cbind(males,females)
y will be interpreted in the model as the proportion of all indiv.
that were male.
Then, fit the generalized linear model; link function:binomial
Intercept
slope
If residual deviance > residual d.o.f , we
called overdispersion
The slope is highly significantly steeper than zero (proportionately more

males at higher population density)
See whether if log transformation of the explanatory variable reduces the
residual deviance below 22.091
In GLM, it is assumed that the residual deviance is the same as the
residual degrees of freedom.

If the residual deviance > residual degrees of freedom, we called
OVERDISPERSION
OVERDISPERSION there is extra, unexplained variation, over
and above the binomial variance assumed by the model
specification.
How to overcome?
By transformation
Or use quasi-likelihood (in family argument)
Eg: glm(y~log(x), family=quasibinomial) if binomial dist.
In the model with log(density), there is no evidence of

overdispersion.
Deviance table correspond to ANOVA tables
The analysis of deviance table

Deviance column differences between models as variables
are added to the model in turn.

The deviances are approximately chi-square distributed with
the stated dof.
Necessary to add the test=chisq argument to get the
approx. chisq tests.
If more than one predictor, to test which predictor should be
stay/ remove from the model, we can use function:
> drop1(model,test="Chisq")
Measure of Fit
The deviance shows how well the model fits the data
Comparing two models deviances
Use a likelihood ratio test
Compare using Chi-square distribution
To do the reverse transformation

to get the model coefficients
Model checking..
Plot the model
Residual vs fitted values

2. Normal plot
3. Diagnostic checking ; cooks distance etc
1.
> par(mfrow=c(2,2))
> plot(model)
No pattern in the residuals against fitted values

Normal plot is reasonably linear
Point no. 4 is highly influential ( it has a large Cooks
distance), but the model is still significant with the
point omitted.
Conclusion of the example:

We conclude that the proportion of animals that are males
increases significantly with increasing density

The logistic model is linearized by logarithmic transformation of
the explanatory variable (population density).
Draw the fitted line through the scatter plot:
xv <- seq(0,6,0.1)
plot(log(density),p,ylab="Proportion male")
lines(xv,predict(model,list(density=exp(xv)),type="response"))
The use of type=response to back-transform from logit scale to
the S-shaped proportion scale.
EXAMPLE 2:
Challenger Data
Temperature at At Least One

Launch
O-ring Failure
Temperature at At Least One

Launch
O-ring Failure
53
70
56
70
57
72
63
73
66
75
67
75
67
76
67
76
68
78
69
79
70
80
70
81
A Logistic Regression Model for the

Challenger Data
Test that all slopes are zero: G = 5.944, DF = 1,
P-Value = 0.015
Goodness-of-Fit Tests
Method
Chi-Square
DF
Pearson
14.049
15
0.522
Deviance
15.759
15
0.398
Hosmer-Lemeshow
11.834
0.159
exp(10.875 0.17132 x)
y
1 exp(10.875 0.17132 x)
68
Note that the fitted function has

been extended down to 31 deg F,
the temperature at which
Challenger was launched
69
Odds Ratio for the Challenger Data

O R e 0.17132 0.84
This implies that every decrease of one degree in temperature
increases the odds of O-ring failure by about 1/0.84 = 1.19 or 19
percent
The temperature at Challenger launch was 22 degrees below the
lowest observed launch temperature, so now
O R e22( 0.17132) 0.0231

This results in an increase in the odds of failure of 1/0.0231 =
43.34, or about 4200 percent!!
Theres a big extrapolation here, but if you knew this prior to
launch, what decision would you have made?
70
The Pneumoconiosis Data

Example 3
Another Logistic Regression Example: The

Pneumoconiosis Data
A 1959 article in Biometrics reported the data:
72
73
74
The fitted model:
75
76
Linear Regression Analysis 5E Montgomery, Peck &

Vining
77
78

Vining
Diagnostic Checking
79

Vining
80

Vining
81

Vining
82

Vining
Useful qualities of the logit for social analysis.

The use of odds in the outcome variable makes the model more
sensitive to changes near p=0 or p=1 than to changes near p=.5

This is appropriate in that small absolute changes in proportions near 0 or 1
tend to reflect bigger effects than small absolute changes in proportions

near .5
The use of the log function in the outcome variable makes the
model sensitive to relative changes in proportions rather than

absolute changes.
This is appropriate in that explanatory variables often have multiplicative
rather than additive effects on response variables.
Advantages of the logit model over the linear

regression model for binary outcomes.
1.) The logit of the outcome tends to have a linear relationship with
the explanatory variables.

(This is the most important advantage!)
2.) The logit of the outcome can go to + or -, so it is
impossible to have meaningless predictions for the outcome

variable.
3.) The logit model produces results equivalent to those of a
homoskedastic model.
One important disadvantage of the logit model:

Estimation
A given individual either will or will not have the outcome, so the
observed p = 0 or p = 1 for all cases.

What is the logit when p = 0? When p = 1?
This problem makes it impossible to do least squares estimation of a logit
model.
least squares estimates minimize (observed expected)2, and the logit of the
observed is always undefined!

it is impossible to directly standardize logits, so there is no true r or r2 for a
logit model.
Solving the estimation problem for logit models:

Logit models are not solved by least squares estimation, but by a
completely different procedure called maximum likelihood

estimation.
least squares procedures are based on the notion of a sampling distribution; a
universe of possible samples coming from a single true population

parameter.
maximum likelihood procedures are based on the notion of a universe of
possible population parameters that could produce the one observed sample.
Standard errors are comparable in the two procedures, but
computation for MLE is prohibitively time-consuming for humans.
Summary of this lecture:

You should be able to do the following:
explain the problems (in order of importance) with using a linear regression
model when there is a binary outcome.

define a logit model in equations and in words
explain why a logit model often overcomes the problems of a linear
regression model
look at the output of a logit model and be able to
predict y,
Predict odds of y, and

And predict log odds of y for a given x,
and to express the slope as an odds ratio

Logistic Nota

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Logistic Nota

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Logistic Nota

Uploaded by

Copyright:

Available Formats

GENERALIZED LINEAR MODEL

Generalized Linear Models

SLR and multiple linear regression, assume that

There are many situations where these

assumptions are inappropriate

The response is either binary (0,1), or a count

Some Approaches to These Problems

Generalized Linear Models

OLS as a special case

Logistic regression: an overview

Problems with linear models using binary outcomes

What logit models look like

Predicting y and/or odds of y for a given x.

The outcome of interest (success) is commonly scored 1 if it

occurs, otherwise 0 (failure). The units of analysis for binary

testing, reliability testing. Example: functional electrical

the device works or failure due to a short, an open, or some other

improvement under a treatment.

Binary Response Variables

The response yi is a Bernoulli random variable

2. Problems With This Model

possibly be normally distributed.

2. Problems With This Model

fall outside the 0, 1 range, and this is impossible because

Binary Response Variables The Challenger

Data for space shuttle

A solution for nonlinear relationships:

(identity transform: no change in yi)

(F is some function such that F(y) is linear with xk)

(log( pi /(1- pi) is the log odds or logit of pi)

Binary Response Variables

function should be nonlinear; an S shape is quite logical

Logistic Regression Curve

The Logistic Response Function

This is called the logit transformation

Logistic Regression Model

The model parameters are estimated by the method of

maximum likelihood (MLE)

Maximum Likelihood Estimation in Logistic

Maximum Likelihood Estimation in Logistic

parameters are those values that maximize the likelihood

Maximum Likelihood Estimation in Logistic

ln L(y, Xy ni ln[1 exp( xi

The derivative of the log-likelihood is

Maximum Likelihood Estimation in Logistic

These equations look easy to solveweve actually seen them

before in linear regression:

Maximum Likelihood Estimation in Logistic

Logistic regression is a nonlinear model

iteratively reweighted least squares or IRLS

3. What does the logit mean?

Interpretation of the Parameters in Logistic

The difference in the log-odds is

Interpretation of the Parameters in Logistic

the probability of success associated with a one-unit

4. Computing p from a log odds

Thus, getting meaningful predictions out of your model for a reader