0% found this document useful (0 votes)
22 views9 pages

Logistic Regression

Uploaded by

ismael kenedy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views9 pages

Logistic Regression

Uploaded by

ismael kenedy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Logistic regression

• It is a tool for modeling the effect of one or more risk factors on a binary (dichotomous)
response, usually this binary response indicates the presence or absence of “a disease”.
• Relative risk(RR): estimates the magnitude of an association between exposure and
“disease” and it indicates the likelihood of developing the disease in the exposed group
relative to those who are not exposed:
Cumulative Incidence(exposed)
RR 
Cumulative Incidence (unexposed)
Interpretation:
* RR  1 means no difference between the exposed and the unexposed groups with
respect to the incidence of the disease
* RR  1 the exposed group is more at risk of the disease than the unexposed group
* RR  1 indicate an inverse association between the exposure and the disease.

Current OC
(exposure/risk
factor) use Bacteriuria Total
Yes No
Yes 27 455 482
No 77 1831 1908
Total 104 2286 2390

Cumulative Incidence(exposed)
RR 
Cumulative Incidence (unexposed)
27 482
=  1.388
77 1908
• The ODDS ratio is another measure of association between exposure and disease:
ODDS:
*Given a patient is exposed to the risk factor, the ODDS of him/her getting the disease is
given by
P  disease | exp osure  
P  disease | exp osure  

*Similarly, given a patient is not exposed to the risk factor, the ODDS of him/her getting
the disease is given by

1
P  disease | exp osure  
P  disease | exp osure  

Then ODDS Ratio (OR) is given by:

P  disease | exp osure   P  disease | exp osure  


OR 
P  disease | exp osure   P  disease | exp osure  

Current
OC use Bacteriuria Total
Yes No
Yes 27 455 482
No 77 1831 1908
Total 104 2286 2390

P  disease | exp osure   P  disease | exp osure  


OR 
P  disease | exp osure   P  disease | exp osure  

=
 27 482   455 482   27 1831  1.41
 77 1908 1831 1908  77  455
Modeling Binary data

We are interested in the P(disease)  P( y  1)  p and particularly how this probability relates
to exposure status (or other risk factor).

A linear model of the form

P( y  1)  p  0  1x Will be unsuitable because

•  j are free to take any value between  ,   , this has implication that P( y  1) can
take any value  ,   but P( y  1) takes only on values in  0,1 .
• Data are not normally distributed, hence the theory underlining linear models will not
valid.
• The variance of the observed probability of success for each individual is not constant
i.e Var  pi 

2
To ensure that P( y  1) will lie in the interval  0,1 we transform the probability scale from the
range

 0,1 to  ,   . Then we formulate a linear model for transformed variable, which will
ensure that fitted probabilities lie between 0 and 1.

The particular transformation that will be concentrating on, is LOGIT transformation

 p 
log it ( p)  log     0  1 x  log  odds of succes 
 1 p 

Once we have estimated the parameters of the model we can back-transformation to obtain
estimate for p using

eb0 b1x
pˆ 
1  eb0 b1x

 j is the log of odds ratio of disease for exposure relative to non-exposure


Thus, e j is an estimate of the relative risk of the disease among the exposed relative to non-
exposed.

Confidence intervals for parameters

100 1    % Confidence interval for the coefficient  j is

 j  z s.e.   j  .
2

The confidence limits for the corresponding odds ratio are obtained by exponentiation of confidence
 j  z s .e.  j 
limits for the  j . That is e 2

Interpretation of the parameters in a linear model

1. For dichotomous exposure,

 j = log of odds ratio of disease for exposure relative to non-exposure

2. Continuous exposure variable


3. For a continuous exposure, X, consider the ratio of odds of disease for an individual for
whom the value of the continuous exposure is X  x  1 , relative to an individual with
X  x:

3
exp   0  1  x  1 
odds ratio   exp  1 
exp   0  1  x  

Where 1 is the change logarithm of odds ratio when X is increased by one unit.

The estimated change in the log odds when X is increased by r units is r 1 .

And the corresponding estimate of the odds ratio is exp  r 1  , which is an estimate of change in risk of
disease, for every increase of r units in the value of the X variable.

Example:

The data below come from a study to determine whether the levels of two proteins, Fibrinogen and
  globulin , increase the erythrocyte sedimentation rate(ESR) at which red blood cells settle out of
suspension in blood plasma. The two protein levels are measured in gm/l. the response variable
indicates whether we have a health individual (1), when ESR  20mn / h or unhealthy individual (0),
when ESR  20mn / h .

Gender Fibrinogen Globulin Response


1 2.52 38 0
1 2.56 31 0
0 2.19 33 0
0 2.18 31 0
1 3.41 37 0
0 2.46 36 0
0 3.22 38 0
1 2.21 37 0
0 3.15 39 0
1 2.6 41 0
1 2.29 36 0
0 2.35 29 0
0 5.06 37 1
1 3.34 32 1
1 2.38 37 1
1 3.15 36 0
0 3.53 46 1
0 2.68 34 0
1 2.6 38 0
0 2.23 37 0
1 2.88 30 0
0 2.65 46 0
1 2.09 44 1

4
0 2.28 36 0
0 2.67 39 0
0 2.29 31 0
0 2.15 31 0
1 2.54 28 0
0 3.93 32 1
0 3.34 30 0
1 2.99 36 0
1 3.22 35 0

Exercise:

Calculate the relative risk of disease related to 0.5 unit increase in Fibrinogen. Compute a 95%
confidence interval for this relative risk.

Logistic regression output;

Model Summary

Step -2 Log likelihood Cox & Snell R Nagelkerke R Square


Square

1 22.874a .221 .358

a. Estimation terminated at iteration number 6 because parameter estimates


changed by less than .001.

Variables in the Equation

B S.E. Wald df Sig. Exp(B)

Fibrinogen 1.942 .985 3.889 1 .049 6.972

Step 1a Globulin .155 .120 1.687 1 .194 1.168

Constant -12.864 5.822 4.883 1 .027 .000

a. Variable(s) entered on step 1: Frabrino, Globulin.

5
Solution:
 
ˆ fibrinogen  1.94 & s.e. ˆ fibrinogen  0.985

Thus, the odds ratio for 1 unit increase in fibrinogen is


e1.94  6.959 .
A 95% confidence interval for ˆ fibrinogen :
1.94  1.96  0.985,1.94  1.96  0.985    0.0094,3.871

A 95% confidence interval for OR:


e 0.0094
, e3.871   1.009, 48.009 

The effect of a 0.5 unit increase in fibrinogen is


exp  0.5 1.941   2.64

To compute a 95% for this odds ratio, we first compute a 95% CI for
0.5ˆ fibrinogen


0.5ˆ fibrinogen  z  0.5  s.e. ˆ fibrinogen
2

That is,  0.5 1.94  1.96  0.5  0.985, 0.5  1.94  1.96  0.5  0.985    0.0047,1.935 

Thus, the corresponding 95% CI for odds ratio is


e 0.0047
, e1.935   1.0047, 6.929 

Variables in the Equation

B S.E. Wald df Sig. Exp(B) 95% C.I.for EXP(B)


Lower Upper

Frabrino 1.942 .985 3.889 1 .049 6.972 1.012 48.034

Step 1a Globulin .155 .120 1.687 1 .194 1.168 .924 1.477

Constant -12.864 5.822 4.883 1 .027 .000

a. Variable(s) entered on step 1: Frabrino, Globulin.

6
HYPOTHESIS TESTING
▪ The Wald statistic for the  coefficient is:
2
Wald = [ /s.e. ]
B

which is distributed chi-square with 1 degree of freedom.


▪ The "Partial R" (in SPSS output) is
1/2
R = {[(Wald-2)/(-2LL()]}

Example

Variable B S.E. Wald R Sig t-value

PETS -0.659 0.2012 10.732 -0.113 0.0011 -3.28


MOBLHOME 1.5583 0.2874 29.39 0.1996 0 5.42
TENURE -0.02 0.008 6.1238 -0.078 0.0133 -2.48
EDUC 0.0501 0.0468 1.1483 0.0000 0.2839 1.07
Constant -0.916 0.69 1.7624 1 0.1843 -1.33

EVALUATING THE PERFORMANCE OF THE MODEL

There are several statistics which can be used for comparing alternative models or evaluating the
performance of a single model:

• Model Chi-Square
• Percent Correct Predictions
• Pseudo-R2

1. MODEL CHI-SQUARE

▪ The model likelihood ratio (LR), statistic is

LR[i] = -2[LL() - LL(, ) ]


{Or, as you are reading SPSS printout:

LR[i] = [-2LL (of beginning model)] - [-2LL (of ending model)]}


▪ The LR statistic is distributed chi-square with i degrees of freedom,
where i is the number of independent variables

7
▪ Use the “Model Chi-Square” statistic to determine if the overall
model is statistically significant.

Example

Beginning Block Number 1. Method: Enter


-2 Log Likelihood 687.35714

Variable(s) Entered on Step Number


1.. PETS PETS
MOBLHOME MOBLHOME
TENURE TENURE
EDUC EDUC

Estimation terminated at iteration number 3 because


Log Likelihood decreased by less than .01 percent.

-2 Log Likelihood 641.842

Chi-Square df Sign.

Model 45.515 4 0.0000

2. PERCENT CORRECT PREDICTIONS

▪ The "Percent Correct Predictions" statistic assumes that if the estimated p is greater
than or equal to .5 then the event is expected to occur and not occur otherwise.
▪ By assigning these probabilities 0s and 1s and comparing these to the actual 0s and
1s, the % correct Yes, % correct No, and overall % correct scores are calculated.

Example

Observed Predicted % Correct
0 1
0 328 24 93.18%
1 139 44 24.04%
Overall 69.53%

3. PSEUDO-R
2 2
▪ One psuedo-R statistic is the McFadden's-R statistic:

8
2
McFadden's-R = 1 - [LL(,)/LL()]
{= 1 - [-2LL(, )/-2LL()] (from SPSS printout)}
2
▪ where the R is a scalar measure which varies between 0 and (somewhat close to) 1
2
much like the R in a LP model.

An Example:

Beginning -2 LL 687.36
Ending -2 LL 641.84
Ending/Beginning 0.9338
McF. R2 = 1 - E./B. 0.0662

You might also like