0% found this document useful (0 votes)
6 views

Logistic Regression

This document discusses logistic regression, which is a statistical analysis used to explain the relationship between a categorical dependent variable and one or more independent variables. Logistic regression allows prediction of the probability of occurrence of an event by fitting data to a logistic curve. The document covers the logistic regression model, interpretation of coefficients, and estimation of parameters in multiple logistic regression analysis.

Uploaded by

prabin regmi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Logistic Regression

This document discusses logistic regression, which is a statistical analysis used to explain the relationship between a categorical dependent variable and one or more independent variables. Logistic regression allows prediction of the probability of occurrence of an event by fitting data to a logistic curve. The document covers the logistic regression model, interpretation of coefficients, and estimation of parameters in multiple logistic regression analysis.

Uploaded by

prabin regmi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Logistic regression

Logistic regression analysis is a popular and widely


used analysis that is similar to linear regression analysis
except that the outcome is dichotomous (e.g.,
success/failure or yes/no or died/lived).

Logistic regression is used to describe data and to


explain the relationship between one dependent binary
variable and one or more nominal, ordinal, interval or
ratio-level independent variables.

2
Simple logistic regression analysis refers to the regression
application with one dichotomous outcome and one independent
variable.

Multiple logistic regression analysis applies when there is a single


dichotomous outcome and more than one independent variable.

Do body weight, calorie intake, fat intake, and age have an


influence on the probability of having a heart attack (yes vs. no)?

3
The Logistic Regression Model

The "logit" model solves these problems:

ln[p/(1-p)] =  + X + e

 p is the probability that the event Y occurs, p(Y=1)


 p/(1-p) is the "odds ratio"
 ln[p/(1-p)] is the log odds ratio, or "logit"

4
• The logistic distribution constrains the estimated
probabilities to lie between 0 and 1.

• The estimated probability is:

p = 1/[1 + exp(- -  X)]

• if you let  +  X = 0, then p = .50


• as  +  X gets really big, p approaches 1
• as  +  X gets really small, p approaches 0

5
Since 0 ≤ P ≤ 1

Odds = P/(1-P)

Odds has no “ceiling” but has “floor” of zero.

So we use the logit transformation


ln(P/(1-P)) = ln(odds) = logit(P)

Logit does not have a floor or ceiling.

6
Model:

ln(P/(1-P))=β0+ β1X1 + β2X2+…+βkXk


or
Odds= e(β0 + β1X1 + β2X2+…+βkXk)=elogit

Since P = odds/(1 + odds) & odds = elogit

P = elogit/(1 + elogit) = 1/(1 + e-logit)

7
If ln(odds)= β0+ β1X1 + β2X2+…+βkXk
then
odds = (eβ0) (eβ1X1) (eβ2X2)…(eβkXk)
or
odds = (base odds) OR1 OR2 … ORk

Model is multiplicative on the odds scale

(Base odds are odds when all Xs=0)


ORi = odds ratio for the ith X

8
Interpreting β coefficients

Example: Dichotomous X

X = 0 for males, X=1 for females


logit(P) = β0 + β1 X

M: X=0, logit(Pm)= β0

F: X=1, logit(Pf) = β0 + β1

logit(Pf) – logit(Pm) = β1

log(OR) = β1, eβ1 = OR


9
Table Age and signs of coronary heart disease (CD)

Age CD Age CD Age CD


22 0 40 0 54 0
23 0 41 1 55 1
24 0 46 0 58 1
27 0 47 0 60 1
28 0 48 0 60 0
30 0 49 1 62 1
30 0 49 0 65 1
32 0 50 1 67 1
33 0 51 0 71 1
35 1 51 1 77 1
38 0 52 0 81 1

10
Dot-plot:

Y
es

Signsofcoronarydisease

N
o

0 2
0 4
0 6
0 8
0 1
00
A
GE(y
ears
)

11
Picture of Logistic Regression

Points on regression line represent predicted probabilities


For Y for each value of X

12
Multiple logistic regression
• More than one independent variable
• Dichotomous, ordinal, nominal, continuous …

 P 
ln    α  β1x1  β2 x 2  ... βi xi
 1- P 

• Interpretation of bi :
• Increase in log-odds for a one unit increase in xi with all the other xis constant
• Measures association between xi and log-odds adjusted for all other xi

13
Example: P is proportion with disease
logit(P) = β0 + β1 age + β2 sex
“sex” is coded 0 for M, 1 for F
OR for F vs M for disease is eβ2 if both are the same age.

eβ1 is the increase in the odds of disease for a one year


increase in age.

(eβ1)k = ekβ1 is the OR for a ‘k’ year change in age in two


groups with the same gender.

14
Estimation of parameter

• Coefficients in the regression model are estimated by minimizing the sum of


squared errors

• Since, p is non-linear in the parameter estimates we need a non-linear estimation


technique
• Maximum-Likelihood Approach
• Non-Linear Least Squares

15

You might also like