PH 750 Fundamentals of Biostatistics
Logistic Regression
Open NCBIRTH800mod dataset
Recall that the North Carolina State Center for Health Statistics makes publicly available birth and infant
death data for all children born in the state of North Carolina. This comprehensive dataset for the births
in 2001 contains 120,300 records. The data in ‘NCBIRTH800.sav’ represents a random sample of 800 of
those births and selected variables.
Linear regression we are looking at the correlation btn two continuous data.
Outcome has to be continuous
Logistic regression…binary/two value ie alive/dead..sick/healthy….treatment
response/not
Not linear bcs it is not continuous.there is no line that can fit.
Logistic Regression
1. Is low birth weight associated with mother’s smoking status during pregnancy after
accounting for gestational age?
EXPLORATORY ANALYSIS
First construct contingency tables for each categorical explanatory variable and the outcome variable
• ANALYZE →DESCRIPTIVES →CROSSTABS
Smoked during pregnancy * Low birthweight infant Crosstabulation
Low birthweight infant Total
Infant was Infant was
not low low
birthweight birthweight
Count 629 55 684
Mother did not smoke
during pregnancy % within Smoked 92.0% 8.0% 100.0%
Smoked during during pregnancy
pregnancy Count 99 15 114
Mother did smoke
during pregnancy % within Smoked 86.8% 13.2% 100.0%
during pregnancy
Count 728 70 798
Total % within Smoked 91.2% 8.8% 100.0%
during pregnancy
Chi-Square Tests
Value df Asymp. Sig. (2- Exact Sig. (2- Exact Sig. (1-
sided) sided) sided)
Pearson Chi-Square 3.197a 1 .074
Continuity Correction b 2.590 1 .108
Likelihood Ratio 2.877 1 .090
Fisher's Exact Test .105 .059
1
PH 750 Fundamentals of Biostatistics
Linear-by-Linear Association 3.193 1 .074
N of Valid Cases 798
a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 10.00.
b. Computed only for a 2x2 table
Summary statistics:
% low birth weight infants for mothers who smoked during pregnancy: 13.2%
% low birth weight infants for mothers who did not smoke during pregnancy: 8.0%,
629∗15
χ²=3.20, df=1, p=0.074, unadjusted OR= =1.73
99∗55
The odds of low birth weight infants are 1.73 times higher among mothers who smoked during pregnancy
as compared to those who did not smoke. This difference is marginally significant (p=0.074)
Note: as part of exploratory analysis you may also assess the relationship between any continuous
explanatory variables and the outcome using the independent samples t-test (not shown here).
Logistic regression:
ANALYZE →REGRESSION →BINARY LOGISTIC
Under Options, check CI for exp(B).
Variables in the Equation
B S.E. Wald df Sig. Exp(B) 95% C.I.for EXP(B)
Lower Upper
Step 1a smoke .672 .378 3.166 1 .075 1.958 .934 4.105
weeks -.610 .070 76.418 1 .000 .543 .474 .623
2
PH 750 Fundamentals of Biostatistics
Constant 20.362 2.582 62.183 1 .000 697088468.827
a. Variable(s) entered on step 1: smoke, weeks.
Interpretation: The odds of low birth weight babies are 1.96 times higher among mothers who smoked
during pregnancy as compared to those who did not smoke, after adjusting for gestational age
(CI for OR (.93, 4.11)), this association is only marginally significant (p=.075).
2. Is low birth weight associated with mother’s ethnicity, after accounting for gestational
age, mother’s age and smoking status during pregnancy?
Note: ‘Other’ ethnicity category was excluded because of the small sample size in that group
Select Non-Hispanic White, Non-Hispanic Black, Hispanic mothers for the analysis:
DATA Select Cases Choose “If condition satisfied”
Click “ethnmom” in and type <4
3
PH 750 Fundamentals of Biostatistics
Click on Continue>OK
Note that there is a new variable in the dataset called filter_$, which indicates which cases satisfy the
condition (here indicating all Non Hispanic White, Non Hispanic Black and Hispanic mothers in the
dataset)
UNADJUSTED (CRUDE) ANALYSIS
ANALYZE →REGRESSION →BINARY LOGISTIC
Under Categorical move ethnmom into the Categorical Covariates box and select First to indicate that
Non Hispanic White mothers are the reference category
Reference category check:
• SPSS output: choice of reference category
• Variable coding in SPSS:
• 1- Non-Hispanic White
• 2-Non-Hispanic Black
• 3-Hispanic
• In logistic regression, the category coded with all zeros is the reference categories, here Non
Hispanic White
Categorical Variables Codings
Frequency Parameter coding
4
PH 750 Fundamentals of Biostatistics
(1) (2)
Non-Hispanic White 523 .000 .000
Ethnicity Non-Hispanic Black 168 1.000 .000
Hispanic 82 .000 1.000
Sample size for this analysis:
Case Processing Summary
Unweighted Casesa N Percent
Included in Analysis 773 100.0
Selected Cases Missing Cases 0 .0
Total 773 100.0
Unselected Cases 0 .0
Total 773 100.0
a. If weight is in effect, see classification table for the total number
of cases.
Dependent variable coding:
Dependent Variable Encoding
Original Value Internal
Value
Infant was not low birthweight 0
Infant was low birthweight 1
Logistic regression model:
Variables in the Equation
B S.E. Wald df Sig. Exp(B) 95% C.I.for EXP(B)
Lower Upper
ethnmom 6.913 2 .032
ethnmom(1) .699 .275 6.463 1 .011 2.012 1.174 3.451
Step 1a
ethnmom(2) -.048 .455 .011 1 .916 .953 .391 2.325
Constant -2.491 .165 229.245 1 .000 .083
a. Variable(s) entered on step 1: ethnmom.
In the unadjusted (crude) analysis:
The odds of having low birth babies were higher for Non-Hispanic Black mothers, as compared to
Non-Hispanic White mothers (reference category): OR=2.01, 95%CI (1.47, 3.45), p=0.11
There were no statistically significant differences in the probability of having low birth weight
babies between Hispanic vs. Non-Hispanic White mothers (reference categories): OR=.92,
95%CI(.39, 2.32), p=.916.
5
PH 750 Fundamentals of Biostatistics
ADJUSTED ANALYSIS
ANALYZE →REGRESSION →BINARY LOGISTIC
Under Categorical move ethnmom into the Categorical Covariates box and select First to indicate that
Non Hispanic White mothers are the reference category
Variables in the Equation
B S.E. Wald df Sig. Exp(B) 95% C.I.for EXP(B)
Lower Upper
smoke .814 .389 4.378 1 .036 2.257 1.053 4.840
weeks -.612 .071 74.059 1 .000 .542 .472 .623
mage .000 .027 .000 1 .988 1.000 .949 1.054
Step 1a ethnmom 1.578 2 .454
ethnmom(1) .461 .368 1.570 1 .210 1.585 .771 3.257
ethnmom(2) .223 .514 .189 1 .664 1.250 .456 3.425
Constant 20.311 2.725 55.559 1 .000 662045947.745
a. Variable(s) entered on step 1: smoke, weeks, mage, ethnmom.
There are no statistically significant differences in the probability of having low birth weight babies
between Non-Hispanic Black mothers vs. Non-Hispanic Black mothers (p=.210) and Hispanic vs. Non-
Hispanic White mothers (p=.664), after adjusting for gestational age, mother’s age and smoking during
pregnancy.