77 MultipleRegression
77 MultipleRegression
stcp-marshall-MultipleRegressionS
The following resources are associated: Simple linear regression in SPSS, Scatterplots and correlation in SPSS,
Checking normality in SPSS and the SPSS dataset ‘Birthweight_reduced.sav’
Weight of
mother before
Mother
pregnancy
smokes = 1
The Simple linear regression in SPSS resource should be read before using this sheet.
The output shows that gestational age has a strong relationship with birthweight (r =
0.706), maternal height (r = 0.368) and pre-pregnancy weight (0.39) are moderately
related with birthweight. The relationship between maternal height and weight is strong (r
= 0.691) but not above 0.8.
Scatterplots should be produced for each
independent with the dependent so see if the
relationship is linear (scatter forms a rough
line). Binary variables can be distinguished by
different markers on scatterplots which helps
to investigate patterns within groups.
The relationship between gestational age and
birthweight is clearly linear. The babies of
smokers tend to be lighter at each gestational
age.
Steps in SPSS
To run a regression, go to Analyze Regression Linear
Move ‘Birth weight’ to the Dependent box and ‘Gestational age at birth’, ‘Smoker’ and
‘mppwt’ (mothers’ pre-pregnancy weight) to the Independent(s) box. Multicollinearity can
be checked using the Collinearity diagnostics in the Statistics menu. In the Plots menu,
move ZRESID to the Y box and ZPRED to the X box to check the assumption of
homoscedasticity. Request the Histogram to check the normality of residuals.
Output
The Coefficients table contains the coefficients for the regression equation (model), tests
of significance for each variable and collinearity statistics.
The Sig column contains the p-values for each of the independent variables. The
hypothesis being tested for each is that the coefficient (B) is 0 after controlling for the other
variables. For example, the effects of gestational age and smoking are removed before
assessing the relationship between the weight of the mother and the weight of the baby. A
p-value < 0.05, provides evidence that the coefficient is different to 0. Gestational age (p <
0.001), smoker (p = 0.017) and mothers’ pre-pregnancy weight (p = 0.03) are all significant
predictors of birthweight. If the independent value is significant, explain the relationship
between the independent and dependent variables using the Unstandardized Coefficient B.
The ‘B’ column in the coefficients table, gives us the coefficients for each independent
variable in the regression model. The model is:
Birthweight (y) = -7.165 + 0.313 *(Gestation) – 0.665*(Smoker) + 0.02*(mppwt)
For gestation, there is a 0.313 lb increase in birthweight for each extra week of gestation.
For each extra pound (lb) a mother weighs, the baby’s weight increases by 0.02 lbs. A
binary variable such as Smoker coded as 0 and 1, the coefficient only applies for the group
coded as 1. Here smokers have babies who weigh 0.665 lbs less than non-smokers.
The R2 value of 0.61 indicates that 61% of the variation in birth weight can be explained by
the model containing gestation, smoker and pre-pregnancy weight. This is quite high so
predictions from the regression equation are fairly reliable. It also means that 39% of the
variation is still unexplained so adding other independent variables could improve the fit of
the model.
Reporting regression
Multiple linear regression was carried out to investigate the relationship between
gestational age at birth (weeks), mothers’ pre-pregnancy weight and whether she smokes
and birth weight (lbs). There was a significant relationship between gestation and birth
weight (p < 0.001), smoking and birth weight (p = 0.017) and pre-pregnacy weight and
birth weight (p = 0.03). For gestation, there was a 0.313 lb increase in birthweight for each
extra week of gestation. For each extra pound (lb) a mother weighs, the baby’s weight
increases by 0.02 lbs and smokers have babies who weigh 0.665 lbs less than non-
smokers.
The R2 value was 0.61 so 61% of the variation in birth weight can be explained by the
model containing gestation, pre-pregnancy weight and whether the mother smokes or not.
The scatterplot of standardised predicted values verses standardised residuals, showed
that the data met the assumptions of homogeneity of variance and linearity and the
residuals were approximately normally distributed.