Community Project: Simple Linear Regression in SPSS
Community Project: Simple Linear Regression in SPSS
stcp-marshall-regressionS
Before carrying out any analysis, investigate the relationship between the independent and
dependent variables by producing a scatterplot and calculating the correlation coefficient.
For a scatterplot: Graphs Legacy Dialogs Scatter/Dot, then choose ‘Simple Scatter’.
Move the dependent ‘Birth weight’ to the Y
Axis box and the independent ‘Gestation’
to the X Axis box.
Simple linear regression quantifies the relationship between two variables by producing an
equation for a straight line of the form
y = a + βx which uses the
independent variable (x) to predict the
dependent variable (y). Regression
involves estimating the values of the
gradient (β ) and intercept (a ) of the
line that best fits the data . This is
Residuals =
defined as the line which minimises actual y – predicted y
the sum of the squared residuals. A
residual is the difference between an
observed dependent value and one
predicted from the regression equation.
Note: The Further regression resource contains more information on assumptions 4 and 5.
Steps in SPSS
Analyze Regression Linear
Move ‘Weight of the baby at birth’ to the Dependent box and ‘Gestational age at birth’ to
the Independent(s) box. The plots for checking assumptions are found in the Plots menu.
The histogram checks the normality of the residuals. There are a few options for the
scatterplot of predicted values against residuals. Here the standardised residuals
(ZRESID) and standardised predicted values (ZPRED) are used.
Output
The Coefficients table is the most important table. It contains the coefficients for the
regression equation and tests of significance.
The ‘B’ column in the co-efficients table, gives us the values of the gradient and intercept
terms for the regression line.
The model is: Birth weight (y) = -6.66 + 0.355 *(Gestational age)
The gradient (β ) is tested for significance. If there is no relationship, the gradient of the
line (β ) would be 0 and therefore every baby would be predicted to be the same weight.
The sig value against Gestational age is less than 0.05 and so there is significant evidence
to suggest that the gradient is not 0 (p < 0.001).
The key information from the table below is the R2 value of 0.499. This indicates that 49.9%
of the variation in birth weight can be explained by the model containing only gestation.
This is quite high so predictions from the regression equation are fairly reliable. It also
means that 50.1% of the variation is still unexplained so adding other independent
variables could improve the fit of the model.
Reporting regression
Simple linear regression was carried out to investigate the relationship between
gestational age at birth (weeks) and birth weight (lbs). The scatterplot showed that there
was a strong positive linear relationship between the two, which was confirmed with a
Pearson’s correlation coefficient of 0.706. Simple linear regression showed a significant
relationship between gestation and birth weight (p < 0.001). The slope coefficient for
gestation was 0.355 so the weight of baby increases by 0.355 lbs for each extra week of
gestation. The R2 value was 0.499 so 49.9% of the variation in birth weight can be
explained by the model containing only gestation.
The scatterplot of standardised predicted values verses standardised residuals, showed
that the data met the assumptions of homogeneity of variance and linearity and the
residuals were approximately normally distributed.