0% found this document useful (0 votes)
132 views4 pages

Community Project: Simple Linear Regression in SPSS

Uploaded by

Neva Asih
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
132 views4 pages

Community Project: Simple Linear Regression in SPSS

Uploaded by

Neva Asih
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

community project

encouraging academics to share statistics support resources


All stcp resources are released under a Creative Commons licence

stcp-marshall-regressionS

The following resources are associated:


Scatterplots and correlation, Checking normality in SPSS and the SPSS dataset Birthweight_reduced.sav’

Simple linear regression in SPSS


Dependent (outcome) variable: Continuous (scale)
Independent (explanatory) variables: Continuous (scale)
Common Applications: Regression is used to (a) look for significant relationships
between two variables or (b) predict a value of one variable for a given value of the other.
Data: The data set ‘Birthweight_reduced.sav’ contains details of 42 babies and their
parents at birth. The dependant variable is Birth weight (lbs) and the independent variable
is the gestational age of the baby at birth (in weeks).

Before carrying out any analysis, investigate the relationship between the independent and
dependent variables by producing a scatterplot and calculating the correlation coefficient.

For a scatterplot: Graphs  Legacy Dialogs  Scatter/Dot, then choose ‘Simple Scatter’.
Move the dependent ‘Birth weight’ to the Y
Axis box and the independent ‘Gestation’
to the X Axis box.

To calculate Pearson’s correlation co-


efficient use Analyze  Correlate 
Bivariate and move both ‘Birthweight’
and ‘Gestation’ to the variables box.

Both the scatterplot and the Pearson’s


correlation co-efficient ( r ) of 0.706
suggest a strong positive linear
relationship between gestational age and
birthweight.

© Ellen Marshall Reviewer: Jean Russell


Sheffield Hallam University University of Sheffield
Simple linear regression in SPSS

Simple linear regression quantifies the relationship between two variables by producing an
equation for a straight line of the form
y = a + βx which uses the
independent variable (x) to predict the
dependent variable (y). Regression
involves estimating the values of the
gradient (β ) and intercept (a ) of the
line that best fits the data . This is
Residuals =
defined as the line which minimises actual y – predicted y
the sum of the squared residuals. A
residual is the difference between an
observed dependent value and one
predicted from the regression equation.

Assumptions for regression


Assumptions How to check What to do if the
assumption is not met
1) The relationship between the Scatterplot: scatter should form a Transform either the
independent and dependent variable line in the plot rather than a curve independent or
is linear or other shape dependent variable
2) Residuals should be approximately Request the histogram of residuals Transform the
normally distributed within the Plots menu dependent variable
3) Homoscedasticity: Scatterplot of This shape is bad since the Transform the
standardised residuals and variation in the residuals (up and dependent variable
standardised predicted values shows down) is not constant (variance is
no pattern (scatter is roughly the increasing)
same width as y increases)

4) Independent observations Request the Durbin Watson If the Durbin-Watson


(adjacent values are not related). statistic within the Statistics menu Statistic is outside the
This is only a possible problem if of regression. It should be range, use Time series
measurements are collected over time between 1.5 – 2.5 (high level statistics)
5) No observations have a large If you wish to check leverage Run the regression
overall influence (leverage). Look at values, request columns with with and without the
individual Cook’s and Leverage Cook’s and Leverage values to be observations and
values. Interpretation of this is not added to the comment on the
included on this sheet. dataset via the differences
Save menu

Note: The Further regression resource contains more information on assumptions 4 and 5.

statstutor community project www.statstutor.ac.uk


Simple linear regression in SPSS

Steps in SPSS
Analyze  Regression  Linear

Move ‘Weight of the baby at birth’ to the Dependent box and ‘Gestational age at birth’ to
the Independent(s) box. The plots for checking assumptions are found in the Plots menu.
The histogram checks the normality of the residuals. There are a few options for the
scatterplot of predicted values against residuals. Here the standardised residuals
(ZRESID) and standardised predicted values (ZPRED) are used.

Output
The Coefficients table is the most important table. It contains the coefficients for the
regression equation and tests of significance.

The ‘B’ column in the co-efficients table, gives us the values of the gradient and intercept
terms for the regression line.
The model is: Birth weight (y) = -6.66 + 0.355 *(Gestational age)

The gradient (β ) is tested for significance. If there is no relationship, the gradient of the
line (β ) would be 0 and therefore every baby would be predicted to be the same weight.
The sig value against Gestational age is less than 0.05 and so there is significant evidence
to suggest that the gradient is not 0 (p < 0.001).

statstutor community project www.statstutor.ac.uk


Simple linear regression in SPSS

The key information from the table below is the R2 value of 0.499. This indicates that 49.9%
of the variation in birth weight can be explained by the model containing only gestation.
This is quite high so predictions from the regression equation are fairly reliable. It also
means that 50.1% of the variation is still unexplained so adding other independent
variables could improve the fit of the model.

Checking the assumptions for this data


Normality of residuals Homoscedasticity

The residuals are approximately


normally distributed There is no pattern in the scatter. The width of the
scatter as predicted values increase is roughly the
same so the assumption has been met.

Reporting regression
Simple linear regression was carried out to investigate the relationship between
gestational age at birth (weeks) and birth weight (lbs). The scatterplot showed that there
was a strong positive linear relationship between the two, which was confirmed with a
Pearson’s correlation coefficient of 0.706. Simple linear regression showed a significant
relationship between gestation and birth weight (p < 0.001). The slope coefficient for
gestation was 0.355 so the weight of baby increases by 0.355 lbs for each extra week of
gestation. The R2 value was 0.499 so 49.9% of the variation in birth weight can be
explained by the model containing only gestation.
The scatterplot of standardised predicted values verses standardised residuals, showed
that the data met the assumptions of homogeneity of variance and linearity and the
residuals were approximately normally distributed.

statstutor community project www.statstutor.ac.uk

You might also like