0% found this document useful (0 votes)
34 views36 pages

SPSS and Building Models

This document provides information on using SPSS to build regression models and summarize medical statistics. It discusses creating datasets, importing data, and using syntax files in SPSS. It outlines the steps to build regression models, including performing simple regressions, selecting significant variables, checking assumptions, and adding interaction terms. An example is provided using data on children's lung function to predict FEV, showing the process of building univariate and multiple regression models and checking for interactions between predictors. Closing remarks discuss correlation versus causation and when correction for confounding is needed.

Uploaded by

Charmaine Mei
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views36 pages

SPSS and Building Models

This document provides information on using SPSS to build regression models and summarize medical statistics. It discusses creating datasets, importing data, and using syntax files in SPSS. It outlines the steps to build regression models, including performing simple regressions, selecting significant variables, checking assumptions, and adding interaction terms. An example is provided using data on children's lung function to predict FEV, showing the process of building univariate and multiple regression models and checking for interactions between predictors. Closing remarks discuss correlation versus causation and when correction for confounding is needed.

Uploaded by

Charmaine Mei
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 36

Medical Statistics

SPSS and building models

Hans Burgerhof
Epidemiology
[email protected]
Programme
Some information on working with SPSS
- creating your own dataset
- importing data (from Excel)
- working with SPSS syntax files
Building a regression model
- prediction models
- estimating a specific relationship; to correct for
other variables or not?
SPSS tutorials on the Internet
SPSS, the empty data matrix
The variable view (empty)
Variable view
Typing in data (Data view)
Missing data
Statistics
age
N Valid 5
Missing 0
Mean 216,200
0

Statistics
age
N Valid 4
Missing 1
Mean 20,5000
Some parts of the menu (1)

Means: you need extra software to use this option


Some parts of the menu (2)
Some parts of the menu (3)
Why working with syntax files?
1) To keep track of all commands you gave, so,
after three months, you will still know what you
did three months ago.
2) In the case there was an error in the dataset
and you have to redo all analyses again: simply
run the syntax file! It will take you less than a
minute!
3) Reproducability: other researchers can check
your analyses.
SPSS syntax
https://www.google.nl/url?esrc=s&q=&rct=j&sa
=U&url=https://www.spss-tutorials.com/spss-ou
tput/&
ved=2ahUKEwjUls7ChM32AhUT_rsIHaw9CAQQF
noECAoQAg&usg=AOvVaw0ks_vO_Zb9aOlYIFSF
UiLt
Creating a syntax file
The syntax file

The command has not


been performed yet!
Running (part of) the syntax file
Using copy-paste

Save the syntax file – give it


a relevant name – and you
can open it another day to
check what you did and/or
to rerun your analyses
Building a regression model – prediction
models
If we have a continuous outcome variable Y and a
set of p explanatory variables X1, X2, ... ,Xp and we
would like to predict (or explain) Y, we can test a
linear regression model like

Do we need all available explanatory variables to


predict Y?
Occam’s razor
William of Occam (or: Ockham) was a medieval
philosopher known from “the principle of
parsimony”.
We will only use (statistically) significant
variables and theoretically arguable variables in
the final model.
Steps to build the model
1. Perform simple regression analyses for all explanatory variables Xi, i = 1 … p. (In
linear regression: check for continuous explanatory variables the linearity
assumption). Do not forget to use dummy variables in the case of categorical
explanatory variables with more than two categories.
2. Select possibly significant explanatory variables in the multiple model by selection
on a large alpha ( = 0.15, 0.2 or 0.25, depending on the number of candidate
explanatory variables) using the P-values from step 1, and on theory / literature.
3. Perform a multiple regression model with all explanatory variables selected in step
2.
4. Check the P-values of the regression coefficients in the multiple regression model.
If all P-values are smaller than 0.05, continue with step 6.
5. If not all P-values are smaller than 0.05: remove the non-significant explanatory
variables, one by one. Start removing the explanatory variable with highest P-value
and rerun the analysis with the other explanatory variables. Continue this process
until all remaining variables have P-values smaller than (or equal to) 0.05.
6. Optional: add, based on theory or clear patterns in your data, interaction terms to
the model and test if this will improve the model.
7. Check assumptions of the final model.
Building a multiple regression model

Outcome variable , (possible) explanatory variables , , , ,


repeat steps 3&4

Steps 1 & 2 Step 3 Step 4 Step 5


Build simple Build 1 Are any of the p- Remove the
values for the
Yes!
regression multiple explanatory
models regression regression variable with the
(univariate, model using all coefficient non- largest non-
remove remaining significant? significant p-
variables explanatory (using α=0.05) value (using
using variables α=0.05)
α=0.25) No!

𝑋 1 ,  𝑋 2 ,        𝑋 4 ,  𝑋5 Step 6
Optional:
𝑋 1 ,  𝑋 2 ,  𝑋 3 ,  𝑋 4 ,  𝑋 5 , 𝑋 6 investigate
addition of
interaction terms

Final model…

Step 7
Check model
assumptions
Example (FEV data)
N = 624 children (Boston)

Ages between 3 – 15 years


Sex: 0 = girl, 1 = boy
Smoke: 0 = no, 1 = yes
Height in cm

What is the best model to predict FEV?


(part of the) Syntax file
Graphical impressions
continuous predictors
Graphical impressions
categorical predictors
Results of univariate analyses (1)

Variable R² coefficient 95% CI P-value All four explanantory


Age 0.567 0.242 0.092 ; 0.423 < 0.0005 variables are
Height 0.748 0.050 0.048 ; 0.052 < 0.0005 significantly related to
Sex 0.033 0.302 0.174 ; 0.429 < 0.0005 FEV (in simple linear
Smoke 0.050 0.669 0.440 ; 0.898 < 0.0005 regression models) (2)
A coefficient depends
The higher the R²,
on the unit of the
the better the
variable.
model
Interpretation?
Results of multiple linear regression with all
four explanantory variables (3)

Smoking no longer significant (P > 0.05). (4)


Do we have an explanation for that?
Multiple linear regression without Smoke (5)
The absolute values of the standardized
coefficients can be used for checking relative
importance of the variables

FEV = -4.417 + 0.055·Age + 0.041·Height + 0.136·Sex

girl = 0
boy = 1
Checking the assumptions concerning the
residuals (7)
Lines by subgroups (6)

Is the height effect on FEV


equal for boys and girls?
Interaction between height and sex
Significant interaction?

FEV = -3.224 + 0.063·Age + 0.033·Height – 1.593·Sex + 0.011·Intheightsex

For girls (coded as 0):


FEV = -3.224 + 0.063·Age + 0.033·Height – 1.593·0 + 0.011·0
FEV = -3.224 + 0.063·Age + 0.033·Height

For boys (coded as 1):


FEV = -3.224 + 0.063·Age + 0.033·Height – 1.593·1 + 0.011·height·1
FEV = -4.817 + 0.063·Age + 0.044·Height
A specific association
What if we do not want to predict FEV, but we
are interested in the effect of a specific variable
on FEV?
Do we have to correct for other variables or not?

Theory on Causality van help.


Directed Acyclic Graph (DAG)

?
Smoke FEV

DAGs can help you to


Age
analyze the data in a
correct way
Age is a confounder in the relation
between Smoke and FEV in the
Boston children data
Uncorrected versus corrected analysis

T-test for independent


groups: P < 0.0005
Some closing remarks
- Correlation doesn’t mean automatically direct
causation
- Two variables can share a common cause
- Should we always correct for possible
confounding?
- In an RCT with large enough groups: probably no
need for correcting
- More likely in observational studies
- Beware for overcorrecting
If you torture your data long enough, they will confess!

You might also like