0% found this document useful (0 votes)

34 views36 pages

SPSS and Building Models

This document provides information on using SPSS to build regression models and summarize medical statistics. It discusses creating datasets, importing data, and using syntax files in SPSS. It outlines the steps to build regression models, including performing simple regressions, selecting significant variables, checking assumptions, and adding interaction terms. An example is provided using data on children's lung function to predict FEV, showing the process of building univariate and multiple regression models and checking for interactions between predictors. Closing remarks discuss correlation versus causation and when correction for confounding is needed.

Uploaded by

Charmaine Mei

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views36 pages

SPSS and Building Models

Uploaded by

Charmaine Mei

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 36

Medical Statistics

SPSS and building models

Hans Burgerhof
Epidemiology
[email protected]
Programme
Some information on working with SPSS
- creating your own dataset
- importing data (from Excel)
- working with SPSS syntax files
Building a regression model
- prediction models
- estimating a specific relationship; to correct for
other variables or not?
SPSS tutorials on the Internet
SPSS, the empty data matrix
The variable view (empty)
Variable view
Typing in data (Data view)
Missing data
Statistics
age
N Valid 5
Missing 0
Mean 216,200
0

Statistics
age
N Valid 4
Missing 1
Mean 20,5000
Some parts of the menu (1)

Means: you need extra software to use this option

Some parts of the menu (2)
Some parts of the menu (3)
Why working with syntax files?
1) To keep track of all commands you gave, so,
after three months, you will still know what you
did three months ago.
2) In the case there was an error in the dataset
and you have to redo all analyses again: simply
run the syntax file! It will take you less than a
minute!
3) Reproducability: other researchers can check
your analyses.
SPSS syntax
https://www.google.nl/url?esrc=s&q=&rct=j&sa
=U&url=https://www.spss-tutorials.com/spss-ou
tput/&
ved=2ahUKEwjUls7ChM32AhUT_rsIHaw9CAQQF
noECAoQAg&usg=AOvVaw0ks_vO_Zb9aOlYIFSF
UiLt
Creating a syntax file
The syntax file

The command has not

been performed yet!
Running (part of) the syntax file
Using copy-paste

Save the syntax file – give it

a relevant name – and you
can open it another day to
check what you did and/or
to rerun your analyses
Building a regression model – prediction
models
If we have a continuous outcome variable Y and a
set of p explanatory variables X1, X2, ... ,Xp and we
would like to predict (or explain) Y, we can test a
linear regression model like

Do we need all available explanatory variables to

predict Y?
Occam’s razor
William of Occam (or: Ockham) was a medieval
philosopher known from “the principle of
parsimony”.
We will only use (statistically) significant
variables and theoretically arguable variables in
the final model.
Steps to build the model
1. Perform simple regression analyses for all explanatory variables Xi, i = 1 … p. (In
linear regression: check for continuous explanatory variables the linearity
assumption). Do not forget to use dummy variables in the case of categorical
explanatory variables with more than two categories.
2. Select possibly significant explanatory variables in the multiple model by selection
on a large alpha ( = 0.15, 0.2 or 0.25, depending on the number of candidate
explanatory variables) using the P-values from step 1, and on theory / literature.
3. Perform a multiple regression model with all explanatory variables selected in step
2.
4. Check the P-values of the regression coefficients in the multiple regression model.
If all P-values are smaller than 0.05, continue with step 6.
5. If not all P-values are smaller than 0.05: remove the non-significant explanatory
variables, one by one. Start removing the explanatory variable with highest P-value
and rerun the analysis with the other explanatory variables. Continue this process
until all remaining variables have P-values smaller than (or equal to) 0.05.
6. Optional: add, based on theory or clear patterns in your data, interaction terms to
the model and test if this will improve the model.
7. Check assumptions of the final model.
Building a multiple regression model

Outcome variable , (possible) explanatory variables , , , ,

repeat steps 3&4

Steps 1 & 2 Step 3 Step 4 Step 5

Build simple Build 1 Are any of the p- Remove the
values for the
Yes!
regression multiple explanatory
models regression regression variable with the
(univariate, model using all coefficient non- largest non-
remove remaining significant? significant p-
variables explanatory (using α=0.05) value (using
using variables α=0.05)
α=0.25) No!

𝑋 1 , 𝑋 2 , 𝑋 4 , 𝑋5 Step 6
Optional:
𝑋 1 , 𝑋 2 , 𝑋 3 , 𝑋 4 , 𝑋 5 , 𝑋 6 investigate
addition of
interaction terms

Final model…

Step 7
Check model
assumptions
Example (FEV data)
N = 624 children (Boston)

Ages between 3 – 15 years

Sex: 0 = girl, 1 = boy
Smoke: 0 = no, 1 = yes
Height in cm

What is the best model to predict FEV?

(part of the) Syntax file
Graphical impressions
continuous predictors
Graphical impressions
categorical predictors
Results of univariate analyses (1)

Variable R² coefficient 95% CI P-value All four explanantory

Age 0.567 0.242 0.092 ; 0.423 < 0.0005 variables are
Height 0.748 0.050 0.048 ; 0.052 < 0.0005 significantly related to
Sex 0.033 0.302 0.174 ; 0.429 < 0.0005 FEV (in simple linear
Smoke 0.050 0.669 0.440 ; 0.898 < 0.0005 regression models) (2)
A coefficient depends
The higher the R²,
on the unit of the
the better the
variable.
model
Interpretation?
Results of multiple linear regression with all
four explanantory variables (3)

Smoking no longer significant (P > 0.05). (4)

Do we have an explanation for that?
Multiple linear regression without Smoke (5)
The absolute values of the standardized
coefficients can be used for checking relative
importance of the variables

FEV = -4.417 + 0.055·Age + 0.041·Height + 0.136·Sex

girl = 0
boy = 1
Checking the assumptions concerning the
residuals (7)
Lines by subgroups (6)

Is the height effect on FEV

equal for boys and girls?
Interaction between height and sex
Significant interaction?

FEV = -3.224 + 0.063·Age + 0.033·Height – 1.593·Sex + 0.011·Intheightsex

For girls (coded as 0):

FEV = -3.224 + 0.063·Age + 0.033·Height – 1.593·0 + 0.011·0
FEV = -3.224 + 0.063·Age + 0.033·Height

For boys (coded as 1):

FEV = -3.224 + 0.063·Age + 0.033·Height – 1.593·1 + 0.011·height·1
FEV = -4.817 + 0.063·Age + 0.044·Height
A specific association
What if we do not want to predict FEV, but we
are interested in the effect of a specific variable
on FEV?
Do we have to correct for other variables or not?

Theory on Causality van help.

Directed Acyclic Graph (DAG)

?
Smoke FEV

DAGs can help you to

Age
analyze the data in a
correct way
Age is a confounder in the relation
between Smoke and FEV in the
Boston children data
Uncorrected versus corrected analysis

T-test for independent

groups: P < 0.0005
Some closing remarks
- Correlation doesn’t mean automatically direct
causation
- Two variables can share a common cause
- Should we always correct for possible
confounding?
- In an RCT with large enough groups: probably no
need for correcting
- More likely in observational studies
- Beware for overcorrecting
If you torture your data long enough, they will confess!

Regression Linear
No ratings yet
Regression Linear
24 pages
Primer of Applied Regression and Analysis of Variance (Glantz S.a., Slinker B.K., Neilands T.B)
No ratings yet
Primer of Applied Regression and Analysis of Variance (Glantz S.a., Slinker B.K., Neilands T.B)
1,472 pages
Statistics For Dummies
100% (3)
Statistics For Dummies
41 pages
Practical Research 2
86% (7)
Practical Research 2
127 pages
Lesson 3.1 SPSS OUTPUT
No ratings yet
Lesson 3.1 SPSS OUTPUT
6 pages
Data Science
100% (1)
Data Science
14 pages
Regression Models As A Tool in Medical Research - 1st Edition No-Wait Download
100% (18)
Regression Models As A Tool in Medical Research - 1st Edition No-Wait Download
15 pages
Biostatistics (Correlation and Regression)
100% (1)
Biostatistics (Correlation and Regression)
29 pages
Grade 10 Chemistry Assessment:: (Criterion B: Inquiring and Designing Criterion C: Analysing and Evaluating)
100% (2)
Grade 10 Chemistry Assessment:: (Criterion B: Inquiring and Designing Criterion C: Analysing and Evaluating)
12 pages
Statistical Modelling of Epidemiological Data
No ratings yet
Statistical Modelling of Epidemiological Data
87 pages
11 Regression JASP
100% (1)
11 Regression JASP
35 pages
Regression Logistic Regression
100% (1)
Regression Logistic Regression
37 pages
120.508 Module 8 Multiple Regression (PDF Full Page Color)
No ratings yet
120.508 Module 8 Multiple Regression (PDF Full Page Color)
52 pages
Lesson 1 (Obtaining Data)
100% (1)
Lesson 1 (Obtaining Data)
7 pages
Multiple Regression
No ratings yet
Multiple Regression
55 pages
Stt151a Notes
No ratings yet
Stt151a Notes
14 pages
Practical Session 1 Solved
No ratings yet
Practical Session 1 Solved
14 pages
Multiple Regression
No ratings yet
Multiple Regression
61 pages
ST T153A Regression Analysis
No ratings yet
ST T153A Regression Analysis
54 pages
1.3. MR Using SPSS
No ratings yet
1.3. MR Using SPSS
24 pages
Session 1.3 Notes
No ratings yet
Session 1.3 Notes
39 pages
RESEARCH METHODS LESSON 18 - Multiple Regression
No ratings yet
RESEARCH METHODS LESSON 18 - Multiple Regression
6 pages
Stats Multiple Regression
No ratings yet
Stats Multiple Regression
19 pages
06 Regression
No ratings yet
06 Regression
18 pages
15multiple Linear Regression
No ratings yet
15multiple Linear Regression
168 pages
ANOVA, Correlation and Regression: Dr. Faris Al Lami MB, CHB PHD FFPH
No ratings yet
ANOVA, Correlation and Regression: Dr. Faris Al Lami MB, CHB PHD FFPH
40 pages
Statistical Methodology Step of Scientific Research Important Parametric Tests Important Nonparametric Tests Example Using Excel Program Using Excel For Statistics in Gateway Cases - Office 2007
No ratings yet
Statistical Methodology Step of Scientific Research Important Parametric Tests Important Nonparametric Tests Example Using Excel Program Using Excel For Statistics in Gateway Cases - Office 2007
42 pages
Statistical Analysis: Linear Regression
No ratings yet
Statistical Analysis: Linear Regression
36 pages
EE3211 Modelling Techniques
No ratings yet
EE3211 Modelling Techniques
47 pages
Week 8 - 10
No ratings yet
Week 8 - 10
72 pages
Chapter 15
No ratings yet
Chapter 15
43 pages
Chapter 3 Notes Part 3
No ratings yet
Chapter 3 Notes Part 3
9 pages
Computer Lab 3 MM
No ratings yet
Computer Lab 3 MM
38 pages
Regression ANOVA Compiled
No ratings yet
Regression ANOVA Compiled
112 pages
EP Questions and Answers
No ratings yet
EP Questions and Answers
23 pages
Week 6 - Result and Analysis 2 (UP)
No ratings yet
Week 6 - Result and Analysis 2 (UP)
7 pages
Linearregression
No ratings yet
Linearregression
18 pages
Introduction To Logistic Regression: Rachid Salmi, Jean-Claude Desenclos, Alain Moren, Thomas Grein
No ratings yet
Introduction To Logistic Regression: Rachid Salmi, Jean-Claude Desenclos, Alain Moren, Thomas Grein
36 pages
Correlation and Regression: Associate Professor Georgi Iskrov, PHD Department of Social Medicine and Public Health
No ratings yet
Correlation and Regression: Associate Professor Georgi Iskrov, PHD Department of Social Medicine and Public Health
28 pages
RegrCorr PDF
No ratings yet
RegrCorr PDF
20 pages
12.1correlation and Simple Linear
No ratings yet
12.1correlation and Simple Linear
45 pages
How Science Works Glossary
No ratings yet
How Science Works Glossary
5 pages
Chapter 3
No ratings yet
Chapter 3
36 pages
SPSS ANNOTATED OUTPUT Multiple Regression
No ratings yet
SPSS ANNOTATED OUTPUT Multiple Regression
12 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
26 pages
Theme 3 Multivariante Regression Model
No ratings yet
Theme 3 Multivariante Regression Model
8 pages
STAT22209 - Chapter 03-Multiple Regression - 2022
No ratings yet
STAT22209 - Chapter 03-Multiple Regression - 2022
41 pages
Summary Data
No ratings yet
Summary Data
9 pages
Statistics For Dummies Rachel Enriquez
No ratings yet
Statistics For Dummies Rachel Enriquez
41 pages
Chapter 3 MLR
No ratings yet
Chapter 3 MLR
40 pages
Topic Planner - Modelling Associations
No ratings yet
Topic Planner - Modelling Associations
20 pages
Loan Prediction Using Artificial Intelligence and Machine Learning
No ratings yet
Loan Prediction Using Artificial Intelligence and Machine Learning
24 pages
Lecture 10
No ratings yet
Lecture 10
5 pages
Phân Tích Dữ Liệu Và Xác Định Phép Kiểm Thống Kê
No ratings yet
Phân Tích Dữ Liệu Và Xác Định Phép Kiểm Thống Kê
50 pages
Multiple Regression
No ratings yet
Multiple Regression
49 pages
Correlation
No ratings yet
Correlation
13 pages
Untitled 472
No ratings yet
Untitled 472
13 pages
Linear Model
No ratings yet
Linear Model
10 pages
Which Test When: 1 Exploratory Tests
No ratings yet
Which Test When: 1 Exploratory Tests
5 pages
Cheat Sheet Statistics
No ratings yet
Cheat Sheet Statistics
3 pages
L4&5 Multiple Regression 2010B
No ratings yet
L4&5 Multiple Regression 2010B
77 pages
Choosing The Right Statistical Test: Source
No ratings yet
Choosing The Right Statistical Test: Source
4 pages
Thesis
No ratings yet
Thesis
8 pages
Chapter 5 Customer Portfolio Management
No ratings yet
Chapter 5 Customer Portfolio Management
15 pages
Advanced Statistical Methods
No ratings yet
Advanced Statistical Methods
63 pages
4.kinds of Variables and Level of Measurement
No ratings yet
4.kinds of Variables and Level of Measurement
61 pages
Data Science Process
No ratings yet
Data Science Process
101 pages
Effects of Rainfall and Runoff-Yield Conditions On
No ratings yet
Effects of Rainfall and Runoff-Yield Conditions On
6 pages
Outline - Demand Planning (Slide 14 Resource)
No ratings yet
Outline - Demand Planning (Slide 14 Resource)
6 pages
Supply Chain Forecasting - Quants
No ratings yet
Supply Chain Forecasting - Quants
36 pages
Anova
No ratings yet
Anova
46 pages
Gender Equity Scale
No ratings yet
Gender Equity Scale
20 pages
Soumyakant Tripathy - Project Report
No ratings yet
Soumyakant Tripathy - Project Report
44 pages
Quantile Regression Models and Their Applications A Review 2155 6180 1000354
No ratings yet
Quantile Regression Models and Their Applications A Review 2155 6180 1000354
6 pages
IEEE Report of BTP
No ratings yet
IEEE Report of BTP
10 pages
Regression Analysis: Answers To Problems and Cases 1. 2
No ratings yet
Regression Analysis: Answers To Problems and Cases 1. 2
80 pages
The Olympic Medals Ranking: Does The Past Predict The Future?
No ratings yet
The Olympic Medals Ranking: Does The Past Predict The Future?
7 pages
Lecture 3
No ratings yet
Lecture 3
47 pages
ML Lab Manual TE 2021-22
No ratings yet
ML Lab Manual TE 2021-22
43 pages
Research Article: Association Between Social Media Use and Depression Among U.S. Young Adults
No ratings yet
Research Article: Association Between Social Media Use and Depression Among U.S. Young Adults
9 pages
Corporaterealestateandgreen Building Prevalence, Transparencyanddrivers
No ratings yet
Corporaterealestateandgreen Building Prevalence, Transparencyanddrivers
15 pages
OPERATIONS PLANNING - Unit 1
No ratings yet
OPERATIONS PLANNING - Unit 1
14 pages
The Impact of Marketing Mix On Customer Loyalty Towards Plaza Indonesia Shopping Center
No ratings yet
The Impact of Marketing Mix On Customer Loyalty Towards Plaza Indonesia Shopping Center
11 pages
Correlation and Regression: Libeeth B. Guevarra Department of Mathematics and Natural Sciences
No ratings yet
Correlation and Regression: Libeeth B. Guevarra Department of Mathematics and Natural Sciences
12 pages
DSBDAL - Assignment No 4
No ratings yet
DSBDAL - Assignment No 4
15 pages
Variables
No ratings yet
Variables
11 pages
Music and Depression
No ratings yet
Music and Depression
6 pages
Enterprise Risk Management in The Nigerian Insurance Industry
No ratings yet
Enterprise Risk Management in The Nigerian Insurance Industry
7 pages
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
Multiple Models Approach in Automation: Takagi-Sugeno Fuzzy Systems
From Everand
Multiple Models Approach in Automation: Takagi-Sugeno Fuzzy Systems
Mohammed Chadli
No ratings yet

SPSS and Building Models

Uploaded by

SPSS and Building Models

Uploaded by

Medical Statistics

SPSS and building models

Means: you need extra software to use this option

The command has not

Save the syntax file – give it

Do we need all available explanatory variables to

Outcome variable , (possible) explanatory variables , , , ,

Steps 1 & 2 Step 3 Step 4 Step 5

Ages between 3 – 15 years

What is the best model to predict FEV?

Variable R² coefficient 95% CI P-value All four explanantory

Smoking no longer significant (P > 0.05). (4)

FEV = -4.417 + 0.055·Age + 0.041·Height + 0.136·Sex

Is the height effect on FEV

FEV = -3.224 + 0.063·Age + 0.033·Height – 1.593·Sex + 0.011·Intheightsex

For girls (coded as 0):

For boys (coded as 1):

Theory on Causality van help.

DAGs can help you to

T-test for independent

You might also like