0% found this document useful (0 votes)
6 views

SPSSIntroduction

The document is an introduction to SPSS, detailing a two-day training schedule covering data management, analysis techniques, and statistical tests. It includes instructions on how to enter data, perform various analyses like t-tests and ANOVA, and visualize results using graphs. Additionally, it provides examples and exercises for practical application of SPSS functionalities.

Uploaded by

Dneto
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

SPSSIntroduction

The document is an introduction to SPSS, detailing a two-day training schedule covering data management, analysis techniques, and statistical tests. It includes instructions on how to enter data, perform various analyses like t-tests and ANOVA, and visualize results using graphs. Additionally, it provides examples and exercises for practical application of SPSS functionalities.

Uploaded by

Dneto
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 124

Introduction to SPSS

Pimrapat Gebert, MPH


Institute of Biometry and Clinical Epidemiology
[email protected]

UNIVERSITÄTSMEDIZIN BERLIN
1
Contents
Time Day 1
9.15 – 9.30 Introduction
Starting SPSS
SPSS windows: Data editor, SPSS viewer, Syntax
9.30 – 10.15 Getting the data into SPSS Data management
- Manually entering data - Variable name
- Opening from SPSS file - Value label
- Opening from Excel file - Compute
- Recode into same
- Recode into difference
- Visual Bander (Cont. -> Cat.)
10.15 – 11.00 Describing Data
- Frequency - Histogramm
- EXAMINE - Skewness/Kurtosis
- Barchart/Graphic
11.00 – 11.15 Pause

11.15 – 12.30 Data analysis


- Crosstabs and Chi-square test, McNemar‘s test
- Comparing means: Independent t-test, Paired t-test
- Non-parametric: Mann-Whitney U test, Wilcoxon Signed Rank test

2
Contents

Time Day 2
9.15 – 10.15 Comparing >2 Groups
- One-Way Analysis of Variance (ANOVA) + Post-hoc tests
- Kruskal-Wallis test (non-parametric)
10.15 – 11.00 Correlation
- Pearson correlation
- Spearman‘s rank correlation
- Linear regression
- Scatter plots with adding a regression line
11.00 – 11.15 Pause

11.15 – 11.45 - ROC


- Logistic regression
11.45 – 12.30 Survival analysis
- Kaplan-Meier Curve
- Log-rank test
- Cox regression

3
Introduction to SPSS

SPSS

IBM® SPSS® Statistics


Statistical Package for Social Sciences
Superior Performing Software System
Statistical Product and Service Solutions
PASW (Predictive Analytics Software)
IBM SPSS Statistics Data File Structure
•Rows (records) are cases. Each row represents
a case or an observation.
•Columns (fields) are variables. Each column represents a variable
or characteristic that is being measured.
Ex. Age or Gender etc…

1. 2. 3. 4. 5. 6. 7.
Variable Variable Variable Variable Variable Variable Variable
1. 1.Record 1.Record 1.Record 1.Record 1.Record 1.Record 1.Record
Record 1.Variable 2.Variable 3.Variable 4.Variable 5.Variable 6.Variable 7.Variable

2. 2.Record 2.Record 2.Record 2.Record 2.Record 2.Record 2.Record


Record 1.Variable 2.Variable 3.Variable 4.Variable 5.Variable 6.Variable 7.Variable

3. 3.Record 3.Record 3.Record 3.Record 3.Record 3.Record 3.Record


Record 1.Variable 2.Variable 3.Variable 4.Variable 5.Variable 6.Variable 7.Variable

4. 4.Record 4.Record 4.Record 4.Record 4.Record 4.Record 4.Record


Record 1.Variable 2.Variable 3.Variable 4.Variable 5.Variable 6.Variable 7.Variable

5. 5.Record 5.Record 5.Record 5.Record 5.Record 5.Record 5.Record


Record 1.Variable 2.Variable 3.Variable 4.Variable 5.Variable 6.Variable 7.Variable

6. 6.Record 6.Record 6.Record 6.Record 6.Record 6.Record 6.Record


Record 1.Variable 2.Variable 3.Variable 4.Variable 5.Variable 6.Variable 7.Variable
Dependent Sample (Paired-Sample)
Dependent Tests are always calculated between two Variables (Columns).
The respective Values (before/after) must always be in one Line!
Example: NPAR TESTS WILCOXON = WT0 WITH WT1

Weight (kg) ID GROUP WT0 WT1


Before After 1 1 75 73
75 73 2 1 88 79
88 79
3 2 69 71
69 71
93 88 4 1 93 88
71 71
5 1 71 71
65 63
80 75 6 2 65 63
83 86 7 2 80 75

8 1 83 86
Independent Samples
For Independent Tests, the existing Cases must be divided into two or more Groups.
The variable values "Verum" Or "Placebo" must be defined and entered as a Variable.

In SPSS: NPAR TESTS M-W = WT1 BY GROUP (1,2)


Example: Weight after experiment
1. Placebo: 73, 79, 88, 71, 86
2. Verum: 71, 63, 75
Group = 1 Group = 2
ID GROUP WT0 WT1 (Placebo) (Verum)

1 1 75 73 73

2 1 88 79 79

3 2 69 71 71

4 1 93 88 88

5 1 71 71 71

6 2 65 63 63

7 2 80 75 75

8 1 83 86 86

7
Log-in to the Computer

• Using Charité account


• Using Local log-in
User = passwort

Download data file: https://biometrie.charite.de/


 Service Unit Biometrie
 Interne Fortbildungskurse
 Einführung in SPSS
 “Kurse in englische Sprache”
 Accompanying material (Download example data file)
(Pls. click right and save under …)
***Change data type from .htm  .sav***

8
9
Download
SPSS data file

Download Handout

10
Changing the language in SPSS

Menu bar
 Edit
 Options…

11
SPSS Windows: Data View

12
SPSS Windows: Variable View

13
SPSS Windows: Output

14
SPSS Windows: Syntax

15
Save SPSS files

All window (Data file, Output, Syntax) will be saved separately**

In each window:

From the menus choose:


 File
 Save as …

 Data file  datafile.sav

 Output file  outputfile.spv

 Syntax file  syntaxfile.spo

16
Getting the data into SPSS

 Manually entering data


 Opening from SPSS file
 Opening from Excel file

17
Manually entering data

Case Record Form (CRF)


Demographic data:
Patient number □□□ No
Clinic [1=University clinic 2=Local hospital] □ Clinic
Categorical data
Gender [0=Male 1=Female] □ Gender
Age (years) □□ Age
Height (cm) □□□ Height Continuous data
Weight (kg) □□□.□ Weight

Baseline:
Enrollment date [DD, MM, YYYY] □□.□□.□□□□ Enroll_date
Low blood count:
- after 1 hr □□ LBC_1h_v0
- after 2 hrs □□□ LBC_2h_v0
GOT [U/l] □□□ GOT_v0
GPT [U/l] □□□ GPT_v0

18
Manually entering data
Menu Bar
 File
 New
 Data

*Variable name=short, has meaning, no space

19
20
Ref: https://www.ibm.com/support/knowledgecenter/en/SSLVMB_25.0.0/statistics_mainhelp_ddita/spss/base/idh_defvar_type.html 21
22
23
Opening from Excel file

Change type of file

24
 Getting the data into SPSS
 Manually entering data
 Opening from SPSS file
 Opening from Excel file

 Data management
 Variable name
 Value label
 Compute
 Recode into same
 Recode into difference
 Visual Bander (Cont. -> Cat.)

25
Compute

• You can compute values for numeric or string (alphanumeric) variables.


• You can create new variables or replace the values of existing variables.
For new variables, you can also specify the variable type and label.
• You can compute values selectively for subsets of data based on logical
conditions.
• You can use a large variety of built-in functions, including arithmetic
functions, statistical functions, distribution functions, and string functions.

From the menus choose:


 Transform
 Compute Variable

26
Compute BMI (kg/m2) =
( )

From the menus choose:


 Transform
 Compute Variable

Variable name:
Height = Height (cm)
Weight = Weight (kg)

Syntax:
COMPUTE bmi=Weight / ((Height / 100) ** 2).
EXECUTE.

27
Recode
 Recode into same: to reassign the values of existing variables
gender gender
Old variable name
0=Male 1=Male !!lose the old value
0 1
1=Female 2=Female
1 2

 Recode into difference: to assign the values into new variables, but
keep the existing variables as original. (Recommend!)

gender gender_n
New variable name
0=Male 1=Male !!keep the old valiable
0 1
1=Female 2=Female
1 2

28
Recode into difference
From the menus choose:
 Transform
 Recode into Different Variables…

29
Recode into same
From the menus choose:
 Transform
 Recode into Same Variables…

30
Create categorical variables from continuous variables

Age Age_g
23 1 = <40 yrs
35 1
40 2 = 40 – 49 yrs
45 2
50 3 = ≥50 yrs
60 3

Commands:
 Recode into difference
 Visual binning

31
Create categorical variables from continuous variables
From the menus choose:
 Transform
 Recode into Different Variables…

32
Create categorical variables from continuous variables
From the menus choose:
 Transform
 Visual Binning…

33
Create categorical variables from continuous variables
From the menus choose:
 Transform
 Visual Binning…

34
 Getting the data into SPSS
 Manually entering data
 Opening from SPSS file
 Opening from Excel file

 Data management
 Variable name
 Value label
 Compute
 Recode into same
 Recode into difference
 Visual Bander (Cont. -> Cat.)

35
Exercise 1:
 Using SPSS_course.sav
BMI (kg/m2) =
( )
 Create new variable:

 Create BMI as Category variable with label value

Code Category BMI (kg/m2)


1 Normal <25.0
2 Overweight 25.0 – 29.9
3 Obese I >=30.0

 Recode BMI group into new variable  Normal vs. Overweight+Obese

36
Compute BMI (kg/m2) =
( )

From the menus choose:


 Transform
 Compute Variable

Variable name:
Height = Height (cm)
Weight = Weight (kg)

Syntax:
COMPUTE bmi=Weight / ((Height / 100) ** 2).
EXECUTE.

37
Create BMI as Category variable with label value
From the menus choose:
 Transform
 Visual Binning…

38
Recode BMI group into new variable
Normal vs. Overweight+Obese

39
Syntax
***Exercise 1

*Calculate BMI.
COMPUTE bmi=weight / ((height / 100) ** 2).
EXECUTE.

*Create BMI group.


* Visual Binning.
*bmi.
RECODE bmi (MISSING=COPY) (30 THRU HI=3) (25 THRU HI=2) (LO THRU HI=1) (ELSE=SYSMIS)
INTO bmi_g.
VARIABLE LABELS bmi_g 'bmi (Binned)'.
FORMATS bmi_g (F5.0).
VALUE LABELS bmi_g 1 '< 25,00' 2 '25,00 - 29,99' 3 '30,00+'.
VARIABLE LEVEL bmi_g (ORDINAL).
EXECUTE.

*Create BMI into 2 groups.


RECODE bmi_g (1=1) (MISSING=Copy) (2 thru 3=2) INTO bmi_2g.
EXECUTE.

40
Break!

https://pixabay.com/de/lehre-klassenzimmer-lehrer-bildung-311356/

12
Describing data

 Frequency
 Explore
 Histogramm
 Skewness/Kurtosis
 Barchart/Graphic

42
Frequency: Categorical variables
From the menus choose:
 Analyze
 Descriptive Statistics
 Frequencies…

43
44
From the menus choose:
 Analyze
 Descriptive Statistics
 Frequencies…

45
46
Frequency: Continuous variables
From the menus choose:
 Analyze
 Descriptive Statistics
 Frequencies…

47
48
Descriptive separate by Category Variable
From the menus choose:
 Analyze
 Descriptive Statistics
 Explore…

49
50
51
52
Statistics overview

 Categorical vs. Categorical


 Independent: Chi-square test or Fisher’s exact test
 Non-independent: McNemar’s test or Binomial exact test

 Continuous vs. Categorical


 2 Groups (Independent): Independent t-test or Mann-Whitney U test
 2 Groups (Paired): Paired t-test or Wilcoxon-signed rank test
 >2 Groups: ANOVA or Kruskal-Wallis test

 Continuous vs. Continuous


 Pearson’s correlation or Spearman’s rank correlation

53
Data analysis: Crosstab + Chi-square test
To compare BMI groups between Male and Female
From the menus choose:
 Analyze
 Descriptive Statistics
 Crosstabs…

54
55
Data analysis: Independent t-test and Paired t-test

• Independent t-test: to compare BMI between Male and Female


• Paired t-test: to compare Cholesterol at baseline and Visit 2

56
Data analysis: Independent t-test
From the menus choose:
 Analyze
 Compare Means
 Independent-Samples T Test…

57
Report
There was no statistically difference of BMI between Male and Female,
Mean difference=0.41 (95%CI: -0.57 to 1.39); p-value=0.405

58
Data analysis: Paired t-test
From the menus choose:
 Analyze
 Compare Means
 Paired-Samples T Test…

59
60
Data analysis: Mann-Whitney U Test
From the menus choose:
 Analyze
 Nonparametric tests
 Legacy Dialogs
 2 Independent Samples

61
𝒁
𝑪𝒐𝒉𝒆𝒏 𝒔 𝒓 =
𝑵

Z n Effect size

5.079 132 0.442

Intermediate effect size

Online ES calculation: https://www.psychometrica.de/effect_size.html 62


gender Alcohol consumption (g) P-value Effect size
Median (IQR) (Cohen‘s r)
Male 80 (22, 110) <0.001 0.442

Female 0 (0, 60)

63
From the menus choose: From the menus choose:
 Data  Analyze
 Split files…  Descriptive Statistics
 Frequencies…

Don‘t forget to set Analyze all cases


In Split files again!!!

64
Without Split file With Split file by gender

65
From the menus choose:
 Data Don‘t forget to set Analyze all cases
 Split files… In Split files back!!!

66
Data analysis: Wilcoxon-signed rank test
From the menus choose:
 Analyze
 Nonparametric tests
 Legacy Dialogs
 2 Related Samples

67
𝒁
𝑪𝒐𝒉𝒆𝒏 𝒔 𝒓 =
𝑵

Z n Effect size

6.207 132 0.54


Large effect size

Online ES calculation: https://www.psychometrica.de/effect_size.html 68


Contents

Time Day 2
9.15 – 10.15 Comparing >2 Groups
- One-Way Analysis of Variance (ANOVA) + Post-hoc tests
- Kruskal-Wallis test (non-parametric)
10.15 – 11.00 Correlation
- Pearson correlation
- Spearman‘s rank correlation
- Linear regression
- Scatter plots with adding a regression line
11.00 – 11.15 Pause

11.15 – 11.45 - ROC


- Logistic regression
11.45 – 12.30 Survival analysis
- Kaplan-Meier Curve
- Log-rank test
- Cox regression

69
Statistics overview

 Categorical vs. Categorical


 Independent: Chi-square test or Fisher’s exact test
 Non-independent: McNemar’s test or Binomial exact test

 Continuous vs. Categorical


 2 Groups (Independent): Independent t-test or Mann-Whitney U test
 2 Groups (Paired): Paired t-test or Wilcoxon-signed rank test
 >2 Groups: ANOVA or Kruskal-Wallis test

 Continuous vs. Continuous


 Pearson’s correlation or Spearman’s rank correlation

70
Statistics overview: Regression models

 Continuous outcome
 Linear regression model
 Binary outcome
 Logistic regression model
 Ordinal outcome
 Ordinal logistic regression model
 Multiple categorical outcome
 Multinomial logistic regression model
 Time-to-event outcome
 Cox-proportional hazard model
 Longitudinal study
 Generalized Estimating Equation (GEE)
 Mixed/Multilevel model

71
From the menus choose:
 Transform
 Visual Binning…

72
Data analysis: One-Way ANOVA
From the menus choose:
 Analyze
 Compare Means
 One-Way ANOVA…

73
74
75
Data analysis: Kruskal-Wallis Test
From the menus choose:
 Analyze
 Nonparametric tests
 Legacy Dialogs
 K Independent Samples

76
77
From the menus choose:
 Analyze
 Nonparametric tests
 Independent Samples…

78
79
Double click at result table!
To show editor window

Change View from Independent Samples Test View


to Pairwise Comparisons
80
Data analysis: Correlation
From the menus choose:
 Analyze
 Correlate
 Bivariate…

81
82
Data analysis: Linear regression
From the menus choose:
 Analyze
 Regression
 Linear…

83
84
From the menus choose:
 Graphs
 Chart Builder…

85
86
From the menus choose:
 Graphs Scatter plot by gender
 Chart Builder…

87
88
Break!

https://pixabay.com/de/lehre-klassenzimmer-lehrer-bildung-311356/

12
Data analysis: ROC
From the menus choose:
 Analyze
 ROC Curve…

90
91
Data analysis: Logistic Regression
From the menus choose:
 Analyze
 Regression
 Binary Logistic…

92
93
The odds of alcohol drinking in female were 87% less than Male.
The odds of alcohol drinking in male was 7.58 (1/0.132) times higher than female.

The odds of alcohol drinking in Pt.who was younger than 50 were 1.2 times higher than
the Pt. who was older than 60 years.
94
Calculation Time from Date Variable

95
96
Data analysis: Kaplan-Meier Curve
From the menus choose:
 Analyze
 Survival
 Kaplan-Meier…

97
98
From the menus choose:
 Analyze
 Survival
 Kaplan-Meier…

99
100
Data analysis: Cox-regression
From the menus choose:
 Analyze
 Survival
 Cox regression…

101
102
Male at any time point during the study period were likely to die 1.94 times compared
to female, and we are 95% confident that the true value is lying between 0.86-4.37.

Pt. who had BMI overweight at any time point during the study period had the risk of
death 3.33 times and Pt. who had BMI obese had 4.29 times compared to the Pt.
who had normal BMI.

103
http://www.picserver.org/images/highway/phrases/exercise.jpg

13
 Create „Group variable“ from „No“
 1= <45
 2= 45 – 100
 3= >100
 Compare „Chol_v0“, „TG_v0“ and „Hb_v0“ between groups by
choosing the appropriate statistical test
 ANOVA or Kruskal-Wallis test
 Performing the correlation between „Age“ and „Chol_v0“ separately
by groups
 Pearson correlation and Scatter plot separately by groups
 Create „Cholesterol group“ from „Chol_v0“
 0 = Normal (Chol_v0<200)
 1 = High Cholesterol (Chol_v0>=200)
 Which factors (Age, gender, bmi_2g) associate with High
Cholesterol?
 Which statistics will you perform?

105
From the menus choose:
 Transform
 Visual Binning…

106
From the menus choose:
 Analyze
 Descriptive Statistics
 Explore…

107
108
109
110
111
112
From the menus choose:
 Analyze
 Compare Means
 One-Way ANOVA…

113
114
115
From the menus choose:
 Analyze
 Nonparametric tests
 Legacy Dialogs
 K Independent Samples

116
117
From the menus choose: From the menus choose:
 Data  Analyze
 Split files…  Correlate
 Bivariate…

118
119
From the menus choose:
 Transform
 Visual Binning…

120
From the menus choose:
 Transform
 Visual Binning…

121
From the menus choose:
 Analyze
 Regression
 Binary Logistic…

122
123

You might also like