Basic SPSS Guidance 1
Basic SPSS Guidance 1
1
Descriptive analysis & Parametric analysis
Content:
1. Descriptive analysis
2. Parametric analysis
2.1 One sample T-test
2.2 Dependent T-test
2.3 Independent T-test
2.4 ANOVA
3. Extra
- Extra 1: Degree of freedom
Disclaimer: This is a student’s work. If there is any mistake can contact the author
0
1. Descriptive analysis
A lecturer recorded the final marks obtained by his students in biostatistics exam.
The individual marks for 60 students are given below.
75 62 71 63 79 60 57 70 67 81 53 70
70 67 66 65 56 68 65 72 65 69 68 83
63 63 67 58 77 45 57 64 75 62 60 75
65 59 54 51 55 85 67 55 66 48 72 50
76 87 62 50 48 56 60 65 67 67 68 64
a. How to key in data?
1
Analyze > Descriptive statistic > Explore > tick at any data that you needed for your
study
The skewness value is 0.148 which is within ± 1, thus the data is assumed to be
symmetrical
2
Since the data have sample more than 50
samples, thus Kolmogorov-Smirnov test is
used
if sample size < 50, use Shapiro-Wilk test
3
There is no outlier or extreme cases present in the box plot.
The length of the whiskers is the same
The median is located at the center of the box.
If have ‘*’ means it is extreme cases, if ‘o’ means outlier
Report, how many cases, which cases is it
Note:
Extreme = those that is not belong to the study population
The pH range of water that is safe to consume is from 6.5 to 7.5. To determine the if the
water is safe to drink, 20 samples have been collected from different places as below:
7.45 7.15 7.47 8.37 5.74 6.40 8.05 7.51 8.45 6.39
5.55 7.91 7.35 6.54 7.63 5.85 7.62 6.98 6.14 6.92
4
1. Key in vertical in one column
2. Go to variable view, change label
make sure the MEASURE is in
SCALE
3. Analyze Compare
meansOne sample T test
4. Put the test value as the mean
that u want to compare with
b. What to write?Ho
/ Interpretation
= The pH of the water is 7
H1 = The pH of the water is not 7
The data is
normally distributed
Note: if it is not normal, we
need to carry out data
transformation
Understanding
1. Small t-value
shows the
groups are
similar
2. Means the
two groups
are 0.381
times
different
5 from each
other as they
are within
each other
The mean for 20 data is 7.07 with a SD 0.86
The standard error is 0.19. 70.7% % probability the Remember to
event occurs by chance (+) to get CI!!
Conclusion
P-value of the test is greater than 0.05, thus accept null hypothesis
The mean pH of the test is around 7 (The sample mean same as the pre-determined
mean)
We are 95% confidence that the mean pH of water is between 6.67 and 7.48.
If p-value < 0.05, then reject null hypothesis. The sample mean is not same/different as
the pre-determined mean.
Note: if Normality test show, not normal check for outlier if got then remove it
check again whether it become normal already or not.
- If still not normal, then need to do transformation
6
2.2 Paired sample T tests / Dependent T test
Uses: To compare the mean of two related
variables (same things / persons) Patient BP before BP after
9 140 140
A study on the effectiveness of a 10 167 167 treatment to
hypertension. The BP (in mmHg) before 11 127 120 the treatment
Patient BP before BP after and 12 145 147 after the
1 180 170 13 166 150 treatment is
2 160 155 14 145 125 measured. The
3 160 161 15 149 140 reading is as
4 170 172 16 150 150 below
5 150 145
6 165 160
7 155 150
8 135 133
a. How to
key in
the data?
7
Analyze> Compare
means > Paired
sample T tests
Ho = The BP before and after is
the same
H1 = The BP before and after is
the not the same
b. What to write? / Interpretation
Can check for descriptive analysis for the
differences (before – after)
First, always Check for the normality!!
8
The mean difference is 4.938 mmHg with SD of 6.37 kg
The t value is 3.1 with degree of freedom of 15
The p-value is 0.007 which is smaller than 0.05.
The 95% confidence interval of the difference is [1.54, 8.33]
P-value less than 0.05, accept null hypothesis, there is significant change in
the mean BP, in which the BP after is smaller than BP before
Thus, the treatment is effective
We are 95% confidence that the reduction in BP is between 1.54mmHg and
8.33mmHg.
School A School B
32 31 34 28 21 28 20 24 29 23 24 26
22 34 32 30 26 22 23 25 24 22 26 24
25 26 26 32 24 24 34 21 18 21 28 24
29 32 24 19 26 28 23 20
9
b. What to
write? /
Interpretation
between A and B
H1 = There is a difference in BMI
between A and B
10
1st step, always check for normality (use the whole data, no need to separate into A and
B while performing descriptive analysis.
11
The mean BMI of 20 male in A is
27.90 with SD of 4.128
The mean BMI of 24 male in B is
24.00 with SD of 3.539
12
2.4 ANOVA
Uses: To test if there is a difference in means between more than two groups (3 or
more different/not related group only!!)
Drugs
A B C D
8 5 6 3
8 6 5 8
7 5 5 7
8 6 6 2
9 3 8 5
Objective: To test if there is a difference in weight loss between different drugs
13
At option…
14
15
Ho = There is no difference in
weight loss between usage of
different drugs
H1 = There is difference in
weight loss between usage of
b. What to write / interpretation
different drugs
(non-parametric test)
The p-value from Levene’s test is 0.062 which is greater than 0.05
Thus, accept null hypothesis, reject alternative hypothesis
16
The
equality of variances is assumed (meet the requirement for the parametric test, can
continue ANOVA)
Note:
IF p< 0.05, equality is not assumed, Move on to non-parametric
Kruskal-Wallis test
Interpret
P-value less than 0.05. Thus, reject null
hypothesis, accept alternative
hypothesis.
There are at least once pair of means differ significantly.
Thus we can say: C does not significantly differ from all the other three drugs
But, there are significant difference between A and both B and D (different
subset)
17
Weight loss in A is significantly larger compare to both B and D (what is the
difference?)
18
The table further validate the statement we can see from the column.
p-value between A and both B and D is 0.038 which is less than 0.05. Thus, there is
significant differences between them
Meanwhile p-value between A and C is 0.229 which is greater than 0.05, so there is
no significant difference between them
19
3. Extra
Extra 1: Degree of freedom (not so important can skip)
Eg: I have set of number 1, 2, 3, 4, 5, 6. The mean of this set of number is 3.5.
FYI (fact): the sum of all differences between the number and the means is equal to Zero.
(didn’t believe it? Let me show you)
1 – 3.5 = - 2.5
2 – 3.5 = - 1.5
3 – 3.5 = - 0.5
4 – 3.5 = 0.5
5 – 3.5 = 1.5
+ 6 – 3.5 = 2.5
0 (Tada, 0 right?!)
So, what does the degree of freedom means?
It means when you work of the calculation, we know that we must get 0 as the final
answer. So out of the 6 number here only 5 number that can varies whatever the
number are, but the last number must be a fixed number in order for us to get 0
Like the below equation, the X must be 2.5 for us to get 0!
- 2.5 - 1.5 - 0.5 + 0.5 + 1.5 + X = 0
Or
3.4 + 4.6 – 4.5 - 2 + 4 + y = 0
Or formula df = n - 1
Eg:
20
Extra 2: Standard error mean (SEM) (quite important, but is hard, can watch the video at the
link on last page for better understanding)
However, ‘Standard error of means’ is the variation of multiple mean from the same population.
Example like I want to measure the time taken by Ali in 100 m sprint, so in a set of
experiment he ran 5 times, and the mean is obtained.
Then I told him to repeat few set again in subsequent day, set 2, 3, 4, …. And the mean of
each set is obtained.
Set
1 2 3 …
50 51 … …
47 54 … …
52 46 … …
50 50 … …
48 50 ... …
mea 49.4 50.2 … …
n
So, SEM is measuring the
variation / dispersion of the means from the average of total experiment (set 1, 2, 3…..).
Why it is important?
- Because standard error allows us to calculate confidence interval
- Thus, allow us to use this data not only for a certain group of people, but also to estimate
the true population mean.
While Standard deviation just tell us that:
Eg: mean = 49, SD = 1
If I told Ali to run again in a single set of experiment the time taken by him will be probably 49 ± 1
But, repeating the experiment for many set is time consuming and use up a lot of energy, so
the statistician came up with the equation:
Standard deviation
SE = √n
21
1. Calculate the confidence interval:
eg:
SE= 1.198
Lower bound = 64.75 – 2 (1.198) = 62.35
Upper bound = 64.75 + 2 (1.198) = 67.15
So this means that we are 95% confidence that the mean of…….. of the population will be
between 62.35 – 67.15
Example: A researcher wanted to find out the effect of revision on the outcome of final
exam, the data is as below:
Do revision Didn’t do revision
69 47
54 68
80 52
mean 68 56
2x SEM 15 13
22
The bar represent the confidence
interval that we have calculated
using the SEM.
80 The mean of
70 sample,
60 being
50 overlapped
Mean
40
30
20
10
0
Do revision Didn’t do revision
23
80
70
60
50
Mean
40
30
20
10
0
Do revision Didn’t do revision
24
Chart Title
90
80
70
60
50
Mean
40
30
20
10
0
Do revision Didn’t do revision
25