Lecture 9: Non-Parametric Tests
PBS2 Statistics
Lent Term 2024
Dr Kshipra Gurunandan MRC Cognition and Brain Sciences Unit
Learning objectives
1. What are non-parametric tests?
2. Identify and calculate a Wilcoxon’s Signed-Ranks Test
3. Identify and calculate a Mann-Whitney U Test
4. Pros and cons of paramedic and non-parametric tests
Parameters and Parametric Tests
What are Parameters?
In statistics, a parameter is a
statistical characteristic of the
population.
E.g. mean, median, variance of the
population
If the population follows a
known/defined distribution, then a
small set of “parameters” can fully
describe the population.
Common Probability Distributions
Parametric Statistics
In statistics, a parameter is a
statistical characteristic of the
population.
E.g. mean, median, variance of the
population
If the population follows a
known/defined distribution, then a
small set of “parameters” can fully
describe the population.
For a sample, these are called
statistics.
We use “sample statistics” to
estimate “population parameters”.
Parametric Tests
Samples of size n
Distribution of sample
means
Mean = μ, SD =
Parametric tests involve assumptions about
one or more characteristics (‘parameters’) of the distribution of
scores
in the population from which the data were sampled.
Due to the central limit theorem, we also extend
these assumptions to the sampling distribution.
z-tests and t-tests are parametric tests
Experimental Designs
BETWEEN-
One Sample WITHIN-subject
subject
Design Design
Design
Task A Task A Task B Task A Task B
mean(A) mean(A) mean(B) mean(A) mean(B)
Are the sample means different?
𝑿 −𝝁
Parametric Tests 𝒛=
𝝈
BETWEEN-
One Sample WITHIN-subject
subject
Design Design
Design
Task A Task A Task B Task A Task B
mean(A) mean(A) mean(B) mean(A) mean(B)
One-sample Paired-sample t-test Two-sample t-test
z or t-test
𝑿 −𝝁 𝑿𝒅 ( 𝑿 ¿ ¿𝟏− 𝑿 𝟐)
𝒕= 𝒕= 𝒕= ¿
𝑺𝑬 𝑺𝑬 𝑺𝑬
Assumptions of the Student’s t-test
One sample t-tests
• Sampling distribution is normal
Paired t-tests
• Sampling distribution of differences is normal
Two-sample t-tests
• Sampling distributions are normal
• Both groups have equal variance
• Scores are independent If not, use
Welch’s t-
test
Not tested on
exam.
Now let’s look at our sample
data…
Distribution of Sample Data
Skewness of sample data
Negatively Skewed Normally distributed Positively Skewed
Distribution of Sample Data
Mode of sample data
Normally distributed
Tests for Normality
These are essentially “goodness of fit” tests,
i.e. is your data “normal enough”?
The most popular tests are:
Kolmogorov-Smirnov test,
Anderson-Darling test,
Shapiro-Wilk test.
Not tested on
exam.
Violation of Assumptions
If the assumptions of normality do not hold:
1. Sample size too small (n < 30), and/or
2. The sample statistics are unreliable
We risk inaccurate test statistics and p-values.
Some possible solutions:
3. Collect more data
4. Transform data
Use non-parametric tests
Non-parametric Tests
What are Non-Parametric Tests?
Statistical tests that do not rely on parameter estimation or
assumptions concerning the population/sampling distribution.
Can be used to test hypotheses about central tendencies
when data are measured using
interval, ratio,
or ordinal scales.
When to Use Non-Parametric Tests?
1. When the outcome has clear limits of detection
Continuous data
(interval or ratio)
2. When there are definite outliers
3. When the outcome is an ordinal variable or a rank
General Procedure
The outcome variable (ordinal, interval or continuous)
is ordered (e.g. lowest to highest).
The analysis focuses on the ranks
instead of the measured or raw values.
Non-parametric tests of central tendency
are sometimes called “rank tests”.
Principles of Ranking
Order the scores in the data set:
Principles of Ranking
Order the scores in the data set:
Scores 6 5 8 6 3 7
If data are tied (i.e., occupy the same position),
then give these data points the same rank (usually their
mean).
Hypothesis Testing
BETWEEN-
One Sample WITHIN-subject
subject
Design Design
Design
Task A Task A Task B Task A Task B
mean(A) mean(A) mean(B) mean(A) mean(B)
One-sample Paired-sample t-test Two-sample t-test
z or t-test
Wilcoxon Signed Rank Test Mann-Whitney U-Test
Wilcoxon Rank-Sum
Test
Wilcoxon Signed Rank Test
Wilcoxon Signed Rank Test
• The dependent variable is the difference in scores:
- Between sample data and population median
- Between sample data measurement 1 and
measurement 2
• If there is no difference, then the sum of positive differences
will equal sum of negative differences.
• If we expect a significant difference, then most of the
differences should be in the same direction.
Step-by-Step Guide
1. Calculate the difference scores
2. Rank the scores:
i. Ignore any differences that are 0
ii. Take the absolute scores and rank them
iii. Assign the sign (+ or –) to the rank
3. Sum the positive ranks (T+)
4. Sum the negative ranks (T-)
5. The smaller absolute value T is called Wilcoxon’s T statistic
Step-by-Step Guide
6. Find the critical value of T
7. If T < Tcrit, we reject the null hypothesis.
Wilcoxon Signed Rank Test [example]
Twelve adults completed a test of attention.
The number of errors they made are
recorded below. The median number of
errors made by adults is known to be 15.
Does this sample of adults significantly differ
from the population?
Subjec
1 2 3 4 5 6 7 8 9 10 11 12
t
Errors 12 17 9 3 16 10 28 14 5 19 20 8
Wilcoxon Signed Rank Test [example]
Subject 1 2 3 4 5 6 7 8 9 10 11 12
Errors 12 17 9 3 16 10 28 14 5 19 20 8
Differe
-3 2 -6 -12 1 -5 13 -1 -10 4 5 -7
nce
Abs.
3 2 6 12 1 5 13 1 10 4 5 7
Diff.
Rank 4 3 8 11 1 6 12 2 10 5 7 9
Tied
4 3 8 11 1.5 6.5 12 1.5 10 5 6.5 9
Rank
Signed
-4 3 -8 -11 1.5 -6.5 12 -1.5 -10 5 6.5 -9
Rank
Pos.
3 1.5 12 5 6.5
ranks
Neg.
4 8 11 6.5 1.5 10 9
ranks
Wilcoxon Signed Rank Test [example]
Subject 1 2 3 4 5 6 7 8 9 10 11 12
Pos.
3 1.5 12 5 6.5
ranks
Neg.
4 8 11 6.5 1.5 10 9
ranks
Sum of positive ranks (T+) = 28
Sum of negative ranks (T-) = 50
Wilcoxon’s T = 28
n = 12 (i.e. non-zero difference scores)
Wilcoxon Signed Rank Test [example]
In Statistics Examination handout:
15. Wilcoxon's Signed-Rank T statistic
Wilcoxon Signed Rank Test [example]
Subject 1 2 3 4 5 6 7 8 9 10 11 12
Pos.
3 1.5 12 5 6.5
ranks
Neg.
4 8 11 6.5 1.5 10 9
ranks
Sum of positive ranks (T+) = 28
Sum of negative ranks (T-) = 50
Wilcoxon’s T = 28
n = 12 (i.e. non-zero difference scores)
Two-tailed critical value for Wilcoxon’s T = 14
Wilcoxon Signed Rank Test [example]
Twelve adults completed a test of attention.
The number of errors they made are
recorded below. The median number of
errors made by adults is known to be 15.
Does this sample of adults significantly differ
from the population?
Subjec
1 2 3 4 5 6 7 8 9 10 11 12
t
Errors 12 17 9 3 16 10 28 14 5 19 20 8
Since T=28 is greater than = 14, we accept the null
hypothesis.
There is no significant difference between the number of
errors made by our sample of adults and the general
population.
Wilcoxon Signed Rank Test [example]
A researcher investigated the effect of
exposure to human male/female
experimenters on stress levels (as indexed
by raised plasma corticosterone levels in
ng/ml) in 10 lab mice. Plasma corticosterone
levels were measured after 15 minutes in
the presence of a female experimenter and
then again after 15 minutes in the presence
Mouse of1 a male
2 experimenter.
3 4 5 6 7 8 9 10
Female 120 384 139 142 378 164 169 351 178 190
Male 410 134 379 372 148 369 368 175 351 190
Wilcoxon Signed Rank Test [example]
Mouse 1 2 3 4 5 6 7 8 9 10
Female 120 384 139 142 378 164 169 351 178 190
Male 410 134 379 372 148 369 368 175 351 190
Differenc
290 -250 240 230 -230 205 199 -176 173 0
e
Abs. Diff. 290 250 240 230 230 205 199 176 173 0
Rank 9 8 7 5 6 4 3 2 1 -
Tied
9 8 7 5.5 5.5 4 3 2 1 -
Rank
Signed
9 -8 7 5.5 -5.5 4 3 -2 1 -
Rank
Pos.
9 7 5.5 4 3 1 -
ranks
Neg.
8 5.5 2 -
ranks
Wilcoxon Signed Rank Test [example]
Mouse 1 2 3 4 5 6 7 8 9 10
Pos.
9 7 5.5 4 3 1 -
ranks
Neg.
8 5.5 2 -
ranks
Sum of positive ranks (T+) = 29.5
Sum of negative ranks (T-) = 15.5
Wilcoxon’s T = 15.5
n=9
Wilcoxon Signed Rank Test [example]
In Statistics Examination handout:
15. Wilcoxon's Signed-Rank T statistic
Wilcoxon Signed Rank Test [example]
Subject 1 2 3 4 5 6 7 8 9 10 11 12
Pos.
3 1.5 12 5 6.5
ranks
Neg.
4 8 11 6.5 1.5 10 9
ranks
Sum of positive ranks (T+) = 29.5
Sum of negative ranks (T-) = 15.5
Wilcoxon’s T = 15.5
n=9
Two-tailed critical value for Wilcoxon’s T = 6
Wilcoxon Signed Rank Test [example]
A researcher investigated the effect of
exposure to human male/female
experimenters on stress levels (as indexed
by raised plasma corticosterone levels in
ng/ml) in 10 lab mice. Plasma corticosterone
levels were measured after 15 minutes in
the presence of a female experimenter and
then again after 15 minutes in the presence
of a male experimenter.
Since T=15.5 is greater than = 6, we accept the null
hypothesis.
There is no significant difference between corticosterone
levels in mice after being exposed to female vs male
experimenters.
Hypothesis Testing
BETWEEN-
One Sample WITHIN-subject
subject
Design Design
Design
Task A Task A Task B Task A Task B
mean(A) mean(A) mean(B) mean(A) mean(B)
One-sample Paired-sample t-test Two-sample t-test
z or t-test
Wilcoxon Signed Rank Test Mann-Whitney U-Test
Wilcoxon Rank-Sum
Test
Mann-Whitney U Test
Mann-Whitney U Test
• Null hypothesis: both samples are drawn from the same
population.
• If we rank all the data across the two groups, we expect
similar distribution of ranks in the two groups.
• If the sum of ranks differs significantly between the two
groups, then we reject the null hypothesis.
Step-by-Step Guide
1. If n1≠ n2 , call the smaller group ‘group 1’ and the larger
group ‘group 2’.
2. Rank all the scores (regardless of group).
3. Calculate the sum of ranks for each group (R1 and R2).
4. The Mann-Whitney U Statistic is the smaller of the two
values:
Step-by-Step Guide
5. Find the critical value for U.
6. If U < Ucrit, we reject the null hypothesis.
Mann-Whitney U Test [example]
In a study to test different methods of pain
relief, two groups of 6 adults with rheumatoid
arthritis were asked to rate their pain (on an
ordinal scale) after 1 week of treatment with
paracetamol or ibuprofen. The adults’ pain
ratings are recorded below:
Subject 1 2 3 4 5 6
Paracetamol 8 6 10 5 7 10
Ibuprofen 4 5 5 6 3 2
Mann-Whitney U Test [example]
Paracetamol Ibuprofen
Score 8 6 10 5 7 10 4 5 5 6 3 2
Rank 10 7 11 4 9 12 3 5 6 8 2 1
Tied
10 7.5 11.5 5 9 11.5 3 5 5 7.5 2 1
Rank
Rank
54.5 23.5
Sum
Mann-Whitney U Test [example]
In Statistics Examination handout:
14. Mann-Whitney U statistic
Mann-Whitney U Test [example]
Paracetamol Ibuprofen
Score 8 6 10 5 7 10 4 5 5 6 3 2
Rank 10 7 11 4 9 12 3 5 6 8 2 1
Tied
10 7.5 11.5 5 9 11.5 3 5 5 7.5 2 1
Rank
Rank
54.5 23.5
Sum
Mann-Whitney U Test [example]
In a study to test different methods of pain
relief, two groups of 6 adults with rheumatoid
arthritis were asked to rate their pain (on an
ordinal scale) after 1 week of treatment with
paracetamol or ibuprofen. The adults’ pain
ratings are recorded below:
Since <, we reject the null hypothesis.
Adults treated with Ibuprofen reported significantly less
pain than those treated with Paracetamol.
Mann-Whitney U Test [example]
A lecturer wanted to evaluate student
satisfaction with the lecture schedule. A
group of 20 students were randomly
assigned to an early (9am) or late lecture
(11am) series covering the same content.
Their satisfaction (rated on an ordinal scale)
is reported below:
Satisfaction Rating
Early 1 3 2 4 2 1 3 5 1 2
Late 3 4 4 5 2 6 3 6 3 3
Mann-Whitney U Test [example]
Early Late
Sco
re
1 3 2 4 2 1 3 5 1 2 3 4 4 5 2 6 3 6 3 3
Ra
1 8 4 14 5 2 9 17 3 6 10 15 16 18 7 19 11 20 12 13
nk
Tie
d 10. 10. 17. 10. 17. 19. 10. 19. 10. 10.
2 5.5 15 5.5 2 2 5.5 15 15 5.5
Ra 5 5 5 5 5 5 5 5 5 5
nk
Ra
nk
76 134
Su
m
Mann-Whitney U Test [example]
In Statistics Examination handout:
14. Mann-Whitney U statistic
Mann-Whitney U Test [example]
Early Late
Sco
re
1 3 2 4 2 1 3 5 1 2 3 4 4 5 2 6 3 6 3 3
Ra
1 8 4 14 5 2 9 17 3 6 10 15 16 18 7 19 11 20 12 13
nk
Tie
d 10. 10. 17. 10. 17. 19. 10. 19. 10. 10.
2 5.5 15 5.5 2 2 5.5 15 15 5.5
Ra 5 5 5 5 5 5 5 5 5 5
nk
Ra
nk
76 134
Su
m
Mann-Whitney U Test [example]
A lecturer wanted to evaluate student
satisfaction with the lecture schedule. A
group of 20 students were randomly
assigned to an early (9am) or late lecture
(11am) series covering the same content.
Their satisfaction (rated on an ordinal scale)
is reported below:
Since <, we reject the null hypothesis.
Students assigned to the late lecture gave higher
satisfaction ratings.
Parametric
vs
Non-Parametric Tests
Considerations
Data must meet a number of assumptions for parametric tests
to be accurate.
Non-parametric tests can be used when:
- Data are measured on ordinal scales
- Data are measured on interval or ratio scales but seriously
violate parametric assumptions
Fun Fact
Wilcoxon Signed-Rank Test:
When n > 25, the distribution is approximated
by the Standard Normal Distribution (i.e. z tables)
Mann-Whitney U Test:
When n2 > 20, the distribution is approximated
by the Standard Normal Distribution (i.e. z tables)
Not tested on
exam.
Non-Parametric Tests
Advantages Disadvantages
1. No restrictive 1. Limited range and
assumptions flexibility of tests
2. Lose information about
2. Not overly influenced the magnitude of
by extreme scores differences
3. Can be used to analyse 3. Less power to detect
ordinal data genuine effects
Summary
1. What are non-parametric tests?
Tests that are not reliant on assumptions about
underlying distributions.
2. How to run non-parametric tests:
Wilcoxon’s Signed-Ranks Test (one-sample or paired-
sample data)
Mann-Whitney U Test (two-sample data)
3. Pros and cons of using non-parametric tests
Fewer assumptions, but less power (if assumptions of
normality are met)
For more information…
Chapter 18
Howell, D.C. (2013).
Statistical Methods
for Psychology (8th
edition).
Acknowledgements
Thanks to Dr Andrea Greve for an earlier version of this lecture
Next Lecture
Analysing categorical data!