0% found this document useful (0 votes)
18 views61 pages

9 Non-Parametric Tests

The document outlines Lecture 9 on Non-Parametric Tests in a statistics course, focusing on key concepts such as Wilcoxon’s Signed-Ranks Test and Mann-Whitney U Test. It explains the principles of non-parametric tests, their applications, and provides step-by-step guides for conducting these tests. Additionally, it discusses the importance of understanding when to use non-parametric tests, particularly in cases where assumptions of normality are violated.

Uploaded by

6qh7ygdfmf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views61 pages

9 Non-Parametric Tests

The document outlines Lecture 9 on Non-Parametric Tests in a statistics course, focusing on key concepts such as Wilcoxon’s Signed-Ranks Test and Mann-Whitney U Test. It explains the principles of non-parametric tests, their applications, and provides step-by-step guides for conducting these tests. Additionally, it discusses the importance of understanding when to use non-parametric tests, particularly in cases where assumptions of normality are violated.

Uploaded by

6qh7ygdfmf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 61

Lecture 9: Non-Parametric Tests

PBS2 Statistics
Lent Term 2024

Dr Kshipra Gurunandan MRC Cognition and Brain Sciences Unit


Learning objectives

1. What are non-parametric tests?

2. Identify and calculate a Wilcoxon’s Signed-Ranks Test

3. Identify and calculate a Mann-Whitney U Test

4. Pros and cons of paramedic and non-parametric tests


Parameters and Parametric Tests
What are Parameters?

In statistics, a parameter is a
statistical characteristic of the
population.
E.g. mean, median, variance of the
population
If the population follows a
known/defined distribution, then a
small set of “parameters” can fully
describe the population.
Common Probability Distributions
Parametric Statistics

In statistics, a parameter is a
statistical characteristic of the
population.
E.g. mean, median, variance of the
population
If the population follows a
known/defined distribution, then a
small set of “parameters” can fully
describe the population.
For a sample, these are called
statistics.

We use “sample statistics” to


estimate “population parameters”.
Parametric Tests

Samples of size n
Distribution of sample
means
Mean = μ, SD =

Parametric tests involve assumptions about


one or more characteristics (‘parameters’) of the distribution of
scores
in the population from which the data were sampled.
Due to the central limit theorem, we also extend
these assumptions to the sampling distribution.
z-tests and t-tests are parametric tests
Experimental Designs

BETWEEN-
One Sample WITHIN-subject
subject
Design Design
Design

Task A Task A Task B Task A Task B


mean(A) mean(A) mean(B) mean(A) mean(B)

Are the sample means different?


𝑿 −𝝁
Parametric Tests 𝒛=
𝝈

BETWEEN-
One Sample WITHIN-subject
subject
Design Design
Design

Task A Task A Task B Task A Task B


mean(A) mean(A) mean(B) mean(A) mean(B)

One-sample Paired-sample t-test Two-sample t-test


z or t-test

𝑿 −𝝁 𝑿𝒅 ( 𝑿 ¿ ¿𝟏− 𝑿 𝟐)
𝒕= 𝒕= 𝒕= ¿
𝑺𝑬 𝑺𝑬 𝑺𝑬
Assumptions of the Student’s t-test

One sample t-tests


• Sampling distribution is normal

Paired t-tests
• Sampling distribution of differences is normal

Two-sample t-tests
• Sampling distributions are normal
• Both groups have equal variance
• Scores are independent If not, use
Welch’s t-
test
Not tested on
exam.
Now let’s look at our sample
data…
Distribution of Sample Data

Skewness of sample data

Negatively Skewed Normally distributed Positively Skewed


Distribution of Sample Data

Mode of sample data

Normally distributed
Tests for Normality

These are essentially “goodness of fit” tests,


i.e. is your data “normal enough”?

The most popular tests are:


Kolmogorov-Smirnov test,
Anderson-Darling test,
Shapiro-Wilk test.

Not tested on
exam.
Violation of Assumptions

If the assumptions of normality do not hold:


1. Sample size too small (n < 30), and/or
2. The sample statistics are unreliable
We risk inaccurate test statistics and p-values.
Some possible solutions:
3. Collect more data
4. Transform data

Use non-parametric tests


Non-parametric Tests
What are Non-Parametric Tests?

Statistical tests that do not rely on parameter estimation or


assumptions concerning the population/sampling distribution.

Can be used to test hypotheses about central tendencies


when data are measured using
interval, ratio,
or ordinal scales.
When to Use Non-Parametric Tests?

1. When the outcome has clear limits of detection


Continuous data
(interval or ratio)
2. When there are definite outliers

3. When the outcome is an ordinal variable or a rank


General Procedure

The outcome variable (ordinal, interval or continuous)


is ordered (e.g. lowest to highest).

The analysis focuses on the ranks


instead of the measured or raw values.

Non-parametric tests of central tendency


are sometimes called “rank tests”.
Principles of Ranking

Order the scores in the data set:


Principles of Ranking

Order the scores in the data set:


Scores 6 5 8 6 3 7

If data are tied (i.e., occupy the same position),


then give these data points the same rank (usually their
mean).
Hypothesis Testing

BETWEEN-
One Sample WITHIN-subject
subject
Design Design
Design

Task A Task A Task B Task A Task B


mean(A) mean(A) mean(B) mean(A) mean(B)

One-sample Paired-sample t-test Two-sample t-test


z or t-test

Wilcoxon Signed Rank Test Mann-Whitney U-Test


Wilcoxon Rank-Sum
Test
Wilcoxon Signed Rank Test
Wilcoxon Signed Rank Test

• The dependent variable is the difference in scores:

- Between sample data and population median


- Between sample data measurement 1 and
measurement 2

• If there is no difference, then the sum of positive differences


will equal sum of negative differences.
• If we expect a significant difference, then most of the
differences should be in the same direction.
Step-by-Step Guide

1. Calculate the difference scores


2. Rank the scores:
i. Ignore any differences that are 0
ii. Take the absolute scores and rank them
iii. Assign the sign (+ or –) to the rank
3. Sum the positive ranks (T+)
4. Sum the negative ranks (T-)
5. The smaller absolute value T is called Wilcoxon’s T statistic
Step-by-Step Guide

6. Find the critical value of T

7. If T < Tcrit, we reject the null hypothesis.


Wilcoxon Signed Rank Test [example]

Twelve adults completed a test of attention.


The number of errors they made are
recorded below. The median number of
errors made by adults is known to be 15.
Does this sample of adults significantly differ
from the population?
Subjec
1 2 3 4 5 6 7 8 9 10 11 12
t
Errors 12 17 9 3 16 10 28 14 5 19 20 8
Wilcoxon Signed Rank Test [example]

Subject 1 2 3 4 5 6 7 8 9 10 11 12

Errors 12 17 9 3 16 10 28 14 5 19 20 8
Differe
-3 2 -6 -12 1 -5 13 -1 -10 4 5 -7
nce
Abs.
3 2 6 12 1 5 13 1 10 4 5 7
Diff.
Rank 4 3 8 11 1 6 12 2 10 5 7 9
Tied
4 3 8 11 1.5 6.5 12 1.5 10 5 6.5 9
Rank
Signed
-4 3 -8 -11 1.5 -6.5 12 -1.5 -10 5 6.5 -9
Rank
Pos.
3 1.5 12 5 6.5
ranks
Neg.
4 8 11 6.5 1.5 10 9
ranks
Wilcoxon Signed Rank Test [example]

Subject 1 2 3 4 5 6 7 8 9 10 11 12

Pos.
3 1.5 12 5 6.5
ranks
Neg.
4 8 11 6.5 1.5 10 9
ranks
Sum of positive ranks (T+) = 28
Sum of negative ranks (T-) = 50
Wilcoxon’s T = 28
n = 12 (i.e. non-zero difference scores)
Wilcoxon Signed Rank Test [example]

In Statistics Examination handout:

15. Wilcoxon's Signed-Rank T statistic


Wilcoxon Signed Rank Test [example]

Subject 1 2 3 4 5 6 7 8 9 10 11 12

Pos.
3 1.5 12 5 6.5
ranks
Neg.
4 8 11 6.5 1.5 10 9
ranks
Sum of positive ranks (T+) = 28
Sum of negative ranks (T-) = 50
Wilcoxon’s T = 28
n = 12 (i.e. non-zero difference scores)
Two-tailed critical value for Wilcoxon’s T = 14
Wilcoxon Signed Rank Test [example]

Twelve adults completed a test of attention.


The number of errors they made are
recorded below. The median number of
errors made by adults is known to be 15.
Does this sample of adults significantly differ
from the population?
Subjec
1 2 3 4 5 6 7 8 9 10 11 12
t
Errors 12 17 9 3 16 10 28 14 5 19 20 8

Since T=28 is greater than = 14, we accept the null


hypothesis.

There is no significant difference between the number of


errors made by our sample of adults and the general
population.
Wilcoxon Signed Rank Test [example]

A researcher investigated the effect of


exposure to human male/female
experimenters on stress levels (as indexed
by raised plasma corticosterone levels in
ng/ml) in 10 lab mice. Plasma corticosterone
levels were measured after 15 minutes in
the presence of a female experimenter and
then again after 15 minutes in the presence
Mouse of1 a male
2 experimenter.
3 4 5 6 7 8 9 10

Female 120 384 139 142 378 164 169 351 178 190

Male 410 134 379 372 148 369 368 175 351 190
Wilcoxon Signed Rank Test [example]

Mouse 1 2 3 4 5 6 7 8 9 10

Female 120 384 139 142 378 164 169 351 178 190

Male 410 134 379 372 148 369 368 175 351 190
Differenc
290 -250 240 230 -230 205 199 -176 173 0
e
Abs. Diff. 290 250 240 230 230 205 199 176 173 0

Rank 9 8 7 5 6 4 3 2 1 -
Tied
9 8 7 5.5 5.5 4 3 2 1 -
Rank
Signed
9 -8 7 5.5 -5.5 4 3 -2 1 -
Rank
Pos.
9 7 5.5 4 3 1 -
ranks
Neg.
8 5.5 2 -
ranks
Wilcoxon Signed Rank Test [example]

Mouse 1 2 3 4 5 6 7 8 9 10

Pos.
9 7 5.5 4 3 1 -
ranks
Neg.
8 5.5 2 -
ranks
Sum of positive ranks (T+) = 29.5
Sum of negative ranks (T-) = 15.5
Wilcoxon’s T = 15.5
n=9
Wilcoxon Signed Rank Test [example]

In Statistics Examination handout:

15. Wilcoxon's Signed-Rank T statistic


Wilcoxon Signed Rank Test [example]

Subject 1 2 3 4 5 6 7 8 9 10 11 12

Pos.
3 1.5 12 5 6.5
ranks
Neg.
4 8 11 6.5 1.5 10 9
ranks
Sum of positive ranks (T+) = 29.5
Sum of negative ranks (T-) = 15.5
Wilcoxon’s T = 15.5
n=9
Two-tailed critical value for Wilcoxon’s T = 6
Wilcoxon Signed Rank Test [example]

A researcher investigated the effect of


exposure to human male/female
experimenters on stress levels (as indexed
by raised plasma corticosterone levels in
ng/ml) in 10 lab mice. Plasma corticosterone
levels were measured after 15 minutes in
the presence of a female experimenter and
then again after 15 minutes in the presence
of a male experimenter.
Since T=15.5 is greater than = 6, we accept the null
hypothesis.

There is no significant difference between corticosterone


levels in mice after being exposed to female vs male
experimenters.
Hypothesis Testing

BETWEEN-
One Sample WITHIN-subject
subject
Design Design
Design

Task A Task A Task B Task A Task B


mean(A) mean(A) mean(B) mean(A) mean(B)

One-sample Paired-sample t-test Two-sample t-test


z or t-test

Wilcoxon Signed Rank Test Mann-Whitney U-Test


Wilcoxon Rank-Sum
Test
Mann-Whitney U Test
Mann-Whitney U Test

• Null hypothesis: both samples are drawn from the same


population.

• If we rank all the data across the two groups, we expect


similar distribution of ranks in the two groups.

• If the sum of ranks differs significantly between the two


groups, then we reject the null hypothesis.
Step-by-Step Guide

1. If n1≠ n2 , call the smaller group ‘group 1’ and the larger


group ‘group 2’.
2. Rank all the scores (regardless of group).
3. Calculate the sum of ranks for each group (R1 and R2).

4. The Mann-Whitney U Statistic is the smaller of the two


values:
Step-by-Step Guide

5. Find the critical value for U.

6. If U < Ucrit, we reject the null hypothesis.


Mann-Whitney U Test [example]

In a study to test different methods of pain


relief, two groups of 6 adults with rheumatoid
arthritis were asked to rate their pain (on an
ordinal scale) after 1 week of treatment with
paracetamol or ibuprofen. The adults’ pain
ratings are recorded below:

Subject 1 2 3 4 5 6

Paracetamol 8 6 10 5 7 10

Ibuprofen 4 5 5 6 3 2
Mann-Whitney U Test [example]

Paracetamol Ibuprofen

Score 8 6 10 5 7 10 4 5 5 6 3 2

Rank 10 7 11 4 9 12 3 5 6 8 2 1
Tied
10 7.5 11.5 5 9 11.5 3 5 5 7.5 2 1
Rank
Rank
54.5 23.5
Sum
Mann-Whitney U Test [example]

In Statistics Examination handout:

14. Mann-Whitney U statistic


Mann-Whitney U Test [example]

Paracetamol Ibuprofen

Score 8 6 10 5 7 10 4 5 5 6 3 2

Rank 10 7 11 4 9 12 3 5 6 8 2 1
Tied
10 7.5 11.5 5 9 11.5 3 5 5 7.5 2 1
Rank
Rank
54.5 23.5
Sum
Mann-Whitney U Test [example]

In a study to test different methods of pain


relief, two groups of 6 adults with rheumatoid
arthritis were asked to rate their pain (on an
ordinal scale) after 1 week of treatment with
paracetamol or ibuprofen. The adults’ pain
ratings are recorded below:

Since <, we reject the null hypothesis.

Adults treated with Ibuprofen reported significantly less


pain than those treated with Paracetamol.
Mann-Whitney U Test [example]

A lecturer wanted to evaluate student


satisfaction with the lecture schedule. A
group of 20 students were randomly
assigned to an early (9am) or late lecture
(11am) series covering the same content.
Their satisfaction (rated on an ordinal scale)
is reported below:
Satisfaction Rating

Early 1 3 2 4 2 1 3 5 1 2

Late 3 4 4 5 2 6 3 6 3 3
Mann-Whitney U Test [example]

Early Late
Sco
re
1 3 2 4 2 1 3 5 1 2 3 4 4 5 2 6 3 6 3 3

Ra
1 8 4 14 5 2 9 17 3 6 10 15 16 18 7 19 11 20 12 13
nk
Tie
d 10. 10. 17. 10. 17. 19. 10. 19. 10. 10.
2 5.5 15 5.5 2 2 5.5 15 15 5.5
Ra 5 5 5 5 5 5 5 5 5 5
nk
Ra
nk
76 134
Su
m
Mann-Whitney U Test [example]

In Statistics Examination handout:

14. Mann-Whitney U statistic


Mann-Whitney U Test [example]

Early Late
Sco
re
1 3 2 4 2 1 3 5 1 2 3 4 4 5 2 6 3 6 3 3

Ra
1 8 4 14 5 2 9 17 3 6 10 15 16 18 7 19 11 20 12 13
nk
Tie
d 10. 10. 17. 10. 17. 19. 10. 19. 10. 10.
2 5.5 15 5.5 2 2 5.5 15 15 5.5
Ra 5 5 5 5 5 5 5 5 5 5
nk
Ra
nk
76 134
Su
m
Mann-Whitney U Test [example]

A lecturer wanted to evaluate student


satisfaction with the lecture schedule. A
group of 20 students were randomly
assigned to an early (9am) or late lecture
(11am) series covering the same content.
Their satisfaction (rated on an ordinal scale)
is reported below:

Since <, we reject the null hypothesis.

Students assigned to the late lecture gave higher


satisfaction ratings.
Parametric
vs
Non-Parametric Tests
Considerations

Data must meet a number of assumptions for parametric tests


to be accurate.

Non-parametric tests can be used when:


- Data are measured on ordinal scales
- Data are measured on interval or ratio scales but seriously
violate parametric assumptions
Fun Fact

Wilcoxon Signed-Rank Test:


When n > 25, the distribution is approximated
by the Standard Normal Distribution (i.e. z tables)

Mann-Whitney U Test:
When n2 > 20, the distribution is approximated
by the Standard Normal Distribution (i.e. z tables)

Not tested on
exam.
Non-Parametric Tests

Advantages Disadvantages
1. No restrictive 1. Limited range and
assumptions flexibility of tests
2. Lose information about
2. Not overly influenced the magnitude of
by extreme scores differences
3. Can be used to analyse 3. Less power to detect
ordinal data genuine effects
Summary

1. What are non-parametric tests?


Tests that are not reliant on assumptions about
underlying distributions.

2. How to run non-parametric tests:


Wilcoxon’s Signed-Ranks Test (one-sample or paired-
sample data)
Mann-Whitney U Test (two-sample data)

3. Pros and cons of using non-parametric tests


Fewer assumptions, but less power (if assumptions of
normality are met)
For more information…

Chapter 18
Howell, D.C. (2013).
Statistical Methods
for Psychology (8th
edition).
Acknowledgements

Thanks to Dr Andrea Greve for an earlier version of this lecture


Next Lecture

Analysing categorical data!

You might also like