Example Mann Whitney U Test
Example Mann Whitney U Test
Consider a Phase II clinical trial designed to investigate the effectiveness of a new drug to reduce
symptoms of asthma in children. A total of n=10 participants are randomized to receive either the new
drug or a placebo. Participants are asked to record the number of episodes of shortness of breath over a
1 week period following receipt of the assigned treatment. The data are shown below.
Placebo 7 5 6 4 12
New Drug 3 6 4 2 1
Is there a difference in the number of episodes of shortness of breath over a 1 week period in participants
receiving the new drug as compared to those receiving the placebo? By inspection, it appears that
participants receiving the placebo have more episodes of shortness of breath, but is this statistically
significant?
The sample size is small (n1=n2=5), so a nonparametric test is appropriate. The hypothesis is given
below, and we run the test at the 5% level of significance (i.e., α=0.05).
H0: The two populations are equal versus
H1: The two populations are not equal.
where R1 = sum of the ranks for group 1 and R2 = sum of the ranks for group 2
7 3 1 1
5 6 2 2
6 4 3 3
4 2 4 4 4.5 4.5
12 1 5 6
6 6 7.5 7.5
7 9
12 10
First, we sum the ranks in each group. In the placebo group, the sum of the ranks is 37; in the new drug
group, the sum of the ranks is 18. Recall that the sum of the ranks will always equal n(n+1)/2. As a check
on our assignment of ranks, we have n(n+1)/2 = 10(11)/2=55 which is equal to 37+18 = 55.
To determine the appropriate critical value we need sample sizes (for Example: n1=n2=5) and our two-
sided level of significance (α=0.05). For Example 1 the critical value is 2, and the decision rule is to reject
H0 if U < 2. We do not reject H0 because 3 > 2. We do not have statistically significant evidence at α
=0.05, to show that the two populations of numbers of episodes of shortness of breath are not equal.
However, in this example, the failure to reach statistical significance may be due to low power. The
sample data suggest a difference, but the sample sizes are too small to conclude that there is a
statistically significant difference.