0% found this document useful (0 votes)
125 views5 pages

Nonparametric Tests in R

The document discusses nonparametric tests that can be performed in R including the sign test, Wilcoxon signed-rank test, Mann-Whitney-Wilcoxon test, and Kruskal-Wallis test. Examples are provided to demonstrate how to apply each test in R and interpret the results. The sign test is used to compare two products' popularity, the Wilcoxon signed-rank test compares barley yields between years, the Mann-Whitney-Wilcoxon test analyzes rainfall data between stations, and the Kruskal-Wallis test examines monthly ozone levels.

Uploaded by

prem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
125 views5 pages

Nonparametric Tests in R

The document discusses nonparametric tests that can be performed in R including the sign test, Wilcoxon signed-rank test, Mann-Whitney-Wilcoxon test, and Kruskal-Wallis test. Examples are provided to demonstrate how to apply each test in R and interpret the results. The sign test is used to compare two products' popularity, the Wilcoxon signed-rank test compares barley yields between years, the Mann-Whitney-Wilcoxon test analyzes rainfall data between stations, and the Kruskal-Wallis test examines monthly ozone levels.

Uploaded by

prem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

NONPARAMETRIC TESTS IN R

B N Mandal
I.A.S.R.I., Library Avenue, New Delhi – 110 012
bnmandal @iasri.res.in

Introduction
Nonparametric or distribution free tests are so-called because the assumptions underlying their
use are “fewer and weaker than those associated with parametric tests” (Siegel & Castellan,
1988, p. 34). To put it another way, nonparametric tests require fewer assumptions about the
shapes of the underlying population distributions. For this reason, they are often used in place of
parametric tests when one feels that the assumptions of the parametric test have been too grossly
violated (e.g., if the distributions are too severely skewed). Purpose of this note is to demonstrate
how R software can be used to perform nonparametric tests.

Sign Test
The sign test is one of the simplest nonparametric tests. It is for use with 2 repeated (or
correlated) measures (see the example below), and measurement is assumed to be at least
ordinal. For each subject, subtract the 2nd score from the 1st, and write down the sign of the
difference. (That is write “-” if the difference score is negative, and “+” if it is positive.) The
usual null hypothesis for this test is that there is no difference between the two treatments. If this
is so, then the number of + signs (or - signs, for that matter) should have a binomial distribution
with p = .5, and N = the number of subjects. In other words, the sign test is just a binomial test
with + and - in place of Head and Tail (or Success and Failure), i.e., a sign test is used to decide
whether a binomial distribution has the equal chance of success and failure.

Example
A food product company has invented a new product, and would like to find out if it will be as
popular as the existing favorite product. For this purpose, its research department arranges 18
participants for taste testing. Each participant tries both products in random order before giving
his or her opinion. It turns out that 5 of the participants like the new product better, and the rest
prefer the old one. At .05 significance level, can we reject the notion that the two products are
equally popular?

The null hypothesis is that the products are equally popular. Here we apply the binom.test
function. As the p-value turns out to be 0.096525, and is greater than the .05 significance level,
we do not reject the null hypothesis.

> binom.test(5, 18)

Exact binomial test

data: 5 and 18
number of successes = 5, number of trials = 18,
p-value = 0.09625
alternative hypothesis: true probability of success is not equal to 0.5
Non-parametric tests in R

95 percent confidence interval:


0.09695 0.53480
sample estimates:
probability of success
0.27778

At .05 significance level, we do not reject the notion that the two products are equally popular.

Wilcoxon Signed-Rank Test


One drawback of the sign test is that it discards a lot of information about the data. It takes into
account the direction of the difference, but not the magnitude of the difference between each pair
of scores. The Wilcoxon signed-ranks test is another nonparametric test that can be used for 2
repeated (or correlated) measures when measurement is at least ordinal. But unlike the sign test,
it does take into account (to some degree, at least) the magnitude of the difference. Two data
samples are matched if they come from repeated observations of the same subject. Using the
Wilcoxon Signed-Rank Test, we can decide whether the corresponding data population
distributions are identical without assuming them to follow the normal distribution.

Example
Barley yield in the year 1931 and 1932 of the same field are recorded for different varieties.
Loc Var Y1 Y2
UF M 81 80.7
UF S 105.4 82.3
UF V 119.7 80.4
UF T 109.7 87.2
UF P 98.3 84.2
W M 146.6 100.4
W S 142 115.5
W V 150.7 112.2
W T 191.5 147.7
W P 145.7 108.1
M M 82.3 103.1
M S 77.3 105.1
M V 78.4 116.5
M T 131.3 139.9
M P 89.6 129.6
C M 119.8 98.9
C S 121.4 61.9
C V 124 96.2
C T 140.8 125.5
C P 124.8 75.7
GR M 98.9 66.4
Non-parametric tests in R

GR S 89 49.9
GR V 69.1 96.7
GR T 89.3 61.9
GR P 104.1 80.3
D M 86.9 67.7
D S 77.1 66.7
D V 78.9 67.4
D T 101.8 91.8
D P 96 94.1

Without assuming the data to have normal distribution, test at .05 significance level if the barley
yields of 1931 and 1932 have identical distributions.

The null hypothesis is that the barley yields of the two sample years are identical populations. To
test the hypothesis, we apply the wilcox.test function to compare the matched samples. For the
paired test, we set the "paired" argument as TRUE. As the p-value turns out to be 0.005318, and
is less than the .05 significance level, we reject the null hypothesis.

> barley=read.csv(file.choose())

> attach(barley)

> wilcox.test(Y1,Y2,paired=TRUE)

Wilcoxon signed rank test with continuity correction

data: Y1 and Y2
V = 368.5, p-value = 0.005318
alternative hypothesis: true location shift is not equal to 0

Warning message:
In wilcox.test.default(Y1, Y2, paired = TRUE) :
cannot compute exact p-value with ties

Mann-Whitney-Wilcoxon Test
Two data samples are independent if they come from distinct populations and the samples do not
affect each other. Using the Mann-Whitney-Wilcoxon Test, we can decide whether the
population distributions are identical without assuming them to follow the normal distribution.

Example
The seasonal rainfall in two stations is given below. Without assuming the data to have normal
distribution, test whether the distribution of rainfall in two stations is same or not.

Station A Station B
Non-parametric tests in R

1011.07 496.44
1066.82 541.76
610.8 1562.01
1111.44 2515.12
955.68 1133.99
1203.84 300.33
1600.32 482.55
555.9 503.22
1302.95 2744.23
182.34 1232.22
1233.2
1402.09
> rainfall=read.csv(file.choose())

> attach(rainfall)

To test the hypothesis, we apply the wilcox.test function to compare the independent samples. As
the p-value turns out to be 0.001817, and is less than the .05 significance level, we reject the null
hypothesis.

> wilcox.test(Station_A,Station_B)

Wilcoxon rank sum test

data: Station_A and Station_B

W = 65, p-value = 0.7713

At .05 significance level, we conclude that the rainfall distribution in two stations is same

Kruskal-Wallis Test
A collection of data samples are independent if they come from unrelated populations and the
samples do not affect each other. Using the Kruskal-Wallis Test, we can decide whether the
population distributions are identical without assuming them to follow the normal distribution.

In the built-in data set named airquality, the daily air quality measurements in New York, May to
September 1973, are recorded. The ozone density is presented in the data frame column Ozone.

> head(airquality)

Ozone Solar.R Wind Temp Month Day


Non-parametric tests in R

1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6

Without assuming the data to have normal distribution, test at .05 significance level if the
monthly ozone density in New York has identical data distributions from May to September
1973.

The null hypothesis is that the monthly ozone density is same from May to September. To test
the hypothesis, we apply the kruskal.test function to compare the independent monthly data. The
p-value turns out to be nearly zero (6.901e-06). Hence we reject the null hypothesis.

> kruskal.test(Ozone ~ Month, data = airquality)

Kruskal-Wallis rank sum test

data: Ozone by Month


Kruskal-Wallis chi-squared = 29.267, df = 4, p-value = 6.901e-06

At .05 significance level, we conclude that the monthly ozone density in New York from May to
September 1973 are nonidentical populations.

References

R Development Core Team (2011). R: A language and environment for statistical computing. R
Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL
http://www.R-project.org/.

Siegel, S., & Castellan, N.J. (1988). Nonparametric statistics for the behavioral sciences (2nd
Ed.). New York, NY: McGraw-Hill.

You might also like