BES - R Lab 7
BES - R Lab 7
Non-parametric Tests 2
1. Objectives
- Explain the R procedures to conduct and check assumptions for the Sign test and Wilcoxon
signed-rank sum test.
- Understand and interpret the R output.
2. Procedures
The Wilcoxon signed-rank sum test is a non-parametric approach used to compare paired data when
the data are not normally distributed. For Wilcoxon signed rank sum test, just follow the same code as
Mann-Whitney-Wilcoxon (MWW) test except for the argument paired=TRUE:
Ø wilcox.test(x,y,alternative=”two.sided”,paired=TRUE,exact=NULL,
correct=TRUE)# data are saved in two different numeric vectors
Ø wilcox.test(outcome ~ grouping variable, data = name of data frame,
alternative = ”two.sided”, paired = TRUE)# data are saved in a data
frame
For the Sign test, we must install the package PASWR, so that we can use function SIGN.test:
Ø install.packages(“PASWR”)
Ø library(PASWR)
Ø SIGN.test(x,y,alternative=“two.sided”)
Don’t forget to check the assumptions to decide which test to use. We prefer Sign test for ranked data,
and Wilcoxon signed-rank sum test for quantitative data (with non-normal differences).
3. Exercises
Exercise 1. A test was conducted for two overnight mail delivery services. Two samples of identical
deliveries were set up so that both delivery services were notified of the need for a delivery at the same
time. The hours required to make each delivery are stored in Overnight data file. Do the data suggest a
difference in the delivery times for the two services? Use a 0.05 level of significance for the test.
We’re going to work with the file Overnight.csv, so firstly import it into R:
Ø overnight<-read.table("Overnight.csv", header = T, sep=",",
stringsAsFactors=F)
Ø head(overnight) #to see some first subjects in the dataframe
Ø str(overnight) #to see the structure of the dataframe
The next step is to check the assumptions to see which test is to be applicable in this case. From the
above codes, we know that the data are quantitative, and the two samples are matched. Let’s check the
normality of differences with the help of the stem-and-leaf display and the Q-Q plot.
Ø diff<-overnight$Service1-overnight$Service2
Ø stem(diff)
Ø qqnorm(diff,main="QQ plot of differences")
Ø qqline(diff)
The outputs are given below.
1|Page
BES – LAB 7
-0 | 444421
0 | 1111
0 | 8
Based on the R outputs, it’s not reasonable to assume that the differences in delivery times are normally
distributed. As a result, we must use a nonparametric test instead of a parametric one (t-test for matched
pairs). In more details, we’re comparing 2 populations; samples are matched; data are quantitative but
differences between paired samples cannot be assumed to be normal, so we must apply the Wilcoxon
signed-rank sum test.
Question 1. Set up the hypotheses for the test. What type of test is this?
Now we apply wilcox.test() to produce the R output for this problem. Notice that we are
asked to test for a significant difference between the 2 groups, choose alternative=”two.sided”;
and with paired samples, we set paired=TRUE.
Ø ex1<-wilcox.test(overnight$Service1,overnight$Service2,alternative
="two.sided",paired = TRUE,correct = TRUE)
Ø ex1
Wilcoxon signed rank test with continuity correction
fairs in the Midwest: How often do you think people become sick because the food they consume are
prepared at outdoor fairs and festivals? The variable “sfair” contains the responses described in the
example concerning safety of food served at outdoor fairs and festivals. The variable “srest” contains
responses to the same question asked about food served in restaurants. The possible responses were:
1 = very rarely; 2 = once in a while; 3 = often; 4 = more often than not; and 5 = always. In all, 303
people answered the question. We suspect that restaurant food will appear safer than food served
outdoors at a fair. Do the data give good evidence for this suspicion? Conduct the approporiate test with
significance level 𝛼 = 0.05.
The data are stored on file foodsafety.csv, so import it into R and check some first subjects as follows.
subject hfair sfair sfast srest gender
1 1 4 1 1 1 1
2 2 4 2 4 2 1
3 3 2 2 2 2 1
4 4 4 2 2 1 1
5 5 2 3 1 3 1
6 6 1 2 2 2 1
Check the assumptions and choose the appropriate technique.
Question 3. Are the two samples independent or matched? What type of data are represented? Should
we conduct the parametric test in this case?
On answering the above questions, you would realize that the best choice is the Sign test in the PASWR
package. Using SIGN.test() gives you the following results:
Ø ex2<-SIGN.test(foodsafety$sfair,foodsafety$srest,alternative="greater)
Ø ex2
Dependent-samples Sign-Test
3|Page
BES – LAB 7
-0 | 66666
-0 | 33333322211
0 | 000
0 | 6
Exercise 4. A student organization surveyed both current students and recent graduates to obtain
information on the quality of teaching at a particular university. An analysis of the responses provided
the following teaching-ability rankings stored in Professors.csv. Do the rankings given by the current
students agree with the rankings given by the recent graduates? Use 𝛼 = 0.1 to draw conclusion.
4|Page
BES – LAB 7
Dependent-samples Sign-Test
5|Page