0% found this document useful (0 votes)
165 views106 pages

Statistical Analysis Using SPSS and R - Chapter 4 PDF

The document discusses parametric and non-parametric statistical tests. Parametric tests like the t-test and ANOVA rely on assumptions like normality, while non-parametric tests like the Wilcoxon and Kruskal-Wallis tests are more flexible. It explains how to check if data meets normality assumptions using the Shapiro-Wilk and Kolmogorov-Smirnov normality tests in both SPSS and R. Based on the results, one would choose between parametric and non-parametric tests.

Uploaded by

Karl Lewis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
165 views106 pages

Statistical Analysis Using SPSS and R - Chapter 4 PDF

The document discusses parametric and non-parametric statistical tests. Parametric tests like the t-test and ANOVA rely on assumptions like normality, while non-parametric tests like the Wilcoxon and Kruskal-Wallis tests are more flexible. It explains how to check if data meets normality assumptions using the Shapiro-Wilk and Kolmogorov-Smirnov normality tests in both SPSS and R. Based on the results, one would choose between parametric and non-parametric tests.

Uploaded by

Karl Lewis
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 106

4.

INFERENTIAL STATISTICS: PARAMETRIC


AND NON-PARAMETRIC TESTS

4.1 Introduction

The term ‘Parametric tests’ will be used here to refer to statistical techniques,
such as the t-tests and the One-Way Anova test, that rely on a number of
assumptions on the distribution of the covariate of interest. One of these
assumptions is that the covariate observations obtained on the different
subgroups being considered come from a normally distributed population or
that the distribution of the population can be approximated by a normal
distribution. Since there are many situations where all necessary assumptions
cannot be met, statisticians have developed alternative tests that are based on
less stringent assumptions, known as non-parametric tests. In particular, these
non-parametric tests are used when the data entries are ranks or when the data
does not satisfy the normality condition.

Parametric Non-Parametric
One-Sample t-test One-Sample Wilcoxon Signed
Rank Test / Sign Test
Independent Samples T-test Mann-Whitney U test
One-Way Anova Kruskal-Wallis Test
Paired Samples t-test Wilcoxon Signed Ranks Test
Repeated measures ANOVA Friedman Test
Table 4.1.1

When conducting such inferential statistics the first step is to conduct


normality tests on the variable of interest.

133 | P a g e
If the variable follows an approximate normal distribution and satisfies any
other assumptions upon which the test relies, we use parametric tests, if not
we consider the non-parametric alternatives.

Two popular tests for checking normality (hypothesis testing) are the Shapiro-
Wilk test and the well-known Kolmogorov-Smirnov test. Both test the
following hypothesis:

H0: Variable follows a normal distribution


H1: Variable does not follow a normal distribution

OR otherwise stated as:

H0: The distribution of the variable can be approximated by a normal


distribution
H1: The distribution of the variable cannot be approximated by a normal
distribution

The Kolmogorov-Smirnov test was designed to compare two distributions


(not necessarily involving the normal distribution). It is based on a simple
way to quantify the discrepancy between the observed (the actual distribution
of the data) and the expected distributions (the latter is the distribution versus
which the distribution of the variable of interest is being compared). On the
other hand, the Shapiro-Wilk test is specifically a test for normality.

We shall see how each of these tests is conducted in SPSS and R/RStudio due
to the fact that both tests are widely used. However, if the two tests give
conflicting results, the result obtained from the Shapiro-Wilk test is
considered, since it has been found to have much better performance than the
Kolmogorov-Smirnov test.

134 | P a g e
Note that, to avoid unnecessary repetition, a detailed description of a test
is provided when demonstrating how the test is conducted using SPSS but
such detail will be skipped when demonstrating how this is conducted
using R. Readers focusing on R software might need to look for more
detail in the sections on SPSS. Remember the tests are the same no matter
which software you use. Interpretation does not change from one
software to another it is the execution and presentation which changes.

In SPSS

Consider once again the Employee dataset.

To conduct these normality tests, go to AnalyzeDescriptive


StatisticsExplore, to obtain:

Figure 4.1.1

On choosing the variable of interest, for example Current Salary, click on


Plots, Normality Plots with Tests, Continue and Ok. A number of outputs are
issued with the most important for us being the table which follows:

135 | P a g e
Tests of Normality

Kolmogorov-Smirnova Shapiro-Wilk
Statistic df Sig. Statistic df Sig.
Current Salary .208 474 .000 .771 474 .000
a. Lilliefors Significance Correction

Table 4.1.2

Since both the Kolmogorov-Smirnov test and the Shapiro-Wilk test gave out
a p-value (sig) of 0 < 0.05, then the hypothesis that Current Salary follows a
normal distribution has to be rejected. Current Salary is thus not normally
distributed.

Note that since the Kolmogorov-Smirnov test is a non-parametric test, it may


also be obtained by doing AnalyzeNonparametric TestsLegacy
DialogsSample K-S.

Another interesting output, that is obtained through the Explore procedure, is


the Normal Q-Q Plot:

Figure 4.1.2

136 | P a g e
A Q-Q plot charts observed values against a known distribution, in this case
a normal distribution. The expected normal distribution is the straight line
and the line made up of little circles is obtained from the observed values from
our data. If our data follows a normal distribution, the plot would have
observations lying close to the straight line. Hence our plot shows that the
distribution of our data deviates from normality almost anywhere.

Another output is that of the Detrended Normal Q-Q Plot:

Figure 4.1.3

The Detrended Normal Q-Q plot shows the differences between the observed
and the expected values of a normal distribution. If the distribution is normal,
the points should cluster in a horizontal band around zero with no pattern. Our
plot again indicates deviation from normality.

137 | P a g e
In R/RStudio

Packages used in this section:

lawstat
mbess
pmcmr

Suppose that we have the following sample of ages:

65, 61, 63, 86, 70, 55, 74, 35, 72, 68, 45, 58.

We shall now check whether the age sample comes from a normal
distribution. We enter the above sample into a variable mydata as follows:

mydata<-c(65, 61, 63, 86, 70, 55, 74, 35, 72, 68, 45, 58)

Next, we perform the Shapiro-Wilk test by typing the command


shapiro.test(mydata). This gives the output:

Shapiro-Wilk normality test

data: mydata
W = 0.97107, p-value = 0.9216

Given that the p-value is greater than either 0.1, 0.05, and 0.01, the standard
levels of significance (0.05 is typically widely used), we deduce that the
Shapiro-Wilk test does not reject normality for the age variable.

The QQ-plot provides a graphical way to check for normality. This is done as
follows:

qqnorm(mydata,plot.it=TRUE)
qqline(mydata)

The QQ-plot given is the following:


138 | P a g e
Figure 4.1.4

More detail on the Q-Q plot has been provided earlier when discussing how
these plots are issued using SPSS. It is important that one never uses the QQ-
plot to draw conclusions about normality or lack of. It should be used as an
indicative tool. The proper way to determine if the variable of interest is
normally distributed or not is through a goodness-of-fit-test such as Shapiro-
Wilk test and Kolmogorov-Smirnov test which will be discussed next.

As mentioned earlier, the Kolmogorov-Smirnov test is considered to be a


weaker test for normality than the Shapiro-Wilk test. If the two tests give
contradicting results, the result from the Shapiro-Wilk test should be
considered. The Kolmogorov-Smirnov test may also be used to check
whether the covariate being considered comes from a population with a
distribution other than the normal. For this purpose, we also show how to
apply the Kolmogorov-Smirnov test function. If, for example, we want to test
whether the data in the variable mydata is obtained from a t-distributed
population with 5 degrees of freedom, we type the following command:

ks.test(mydata,"pt",df=5)
where "pt" refers to the cumulative distribution of the t-distribution and
df=5 refers to the degrees of freedom parameter being tested.

139 | P a g e
To see how the command works, generate 1000 readings from a t-distribution
with 5 degrees of freedom as follows:

mydata<-rt(1000,df=5)

and then apply the ks.test command.

4.2 Parametric Tests

In this section we shall start by discussing a number of t-tests. There are three
types of t-tests: One-Sample t-test, Independent-Samples t-test, Paired-
Samples t-test.

Reminder: So as to conduct a one sample t-test, the data should be a sample


drawn from a population following a normal distribution and if using the t-
test to compare two groups (independent samples t-test), the groups should be
randomly drawn from normally distributed and independent populations.

While each of the t-tests compares mean values they are designed for
distinctly different situations as we shall shortly see.

4.2.1 One-Sample T-test

The one-sample t-test is used to compare the mean of a single sample with a
specified population mean.

Example: A company producing wafers claims that the average amount of


sugar in each wafer is 14.1 g. A random sample of 36 wafers was selected
to check whether the sugar content was significantly different than the
specified value.

140 | P a g e
14.5 16.2 14.4 15.8 13.1 12.9 17.3 15.5 16.2 14.9 13.9 15.0
14.4 15.6 13.9 15.6 14.4 16.4 17.9 15.0 14.8 13.6 16.1 15.2
14.3 15.8 16.4 16.6 17.1 13.5 15.8 14.7 16.0 13.4 15.8 16.7

The null and alternative hypotheses for this two-tailed test are:

H 0 : Actual average amount of sugar is 14.1 g


H1 : Actual average amount of sugar is not 14.1g

which may alternatively be presented as:

H 0 :   14.1g
H1 :   14.1g

Note that this is a two-tailed test since in the alternative hypothesis we are
considering values > 14.1g and values < 14.1g.

First using the normality tests discussed earlier we test if the data set under
study can be assumed to follow a normal distribution. To be able to check for
normality we need to enter the data in the format required by the software.

In SPSS

The list of readings making up the covariate ‘sugar’ is inserted in the cells of
the first column.

From Table 4.2.1, below, we note that the p-values are greater than 0.05 for
both the Kolmogorov-Smirnov test and Shapiro Wilk test, hence we cannot
reject the assumption of normality.

141 | P a g e
Tests of Normality
Kolmogorov-Smirnova Shapiro-Wilk
Statistic df Sig. Statistic df Sig.
Sugar content in grams .091 36 .200* .984 36 .876
*. This is a lower bound of the true significance.
a. Lilliefors Significance Correction

Table 4.2.1

Since the covariate sugar is normally distributed, we proceed to check whether


the population mean amount of sugar is significantly different from 14.1g by
using the one-sample t-test as follows:

Choose Analyze from the bar menu, select Compare Means and click on One-
Sample t Test. Move ‘sugar’ to the test variable box and input 14.1 for the
test value. Click OK to run the procedure:

Figure 4.2.1

The resulting output is:

142 | P a g e
One-Sample Statistics
N Mean Std. Deviation Std. Error Mean
Sugar content in grams 36 15.2417 1.23297 .20549

Table 4.2.2

One-Sample Test
Test Value = 14.1
95% Confidence Interval of the
Mean Difference
t df Sig. (2-tailed) Difference Lower Upper
Sugar content in 5.556 35 .000 1.14167 .7245 1.5588
grams

Table 4.2.3

Table 4.2.2 shows that the mean sample sugar content is 15.24g with a
standard deviation of 1.23g. Now, accepting the null hypothesis is the same
as saying that the mean difference between the population mean, 14.1g and
the sample mean, 15.24g is not statistically significant. This mean difference
in fact plays a major role in calculating the t-test. The t-test is simply a ratio
of the mean difference and the standard error of the mean.

From Table 4.2.3, the mean difference is 1.14167g and from Table 4.2.2, the
standard error of the mean is 0.020549g.

1.14167
The t statistic is thus  5.556 .
0.20549

Working out manually, we would have to get the critical values from the t-
distribution and hence check whether 5.556 lies in the acceptance or the
rejection region. The SPSS output reports a t-statistic and degrees of freedom
for all t-test procedures. Every unique value of the t-statistic and its associated
degrees of freedom have a significance value. So, when using SPSS, we just
have to consider the p-value (sig) so as to determine whether to accept or reject
H0 .

143 | P a g e
In this case, our p-value is 0. Now, by default SPSS works out using 95%
confidence (0.05 level of significance). Hence, since 0 < 0.05 we reject H 0
whilst if p-value was > 0.05 we would not reject H 0 . Hence we are 95%
confident that the actual sugar content is not equal to 14.1g.

If on the other hand it was required for us to test whether the mean sugar
content is larger than 14.1g, we would be dealing with a one-tailed test and
the hypotheses are:
H 0 :   14.1g
H1 :   14.1g

In this case we have a one-tailed test since in the alternative hypothesis we are
only considering values >14.1g. Now the p-value given by SPSS is always
computed for a two-tailed test. For a one-tailed test this p-value is simply
divided by 2. So for this example, the p-value provided by SPSS is first
divided by two. The resulting value is approximately 0, and smaller than the
level of significance (0.05), so we reject H 0 .

In R/RStudio

As mentioned earlier in the notes, to perform statistical analysis in R/RStudio


you can either import data from say Excel/SPSS or otherwise do the data entry
directly into R. Here we are going to proceed as if data is being imported into
R/RStudio.

Save the data in a .csv file (which is an excel format). Place all the data in a
column with the first element of the column containing the variable name:
sugar in this case.

Recall the following commands that are used to upload and view the data file:

dataonesample<-read.csv( file.choose(), header = TRUE)


# this command will open a window which will allow us to
# look for and open the required data file

144 | P a g e
attach(dataonesample)
names(dataonesample) # list the variables in my data
str(dataonesample) # list the structure of my data
View(dataonesample) # opens the data viewer

We need to check whether the sample comes from a normal population. We


do this by using the Shapiro-Wilk test, using the command
shapiro.test(dataonesample$sugar). If the resulting p-value is < 0.05
we reject H 0 whilst if the p-value is > 0.05 we cannot reject H 0 . This test
yields a p-value of 0.8756. With such a large p-value, we can conclude that
the Shapiro-Wilk test fails to reject the normality hypothesis for the
population.

We can now perform the one-sample t-test. We type:

t.test(dataonesample$sugar,mu=14.1)

where mu=14.1 above refers to the hypothetical value of 14.1g. This yields
the following output:

One Sample t-test

data: dataonesample$sugar
t = 5.5557, df = 35, p-value = 2.976e-06
alternative hypothesis: true mean is not equal to 14.1
95 percent confidence interval:
14.82449 15.65884
sample estimates:
mean of x
15.24167

We can see that the sample mean is 15.24167, the t-statistic is 5.5557, and that
a p-value of 2.976e-06 rejects the null hypothesis that the true population
mean is equal to 14.1g. Also note the 95% confidence interval for the
population mean, with the lower confidence bound given by 14.82449 and
the upper confidence bound given by 15.65884.

145 | P a g e
If we want to change the confidence level, say 99%, we put conf=0.99 in the
t.test command.

If on the other hand it was required for us to test whether the mean sugar
content is larger than 14.1g, we would be dealing with a one-tailed test and
the hypotheses are:
H 0 :   14.1g
H1 :   14.1g

In this case we have a one-tailed test since in the alternative hypothesis we are
only considering values >14.1g. Thus we type:

t.test(dataonesample$sugar, alternative="greater",mu=14.1)

We include the parameter alternative="greater" in the command, since


we have a one-tailed test and checking whether the mean is greater than 14.1g.
If the one-tailed test were in the other direction we would have included
alternative="less". The output which results when using
alternative="greater" is the following:

One Sample t-test

data: dataonesample$sugar
t = 5.5557, df = 35, p-value = 1.488e-06
alternative hypothesis: true mean is greater than 14.1
95 percent confidence interval:
14.89447 Inf

sample estimates:
mean of x
15.24167

In this case, the t-statistic is the same as before but the p-value has been halved
due to the fact that we are now using a one-tailed test. The p-value is smaller
than 0.05, meaning that the null hypothesis is rejected at a 0.05 level of
significance. Hence the company’s claim is incorrect and the actual average
amount of sugar is greater than 14.1mg.

146 | P a g e
4.2.2 Independent Samples T-test

The independent-samples t-test is used to assess whether the means of two


independent groups statistically differ from each other. The groups are
considered independent if a member of one group cannot possibly be in the
other group. For example, this t-test is appropriate when testing behavioural
differences between males and females where ‘male’ and ‘female’ are taken
to be the two main distinct categories making up the variable Gender. So a
person cannot be in both groups.

Example: A fitness instructor is conducting a study to investigate the number


of weekly hours spent at the fitness centre by his male and female clients. A
sample of 30 male clients and 25 female clients was selected at random. The
number of hours that the selected individuals spent at the fitness centre in a
particular week was recorded and is displayed in the table which follows:

Male Clients 20, 18, 15, 10, 13.5, 5, 9.5, 15, 22.25, 25, 30.5, 15.25, 14,
10, 10.5, 8, 5.5, 6.6, 9.9, 8.2, 4.4, 2.2, 3.3, 4.5, 2.3
Female Clients 3,2, 3.5, 4.5, 6.6, 8.8, 9, 10, 2.5, 4.5, 5, 6.25, 4, 3, 2, 5.5,
6, 6, 4.25, 4, 4.5, 8, 9, 8.8, 8, 4, 2, 3.5, 5.25, 6

To be able to use the independent samples t-test we need to first check whether
the male client data and female client data are both normally distributed. Also,
the independent samples t-test relies on the assumption of equality of
population variances, meaning that to be able to use this test, the male
population and female population need to have equal variances. The former
is checked by means of normality tests. The latter is checked by means of
Levene’s test for equality of variances. If both assumptions are satisfied, the
independent samples t-test may then be used to compare the average number
of weekly hours spent at the fitness centre by male and female clients.

The null and alternative hypotheses for a two-tailed test would be:

147 | P a g e
H 0 : On average, male and female clients spend the same number of weekly
hours at the fitness centre.
H1 : On average, male and female clients do not spend the same number of
weekly hours at the fitness centre.

The null and alternative hypotheses for a one-tailed test could be:

H 0 : On average, male and female clients spend the same number of weekly
hours at the fitness centre.
H1 : On average, male clients tend to spend more time at the fitness centre
than female clients.

In SPSS

In SPSS the data should be entered as shown below:

Figure 4.2.2
148 | P a g e
where for the variable Gender, 1=Male and 2=Female.

We start by testing if the variable Time follows a normal distribution for both
independent groups (that is for males and females). The output that follows
confirms that this is in fact true since all p-values are greater than 0.05.

Tests of Normality
Kolmogorov-Smirnova Shapiro-Wilk
Gender Statistic df Sig. Statistic df Sig.
Time in hours, spent at the Male .156 25 .119 .930 25 .089
fitness centre Female .136 30 .162 .939 30 .087
a. Lilliefors Significance Correction
Table 4.2.4

We then select: Analyze Compare Means Independent Samples t-test.

where the variable Time is moved underneath Test Variable(s) and the
variable Gender is moved into the Grouping Variable box.

The button Define Groups is then clicked to define the groups according to
the index given to represent Males and Females (for example 1 and 2).

Figure 4.2.3

149 | P a g e
The resulting outputs are as follows:

Table 4.2.5

Table 4.2.6

From Table 4.2.5, it should be noted that SPSS automatically computes the
descriptive statistics (sample mean, sample standard deviation, standard error
for the sample mean), Levene’s test for equality of variances, 95% confidence
interval (or otherwise if specified), t-statistic (with corresponding degrees of
freedom) and the two-tailed p-value, which will once again be the main focus
in our hypothesis testing.

Now, Table 4.2.6 contains two sets of values: the first line assumes equal
variances and the second does not. To assess which line of values to use, the
result for Levene’s test for Equality of Variances has to be taken into
consideration.

The Levene’s test checks whether our datasets come from populations with
equal variances and again, this hypothesis is accepted or rejected on the basis
of the p-value being greater or smaller than α. The corresponding hypothesis
test is:

150 | P a g e
H 0 : The variances of the two populations from which the samples are
extracted are equal.
H 1 : The variances of the two populations from which the samples
are extracted are not equal.

Taking   0.05 , the resulting p-value in this case is 0.00 < 0.05 and thus we
reject H 0 . So, the two datasets come from populations with unequal
variances. Thus, we should use the statistics in the row labelled Equal
variances not assumed.

Consider the two-tailed test mentioned earlier, our null and alternative
hypotheses can be written as:

H 0 : 1  2  0
H1 : 1  2  0

From Table 4.2.6 we note that the t-statistic (obtained using the Welch t-test)
under the assumption of unequal variances has a value of 4.063 with degrees
of freedom of 28.05 and an associated p-value of 0.000. Since the p-value is
less than 0.05, then we reject H 0 . Thus, the probability that there is no
difference between attendance of male and female clients is very small. The
sample means in Table 4.2.5 in fact suggest that males, on average, spend
more time at the fitness centre than females. The resulting p-value confirms
that the observed difference in sample means is not due to chance.

If on the other hand as alternative hypothesis we consider the following:

H1 : On average, male clients tend to spend more time at the fitness centre
than female clients

then, once again we can consider the t-statistic value obtained under the
assumption of unequal variances, 4.063 with degrees of freedom of 15 and
check whether this value falls inside the rejection region or not.

151 | P a g e
Otherwise, as is typically done when using software, we consider the
0.000
associated p-value of  0.000 . Since the p-value is less than 0.05, then
2
we reject H 0 . Thus we can conclude that, male clients tend to spend more
time at the fitness centre than female clients.

In R/RStudio

Similar tests as those conducted in SPSS will now be conducted in R (or


RStudio).

Suppose that the data has been entered in Excel in the same layout as was used
in SPSS. The data should be saved with extension .csv.

The data is first uploaded into the software:

dataindepsamples<-read.csv( file.choose(), header = TRUE)


# this command will open a window which will allow us to look
# for and open the required data file

The independent samples t-test can be performed by using the following


commands:

attach(dataindepsamples)
names(dataindepsamples)
# list the variables in my data

str(dataindepsamples)
# list the structure of mydata

View(dataindepsamples)
#opens data viewer

dataindepsamples$Gender<-
factor(dataindepsamples$Gender,levels=c(1,2),labels=
c("Male", "Female"))
#Adding labels to the levels of the fixed factor

152 | P a g e
We can check the assumption of normality for both subgroups using the
command:
by(dataindepsamples$Time, dataindepsamples$Gender,
shapiro.test) #Checking if both groups satisfy normality

Based on the p-values from the output below, we cannot reject normality at
either 0.01 or 0.05 levels of significance.

dataindepsamples$Gender: Male

Shapiro-Wilk normality test

data: dd[x, ]
W = 0.93048, p-value = 0.08916

-----------------------------------------------------------
----------------
dataindepsamples$Gender: Female

Shapiro-Wilk normality test

data: dd[x, ]
W = 0.9392, p-value = 0.08655

We check whether the Levene's test accepts the variance homogeneity


hypothesis. To perform the Levene's test in R we need to install the package
‘lawstat’ and use the command levene.test. Script used is displayed below:

levene.test(dataindepsamples $Time, dataindepsamples$Gender,locat


ion='mean')

classical Levene's test based on the absolute deviati


ons
from the mean ( none not applied because the location
is
not set to median )

data: Independent_Samples_ttest$Time
Test Statistic = 22.808, p-value = 1.455e-05

With a test statistic of 22.808 and a p-value of 1.455e-05, the null hypothesis
of variance homogeneity is rejected, whether at the 0.1, 0.05 or 0.01 level of
significance. Typing:

153 | P a g e
by(dataindepsamples$Time,dataindepsamples$Gender,var)

we can in fact see that the observed times for males have a variance of 54.05
and those for females have a variance of 5.346.

Recall that we want to test:

H 0 : 1  2  0
H1 : 1  2  0

Keeping in mind that the variance homogeneity hypothesis is rejected, the


command to conduct the t-test is the following:

t.test( dataindepsamples$Time ~ dataindepsamples$Gender,


mu=0, var.equal=FALSE)

In the above command, mu=0 is the value set by the null hypothesis, and
var.equal=FALSE is required because we cannot assume equal variances.
When it is possible to assume equal variances this is replaced with
var.equal=TRUE.

The output for this test is the following:


Welch Two Sample t-test

data: dataindepsamples$Time by dataindepsamples$Gender


t = 4.0631, df = 28.047, p-value = 0.0003539
alternative hypothesis: true difference in means is not equal
to 0
95 percent confidence interval:
3.084907 9.357093
sample estimates:
mean in group Male mean in group Female
11.536 5.315

The t-statistic of 4.0631 rejects the null hypothesis for levels of significance
0.01, 0.05 and 0.1, with a p-value of 0.003539. This means that the mean time
spent at the fitness centre by male clients is significantly different from the
mean time spent at the fitness centre by female clients.

154 | P a g e
The output also shows that the sample mean times for males and females are
11.54 and 5.32 respectively, which indicate that on average male clients spend
more than double the time at the fitness centre than females.

Now suppose that we want to test the following one-tailed test:

H 0 : 1  2  0
H1 : 1  2  0

The command to conduct the t-test is the following:

t.test( dataindepsamples$Time ~ dataindepsamples$Gender,


alternative="greater", mu=0, var.equal=FALSE )

As can be seen in the above command, due to the fact that we have a one-
tailed test and we are checking whether the average time spent at the fitness
centre by males is greater than the average time spent by females, we need to
set alternative="greater".

The output for this test is the following:

Welch Two Sample t-test

data: dataindepsamples$Time by dataindepsamples$Gender


t = 4.0631, df = 28.047, p-value = 0.0001769
alternative hypothesis: true difference in means is greater
than 0
95 percent confidence interval:
3.616536 Inf

sample estimates:
mean in group Male mean in group Female
11.536 5.315

Note that the t-statistic and the sample means are the same as those obtained
for the two-tailed test. The p-value is smaller at 0.0001769 (due to dividing
by two). This value is once again smaller than 0.05, thus the null hypothesis
is rejected.

155 | P a g e
If on the other hand we wanted to test:
H 0 : 1  2  5
H1 : 1  2  5

Or
H 0 : 1  2  5
H1 : 1  2  5

One would need to change mu=0 to mu=5 in the above commands.

4.2.3 Paired Samples t-test

The paired-samples t-test is used to compare the means of two (related)


variables within a single group. Very often the design for this test involves
measuring each subject twice: before and after some kind of treatment or
intervention. The aim is to determine whether the mean of the differences
between a series of paired observations on a particular outcome is
significantly different from zero.

Some examples of when such a test is used are:

 To check whether a particular training program has any impact on the


efficiency of the employee of a company.

 To check whether a diet program helps to reduce people’s weight.

 To check whether there is a significant difference in a diagnostic test


result before and after patients are given some treatment.

156 | P a g e
Example: The following are the average weekly losses, in hours, due to
accidents in 10 industrial plants before and after a certain safety program was
put into operation. We want to check whether the program is significantly
reducing the mean weekly losses due to accidents. A parametric test is used
because both sets of readings may be shown to come from a population
following a normal distribution.

Before 45 73 46 124 33 57 83 34 26 17

After 36 60 44 119 35 51 77 29 24 11

The paired-samples t-test is the ideal test to use to check for significant
difference between the before and after readings. Our null and alternative
hypotheses are:

H 0 :  Before   After  0
H1 :  Before   After  0

which may be translated into words as follows:

Null: The program is not effective (Mean weekly losses are unaltered)
Alternative: The program has significantly reduced the mean weekly losses

In SPSS

Enter all the observations of one sample in the cells of the first column and
the observations of the second sample in the cells of the second column.
These columns define the ‘before’ and the ‘after’ average weekly losses on
man hours due to accidents.

157 | P a g e
Before we can proceed to use the paired samples t-test, we need to create a
new variable with the differences between Before and After. This new
variable is created so that we can use it to check for normality of the data.

Figure 4.2.4

In this case, focus lies with the differences since the paired samples t-test is
actually a one sample t-test applied to the set of differences. To create the new
variable go to Transform, Compute Variable, give a name to the new variable
underneath Target Variable, move the variables Before and After underneath
Numeric Expression and include the subtraction sign in between as shown in
Figure 4.2.4.

Having created the new variable Differences, we use the Shapiro-Wilk test to
check whether this variables is normally distributed or not. This test yields a
p-value of 0.682. Hence the null hypothesis that this variable follows a normal
distribution cannot be rejected. Thus the paired t-test can be applied.

158 | P a g e
Choose Analyze from the bar menu, select Compare Means and click on
Paired samples t-test. Move the variables ‘before’ and ‘after’ simultaneously
to the paired variables box and click on OK to run the procedure.

Figure 4.2.5

The following outputs are obtained:

Paired Samples Statistics


Mean N Std. Deviation Std. Error Mean
Pair 1 Before 53.80 10 32.058 10.138
After 48.60 10 31.031 9.813
Table 4.2.7

Paired Samples Test


Paired Differences
95% Confidence
Interval of the
Std. Std. Error Difference Sig. (2-
Mean Deviation Mean Lower Upper t df tailed)
Pair 1 Before - 5.200 4.077 1.289 2.283 8.117 4.033 9 .003
After
Table 4.2.8

159 | P a g e
From Table 4.2.7, the average of the paired differences is 5.2. Our null
hypothesis says that there is no difference in the average weekly losses on
man-hours due to accidents before and after the program; in other words we
are testing whether this mean paired difference, 5.2, is significantly different
from zero. SPSS calculates the value of the t-statistic, which is simply the ratio
of mean paired difference and the standard error. So t = 5.2/1.289 = 4.033.
Since the p-value is 0.0015 (0.003/2 - due to the fact that the alternative
hypothesis is one-tailed) is smaller than the level of significance (0.05), we
have enough evidence to reject the null hypothesis. This implies that the
program was effective in reducing the weekly losses on man-hours due to
accidents. The probability that this assertion is wrong is 0.0015 which is very
small.

In R/RStudio

In this section we will show how the paired samples t-test may be conducted
in R/RStudio.

We start by uploading the data saved in .csv format into R or RStudio:

rm(list=ls(all=TRUE)) #This clears all existing variables


library(lawstat)
library(MBESS)

#Paired Samples t-test

datapairedsamples<-read.csv( file.choose(), header = TRUE)


# this command will open a window which will allow us to
#look for and open the required data file

attach(datapairedsamples)
names(datapairedsamples) # list the variables in my data
str(datapairedsamples) # list the structure of my data
View(datapairedsamples) # opens data viewer

160 | P a g e
As a first step, we need to check that the difference between Before and After
readings follows a normal distribution. In R this is done through the following
command:

shapiro.test(datapairedsamples$Before-datapairedsamples$Aft
er) # Checking if differences satisfy normality

This test yields a p-value of 0.6824 so we cannot reject the null hypothesis
that the differences between before and after readings follow a normal
distribution.

We proceed to test the hypothesis stated for the example under study, that is,
we want to check whether the mean of the after data is significantly smaller
than that of the before data. We perform the paired samples t-test as follows:

t.test( datapairedsamples$Before, datapairedsamples$After,


alternative="greater", paired=TRUE) # paired samples t-test

This yields the following output:

Paired t-test

data: datapairedsamples$Before and datapairedsamples$After


t = 4.0333, df = 9, p-value = 0.001479
alternative hypothesis: true difference in means is greater
than 0
95 percent confidence interval:
2.836619 Inf
sample estimates:
mean of the differences
5.2

The outputs are very similar to those obtained in SPSS. The sample mean of
the differences is 5.2, and with a t-statistic of 4.033 and a p-value of 0.015,
the null hypothesis is rejected at 0.1, 0.05 and 0.01 level of significance. For
more detail on how to interpret this result from an application point of view
refer to the interpretation given for the SPSS output.

161 | P a g e
4.2.4 One-Way ANOVA

Apart from comparing two groups of variables, we may want to compare


several independent groups. One-Way ANOVA is the most suitable test for
this purpose. One-Way ANOVA tests whether there is a significant difference
among the mean scores of three or more independent groups (or
subpopulations on one covariate).

In order to apply this test the following assumptions need to be satisfied:

• For each group the response variable must be normally distributed;


• Samples have to be independent;
• Variances are homogeneous (equal) between groups (this will be
checked by the Levene’s test)

The null hypothesis for a one-way ANOVA is that the mean values of the
independent groups/subpopulations are equal. The alternative hypothesis is
that some of the mean values of the independent groups/subpopulations are
different.

Example: A marketing research firm is testing the popularity of three new


flavourings for a soft drink, Orange, Lemon and Cherry, using a sample of 35
people assigned randomly to 3 groups. The first group tasted the orange
flavour, the second group tasted the lemon flavour and the third group tasted
the cherry flavour. Each person was then given a questionnaire which
evaluates how enjoyable the beverage was and scores were then calculated.
The results are found in the following table:

162 | P a g e
Orange Lemon Cherry
13 12 7
17 8 19
19 6 15
11 16 14
20 12 10
15 14 16
18 10 18
9 18 11
12 4 14
16 11 11
9 5
9 6
4

The null and alternative hypotheses for this two-tailed test are:

H0: The mean scores between flavours are equal (there is no significant
difference in mean scores due to flavour)
H1: The mean scores between flavours are not equal (there is a significant
difference in mean scores due to flavour)

In SPSS

Enter all the observations of the three samples in the cells of the first column,
which defines the covariate ‘score’. In the second column enter the values 1,
2 or 3 besides the readings of Orange, Lemon and Cherry respectively. This
column defines the factor ‘flavour’ which has three levels.

163 | P a g e
Figure 4.2.6

First we need to check that the assumption of normality is satisfied for the
three levels of the factor flavour. Normality tests can be done by going to
Analyze, Descriptive Statistics, Explore and follow the instructions given
earlier in the notes. The procedure to check for normality here is the same as
what has been described before but in this case the factor flavour is entered
underneath Factor list as shown in Figure 4.2.7.

164 | P a g e
Figure 4.2.7

The following output confirms that at each level of the factor flavour, the
normality assumption is satisfied by the dependent variable score.

Tests of Normality
Kolmogorov-Smirnova Shapiro-Wilk
flavour Statistic df Sig. Statistic df Sig.
score Orange .142 12 .200* .916 12 .256
Lemon .172 13 .200* .937 13 .416
Cherry .153 10 .200* .970 10 .894
*. This is a lower bound of the true significance.
a. Lilliefors Significance Correction

Table 4.2.9

From the bar menu choose Analyze, select Compare Means and click on One
Way Anova. Move score into the Dependent List and move ‘flavour into the
grouping variable to be defined as a factor. Click on Options and select
Descriptives, Homogeneity of Variance Test and Means Plot. Finally click on
Continue and OK to run the procedure.

165 | P a g e
Figure 4.2.8

The following outputs are obtained:

Descriptives
score
95% Confidence Interval
for Mean
Std. Std. Lower Upper
N Mean Deviation Error Bound Bound Minimum Maximum
Orange 12 14.0000 4.04520 1.16775 11.4298 16.5702 9.00 20.00
Lemon 13 9.6923 4.62574 1.28295 6.8970 12.4876 4.00 18.00
Cherry 10 13.5000 3.74907 1.18556 10.8181 16.1819 7.00 19.00
Total 35 12.2571 4.53965 .76734 10.6977 13.8166 4.00 20.00
Table 4.2.10

Table 4.2.10 gives the sample average, standard deviation, standard error and
95% confidence intervals of the mean scores obtained by the three flavours.

For each flavour, there is a 95% chance that the actual mean score lies within
the given confidence intervals. So there is a 95% chance that the actual mean
score for Orange lies between 11.43 and 16.357.

166 | P a g e
The sample means suggest that on average the scores for Orange and Cherry
can be considered equal while the score for Lemon is lower. The statistical
significance of these observed difference still needs to be tested.

The following table presents the results for Levene’s tests. Levene’s test is
used to check whether the assumption of equal variances is satisfied. In this
case we have the following hypothesis:

H0: The variances of the subpopulations from which the samples are extracted
are equal.

H1: The variances of the subpopulations from which the samples are extracted
are not equal.

Test of Homogeneity of Variances


score
Levene Statistic df1 df2 Sig.
.520 2 32 .599
Table 4.2.11

Since the p-value is 0.599, which is greater than 0.05, then we cannot reject
the null hypothesis. So there is no significant difference in the three
subpopulation variances.

Since all assumptions made by the one-way ANOVA test have been shown to
be satisfied we can proceed to interpret the output for this test.

ANOVA
score
Sum of Squares df Mean Square F Sig.
Between Groups 137.416 2 68.708 3.903 .030
Within Groups 563.269 32 17.602
Total 700.686 34
Table 4.2.12

167 | P a g e
SPSS calculates the value of the F-statistic, which is simply the ratio of mean
square (between groups) and the mean square (within groups). So
F  68.708 17.602  3.903 . Since the p-value (0.03) is smaller than the level
of significance (0.05), we reject H0. This implies that there is a significant
difference in mean scores due to flavour. The differences between the sample
means are displayed visually in the means plot shown below:

Figure 4.2.9

For a more detailed study on which means are significantly different from the
rest, one can consider Post Hoc tests. More detail about such tests is provided
later in section 4.4.

In R/ RStudio

Data is saved in .csv format following the same structure used when entering
data in SPSS. So again we have two columns, the first defines the covariate
‘score’ and the second the factor with three levels, ‘flavour’.

168 | P a g e
We start by loading the data and then assigning the labels to the factor.

rm(list=ls(all=TRUE)) # This clears all existing variables


library(lawstat)
library(MBESS)

dataflavours<-read.csv( file.choose(), header = TRUE)


# this command will open a window which will allow us to
# look for and open the required data file

attach(dataflavours)
names(dataflavours) # list the variables in my data

dataflavours$flavour<-factor(dataflavours$flavour,
levels=c(1,2,3),labels=c("Orange","Lemon","Cherry"))
# adding labels

str(dataflavours) # list the structure of mydata


View(dataflavours) # open the data viewer

We next need to check for the normality assumption and the variance
homogeneity assumption. For the first, we write the command:

by(dataflavours$score,dataflavours$flavour,shapiro.test)
# testing for normality

This yields the output:


dataflavours$flavour: Orange

Shapiro-Wilk normality test


data: dd[x, ]
W = 0.9162, p-value = 0.256
-----------------------------------------------------------
dataflavours$flavour: Lemon
Shapiro-Wilk normality test

data: dd[x, ]
W = 0.93673, p-value = 0.416

-----------------------------------------------------------
dataflavours$flavour: Cherry

169 | P a g e
Shapiro-Wilk normality test

data: dd[x, ]
W = 0.97035, p-value = 0.8941

We can see that for all three flavours, the Shapiro-Wilk test fails to reject the
normality assumption at all levels of significance (0.1, 0.05 and 0.01). This
means that we can assume that the normality assumption is satisfied in all
three cases. For the Levene's test on the other hand we write:

levene.test(dataflavours$score,dataflavours$flavour,locatio
n="mean")
# test for homogenity of variance

The output is as follows:

classical Levene's test based on the absolute deviati


ons
from the mean ( none not applied because the location
is
not set to median )

data: dataflavours$score
Test Statistic = 0.52004, p-value = 0.5994

With a test statistic of 0.52, and a p-value of 0.5994 we fail to reject the
homogeneity of variance hypothesis at 0.1, 0.05 and 0.01 levels of
significance. Having failed to reject both assumptions stipulated by the
ANOVA test, we can now proceed to implement one-way ANOVA. There
are various ways in which one-way ANOVA may be performed in R. The
following is just one of them:

output=aov(dataflavours$score~dataflavours$flavour)
summary(output)

The output for the one-way ANOVA test is the following:

170 | P a g e
Df Sum Sq Mean Sq F value Pr(>F)
dataflavours$flavour 2 137.4 68.71 3.903 0.0304 *
Residuals 32 563.3 17.60
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘
’ 1

From the resulting output, we fail to reject the null hypothesis that there is a
significant difference in mean scores due to flavour, at 0.01 level of
significance. The null hypothesis is however rejected at 0.05 and 0.1 level of
significance.

The output obtained in R is the same as that obtained in SPSS and hence can
be interpreted in the same way. So for more details refer to the SPSS section.

4.2.5 Repeated Measures ANOVA

Repeated measures ANOVA is a generalization of the paired samples t-test.


The paired samples t-test is a parametric test used to check whether the mean
difference between two sets of observations obtained on the same
subjects/objects is zero. Repeated measures ANOVA is used to check
whether the mean difference between more than two sets of observations
obtained on the same subjects/objects is zero. Repeated measures ANOVA is
applied in studies that investigate either (1) changes in mean scores over three
or more time points (such as when measuring changes in in cholesterol level
due to an exercise-training programme), or (2) differences in mean scores
under three or more different conditions (for example CBC blood tests for the
same set of blood samples run on three or more different machines). Due to
the nature of the data, this type of ANOVA is also known as within subjects
ANOVA.

The following is an example of where one-way repeated measures ANOVA


might come in handy.

171 | P a g e
Example: Suppose that a number of patients with chronic pain have been
prescribed some new form of medication. Readings on the Visual Analog
Pain Scale (VAS) where 0 means no pain and 10 means worst pain ever, were
recorded for each patient after two months, four months and six months of
taking the medication. Having collected the data, it would then be of interest
to see whether there is any significant difference in the pain measurements
recorded. An improvement in the measurements recorded over time would
mean that the prescribed medication is effectively reducing the pain
experienced by the patients.

In a similar manner to the paired samples t-test and ANOVA, repeated


measures ANOVA also relies on a number of assumptions. To be able to
perform a repeated measures ANOVA, we need the following assumptions to
hold:

 the data is multivariate normally distributed (the multivariate normal


distribution is a generalization of the normal distribution which caters for
relationships amongst multiple variables). We will see how testing for
multivariate normality may be carried out using the Energy test for
Multivariate Normality available in the energy package in R. Note that
testing for multivariate normality cannot be carried out in SPSS.

 sphericity - the population variances of all possible difference scores (for


the pain measurement example, variance of scores obtained on Time 1 –
Time 2, variance of scores obtained on Time 1 – Time 3, variance of
scores obtained on Time 2 – Time 3) are all equal. We will see how
testing for sphericity is carried out using Mauchly’s test.

Example: Consider once again the VAS pain measurement example.


Suppose that pain scores have been obtained from 40 randomly selected
patients and that these scores have been obtained at month 2, month 4 and
month 6.

172 | P a g e
Since the multivariate normality assumption cannot be checked in SPSS, we
will start this section by showing how to perform repeated measures ANOVA
in R.

In R/RStudio

There are various ways in which repeated measures ANOVA can be carried
out in R. The command used depends on the output required. The test for
sphericity is for example not available with the aov() function that has been
used in one-way ANOVA. The command that will be used here is ANOVA
from the car package.

For the example being considered, the data entry consists of creating one
column for each time point together with a variable for Subject, as shown in
Figure 4.1.12. The data being shown here has been created in SPSS. As for
any other statistical technique shown earlier in the notes, the data could also
have been created in Excel. The import menu in RStudio or the command
read.csv is then used to import the data in R.

Figure 4.2.10

173 | P a g e
# Conducting repeated measures ANOVA in R

rm(list=ls(all=TRUE)) # this clears all existing variables

datapain<-read.csv( file.choose(), header = TRUE)


# this command will open a window which will allow us to
# look for and open the required data file

attach(datapain)
names(datapain) # list the variables in my data

Having entered the data into the required format, we can now proceed to start
analysing the data. Descriptive statistics of the data and plots will give insight
into whether there are any differences in the mean pain scores due to time.

The boxplot which follows shows that there seems to be a reduction in pain
scores with time.
8
6
4
2

Month_2 Month_4 Month_6

Figure 4.1.13

Note that to be able to obtain the boxplot in Figure 4.1.13 we need the data to
be changed into what is known as long format. To change from the current
short format into long format, the command reshape needs to be used as
follows:

Paindatalong <- reshape(datapain, idvar = "Subject", varying


= c("Month_2","Month_4","Month_6"), timevar = "time", times
= factor(c(1, 2, 3)), v.names = "score", direction = "long")
attach(Paindatalong)
str(Paindatalong) # list the structure of mydata
View(Paindatalong) # open the data viewer

174 | P a g e
time<-factor(time,levels=c(1,2,3),labels=
c("Month_2","Month_4","Month_6"))
# adding labels
plot(time, score)

To be able to draw conclusions about any possible difference in scores we will


need to proceed to the hypothesis testing step given by the repeated measures
ANOVA. Note that for the procedure which follows the short format data is
used.

First we need to check the assumptions of multivariate normality and


sphericity. Multivariate normality is tested by means of the command
mvnorm.etest from the package energy (note that there are many other
multivariate tests of normality available in R) as follows:

library(energy)
mvnorm.etest(datapain[,2:4],R=200)

Energy test of multivariate normality: estimated


parameters

data: x, sample size 40, dimension 3, replicates 200


E-statistic = 0.83721, p-value = 0.385

The p-value for this test is 0.385 which is greater than 0.05 hence we accept
the hypothesis that the data follows a multivariate normal distribution.

The Mauchly test of sphericity can be carried out using the command
mauchly.test. This command is quite complicated for the non-advanced R
user. We can alternatively use the function ANOVA from the car package. The
Mauchly test result is obtained as part of the output for repeated measures
ANOVA. The commands to perform such a test and the output which results
follow:

options(contrasts=c("contr.sum", "contr.poly"))
# defines the way the sums of squares are worked out
# with this setting calculations match those of SPSS and of
# common ANOVA textbooks

175 | P a g e
design<-factor(c("Month_2","Month_4","Month_6"))
# defining the time variable
library(car)
# loading the package car to use the command ANOVA

rmmodel<-lm(as.matrix(datapain[,2:4]) ~ 1)
# note that if for example besides the pain levels over time
# we were also considering other variables in the study,
# such as age, gender etc, any possible influence that these
# between subjects variables might have posed on the pain
# experienced by the subjects would need to be catered for
# by including these variables in the right hand side of the
# model, instead of the value 1, in the form :
# rmmodel<-lm(as.matrix(datapain[,2:4]) ~ age*gender)
results<-Anova(rmmodel, idata=data.frame(design),
idesign=~design, type="III")
# note that the commands idata and idesign can be adapted
# for use with any within subjects design. In our example
# we are only considering one within subjects variable,
# Time. Suppose that besides measuring pain levels in month
# 2, 4 and 6, pain levels were also measured three times a
# day, morning, afternoon, evening. A two-way repeated
# measures ANOVA would have to be used in such a case.
# Assuming that the data would have been entered in a
# similar format to the one being used here with columns
# for Morning_Month2, Afternoon_Month2, Evening_Month2,…,
# Evening_Month6, the left hand side of the model inside
# the lm function above would need to changed to include
# all the repeated measures columns in the data and the
# commands idata and idesign would need to be changed to:
# idata <- expand.grid(timeofday= c("Morning","Afternoon",
# "Evening"),time= c("Month_2","Month_4","Month_6"))
# idesign=~time*timeofday

summary(results, multivariate=F)

Univariate Type III Repeated-Measures ANOVA Assuming Sphericity

SS num Df Error SS den Df F Pr(>F)


(Intercept) 2970.07 1 68.258 39 1696.978 < 2.2e-16
design 90.65 2 198.017 78 17.854 4.13e-07
(Intercept) ***
design ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

176 | P a g e
Mauchly Tests for Sphericity

Test statistic p-value


design 0.99356 0.8845

Greenhouse-Geisser and Huynh-Feldt Corrections


for Departure from Sphericity

GG eps Pr(>F[GG])
design 0.9936 4.446e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

HF eps Pr(>F[HF])
design 1.046776 4.129517e-07

Since the p-value obtained from the Mauchly test of sphericity is 0.8845 >
0.05 then we do not have enough evidence to reject the sphericity assumption.
We can thus proceed with interpreting the output that result for the repeated
measures ANOVA. The p-value of interest in this case is 0 (marked in bold
in the output, design - 4.13e-07) meaning that there is a significant
difference in the pain scores due to time.

If it was the case that the Mauchly test of sphericity gave enough evidence to
reject the null hypothesis of sphericity, we would not be able to trust the
resulting F-statistic and p-value. We would then need to move on to the
output given by Greenhouse-Geisser and Huynh-Feldt Corrections.
Greenhouse-Geisser and Huynh-Feldt are two methods which calculate the
amount of sphericity in the data (called eps in R output) and are used to correct
the p-value according to how bad the violation of sphericity is. If sphericity
is > 0.75, the Huyn-Feldt correction is used. If sphericity is < 0.75, the
Greenhouse-Geisser correction is used. An eps of 1 or close to 1, as is the
case in this example, shows that the condition of sphericity is not violated.
Note that as with many other tests, Mauchly’s test of sphericity might be
affected by the use of a small sample size. In small sample sizes, large
deviations from sphericity might be interpreted as being non-significant.

Having found that there is a significant difference in mean pain scores due to
Time, it might be of interest to proceed with doing post hoc tests. Such tests
will be discussed in Section 4.4.

177 | P a g e
In SPSS

For the example being considered, the data entry consists of creating one
column for each time point together with a variable for Subject, as shown in
Figure 4.2.10.

The pain data being analysed in this section has been shown to be multivariate
normally distributed by means of the command mvnorm.etest from the
package energy in R. SPSS provides us with two tests for normality, the
Shapiro-Wilk test and Kolmogorov-Smirnov test. Both of these tests,
however, are tests for univariate normality, not multivariate normality.
Multivariate tests for normality are not available in SPSS.

To be able to proceed with using repeated measures ANOVA we also need to


check whether the assumption of sphericity is satisfied. The output for this
test forms part of the output for repeated measures ANOVA. So go to
Analyze, General Linear Model, Repeated Measures, choose a name for the
within subjects factor - Time in this case, as we have repeated observations
obtained over different time periods, define 3 to be the number of measures
due to measurements being taken at 3 different time points and we can also
choose a measure name. In this case we can call this Score, the pain score
given by the patients, as shown in Figure 4.2.11.

178 | P a g e
Figure 4.2.12

Press the Add button underneath Number of Levels and Score, go on Define,
select the variables Month_2, Month_4 and Month_6 and move them to
Within Subjects Variables as shown in Figure 4.1.14. Note that if Between
Subjects Variables such as gender and age were also available in the data,
these would need to be moved underneath Between Subjects Factor(s).
Descriptive statistics may also be obtained from the Options button. Then
press Continue and OK.

179 | P a g e
Figure 4.2.13

Table 4.2.13 shows descriptive statistics for the sets of pain scores obtained
in the three different months being considered in this analysis.

Descriptive Statistics
Mean Std. Deviation N
Month_2 6.00 1.340 40
Month_4 5.05 1.648 40
Month_6 3.88 1.522 40
Table 4.2.13

Table 4.2.14 shows the results from Mauchly test of sphericity. Since the p-
value obtained from the Mauchly test of sphericity is 0.885 > 0.05 then we do
not have enough evidence to reject the sphericity assumption.

180 | P a g e
Mauchly's Test of Sphericitya
Measure: Score
Epsilonb
Within Subjects Mauchly's Approx. Greenhous Huynh- Lower-
Effect W Chi-Square df Sig. e-Geisser Feldt bound
Time .994 .245 2 .885 .994 1.000 .500
Table 4.2.14

So we can now proceed to focus on the p-value obtained from the repeated
measures ANOVA table. The p-value of interest in Table 4.1.19 is 0 (marked
in bold in the output) meaning that there is a significant difference in the pain
scores due to time.

Tests of Within-Subjects Effects


Measure: Score
Type III Sum
Source of Squares df Mean Square F Sig.
Time Sphericity Assumed 90.650 2 45.325 17.854 .000
Greenhouse-Geisser 90.650 1.987 45.617 17.854 .000
Huynh-Feldt 90.650 2.000 45.325 17.854 .000
Lower-bound 90.650 1.000 90.650 17.854 .000
Error(Time) Sphericity Assumed 198.017 78 2.539
Greenhouse-Geisser 198.017 77.501 2.555
Huynh-Feldt 198.017 78.000 2.539
Lower-bound 198.017 39.000 5.077
Table 4.2.16

Having found that there is a significant difference in mean pain scores due
to Time, it might be of interest to proceed with doing post hoc tests. These
will be discussed later in section 4.4.

For more details on the output, refer to the repeated measures ANOVA section
for R/RStudio.

181 | P a g e
4.3 Non-Parametric Tests

The tests presented in Section 4.2 can only be applied when the assumption
of normality is satisfied. If we need to test such hypotheses but the assumption
of normality is not satisfied we turn to the non-parametric alternatives which
are presented here.

Important: Note that all non-parametric tests can still be used when data
follows a normal distribution but in such cases their power is less than that
of the respective parametric tests.

4.3.1 The One-Sample Wilcoxon Signed Ranks Test

The one-sample Wilcoxon signed ranks test is the non-parametric equivalent


to the one sample t-test. It is used to compare the median (not mean like the
one-sample) of a single sample with a specified population median. This test
should only be used when the variable under study is not normally distributed.

Example: Henna Ltd. claims that the 20g tubs of yogurts it produces contain
an average of 2g of fat. A quality control officer takes a sample of 39 (20g)
tubs of yogurts produced by this company and notes the amount of fats they
contain (rounded to the nearest gram). The observed values are listed in the
table which follows:

182 | P a g e
1.77 1.86 4.04 1.38 3.65
3.8 1.79 2.04 4.56 1.66
2.69 4 2.61 3.07 2.59
3.08 1.85 0.53 1.33
1.7 2.88 1.58 3.85
3.59 1.14 1.49 1.82
1.24 2.49 1.94 1
1.06 6.33 1.61 1.01
2.9 2.34 1 0.48

Since the data given in this example may be shown to not be normally
distributed, a one-sample Wilcoxon signed ranks test can be used to test the
following hypothesis:

H 0 : median  2 g
.
H1 : median  2 g
Alternatively:

H 0 : On average 20g tubs of yogurt produced by Henna Ltd. contain 2g of fat.


H 0 : On average 20g tubs of yogurt produced by Henna Ltd. do not contain 2g
of fat.

Only if the assumption of normality is not satisfied should one proceed to use
the one-sample Wilcoxon test to test the above hypothesis.

In SPSS

Go to Analyze ->Nonparametric Tests ->One Sample.

183 | P a g e
Figure 4.3.1

On the Objective tab specify Customize analysis.

Figure 4.3.2
184 | P a g e
On the Fields tab specify the variable for which the one-sample Wilcoxon test
is desired.

Figure 4.3.3
On the Settings tab specify Customize tests, check the box for Compare median
to hypothesized (Wilcoxon signed-rank test), specify the Hypothesized median
value in this case enter 2 and click RUN.

Figure 4.3.4
185 | P a g e
The following output is obtained:

Table 4.3.1

From Table 4.3.1 we see that the null hypothesis cannot be rejected with a
0.05 level of significance (p-value (sig) = 0.332 > 0.05). This means that the
company’s claim is true. That is, on average 20g tubs of yogurt produced by
Henna Ltd. contain 2g of fat.

In R/RStudio

In R or RStudio the command used to do the one-sample Wilcoxon signed


rank test is wilcox.test. The following are the commands used to carry out
this test in R:

#uploading the data


dataonesamplewil<-read.csv( file.choose(), header = TRUE)
# this command will open a window which will allow us to
# look for and open the required data file
attach(dataonesamplewil)
names(dataonesamplewil) # list the variables in my data
str(dataonesamplewil) # list the structure of mydata
View(dataonesamplewil) # open data viewer

# Testing for normality


shapiro.test(dataonesamplewil$Fat)

# one-sample Wilcoxon signed rank test


wilcox.test(dataonesamplewil$Fat, mu = 2, alternative =
"two.sided")

The following result is obtained:

186 | P a g e
Wilcoxon signed rank test with continuity correction

data: dataonesamplewil$Fat
V = 459, p-value = 0.3391
alternative hypothesis: true location is not equal to 2

From the output above we see that the null hypothesis cannot be rejected with
a 0.05 level of significance (p-value (sig) = 0.339 > 0.05). This means that
the company’s claim is true. That is, on average 20g tubs of yogurt produced
by Henna Ltd. contain 2g of fat.

Note that:
 if you want to test whether the median amount of fat is less than 2g (one-
tailed test), type:

wilcox.test(dataonesamplewil$Fat, mu = 2,alternative =
"less")
 Or, if you want to test whether the median weight of mice is greater than
25g (one-tailed test), type:

wilcox.test(dataonesamplewil$Fat, mu = 2,alternative =
"greater")

4.3.2 The Mann-Whitney Test

The Mann Whitney U test is the non-parametric equivalent of the independent


samples t-test. It is used to assess whether the medians (not mean like the the
independent samples t-test) of two independent groups statistically differ from
each other. This test should only be used when the dependent variable under
study is not normally distributed for at least one of the groups.

Example: A researcher wants to learn if a new drug slows the growth of


tumours. 48 mice with tumours are obtained and randomly divided into two
groups.

187 | P a g e
The first group is injected with the new drug while the second group is the
control, injected with an old established drug. After 2 weeks, the bodies of the
mice are scanned, and any decrease in the size of the tumour (in cm) recorded.

New_Drug 0.71 0.83 0.89 0.57 0.68 0.74 0.75 0.67 0.9 0.25 0.88 0.13
Control
(Old Drug) 0.72 0.68 0.69 0.66 0.57 0.66 0.7 0.63 0.86 0.5 0.34 0.76
New_Drug 0.83 0.72 0.11 0.45 0.06 0.57 0.22 0.8 0.43 0.97 0.19 0.24
Control
(Old Drug) 0.31 0.42 0.12 0.77 0.81 0.87 0.14 0.37 0.29 0.42 0.44 0.15

Use a 0.05 level of significance to test whether there is a significant difference


in the amount decreased between the new and old drug.

When preparing the data to be uploaded for the software the same format as
that used for the independent sample-test is to be used. We should have two
columns of data, the first column is a fixed factor with two levels which
specifies the drug used. So this variable will be called Drug. The second
column which will be called Size_decrease contains the decrease in the size
of the tumour (in cm) after 2 weeks.

Upon testing for normality using the Shapiro-Wilk tests p-values of 0.14 and
0.03 were obtained for the size decrease of the Control group and of the New
Drug Group respectively. This means that normality for the variable
Size_decrease is only satisfied for the control group. Thus, the Mann-Whitney
test should be used. The following hypothesis will be tested:

H 0 : The median decrease in size is the same for both groups.


H 1 : The median decrease in size is not the same for both groups.

188 | P a g e
In SPSS

From the bar menu choose Analyze, select NonParametric tests, Legacy
Dialogs and the click on Two-Independent samples. Move Size_decrease
under Test Variable List and Drug under Grouping Variable.

Figure 4.3.5

Press the tab Define Groups to define the groups considered. For ‘Group 1’
we enter 1 which refers to the Control group and for ‘Group 2’ we enter 2
which refers the new drug group. Press Continue then Ok and we get the
following output:

Test Statisticsa
in cm
Mann-Whitney U 255.000
Wilcoxon W 555.000
Z -.681
Asymp. Sig. (2-tailed) .496
a. Grouping Variable: Drug

Table 4.3.2
189 | P a g e
Since the p-value (0.496) is bigger than the level of significance (0.05), we
cannot reject H 0 . So we can conclude that, on average, both drugs have the
same effect on reduction of tumour size.

In R/RStudio

In R or RStudio the Mann-Whitney test is conducted using the command


wilcox.test as shown below:

#Uploading the data


datamanwhit<-read.csv( file.choose(), header = TRUE)
# this command will open a window which will allow us to look
# for and open the required data file
attach(datamanwhit)
names(datamanwhit) # list the variables in my data
str(datamanwhit) # list the structure of mydata
View(datamanwhit) # open data viewer

#Adding labels to the levels of the fixed factor


datamanwhit$Drug<-
factor(datamanwhit$Drug,levels=c(1,2),labels= c("Control",
"New Drug"))

# Checking if both groups satisfy normality


by(datamanwhit$Size_decrease,datamanwhit$Drug,
shapiro.test)
#Conducting the Mann Whitney test
wilcox.test(datamanwhit$Size_decrease ~ datamanwhit$Drug,
data = datamanwhit, exact = FALSE)

The following output is obtained:

Wilcoxon rank sum test with continuity correction


data: datamanwhit$Size_decrease by datamanwhit$Drug
W = 255, p-value = 0.5027
alternative hypothesis: true location shift is not equal to
0

190 | P a g e
Since the p-value (0.503) is bigger than the level of significance (0.05), we
cannot reject H 0 . So we can conclude that, on average, both drugs have the
same effect on reduction of tumour size.

4.3.3 The Paired Sample Wilcoxon Signed Ranks Test

The paired sample Wilcoxon Signed Ranks test is a non-parametric analogue


to the Paired-Sample t-test. It is used to test the null hypothesis that there is
no significant difference between two related samples.

Example: (Taken from https://www.r-bloggers.com/wilcoxon-signed-rank-


test/ accessed on 21/07/2017) The mayor of a city wants to see if pollution
levels are reduced by closing the streets to car traffic. This is measured by the
rate of pollution taken every 60 minutes (from 8am to 10pm: a total of 15
measurements) in a day when traffic may pass through, and in a day of closure
to traffic. The following are the values of air pollution obtained:

With traffic 214, 159, 169, 202, 103, 119, 200, 109, 132, 142, 194, 104, 219,
119, 234
Without traffic 159, 135, 141, 101, 102, 168, 62, 167, 174, 159, 66, 118, 181, 171,
112

Use a 0.05 level of significance to test whether there is a significant difference


in the amount of pollution present in a day with traffic versus a day without
traffic.

Considering the fact that observations were obtained from the same city (with
its peculiarities, weather, ventilation, etc.) albeit in two different days, the data
may be considered to be paired data. The Shapiro Wilk test applied to both
variables leads us to conclude that we are unable to assume a normal
distribution for the pollution values recorded. Thus, we must proceed with
testing using a non-parametric test, the Wilcoxon signed rank test.

191 | P a g e
The structure that should be used when entering the data should involve two
columns, the first one called WithTraffic and the second one called
WithoutTraf.

The hypothesis to be tested is:

H 0 : Pollution levels are not affected by closing streets to car traffic (Median
hourly pollution is unaltered)

H 1 : Closing streets to car traffic has significantly reduced the hourly median
pollution level

In SPSS

From the bar menu choose Analyze, select NonParametric tests, Legacy
Dialogs and the click on 2 Related-Samples. The following dialogue box will
appear:

Figure 4.3.6

192 | P a g e
Follow the instructions shown in Figure 4.3.6 and press Ok. The following
output is obtained:

Test Statisticsa
WithoutTraf -
WithTraffic
Z -3.681b
Asymp. Sig. (2-tailed) .000
a. Wilcoxon Signed Ranks Test
b. Based on positive ranks.

Table 4.3.3

Since the p-value (0/2 = 0 – since we have a one-tailed test) is smaller than
the level of significance (0.05), we reject H 0 .
So we can conclude that, closing streets to car traffic has significantly reduced
the hourly median pollution level.

In R/RStudio

In R or RStudio the Wilcoxon Signed Ranks test is also conducted using the
command wilcox.test .

First we upload the data and check for normality:

datapairedwil<-read.csv( file.choose(), header = TRUE)


# this command will open a window which will allow us to look
# for and open the required data file
attach(datapairedwil)
names(datapairedwil) # list the variables in my data
str(datapairedwil) # list the structure of mydata
View(datapairedwil) # open data viewer

shapiro.test(datapairedwil$WithTraffic)#Checking normality
shapiro.test(datapairedwil$WithoutTraf)#Checking normality

We then use the Wilcoxon signed ranks test to conduct the one-tailed test
described earlier:

193 | P a g e
wilcox.test(datapairedwil$WithTraffic,
datapairedwil$WithoutTraf,alternative="greater",
paired=TRUE,exact = FALSE)

The following output is obtained:


Wilcoxon signed rank test with continuity correction

data: datapairedwil$WithTraffic and datapairedwil$WithoutT


raf
V = 170, p-value = 0.0001266
alternative hypothesis: true location shift is greater than
0

Since the p-value (0.0001266) is smaller than the level of significance (0.05),
we reject H 0 . So we can conclude that, closing streets to car traffic has
significantly reduced the hourly median pollution level.

4.3.4 The Friedman Test

The Friedman test is the non-parametric equivalent of the repeated


measures ANOVA.

Example: Suppose that 10 subjects are asked to rate 4 different wines from 0
(I hate it!) to 5 (I love it!). The results are found in the following table:

Wine 1 Wine 2 Wine 3 Wine 4


Subject 1 0 5 1 4
Subject 2 3 4 2 5
Subject 3 1 4 3 4
Subject 4 4 2 2 3
Subject 5 2 2 4 3
Subject 6 0 3 5 5
Subject 7 3 1 3 4
Subject 8 5 3 1 5
Subject 9 1 5 2 4
Subject 10 2 4 0 3

194 | P a g e
We would like to test whether the median score for each of the above wines
is the same throughout, or whether some wines are preferred more than others.
Hence we test the following hypothesis:

H 0 : The median score is the same for all wines.


H 1 : The median score differs between at least some of the wines.

In other words in the alternative hypothesis we assume that some wines are
rated better than others.

The Friedman test is the ideal test to use in this case, since the different wines
have been rated by the same 10 subjects. It thus stands to reason that the rating
that one subject gives to one particular wine is related to the rating the same
subject gives to another wine. So here we are dealing with four related
samples.

Note that if we had to tests for multivariate normality as we have done for
reapeated measures ANOVA, from the output below we note that this is
satisfied and hence this data could be analysed using repeated measures
ANOVA. Here we shall consider the Friedman test keeping in mind that for
such data its power is less than that of the repeated measures ANOVA.

> library(energy)
> mvnorm.etest(Friedman,R=200)

Energy test of multivariate normality: estimated


parameters

data: x, sample size 10, dimension 4, replicates 200


E-statistic = 0.92531, p-value = 0.56

In SPSS

For the purpose of examining this data using SPSS, it needs to be entered into
four different columns. Each column will contain the scores given by each
subject to the different wine. So we shall call each variable Wine1, Wine2,
Wine3, Wine4, respectively.

195 | P a g e
From the bar menu choose Analyze, select NonParametric tests, Legacy
Dialogs and then click on K Related Samples. The following dialogue box
will appear:

Figure 4.3.7

Move the variables ‘Wine1’, ‘Wine2’, ‘Wine3’ and ‘Wine4’ simultaneously


into the test variable list, click on Statistics and select Descriptive. Click on
Continue and OK to run the procedure. The following tables are obtained:

Descriptive Statistics
N Mean Std. Deviation Minimum Maximum
Score for Wine1 10 2.10 1.663 0 5
Score for Wine2 10 3.30 1.337 1 5
Score for Wine3 10 2.30 1.494 0 5
Score for Wine4 10 4.00 .816 3 5

Table 4.3.4

196 | P a g e
Test Statisticsa
N 10
Chi-Square 7.979
df 3
Asymp. Sig. .046
a. Friedman Test

Table 4.3.5

The mean rating scores for wines 1, 2, 3 and 4 are respectively 2.10, 3.30, 2.30
and 4. Since the p-value for the Friedman test (0.046) is less than the level of
significance (0.05), we reject H 0 . So there is a significant difference in the
quality of the different wines. However note that the p-value is almost equal
to 0.05 and hence our conclusion is not very ‘strong’. When this happens it is
recommended to increase the sample size. In this example this would imply
asking more subjects to score the wines.

The mean rating scores suggest that wine 4 is significantly superior to wines
1 and 3 but to confirm this we would need to consider post hoc tests. These
tests are discussed later in section 4.4.

In R/RStudio

For the purpose of examining this data using R software it needs to be entered
into three columns:

 The first column will represent a factor with 4 levels each level
representing a different wine so we call this variable Wine.
 In the second column we have a factor with 10 levels, each level refers
to a different subject, call this variable Subject.
 The third column will contain the scores given by the subjects to each
wine and hence will be called Scores.

Uploading the data saved in .csv format:

197 | P a g e
datafriedman<-read.csv( file.choose(), header = TRUE)
# this command will open a window which will allow us
# to look for and open the required data file
attach(datafriedman)
names(datafriedman) # list the variables in my data
str(datafriedman) # list the structure of mydata
View(datafriedman) # opens data viewer

We then type:

friedman.test(datafriedman$Scores,datafriedman$Wine,
datafriedman$Subject)

The order in which we input the variables inside the friedman.test command
is important. We first need to put the response variable (in this case the score).
The second input must be the grouping variable (in this case the wine).
Finally, the third input must be the variable pertaining to the subject
(sometimes also referred to as blocks). The output for the Friedman test, in
this case, is:

Friedman rank sum test


data: datafriedman$Scores, datafriedman$Wine and
datafriedman$Subject
Friedman chi-squared = 7.9787, df = 3, p-value = 0.04645

The test statistic related to this test is the Friedman chi-squared statistic. For
the data being considered, it has a value of 7.9787. The p-value of 0.04645 is
small enough to reject the null hypothesis of no difference between groups
(that is no difference in the quality of wines) at 0.1 and 0.05 level of
significance. At a 0.01 level of significance, however, the null hypothesis is
not rejected.

If we stick to the conventional 0.05 level of significance we can conclude that


there is a significant difference between the median scores given to the
different wines. Post hoc tests can then be applied to test where the differences
actually lie. These tests are discussed later in section 4.4.

198 | P a g e
4.3.5 The Kruskal Wallis Test

The Kruskal-Wallis test is a non-parametric analogue to the One-way Anova.


It is used to test the null hypothesis that there is no significant difference in
the medians of several independent groups.

Example: (https://statistics.laerd.com/spss-tutorials/kruskal-wallis-h-test-using-spss-
statistics.php) A medical researcher has heard anecdotal evidence that certain
anti-depressive drugs can have the positive side-effect of lowering
neurological pain in those individuals with chronic, neurological back pain,
when administered in doses lower than those prescribed for depression. The
medical researcher would like to investigate this anecdotal evidence with a
study. The researcher identifies 3 well-known, anti-depressive drugs which
might have this positive side effect, and labels them Drug A, Drug B and Drug
C. The researcher then recruits a group of 60 individuals with a similar level
of back pain and randomly assigns them to one of three groups – Drug A,
Drug B or Drug C treatment groups – and prescribes the relevant drug for a 4
week period. At the end of the 4 week period, the researcher asks the
participants to rate their back pain on a scale of 1 to 10, with 10 indicating the
greatest level of pain. The recorded data is found in the table which follows:

Drug A Drug B Drug C Drug A Drug B Drug C


9 8 3 6 6 2
8 6 4 7 5 3
7 5 5 8 6 4
8 6 4 9 4 2
7 5 5 9 7 3
8 6 3 9 8 4
9 5 4 8 6 4
9 6 3 8 7 4
8 7 3 9 8 3
7 5 3 9 6 2

The researcher wants to compare the levels of pain experienced by the


different groups at the end of the drug treatment period.

199 | P a g e
The Kruskal-Wallis test is the ideal test to use in this case, since we have three
different sets of independent samples and the Shapiro-Wilk normality test
shows that the scores obtained on Drug A and Drug C are not normally
distributed.

To run a Kruskal-Wallis H test to compare the pain score between the three
drug treatments the data should be entered into two columns. The first column
which we label Pain_Score represents the dependent variable consisting of
the scores given by the patients and the second column, which we label
Drug_Treatment_Group, represents the factor with three levels, each level
represents a different drug.

In SPSS
From the bar menu choose Analyze, select NonParametric tests, Legacy
Dialogs and the click on K Independent Samples. The following dialogue box
will appear:

Figure 4.3.8

200 | P a g e
Move Pain_Score into the test variable list and then move
Drug_Treatment_Group, into the grouping variable list. Click on Define
range and write 1 for the minimum value and 3 for the maximum value.

Click on Continue and OK to run the procedure.

Test Statisticsa,b
Pain_Score
Chi-Square 47.280
df 2
Asymp. Sig. .000
a. Kruskal Wallis Test
b. Grouping Variable: Drug
given
Table 4.3.4
Since the p-value (0) is smaller than the level of significance (0.05), H 0 is
rejected. So there is a significant difference in the median pain scores given
by patients administered different drugs. Post hoc tests can than be applied to
tests which means are actually significantly different.

In R/RStudio

Uploading the data saved in .csv format:

datakw<-read.csv( file.choose(), header = TRUE)


# this command will open a window which will allow us to look
# for and open the required data file
attach(datakw)
names(datakw) # list the variables in my data
str(datakw) # list the structure of mydata
View(datakw) # open data viewer

The command for running the Kruskal-Wallis test is kruskal.test. For the data
under study this is used as follows:

kruskal.test(datakw$Pain_Score,datakw$Drug_Treatment_Group)

201 | P a g e
The output for the Kruskal-Wallis test, in this case, is:

Kruskal-Wallis rank sum test

data: datakw$Pain_Score and datakw$Drug_Treatment_Group


Kruskal-Wallis chi-squared = 47.28, df = 2, p-value = 5.41e
-11

Since the p-value is very small it can be considered to be equal to 0 and hence
smaller than the level of significance (0.05). Thus H 0 is rejected. So there is
a significant difference in the median pain scores given by patients
administered different drugs. Post hoc tests can than be applied to tests which
means are actually significantly different. Such tests are discussed in the next
section.

4.4 Post-Hoc Analysis

When comparing more than two means/medians, whether we are dealing with
independent or related samples, parametric or non-parametric tests, post-hoc
analysis is the procedure of looking into the data to identify where the
differences truly lie. As a result, we can group the different samples in the
analysis into homogenous subgroups. Thus far, we have considered four tests
for comparing more than two means/medians. These are the following:

 one-way ANOVA: this is a parametric test for comparing means of


more than two independent samples;
 repeated measures ANOVA: this is a parametric test for comparing
means of more than two related samples
 Friedman test: this is the non-parametric equivalent of repeated
measures ANOVA, and is used for comparing medians of more than
two related samples
 Kruskal-Wallis test: this is the non-parametric equivalent of one-way
ANOVA, and is used for comparing medians of more than two
independent samples.

202 | P a g e
The most straightforward way to compare where the differences lie, is to
perform pairwise comparisons on all possible pairs using the two-sample
equivalent of the test in question. The two-sample equivalent for one-way
ANOVA is the independent samples t-test, for Kruskal-Wallis test is the
Mann-Whitney test, for repeated measures ANOVA is the paired samples t-
test and for Friedman test is the Wilcoxon test. This will give us an indication
of where the differences lie. However, it would be naïve to use these pairwise
comparisons individually, when the aforementioned four tests are comparing
all means simultaneously. The reason for this is the following.

To pick one particular example, suppose we are performing a one-way


ANOVA test comparing three samples, and with level of significance 0.05 -
note that this argument will hold for any of the aforementioned tests, with any
number of samples and for any level of significance, but we shall stick to this
example for illustration purposes. This means that if the hypothesis of no
difference between the means of all three samples is true, the probability that
the one-way ANOVA will not reject this hypothesis is 0.95. Now suppose
that we are performing pairwise comparisons using independent samples t-
test, each at 0.05 level of significance, assuming that for all three pairwise
comparisons the null hypothesis of no difference is correct. A mathematical
argument can be used to show that the probability that all three pairwise
comparisons simultaneously accept the hypothesis of no difference is greater
than or equal to 1-3(0.05)=0.85. This means that, under the assumption that
all three null hypotheses are correct, the probability of wrongly rejecting at
least one of them is less than or equal to 0.15, a boundary much larger than
the level of significance of 0.05 for ANOVA.

In this section, we shall look at the Bonferroni approach to post-hoc analysis.


This is done as follows – if the main test (e.g. one-way ANOVA) is conducted
at  level of significance, and the test yields m pairwise comparisons, then we
test each pairwise comparison at an /m level of significance. Hence, in the
main test we have that if the hypothesis of no difference between means is
correct, the probability not rejecting it is 1-.

203 | P a g e
On the other hand, under the assumption that the hypothesis of no difference
holds true for all m pairwise comparisons, the probability of accepting the null
hypothesis of no difference for all m pairwise comparisons is greater or equal
to 1 –  if we test each with level of significance /m. Consequently, the
probability of wrongly rejecting at least one of them is less than or equal to .
In most statistical software, what happens when the Bonferroni method is
applied is that the p-value for the individual pairwise comparison is adjusted
by being multiplied by m - if this is greater than 1, then the p-value for
simultaneous comparison is set to 1.

The Bonferroni approach is not the only post-hoc method, but it will be the
method which we will focus on in this unit. However, it can be noticed that
the Bonferroni approach is dependent on the number of pairwise comparisons
m, and it will therefore have low power for large m. There are other
approaches, some of which differ in the p-value adjustment method, and some
of which are not dependent on m. Furthermore, some post-hoc approaches are
unique to specific tests and cannot be applied for all, as in the Bonferroni case.

We shall now have a look at how we can perform Bonferroni post-hoc tests in
both SPSS and R.

In SPSS

For one-way ANOVA:

We go back to the flavour comparison example in the one-way ANOVA


section (section 4.2.4). We go to Analyze, Compare Means, One-Way ANOVA
and put flavour in the factor list, and score in the dependent list, then select
the Post Hoc option on the side. Then, in Figure 1, we select the Bonferroni
option and press Continue then OK.

204 | P a g e
Figure 4.4.1

The output for the Bonferroni multiple comparisons can be seen in Table
4.4.1. Since there are 3 possible pairwise comparisons (Orange-Lemon,
Orange-Cherry and Lemon-Cherry) then, in comparison to the standard t-
test, the p-value has been adjusted by being multiplied by 3. It can be seen
that, for these simultaneous multiple comparisons, only Orange and Lemon
are significantly different at 0.05 level of significance (but not at 0.01 level
of significance).

Multiple Comparisons
Dependent Variable: score
Bonferroni
Mean Difference 95% Confidence Interval
(I) flavour (J) flavour (I-J) Std. Error Sig. Lower Bound Upper Bound
Orange Lemon 4.30769* 1.67954 .046 .0645 8.5509
Cherry .50000 1.79640 1.000 -4.0385 5.0385
Lemon Orange -4.30769* 1.67954 .046 -8.5509 -.0645
Cherry -3.80769 1.76472 .116 -8.2661 .6507
Cherry Orange -.50000 1.79640 1.000 -5.0385 4.0385
Lemon 3.80769 1.76472 .116 -.6507 8.2661
*. The mean difference is significant at the 0.05 level.
Table 4.4.1
205 | P a g e
The following are the homogenous subgroups arising from the post-hoc test
at 0.05 level of significance:

Group 1: Orange, Cherry


Group 2: Lemon, Cherry

Repeated measures ANOVA:

To be able to do pairwise comparisons between months in the pain score vs


time example implemented earlier, select Analyze, General Linear Model,
Repeated Measures, Define, press on Options, move Time underneath Display
Means for, choose Compare main effects, and select Bonferroni from the list
underneath Confidence interval adjustment as shown in Figure 4.4.2. Then
press Continue and OK. The resulting output is shown in Table 4.4.2.

Figure 4.4.2

206 | P a g e
Pairwise Comparisons
Measure: Score
95% Confidence Interval for
Mean Difference Differenceb
(I) Time (J) Time (I-J) Std. Error Sig.b Lower Bound Upper Bound
Month 2 Month 4 .950* .351 .030 .073 1.827
Month 6 2.125* .347 .000 1.256 2.994
Month 4 Month 2 -.950* .351 .030 -1.827 -.073
Month 6 1.175* .370 .009 .249 2.101
*
Month 6 Month 2 -2.125 .347 .000 -2.994 -1.256
Month 4 -1.175* .370 .009 -2.101 -.249
Based on estimated marginal means
*. The mean difference is significant at the .05 level.
b. Adjustment for multiple comparisons: Bonferroni.

Table 4.4.2

Since all the resulting p-values are less than 0.05 after the Bonferroni p-value
adjustment, all the pairwise comparisons contributed towards to a significant
difference in mean pain scores at 0.05 level of significance, showing that there
was a change in the pain levels experienced by the patients, throughout the
whole period of study. Since all levels of the time factor are distinct, each
level forms a distinct homogenous group.

Friedman test:

To perform Bonferroni post-hoc tests for the Friedman test in the previously
conducted wine example, we first do Analyze, NonParametric tests, Related
Samples. The window displayed in Figure 4.4.3 is opened. In the tab Fields,
select Use custom field assignments and move the the variables ‘Wine1’,
‘Wine2’, ‘Wine3’ and ‘Wine4’ simultaneously into the test fields list. In the
tab Settings (see Figure 4) select Customize tests, Friedman 2-Way Anova by
ranks (k samples) and make sure that the Multiple Comparisions option below
the latter test is set at All pairwise.

207 | P a g e
Figure 4.4.3

Figure 4.4.4

208 | P a g e
In the output, you get Table 4.4.3.

Table 4.4.3

To obtain a more detailed output double click on this table in the SPSS output
file. The window in Figure 4.4.5 is opened. In the lower right corner (see red
circle in Figure 4.4.5) there is an option view, select Pairwise comparisons to
see the results of post hoc tests displayed in Figure 4.4.6.

Figure 4.4.6

209 | P a g e
Figure 4.4.7

From the output displayed in Figure 4.4.7, we can see that the Bonferroni
corrected p-values suggest that all medians are equal at 0.05 level of
significance, though Wine 1 and Wine 4 are significantly different at 0.1 level
of significance. This means that difference between Wine 1 and Wine 4 is
most likely the highest contributor to the difference between the groups. This
contradiction is probably due to the fact that we are working with small
sample sizes and hence the power of the tests is weak. Also the p-value for
the Friedman tests was almost equal to 0.05 which is the ‘grey area’ value.
The only way one can improve these results is by increasing the sample sizes.
However, at 0.05 level of significance, all levels of the wine factor fall in one
homogenous group. At a 0.01 level of significance we have:

Group 1: Wine 1, Wine 2, Wine 3


Group 2: Wine 2, Wine 3, Wine 4

210 | P a g e
Kruskal-Wallis test:

To apply the post-hoc Bonferroni procedure to the Kruskal-Wallis test in the


previous pain score vs drug example implemented earlier, we go to Analyze,
Non-Parametric Tests, Independent Samples, put the ‘drug’ variable in
‘Group’ and the ‘pain score’ variable in ‘Test Fields’ in the Fields window
(see Figure 4.4.8), and select ‘Customise tests’ then select all pairwise
multiple comparisons for the Kruskal-Wallis 1-way ANOVA in the Settings
window (see Figure 4.4.9), then press Run.

Figure 4.4.8

211 | P a g e
Figure 4.4.9

SPSS then gives us the following output which we double-click (see Table
4.4.3). If we select ‘Multiple Comparisons’ in the ‘View’ option, we obtain
the table we require (Figure 4.4.10).

Table 4.4.3

212 | P a g e
Figure 4.4.10

It can be seen that, in Figure 4.4.10, that after the Bonferroni p-value
adjustment, all 3 drugs are still significantly different from each other. This
means that at 0.05 level of significance, and even at 0.01 level of significance,
every drug pertains to a distinct homogenous group.

213 | P a g e
In R/RStudio

For one-way ANOVA:

Using the dataflavours variable from the one-way ANOVA example, we use
the following command for the Bonferroni approach using the pairwise t-test:
> pairwise.t.test(dataflavours$score, dataflavours $flavour, p.adjust.m
ethod = "bonferroni",pool.sd = TRUE, paired = FALSE,alternative="two.si
ded")

to obtain the following output:

Pairwise comparisons using t tests with pooled SD

data: dataflavours$score and dataflavours$flavour

Orange Lemon
Lemon 0.046 -
Cherry 1.000 0.116

P value adjustment method: bonferroni

For these simultaneous multiple comparisons, only Orange and Lemon are
significantly different at 0.05 level of significance (but not at 0.01 level of
significance). The following are the homogenous subgroups arising from the
post-hoc test at 0.05 level of significance:

Group 1: Orange, Cherry


Group 2: Lemon, Cherry

For repeated measures ANOVA:

In this case, we can also apply the pairwise paired sample t-test on the
Paindatalong variable created earlier, by applying the Bonferroni approach,
as follows:

> pairwise.t.test(Paindatalong$score, RMAnova$tonth, p.adjust.method =


"bonferroni", paired = TRUE,alternative="two.sided")

to obtain the following output:

214 | P a g e
Pairwise comparisons using paired t tests

data: Paindatalong$Score and Paindatalong$Month


Month 2 Month 4
Month 4 0.0299 -
Month 6 1.1e-06 0.0088

P value adjustment method: bonferroni

Since all the resulting p-values are less than 0.05, all the pairwise comparisons
contributed towards to a significant difference in mean pain scores at 0.05
level of significance, showing that there was a change in the pain levels
experienced by the patients, throughout the whole period of study. Since all
levels of the time factor are distinct, each level forms a distinct homogenous
group.

For Friedman test:

In this case, we can also apply the pairwise Wilcoxon test the datafriedman
variable created earlier, applying the Bonferroni approach, as follows:

pairwise.wilcox.test(datafriedman$Scores, datafriedman$Wine, p.adjust.m


ethod = "bonferroni", paired = TRUE,alternative="two.sided")

to obtain the following output:

Pairwise comparisons using Wilcoxon signed rank test

data: Friedman$Scores and Friedman$Wine

Wine 1 Wine 2 Wine 3


Wine 2 1.00 - -
Wine 3 1.00 1.00 -
Wine 4 4 0.12 0.96 0.12

P value adjustment method: bonferroni

From the output, we can see that the Bonferroni corrected p-values suggest
that all medians are equal at 0.05 level of significance, and even at 0.1 level
of significance.

215 | P a g e
It can be seen though that the difference between Wine 1 and Wine 4, and
Wine 3 and Wine 4, are the major contributors to the significant difference
detected by the Friedman test. This is probably due to the fact that we are
working with small sample sizes and hence the power of the tests is weak.
Also, the p-value for the Friedman tests was almost equal to 0.05 which is the
‘grey area’ value. The only way one can improve these results is by increasing
the sample sizes. In this case, all wines belong to just one homogenous group,
both at 0.05 and 0.1 level of significance.

For Kruskal-Wallis test:

In this case, we can also apply the pairwise Mann-Whitney test to the datakw
variable, applying the Bonferroni approach, as follows:

> pairwise.wilcox.test(datakw$Pain_Score, datakw$Drug_Treatment_Group,


p.adjust.method = "bonferroni", paired = FALSE,alternative="two.sided")

to obtain the following output:

Pairwise comparisons using Wilcoxon rank sum test

data: kruskalwallis$Pain_Score and kruskalwallis$Drug_Treatment_Group


Drug A Drug B
Drug B 2.6e-05 -
Drug C 1.3e-07 6.3e-07

P value adjustment method: bonferroni

It can be seen that, after the Bonferroni p-value adjustment, all 3 drugs are
still significantly different from each other, even at 0.01 level of significance.
Since all drugs perform distinctly different, each drug forms a distinct
homogenous group.

Other post-hoc tests

We shall now discuss other post-hoc tests which can be applied to the
abovementioned tests.

216 | P a g e
First of all, the Bonferroni is not the only possible adjustment for simultaneous
multiple comparisons, and the R commands for pairwise.t.test and
pairwise.wilcox.test also allow for the following adjustments:

 Holm (1979) ("holm")


 Hochberg (1988) ("hochberg")
 Hommel (1988) ("hommel")
 Benjamini & Hochberg (1995) ("BH" or its alias "fdr")
 Benjamini & Yekutieli (2001) ("BY")

The option ("none") is also included, which does not involve any p-value
adjustments whatsoever. SPSS also implements the Sidak correction (Sidak
1967).

There are also other tests specific to the individual tests. For the one-way
ANOVA test, one can also implement the Scheffe and the Tukey tests. For the
Friedman test, one can also implement the Nemenyi and the Conover tests.
For the Kruskall-Wallis test, one can also implement the Kruskal-Nemenyi,
Kruskal-Conover and the Kruskal-Dunn tests. For post-hoc analysis on
repeated measures ANOVA, we show how one can do post-hoc analysis using
the Pillai test statistic, which is used in Multivariate ANOVA (MANOVA).
To be able to used this command, the package phia is used as follows:
library(phia)

testInteractions(model, pairwise="design", idata=idata,


adjustment="bonferroni")

Multivariate Test: Pillai test statistic


P-value adjustment method: bonferroni
Value Df test stat approx F num Df den Df
Month_2-Month_4 0.950 1 0.15833 7.337 1 39
Month_2-Month_6 2.125 1 0.48950 37.395 1 39
Month_4-Month_6 1.175 1 0.20530 10.075 1 39
Pr(>F)
Month_2-Month_4 0.029946 *
Month_2-Month_6 1.075e-06 ***
Month_4-Month_6 0.008794 **

217 | P a g e
Since all the resulting p-values are less than 0.05, all the pairwise comparisons
contributed towards to a significant difference in mean pain scores, showing
that there was a change in the pain levels experienced by the patients,
throughout the whole period of study. This post-hoc test can also be applied
when between-subject effects are present.

4.5 Correlation Analysis

Correlation is one of the most common forms of data analysis. Correlation is


a measure of the dependence between scalar or ordinal or rank variables. Here
we shall consider three bivariate correlation coefficients, namely Pearson’s
correlation coefficient, Spearman’s rho and Kendall’s tau.

Consider for example, height and weight of individuals. These are related -
taller people tend to be heavier than shorter people. However, this relationship
is not perfect. People of the same height vary in weight and we may easily
think of two people we know, where the shorter one is heavier than the taller
one. Nonetheless, the average weight of people of height 1.6m is less than the
average weight of people of height 1.7m and the average weight of the two is
less than that of people who are 1.8m tall, etc. Correlation can tell us just how
much of the variation in peoples' weights is related to their heights.

Choice of a particular correlation coefficient is based on the type of variables


being considered (continuous/categorical). In general, the values of the
correlation coefficients lie in a range between -1 and 1. Values closer to 1 or
-1 indicate that there is a strong relationship between the variables whilst
values closer to 0 indicate that there is little or no relationship between the
two variables. The sign of a correlation coefficient describes the type of
relationship between the variables being considered.

N.B: Note that each one of the coefficients, considered here, is sensitive to
outliers. That is, in the presence of outliers they can give misleading results.

218 | P a g e
Hence, before computing such coefficients, the variables should be screened
and any outlying values should be corrected or removed.

Pearson product-moment correlation coefficient (Pearson’s correlation, for


short) is used when exploring the linear relationship between two normally
distributed covariates. Kendall tau and Spearman’s rho correlation
coefficients are suitable alternatives to the Pearson correlation coefficient if
the two variables under study are not bivariate normally distributed.
Spearman’s rho is also used instead of Pearson when normality is satisfied but
the relationship is not linear. More detail about when each coefficient should
be used is provided below.

4.5.1 The Pearson Correlation Coefficient

Pearson product-moment correlation coefficient is a measure of the presence,


strength and direction of a linear relationship between two covariates having
a joint bivariate normal distribution.

For example, you could use a Pearson’s correlation to understand whether


there is a linear relationship between calories and sugar content of cereals.
You could also use a Pearson's correlation to understand whether there is an
association between salary and length of unemployment.

Pearson’s correlation coefficient is an appropriate measure of the relationship


between two variables only if the variables satisfy the following assumptions:

1. The two variables have a joint bivariate normal distribution.


This assumption is needed in order to assess the statistical significance
of the Pearson correlation. Bivariate normality is a special case of
multivariate normality and hence we can make use of the multivariate
normality test which was introduced earlier when discussing repeated
measures ANOVA (section 4.2.5). Recall that this test is only available
in R through the command mvnorm.etest from the package energy.

219 | P a g e
2. A relationship between the variables exist which is linear.
Whilst there are a number of ways to check whether a linear
relationship exists between your two variables, the simplest way is by
plotting a scatterplot (as shown in Chapter 3) of one variable against
the other variable, and then visually inspect the scatterplot to check for
linearity. Your scatterplot may look something like the ones presented
in Figure 4.5.1 where figures a, b, c, and d display linear relationships
while figures e and f display non-linear relationships.

As such, linearity is not actually an assumption of Pearson's


correlation. However, you would not normally want to pursue a
Pearson's correlation to determine the strength and direction of a
linear relationship when you already know the relationship between
your two variables is not linear. Instead, the relationship between your
two variables might be better described by another statistical measure.
For this reason, it is not uncommon to view the relationship between
your two variables in a scatterplot to see if running a Pearson's
correlation is the best choice as a measure of association or whether
another measure would be better.

(https://statistics.laerd.com/spss-tutorials/pearsons-product-moment-
correlation-using-spss-statistics.php, September 2017)

220 | P a g e
Figure 4.5.1 Examples of scatter plots. The Pearson correlation
coefficient is denoted by r.

3. No outliers are present.


One of the simplest way of checking for the presence of outliers
between two covariates is by looking that their box plots (as explained
in Chapter 3). Further outlier diagnostics will be presented later in
Chapter 5 when discussing linear regression models.

For the Pearson correlation coefficient,


- A positive correlation coefficient indicates that there is a positive
linear relationship between the variables: as one variable increases
in value, so does the other. The closer the value is to 1 the ‘more
linear’ is the relationship. See figures a and c in Figure 4.5.1.

- A negative value indicates a negative linear relationship between


variables: as one variable increases in value, the other variable
decreases in value. The closer the value is to -1 the ‘more linear’ is
the relationship. See figures b and d in Figure 4.5.1

221 | P a g e
Example: An ice cream manufacturer wants to test whether temperatures have
an effect on ice cream sales. The following is a table of 10 daily pairs of
readings concerning temperatures and ice cream sales.

Temperature 14.2 16.2 11.9 15.2 18.5 22.1 19.4 25.1 23.4 18.1
(Celsius)
Ice-cream 2154 3256 1859 3321 4062 5221 4120 6144 5449 4211
Sales
(in Euros)

Use a 0.05 level of significance to test whether the correlation between the
two variables is significant.

We test the following hypothesis:

H 0 : The two variables are not linearly dependent :   0


H1 : The two variables are linearly dependent :   0

 , denotes the population correlation coefficient between the two variables.

Box plots issued for both variables identified no outliers and the multivariate
normality test conducted in R software gave a p-value of 0.905, hence we
accept the hypothesis that our data follows a bivariate distribution.
Furthermore scatter plot indicated the presence of a linear relationship
between the two variables. Thus we can proceed to calculate the Pearson
correlation coefficient with its significance test.

In SPSS

To perform correlation analysis using SPSS, it is required to input the two


variables into separate adjacent columns in the Data View.

222 | P a g e
Then go to Analyze, Correlate, Bivariate, move the two variables underneath
Variables, select Pearson from underneath Correlation Coefficients (as
shown in Figure 4.5.2) and press OK.

Figure 4.5.2

The resulting output is as follows:

Correlations
Ice_Cream_Sal
Temperature es
Temperature Pearson Correlation 1 .985**
Sig. (2-tailed) .000
N 10 10
Ice_Cream_Sales Pearson Correlation .985** 1
Sig. (2-tailed) .000
N 10 10
**. Correlation is significant at the 0.01 level (2-tailed).

Table: 4.5.1

223 | P a g e
From Table 4.5.1, the Pearson correlation coefficient is 0.985 and the resulting
p-value is 0.000 which is less than 0.05 meaning that the resulting correlation
is significant. So there is a strong, positive linear relationship between the
two variables.

In R/RStudio

The Pearson correlation coefficient together with the hypothesis tests of


significance is conducted by means of the function ‘rcorr’ found in package
‘Hmisc’. The following commands were used to find the Pearson correlation
coefficient for the two variables being considered in this study:

datapearson<-read.csv( file.choose(), header = TRUE)# this


command will open a window which will allow us to look for
and open the required data file
attach(datapearson)
names(datapearson)# list the variables in my data
str(datapearson)# list the structure of my data

rcorr(datapearson$Temperature,datapearson$Ice_Cream_Sales,
type="pearson") # type can be pearson or spearman

The following output is obtained:

x y
x 1.00 0.98
y 0.98 1.00

n= 10

P
x y
x 0
y 0

From the above output we note that for the variables under study Pearson
correlation coefficient is 0.98, which is close to 1. P-value = 0 which is less
than 0.05 hence the coefficient is significantly different from zero. Hence we
conclude that the variables are strongly, positively linearly related.
224 | P a g e
4.5.2 The Kendall Rank Correlation Coefficient

The Kendall rank correlation coefficient, commonly referred to as Kendall's


tau (τ) coefficient, is a statistic that is used to measure and test for any
association between two ordinal variables. This test is non-parametric as it
does not rely on any assumptions on the distributions of the two variables
under study. In particular, the Kendall tau correlation coefficient is a suitable
alternative to the Pearson correlation coefficient if the two variables under
study are not bivariate normally distributed.

This correlation coefficient was developed to cater for two different scenarios:
 as a measure of agreement for the same subject/object
 as a measure of agreement between different subjects/objects

In the first instance, suppose that a trainee was asked to arrange (rank) a
number of objects during quality control. Suppose that this same trainee was
asked to arrange (rank) the same objects, the following day. The Kendall
correlation coefficient will provide a measure of agreement on the two sets of
ranks given by the same person. This coefficient may thus be used as a
measure of repeatability (reliability) on the quality of judgement of an
individual.

In the second instance, suppose that the ranks obtained by the trainee on the
first day, have to be compared to the ranks given by an experienced employee.
The Kendall correlation coefficient will in this case provide a measure of
agreement between two different persons. It will give a measure on the
similarity of judgment of different individuals.

Kendall’s correlation coefficient can take values from -1 to 1, both inclusive.


The larger the correlation the stronger the agreement between the scores of
the two variables. A value of 1 indicates perfect agreement, a value of 0
indicates no agreement and a value of -1 indicates perfect disagreement.

225 | P a g e
Example: Two interviewers ranked 10 candidates according to their
suitability to fill in a vacancy within the company, with a rank of 1 meaning
the most suitable candidate and a rank of 10 meaning the least suitable
candidate. The ranks given are shown in the table which follows.

Interviewer 1 1 3 5 9 7 10 8 4 2 6
Interviewer 2 3 2 6 10 7 8 9 5 1 4

Use a 0.05 level of significance to test whether the two variables are
independent.

Since the data being considered in this example is rank data, testing whether
the two variables are related or not should be carried out using the Kendall tau
correlation coefficient. No bivariate normality testing is needed here. Pearson
correlation coefficient should not be used with rank data. We test the
following hypothesis:

H 0 : X and Y are independent    0


H1 : X and Y are not independent    0

In SPSS

To be able to use the Kendall tau-b correlation coefficient in SPSS, enter one
column for each of the two variables in the data. Then go to Analyze,
Correlate, Bivariate, move the two variables underneath Variables, select
Kendall’s tau-b from underneath Correlation Coefficients (as shown in Figure
4.1.3) and press OK.

226 | P a g e
Figure: 4.5.3

The resulting output is as follows:

Correlations
Interviewer_1 Interviewer_2
Kendall's tau_b Interviewer_1 Correlation Coefficient 1.000 .733**
Sig. (2-tailed) . .003
N 10 10
Interviewer_2 Correlation Coefficient .733** 1.000
Sig. (2-tailed) .003 .
N 10 10
**. Correlation is significant at the 0.01 level (2-tailed).

Table:4.5.2

From Table 4.5.2 Kendall tau-b correlation coefficient is 0.733 and the
resulting p-value is 0.003 which is less than 0.05 meaning that the resulting
correlation is significant. So there is agreement in the rankings (similar ranks)
given by the two interviewers.

227 | P a g e
In R/RStudio

The following commands were used to find the Kendall correlation coefficient
for the two variables being considered in this study:

dataKendall<-read.csv( file.choose(), header = TRUE)


# this command will open a window which will allow us to look
# for and open the required data file

attach(dataKendall)
names(dataKendall) # list the variables in my data
str(dataKendall) # list the structure of mydata
View(dataKendall) # open data viewer

cor.test(Interviewer.1, Interviewer.2, method="kendall")

Kendall's rank correlation tau


data: Interviewer.1 and Interviewer.2
T = 39, p-value = 0.002213
alternative hypothesis: true tau is not equal to 0
sample estimates:
tau
0.7333333

From the resulting output, Kendall tau-b correlation coefficient is 0.733 and
the resulting p-value is 0.003 which is less than 0.05 meaning that the resulting
correlation is significant. So there is agreement in the rankings (similar ranks)
given by the two interviewers.

4.5.3 The Spearman Correlation Coefficient

The Spearman rank correlation coefficient, commonly referred to as


Spearman's rho (ρ) coefficient, is a statistic that is used to measure the rank
correlation (statistical dependence between the rankings of two variables).
More specifically Spearman's correlation determines the degree to which the
relationship between the two variables can be described using a monotonic
function.

228 | P a g e
A monotonic function is a function that increases (decreases) monotonically,
does not exclusively have to increase (decrease), it simply must not decrease
(increase). See Figures 4.5.4 to 4.5.6 which are taken from
https://en.wikipedia.org/wiki/Monotonic_function.

Figure 4.5.4 : A monotonically increasing function. It is strictly increasing


on the left and right while just monotonic (unchanging) in the middle.

Figure 4.5.5 : A monotonically decreasing function.

229 | P a g e
Figure 4.5.6 : A function which is not monotonic.

Note that: A linear function is a monotonic function.

Like the Kendall tau, this test is non-parametric as it does not rely on any
assumptions on the distributions of the two variables under study. Hence it is
a suitable alternative to the Pearson correlation coefficient if the two variables
under study are ordinal variables or covariates for which the assumption of
bivariate normality is not satisfied.

One would not normally want to pursue a Spearman's correlation to determine


the strength and direction of a monotonic relationship if s/he already knows
the relationship between your two variables is not monotonic. Hence prior to
computing this correlation coefficient it is recommended to look at the scatter
plot of one variable against the other variable (see Chapter 3), and then
visually inspect the scatterplot to check for monotonicity.

Example: Consider once again the example presented in the Kendall tau
section. This time we shall repeat the correlation analysis using Spearman
correlation. Given that we already know that the assumption of bivariate
normality is not satisfied, we move on to checking monotonicity by plotting a
scatter plot of the two variables.

230 | P a g e
Figure 4.5.7

The scatter plot in Figure 4.5.7 indicates that the relationship between the two
variables is linear hence monotonic. Next we test the following hypothesis:

H 0 : X and Y are independent    0


H1 : A relationship exists between X and Y which can be modeled
by some monotonic function    0

In SPSS

To be able to use the Spearman correlation coefficient in SPSS, enter one


column for each of the two variables in the data. Then go to Analyze,
Correlate, Bivariate, move the two variables underneath Variables, select
Spearman from underneath Correlation Coefficients (as shown in Figure
4.1.8) and press OK.

231 | P a g e
Figure 4.5.8

The resulting output is as follows:

Correlations
Interviewer_1 Interviewer_2
Spearman's rho Interviewer_1 Correlation Coefficient 1.000 .891**
Sig. (2-tailed) . .001
N 10 10
Interviewer_2 Correlation Coefficient .891** 1.000
Sig. (2-tailed) .001 .
N 10 10
**. Correlation is significant at the 0.01 level (2-tailed).

Table: 4.5.3

From Table 4.5.3, the Spearman correlation coefficient is 0.091 and the
resulting p-value is 0.001. So there is a monotonic relationsip between the
rankings given by the two interviewers.

232 | P a g e
As you can see the conclusion here is different from that made when looking
at the Kendall tau correlation coefficient but this is not surprising since one is
measuring agreement and the other is measuring the presence of a monotonic
relationship. A monotonic relationship can exist even when there is no
agreement.

In R/RStudio

The following commands were used to find the Pearson correlation coefficient
for the two variables being considered in this study:

dataSpearman<-read.csv( file.choose(), header = TRUE)


# this command will open a window which will allow us to look
# for and open the required data file
attach(dataSpearman)
names(dataSpearman) # list the variables in my data
str(dataSpearman) # list the structure of mydata
View(dataSpearman) # open data viewer

cor.test(Interviewer.1, Interviewer.2, method="spearman")

The following result is obtained:

Spearman's rank correlation rho


data: Interviewer.1 and Interviewer.2
S = 18, p-value = 0.00138
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.8909091

From the resulting output, the Spearman correlation coefficient is 0.891 and
the resulting p-value is 0.001 which is less than 0.05 meaning that the resulting
correlation is significant. So there is a monotonic relationsip between the
rankings given by the two interviewers. As you can see the conclusion here is
different from that made when looking at the Kendall tau correlation
coefficient but this is not surprising since one is measuring agreement and the
other is measuring the presence of a monotonic relationship. A monotonic
relationship can exist even when there is no agreement.
233 | P a g e
4.5.4 The Pearson Partial Correlation Coefficient

A popular variation of the Pearson correlation is the partial correlation. The


latter is useful when we wish to look at the relationship between two variables
while removing the effect of one or two other variables.

For example, suppose we have data on the height, weight and age of a number
of individuals and we find that there is correlation between the variables
weight, height and age. Now suppose we are interested in the relationship
between height and weight when the effect of age is eliminated. We might
suspect that the correlation between weight and height might be due to the fact
that both weight and height are correlated with age and not because there truly
exists a relationship between height and weight. Thus we can use a partial
correlation coefficient to eliminate the effect of age from the two variables
and see if they are still correlated.

The Pearson partial correlation should be used when:


 The variables under study have a joint multivariate normal distribution
and are linearly related.
 The interest is an association between two variables but want to factor
out the effect of the other variables.
 There are no significant outliers in any of the variables.

Example: A nutritionist is conducting a study to see how people choose


breakfast cereal. A sample of 20 cereals was selected at random. The table
which follows displays the grams of Carbohydrates (Carbs) and Sugars per
100g serving of each cereal and values on the variable "Rating"1 that were
calculated using Consumer Reports.

1
The higher the rating the more popular is the cereal.

234 | P a g e
Carbs 65 70 55 50 100 110 110 120
Sugars 7.20 9.16 8.56 6.45 10.78 14.50 12.73 14.23
Rating 58.14 68.40 66.73 93.70 39.70 22.74 22.40 21.87
Carbs 120 50 90 131 120 116 80 90
Sugars 12.26 5.52 9.712 13.31 15.10 12.37 11.58 13.64
Rating 28.04 53.31 45.81 19.82 39.26 23.80 68.24 74.47
Carbs 90 88 141 142
Sugars 11.89 10.11 17.72 14.38
Rating 72.80 89.24 29.24 39.10

Test the hypothesis,

H 0 : There is no correlation between rating and carbohydrates after


contolling for sugar.
H1 : There is a correlation between between rating and carbohydrates after
contolling for sugar.

The multivariate test for normality introduced earlier when discussing Pearson
correlation coefficients was conducted and it confirmed that the variables
under study can be assumed to follow a multivariate normal distribution.
Pearson correlation coefficients were computed for each pair of variables and
were found to be significant. Their values are reported in the table below:

Pearson Correlation Coefficients


Carbohydrates Rating Sugars
**
Carbohydrates 1 -.744 .898
Rating -.744 1 -.585
Sugars .898 -.585 1

Table: 4.5.4

235 | P a g e
In SPSS

To be able to use the Pearson partial correlation coefficient in SPSS, enter one
column for each of the three variables in the data. Then go to Analyze,
Correlate, Partial, move the two variables of interest underneath Variables,
and the variable you want to control for under Controling for (as shown in
Figure 4.1.9) and press OK.

Figure 4.5.9

The resulting output is as follows:

Correlations
Control Variables Rating Carbohydrates
Sugars Rating Correlation 1.000 -.612
Significance (2-tailed) . .005
df 0 17
Carbohydrates Correlation -.612 1.000
Significance (2-tailed) .005 .
df 17 0

Table: 4.5.5

236 | P a g e
Note that from Table 4.4.4, the Pearson correlation that was obtained for
rating and carbohydrates, when not controlling for sugar, was -0.744. Recall
that the partial coefficient gives us a measure of the association between rating
and carbohydrates while removing the association between sugar and
carbohydrates and sugar and rating. From Table 4.4.5 we note that the partial
Pearson correlation coefficient for rating and carbohydrates is -0.612 with a
p-value of 0.005, which is less than 0.05, hence the coefficient is still
significant, showing that the linear relationship between these two variables
is not in fact a result of the influence of sugar.

In R/RStudio

In R/ Rstudio partial correlation coefficients are computed using the pcor.test


command which is found in the package ppcor.

The following commands were used to find the partial Pearson correlation
coefficient for the three variables being considered in this study:

datapartial<-read.csv( file.choose(), header = TRUE)


# this command will open a window which will allow us to look
# for and open the required data file

library(ppcor)
attach(datapartial)
names(datapartial) # list the variables in my data
str(datapartial) # list the structure of mydata
View(datapartial) # open data viewer

pcor.test(Carbohydrates, Rating,Sugars, method="pearson")

The following output is obtained:

estimate p.value statistic n gp Method


1 -0.612051 0.005349426 -3.191063 20 1 pearson

Note that from Table 4.4.4, the Pearson correlation that was obtained for
rating and carbohydrates, when not controlling for sugar, was -0.744.

237 | P a g e
Recall that the partial coefficient gives us a measure of the association
between rating and carbohydrates while removing the association between
sugar and carbohydrates and sugar and rating. From Table 4.4.5 we note that
the partial Pearson correlation coefficient for rating and carbohydrates is -
0.612 with a p-value of 0.005, which is less than 0.05, hence the coefficient is
still significant, showing that the linear relationship between these two
variables is not in fact a result of the influence of sugar.

238 | P a g e

You might also like