0% found this document useful (0 votes)
30 views47 pages

Non-Parametric Tests

This document discusses non-parametric tests and the chi-square test. It provides descriptions of 5 common non-parametric tests: Chi-Square, Mann-Whitney U-test, Wilcoxon Ranked Sign Test, Kruskal-Wallis Test, and Friedman test. It then focuses on the chi-square test, outlining its objectives, assumptions, and how to calculate it using SPSS through cross-tabulation and chi-square statistics.

Uploaded by

addis zewd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views47 pages

Non-Parametric Tests

This document discusses non-parametric tests and the chi-square test. It provides descriptions of 5 common non-parametric tests: Chi-Square, Mann-Whitney U-test, Wilcoxon Ranked Sign Test, Kruskal-Wallis Test, and Friedman test. It then focuses on the chi-square test, outlining its objectives, assumptions, and how to calculate it using SPSS through cross-tabulation and chi-square statistics.

Uploaded by

addis zewd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

Non-parametric Tests

1. Chi-Square (χ2)
2. Mann-Whitney U- test (for independent samples)
3. Wilcoxon Ranked Sign Test (for related samples)
4. Kruskal – Wallis Test
5. Friedman test (for related samples)
Non-parametric Tests
• Researchers most often prefer to use a parametric test rather than a
nonparametric test.
• This is partly because a parametric test is more powerful (it is better
for finding an effect when it really is there).
• Second, we are able to calculate means and standard deviations,
which provide a nice clear summary of the data.
• There are a number of reasons for deciding that it is not appropriate
to use a parametric test, particularly when one or more of the
assumptions of a parametric test is violated.
• If you know that the population from which you drew the data is not
normally distributed or you know that the data is not from a
continuous variable then you would undertake a nonparametric test.
• For example, if a teacher is rating students on effort on a 100-point
scale and only rates the children between either 0 and 40 or 60 and
100 we can see that the variable is not continuous as the middle
range is not being used and certainly does not look to be normally
distributed.
1. Chi-Square (χ )
2
What is Chi-Square?
• The Chi-square test is used to test whether there is a
significant difference between the observed number of
responses in each category and the expected number of
responses to such categories under the assumptions of null
hypothesis.
• So, its objectives is to find out how well the distribution of
observed frequencies (fo) fit the distribution of expected
frequencies (fe). This is also called goodness – fit test.
• The Chi-square test applies only to discrete data (discrete
variables are those expressed in frequency counts).
• The test is based upon the concept of independence or the
idea is that one variable is not affected by, or related to
another.
What is Chi-Square?
• For example, in a test in which 90 individuals are given a
blindfold test in order to determine their selection of the best
of three brands of soft drink, the results could be interpreted
by the Chi-square test.
• If there were no significant differences in preference we
would expect the individual to choose Brand A, Brand B or
Brand C in about equal proportion.
• On the basis of a null hypothesis any variation could
possibly be attributed to sampling error.
• We would hypothesize that the choices were independent,
or not related to any factor other than probability.
• The test would provide a method of testing the difference
between actual preferences and choices based upon a
probability assumption.
• The Chi-square formula:
Chi-Square Tests
• Chi-square one-sample test
SA A N D SD

18 7 9 5 20

• Chi-square two-sample case


Yes No
Male 34 38
Female 41 26

• Chi-square k – sample case (X2 Test of homogeneity)


Favour Against Neutral
Year I 43 62 21
Year II 56 35 15
Year III 37 51 18
Year IV 49 23 20
Chi-Square Tests
• The first step is to derive the hypothesis (the alternative hypothesis—H1) we
wish to test and the second, related step, is to then state our corresponding null-
hypothesis (H0).
• In this case the hypotheses are:
– • H0: There is no difference between male and female students in terms of academic
dismissal.
– • H1: There is a difference between male and female students in terms of academic
dismissal.
• As can be seen, we are dealing with a non-directional, two-tailed hypothesis in
this case, as we are not providing any indication of what the differences are
likely to be (i.e. is it male students who are more likely to be dismissed than
female students or vice versa?).
• The third step is to identify what two variables we are dealing with. Clearly one
variable relates to the students’ sex and the other to whether they have been
dismissed or not.
• The first variable, “GENDER” is clearly a nominal variable.
• As for the second variable, as it only has two categories (i.e. students being
reported “academic dismissal” or “promoted” at end of the semester) then we
also treat this as a nominal variable.
• Having identified our two variables the fourth step is to choose an appropriate
statistical test to use.
• As we can see that as we are dealing with two nominal variables, we should use
the Chi-Square test. The fifth step is to actually run the Chi-Square test.
Chi-Square Tests
• When we look at the output, we can see that underneath the cross
tabulation table we have a new box with a number of test statistics in it
(‘Pearson Chi-Square’, ‘Likelihood Ratio’, and ‘Linear-by-Linear
Association’).
• This, as will often be the case with SPSS output, is more information
than we really need. The one we want to look at is the first one,
‘Pearson Chi-Square’. As mentioned above, we are not that interested in
the actual statistics (the first column) and the same goes for the second
column (df).
• The column we want to look at is the third one, labelled ‘Asymp. Sig.’
(asymptotic significance). This is the p-value.
• That means that, using the 0.05 cut-off point, our difference can be
statistically significant or not.
• We will need to include the chi square statistic and df when we write up
our results, so other researchers can replicate our findings. We would
report our findings a bit like this:
• ‘A significant difference was found in academic dismissals of boys and
girls (chi square = 14.81, df = 3, p = 0.002)’.
Conditions for Chi- square test
• We can only use the chi square test if the following
conditions are met:
• 1. The two variables we are looking at have to be nominal or
ordinal, not continuous.
• 2. No cell should have an expected value of less than one.
We can find this out by looking at the expected values in the
cross tabulation table. None of these should be less than
one.
• 3. No more than 20 per cent of the cells should have
expected values less than five.
• We can find this out by counting the number of cells with an
expected value of less than five in the cross tabulation table
(e.g. 10) and seeing what percentage that is of all cells (e.g.
20 cells in total would make 50 per cent so we should not
use the chi square test in this example).
Cross tabulation
• How do we know which variable to put in the rows and
which in the columns?
• As a rule, we will put the independent variable in the
columns and the dependent variable in the rows.
• The dependent variable is our outcome variable, the one we
are predicting, or the effect.
• The independent variable is the predictor or the cause.
• So, for example, if we were looking at the relationship
between gender and body image, gender would be seen as
the cause and put in the columns, and body image as the
effect (dependent) and put in the rows. (It would not make
sense to hypothesise that body image ‘caused or predicted’
gender!)
Cross tabulation
• In some cases we do not have a hypothesis that can help us
determine which variable is dependent and which is
independent (e.g. when looking at the relationship between
gender and ethnicity among my students).
• In that case it does not matter which variable we put in the
columns and which in the rows.
• Even when we can clearly distinguish dependent and
independent variables, one thing to point out is that it makes
no difference to the calculation of the cross tabulation which
is which.
• Therefore this is not something you should get too worried
about. This is merely a convention, so feel free to break it if
you want.
Data Entry for Chi-Square Test
• There are two methods for entering data for a crosstabulation and chi-square.
• Data Entry Method 1
• Each person can be entered separately, using the procedure to code the
categories appropriately as we will be using nominal data.
• Data Entry Method 2
• Total frequency counts can be entered by Weight Cases.
• This procedure is particularly useful if you have a large dataset.
• Each combination of scores is entered along with the total frequency count for
that group.
• In order for SPSS to associate the frequency counts with the variables, the
following procedure should be used.
• Go to Data and Weight Cases
• The variable we want to weight the categories by is frequency, therefore select
the Weight Cases by option and send the variable ‘freq’ over to the Frequency
Variable box.
• Click on OK.
• You can be sure that the procedure has been carried out successfully as in the
bottom right-hand corner of the screen a message saying ‘Weight On’ will be
displayed.
Steps in calculating Chi-Square on SPSS
• Open the dataset and then select Analyze → Descriptive
Statistics → Crosstabs. . . .
• This brings up the main Crosstabs window.
• 1. Place “dependent variable” in rows and “independent
variable” in columns using the arrow buttons.
• 2. Click on the “Cell” button to select “Row Percentages” to
allow comparisons the proportions of male and female
students.
• 3. After this, click on the “Statistics . . .” button to open up
the second “Crosstabs: Statistics” window.
• 4. Select “Chi-square” and also “Phi and Cramer’s V” and
then click “Continue” to return to the main window. Click
“OK” to complete the procedure.
Chi-square as a ‘goodness of fit’ test
• As mentioned in the introduction, chi-square can also be
used as a ‘goodness of fit’ test.
• This assesses whether the pattern of results matches (or
fits) a predicted pattern.
• From the drop-down menu, choose Analyze,
Nonparametric Tests, then Chi-Square.
• The test variable should be sent to the Test Variable List.
• The variable ‘freq’ can be left because if the data has been
entered via the weight cases method, this is no longer
necessary.
• • As we are testing our null hypothesis that the numbers in
each category will be equal, we can leave the Expected
Values box set to the default as All categories equal.
Chi-square as a ‘goodness of fit’ test
• SPSS Output
• The first table produced by SPSS shows the observed and expected
counts for the categories given.
• The Observed counts show us how many people belong to each
category.
• • The Expected counts confirm how many people we would expect to
belong to each category if the null hypothesis was true. We can see
from the table whether or not the observed and expected counts are
quite different from each other. This is confirmed by inspection of the
Residual column.
• In order to assess whether the distribution found through data collection
differs from
• the expected pattern, the Test Statistics table needs to be examined.
• Suppose we see from the table χ 2 = 34.320, df = 3, p < 0.001, we can
conclude that there is a significant difference between the observed and
expected counts, and we can therefore reject the null hypothesis; the
counts are not equally distributed.
• The findings can be expressed as follows:
• χ 2 = 34.320, df = 3, p < 0.01
Interpreting the output
• There are three pieces of information we need to draw from
the output.
• The first relates to the need to check that the assumptions
upon which the Chi-Square test are based have been met.
Basically, we should only use the Chi-Square test if two
conditions are met:
– • no more than 20 percent of cells in the contingency table should
have expected values less than five;
– • no cell has an expected value of less than one.
• The actual test statistic calculated for this particular test is
the Pearson Chi-Square statistic. For the Chi-Square test
we also need to quote the degrees of freedom associated
with this statistic and this is also given in the output.
• The final and most important piece of information is the
probability that is also provided in the output as indicated
under the column headed “Asymp. Sig.”.
The measure of effect size: phi
• The other question that the chi square tests did not answer was how
strong the relationship is.
• You might think we could answer this question by looking at the p-value.
The more significant, the lower the p-value and therefore the stronger
the relationship.
• This is not correct, however. The significance level is only partly
determined by the strength of the relationship. It is equally determined
by sample size.
• Therefore, we need a different measure to look at the strength of the
relationship, or the effect size.
• For a contingency table that only has four cells (i.e. only two categories
for each of the two variables) we use Phi as the effect size measure.
When dealing with Phi, we can ignore the minus sign.
• For any tables larger than this we would use Cramer’s V.
• If you are calculating Chi-square value manually, they are usually easy
to calculate measures of strength of relationship.
• The effect size for the chi square test, which is called phi, is calculated
by taking the square root of the calculated value of chi square divided by
the overall sample size.
2. Mann-Whitney U- test
(for independent samples)
Mann–Whitney U test
• When we wish to compare two samples, the test of
choice is usually the t test.
• However, the t test is a parametric test which makes
certain assumptions about our data.
• When these assumptions are not met the t test may not
be appropriate so we perform nonparametric tests which
do not require these assumptions.
• A common occasion when we might use a nonparametric
test is when our data is ordinal rather than interval.
• When we produce ratings as our data, the data may not
be interval.
• For example, if we are asked to judge the politeness of
the children in a class or to rate the management potential
of the staff, we use ratings.
Mann–Whitney U test
• We are not like clocks and speedometers, in that we
may not use interval scales.
• We may rate all the children very highly on a 0 to 10
point scale and never score a child below 7.
• Similarly, we may rate one staff member as one point
higher than another but believe that there is a big
difference between them and rate two others quite
differently despite seeing them as similar.
• With ratings we cannot trust that the data is from an
interval scale.
• However, we can trust the order of the ratings.
• That is why nonparametric tests do not analyze the raw
scores but rank the data and analyze the ranks.
Mann–Whitney U test
• The Mann–Whitney test is a nonparametric equivalent of the
independent samples t test.
• We use a nonparametric test when the assumptions of the t test are
not met.
• The t test requires interval data and the samples to come from
normally distributed populations, with equal variances in both groups.
• The most common use of the Mann–Whitney test is when our data are
ordinal.
• A rating scale is usually treated as ordinal, particularly if we are
concerned that the full range of the scale is not being used by the
participants.
• If we were interested in seeing whether there was a difference
between boys and girls in their enjoyment of the speaking tasks they
often do in the language classes, we might give them a 0 to 10 scale
and ask them to rate their satisfaction with the tasks on this scale.
• We can guess that our participants do not rate the tasks very low even
when they are unhappy doing them, so we say the scale is not an
interval scale.
Mann–Whitney U test
• The Mann–Whitney test provides us with a statistic that allows us to
decide when we can claim a difference between the samples (at our
chosen level of significance).
• It calculates two U values, a U for the boys and a U for the girls.
• If both values are about the same, it means that the samples are
very mixed amongst the ranks and we have no difference between
them.
• If one U is large and the other small, then this indicates a separation
of the groups amongst the ranks.
• Indeed, if the smaller of the two U values is zero it indicates that all
of one sample are in the top ranks and all the other sample in the
bottom ranks with no overlap at all.
• To test the significance of our difference we take the smaller of the
two U values and examine the probability of getting this value when
there is no difference between the groups.
• If this probability is lower than our significance level (p < 0.05
normally) we can reject the null hypothesis and claim a significant
difference between our samples.
Mann–Whitney U test
• With large samples the distribution of U
approximates the normal distribution so
SPSS also gives a z score as well as the
smaller of the two U values.
• We need to be careful if we get a lot of tied
ranks, particularly with small sample sizes
as these make the outcome of the Mann–
Whitney test less valid.
Mann–Whitney U test: Senario
• Our EFL students come from two differing backgrounds: some
completed their secondary education in private schools in urban
centers and the others completed their secondary education in
government schools in rural-like towns.
• Now both groups are attending their freshman program in our
university, where the same material is used for the course
Communicative English Skills.
• One of our colleagues decides to find out how much the students
enjoyed the course so asks everyone to rate their enjoyment of
the course on a scale of 0 to 100.
• The members of the first group like to see themselves as very
proficient users of English so our colleague predicts that they will
rate their enjoyment of the course lower than the members of the
second group.
• Enter the dataset in two different columns. Because of the
independent groups design one column will be the grouping
variable and the other column the rating score for each person.
Mann–Whitney U test: on SPSS
• Go to Analyze and select Nonparametric Tests from the
dropdown menu.
• SPSS refers to the Mann–Whitney as 2 Independent Samples.
• Choose this to bring up the Two Independent Samples Tests
box.
• The dependent variable ‘rating’ should be sent to the Test
Variable List box.
• The independent variable ‘background’ should be sent to the
Grouping Variable box.
• SPSS then requires the user to define the range of values used
when setting up the dataset to define the groups.
• To do this click on Define Groups.
• The values given to the groups in this example are 1 and 2.
• Enter these in the respective boxes then click on Continue.
• Ensure that Mann–Whitney U is selected under Test Type, then
click on OK.
Mann–Whitney U test: SPSS output
• The first table generated by SPSS is a description of the data giving the Mean Rank for
each group and the Sum of Ranks for each group.
• N indicates the number of participants in each group, and the total number of participants.
• The Mean Rank indicates the mean rank of scores within each group.
• The Sum of Ranks indicates the total sum of all ranks within each group.
• If there were no differences between the groups’ ratings, i.e. if the null hypothesis was
true, we would expect the mean rank and the sum of ranks to be roughly equal across the
two groups.
• We can see from our example that the two groups do not appear to be equal in their
ratings of the material.
• In order to determine whether the difference in these rankings is significant, the Test
Statistics table must be observed.
• The test statistic generally reports the Mann–Whitney U.
• The probability value is ascertained by examining the Asymp. Sig. (2-tailed).
• A figure of less than 0.05 is considered to be indicative of significant differences.
• The Asymp. Sig. (2-tailed) level is an approximation that is useful when the dataset is
large, and is used when SPSS cannot give an exact figure or takes too long to work out
the significant value.
• The Exact Sig. is based on the exact distribution of the test statistic (in this case U). This
should be reported when the dataset is small, poorly distributed or contains many ties.
Reporting this significance level reflects a more accurate judgement of significance when
working with datasets of this nature.
3. Wilcoxon Ranked Sign Test
(for related samples)
Wilcoxon signed-ranks test
• The Wilcoxon signed-ranks test is the nonparametric equivalent of
the related t test and is used when we do not believe that the
assumptions of the t test are met.
• As the samples are related we have matched scores in the two
samples.
• For example, a group of people rate their wakefulness on a ten-point
scale in the morning and again in the afternoon.
• Our two samples are the ratings in the morning and the ratings in
the afternoon, with each participant providing a score in each
sample.
• As the samples are matched, the Wilcoxon test produces a
‘difference score’ for each participant, with each person’s score in
one sample taken from the same person’s score in the second
sample.
• If all these differences go in the same direction (they are either all
positive or all negative) and if there is a large enough sample size
then this is convincing evidence that there is a difference between
the groups.
Wilcoxon signed-ranks test
• However, if some differences are positive and some negative then it is
harder to judge if there is a consistent difference between the groups.
• To work this out the Wilcoxon test ranks the size of the differences (ignoring
the sign of the difference) from lowest to highest.
• Then the ranks of the positive differences are added up and the ranks of the
negative differences are added up.
• The smallest of these two totals is taken as the calculated value of the
Wilcoxon statistic T.
• If most of the differences go one way with only a few differences in the
opposite direction, with the discrepant differences being small, this will
result in a very small T.
• When we calculate T by hand we can then compare the calculated value
with the critical values of T, at an appropriate significance level, in a
statistics table.
• However, SPSS calculates a z score as there is a relationship between T
and z (with the distribution of T approximating the normal distribution,
particularly for sample sizes of 25 and over).
Wilcoxon signed-ranks test
• When the difference between scores is zero, this
indicates that a person has given the same score in both
samples.
• As this provides us with no information about which
sample has the larger scores we have to reject the
individual’s scores from the data and reduce the sample
size by one.
• If our samples get too small we may not be able to claim
a difference between them.
• Also, nonparametric tests work more accurately without
tied ranks, so if there is a large number of tied ranks and
if the sample size is small we must be careful that the
analysis is appropriate.
Wilcoxon signed-ranks test: Scenario
• Two parallel materials were in use for teaching speaking
for almost a year.
• A researcher purposively selected a sample of fourteen
outstanding students towards the end of the academic
year.
• After being informed about their role, they were given
ample time to go through the revised materials.
• They were asked to rate the two final versions of
materials used for language teaching on a scale of 1 to
20 in terms of their suitability for students.
• Is one material rated significantly higher than the other
by the raters?
Wilcoxon signed-ranks test: on SPSS
• Enter the dataset in the same way we do for paired
sample t-test. Because of the repeated measures design
we can see that each rater has two ratings.
• Go to Analyze and select Nonparametric Tests from
the drop-down menu.
• SPSS refers to the Wilcoxon test as 2 Related Samples.
Choose this to bring up the Two Related Samples
Tests box.
• Select both variables and send them over to the Test
Pair(s) List box.
• Both variables will now appear on the same line.
• Check that Wilcoxon is selected under Test Type, and
then click on OK.
Wilcoxon signed-ranks test: SPSS output
• The first table that is produced by SPSS is the table of
descriptive statistics, which gives us a summary of the ranks
of the candidates. In the table the number of negative,
positive and tied ranks is indicated, along with the Mean
Rank and the Sum of Ranks.
• If Material 1 has been entered into the equation first, and
therefore the calculation of ranks is based on the ratings of
Material 1 minus the ratings of Material 2.
• If Material 2 has been entered into the equation first, and
therefore the calculation of ranks is based on the ratings of
Material 2 minus the ratings of Material 1.
• The Negative Ranks therefore indicate how many ranks of
one material were larger than the other.
• The Positive Ranks indicate the number of ranks of one
material that were smaller than the other.
• Finally, the Tied Ranks indicate how many of the rankings of
Material 1 and Material 2 were equal.
Wilcoxon signed-ranks test: SPSS output
• The Total is the total number of ranks, which will be equal to the total
number of judges.
• Other information that can be gained from the table includes the Mean
Rank and a Sum of Ranks for both the positive and negative ranks.
• The table that shows the our inferential statistics is the Test Statistics
table shown next.
• The first thing that can be seen is that SPSS generates a Z score
rather than the Wilcoxon T which will be generated when working out
calculations by hand.
• From the Test Statistics table it can be seen Z score (e.g. Z =
−2.194).
• A two-tailed analysis is carried out by default, which yields p value
(e.g. p = 0.028, which is significant at p < 0.05).
• From this we can conclude that the judges rated the two materials
significantly differently, with Material 1 or 2 being significantly favored
by the students, as indicated by the positive and negative ranks.
• The findings of the Wilcoxon test should be reported as follows:
z = −2.194, N = 9, p < 0.05
4. Kruskal – Wallis Test
Kruskal – Wallis Test
• Kruskal–Wallis test is a nonparametric tests we can use instead
of the one way ANOVA or one factor independent measures
ANOVA.
• When we wish to undertake a nonparametric analysis and have
a single independent measures factor (independent variable)
with more than two samples we choose the Kruskal– Wallis test.
• For example, a researcher chooses four towns of different sizes
and inhabitants are asked to rate their overall happiness on a
100-point scale. The researcher is interested in seeing whether
there is an effect of town size on the ratings of happiness.
• The key feature of many nonparamatric tests is that the data is
treated as ordinal and the first part of the analysis involves
ranking the data.
• The Kruskal–Wallis test is no different. All the scores (from all the
conditions) are ranked from lowest to highest.
• After that an analysis similar to the ANOVA is undertaken on the
ranks.
Kruskal – Wallis Test
• The statistic H (rather than F in the ANOVA) gives a measure of
the relative strength of the variability in the ranks between the
conditions compared to a standard value for this number of
participants.
• Within the Kruskal–Wallis formula are calculations based on the
ranks and the squares of the ranks.
• When we have tied values and hence tied ranks it can cause us
a problem.
• For example, with only three scores we have the ranks 1, 2 and
3. If we square these ranks we get 1, 4 and 9 which gives a total
of 14.
• If the last two values were tied we would have to give the ranks
1.5, 1.5 and 3. When these are squared we get 2.25, 2.25 and 9,
giving a total of 13.5.
• If there are only a few ties we normally do not worry. However,
with a lot of ties the Kruskal–Wallis test can be inappropriate.
Kruskal – Wallis test: Scenario
• A researcher wanted to do an experimental test to check which of the
three instructional techniques, a group discussion, lecture and a mix
of group discussion and lecture, would be favoured by clever students
who agreed to take part.
• Three sections were selected and one technique was applied in each
section.
• After a month of employing the techniques, the participants were
asked to rate their perception about the usefulness of the
instructional techniques on a 50- point scale (ranging from 0, much
useless, to 50, much useful).
• Six people from the first section that tried the group discussion
method,
• Five students from the second section that used lecture method, and
• Seven from the third section that tried a mix of group discussion and
lecture method
• Is there an effect of instructional techniques on their ratings?
Kruskal – Wallis test: on SPSS
• The data should be entered into SPSS in a similar manner to
the independent ANOVA.
• Due to the design being independent groups, one column will
be the grouping variable and the other the rating score for
each person.
• The command for the Kruskal–Wallis test is found under the
drop-down menu Analyze, followed by Nonparametric
Tests.
• SPSS refers to Kruskal–Wallis as K Independent Samples.
• Send the dependent variable to the Test Variable List box –
in our case this is the variable ‘relax’.
• Send the independent variable to the Grouping Variable box.
• SPSS will then require you to define the range of your groups
by clicking on the Define Range option and entering the
minimum and maximum values assigned to your groups.
Kruskal – Wallis test: SPSS output
• The first table generated by SPSS is the Ranks table, which is a
description of the data giving the number of participants and the Mean
Rank for each group.
• N is the number of participants in each group and the total number of
participants and the Mean Rank indicates the mean rank of scores
within each group.
• We can see from the table whether or not the groups appear to be equal
in their ratings of the issue given.
• In order to determine whether the difference in the rankings is significant
we must look at the Test Statistics table.
• The test statistic to report is the Kruskal–Wallis Chi-Square, which has
a certain value (e.g. χ 2 = 13.262).
• The Asymp. Sig. gives the probability value (e.g. p = 0.035).
• Then it can be reported in such a way: χ 2 = 13.262, df = 2, p < 0.05).
• We can conclude that the difference between the ratings of the three
groups is significant.
• SPSS always presents the results as a chi-square rather than the
Kruskal–Wallis H. This is because the distribution of H closely
approximates that of the chi-square.
5. Friedman test for related
samples
Friedman test for related samples
• The Friedman test is a nonparametric equivalent of the
Repeated Measures ANOVA.
• We use the Friedman test as a nonparametric equivalent
of a one-factor repeated measures ANOVA in cases
where the assumptions for the ANOVA are not met.
• For example, if we have ratings and therefore believe
that the data are not interval or ratio we would use the
Friedman test.
• As an example, a researcher is interested in television
viewers’ preferences for different types of programs and
asks 20 people to rate the following four categories of
program in terms of their interest on a 1 to 10 scale:
soap opera, documentary, action/adventure, current
affairs.
Friedman test for related samples
• One person might rate the programs as follows: soap opera
8, documentary 3, action/adventure 9, current affairs 1.
• A second person might produce the following ratings: soap
opera 6, documentary 5, action/ adventure 7, current affairs
4.
• Even though the ratings are very different both participants
have given the same order for the four types of program.
• It is the order of participants’ results that the Friedman test
analyses.
• The first part of a Friedman test is to rank order the results
from each participant separately. So both participants
described above would have the following ranks: 3, 2, 4, 1.
• These ranks are then analysed in a similar way to an
ANOVA. It is not exactly the same because, as we are
dealing with ranks, certain values are fixed and it makes our
analysis somewhat easier.
Friedman test for related samples
• The Friedman test produces a chi-square statistic, with a
large value indicating that there is a difference between the
rankings of one or more of the conditions.
• While there are not the assumptions that we require for a
one factor repeated measures ANOVA, we should examine
how many tied ranks we have in the Friedman test.
• The fewer tied ranks the more appropriate the analysis.
• As you can imagine, in the above example a person’s
scores of 5, 7, 7, 7 would give ranks of 1, 3, 3, 3, and this is
not very informative data for this type of analysis.
• Fortunately, as we are ranking within each participant rather
than across all participants, we do not normally have a lot of
tied ranks in a Friedman test.
Friedman test: Scenario
• A new teaching material was on a pilot test for a
semester.
• A group of students were using this new course material
for FLEn 102.
• Among these students, ten outstanding ones, who were
assumed to be responsible, were selected
• They were asked to rate the quality of each component
of the material (reading, speaking, writing, listening and
grammar) on a scale of 0 to 100 (from bad to good).
• Is there a significant difference between the five
components of the material in their rated quality?
Friedman test: On SPSS
• The dataset is set up like the repeated measures
ANOVA. Each person will have five ratings and hence
five variable columns are required.
• The command for the Friedman test is found under the
drop-down menu Analyze, followed by Nonparametric
Tests.
• SPSS refers to Friedman as K Related Samples.
• Send the variables to the Test Variables box.
• Ensure that Friedman is selected in the Test Type box.
• Click on OK.
Friedman test: SPSS Output
• The first table generated by SPSS is the Ranks table, which is a
description of the data giving the number of participants and the Mean
Rank for each condition.
• The Mean Rank indicates the mean rank of scores within each condition.
• If there were no difference between the ratings of the five components of
the new course material, i.e. the null hypothesis was true, we would
expect the mean ranks to be roughly equal across the five components.
• We can see from our example above that the five components do not
appear to be equally rated, with one receiving a higher rating than the
other four components.
• In order to determine whether the difference in these rankings is
significant we must look at the Test Statistics table.
• The test statistic to report is the Friedman Chi-Square, which in the
above example has a value of 14.727.
• The significance of the result is checked by examining the p value
(Asymp. Sig.).
• In our example χ 2 = 14.727, df = 4; p < 0.05.
• We can therefore conclude that the enjoyment of the grammar
component is favoured by students, having the highest rating.

You might also like