Non Parametric Test
Non Parametric Test
Nonparametric tests
Occasionally, the assumptions of the t-tests are seriously violated. In
particular, if the type of data you have is ordinal in nature and not at least
interval. On such occasions an alternative approach is to use nonparametric
tests. We are not going to place much emphasis on them in this unit as they
are only occasionally used. But you should be aware of them and have some
familiarity with them.
Nonparametric tests are also referred to as distribution-free tests. These
tests have the obvious advantage of not requiring the assumption of
normality or the assumption of homogeneity of variance. They compare
medians rather than means and, as a result, if the data have one or two
outliers, their influence is negated.
Parametric tests are preferred because, in general, for the same number of
observations, they are more likely to lead to the rejection of a false hull
hypothesis. That is, they have more power. This greater power stems from
the fact that if the data have been collected at an interval or ratio level,
information is lost in the conversion to ranked data (i.e., merely ordering the
data from the lowest to the highest value).
The following table gives the non-parametric analogue for the paired sample
t-test and the independent samples t-test. There is no obvious comparison
for the one sample t-test. Chi-square is a one-sample test and there are
alternatives to chi-square but we will not consider them further. Chi-square
is already a non-parametric test. Pearson's correlation also has non-
parametric alternative (Spearman's correlation) but we will not deal with it
further either.
There are a wide range of alternatives for the two group t-tests, the ones
listed are the most commonly use ones and are the defaults in SPSS.
Generally, running nonparametric procedures is very similar to running
parametric procedures, because the same design principle is being assessed
in each case. So, the process of identifying variables, selecting options, and
running the procedure are very similar. The final p-value is what determines
significance or not in the same way as the parametric tests. SPSS gives the
option of two or three analogues for each type of parametric test, but you
need to know only the ones cited in the table. Same practice with these tests
is given in Assignment II.
Readings
Howell describes several measures which assess the degree of relationship
between the two variables in chi-square. This material is worth reading but
for this unit we will not be discussing these at all. Howell also describes a
correction for continuity that is sometimes used and the use of likelihood
ratios. Again we will not be dealing with these issues. However, you should
read Howell's discussion of assumptions.
Ray does not appear to discuss chi-square or contingency tables.
○
• Nonparametric Tests
Chi square: χ 2
Zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz
Introduction to Kruskal-Wallis
Hypotheses
Data arrangement
Once you have established that your data suits Kruskal-Wallis, your data must
be arranged thus for use in one of the statistical packages (SPSS, UNISTAT):
Although it looks a bit daunting do not be worried. There is only one value that
concerns the selection of one of the hypotheses. The Right-Tail Probability
(0.0052) is the probability of the differences between the data sets occurring
by chance. Since it is lower than 0.05 the HO must be rejected and the HA
accepted.
Two-way Kruskal-Wallis results would appear in a similar format to the two-
way ANOVA.
Zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz
Non-parametric statistics
In statistics, the term non-parametric statistics covers a range
of topics:
• distribution free methods which do not rely on assumptions that the
data are drawn from a given probability distribution. As such it is the
opposite of parametric statistics. It includes non-parametric statistical
models, inference and statistical tests.
• non-parametric statistic can refer to a statistic (a function on a sample)
whose interpretation does not depend on the population fitting any
parametrized distributions. Statistics cased on the ranks of
observations are one example of such statistics and these play a
central role in many non-parametric approaches.
• non-parametric regression refers to modelling where the structure of
the relationship between variables is treated non-parametrically, but
where nevertheless there may be parametric assumptions about the
distribution of model residuals.
Contents
• 1 Applications and
purpose
• 2 Non-parametric models
• 3 Methods
• 4 General references
• 5 See also
[edit] Methods
Nonparametric Tests
Wilcoxon Mann-Whitney Test
Wilcoxon Signed Ranks Test
Sign Test
Runs Test
Kolmogorov-Smirnov Test
Kruskal-Wallis Test
Nonparametric Tests
The Wilcoxon Signed Ranks test is designed to test a hypothesis about the
location (median) of a population distribution. It often involves the use of
matched pairs, for example, before and after data, in which case it tests for
a median difference of zero.
The Wilcoxon Signed Ranks test does not require the assumption that the
population is normally distributed.
In many applications, this test is used in place of the one sample t-test
when the normality assumption is questionable. It is a more powerful
alternative to the sign test, but does assume that the population probability
distribution is symmetric.
This test can also be applied when the observations in a sample of data are
ranks, that is, ordinal data rather than direct measurements.
Sign Test
Runs Test
Kolmogorov-Smirnov Test
For a single sample of data, the Kolmogorov-Smirnov test is used to test
whether or not the sample of data is consistent with a specified distribution
function. When there are two samples of data, it is used to test whether or
not these two samples may reasonably be assumed to come from the
same distribution.
The Kolmogorov-Smirnov test does not require the assumption that the
population is normally distributed.
Compare Chi-Squared Goodness of Fit Test.
Kruskal-Wallis Test
In this section...
The Mann-Whitney U
The Kruskal-Wallis H Test
Chi-Square
Like all of the statistical tests discussed up to this point, non-parametric tests
are used to investigate the relationship between two or more variables. Recall
from our discussion at the start of this module that one of the key factors in
determining which statistical test to run is the nature of the data to be
analyzed. All of the statistical techniques you have learned up to now have
made assumptions regarding the data (in particular regarding the population
parameters estimated by the data). Correlation, ANOVA, independent and
paired-samples t-tests, and regression all assume that the population
parameters captured by the data are (1) normally distributed (values on all
variables correspond roughly to the bell shaped normal curve); (2)
quantitative in nature (the values can be manipulated arithmetically in a
meaningful manner); (3) and, at the very least, interval (differences between
values are captured by equal intervals). Indeed, these are conditions that
must be met in order to run parametric tests.
But if you reflect for a moment on the nature of data in general, you will
realize that not all data sets meet these assumptions. Consider, for example,
the following: what if in our fictitious compensation study salary levels for our
sample "bunch" around the extremes (high salary and low salary), with very
few people earning amounts in the "average" range. Data such as these are
not normally distributed--they "violate the normality assumption." Or say one
of our questions is "are you a college graduate?" and we offer only two
response options, "yes" or "no." This dichotomous variable is not quantitative
in nature (how do you determine the mean of "yes"?). Lastly, there are many
variables that are not captured on an interval or ratio scale. Some data simply
divide the values into two mutually exclusive groups--USF graduates and non-
USF graduates, for example. Such data are called "nominal" or "categorical":