UNIT 8research
UNIT 8research
UNIT
8
PARAMETRIC
TESTS
1 Samyukdha Publications
uniquely defined by a mean and standard deviation (SD).
Developed primarily to deal with categorical data (non-
continuous data). Example; disease vs no disease; dead vs alive.
Parametric vs. Nonparametric
The difference between parametric and nonparametric
statistics has to do with the kind of data available for
analysis.
Parametric when estimate of one parameter is interval or
ratio level.
Most common are t-test and ANOVA which measure
differences between group means.
Nonparametric when level of data is nominal or ordinal
and the normality of the distribution cannot be assumed.
Most common is chi-square which measures difference
between 2 nominal variables or Spearman's r which can
measure relationship.
Parametric Tests
Statistical methods which depend on the parameters of
populations or probability distributions are referred to as
parametric tests. Parametric tests include:
t-test
F-ratio
ANCOVA
Correlation
Regression
Factor Analysis
These tests are only meaningful for continuous data which is
sampled from a population with an underlying normal
distribution or whose distribution can be rendered normal by
mathematical transformation.
Samyukdha Publications 2
Advanced Techniques of Research Statistics
3 Samyukdha Publications
8.2. CHOOSING BETWEEN PARAMETRIC AND
NONPARAMETRIC TESTS: THE EASY CASES
Choosing between parametric and nonparametric tests is
sometimes easy. Researcher should definitely choose a
parametric test if they are sure that their data are sampled from a
population that follows a Gaussian distribution (at least
approximately). Researcher should definitely select a
nonparametric test in three situations:
The outcome is a rank or a score and the population is
clearly not Gaussian. Examples include class ranking of
students, the Apgar score for the health of newborn babies
(measured on a scale of 0 to IO and where all scores are
integers), the visual analogue score for pain (measured on a
continuous scale where 0 is no pain and 10 is unbearable
pain), and the star scale commonly used by movie and
restaurant critics (* is OK, ***** is fantastic).
Some values are "off the scale," that is, too high or too low
to measure. Even if the population is Gaussian, it is
impossible to analyze such data with a parametric test since
researcher don't know all of the values. Using a
nonparametric test with these data is simple. Assign values
too low to measure an arbitrary very low value and assign
values too high to measure an arbitrary very high value.
Then perform a nonparametric test. Since the nonparametric
test only knows about the relative ranks of the values, it
won't matter that researcher didn't know all the values
exactly.
The data are measurements and researcher is sure that the
population is not distributed in a Gaussian manner. If the
Samyukdha Publications 4
Advanced Techniques of Research Statistics
5 Samyukdha Publications
8.4. t-test (Test of significance)
‘t’ test or test of significance of the difference between
means for large independent samples (Garrett, 1969) is used to
compare the means between any two groups on any of the
variables.
If the ‘t’ value is below a cut-off point (depending on the
degrees of freedom), the differences in means is considered not
significant, and the null hypothesis is accepted. When the ‘t’
value exceeds a cut-off point, the difference is said to be
significant and the null hypothesis is rejected.
The t-test is used to find the significant level of difference
between two groups of populations. The t-value is calculated
from the mean and standard deviation of the two groups. If the
obtained value is 2.58 and above, then the significant level of
difference is 0.01 and if the value lies between 1.96 and 2.58, the
significant level of difference is 0.05. If the value is below 1.96,
the difference is not significant at any level. The t-test is
calculated using the formula.
M1 - M 2
t
12 22
N1 N2
Where,
M1 = Mean of the sample
M2 = Mean of the sample
1 = S.D of the sample
2 = S.D of the sample
N1 = Total number of frequency of the sample
N2 = Total number of frequency of the sample
Samyukdha Publications 6
Advanced Techniques of Research Statistics
SS t x 2
x 2
N
7 Samyukdha Publications
3) The mean value of the variance of each of the three groups
computed separately is known as the between group
variance.
a. The formula is,
x1 x2 x3 xn
2 2 2 2
SSb .................
N1 N2 N3 Nn
4) Within the group sum of squares.
SSw = SSt - SSb
5) Computation of F-ratio.
F-ratio by ANOVA was calculated by dividing the groups of the
scores obtained by both the groups.
MSb
F =
MSw
SSb / df1
=
SS w / df 2
Where,
MSb = Mean square variance between group
MSw = Mean square variance with in the group
Samyukdha Publications 8
Advanced Techniques of Research Statistics
9 Samyukdha Publications
initial and final scores are correlated to have co-variance. Thus
analysis of co-variance technique has to adjust the final scores.
C.V. = x100
X
Where, = Standard Deviation
X = Mean Value
8.7. Correlation Technique
Correlation is used for measuring the degree of
relationship between two variables. It shows us the extent
to which values in one variable are linked or related to
values in another variable.
Correlation coefficient is calculated using the formula,
N XY X Y
r
N X 2 X 2 N Y 2 Y 2
where,
r = Correlation coefficient
X = Sum of X score
Y = Sum of Y score
X2= Sum of squares of X score
Y2= Sum of squares of Y score
XY = Sum of product of X and Y
N = Number of students
8.8. Multiple Correlation
Multiple correlation is used for estimating the inter-
correlations among independent variables as well as to their
Samyukdha Publications 10
Advanced Techniques of Research Statistics
11 Samyukdha Publications
variables. Thus, the chief aim is to attain scientific parsimony or
economy of description”.
Guilford (1956) outlines the different steps in factor analytic
study in the following term:
a) Select an appropriate domain for investigation.
b) Develop a hypothesis concerning the factor.
c) Select or construct suitable tests.
d) Select a suitable population.
e) Obtain a sample of adequate size.
f) Extract factors with commonalities in the diagonal cells
of the correlation matrix.
g) Rotate the reference axes and
h) Interpret the rotated factors.
The present investigation made use of principal-axes
method, as it is one of the satisfactory procedures of factor
analysis. Fruchter (1954) explains the superiority of this method
in the following terms.
The principal-axes method of factoring the correlation
matrix is of interest for several reasons. Each factor extracts the
maximum amount of variance, (i.e., the sum of squares of factors
loadings is maximized on each factor) and gives the smallest
possible residuals. The correlation matrix is condensed into the
smallest number of orthogonal factors by this method. The
method also has an advantage of giving mathematically unique
(least square) solution for a given table of correlations. Harman
(1960) points out that this method needs larger number of
computations. But this difficulty is overcome with the help of
high speed computers.
Samyukdha Publications 12
Advanced Techniques of Research Statistics
13 Samyukdha Publications
8.10. REGRESSION OR CORRELATION
Linear regression and correlation are similar and easily
confused. In some situations it makes sense to perform both
calculations. Calculate linear correlation if researcher measured
both X and Y in each subject and wish to quantity how well they
are associated. Select the Pearson (parametric) correlation
coefficient if researcher can assume that both X and Y are
sampled from Gaussian populations. Otherwise choose the
Spearman nonparametric correlation coefficient. Don't calculate
the correlation coefficient (or its confidence interval) if
researcher manipulated the X variable.
Calculate linear regressions only if one of the variables (X)
is likely to precede or cause the other variable (Y). Definitely
choose linear regression if researcher manipulated the X
variable. It makes a big difference which variable is called X and
which is called Y, as linear regression calculations are not
symmetrical with respect to X and Y. If researcher swaps the two
variables, researcher will obtain a different regression line. In
contrast, linear correlation calculations are symmetrical with
respect to X and Y. If researcher swaps the labels X and Y,
researcher will still get the same correlation coefficient.
Restrictions of parametric tests
Conventional statistical procedures are also called parametric
tests. In a parametric test a sample statistic is obtained to
estimate the population parameter. Because this estimation
process involves a sample, a sampling distribution, and a
population, certain parametric assumptions are required to ensure
all components are compatible with each other. For example, in
Analysis of Variance (ANOVA) there are three assumptions:
Samyukdha Publications 14
Advanced Techniques of Research Statistics
15 Samyukdha Publications
be normal and equal in variance, therefore the researcher
demands the same properties in the sample. Actually, the
population is infinite and unknown. It may or may not possess
those attributes. The required assumptions are imposed on the
data because those attributes are found in sampling distributions.
However, very often the acquired data do not meet these
assumptions. There are several alternatives to rectify this
situation:
However, non-parametric procedures are criticized for the
following reasons:
Unable to estimate the population: Because non-
parametric tests do not make strong assumptions about the
population, a researcher could not make an inference that the
sample statistic is an estimate of the population parameter.
Losing precision: Edgington (1995) asserted that when
more precise measurements are available, it is unwise to
degrade the precision by transforming the measurements into
ranked data
Low power: Generally speaking, the statistical power of
non-parametric tests is lower than that of their parametric
counterpart except on a few occasions (Hodges & Lehmann,
1956; Tanizaki, 1997; Freidlin & Gastwirth, 2000).
False sense of security: It is generally believed that non-
parametric tests are immune to parametric assumption
violations and the presence of outliers. However,
Zimmerman (2000) found that the significance levels of the
WMW test and the KW test are substantially biased by
unequal variances even when sample sizes in both groups are
equal. In some cases the Type error rate can increase up to
Samyukdha Publications 16
Advanced Techniques of Research Statistics
17 Samyukdha Publications
sample sizes (equal, unequal). It was found that the WMW test is
considered either the best or an acceptable method when the
variances are equal, regardless of the distribution shape and the
homogeneous of sample size.
Samyukdha Publications 18