0% found this document useful (0 votes)
20 views18 pages

UNIT 8research

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views18 pages

UNIT 8research

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Advanced Techniques of Research Statistics

UNIT
8
PARAMETRIC
TESTS

The main purpose of statistical analysis is to draw inferences


or made generalization. The statistical techniques enable us to
estimate parameters on the basis of sample statistics. The
measures of a large population are known as parameters and a
sample measure which is part of it is called statistics. This is the
primary purpose of statistical treatment. This procedure is
employed for statistical inference. In most of the situations it is
not possible to measure all the subjects of a given population; a
sample is selected as a representative parts it. The measures of
the sample are used to estimate population measures
(parameters).
8.1. DEFINITION - PARAMETRIC TESTS
Variable of interest is a measured quantity. Assumes that the
data follow some distribution which can be described by specific
parameters. Typically a normal distribution. Example; There are
an infinite number of normal distributions, all which can be

1 Samyukdha Publications
uniquely defined by a mean and standard deviation (SD).
Developed primarily to deal with categorical data (non-
continuous data). Example; disease vs no disease; dead vs alive.
Parametric vs. Nonparametric
 The difference between parametric and nonparametric
statistics has to do with the kind of data available for
analysis.
 Parametric when estimate of one parameter is interval or
ratio level.
 Most common are t-test and ANOVA which measure
differences between group means.
 Nonparametric when level of data is nominal or ordinal
and the normality of the distribution cannot be assumed.
 Most common is chi-square which measures difference
between 2 nominal variables or Spearman's r which can
measure relationship.
Parametric Tests
Statistical methods which depend on the parameters of
populations or probability distributions are referred to as
parametric tests. Parametric tests include:
 t-test
 F-ratio
 ANCOVA
 Correlation
 Regression
 Factor Analysis
These tests are only meaningful for continuous data which is
sampled from a population with an underlying normal
distribution or whose distribution can be rendered normal by
mathematical transformation.

Samyukdha Publications 2
Advanced Techniques of Research Statistics

Unfortunately, they are less flexible in practice and less


powerful than parametric tests. In cases where both parametric
and nonparametric methods are applicable, statisticians usually
recommend using parametric methods because they tend to
provide better precision.
Choosing an Appropriate Statistical Test
Type of Data
Goal Measurement Rank, Score, Binomial Survival Time
(from or (Two Possible
Gaussian Measurement Outcomes)
Population) (from Non-
Gaussian
Population)
Describe one Mean, SD Median, inter- Proportion Kaplan Meier
group quartile range survival curve
Compare one One-sample Wilcoxon test Chi-square
group to a t test or
hypothetical Binomial test
value **
Compare two Unpaired t test Mann-Whitney Fisher's test Log-rank test
unpaired test (chi-square for or Mantel-
groups large samples) Haenszel*
Compare two Paired t test Wilcoxon test McNemar's Conditional
paired groups test proportional
hazards
regression*
Compare One-way Kruskal-Wallis Chi-square test Cox
three or more ANOVA test proportional
unmatched hazard
groups regression**
Compare Repeated- Friedman test Cochrane Q** Conditional
three or more measures proportional
matched ANOVA hazards
groups regression**
Quantify Pearson Spearman Contingency
association correlation correlation coefficients**
between two
variables
Predict value Simple linear Nonparametric Simple logistic Cox
from another regression regression** regression* proportional
measured or hazard
variable Nonlinear regression*
regression
Predict value Multiple linear Multiple Cox
from several regression* logistic proportional
measured or or Multiple regression* hazard
binomial nonlinear regression*
variables regression**

3 Samyukdha Publications
8.2. CHOOSING BETWEEN PARAMETRIC AND
NONPARAMETRIC TESTS: THE EASY CASES
Choosing between parametric and nonparametric tests is
sometimes easy. Researcher should definitely choose a
parametric test if they are sure that their data are sampled from a
population that follows a Gaussian distribution (at least
approximately). Researcher should definitely select a
nonparametric test in three situations:
 The outcome is a rank or a score and the population is
clearly not Gaussian. Examples include class ranking of
students, the Apgar score for the health of newborn babies
(measured on a scale of 0 to IO and where all scores are
integers), the visual analogue score for pain (measured on a
continuous scale where 0 is no pain and 10 is unbearable
pain), and the star scale commonly used by movie and
restaurant critics (* is OK, ***** is fantastic).
 Some values are "off the scale," that is, too high or too low
to measure. Even if the population is Gaussian, it is
impossible to analyze such data with a parametric test since
researcher don't know all of the values. Using a
nonparametric test with these data is simple. Assign values
too low to measure an arbitrary very low value and assign
values too high to measure an arbitrary very high value.
Then perform a nonparametric test. Since the nonparametric
test only knows about the relative ranks of the values, it
won't matter that researcher didn't know all the values
exactly.
 The data are measurements and researcher is sure that the
population is not distributed in a Gaussian manner. If the

Samyukdha Publications 4
Advanced Techniques of Research Statistics

data are not sampled from a Gaussian distribution, consider


whether researcher can transformed the values to make the
distribution become Gaussian. For example, researcher
might take the logarithm or reciprocal of all values. There
are often biological or chemical reasons (as well as statistical
ones) for performing a particular transform.
8.3. CHOOSING BETWEEN PARAMETRIC AND
NONPARAMETRIC TESTS: THE HARD CASES
It is not always easy to decide whether a sample comes from
a Gaussian population. Consider these points:
 If researcher collect many data points (over a hundred or so),
researcher can look at the distribution of data and it will be
fairly obvious whether the distribution is approximately bell
shaped. A formal statistical test (Kolmogorov-Smirnoff test,
not explained in this book) can be used to test whether the
distribution of the data differs significantly from a Gaussian
distribution. With few data points, it is difficult to tell
whether the data are Gaussian by inspection, and the formal
test has little power to discriminate between Gaussian and
non-Gaussian distributions.
 Researcher should look at previous data as well. Remember,
what matters is the distribution of the overall population, not
the distribution of researcher sample. In deciding whether a
population is Gaussian, look at all available data, not just
data in the current experiment.
 Consider the source of scatter. When the scatter comes from
the sum of numerous sources (with no one source
contributing most of the scatter), researcher expect to find a
roughly Gaussian distribution.

5 Samyukdha Publications
8.4. t-test (Test of significance)
‘t’ test or test of significance of the difference between
means for large independent samples (Garrett, 1969) is used to
compare the means between any two groups on any of the
variables.
If the ‘t’ value is below a cut-off point (depending on the
degrees of freedom), the differences in means is considered not
significant, and the null hypothesis is accepted. When the ‘t’
value exceeds a cut-off point, the difference is said to be
significant and the null hypothesis is rejected.
The t-test is used to find the significant level of difference
between two groups of populations. The t-value is calculated
from the mean and standard deviation of the two groups. If the
obtained value is 2.58 and above, then the significant level of
difference is 0.01 and if the value lies between 1.96 and 2.58, the
significant level of difference is 0.05. If the value is below 1.96,
the difference is not significant at any level. The t-test is
calculated using the formula.

M1 - M 2
t
 12  22

N1 N2

Where,
M1 = Mean of the  sample
M2 = Mean of the  sample
1 = S.D of the  sample
2 = S.D of the  sample
N1 = Total number of frequency of the  sample
N2 = Total number of frequency of the  sample

Samyukdha Publications 6
Advanced Techniques of Research Statistics

8.5. F-ratio (ANOVA-Single factor)


This method was widely used in the experiments of
behavioural and social science to test the significant of
difference of means in different groups of a varied population.
For this technique, it is possible of determine the significance of
difference of different means in a single test rather they many. In
this way it minimizes the type 1 error unlike in case of ‘t’ test.
Analysis of Variance (ANOVA) is an extremely useful
technique, for testing the difference between the means of
multiple independent samples. The basic principle for ANOVA
is to test the differences among the means of the samples by
examining the amount of variation between the samples relative
to the amount of variation between the samples. This value is
compared with the ‘F’ value for the given degrees of freedom. If
the ‘F’ value worked out is equal or exceeds the ‘F’ limit value
(from tables) it indicates that there are significant differences
among the sample means.
The F-test is an effective way to determine whether the
means of more than two samples are too different to attribute to
sampling error. It contains of following operations.
1) The sum of scores and the sum of squares of the scores are
obtained.
2) The variances of the score of one composite to one
composite group are known as the total group variance.
a. The formula is,

SS t   x 2 
 x 2
N

7 Samyukdha Publications
3) The mean value of the variance of each of the three groups
computed separately is known as the between group
variance.
a. The formula is,

  x1   x2    x3   xn 
2 2 2 2

SSb     ................. 
N1 N2 N3 Nn
4) Within the group sum of squares.
SSw = SSt - SSb
5) Computation of F-ratio.
F-ratio by ANOVA was calculated by dividing the groups of the
scores obtained by both the groups.

MSb
F =
MSw
SSb / df1
=
SS w / df 2
Where,
MSb = Mean square variance between group
MSw = Mean square variance with in the group

df1 = Number of groups -1

df 2 = Total number of students – Number of groups

8.6. ANCOVA (Analysis of Co-efficient of Variation)


(Correlated Variance)
Analysis of co-variance is an extension of analysis, of
variance to allow for the correlation between initial and final
scores. It is also an improvement over analysis of variance

Samyukdha Publications 8
Advanced Techniques of Research Statistics

technique. Analysis of co-variance is useful for experimental


psychologists where for variance reasons it is impossible or
difficult to equate experimental and control groups at the start a
situation which often obtains in actual situations or in
experiments. Though analysis of co-variance one is able to adjust
in final or terminal scores which may allow for differences in
some initial variable.
Definition of Co-variance
The term co-variance has been defined, “Statistically the co-
variance may be defined as the function of two correlated factors
and their analysis into corresponding parts”.
Practically analysis of co-variance is technique to adjust the
initial scores to final scores, so that net effect can be analyzed.
The analysis of variance technique is to analyse and test the
significance difference among final scores or initial scores. The
final scores difference may be attributable to treatment effects,
while it may be caused due to initial effect. The analysis of co-
variance technique adjusts the initial performance to final scores
to obtain the net effects of the treatments.
Meaning and Functions of Analysis of Variance
In the experiments of psychology and education, the final
scores are used to test the effectiveness of treatments. The
analysis of variance technique is used for statistical analysis. The
initial performance is not considered and groups are not equated.
The initial performance is included in the final scores. The
difference may be due to initial performance, not due to
treatments effect. Thus, the findings through analysis of variance
may not be valid. The initial performance should be adjusted to
final scores for obtaining the net effect of the treatments. The

9 Samyukdha Publications
initial and final scores are correlated to have co-variance. Thus
analysis of co-variance technique has to adjust the final scores.

C.V. = x100
X
Where, = Standard Deviation

X = Mean Value
8.7. Correlation Technique
Correlation is used for measuring the degree of
relationship between two variables. It shows us the extent
to which values in one variable are linked or related to
values in another variable.
Correlation coefficient is calculated using the formula,
N  XY   X  Y
r 
N  X 2    X  2  N  Y 2    Y  2 
where,
r = Correlation coefficient
X = Sum of X score
Y = Sum of Y score
X2= Sum of squares of X score
Y2= Sum of squares of Y score
XY = Sum of product of X and Y
N = Number of students
8.8. Multiple Correlation
Multiple correlation is used for estimating the inter-
correlations among independent variables as well as to their

Samyukdha Publications 10
Advanced Techniques of Research Statistics

correlations with the depended variable. The coefficient of


multiple correlation indicated the strength of relationship
between one variable (dependent variable) and two or more
others (independent variables) taken together.
Multiple Regression Analysis
1. Regression Equation in obtained scores form (XC)
Xc = a0 +a1 X1 + a2 X2 + a3 X3 + a4 X4 + a5 X5 + a6 X6 + a7 X7 +
a8 X8 + a9 X9 + a10 X10
Where, Xi= Independent variables,
ai = Co-efficient of independent variables for i = 1 to 10; and
a0 = Regression Constant
2. Regression Equation in standard scores form (ZC)
Zc = a1 Z1 + a2 Z2 + a3 Z3 + a4 Z4 + a5 Z5 + a6 Z6 + a7 Z7 + a8 Z8 +
a9 Z9 + a10 Z10
Where, Zi= Independent variables, and
ai = Co-efficient of independent variables for i = 1 to 10
8.9. Factor Analysis
Factor analysis is a general label applied to a set of statistical
procedures designed to identify the basis dimension or factors
that underline the relationship among a large number of variable.
Harman (1960) defines the procedure of factor analysis as
follows: “The principal concern of factor analysis is the
resolution of a set of variables linear by in terms of a smaller
number of categories or ‘factors’. This resolution can be
accomplished by the analysis of the correlation among the
variables. A satisfactory solution will yield factors, which
convey all the essential information of the original set of

11 Samyukdha Publications
variables. Thus, the chief aim is to attain scientific parsimony or
economy of description”.
Guilford (1956) outlines the different steps in factor analytic
study in the following term:
a) Select an appropriate domain for investigation.
b) Develop a hypothesis concerning the factor.
c) Select or construct suitable tests.
d) Select a suitable population.
e) Obtain a sample of adequate size.
f) Extract factors with commonalities in the diagonal cells
of the correlation matrix.
g) Rotate the reference axes and
h) Interpret the rotated factors.
The present investigation made use of principal-axes
method, as it is one of the satisfactory procedures of factor
analysis. Fruchter (1954) explains the superiority of this method
in the following terms.
The principal-axes method of factoring the correlation
matrix is of interest for several reasons. Each factor extracts the
maximum amount of variance, (i.e., the sum of squares of factors
loadings is maximized on each factor) and gives the smallest
possible residuals. The correlation matrix is condensed into the
smallest number of orthogonal factors by this method. The
method also has an advantage of giving mathematically unique
(least square) solution for a given table of correlations. Harman
(1960) points out that this method needs larger number of
computations. But this difficulty is overcome with the help of
high speed computers.

Samyukdha Publications 12
Advanced Techniques of Research Statistics

Test of Significance of Extracted Factors


The test of significance is applied to the obtained factors and
only those, which are significant, are retained for final
interpretation.
Interpretations of Factors: Principles and Criteria
a) Locate the group of variables in which the factor has the
highest loadings.
b) Locate the group of variables in which the factor has the
lowest loadings.
c) Examine the possibility of different factors becoming
independent and
d) Treat factor loading whose absolute values are greater
than 0.30 as significant and neglect others as not
significant.
The degree of presence of each variable is a factor determined as
follows:
a. Factor loading above 0.900– extremely high presence of
the variable
b. Factor loading above 0.700 to 0.900 – high presence of
the variable.
c. Factor loading above 0.550 to 0.700 – considerable
presence.
d. Factor loading above 0.450 to 0.550 – variable
somewhat presence.
e. Factor loading above 0.300 to 0.450 – variable presence
but low, and
f. Factor loading below 0.300 – variable not presence.

13 Samyukdha Publications
8.10. REGRESSION OR CORRELATION
Linear regression and correlation are similar and easily
confused. In some situations it makes sense to perform both
calculations. Calculate linear correlation if researcher measured
both X and Y in each subject and wish to quantity how well they
are associated. Select the Pearson (parametric) correlation
coefficient if researcher can assume that both X and Y are
sampled from Gaussian populations. Otherwise choose the
Spearman nonparametric correlation coefficient. Don't calculate
the correlation coefficient (or its confidence interval) if
researcher manipulated the X variable.
Calculate linear regressions only if one of the variables (X)
is likely to precede or cause the other variable (Y). Definitely
choose linear regression if researcher manipulated the X
variable. It makes a big difference which variable is called X and
which is called Y, as linear regression calculations are not
symmetrical with respect to X and Y. If researcher swaps the two
variables, researcher will obtain a different regression line. In
contrast, linear correlation calculations are symmetrical with
respect to X and Y. If researcher swaps the labels X and Y,
researcher will still get the same correlation coefficient.
Restrictions of parametric tests
Conventional statistical procedures are also called parametric
tests. In a parametric test a sample statistic is obtained to
estimate the population parameter. Because this estimation
process involves a sample, a sampling distribution, and a
population, certain parametric assumptions are required to ensure
all components are compatible with each other. For example, in
Analysis of Variance (ANOVA) there are three assumptions:

Samyukdha Publications 14
Advanced Techniques of Research Statistics

 Observations are independent.


 The sample data have a normal distribution.
 Scores in different groups have homogeneous variances.
In a repeated measure design, it is assumed that the data
structure conforms to the compound symmetry. A regression
model assumes the absence of co linearity, the absence of auto
correlation, random residuals, linearity...etc. In structural
equation modeling, the data should be multivariate normal.
Why are they important? Take ANOVA as an example.
ANOVA is a procedure of comparing means in terms of variance
with reference to a normal distribution. The inventor of
ANOVA, Sir R. A. Fisher (1935) clearly explained the
relationship among the mean, the variance, and the normal
distribution: "The normal distribution has only two
characteristics, its mean and its variance. The mean determines
the bias of our estimate, and the variance determines its
precision." (p.42) It is generally known that the estimation is
more precise as the variance becomes smaller and smaller.
Put it in another way: the purpose of ANOVA is to extract
precise information out of bias, or to filter signal out of noise.
When the data are skewed (non-normal), the means can no
longer reflect the central location and thus the signal is biased.
When the variances are unequal, not every group has the same
level of noise and thus the comparison is invalid. More
importantly, the purpose of parametric test is to make inferences
from the sample statistic to the population parameter through
sampling distributions. When the assumptions are not met in the
sample data, the statistic may not be a good estimation to the
parameter. It is incorrect to say that the population is assumed to

15 Samyukdha Publications
be normal and equal in variance, therefore the researcher
demands the same properties in the sample. Actually, the
population is infinite and unknown. It may or may not possess
those attributes. The required assumptions are imposed on the
data because those attributes are found in sampling distributions.
However, very often the acquired data do not meet these
assumptions. There are several alternatives to rectify this
situation:
However, non-parametric procedures are criticized for the
following reasons:
 Unable to estimate the population: Because non-
parametric tests do not make strong assumptions about the
population, a researcher could not make an inference that the
sample statistic is an estimate of the population parameter.
 Losing precision: Edgington (1995) asserted that when
more precise measurements are available, it is unwise to
degrade the precision by transforming the measurements into
ranked data
 Low power: Generally speaking, the statistical power of
non-parametric tests is lower than that of their parametric
counterpart except on a few occasions (Hodges & Lehmann,
1956; Tanizaki, 1997; Freidlin & Gastwirth, 2000).
 False sense of security: It is generally believed that non-
parametric tests are immune to parametric assumption
violations and the presence of outliers. However,
Zimmerman (2000) found that the significance levels of the
WMW test and the KW test are substantially biased by
unequal variances even when sample sizes in both groups are
equal. In some cases the Type error rate can increase up to

Samyukdha Publications 16
Advanced Techniques of Research Statistics

40-50%, and sometime 300%. The presence of outliers is


also detrimental to non-parametric tests. Zimmerman (1994)
outliers modify Type II error rate and power of both
parametric and non-parametric tests in a similar way. In
short, non-parametric tests are not as robust as what many
researchers thought.
 Lack of software: Currently very few statistical software
applications can produce confidence intervals for
nonparametric tests. MINITAB and Stata are a few
exceptions.
 Testing distributions only: Further, non-parametric tests are
criticized for being incapable of answering the focused
question. For example, the WMW procedure tests whether
the two distributions are different in some way but does not
show how they differ in mean, variance, or shape. Based on
this limitation, Johnson (1995) preferred robust procedures
and data transformation to non-parametric tests (Robust
procedures and data transformation will be introduced in the
next section).
At first glance, taking all of the above shortcomings into
account, non-parametric tests seem not to be advisable.
However, everything that exists has a reason to exist. Despite the
preceding limitations, nonparametric methods are indeed
recommended in some situations. By employing simulation
techniques, Skovlund and Fenstad (2001) compared the Type I
error rate of the standard t-test and the WMW test, and the
Welch's test (a form of robust procedure, which will be discussed
later) with variations of three variables: variances (equal,
unequal), distributions (normal, heavy-tailed, skewed), and

17 Samyukdha Publications
sample sizes (equal, unequal). It was found that the WMW test is
considered either the best or an acceptable method when the
variances are equal, regardless of the distribution shape and the
homogeneous of sample size.



Samyukdha Publications 18

You might also like