Practicum
Practicum
(Two problems to be written in the practical files: One with Yates Correction and another
without Yates Correction)
In statistical test two kinds of assertions are involved, viz, an assertion directly related to the
purpose of investigation and other assertion to make a probability statement. The former is an
assertion to be tested and is technically called a hypothesis, whereas other assertion is called a
model, it is known as distribution-free or non-parametric test. Under nonparametric or distribution
free test, it is difficult to assume that a particular distribution is applicable or that a certain value
is attached to a parameter of the population. Nonparametric statistics require few assumptions, no
estimate of parameter in their computation, and no normal distribution of the variables in the
population.
(a) Speed of Application: When sample size is small and moderate, distribution free methods are
generally faster than parametric techniques.
(b) Scope of Application: Since nonparametric tests are based on fewer and less elaborate
assumptions than parametric tests, nonparametric techniques can be correctly applied to a much
larger class of population.
(d) Influence on Sample Size: When the sample size is less or equal to 10, distribution free
statistical tests are easier and quicker, though less efficient. At such sample sizes the parametric
assumptions may not be satisfied for this, nonparametric tests are most appropriate. As the sample
size increases the nonparametric tests become more laborious and time consuming and frequently
become a much less efficient statistical test.
1
(e) Susceptibility to violations of assumptions: Since the assumptions are fewer and less elaborate
with nonparametric statistical test, they are less susceptible to violation.
(f) In terms of mathematical criterion of statistical efficiency, distribution free tests are often
superior or equal to their parametric counterparts particularly when, the assumptions of
nonparametric tests are fulfilled but the assumptions of parametric tests are not.If both tests are
applied when all assumptions of parametric tests are made, distribution free statistics are only
slightly less efficient, when sample size is small.
(a) easier to apply; (b) applicable to ranked data; (c) usable to two sets of observation coming from
different population; (d) the only alternative when sample size is small, (e) useful at a specified
significant level, (f) less efficient statistics, (g) not applicable to normal distribution in the variables
of the population, (h) not included any precomputed statistic as an estimate of parameter in the
computation.
The Chi-square (χ2) is an important test among the several tests of significance developed by
statistician. As a nonparametric test it is used to determine if categorical data shows dependence
or two classifications are independent. It can also be used to make comparison between theoretical
population and actual data when categories are used. Thus, the chi-square test is applicable to a
large number of problems. The test is, in fact the technique through the use of which it is possible
for all researchers who:
2
(a) test the goodness of fit; (b) test the significance of association between two attributes, (c) test
the homogeneity or significance of population variance.
Chi-square (χ2) is an important nonparametric test and as such no rigid assumptions are necessary
in respect of the type of population. As a nonparametric test, Chi-square (χ2) can be used as a
goodness of fit and as a test of independence.
The differences between observed and expected frequencies are squared and divided by the
expected number in each case, and the sum of these quotients is χ2. The more closely the observed
results approximate to the expected, the smaller the chi-square and the closer agreement between
observed data and hypothesis being tested. Contrariwise, the larger the chi-square the greater the
probability of a real divergence of experimentally observed from expected results.
3
This test is used to explore how far a distribution of observed frequencies (fo) fits with theoretical
distribution such as the normal distribution, a binomial distribution, a Mendelian phenotype
distribution, and a distribution proposed by hypothesis of equal probability. Thefevalues are
computed here on the basis of the proposed theoretical distribution. Theχ2computed from the (fo-
fe)values proves significant if it equals or exceeds the critical χ2 for the chosen level of significance;
in such a case, the fo distribution differs significantly from the proposed distribution. A computed
χ2 indicates that the fo distribution fits with proposed distribution and does not differ significantly
from the latter.
The classical formula, based on (fo-fe) values, is used in computing the χ2.
The alternative formula, avoiding the use of fe values, cannot be applied as no contingency table
is involved. Yates’ correction has to be applied if fe of any class falls below 5 and the df amounts
1 only.
An association may exist between two variables if the change, in value or quality, of one variable
is accompanied by a similar or opposite change of the other in the same individual. Absence of
association is called the independence of two variables. A χ2 test of independence explores whether
or not two variables are significantly independent of each other; in other words, it explores the
existence of any significant association between the variables. But it can (i) neither measure the
magnitude and the direction of the association, (ii) nor predict a cause-and-effect relationship
between the variables.
When several χ2s’ have been computed from independent experiments, these may be summed to
give a new chi-square with df= the sum of separate df’s. The fact that chi-square may be added to
provide an overall test of a hypothesis is important in many experimental studies. Combining the
data from several experiments will often yield a conclusive result, when separate experiments, taken alone,
provide only indications.
For example
4
Problem No: 1
Specific Problem: The following table represents possessions of two characteristics of varying
degrees by a group. In total 413 persons have been classified as to “eyedness” and “handedness”.
Do these data handedness and eyedness are essentially independent? Test the hypothesis at 0.05
level.
Null Hypothesis: There is no significant association between the two characteristics namely
handedness and eyedness. Any such association is due to chance alone.
Statistic used for Testing of Hypothesis:A two-tailChi-square (χ2) test will be used here for
testing the hypothesis. The relevant formula is as follows:
5
df=(r-1) (c-1) that is (row-1)(column-1)
Statistical Treatment:
Interpretation and Conclusion: The obtained chi-square (χ2) value has been found as -----. The
critical value of chi-square with df=------------ at 0.05 level is----------. As the obtained chi-square
value is lesser/higher than the critical value, therefore it can be said that the obtained chi-square
value is not significant/significant at 0.05 level. So, the probability of obtaining such association
by chance is greater/lesser than 0.05. It may also be said that the null hypothesis is
rejected/accepted and the alternative hypothesis is rejected/accepted. In other words, there is no
significant association/significant association between handedness and eyedness. It means that the
two characteristics eyedness and handedness are independent/not independent of each other.
For example
Problem No: 2
6
General Problem: On Nonparametric Statistics
Statistic used for Testing of Hypothesis:A two-tailChi-square (χ2) test will be used here for
testing the hypothesis. The relevant formula is as follows:
7
df=(r-1) (c-1) that is (row-1)(column-1)
In this context, however it may be noted that, if any computed fe is less than 5 and the χ2 has the df
of 1 only, Yates’ correction has to be applied. Yates’ correction brings each (fo- fe) closer to zero
by 0.5-this means the subtraction of 0.5 from each positive (fo- fe) and the addition of 0.5 to each
negative (fo- fe). The corrected (fo- fe) values are used for computing χ2. In other words,
Where the bars on the two sides of fo- feindicate that all values of (fo- fe) are taken as positive,
ignoring their algebraic sigs.
Alternative formula:
When any fe is less than 5, Yates’ correction is applied by changing the computational formula:
Where the bars on the two sides of AD-BC indicate that (AD-BC) is take n as positive irrespective
of its algebraic sign.
The correction is needed because a computed chi-square based on frequencies (which are whole
numbers), varies in discrete jumps, whereas the chi-square table, representing the distribution of
chi-square gives values from a continuous scale. When frequencies are large, this correction is
relatively unimportant, but when they are small, a chance of 0.5 is some consequence. The
correction is particularly important when chi-square turn out to be near a point of division between
critical regions.
Statistical Treatment:
8
Diabetic 7 8 15
Nondiabetic 8 2 10
Total 15 10 25(n)
Interpretation and Conclusion: The obtained chi-square (χ2) value has been found as ------. The
critical value of chi-square with df=1 at 0.05 level is----------. As the obtained chi-square value is
lesser/higher than the critical value, therefore it can be said that the obtained chi-square value is
not significant/significant at 0.05 level. So, the probability of obtaining such association by chance
is greater/lesser than 0.05. It may also be said that the null hypothesis is rejected/accepted and the
alternative hypothesis is rejected/accepted. In other words, there is no significant
association/significant association between hypercholesterolemia and diabetes.
* The statistics mentioned here are just considered as an example to show the format of writing,
but for practical file any statistical calculation (One with Yates Correction and another without
Yates Correction) can be done according to the decision taken into the workshop.
For example
9
Problem No: 3
Specific Problem:Among 90 children, 60 had given vaccination against influenza and 12 have
severe influenza attack whereas 10 among 30 non-vaccinated have minor attack and 20 have severe
attack of influenza. State and Test the null hypothesis.
Statistic used for Testing of Hypothesis:A two-tailChi-square (χ2) test will be used here for
testing the hypothesis. The relevant formula is as follows:
χ2here is computed using the products of pairs of cell frequencies (AD and BC) and the marginal
totals
Statistical Treatment:
10
Vaccinated 48 12 60
Non vaccinated 10 20 30
Totals 58 32 90(n)
Interpretation and Conclusion: The obtained chi-square (χ2) value has been found as------. The
critical value of chi-square with df=1 at 0.05 and 0.01 levels are----------& --------. As the obtained
chi-square value is lesser/higher than the critical value, therefore it can be said that the obtained
chi-square value is not significant/significant at 0.05/0.01 level. So, the probability of obtaining
such association by chance is greater/lesser than 0.05/0.01. It may also be said that the null
hypothesis is rejected/accepted and the alternative hypothesis is rejected/accepted. In other words,
vaccination against influenza does not show/show a genuine capacity to check its severity of attack
in children.
11