0% found this document useful (0 votes)
16 views27 pages

Chi Square Test 2

The document discusses Chi-Square tests and ANOVA. It defines important terms like parametric/non-parametric tests and explains Chi-Square distribution and calculations. It also covers applications of Chi-Square tests including goodness of fit, independence of attributes, and homogeneity. Conditions for application and an example calculation are provided.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views27 pages

Chi Square Test 2

The document discusses Chi-Square tests and ANOVA. It defines important terms like parametric/non-parametric tests and explains Chi-Square distribution and calculations. It also covers applications of Chi-Square tests including goodness of fit, independence of attributes, and homogeneity. Conditions for application and an example calculation are provided.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

CHI – SQUARE TEST

AND
ANOVA (ANALYSIS OF
VARIANCE)

GROUP – 4:
Member:
1. Mamac, Mariel T.
2. Mendoza, April M.
3. Mejares, Weliza
4. Nuique, Benifacia O.
5. Turtal, Jamesly T.
6. Villalon, Merrycris M.

IMPORTANT TERMS:

1) PARAMETRIC TEST: The test in which,


the population constants like mean, std
deviation, std error, correlation coefficient,
proportion etc. and data tend to follow one
assumed or established distribution such as
normal, binomial, poison etc.

2) NON PARAMETRIC TEST: the test in


which no constant of a population is used.
Data do not follow any specific distribution
and no assumption are made in these tests.
E.g. to classify good, better and best we just
allocate arbitrary numbers or marks to each
category.
3) HYPOTHESIS: It is a definite statement about the
population parameters.

4) NULL HYPOTHESIS: (H) states that no association


exists between the two cross-tabulated variables in the
population, and therefore the variables are statistically
independent. E.g. if we want to compare 2 methods
method A and method B for its superiority, and if the
assumption is that both methods are equally good, then
this assumption is called as NULL HYPOTHESIS.

5) ALTERNATIVE HYPOTHESIS: (H) proposes that


the two variables are related in the population. If we
assume that from 2 methods, method A is superior than
method B, then this assumption is called as
ALTERNATIVE HYPOTHESIS.

6) DEGREE OF FREEDOM: It denotes the extent of


independence (freedom) enjoyed by a given set of
observed frequencies Suppose we are given a set of n
observed frequencies which are subjected to k
independent constraints(restrictions) then,

d.f = (number of frequencies) - (number of


independent constraints on them)
In other terms, df (r-1)(c-1) where:
r the number of rows
c=the number of columns
7) CONTINGENCY TABLE: When the table is prepared
by enumeration of qualitative data by entering the actual
frequencies, and if that table represents occurrence of two
sets of events, that table is called the contingency table.
(Latin, con- together, tangere- to touch). It is also called as
an association table.

INTRODUCTION
 The chi-square test is an important test amongst the
several tests of significance developed by statisticians.
 Is was developed by Karl Pearson in1900.
 CHI SQUARE TEST is a non parametric test not based
on any assumption or distribution of any variable.
 This statistical test follows a specific distribution
known as chi square distribution.
 In general The test we use to measure the differences
between what is observed and what is expected
according to an assumed hypothesis is called the chi-
square test.

IMPORTANT CHARACTERISTICS OF A CHI


SQUARE TEST
➤ This test (as a non-parametric test) is based on
frequencies and not on the parameters like mean and
standard deviation.
➤ The test is used for testing the hypothesis and is not
useful for estimation.
➤ This test can also be applied to a complex contingency
table with several classes and as such is a very useful test
in research work.
.
➤ This test is an important non-parametric test as no
rigid assumptions are necessary in regard to the type
of population, no need of parameter values and
relatively less mathematical details are involved.

CHI SQUARE DISTRIBUTION:


If X, X....X are independent normal variates and each
is distributed normally with mean zero and standard
deviation unity, then X,+X+......+X, X, is distributed as
chi square (c2 )with n degrees of freedom (d.f.) where n
is large. The chi square curve for d.f. N=1,5 and 9 is as
follows.
Figure 14.1 Chi-Square Distributions for 1, 5, and 9 Degrees of
Freedom

If degree of freedom > 2: Distribution is bell shaped

If degree of freedom 2: Distribution is L shaped with


maximum ordinate at zero

If degree of freedom <2 (>0): Distribution L shaped


with infinite ordinate at the origin.
APPLICATIONS OF A CHI SQUARE TEST

This test can be used in;


1) Goodness of fit of distributions
2) test of independence of attributes
3) test of homogeneity.

1) TEST OF GOODNESS OF FIT OF


DISTRIBUTIONS:
➢ This test enables us to see how well does the
assumed theoretical distribution (such as
Binomial distribution, Poisson distribution
or Normal distribution) fit to the observed
data.
➢ The x2 test formula for goodness of fit is:

Χ² = Σ (o-e)²
e
Where,
o = observed frequency
e = expected frequency
➢ If x2 (calculated) > 22 (tabulated), with (n-1)
d.f, then null hypothesis is rejected otherwise
accepted.
➢ And if null hypothesis is accepted, then it
can be concluded that the given distribution
follows theoretical distribution.
2) TEST OF INDEPENDENCE OF
ATTRIBUTES
➢ Test enables us to explain whether or not
two attributes are associated.
➢ For instance, we may be interested in
knowing whether a new medicine is
effective in controlling fever or not, x2 test
is useful.
➢ In such a situation, we proceed with the
null hypothesis that the two attributes
(viz., new medicine and control of fever) are
independent which means that new
medicine is not effective in controlling
fever.
➢ x2 (calculated) > x2 (tabulated) at a certain
level of significance for given degrees of
freedom, the null hypothesis is rejected, i.e.
two variables are dependent.(i.e., the new
medicine is effective in controlling the
fever) and if, x2 (calculated) <x2
(tabulated), the null hypothesis is accepted,
i.e. 2 variables are independent. (i.e., the
new medicine is not effective in controlling
the fever).
➢ When null hypothesis is rejected, it can be
concluded that there is a significant
association between two attributes.
3.) TEST OF HOMOGENITY
➢ This test can also be used to test whether the
occurance of events follow uniformity or not e.g.
the admission of patients in government hospital
in all days of week is uniform or not can be tested
with the help of chi square test.
➢ x2 (calculated) <x2 (tabulated), then null
hypothesis is accepted, and it can be concluded
that there is a uniformity in the occurance of the
events. (uniformity in the admission of patients
through out the week).

CALCULATION OF CHI SQUARE

Χ² = Σ (o-e)²
e
Where,
O = observed frequency
E = expected frequency
If two distributions (observed and theoretical) are
exactly alike, x2=0; (but generally due to sampling
errors, x2 is not equal to zero)
STEPS INVOLVED IN CALCULATING x2

1) Calculate the expected frequencies and the


observed frequencies:

Expected frequencies fꬲ: the cell frequencies


that would be expected in a contingency table if the
two variables were statistically independent.

Observed frequencies fo: the cell frequencies


actually observed in a contingency table.
fꬲ= (column total) (row total)
N
To obtain the expected frequencies for any cell in
any cross- tabulation in which the two variables
are assumed independent, multiply the row and
column totals for that cell and divide the product
by the total number of cases in the table.

2) Then x2 is calculated as follows:

x² = Σ (fe - fo)²
fe
CONDITIONS FOR THE APPLICATION
OF TEST
The following conditions should be satisfied before
X2 test can be applied:

1) The data must be in the form of frequencies


2) The frequency data must have a precise
numerical value and must be organised into
categories or groups.
3) Observations recorded and used are collected on
a random basis.
4) All the itmes in the sample must be
independent.
5) No group should contain very few items, say less
than 10. In case where the frequencies are less
than 10, regrouping is done by combining the
frequencies of adjoining groups so that the new
frequencies become greater than 10. (Some
statisticians take this number as 5, but 10 is
regarded as better by most of the statisticians.)
6) The overall number of items must also be
reasonably large. It should normally be at least 50.

EXAMPLE:
A die is thrown 132 times with following results:
Solution: Let us take the hypothesis that the die is
unbiased. If that is so, the probability of obtaining
any one of the six numbers is 1/6 and as such the
expected frequency of any one member coming
upward is 132=1/6=22 . Now we can write the
observed frequencies along with expected
frequencies and work out the value of x² as follows:

∑[(O-E)²/E]=9

➢ Hence, the calculated value of x²=9 .


➢ Degrees of freedom in the given problem is
(n-1)=(6-1)=5.

The table value x² for 5 degrees of freedom at 5


percent level of significance is 11.071. Comparing
calculated and table values of x², we find that
calculated value is less than the table value and as
such could have arisen due to fluctuations of
sampling. The result, thus, supports the hypothesis
and it can be concluded that the die is unbiased.
YATE'S CORRECTION

If in the 2*2 contingency table, the expected frequencies are ........ say
less than 5, then 22 test can't be used. In that case, the direct formula
of the chi square test is modified and given by Yate's correction for
continuity

z²(connected) = N – (/ad-bc/-0.5N)²
R1R2C1C2

LIMITATIONS OF A CHI SQUARE TEST

1) The data is from a random sample.


2) This test applied in a fourfold table, will not give a
reliable result with one degree of freedom if the expected
value in any cell is less than 5. in such case, Yate's
correction is necessary. i.e. reduction of the mode of (o-e)
by half.
3) Even if Yate's correction, the test may be misleading if
any expected frequency is much below 5. in that case
another appropriate test should be applied.
4) In contingency tables larger than 2*2, Yate's correction
cannot be applied.
5) Interpret this test with caution if sample total or total
of values in all the cells is less than 50.
6) This test tells the presence or absence of an association
between the events but doesn't measure the strength of
association.
7) This test doesn't indicate the cause and effect, it only
tells the probability of occurance of association by chance.
8) The test is to be applied only when the individual
observations of sample are independent which means
that the occurrence of one individual observation (event)
has no effect upon the occurrence of any other
observation (event) in the sample under consideration.
ANALYSIS OF
VARIANCE
(ANOVA)
CONTENTS

 1.Introduction
 2.F-Statistics

 3.Technique of analysing variance

 4.Classification of analysis of
variance
a. One-way classification
b. Two-way classification
 5.Applications of analysis of
variance
 6. Referencesles
INTRODUCTION

 The analysis of variance(ANOVA) is developed


by R.A.Fisher in 1920.
 If the number of samples is more than two the
Z-test and t-test cannot be used.
 The technique of variance analysis developed
by fisher is very useful in such cases and with
its help it is possible to study the significance of
the difference of mean values of a large no.of
samples at the same time.
 The techniques of variance analysis originated,
in agricultural research where the effect of
various types of soils on the output or the effect
of different types offertilizers on production had
to be studied.
 The technique of the analysis of variance was
extremely useful in all types of researches.
 The variance analysis studies the significance
of the difference in means by analysing
variance.
 The variances would differ only when the
means are significantly different.
 The technique of the analysis of variance as
developed by Fisher is capable of fruitful
application in a variety of problems.
 Ho: Variability w/i groups = variability b/t
groups, this means that,
 Ha: Variability w/i groups does not = variability
b/t groups,

F-STATISTICS

 ANOVA measures two sources of variation in


the data and compares their relative sizes.

* variation BETWEEN groups:


-for each data value look at the difference
between its group mean and the overall mean.

*variation WITHIN groups:


- for each data value we look at the
difference between that value and the mean of
its group.
 The ANOVA F-statistic is a ratio of the Between
Group Variaton divided by the Within Group
Variation:

F= Variaauce Between the scumpless


MSCVariance withion chice scampokess MSE

 A large F is evidence against Ho, since it indicates


that there is more difference between groups than
within groups.

TECNIQUE OF ANALYSING VARIANCE

 The technique of analysing the variance in case of a


single variable and in case two variables is similar.
 In both cases a comparison is made between the
variance of sample means with the residual variance.
 However, in case of a single variable, the total
variance is divided in two parts only, viz..,
 variance between the samples and variance within the
samples.
 The latter variance is the residual variance. In case of
two variables the total variance is divided in three
parts, viz.
(i) Variance due to variable no.1
(ii) Variance due to variable no.2
(iii) Residual variance.
CLASSIFICATION OF ANOVA

 The Analysis of variance is classified into two


ways:
a. One-way classification
b. Two-way classification

ONE-WAY CLASSIFICATION

 In one-way classification we take into account


only one variable- say, the effect of different types
of fertilizers or yield.
 Other factors like difference in soil fertility or the
availability of irrigation facilities etc. are not
considered.
 For one-way classification we may conduct the
experiment through a number of sample studies.
 Thus, if four different fertilizers are being studied
we may have four samples of, say, 10 fields each
and conduct the experiment.
 We will note down the yield on each one of the
field of various samples and then with help of F-
test try to find out if there is a significant
difference in the mean yields given by different
fertilizers.
a. We will start with the Null Hypothesis
that is, the mean yield of the four
fertilizers is not different in the universe,
or
Η0: μ1=μ2=μ3=μ4
The alternate hypothesis will be
Η0:μ1≠μ2 ≠ μ3 ≠ μ4

b. Compute grad total, G=Σxc2+Σxc2+Σxc3


Correction factor(C.F)=G²N-D

c. Total sum of samples (SST)=A-D


SST=Σxc2².Σxc2². Σx c3² -G ² N

d. Sum of squares between samples (colums)


SSC=(Σxc2)²nc2 + Σxc2 ² /nc2+Σxc3)(Σx) ² /nc3- G ² /N
Where n c1= no. of elements in first column etc.
e. Sum of squares with in samples,
SSE=SST-SSC SSE-A-D-(B-D)=A-B

f. The no. of d.f for between samples


1 1) 1=C-1

g. The no.of d.f for Within the samples, 1 1) 2=N-C

h. Mean squares between columns, MSC=SSC/₁ =


SSC/C-1

i. Mean squares within samples,


MSE-SSE-SSEN-C
F=MSCMSE if MSC>MSE or
MSEMSC if MSE>MSC

j. Conclusion: Fcal< Ftab= accept Ho


TWO WAY CLASSIFICATION

1.In a one-way classification we take into account


the effect of only one variable.
2.If there is a two way classification the effect of
two variables can be studied.
3. The procedure' of analysis in a two-way
classification is total both the columns and
rows.
4. The effect of one factor is studied through the
column wise figures and total's and of the other
through the row wise figures and totals.
5. The variances are calculated for both the
columns and rows and they are compared with
the residual variance or error.
a. We will start with the Null Hypothesis that is,
the mean yield of the four fields is not different
in the universe, or
Ηo:μ 1= μ 2 = μ 3 = μ 4
The alternate hypothesis will be
Ηo:μ 1≠μ 2 ≠ μ 3 ≠ μ 4

b.Compute grad total,


G=∑×c2+ ∑ × c2 +Σ × c3
Correction factor(C.F)= G²∕N−D

c. Total sum of samples (SST)=A-D SST= ∑×c2²+ ∑


× c2 ²+Σ × c3²- G²∕N

d.Sum of squares between samples(colums)


SSC=B-D
SSC=(∑c1› ² ∕n c1 +(∑ × c2› ² ∕n c2+ ∑×c3› ² ∕n c3- G-N
Where n c1 = no. of elements in first column etc.
e. Sum of the squares between rows SSR= Σ×r1 › ² ∕n
r1+(Σ ×r2 ² ∕n r2²+ Σ ×r3 › ² ∕n r3²-G ² ∕ N
nr2= no. of elements in first row SSR=C-D

f. Sum of squares within samples, SSE-SST-


(SSC+SSR)=SSE=A-D-(B-D+C-D)

g. The no. of d.f for between samples


₁ =C-1

h. The no. of d.f for between rows, ₂ =r-1

i.The no. of d.f for within samples, 3=(C-1)(r-1)

j. Mean squares between columns,


MSC=SSC/SSC/C-1

k. Mean squares between rows,


MSR-SSR/ 2

l. Mean squares within samples,


MSE=SSE/3=SSE/(C-1)(r-1)

m. Between columns
F=MSC/MSE
if Fcal < Ftab = accept Ho

n. Between rows F=MSR/MSE


if Fcal< Ftab = accept Ho
ANOVA TABLE FOR TWO-WAY
APPLICATIONS OF ANOVA

• Similar to t-test
• More versatile than t-test
• ANOVA is the synthesis of several ideas & it is
used for multiple purposes.
• The statistical Analysis depends on the design
and discussion of ANOVA therefore includes
common statistical designs used in
pharmaceutical research. • This is particularly
applicable to experiment otherwise difficult to
implement such as is the case in Clinical trials.
• In the bioequelence studies the similarities
between the samples will be analyzed with
ANOVA only.
• Pharmacokinetic data also will be evaluated
using
• This is particularly applicable to experiment
otherwise difficult to implement such as is the
case in Clinical trials.
• In the bioequelence studies the similarities
between the samples will be analyzed with
ANOVA only.
• Pharmacokinetic data also will be evaluated
using ANOVA.Pharmacodynamics (what drugs
does to the body) data also will be analyzed with
ANOVA only.
• That means we can analyze our drug is showing
significant pharmacological action (or) not.
•Compare heights of plants with and without galls.
• Compare birth weights of deer in different
geographical regions.
• Compare responses of patients to real medication vs.
placebo.
•Compare attention spans of undergraduate students
in different programs at PC.

General Applications:

• Pharmacy
• Business research
• Biology
• Microbiology
• Agriculture
• Statistics
• Marketing
• Finance
• Mechanical calculations
REFERENCES

 DN Elhance, B M Aggarwal Fundamentals of


statistics, Page No: (25.1-25.19).
 Guptha SC, kapoor VK. Fundamentals of
applied statistics. 4th Ed. New Delhi: Sultan
Chand and Sons; 2007.page no: (23.12-23.28).
 Lewis AE. Biostatistics, 2nd Ed. New York:
Reinhold Publishers Corporation; 1984. Page
no:Arora PN, Malhan PK. Biostatistics. Mumbai:
Himalaya Publishing House; 2008.Page no:
 Bolton S, Bon C, Pharmaceutical Statistics, 4th
ed. New York: Marcel Dekker Inc;
2004.2lenovoFR-% 5346778WERTYU
4SAltDFGHJXCVBN

You might also like