Test of Goodness of Fit and Independence: Chi-Square-test-as A Test of Independence
Test of Goodness of Fit and Independence: Chi-Square-test-as A Test of Independence
Prof. S P Bansal
Principal Investigator Vice Chancellor
Maharaja Agrasen University, Baddi
Prof. YoginderVerma
Co-Principal Investigator Pro–Vice Chancellor
Central University of Himachal Pradesh. Kangra. H.P.
Dr Deependra Sharma
Paper:15 , Quantitative Techniques for Management Decisions
Associate-Professor
Content Writer
Amity University Gurgaon.Haryana
Module: 20, Hypothesis Testing: Developing null and alternative hypotheses
Prof. S P Bansal
Principal Investigator Vice Chancellor
Maharaja Agrasen University, Baddi
Items Description of Module
Module Title Test of Goodness of Fit and Independence: Chi-Square-test-as a test of independence
Module Id 24
Introduction
Self-Check Questions
Summary
Quadrant-I
Learning objective
Introduction
The given set of data can be analyzed with the help of various tools available on the basis of the following
parameters;
Size of the Sample
Size of the Population
Scale used for measurement of data And dependency of measurement
The tests may be classified in to two category mainly Parametric and Non-Parametric. Three test i.e. t, z
and F are used to estimate and test the population parameters and prerequisite of application of these test
are-
Interval and ratio Scale to be used
Hypothesis testing for specific parameters
Assumption of normality and Standard deviation is known or not should be clear
The absence of these conditions leads to the application of Non-Parametric Tests or distribution free tests.
These tests are applied in following conditions;
Do not require specific population distribution and data can be nominal or ordinal
Does not takes in to consideration of population parameters
Does not require normally distributed population
These test are very easy to apply and can use nominal or ordinal data as well for calculation. These tests
provide broad based conclusion with approximate solution and does not necessarily require normally
distributed population. The χ-square test is one of the non-parametric tests used to test hypothesis.
Procedure
The procedure to test the association between two independent variables where the sample data is
presented in the form of contingency table with n rows and m columns is summarized as –
Ha: A relationship or association exists between variables i.e., they are related.
2. Select a random sample and record the observed frequencies (O) in each cell of the contingency table
and calculate the row, column and grand total.
where O is the observed frequency count and E is the expected frequency count.
df = (c - 1) * (r - 1)
where c is the number of levels for one categorical variable, and r is the number of levels for the other
categorical variable.
7. Compare the calculated and table value .If calculated value of chi-square is less than the table value,
accept the null hypotheis otherwise reject it
Example
A simple random sample of 1000 prospective voters was taken. They were categorized on the basis of
gender ( namely M/F) and on the basis of party liking (Republican, Democrat, or Independent).
The contingency table given below shows the result
Party liking
Row total
Republican Democrat Independent
Do the M's party liking differ significantly from the F's preferences? Use a 0.05 level of significance.
Solution
The first step is to state the null hypothesis and an alternative hypothesis.
For this analysis, the significance level is 0.05, chi-square test for independence will be used.
Degrees of freedom, the expected frequency counts, and the chi-square test statistic are calculated.
df = (c - 1) * (r - 1) = (2 - 1) * (3 - 1) = 2
This calculated value of chi-square statistic having 2 degrees of freedom is more than the table
value (refer Chi-square table ,hence null hypothesis is not accepted. Thus, we conclude that there
is a relationship between gender and voting preference.
Self-Check Questions:
Question 1: Two hundred randomly selected adults were asked whether TV shows as a whole are
primarily entertaining, educational or boring. The respondents were categorized by gender. Their
responses are given in the following table-
Opinion
Gender Entertaining Educational Waste of time Total
Female 52 28 30 110
Male 28 12 50 90
Total 80 40 80 200
Is this evidence convincing that there is a relationship between gender and opinion in the population of
interest?
Solution – Let us take the null hypothesis that the opinion of adults is independent of adults is
independent of gender.
Since, contingency table is of size 2x3,the degrees of freedom would be (2-1)(3-1) = 2.This implies that
we need to calculate only to calculate only two expected frequencies and the other four can automatically
be determined as shown below:
E21=80-E11=40-22==18
E22=40-E12=40-22=18
E23=80-E13=80-44=36
Opinion
Arranging the observed and expected frequencies as follows to calculate the value of x2-test statistic:
Since, calculated value of x2=16.766 is more than its critical value, x2=5.99 at α=0.05 and df = 2 ,the null
hypothesis is rejected. Hence, we conclude that the opinion of adults is not independent of gender.
Question 2: A sample analysis of examination results of 500 students was made. It was found that 220
students had failed ,170 had secured a third division 90 were placed in second division and 20 got a first
division. Are these figures commensurate with the general examination result which is the ratio of 4:3:2:1
for the various categories respectively?
Solution- Let us take the null hypothesis that the observed results are commensurate with the general
examination result which is the ratio 4:3:2:1.
The expected number of students who have failed, obtained a third division second division and first
division, respectively, are
Since calculated value of x2 = 23.667 is more than its table value , x2=7.81 at α = 0.05 level of
significance and df= n – 1 = 4 -1 =3 the hypothesis is rejected.
Question 3: Based on information on 1000 randomly selected fields about the tenancy status of the
cultivation of these fields and use of fertilizers ,collected in an AGRO ECONOMY survey, the following
classification was noted:
Would you conclude that owner cultivators are more towards the use of fertilizers at 5%level of
significance? Carry out a chi-square test as per testing procedure.
Solution: Let us take the hypothesis that ownership of fields and the use of fertilizers are independent
attributes. Since, contingency table is of size 2*2 the degree of freedom would be (2-1)(2-1)=1. This
implies that we need to calculate only one expected frequency and others can be automatically determined
as follows:
E11=600*480/1000=288
E12 =600-288=312
E21=480-288=192
E22=208
The calculated value of x2=273.534 at α=0.05 level of significance and df= (n-1) (r-1) =(2-1) (2-1) = 1 is
much more than its table value,χ2=3.84. The null hypothesis H0 is rejected. Hence, it can be conducted
that owners’ cultivators are more inclined towards the use of fertilizers.
Summary
The tests may be classified in to two category mainly Parametric and Non-Parametric. Three test i.e. t test,
z test and f test are used to estimate and test the population parameters and prerequisite of application of
these test are-interval and ratio Scale to be used, hypothesis testing for specific parameters, assumption of
normality and Standard deviation is known or not should be clear.
The absence of these conditions leads to the application of Non-Parametric Tests or distribution free tests.
These test are very easy to apply and can use nominal or ordinal data as well for calculation. These tests
provide broad based conclusion with approximate solution and does not necessarily require normally
distributed population. The Chi-Square test is one of the non-parametric tests used to test hypothesis. Chi-
Square Test for Independence test is applied when you have two categorical variables from a single
population. It is used to determine whether there is a significant association between the two variables.