Chapter 14 (The Chi-Square Test)
Chapter 14 (The Chi-Square Test)
Page 1 of
This chapter describes two types of tests: 1. Tests of hypothesis about contingency tables called independence tests 2. Tests of hypothesis for experiments with more than two categories, called goodness of fit tests
2 All of these tests are performed by using the chisquare distribution. It is written as distribution, which is pronounced as ki i!e the t distribution the chisquare has only one parameter called the degrees of freedom "df# . The shape of a specific chisquare distribution depends on the number of degrees of 2 freedom. The random $ariable assumes nonnegati$e $alues only. %enc&e a chisquare distribution cur$e starts at the origin and lies entirely to the right of the $ertical axis. If we !now the degrees of freedom and the area in the right tail of a chisquare distribution 2 cur$e, we can find the $alue of from the table.
1&.1 '( ) )*+TI+,-+). TA/ -0. Information can be summari1ed and presented using a two way classification table called a contingency table., which is also called a contingency table or cross tabulation In a test of independence for a contingency table, we test the null characteristics of the elements of a gi$en population are not related that they are independent# against the alternati$e hypothesis that the two characteristics are related " that they are dependant#. may want to test if there is an association between being a male or female and ha$ing a preference for watching sports or soap operas on tele$ision. 2e perform such a test by using the chi3square distribution The 4egrees of 5reedom for a test of independence are df = " R 1#"C 1# 2here ' and ) are the number of rows and number of columns, respecti$ely, in the gi$en contingency table 1&.1.1 A T-0T *5 I+4-6-+4-+)- *5 %*7*,-+-IT. The $alue of the test statistic 2 for a test of independence is calculated as
2 =
(O E ) 2
E
2here * and - are the obser$ed and expected frequencies, respecti$ely for a cell. The null hypothesis in a test of independence is always that the two attributes are not related. The alternati$e hypothesis is that the two attributes are related. The frequencies obtained from the performance of an experiment for a contingency table are called the obser$ed frequencies. The expected frequency - for a cell is calculated as
Page 2 of
E=
( rowtotal )( columntotal )
sum
1&.2 )%I3089A'- ,**4+-003*535IT T-0T This 0ection explains how to ma!e tests of %ypothesis about experiments with more than two possible outcomes "categories#. 0uch experiments called multinomial experiments possess four characteristics. A Multinomial Experiment An experiment with the following characteristics is called a multinomial experiment 1. 2. =. &. It consists of n identical trials -ach trial results in one of ! possible outcomes" categories # where !<2 The trials are independent The probabilities of the $arious outcomes remain constant for each trial
1&.2.1 *bser$ed and expected frequencies The frequencies obtained from the actual performance of a test are called obser$ed frequencies. In a goodness :of3fit test, we test the null hypothesis that the obser$ed frequency for an experiment follows a certain pattern or theoretical distribution. It is called a goodness of fit test because the hypothesis tested is how good the obser$ed frequencies fit a gi$en pattern. * denotes them. The expected frequencies, denoted by E are the frequencies that we will expect to obtain if the null hypothesis is true. The expected frequency for a category is obtained as E =np, where n is the sample size and p is the probability that an element belongs to that category if the null hypothesis is true
1&.2.2 4egrees of freedom for a goodness of fit test In a good ness of fit test, the degrees of freedom are df = k 1 where ! denotes the number of possible outcomes for the experiment 14.2.3 Test statistic for a goodness of fit test The e! ! a "! "# fo$ a goodness of fit test "! 2 a%& " ! 'a()e "! #a(#)(a e& a!
2 =
(O E )2
E
where
Page 3 of
-; expected frequency for a category 'emember that a chi3square goodness of fit test is always a right tailed test
PROBLEM SET Se# "o% 1 1. =>> employees of a company were selected at random and as!ed whether they were in fa$or of a scheme to introduce flexible wor!ing hours. The following table shows the opinions and the departments of the employees 4epartment *pinion Infa$our 9ncertain 6roduction ?@ &2 0ales A= =B Administration =? 12 Test whether there is e$idence of a significant association department C " ?.@?# Against @ 11 1> between opinion and
2. A group of executi$es was classified according to total income and age. Test the hypothesis , that age is not related to the le$el of income Age ess than D1>>,>>> D1>>,>>> to D D&>>,>>> or more =@@,@@@ 9nder &> B @ A &> to A& 1? 1@ ? AA or older 11 12 1E Test whether or not type of industry is independent of stateC "B.?A# =. 0uppose a personnel department in in$estigated absentees, by categori1ing absentees according to the shift on which they wor!ed , as shown in the following table,. 4ay of the 2ee! 0hift 4ay -$ening 7onday &@ A> Tuesday =B =? 2ednesd ay &= &> Thursday &> &> 5riday &A &1
Page 4 of
Is there is sufficient e$idence at AF significance le$el of an association between the days on which the employees are absent and the shift on which the employees wor! C " >.=21E# &. A company owns %yper mart in $arious parts of the country. The hyper marts are situated near large cities. -ach %yper mart has a a large car par! that is free to use to users. The directors thin! that there are regional differences in the distances that customers tra$el to reach these stores. A hyper mart was selected in each of the three regions and a random sample of customers at each store was as!ed how far they ha$e tra$eled to reach the store . The result were as follows 'egion 0outh 7iddle +orth ess Than A 7iles A> ?> E> /etween A and 1> ?> B> 2> miles 7ore than 1> miles E> B> 1> -xamine at AF significance le$el whether there is any relation ship between distance tra$eled and region C " AE.?A# A. The mar!eting director for a metropolitan daily news paper is studying the relationship between the type of community the reader li$es in and the portion of the paper he or she reads first. 5or a sample of readers the following information is obtained +ational news 0ports )omics 9rban 1E> 1&> @> 'ural 1>> 11> 1>> 5arm 1=> 1>> B> At the >.>A significance le$el , can we conclude that there is a relation ship between the type of community where the person resides and the portion of the paper he reads firstC " ?>.BE?# B. The following data concerning industrial accidents and absentees are )lassified according top the type of employee Type of employee Absence following accident 9p to *ne 7onth *ne month or onger the 7en 2B 1& 2omen 1B @ Gu$enile ? E 4istance Tra$eled
Is there any e$idence to suggest that the se$erity of accident is associated with the types of employeeC 9se a AF significance le$elC ">.BB1?#
Page * of
E A tile company was interested in comparing the fraction of new house builders fa$oring three types of tiles as floor co$erings for their houses in three different areas of Hlang $alley i.e. 0ubang Iaya, 6uchong and 6etaling Gaya. A sur$ey was conducted and the data were as follows Are a 5loor 0ubang Gaya 6uchong 6etalling Gaya )o$ering Type1 22& 1BA =B Type 11 1@B 1A2 && Type111 ?> ?= 2> Test at AFsignificance le$el whether there is any association between types of tiles used and the areas concerned. A.&# ?. A large consultancy firm regularly recruits 7/A graduates. The personnel director has categori1ed each business school producing 7/A graduates as top rate, adequate or bad to assist their recruitment strategy. A sur$ey of the performance of 1>> recent recruits has rated them as excellent, a$erage or poor. A cross3classification of the results of the sur$ey is shown in the table below. 'ating *f ,raduates -xcellent A$erage 1> 1> E => = 2> 6oor A ? E
Is there a relation ship between the rating of these recruits and the business school at which they were trained. " Test at the AF significance le$el# "@.&&# Se# "o% 2 1. A group of =?A mental patients has been classified according to parental social class, with the following results 0ocial 9pper 9pper 7iddle ower ower )lass 7iddle middle 5requency 1? =1 &B 12B 1B& Test a AF significance le$el that the data are consistent with the assumption that all social classes are equally li!ely to be represented "@.&?# 2. 7otor Jehicle production is the same each days. The following information is gi$en below 4ays +o:of $ehicles 7onday 1B> Tuesday 1&> 2ednesday 1=@ Thursday 1&1 5riday 1E>
Page + of
Test at 1>F significance le$el to determine whether the number of $ehicles is the same throughout the wee!C "A.=B# =. It has been estimated that employee absenteeism costs 7alaysian companies more than '7 A>> million per year. The personnel department of a large corporation recorded the wee!days during which indi$iduals in a sample of &22 absentees were away o$er the past se$eral months. .4o these data suggests that absenteeism is higher on some days of the wee! "use = 0.05 ) " &.>@1# 4ay +umber absent 7onday @@ Tuesday E& 2ednesday ?= Thursday ?> 5riday ?B
&. A company !eeps detailed records of staff accidents. 4uring a recent safety re$iew , A random sample of B> accidents was selected and classified by the day of the wee! on which they occurred 4ay 7onday Tuesday 2ednesday Thursday 5riday +umber of accidents ? 12 @ 1& 1E Test at AF le$el of significance whether there is any e$idence that accidents are more li!ely to happen on some days than othersC "&.A# A. A study reports an analysis of =A!ey product categories. At the time of the study, E2.@F of the products sold were of a national brand, 2= F were pri$ate :label and &.1 F were generic. 0uppose that you want to test whether these percentages are still $alid for the mar!et today. .ou collect a random sample of 1>>> products in the =A product categories studied, and you find the following: B1> products are of a national brand, 2@> are pri$ate label, and 1>> are generic. )onduct the test at the >.>2A le$el of significance. "11@.@?# B. A farmers apples are graded on a scale from A to 4 before sale. 6ast experience shows that the percentages of apples in the four grades are as follows. ,rade A / ) 4 F 2@ =? 2E B The farmer introduces a new treatment and applies it to a small number of trees to see if it affects the distribution of grades. The apples produced by these trees are graded as following ,rade A / ) 4 +umber of apples E@ @& A? 1@ Test at the AF le$el of significance to see if the new treatment has affected the distribution of gradesC " =.>?# E. In a certain town in the 0elangor state, the retailing mar!et for petrol is shared among se$eral companies. Their mar!et share can be established in the ratio of &A: 2A: 2>: 1> respecti$ely. A sur$ey was conducted recently among 1>>> car owners in that town and their preference were tabulated as follows
Page , of
*il )ompany 0hell -sso 6etronas *thers +umber of )ar *wners &2> =>> 21> E> 2 9se a test at 1F significance le$el to test the hypothesis that there has been no change in the mar!et share for petrolC " 21.A# ?. An organi1ation recently published the number of acts of $iolence seen in types of tele$ision programs.. Type of 4rama *ld mo$ies )artoon 6olice )omedy +ews program Acts *f &2 AE ?= @2 =? ?1 Jiolence The organi1ation claimed that such acts occur with equal frequency across all types of program. Test this claim at 1>F le$elC " &>.1&# @. 0eattle Air craft )ompany Inc 7anufactures and 0ells Twin *tters in the 9.0. 'ecords of the company showed that sales, by regions , in the pre$ious years were distributed according to the following proportions 2est +orth +orth -ast )oast )entral 6ercentage "F# => 2A 2> This year, the numbers of planes sold in these regions are 'egion +o. *f 6lanes 2est )oast ==> +orth )entral 22> +orth -ast 1E> 'egion 0outh 1> 0outh 12> 0outh -ast 1A 0outh -ast 1B>
)an you conclude at 1F significance le$el that the sales distribution for this year differ significantly from those of the pre$ious yearsC" 1A.1# 1>. The 46 express way, which has fi$e lines after the tollgate, was studied to see whether dri$ers preferred to dri$e on the inside lanes. A total of 1>>> automobiles was obser$ed during the early morning traffic, and the number of cars on respecti$e lanes were recorded.. The result were as follows: ane 1 2 = & A *bser$ed count @B 1A& 2EA 22A 1E1 4o the data pro$ide sufficient e$idence at AFle$el of significance to indicate that some lanes are preferred o$er others " 1>1.?1#
Page 8 of
11. A sur$ey of the employees of a large company was conducted to see whether competence in computing s!ills was related to age. The results of the sur$ey are gi$en below Age ,roup " years # 1?and under => => and under &A &A and o$er "i# ,ood E> &> => )omputing 0!ill A$erage 6oor 2> 1> => => => B>
"ii#
In a pre$ious assessment of computing s!ills ta!en for all e=mployess A years ago , it was found that =>F were good , 2>F a$erage and A>F poor. )ombine the three age groups of the data in part "i# and test whether there is any e$idence of a change in computing s!illC " &B.BE# Assuming that the sur$ey was conducted by means of a random sample test , test the hypothesis that computing s!ill is associated with ageC "AA.E1=#