9: Hypothesis Testing: 1 Some Definitions
9: Hypothesis Testing: 1 Some Definitions
9: Hypothesis Testing: 1 Some Definitions
1
1.1
Some denitions
Simple, compound, null and alternative hypotheses
In test theory one distinguishes between simple hypotheses and compound hypotheses. A simple hypothesis is an hypothesis that completely species the probability distribution. Examples: The parameter of this binomial distribution is p = 0.6 . This distribution is a normal one of average = 4.5 and standard deviation = 1.23 . The new treatments gives identical results to the previous one. A compound hypothesis does not completely specify the distribution. Examples: The parameter p of this binomial distribution is greater than 0.6 . These two distributions of common variance 1 have the same mean. The new treatment gives better results than the previous one. Often one has to consider the alternative to the proposed hypothesis. For example, if the parameter of this binomial law is not 0.6, does this just mean that data are not distributed according to p = 0.6, or more specically that p = 0.7, p < 0.6, or that the distribution is not binomial at all? There is somehow an asymmetry between the hypotheses one aims at checking: The privileged hypothesis, traditionally noted H0 , is called the null hypothesis The alternative hypothesis is noted H1
1.2
Two types of mistakes can be made: a Type I Error happens when the null hypothesis H0 was rejected though it should have been accepted, and a Type II Error occurs when the alternative hypothesis H1 was rejected though it should have been accepted, or , equivalently, when the null hypothesis H0 was accepted though it should have been rejected. An example can be seen in the case of a criminal trial. The null hypothesis is he is guilty, the alternative one is he is innocent; a type I error consists of letting free a guilty person, and a type II error consists of condemning an innocent. A test is a procedure which divides the space of observations into 2 regions, R and A. The two important characteristics of a test are called signicance and power, which refer to errors of type I and II respectively: Signicance = 1 = 1 Prob (x R | H0 ) = 1 Power = 1 = 1 Prob (x A | H1 ) = 1 Prob (x | H0 ) dx = Prob (x | H1 ) dx = Prob (x | H0 ) dx Prob (x | H1 ) dx
R A
A R
The determination of a test is usually a trade-o between and . One commonly encountered procedure is to set a priori the signicance to a xed value ( = 0.01, 0.05, ...) and nd the most powerful test. To make as small as possible for a given , the integral over the chosen rejection region R Prob (x | H1 ) dx = 1 must be as large as possible, for a given R Prob (x | H0 ) dx = . In the case where data consist of one measurement, say x, the choice of sets , through the test x < xc . In other cases, dierent tests correspond tho the same given . It should be noted that in the following nothing is known about the a priori probability, if such a thing exists, of the hypothesis H0 with respect to that of H1 . For example, if we are dealing with the one-by-one identication of two types of cell in a test-tube, the formalism makes no use of their relative concentration .
The Neyman Pearson test applies to the case of a simple null hypothesis against a simple alternative hypothesis. The rejection region is determined by the following theorem: For a given , the most powerful test rejects H0 in a region such as Lets rst give a rigorous proof, then a more intuitive one. Rigorous proof. Let R be the rejection region, dened by Prob (x | H1 ) / Prob (x | H0 ) > k. By denition of the acceptance , we have Prob (x R | H0 ) = . Let be another test of signicance , of rejection region S, with . We want to show that this new test is less powerful, i.e. that Prob (x S | H1 ) < Prob (x R | H1 ) One has: R is dened by = = Prob (x R S | H0 ) + Prob (x R S | H0 ) Prob (x R S | H0 ) + Prob (x S R | H0 ) Prob (x | H1 ) >k Prob (x | H0 )
Prob (x R S | H0 ) Prob (x R S | H0 ) Prob (x | H1 ) >k Prob (x | H0 ) Prob (x I | H1 ) > k Prob (x I | H0 ) and for any region O outside R, Prob (x O | H1 ) k Prob (x O | H0 ) Thus Prob (x R S | H0 ) < Prob (x R S | H0 ) 1 Prob (x R S | H1 ) k 1 Prob (x R S | H1 ) k
Adding to the latter two terms the quantity Prob (x R S | H1 ), one gets the nal result: Prob (x S | H1 ) < Prob (x R | H1 ) The requirement of H0 and H1 be simple is essential to be able to write expressions involving probabilities like Prob (x S | H1 ), ... A more intuitive proof can be given at follows: assume we have dened an acceptance region A. Lets further assume we want to slightly modify it, keeping the same signicance ; this is achieved by adding a small region 2 and removing 1 , of equal weight in terms of H0 : Prob (x 1 | H0 ) = Prob (x 2 | H0 ) . If we want to increase the power of the test, we must fulll the condition Prob (x 2 | H1 ) > Prob (x 1 | H1 ) which is, by construction of A, impossible.
3
3.1
Let be n observations from a Gaussian distribution of unknown mean , but of known variance 2 . can be either 0 (null hypothesis), or 1 (alternative hypothesis). Note that both hypotheses are simple ones. Lets assume that 1 > 0 . The likelihood of the observation is 1 (xi )2 Prob (x | ) = i exp( ) 2 2 2 1 i (xi )2 = n n exp( ) 2 2 ( 2) nx x2 1 n2 = n n exp( 2 ) exp( 2 ) exp( 2 ) 2 2 ( 2) In the framework of the Neyman Pearson test, the ratio of the likelihood reads: Prob (x | 1 ) n(1 0 )x = K exp( ) Prob (x | 0 ) 2 where K is a constant which does not depend on the observations. The Neyman Pearson theorem tells us to reject H0 if n(1 0 )x exp( )>k 2 where k is here the generic term for a constant. Since 1 0 > 0, the test will reject the hypothesis = 0 if x > c . The value of c is determined by the equation Prob (x > c ) = As an example, if we set = 0.05, this corresponds to c = 0 + 1.645 / n If we now test H1 against H0 , we get the result that one has to reject H1 if x < D , with Prob (x < d ) = d = 1 1.645 / n, for = 0.05
These striking results show how asymmetric are the roles played by the null and the alternative hypotheses. 3
3.2
We have performed n observations from a binomial law, and got r successes. The null hypothesis is the parameter of this distribution is p = p0 , and the alternative one is p = p1 , with p1 > p0 . In the Neyman Pearson formalism, the ratio of likelihoods is Prob (r | H1 ) Prob (r | H0 ) p1 r 1 p1 nr ) ( ) p0 1 p0 1 p1 n p1 /(1 p1 ) r = ( ) ( ) 1 p0 p0 /(1 p0 ) 1 p1 n p1 /(1 p1 ) r ) ( > k( ) >k 1 p0 p0 /(1 p0 ) p1 /(1 p1 ) r log >k p0 /(1 p0 ) r > rc = (
Prob (r | H1 ) Prob (r | H0 )
A numerical example will introduce the concept of randomized tests: assume N = 10, p0 = 0.5, p1 = 0.6. Lets set the signicance to = 0.05. We are looking for rc such as Prob (r > rc | p = 0.5) = 0.05. Looking at the tables show that Prob (r > 7 | p = 0.5) = 0.0547 Prob (r > 8 | p = 0.5) = 0.0108 Two options are opened: Change the signicance of the test to = 0.0547 by rejecting if r > 7; Decide that in the case of r = 8, reject H0 with a probability such as; Prob (r = 8 | p = 0.5) + Prob (r > 8 | p = 0.5) = 0.05 One nds = 0.89. In other words, chance will help in deciding. Lets now consider a slightly modied test: we set a priori r to a given value, and perform n experiments until we get r successes. The formulae read: Prob (n) = n1 r1 pr (1 p)nr
p1 1 p1 r 1 p0 n Prob (n | p = p1 ) = ( )r ( ) ( ) Prob (n | p = p0 ) p0 1 p0 1 p1 1 p0 n Prob (n | p = p1 ) >k ( ) >k Prob (n | p = p0 ) 1 p1 where the last equation comes from the fact that it is the only term depends on n. This leads to reject the hypothesis p = p0 if n < nc , since 1 p0 < 1 1 p1 One can see, comparing with the rst method described at the beginning of this section, that the same pair of results (n, r) can lead to a dierent conclusion, depending on the way it was obtained. This is the classical criticism addressed by Bayesian statisticians to the Neynam Pearson theory. 4
4
4.1
In the previous two sections we have discussed tests theory in the case of a simple null hypothesis against a simple alternative hypothesis. We have seen that the Neyman Pearson theorem gives the framework for nding the most powerful test, given a priori the signicance. In this section we will extend these results to compound hypotheses, restricting ourselves to a specic class of distributions: the Exponential family.
4.2
The Exponential family comprises all distributions which can be generically written: Prob (x, ) = C() h(x) exp[T (x)] Lets start with the particular case of the exponential distribution Prob (x, ) = x 1 exp( )
and lets again test the null hypothesis = 0 against the alternative one = 1 , with 1 > 0 . The ratio of likelihoods can be written 0 xi xi Prob (x | = 1 ) = ( )n exp[ ] Prob (x | = 0 ) 1 0 1 Prob (x | = 1 ) xi xi > k > k Prob (x | = 0 ) 0 1 1 xi > xc n where xc is dened by 1 Prob ( xi > xc ) = n In the general case, the result is basically the same: Prob (x | = 1 ) C(1 ) n exp[1 T (xi )] = ( ) Prob (x | = 0 ) C(0 ) exp[0 T (xi )] Prob (x | = 1 ) > k (1 0 )T (xi ) > k Prob (x | = 0 ) As a consequence, the rejection region is 1 T (xi ) > Tc n
4.3
Lets now consider the test of the simple hypothesis H0 : = 0 against the compound unilateral alternative H1 : > 0 . We just saw that the test of H0 against any simple alternative H1 : = 1 > 0 does not depend at all on the value of 1 . This test is therefore the most powerful one for each of the simple hypotheses composing H1 . It it said to be uniformly the most powerful. 5
4.4
We now want to test the simple hypothesis H0 : = 0 against the alternative H1 : = 0 . The situation becomes more complex. One possibility is to envisage a reasonable solution: reject H0 if T (xi ) < k1 or if T (xi ) > k2 , sharing the risk of type I errors: Prob (T (xi ) < k1 | H0 ) = Prob (T (xi ) > k2 | H0 ) = 2 2
In this section we will show a generalization of the Neyman Pearson theorem to the case of a null compound hypothesis against simple alternative one. H0 is the union of a (innite) number of simple hypotheses. Any test of H0 against H1 will always result to split the set of observations into two areas, the acceptance region A and the rejection region R. In such a situation one can generally not set a priori the risk of a type I error to a given value . At best, one can set the conditions Prob (x R | H0 ) , H0 Prob (x R | H1 ) maximum
(3) (n)
The generalization of the Neyman Pearson theorem reads: given H0 compound of H0 , H0 , H0 , ..., H0 , the most powerful test of H0 against the simple hypothesis H1 rejects if Prob (x | H1 ) > k0 Prob (x | H0 ) + k0 Prob (x | H0 ) + ... + k0 Prob (x | H0 ) The k0
(n) (n) (n)
Example: Let x be a sample from a Gaussian distribution N (, 1); one wishes to test H0 : = 0 or 0 against H1 : = 1 . Lets assume 0 < 0 < 1 . The extended Neyman Pearson theorem yields: exp[n1 x] > k0 exp[n0 x] + k0 exp[n0 x] exp[n(1 0 )x] > k0 + k0 exp[n(0 0 )x] The problem is now to determine k0 and k0 . They cant both be negative, otherwise we would always reject H0 . Lets assume that k0 is positive and k0 either positive or negative. In such a case, the test rejects if x > xc , which was a predictable result. Because of the properties of the exponential distribution, Prob (x > xc | 0 ) > Prob (x > xc | 0 ) In order to limit the risk of a Type I error to , one has to choose xc so that Prob (x > xc | 0 ) = . Remains the case of k0 negative and k0 positive. Depending on the value of k0 , this would lead either to always reject H0 , or accept it inside a given interval, which is impossible. Conclusion: k , k > 0; the test is: reject H0 is x > xc , with xc so that Prob (x > xc | 0 ) = .