Testing in Statistics
Testing in Statistics
Testing in Statistics
Chapter 8
Testing
Hypothesis Testing
A statistical hypothesis test is a method of making decisions using
experimental data. A result is called statistically significant if it is unlikely
to have occurred by chance.
These decisions are made using (null) hypothesis tests. A hypothesis
can specify a particular value for a population parameter, say =0.
Then, the test can be used to answer a question like:
Assuming 0 is true, what is the probability of observing a value for
the test statistic that is at least as big as the value that was actually
observed?
Uses of hypothesis testing:
- Check the validity of theories or models.
- Check if new data can cast doubt on established facts.
Chapter 8 - Testing
Hypothesis Testing
We will emphasize statistical hypothesis testing under the classical
approach (frequentist school).
There is a Bayesian approach to hypothesis testing. The decisions
regarding the parameter are based on the posterior probability i.e.,
the conditional probability that is computed after the relevant
evidence (the data, X) is taken into account. Based on the posterior
probabilities associated with different hypothetical values for , we
asses which hypothesis about is more likely.
Posterior: p( | X) p() p(X | ). (: proportional)
p(): Prior.
p(X | ): Likelihood
Hypothesis Testing
In general, there are two kinds of hypotheses:
(1) About the form of the probability distribution
Example: Is the random variable normally distributed?
(2) About the parameters of a distribution function
Example: Is the mean of a distribution equal to 0?
The second class is the traditional material of econometrics. We may
test whether the effect of income on consumption is greater than
one, or whether the size coefficient on a CAPM regression is equal to
zero.
Chapter 8 - Testing
Hypothesis Testing
Hypothesis testing involves the comparison between two competing
hypothesis (sometimes, they represent partitions of the world).
The null hypothesis, denoted H0, is sometimes referred to as the
maintained hypothesis.
The alternative hypothesis, denoted H1, is the hypothesis that will
be considered if the null hypothesis is rejected.
Idea: We collect a sample of data X1,Xn. This sample is a
multivariate random variable, En (an element of an Euclidean space).
Then, based on this sample, we follow a decision rule:
If the multivariate random variable is contained in space R, we
reject the null hypothesis.
Alternatively, if the random variable is in the complement of the
space R (RC) we fail to reject the null hypothesis.
Hypothesis Testing
Decision rule:
if X R ,
we reject H 0
if X R or X R C ,
we fail to reject H 0
The set R is called the region of rejection or the critical region of the test
The rejection region is defined in terms of a statistics T(X ), called the
test statistic. Note that like any other statistic, T(X) is a random variable.
Given this test statistic, the decision rule can then be written as:
T ( X ) R reject H 0
T ( X ) R C fail to reject H 0
Chapter 8 - Testing
Chapter 8 - Testing
Hypothesis Testing
There are two types of hypothesis regarding parameters:
(1) A simple hypothesis. Under this scenario, we test the value of a
parameter against a single alternative.
Example: H0:=0 against H1:=1.
(2) A composite hypothesis. Under this scenario, we test whether the
effect of income on consumption is greater than one. Implicit in this
test is several alternative values.
Example: H0:>0 against H1:<1.
Definition: Simple and composite hypotheses
A hypothesis is called simple if it specifies the values of all the
parameters of a probability distribution, say =0. Otherwise, it is called
composite.
=> () = 1- ().
Chapter 8 - Testing
Cannot reject
(accept) Ho
Correct decision
Type II error
Reject Ho
Type I error
Correct decision
Decision
Chapter 8 - Testing
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
-1.50
-1.00
0
0.00
-0.50
H0
0.50
1.00
1.50
H1
= Type II error
= Type I error
= Power of test
f (x ) = 1 + x for 1 x <
= 1+ x
for x + 1
This is a triangular probability density function.
We test H0:=0 against H1:=1, using a single observation of X.
1.20
1.00
0.80
0.60
0.40
0.20
0.00
-1.50
-1.00
-0.50
0.00
0.50
1.00
-0.20
0
1.50
2.00
2.50
Chapter 8 - Testing
1
1
2
1/2
1/2
Chapter 8 - Testing
=
=
+ (1 )
+ (1
) 2
Chapter 8 - Testing
UMP Test
Definition: Uniformly most powerful (UMP) test
R is the uniformly most powerful test of level (that is, such that (R) )
and for every test R1 of level (that is, (R1) ), if (R) (R1).
For every test: for alternative values of 1 in H1:=1. .
Choosing between admissible test statistics in the (,) plane is
similar to the choice of a consumer choosing a consumption point in
utility theory. Similarly, the tradeoff problem between and can be
characterized as a ratio.
Chapter 8 - Testing
Neyman-Pearson Lemma
Neyman-Pearson Lemma provides a procedure for selecting the
best test of a simple hypothesis about : H0:=0 against H1:=1.
Let L(x|) be the joint density function of X. We determine R
based on the ratio L(x|1)/L(x|0). (This ratio is called the likelihood
ratio.) The bigger this ratio, the more likely the rejection of H0.
Neyman-Pearson Lemma
Consider testing a simple hypothesis H0: = 0 vs. H1: = 1, where
the pdf corresponding to i is L(x|i), i=0,1, using a test with
rejection region R that satisfies
(1)
x R if L(x|1) > k L(x|0)
x Rc if L(x|1) < k L(x|0),
for some k 0, and
(2)
= P[X R|H0]
Then,
(a) Any test that satisfies (1) and (2) is a UMP level test.
(b) If there exists a test satisfying (1) and (2) with k > 0, then every
UMP level test satisfies (2) and every UMP level test satisfies (1)
except perhaps on a set A satisfying P[XA|H0] = P[XA|H1]=0
Chapter 8 - Testing
Neyman-Pearson Lemma
Note that, if = P[X R | H 0 ], we have a size test and hence a level test
because sup 0 P[X R] = P[X R | H 0 ] = , since 0 has only one point.
Define the test function (maps data into chosen hypothesis (1 or 0) as :
( x ) = 1 if x R,
(x) = 0
if x R c .
Let ( x ) be the test function of a test satisfying (1) and (2) and ' ( x ) be the test
function for any other level test, where the correspond ing power functions
are ( ) and ' ( ).
Since 0 ' ( x ) 1, ( ( x ) ' ( x ))( L ( x | 1) k L ( x | 0)) 0, for every x. Thus,
(3)
Neyman-Pearson Lemma
Proof of (a)
(a) is proved by noting ( 0) ' ( 0) = ' ( 0) 0.
Thus with k 0 and (3),
0 ( 1) ' ( 1) k ( ( 0) ' ( 0)) ( 1) ' ( 1)
showing ( 1) ' ( 1). Since ' is arbitrary and 1 is the only point in c0 ,
is an UMP test.
Chapter 8 - Testing
Neyman-Pearson Lemma
Proof of (b)
Now, let ' be the test function for any UMP level test.
By (a), , the test satisfying (1) and (2), is also a UMP level test.
Thus, ( 1) = ' ( 1). Using this result, (3), and k 0,
' ( 0) = ( 0) ' ( 0) 0.
Since ' is a level test, ' ( 0) , that is, ' is a size test implying that
(3) is an equality. But the nonnegativ e integrand in (3) will be 0 only if '
satisfies (1) except, perhaps, where
(x) =
L( 1 |
L( 0 |
x)
x)
(2 )
- n/2
n
( xi 1) 2 / 2
e i =1
n
( xi 0 ) 2 / 2
- n/2
i
=
(2 ) e 1
n
i =1
i =1
i =1
i =1
( xi 1) 2 + ( xi 0 ) 2
2
=e
( xi 1) 2 + ( xi 0 ) 2
That is,
(x) = e
n
ln (x) =
ln (x) =
>k
n
( xi 1 ) + ( xi 0 ) 2
i =1
i =1
> ln k
2
n
i =1
i =1
i =1
i =1
( xi 2 2 xi1 + n12 ) + xi 2 2 xi 0 + n 0 2
> ln k
2
n
Chapter 8 - Testing
Chapter 8 - Testing
Let u(X)=iT(Xi).
Then,
ln / u = [A(1)A(0)] >0, if A(.) is monotonic in .
In addition, u(X) is a sufficient statistic..
Some distributions with MLRP in T(X)= i xi: normal (with
known), exponential, binomial, Poisson.
Karlin-Rubin Theorem
Theorem: Karlin-Rubin (KR) Theorem
Suppose we are testing H0:0 vs. H1:>0. Let T(X) be a sufficient
statistic, and the family of distributions g(.) has the MLRP in T(X).
Then,, for any t0 the test with rejection region T>t0 is UMP level ,
where = Pr(T>t0|0).
Proof:
Let () be the power function for the test mentioned in KR.
() is nondecreasing, meaning for any 1>2,
(1) (2)
Pr(T(X) > t0|1). Pr(T(X) >t0|2).
This implies sup H0 () = (0) = , so the test is level .
Chapter 8 - Testing
Karlin-Rubin Theorem
Proof (continuation):
Now, consider testing the simple hypotheses H0:=0 vs. H1:=, with
0 >.
Define
k = inft g(t|)/g(t|0).
where is the region where t>t0 and at least one of the densities is
nonzero. Then, from the MLRP in T(X) of g,
T(X) > t0 g(t|)/g(t|0) > k
Thus, () satisfies the definition of the test given in the NP Lemma
for testing H0:=0 vs. H1:=, thus it is the UMP test for those
hypotheses. Since was arbitrary, the test is simultaneously most
powerful for every >0, thus it is UMP level for the composite
alternative hypothesis.
Chapter 8 - Testing
Unbiased test => (0) < (1) for all 0 in H0 and 1 in H1.
Most two-sided tests we use are UMP level unbiased (UMPU) tests.
Chapter 8 - Testing
No UMP test
Power function (again)
We define the power function as () = P[X R]. Ideally, we want ()
to be near 0 for H0, and () to be near 1 for H1.
The classical (frequentist) approach is to look in the class of all level
tests (all tests with sup H0 () ) and find the MP one available.
In some cases there is a UMP level test, as given by the Neyman
Pearson Lemma (simply hypotheses) and the Karlin Rubin Theorem
(one sided alternatives with univariate sufficient statistics with MLRP).
But, in many cases, there is no UMP test.
When no UMP test exists, we turn to general methods that produce
good tests.
Chapter 8 - Testing
General Methods
Likelihood Ratio (LR) Tests
Bayesian Tests - can be examined for their frequentist properties even
if you are not a Bayesian.
Pivot Tests - Tests based on a function of the parameter and data
whose distribution does not depend on unknown parameters. Wald,
Score and LR tests are examples of asymptotically pivotal tests.
Wald Tests - Based on the asymptotic normality of the MLE
Score tests - Based on the asymptotic normality of the log-likelihood
Pivot Tests
Pivot Test: A tests whose distribution does not depend on unknown
parameters.
Example: Suppose you draw X_ from a N(, 2).
Asymptotic theory implies that x is asymptotically N(, 2/N).
This statistic is not asymptotically pivotal statistic because it depends
on an unknown parameter, 2 (even if you specify 0 under H0).
_
Chapter 8 - Testing
Chapter 8 - Testing
(x) =
L( 0 | x)
_
L(x | x)
n
( xi 0) 2 / 2 2
(2 ) -n/2 e i =1
_
n
( xi x )2 / 2 2
-n/2 i =1
(2 ) e
i=1
i =1
2 2
=e
( xi 0) 2 + ( xi x ) 2
n( x 0 ) 2
2 2
n( x 0) 2
2
= e 2
< ln k
( x 0 ) 2
> 2 ln k
(x) =
0 n e 0 n x
_
= ( x 0 ) n e {n (1 0 x )}
(1 / x ) n e n
_
Reject H 0 if
( x) < k
ln (x) = n ln ( x 0 ) + n (1 0 x ) < ln k
Chapter 8 - Testing
D
- 2 log (X) = 2[log L(X, 0 ) log L(X, n )]
12
1
log L(X, ) = log L(X, n ) + nSn (X, n )( n ) + ( n )' nS'n (X, n )( n )
2
At n , Sn (X, n ) = 0. Then, at = 0
1
= n( 0 n )' S 'n (X, n )(0 n )
2
p
d
Since - S 'n (X, n* )
I (0 ) and n1/ 2 ( n 0 )