Testing in Statistics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Chapter 8 - Testing

Chapter 8
Testing

Hypothesis Testing
A statistical hypothesis test is a method of making decisions using
experimental data. A result is called statistically significant if it is unlikely
to have occurred by chance.
These decisions are made using (null) hypothesis tests. A hypothesis
can specify a particular value for a population parameter, say =0.
Then, the test can be used to answer a question like:
Assuming 0 is true, what is the probability of observing a value for
the test statistic that is at least as big as the value that was actually
observed?
Uses of hypothesis testing:
- Check the validity of theories or models.
- Check if new data can cast doubt on established facts.

Chapter 8 - Testing

Hypothesis Testing
We will emphasize statistical hypothesis testing under the classical
approach (frequentist school).
There is a Bayesian approach to hypothesis testing. The decisions
regarding the parameter are based on the posterior probability i.e.,
the conditional probability that is computed after the relevant
evidence (the data, X) is taken into account. Based on the posterior
probabilities associated with different hypothetical values for , we
asses which hypothesis about is more likely.
Posterior: p( | X) p() p(X | ). (: proportional)
p(): Prior.
p(X | ): Likelihood

Hypothesis Testing
In general, there are two kinds of hypotheses:
(1) About the form of the probability distribution
Example: Is the random variable normally distributed?
(2) About the parameters of a distribution function
Example: Is the mean of a distribution equal to 0?
The second class is the traditional material of econometrics. We may
test whether the effect of income on consumption is greater than
one, or whether the size coefficient on a CAPM regression is equal to
zero.

Chapter 8 - Testing

Hypothesis Testing
Hypothesis testing involves the comparison between two competing
hypothesis (sometimes, they represent partitions of the world).
The null hypothesis, denoted H0, is sometimes referred to as the
maintained hypothesis.
The alternative hypothesis, denoted H1, is the hypothesis that will
be considered if the null hypothesis is rejected.
Idea: We collect a sample of data X1,Xn. This sample is a
multivariate random variable, En (an element of an Euclidean space).
Then, based on this sample, we follow a decision rule:
If the multivariate random variable is contained in space R, we
reject the null hypothesis.
Alternatively, if the random variable is in the complement of the
space R (RC) we fail to reject the null hypothesis.

Hypothesis Testing
Decision rule:
if X R ,

we reject H 0

if X R or X R C ,

we fail to reject H 0

The set R is called the region of rejection or the critical region of the test
The rejection region is defined in terms of a statistics T(X ), called the
test statistic. Note that like any other statistic, T(X) is a random variable.
Given this test statistic, the decision rule can then be written as:
T ( X ) R reject H 0
T ( X ) R C fail to reject H 0

Chapter 8 - Testing

Hypothesis Testing: A brief comment


What we present as classical approach is a synthesized approach.
Ronal Fisher defined only H0. Under his approach we:
1. Identify H0.
2. Determine the appropriate T(X) and its distribution under the
assumption that H0 is true.
3. Calculate T(X) from the data.
4. Determine the achieved significance level that corresponds to the
T(X) using the distribution under the assumption that H0 is true.
5. Reject H0 if the achieved significance level is sufficiently small.
Otherwise, reach no conclusion.
This construct leads to the question of what p-value is sufficiently
small as to warrant rejection of H0. Fisher favored 5% or 1%.

Hypothesis Testing: A brief comment


Neyman and Pearson in their approach added H1. Steps:
1. Identify H0 and a complementary hypothesis, H1.
2. Determine the appropriate T(X) and its distribution under the
assumption that H1 is true.
3. Specify a significance level (), and determine the corresponding
critical value of T(X) under the assumption that H1 is true.
4. Calculate T(X) from the data.
5. Reject H1 and accept H0 if the T(X) is further than the critical value
from E[T(X)|H0 true].
The Neyman-Pearson approach is important in decision theory. The
final step is assigned a risk function computed as the expected loss
from making an error.

Chapter 8 - Testing

Hypothesis Testing
There are two types of hypothesis regarding parameters:
(1) A simple hypothesis. Under this scenario, we test the value of a
parameter against a single alternative.
Example: H0:=0 against H1:=1.
(2) A composite hypothesis. Under this scenario, we test whether the
effect of income on consumption is greater than one. Implicit in this
test is several alternative values.
Example: H0:>0 against H1:<1.
Definition: Simple and composite hypotheses
A hypothesis is called simple if it specifies the values of all the
parameters of a probability distribution, say =0. Otherwise, it is called
composite.

Type I and Type II Errors


Definition: Type I and Type II errors
A Type I error is the error of rejecting H0 when it is true. A Type II error
is the error of accepting H0 when it is false (that is when H1 is true).
Notation:

Probability of Type I error: = P[X R|H0]


Probability of Type II error: = P[X RC|H1]

Definition: Power of the test


The probability of rejecting H0 based on a test procedure is called the
power of the test. It is a function of the value of the parameters tested, :
= () = P[X R].
Note: when H1

=> () = 1- ().

Chapter 8 - Testing

Type I and Type II Errors


We want () to be near 0 for H0, and () to be near 1 for
H1.
Definition: Level of significance
When H0 , () gives you the probability of Type I error. This
probability depends on . The maximum value of this when H0 is
called level of significance (significance level) of a test, denoted by . Thus,

= sup H0 P[X R|H0] = sup H0 ()


Define a level test to be a test with sup H0 () .
Sometimes, . is called the size of a test.

Type I and Type II Errors


State of World
Ho true

H1 true (Ho false)

Cannot reject
(accept) Ho

Correct decision

Type II error

Reject Ho

Type I error

Correct decision

Decision

Need to control both types of error:


= P(rejecting Ho|Ho)
= P(not rejecting Ho|H1)

Chapter 8 - Testing

Type I and Type II Errors


0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

-1.50

-1.00

0
0.00

-0.50

H0

0.50

1.00

1.50

H1

= Type II error

= Type I error
= Power of test

Type I and Type II Errors: Example


Example. Let X have the density

f (x ) = 1 + x for 1 x <
= 1+ x
for x + 1
This is a triangular probability density function.
We test H0:=0 against H1:=1, using a single observation of X.
1.20

1.00

0.80

0.60

0.40

0.20

0.00
-1.50

-1.00

-0.50

0.00

0.50

1.00

-0.20
0

1.50

2.00

2.50

Chapter 8 - Testing

Type I and Type II Errors: Example


Type I and Type II errors i.e., the areas of the isosceles trianglesare then defined by the choice of t, the cut off region:
1
(1 t )2
2
1 2
=
t
2

Deriving in terms of yields:

1
1
2

1/2

1/2

Type I and Type II Errors: Example


The choice of any t yields an admissible test. However, any
randomized test is inadmissible.
Theorem.
The set of admissible characteristics plotted on the , plane is a
continuous, monotonically decreasing, convex function which starts at
a point with [0,1] on the axis and ends at a point within the [0,1] on
the axis.

Chapter 8 - Testing

Type I and Type II Errors


There is a natural trade-off between Type I and Type II errors. It is
impossible to minimize both.
Q: How do we select a test?
Assume that we want to compare two critical regions R1 and R2.
Assume that we choose either confidence region R1or R2 randomly with
probabilities and 1-, respectively. This is called a randomized test.
If the probabilities of the two types of error for R1 and R2 are (1,1)
and (2,2) respectively. The probability of each type of error becomes:

=
=

+ (1 )

+ (1

) 2

The values (,) are the characteristics of the test.

More Powerful Test


Definition: More Powerful Test
Let (1,1) and (2,2) be the characteristics of two tests. The first
test is more powerful (better) than the second test if 1 2, and 1 2
with a strict inequality holding for at least one point.
If we cannot determine that one test is better by the definition, we
could consider the relative cost of each type of error. Classical
statisticians typically do not consider the relative cost of the two errors
because of the subjective nature of this comparison.
Note: Bayesian statisticians compare the relative cost of the two errors
using a loss function.

Chapter 8 - Testing

Most Powerful Test


Definition: Admissible test
A test is inadmissible if there exits another test, which is better.
Otherwise, it is called admissible.
Definition: Most powerful test of size
R is the most powerful test of size if (R)= and for any test R1 of size
, (R) (R1).
Definition: Most powerful test of level
R is the most powerful test of level (that is, such that (R) ) and for
any test R1 of level (that is, (R1) ), if (R) (R1).

UMP Test
Definition: Uniformly most powerful (UMP) test
R is the uniformly most powerful test of level (that is, such that (R) )
and for every test R1 of level (that is, (R1) ), if (R) (R1).
For every test: for alternative values of 1 in H1:=1. .
Choosing between admissible test statistics in the (,) plane is
similar to the choice of a consumer choosing a consumption point in
utility theory. Similarly, the tradeoff problem between and can be
characterized as a ratio.

This idea is the basis of the Neyman-Pearson Lemma to construct a test


of a hypothesis about : H0:=0 against H1:=1.

Chapter 8 - Testing

Neyman-Pearson Lemma
Neyman-Pearson Lemma provides a procedure for selecting the
best test of a simple hypothesis about : H0:=0 against H1:=1.
Let L(x|) be the joint density function of X. We determine R
based on the ratio L(x|1)/L(x|0). (This ratio is called the likelihood
ratio.) The bigger this ratio, the more likely the rejection of H0.

Jerzy Neyman (1894-1981)

Egon Pearson (1895-1980)

Neyman-Pearson Lemma
Consider testing a simple hypothesis H0: = 0 vs. H1: = 1, where
the pdf corresponding to i is L(x|i), i=0,1, using a test with
rejection region R that satisfies
(1)
x R if L(x|1) > k L(x|0)
x Rc if L(x|1) < k L(x|0),
for some k 0, and
(2)
= P[X R|H0]
Then,
(a) Any test that satisfies (1) and (2) is a UMP level test.
(b) If there exists a test satisfying (1) and (2) with k > 0, then every
UMP level test satisfies (2) and every UMP level test satisfies (1)
except perhaps on a set A satisfying P[XA|H0] = P[XA|H1]=0

Chapter 8 - Testing

Neyman-Pearson Lemma
Note that, if = P[X R | H 0 ], we have a size test and hence a level test
because sup 0 P[X R] = P[X R | H 0 ] = , since 0 has only one point.
Define the test function (maps data into chosen hypothesis (1 or 0) as :
( x ) = 1 if x R,

(x) = 0

if x R c .

Let ( x ) be the test function of a test satisfying (1) and (2) and ' ( x ) be the test
function for any other level test, where the correspond ing power functions
are ( ) and ' ( ).
Since 0 ' ( x ) 1, ( ( x ) ' ( x ))( L ( x | 1) k L ( x | 0)) 0, for every x. Thus,
(3)

0 [ ( x ) ' ( x )][ L ( x | 1) k L ( x | 0)]dx


= ( 1) ' ( 1) k ( ( 0) ' ( 0)).

Neyman-Pearson Lemma
Proof of (a)
(a) is proved by noting ( 0) ' ( 0) = ' ( 0) 0.
Thus with k 0 and (3),
0 ( 1) ' ( 1) k ( ( 0) ' ( 0)) ( 1) ' ( 1)
showing ( 1) ' ( 1). Since ' is arbitrary and 1 is the only point in c0 ,

is an UMP test.

Chapter 8 - Testing

Neyman-Pearson Lemma
Proof of (b)
Now, let ' be the test function for any UMP level test.
By (a), , the test satisfying (1) and (2), is also a UMP level test.
Thus, ( 1) = ' ( 1). Using this result, (3), and k 0,

' ( 0) = ( 0) ' ( 0) 0.
Since ' is a level test, ' ( 0) , that is, ' is a size test implying that
(3) is an equality. But the nonnegativ e integrand in (3) will be 0 only if '
satisfies (1) except, perhaps, where

L(x | )dx = 0 on a set A.


i

Neyman-Pearson Lemma: Example


Let X1,...Xn be a random sample from a N( , 1) population. Test : H0 : = 0 vs H1 : = 1.
The Neyman - Pearson lemma is based on the ratio (x) :

(x) =

L( 1 |

L( 0 |

x)

x)

(2 )

- n/2

n
( xi 1) 2 / 2
e i =1

n
( xi 0 ) 2 / 2
- n/2
i
=
(2 ) e 1
n

i =1

i =1

i =1

i =1

( xi 1) 2 + ( xi 0 ) 2
2

=e

( xi 1) 2 + ( xi 0 ) 2

That is,

(x) = e
n

ln (x) =

ln (x) =

>k
n

( xi 1 ) + ( xi 0 ) 2
i =1

i =1

> ln k

2
n

i =1

i =1

i =1

i =1

( xi 2 2 xi1 + n12 ) + xi 2 2 xi 0 + n 0 2
> ln k

2
n

ln (x) = xi (1 0 ) + n( 0 2 12 ) / 2 = n(1 0 ) + n x ( 0 2 12 ) / 2 > ln k


i =1

Chapter 8 - Testing

Neyman-Pearson Lemma: Example


_

We will reject H0 if ln (x) > ln k. But, this reduces to x > d, where d


is selected to give a size test.
Thus, the
critical region is R = { x: _ > d },
_
x
and P[ x> d|=0 ]= .
_

Under H0, we have


z= x - 0 ~ N(0,1)
_
=> P[ x > d|=0 ]= P[ z > (d- 0)|=0 ]= .
=> d = z+ _0.
=> R = { x: x > z+ 0}.
Note: We reject H0 if the sample mean is greater than z+ 0. But, R
is independent of 1 and it is the same for any 1>0. Thus, R gives a
UMP for H0: = 0 vs. H1: > 0.

Monotone Likelihood Ratio


In general, we have no basis to pick 1. We need a procedure to test
composite hypothesis, preferably with a UMP.
Definition: Monotone Likelihood Ratio
The model f(X,) has the monotone likelihood ratio property in u(X) if there
exists a real valued function u(X) such that the likelihood ratio
= L(x|1)/L(x|0) is a non-decreasing function of u(X) for each
choice of 1 and 0. with 1>0.
If L(x|1) satisfies the MLRP with respect to L(x|0) the higher the
observed value u(X), the more likely it was drawn from distribution
L(x|1) rather than L(x|0).

Chapter 8 - Testing

Monotone Likelihood Ratio


Consider the exponential family:
L(X;) = exp{iU(Xi) A() iT(Xi) + n B()}.
Then,

ln = iT(Xi) [A(1)A(0)] + nB(1) nB(0).

Let u(X)=iT(Xi).
Then,
ln / u = [A(1)A(0)] >0, if A(.) is monotonic in .
In addition, u(X) is a sufficient statistic..
Some distributions with MLRP in T(X)= i xi: normal (with
known), exponential, binomial, Poisson.

Karlin-Rubin Theorem
Theorem: Karlin-Rubin (KR) Theorem
Suppose we are testing H0:0 vs. H1:>0. Let T(X) be a sufficient
statistic, and the family of distributions g(.) has the MLRP in T(X).
Then,, for any t0 the test with rejection region T>t0 is UMP level ,
where = Pr(T>t0|0).
Proof:
Let () be the power function for the test mentioned in KR.
() is nondecreasing, meaning for any 1>2,
(1) (2)
Pr(T(X) > t0|1). Pr(T(X) >t0|2).
This implies sup H0 () = (0) = , so the test is level .

Chapter 8 - Testing

Karlin-Rubin Theorem
Proof (continuation):
Now, consider testing the simple hypotheses H0:=0 vs. H1:=, with
0 >.
Define
k = inft g(t|)/g(t|0).
where is the region where t>t0 and at least one of the densities is
nonzero. Then, from the MLRP in T(X) of g,
T(X) > t0 g(t|)/g(t|0) > k
Thus, () satisfies the definition of the test given in the NP Lemma
for testing H0:=0 vs. H1:=, thus it is the UMP test for those
hypotheses. Since was arbitrary, the test is simultaneously most
powerful for every >0, thus it is UMP level for the composite
alternative hypothesis.

KR Theorem: Practical Use


Goal: Find the UMP level test of H0:0 vs. H1:>0 (similar for
H0:0 vs. H1:<0)
1.
If possible, find a univariate sufficient statistic T(X). Verify its
density has an MLR (might be non-decreasing or non-increasing,
just show it is monotonic).
2. KR states the UMP level test is either 1) reject if T>t0 or 2) reject
if T<t0. Which way depends on the direction of the MLR and the
direction of H1.
3. Derive E[T] as a function of . Choose the direction to reject (T>t0
or T<t0) based on whether E[T] is higher or lower for in H1. If
E[T] is higher for values in H1, reject when T>t0, otherwise reject
for T<t0.

Chapter 8 - Testing

KR Theorem: Practical Use


4.

t0 is the appropriate percentile of the distribution of T when =0.


This percentile is either the percentile (if you reject for T<t0) or
the 1- percentile (if you reject for T>t0).

Herman Rubin (1926)

Samuel Karlin (1924-2007)

Nonexistence of UMP tests


For most two-sided hypotheses i.e., H0:=0 vs. H1:0-, no UMP
level test exists.
Simply intuition: the test which is UMP for <0 is not the same as the
test which is UMP for >0. A UMP test must be most powerful across
every value in H1.
Definition: Unbiased Test
A test is said to be unbiased when
()
for all H1
and
P[Type I error]: P[X R|H0] = ()

for all H0.

Unbiased test => (0) < (1) for all 0 in H0 and 1 in H1.
Most two-sided tests we use are UMP level unbiased (UMPU) tests.

Chapter 8 - Testing

Some problems left for students


So far, we have produced UMP level tests for simple versus simple
hypotheses (H0:=0 vs. H1:=1) and one sided tests with MLRP
(H0:0 vs. H1:>0)..
There are a lot of unsolved problems. In particular,
(1) We did not cover unbiased tests in detail, but they are often simply
combinations of the UMP tests in each directions
(2) Karlin-Rubin discussed univariate sufficient statistics, which leaves
out every problem with more than one parameter (for example testing
the equality of means from two populations).
(3) Every problem without an MLR is left out.
.

No UMP test
Power function (again)
We define the power function as () = P[X R]. Ideally, we want ()
to be near 0 for H0, and () to be near 1 for H1.
The classical (frequentist) approach is to look in the class of all level
tests (all tests with sup H0 () ) and find the MP one available.
In some cases there is a UMP level test, as given by the Neyman
Pearson Lemma (simply hypotheses) and the Karlin Rubin Theorem
(one sided alternatives with univariate sufficient statistics with MLRP).
But, in many cases, there is no UMP test.
When no UMP test exists, we turn to general methods that produce
good tests.

Chapter 8 - Testing

General Methods
Likelihood Ratio (LR) Tests
Bayesian Tests - can be examined for their frequentist properties even
if you are not a Bayesian.
Pivot Tests - Tests based on a function of the parameter and data
whose distribution does not depend on unknown parameters. Wald,
Score and LR tests are examples of asymptotically pivotal tests.
Wald Tests - Based on the asymptotic normality of the MLE
Score tests - Based on the asymptotic normality of the log-likelihood

Pivot Tests
Pivot Test: A tests whose distribution does not depend on unknown
parameters.
Example: Suppose you draw X_ from a N(, 2).
Asymptotic theory implies that x is asymptotically N(, 2/N).
This statistic is not asymptotically pivotal statistic because it depends
on an unknown parameter, 2 (even if you specify 0 under H0).
_

On the other hand, the t-statistic, t = ( x - 0)/s is asymptotically N(0, 1).


This is asymptotically pivotal since 0 and 1 are known!
Most statistics are not asymptotically pivotal. Many popular test statistics
-for example, Wald, LR- are asymptotically pivotal because they are
distributed as 2 with known df or follow an N(0, 1) distribution.

Chapter 8 - Testing

Likelihood Ratio Tests


Define the likelihood ratio (LR) statistic
(X) = sup H0 L(X|)/ sup L(X|)
Note:
Numerator: maximum of the LF within H0
Denominator: maximum of the LF within the entire parameter space,
which occurs at the MLE.
Reject H0 if (X) <k, where k is determined by
Prob[0 < (X) < k| H0] = .

Properties of the LR statistic (X)


Properties of (X) = sup H0 L(X|)/ sup L(X|)
(1) 0 (X) 1, with (X) = 1 if the supremum of the likelihood occurs
within H0.
Intuition of test: If the likelihood is much larger outside H0 -i.e., in the
unrestricted space-, then (X) will be small and H0 should be rejected.
(2) Under general assumptions, -2 ln (X) ~ p2, where p is the
difference in df between the H0 and the general parameter space.
(3) For simple hypotheses, the numerator and denominator of the LR
test are simply the likelihoods under H0 and H1. The LR test reduces to a
test specified by the NP Lemma.

Chapter 8 - Testing

Likelihood Ratio Tests: Example I


Example: (X) for a X~N(,2) for H0:=0 vs. H1:0. Assume 2 is
known.

(x) =

L( 0 | x)
_

L(x | x)

n
( xi 0) 2 / 2 2
(2 ) -n/2 e i =1
_
n
( xi x )2 / 2 2
-n/2 i =1
(2 ) e

i=1

i =1

2 2

=e

Reject H0 if ( x) < k ln (x) =

( xi 0) 2 + ( xi x ) 2

n( x 0 ) 2
2 2

n( x 0) 2
2
= e 2

< ln k

( x 0 ) 2

> 2 ln k

Note: Finding k is not needed.


Why? We know the left hand side is distributed as a p2, thus (-2 ln k)
needs to be the 1- percentile of a p2. We need not solve explicitly for
k, we just need the rejection rule.

Likelihood Ratio Tests: Example II


Example: (X) for a X~exponential () _for H0: =0 vs. H1:
.
_ 0
n
n
L(X|)= exp(- i xi) = exp(- n x)
=> MLE=1/ x
_

(x) =

0 n e 0 n x
_

= ( x 0 ) n e {n (1 0 x )}

(1 / x ) n e n
_

Reject H 0 if

( x) < k

ln (x) = n ln ( x 0 ) + n (1 0 x ) < ln k

We need to find k such that P[(X)<k] = . Unfortunately,


this is not
_
analytically feasible. We know the distribution of x is Gamma(n; /n),
but we cannot get further.
It is, however, possible to determine the cutoff point, k, by simulation
(set n, 0).

Chapter 8 - Testing

Asymptotic Distribution of the LRT Simple H0


Theorem : Test H 0 : = 0 vs. H1 : 0 . Suppose X1 ,...Xn are iid with pdf f ( x | ).

Let be the MLE of , and f ( x | ) satisfies the following regularity conditions :


(1) The parameteris identifiable - i.e., if ' , then f ( x | ) f ( x | ' ).
(2) The densities f ( x | ) have some common support, and f ( x | ) is .
differentiable in .
(3) The parameterspace contains an open set of which the true parameter
value 0 is an interior point.
,

(4) x X the density f ( x | ) is three times differentiable with respect to , the

third derivativeis continuous in , and f ( x | )dx can be differentiated three


times under theintegral sign.
(5) , c > 0 and a function M( x) (both depend on 0 ) such that :
3
3

log f ( x | ) M( x) x X , 0 - c < < 0 + c,

with E 0 [M(X)] < .

Asymptotic Distribution of the LRT Simple H0


Then under H 0 , as n ,

D
- 2 log (X) = 2[log L(X, 0 ) log L(X, n )]
12

If is a vector in 0 - 2 log (X) 2p ,


p : [# of free parameters under 0 ] [# of free parameters under ].

Proof: Expand L(x|) around n , the MLE.

1
log L(X, ) = log L(X, n ) + nSn (X, n )( n ) + ( n )' nS'n (X, n )( n )
2

At n , Sn (X, n ) = 0. Then, at = 0

log (X) = log L(X, 0 ) log L(X, n )

1
= n( 0 n )' S 'n (X, n )(0 n )
2

p
d
Since - S 'n (X, n* )
I (0 ) and n1/ 2 ( n 0 )

N (0, I ( 0 ) 1 ). Then, - 2 log (X) ~ 2p

You might also like