100% found this document useful (1 vote)
148 views30 pages

IE241 Hypothesis Testing

This document provides an introduction to hypothesis testing. It defines statistical hypotheses as assumptions about probability distributions. Hypothesis testing involves setting up a null hypothesis (Ho) and an alternative hypothesis (Ha) and using a procedure to decide whether to reject the null hypothesis based on sample data. There are two types of possible errors: type 1 errors where Ho is incorrectly rejected, and type 2 errors where Ho is incorrectly not rejected. The document outlines how to set significance levels (α) to control the probability of type 1 errors and choose tests that minimize the probability of type 2 errors (β). It provides an example of testing if two brands of light bulbs have different lifetimes on average. The document also discusses simple versus composite hypotheses and

Uploaded by

Anonymous RJtBkn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
148 views30 pages

IE241 Hypothesis Testing

This document provides an introduction to hypothesis testing. It defines statistical hypotheses as assumptions about probability distributions. Hypothesis testing involves setting up a null hypothesis (Ho) and an alternative hypothesis (Ha) and using a procedure to decide whether to reject the null hypothesis based on sample data. There are two types of possible errors: type 1 errors where Ho is incorrectly rejected, and type 2 errors where Ho is incorrectly not rejected. The document outlines how to set significance levels (α) to control the probability of type 1 errors and choose tests that minimize the probability of type 2 errors (β). It provides an example of testing if two brands of light bulbs have different lifetimes on average. The document also discusses simple versus composite hypotheses and

Uploaded by

Anonymous RJtBkn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 30

IE241: Introduction to

Hypothesis Testing

We said before that estimation of parameters


was one of the two major areas of statistics.
Now lets turn to the second major area of
statistics, hypothesis testing.
What is a statistical hypothesis? A statistical
hypothesis is an assumption about f(X) if X is
continuous or p(X) if X is discrete.
A test of a statistical hypothesis is a procedure
for deciding whether or not to reject the
hypothesis.

Lets look at an example.


A buyer of light bulbs bought 50 bulbs of
each of two brands. When he tested
them, Brand A had an average life of 1208
hours with a standard deviation of 94
hours. Brand B had a mean life of 1282
hours with a standard deviation of 80
hours. Are brands A and B really different
in quality?

We set up two hypotheses.


The first, called the null hypothesis Ho, is
the hypothesis of no difference.
Ho: A = B
The second, called the alternative
hypothesis Ha, is the hypothesis that
there is a difference.
Ha: A B

On the basis of the sample of 50 from


each of the two populations of light bulbs,
we shall either reject or not reject the
hypothesis of no difference.
In statistics, we always test the null
hypothesis. The alternative hypothesis is
the default winner if the null hypothesis is
rejected.

We never really accept the null hypothesis;


we simply fail to reject it on the basis of
the evidence in hand.
Now we need a procedure to test the null
hypothesis. A test of a statistical
hypothesis is a procedure for deciding
whether or not to reject the null
hypothesis.
There are two possible decisions, reject or
not reject. This means there are also two
kinds of error we could make.

The two types of error are shown in the table


below.
True state

Ho true

Ho false

Reject Ho

Type 1
error

Correct
decision

Do not
reject Ho

Correct
decision

Type 2
error

Decision

If we reject Ho when Ho is in fact true, then


we make a type 1 error. The probability of
type 1 error is .
If we do not reject Ho when Ho is really
false, then we make a type 2 error. The
probability of a type 2 error is .

Now we need a decision rule that will


make the probability of the two types of
error very small. The problem is that the
rule cannot make both of them small
simultaneously.
Because in science we have to take the
conservative route and never claim that we
have found a new result unless we are
really convinced that it is true, we choose
a very small , the probability of type 1
error.

Then among all possible decision rules given , we choose the one that makes as small
as possible.
The decision rule consists of a test statistic and a critical region where the test statistic
may fall. For means from a normal population, the test statistic is

X X
X X
t

s
s
s

n A nB

where the denominator is the standard deviation of the difference between two independent
A
B
A
B
means.
2
2
diff
A
B

The critical region is a tail of the distribution of


the test statistic. If the test statistic falls in the
critical region, Ho is rejected.
Now, how much of the tail should be in the
critical region? That depends on just how small
you want to be. The usual choice is = .05,
but in some very critical cases, is set at .01.
Here we have just a non-critical choice of light
bulbs, so well choose = .05. This means that
the critical region has probability = .025 in each
tail of the t distribution.

For a t distribution with .025 in each tail,


the critical value of t = 1.96, the same as
z because the sample size is greater than
30. The critical region then is
|t |
> 1.96.
In our light bulb example, the test statistic
is
t

1282 1208

74

4.23
2
2
17.5
80
94

50
50

Now 4.23 is much greater than 1.96 so we


reject the null hypothesis of no difference
and declare that the average life of the B
bulbs is longer than that of the A bulbs.
Because = .05, we have 95% confidence
in the decision we made.

We cannot say that there is a 95% probability that


we are right because we are either right or wrong
and we dont know which.
But there is such a small probability that t will
land in the critical region if Ho is true that if it
does get there, we choose to believe that Ho is
not true.
If we had chosen = .01, the critical value of t
would be 2.58 and because 4.23 is greater than
2.58, we would still reject Ho. This time it would
be with 99% confidence.

How do we know that the test we used is


the best test possible?
We have controlled the probability of Type
1 error. But what is the probability of Type
2 error in this test? Does this test
minimize it subject of the value of ?

To answer this question, we need to


consider the concept of test power. The
power of a statistical test is the probability
of rejecting Ho when Ho is really false.
Thus power = 1-.
Clearly if the test maximizes power, it
minimizes the probability of Type 2 error
. If a test maximizes power for given , it
is called an admissible testing strategy.

Before going further, we need to distinguish


between two types of hypotheses.
A simple hypothesis is one where the value of
the parameter under Ho is a specified constant
and the value of the parameter under Ha is a
different specified constant.
For example, if you test
Ho: = 0 vs Ha: = 10
then you have a simple hypothesis test.
Here you have a particular value for Ho and a
different particular value for Ha.

For testing one simple hypothesis Ha against the


simple hypothesis Ho, a ground-breaking result
called the Neyman-Pearson lemma provides the
most powerful test.
L(a )

L(0 )

is a likelihood ratio with the Ha parameter MLE in


the numerator and the Ho parameter MLE in the
denominator. Clearly, any value of > 1 would
favor the alternative hypothesis, while values less
than 1 would favor the null hypothesis.

Consider the following example of a test of


two simple hypotheses.
A coin is either fair or has p(H) = 2/3.
Under Ho, P(H) = and under Ha, P(H) =
2/3.
The coin will be tossed 3 times and a
decision will be made between the two
hypotheses. Thus X = number of heads =
0, 1, 2, or 3. Now lets look at how the
decision will be made.

First, lets look at the probability of Type 1 error .


In the table below, Ho P(H) =1/2 and Ha P(H)
= 2/3.
X P(X|Ho)

P(X|Ha)

1/8

1/27

1
2

3/8
3/8

6/27
12/27

1/8

8/27

Now what should the critical region be?

Under Ho, if X = 0, = 1/8. Under Ho, if X = 4, = 1/8.


So if either of these two values is chosen as the critical
region, the probability of Type 1 error would be the same.
Now what if Ha is true? If X = 0 is chosen as the critical
region, the value of = 26/27 because that is the
probability that X 0. On the other hand, if X = 4 is
chosen as the critical region, the value of = 19/27
because that is the probability that X 3.
Clearly, the better choice for the critical region is X=3
because that is the region that minimizes for fixed .
So this critical region provides the more powerful test.

In discrete variable problems like this, it


may not be possible to choose a critical
region of the desired . In this illustration,
you simply cannot find a critical region
where = .05 or .01.
This is seldom a problem in real-life
experimentation because n is usually
sufficiently large so that there is a wide
variety of choices for critical regions.

This problem to illustrate the general


method for selecting the best test was
easy to discuss because there was only a
single alternative to Ho.
Most problems involve more than a single
alternative. Such hypotheses are called
composite hypotheses.

Examples of composite hypotheses:


Ho: = 0 vs

Ha: 0

which is a two-sided Ha.


A one-sided Ha can be written as
Ho: = 0 vs

Ha: > 0

Ho: = 0 vs

Ha: < 0

or
All of these hypotheses are composite because
they include more than one value for Ha. And
unfortunately, the size of here depends on the
particular alternative value of being considered.

In the composite case, it is necessary to


compare Type 2 errors for all possible
alternative values under Ha. So now the
size of Type 2 error is a function of the
alternative parameter value .
So () is the probability that the sample
point will fall in the noncritical region when
is the true value of the parameter.

Because it is more convenient to work with


the critical region, the power function 1() is usually used.
The power function is the probability that
the sample point will fall in the critical
region when is the true value of the
parameter.
As an illustration of these points, consider
the following continuous example.

Let X = the time that elapses between two


successive trippings of a Geiger counter in
studying cosmic radiation. It is assumed
that the density function is
f(x;) = e-x
where is a parameter which depends on
experimental conditions.
Under Ho, = 2. Now a physicist believes
that < 2. So under Ha, < 2.

Now one choice for the critical region is X 1. and

2e 2 x dx .135
1

Another choice is the left tail, X .07 for which = .


135. That is,
.07

2e 2 x dx .135
0

Now lets examine the power functions for the two


competing critical regions.

For the critical region X > 1,

1 (1 ) e x dx e
1

and for the critical region X <.07,


.07

1 ( 2 ) e x dx 1 e .07
0

The graphs of these two functions are called the


power curves for the two critical regions.

These two power functions are

Note that the power function for X>1 region is always higher than the
power function for X<.07 region before they cross at = 2. Since the
alternative values in the problem are all <2, clearly the region X>1
is superior.

You might also like