0% found this document useful (0 votes)

69 views12 pages

Introduction To Hypothesis Testing: 8.1 Basic Structure of Hypothesis Tests

This document provides an overview of hypothesis testing. It defines the basic structure of hypothesis tests including the null and alternative hypotheses (H0 and H1), rejection region, test statistic, and critical value. Hypotheses can be simple, specifying all parameter values, or composite. A good hypothesis test should be more likely to reject H0 when it is false than when it is true. The document discusses type I and type II errors that can occur when the test decision does not match the true hypothesis.

Uploaded by

Jung Yoon Song

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views12 pages

Introduction To Hypothesis Testing: 8.1 Basic Structure of Hypothesis Tests

Uploaded by

Jung Yoon Song

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

STATS 200 (Stanford University, Summer 2015)

Lecture 8:

Introduction to Hypothesis Testing

It is often the case that we wish to use data to make a binary decision about some unknown
aspect of nature. For example, we may wish to decide whether or not it is plausible that a
parameter takes some particular value. A frequentist approach to using data to make such
decisions is hypothesis testing, also called significance testing.
Note: There exist Bayesian counterparts of frequentist hypothesis tests, but the two
philosophies differ more substantially for these types of binary decisions than for estimation problems.

8.1

Basic Structure of Hypothesis Tests

A hypothesis test consists of two hypotheses and a rejection region. The rejection region
may be specified via a test statistic and a critical value. We define each of these terms below.
Hypotheses
A hypothesis is any statement about an unknown aspect of a distribution. In a hypothesis
test, we have two hypotheses:
H0 , the null hypothesis, and
H1 , the alternative hypothesis.
Often a hypothesis is stated in terms of the value of one or more unknown parameters, in
which case it is called a parametric hypothesis. Specifically, suppose we have an unknown
parameter . Then parametric hypotheses about can be written in general as H0 0
and H1 1 , where 0 and 1 are disjoint, i.e., 0 1 = . We will typically assume
hypotheses to be parametric unless clearly stated otherwise.
Example 8.1.1: Let X1 , . . . , Xn be iid observations from a distribution with an unknown
mean R. Parametric hypotheses about could be H0 2 and H1 > 2. A different
set of parametric hypotheses could be H0 = 2 and H1 2.

Hypotheses can be further classified as simple or composite.

A hypothesis is simple if it fully specifies the distribution of the data (including all
unknown parameter values). A parametric hypothesis is simple if it states specific
values for all unknown parameters.
A hypothesis is composite if it is not simple.
Note that taking both hypotheses to be simple is equivalent to allowing only two possible
values for the unknown parameter , which is often unrealistic in practice. Thus, at least one
hypothesis is typically composite, and sometimes both hypotheses are composite. (If only
one hypothesis is composite, it is usually the alternative hypothesis H1 , for reasons that will
become clear later.)

Lecture 8: Introduction to Hypothesis Testing

Example 8.1.2: Let X1 , . . . , Xn iid N (, 2 ), and consider various sets of hypotheses:

H0 = 40 versus H1 = 45, with 2 known. H0 and H1 are both simple.
H0 = 40 versus H1 40, with 2 known. H0 is simple, and H1 is composite.
H0 = 40 versus H1 40, with 2 unknown. H0 and H1 are both composite.
H0 40 versus H1 > 40. H0 and H1 are both composite.
H0 (, 2 ) = (40, 9) versus H1 (, 2 ) (40, 9). H0 is simple, and H1 is composite.
Note that if 2 is unknown, any hypothesis that does not specify its value is composite.
Rejection Region
Suppose we have some data X. Let S denote the set of all possible values that X can take.
A test of the hypotheses H0 and H1 based on X is a rule of the form
Reject H0 (in favor of H1 ) if and only if X R,
where R is a subset of S. This set R is called the rejection region.
Note: When we do not reject H0 , we typically simply say that we fail to reject H0 .
Some people prefer to say instead that we accept H0 . For now, it is unimportant which
interpretation we prefer, since we are simply treating a hypothesis test as a rule for
using data to make a binary decision. However, we will revisit this issue later.

Example 8.1.3: Let X Bin(n, ), where 0 < < 1, and consider testing H0 = 1/2 versus
H1 1/2. Perhaps the simplest nontrivial test of these hypotheses is to reject H0 if and
only if the trials are all successes or all failures, i.e., if and only if X = 0 or X = n. Then the
rejection region is R = {0, n}.

Essentially, a hypothesis test is its rejection region, in the sense that two tests of the same
hypotheses based on the same data are identical tests if and only if they have the same
rejection region.
Test Statistic
It is common to write the rejection region R in the form
R = {x S T (x) c}

R = {x S T (x) > c},

where T S R is a function of the data and c R.

When the function T is applied to the data X, the resulting random variable T (X) is
called the test statistic. We can also talk about the specific value T (x) that the test
statistic takes for a particular data set X = x.
The number c is called the critical value. Different values of c yield different rejection
regions, which we write as Rc .

Lecture 8: Introduction to Hypothesis Testing

Notice that if c1 > c2 , then Rc1 Rc2 . Thus, writing rejection regions in this form allows us
to construct a series of nested rejection regions corresponding to the same test statistic.
Example 8.1.4: Let X Bin(n, ), where 0 < < 1, and consider testing H0 = 1/2 versus
H1 1/2. A simple test of these hypotheses is to reject H0 if and only if X/n is far enough
from 1/2. Then we could state the test statistic and rejection region as
T (X) =

X 1
,
n 2

Rc = {x S T (x) c},

where S = {0, 1, . . . , n} and c R. For example, if n = 6, then we have the following:

Critical Value

Rejection Region

c0
0 < c 1/6
1/6 < c 1/3
1/3 < c 1/2
1/2 < c

{0, 1, 2, 3, 4, 5, 6}
{0, 1, 2, 4, 5, 6}
{0, 1,
5, 6}
{0,
6}
{
}

Thus, we always reject H0 (Rc = S) if c 0 , while we never reject H0 (Rc = ) if c > 1/2.
Although any rejection region R can be written in the form R = {x S T (x) c} or
R = {x S T (x) > c} for some test statistic T (X) and some critical value c, it may be
occasionally be more convenient to express a rejection region in some other form.
Example 8.1.5: Example 9.1.7 of DeGroot & Schervish describes a test statistic Yn and
a hypothesis test in which we reject H0 if and only if Yn 2.9 or Yn 4.0. On page 533,
DeGroot & Schervish claim that the rejection region of this test cannot be written in the
form {x S T (x) c}. However, this claim is clearly incorrect, since we can simply
define another test statistic Zn = max{2.9 Yn , Yn 4.0} and write the rejection region as
{x S Zn (x) 0}. However, it is probably more convenient to work with the rejection
region in terms of the original Yn , even though it does not fit the standard form.

Good and Bad Hypothesis Tests (and Non-Tests)

Every subset of the sample space can be a rejection region, and every rejection region corresponds to a hypothesis test. However, not all such hypothesis tests are actually sensible.
A good hypothesis test should be more likely to reject H0 if it is actually false than if
it is actually true.
Mathematically, the rejection region R corresponds to a sensible test of H0 0
versus H1 1 if P (X R) tends to be higher for 1 than for 0 .
A perfect hypothesis test would have P (X R) equal to 0 or 1 according to whether
0 or 1 , respectively. However, this is typically impossible to achieve.
The probability P (X R), which is a function of , will be given a name in Section 8.2.

Lecture 8: Introduction to Hypothesis Testing

Example 8.1.6: Let X Bin(n, ), where 0 < < 1, and consider testing H0 = 1/2 versus
H1 1/2. Clearly the hypothesis tests proposed in Example 8.1.3 and Example 8.1.4 are
good since X is more likely to fall in the rejection region if 1/2 than if = 1/2.

Example 8.1.7: Let X Bin(n, ), where 0 < < 1, and consider testing H0 = 1/2 versus
H1 1/2. A legal hypothesis test is simply to always reject H0 . The rejection region of
this test is {0, 1, . . . , n}, the entire sample space. Another legal hypothesis test is simply to
never reject H0 . The rejection region of this test is . However, these two hypothesis tests
are obviously a waste of time.

Example 8.1.8: Let X Bin(n, ), where 0 < < 1, and consider testing H0 = 1/2 versus
H1 1/2. Suppose we take the test statistic to be T (X) = X and reject H0 if and only
if X c. This is a legal hypothesis test. However, it is not a good test of these hypotheses
since P (X c) is smaller for < 1/2 than for = 1/2. (Note, however, that it would be a
good test of H0 = 1/2 versus H1 > 1/2.)

Example 8.1.9: Let X Bin(n, ), where 0 < < 1, and consider testing H0 = 1/2 versus
H1 1/2. The seemingly perfect test that rejects H0 if and only if 1/2 is not a
hypothesis test at all, since it does not specify a rejection region as a subset of the sample
space. (It specifies a rule in terms of the parameter value itself, which of course is impossible
to apply since the parameter value is unknown.)

8.2

Properties of Hypothesis Tests

We now discuss basic properties of hypothesis tests in a probabilistic context. Remember that
hypothesis tests as discussed here are a fundamentally frequentist concept, so probabilities
discussed here are calculated as if the true parameter value is fixed but unknown.
Type I and Type II Errors
Recall that an ideal hypothesis test would always fail to reject H0 when it is true and would
always reject H0 when it is false. Then an ideal hypothesis test would be constructed so that
X R if and only if 1 . However, since X is random, this goal is typically impossible.
Hence, there is usually some chance that our test will make the wrong decision.
A type I error occurs if we reject H0 when it is true, i.e., if 0 and X R.
A type II error occurs if we fail to reject H0 when it is false, i.e., if 1 and X R.
The following table of possibilities may be helpful:
Truth
H0 0
H0 0
H1 1
H1 1

Data
X
X
X
X

R
R
R
R

Decision

Outcome

Fail to Reject H0
Reject H0
Fail to Reject H0
Reject H0

Correct Decision
Type I Error
Type II Error
Correct Decision

Lecture 8: Introduction to Hypothesis Testing

Of course, in reality we would not know whether a decision is correct or is an error, because
we would not know the true parameter value . However, we can still consider the probability
of each type of error.
If 0 , then the probability of a type I error is P (X R).
If 1 , then the probability of a type II error is P (X R) = 1 P (X R).
The true value of is unknown, but these probabilities can be calculated for each possible .
Power Function
The power function of a hypothesis test with rejection region R is Power() = P (X R).
Note: We will write Power() to avoid any notational confusion, but be aware that
this notation is nonstandard. Our textbook uses () for the power function, while
another textbook uses (). The latter choice is particularly confusing since many
people instead use to denote the probability of a type II error.

Notice that the power function provides the probabilities of both error types:

P (type I error)
Power() = P (X R) =

1 P (type II error)

if 0 ,
if 1 .

Note: When people use the word power in the context of hypothesis tests, they
usually mean 1 P (type II error), i.e., they mean the values of Power() for 1 .
The definition of the power function above is simply the logical extension to 0 as
well. Note, however, that it is actually bad if Power() is large for 0 .

The perfect power function would be

0
Power() = 11 () =

if 0 ,
if 1 ,

but we know this is typically impossible since it corresponds to a perfect hypothesis test.
More practically, we want Power() to be small for 0 and large for 1 .
Example 8.2.1: Let X Bin(6, ), where 0 < < 1 and consider testing H0 1/2 versus
H1 > 1/2 using one of the following three hypothesis tests:
Test 1: Reject H0 if and only if X = 6. The power function of this hypothesis test is
Power(1) () = P (X = 6) = 6 .
Test 2: Reject H0 if and only if X 5. The power function of this hypothesis test is
Power(2) () = P (X 5) = 6 + 65 (1 ) = 5 (6 5).
Test 3: Reject H0 if and only if X 4. The power function of this hypothesis test is
Power(3) () = P (X 4) = 6 + 65 (1 ) + 15 4 (1 )2 = 4 (15 24 + 10 2 ).
These functions are plotted below.

1.0

Lecture 8: Introduction to Hypothesis Testing

0.0

0.2

Power
0.4
0.6

0.8

Test 1
Test 2
Test 3

0.0

0.2

0.4

0.6

0.8

1.0

From the plot, it is easy to see the following:

Power(1) () is very small for all 1/2, which is good. However, Power(1) () is still
fairly small for most of the > 1/2 region as well, which is not good.
Power(3) () is fairly large for most of the > 1/2 region, which is good. However,
Power(3) () can be reasonably large even when 1/2, which is not so good.
Power(2) () is in between Power(1) () and Power(3) ().
Another way to think about this plot is as follows:
Test 1 makes the fewest type I errors, while Test 3 makes the most (for all ).
Test 3 makes the fewest type II errors, while Test 1 makes the most (for all ).
Which of these three tests is the best is a purely subjective question, and the answer
depends on the relative importance of type I and type II errors in the problem at hand.
Error Trade-Off
In Example 8.2.1, all three hypothesis tests used the same test statistic and differed only in
the choice of the critical value. When comparing a collection of tests of this form (i.e., when
selecting a critical value c), there is always a trade-off of type I and type II errors.
If we adjust c to attain a smaller rejection region (i.e., if we increase c in the standard
form), then we tend to decrease P (X Rc ) = P [T (X) c] for all . This decreases
the probability of a type I error but increases the probability of a type II error.
If we adjust c to attain a larger rejection region (i.e., if we decrease c in the standard
form), then we tend to increase P (X Rc ) = P [T (X) c] for all . This decreases
the probability of a type II error but increases the probability of a type I error.
However, when comparing hypothesis tests with different test statistics, it may be the case
that one test outperforms the other in terms of both type I error and type II error.

Lecture 8: Introduction to Hypothesis Testing

Significance Levels and Sizes

The most common strategy (by far) is to fix some maximum probability of a type I error
and to then try to find a test that has the smallest possible type II error probability subject
to this constraint. This leads to the following terminology.
A level (or significance level ) of a test is any R such that Power() for all 0 .
Thus, a level of a test is simply any upper bound for its type I error probability.
The size of a test is sup0 Power(). Thus, the size of a test is the smallest number
that is a level of the test.
When possible, we usually try to report sizes and levels in such a way that the terms are
interchangeable. In other words, when stating a level of the test, we usually state the size if
it is known, even though larger values would also be levels. Similarly, when asked to find a
test with a specified level , we usually try to find a test with size , even though tests with
smaller sizes would also have level .
Example 8.2.2: Consider again the three hypothesis tests of Example 8.2.1. Since the
power function of each test is an increasing function of , we have
sup Power(j) () = Power(j) (1/2)
0<1/2

for each j {1, 2, 3}. Thus, we have the following:

The size of Test 1 is sup0 Power(1) () = Power(1) (1/2) = 1/64 0.016.
The size of Test 2 is sup0 Power(2) () = Power(2) (1/2) = 7/64 0.109.
The size of Test 3 is sup0 Power(3) () = Power(3) (1/2) = 11/32 0.344.
Note that in each case, the size of the test is also a level of the test. However, any number
greater than the size is also a level of the test.

Thus, fixing a maximum probability of a type I error is equivalent to specifying a level.

In the next section, we will discuss how to actually construct hypothesis tests that have a
specified level (either exactly or approximately).

8.3

Critical Values and Significance Levels

Suppose we have a test of the hypotheses H0 0 and H1 1 that rejects H0 if and

only if T (X) c for some test statistic T (X) and some critical value c. We often wish to
choose c so that the test will have a specified significance level (such as = 0.05). The
test has level if
P [T (X) c] for all 0 ,
so our goal is to find c such that this is the case.

Lecture 8: Introduction to Hypothesis Testing

Distribution of the Test Statistic

To work with P [T (X) c], we need to know the distribution of the test statistic for every
value of (or at least for every 0 ). For this reason, we often choose a test statistic T (X)
that has some standard distribution (e.g., standard normal, Students t, or chi-squared)
when 0 .
Example 8.3.1: Let X1 , . . . , Xn iid N (, 2 ), where 2 > 0 is known, and suppose we want
to test H0 = 5 versus H1 5. We might take our test statistic to be
X n 5
T (X) =
2 /n
because this test statistic has the same distribution as the absolute value of a N (0, 1) random
variable if H0 is true, i.e., if = 5. An equivalent test could be obtained by taking the test
statistic to be any monotonically increasing function of the test statistic above, but such a
test statistic might have a more complicated distribution.

Suppose for now that our null hypothesis is H0 = 0 (i.e., suppose that 0 = {0 }). Then
it suffices to consider P0 [T (X) c].
Achieving a Specified Level
Our test has level if and only if P0 [T (X) c] . For any > 0, we can find a value of c
large enough to satisfy this inequality.
Achieving a Specified Size
Now suppose that we wish to construct a test with size . Our test has size if and only if
P0 [T (X) c] = . It may or may not be possible to find such a test.
If the distribution of T (X) is continuous, then we want to find a value c R such that
[T (X)]

= P0 [T (X) c] = P0 [T (X) > c] = 1 F0

(c),

[T (X)]

where F0
denotes the cdf of T (X) for parameter value 0 . If 0 < < 1, then there
exists a point c R that satisfies this equation since the cdf of T (X) is continuous.
Thus, if the test statistic T (X) is a continuous random variable and 0 < < 1, then
there exists a choice of the critical value c that achieves size .
If instead the distribution of T (X) is discrete, then there may or may not exist a value
of c for which P0 [T (X) c] = . If no such c exists, then there does not exist a test
with size based on the test statistic T (X). In this case, we would typically try to
find a test with size less than (so that it still has level ) but as close to as possible.
Example 8.3.2: In Example 8.3.1, we can obtain a test with size by taking the critical
value c to be the number such that P (Z c) = for a standard normal random variable Z.
(For = 0.05, this is c 1.96. For = 0.10, this is c 1.64.) Any larger value of c would also
yield a test with level , but the size of such a test would be smaller than .

Lecture 8: Introduction to Hypothesis Testing

Composite Null Hypothesis

If the null hypothesis is composite, then it may be less straightforward to achieve a test with
a specified level or size based on a particular test statistic T (X). However, sometimes we
find that
sup P [T (X) c] = P [T (X) c] for all c R
0

for some
0 1 . (Often is on the boundary of 0 .) Then we can proceed as if the
set 0 were instead simply { }, i.e., as if the null hypothesis were simply H0 = .
Example 8.3.3: In Example 8.2.1 and Example 8.2.2,
sup P (X c) = P=1/2 (X c)
0<1/2

for all c R (which was why the sizes of the tests in Example 8.2.2 could be computed by
evaluating the power function at = 1/2). Then since the distribution of X is discrete, a
test with size exactly only exists for certain values of . For example, there does not exist
a test of this form with size 0.05. If we were asked to find a test with level 0.05, we could
choose Test 1, which rejects H0 if and only if X = 6. This test has size 1/64 0.016, so 0.05
is indeed a level of this test.

8.4

P-Values

The choice of the size or level of a test is typically subjective. This subjectivity can be
somewhat unsatisfying, since two different people can reach opposite conclusions from the
same data and the same test statistic simply because they chose to use different sizes or
levels (and hence different critical values).
Example 8.4.1: In Example 8.3.1 and Example 8.3.2, we considered a test that rejects H0
if and only if the test statistic exceeds the number c such that P (Z c) = for a standard
normal random variable Z. Suppose one person uses = 0.05 and c 1.96, while another
person uses = 0.10 and c 1.64. Now suppose the observed test statistic value is 1.76.
Then the first person will fail to reject H0 , while the second person will reject H0 .

Thus, if we simply report whether or not we rejected H0 at a certain level , then we

have somewhat oversimplified the conclusions that can be drawn from the data. A more
informative way to report the conclusions of a hypothesis test is by stating a quantity called
the p-value.
P-Value and Relationship to Test Statistic
Although there exist other ways to define p-values, we will use a definition that frames them
in the context of a test statistic. Consider a hypothesis test that rejects H0 if and only if
T (X) c for some test statistic T (X) and critical value c R. Suppose we observe X = xobs .
Then the p-value of the test with test statistic T (X) for the observed data xobs is
p(xobs ) = sup P [T (X) T (xobs )].
0

Lecture 8: Introduction to Hypothesis Testing

For a simple null hypothesis H0 = 0 , the p-value reduces to

p(xobs ) = P0 [T (X) T (xobs )].
Thus, the p-value is the probability (under H0 ) of observing a test statistic value at least as
large (i.e., at least as far against H0 ) as the one that was actually observed. The following
theorem shows why the p-value is useful.
Theorem 8.4.2. Let Rc be a rejection region of the form Rc = {x T (x) c}, where c is
the smallest number such that the test associated with Rc has level . Then xobs Rc if and
only if p(xobs ) .
Proof. Suppose that xobs Rc . Then T (xobs ) c, so
p(xobs ) = sup P [T (X) T (xobs )] sup P [T (X) c]
0

since the test has level . Now suppose instead that xobs Rc . Then T (xobs ) < c, so
p(xobs ) = sup P [T (X) T (xobs )] >
0

since otherwise c would not be the smallest number such that the test associated with Rc
has level .
Thus, Theorem 8.4.2 tells us that an equivalent way to make the final decision in a hypothesis
test is to calculate the p-value p(xobs ) for the observed data xobs and reject H0 at level
if and only if p(xobs ) . For this reason, the p-value is sometimes called the observed
significance level.
Example 8.4.3: In Example 8.4.1, the observed test statistic value 1.76 has p-value
p(1.76) = P (Z 1.76) 0.078,
where Z is a standard normal random variable.

8.5

Philosophical Issues with Hypothesis Testing

Frequentist hypothesis testing has been an immensely popular tool of statistical inference
for decades. However, there do exist scenarios in which hypothesis tests show properties
that some people consider illogical and unacceptable. On the other hand, some people see
absolutely no problem with this type of behavior. We now provide a few examples merely
to illustrate some issues that can arise.
Example 8.5.1: Suppose we wish to test whether a particular coin is fair or weighted in
favor of heads. Then our hypotheses are H0 = 1/2 and H1 > 1/2, where denotes the
probability that the coin yields heads on any given flip. Now suppose we are told that the
following sequence of flips was observed (in order):
heads, heads, heads, heads, heads, tails.
There is some ambiguity here about how we should represent the data as a random variable.

Lecture 8: Introduction to Hypothesis Testing

Perhaps the person flipping the coin decided to flip the coin repeatedly until obtaining
tails. Let X be the number of times heads is observed for such an experiment before
the first tails. Then X Geometric(), and a sensible hypothesis test is to reject H0
if and only if X c for some c. The observed value of X was X = 5, so the p-value is
p(5) = P=1/2 (X 5) =

1
0.031.
32

Perhaps the person flipping the coin instead decided to flip the coin six times and record
the results. Let X be the number of times heads is observed for such an experiment.
Then X Bin(6, ), and a sensible hypothesis test is to reject H0 if and only if X c
for some c. The observed value of X was X = 5, so the p-value is
p(5) = P=1/2 (X 5) =

7
0.109.
64

Thus, the two different representations yield very different p-values and would therefore lead
to opposite conclusions at both = 0.05 and = 0.10. This is troubling since there is no clear
reason to prefer either representation over the other. Essentially, the result of our hypothesis
test depends on knowing what the experimenter would have done under circumstances that
are already known not to have occurred (e.g., whether the experimenter would have stopped
flipping had tails occurred earlier than the sixth flip).

Example 8.5.2: A researcher visits a lab and is allowed to use Machine A to conduct some
measurements. These measurements are then used to perform a hypothesis test and reach
a conclusion. However, the researcher later learns that the lab actually had two similar
machines of this type (Machine A and Machine B), that another researcher also visited the
lab the same day, and that the two machines were assigned to the two researchers randomly.
Also, the machines are not identical: Machine A is a better piece of equipment and hence
provides more precise measurements than Machine B. Although these new facts do not change
the researchers data or test statistic, they do change the distribution of that test statistic,
which must instead be calculated as if there were probability 1/2 of using Machine A and
probability 1/2 of using Machine B. Thus, the outcome of the hypothesis test can be altered
even after the data has been collected by the mere existence of Machine B and the fact that
it could have been used instead, even though it is already known that it was not used.

Example 8.5.3: Suppose a certain voltage is to be measured using a voltmeter for which
the readings are iid N (, 2 ) random variables, where 2 > 0 is known. The sample mean is
computed, and a hypothesis test is performed. However, it is later learned that the voltmeter
had a maximum reading of 10 V, and any reading that otherwise would have been greater
than 10 V would have instead been given as 10 V. This fact changes the distribution of the
test statistic and could thus alter the outcome of the hypothesis test. Note that this change
occurs even if all of the readings are less than 10 V, i.e., even if it is already known that the
maximum did not actually matter.

Lecture 8: Introduction to Hypothesis Testing

Source of the Issues

While some people see no problems with the behavior described above, other people feel that
these examples contradict common sense. The issues arise because the various probabilistic
notions involved in hypothesis testing all involve summing or integrating over the entire
sample space, i.e., over all possible data values that could have been observed. Thus, the
results of the test can be affected by what would have happened for data values that did
not actually occur. Note that this issue applies to frequentist inference in general, not just
hypothesis testing. The same issues can also arise when calculating properties of estimators
such as bias.
Example 8.5.4: In Example 8.5.3, the existence of a maximum reading for the voltmeter
would also affect the bias of the the sample mean. Note that the sample mean is still an
unbiased estimator of the true mean of each reading on the voltmeter. However, the true
mean of each reading on the voltmeter is now slightly less than the true voltage .

These examples also highlight the differences between frequentist and Bayesian inference.
Frequentist inference conditions on parameter values and integrates/sums over all possible data values that could be observed.
Bayesian inference conditions on the observed data values and integrates/sums over all
possible values of the parameter.
Thus, the issues that arise in the examples in this section do not arise in Bayesian inference.
Since Bayesian methods are conditional on the data that is actually observed, they are
unaffected by what could have happened for data values that did not actually occur.
Note: Of course, there also exist scenarios where Bayesian methods exhibit behavior
that can be criticized on philosophical grounds. We will return to such scenarios later
in the course if time permits.

GSEC GIAC Security Essentials Certification Exam Guide
No ratings yet
GSEC GIAC Security Essentials Certification Exam Guide
587 pages
Music Theory - Study Guide
No ratings yet
Music Theory - Study Guide
3 pages
Pset2 Solution Handout
0% (1)
Pset2 Solution Handout
6 pages
Federalist 10 Questions
No ratings yet
Federalist 10 Questions
3 pages
Tests Based On Asymptotic Properties
No ratings yet
Tests Based On Asymptotic Properties
6 pages
Frequentist Confidence Sets
No ratings yet
Frequentist Confidence Sets
8 pages
Likelihood Ratio Tests
No ratings yet
Likelihood Ratio Tests
3 pages
Homework 2
No ratings yet
Homework 2
1 page
Homework 1
No ratings yet
Homework 1
1 page
Convergence Concepts: 2.1 Convergence of Random Variables
No ratings yet
Convergence Concepts: 2.1 Convergence of Random Variables
6 pages
Homework 3few
No ratings yet
Homework 3few
2 pages
Bayesian Credible Sets: C (x) θ∈C (x)
No ratings yet
Bayesian Credible Sets: C (x) θ∈C (x)
3 pages
Lecture 5
No ratings yet
Lecture 5
6 pages
Copy of Extra3
No ratings yet
Copy of Extra3
9 pages
SAT Problem-Solving and Data Analysis H
No ratings yet
SAT Problem-Solving and Data Analysis H
71 pages
Evaluating statistical claims_ Observational studies and experiments answers
No ratings yet
Evaluating statistical claims_ Observational studies and experiments answers
11 pages
Trigonometric Identities Sept 2017
No ratings yet
Trigonometric Identities Sept 2017
2 pages
Question Bank 136
No ratings yet
Question Bank 136
12 pages
2023 October Internationalpdf_250204_201502
No ratings yet
2023 October Internationalpdf_250204_201502
36 pages
Solutions To Sample Midterm Questions
No ratings yet
Solutions To Sample Midterm Questions
14 pages
12C
No ratings yet
12C
75 pages
Review of Basic Probability: 1.1 Random Variables and Distributions
No ratings yet
Review of Basic Probability: 1.1 Random Variables and Distributions
8 pages
Frequentist Estimation: 4.1 Likelihood Function
No ratings yet
Frequentist Estimation: 4.1 Likelihood Function
6 pages
Stanford Stats 200
No ratings yet
Stanford Stats 200
6 pages
Ap Stats Cram Sheet: Symmetric - When The Left Half Is
No ratings yet
Ap Stats Cram Sheet: Symmetric - When The Left Half Is
7 pages
Kaylaschoolwork 3
No ratings yet
Kaylaschoolwork 3
5 pages
AP-2020-Biology-Practice Exam 3
No ratings yet
AP-2020-Biology-Practice Exam 3
113 pages
AP C - Mock Practice 2 - MCQ
No ratings yet
AP C - Mock Practice 2 - MCQ
27 pages
Introduction To Market Design.2011
No ratings yet
Introduction To Market Design.2011
41 pages
AP Calculus
No ratings yet
AP Calculus
8 pages
AP Stats 2016
No ratings yet
AP Stats 2016
112 pages
Practice Test 7 Static Gravitation SHM
No ratings yet
Practice Test 7 Static Gravitation SHM
17 pages
U.S. March SAT v2 @D7SAT (2)
No ratings yet
U.S. March SAT v2 @D7SAT (2)
58 pages
Statistics
No ratings yet
Statistics
20 pages
Unit 1 MCQ SG
No ratings yet
Unit 1 MCQ SG
18 pages
Stats Ans
No ratings yet
Stats Ans
44 pages
TB MockExam01 66f06924090128.66f069260ac868.59117147
No ratings yet
TB MockExam01 66f06924090128.66f069260ac868.59117147
15 pages
Problem Sette #4 DEC 2023 DSAT
No ratings yet
Problem Sette #4 DEC 2023 DSAT
7 pages
Ap11 FRQ Calculus BC
No ratings yet
Ap11 FRQ Calculus BC
7 pages
2023_Nov_Digital_SAT_Test_1[1]
No ratings yet
2023_Nov_Digital_SAT_Test_1[1]
99 pages
ENG SAT Suite Question Bank - 3.2
No ratings yet
ENG SAT Suite Question Bank - 3.2
149 pages
Ultimate AP Chem Review 1
No ratings yet
Ultimate AP Chem Review 1
77 pages
MIT System Theory Solutions
No ratings yet
MIT System Theory Solutions
75 pages
Ap MCQ 1
No ratings yet
Ap MCQ 1
5 pages
SAT Suite Question Bank - Results
No ratings yet
SAT Suite Question Bank - Results
51 pages
4 5994732338905355223
No ratings yet
4 5994732338905355223
10 pages
AP Lang 2020 Practice 2.PDF
No ratings yet
AP Lang 2020 Practice 2.PDF
124 pages
Digital Sat Math Test 2
No ratings yet
Digital Sat Math Test 2
12 pages
AP Calculus BC - Test 6 (18 April)
No ratings yet
AP Calculus BC - Test 6 (18 April)
14 pages
1.2&1.3&1.4 Ms
No ratings yet
1.2&1.3&1.4 Ms
19 pages
DSAT_Math_Problem-Solving_and_Data_Analysis_Question_Bank (1)
No ratings yet
DSAT_Math_Problem-Solving_and_Data_Analysis_Question_Bank (1)
303 pages
Advanced Math Hard Answers
No ratings yet
Advanced Math Hard Answers
168 pages
AP Stats Study Guide
No ratings yet
AP Stats Study Guide
14 pages
Math Mini Guide
No ratings yet
Math Mini Guide
22 pages
Physics CMech PE2 FRQ Scoring Guidelines
No ratings yet
Physics CMech PE2 FRQ Scoring Guidelines
10 pages
AP Physics 1 Algebra Based Unit 9 DC Circuits
No ratings yet
AP Physics 1 Algebra Based Unit 9 DC Circuits
9 pages
Unit 7 Free Response Practice
No ratings yet
Unit 7 Free Response Practice
15 pages
A.P. Statistics Chapter 10 Review 2 Key
100% (1)
A.P. Statistics Chapter 10 Review 2 Key
7 pages
9.2 Quiz Solutions
No ratings yet
9.2 Quiz Solutions
1 page
STSM3714 (With Notes From Class) (1)
No ratings yet
STSM3714 (With Notes From Class) (1)
110 pages
Testing in Statistics
No ratings yet
Testing in Statistics
22 pages
Extra Homework For Final Exam Solution
No ratings yet
Extra Homework For Final Exam Solution
5 pages
Final Exam Sample Solutions
No ratings yet
Final Exam Sample Solutions
19 pages
China, Russia To Hold First-Ever Mediterranean Naval Exercise - RT News
No ratings yet
China, Russia To Hold First-Ever Mediterranean Naval Exercise - RT News
2 pages
Final Exam Sample Solutions
No ratings yet
Final Exam Sample Solutions
19 pages
J If P (I, J) 0, Where P Is The Transition Probability. Note 2 1 0
No ratings yet
J If P (I, J) 0, Where P Is The Transition Probability. Note 2 1 0
2 pages
HW5 210
No ratings yet
HW5 210
1 page
B3 AND A3 FINAL COMBINED
No ratings yet
B3 AND A3 FINAL COMBINED
48 pages
Name Synopsis Description
No ratings yet
Name Synopsis Description
4 pages
International Business Assignment PDF
No ratings yet
International Business Assignment PDF
10 pages
Compugra Posting
No ratings yet
Compugra Posting
2 pages
Certificates Old Council
0% (1)
Certificates Old Council
6 pages
Migration
No ratings yet
Migration
7 pages
Computer Shortcuts
No ratings yet
Computer Shortcuts
2 pages
Research Guide
No ratings yet
Research Guide
22 pages
Khmer Empire
No ratings yet
Khmer Empire
34 pages
Mentimeter Cheat Sheet-1
No ratings yet
Mentimeter Cheat Sheet-1
13 pages
R C L Z Z R L Z: Lecture 28: M-Derived Filter Section
No ratings yet
R C L Z Z R L Z: Lecture 28: M-Derived Filter Section
3 pages
Manual de Instrucciones KBTZ2
No ratings yet
Manual de Instrucciones KBTZ2
1 page
Xitsonga HL Grade 3
No ratings yet
Xitsonga HL Grade 3
33 pages
Experiment No. 1 Gravimetric Determination of Calcium: Objectives
No ratings yet
Experiment No. 1 Gravimetric Determination of Calcium: Objectives
3 pages
Basic Computer Skills Quiz For Beginners
No ratings yet
Basic Computer Skills Quiz For Beginners
2 pages
Childrens Book Illustrations Visual Language of P
100% (3)
Childrens Book Illustrations Visual Language of P
13 pages
THERMA V (AWHP) R32 Split IWT Leaflet - 0721 - FIN - Low
No ratings yet
THERMA V (AWHP) R32 Split IWT Leaflet - 0721 - FIN - Low
2 pages
Course Book: Koya University
No ratings yet
Course Book: Koya University
24 pages
Certified AI Practitioner Exam AIP-110 Blueprint_Final_20190813
No ratings yet
Certified AI Practitioner Exam AIP-110 Blueprint_Final_20190813
11 pages
QinQ VLAN Application in FTTH PON Network - FTTH & Triple Play Broadband Equipment
No ratings yet
QinQ VLAN Application in FTTH PON Network - FTTH & Triple Play Broadband Equipment
5 pages
CV-Soubhagy - Contracts & Procurement
No ratings yet
CV-Soubhagy - Contracts & Procurement
4 pages
Contrast Media UlrichINJECT US CT Motion Brochure 2019 JB68485USa
No ratings yet
Contrast Media UlrichINJECT US CT Motion Brochure 2019 JB68485USa
4 pages
Empty House in Astrology
No ratings yet
Empty House in Astrology
7 pages
Hot-Melt Extrusion in The Pharmaceutical Industry: Toward Filing A New Drug Application
No ratings yet
Hot-Melt Extrusion in The Pharmaceutical Industry: Toward Filing A New Drug Application
20 pages
Qsr
No ratings yet
Qsr
5 pages
IR&Yourcegid Retail 2018
0% (1)
IR&Yourcegid Retail 2018
79 pages
Pathfinder Module From Shore To Sea Brandon Hodge download
No ratings yet
Pathfinder Module From Shore To Sea Brandon Hodge download
43 pages
(Part 2) Latihan Nombor Indeks
No ratings yet
(Part 2) Latihan Nombor Indeks
3 pages
Smart Cities and Artificial Intelligence: Convergent Systems For Planning, Design, and Operations 1St Edition Christopher Grant Kirwan - Ebook PDF
100% (2)
Smart Cities and Artificial Intelligence: Convergent Systems For Planning, Design, and Operations 1St Edition Christopher Grant Kirwan - Ebook PDF
41 pages

Introduction To Hypothesis Testing: 8.1 Basic Structure of Hypothesis Tests

Uploaded by

Introduction To Hypothesis Testing: 8.1 Basic Structure of Hypothesis Tests

Uploaded by

STATS 200 (Stanford University, Summer 2015)

Introduction to Hypothesis Testing

Basic Structure of Hypothesis Tests

Hypotheses can be further classified as simple or composite.

Lecture 8: Introduction to Hypothesis Testing

Example 8.1.2: Let X1 , . . . , Xn iid N (, 2 ), and consider various sets of hypotheses:

R = {x S T (x) > c},

where T S R is a function of the data and c R.

Lecture 8: Introduction to Hypothesis Testing

where S = {0, 1, . . . , n} and c R. For example, if n = 6, then we have the following:

Good and Bad Hypothesis Tests (and Non-Tests)

Lecture 8: Introduction to Hypothesis Testing

Properties of Hypothesis Tests

Lecture 8: Introduction to Hypothesis Testing

The perfect power function would be

Lecture 8: Introduction to Hypothesis Testing

From the plot, it is easy to see the following:

Lecture 8: Introduction to Hypothesis Testing

Significance Levels and Sizes

for each j {1, 2, 3}. Thus, we have the following:

Thus, fixing a maximum probability of a type I error is equivalent to specifying a level.

Critical Values and Significance Levels

Suppose we have a test of the hypotheses H0 0 and H1 1 that rejects H0 if and

Lecture 8: Introduction to Hypothesis Testing

Distribution of the Test Statistic

= P0 [T (X) c] = P0 [T (X) > c] = 1 F0

Lecture 8: Introduction to Hypothesis Testing

Composite Null Hypothesis

Thus, if we simply report whether or not we rejected H0 at a certain level , then we

Lecture 8: Introduction to Hypothesis Testing

For a simple null hypothesis H0 = 0 , the p-value reduces to

Philosophical Issues with Hypothesis Testing

Lecture 8: Introduction to Hypothesis Testing

Lecture 8: Introduction to Hypothesis Testing

Source of the Issues

You might also like