Introduction To Hypothesis Testing: 8.1 Basic Structure of Hypothesis Tests

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

STATS 200 (Stanford University, Summer 2015)

Lecture 8:

Introduction to Hypothesis Testing

It is often the case that we wish to use data to make a binary decision about some unknown
aspect of nature. For example, we may wish to decide whether or not it is plausible that a
parameter takes some particular value. A frequentist approach to using data to make such
decisions is hypothesis testing, also called significance testing.
Note: There exist Bayesian counterparts of frequentist hypothesis tests, but the two
philosophies differ more substantially for these types of binary decisions than for estimation problems.

8.1

Basic Structure of Hypothesis Tests

A hypothesis test consists of two hypotheses and a rejection region. The rejection region
may be specified via a test statistic and a critical value. We define each of these terms below.
Hypotheses
A hypothesis is any statement about an unknown aspect of a distribution. In a hypothesis
test, we have two hypotheses:
H0 , the null hypothesis, and
H1 , the alternative hypothesis.
Often a hypothesis is stated in terms of the value of one or more unknown parameters, in
which case it is called a parametric hypothesis. Specifically, suppose we have an unknown
parameter . Then parametric hypotheses about can be written in general as H0 0
and H1 1 , where 0 and 1 are disjoint, i.e., 0 1 = . We will typically assume
hypotheses to be parametric unless clearly stated otherwise.
Example 8.1.1: Let X1 , . . . , Xn be iid observations from a distribution with an unknown
mean R. Parametric hypotheses about could be H0 2 and H1 > 2. A different
set of parametric hypotheses could be H0 = 2 and H1 2.

Hypotheses can be further classified as simple or composite.


A hypothesis is simple if it fully specifies the distribution of the data (including all
unknown parameter values). A parametric hypothesis is simple if it states specific
values for all unknown parameters.
A hypothesis is composite if it is not simple.
Note that taking both hypotheses to be simple is equivalent to allowing only two possible
values for the unknown parameter , which is often unrealistic in practice. Thus, at least one
hypothesis is typically composite, and sometimes both hypotheses are composite. (If only
one hypothesis is composite, it is usually the alternative hypothesis H1 , for reasons that will
become clear later.)

Lecture 8: Introduction to Hypothesis Testing

Example 8.1.2: Let X1 , . . . , Xn iid N (, 2 ), and consider various sets of hypotheses:


H0 = 40 versus H1 = 45, with 2 known. H0 and H1 are both simple.
H0 = 40 versus H1 40, with 2 known. H0 is simple, and H1 is composite.
H0 = 40 versus H1 40, with 2 unknown. H0 and H1 are both composite.
H0 40 versus H1 > 40. H0 and H1 are both composite.
H0 (, 2 ) = (40, 9) versus H1 (, 2 ) (40, 9). H0 is simple, and H1 is composite.
Note that if 2 is unknown, any hypothesis that does not specify its value is composite.
Rejection Region
Suppose we have some data X. Let S denote the set of all possible values that X can take.
A test of the hypotheses H0 and H1 based on X is a rule of the form
Reject H0 (in favor of H1 ) if and only if X R,
where R is a subset of S. This set R is called the rejection region.
Note: When we do not reject H0 , we typically simply say that we fail to reject H0 .
Some people prefer to say instead that we accept H0 . For now, it is unimportant which
interpretation we prefer, since we are simply treating a hypothesis test as a rule for
using data to make a binary decision. However, we will revisit this issue later.

Example 8.1.3: Let X Bin(n, ), where 0 < < 1, and consider testing H0 = 1/2 versus
H1 1/2. Perhaps the simplest nontrivial test of these hypotheses is to reject H0 if and
only if the trials are all successes or all failures, i.e., if and only if X = 0 or X = n. Then the
rejection region is R = {0, n}.

Essentially, a hypothesis test is its rejection region, in the sense that two tests of the same
hypotheses based on the same data are identical tests if and only if they have the same
rejection region.
Test Statistic
It is common to write the rejection region R in the form
R = {x S T (x) c}

or

R = {x S T (x) > c},

where T S R is a function of the data and c R.


When the function T is applied to the data X, the resulting random variable T (X) is
called the test statistic. We can also talk about the specific value T (x) that the test
statistic takes for a particular data set X = x.
The number c is called the critical value. Different values of c yield different rejection
regions, which we write as Rc .

Lecture 8: Introduction to Hypothesis Testing

Notice that if c1 > c2 , then Rc1 Rc2 . Thus, writing rejection regions in this form allows us
to construct a series of nested rejection regions corresponding to the same test statistic.
Example 8.1.4: Let X Bin(n, ), where 0 < < 1, and consider testing H0 = 1/2 versus
H1 1/2. A simple test of these hypotheses is to reject H0 if and only if X/n is far enough
from 1/2. Then we could state the test statistic and rejection region as
T (X) =

X 1
,
n 2

Rc = {x S T (x) c},

where S = {0, 1, . . . , n} and c R. For example, if n = 6, then we have the following:


Critical Value

Rejection Region

c0
0 < c 1/6
1/6 < c 1/3
1/3 < c 1/2
1/2 < c

{0, 1, 2, 3, 4, 5, 6}
{0, 1, 2, 4, 5, 6}
{0, 1,
5, 6}
{0,
6}
{
}

Thus, we always reject H0 (Rc = S) if c 0 , while we never reject H0 (Rc = ) if c > 1/2.
Although any rejection region R can be written in the form R = {x S T (x) c} or
R = {x S T (x) > c} for some test statistic T (X) and some critical value c, it may be
occasionally be more convenient to express a rejection region in some other form.
Example 8.1.5: Example 9.1.7 of DeGroot & Schervish describes a test statistic Yn and
a hypothesis test in which we reject H0 if and only if Yn 2.9 or Yn 4.0. On page 533,
DeGroot & Schervish claim that the rejection region of this test cannot be written in the
form {x S T (x) c}. However, this claim is clearly incorrect, since we can simply
define another test statistic Zn = max{2.9 Yn , Yn 4.0} and write the rejection region as
{x S Zn (x) 0}. However, it is probably more convenient to work with the rejection
region in terms of the original Yn , even though it does not fit the standard form.

Good and Bad Hypothesis Tests (and Non-Tests)


Every subset of the sample space can be a rejection region, and every rejection region corresponds to a hypothesis test. However, not all such hypothesis tests are actually sensible.
A good hypothesis test should be more likely to reject H0 if it is actually false than if
it is actually true.
Mathematically, the rejection region R corresponds to a sensible test of H0 0
versus H1 1 if P (X R) tends to be higher for 1 than for 0 .
A perfect hypothesis test would have P (X R) equal to 0 or 1 according to whether
0 or 1 , respectively. However, this is typically impossible to achieve.
The probability P (X R), which is a function of , will be given a name in Section 8.2.

Lecture 8: Introduction to Hypothesis Testing

Example 8.1.6: Let X Bin(n, ), where 0 < < 1, and consider testing H0 = 1/2 versus
H1 1/2. Clearly the hypothesis tests proposed in Example 8.1.3 and Example 8.1.4 are
good since X is more likely to fall in the rejection region if 1/2 than if = 1/2.

Example 8.1.7: Let X Bin(n, ), where 0 < < 1, and consider testing H0 = 1/2 versus
H1 1/2. A legal hypothesis test is simply to always reject H0 . The rejection region of
this test is {0, 1, . . . , n}, the entire sample space. Another legal hypothesis test is simply to
never reject H0 . The rejection region of this test is . However, these two hypothesis tests
are obviously a waste of time.

Example 8.1.8: Let X Bin(n, ), where 0 < < 1, and consider testing H0 = 1/2 versus
H1 1/2. Suppose we take the test statistic to be T (X) = X and reject H0 if and only
if X c. This is a legal hypothesis test. However, it is not a good test of these hypotheses
since P (X c) is smaller for < 1/2 than for = 1/2. (Note, however, that it would be a
good test of H0 = 1/2 versus H1 > 1/2.)

Example 8.1.9: Let X Bin(n, ), where 0 < < 1, and consider testing H0 = 1/2 versus
H1 1/2. The seemingly perfect test that rejects H0 if and only if 1/2 is not a
hypothesis test at all, since it does not specify a rejection region as a subset of the sample
space. (It specifies a rule in terms of the parameter value itself, which of course is impossible
to apply since the parameter value is unknown.)

8.2

Properties of Hypothesis Tests

We now discuss basic properties of hypothesis tests in a probabilistic context. Remember that
hypothesis tests as discussed here are a fundamentally frequentist concept, so probabilities
discussed here are calculated as if the true parameter value is fixed but unknown.
Type I and Type II Errors
Recall that an ideal hypothesis test would always fail to reject H0 when it is true and would
always reject H0 when it is false. Then an ideal hypothesis test would be constructed so that
X R if and only if 1 . However, since X is random, this goal is typically impossible.
Hence, there is usually some chance that our test will make the wrong decision.
A type I error occurs if we reject H0 when it is true, i.e., if 0 and X R.
A type II error occurs if we fail to reject H0 when it is false, i.e., if 1 and X R.
The following table of possibilities may be helpful:
Truth
H0 0
H0 0
H1 1
H1 1

Data
X
X
X
X

R
R
R
R

Decision

Outcome

Fail to Reject H0
Reject H0
Fail to Reject H0
Reject H0

Correct Decision
Type I Error
Type II Error
Correct Decision

Lecture 8: Introduction to Hypothesis Testing

Of course, in reality we would not know whether a decision is correct or is an error, because
we would not know the true parameter value . However, we can still consider the probability
of each type of error.
If 0 , then the probability of a type I error is P (X R).
If 1 , then the probability of a type II error is P (X R) = 1 P (X R).
The true value of is unknown, but these probabilities can be calculated for each possible .
Power Function
The power function of a hypothesis test with rejection region R is Power() = P (X R).
Note: We will write Power() to avoid any notational confusion, but be aware that
this notation is nonstandard. Our textbook uses () for the power function, while
another textbook uses (). The latter choice is particularly confusing since many
people instead use to denote the probability of a type II error.

Notice that the power function provides the probabilities of both error types:

P (type I error)
Power() = P (X R) =

1 P (type II error)

if 0 ,
if 1 .

Note: When people use the word power in the context of hypothesis tests, they
usually mean 1 P (type II error), i.e., they mean the values of Power() for 1 .
The definition of the power function above is simply the logical extension to 0 as
well. Note, however, that it is actually bad if Power() is large for 0 .

The perfect power function would be

0
Power() = 11 () =

if 0 ,
if 1 ,

but we know this is typically impossible since it corresponds to a perfect hypothesis test.
More practically, we want Power() to be small for 0 and large for 1 .
Example 8.2.1: Let X Bin(6, ), where 0 < < 1 and consider testing H0 1/2 versus
H1 > 1/2 using one of the following three hypothesis tests:
Test 1: Reject H0 if and only if X = 6. The power function of this hypothesis test is
Power(1) () = P (X = 6) = 6 .
Test 2: Reject H0 if and only if X 5. The power function of this hypothesis test is
Power(2) () = P (X 5) = 6 + 65 (1 ) = 5 (6 5).
Test 3: Reject H0 if and only if X 4. The power function of this hypothesis test is
Power(3) () = P (X 4) = 6 + 65 (1 ) + 15 4 (1 )2 = 4 (15 24 + 10 2 ).
These functions are plotted below.

1.0

Lecture 8: Introduction to Hypothesis Testing

0.0

0.2

Power
0.4
0.6

0.8

Test 1
Test 2
Test 3

0.0

0.2

0.4

0.6

0.8

1.0

From the plot, it is easy to see the following:


Power(1) () is very small for all 1/2, which is good. However, Power(1) () is still
fairly small for most of the > 1/2 region as well, which is not good.
Power(3) () is fairly large for most of the > 1/2 region, which is good. However,
Power(3) () can be reasonably large even when 1/2, which is not so good.
Power(2) () is in between Power(1) () and Power(3) ().
Another way to think about this plot is as follows:
Test 1 makes the fewest type I errors, while Test 3 makes the most (for all ).
Test 3 makes the fewest type II errors, while Test 1 makes the most (for all ).
Which of these three tests is the best is a purely subjective question, and the answer
depends on the relative importance of type I and type II errors in the problem at hand.
Error Trade-Off
In Example 8.2.1, all three hypothesis tests used the same test statistic and differed only in
the choice of the critical value. When comparing a collection of tests of this form (i.e., when
selecting a critical value c), there is always a trade-off of type I and type II errors.
If we adjust c to attain a smaller rejection region (i.e., if we increase c in the standard
form), then we tend to decrease P (X Rc ) = P [T (X) c] for all . This decreases
the probability of a type I error but increases the probability of a type II error.
If we adjust c to attain a larger rejection region (i.e., if we decrease c in the standard
form), then we tend to increase P (X Rc ) = P [T (X) c] for all . This decreases
the probability of a type II error but increases the probability of a type I error.
However, when comparing hypothesis tests with different test statistics, it may be the case
that one test outperforms the other in terms of both type I error and type II error.

Lecture 8: Introduction to Hypothesis Testing

Significance Levels and Sizes


The most common strategy (by far) is to fix some maximum probability of a type I error
and to then try to find a test that has the smallest possible type II error probability subject
to this constraint. This leads to the following terminology.
A level (or significance level ) of a test is any R such that Power() for all 0 .
Thus, a level of a test is simply any upper bound for its type I error probability.
The size of a test is sup0 Power(). Thus, the size of a test is the smallest number
that is a level of the test.
When possible, we usually try to report sizes and levels in such a way that the terms are
interchangeable. In other words, when stating a level of the test, we usually state the size if
it is known, even though larger values would also be levels. Similarly, when asked to find a
test with a specified level , we usually try to find a test with size , even though tests with
smaller sizes would also have level .
Example 8.2.2: Consider again the three hypothesis tests of Example 8.2.1. Since the
power function of each test is an increasing function of , we have
sup Power(j) () = Power(j) (1/2)
0<1/2

for each j {1, 2, 3}. Thus, we have the following:


The size of Test 1 is sup0 Power(1) () = Power(1) (1/2) = 1/64 0.016.
The size of Test 2 is sup0 Power(2) () = Power(2) (1/2) = 7/64 0.109.
The size of Test 3 is sup0 Power(3) () = Power(3) (1/2) = 11/32 0.344.
Note that in each case, the size of the test is also a level of the test. However, any number
greater than the size is also a level of the test.

Thus, fixing a maximum probability of a type I error is equivalent to specifying a level.


In the next section, we will discuss how to actually construct hypothesis tests that have a
specified level (either exactly or approximately).

8.3

Critical Values and Significance Levels

Suppose we have a test of the hypotheses H0 0 and H1 1 that rejects H0 if and


only if T (X) c for some test statistic T (X) and some critical value c. We often wish to
choose c so that the test will have a specified significance level (such as = 0.05). The
test has level if
P [T (X) c] for all 0 ,
so our goal is to find c such that this is the case.

Lecture 8: Introduction to Hypothesis Testing

Distribution of the Test Statistic


To work with P [T (X) c], we need to know the distribution of the test statistic for every
value of (or at least for every 0 ). For this reason, we often choose a test statistic T (X)
that has some standard distribution (e.g., standard normal, Students t, or chi-squared)
when 0 .
Example 8.3.1: Let X1 , . . . , Xn iid N (, 2 ), where 2 > 0 is known, and suppose we want
to test H0 = 5 versus H1 5. We might take our test statistic to be
X n 5
T (X) =
2 /n
because this test statistic has the same distribution as the absolute value of a N (0, 1) random
variable if H0 is true, i.e., if = 5. An equivalent test could be obtained by taking the test
statistic to be any monotonically increasing function of the test statistic above, but such a
test statistic might have a more complicated distribution.

Suppose for now that our null hypothesis is H0 = 0 (i.e., suppose that 0 = {0 }). Then
it suffices to consider P0 [T (X) c].
Achieving a Specified Level
Our test has level if and only if P0 [T (X) c] . For any > 0, we can find a value of c
large enough to satisfy this inequality.
Achieving a Specified Size
Now suppose that we wish to construct a test with size . Our test has size if and only if
P0 [T (X) c] = . It may or may not be possible to find such a test.
If the distribution of T (X) is continuous, then we want to find a value c R such that
[T (X)]

= P0 [T (X) c] = P0 [T (X) > c] = 1 F0

(c),

[T (X)]

where F0
denotes the cdf of T (X) for parameter value 0 . If 0 < < 1, then there
exists a point c R that satisfies this equation since the cdf of T (X) is continuous.
Thus, if the test statistic T (X) is a continuous random variable and 0 < < 1, then
there exists a choice of the critical value c that achieves size .
If instead the distribution of T (X) is discrete, then there may or may not exist a value
of c for which P0 [T (X) c] = . If no such c exists, then there does not exist a test
with size based on the test statistic T (X). In this case, we would typically try to
find a test with size less than (so that it still has level ) but as close to as possible.
Example 8.3.2: In Example 8.3.1, we can obtain a test with size by taking the critical
value c to be the number such that P (Z c) = for a standard normal random variable Z.
(For = 0.05, this is c 1.96. For = 0.10, this is c 1.64.) Any larger value of c would also
yield a test with level , but the size of such a test would be smaller than .

Lecture 8: Introduction to Hypothesis Testing

Composite Null Hypothesis


If the null hypothesis is composite, then it may be less straightforward to achieve a test with
a specified level or size based on a particular test statistic T (X). However, sometimes we
find that
sup P [T (X) c] = P [T (X) c] for all c R
0

for some
0 1 . (Often is on the boundary of 0 .) Then we can proceed as if the
set 0 were instead simply { }, i.e., as if the null hypothesis were simply H0 = .
Example 8.3.3: In Example 8.2.1 and Example 8.2.2,
sup P (X c) = P=1/2 (X c)
0<1/2

for all c R (which was why the sizes of the tests in Example 8.2.2 could be computed by
evaluating the power function at = 1/2). Then since the distribution of X is discrete, a
test with size exactly only exists for certain values of . For example, there does not exist
a test of this form with size 0.05. If we were asked to find a test with level 0.05, we could
choose Test 1, which rejects H0 if and only if X = 6. This test has size 1/64 0.016, so 0.05
is indeed a level of this test.

8.4

P-Values

The choice of the size or level of a test is typically subjective. This subjectivity can be
somewhat unsatisfying, since two different people can reach opposite conclusions from the
same data and the same test statistic simply because they chose to use different sizes or
levels (and hence different critical values).
Example 8.4.1: In Example 8.3.1 and Example 8.3.2, we considered a test that rejects H0
if and only if the test statistic exceeds the number c such that P (Z c) = for a standard
normal random variable Z. Suppose one person uses = 0.05 and c 1.96, while another
person uses = 0.10 and c 1.64. Now suppose the observed test statistic value is 1.76.
Then the first person will fail to reject H0 , while the second person will reject H0 .

Thus, if we simply report whether or not we rejected H0 at a certain level , then we


have somewhat oversimplified the conclusions that can be drawn from the data. A more
informative way to report the conclusions of a hypothesis test is by stating a quantity called
the p-value.
P-Value and Relationship to Test Statistic
Although there exist other ways to define p-values, we will use a definition that frames them
in the context of a test statistic. Consider a hypothesis test that rejects H0 if and only if
T (X) c for some test statistic T (X) and critical value c R. Suppose we observe X = xobs .
Then the p-value of the test with test statistic T (X) for the observed data xobs is
p(xobs ) = sup P [T (X) T (xobs )].
0

Lecture 8: Introduction to Hypothesis Testing

10

For a simple null hypothesis H0 = 0 , the p-value reduces to


p(xobs ) = P0 [T (X) T (xobs )].
Thus, the p-value is the probability (under H0 ) of observing a test statistic value at least as
large (i.e., at least as far against H0 ) as the one that was actually observed. The following
theorem shows why the p-value is useful.
Theorem 8.4.2. Let Rc be a rejection region of the form Rc = {x T (x) c}, where c is
the smallest number such that the test associated with Rc has level . Then xobs Rc if and
only if p(xobs ) .
Proof. Suppose that xobs Rc . Then T (xobs ) c, so
p(xobs ) = sup P [T (X) T (xobs )] sup P [T (X) c]
0

since the test has level . Now suppose instead that xobs Rc . Then T (xobs ) < c, so
p(xobs ) = sup P [T (X) T (xobs )] >
0

since otherwise c would not be the smallest number such that the test associated with Rc
has level .
Thus, Theorem 8.4.2 tells us that an equivalent way to make the final decision in a hypothesis
test is to calculate the p-value p(xobs ) for the observed data xobs and reject H0 at level
if and only if p(xobs ) . For this reason, the p-value is sometimes called the observed
significance level.
Example 8.4.3: In Example 8.4.1, the observed test statistic value 1.76 has p-value
p(1.76) = P (Z 1.76) 0.078,
where Z is a standard normal random variable.

8.5

Philosophical Issues with Hypothesis Testing

Frequentist hypothesis testing has been an immensely popular tool of statistical inference
for decades. However, there do exist scenarios in which hypothesis tests show properties
that some people consider illogical and unacceptable. On the other hand, some people see
absolutely no problem with this type of behavior. We now provide a few examples merely
to illustrate some issues that can arise.
Example 8.5.1: Suppose we wish to test whether a particular coin is fair or weighted in
favor of heads. Then our hypotheses are H0 = 1/2 and H1 > 1/2, where denotes the
probability that the coin yields heads on any given flip. Now suppose we are told that the
following sequence of flips was observed (in order):
heads, heads, heads, heads, heads, tails.
There is some ambiguity here about how we should represent the data as a random variable.

Lecture 8: Introduction to Hypothesis Testing

11

Perhaps the person flipping the coin decided to flip the coin repeatedly until obtaining
tails. Let X be the number of times heads is observed for such an experiment before
the first tails. Then X Geometric(), and a sensible hypothesis test is to reject H0
if and only if X c for some c. The observed value of X was X = 5, so the p-value is
p(5) = P=1/2 (X 5) =

1
0.031.
32

Perhaps the person flipping the coin instead decided to flip the coin six times and record
the results. Let X be the number of times heads is observed for such an experiment.
Then X Bin(6, ), and a sensible hypothesis test is to reject H0 if and only if X c
for some c. The observed value of X was X = 5, so the p-value is
p(5) = P=1/2 (X 5) =

7
0.109.
64

Thus, the two different representations yield very different p-values and would therefore lead
to opposite conclusions at both = 0.05 and = 0.10. This is troubling since there is no clear
reason to prefer either representation over the other. Essentially, the result of our hypothesis
test depends on knowing what the experimenter would have done under circumstances that
are already known not to have occurred (e.g., whether the experimenter would have stopped
flipping had tails occurred earlier than the sixth flip).

Example 8.5.2: A researcher visits a lab and is allowed to use Machine A to conduct some
measurements. These measurements are then used to perform a hypothesis test and reach
a conclusion. However, the researcher later learns that the lab actually had two similar
machines of this type (Machine A and Machine B), that another researcher also visited the
lab the same day, and that the two machines were assigned to the two researchers randomly.
Also, the machines are not identical: Machine A is a better piece of equipment and hence
provides more precise measurements than Machine B. Although these new facts do not change
the researchers data or test statistic, they do change the distribution of that test statistic,
which must instead be calculated as if there were probability 1/2 of using Machine A and
probability 1/2 of using Machine B. Thus, the outcome of the hypothesis test can be altered
even after the data has been collected by the mere existence of Machine B and the fact that
it could have been used instead, even though it is already known that it was not used.

Example 8.5.3: Suppose a certain voltage is to be measured using a voltmeter for which
the readings are iid N (, 2 ) random variables, where 2 > 0 is known. The sample mean is
computed, and a hypothesis test is performed. However, it is later learned that the voltmeter
had a maximum reading of 10 V, and any reading that otherwise would have been greater
than 10 V would have instead been given as 10 V. This fact changes the distribution of the
test statistic and could thus alter the outcome of the hypothesis test. Note that this change
occurs even if all of the readings are less than 10 V, i.e., even if it is already known that the
maximum did not actually matter.

Lecture 8: Introduction to Hypothesis Testing

12

Source of the Issues


While some people see no problems with the behavior described above, other people feel that
these examples contradict common sense. The issues arise because the various probabilistic
notions involved in hypothesis testing all involve summing or integrating over the entire
sample space, i.e., over all possible data values that could have been observed. Thus, the
results of the test can be affected by what would have happened for data values that did
not actually occur. Note that this issue applies to frequentist inference in general, not just
hypothesis testing. The same issues can also arise when calculating properties of estimators
such as bias.
Example 8.5.4: In Example 8.5.3, the existence of a maximum reading for the voltmeter
would also affect the bias of the the sample mean. Note that the sample mean is still an
unbiased estimator of the true mean of each reading on the voltmeter. However, the true
mean of each reading on the voltmeter is now slightly less than the true voltage .

These examples also highlight the differences between frequentist and Bayesian inference.
Frequentist inference conditions on parameter values and integrates/sums over all possible data values that could be observed.
Bayesian inference conditions on the observed data values and integrates/sums over all
possible values of the parameter.
Thus, the issues that arise in the examples in this section do not arise in Bayesian inference.
Since Bayesian methods are conditional on the data that is actually observed, they are
unaffected by what could have happened for data values that did not actually occur.
Note: Of course, there also exist scenarios where Bayesian methods exhibit behavior
that can be criticized on philosophical grounds. We will return to such scenarios later
in the course if time permits.

You might also like