Probability
Probability
Probability Theory
1 Random Variables
Econometrics essentially applies statistical methods to examine questions of interest
to economists, such as quantifying relationships between different economic variables,
testing competing hypotheses and computing forecasts. Models of relationships be-
tween economic variables (e.g. consumption and income) and models of how economic
variables change over time (e.g. the time path of GDP growth) all involve a degree
of uncertainty when applied to real world data, since we cannot capture all possible
variations in a simple model. As a result, econometric analysis treats economic data
as observations on random variables.
A random (or stochastic) variable is any variable whose value is a real number that
cannot be predicted exactly, and can be viewed as the outcome of a chance experiment.
A random variable is said to be discrete if it can only take a limited number of distinct
values (e.g. the total score when two dice are thrown). A random variable is said to
be continuous if it can assume any value over a continuum (e.g. the temperature in a
room). Most economic variables (e.g. GDP, exchange rates, etc.) are considered to be
continuous random variables, so our focus in this module is on this type of variable.
We now consider some basic probability theory associated with continuous random
variables.
Thus the integral of the PDF over a certain range gives the probability that the random
variable will fall in that interval. To be a valid PDF, f (x) must satisfy the following
conditions:
(i) f (x) ≥ 0, −∞ ≤ x ≤ ∞
R∞
(ii) −∞ f (x)dx = 1
1
so that all probabilities are non-negative, and the continuous sum of probabilities for
all possible outcomes is one. Notice that:
P (X = a) = P (a ≤ X ≤ a)
Z a
= f (x)dx
a
= 0
so that probabilities for a continuous random variable are only non-zero when measured
over an interval.
Example
Consider a spinner on a 0–100 dial with four equal sections (i.e. dividing lines at 0/100,
25, 50 and 75), and let X denote the value the spinner lands on. Since the spinner can
land on an infinite number of positions, the probability of it landing on any particular
position is zero, e.g. P (X = 32) = 0. However, the probability that the spinner
lands in a specified interval can be easily established, e.g. P (0 ≤ X ≤ 50) = 0.5,
P (75 ≤ X ≤ 100) = 0.25. The PDF of X here is given by:
1
100
for 0 ≤ x ≤ 100
f (x) =
0 otherwise
and probabilities can be calculated using this formula, e.g.:
Z 100
1
P (75 ≤ X ≤ 100) = dx
75 100
x 100
=
100 75
= 1 − 0.75
= 0.25
This example is a case of a uniform distribution, more generally defined by X ∼
U (a, b) with PDF: 1
b−a
for a ≤ x ≤ b
f (x) =
0 otherwise
Here we have, for a ≤ v ≤ w ≤ b:
Z w
1
P (v ≤ X ≤ w) = dx
v b−a
w
x
=
b − a v
w−v
=
b−a
Notice that P (v ≤ X ≤ w) depends only on the width of the interval w − v relative to
the total range b − a, but not on its position relative to a and b; hence the terminology
uniform distribution.
2
3 Cumulative Distribution Function
The cumulative distribution function (CDF) for a random variable X gives the prob-
ability that X will take a value less than or equal to a specified number x. It is a
monotonically increasing function of the PDF defined as:
F (x) = P (−∞ ≤ X ≤ x)
Z x
= f (t)dt
−∞
Example
In the above example of the spinner on a 0–100 dial, the PDF was given by:
1
100
for 0 ≤ x ≤ 100
f (x) =
0 otherwise
3
4 Multivariate Distributions
In the previous section we considered the distribution of a single continuous random
variable. In econometrics we are often concerned with the joint distribution of more
than one continuous random variable. We now extend our analysis to consider the
joint distribution of two random variables.
4
4.2 Marginal Probability Density Functions
Given a joint distribution for a pair of random variables X and Y , it is possible to
work out the univariate PDFs of the individual variables X and Y , regardless of the
values that the other variable might take. When the PDF of X or Y is obtained from
the joint distribution, we refer to it as the marginal PDF. The marginal PDFs for X
and Y are defined as:
Z ∞
f (x) = f (x, y)dy
−∞
Z ∞
f (y) = f (x, y)dx
−∞
Note that f (x) (or f (y)) is a function of x (or y) alone and both are legitimate univariate
PDFs in their own right.
The marginal PDF of X is used to assign probabilities to a range of values of X
irrespective of the range of values in which Y is located, i.e.
P (a ≤ X ≤ b) = P (a ≤ X ≤ b, −∞ ≤ Y ≤ ∞)
Z ∞Z b
= f (x, y)dxdy
−∞ a
Z b Z ∞
= f (x, y)dy dx
a −∞
Z b
= f (x)dx
a
P (c ≤ Y ≤ d) = P (−∞ ≤ X ≤ ∞, c ≤ Y ≤ d)
Z dZ ∞
= f (x, y)dxdy
c −∞
Z d Z ∞
= f (x, y)dx dy
c −∞
Z d
= f (y)dy
c
Example
Consider again the bivariate uniform random variables X and Y :
1
(b−a)(d−c)
for a ≤ x ≤ b and c ≤ y ≤ d
f (x, y) =
0 otherwise
5
The marginal PDF of X is obtained as
Z ∞
f (x) = f (x, y)dy
−∞
Z d
1
= dy
c (b − a)(d − c)
d
y
=
(b − a)(d − c) c
d−c
=
(b − a)(d − c)
1
=
b−a
so that the marginal distribution of X is X ∼ U (a, b). A similar analysis shows that
Z ∞
f (y) = f (x, y)dx
−∞
Zb
1
= dx
a (b − a)(d − c)
b
x
=
(b − a)(d − c) a
b−a
=
(b − a)(d − c)
1
=
d−c
so that the marginal distribution of Y is Y ∼ U (c, d).
This is analogous to the conditional probability of event A occurring given that event
B occurs, specified as P (A|B) = P (A and B)/P (B).
6
The conditional PDF of X is used to assign probabilities to a range of values of X
given that Y takes the value Y = y, i.e.:
Z b
P (a ≤ X ≤ b|Y = y) = f (x|y)dx
a
Rb
f (x, y)dx
= a
f (y)
Example
Consider again the bivariate uniform random variables X and Y :
1
(b−a)(d−c)
for a ≤ x ≤ b and c ≤ y ≤ d
f (x, y) =
0 otherwise
7
4.4 Statistical Independence
The notion of the independence of two events is that knowledge of one event occurring
has no effect on the probability of the second event occurring. In the continuous random
variable context, given two random variables X and Y with joint PDF f (x, y), X and
Y are said to be statistically independent if and only if:
f (x, y) = f (x)f (y)
i.e. the joint PDF can be expressed as the product of the marginal PDFs. Under
independence, it also follows that:
f (x, y)
f (x|y) =
f (y)
f (x)f (y)
=
f (y)
= f (x)
and similarly that:
f (y|x) = f (y)
so that marginal and conditional PDFs are identical.
Example
In the above bivariate uniform example, we found that f (x|y) = f (x) and f (y|x) =
f (y), showing that X and Y are independent. This can also be confirmed by demon-
strating that the joint PDF is the product of the marginal PDFs:
1 1
f (x)f (y) =
b−ad−c
1
=
(b − a)(d − c)
= f (x, y)
8
5.1 Expected Value
Consider first an example of a discrete random variable: the score from throwing a die.
The probability of scoring 1, 2, 3, 4, 5 or 6 is in each case 61 , and so the expected score
on average would clearly be:
(1 × 16 ) + (2 × 16 ) + (3 × 61 ) + (4 × 16 ) + (5 × 16 ) + (6 × 16 ) = 3.5
The notion of expected value follows from this idea of the mean outcome of a random
variable, and can be defined formally for a discrete random variable X as:
X
E(X) = xi P (X = xi )
∀i
Extending this concept to the case of a continuous random variable X with PDF f (x),
the expected value is defined as:
Z ∞
E(X) = xf (x)dx
−∞
When considering more than one random variable, e.g. two random variables X
and Y with joint PDF f (x, y), we define:
Z ∞Z ∞
E(X) = xf (x, y)dxdy
−∞ −∞
Z ∞Z ∞
E(Y ) = yf (x, y)dxdy
−∞ −∞
These expected values could alternatively be calculated by first working out the marginal
PDFs of X and Y , and then using the expected value formula for a univariate random
variable. To show that these approaches give the same answer, consider for example
E(X):
Z ∞Z ∞
E(X) = xf (x, y)dxdy
−∞ −∞
Z ∞ Z ∞
= x f (x, y)dy dx
−∞ −∞
Z ∞
= xf (x)dx
−∞
Example
In the case where X ∼ U (a, b), we have:
1
b−a
for a ≤ x ≤ b
f (x) =
0 otherwise
9
and so:
Z ∞
E(X) = xf (x)dx
−∞
Z b
x
= dx
a b−a
b
x2
=
2(b − a) a
2 2
b −a
=
2(b − a)
(b − a)(b + a)
=
2(b − a)
b+a
=
2
Hence, for a uniformly distributed random variable, E(X) is simply the midpoint
between a and b.
2. If a is a constant then:
E(a) = a
10
5. The expected value of a sum of random variables is the sum of their expected values,
i.e.:
E(X + Y ) = E(X) + E(Y )
This follows since:
Z ∞ Z ∞
E(X + Y ) = (x + y)f (x, y)dxdy
Z−∞
∞ Z−∞
∞ Z ∞Z ∞
= xf (x, y)dxdy + yf (x, y)dxdy
−∞ −∞ −∞ −∞
= E(X) + E(Y )
6. If X and Y are independent random variables, the expected value of their product
is the product of their expected values, i.e.:
E(XY ) = E(X)E(Y )
E(X) = a
5.2 Variance
The variance is a measure of the dispersion of a random variable about its expected
value, and for a continuous random variable X with PDF f (x), it is defined as:
11
i.e. the average squared deviation of X from its mean. An alternative formula can be
derived as follows:
Z ∞
V (X) = [x − E(X)]2 f (x)dx
Z−∞
∞ Z ∞ Z ∞
2 2
= x f (x)dx + E(X) f (x)dx − 2E(X) xf (x)dx
−∞ −∞ −∞
2 2
= E(X ) + E(X) − 2E(X).E(X)
= E(X 2 ) − E(X)2
Since the variance measures the average squared deviation of X from its mean, the units
of measurement are the squares of those of X. An alternative measure which converts
the variance into the same units of measurement as X is the standard deviation, simply
defined as the square root of the variance, i.e.:
p
s.d.(X) = V (X)
When considering more than one random variable, e.g. two random variables X
and Y with joint PDF f (x, y), we define:
Z ∞Z ∞
V (X) = [x − E(X)]2 f (x, y)dxdy
−∞ −∞
Z ∞Z ∞
V (Y ) = [y − E(Y )]2 f (x, y)dxdy
−∞ −∞
These variances could alternatively be calculated by first working out the marginal
PDFs of X and Y , and then using the variance formula for a univariate random variable.
To show that these approaches give the same answer, consider for example V (X):
Z ∞Z ∞
V (X) = [x − E(X)]2 f (x, y)dxdy
Z−∞
∞
−∞
Z ∞
2
= [x − E(X)] f (x, y)dy dx
−∞ −∞
Z ∞
= [x − E(X)]2 f (x)dx
−∞
Example
In the case where X ∼ U (a, b), we have:
1
b−a
for a ≤ x ≤ b
f (x) =
0 otherwise
12
and so:
Z ∞
2
E(X ) = x2 f (x)dx
−∞
b
x2
Z
= dx
a b−a
b
x3
=
3(b − a) a
3 3
b −a
=
3(b − a)
1. If a is a constant then:
V (a) = 0
V (aX + b) = a2 V (X)
5.3 Covariance
When we have a joint distribution involving more than one variable, it is also useful
to have a summary measure of the association between the variables. A well used
measure of linear association between two random variables is the covariance. Given
13
two random variables X and Y with joint PDF f (x, y), the covariance between X and
Y is defined as:
The sign of the covariance indicates whether the random variables are positively related
(i.e. when X increases, Y typically increases as well) or negatively related (i.e. when
X increases, Y typically decreases), or if the covariance is zero there is no linear
relationship between X and Y .
A standardized measure of covariance, which provides a unit free measure of the
strength of linear association between X and Y (as well as the sign of the relationship),
is the correlation. The correlation between X and Y is defined by:
C(X, Y )
ρXY = p p
V (X) V (Y )
If ρXY > 0 then X and Y are said to be positively correlated, and if ρXY < 0 then X
and Y are said to be negatively correlated.
It is possible to show that ρXY always lies in the following range:
−1 ≤ ρXY ≤ 1
If ρXY = 0 then X and Y are said to be uncorrelated, while if ρXY = ±1 then there
exists between X and Y an exact linear relationship of the form Y = aX +b. Inbetween
these extremes, correlation measures the degree of linear relationship between X and
Y , and does not depend on the units of measurement involved with X and Y .
14
Properties of Covariance
V (X + Y ) = V (X) + V (Y ) + 2C(X, Y )
V (X − Y ) = V (X) + V (Y ) − 2C(X, Y )
V (X + Y ) = E[(X + Y )2 ] − E(X + Y )2
= E(X 2 ) + E(Y 2 ) + 2E(XY ) − E(X)2 − E(Y )2 − 2E(X)E(Y )
= V (X) + V (Y ) + 2C(X, Y )
V (X − Y ) = E[(X − Y )2 ] − E(X − Y )2
= E(X 2 ) + E(Y 2 ) − 2E(XY ) − E(X)2 − E(Y )2 + 2E(X)E(Y )
= V (X) + V (Y ) − 2C(X, Y )
so that
ρXY = 0
V (X + Y ) = V (X) + V (Y )
V (X − Y ) = V (X) + V (Y )
15
6.1 The Normal Distribution
A random variable X is said to have a normal distribution if its PDF has the form:
(x − µ)2
1
f (x) = √ exp − , −∞ ≤ x ≤ ∞
2πσ 2 2σ 2
The normal distribution is fully characterized by the two parameters µ and σ 2 . It can
be shown that:
Z ∞
E(X) = xf (x)dx = µ
−∞
Z ∞
V (X) = (x − µ)2 f (x)dx = σ 2
−∞
We write X ∼ N (µ, σ 2 ).
1. The normal distribution is bell-shaped and symmetric about its mean µ. Below is
the PDF corresponding to X ∼ N (0, 1), called the standard normal distribution:
16
3. A linear function of a normal random variable is also normally distributed:
if X ∼ N (µ, σ 2 ), Y = aX + b ∼ N (aµ + b, a2 σ 2 )
5. If X ∼ N (µ, σ 2 ) then:
X − µ ∼ N (0, σ 2 )
and:
X −µ
∼ N (0, 1)
σ
Thus any probability statement concerning X ∼ N (µ, σ 2 ) can be transformed into
an equivalent one concerning (X − µ)/σ ∼ N (0, 1). For this reason, textbooks only
report probability values for the N (0, 1) distribution.
1. The chi-square distribution is asymmetric, and is right skewed, with the skewness
decreasing with the degrees of freedom n. As n → ∞, the chi-square distribution
approaches the (symmetric) normal distribution, as seen below:
17
2. Let Z ∼ χ2n . We can show that the mean and variance of Z are given by:
E(Z) = n
V (Z) = 2n
1. The t distribution is symmetric, but is less peaked and has thicker tails than the
normal distribution. As n → ∞, the t distribution approaches the standard normal
distribution, as seen below:
18
2. Let Y ∼ tn . We can show that the mean and variance of Y are given by:
E(Y ) = 0
V (Y ) = n/(n − 2), n>2
3. Textbooks give probability values for different values of n, but if n is large standard
normal probabilities provide a good approximation.
1. Like the chi-square distribution, the F distribution is asymmetric and right skewed.
As n1 , n2 → ∞, the F distribution approaches the (symmetric) normal.
2. Let W ∼ Fn1 ,n2 . We can show that the mean and variance of W are given by:
E(W ) = n2 /(n2 − 2), n2 > 2
2n22 (n1 + n2 − 2)
V (W ) = , n2 > 4
n1 (n2 − 2)2 (n2 − 4)
3. If Y ∼ tn , then Y 2 ∼ F1,n .
19
7 Samples and Estimators
When the PDF of a random variable is known, we can work out its probabilistic prop-
erties, and establish the mean, variance and other characteristics we may be interested
in. However, in practice, we typically do not know the precise probability distributions
governing the continuous random variables that we deal with in empirical economic
analysis. In this situation we refer to the true unknown distribution as the population
distribution, and we try and investigate aspects of this population distribution using
a sample of observed data. The sample data can then be used to estimate quantities
associated with the population distribution, for example we may wish to estimate the
population mean.
Cross-sectional data – a sample of n observations drawn at the same point in time (e.g.
a sample of annual income for n different households in a given year).
7.2 Estimators
A function of the sample observations is known as a sample statistic, and since each
observation in a random sample is itself a random variable, then any sample statistic
will also be a random variable. Sample statistics are often constructed in a way to
provide information about the unknown parameters of the population distribution,
and when this is done the sample statistic is called an estimator. For example, the
population mean is usually unknown, and we may wish to estimate it using sample
data; in such a case, we would construct an estimator of the mean using the data.
In general, let the random variable X have the PDF f (x; θ), where the notation
f (x; θ) is used to denote the fact that the PDF depends on some parameters collectively
referred to as θ. For example, if X ∼ N (µ, σ 2 ) then θ = (µ, σ 2 ). Now suppose we have
20
a random sample x1 , x2 , ..., xn from f (x; θ) but θ is unknown. Then we let our estimator
of θ be denoted by:
θ̂ = g(x1 , x2 , ..., xn )
where g(.) is some suitably chosen function of the sample observations x1 , x2 , ..., xn . For
example, if X ∼ N (µ, σ 2 ) and we wanted to estimate the mean µ, a sensible estimator
would be: n
1X
x̄ = xi
n i=1
i.e. the average of the data, known as the sample mean. Another common estimator
is the sample variance, defined as:
n
1 X
s2 = (xi − x̄)2
n − 1 i=1
which can be used to estimate the population variance σ 2 . Note that while population
parameters like E(X) and V (X) are fixed values related to a given PDF, estimators
of these values are random variables – if different samples are drawn from the same
population, their values will change. Since an estimator is a random variable, it will
have its own probability distribution, called a sampling distribution.
Example
Suppose X ∼ N (µ, σ 2 ) and we have a random (iid) sample x1 , x2 , ..., xn from this
distribution. The sampling distribution of the sample mean x̄ can be found as follows.
Since x̄ is a linear combination of independent, normally distributed random variables,
it is itself normally distributed. The mean can be derived as follows:
n
!
1X
E(x̄) = E xi
n i=1
n
!
1 X
= E xi
n i=1
n
1X
= E(xi )
n i=1
n
1X
= µ
n i=1
= µ
21
Hence, the sampling distribution of x̄ is given by:
σ2
x̄ ∼ N µ,
n
We now consider two desirable properties that we would wish our estimators to
possess.
Unbiasedness
E(θ̂) = θ
i.e. the mean of its sampling distribution equals the parameter of interest θ. Conversely,
if E(θ̂) 6= θ, we say that θ̂ is a biased estimator. Other things equal, an unbiased
estimator is preferred to a biased one. Examples of unbiased estimators are x̄ and s2
since, as shown above, E(x̄) = µ, and we can also show that E(s2 ) = σ 2 .
Efficiency
An efficient estimator is one that achieves the smallest possible variance among a given
class of estimators, i.e. it delivers as precise an estimator as possible. More formally, let
θ̂1 , θ̂2 , ..., θ̂k be k different unbiased estimators of θ. Then θ̂j is said to be the minimum
variance unbiased estimator, i.e. efficient among unbiased estimators, if:
8 Hypothesis Testing
Suppose we observe a random sample x1 , x2 , ..., xn on a random variable X with PDF
f (x; θ) and we obtain an estimate of θ, denoted by θ̂. The problem of hypothesis testing
involves deciding whether the value of θ̂ is compatible with some hypothesized value
of θ, say θ∗ . In other words does the value of θ̂ lend support to the assertion that the
observed sample could have originated from the PDF f (x; θ∗ )? To be able to perform a
test, we need to assume a functional form for f (.), e.g. the normal distribution, but of
course we will not typically know the parameters of f (.) as these are the objects of the
hypothesis testing exercise, e.g. testing whether the mean of the normal distribution
is equal to some particular value.
To set up a hypothesis test, we formally state the question as:
H0 : θ = θ∗
H1 : θ 6= θ∗
22
where H0 is called the null hypothesis and H1 is called the alternative hypothesis.
The idea behind hypothesis testing is as follows. If θ̂ is a decent estimator of θ then
it will typically take a value that is close to θ, i.e. the distance θ̂ − θ will be “small”.
So, if H0 is true, θ̂ −θ∗ = (θ̂ −θ)+(θ −θ∗ ) should be “small” since (θ̂ −θ) is “small” and
(θ − θ∗ ) = 0 if H0 is true. On the other hand, if H1 is true, θ̂ − θ∗ = (θ̂ − θ) + (θ − θ∗ )
should be “large” since (θ̂ − θ) is “small” but (θ − θ∗ ) 6= 0 is “large”. We need to know
the distribution of θ̂, and particularly its mean and variance, to be able to calibrate
what we mean by “small” and “large”.
A sample statistic based on θ̂ − θ∗ which is used to discriminate between H0 and
H1 is known as a test statistic, which we shall denote by t∗ . The idea is that t∗ has a
different sampling distribution under H0 and H1 , this being “larger” under H1 than H0 ,
thereby making discrimination between the two hypotheses possible. We then compare
the value of t∗ with a specified cut-off value, known as the critical value, and if t∗ is
greater than the critical value we reject H0 in favour of H1 .
Although we associate large values of t∗ as being consistent with H1 , there is always
the possibility that t∗ happens to be greater than the critical value even when H0 is
true. In such cases we would reject H0 even though it is true – this is known as making
a Type I error. However, once we have worked out the distribution of t∗ under H0 , we
can quantify what risk there is of a Type I error for a given critical value. Put another
way, we can decide what risk of Type I error we are happy with, and set the critical
value accordingly. Usually, we set this risk level, known as the significance level or size,
to be 0.05, and we then determine the critical value (c.v.) so as to make the following
probability statement true under H0 :
Example
Suppose we have a random (iid) sample x1 , x2 , ..., xn from the distribution X ∼ N (µ, σ 2 ),
and suppose that we assume X ∼ N but we do not know the values of µ or σ 2 . We
have already considered unbiased estimators of these parameters – the sample mean
23
and variance:
n
1X
x̄ = xi
n i=1
n
1 X
s2 = (xi − x̄)2
n − 1 i=1
Suppose now that we wish to test a hypothesis about the population mean, specifically:
H0 : µ = µ∗
H1 : µ 6= µ∗
Given that we are conducting a test about the population mean, our test statistic will
naturally be based on the sample mean x̄. We have already established the sampling
distribution of x̄:
σ2
x̄ ∼ N µ,
n
which we can standardize to give:
σ2
x̄ − µ ∼ N 0,
n
x̄ − µ
q ∼ N (0, 1)
σ2
n
√
x̄ − µ
n ∼ N (0, 1)
σ
(n − 1)s2
∼ χ2n−1
σ2
and we can write:
√ √
x̄ − µ x̄ − µ σ
n = n
s σ s
√ x̄−µ
n σ N (0, 1)
= r ∼p 2
(n−1)s2
. χn−1 /(n − 1)
σ2
(n − 1)
√ x̄−µ
(n−1)s2
Then, since n σ
and σ2
are independent:
√
x̄ − µ
n ∼ tn−1
s
24
Now if H0 is true it follows that:
√ x̄ − µ∗
n ∼ tn−1
s
We can then use this result to define a test statistic for distinguishing between H0 and
H1 :
√ x̄ − µ∗
∗
t = n
s
The appropriate critical values can be obtained from the tn−1 distribution using a
chosen significance level. The test can distinguish between H0 and H1 because if H1 is
true, we can write:
√ x̄ − µ∗
∗
t = n
s
√ (x̄ − µ) + (µ − µ∗ )
= n
s
√ √ µ − µ∗
x̄ − µ
= n + n
s s
√ ∗
µ−µ
= ∼ tn−1 + n
s
and so, given that µ 6= µ∗ under H1 , t∗ should be “larger” (in absolute value terms)
than it would be if H0 was true. As n → ∞ we find that |t∗ | → ∞ under H1 and so
the probability of rejecting H0 approaches 1, i.e. the test is consistent.
Using a numerical example, suppose the unknown population parameters are µ = 3
and σ 2 = 2, and suppose we obtain estimates from a sample of n = 64 observations,
giving x̄ = 2.9 and s2 = 4. If we consider testing H0 : µ = 3 at the 0.05 significance
level, the hypothesis is true, so the distribution of the test statistic t∗ is tn−1 . Given
that we use critical values from a tn−1 distribution, there is a 0.05 chance that we
incorrectly reject the null. Suppose instead we consider testing H0 : µ = 1 at the 0.05
significance level. In this case the null hypothesis is false (as the true population value
is µ = 3), and the distribution of the test statistic t∗ is
√ µ − µ∗ √
3−1
tn−1 + n = t63 + 64 √
s 4
= t63 + 8
The test statistic is then very likely to reject the null hypothesis as it comes from a
distribution that is heavily right-shifted compared to the distribution from which the
critical values are obtained.
25
Figure 1:
26