MTH451 Study Notes
MTH451 Study Notes
1 Postulates of Probability
• Postulate 1: The probability of an event is a non-negative real number;
that is, P (A) ≥ 0 for any subset A of S.
• Postulate 2: P (S) = 1.
• Postulate 3: If A1 , A2 , A3 , ..., is a finite or infinite sequence of mutually
exclusive events of S, then
1
2 Definitions
• Definition 2.1: If A and B are any two events in a sample space S and
P (A) 6= 0, the conditional probability of B given A is
P (A ∩ B)
P (B|A) =
P (A)
P (A ∩ B) = P (A) · P (B)
• Definition 3.4: A function with values f (x), defined over the set of
all real numbers, is called a probability distribution function of the
continuous random variable X if and only if
Z b
P (a ≤ X ≤ b) = f (x)dx
a
2
• Definition 3.5: If X is a continuous random variable and the value of its
probability density at t is f (t), then the function given by
Z x
F (x) = P (X ≤ x) = f (t)dt, −∞ < x < ∞
−∞
for −∞ < x < ∞ and −∞ < y < ∞, where f (s, t) is the value of
the joint probability distribution of X and Y at (s, t) is called the joint
cumulative distribution, of X and Y .
• Definition 3.8: A bivariate function with values f (x, y), defined over
the xy-plane, is called a joint probability density function of the
continuous random variables X and Y if and only if
Z Z
P [(X, Y ) ∈ A] = f (x, y) dx dy
A
3
• Definition 3.10: If X and Y are discrete random variables and f (x, y)
is the value of their joint probability distribution at (x, y), the function
given by X
g(x) = f (x, y)
y
4
for x ∈ (−∞, +∞), is called the conditional density of X given Y = y.
Correspondingly, if g(x) is thevalue of the marginal density of X at x, the
function given by
f (x, y)
w(y|x) = g(x) 6= 0
g(x)
for y ∈ (−∞, +∞), is called the conditional density of Y given X = x.
• Definition 3.14: If f (x1 , x2 , ..., xn ) is the value of the joint probabil-
ity distribution of the n discrete random variables X1 , X2 , ..., Xn at
(x1 , x2 , ..., xn ), and fi (xi ) is the value of the marginal distribution of Xi
at xi for i = 1, 2, ..., n, then the n random variables are independent if
and only if
• Definition 4.2: The rth moment about the origin of a random vari-
able X, denoted by µ0r , is the expected value of X r ; symbolically,
X
µ0r = E(X r ) = xr · f (x)
x
when X is continuous.
• Definition 4.3: µ01 is called the mean of the distribution X, or simply
the mean of X, and it is denoted by µ.
• Definition 4.4: The rth moment about the mean of a random variable
r
X, denoted by µr , is the expected value of (X − µ) ; symbolically,
X
µr = E [(X − µ)r ] = (x − µ)r · f (x)
x
5
for r = 0, 1, 2, ... when X is discrete, and
Z +∞
µr = E [(X − µ)r ] = (x − µ)r · f (x)dx
−∞
when X is continuous.
• Definition 4.5: µ2 is called the variance of the distribution X, or simply
the variance of X, and it is denoted by σ 2 , var(X), or V (X); σ, or the
positive square root of the variance, is called the standard deviation.
• Definition 4.6: The moment-generating function of a random vari-
able X, where it exists, is given by
X
MX (t) = E(etX ) = etx · f (x)
x
when X is continuous.
• Definition 4.7: The rth and sth product moment about the origin
of the random variables X and Y , denoted µ0r,s is the expected value of
X r Y s ; symbolically,
XX
µ0r,s = E(X r Y s ) = xr y s · f (x, y)
x y
6
• Definition 4.9: µ1,1 is called the covariance of X and Y , and it is
denoted by σXY , cov(X, Y ), or C(X, Y ).
• Definition 4.10: If X is a discrete random variable and f (x|y) is the
value of the conditional probability distribution of X given Y = y at x,
the conditional expectation of u(X) given Y = y is
X
E [u(X)|y] = u(x) · f (x|y)
x
f (x; θ) = θx (1 − θ)1−x x = 0, 1
b∗ (x; k, θ) = x−1
k k−x
k−1 θ (1 − θ)
for x = k, k + 1, k + 2, ....
7
• Definition 5.5: A random variable X has a geometric distribution
and it is referred to as a geometric random variable if and only if its
probability distribution is given by
g(x; θ) = θ(1 − θ)x−1
for x = 1, 2, 3, ....
• Definition 5.6: A random variable X has a hypergeometric distribu-
tion and it is referred to as a hypergeometric random variable if and only
if its probability distribution is given by
M
N −M
x n−x
h(x; n, N, M ) =
(N
n)
8
• Definition 6.2: A random variable has a gamma distribution and it
is referred to as a gamma random variable if and only if its probability
density is given by
1
g(x; α, β) = xα−1 e−x/β
β α Γ(α)
for x > 0 and g(x; α, β) = 0 elsewhere, where α > 0 and β > 0.
• Definition 6.3: A random variable has an exponential distribution
and it is referred to as an exponential random variable if and only if its
probability density is given by
1 −x/θ
g(x; θ) = e
θ
for x > 0 and g(x; θ) = 0 elsewhere, where θ > 0.
• Definition 6.4: A random variable X has a chi-square distribution
and it is referred to as a chi-square random variable if and only if its
probability density is given by
1 ν−2 x
f (x) = x 2 e− 2
2ν/2 Γ(ν/2)
for x > 0 and f (x) = 0 elsewhere. The parameter ν is referred to as the
number of degrees of freedom, or simply the degrees of freedom.
• Definition 6.5: A random variable has a beta distribution and it is
referred to as a beta random variable if and only if its probability density
is given by
Γ(α + β) α−1
f (x) = x (1 − x)β−1
Γ(α) · Γ(β)
for x ∈ (0, 1) and f (x) = 0 elsewhere, where α > 0 and β > 0.
• Definition 6.5X: A random variable has a Cauchy distribution and
it is referred to as a Cauchy random variable if and only if its probability
density is given by
β/π
f (x) =
(x − α)2 + β 2
for x ∈ (−∞, ∞).
• Definition 6.6: A random variable X has a normal distribution and
it is referred to as a normal random variable if and only if its probability
density is given by
1 1 x−µ 2
n(x; µ, σ) = √ e− 2 ( σ )
σ 2π
for x ∈ (−∞, ∞) and where σ > 0.
9
• Definition 6.7: The normal distribution with µ = 0 and σ = 1 is referred
to as the standard normal distribution.
• Definition 6.8: A pair of random variables X and Y have a bivariate
normal distribution and they are referred to as jointly normally dis-
tributed random variables if and only if their joint probability density is
given by
2 2
1 x−µ1 2−µ1 y−µ2 y−µ2
− 2(1−ρ2 ) σ1 − 2ρ σ1 σ2 + σ2
f (x, y) = p
2πσ1 σ2 1 − ρ2
for x ∈ (−∞, ∞) and y ∈ (−∞, ∞), where σ1 > 0, σ2 > 0, and −1 < ρ <
1.
• Definition 8.1: If X1 , X2 , ..., Xn are independent and identically dis-
tributed random variables, we say that they constitute a random sample
from the infinite population given by their common distribution.
• Definition 8.2: If X1 , X2 , ..., Xn constitute a random sample, then
Pn
Xi
X̄ = i=1
n
is called the sample mean and
Pn
2 (Xi − X̄)2
S = i=1
n−1
is called the sample variance.
• Definition 8.3: If X1 is the first value drawn from a finite population of
size N , X2 is the second drawn, ..., Xn is the nth value drawn, and the
joint probability distribution of the n random variables is given by
1
f (x1 , x2 , ..., xn ) =
N (N − 1)...(N − n + 1)
10
3 Theorems
• Theorem 1.1 - Multiplication of Choices: If an operation consists of
two steps, of which the first can be done in n1 ways and for each of these
the second can be done in n2 ways, then the whole operation can be done
in n1 · n2 ways.
• Theorem 1.2 - Multiplication of Choices (Generalized): If an op-
eration consists of k steps, of which the first can be done in n1 ways, for
each of these the second step can be done in n2 ways, for each of the first
two the third step can be done in n3 ways, and so forth, then the whole
operation can be done in n1 · n2 · ... · nk ways.
• Theorem 1.3 - Permutations of n distinct objects: The number of
permutations of n distinct objects is n!.
• Theorem 1.4 - Permutations of n distinct objects taken r at a
time: The number of permutations of n distinct objects taken r at a time
is
n!
n Pr =
(n − r)!
for r = 0, 1, 2, ..., n.
• Theorem 1.5 - Permutations of n distinct objects arranged on a
circle: The number of permutations of n distinct objects arranged in a
circle is (n − 1)!.
• Theorem 1.6 - Permutations of n objects of which some are alike:
The number of permutations of n objects of which n1 are of one kind, n2
are of a second kind, ..., nk are of a kth kind, and n1 + n2 + ... + nk = n is
n!
n1 ! · n2 ! · ... · nk !
11
• Theorem 1.9 - Binomial Expansion: For any positive integer n
n
X
(x + y)n = (nr ) xn−r y r
r=0
(nr ) = n−r n
P (Ac ) = 1 − P (A)
12
• Theorem 2.6 - Maximum and minimum values of probabilities:
0 ≤ P (A) ≤ 1 for any event A.
• Theorem 2.7 - Addition rule for two events: If A and B are any
two events in a sample space S, then
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
−P (B ∩ C) + P (A ∩ B ∩ C)
P (A ∩ B) = P (A) · P (B|A)
13
• Theorem 2.13 - Bayes’ Theorem : If B1 , B2 , ..., Bk constitute a
partition of the sample space S and P (Bi ) 6= 0 for ∀i, then for any event
A in S such that P (A) 6= 0
P (Br ) · P (A|Br )
P (Br |A) = Pk
i=1 P (Bi ) · P (A|Bi )
for ∀k.
• Theorem 3.1 - Conditions for function to serve as probability
distribution: A function can serve as the probability distribution of a
discrete random variable X if and only if its values, f (x), satisfy the
conditions
14
• Theorem 3.6 - Relationship between values of probability density
and distribution function: If f (x) and F (x) are the values of the
probability density and the cumulative distribution of X at x, then
P (a ≤ X ≤ b) = F (b) − F (a)
15
• Theorem 4.1 - Expected value of a function g(X) defined over X:
If X is a discrete random variable and f (x) is the value of its probability
distribution at x, the expected value of g(X) is given by
X
E [g(X)] = g(x) · f (x)
x
16
• Theorem 4.6 - Relationship between the variance of a random
variable X and the first and second moments about the origin of
X:
σ 2 = µ02 − µ2
17
• Theorem 4.13 - Expected value of product of independent ran-
dom variables: If X1 , X2 , ..., Xn are independent, then
and
n
X XX
var(Y ) = a2i · var(Xi ) + 2 ai aj · cov(Xi Xj )
i=1 i<j
where the double summation extends over all values of i and j, from 1 to
n, for which i < j.
• Theorem 4.15 - Covariance of two linear combinations of random
variables: If X1 , X2 , ..., Xn are random variables and
n
X n
X
Y1 = ai Xi , Y2 = bi Xi
i=1 i=1
µ = nθ , σ 2 = nθ(1 − θ)
18
• Theorem 5.3 - Mean and variance of X/n, where X has bino-
mial distribution with parameters n and θ: If X has a binomial
distribution with the parameters n and θ and Y = X/n, then
θ(1 − θ)
E(Y ) = θ , σY2 =
n
nM nM (N − M )(N − n)
µ= , σ2 =
N N 2 (N − 1)
µ=λ , σ2 = λ
19
• Theorem 6.1 - Mean and variance of uniform density: The mean
and the variance of the uniform distribution are given by
α+β 1
µ= , σ2 = (β − α)2
2 12
β r Γ(α + r)
µ0r =
Γ(α)
µ = αβ , σ 2 = αβ 2
20
• Theorem 6.8 - Normal approximation to binomial distribution: If
X is a random variable having a binomial distribution with the parameters
n and θ, then the moment-generating function of
X − nθ
Z=p
nθ(1 − θ)
21
the functions given by y1 = u1 (x1 , x2 ) and y2 = u2 (x1 , x2 ) are partially
differentiable with respect both x1 and x2 and represent a one-to-one
transformation for all the values within the range of X1 and X2 for which
f (x1 , x2 ) 6= 0, then, for these values of x1 and x2 , the equations y1 =
u1 (x1 , x2 ) and y2 = u2 (x1 , x2 ) can be uniquely solved for x1 and x2 to
give x1 = w1 (y1 , y2 ) and y2 = w2 (y1 , y2 ), and for the corresponding values
of y1 and y2 , the joint probability density of Y1 = u1 (X1 , X2 ) and Y2 =
u2 (X1 , X2 ) is given by
22
• Theorem 8.4 - Distribution of X̄ for random sample from normal
population: If X̄ is the mean of a random sample of size n from a normal
population with the mean µ and the variance σ 2 , its sampling distribution
is a normal distribution with the mean µ and the variance σ 2 /n.
• Theorem 8.5 - Covariance of rth and sth values of random sample
from infinite population: If Xr and Xs are the rth and sth random
variables of a random sample of size n drawn from the finite population
{c1 , c2 , ..., cn }, then
σ2
C(Xr , Xs ) = −
N −1
σ2 N − n
E(X̄) = µ , V (X̄) = ·
n N −1
23
freedom, and X1 + X2 has a chi-square distribution with ν > ν1 degrees
of freedom, then X2 has a chi-square distribution with ν − ν1 degrees of
freedom.
• Theorem 8.11 - Joint distribution of mean and variance for ran-
dom sample from normal population: If X̄ and S 2 are the mean and
the variance of a random sample of size n from a normal population with
the mean µ and the standard deviation σ, then
– 1.: X̄ and S 2 are independent;
(n−1)S 2
– 2.: the random variable σ2 has a chi-square distribution with
n − 1 degrees of freedom
• Theorem 8.12 - Derivation of t distribution: If Y and Z are indepen-
dent variables, Y has a chi-square distribution with ν degrees of freedom,
and Z has the standard normal distribution, then the distribution of
T
T =p
Y /ν
is given by
− ν+1
Γ ν+1
2 t2 2
f (t) = √ ν
· 1 + , t ∈ (−∞, ∞)
πνΓ 2 ν
24
• Theorem 8.15 - Distribution of ratio of variances of independent
random samples from normal populations, divided by respective
population variances: If S12 and S22 are the variances of independent
random samples of size n1 and n2 from normal populations with the vari-
ances σ12 and σ22 , then
S12 /σ12 σ2 S 2
F = 2 2 = 22 12
S2 /σ2 σ1 S2
25
4 Power Series
1
• 1−x for |x| < 1:
∞
1 X
= 1 + x + x2 + x3 + ... = xj
1−x j=0
• ex for ∀x:
∞
x2 x3 X xj
ex = 1 + x + + + ... =
2! 3! j=0
j!
26
5 Special Probability Distributions
5.1 The Discrete Uniform Distribution
• Distribution:
1
f (x) = x = x1 , x2 , ..., xk
k
• Mean:
k
X 1
µ= xi ·
i=1
k
• Variance:
k
X 1
σ2 = (xi − µ)2 ·
i=1
k
• Mean:
µ = nθ
• Variance:
σ 2 = nθ(1 − θ)
• Moment-Generating Function:
n
MX (t) = 1 + θ(et − 1)
• Mean:
k
µ=
θ
• Variance:
k 1
σ2 = −1
σ θ
27
5.5 The Geometric Distribution
• Distribution:
g(x; θ) = θ(1 − θ)x−1
• Mean:
nM
µ=
N
• Variance:
nM (N − M )(N − n)
σ2 =
N 2 (N − 1)
• Variance:
σ2 = λ
• Moment-Generating Function:
t
−1)
MX (t) = eλ(e
k
X k
X
xi = 0, 1, ..., n, xi = n, θi = 1
i=1 i=1
28
5.9 The Multivariate Hypergeometric Distribution
• Distribution:
(M M2 Mk
x1 )(x2 )...(xk )
1
29