Probability Statistics Notes
Probability Statistics Notes
∗
Caution: there might be errors/mistakes/typos. If you see some, then please email
me.
1
Contents
1 Syllabus 3
1.1 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Special Distributions . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Function of a Random Variable . . . . . . . . . . . . . . . . . 3
1.5 Joint Distributions . . . . . . . . . . . . . . . . . . . . . . . . 3
1.6 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.7 Sampling Distributions . . . . . . . . . . . . . . . . . . . . . . 3
1.8 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.9 Testing of Hypotheses . . . . . . . . . . . . . . . . . . . . . . 4
2 Motivation 4
5 Joint distribution 37
5.1 Conditional distributions in two variables . . . . . . . . . . . 42
5.2 Transformation of random variables . . . . . . . . . . . . . . 44
5.3 Bivariate normal distribution . . . . . . . . . . . . . . . . . . 48
7 Estimation of parameters 59
8 Testing of hypotheses 59
2
1 Syllabus
1.1 Probability
Classical, relative frequency and axiomatic definitions of probability, addi-
tion rule and conditional probability, multiplication rule, total probability,
Bayes’ Theorem and independence, problems.
1.6 Transformations
Functions of random vectors, distributions of sums of random variables,
problems.
3
problems.
1.8 Estimation
Unbiasedness, consistency, the method of moments and the method of max-
imum likelihood estimation, confidence intervals for parameters in one sam-
ple and two sample problems of normal populations, confidence intervals for
proportions, problems.
2 Motivation
Probability and statistics has a plethora of applications in real world. Chance
of winning a game. Applications in
• Data science
• Artificial Intelligence
• Machine Learning
• Weather forecast
4
3. The experiment can be repeated under identical conditions.
Example 2. For the previous example, the following are the sample space
Ω.
1. Ω = {H, T }.
2. Ω = {1, 2, 3, 4, 5, 6}.
4. Ω = [0, ∞).
5
P (A) = limn→∞ |An |/|Ωn |, limn→∞ An = A, limn→∞ Ωn = Ω.
1. Ω ∈ A
2. A ∈ A(A ⊂ Ω) =⇒ Ac ∈ A
3. A, B ∈ A, A, B ∈ Ω =⇒ A ∪ B ∈ A.
Non example:
1. A = {Ω}.
1. P (Ω) = 1
6
Remark: Classical definition is a particular case of the axiomatic definition
of probability.
Example 11. What is the probability that a randomly chosen number from
Ω = [0, 1] will be
1. a rational number
1. P (ϕ) = 0
2. P (Ac ) = 1 − P (A)
4. 1 − P (∪∞ ∞ c
i=1 Ai ) = P (∩i=1 Ai )
5. P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
Problem 2. Consider all families with two children and assume that the
boy and the girl are equally
(i) If a family is chosen at random and is found to have a boy. Then what
is the probability that the other one is also a boy?
(ii) If a child is chosen at random from the families and is found to be boy
then what is the probability that the other child in that family is also a boy?
Solution: (i) Ω = {(b, b), (b, g), (g, b), (g, g)}
A := a boy is found in the family
B := the other one is also boy
P (A) = 34 , P (A ∩ B) = 14
7
P (A∩B)
P (B|A) = P (A) = 31 .
[picture]
Example 12. Consider a tetrahedron. one side : A, 2nd side : B, 3rd side
: C, 4th side : ABC. Roll a tetrahedron , Ω = {A, B, C, ABC}
P (A) = 21 , P (A ∩ B) = 14 = P (A)P (B)
P (B) = 21 , P (A ∩ C) = 41 = P (A)P (C)
1
P (C) = 2 , P (B ∩ C) = 41 = P (B)P (C)
A1 = seeing A
A2 = seeing B
A3 = seeing C
{A1 , A2 , A3 } are pairwise independent.
P (A1 ∩ A2 ∩ A3 ) = 14 and P (A1 )P (A2 )P (A3 ) = 81 , so {A1 , A2 , A3 } are not
mutually independent.
8
Remark. Mutually exclusive or exhaustive events does not depend on the
probability function.
Proof.
P (Ai ∩ B) P (B ∩ Ai )
P (Ai |B) = = (1)
P (B) P (B)
P (B ∩ Ai )
P (B ∩ Ai ) = P (B|Ai )P (Ai )[P (B|Ai ) = ] (2)
P (Ai )
B = (B ∩ A1 ) ∪ (B ∩ A2 ) ∪ · · · ∪ (B ∩ Ak )
= B ∩ (A1 ∪ A2 ∪ · · · ∪ Ak )
=B∩Ω
=B
Now (B ∩ Ai ) ∩ (B ∩ Aj ) = ϕ, because Ai ∩ Aj = ϕ as they are mutually
exclusive.
k
X k
X
P (B) = P (B ∩ Ai ) = P (B|Ai )P (Ai )[f rom(2)] (3)
i=1 i=1
9
Problem 4. There are three drawers in a table. The first drawer contains
two gold coins. The second drawer contains a gold and a silver coin. The
third one contains two silver coins. Now a drawer is chosen at random and
then a coin is selected randomly. It is found that the gold coin has been
selected. What is the probability that the second drawer was chosen?
P (B|A2 )P (A2 )
P (A2 |B) =
P (B|A1 )P (A1 ) + P (B|A2 )P (A2 ) + P (B|A3 )P (A3 )
1 1
2 × 3
=
1 × 13 + 12 × 13 + 0 × 31
1 6
= ×
6 3
1
= .
3
Problem 5. Prove that P (Ac ) = 1 − P (A).
Solution: P (Ω) = A ∪ Ac ∪ ϕ ∪ · · · , Ai = ϕ, ∀i ≥ 3.
P (Ω) = P (A) + P (Ac ) + P (ϕ) + · · ·
=⇒ 1 = P (A) + P (Ac ) + 0
=⇒ P (Ac ) = 1 − P (A).
Solution: A ∪ B = A ∪ (B ∩ Ac ) ∪ ϕ ∪ · · · ∪ ϕ
P (A ∪ B) = P (A) + P (B ∩ Ac )
B = (B ∩ Ac ) ∪ (A ∩ B) ∪ ϕ ∪ · · · ∪ ϕ
P (B) = P (B ∩ Ac ) + P (A ∩ B)
Hence P (A ∪ B) = P (A) + P (B) − P (A ∪ B)
10
Example 13. Ω = {HH, HT, T H, T T }
A = all subsets of Ω
X:Ω→R
X(ω) = the number of H in ω ∈ Ω
X(HH) = 2, X(T H) = 1, X(HT ) = 1, X(T T ) = 0
ϕ x<0
{T T } 0≤x<1
X −1 ((−∞, x]) =
{T H, HT, T T } 1 ≤ x < 2
Ω 2≤x<∞
2. F is right continuous
3. F (−∞) = limx→−∞ = 0
11
4. F (+∞) = limx→+∞ = 1
[draw picture]
Example 17. A box contains good and defective items. If an item is drawn
is good we assign 1 to the drawing, otherwise 0. Let p be the probability of
drawing at random a good item. Then
X = 1 P (X = 1) = p
X = 0 P (X = 0) = 1 − p
0
x<0
F (X) = P {X ≤ x} = 1 − p 0 ≤ x < 1
1 x≥1
Exercise 1:
6 1
Let X be an RV with pmf P (X = k) = π 2 k2
,k = 1, 2, · · · . Then
calculate F (X)?
12
Remark. The function f (t) is called the probability density function (pdf)
of
R ∞the RV X.
−∞ f (t)dt = F (−∞) = 1.
P (a < X ≤ b) = P (X ≤ b) − P (X ≤ a)
= F (b) − F (a)
Z b Z a
= f (t)dt − f (t)dt
−∞ −∞
Z b
= f (t)dt
a
Rx
F (x) = ∞ f (t)dt
′
F (x) = f (x)
(
0 t ≤ 0, t ≥ 1
f (t) =
1 0<t<1
[picture]
Exercise 2:
If X is RV, then prove that
1. aX + b, a, b ∈ R, are RV
13
2. X 2 is RV
3. |X| is RV.
14
Definition 27. A number Qp satisfying P (X ≤ Qp ) ≥ p and P (X ≥ Qp ) ≥
1 − p, 0 < p < 1 is called quartile of order p. If p = 12 , then Q1/2 is called
median.
Q1/4 , Q2/4 , Q3/4 quartiles
Q1/10 , Q2/10 , · · · , Q9/10 decile
Q1/100 , Q2/100 , · · · , Q99/100 percentiles.
Definition 28. (Moment generating function) X be an RV on (Ω, A, P ).
Then the function M (s) = E(esX ) is known as moment generate function
of the RV X if E(esX ) exists in some neighbourhood of 0.
Example 22. X with pdf
1 − x2
(
f (x) = 2e x>0
0 x≤0
Z ∞
M (s) = esx f (x)dx
−∞
1 ∞ sx − x
Z
= e e 2 dx
2 0
1 ∞ − x (1−2s)
Z
= e 2 dx
2 0
1 1
= ,s <
1 − 2s 2
Theorem 2. If the MGF M (s) of an RV X exists for all s ∈ (−s0 , s0 ), say
s0 > 0, then the derivatives of all order exist at s = 0 and can be evaluated
under the integral sign i.e., M (k) (s)|s=0 = E(X k ), for positive integer k.
Remark. Alternatively, if the MGF M (s) exists for s ∈ (−s0 , s0 ), s0 > 0,
one can express M (s)(uniquely) in a Maclaurin series expansion: M (s) =
2 ′′ k
M (0) + sM ′ (0) + s2 M + · · · so E(X k ) is the coefficient of sk! .
Example 23. Let X be an RV with pdf
1 − x2
(
2 e x>0
f (x) =
0 x≤0
1
M (s) = 1−2s , s < 21
′ 2 ′′ 8 ′ 2
M (s) = (1−2s) 2 , M (s) = (1−2s)3 , E(X) = M (0) = 2, E(X ) = 8.
15
es −1
R1
M (s) = 0 esx = s
′
E(X) = M (0)
′
= lim M (s)
s→0
ses − es + 1
= lim
s→0 s2
ses + es − es
= lim (byL′ Hospitalrule)
s→0 2s
1
=
2
16
Problem 8.
|x−α|
(
1
β {1 − β } α−β <x<α+β
f (x) =
0 otherwise
α ∈ R, β > 0. Median = α
Z α+β
M ean = E(X) = xf (x)dx
α−β
Z 1
= (yβ + α)(1 − |y|)dy
−1
Z 1 Z 1
= yβ(1 − |y|)dy + α (1 − |y|)dy
−1 −1
=0+α
=α
Z α+β Z 1
2 2
V ariance(X) = (x − α) f (x)dx = β y 2 (1 − |y|)dy
α−β −1
Z 1
= 2β 2 y 2 (1 − y)dy
0
β2
= .
6
17
(
0 x<k
F (x) = ϵ(x − k) =
1 x≥k
E(X) = k, E(X l ) = k l , Var(X) = E(X 2 ) − {E(X)}2 = k 2 − k 2 = 0,
MGF = M (t) = E(etX ) = et .
2. Two-point distribution
We say that an RV X has a two point distribution if it takes two
values x1 and x2 , with probabilities P {X = x1 } = p and P {X =
x2 } = 1 − p, 0 < p < 1
DF = F (x) = pϵ(x − x1 ) + (1 − p)ϵ(x − x2 )
E(X) = px1 + (1 − p)x2
E(X n ) = xn1 p + X2n (1 − p)
MGF = M (t) = E(etX ) = petx1 + (1 − p)etx2
Var(X)E(X 2 ) − {E(X)}2 = px21 + (1 − p)x22 − {px1 + (1 − p)x2 }2 =
p(1 − p)(x1 − x2 )2 .
3. Bernoulli Distribution
If x1 = 1 and x2 = 0 then it is called Bernoulli Distribution.
E(X) = p
M (t) = 1 + p(et − 1)
Var(X) = p(1 − p) = pq, p + q = 1
(n+1)(2n+1)
In particular, xi = 1, i = 1, 2, · · · , n E(X) = n+1
2 , E(X ) =
2
6 ,
2
n −1
Var(X) = 12 .
Problem 9. A box contains tickets numberd 1 to N . Let X be the
largest number drawn in n random drawing with replacement. P (X =
k) =?
P (X ≤ k) = ( Nk )n
P (X ≤ k − 1) = ( k−1
N )
n
n n
P (X = k) = P (x ≤ k) − P (X ≤ k − 1) = k −(k−1)
Nn .
18
5. Binomial distribution
We say that X has binomial distribution with parameter p if its pmf
is given by
n k n−k , k = 0, · · · , n; 0 ≤ p ≤ 1
p k = P {X = k} = k p (1 − p)
P n n
k=0 pk = (p + (1 − p))
X ∼ b(n, p)
solution: 1 − ( 56 )n ≥ 12
log 2
=⇒ n ≥ log 1.2 ≈ 3.8. n = 4
19
Definition 29. For a fixed positive integer r ≥ 1 and 0 < p < 1 an RV
with pmf given by (4) is said to have a negative binomial distrtibution.
We use this notation X ∼ N B(r; p).
We may
P∞write
X = x=0 xI{X=x} and F (x) = ∞ k+r−1
pr (1 − p)k ϵ(x − k)
P
x=0 k
∞
X x+r−1
M GF = M (t) = p2 (1 − p)x etx
x
x=0
∞
r
X
t x x+r−1
=p (qe ) [q = 1 − p]
x
x=0
= p (1 − qet )−r , qet < 1
r
Solution:
X := number of passengers who turn up for the flight.
X ∼ b(n = 52, p = 0.95)
Hypergeometric Distribution
A box contains N marbles. Of these, M are drawn at random, marked,
and returned to the the box. The contents of the box are then thoroughly
mixed. Next, n marbles are drawn at random from the box, and the marked
marbles are counted. If X denotes the number of marked marbles, then
M N −M
x n−x
P {X = x} = N
, x = 0, 1, 2, · · · (7)
x
20
n N −M
M
X x x
M ean = µ = E(X) = N
n−x
x=0 n
n M
N −1−(M −1)
Xx x
= N
n−x
x=1 n
n−1 M −1 N −1−(M −1)
Mn X y n−y−1
= N −1
N n−1
y=0
Mn
=
N
M n (M −1)(n−1)
E(X(X − 1)) = N (N −1) = E(X 2 − X) = E(X 2 ) − E(X)
M n(M − 1)(n − 1) M n Mn 2
V ar(X) = E(X 2 ) − {E(X)}2 = + −( )
N (N − 1) N N
nM (N − M )(N − n)
=
N 2 (N − 1)
Poisson Distribution
An RV is said to be Poisson RV with parameter λ > 0 if its pmf is given by
e−λ λk
P {X = k} = , k = 0, 1, 2, · · · (8)
k!
Some experiments result in counting the number of times particular event
occur in the given times or given physical objects.
∞
X e−λ λk
E(X) = k
k!
k=0
∞
−λ
X λk−1
=e λ
(k − 1)!
k=1
−λ λ
= λe e
=λ
21
∞
X e−λ λk
E(X 2 ) = k2
k!
k=0
∞
−λ 2
X kλk−2
=e λ
(k − 1)!(k − 2)!
k=1
∞
−λ 2
X λk−2 k − 1 1
=e λ ( + )
(k − 2)! k − 1 k − 1
k=1
∞ ∞
−λ 2
X λk−2 −λ
X λk−1
=e λ +e λ
(k − 2)! (k − 1)!
k=2 k=1
−λ 2 λ −λ λ
=e λ e +e λe
2
=λ +λ
Var(X) = λ.
The MGF of X is given by E(etX ) = exp [λ(et − 1)]
P (X ≥ 5) = 1 − P (X ≤ 4)
4
X e−6 6x
=1−
x!
x=0
= 1 − 0.255
= 0.715
22
We will write X ∼ U [a, b] if X has a uniform distribution
R∞ on [a, b].
The endpoint a or b both may be excluded clearly, −∞ f (x)dx = 1,
hR i
∞ Rb 1 b−a
−∞ f (x) = a b−a dx = b−a = 1 .
The cdf of X is given by
0
x<a
x−a
F (x) = b−a a ≤ x < b
1 b≤x
R∞ R b x 1 2 2 a+b
E(X) = −∞ xf (x)dx = a 1−a dx = 2(b−a) [b − a ] = 2 .
Rb k
E(X k ) = ∞∞ k 1 1 1
−∞ x f (x)dx = 1−a a x dx = b−a x+1 (b
k+1 − ak+1 )
x
f (x) = 1θ e− θ , 0 ≤ x < ∞
23
Z ∞
1 x
M (t) = etx ( )e− θ dx
0 θ
Z b
1 x
= lim ( )e−(1−θt) θ dx
b→∞ 0 θ
" x
#b
e−(1−θt) θ
= lim −
b→∞ 1 − θt
0
1 1
= ,t <
1 − θt θ
′ θ
Thus M (t) = (1−θt)2
′′ 2θ2
and M (t) = (1−θt)3
Hence, for an exponential distribution, we have,
′ ′′ ′
µ = M (0) = θ, and σ 2 = M (0) − [M ]2 = θ2 .
So λ is then mean number of changes in the unit interval, then θ = λ1 is the
mean waiting time for the first change. In particular, suppose that λ = 7
is the mean number of changes per minute, then the mean waiting time for
the first change is 17 of a minute, a result that agrees with our intuition.
R∞
The integral Γ(α) = 0+ xα−1 e−x dx is called Gamma function. Γ(α) <
∞, converges as α > 0 and Γ(α) diverges as α ≤ 0. If α > 1, Γ(α) =
√
(α − 1)Γ(α). If α = n is an integer Γ(n) = (n − 1)!. Γ( 21 ) = π.
24
x
∞
x2−1 e− 2
Z
P (X > 5) = dx
5 Γ(2)22
x
∞
xe− 2
Z
= dx
5 4
1h x ∞
x
i
= (−2)xe− 2 − 4e− 2
4 5
1 −5/2 −5/2
= [10e + 4e ]
4
7
= e−5/2
2
= 0.287
Chi-square Distribution.
Let X ∼ G(α, θ). If θ = 2 and α = 2r , r is a positive integer. The pdf of X is
r x
f (x) = 1
r x 2 −1 e− 2 , x ≥ 0
Γ( r2 )2 2
In this case, we say that X has a χ2 -distribution with r degrees freedom and
write as X ∼ χ2 (r).
µ = αθ = 2r × 2 = r
σ 2 = αθ2 = 2r × 4 = 2r
r
M (t) = (1 − 2t)− 2 , t < 12
Rx 1 r ω
F (x) = 0 r r ω 2 −1 e− 2 dω
Γ( 2 )2 2
Normal Distribution.
The RV X has a normal distribution if its pdf is defined by
25
2
f (x) = √1
σ 2π
exp [− (x−µ)
2σ 2
], −∞ < x < ∞
where µ and σ are parameters satisfying −∞ < µ < ∞ and 0 < σ < ∞. We
write X ∼ N (µ, σ 2 ).
Clearly f (x) > 0 because of exponential function. We now evaluate the
integral
R∞ 2
I = −∞ σ√12π exp [− (x−µ)
2σ 2
]dx. Want to show I = 1.
x−µ
In I, change variable of integration by letting z = σ , then
2
− z2
R∞
I= √1
−∞ 2π e dz
The MGF of X is
∞
etx (x − µ)2
Z
M (t) = √ exp [− ]dx
σ 2π 2σ 2
Z−∞
∞
1 1
= √ exp{− 2 [x2 − 2(µ + σ 2 t)x + µ2 ]}dx
−∞ σ 2π 2σ
To evaluate this integral, we compute the square in the exponent
x2 − 2(µ + σ 2 t)x + µ2 = [x − (µ + σ 2 t)]2 − 2µσ 2 t − σ 4 t2
2 t+σ 4 t2 2 2
Thus M (t) = exp ( 2µσ 2σ 2 ) = exp (µt + σ 2t )
′ 2 2
M (t) = (µ + σ 2 t) exp (µt + σ 2t )
′′ 2 2
M (t) = [(µ + σ 2 t)2 + σ 2 ] exp (µt + σ 2t )
′
E(X) = M (0) = µ
′′ ′
Var(X)M (0) − [M (0)]2 = µ2 + σ 2 − µ2 = σ 2 .
2
Example 29. Given pdf = f (x) = √1
32π
exp [− (x+7)
32 ] what are mean and
standard deviation?
µ = −7, σ 2 = 16, X ∼ N (−7, 16)
26
M (t) = exp −7t + 8t2
Z ∼ N (0, 1) is called standard normal distribution.
z 2
pdf = f (z) = √1 e− 2
2π
2
− w2
Rz
cdf = Φ(z) = √1
−∞ 2π e dw
(⋆) pdf is symmetric
(⋆) Φ(−z) = 1 − Φ(z)
R 1.24 z2
Example 31. If Z ∼ N (0, 1), P (Z ≤ 1.24) = Φ(1.24) = √1 e− 2 dz =
−∞ 2π
0.8925.
Problem 15. If the distribution Z ∼ N (0, 1). Find the the cconstant a
such that P (Z ≤ a) = 0.9174
Indeed,Φ(a) = 0.9147, Φ(1.37) = 0.9147, a = 1.37
because Φ is 1 − 1.
µ4
Kurtosis: β2 = σ4
. It measures peakness.
′
and thus the pdf g(y) = G (y) of Y is
27
ln y
g(y) = 1 α−1 e− θ ( 1 )
Γ(α)θα (ln y) y
Equivalently, we have
1 (ln y)α−1
g(y) = Γ(α)θα 1 , 1<y<∞
y 1+ θ
Let X have the pdf f (x) = 3(1−x)2 , 0 < x < 1, say Y = (1−x)3 = u(x).
Calculate g(y)?
Example 34. A spinner is mounted at the point (0, 1), let w be the smallest
angle between the y-axis and the spinner. Assume that w is the value of a
random variable W that has a uniform distribution on the interval (− π2 , π2 ).
That is, W is U ((− π2 , π2 )), and the distribution function of W is
0
−∞ < w < − π2
P (W ≤ w) = F (w) = (w + π2 ) π1 − π2 ≤ w < π2
1 w ≥ π2
X −µ
P (Z ≤ z) = P ( ≤ z)
σ
= P (X ≤ zσ + µ)
Z zσ+µ
1 (x − µ)2
= √ exp [− ]dx
−∞ σ 2π 2σ 2
In the integral, use the change of variable of integration given by
w = x−µ
σ
28
to obtain
Rz w2
P (Z ≤ z) = −∞ √12π e− 2 dw
But this is the expression of ϕ(z), the distribution function of a standard
normal random variable, Z ∼ N (0, 1).
Remark. X ∼ N (µ, σ 2 )
P (a ≤ X ≤ b) = P (a − µ ≤ x − µ ≤ b − µ)
a−µ X −µ b−µ
= P( ≤ ≤ )
σ σ σ
a−µ b−µ
= P( ≤Z≤ )
σ σ
b−µ a−µ
= Φ( ) − Φ( )
σ σ
Φ is cdf of N (0, 1).
Example 35. If X is N (3, 16), then µ = 3, σ = 4
4−3 X −3 8−3
P (4 ≤ X ≤ 8) = P ( ≤ ≤ )
4 4 4
= Φ(1.25) − Φ(0.25)
= 0.8944 − 0.5987
= 0.2957
Example 36. If N is N (25, 36), we find a constant c such that P (|X −25| ≤
c) = 0.9544
P (− 6c ≤ X−25
6 ≤ 6c ) = 0.9544
Thus Φ( 6 ) − [1 − Φ( 6c )] = 0.9544
c
29
µ2 = σ 2 , µ4 = 3σ 4
3σ 4
Kurtosis = β2 = µµ42 = σ4
= 3.
2
Odd moments for N (µ, σ 2 ) is µ 2n+1 .
x−µ 2
R∞ −1
2( σ )
By definition: µ2n+1 = −∞ (x − µ)2n+1 e √
σ 2π
dx
x−µ
letting z = σ in the above integral we have
2
Z ∞ 2n+1 e z2
2n+1 z
µ2n+1 = σ √
−∞ 2π
= 0,
since the integrand is odd function.
Skewness= β1 = σµ33 = 0.
Beta Distribution.
The integral Z 1−
B(α, β) = xα−1 (1 − x)β−1 dx (9)
0+
converges for α > 0, β > 0 and is called a beta function. For α ≤ 0 or β ≤ 0,
the above integral diverges.
Properties:
1. B(α, β) = B(β, α)
R∞
2. 0+ x
α−1 (1 + x)−α−β dx
Γ(α)Γ(β)
3. B(α, β) = Γ(α+β)
(
1 α−1 (1
B(α,β) x − x)β−1 0<x<1
f (x) = (10)
0 otherwise
defines a pdf.
30
Raw moments,
Z 1
n1
mn = E(X ) = xn+α−1 (1 − x)β−1 dx
B(α, β) 0
B(n + α, β)
=
B(α, β)
Γ(n + α)Γ(α, β)
=
Γ(n + α − β)Γ(α)
B(α+1,β) Γ(α+1) Γ(α+β)
E(X) = B(α,β) = Γ(α) =
Γ(α+β+1)
Var(X) = E(X 2 )− = (α+β)2αβ
E(X)2
1
R 1 tx α−1 (α+β+1)
MGF = M (t) = B(α,β) 0 e x (1 − x)β−1 dx
Since moments of all order exist and E|X|j < 1, ∀j, we have M (t) =
P∞ tj j
P∞ tj Γ(α+j)Γ(α+β)
j=0 j! E(X) = j=0 j! Γ(α+β+j)Γ(α)
Cauchy Distribution.
31
Check for pdf
Z ∞ Z ∞
µ 1
f (x)dx = 2 2
dx
−∞ −∞ π µ + (x − θ)
1 ∞ x−θ
Z
1
= 2
dy, [y = ]
π −∞ 1 + y µ
Z ∞
2 dy
=
π 0 1 + y2
2 −1 ∞
= tan y 0
π
2 π
= ×
π 2
=1
Pareto Distribution
We say that the RV X has a pareto distribution with parameter θ > 0 and
α > 0 if its pdf is given by
αθα
(
(x+θ)θ+1
x>0
f (x) =
0 otherwise
θ : scale parameter
α : shape parameter
θα
F (x) = P (X ≤ x) = 1 − (θ+x)α , x >0
θ
E(X) = α−1
αθ2
Var(X)= (α−2)(α−1)2
,α >2
Weibull Distribution
Gamma distribution X ∼ G(α, β)
32
x
α−1 e− β
(
1
Γ(α)β α x x>0
f (x) =
0 x≤0
G(1, β), then
x
1 −β
(
βe x>0
f (x) =
0 x≤0
1
Y = Xα,α > 0
F (y) = P (Y ≤ y)
1
= P (X α ≤ y)
Z yα
= f (x)dx
−∞
Z yα
1 − βx
= e
0 β
Z yα
β
= e−z dz
0
−z
yβα
= −e 0
α
− yβ
=1−e
′
f (y) = F (y)
yα
(
α α−1 −
βy e y>0
β
f (y) = (12)
0 y≤0
Y is said to have Weibull distribution with pdf given by (12).
Reliability
33
P (T ∈ (t, dt))
P (T ∈ (t, dt)|T > t) =
P (T > t)
f (t)dt
=
1 − F (t)
d
= [− log(1 − F (t))]
dt
f (t) d
λ(t) = = [− log(1 − F (t))] (13)
1 − F (t) dt
represents the conditional probability intensity that an item of age t will fail
instantly.
bt2
(
(a + bt)e−(at+ 2 ) t≥0
f (t) = (14)
0 t<0
34
Problem 16. Under a certain complicated birth situation the mortality
rate of a new born child is given by Z(t) = 0.5 + 2t, t > 0
If the baby survives to age one what is the probability that he/ she will
survive to age 2.
Solution: X ∼ Age
P (X > 2|X > 1) = P P(A∩B)
(B) =
P (A)
P (B) = P (X>2)
P (X>1)
Z
P (X > t) = RX (t) = exp {− z(t)dt}
= exp {−(0.5t + t2 )}
P (X > 2) = e−5 , P (X > 1) = e−1.5
Solution:
Z
P (X ≥ 10) = 0∞ 0.λe−0.1t dt
1
= lim [−e−0.1t + e−1 ]
t→∞
−1
=e
≈ 0.368
1. E(aX) = aE(X)
E(nX) = E(X + · · · + X) = nE(X)
E(cX + d) = cE(X) + d
2. Var(cX) = c2 Var(X)
R∞
M (t) = E(etX ) = −∞ etx f (x)dx
′ ′′
M (0) = E(X), M (0) = E(X 2 )
35
Problem 19. The lifetime X in hours of a component is modeled by a
Weibull distribution with α = 2. Starting with a large number of compo-
nents , it is observed that 15% of the components that have lasted 90 hours
fail before 100 hours. Determine parameter λ.
2
Solution: FX (x) = 1 − e−λx
15
P (X < 100|X > 90) = 100
=⇒ P (90<X<100)
P (X>90) = 0.15
FX (100)−FX (90)
=⇒ 1−FX (90) = 0.15
2 2
e−λ90 −e−λ100
=⇒ λ90 2 = 0.15
1−e
=⇒ λ ∼ − ln1900
(0.85)
≈ 0.000086
36
5 Joint distribution
Many a time, the data appears in a pair, tuples etc. For instance, one is
interested in collecting data (height, weight) of a college students for study
purpose.
We begin with the definition of bivariate distribution. Similarly, one can
define multivariate distribution.
1. 0 ≤ f (x, y) ≤ 1
PP
2. (x,y)∈S =1
PP
3. P [(x, y) ∈ A] = (x,y)∈A f (x, y), A ⊆ S.
Definition 36. Let X and Y have the joint pmf f (x, y) with the space S.
The pmf of X alone, which is called the marginal pmf of X, is defined by
P
f1 (x) = y f (x, y) = P (X = x), x ∈ S1
37
Answer: No, f (x, y) ̸= f1 (x)f2 (y)
11 1 1
For instance, f1 (1)f2 (1) = 36 × 36 ̸= f (1, 1) = 36 .
2
Example 41. Joint pmf X and Y given by f (x, y) = xy 30 , x = 1, 2, 3; y =
1, 2. S = {(1, 1), (2, 1), (3, 1), · · · } ⊆ R2 . Find marginal pmfs? Are X and
Y independent?
2
Let us calculate. f1 (x) = 2y=1 xy x 4x 5x x
P
30 = 30 + 30 = 30 = 6
2 y2 2y 2 3y 2 6y 2 y2
f2 (y) = 3x=1 xy
P
30 = 30 + 30 + 30 = 30 = 5
2
x y2
f (x, y) = xy
30 = 6 5 = f1 (x)f2 (y) ∀x, y
Hence, X and Y are independent.
Example 42. There are eight similar chips in a bowl, three marked (0, 0)
two marked (0,1), two marked (1, 0), one marked (1, 1). A player select a
chip at random and is given sum of the coordinates in dollars. X1 and X2
represent those coordinates. Then f (x1 , x2 ) = 3−x81 −x2 , x1 = 0, 1; x2 = 0, 1
u(X1 , X2 ) = X1 + X2 , E(u(X1 , X2 )) =?
1 X
1
X 3 − x1 − x2
E(u(X1 , X2 )) = (x1 + x2 )
8
x2 =0 x1 =0
1
X x2 (3 − x2 ) (1 + x2 )(2 − x2 )
= ( + )
8 8
x2 =0
2 2 2
= + +
8 8 8
6
=
8
3
=
4
Remark. (a) u(X1 , X2 ) = Xi , E(u(X1 , X2 )) = E(Xi ) = µi is called the
mean of XP i , iP= 1, 2
xi f (x1 ,x2 ) P
E(X1 ) = (x1 ,x2 ) f1 (x1 ) = E(X1 ) = x1 f1 (x1 )
(b) If u2 (X1 , X2 ) = (Xi − ui )2 , then E(u2 (X1 , X2 )) = E[(Xi − ui )2 ] = σi2 =
V ar(Xi ).
38
Definition 39. The joint pdf of two continuous type random variables is
an integrable f (x, y) with the following properties:
1. f (x, y) ≥ 0
R∞ R∞
2. −∞ −∞ f (x, y)dxdy =1
RR
3. P [(X, Y ) ∈ A] = A f (x, y)dxdy, where {(x, y) ∈ A} is an event in
the plane.
Remark. Property (3) implies that P [(X, Y ) ∈ A] is the volume of the solid
over the region A in the xy-plane and bounded by the surface z = f (x, y).
Example 43. Let X and Y have the joint pdf f (x, y) = 32 x2 (1 − |y|),
−1 < x < 1, −1 < y < 1
Let A = {(x, y) : 0 < x < y, 0 < yR <R x}. The probability that (X, Y ) falls
1 x
in A is given by P [(X, Y ) ∈ A] = 0 0 23 x2 (1 − y)dydx = 40
9
The respective
R∞ marginal pdf for continuous case:
f1 (x) = −∞ f (x, y)dy, x ∈ S1
R∞
f2 (y) = −∞ f (x, y)dx, y ∈ S2
where S1 and S2 are spaces of X and Y .
From theR previous example, we calculate
1
f1 (x) = −1 23 x2 (1 − |y|)dy = 23 x2 , −1 < x < 1
R1
f2 (y) = −1 32 x2 (1 − |y|)dx = 1 − |y|, −1 < y < 1
=1
39
Solution:
1 1 1
P (0 ≤ X ≤ , 0 ≤ Y ≤ ) = P (0 ≤ X ≤ y, 0 ≤ y ≤
2 2 2
Z 1Z y
2
= 2dxdy
0 0
Z 1
2
= 2ydy
0
1
=
4
[Draw picture]
The Rmarginal pdfs are given by
1
f1 (x) = x 2dy = 2(1 − x), 0 ≤ x ≤ 1
Ry
f2 (y) = 0 2dy = 2y, 0 ≤ y ≤ 1
Three illustrations
R1R1 of expected
R1 values are
1
E(X) = 0 x 2xdydx = 0 2x(1 − x)dx = 3
R1Ry R1 2
E(Y ) = 0 0 2ydxdy = 0 2y 2 dy = 3
R1Ry R1 1
E(Y 2 ) = 0 0 2y 2 dxdy = 0 2y 3 dy = 2
Definition 40. X and Y are independent ⇐⇒ the joint factors into the
product of their marginal pdfs, namely,
f (x, y) = f1 (x)f2 (y), x ∈ S1 , y ∈ S2 .
40
1. F is non-decreasing and right continuous with respect to each coordi-
nate
3. For every (x1 , y1 ), (x2 , y2 ) with x1 < x2 and y1 < y2 the inequality
F (x2 , y2 ) − F (x2 , y1 ) + F (x1 , y1 ) − F (x1 , y2 ) ≥ 0.
Remark. The conditions (1) and (2) are not sufficient to make any function
F (x, y) a DF. For example, consider F (x, y) = 0, when x < 0, y < 0, x+y < 1
and F (x, y) = 1 otherwise. Then F satisfies both (1) and (2). However check
that P (1/3 < X ≤ 1, 1/3 < Y ≤ 1) = −1 ≱ 0.
µi := E(Xi ), i = 1, 2
σi2
= E((Xi − µi )2 ), i = 1, 2
Covariance and correlation:
1. V ar(X) = Cov(X, X)
3. Cov(X, Y ) = Cov(Y, X)
41
4. Cov(cX, Y ) = cCov(X, Y )
Since f (x1 , x2 ) ̸= f1 (x1 )f2 (x2 ), X1 and X2 are dependent. The mean and
the variance of X1 are respectively
µ1 = 2x1 =1 x1 2x18 1 +6 8 10
= 14
P
= 18 + 2 18 9
σ 2 = 2x1 =1 x21 2x18 1 +6
− ( 14 2 = 24 − 196 = 20
P
9 ) 9 81 81
− 1 1
corelation coefficient ρ = q 162
20 77
= − √1540 = −0.025
( 81 )( 324 )
42
f (x,y)
h(y|x) = f1 (x) , provided f1 (x) > 0.
Example 45. Let X and Y have the joint pmf, f (x, y) = x+y 21 , x =
2x+3
1, 2, 3; y = 1, 2. We showed that f1 (x) = 21 , x = 1, 2, 3 and f2 (y) =
3y+6
21 , y = 1, 2
(x+y)
x+y
So, conditional pmf of X, given Y = y is equal to g(x|y) = 21
3y+6 = 3y+6 ,
21
x = 1, 2, 3; y = 1, 2.
4
P (X = 2|Y = 2) = g(2|2) = 12 = 13
x+y
Similarly, the conditional pmf of Y , given X = x is equal to , h(y|x) = 2x+3 ,
x = 1, 2, 3; y = 1, 2.
43
Binomial distribution to Trinomial distribution
Here we have 3 mutually exclusive and exhaustive ways for experiment to
terminate say, perfect, “seconds”, and defective. We repeated the experi-
ment n independents times and the probabilities p1 , p2 , p3 = 1 − p1 − p2 of
perfect, “seconds” and defective respectively remain the same in each trial.
In the n trials, let x1 = number of perfect items, X2 = number of seconds
items, X3 = n − X1 − X2 = number of defectives.
If x1 and x2 are non-negative integers such that x1 + x2 ≤ n, then the
probability of having x1 perfect, x2 “seconds” and n − x1 − x2 defectives is
px1 1 px2 2 (1 − p1 − p2 )n−x1 −x2
P (A) = 1 − P (A‘ )
= 1 − P (X = 00r1andY = 0or1)
= 1 − P (X = 0, Y = 0) − P (X = 0, Y = 1) − P (X = 1, Y = 0) − P (X = 1, Y = 1)
20!
=1− (0.04)0 (0.01)0 (0.95)20
0!0!20!
= 0.204
44
X = v(Y ). c1 < x < c2 maps onto d1 = u(c1 ) < y < d2 = u(c2 )
The distribution function Y is G(y) = P (Y ≤ y) = P (u(X) ≤ y) = P (X ≤
v(y)), d1 < y < d2
G(y) = 0, y ≤ d1
G(y) = 1, y ≥ d2
Then Z v(y)
G(y) = f (x)dx, d1 < y < d2 (17)
0
From calculus: from (17), we get
′ ′
G (y) = g(y) = f (v(y)v (y)), d1 < y < d2
′
G (y) = g(y) = 0 if y < d1 or y > d2 .
For illustration, of this change of variable technique, Y = eX , where X has
1
pdf f (x) = Γ(α)θ α−1 e− xθ , 0 < x < ∞
αx
′
g(y) = |v (y)|f (v(y))
45
Compute
∂x1 ∂x1
∂y1 ∂y2
J= ∂x2 ∂x2
∂y1 ∂y2
Example 47. Let X1 , X2 have the joint pdf f (x1 , x2 ) = 2, 0 < x1 < x2 < 1.
Consider the transformation Y1 = X X2 , Y2 = X2 .
1
Note: The points for which y2 = 0, 0 ≤ y < 1, all map into the single
point x1 = 0, x2 = 0. i.e., many to one mapping and yet we restricting
ourself to one-one mapping. However, the boundaries are not part of our
support. Thus S1 is as depicted in (?) and according to the rule, the joint
pdf of y1 and y2 is g(y1 , y2 ) = |y2 |2 = 2y, 0 < y1 < 1, 0 < y2 < 1. It is
interesting to note that the marginal probabilities density functions are
R1
g1 (y1 ) = 0 2y2 dy2 = 1, 0 < y2 < 1
R1
g2 (y2 ) = 0 2y2 dy1 = 2y2 , 0 < y2 < 1
Let us consider Y1 = X1 − X2 , Y2 = X1 + X2
y2 −y1
=⇒ x1 = y1 +y 2
2 , x2 = 2
1 1 1
J= 2 2 =
− 12 1
2 2
46
R y2 1 −y2
g2 (y2 ) = −y2 2 e dy1 = y2 e−y2 , 0 < y2 < ∞
That of Y1 is
(R ∞
1 −y2
2e dy2 = 12 ey1 −∞ < y1 < 0
g1 (y1 ) = Ry∞
1
1 −y2
y1 2e dy2 = 12 e−y 0 < y1 < ∞
that is, the expression for g1 (y1 ) depends on the location of y1 , although
this could be written as
g1 (y1 ) = 21 e−|y1 | , ∞ < y − 2 < ∞
which is called a double experiment pdf.
Theorem 8. Let (X1 , X2 , · · · , Xn ) be an n dimensional random variable of
the continuous type with pdf f (x1 , x2 , · · · , xn )
1. let y1 = g1 (x1 , x2 , · · · , xn )
y2 = g2 (x1 , x2 , · · · , xn )
..
.
yn = gn (x1 , x2 , · · · , xn )
be a one-one mapping Rn 7→ Rn , i.e., there exists the inverse transfor-
mation
x1 = h1 (y1 , y2 , · · · , yn ), x2 = h2 (y1 , y2 , · · · , yn ), · · · , xn = hn (y1 , y2 , · · · , yn )
defined over the range of the transformation.
2. Assume that both the mapping and its inverse are continuous
∂xi
3. Assume that the partial derivatives ∂yj , 1 ≤ i ≤ n, 1 ≤ j ≤ n exist and
are continuous
47
Example 49. Let X1 , X2 , X3 be iid RVs with common exponential function
(
e−x x > 0
f (x) =
0 otherwise
y2 y2 y1 y3 y1 y2
J = y2 (1 − y3 ) y1 (1 − y3 ) −y1 y2 = −y12 y2
1 − y2 −y1 0
X1 +X2
Note, 0, y1 < ∞, 0 < y2 < 1, 0 < y3 < 1. Thus the 0 < X1 +X2 +X3 < 1.
Joint pdf of Y1 , Y2 , Y3 is given by w(y1 , y2 , y3 ) = y12 y2 e−y1 .
2. Q(x, y) = 1
1−ρ2
[( x−µ1 2
σ1 ) − 2ρ x−µ
σ1
1 y−µ2 y−µ2 2
σ2 + ( σ2 ) ]
Theorem 9. The function defined by (1) and (2) with σ1 > 0, σ2 > 0, |ρ| <
1 is a joint pdf. The marginal pdfs of X and Y are respectively N (µ1 , σ12 )
and N (µ2 , σ22 ) and ρ is the correlation coefficient between X and Y .
R∞
Proof. Let f1 (x) = −∞ f (x, y)dy. Note that
y − µ2 x − µ1 2 x − µ1 2
(1 − ρ2 ) = ( −ρ ) + (1 − ρ2 )( )
σ2 σ1 σ1
y − [µ2 + ρ(σ2 /σ1 )(x − µ1 )] 2 x − µ1 2
={ } + (1 − ρ2 )( )
σ2 σ1
It follows that
(y−βx )2
exp {− 2 (1−ρ2 ) }
1 −(x−µ1 )2 R ∞ 2σ2
f1 (x) = √ exp [ ] −∞ √ √
σ1 2π 2σ12 σ2 1−ρ2 2π
48
f1 (x) = 1
√
σ1 2π
exp [− 12 ( x−µ 1 2
σ1 ) ], −∞ < x < ∞
R∞ R∞ R∞
Thus −∞ [ −∞ f (x, y)dy]dx = −∞ f1 (x)dx = 1
and f (x, y) is a joint pdf of two RVs of the continuous type. It also follows
that f1 is the marginal pdf of X, so that X is N (µ1 , σ12 ).
In a similar manner we can show that Y is N (µ2 , σ22 ).
49
We see that as n → ∞
Fn (x) →
− F (x) (
0 x≤θ
=
1 x≥θ
w
is DF. Hence Fn −
→F
w
clearly, Fn −
→ F , where F is the density function given by
(
0 x<0
F (x) =
1 x≥0
50
L
2. cXn −
→ cX, c ̸= 0
Remark. We emphasize that the definition says nothing about the conver-
gence of RVs Xn to the RV X in the sense in which it is understood in real
P
analysis/ calculus. Thus Xn −→ X does not imply that given ϵ > 0, we can
find an N such that |Xn − X| < ϵ, ∀n ≥ N . The last definition speaks only
of the convergence of the sequence in probabilities P {|Xn − X| > ϵ} to 0.
1
Example 54. Let {Xn } be a sequence of RVs with pmf P {Xn = 1} = n
and P {Xn = 0} = 1 − n1 . Then
(
P {Xn = 1} = n1 0 < ϵ < 1
P {|Xn | > ϵ} =
0 ϵ≥1
P
It follows that P {|Xn | > ϵ} → 0 as n → ∞ and we conclude that Xn −
→ 0.
Some Properties
P P
1. Xn −→ X ⇐⇒ Xn − X − → 0.
P {|Xn − X − 0| > ϵ} → 0 as n → ∞
⇐⇒ P {|Xn − X| > ϵ} → 0 as n → ∞
P P
2. Xn − → X, Xn − → Y =⇒ P {X = Y } = 1. P {|X − Y | > c} ≤
P {|Xn − X| > 2 } + P {|Xn − Y | > 2c }
c
P P
3. Xn −→ X =⇒ Xn − Xm − → 0 as n, m → ∞ for P {|Xn − Xm | > ϵ} ≤
P {|Xn − X| > 2ϵ } + P {|Xm − X| > 2ϵ }
P P P
4. Xn −
→ X, Yn −
→ Y =⇒ Xn ± Yn −
→X ±Y
P P
5. Xn −
→ X =⇒ kXn −
→ kX, k is constant.
51
P P
→ k =⇒ Xn2 −
6. Xn − → k2
P P P
7. Xn −→ a, Yn −→ b, a, b constants =⇒ Xn Yn − → ab
(Xn +Yn )2 −(Xn −Yn )2 P (a+b)2 −(a−b)2
Xn Yn = 4 −
→ 4 = ab
P P
→ 1 =⇒ Xn−1 −
8. Xn − → 1 for
1 1 1
P {| − 1| ≥ ϵ} = P { ≥ 1 + ϵ} + P { ≤ 1 − ϵ}
Xn Xn Xn
1 1 1
= P{ ≥ 1 + ϵ} + P { ≤ 0} + P {0 < ≤ 1 − ϵ}
Xn Xn Xn
P P P
9. Xn − → b, a, b are constants, b ̸= 0 =⇒ Xn Yn−1 −
→ a, Yn − → ab−1
P Yn P b P P
(Yn −
→ b, by (5) b −
→ 1, by (8) Yn → 1, by(7)Xn Ybn −
− → a =⇒
P
Xn Yn−1 −
→ ab−1 , by (5))
P P
10. Xn − → X, and Y an RV =⇒ Xn Y − → XY .
Note that Y is an RV, so that given δ > 0, there exists a k > 0 such
that P {|Y | > k} < 2δ .
Thus P {|Xn Y − XY | > ϵ}
= P {|Xn − Y ||Y | > ϵ, |Y | > k} + P {|Xn − X||Y | > ϵ, |Y | ≤ k}
< 2δ + P (|Xn − X| kϵ )
P
=⇒ Xn Y −
→ XY .
P P P
11. Xn −→ X, Yn − → Y =⇒ Xn Yn −
→ XY .
P
( Note (Xn − X)(Yn − Y ) −
→ 0 by (7). The result now follows on
multiplication
P
Xn Yn − Xn Y − XYn + XY −→0
as n → ∞, Xn Y → XY , XYn → XY by (10)
P P
Hence, Xn Yn − XY −
→ 0 by (1) Xn Yn −
→ XY )
P
Theorem 11. Let Xn − → X, and g be a continuous function defined on R.
P
Then g(Xn ) −
→ g(X) as n → ∞.
Now we explain the relationship between weak convergence and conver-
gence in probability.
P L
Theorem 12. Xn −
→ X =⇒ Xn −
→ X.
52
L P
→ k =⇒ Xn −
Theorem 13. Let k be a constant then Xn − → k.
L P
→ k ⇐⇒ Xn −
Corollary 13.1. Xn − → k.
Example 56. Let {Xn } be a sequence of Rvs with pmf P {Xn = −n} = 1,
n = 1, 2, · · · . If we have MGF= Mn (t) − E(etXn ) = e−tn .
Mn (t) → 0 as n → ∞, ∀t > 0.
Mn (t) = e−tn → 1 , n → ∞, t = 0.
So,
0 t > 0
Mn (t) → M (t) = 1 t = 0 as n → ∞
∞ t<0
53
But M (t) is not an MGF. Note that if Fn is the DF of Xn then
(
0 x < −n
Fn (x) =
1 x ≥ −n
Fn (x) → F (x) = 1∀x as n → ∞.
Note: F (x) = 1∀x is not a DF, F (−∞) = limx→∞ F (x) ̸= 0.
L
Question: Suppose that Xn has MGF Mn and Xn − → X, where X is
an RV with MGF M . Does Mn (t) → M (t) as n → ∞?
The answer to this question is in the negative.
Example 57. Consider DF
0
x < −n
Fn (x) = 21 + cn tan−1 (nx) −n ≤ x < n
1 x≥n
1
where cn = 2 tan−1 (n2 )
.
If x < 0, Fn (x) → 0 as n → ∞
If x ≥ 0, Fn (x) → 1 as n → ∞
(
0 x<0
Fn (x) → F (x) =
1 x≥0
F (x) is DF and Fn (x) → F (x) at all points of continuity of the DF F . The
MGF associated
Rn with Fn is
Mn (t) = −n cn etx 1+nn2 x2 dx
which exists forall t.
The MGF corresponding to F is M (t) = 1∀t. But Mn (t) ̸→ M (t), since
Mn (t) > ∞ if t ̸= 0.
3 3
Indeed, Mn (t) > 0 cn |t| 6x 1+nn2 x2 dx.
Rn
Remark. MGF are often useful for establishing the convergence of distri-
bution functions. Fact is: a distribution function Fn is uniquely determined
by its MGF Mn . The following theorem (we call continuity theorem) states
that this unique determination holds for limits as well.
Theorem 15. (Continuity Theorem)
Let Fn be a sequence of cumulative distribution functions with the corre-
sponding MGF M . If Mn (t) → M (t) for all t in an open interval containing
zero, then Fn (x) → F (x) at all continuity point of F .
Example 58. Let Xn be an RV with pmf P (Xn = 1) = n1 , P (Xn = 0) =
1 − n1 for each n. Then Mn (t) = n1 et + (1 − n1 ) exists ∀t ∈ R.
Mn (t) → 1 as n → ∞ ∀t ∈ R. Mn (t) → 1 as n → ∞ ∀t ∈ R. Here M (t) = 1
L w
is the MGF of an RV X degenerate at 0. Xn −
→ X. Also Fn −
→ F.
54
Theorem 16. (Central Limit P Theorem)
Let X1 , X2 , · · · be iid, Sn = ni=1 Xi , limn→∞ P ( σS√nn ) = Φ(n).
Example 59. An insurance company has 25, 000 automobile policy hold-
ers. If the yearly claim of a policy holder is a random variable with mean
320 and standard deviation 540, approximate the probability that the total
yearly claim exceeds 8.3 million.
Solution: Let X denote the total yearly claim. Number the policy holders,
and let Xi denote the yearly claim of policy holder
Pn i. with n = 25, 00, we
have from the central limit theorem that X = i=1 Xi will have approxi-
mately a normal√ distribution with mean 320 × 25000 = 8 × 106 and standard
4
deviation 540 25000 = 8.5381 × 10 . Therefore,
P (X > 8.3 × 106 )
X−8×106 8.3×106 −8×106
=P ( 8.5381×104 > 8.5381×104
)
X−8×10 6 0.3×10 6
= P ( 8.5381×104 > 8.5381×104 )
≈ P (Z > 3.51), Z is a standard normal
≈ 0.00023
Thus, there are only 2.3 chances out of 10, 000 that the total yearly claim
will exceed 8.3 million.
Example 60. Civil engineers believe that W , the amount of weight (in
units of 1000 pounds) that a certain span of a bridge can withstand without
structural damage resulting is normally distributed with mean 400 and stan-
dard deviation 40. Suppose that the weight (again,in units of 1000 pounds)
of a car is a random variable with mean 3 and standard deviation 0.3. How
many cars would have to be on the bridge span for the probability of struc-
tural damage to exceed 0.1?
Solution: Let Pn denote the probability of structural damage when there
are n cars on the bridge. That is, Pn = P (X1 + · · · + Xn ≥ W ) = P (X1 +
· · · + Xn − W ≥ 0), where Xi is the weight of the Pnith car, i = 1, 2, · · · , n.
Now it follows from central limit theorem that i=1 Xi is approximately
normal with mean 3n and variance 0.09n. Since W isPindependent of the
Xi , i = 1, 2, · · · , n and is also normal, it follows that ni=1 Xi − W is ap-
proximately
Pn normal with mean and variance given by
E( P X
i=1 i − W ) = 3n − 400
Var( i=1 Xi − W ) = Var( ni=1 Xi ) + Var(W ) = 0.9n + 1600
n P
−W −(3n−400)
Thus, Pn = P ( X1 +···+X √ n
0.09n+1600
≥ √−(3n−400)
0.09n+1600
400−3n
) ≈ P (Z ≥ √0.09n+1600 )
where Z is a standard normal random variable. Now P (Z ≥ 1.28) ≈ 0.1
400−3n
and so if the number of cars n is such that √0.09n+1600 ≤ 1.28 or n ≥ 117
then there is at least 1 chance in 10 that structural damage will occur.
Problem 24. The ideal size of a first year class at particular college is
150 students. The college, knowing from the past experience that, on the
average, only 30 percent of those accepted for admission will actually attend,
55
uses a policy of approving the applications of 450 students. Compute the
probability that more than 150 first year students attend this college.
Solution: Let X denote the number of students that attend, then as-
suming that each accepted applicant will independently attend, it follows
that X is a binomial random variable with parameters n = 450 and p = 0.3.
Since binomial is a discrete and the normal a continuous distribution, it is
best to compute P (X = i) as P (i − 0.5 < X < i + 0.5) when applying the
normal approximation, this is called the continuity correction. This yields
the approximation
P (X > 150.5) = P ( √X−450×0.3
450×0.3×0.7
≥√150.5−450×.3
450×0.3×0.7
) ≈ P (Z > 1.59) = 0.06
Hence, only 6 percent of the time do more than 150 of the first 450 accepted
actually attend.
Remark. One of the most important applications of the central limit theo-
rem is in regard to binomial random variables. Since such a random variable
X having a parameter (n, p) represents the number of successes in n inde-
pendent trials when each trial is a success with probability p, we can express
it as
X = X1 + · · · + Xn
where (
1 if the trial is success
Xi =
0 otherwise
Because E(Xi ) = p, Var(Xi ) = p(1 − p), it follows from the central limit
theorem that for n large √X−np will approximately be a standard normal
np(1−p)
variable.
56
the values of the RV X. F is CDF of X. F will not be known completely,i.e.,
one or more parameters associated to F will be unknown.
Job is: to estimate these unknown parameters or to test the validity of the
certain statement about them.
or
We seek information about some numerical characteristics of a collection of
elements, called a population.
2. Note that the sample statistic X̄, S 2 etc. are random variables, while
the population parameters µ,2 and so on are fixed constants that may
be unknown.
Chi-square distribution
Recall that Chi-square distribution is a special case of the Gamma distribu-
tion. Let n > 0 be an integer. Then G( n2 , 2) is a χ2 (n) RV.
57
If X has a Chi-square distribution with n degrees of freedom, we write
X ∼ χ2 (n). If pdf is given by
( n/2−1 −x/2
x e
2n/2Γ(n/2)
x≥0
f (x) =
0 x<0
the MGF by M (t) = (1 − 2t)−n/2 for t < 21 . Mean E(X) = n and variance
Var(X) = 2n.
Student’s t-statistic
Theorem 18. Let X ∼ t(n), n > 1. Then E(X r ) exists for r < n. In
particular, if r < n is odd, E(X r ) = 0 and if r < n is even E(X r ) =
nr/2 Γ[(r+1)/2]Γ[(n−r)/2]
F ( 12 )Γ( n )
.
2
n
Corollary 18.1. If n > 2, E(X) = 0 and E(X 2 ) = Var(X) = n−2
(Gosset proposed t-distribution, his penname was student)
58
2. We write Fm,n,α for the upper α percent point of the F (m, n) distri-
bution, that is, P (F (m, n) > Fm,n,α ) = α.
1
From (1) we have the following relation: Fm,n,1−α = Fn,m,α .
7 Estimation of parameters
To be typed
8 Testing of hypotheses
To be typed
References
[1] Jhon A. Rice, Mathematical Statistics and Data Analysis.
[2] R.V. Hogg, E.A. Tanis D.L. Zimmerman, Probability and Statistical
Inference
59