Lecture Notes I
Lecture Notes I
0.1. Introduction
Example 0.1. The production manager of a bulb manufacturing company wishes to study the effect of new manufac-
turing process on the lifetimes of bulbs produced through it.
Here the population under study is the following:
P: Collection of lifetimes of all electric bulbs produced using new manufacturing process.
In most practical situation P is generally large (e.g. collection of lifetimes of all electric bulbs that would be produced
using new manufacturing process) and it is not (due to time/cost contraints) to get complete information about P.
Thus a representative sample (a sample that in certain sense is a true representative of the population) is taken from P
and using this representative sample inferences regarding various population characteristics of P (such as population
mean, population variance etc.) are made. Note that the sample contains only partial information about P and the
goal is to make inferences about various population characteristics based on partial information in the sample drawn
from P.
X: Lifetime of a typical electric bulbs manufactured using new manufacturing process (a typical element of P).
X is random (called a random variable) and its value varies across P according to some law.
Probability Theory: A mathematical tool for modelling uncertainty (e.g. to describe the law according to which
values of X vary across P).
Statistics: Concerns with procedures for analyzing data (sample) and drawing inferences about various characteristics
of the population P.
For understanding of statistics, one must have a sound background in probability theory.
The only way to collect information about any random phenomenon is to perform experiments (e.g. selecting a set of
bulbs manufactured by the new manufacturing process and putting them on test for measuring their lifetimes). Each
experiment terminates in an outcome which cannot be predicted in advance prior to the performance of experiment
(e.g. lifetimes of the bulbs put on test cannot be predicted before they are put on test).
Definition 0.2 (Random Experiment). A random experiment is an experiment in which
(a) all possible outcomes of the experiment are known in advance,
(b) outcome of a particular performance (trial) of the experiment cannot be predicted in advance,
(c) the experiment can be repeated under identical conditions.
0-1
Lecture 0 Lecture Notes 0-2
Definition 0.3 (Sample Space). The collection of all possible outcomes of a random experiment is called its sample
space. A sample space will usually be denoted by Ω.
Example 0.4. (i) E: Tossing a coin once. Sample space Ω = {H, T }, where H: Heads and T : Tails.
(ii) E: Throwing a die. Sample space Ω = {1, 2, 3, 4, 5, 6}.
(iii) E: Birth of a child. Sample space Ω = {M, F }. If we consider his/her weight then Ω = (0, 7).
(iv) E: Age at the death of a person. Sample space Ω = (0, 120).
(v) E: Putting an electric bulbs produced by new manufacturing process into test and measuring its lifetime. Sample
space Ω = [0, ∞).
(vi) E: Throwing two dice. Sample space
Ω = {(1, 1), (1, 2), . . . , (1, 6), (2, 1), (2, 2), . . . , (2, 6), . . . , (6, 1), (6, 2), . . . , (6, 6)}
= {(i, j) : i, j ∈ {1, 2, . . . , 6}}.
(vii) E: Putting two electric bulbs produced by new manufacturing process into test and measuring their lifetimes.
Sample space Ω = {(x1 , x2 ) : x1 ≥ 0, x2 ≥ 0} = [0, ∞) × [0, ∞).
(viii) E: Casting one red die and white die. Sample space
Ω = {(r, w) : r is number of spots on the red die and w is number of spots on the white die }
= {(1, 1), (1, 2), . . . , (1, 6), (2, 1), (2, 2), . . . , (2, 6), . . . , (6, 1), (6, 2), . . . , (6, 6)}
= {(i, j) : i, j ∈ {1, 2, . . . , 6}}
= {1, 2, . . . , 6} × {1, 2, . . . , 6} → has 36 elements.
Definition 0.5 (Event). An event is any subset of the sample space. If the outcome of a random experiment is a member
of the set E ⊆ Ω, we say that event E has occured.
Example 0.6. In Example 0.4 (vi), A = {(1, 5), (6, 2), (2, 2)} is an event. Also, in Example 0.4 (vii), A = {(x1 , x2 ) :
x1 ≤ 6, x2 ≥ 8} = [0, 6] × [8, ∞) may be an event.
Impossible Event: φ.
Sure Event: Ω.
Sn
Exhaustive Events: If i=1 Ai = Ω then we call A1 , A2 , . . . , An to be exhaustive events.
Mutually Exclusive Events: If A ∩ B = φ then A and B are called mutually exclusive events i.e., happening or
occcurrence of one of them excludes the possiblity of occurrence of other.
Pairwise Disjoint Events: Let A1 , A2 , . . . be events such that Ai ∩ Aj = φ, i 6= j. Then, we say that A1 , A2 , . . .
are pairwise disjoint or mutually exclusive.
Let A and B be two events. Then,
(i) A ∪ B → occurrence of at least one of the event A and B.
S∞
(ii) i=1 Ai → occurrence of at least one Ai , i = 1, 2, . . . , n.
(iii) A ∩ B → simultaneous occurrence of A and B.
T∞
(iv) i=1 Ai → simultaneous occurrence of Ai , i = 1, 2, . . . , n.
(v) Ac → not happenning of A.
Lecture 0 Lecture Notes 0-3
The algebra of set theory is applicable in probability theory. Probability is a measure of uncertainty. We are interested
in quantifying uncertainty associated with various outcomes of a random experiment by assigning probability to these
outcomes.
Here, we will not discuss how probabilities are assigned (which is a part of probability modelling) rather we will
discuss properties of a probability as a measure.
Recall that E denotes a random experiment, Ω denotes the sample space of E and F denotes event space. For all
practical purposes one may take F = P(Ω).
A set function is a function whose domain is a collection of sets (called a class of sets).
Definition 0.8 (Probability Function or Probability Measure). A probability function (or probability measure) is a real
valued set function, defined on the event space F satisfying the following axioms:
(a) P (Ω) = 1 (certainty),
(b) P (A) ≥ 0 ∀ A ∈ F (positivity),
(c) If A1 , A2 ∈ F be mutually exclusive/disjoint sets (i.e. A1 ∩ A2 = φ, the empty set) then
More generally, if {An }n≥1 is a sequence of mutually exclusive (disjoint) sets in F i.e., Ai ∩ Aj = φ, i 6= j, then
∞ ∞
!
[ X
P Ai = P (Ai ) (countable additivity).
i=1 i=1
We call P (A) the probability of event A. The triplet (Ω, F, P ) is called probability space.
Remark 0.9. Axiom (b) and (c) are desirable for any measure (such as area, volume, probability etc.). Since the
sample space Ω consists of all possible outcomes its occurrence is certain (100% chance of occurrence) and therefore
Axiom (a) (P (Ω) = 1) is also reasonable.
∞
S
Proof. Let A1 = Ω and Ai = φ, i = 2, 3, . . . Also, we have A1 = Ai , Ai ∩ Aj = φ, ∀ i 6= j. Therefore,
i=1
∞
!
[
P (Ω) = P Ai
i=1
∞
X
=⇒ 1 = P (Ai ), (Axioms (a) and (c))
i=1
n
X
=⇒ 1 = lim P (Ai )
n→∞
i=1
=⇒ 1 = lim [P (Ω) + (n − 1)P (φ)]
n→∞
=⇒ 1 = 1 + lim [(n − 1)P (φ)]
n→∞
=⇒ P (φ) = 0.
n
S ∞
S
Proof. Let Ai = φ, i = n + 1, n + 2, . . . Then Ai ∩ Aj = φ, ∀ i 6= j and Ai = Ai . This implies
i=1 i=1
∞
n
! !
[ [
P Ai =P Ai
i=1 i=1
∞
X
= P (Ai ), (Axioms (c))
i=1
Xn
= P (Ai ), (P (Ai ) = P (φ) = 0, ∀ i = n + 1, n + 2, . . .).
i=1
1 = P (Ω) = P (A ∪ Ac ) = P (A) + P (Ac ) ≥ P (A), (using Axioms (a), (b) and (P2)).
(P4) Let A1 , A2 ∈ F be such that A1 ⊆ A2 . Then, P (A2 − A1 ) = P (A2 ) − P (A1 ) and P (A1 ) ≤ P (A2 ).
By Axiom (b), we have P (A2 − A1 ) ≥ 0 =⇒ P (A2 ) ≥ P (A1 ), that is, P (·) is monotone.
Also, we have
(A1 ∩ A2 ) ∩ (A2 − A1 ) = φ and A2 = (A1 ∩ A2 ) ∪ (A2 − A1 ),
which implies
(d) Let A1 , A2 ∈ F. Then, using (P3), (P5) and Axiom (b), we get
(sum of probabilities of all possible intersections involving 2 events out of the k events A1 , . . . , Ak )
..
.
XX X
pi,k = ··· P (Aj1 ∩ Aj2 ∩ · · · ∩ Aji )
1≤j1 <j2 <···<ji ≤k
(sum of probabilities of all possible intersections involving i events out of k events A1 , . . . , Ak , i = 1, . . . , k).
Then, !
k
[
P Ai = p1,k − p2,k + p3,k − p4,k + · · · + (−1)k−1 pk,k .
i=1
Proof. Note that, for k = 2, p1,2 = P (A1 ) + P (A2 ), p2,2 = P (A1 ∩ A2 ) and
Thus the result is true for k = 2. Now suppose that the result is true for k = 2, 3, . . . , m, that is,
k
!
[
P Ai = p1,k − p2,k + p3,k − p4,k + · · · + (−1)k−1 pk,k ∀ k = 2, 3, . . . , m.
i=1
Then,
m+1
! m
! !
[ [ [
P Ai =P Ai Am+1
i=1 i=1
m
! m
! !
[ [ \
=P Ai + P (Am+1 ) − P Ai Am+1 , (using result for k = 2)
i=1 i=1
m m
! m
X [ [
= (−1)j−1 pj,m + P (Am+1 ) − P (Ai ∩ Am+1 ) , (using the result for k = m on Ai )
j=1 i=1 i=1
Xm m
X m
[
= (−1)j−1 pj,m + P (Am+1 ) − (−1)j−1 tj,m , (using the result for k = m on (Ai ∩ Am+1 )),
j=1 j=1 i=1
where
m
X
t1,m = P (Ai ∩ Am+1 )
i=1
XX
t2,m = P (Ai ∩ Aj ∩ Am+1 )
1≤i<j≤m
XX X
tj,k = ··· P (Ai1 ∩ Ai2 ∩ · · · ∩ Aij ∩ Am+1 ), j = 1, 2, . . . , m
1≤i1 <i2 <···<ij ≤m
Lecture 0 Lecture Notes 0-7
Therefore,
m+1
!
[
P Ai = (p1,m + P (Am+1 )) − (p2,m + t1,m ) + (p3,m + t2,m ) + · · · + (−1)m−1 (pm,m + tm−1,m ) + (−1)m tm,m
i=1
= p1,m+1 − p2,m+1 + p3,m+1 + · · · + (−1)m−1 pm,m+1 + (−1)m pm+1,m+1 ,
as
m
X
p1,m + P (Am+1 ) = P (Aj ) + P (Am+1 ) = p1,m+1 ,
j=1
XX m
X
p2,m + t1,m = P (Ai ∩ Aj ) + P (Ai ∩ Am+1 )
1≤i<j≤m i=1
XX
= P (Ai ∩ Aj ) = p2,m+1 ,
1≤i<j≤m+1
..
.
XX X
pm,m + tm−1,m = P (A1 ∩ A2 ∩ · · · ∩ Am ) + ··· P (Ai1 ∩ Ai2 ∩ · · · ∩ Aim−1 ∩ Am+1 )
1≤i1 <i2 <···<im−1 ≤m
= pm,m+1
and tm,m = P (A1 ∩ A2 ∩ · · · ∩ Am ∩ Am+1 ) = pm+1,m+1 . The result now follows by induction.
Remark 0.12. Let A1 , A2 , A3 ∈ F. Then
Proof. Note that for k = 2, p1,2 = P (A1 ) + P (A2 ), p2,2 = P (A1 ∩ A2 ) and
This implies p1,2 − p2,2 = P (A1 ∪ A2 ) ≤ P (A1 ) + P (A2 ). Thus the result is true for k = 2. Now suppose that for
some positive integer m(≥ 2)
k
!
[
p1,k − p2,k ≤ P Ai ≤ p1,k ∀ k = 1, 2, . . . , m.
i=1
Then,
m+1
! m
! !
[ [ [
P Ai =P Ai Am+1
i=1 i=1
Lecture 0 Lecture Notes 0-8
m
!
[
≤P Ai + P (Am+1 ), using result for k = 2, A = ∪m
i=1 Ai and B = Am+1 ,
i=1
then P (A ∪ B) ≤ P (A) + P (B)
≤ p1,m + P (Am+1 )
= p1,m+1 . (0.3)
and !
m
[ m
X
P (Ai ∩ Am+1 ) ≤ P (Ai ∩ Am+1 )
i=1 i=1
Thus,
m+1
! m
! !
[ [ [
P Ai =P Ai Am+1
i=1 i=1
m
! m
!
[ [
=P Ai + P (Am+1 ) − P (Ai ∩ Am+1 )
i=1 i=1
m
X
≥ p1,m − p2,m + P (Am+1 ) − P (Ai ∩ Am+1 )
i=1
m
!
X
= (p1,m + P (Am+1 )) − p2,m + P (Ai ∩ Am+1 ) (0.4)
i=1
for m = 1, 2, . . . , [ k2 ].
Theorem 0.15 (Bonferroni’s Inequality). Let A1 , A2 , . . . , Ak ∈ F. Then
k
! ( k )
\ X
P Ai ≥ max P (Ai ) − (k − 1), 0 .
i=1 i=1
Lecture 0 Lecture Notes 0-9
Proof. We have
k
! k
!c !
\ [
P Ai =P Aci , (De-Morgan’s law)
i=1 i=1
k
!
[
=1−P Aci
i=1
k
X
≥1− P (Aci ), (Boole’s inequality)
i=1
k
X
=1− (1 − P (Ai ))
i=1
k
X
= P (Ai ) − (k − 1). (0.5)
i=1
Also, !
k
\
P Ai ≥ 0. (0.6)
i=1
F1 = E1 ,
F2 = E2 − E1 ,
..
.
Fn = En − En−1 .
Sn Pn
Then, {Fn } is a disjoint sequence of events and En = i=1 Fi =⇒ P (En ) = i=1 P (Fi ). Now
n
[ ∞
[
lim En = lim Fi = Fn .
n→∞ n→∞
i=1 n=1
Lecture 0 Lecture Notes 0-10
So,
∞ ∞
! n n
!
[ X X [
P lim En = P Fn = P (Fn ) = lim P (Fi ) = lim P Fi = lim P (En ).
n→∞ n→∞ n→∞ n→∞
n=1 n=1 i=1 i=1
1
P ({ωi }) = , i = 1, 2, . . . , k (each elementary event is equally likely).
k
Sr event E ⊆ Ω, we have E = {ωi1 , ωi2 , . . . , ωir }, for some i1 , i2 , . . . , ir ∈ {1, 2, . . . , k}, 1 ≤ r ≤ k. Then,
For any
E = j=1 {ωij } and
[r X r
P (E) = P {ωij } = P {ωij }
j=1 j=1
r
X 1 r number of ways favourable to event E
= = = .
j=1
k k total number of ways in which the random experiment can terminate
1
Here the assumption of equally likely P ({ωi }) = , i = 1, 2, . . . , k is a part of probability modelling.
k
“At random”: In a random experiment with finite sample space Ω, whenever we say that the experiment has been
performed at random it means that all the outcomes in the sample space are equally likely.
Lecture 0 Lecture Notes 0-11
Example 0.18 (Birthday Problem). Suppose that a college has n students, including you. Each of them were born on
non-leap years.
(a) Find the probability that at least two of them have the same birthday. For what values of n this probability is more
than 0.5, 0.8, 0.95?
(b) For what value of n the probability that you will find someone who shares your birthday is 0.5.
13
5
Solution: (i) P (E1 ) = 52 ,
5
39
5
(ii) P (E2 ) = 1 − P (E2c ) = 1 − P (no card is spade) = 1 − 52 ,
5
4
4
3
(iii) P (E3 ) = 52
2 ,
5
4
4 4
2 2 1
(iv) P (E4 ) = 52
.
5
Example 0.20 (Capture/Recapture Method for Estimating Population Size). In a wildlife population suppose that the
population size n is unknown. To estimate the population size n, 20 animals are captured, tagged and then released
back. Thereafter 40 animals are captured at random and it is found that 8 of them are tagged. Find an estimate of the
population size n based on the given data.
Solution: We have
n−19 n−20
32 32
l(n + 1) > l(n) ⇐⇒ n+1
> n
40 40
n − 19
⇐⇒ >1
(n − 51)(n + 1)
⇐⇒ n < 99.
Similarly l(n + 1) < l(n) ⇐⇒ n > 99. Thus l is maximized at n = 99, that is, for n = 99, the observe data (among
the captured animals 8 are tagged and 32 are untagged) is most probable.
Thus an estimate of n is n̂ = 99 (Maximum likelihood estimator).
|A ∩ B| |A ∩ B|/n P (A ∩ B)
P (B|A) = = = , B ∈ F.
|A| |A|/n P (A)
Definition 0.21. Let (Ω, F, P ) be a probability space and let A ∈ F be such that P (A) > 0. Then
P (A ∩ B)
P (B|A) = , B ∈ F,
P (A)
is called the conditional probabilty of event B given the event A.
Remark 0.22. (a) In the above definition the event A (with P (A) > 0) is fixed and for this fixed A ∈ F, P (·|A) is a
set function defined on F. Is it a probability function/ measure?
(b) P (A ∩ B) = P (A)P (B|A) = P (B)P (A|B) for A, B ∈ F.
Theorem 0.23. Let (Ω, F, P ) be a probability space and let A ∈ F be such that P (A) > 0 be fixed. Then
P (·|A) : F → R is a probability function (called the conditional probabilty function) on F (so that (Ω, F, P (·|A)) is
a probability space).
P (A ∩ B) P (A ∩ Ω)
Proof. Note that P (B|A) = ≥ 0 for all B ∈ F and P (Ω|A) = = 1.
P (A) P (A)
Lecture 0 Lecture Notes 0-13
Since {Bn }n≥1 are disjoint then subsets {Bn ∩ A}n≥1 are also disjoint. Since P (·) is a probability measure, we get
∞ ∞ ∞
! P
∞
[ P (Bn ∩ A) X P (Bn ∩ A) X
P Bn | A = n=1 = = P (Bn |A).
n=1
P (A) n=1
P (A) n=1
It follows that P (·|A) is a probabilty function on F for any fixed A ∈ F with P (A) > 0.
Example 0.24. Five cards are drawn at random (without replacement) from a deck of 52 cards. Define events
Find P (B|A).
Solution: We have
P (A ∩ B) P (B)
P (B|A) = = (since B ⊆ A)
P (A) P (A)
13
/ 52
= 13 395 5
13
52 = 0.441.
4 1 + 5 / 5
provided P (C1 ∩ C2 ∩ · · · ∩ Cn−1 ) > 0 (which also ensures that P (C1 ∩ C2 ∩ · · · ∩ Ci ) > 0, i = 1, 2, . . . , n − 2).
Due to symmetry, if (α1 , α2 , . . . , αn ) is a permutation of (1, 2, . . . , n), then
n
!
\
P Ci = P (Cα1 ∩ Cα2 ∩ · · · ∩ Cαn )
i=1
= P (Cα1 )P (Cα2 |Cα1 )P (Cα3 |Cα1 ∩ Cα2 ) . . . P (Cαn |Cα1 ∩ Cα2 ∩ · · · ∩ Cαn−1 )
provided P (Cα1 ∩Cα2 ∩· · ·∩Cαn−1 ) > 0 (which also ensures that P (Cα1 ∩Cα2 ∩· · ·∩Cαi ) > 0, i = 1, 2, . . . , n−2).
Lecture 0 Lecture Notes 0-14
Example 0.26. A bowl contains 3 red and 5 blue chips. All chips that are of the same colour are identical. Two chips
are drawn successively at random and without replacement. Define events
3 5
Solution: P (A) = , P (B|A) = and
8 7
5 3 4 5 35
P (B) = P (A ∩ B) + P (Ac ∩ B) = P (B|A)P (A) + P (B|Ac )P (Ac ) = × + × = .
7 8 7 8 56
Note that here the outcomes of second draw is dependent on outcome of first draw (P (B|A) 6= P (B)). Also,
3 5
P (A ∩ B) = P (A)P (B|A) = × = 0.2679.
8 7
Theorem 0.27 (Theorem of Total Probability). For a countable set ∆ (that is elements of ∆ can either be put in 1-1
correspondence with N = {1, 2, . . . } or with {1, 2, . . . , n} for some n ∈ N), let {ESα : α ∈ ∆} be a countable
collection of mutually exclusive (i.e., Eα ∩ Eβ = φ, ∀ α 6= β ) and exhaustive (i.e., P α∈∆ Eα = 1) events. Then,
for any E ∈ F, X X
P (E) = P (E ∩ Eα ) = P (E|Eα )P (Eα ).
α∈∆ α∈∆
P (Eα )>0
S
Proof. Since P α∈∆ Eα = 1, we have
!! !
\ [ [
P (E) = P E Eα =P (E ∩ Eα )
α∈∆ α∈∆
X
= P (E ∩ Eα ), (Eα ’s are disjoint =⇒ their subsets (E ∩ Eα )’s are disjoint)
α∈∆
X
= P (E ∩ Eα ), (P (Eα ) = 0 =⇒ P (E ∩ Eα ) = 0, α ∈ ∆)
α∈∆
P (Eα )>0
X
= P (E|Eα )P (Eα ).
α∈∆
P (Eα )>0
(b)
P (M ∩ S) P (S|M )P (M ) 0.30 × 0.60 3
P (M |S) = = = = .
P (S) P (S) 0.24 4
Theorem 0.29 (Bayes’ Theorem). Let {Eα : α ∈ ∆} be a countable collection of mutually exclusive and exhaustive
events and let E be any event P (E) > 0. Then, for j ∈ ∆ with P (Ej ) > 0,
P (E|Ej )P (Ej )
P (Ej |E) = P .
P (E|Eα )P (Eα )
α∈∆
P (Eα )>0
Proof. For j ∈ ∆,
Remark 0.30. (a) Suppose that occurrence of any of the mutually exclusive and exhaustive events {Eα : α ∈ ∆}
(where ∆ is a countable set) may cause the occurrence of an event E. Given that the event E has occurred (i.e., given
the effect), Bayes’ Theorem provides the conditional probability that the event E (effect) is caused by occurrence of
event Ej , j ∈ ∆.
(b) In Bayes’ Theorem {P (Ej ) : j ∈ ∆} are called prior probabilities and {P (Ej |E) : j ∈ ∆} are called posterior
probabilities.
Example 0.31. Bowl C1 contains 3 red and 7 blue chips. Bowl C2 contains 8 red and 2 blue chips. Bowl C3 contains
5 red and 5 blue chips. All chips of the same colour are identical.
A die is cast and a bowl is selected as per the following schemes:
The selected bowl is handed over to another person who drawns two chips at random from this bowl. Find the
probability that:
(a) Two red chips are drawn.
(b) Given that drawn chips are both red, find the probability that it came from bowl C3 .
Lecture 0 Lecture Notes 0-16
(b)
(52) 1
×
P (R|A3 )P (A3 ) (10
2) 6 1
P (A3 |R) = = 10 = .
P (R) 27
10
Remark 0.32. In the above example,
(32) 1
×
P (R|A1 )P (A1 ) (10
2)
3 3
P (A1 |R) = = 10 = ,
P (R) 27
50
(82) 1
×
P (R|A2 )P (A2 ) 2(1021
2)
P (A2 |R) = = 10 = ,
P (R) 27
25
3 1
P (A1 |R) = < = P (A1 ) ⇐⇒ P (A1 ∩ R) < P (A1 )P (R) ←→ R has negative information about A1 ,
50 3
21 1
P (A2 |R) = > = P (A2 ) ⇐⇒ P (A2 ∩ R) > P (A2 )P (R) ←→ R has positive information about A2 ,
25 2
1 1
P (A3 |R) = < = P (A3 ) ⇐⇒ P (A3 ∩ R) < P (A3 )P (R) ←→ R has negative information about A3 .
10 6
Note that proportion of red chips in C2 > proportion of red chips in Ci , i = 1, 3.
Independent Events:
Definition 0.33. Let {Ej : j ∈ ∆} be a collection of events.
(i) Events {Ej : j ∈ ∆} are said to be pairwise independent if for any pair of events Eα and Eβ (α, β ∈ ∆, α 6= β)
in the collection {Ej : j ∈ ∆}, we have
(ii) Events {E1 , E2 , . . . , En } are said to be independent if for any subcollection {Eα1 , Eα2 , . . . , Eαk } of {E1 , E2 , . . . , En }
(k = 1, 2, . . . , n), we have
\k Yk
P Eαj = P (Eαj ).
j=1 j=1
(iii) Let ∆ ⊆ R be an arbitrary index set so that {Eα : α ∈ ∆} is an arbitrary collection of events. Events
{Eα : α ∈ ∆} are said to be independent if any finite subcollection of events in {Eα : α ∈ ∆} forms a collection of
independent events.
Lecture 0 Lecture Notes 0-17
Tn T∞ T∞ T∞
Proof. Let Bn = k=1 Ek , n = 1, 2, . . . . Then Bn ↓ and P ( n=1 Bn ) = lim P (Bn ). But n=1 Bn = k=1 Ek
Tn Qn n→∞
and P (Bn ) = P ( k=1 Ek ) = k=1 P (Ek ). Thus,
∞ ∞
! n
\ Y Y
P Ek = lim P (Ek ) = P (Ek ).
n→∞
k=1 k=1 k=1
For an example to conclude that three events E1 , E2 , E3 are independent, the following four (as 23 − 3 − 1 = 4)
conditions must be verified:
and
P (E1 ∩ E2 ∩ E3 ) = P (E1 )P (E2 )P (E3 ).
(ii) Any subcollection of independent events is independent. In particular, the independence of a collection of events
implies their pairwise independence.
(iii) If E1 and E2 are independent events (P (E1 ) > 0, P (E2 ) > 0), then
which implies that A, B and C are not independent although they are pairwise independent.
Lecture 0 Lecture Notes 0-18
Let (Ω, F, P ) be a given probability space. In some situations we may not be directly interested in the sample space
Ω; rather we may be interested in some numerical aspect of Ω.
Example 0.39. A fair coin (head and tail are equally likely) is tossed three times independently. Then,
and P ({ω}) = 1/8 for all ω ∈ Ω. Suppose that we are interested in number of heads in three tosses, i.e., we are
interested in the function X : Ω → R defined as
0, if ω = T T T,
1, if ω ∈ {HT T, T HT, T T H},
X(ω) =
2, if ω ∈ {HHT, HT H, T HH},
3, if ω = HHH.
Note: From rigorous mathematical point of view a random variable is a real valued function with some technical
condition. In this course we are ignoring these technical details. For all practical purpose r.v. is a real valued function
defined on Ω.
For a function Y : Ω → R and A ⊆ R, define
(d) Y −1 −1
T T
α∈Λ Aα = α∈Λ Y (Aα ).
Definition 0.45. Let X be a r.v. defined on probability space (Ω, F, P ) and let (R, B, PX ) denote the probability
space induced by X. Define the function FX : R → R by
FX (x) = P (X ≤ x) = P (X −1 (−∞, x]) = PX ((−∞, x]), x ∈ R.
The function FX is called the cumulative distribution function (c.d.f.) or simply the distribution function (d.f.) of r.v.
X.
Note: Whenever there is no ambiguity we will drop subscript X in FX to represent d.f. of a r.v. by F . It can be shown
(in advanced courses) that the c.d.f. FX (·) of a r.v. X determines the induced probability measure PX (·) uniquely.
Thus to study the random behaviour of r.v. X it suffices to study its d.f. F .
Example 0.46. In the previous example
P (X = 0) = PX ({0}) = 1/8, P (X = 1) = PX ({1}) = 3/8 = P (X = 2) = PX ({2})
and P (X = 3) = PX ({3}) = 1/8. Then, the d.f. of X is obtained as
0, x < 0,
1/8, 0 ≤ x < 1,
X
FX (x) = P (X ≤ x) = P ({ω : X(ω) ≤ x}) = PX ({i}) = 1/8 + 3/8 = 1/2, 1 ≤ x < 2,
7/8, 2 ≤ x < 3,
i∈{0,1,2,3}
i≤x
1, x ≥ 3.
Theorem 0.47. Let F (·) be the c.d.f. of a r.v. X defined on a probability space (Ω, F, P ) and let (R, B, PX ) be the
probability space induced by X. Then
(i) F is non-decreasing,
(ii) F (x) is right continuous,
(iii) F (−∞) = lim F (−n) = 0 and F (∞) = lim F (n) = 1.
n↑∞ n↑∞
Conversely, any function G(·) satisfying properties (i)-(iii) is a d.f. of some r.v. Y defined on a probability space
(Ω∗ , F∗ , P ∗ ).
Proof. (i) Let −∞ < x < y < ∞. Then (−∞, x] ⊆ (−∞, y] =⇒ PX ((−∞, x]) ≤ PX ((−∞, y]). This implies
that F (x) ≤ F (y).
(ii) Since F is monotone and bounded below (by 0), lim F (x + h) = F (x+) exists ∀ x ∈ R. Therefore,
h↓0
1
F (x+) = lim F (x + h) = lim F x+ = lim PX ((−∞, x + 1/n]) .
h↓0 n→∞ n n→∞
Lecture 0 Lecture Notes 0-21
T∞
Let An = (−∞, x + 1/n], n = 1, 2, . . . . Then An ↓ and n=1 (−∞, x + 1/n] = (−∞, x]. Thus,
∞
!
\
F (x+) = PX (−∞, x + 1/n] = PX ((−∞, x]) = F (x).
n=1
Also,
∞
!
[
F (+∞) = lim F (n) = lim PX ((−∞, n]) = PX (−∞, n] , ((−∞, n] ↑)
n→∞ n→∞
n=1
∞
!
[
= PX (R), (−∞, n] = R
n=1
= 1.
(ii) From the calculus we know that any monotone function is either continuous on R or it has atmost countable number
of discontinuities. Thus any c.d.f F (x) is either continuous on R or has atmost countable number of discontinuities.
Since, for any x ∈ R, F (x+) and F (x−) exist, F has only jump discontinuities (F (x) = F (x+) > F (x−)).
(iii) A distribution function F is continuous at a ∈ R iff F (a) = F (a−).
0, if x < 0,
x
3, if 0 ≤ x < 1,
G(x) = 12 , if 1 ≤ x < 2,
2
3, if 2 ≤ x < 3,
1, if x ≥ 3.
(a) Show that G is d.f. of some r.v. X,
(b) Find P (X = a) for various values of a ∈ R,
(c) Find P (X < 3), P X ≥ 12 , P (2 < X ≤ 4), P (1 ≤ X < 2), P (2 ≤ X ≤ 3) and P 1
2 <X<3 .
Solution: (a) Clearly G is non-decreasing in (−∞, 0), (0, 1), (1, 2), (2, 3) and (3, ∞). Moreover,
1 1
G(0) − G(0−) = 0 ≥ 0, G(1) − G(1−) =
− > 0,
2 3
2 1 2
G(2) − G(2−) = − > 0, G(3) − G(3−) = 1 − > 0.
3 2 3
It follows that G is non-decreasing.
Clearly G is continuous ( and hence right continuous) on (−∞, 0), (0, 1), (1, 2), (2, 3) and (3, ∞). Moreover,
G(0+) − G(0) = 0 − 0 =0
G(1+) − G(1) = 1/2 − 1/2 = 0
=⇒ G is right continuous on R.
G(2+) − G(2) = 2/3 − 2/3 = 0
G(3+) − G(3) = 1 − 1 =0
Also, G(+∞) = lim G(x) = 1 & G(−∞) = lim G(−x) = 0. Thus, G is a d.f. of some random variable X.
x→∞ x→∞
P (X = a) = G(a) − G(a−) = 0, ∀ a 6= 1, 2, 3,
1 1 1
P (X = 1) = G(1) − G(1−) = − = ,
2 3 6
2 1 1
P (X = 2) = G(2) − G(2−) = − = ,
3 2 6
2 1
P (X = 3) = G(3) − G(3−) = 1 − = .
3 3
1 1 1 5
P X≥ =1−G − = , =1−
2 2 6 6
2 1
P (2 < X ≤ 4) = G(4) − G(2) = 1 − = ,
3 3
1 1 1
P (1 ≤ X < 2) = G(2−) − G(1−) = − = ,
2 3 6
1 1
P (2 ≤ X ≤ 3) = G(3) − G(2−) = 1 − = ,
2 2
1 1 2 1 1
P < X < 3 = G(3−) − G = − = .
2 2 3 6 2
Let (Ω, F, P ) be a probability space and let X : Ω → R be a r.v. with induced probability space (R, B, PX ) and d.f.
F.
Definition 0.50. The r.v. X is said to be a discrete r.v. if there exists a countable set S (finite or infinite) such that
P (X = x) = F (x) − F (x−) > 0 ∀ x ∈ S, and P (X ∈ S) = 1.
The set S is called the support of r.v. X.
Remark 0.51. (i) If S is the support of a discrete r.v. X, then clearly
S = {x ∈ R : F (x) − F (x−) > 0} = set of discontinuity points of F.
Example 0.52. In Example 0.49 the set of discontinuity points of G is D = {1, 2, 3} and
X
[G(x) − G(x−)] = 1/6 + 1/6 + 1/3 = 2/3 < 1 =⇒ X is not a discrete r.v.
x∈D
Definition 0.54. Let X be a r.v. with c.d.f. FX and support SX . Define the function fX : R → R by
(
P (X = x) = FX (x) − FX (x−) > 0, if x ∈ SX ,
fX (x) =
0, otherwise.
Whenever there is no ambiguity we will drop subscript X in FX , SX and fX to represent the d.f. of X by F , the
support of X by S and the p.m.f. of X by f .
Remark 0.55. (i) Let X be a discrete r.v with p.m.f. f and d.f F . Then, for any A ⊆ R
X
P (X ∈ A) = P (X ∈ A ∩ S) = f (x), (A ∩ S ⊆ S and thus A ∩ S is a countable set),
x∈A∩S
(ii) Clearly a d.f. determines the p.m.f. uniquely and vice-versa. Thus it suffices to study the p.m.f. of discrete r.v.
(iii) Let X be a discrete r.v. with p.m.f. f and support S. Then, f : R → R satisfies
X
(i) f (x) > 0, ∀ x ∈ S, (ii) f (x) = 1.
x∈S
Conversely, suppose that g : R → R is a function such that, for some countable set T
X
(i) g(x) > 0, ∀ x ∈ T and (ii) g(x) = 1.
x∈T
We have seen in Example 0.53 that X is a discrete r.v with support S = {0, 1, 2, 3}. Then, the p.m.f. of X is
f : R → R, where
f (0) = F (0) − F (0−) = 1/8, f (1) = F (1) − F (1−) = 1/2 − 1/8 = 3/8,
f (2) = F (2) − F (2−) = 7/8 − 1/2 = 3/8 and f (3) = F (3) − F (3−) = 1 − 7/8 = 1/8.
Example 0.57. A fair die (all outcomes are equally likely) is tossed repeatedly and independently until a 6 is observed.
Then X is a discrete r.v. with support S = {1, 2, 3, . . . }.
x−1
5
1
, if x = 1, 2, 3, . . . ,
p.m.f. f (x) = P (X = x) = 6 6
0, otherwise
and d.f.
0,
if x < 1,
1/6, if 1 ≤ x < 2,
11/36, if 2 ≤ x < 3,
F (x) =
..
.
Pi ( 5 )j−1 1 = 1 − 5 i ,
j=1 6 6 6 if i ≤ x < i + 1.
The function f (·) is called the probability density function (p.d.f.) of X. The support of the continuous r.v X is the
Z x+h
set S = {x ∈ R : F (x + h) − F (x − h) > 0 ∀ h > 0}, that is, S = {x ∈ R : f (t)dt > 0 ∀ h > 0}.
x−h
Remark 0.59. (i) From the fundamental theorem of calculus, we know that the definite integral
Z x
F (x) = f (t)dt
−∞
is a continuous function on R. Thus, the d.f F of any continuous r.v X is continuous everywhere on R. In particular,
P (X = x) = F (x) − F (x−) = 0, ∀ x ∈ R.
= F (b) − F (a)
Z b Z a Z b
= f (t)dt − f (t)dt = f (t)dt.
−∞ −∞ a
(iii) Let f (·) be the p.d.f. of a continuous r.v. X and let E ⊆ R be any countable subset of R. Define g : R → [0, ∞)
by (
f (x), if x ∈ R ∩ E c ,
g(x) =
Cx , if x ∈ E,
where Cx ≥ 0 are arbitrary. Then
Z x Z x
F (x) = f (t)dt = g(t)dt ∀ x ∈ R
−∞ −∞
and, thus, g is also a p.d.f. of X. Thus, the p.d.f. of a continuous r.v. is not unique.
(iv) There are random variables that are neither discrete nor continuous (see Example 0.49). Such random variables
will not be studied here.
Remark 0.61. (i) The p.d.f. determines the d.f. uniquely. Converse is not true. However, the d.f. determines the p.d.f.
almost uniquely (they may vary on sets that have no length (or have zero content)). Thus it is enough to study the p.d.f.
of a continuous r.v.
(ii) Let X be continuous r.v. with p.d.f. f (x). Then,
Z ∞
(a) f (x) ≥ 0 ∀ x ∈ R and (b) f (t)dt = 1.
−∞
Solution: Let D be the set of discontinuity points of F . Then D = {1, 2, 5/2}. So, D 6= φ =⇒ X is not a
continuous r.v. So
X 1 1 3 2 15 11
[F (x) − F (x−)] = − + − + 1− = < 1 =⇒ Xis not a discrete r.v.
3 4 4 3 16 48
x∈D
Solution: Clearly F is continuous everywhere. Moreover, F is differentiable everywhere except at two (countable)
points 1, 2, and
0, if x < 0,
x, if 0 < x < 1,
F 0 (x) =
1/2, if 1 < x < 2,
if x ≥ 2.
0,
Z ∞ Z 1 Z 2
1
Also, F 0 (x)dx = xdx + dx = 1 =⇒ X is continuous r.v. with p.d.f.
−∞ 0 1 2
x,
if 0 < x < 1,
f (x) = 1/2, if 1 < x < 2,
0, otherwise.
The support of X is
( )
Z x+h
S = {x ∈ R : F (x + h) − F (x − h) > 0 ∀ h > 0} = x∈R: f (t)dt > 0 ∀ h > 0 = [0, 2].
x−h
Let (Ω, F, P ) be a probability space and let X : Ω → R be a r.v. with d.f. F , p.m.f. f and support S. Let h : R → R
be a given function. Define Z : Ω → R as
Z(ω) = h(X(ω)), ω ∈ Ω.
Then Z is a r.v. and it is a function of r.v. X. Since we are only interested in values of random variables X and Z and
not in the original probability space (Ω, F, P ), we simply write X(ω), ω ∈ Ω as X and Z(ω), ω ∈ Ω as Z.
We have F (x) = P (X ≤ x), f (x) = P (X = x), x ∈ R, P (X ∈ S) = 1 and P (X = x) > 0 for all x ∈ S.
Define T = h(S) = {h(x) : x ∈ S}. For any set A ⊆ R, define
h−1 (A) = {x ∈ S : h(x) ∈ A}.
Then T is a countable set. Also, P (Z = z) > 0, ∀ z ∈ T (since P (X = x) > 0, ∀x ∈ S) and P (Z ∈ T ) = 1 (since
P (X ∈ S) = 1). It follows that Z is a discrete r.v. Moreover, for z ∈ T ,
X X X
P (Z = z) = P (h(X) = z) = P (X = x) = P (X = x) = f (x),
{x∈S:h(x)=z} x∈h−1 ({z}) x∈h−1 (z)
Theorem 0.66. Let X be a discrete r.v. with support S, d.f. F and p.m.f. f . Let h : R → R be a given function. Then,
Z = h(X) is a discrete r.v. with support T = {h(x) : x ∈ S} and p.m.f.
X
f (x), if z ∈ T,
g(z) = x∈h ({z})
−1
0, otherwise,
and d.f. X X X
G(z) = P (Z ≤ z) = g(t) = f (x) = f (x).
{t∈T :t≤z} {x∈S:h(x)≤z} x∈h−1 ((−∞,z])∩S
Solution: Here, the support of X is S = {−2, −1, 0, 1, 2, 3}. By Theorem 0.66, Y = X 2 is discrete r.v. with support
T = {0, 1, 4, 9} and p.m.f.
P (X = 0), if z = 0,
1/7, if z = 0,
P (X = −1) + P (X = 1), if z = 1, 2/7, if z = 1,
2
g(z) = P (X = z) = P (X = −2) + P (X = 2), if z = 4, = 5/14, if z = 4,
P (X = 3), if z = 9, 3/14, if z = 9,
0, otherwise. 0, otherwise.
The d.f. of Y is
0, if z < 0
1/7, if 0 ≤ z < 1
G(z) = P (Y ≤ z) = 3/7, if 1 ≤ z < 4
11/14, if 4 ≤ z < 9
1, if z ≥ 9.
Example 0.68. In Example 0.67, directly find the d.f. of Y = X 2 (i.e. find d.f. of Y before finding the p.m.f. of Y ).
Hence find the p.m.f. of Y .
Solution: By Theorem 0.66, Y is a discrete r.v. with support T = {0, 1, 4, 9}. Thus the d.f. of Y is
0, z < 0,
P (X 2 = 0), 0 ≤ z < 1,
G(z) = P (Y ≤ z) = P (X 2 ≤ z) = P (X 2 = 0) + P (X 2 = 1), 1 ≤ z < 4,
2 2 2
P (X = 0) + P (X = 1) + P (X = 4), 4 ≤ z < 9,
1, z ≥ 9.
Lecture 0 Lecture Notes 0-30
0, z < 0, 0, z < 0,
1
7
, 0 ≤ z < 1,
1/7, 0 ≤ z < 1,
1 1 1
= 7 + 7 + 7 , 1 ≤ z < 4, = 3/7, 1 ≤ z < 4,
1 1 1 1 3
+ 7 + 7 + 7 + 14 , 4 ≤ z < 9, 11/14, 4 ≤ z < 9,
7
z ≥ 9. 1, z ≥ 9.
1,
The p.m.f. of Y is
1/7, if z = 0,
2/7, if z = 1,
(
G(z) − G(z−), if z ∈ T,
g(z) = = 5/14, if z = 4,
0, otherwise.
3/14, if z = 9,
0, otherwise.
Thus,
0, if z < h(a),
Z z
d −1
G(z) = f (h−1 (y)) h (y) dy, if h(a) ≤ z < h(b),
h(a) dy
1, if z ≥ h(b).
Since f is continuous on (a, b) it follows that G(z) is differentiable everywhere except possibly at z = h(a) and
z = h(b). Moreover,
f (h−1 (z)) d h−1 (z) , if h(a) < z < h(b),
0 dz
G (z) =
0, otherwise,
and Z ∞ Z h(b) Z b
d −1
G0 (z)dz = f (h−1 (z)) h (z) dz = f (t)dt = 1.
−∞ h(a) dz a
Lecture 0 Lecture Notes 0-31
Thus,
0, if z < h(b),
Z z
d −1
G(z) = f (h−1 (y)) h (y) dy, if h(b) ≤ z < h(a),
h(b) dy
1, if z ≥ h(a).
Since f is continuous on (a, b), it follows that G(·) is differentiable everywhere except possibly at h(a) and h(b).
Moreover,
f (h−1 (z)) d h−1 (z) , if h(b) < z < h(a),
0 dz
G (z) =
0, otherwise
and Z ∞ Z h(a) Z b
d −1
G0 (z)dz = f (h−1 (z)) h (z) dz = f (t)dt = 1.
−∞ h(b) dz a
The following theorem is a generalization of the above result and can be proved on similar lines.
Lecture 0 Lecture Notes 0-32
S
Theorem 0.70. Let X be a continuous r.v. with p.d.f. f (·) and support S =S i∈Λ [ai , bi ], where Λ is a countable set
and [ai , bi ]’s are disjoint intervals. Suppose that {x ∈ R : f (x) > 0} = i∈Λ (ai , bi ) and that f is continuous in
each (ai , bi ), i ∈ Λ. Let h : R → R be a function that is differentiable and strictly monotone in each (ai , bi ), i ∈ Λ (h
may be monotonic increasing in some (ai , bi ) and monotonic decreasing in some (ai , bi )). Let h−1 i (·) be the inverse
function of hi on (ai , bi ), i ∈ Λ. Then, Z = h(X) is a continuous r.v. with p.d.f.
(
X
−1 d −1 1, z ∈ hj ((aj , bj )),
g(z) = f (hj (z)) hj (z) Ihj ((aj ,bj )) (z), where Ihj ((aj ,bj )) (z) = .
dz 0, otherwise.
j∈Λ
Remark 0.71. Theorem 0.69 and Theorem 0.70 hold even in situations where the function h is differentiable every-
where except possibly at a finite number of points in S.
Example 0.72. Let X be a r.v. with p.d.f.
(
3x2 , 0 < x < 1,
f (x) =
0, otherwise.
Find the p.d.f. and d.f. of Y = 1/X 2 . What is the support of d.f. of Y .
Solution: The support of F is [0, 1] and {x ∈ R : f (x) > 0} = (0, 1). Moreover, f is continuous on (0, 1) and
h(x) = 1/x2 is differentiable and strictly monotone on (0, 1).
h((0, 1)) = (1, ∞). Now
1 1 1 d −1 1
y= =⇒ x = √ , i .e., h−1 (y) = √ =⇒ h (y) = − √ , y ∈ (1, ∞).
x2 y y dy 2y y
d −1
g(y) = f (h−1 (y)) h (y) Ih((0,1)) (y)
dy
d −1
= f (h−1 (y)) h (y) I(1,∞) (y)
dy
3
· 1√ , if y > 1, 3√ , if y > 1,
2
= y 2y y = 2y y
0, otherwise, 0, otherwise.
The d.f. of Y is
Z y 0, if y < 1, 0, if y < 1,
Z y
G(y) = g(t)dt = 3 = 1
−∞ √ dt, if y > 1, 1 − , if y > 1.
2 t y 3/2
1 2t
and let Y = X 2 .
Lecture 0 Lecture Notes 0-33
(a) Find the p.d.f. of Y directly and hence find the d.f. of Y .
(b) Find the d.f. of Y and hence find the p.d.f. of Y .
(c) Find the support of d.f. of Y .
Solution: (a) The support of F is S = [−1, 2] and we may take S = [−1, 0] ∪ [0, 2], {x ∈ R : f (x) > 0} =
(−1, 0) ∪ (0, 2). The p.d.f. f is continuous on (−1, 0) ∪ (0, 1) ∪ (1, 2), h(x) = x2 is differentiable on (−1, 0) ∪ (0, 2),
h(·) is strictly decreasing on (−1, 0) and strictly increasing on (0, 2).
√
h(x) = x2 is strictly decreasing on S1 = (−1, 0) with inverse function h−1 1 (y) = − y, y ∈ (0, 1), h(S1 ) = (0, 1).
√
h(x) = x2 is strictly increasing on S2 = (0, 2) with inverse function h−1 2 (y) = y, y ∈ (0, 4), h(S2 ) = (0, 4).
Thus, Y = X 2 is a continuous r.v. with p.d.f.
d −1 d −1
g(y) = f (h−1
1 (y)) h (y) I(0,1) (y) + f (h−1
2 (y)) h (y) I(0,4) (y)
dy 1 dy 2
√ −1 √ 1
= f (− y) √ I(0,1) (y) + f ( y) √ I(0,4) (y)
2 y 2 y
1 √ √
= √ f (− y)I(0,1) (y) + f ( y)I(0,4) (y)
2 y
1
2 , if 0 < y < 1,
= 61 , if 1 < y < 4,
0, otherwise.
The d.f. of Y is
0, if y < 0,
Z y 0, if y < 0,
dt
, if 0 ≤ y < 1,
y
, if 0 ≤ y < 1,
Z y
0 2 2
G(y) = P (X 2 ≤ y) = g(t)dt = Z 1 Z y = y+2
−∞ dt dt
, if 1 ≤ y < 4,
+ , if 1 ≤ y < 4,
6
2 1 6
0
1, if y ≥ 4.
1, if y ≥ 4.
(
2 0, if y < 0,
G(y) = P (X ≤ y) = √ √ .
P {− y ≤ X ≤ y}, if y > 0.
For 0 ≤ y < 1, √
y
√ √ |x|
Z
y
G(y) = P {− y ≤ X ≤ y} = √
dx = .
− y 2 2
√ √
For 1 ≤ y < 4 (so that −2 < − y ≤ −1 and 1 ≤ y ≤ 2)
√
1 y
√ √ |x|
Z Z
x y+2
G(y) = P {− y ≤ X ≤ y} = dx + dx = .
−1 2 1 3 6
Lecture 0 Lecture Notes 0-34
Clearly G is differentiable everywhere except at finite number of points (0,1 and 4) and we may take
1/2, if 0 < y < 1,
0
G (y) = 1/6, if 1 < y < 4,
0, otherwise.
Z ∞ Z 1 Z 4
1 1
Moreover, G0 (y)dy = dy + dy = 1. Thus, Y is a continuous r.v. with p.d.f.
−∞ 0 2 1 6
1/2, if 0 < y < 1,
g(y) = 1/6, if 1 < y < 4,
0, otherwise.
Let X be a discrete r.v. with p.m.f. f (·) and support S. For any x ∈ S, f (x) gives an idea about proportion
P of
times we will observe the event {X = x} if the experiment is repeated a large number of times. Thus x∈S xf (x)
represents the mean (or expected) value of r.v. X if the experiment is repeated a large number of times.
Z ∞
Similarly, if X is a continuous r.v. with p.d.f. f (·) then xf (x)dx (provided the integral is finite) represents the
−∞
mean (or expected) value of r.v. X.
Definition 0.74. (a) Let X be a discrete r.v. with p.m.f. f (·) and support S. We say that the expected value of X (or
the mean of X, which we denote by E(X)) is finite and equals
X X
E(X) = xf (x), provided |x|f (x) < ∞.
x∈S x∈S
(b) Let X be a continuous r.v. with p.d.f. f (·) and support S. We say that the expected value of X (or the mean of X,
which we denote by E(X)) is finite and equals
Z ∞ Z ∞
E(X) = xf (x)dx, provided |x|f (x)dx < ∞.
−∞ −∞
1 , if x ∈ {1, 2, 3, . . . },
f (x) = 2x .
0, otherwise.
f (x) = π 2 x2 .
0, otherwise.
n
where an = > 0, ∀ n = 1, 2, . . . and
2n
an+1 n+1 1
= → < 1, as n → ∞.
an 2n 2
P P∞ n
Thus by the ratio test x∈S |x|f (x) = n=1 < ∞. It can be seen that E(X) = 2 (Exercise).
2n
(b) Here the support of the distribution is S = {±1, ±2, . . . }.
∞
X 6 X1
|x|f (x) = = ∞ =⇒ E(X) is not finite.
π 2 n=1 n
x∈SX
(c) We have
∞ ∞ ∞
e−|x|
Z Z Z
|x|f (x)dx = |x| dx = xe−x dx = 1 < ∞ =⇒ E(X) is finite
−∞ −∞ 2 0
and
∞ ∞
e−|x|
Z Z
E(X) = xf (x)dx = x dx = 0.
−∞ −∞ 2
(d) We have
Z ∞ Z ∞ Z ∞
1 2 x
|x|f (x)dx = |x| 2
dx = dx = ∞ =⇒ E(X) is not finite.
−∞ −∞ π(1 + x ) π 0 1 + x2
Example 0.76 (St. Petersburg Paradox). To make some money a gambler plays a sequence of fair games with the
following strategy:
In the first bet he bet Rs. 1 million. If the first bet is lost he doubles his bet in the second game. He keeps on doubling
his bet until he wins a game. If the gambler has not won by the mth trial he bets Rs. 2m million in the (m + 1)th
game. If he wins in kth game then
The above scheme seems to be foolproof for earning Rs. 1 million rupee. By this logic all gamblers should be
billionaries!
X : the amount of money bet on the last game (the game he wins). Then
∞
1 X 2k
P (X = 2k ) = k+1
, k = 0, 1, 2, . . . , E(X) = = ∞ (E(X) is not finite).
2 2k+1
k=0
Proof. We will provide the proof for the case when X is a continuous r.v. with p.d.f., say f . We have
Z ∞
E(X) = xf (x)dx
−∞
Z 0 Z ∞
= xf (x)dx + xf (x)dx
−∞ 0
Z 0 Z 0 Z ∞ Z x
=− f (x)dydx + f (x)dydx
−∞ x 0 0
Z 0 Z y Z ∞ Z ∞ Z 0 Z ∞
=− f (x)dxdy + f (x)dxdy = − P (X < y)dy + P (X > y)dy.
−∞ −∞ 0 y −∞ 0
P∞
(c) Suppose that P (X ∈ {0, 1, 2, . . . }) = 1. Then E(X) = n=1 P (X ≥ n).
Proof. Exercise.
The following theorem suggests that for any r.v. X and any function h : R → R, E(h(X)) can be directly found using
p.m.f. / p.d.f. of X.
Theorem 0.79. (a) Let X be a discrete r.v. with p.m.f. f (·) and support S. Let h : R → R be a given function and let
Z = h(X). Then X X
E(Z) = h(x)f (x) provided |h(x)|f (x) < ∞.
x∈S x∈S
Lecture 0 Lecture Notes 0-37
(b) Let X be a continuous r.v. with p.d.f. f (·) and let h : R → R be a given function. If Z = h(X), then
Z ∞ Z ∞
E(Z) = h(x)f (x)dx, provided |h(x)|f (x)dx < ∞.
−∞ −∞
Proof. We will provide the proof of (a) only. The proof of (b) follows on similar lines. The support of Z = h(X) is
T = h(S). We have
X X
E(Z) = tP (Z = t) = tP (h(X) = t)
t∈T t∈T
X X
= t P (X = x)
t∈T {x∈S:h(x)=t}
X X
= tP (X = x)
{x∈S:h(x)=t} t∈T
X X
= h(x)P (X = x)
{x∈S:h(x)=t} t∈T
X X
= h(x)P (X = x)
t∈T {x∈S:h(x)=t}
X X
= h(x)P (X = x) = h(x)P (X = x).
S
{x∈S:h(x)=t} x∈S
t∈T
Find E(X 2 ).
(b) Let the r.v. X have the p.d.f. (
2x, if 0 < x < 1,
f (x) =
0, otherwise.
Find E(X 3 ).
X 1 1 1 1 1 1 19
Solution: (a) E(X 2 ) = x2 f (x) = 4 × +1× +0× +1× +4× +9× = .
6 6 6 6 6 6 6
x∈S
Z ∞ Z 1
3 3 2
(b) E(X ) = x f (x)dx = 2 x4 dx = .
−∞ 0 5
Theorem 0.81. Let X be a discrete or continuous r.v. with p.m.f./ p.d.f. f and support S. Let hi : R → R,
i = 1, 2, . . . , m be given functions.
(a) Then, for real constants c1 , c2 , . . . , cm
m
! m
X X
E ci hi (X) = ci E(hi (X)),
i=1 i=1
Lecture 0 Lecture Notes 0-38
In particular, if E(X) is finite and P (a ≤ X ≤ b) = 1, for some real constants a and b (a < b) then a ≤ E(X) ≤ b.
(c) If P (X ≥ 0) = 1 and E(X) = 0, then P (X = 0) = 1.
(d) If E(X) is finite then |E(X)| ≤ E(|X|).
(e) Let a and b be two real constants. Then,
Proof. The proofs of (a), (b) and (e) follows from the definition of expectation of a r.v.
(c) We will provide the proof for the case when X is a continuous r.v. Then
∞ !
[ 1
P (X > 0) = P X≥
n=1
n
1 1
= lim P X ≥ , X≥ ↑
n→∞ n n
Z ∞
= lim f (x)dx
n→∞ 1/n
Z∞
≤ lim nxf (x)dx, (x ∈ [1/n, ∞) =⇒ nx ≥ 1)
n→∞ 1/n
Z ∞
≤ lim n xf (x)dx
n→∞ 0
= lim [nE(X)] = 0 =⇒ P (X = 0) = 1.
n→∞
(d) We have
−|X| ≤ X ≤ |X| =⇒ E(−|X|) ≤ E(X) ≤ E(|X|) =⇒ |E(X)| ≤ E(|X|).
This completes the proof.
called the standard deviation of X (positive square root of the variance of r.v. X).
Remark 0.82. (i) Var(X) = σ 2 = E(X − µ01 )2 = E(X 2 − 2µ01 X + (µ01 )2 ) = E(X 2 ) − 2(µ01 )2 + (µ01 )2 =
E(X 2 ) − (E(X))2 .
Lecture 0 Lecture Notes 0-39
Theorem 0.83. Let X be a r.v. such that E(|X|s ) < ∞, for some s > 0. Then, E(|X|r ) < ∞, ∀ 0 < r < s.
Proof. Note that |X|r ≤ max{|X|s , 1} ≤ |X|s + 1. This implies that E(|X|r ) ≤ E(|X|s + 1) = E(|X|s ) + 1 < ∞.
Thus, the result follows.
(iv) The name m.g.f. to the transform MX is motivated by the fact that MX can be used to generate moments of any
r.v., as illustrated in the following theorem.
Theorem 0.86. Let X be a r.v. with m.g.f. MX that is finite on (−h, h), h > 0. Then,
(a) For each r ∈ {1, 2, . . . }, µ0r = E(X r ) is finite;
(r)
(b) For each r ∈ {1, 2, . . . }, µ0r = E(X r ) = MX (0), where
r
(r) d
MX (0) = MX (t) , the rth derivative of MX at the point 0;
dtr t=0
∞ r
X t tr
(c) MX (t) = µ0r , t ∈ (−h, h), so that µ0r is equal to coefficient of (r = 1, 2, . . . ) in the Maclaurin’s series
r!
r=0
r!
expansion of MX (t) around t = 0.
Z 0 Z ∞
=⇒ e−t|x| f (x)dx < ∞ ∀ t ∈ (−h, h) and et|x| f (x)dx < ∞ ∀ t ∈ (−h, h)
−∞ 0
Z 0 Z ∞
=⇒ e|t||x| f (x)dx < ∞ ∀ t ∈ (−h, h) and e|t||x| f (x)dx < ∞ ∀ t ∈ (−h, h)
−∞ 0
Z ∞
|tx|
=⇒ e f (x)dx < ∞ ∀ t ∈ (−h, h);
−∞
∞ ∞
dr
Z Z
tx (r)
(b) MX (t) = e f (x)dx, MX (t) = r etx f (x)dx, r = 1, 2, . . . .
−∞ dt −∞
Using the arguments of advanced calculus it can be shown that of MX (t) = E(etX ) < ∞, ∀ t ∈ (−h, h), then the
derivative can be passed through the integral sign. Therefore,
Z ∞ r Z ∞
(r) d tx
xr etx f (x)dx, r = 1, 2, . . .
MX (t) = r
e f (x) dx =
−∞ dt −∞
and Z ∞
(r)
MX (0) = xr f (x)dx = E(X r ).
−∞
∞ r r
!
Z ∞ Z ∞
tx
X t x
(c) MX (t) = e f (x)dx = f (x)dx.
−∞ −∞ r=0
r!
Under the assumption that MX (t) = E(etX ) < ∞, ∀ t ∈ (−h, h), using arguments of advanced calculus, it can be
shown that the summation sign can be passed through the integral sign. Thus,
∞ r Z ∞ ∞ r
X t X t
MX (t) = xr f (x)dx = E(X r ), r = 1, 2, . . . .
r=0
r! −∞ r=0
r!
where λ > 0. Show that the m.g.f. of X exists and is finite on whole R. Find MX (t), mean, variance of X and E(X 3 ).
(b) Let X be a continuous r.v. with p.d.f.
(
λe−λx , x > 0,
fX (x) =
0, otherwise,
where λ > 0. Find m.g.f., mean, variance of X and E(X r ), r = 1, 2, . . . (provided they exist).
1
(c) Let X be a continuous r.v. having the p.d.f. f (x) = , −∞ < x < ∞ (called Cauchy p.d.f. and
π(1 + x2 )
corresponding probability distribution is called Cauchy distribution). Show that the m.g.f. of X does not exist.
Alternatively, for t ∈ R,
t
−1)
MX (t) = eλ(e
λ2 (et − 1)2 λ3 (et − 1)3
= 1 + λ(et − 1) + + + ···
2! 3!
2 3
∞ j 2 ∞ j 3 ∞ j
X t λ X t λ X t + ···
= 1 + λ + +
j=1
j! 2! j=1
j! 3! j=1
j!
Lecture 0 Lecture Notes 0-42
λ2 2λ2 λ3
2 λ 3 λ
= 1 + λt + t + +t + + + ···
2! 2! 3! (2!)2 3!
Thus,
(c) Since E(X) is not finite, the m.g.f. of X does not exist.
Definition 0.89 (Equality in Distribution). Let X and Y be two r.v.’s with d.f.’s FX and FY , respectively. We say that
d
X and Y have the same distribution (written as X = Y ) if FX (x) = FY (x), ∀ x ∈ R.
Remark 0.90. (i) Let X and Y be two discrete r.v.’s with p.m.f.’s fX and fY , respectively. Then,
d
X = Y ⇐⇒ fX (x) = fY (x), ∀ x ∈ R.
d
(ii) Let X and Y be two continuous r.v.’s. Then, X = Y iff there exist versions of p.d.f.’s fX and fY of X and Y ,
respectively, such that fX (x) = fY (x), ∀ x ∈ R.
d d
(iii) Suppose X = Y , then for any Borel measurable function h : R → R, h(X) = h(Y ) and hence E(h(X)) =
E(h(Y )).
d
Theorem 0.91. Let X and Y be r.v.’s such that for some c > 0, MX (t) = MY (t), ∀ t ∈ (−c, c). Then, X = Y .
Proof. Special Case: Suppose that X and Y are discrete r.v.’s with support SX = SY = {1, 2, . . . }, pk = P (X = k)
and qk = P (Y = k), k = 1, 2, . . . . Then
∞
X ∞
X
=⇒ Λk pk = Λk qk ∀ Λ ∈ (e−c , ec )
k=1 k=1
=⇒ pk = qk ∀ k = 1, 2, . . . ,
d
since if two power series are equal over an interval then their coefficients are the same. Thus, X = Y .
Example 0.92. For any p ∈ (0, 1) and positive integer n, let Xp,n be a discrete r.v. with p.m.f.
n px (1 − p)n−x , if x = {0, 1, . . . , n},
fp,n (x) = x
0, otherwise.
Here, p ∈ (0, 1) and n ∈ N. (Such a r.v. or probability distribution is called binomial r.v. or distribution with n trials
d
and probability of success p). Define Yp,n = n − Xp,n . Using the m.g.f. of Xp,n , show that Yp,n = X1−p,n . Find
E(X1/2,n ).
Solution: We have
n n
X n x X n
MXp,n (t) = E etXp,n = etx p (1 − p)n−x = (et p)x (1 − p)n−x = (1 − p + pet )n , t ∈ R.
x=0
x x=0
x
Now
MYp,n (t) = E etYp,n = E et(n−Xp,n )
d
Thus, Yp,n = X1−p,n .
Alternatively,
d
Thus, Yp,n = X1−p,n .
d
Now for p = 1/2, X1/2,n = n − X1/2,n . Thus, E(X1/2,n ) = E(n − X1/2,n ) =⇒ E(X1/2,n ) = n/2.
e−|x| d
Example 0.93. Let X be a r.v. with p.d.f. fX (x) = , −∞ < x < ∞ and let Y = −X. Show that Y = X and
2
hence show that E(X) = 0.
Lecture 0 Lecture Notes 0-44
Solution: We have
∞ ∞
e−|x| e−|x|
Z Z
MY (t) = E(etY ) = E(e−tX ) = e−tx dx = etx dx = MX (t) ∀ t ∈ (−1, 1).
−∞ 2 −∞ 2
"
∞ 0 ∞
e−|x| ex e−x
Z Z Z
MX (t) = etx dx = etx dx + etx dx
−∞ 2 −∞ 2 0 2
Z ∞ Z ∞
1
= e−(1+t)x dx + e−(1−t)x dx
2 0 0
#
1 1 1 1 d
= + = ∀ t ∈ (−1, 1) =⇒ X = Y.
2 1+t 1−t 1 − t2
e−|y|/2 d
= fX (y) ∀ − ∞ < y < ∞ =⇒ X = Y.
fY (y) =
2
Z ∞
Thus, E(Y ) = E(X) =⇒ E(−X) = E(X) =⇒ E(X) = 0 (since |x|fX (x)dx < ∞).
−∞
0.11. Inequalities
Inequalities provide estimates of probabilities when they can not be evaluated precisely.
Theorem 0.94. Let X be a r.v. and let g : R → R be a non-negative function such that E(g(X)) is finite. Then, for
any c > 0,
E(g(X))
P (g(X) ≥ c) ≤ .
c
Corollary 0.95. (a) Let g : [0, ∞) → R be a non-negative and strictly increasing function such that E(g(X)) is finite.
Then, for any c > 0 such that g(c) > 0,
E(g(|X|))
P (|X| ≥ c) ≤ .
g(c)
E(|X|r )
P (|X| ≥ t) ≤ , (Markov’s inequality)
tr
E(|X|)
provided E(|X r |) < ∞. In particular, P (|X| ≥ t) ≤ , provided E(|X|) < ∞.
t
(b) We take g(x) = xr , x ≥ 0, r > 0. Then, g is strictly increasing on [0, ∞) and is non-negative. Using (a) we get
E(g(|X|)) E(|X|r )
P (|X| ≥ t) ≤ = .
g(t) tr
Example 0.97 (The above bounds are sharp). Let X be a r.v. with p.m.f.
1
8 , if x = −1, 1,
f (x) = 34 , if x = 0, .
0, otherwise.
1
Then E(X 2 ) = 4 and P (|X| ≥ 1) = 41 .
Using the Markov inequality, P (|X| ≥ 1) ≤ E(X 2 ) = 41 .
Example 0.98. Let X be a r.v. with p.d.f.
( 1
√ √
√
2 3
, if − 3<x< 3,
f (x) = .
0, otherwise.
Lecture 0 Lecture Notes 0-46
√ √
3 3
x2
Z Z
x
Then µ = E(X) = √
√ dx = 0, σ 2 = E(X 2 ) = √
√ dx = 1 and
− 3 2 3 − 3 2 3
Z 3/2
√
3 1 3
P (|X| ≥ ) = 1 − √ dx = 1 − = 0.134.
2 −3/2 2 3 2
The function ψ(·) is said to be strictly convex if the above inequality is strict.
Proof. We give the proof for the special case where ψ is twice differentiable on (a, b) so that ψ 00 (x) ≥ 0, ∀ x ∈ (a, b).
Let µ = E(X). Expand ψ(x) into a Taylor series about µ we get
(x − µ)2 00
ψ(x) = ψ(µ) + (x − µ)ψ 0 (µ) + ψ (ξ), ∀ x ∈ (a, b)
2!
for some ξ between µ and x. Thus,
ψ(x) ≥ ψ(µ) + (x − µ)ψ 0 (µ) =⇒ E(ψ(X)) ≥ E(ψ(µ) + (X − µ)ψ 0 (µ)) = ψ(µ) = ψ(E(X)).
Pn
Example 0.103. Let a1 , a2 , . . . , an , w1 , w2 , . . . , wn be positive constants such that i=1 wi = 1. Prove the AM-
GM-HM inequality
n n
X Y 1
ai wi ≥ awi ≥ Pn
i
wi , (AM ≥ GM ≥ HM ).
i=1 i=1 i=1 ai
E(ψ(X)) ≥ ψ(E(X))
=⇒ E(− ln X) ≥ − ln E(X)
n n
!
X X
=⇒ − (ln ai )wi ≥ − ln ai wi
i=1 i=1
n
! n
! n n
X Y X Y
=⇒ ln ai wi ≥ ln aw
i
i
=⇒ ai wi ≥ aw
i .
i
n n
X wi Y
Replacing ai ’s by 1
ai ’s, we get ≤ 1/ aw i
i . Therefore,
i=1
ai i=1
n n
X Y 1
ai wi ≥ aw
i ≥ Pn
i
wi .
i=1 i=1 i=1 ai
Let X be a r.v. defined on a probability space (Ω, F, P ) associated with a random experiment E . Let FX (·) be its
distribution function and fX (·) be its p.m.f. / p.d.f.
The probabilty distribution of X (i.e., p.m.f. / p.d.f.) describes the manner in which the r.v. X takes values in various
sets. It may be desirable to have a set of numerical measures that provide a summary of the prominent features of
the probability distribution of X. We call these measures as descriptive measures. Four prominently used descriptive
measures are:
(1) Measures of Central Tendency or Location (also called Averages):
This gives us the idea about central value of the probability distribution around which the values of r.v. X are clustered.
Commonly used measures of central tendency are:
(a) Mean:
Z ∞ X
µ= µ01 = E(X) = xfX (x)dx or xfX (x) → may or may not exist.
−∞ x∈SX
Whenever it exists it gives us the idea about average observed value of X when E is repeated a large number of times.
d
Note that if distribution of X is symmetric about µ (i.e., X − µ = µ − X), then E(X) = µ, provided it exists.
Lecture 0 Lecture Notes 0-48
Mean seems to be the best suited measure of central tendency for symmetric distribution. Because of its simplicity
mean is the most commonly used average. However mean may be affected by a few extreme values and also it may
not be defined.
(b) Median:
Before defining the median we first inroduce the concept of quantile function or quantile.
The quantile function of r.v. X is a function QX : (0, 1) → R defined by
For a fixed p ∈ (0, 1) the quantity ξp = QX (p) is called the quantile of order p. Note that
since the positive mass to large values of X pulls up the values of mean µ.
Negatively Skewed Distributions:
· Have more probability mass to the left side of the p.d.f. / p.m.f.
· Have longer tails on the left side of p.d.f.
For unimodal negatively skewed distributions, normally
(q3 − m) − (m − q1 ) q3 − 2m + q1
Yule coefficient of skewness = β2 = = (independent of units).
q3 − q1 q3 − q1
Clearly for positively / negatively skewed distribution β2 > 0/β2 < 0 and for symmetric distributions β2 = 0.
(4) Measures of Kurtosis:
For µ ∈ R and σ > 0, let Yµ,σ be a r.v. having p.d.f.
1 (x−µ)2
fYµ,σ (x) = √ e− 2σ2 , −∞ < x < ∞ (Normal distribution, Yµ,σ ∼ N (µ, σ 2 )).
σ 2π
It can be shown that
· E(Yµ,σ ) = µ, Var(Yµ,σ ) = σ 2 ;
Lecture 0 Lecture Notes 0-52
d
· Yµ,σ − µ = µ − Yµ,σ and hence β1 = 0, E((Yµ,σ − µ)4 ) = 3σ 4 ;
· fYµ,σ (·) is unimodal and symmetric.
Kurtosis of the probability distribution of X is a measure of peakedness and thickness of tails of p.m.f. / p.d.f. of X
relative to that of normal distribution.
A disribution is said to have higher (lower) kurtosis than the normal distribution if its p.m.f. / p.d.f. in comparison
with p.d.f. of a normal distribution, has a sharper (rounded) peak and longer, fatter (shorter, thinner) tails.
X −µ
Define Z = (independent of units)
σ
E((X − µ)4 ) µ4
ν1 = E(Z 4 ) = = 2 → Kurtosis of the probability distribution of X.
σ4 µ2
ν1 is used as a measure of kurtosis for unimodal distributions. For N (µ, σ 2 ) distribution, ν1 = 3. The quantity
ν2 = ν1 − 3 is called the excess kurtosis of the distribution of X. Obviously for normal distributions, ν2 = 0.
Mesokurtic distributions: Distributions with ν2 = 0,
Leptokurtic distributions: Distributions with ν2 > 0 (has sharper peak and longer, fatter tails).
Platykurtic distributions: Distributions with ν2 < 0 (has rounded peak and shorter, thinner tails).
Example 0.104. For α ∈ [0, 1], let Xα has the p.d.f.
(
αex , x < 0,
fα (x) = .
(1 − α)e−x , x ≥ 0.
Let ξp be the quantile of order p ∈ (0, 1). Then Fα (ξp ) = p, where Fα is the d.f. of Xα . Clearly Fα (0) =
Z 0
α ex dx = α. For 0 ≤ α < p, we have
−∞
Z 0 Z ξp
p = Fα (ξp ) = x
αe dx + (1 − α)e−x dx = 1 − (1 − α)e−ξp
−∞ 0
and for α ≥ p
Z ξp
p= αex dx = αeξp .
−∞
Lecture 0 Lecture Notes 0-53
Thus,
ln 1−α
1−p , if 0 ≤ α < p,
ξp =
− ln αp , if p ≤ α ≤ 1,
ln 4(1−α) , if 0 ≤ α < 1 ,
3 4
q1 (α) = ξ1/4 =
− ln (4α) , if 1 ≤ α ≤ 1,
4
(
ln (2(1 − α)) , if 0 ≤ α < 21 ,
me (α) = ξ1/2 = 1
− ln (2α) , if 2 ≤ α ≤ 1,
(
ln (4(1 − α)) , if 0 ≤ α < 43 ,
q3 (α) = ξ3/4 =
− ln 4α 3
3 , if 4 ≤ α ≤ 1,
Note that, for 0 ≤ α < 12 , me (α) = ln(2(1 − α)) ≥ 0 and for α > 21 , me (α) = − ln(2α) < 0. Thus, for 0 ≤ α < 1
2
(so that me (α) ≥ 0)
Thus, (
ln(2(1 − α)) + 2α, if 0 ≤ α < 21 ,
M D(me (α)) =
ln(2α) + 2(1 − α), if 21 ≤ α ≤ 1,
(
ln 3, if 0 ≤ α < 41 or 34 ≤ α ≤ 1,
IQR ≡ IQR(α) = q3 (α) − q1 (α) =
ln(16α(1 − α)), if 41 ≤ α < 43 ,
√
ln 3, if 0 ≤ α < 14 ,
q3 (α) − q1 (α) p
QD ≡ QD(α) = = ln(4 α(1 − α), if 14 ≤ α < 43 ,
2 √
ln 3, if 43 ≤ α ≤ 1,
Lecture 0 Lecture Notes 0-54
ln 3
, if 0 ≤ α < 41 ,
16(1−α)2
ln 3
ln(16α(1 − α))
q3 (α) − q1 (α)
, if 14 ≤ α ≤ 43 ,
CQD ≡ CQD(α) = =
(1−α)
q3 (α) + q1 (α) ln
α
ln 3
3
− 2 , if 4 ≤ α ≤ 1.
ln 16α
3
For α 6= 21 ,
√
σ(α) 1 + 4α − 4α2
CV ≡ CV (α) = 0 = ,
µ1 (α) 1 − 2α
µ3 (α) = E((Xα − µ01 (α))3 ) = µ03 (α) − 3µ01 (α)µ02 (α) + 2(µ01 (α))3 = 2(1 − 2α)3 ,
µ3 (α) 2(1 − 2α)3
β1 ≡ β1 (α) = =√ ,
σ(α) 1 + 4α − 4α2
4
ln( 3 ) , if 0 ≤ α < 1 ,
ln 3 4
ln(4α(1 − α))
− , if 41 ≤ α < 21 ,
q3 (α) − 2m(α) + q1 (α) ln(16α(1 − α))
β2 ≡ β2 (α) = =
q3 (α) − q1 (α) ln(4α(1 − α))
, if 21 ≤ α ≤ 43 ,
ln(16α(1 − α))
3
ln( 4 ) , if 3 ≤ α ≤ 1.
ln 3 4