0% found this document useful (0 votes)
11 views54 pages

Lecture Notes I

The document provides lecture notes on probability and statistics, focusing on the concepts of random experiments, sample spaces, events, and probability measures. It defines key terms such as random variable, sample space, event, and outlines the axioms of probability functions. The notes emphasize the importance of probability theory as a foundation for statistical analysis and inference.

Uploaded by

viitalumnivizag
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views54 pages

Lecture Notes I

The document provides lecture notes on probability and statistics, focusing on the concepts of random experiments, sample spaces, events, and probability measures. It defines key terms such as random variable, sample space, event, and outlines the axioms of probability functions. The notes emphasize the importance of probability theory as a foundation for statistical analysis and inference.

Uploaded by

viitalumnivizag
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

CS Core 2: Probability and Statistics July-Dec 2025

Lecture 0: Lecture Notes


Instructor: Dr. Kuldeep Kumar Kataria Scribe:

0.1. Introduction

Example 0.1. The production manager of a bulb manufacturing company wishes to study the effect of new manufac-
turing process on the lifetimes of bulbs produced through it.
Here the population under study is the following:
P: Collection of lifetimes of all electric bulbs produced using new manufacturing process.
In most practical situation P is generally large (e.g. collection of lifetimes of all electric bulbs that would be produced
using new manufacturing process) and it is not (due to time/cost contraints) to get complete information about P.
Thus a representative sample (a sample that in certain sense is a true representative of the population) is taken from P
and using this representative sample inferences regarding various population characteristics of P (such as population
mean, population variance etc.) are made. Note that the sample contains only partial information about P and the
goal is to make inferences about various population characteristics based on partial information in the sample drawn
from P.
X: Lifetime of a typical electric bulbs manufactured using new manufacturing process (a typical element of P).
X is random (called a random variable) and its value varies across P according to some law.

Probability Theory: A mathematical tool for modelling uncertainty (e.g. to describe the law according to which
values of X vary across P).
Statistics: Concerns with procedures for analyzing data (sample) and drawing inferences about various characteristics
of the population P.
For understanding of statistics, one must have a sound background in probability theory.
The only way to collect information about any random phenomenon is to perform experiments (e.g. selecting a set of
bulbs manufactured by the new manufacturing process and putting them on test for measuring their lifetimes). Each
experiment terminates in an outcome which cannot be predicted in advance prior to the performance of experiment
(e.g. lifetimes of the bulbs put on test cannot be predicted before they are put on test).
Definition 0.2 (Random Experiment). A random experiment is an experiment in which
(a) all possible outcomes of the experiment are known in advance,
(b) outcome of a particular performance (trial) of the experiment cannot be predicted in advance,
(c) the experiment can be repeated under identical conditions.

We will generally denote a random experiment by E.

0-1
Lecture 0 Lecture Notes 0-2

Definition 0.3 (Sample Space). The collection of all possible outcomes of a random experiment is called its sample
space. A sample space will usually be denoted by Ω.
Example 0.4. (i) E: Tossing a coin once. Sample space Ω = {H, T }, where H: Heads and T : Tails.
(ii) E: Throwing a die. Sample space Ω = {1, 2, 3, 4, 5, 6}.
(iii) E: Birth of a child. Sample space Ω = {M, F }. If we consider his/her weight then Ω = (0, 7).
(iv) E: Age at the death of a person. Sample space Ω = (0, 120).
(v) E: Putting an electric bulbs produced by new manufacturing process into test and measuring its lifetime. Sample
space Ω = [0, ∞).
(vi) E: Throwing two dice. Sample space

Ω = {(1, 1), (1, 2), . . . , (1, 6), (2, 1), (2, 2), . . . , (2, 6), . . . , (6, 1), (6, 2), . . . , (6, 6)}
= {(i, j) : i, j ∈ {1, 2, . . . , 6}}.

(vii) E: Putting two electric bulbs produced by new manufacturing process into test and measuring their lifetimes.
Sample space Ω = {(x1 , x2 ) : x1 ≥ 0, x2 ≥ 0} = [0, ∞) × [0, ∞).
(viii) E: Casting one red die and white die. Sample space

Ω = {(r, w) : r is number of spots on the red die and w is number of spots on the white die }
= {(1, 1), (1, 2), . . . , (1, 6), (2, 1), (2, 2), . . . , (2, 6), . . . , (6, 1), (6, 2), . . . , (6, 6)}
= {(i, j) : i, j ∈ {1, 2, . . . , 6}}
= {1, 2, . . . , 6} × {1, 2, . . . , 6} → has 36 elements.

Definition 0.5 (Event). An event is any subset of the sample space. If the outcome of a random experiment is a member
of the set E ⊆ Ω, we say that event E has occured.
Example 0.6. In Example 0.4 (vi), A = {(1, 5), (6, 2), (2, 2)} is an event. Also, in Example 0.4 (vii), A = {(x1 , x2 ) :
x1 ≤ 6, x2 ≥ 8} = [0, 6] × [8, ∞) may be an event.

Impossible Event: φ.
Sure Event: Ω.
Sn
Exhaustive Events: If i=1 Ai = Ω then we call A1 , A2 , . . . , An to be exhaustive events.
Mutually Exclusive Events: If A ∩ B = φ then A and B are called mutually exclusive events i.e., happening or
occcurrence of one of them excludes the possiblity of occurrence of other.
Pairwise Disjoint Events: Let A1 , A2 , . . . be events such that Ai ∩ Aj = φ, i 6= j. Then, we say that A1 , A2 , . . .
are pairwise disjoint or mutually exclusive.
Let A and B be two events. Then,
(i) A ∪ B → occurrence of at least one of the event A and B.
S∞
(ii) i=1 Ai → occurrence of at least one Ai , i = 1, 2, . . . , n.
(iii) A ∩ B → simultaneous occurrence of A and B.
T∞
(iv) i=1 Ai → simultaneous occurrence of Ai , i = 1, 2, . . . , n.
(v) Ac → not happenning of A.
Lecture 0 Lecture Notes 0-3

(vi) A − B → happenning of A not B. Thus, A − B = A ∩ B c .


Generally, we are interested in specific subsets of Ω, called event. So the event space (events under consideration) F
is a subset of power set of Ω.
So the event space is F ⊆ P(Ω). Here, P(Ω) denotes the power set of Ω. In many situations and for almost all practical
purposes F = P(Ω).
The choice of F is an important one:
(i) If Ω contains atmost a countable number of points we can always take F to be the P(Ω). (This is certainly a σ-field).
In this case each point set is a member of F and is the fundamental object of interest. Every subset of Ω is an event.
(ii) If Ω = R or any interval then Ω is uncountable. In this case we would like to consider all one point subsets of Ω,
all intervals (closed, open or semi-closed) to be events. We consider the Borel σ-field B generated by the class of all
semi closed intervals (a, b], which is a σ-field in R.
We say that the event space F ⊆ P(Ω) contains all subsets of Ω actually encountered in ordinary analysis and proba-
bility. It is large enough for all practical purposes.
Definition 0.7 (σ-Field/ σ-Algebra). A class of subset F of the sample space Ω is called a σ-field if it satisfies the
following conditions:
(a) Ω ∈ F,
(b) If A ∈ F then Ac ∈ F,
S∞
(c) If A1 , A2 , · · · ∈ F then i=1 Ai ∈ F.

The algebra of set theory is applicable in probability theory. Probability is a measure of uncertainty. We are interested
in quantifying uncertainty associated with various outcomes of a random experiment by assigning probability to these
outcomes.
Here, we will not discuss how probabilities are assigned (which is a part of probability modelling) rather we will
discuss properties of a probability as a measure.

0.2. Probability Measure

Recall that E denotes a random experiment, Ω denotes the sample space of E and F denotes event space. For all
practical purposes one may take F = P(Ω).
A set function is a function whose domain is a collection of sets (called a class of sets).
Definition 0.8 (Probability Function or Probability Measure). A probability function (or probability measure) is a real
valued set function, defined on the event space F satisfying the following axioms:
(a) P (Ω) = 1 (certainty),
(b) P (A) ≥ 0 ∀ A ∈ F (positivity),
(c) If A1 , A2 ∈ F be mutually exclusive/disjoint sets (i.e. A1 ∩ A2 = φ, the empty set) then

P (A1 ∪ A2 ) = P (A1 ) + P (A2 ).


Lecture 0 Lecture Notes 0-4

More generally, if {An }n≥1 is a sequence of mutually exclusive (disjoint) sets in F i.e., Ai ∩ Aj = φ, i 6= j, then
∞ ∞
!
[ X
P Ai = P (Ai ) (countable additivity).
i=1 i=1

We call P (A) the probability of event A. The triplet (Ω, F, P ) is called probability space.
Remark 0.9. Axiom (b) and (c) are desirable for any measure (such as area, volume, probability etc.). Since the
sample space Ω consists of all possible outcomes its occurrence is certain (100% chance of occurrence) and therefore
Axiom (a) (P (Ω) = 1) is also reasonable.

Elementary Properties of Probability Function/ Measure:


Let (Ω, F, P ) be a probability space.
(P1) P (φ) = 0.


S
Proof. Let A1 = Ω and Ai = φ, i = 2, 3, . . . Also, we have A1 = Ai , Ai ∩ Aj = φ, ∀ i 6= j. Therefore,
i=1


!
[
P (Ω) = P Ai
i=1

X
=⇒ 1 = P (Ai ), (Axioms (a) and (c))
i=1
n
X
=⇒ 1 = lim P (Ai )
n→∞
i=1
=⇒ 1 = lim [P (Ω) + (n − 1)P (φ)]
n→∞
=⇒ 1 = 1 + lim [(n − 1)P (φ)]
n→∞
=⇒ P (φ) = 0.

This completes the proof.


 n
 n
(P2) For some natural number n, let A1 , A2 , . . . , An ∈ F be mutually exclusive. Then, P
S P
Ai = P (Ai ).
i=1 i=1

n
S ∞
S
Proof. Let Ai = φ, i = n + 1, n + 2, . . . Then Ai ∩ Aj = φ, ∀ i 6= j and Ai = Ai . This implies
i=1 i=1


n
! !
[ [
P Ai =P Ai
i=1 i=1

X
= P (Ai ), (Axioms (c))
i=1
Xn
= P (Ai ), (P (Ai ) = P (φ) = 0, ∀ i = n + 1, n + 2, . . .).
i=1

This completes the proof.


Lecture 0 Lecture Notes 0-5

(P3) For all A ∈ F, 0 ≤ P (A) ≤ 1 and P (Ac ) = 1 − P (A).

Proof. Note that Ω = A ∪ Ac and A ∩ Ac = φ. Therefore,

1 = P (Ω) = P (A ∪ Ac ) = P (A) + P (Ac ) ≥ P (A), (using Axioms (a), (b) and (P2)).

Thus, 0 ≤ P (A) ≤ 1 and P (Ac ) = 1 − P (A).

(P4) Let A1 , A2 ∈ F be such that A1 ⊆ A2 . Then, P (A2 − A1 ) = P (A2 ) − P (A1 ) and P (A1 ) ≤ P (A2 ).

Proof. A2 = A1 ∪ (A2 − A1 ) and A1 ∩ (A2 − A1 ) = φ. Thus,

P (A2 ) = P (A1 ) + P (A2 − A1 ) =⇒ P (A2 − A1 ) = P (A2 ) − P (A1 ).

By Axiom (b), we have P (A2 − A1 ) ≥ 0 =⇒ P (A2 ) ≥ P (A1 ), that is, P (·) is monotone.

(P5) Let A1 , A2 ∈ F. Then,

P (A1 ∪ A2 ) = P (A1 ) + P (A2 ) − P (A1 ∩ A2 ) (Inclusion-Exclusion principle for two events).

Proof. Note that A1 ∪ A2 = A1 ∪ (A2 − A1 ) and A1 ∩ (A2 − A1 ) = φ. This implies

P (A1 ∪ A2 ) = P (A1 ∪ (A2 − A1 )) = P (A1 ) + P (A2 − A1 ) (using (P2)). (0.1)

Also, we have
(A1 ∩ A2 ) ∩ (A2 − A1 ) = φ and A2 = (A1 ∩ A2 ) ∪ (A2 − A1 ),
which implies

P (A2 ) = P (A1 ∩ A2 ) + P (A2 − A1 )


=⇒ P (A2 − A1 ) = P (A2 ) − P (A1 ∩ A2 ). (0.2)

Using (0.2) in (0.1), we get


P (A1 ∪ A2 ) = P (A1 ) + P (A2 ) − P (A1 ∩ A2 ).
This completes the proof.
Remark 0.10. (a) If P (A) = 0 and B ⊆ A, then P (B) = 0 (using (P4) and Axiom (b)).
Similarly, if P (C) = 1 and C ⊆ D, then P (D) = 1 (using (P3) and (P4)).
(b) Exercise: If P (D) = 1, then P (A) = P (A ∩ D), ∀A ∈ F.
Similarly, if P (D) = 0, then P (A) = P (A ∩ Dc ), ∀A ∈ F.
(c) Let A1 , A2 ∈ F. Then, using (P5) and Axiom (b), we get

P (A1 ∪ A2 ) ≤ P (A1 ) + P (A2 ) (Boole’s inequality for two events).

(d) Let A1 , A2 ∈ F. Then, using (P3), (P5) and Axiom (b), we get

P (A1 ∩ A2 ) ≥ max {P (A1 ) + P (A2 ) − 1, 0} (Bonferroni’s inequality for two events).


Lecture 0 Lecture Notes 0-6

Theorem 0.11 (Inclusion-Exclusion Principle). For A1 , A2 , . . . , Ak ∈ F, (k ≥ 2 is an integer), let


k
X
p1,k = P (A1 ) + P (A2 ) + · · · + P (Ak ) = P (Ai )
i=1
p2,k = P (A1 ∩ A2 ) + P (A1 ∩ A3 ) + · · · + P (A1 ∩ Ak ) + P (A2 ∩ A3 ) + · · · + P (A2 ∩ Ak ) + · · · + P (Ak−1 ∩ Ak )
XX
= P (Ai ∩ Aj )
1≤i<j≤k

(sum of probabilities of all possible intersections involving 2 events out of the k events A1 , . . . , Ak )
..
.
XX X
pi,k = ··· P (Aj1 ∩ Aj2 ∩ · · · ∩ Aji )
1≤j1 <j2 <···<ji ≤k

(sum of probabilities of all possible intersections involving i events out of k events A1 , . . . , Ak , i = 1, . . . , k).

Then, !
k
[
P Ai = p1,k − p2,k + p3,k − p4,k + · · · + (−1)k−1 pk,k .
i=1

Proof. Note that, for k = 2, p1,2 = P (A1 ) + P (A2 ), p2,2 = P (A1 ∩ A2 ) and

P (A1 ∪ A2 ) = P (A1 ) + P (A2 ) − P (A1 ∩ A2 ) = p1,2 − p2,2 .

Thus the result is true for k = 2. Now suppose that the result is true for k = 2, 3, . . . , m, that is,
k
!
[
P Ai = p1,k − p2,k + p3,k − p4,k + · · · + (−1)k−1 pk,k ∀ k = 2, 3, . . . , m.
i=1

Then,
m+1
! m
! !
[ [ [
P Ai =P Ai Am+1
i=1 i=1
m
! m
! !
[ [ \
=P Ai + P (Am+1 ) − P Ai Am+1 , (using result for k = 2)
i=1 i=1
m m
! m
X [ [
= (−1)j−1 pj,m + P (Am+1 ) − P (Ai ∩ Am+1 ) , (using the result for k = m on Ai )
j=1 i=1 i=1
Xm m
X m
[
= (−1)j−1 pj,m + P (Am+1 ) − (−1)j−1 tj,m , (using the result for k = m on (Ai ∩ Am+1 )),
j=1 j=1 i=1

where
m
X
t1,m = P (Ai ∩ Am+1 )
i=1
XX
t2,m = P (Ai ∩ Aj ∩ Am+1 )
1≤i<j≤m
XX X
tj,k = ··· P (Ai1 ∩ Ai2 ∩ · · · ∩ Aij ∩ Am+1 ), j = 1, 2, . . . , m
1≤i1 <i2 <···<ij ≤m
Lecture 0 Lecture Notes 0-7

tm,m = P (A1 ∩ A2 ∩ · · · ∩ Am ∩ Am+1 )

Therefore,
m+1
!
[
P Ai = (p1,m + P (Am+1 )) − (p2,m + t1,m ) + (p3,m + t2,m ) + · · · + (−1)m−1 (pm,m + tm−1,m ) + (−1)m tm,m
i=1
= p1,m+1 − p2,m+1 + p3,m+1 + · · · + (−1)m−1 pm,m+1 + (−1)m pm+1,m+1 ,

as
m
X
p1,m + P (Am+1 ) = P (Aj ) + P (Am+1 ) = p1,m+1 ,
j=1
XX m
X
p2,m + t1,m = P (Ai ∩ Aj ) + P (Ai ∩ Am+1 )
1≤i<j≤m i=1
XX
= P (Ai ∩ Aj ) = p2,m+1 ,
1≤i<j≤m+1
..
.
XX X
pm,m + tm−1,m = P (A1 ∩ A2 ∩ · · · ∩ Am ) + ··· P (Ai1 ∩ Ai2 ∩ · · · ∩ Aim−1 ∩ Am+1 )
1≤i1 <i2 <···<im−1 ≤m

= pm,m+1

and tm,m = P (A1 ∩ A2 ∩ · · · ∩ Am ∩ Am+1 ) = pm+1,m+1 . The result now follows by induction.
Remark 0.12. Let A1 , A2 , A3 ∈ F. Then

P (A1 ∪ A2 ∪ A3 ) = p1,3 − p2,3 + p3,3


= P (A1 ) + P (A2 ) + P (A3 ) − P (A1 ∩ A2 ) − P (A1 ∩ A3 ) − P (A2 ∩ A3 ) + P (A1 ∩ A2 ∩ A3 ).

Theorem 0.13. For some positive integer k ≥ 2, let A1 , A2 , . . . , Ak ∈ F. Then


k
!
[
p1,k − p2,k ≤ P Ai ≤ p1,k .
i=1

Proof. Note that for k = 2, p1,2 = P (A1 ) + P (A2 ), p2,2 = P (A1 ∩ A2 ) and

P (A1 ∪ A2 ) = P (A1 ) + P (A2 ) − P (A1 ∩ A2 ) ≤ P (A1 ) + P (A2 ).

This implies p1,2 − p2,2 = P (A1 ∪ A2 ) ≤ P (A1 ) + P (A2 ). Thus the result is true for k = 2. Now suppose that for
some positive integer m(≥ 2)
k
!
[
p1,k − p2,k ≤ P Ai ≤ p1,k ∀ k = 1, 2, . . . , m.
i=1

Then,
m+1
! m
! !
[ [ [
P Ai =P Ai Am+1
i=1 i=1
Lecture 0 Lecture Notes 0-8

m
!
[
≤P Ai + P (Am+1 ), using result for k = 2, A = ∪m
i=1 Ai and B = Am+1 ,
i=1

then P (A ∪ B) ≤ P (A) + P (B)
≤ p1,m + P (Am+1 )
= p1,m+1 . (0.3)

Also using the result for k = m, we get


m
!
[
P Ai ≥ p1,m − p2,m
i=1

and !
m
[ m
X
P (Ai ∩ Am+1 ) ≤ P (Ai ∩ Am+1 )
i=1 i=1

Thus,
m+1
! m
! !
[ [ [
P Ai =P Ai Am+1
i=1 i=1
m
! m
!
[ [
=P Ai + P (Am+1 ) − P (Ai ∩ Am+1 )
i=1 i=1
m
X
≥ p1,m − p2,m + P (Am+1 ) − P (Ai ∩ Am+1 )
i=1
m
!
X
= (p1,m + P (Am+1 )) − p2,m + P (Ai ∩ Am+1 ) (0.4)
i=1

Using (0.3) and (0.4), we get !


m+1
[
p1,m+1 − p2,m+1 ≤ P Ai ≤ p1,m+1
i=1

and the result follows using principle of mathematical induction.


Remark 0.14. It can also be shown that
k
!
[
p1,k − p2,k + p3,k − p4,k ≤P Ai ≤ p1,k − p2,k + p3,k
i=1
..
.
k
!
[
p1,k − p2,k + · · · + p2m−1,k − p2m,k ≤P Ai ≤ p1,k − p2,k + · · · + p2m−1,k ,
i=1

for m = 1, 2, . . . , [ k2 ].
Theorem 0.15 (Bonferroni’s Inequality). Let A1 , A2 , . . . , Ak ∈ F. Then
k
! ( k )
\ X
P Ai ≥ max P (Ai ) − (k − 1), 0 .
i=1 i=1
Lecture 0 Lecture Notes 0-9

Proof. We have
k
! k
!c !
\ [
P Ai =P Aci , (De-Morgan’s law)
i=1 i=1
k
!
[
=1−P Aci
i=1
k
X
≥1− P (Aci ), (Boole’s inequality)
i=1
k
X
=1− (1 − P (Ai ))
i=1
k
X
= P (Ai ) − (k − 1). (0.5)
i=1

Also, !
k
\
P Ai ≥ 0. (0.6)
i=1

Combining (0.5) and (0.6), we get


k
! ( k )
\ X
P Ai ≥ max P (Ai ) − (k − 1), 0 .
i=1 i=1

This completes the proof.

Probability as a Continuous Set Function:


A sequence of events {En , n ≥ 1} is said to be an increasing sequence if E1 ⊂ E2 ⊂ · · · ⊂ En ⊂ En+1 . . . whereas
it is said to be decreasing sequence if E1 ⊃ E2 ⊃ · · · ⊃ En ⊃ En+1 ⊃ . . . .
S∞
If {En } is an increasing sequence of events, then we
T∞ define limn→∞ En = i=1 Ei . Similarly, if {En } is decreasing
sequence of events, then we define limn→∞ En = i=1 Ei .
Theorem 0.16. If {En } is either increasing or decreasing sequence of events, then
 
lim P (En ) = P lim En .
n→∞ n→∞

Proof. Let {En } be an increasing sequence and define Fn , n ≥ 1 as

F1 = E1 ,
F2 = E2 − E1 ,
..
.
Fn = En − En−1 .
Sn Pn
Then, {Fn } is a disjoint sequence of events and En = i=1 Fi =⇒ P (En ) = i=1 P (Fi ). Now
n
[ ∞
[
lim En = lim Fi = Fn .
n→∞ n→∞
i=1 n=1
Lecture 0 Lecture Notes 0-10

So,
∞ ∞
! n n
!
  [ X X [
P lim En = P Fn = P (Fn ) = lim P (Fi ) = lim P Fi = lim P (En ).
n→∞ n→∞ n→∞ n→∞
n=1 n=1 i=1 i=1

Similarly, we can prove for other case.


Example 0.17. Random experiment E: casting a red and white die.
Sample space: Ω = {(i, j) : i ∈ {1, 2, . . . , 6}, j ∈ {1, 2, . . . , 6}}.
For (i, j) ∈ Ω, i: number of spots up on the red die; j: number of spots up on the white die.
Event space F = power set of Ω.
For A ∈ F, define Q : F → R as
|A|
Q(A) = , where |A| = number of elements in A.
36
Then
|Ω| 36
(a) Q(Ω) = = = 1.
36 36
|A|
(b) Q(A) = ≥ 0, ∀ A ∈ F.
36
(c) For mutually exclusive events A1 , A2 , . . .
∞ ∞ ∞
! S∞ P∞
[ | i=1 Ai | |Ai | X |Ai | X
Q Ai = = i=1 = = Q(Ai ).
i=1
36 36 i=1
36 i=1

Thus, (Ω, F, Q) is a probability space.

Equally Likely Probability Models for Finite Sample Space:


Suppose that the sample space Ω = {ω1 , ω2 , . . . , ωk } is finite (has k elements). Here singletons {ωi } are called
[k
elementary events and Ω = {ωi }. Suppose that
i=1

1
P ({ωi }) = , i = 1, 2, . . . , k (each elementary event is equally likely).
k

Sr event E ⊆ Ω, we have E = {ωi1 , ωi2 , . . . , ωir }, for some i1 , i2 , . . . , ir ∈ {1, 2, . . . , k}, 1 ≤ r ≤ k. Then,
For any
E = j=1 {ωij } and
 
[r X r

P (E) = P  {ωij } = P {ωij }
j=1 j=1
r
X 1 r number of ways favourable to event E
= = = .
j=1
k k total number of ways in which the random experiment can terminate
 
1
Here the assumption of equally likely P ({ωi }) = , i = 1, 2, . . . , k is a part of probability modelling.
k
“At random”: In a random experiment with finite sample space Ω, whenever we say that the experiment has been
performed at random it means that all the outcomes in the sample space are equally likely.
Lecture 0 Lecture Notes 0-11

Example 0.18 (Birthday Problem). Suppose that a college has n students, including you. Each of them were born on
non-leap years.
(a) Find the probability that at least two of them have the same birthday. For what values of n this probability is more
than 0.5, 0.8, 0.95?
(b) For what value of n the probability that you will find someone who shares your birthday is 0.5.

365 × 364 × · · · × (365 − n + 1)


Solution: Required probability = 1−P (all of them have different birthdays) = 1−
365n
364n−1
Required probability = 1 − P (no one shares the same birthday as mine) = 1 − .
365n−1
364n−1
For 1 − ≈ 0.5, n ≈ 253.
365n−1
Example 0.19. Five cards are drawn at random and without replacement from a deck of 52 cards. Find the probability
that
(i) each card is spade (event E1 ),
(ii) at least one card is spade (event E2 ),
(iii) exactly three cards are king and two cards are queen (event E3 ),
(iv) exactly two kings, two queens and one jack are drawn (event E4 ).

13

5
Solution: (i) P (E1 ) = 52 ,

5
39

5
(ii) P (E2 ) = 1 − P (E2c ) = 1 − P (no card is spade) = 1 − 52 ,

5
4
 4
3
(iii) P (E3 ) = 52
2 ,
5
4
 4 4
2 2 1
(iv) P (E4 ) = 52
 .
5

Example 0.20 (Capture/Recapture Method for Estimating Population Size). In a wildlife population suppose that the
population size n is unknown. To estimate the population size n, 20 animals are captured, tagged and then released
back. Thereafter 40 animals are captured at random and it is found that 8 of them are tagged. Find an estimate of the
population size n based on the given data.

Solution: We have

n = total number of animals,


20 = number of tagged animals in the population,
n − 20 = number of untagged animals in the population.

Data: Sample of 40 animals yield

number of tagged animals = 8,


number of untagged animals = 32.
Lecture 0 Lecture Notes 0-12

The probability of obtaining this data is


20 n−20
 
8 32
l(n) = n , n ≥ 52.
40

n−19 n−20
 
32 32
l(n + 1) > l(n) ⇐⇒ n+1
 > n
40 40
n − 19
⇐⇒ >1
(n − 51)(n + 1)
⇐⇒ n < 99.

Similarly l(n + 1) < l(n) ⇐⇒ n > 99. Thus l is maximized at n = 99, that is, for n = 99, the observe data (among
the captured animals 8 are tagged and 32 are untagged) is most probable.
Thus an estimate of n is n̂ = 99 (Maximum likelihood estimator).

0.3. Conditional Probability

Consider a probability space (Ω, F, P ) where Ω = {ω1 , ω2 , . . . , ωn } is finite and


1
P ({ωi }) = , i = 1, 2, . . . , n (equally likely probability model).
n
Then, for any A ∈ F
number of cases favourable to A |A| |A|
P (A) = = = .
total number of cases |Ω| n
Now suppose it is known a priori that event A has occured (i.e. outcome of the experiment is an element of A), where
|A| ≥ 1 (so that P (A) = |A|/n > 0). Given this prior information (that the event A has occured) we want to define
probability function say P (B|A) on the event space F. A natural way to define P (B|A) is

|A ∩ B| |A ∩ B|/n P (A ∩ B)
P (B|A) = = = , B ∈ F.
|A| |A|/n P (A)

Definition 0.21. Let (Ω, F, P ) be a probability space and let A ∈ F be such that P (A) > 0. Then

P (A ∩ B)
P (B|A) = , B ∈ F,
P (A)
is called the conditional probabilty of event B given the event A.
Remark 0.22. (a) In the above definition the event A (with P (A) > 0) is fixed and for this fixed A ∈ F, P (·|A) is a
set function defined on F. Is it a probability function/ measure?
(b) P (A ∩ B) = P (A)P (B|A) = P (B)P (A|B) for A, B ∈ F.
Theorem 0.23. Let (Ω, F, P ) be a probability space and let A ∈ F be such that P (A) > 0 be fixed. Then
P (·|A) : F → R is a probability function (called the conditional probabilty function) on F (so that (Ω, F, P (·|A)) is
a probability space).

P (A ∩ B) P (A ∩ Ω)
Proof. Note that P (B|A) = ≥ 0 for all B ∈ F and P (Ω|A) = = 1.
P (A) P (A)
Lecture 0 Lecture Notes 0-13

Let {Bn }n≥1 be a sequence of disjoint events in F. Then,



! S∞ S∞
[ P (( n=1 Bn ) ∩ A) P ( n=1 (Bn ∩ A))
P Bn A = = .
n=1
P (A) P (A)

Since {Bn }n≥1 are disjoint then subsets {Bn ∩ A}n≥1 are also disjoint. Since P (·) is a probability measure, we get
∞ ∞ ∞
! P

[ P (Bn ∩ A) X P (Bn ∩ A) X
P Bn | A = n=1 = = P (Bn |A).
n=1
P (A) n=1
P (A) n=1

It follows that P (·|A) is a probabilty function on F for any fixed A ∈ F with P (A) > 0.
Example 0.24. Five cards are drawn at random (without replacement) from a deck of 52 cards. Define events

B : all spade in hand and A : at least 4 spade in hand.

Find P (B|A).

Solution: We have
P (A ∩ B) P (B)
P (B|A) = = (since B ⊆ A)
P (A) P (A)
13
/ 52
 
=  13 395 5
13
 52 = 0.441.
4 1 + 5 / 5

Remark 0.25 (Multiplication Law). (i) P (A ∩ B) = P (A)P (B|A), if P (A) > 0.


(ii) P (A ∩ B ∩ C) = P (A ∩ B)P (C|A ∩ B) = P (A)P (B|A)P (C|A ∩ B), provided P (A ∩ B) > 0 (which ensures
that P (A) > 0 as A ∩ B ⊆ A).
(iii) Using principle of mathematical induction, we have
n
!
\
P Ci = P (C1 ∩ C2 ∩ · · · ∩ Cn )
i=1
= P (C1 ∩ C2 ∩ · · · ∩ Cn−1 )P (Cn |C1 ∩ C2 ∩ · · · ∩ Cn−1 )
= P (C1 ∩ C2 ∩ · · · ∩ Cn−2 )P (Cn−1 |C1 ∩ C2 ∩ · · · ∩ Cn−2 )P (Cn |C1 ∩ C2 ∩ · · · ∩ Cn−1 )
..
.
= P (C1 )P (C2 |C1 )P (C3 |C1 ∩ C2 ) . . . P (Cn |C1 ∩ C2 ∩ · · · ∩ Cn−1 )

provided P (C1 ∩ C2 ∩ · · · ∩ Cn−1 ) > 0 (which also ensures that P (C1 ∩ C2 ∩ · · · ∩ Ci ) > 0, i = 1, 2, . . . , n − 2).
Due to symmetry, if (α1 , α2 , . . . , αn ) is a permutation of (1, 2, . . . , n), then
n
!
\
P Ci = P (Cα1 ∩ Cα2 ∩ · · · ∩ Cαn )
i=1
= P (Cα1 )P (Cα2 |Cα1 )P (Cα3 |Cα1 ∩ Cα2 ) . . . P (Cαn |Cα1 ∩ Cα2 ∩ · · · ∩ Cαn−1 )

provided P (Cα1 ∩Cα2 ∩· · ·∩Cαn−1 ) > 0 (which also ensures that P (Cα1 ∩Cα2 ∩· · ·∩Cαi ) > 0, i = 1, 2, . . . , n−2).
Lecture 0 Lecture Notes 0-14

Example 0.26. A bowl contains 3 red and 5 blue chips. All chips that are of the same colour are identical. Two chips
are drawn successively at random and without replacement. Define events

A : first draw resulted in a red chip,


B : second draw resulted in a blue chip.

Find P (A ∩ B), P (A) and P (B).

3 5
Solution: P (A) = , P (B|A) = and
8 7
5 3 4 5 35
P (B) = P (A ∩ B) + P (Ac ∩ B) = P (B|A)P (A) + P (B|Ac )P (Ac ) = × + × = .
7 8 7 8 56
Note that here the outcomes of second draw is dependent on outcome of first draw (P (B|A) 6= P (B)). Also,
3 5
P (A ∩ B) = P (A)P (B|A) = × = 0.2679.
8 7
Theorem 0.27 (Theorem of Total Probability). For a countable set ∆ (that is elements of ∆ can either be put in 1-1
correspondence with N = {1, 2, . . . } or with {1, 2, . . . , n} for some n ∈ N), let {ESα : α ∈ ∆} be a countable
collection of mutually exclusive (i.e., Eα ∩ Eβ = φ, ∀ α 6= β ) and exhaustive (i.e., P α∈∆ Eα = 1) events. Then,
for any E ∈ F, X X
P (E) = P (E ∩ Eα ) = P (E|Eα )P (Eα ).
α∈∆ α∈∆
P (Eα )>0

S 
Proof. Since P α∈∆ Eα = 1, we have
!! !
\ [ [
P (E) = P E Eα =P (E ∩ Eα )
α∈∆ α∈∆
X
= P (E ∩ Eα ), (Eα ’s are disjoint =⇒ their subsets (E ∩ Eα )’s are disjoint)
α∈∆
X
= P (E ∩ Eα ), (P (Eα ) = 0 =⇒ P (E ∩ Eα ) = 0, α ∈ ∆)
α∈∆
P (Eα )>0
X
= P (E|Eα )P (Eα ).
α∈∆
P (Eα )>0

This completes the proof.


Example 0.28. A population comprises of 40% female and 60% male. Suppose that 15% of female and 30% of male
in the population smoke. A person is selected at random from the population.
(a) Find the probability that he/she is a smoker.
(b) Given that the selected person is smoker, find the probability that he is male.

Solution: Define the events

M : selected person is a male,


F = M c : selected person is a female,
Lecture 0 Lecture Notes 0-15

S : selected person is a smoker,


T = S c : selected person is a non-smoker.

We have P (F ) = 0.4, P (M ) = 0.6, P (F ∪M ) = P (F )+P (M ) = 1, P (S|F ) = 0.15, P (T |F ) = 0.85, P (S|M ) =


0.30, P (T |M ) = 0.70.
(a) By using Theorem of total probability, we get

P (S) = P (S ∩ F ) + P (S ∩ M ) = P (S|F )P (F ) + P (S|M )P (M ) = 0.15 × 0.4 + 0.30 × 0.6 = 0.24.

(b)
P (M ∩ S) P (S|M )P (M ) 0.30 × 0.60 3
P (M |S) = = = = .
P (S) P (S) 0.24 4
Theorem 0.29 (Bayes’ Theorem). Let {Eα : α ∈ ∆} be a countable collection of mutually exclusive and exhaustive
events and let E be any event P (E) > 0. Then, for j ∈ ∆ with P (Ej ) > 0,

P (E|Ej )P (Ej )
P (Ej |E) = P .
P (E|Eα )P (Eα )
α∈∆
P (Eα )>0

Proof. For j ∈ ∆,

P (Ej ∩ E) P (E|Ej )P (Ej )


P (Ej |E) = = P , (using Theorem of total probability).
P (E) P (E|Eα )P (Eα )
α∈∆
P (Eα )>0

This completes the proof.

Remark 0.30. (a) Suppose that occurrence of any of the mutually exclusive and exhaustive events {Eα : α ∈ ∆}
(where ∆ is a countable set) may cause the occurrence of an event E. Given that the event E has occurred (i.e., given
the effect), Bayes’ Theorem provides the conditional probability that the event E (effect) is caused by occurrence of
event Ej , j ∈ ∆.
(b) In Bayes’ Theorem {P (Ej ) : j ∈ ∆} are called prior probabilities and {P (Ej |E) : j ∈ ∆} are called posterior
probabilities.
Example 0.31. Bowl C1 contains 3 red and 7 blue chips. Bowl C2 contains 8 red and 2 blue chips. Bowl C3 contains
5 red and 5 blue chips. All chips of the same colour are identical.
A die is cast and a bowl is selected as per the following schemes:

Bowl C1 is selected if 5 or 6 spots show on the upper side,


Bowl C2 is selected if 2,3 or 4 spots show on the upper side,
Bowl C3 is selected if 1 spots show on the upper side.

The selected bowl is handed over to another person who drawns two chips at random from this bowl. Find the
probability that:
(a) Two red chips are drawn.
(b) Given that drawn chips are both red, find the probability that it came from bowl C3 .
Lecture 0 Lecture Notes 0-16

Solution: Define the events


Ai : selected bowl is Ci , i = 1, 2, 3, and R: the chips drawn from the selected bowl are both red.
2 1 3 1 1
Then P (A1 ) = = , P (A2 ) = = , P (A3 ) = . Note that {A1 , A2 , A3 } are mutually exclusive and
6 3 6 2 6
exhaustive.
(a)
3 8 5
  
2 1 2 1 2 1 10
P (R) = P (R|A1 )P (A1 ) + P (R|A2 )P (A2 ) + P (R|A3 )P (A3 ) = 10
 × + 10
 × + 10
 × = .
2
3 2
2 2
6 27

(b)
(52) 1
×
P (R|A3 )P (A3 ) (10
2) 6 1
P (A3 |R) = = 10 = .
P (R) 27
10
Remark 0.32. In the above example,

(32) 1
×
P (R|A1 )P (A1 ) (10
2)
3 3
P (A1 |R) = = 10 = ,
P (R) 27
50
(82) 1
×
P (R|A2 )P (A2 ) 2(1021
2)
P (A2 |R) = = 10 = ,
P (R) 27
25
3 1
P (A1 |R) = < = P (A1 ) ⇐⇒ P (A1 ∩ R) < P (A1 )P (R) ←→ R has negative information about A1 ,
50 3
21 1
P (A2 |R) = > = P (A2 ) ⇐⇒ P (A2 ∩ R) > P (A2 )P (R) ←→ R has positive information about A2 ,
25 2
1 1
P (A3 |R) = < = P (A3 ) ⇐⇒ P (A3 ∩ R) < P (A3 )P (R) ←→ R has negative information about A3 .
10 6
Note that proportion of red chips in C2 > proportion of red chips in Ci , i = 1, 3.

Independent Events:
Definition 0.33. Let {Ej : j ∈ ∆} be a collection of events.
(i) Events {Ej : j ∈ ∆} are said to be pairwise independent if for any pair of events Eα and Eβ (α, β ∈ ∆, α 6= β)
in the collection {Ej : j ∈ ∆}, we have

P (Eα ∩ Eβ ) = P (Eα )P (Eβ ).

(ii) Events {E1 , E2 , . . . , En } are said to be independent if for any subcollection {Eα1 , Eα2 , . . . , Eαk } of {E1 , E2 , . . . , En }
(k = 1, 2, . . . , n), we have  
\k Yk
P Eαj  = P (Eαj ).
j=1 j=1

(iii) Let ∆ ⊆ R be an arbitrary index set so that {Eα : α ∈ ∆} is an arbitrary collection of events. Events
{Eα : α ∈ ∆} are said to be independent if any finite subcollection of events in {Eα : α ∈ ∆} forms a collection of
independent events.
Lecture 0 Lecture Notes 0-17

Theorem 0.34. Let E1 , E2 , . . . be collection of independent events. Then


∞ ∞
!
\ Y
P Ek = P (Ek ).
k=1 k=1

Tn T∞ T∞ T∞
Proof. Let Bn = k=1 Ek , n = 1, 2, . . . . Then Bn ↓ and P ( n=1 Bn ) = lim P (Bn ). But n=1 Bn = k=1 Ek
Tn Qn n→∞
and P (Bn ) = P ( k=1 Ek ) = k=1 P (Ek ). Thus,
∞ ∞
! n
\ Y Y
P Ek = lim P (Ek ) = P (Ek ).
n→∞
k=1 k=1 k=1

This completes the proof.


Remark 0.35. (i) To verify that n events E1 , E2 , . . . , En are independent one must verify
     
n n n
+ + ··· + = 2n − n − 1 conditions.
2 3 n

For an example to conclude that three events E1 , E2 , E3 are independent, the following four (as 23 − 3 − 1 = 4)
conditions must be verified:

P (E1 ∩ E2 ) = P (E1 )P (E2 ), P (E1 ∩ E3 ) = P (E1 )P (E3 ), P (E2 ∩ E3 ) = P (E2 )P (E3 ),

and
P (E1 ∩ E2 ∩ E3 ) = P (E1 )P (E2 )P (E3 ).

(ii) Any subcollection of independent events is independent. In particular, the independence of a collection of events
implies their pairwise independence.
(iii) If E1 and E2 are independent events (P (E1 ) > 0, P (E2 ) > 0), then

P (E1 ∩ E2 ) P (E1 )P (E2 )


P (E1 |E2 ) = = = P (E1 ),
P (E2 ) P (E2 )

that is, conditional probability of E1 given E2 is the same as unconditional probability of E1 .


Similarly, if E1 , E2 and E3 are independent events then P (E1 |E2 ∩ E3 ) = P (E1 ).
Example 0.36. Consider the probability space (Ω, F, P ) with Ω = {1, 2, 3, 4} and P ({i}) = 1/4, i = 1, 2, 3, 4. Let
A = {1, 4}, B = {2, 4}, C = {3, 4}. Then, show that A, B and C are pairwise independent but not independent.

Solution: We have P (A) = P (B) = P (C) = 1/2. Also, P (A ∩ B) = P (A ∩ C) = P (B ∩ C) = P ({4}) = 1/4.


Thus,
P (A ∩ B) = P (A)P (B), P (A ∩ C) = P (A)P (C), P (B ∩ C) = P (B)P (C),
which implies that A, B and C are pairwise independent. However,

P (A ∩ B ∩ C) = P ({4}) = 1/4 6= 1/8 = P (A)P (B)P (C),

which implies that A, B and C are not independent although they are pairwise independent.
Lecture 0 Lecture Notes 0-18

Example 0.37. Let E1 , E2 , . . . , En be a collection of independent events. Show that,


(a) for any permutation (α1 , . . . , αn ) of (1, . . . , n), Eα1 , Eα2 , . . . , Eαn are independent;
c
(b) E1 , E2 , . . . , Ek , Ek+1 , . . . , Enc are independent for any k ∈ {0, 1, . . . , n − 1};
(c) E1c and E2 ∪ E3c ∪ E5 are independent.
(d) E1 ∪ E2c , E3c and E4 ∩ E5c are independent.
Remark 0.38. When we say that the two random experiments are performed independently, it means that the events
associated with two random experiments are independent.

0.4. Random Variables and their Distribution Functions

Let (Ω, F, P ) be a given probability space. In some situations we may not be directly interested in the sample space
Ω; rather we may be interested in some numerical aspect of Ω.
Example 0.39. A fair coin (head and tail are equally likely) is tossed three times independently. Then,

Ω = {HHH, HHT, HT H, HT T, T T T, T T H, T HT, T HH}

and P ({ω}) = 1/8 for all ω ∈ Ω. Suppose that we are interested in number of heads in three tosses, i.e., we are
interested in the function X : Ω → R defined as

 0, if ω = T T T,

1, if ω ∈ {HT T, T HT, T T H},

X(ω) =
2, if ω ∈ {HHT, HT H, T HH},



3, if ω = HHH.

Clearly the values assumed by X are random with

P (X = 0) = P (X = 3) = 1/8 and P (X = 1) = P (X = 2) = 3/8.

Hence P (X ∈ {0, 1, 2, 3}) = 1.


Definition 0.40. Let (Ω, F, P ) be a given probability space. A real valued measurable function X : Ω → R (defined
on sample space Ω) is called a random variable (r.v.).

Note: From rigorous mathematical point of view a random variable is a real valued function with some technical
condition. In this course we are ignoring these technical details. For all practical purpose r.v. is a real valued function
defined on Ω.
For a function Y : Ω → R and A ⊆ R, define

Y −1 (A) = {ω ∈ Ω : Y (ω) ∈ A}.

Then it is straightforward to prove the following result:


Proposition 0.41. Let A ⊆ R, B ⊆ R and Aα ⊆ R, α ∈ Λ, where Λ is an arbitrary index set. Let Y : Ω → R be a
given function. Then
(a) If A ∩ B = φ, then Y −1 (A) ∩ Y −1 (B) = φ;
(b) Y −1 (Ac ) = (Y −1 (A))c (that is, Y −1 (R − A) = Y −1 (R) − Y −1 (A) = Ω − Y −1 (A));
Lecture 0 Lecture Notes 0-19

(c) Y −1 Aα = α∈Λ Y −1 (Aα );


S  S
α∈Λ

(d) Y −1 −1
T  T
α∈Λ Aα = α∈Λ Y (Aα ).

For a probability space (Ω, F, P ) and a r.v. X : Ω → R, note that ∀ B ⊆ B


X −1 (B) = {ω ∈ Ω : X(ω) ∈ B} ∈ F.
Thus, one can define a set function PX : B → [0, 1] by
PX (B) = P (X −1 (B)) = P ({ω ∈ Ω : X(ω) ∈ B}) , B ∈ B,
where B is some class of subsets of R. Here, also for all practical purpose we will take B to be a sigma algebra
formed by open subsets of R.
We simply write
PX (B) = P ({ω ∈ Ω : X(ω) ∈ B}) = P (X ∈ B), B ∈ B.
X
We have the following scenario (Ω, F, P ) −→ (R, B, PX ).
Theorem 0.42 (Induced probability space / measures). (R, B, PX ) (as defined above) is a probability space, i.e.,
PX (·) is a probability function defined on B.

Proof. (i) PX (R) = P (X ∈ R) = P (X −1 (R)) = P (Ω) = 1.


(ii) For any B ∈ B, PX (B) = P (X −1 (B)) ≥ 0.
(iii) Let {Bn } be a collection of mutually exclusive events in B. Then,
∞ ∞
! !!
[ [
−1
PX Bn = P X Bn
n=1 n=1

!
[
−1
=P X (Bn ) , (Proposition 0.41(c))
n=1

X
= P (X −1 (Bn )), (P is a probability measure and using Proposition 0.41(a))
n=1
X∞
= PX (Bn ).
n=1

This completes the proof.


Definition 0.43. The probability function PX defined above is called the probability function/ measure induced by
r.v. X and (R, B, PX ) is called the probability space induced by r.v. X.

The induced probability measure PX describes the random behaviour of X.


Example 0.44. Toss a coin three times independently. Then,
Ω = {HHH, HHT, HT H, T HH, HT T, T HT, T T H, T T T } and P ({ω}) = 1/8, ∀ ω ∈ Ω
and X : Ω → R (number of heads in three tosses) is defined by
0, if ω ∈ {T T T },



1, if ω ∈ {HT T, T HT, T T H},

X(ω) =

 2, if ω ∈ {HHT, HT H, T HH},

3, if ω ∈ {HHH}.

Lecture 0 Lecture Notes 0-20

Obviously, X : Ω → R is r.v. with induced probability space given by (R, B, PX ), where


PX ({0}) = P ({T T T }) = 1/8,
PX ({1}) = P ({HT T, T HT, T T H}) = 3/8,
PX ({2}) = P ({HHT, HT H, T HH}) = 3/8,
PX ({3}) = P ({HHH}) = 1/8.
Now for any B ∈ B,
X
PX (B) = P (X −1 (B)) = P ({ω ∈ Ω : X(ω) ∈ B}) = PX ({i}).
i∈B∩{0,1,2,3}

Definition 0.45. Let X be a r.v. defined on probability space (Ω, F, P ) and let (R, B, PX ) denote the probability
space induced by X. Define the function FX : R → R by
FX (x) = P (X ≤ x) = P (X −1 (−∞, x]) = PX ((−∞, x]), x ∈ R.
The function FX is called the cumulative distribution function (c.d.f.) or simply the distribution function (d.f.) of r.v.
X.

Note: Whenever there is no ambiguity we will drop subscript X in FX to represent d.f. of a r.v. by F . It can be shown
(in advanced courses) that the c.d.f. FX (·) of a r.v. X determines the induced probability measure PX (·) uniquely.
Thus to study the random behaviour of r.v. X it suffices to study its d.f. F .
Example 0.46. In the previous example
P (X = 0) = PX ({0}) = 1/8, P (X = 1) = PX ({1}) = 3/8 = P (X = 2) = PX ({2})
and P (X = 3) = PX ({3}) = 1/8. Then, the d.f. of X is obtained as


0, x < 0,

1/8, 0 ≤ x < 1,


X 
FX (x) = P (X ≤ x) = P ({ω : X(ω) ≤ x}) = PX ({i}) = 1/8 + 3/8 = 1/2, 1 ≤ x < 2,

7/8, 2 ≤ x < 3,
i∈{0,1,2,3} 


i≤x 
1, x ≥ 3.

Theorem 0.47. Let F (·) be the c.d.f. of a r.v. X defined on a probability space (Ω, F, P ) and let (R, B, PX ) be the
probability space induced by X. Then
(i) F is non-decreasing,
(ii) F (x) is right continuous,
(iii) F (−∞) = lim F (−n) = 0 and F (∞) = lim F (n) = 1.
n↑∞ n↑∞

Conversely, any function G(·) satisfying properties (i)-(iii) is a d.f. of some r.v. Y defined on a probability space
(Ω∗ , F∗ , P ∗ ).

Proof. (i) Let −∞ < x < y < ∞. Then (−∞, x] ⊆ (−∞, y] =⇒ PX ((−∞, x]) ≤ PX ((−∞, y]). This implies
that F (x) ≤ F (y).
(ii) Since F is monotone and bounded below (by 0), lim F (x + h) = F (x+) exists ∀ x ∈ R. Therefore,
h↓0
 
1
F (x+) = lim F (x + h) = lim F x+ = lim PX ((−∞, x + 1/n]) .
h↓0 n→∞ n n→∞
Lecture 0 Lecture Notes 0-21

T∞
Let An = (−∞, x + 1/n], n = 1, 2, . . . . Then An ↓ and n=1 (−∞, x + 1/n] = (−∞, x]. Thus,

!
\
F (x+) = PX (−∞, x + 1/n] = PX ((−∞, x]) = F (x).
n=1

(iii) Note that



!
\
F (−∞) = lim F (−n) = lim PX ((−∞, −n]) = PX (−∞, −n] , ((−∞, −n] ↓)
n→∞ n→∞
n=1

!
\
= PX (φ), (−∞, −n] = φ
n=1
= 0.

Also,

!
[
F (+∞) = lim F (n) = lim PX ((−∞, n]) = PX (−∞, n] , ((−∞, n] ↑)
n→∞ n→∞
n=1

!
[
= PX (R), (−∞, n] = R
n=1
= 1.

This completes the proof.


Remark 0.48. (i) Since any distribution function is monotone and bounded above (by 1), lim F (x − h) = F (x−)
h↓0
exists ∀x ∈ R. Moreover,

F (x−) = lim F (x − h) = lim F (x − 1/n) = lim PX ((−∞, x − 1/n])


h↓0 n→∞ n→∞

!
[
= PX (−∞, x − 1/n] , ((−∞, x − 1/n] ↑)
n=1
= PX ((−∞, x)) = P (X < x).

(ii) From the calculus we know that any monotone function is either continuous on R or it has atmost countable number
of discontinuities. Thus any c.d.f F (x) is either continuous on R or has atmost countable number of discontinuities.
Since, for any x ∈ R, F (x+) and F (x−) exist, F has only jump discontinuities (F (x) = F (x+) > F (x−)).
(iii) A distribution function F is continuous at a ∈ R iff F (a) = F (a−).

(iv) For any a ∈ R, P (X = a) = P (X ≤ a) − P (X < a) = F (a) − F (a−). Thus, a d.f. F is continuous at a ∈ R


iff P (X = a) = F (a) − F (a−) = 0.

(v) For −∞ < a < b < ∞, P (X ≤ b) = P (X ≤ a) + P (a < X ≤ b).

P (a < X ≤ b) = P (X ≤ b) − P (X ≤ a) = F (b) − F (a).

Similarly, for −∞ < a < b < ∞,

P (a < X < b) = P (X < b) − P (X ≤ a) = F (b−) − F (a),


Lecture 0 Lecture Notes 0-22

P (a ≤ X ≤ b) = P (X ≤ b) − P (X < a) = F (b) − F (a−),


P (a ≤ X < b) = P (X < b) − P (X < a) = F (b−) − F (a−),
P (a < X < b) = P (X < b) − P (X ≤ a) = F (b−) − F (a),
P (X > a) = 1 − P (X ≤ a) = 1 − F (a),
P (X ≥ a) = 1 − P (X < a) = 1 − F (a−).

Example 0.49. Consider the function G : R → R defined by




 0, if x < 0,
x
3, if 0 ≤ x < 1,



G(x) = 12 , if 1 ≤ x < 2,
2

3, if 2 ≤ x < 3,





1, if x ≥ 3.
(a) Show that G is d.f. of some r.v. X,
(b) Find P (X = a) for various values of a ∈ R,
(c) Find P (X < 3), P X ≥ 12 , P (2 < X ≤ 4), P (1 ≤ X < 2), P (2 ≤ X ≤ 3) and P 1
 
2 <X<3 .

Solution: (a) Clearly G is non-decreasing in (−∞, 0), (0, 1), (1, 2), (2, 3) and (3, ∞). Moreover,
1 1
G(0) − G(0−) = 0 ≥ 0, G(1) − G(1−) =
− > 0,
2 3
2 1 2
G(2) − G(2−) = − > 0, G(3) − G(3−) = 1 − > 0.
3 2 3
It follows that G is non-decreasing.
Clearly G is continuous ( and hence right continuous) on (−∞, 0), (0, 1), (1, 2), (2, 3) and (3, ∞). Moreover,

G(0+) − G(0) = 0 − 0 =0 
G(1+) − G(1) = 1/2 − 1/2 = 0

=⇒ G is right continuous on R.
G(2+) − G(2) = 2/3 − 2/3 = 0  
G(3+) − G(3) = 1 − 1 =0

Also, G(+∞) = lim G(x) = 1 & G(−∞) = lim G(−x) = 0. Thus, G is a d.f. of some random variable X.
x→∞ x→∞

(b) The set of discontinuity points of F is D = {1, 2, 3}. Thus,

P (X = a) = G(a) − G(a−) = 0, ∀ a 6= 1, 2, 3,
1 1 1
P (X = 1) = G(1) − G(1−) = − = ,
2 3 6
2 1 1
P (X = 2) = G(2) − G(2−) = − = ,
3 2 6
2 1
P (X = 3) = G(3) − G(3−) = 1 − = .
3 3

(c) Note that


2
P (X < 3) = G(3−) = ,
3
Lecture 0 Lecture Notes 0-23

   
1 1 1 5
P X≥ =1−G − = , =1−
2 2 6 6
2 1
P (2 < X ≤ 4) = G(4) − G(2) = 1 − = ,
3 3
1 1 1
P (1 ≤ X < 2) = G(2−) − G(1−) = − = ,
2 3 6
1 1
P (2 ≤ X ≤ 3) = G(3) − G(2−) = 1 − = ,
    2 2
1 1 2 1 1
P < X < 3 = G(3−) − G = − = .
2 2 3 6 2

0.5. Discrete Random Variables

Let (Ω, F, P ) be a probability space and let X : Ω → R be a r.v. with induced probability space (R, B, PX ) and d.f.
F.
Definition 0.50. The r.v. X is said to be a discrete r.v. if there exists a countable set S (finite or infinite) such that
P (X = x) = F (x) − F (x−) > 0 ∀ x ∈ S, and P (X ∈ S) = 1.
The set S is called the support of r.v. X.
Remark 0.51. (i) If S is the support of a discrete r.v. X, then clearly
S = {x ∈ R : F (x) − F (x−) > 0} = set of discontinuity points of F.

(ii) If x is a discontinuity point of d.f. F then


F (x) − F (x−) = size of jump of F at x.
Thus, a r.v. X is of discrete type ⇐⇒ sum of jump points of F equals 1, i.e.,
X X
P (X ∈ S) = P (X = x) = [F (x) − F (x−)] = 1.
x∈S x∈S

Example 0.52. In Example 0.49 the set of discontinuity points of G is D = {1, 2, 3} and
X
[G(x) − G(x−)] = 1/6 + 1/6 + 1/3 = 2/3 < 1 =⇒ X is not a discrete r.v.
x∈D

Example 0.53. Consider the d.f. (see Example 0.46)




 0, if x < 0,

1/8, if 0 ≤ x < 1,



FX (x) = 1/2, if 1 ≤ x < 2,

7/8, if 2 ≤ x < 3,





1, if x ≥ 3.

The set of discontinuity points of F is D = {0, 1, 2, 3} with


     
X 1 1 1 7 1 7
[F (x) − F (x−)] = + − + − + 1− = 1,
8 2 8 8 2 8
x∈D

which implies that X is a discrete r.v. with support S = D = {0, 1, 2, 3}.


Lecture 0 Lecture Notes 0-24

Definition 0.54. Let X be a r.v. with c.d.f. FX and support SX . Define the function fX : R → R by
(
P (X = x) = FX (x) − FX (x−) > 0, if x ∈ SX ,
fX (x) =
0, otherwise.

The function fX is called the probability mass function (p.m.f.) of r.v. X.

Whenever there is no ambiguity we will drop subscript X in FX , SX and fX to represent the d.f. of X by F , the
support of X by S and the p.m.f. of X by f .
Remark 0.55. (i) Let X be a discrete r.v with p.m.f. f and d.f F . Then, for any A ⊆ R
X
P (X ∈ A) = P (X ∈ A ∩ S) = f (x), (A ∩ S ⊆ S and thus A ∩ S is a countable set),
x∈A∩S

where S is the support of X.


X
Moreover, F (x) = f (y). Also, for any x ∈ S, f (x) = F (x) − F (x−).
y∈S∩(−∞,x]

(ii) Clearly a d.f. determines the p.m.f. uniquely and vice-versa. Thus it suffices to study the p.m.f. of discrete r.v.
(iii) Let X be a discrete r.v. with p.m.f. f and support S. Then, f : R → R satisfies
X
(i) f (x) > 0, ∀ x ∈ S, (ii) f (x) = 1.
x∈S

Conversely, suppose that g : R → R is a function such that, for some countable set T
X
(i) g(x) > 0, ∀ x ∈ T and (ii) g(x) = 1.
x∈T

Then, g(·) is the p.m.f. of some discrete r.v. having support T .


Example 0.56. Let X be a r.v. having d.f.


 0, if x < 0,

1/8, if 0 ≤ x < 1,



F (x) = 1/2, if 1 ≤ x < 2,

7/8, if 2 ≤ x < 3,





1, if x ≥ 3.

We have seen in Example 0.53 that X is a discrete r.v with support S = {0, 1, 2, 3}. Then, the p.m.f. of X is
f : R → R, where

f (0) = F (0) − F (0−) = 1/8, f (1) = F (1) − F (1−) = 1/2 − 1/8 = 3/8,
f (2) = F (2) − F (2−) = 7/8 − 1/2 = 3/8 and f (3) = F (3) − F (3−) = 1 − 7/8 = 1/8.

Thus, the p.m.f. of X is 


1/8,
 x = 0, 3,
f (x) = 3/8, x = 1, 2,

0, otherwise.

Lecture 0 Lecture Notes 0-25

Example 0.57. A fair die (all outcomes are equally likely) is tossed repeatedly and independently until a 6 is observed.
Then X is a discrete r.v. with support S = {1, 2, 3, . . . }.
 x−1
 5
 1
, if x = 1, 2, 3, . . . ,
p.m.f. f (x) = P (X = x) = 6 6

0, otherwise

and d.f. 
0,


if x < 1,
1/6, if 1 ≤ x < 2,




11/36, if 2 ≤ x < 3,

F (x) =

 ..



 .
Pi ( 5 )j−1 1 = 1 − 5 i ,


j=1 6 6 6 if i ≤ x < i + 1.

0.6. Continuous Random Variable

Let X be a random variable with d.f. F .


Definition 0.58. The r.v. X is said to be a continuous r.v. if there exists a non-negative integrable function f : R →
[0, ∞) such that, for any x ∈ R, Zx
F (x) = P (X ≤ x) = f (t)dt.
−∞

The function f (·) is called the probability density function (p.d.f.) of X. The support of the continuous r.v X is the
Z x+h
set S = {x ∈ R : F (x + h) − F (x − h) > 0 ∀ h > 0}, that is, S = {x ∈ R : f (t)dt > 0 ∀ h > 0}.
x−h

Remark 0.59. (i) From the fundamental theorem of calculus, we know that the definite integral
Z x
F (x) = f (t)dt
−∞

is a continuous function on R. Thus, the d.f F of any continuous r.v X is continuous everywhere on R. In particular,

P (X = x) = F (x) − F (x−) = 0, ∀ x ∈ R.

Generally, if A is any countable subset of R then for any continuous r.v. X


X
P (X ∈ A) = P (X = x) = 0.
x∈A

(ii) If X is a continuous r.v. then


(a) P (X < x) = P (X ≤ x) = F (x) ∀ x ∈ R,
(b) P (X ≥ x) = 1 − P (X < x) = 1 − F (x) ∀ x ∈ R,
(c) For any a, b ∈ R, −∞ < a < b < ∞,

P (a < X < b) = P (a ≤ X < b) = P (a < X ≤ b) = P (a ≤ X ≤ b)


Lecture 0 Lecture Notes 0-26

= F (b) − F (a)
Z b Z a Z b
= f (t)dt − f (t)dt = f (t)dt.
−∞ −∞ a

(iii) Let f (·) be the p.d.f. of a continuous r.v. X and let E ⊆ R be any countable subset of R. Define g : R → [0, ∞)
by (
f (x), if x ∈ R ∩ E c ,
g(x) =
Cx , if x ∈ E,
where Cx ≥ 0 are arbitrary. Then
Z x Z x
F (x) = f (t)dt = g(t)dt ∀ x ∈ R
−∞ −∞

and, thus, g is also a p.d.f. of X. Thus, the p.d.f. of a continuous r.v. is not unique.
(iv) There are random variables that are neither discrete nor continuous (see Example 0.49). Such random variables
will not be studied here.

We state the following theorem without proof.


Theorem 0.60. Let X be a r.v. with Zd.f. F . Suppose that F is differentiable everywhere except (possibly) on a

countable set E. Further suppose that F 0 (t)dt = 1. Then, X is a continuous r.v with p.d.f.
−∞
(
F 0 (x), x ∈ Ec,
f (x) =
0, x ∈ E.

Remark 0.61. (i) The p.d.f. determines the d.f. uniquely. Converse is not true. However, the d.f. determines the p.d.f.
almost uniquely (they may vary on sets that have no length (or have zero content)). Thus it is enough to study the p.d.f.
of a continuous r.v.
(ii) Let X be continuous r.v. with p.d.f. f (x). Then,
Z ∞
(a) f (x) ≥ 0 ∀ x ∈ R and (b) f (t)dt = 1.
−∞

Conversely, suppose that g : R → R is a function such that


Z ∞
(a) g(x) ≥ 0 ∀ x ∈ R, (b) g(t)dt = 1.
−∞
( )
Z x+h
Then, g(·) is the p.d.f. of some continuous r.v. having support T = x∈R: g(t)dt > 0 ∀ h > 0 .
x−h

Example 0.62. Let X be a r.v. with d.f.




 0, if x < 0,

x/4, if 0 ≤ x < 1,



F (x) = x/3, if 1 ≤ x < 2,

3x/8, if 2 ≤ x < 5/2,





1, if x ≥ 5/2.

Examine whether X is a continuous r.v. or a discrete r.v. or none?


Lecture 0 Lecture Notes 0-27

Solution: Let D be the set of discontinuity points of F . Then D = {1, 2, 5/2}. So, D 6= φ =⇒ X is not a
continuous r.v. So
     
X 1 1 3 2 15 11
[F (x) − F (x−)] = − + − + 1− = < 1 =⇒ Xis not a discrete r.v.
3 4 4 3 16 48
x∈D

Thus, X is neither a discrete nor a continuous r.v.


Example 0.63. Let X be a r.v with d.f.

 0, if x < 0,

x2 /2,

if 0 ≤ x < 1,
F (x) =

 x/2, if 1 ≤ x < 2,

if x ≥ 2.

1,

Show that X is a continuous r.v. Find the p.d.f. of X and support of X.

Solution: Clearly F is continuous everywhere. Moreover, F is differentiable everywhere except at two (countable)
points 1, 2, and 
 0, if x < 0,


x, if 0 < x < 1,
F 0 (x) =

 1/2, if 1 < x < 2,

if x ≥ 2.

0,
Z ∞ Z 1 Z 2
1
Also, F 0 (x)dx = xdx + dx = 1 =⇒ X is continuous r.v. with p.d.f.
−∞ 0 1 2

x,
 if 0 < x < 1,
f (x) = 1/2, if 1 < x < 2,

0, otherwise.

The support of X is
( )
Z x+h
S = {x ∈ R : F (x + h) − F (x − h) > 0 ∀ h > 0} = x∈R: f (t)dt > 0 ∀ h > 0 = [0, 2].
x−h

Example 0.64. Let X be a continuous r.v. with p.d.f.


 2
x ,
 if 0 < x < 1,
−x
f (x) = ce , if x ≥ 1, where c ≥ 0 is a constant,

0, otherwise.

(a) Find the value of c,


(b) Find P (1/2 ≤ X ≤ 2),
(c) Find the support of X,
(d) Find the d.f. of X.

Solution: (a) We have


Z b Z 1 Z ∞
2e
f (x)dx = 1 =⇒ x2 dx + ce−x dx = 1 =⇒ 1/3 + ce−1 = 1 =⇒ c = .
a 0 1 3
Lecture 0 Lecture Notes 0-28

(b) Observe that,


Z 2 Z 1 Z 2
P (1/2 ≤ X ≤ 2) = f (x)dx = 2
x dx + c e−x dx
1/2 1/2 1
1 7 2
= (1 − 1/8) + c(e−1 − e−2 ) = + (1 − e−1 ).
3 24 3
( )
Z x+h
(c) The support of X is S = x∈R: f (t)dt > 0 ∀ h > 0 = [0, ∞).
x−h
Z x
(d) The d.f. of X is F (x) = f (t)dt. For x < 0, clearly F (x) = 0. For 0 ≤ x < 1,
−∞
Z x
F (x) = t2 dt = x3 /3.
0
For x ≥ 1, Z 1 Z x
1 2
F (x) = t2 dt + c e−t dt = + (1 − e−(x−1) ).
0 1 3 3
Thus, 

 0, if x < 0,

 x3

F (x) = , if 0 ≤ x < 1,
 3
 1 + 2 (1 − e−(x−1) ), if x ≥ 1.



3 3
Remark 0.65. Let X be a continuous r.v. with p.d.f. f (·). If f is continuous at x ∈ R, then
1 x+δ/2
Z
f (x) = lim f (t)dt =⇒ P (x − δ/2 ≤ X ≤ x + δ/2) ≈ δf (x), for small δ > 0,
δ↓0 δ x−δ/2

that is, P (x − dx ≤ X ≤ x + dx) ≈ f (x)dx.

0.7. Probability Distribution of a Function of Discrete Random Variable

Let (Ω, F, P ) be a probability space and let X : Ω → R be a r.v. with d.f. F , p.m.f. f and support S. Let h : R → R
be a given function. Define Z : Ω → R as
Z(ω) = h(X(ω)), ω ∈ Ω.
Then Z is a r.v. and it is a function of r.v. X. Since we are only interested in values of random variables X and Z and
not in the original probability space (Ω, F, P ), we simply write X(ω), ω ∈ Ω as X and Z(ω), ω ∈ Ω as Z.
We have F (x) = P (X ≤ x), f (x) = P (X = x), x ∈ R, P (X ∈ S) = 1 and P (X = x) > 0 for all x ∈ S.
Define T = h(S) = {h(x) : x ∈ S}. For any set A ⊆ R, define
h−1 (A) = {x ∈ S : h(x) ∈ A}.
Then T is a countable set. Also, P (Z = z) > 0, ∀ z ∈ T (since P (X = x) > 0, ∀x ∈ S) and P (Z ∈ T ) = 1 (since
P (X ∈ S) = 1). It follows that Z is a discrete r.v. Moreover, for z ∈ T ,
X X X
P (Z = z) = P (h(X) = z) = P (X = x) = P (X = x) = f (x),
{x∈S:h(x)=z} x∈h−1 ({z}) x∈h−1 (z)

and for any z 6∈ T , P (Z = z) = 0. Thus, we have the following theorem:


Lecture 0 Lecture Notes 0-29

Theorem 0.66. Let X be a discrete r.v. with support S, d.f. F and p.m.f. f . Let h : R → R be a given function. Then,
Z = h(X) is a discrete r.v. with support T = {h(x) : x ∈ S} and p.m.f.
 X

 f (x), if z ∈ T,
g(z) = x∈h ({z})
−1

0, otherwise,

and d.f. X X X
G(z) = P (Z ≤ z) = g(t) = f (x) = f (x).
{t∈T :t≤z} {x∈S:h(x)≤z} x∈h−1 ((−∞,z])∩S

In particular, if h : S → R is one-one then


(
f (h−1 (z)), if z ∈ T,
g(z) =
0, otherwise.

Example 0.67. Let X be a discrete r.v. with p.m.f.



1/7, if x ∈ {−2, −1, 0, 1},

f (x) = 3/14, if x ∈ {2, 3},

0, otherwise.

Find the p.m.f. and d.f. of Y = X 2 .

Solution: Here, the support of X is S = {−2, −1, 0, 1, 2, 3}. By Theorem 0.66, Y = X 2 is discrete r.v. with support
T = {0, 1, 4, 9} and p.m.f.
 

 P (X = 0), if z = 0, 
 1/7, if z = 0,
 
P (X = −1) + P (X = 1), if z = 1, 2/7, if z = 1,

 

 
2
g(z) = P (X = z) = P (X = −2) + P (X = 2), if z = 4, = 5/14, if z = 4,
 
P (X = 3), if z = 9, 3/14, if z = 9,

 

 
 
0, otherwise. 0, otherwise.
 

The d.f. of Y is 

 0, if z < 0

1/7, if 0 ≤ z < 1



G(z) = P (Y ≤ z) = 3/7, if 1 ≤ z < 4

11/14, if 4 ≤ z < 9





1, if z ≥ 9.

Example 0.68. In Example 0.67, directly find the d.f. of Y = X 2 (i.e. find d.f. of Y before finding the p.m.f. of Y ).
Hence find the p.m.f. of Y .

Solution: By Theorem 0.66, Y is a discrete r.v. with support T = {0, 1, 4, 9}. Thus the d.f. of Y is


 0, z < 0,

P (X 2 = 0), 0 ≤ z < 1,



G(z) = P (Y ≤ z) = P (X 2 ≤ z) = P (X 2 = 0) + P (X 2 = 1), 1 ≤ z < 4,
 2 2 2
P (X = 0) + P (X = 1) + P (X = 4), 4 ≤ z < 9,




1, z ≥ 9.

Lecture 0 Lecture Notes 0-30

 
0, z < 0,  0, z < 0,
 
1
 


7
 , 0 ≤ z < 1, 


 1/7, 0 ≤ z < 1,
1 1 1
= 7 + 7 + 7 , 1 ≤ z < 4, = 3/7, 1 ≤ z < 4,
 
1 1 1 1 3
+ 7 + 7 + 7 + 14 , 4 ≤ z < 9,  11/14, 4 ≤ z < 9,

 
7

 

 
z ≥ 9. 1, z ≥ 9.
 
1,

The p.m.f. of Y is 

 1/7, if z = 0,

2/7, if z = 1,
( 

G(z) − G(z−), if z ∈ T,

g(z) = = 5/14, if z = 4,
0, otherwise. 
3/14, if z = 9,





0, otherwise.

0.8. Probability Distribution of a Function of Continuous Random Variable


Z x+h
Let X be a continuous r.v. with d.f. F , p.d.f. f (·) and support S = {x ∈ R : F (x + h) − F (x − h) = f (t)dt >
x−h
0, ∀ h > 0}. For convenience assume that S = [a, b] and {x ∈ R : f (x) > 0} = (a, b), for some −∞ ≤ a < b ≤ ∞
(with the convention that [−∞, b] ≡ (−∞, b), ∀ b ∈ R, [a, ∞] ≡ (a, ∞), ∀ a ∈ R and [−∞, ∞] ≡ (−∞, ∞)).
Let h : R → R be a function such that h is strictly monotone and differentiable function on S. Then Z = h(X) is a
r.v. with d.f. G(z) = P (Z ≤ z) = P (h(X) ≤ z), z ∈ R.
For any sets A ⊆ R and B ⊆ R, define h(A) = {h(x) : x ∈ A} and h−1 (B) = {x ∈ R : h(x) ∈ B}. Clearly
P (X ∈ (a, b)) = 1 and therefore P (h(X) ∈ h((a, b)) = 1). Consider the following cases:
Case I: h(·) is strictly increasing on S
We have P (h(a) < Z < h(b)) = 1. Therefore, for z < h(a), P (Z ≤ z) = 0 and for z ≥ h(b), P (Z ≤ z) = 1. For
h(a) < z < h(b),
Z h−1 (z) Z h−1 (z) Z z
d −1
G(z) = P (h(X) ≤ z) = P (X ≤ h−1 (z)) = f (t)dt = f (t)dt = f (h−1 (y)) h (y) dy.
−∞ a h(a) dy

Thus, 
0, if z < h(a),

Z z

d −1
G(z) = f (h−1 (y)) h (y) dy, if h(a) ≤ z < h(b),

 h(a) dy

1, if z ≥ h(b).

Since f is continuous on (a, b) it follows that G(z) is differentiable everywhere except possibly at z = h(a) and
z = h(b). Moreover, 
f (h−1 (z)) d h−1 (z) , if h(a) < z < h(b),

0 dz
G (z) =

0, otherwise,

and Z ∞ Z h(b) Z b
d −1
G0 (z)dz = f (h−1 (z)) h (z) dz = f (t)dt = 1.
−∞ h(a) dz a
Lecture 0 Lecture Notes 0-31

It follows that Z is a continuous r.v. with p.d.f.



f (h−1 (z)) d h−1 (z) , if h(a) < z < h(b),

g(z) = dz

0, otherwise

and support T = [h(a), h(b)].


Case II: h(·) is strictly decreasing on S
Here P (h(b) < h(X) < h(a)) = 1 and G(z) = P (h(X) ≤ z), z ∈ R. Clearly, for z < h(b), G(z) = 0 and for
z ≥ h(a), G(z) = 1. For h(b) < z < h(a),
Z ∞ Z b Z z
−1 d −1
G(z) = P (X ≥ h (z)) = f (t)dt = f (t)dt = f (h−1 (y)) h (y) dy.
h−1 (z) h−1 (z) h(b) dy

Thus, 
 0, if z < h(b),

Z z
 d −1
G(z) = f (h−1 (y)) h (y) dy, if h(b) ≤ z < h(a),

 h(b) dy

1, if z ≥ h(a).

Since f is continuous on (a, b), it follows that G(·) is differentiable everywhere except possibly at h(a) and h(b).
Moreover, 
f (h−1 (z)) d h−1 (z) , if h(b) < z < h(a),

0 dz
G (z) =

0, otherwise

and Z ∞ Z h(a) Z b
d −1
G0 (z)dz = f (h−1 (z)) h (z) dz = f (t)dt = 1.
−∞ h(b) dz a

Consequently, Z is a continuous r.v. with p.d.f.



f (h−1 (z)) d h−1 (z) , if h(b) < z < h(a),

g(z) = dz

0, otherwise

and support T = [h(b), h(a)].


Combining Case I and Case II, we get the following result:
Theorem 0.69. Let X be a continuous r.v. with p.d.f. f (·) and support S = [a, b] for some −∞ ≤ a < b ≤ ∞.
Suppose that {x ∈ R : f (x) > 0} = (a, b) and that f is continuous on (a, b). Let h : R → R be a function that is
differentiable and strictly monotone on (a, b). Then, Z = h(X) is a continuous r.v. with p.d.f.

f (h−1 (z)) d h−1 (z) , if z ∈ h((a, b)),

dz

0, otherwise,

and support T = [min{h(a), h(b)}, max{h(a), h(b)}].

The following theorem is a generalization of the above result and can be proved on similar lines.
Lecture 0 Lecture Notes 0-32

S
Theorem 0.70. Let X be a continuous r.v. with p.d.f. f (·) and support S =S i∈Λ [ai , bi ], where Λ is a countable set
and [ai , bi ]’s are disjoint intervals. Suppose that {x ∈ R : f (x) > 0} = i∈Λ (ai , bi ) and that f is continuous in
each (ai , bi ), i ∈ Λ. Let h : R → R be a function that is differentiable and strictly monotone in each (ai , bi ), i ∈ Λ (h
may be monotonic increasing in some (ai , bi ) and monotonic decreasing in some (ai , bi )). Let h−1 i (·) be the inverse
function of hi on (ai , bi ), i ∈ Λ. Then, Z = h(X) is a continuous r.v. with p.d.f.
(
X
−1 d −1 1, z ∈ hj ((aj , bj )),
g(z) = f (hj (z)) hj (z) Ihj ((aj ,bj )) (z), where Ihj ((aj ,bj )) (z) = .
dz 0, otherwise.
j∈Λ

Remark 0.71. Theorem 0.69 and Theorem 0.70 hold even in situations where the function h is differentiable every-
where except possibly at a finite number of points in S.
Example 0.72. Let X be a r.v. with p.d.f.
(
3x2 , 0 < x < 1,
f (x) =
0, otherwise.

Find the p.d.f. and d.f. of Y = 1/X 2 . What is the support of d.f. of Y .

Solution: The support of F is [0, 1] and {x ∈ R : f (x) > 0} = (0, 1). Moreover, f is continuous on (0, 1) and
h(x) = 1/x2 is differentiable and strictly monotone on (0, 1).
h((0, 1)) = (1, ∞). Now

1 1 1 d −1 1
y= =⇒ x = √ , i .e., h−1 (y) = √ =⇒ h (y) = − √ , y ∈ (1, ∞).
x2 y y dy 2y y

Thus, Y = 1/X 2 is continuous r.v. with p.d.f. g(y) given by

d −1
g(y) = f (h−1 (y)) h (y) Ih((0,1)) (y)
dy
d −1
= f (h−1 (y)) h (y) I(1,∞) (y)
dy
3
 · 1√ , if y > 1,  3√ , if y > 1,

2
= y 2y y = 2y y
 
0, otherwise, 0, otherwise.

The d.f. of Y is
 
Z y 0, if y < 1, 0, if y < 1,
Z y
G(y) = g(t)dt = 3 = 1
−∞  √ dt, if y > 1, 1 − , if y > 1.
2 t y 3/2
1 2t

Clearly the support of G is [1, ∞).


Example 0.73. Let X be r.v. with p.d.f.

|x|/2, if − 1 < x < 1,

f (x) = x/3, if 1 ≤ x ≤ 2,

0, otherwise

and let Y = X 2 .
Lecture 0 Lecture Notes 0-33

(a) Find the p.d.f. of Y directly and hence find the d.f. of Y .
(b) Find the d.f. of Y and hence find the p.d.f. of Y .
(c) Find the support of d.f. of Y .

Solution: (a) The support of F is S = [−1, 2] and we may take S = [−1, 0] ∪ [0, 2], {x ∈ R : f (x) > 0} =
(−1, 0) ∪ (0, 2). The p.d.f. f is continuous on (−1, 0) ∪ (0, 1) ∪ (1, 2), h(x) = x2 is differentiable on (−1, 0) ∪ (0, 2),
h(·) is strictly decreasing on (−1, 0) and strictly increasing on (0, 2).

h(x) = x2 is strictly decreasing on S1 = (−1, 0) with inverse function h−1 1 (y) = − y, y ∈ (0, 1), h(S1 ) = (0, 1).

h(x) = x2 is strictly increasing on S2 = (0, 2) with inverse function h−1 2 (y) = y, y ∈ (0, 4), h(S2 ) = (0, 4).
Thus, Y = X 2 is a continuous r.v. with p.d.f.

d −1 d −1
g(y) = f (h−1
1 (y)) h (y) I(0,1) (y) + f (h−1
2 (y)) h (y) I(0,4) (y)
dy 1 dy 2
√ −1 √ 1
= f (− y) √ I(0,1) (y) + f ( y) √ I(0,4) (y)
2 y 2 y
1  √ √ 
= √ f (− y)I(0,1) (y) + f ( y)I(0,4) (y)
2 y
1
 2 , if 0 < y < 1,

= 61 , if 1 < y < 4,

0, otherwise.

The d.f. of Y is

0, if y < 0, 


Z y  0, if y < 0,
dt
 


, if 0 ≤ y < 1,

 y
 , if 0 ≤ y < 1,
Z y 
 


0 2 2
G(y) = P (X 2 ≤ y) = g(t)dt = Z 1 Z y = y+2
−∞  dt dt 
, if 1 ≤ y < 4,

 + , if 1 ≤ y < 4, 

6
2 1 6

 

 0 
1, if y ≥ 4.

 
1, if y ≥ 4.

(b) The d.f. of Y is

(
2 0, if y < 0,
G(y) = P (X ≤ y) = √ √ .
P {− y ≤ X ≤ y}, if y > 0.
For 0 ≤ y < 1, √
y
√ √ |x|
Z
y
G(y) = P {− y ≤ X ≤ y} = √
dx = .
− y 2 2
√ √
For 1 ≤ y < 4 (so that −2 < − y ≤ −1 and 1 ≤ y ≤ 2)

1 y
√ √ |x|
Z Z
x y+2
G(y) = P {− y ≤ X ≤ y} = dx + dx = .
−1 2 1 3 6
Lecture 0 Lecture Notes 0-34

For y ≥ 4, G(y) = 1. Therefore 



 0, if y < 0,

 y
 , if 0 ≤ y < 1,


2
G(y) = y + 2
, if 1 ≤ y < 4,


6




1, if y ≥ 4.

Clearly G is differentiable everywhere except at finite number of points (0,1 and 4) and we may take

1/2, if 0 < y < 1,

0
G (y) = 1/6, if 1 < y < 4,

0, otherwise.

Z ∞ Z 1 Z 4
1 1
Moreover, G0 (y)dy = dy + dy = 1. Thus, Y is a continuous r.v. with p.d.f.
−∞ 0 2 1 6

1/2, if 0 < y < 1,

g(y) = 1/6, if 1 < y < 4,

0, otherwise.

(c) The support of G is [0, 4].

0.9. Expectation (or Mean) of Random Variables

Let X be a discrete r.v. with p.m.f. f (·) and support S. For any x ∈ S, f (x) gives an idea about proportion
P of
times we will observe the event {X = x} if the experiment is repeated a large number of times. Thus x∈S xf (x)
represents the mean (or expected) value of r.v. X if the experiment is repeated a large number of times.
Z ∞
Similarly, if X is a continuous r.v. with p.d.f. f (·) then xf (x)dx (provided the integral is finite) represents the
−∞
mean (or expected) value of r.v. X.
Definition 0.74. (a) Let X be a discrete r.v. with p.m.f. f (·) and support S. We say that the expected value of X (or
the mean of X, which we denote by E(X)) is finite and equals
X X
E(X) = xf (x), provided |x|f (x) < ∞.
x∈S x∈S

(b) Let X be a continuous r.v. with p.d.f. f (·) and support S. We say that the expected value of X (or the mean of X,
which we denote by E(X)) is finite and equals
Z ∞ Z ∞
E(X) = xf (x)dx, provided |x|f (x)dx < ∞.
−∞ −∞

Example 0.75. (a) Let X be a discrete r.v. with p.m.f.

 1 , if x ∈ {1, 2, 3, . . . },

f (x) = 2x .
0, otherwise.

Show that E(X) is finite. Find E(X).


Lecture 0 Lecture Notes 0-35

(b) Let X be a r.v. with p.m.f.


 3 , if x ∈ {±1, ±2, . . . },

f (x) = π 2 x2 .
0, otherwise.

Show that E(X) is not finite.


e−|x|
(c) Let X be a continuous r.v. with p.d.f. f (x) = , −∞ < x < ∞. Show that E(X) is finite. Find E(X).
2
1
(d) Let X be a continuous r.v. with p.d.f. f (x) = , −∞ < x < ∞. Show that E(X) is not finite.
π(1 + x2 )

Solution: (a) The support of the distribution is S = {1, 2, . . . }. Also,


∞ ∞
X X n X
|x|f (x) = = an (say),
n=1
2n n=1
x∈S

n
where an = > 0, ∀ n = 1, 2, . . . and
2n
an+1 n+1 1
= → < 1, as n → ∞.
an 2n 2
P P∞ n
Thus by the ratio test x∈S |x|f (x) = n=1 < ∞. It can be seen that E(X) = 2 (Exercise).
2n
(b) Here the support of the distribution is S = {±1, ±2, . . . }.

X 6 X1
|x|f (x) = = ∞ =⇒ E(X) is not finite.
π 2 n=1 n
x∈SX

(c) We have
∞ ∞ ∞
e−|x|
Z Z Z
|x|f (x)dx = |x| dx = xe−x dx = 1 < ∞ =⇒ E(X) is finite
−∞ −∞ 2 0

and
∞ ∞
e−|x|
Z Z
E(X) = xf (x)dx = x dx = 0.
−∞ −∞ 2
(d) We have
Z ∞ Z ∞ Z ∞
1 2 x
|x|f (x)dx = |x| 2
dx = dx = ∞ =⇒ E(X) is not finite.
−∞ −∞ π(1 + x ) π 0 1 + x2

Example 0.76 (St. Petersburg Paradox). To make some money a gambler plays a sequence of fair games with the
following strategy:
In the first bet he bet Rs. 1 million. If the first bet is lost he doubles his bet in the second game. He keeps on doubling
his bet until he wins a game. If the gambler has not won by the mth trial he bets Rs. 2m million in the (m + 1)th
game. If he wins in kth game then

Investment=1 + 2 + 4 + · · · + 2k−1 = 2k − 1 million rupee, win=2k million rupee.

Total earning if he wins on the kth game= 1 million rupee.


Lecture 0 Lecture Notes 0-36

The above scheme seems to be foolproof for earning Rs. 1 million rupee. By this logic all gamblers should be
billionaries!
X : the amount of money bet on the last game (the game he wins). Then

1 X 2k
P (X = 2k ) = k+1
, k = 0, 1, 2, . . . , E(X) = = ∞ (E(X) is not finite).
2 2k+1
k=0

This implies enormous amount of money would be required.


Theorem 0.77. Let X be a continuous (discrete) r.v. Then
Z ∞ Z 0
E(X) = P (X > y)dy − P (X < y)dy,
0 −∞

provided E(X) is finite.

Proof. We will provide the proof for the case when X is a continuous r.v. with p.d.f., say f . We have
Z ∞
E(X) = xf (x)dx
−∞
Z 0 Z ∞
= xf (x)dx + xf (x)dx
−∞ 0
Z 0 Z 0 Z ∞ Z x
=− f (x)dydx + f (x)dydx
−∞ x 0 0
Z 0 Z y Z ∞ Z ∞ Z 0 Z ∞
=− f (x)dxdy + f (x)dxdy = − P (X < y)dy + P (X > y)dy.
−∞ −∞ 0 y −∞ 0

This completes the proof.


Corollary 0.78. (a) Suppose that X is a continuous (discrete) r.v. with P (X ≥ 0) = 1. Then
Z ∞
E(X) = P (X > y)dy.
0

(b) Suppose that P (X ∈ {0, ±1, ±2, . . . }) = 1. Then



X ∞
X
E(X) = P (X ≥ n) − P (X ≤ −n).
n=1 n=1

P∞
(c) Suppose that P (X ∈ {0, 1, 2, . . . }) = 1. Then E(X) = n=1 P (X ≥ n).

Proof. Exercise.

The following theorem suggests that for any r.v. X and any function h : R → R, E(h(X)) can be directly found using
p.m.f. / p.d.f. of X.
Theorem 0.79. (a) Let X be a discrete r.v. with p.m.f. f (·) and support S. Let h : R → R be a given function and let
Z = h(X). Then X X
E(Z) = h(x)f (x) provided |h(x)|f (x) < ∞.
x∈S x∈S
Lecture 0 Lecture Notes 0-37

(b) Let X be a continuous r.v. with p.d.f. f (·) and let h : R → R be a given function. If Z = h(X), then
Z ∞ Z ∞
E(Z) = h(x)f (x)dx, provided |h(x)|f (x)dx < ∞.
−∞ −∞

Proof. We will provide the proof of (a) only. The proof of (b) follows on similar lines. The support of Z = h(X) is
T = h(S). We have
X X
E(Z) = tP (Z = t) = tP (h(X) = t)
t∈T t∈T
X X
= t P (X = x)
t∈T {x∈S:h(x)=t}
X X
= tP (X = x)
{x∈S:h(x)=t} t∈T
X X
= h(x)P (X = x)
{x∈S:h(x)=t} t∈T
X X
= h(x)P (X = x)
t∈T {x∈S:h(x)=t}
X X
= h(x)P (X = x) = h(x)P (X = x).
S
{x∈S:h(x)=t} x∈S
t∈T

This completes the proof.

Example 0.80. (a) Let the r.v. X have the p.m.f.


(
1/6, if x = −2, −1, 0, 1, 2, 3,
f (x) =
0, otherwise.

Find E(X 2 ).
(b) Let the r.v. X have the p.d.f. (
2x, if 0 < x < 1,
f (x) =
0, otherwise.

Find E(X 3 ).

X 1 1 1 1 1 1 19
Solution: (a) E(X 2 ) = x2 f (x) = 4 × +1× +0× +1× +4× +9× = .
6 6 6 6 6 6 6
x∈S
Z ∞ Z 1
3 3 2
(b) E(X ) = x f (x)dx = 2 x4 dx = .
−∞ 0 5
Theorem 0.81. Let X be a discrete or continuous r.v. with p.m.f./ p.d.f. f and support S. Let hi : R → R,
i = 1, 2, . . . , m be given functions.
(a) Then, for real constants c1 , c2 , . . . , cm
m
! m
X X
E ci hi (X) = ci E(hi (X)),
i=1 i=1
Lecture 0 Lecture Notes 0-38

provided involved expectations are finite.


(b) Let h1 (x) ≤ h2 (x), ∀ x ∈ S. Then,

E(h1 (X)) ≤ E(h2 (X)), provided involved expectations are finite.

In particular, if E(X) is finite and P (a ≤ X ≤ b) = 1, for some real constants a and b (a < b) then a ≤ E(X) ≤ b.
(c) If P (X ≥ 0) = 1 and E(X) = 0, then P (X = 0) = 1.
(d) If E(X) is finite then |E(X)| ≤ E(|X|).
(e) Let a and b be two real constants. Then,

E(aX + b) = aE(X) + b, provided involved expectations are finite.

Proof. The proofs of (a), (b) and (e) follows from the definition of expectation of a r.v.
(c) We will provide the proof for the case when X is a continuous r.v. Then
∞  !
[ 1
P (X > 0) = P X≥
n=1
n
    
1 1
= lim P X ≥ , X≥ ↑
n→∞ n n
Z ∞
= lim f (x)dx
n→∞ 1/n
Z∞
≤ lim nxf (x)dx, (x ∈ [1/n, ∞) =⇒ nx ≥ 1)
n→∞ 1/n
 Z ∞ 
≤ lim n xf (x)dx
n→∞ 0
= lim [nE(X)] = 0 =⇒ P (X = 0) = 1.
n→∞

(d) We have
−|X| ≤ X ≤ |X| =⇒ E(−|X|) ≤ E(X) ≤ E(|X|) =⇒ |E(X)| ≤ E(|X|).
This completes the proof.

Some Special Expectations:


(i) h(x) = x, E(X) = µ01 = mean of X.
(ii) h(x) = xr , r = {1, 2, . . . }, E(X r ) = µ0r = rth moment of X about origin.
(iii) h(x) = |x|r , r = {1, 2, . . . }, E(|X|r ) = rth absolute moment of X about origin.
(iv) h(x) = (x − µ01 )r , r = {1, 2, . . . }, E(X − µ01 )r = µr = rth moment of X about its mean or rth central moment.

(v) µ2 = E(X − µ01 )2 = σ 2 = variance of X. We also denote it by Var(X). And, µ2 = E(X − µ01 )2 = σ is
p

called the standard deviation of X (positive square root of the variance of r.v. X).
Remark 0.82. (i) Var(X) = σ 2 = E(X − µ01 )2 = E(X 2 − 2µ01 X + (µ01 )2 ) = E(X 2 ) − 2(µ01 )2 + (µ01 )2 =
E(X 2 ) − (E(X))2 .
Lecture 0 Lecture Notes 0-39

(ii) Since (X − µ01 )2 ≥ 0, we have

Var(X) = E(X − µ01 )2 ≥ 0 =⇒ E(X 2 ) ≥ (E(X))2 .

(iii) Var(X) = 0 ⇐⇒ E(X − µ01 )2 = 0 ⇐⇒ P (X = E(X)) = 1.

Theorem 0.83. Let X be a r.v. such that E(|X|s ) < ∞, for some s > 0. Then, E(|X|r ) < ∞, ∀ 0 < r < s.

Proof. Note that |X|r ≤ max{|X|s , 1} ≤ |X|s + 1. This implies that E(|X|r ) ≤ E(|X|s + 1) = E(|X|s ) + 1 < ∞.
Thus, the result follows.

0.10. Moment Generating Function

Let X be a r.v with d.f. F and p.d.f. / p.m.f. f (·).


Definition 0.84. We say that the moment generating function (m.g.f.) of X (denoted by MX (·)) exists and equals

MX (t) = E(etX ), provided E(etX ) is finite in (−h, h) for some h > 0.

Remark 0.85. (i) MX (0) = 1, thus A = {t ∈ R : E(etX ) is finite} =


6 φ.
(ii) MX (t) > 0, ∀ t ∈ A = {s ∈ R : E(esX ) is finite}.
(iii) Suppose that MX (t) exists and is finite on (−h,h) for some
 h > 0. For real constants c and d, let Y = cX + d.
h h
Then, the m.g.f. of Y also exists and is finite on − , (with the convention that ± a0 = ±∞, if a > 0).
|c| |c|
Moreover,  
h h
MY (t) = E(et(cX+d) ) = etd MX (ct), t ∈ − , .
|c| |c|

(iv) The name m.g.f. to the transform MX is motivated by the fact that MX can be used to generate moments of any
r.v., as illustrated in the following theorem.
Theorem 0.86. Let X be a r.v. with m.g.f. MX that is finite on (−h, h), h > 0. Then,
(a) For each r ∈ {1, 2, . . . }, µ0r = E(X r ) is finite;
(r)
(b) For each r ∈ {1, 2, . . . }, µ0r = E(X r ) = MX (0), where
 r 
(r) d
MX (0) = MX (t) , the rth derivative of MX at the point 0;
dtr t=0

∞ r
X t tr
(c) MX (t) = µ0r , t ∈ (−h, h), so that µ0r is equal to coefficient of (r = 1, 2, . . . ) in the Maclaurin’s series
r!
r=0
r!
expansion of MX (t) around t = 0.

Proof. (a) We have

E(etX ) < ∞, ∀ t ∈ (−h, h)


Z 0 Z ∞
tx
=⇒ e f (x)dx < ∞ ∀ t ∈ (−h, h) and etx f (x)dx < ∞ ∀ t ∈ (−h, h)
−∞ 0
Lecture 0 Lecture Notes 0-40

Z 0 Z ∞
=⇒ e−t|x| f (x)dx < ∞ ∀ t ∈ (−h, h) and et|x| f (x)dx < ∞ ∀ t ∈ (−h, h)
−∞ 0
Z 0 Z ∞
=⇒ e|t||x| f (x)dx < ∞ ∀ t ∈ (−h, h) and e|t||x| f (x)dx < ∞ ∀ t ∈ (−h, h)
−∞ 0
Z ∞
|tx|
=⇒ e f (x)dx < ∞ ∀ t ∈ (−h, h);
−∞

here f (·) denotes the p.d.f. of r.v. X.


|x|r
Fix r ∈ {1, 2, . . . } and t ∈ (−h, h) − {0}. Then, lim = 0 and therefore ∃ a positive real number Ar,t such
|x|→∞ e|tx|
that |x|r < e|tx| , ∀ |x| > Ar,t . Therefore
Z ∞
r
E(|X| ) = |x|r f (x)dx
−∞
Z Z
= |x|r f (x)dx + |x|r f (x)dx
|x|≤Ar,t |x|>Ar,t
Z Z
≤ Arr,t f (x)dx + e|tx| f (x)dx
|x|≤Ar,t |x|>Ar,t
Z ∞
≤ Arr,t + e|tx| f (x)dx < ∞, r = 1, 2, . . . .
−∞

∞ ∞
dr
Z Z
tx (r)
(b) MX (t) = e f (x)dx, MX (t) = r etx f (x)dx, r = 1, 2, . . . .
−∞ dt −∞

Using the arguments of advanced calculus it can be shown that of MX (t) = E(etX ) < ∞, ∀ t ∈ (−h, h), then the
derivative can be passed through the integral sign. Therefore,
Z ∞ r Z ∞
(r) d tx
xr etx f (x)dx, r = 1, 2, . . .

MX (t) = r
e f (x) dx =
−∞ dt −∞

and Z ∞
(r)
MX (0) = xr f (x)dx = E(X r ).
−∞

∞ r r
!
Z ∞ Z ∞
tx
X t x
(c) MX (t) = e f (x)dx = f (x)dx.
−∞ −∞ r=0
r!

Under the assumption that MX (t) = E(etX ) < ∞, ∀ t ∈ (−h, h), using arguments of advanced calculus, it can be
shown that the summation sign can be passed through the integral sign. Thus,
∞ r Z ∞ ∞ r
X t X t
MX (t) = xr f (x)dx = E(X r ), r = 1, 2, . . . .
r=0
r! −∞ r=0
r!

This completes the proof.


Corollary 0.87. Under the notation and assumption of the above theorem define ψX (t) = ln(MX (t)), t ∈ (−h, h).
Then,
(1) (2)
µ01 = µ = E(X) = ψX (0) and µ2 = σ 2 = Var(X) = ψX (0).
Lecture 0 Lecture Notes 0-41

Proof. For t ∈ (−h, h)


(1)
(1) MX (t) (1) (1)
ψX (t) = =⇒ ψX (0) = MX (0) = E(X) (since MX (0) = 1).
MX (t)
Also,
(2) (1)
(2) MX (t)MX (t) − (MX (t))2 (2) (2) (1)
ψX (t) = =⇒ ψX (0) = MX (0) − (MX (0))2 = E(X 2 ) − (E(X))2 = Var(X).
(MX (t))2
This completes the proof.
Example 0.88. (a) Let X be a discrete r.v. with p.m.f.
 −λ x
e λ
, x = 0, 1, 2, . . . ,
fX (x) = P (X = x) = x!
0,

otherwise,

where λ > 0. Show that the m.g.f. of X exists and is finite on whole R. Find MX (t), mean, variance of X and E(X 3 ).
(b) Let X be a continuous r.v. with p.d.f.
(
λe−λx , x > 0,
fX (x) =
0, otherwise,

where λ > 0. Find m.g.f., mean, variance of X and E(X r ), r = 1, 2, . . . (provided they exist).
1
(c) Let X be a continuous r.v. having the p.d.f. f (x) = , −∞ < x < ∞ (called Cauchy p.d.f. and
π(1 + x2 )
corresponding probability distribution is called Cauchy distribution). Show that the m.g.f. of X does not exist.

Solution: (a) We have


∞ ∞
X e−λ λx X (λet )x t t
MX (t) = etx = e−λ = e−λ eλe = eλ(e −1) , ∀ t ∈ R.
x=0
x! x=0
x!
t
−1)
Thus, m.g.f. of X exists and finite on whole of R and MX (t) = eλ(e , t ∈ R.
(1) (2)
Now ψX (t) = ln(MX (t)) = λ(et − 1) =⇒ ψX (t) = λet = ψX (t), ∀ t ∈ R.
(1) (2)
Thus, E(X) = ψX (0) = λ and Var(X) = ψX (0) = λ. Again,
(1) t (1)
−1)
MX (t) = λet eλ(e = λet MX (t) =⇒ MX (0) = E(X) = λ,
(2) (1) (2)
MX (t) = λet MX (t) + λet MX (t) =⇒ MX (0) = E(X 2 ) = λ2 + λ,
(3) (2) (1) (3)
MX (t) = λet MX (t) + 2λet MX (t) + λet MX (t) =⇒ MX (0) = E(X 3 ) = λ3 + 3λ2 + λ.

Alternatively, for t ∈ R,
t
−1)
MX (t) = eλ(e
λ2 (et − 1)2 λ3 (et − 1)3
= 1 + λ(et − 1) + + + ···
2! 3!
   2  3
∞ j 2 ∞ j 3 ∞ j
X t λ X t λ X t  + ···
= 1 + λ +   + 
j=1
j! 2! j=1
j! 3! j=1
j!
Lecture 0 Lecture Notes 0-42

λ2 2λ2 λ3
   
2 λ 3 λ
= 1 + λt + t + +t + + + ···
2! 2! 3! (2!)2 3!
Thus,

E(X) = coefficient of t in the expansion of MX (t) = λ,


t2
E(X 2 ) = coefficient of in the expansion of MX (t) = λ2 + λ,
2!
t3
E(X 3 ) = coefficient of in the expansion of MX (t) = λ3 + 3λ2 + λ.
3!
Z ∞ Z ∞
(b) etx fX (x)dx = λ e−λ(1−t/λ)x dx < ∞, if t < λ. Thus the m.g.f. of X exists and, for t < λ,
−∞ −∞
−1
t2 tr

t t
MX (t) = 1− =1+ + 2 + ··· + r + ··· .
λ λ λ λ
For r = 1, 2, . . .
tr r!
µ0r = E(X r ) = coefficient of in the expansion of MX (t) = r , r ∈ {1, 2, . . . }.
r! λ
Alternatively,
 −2  −3  −(r+1)
(1) 1 t (2) 2 t (r) r! t
MX (t) = 1− , MX (t) = 2 1 − and MX (t) = r 1 − , t < λ.
λ λ λ λ λ λ
This implies
(r) r! 2 1 1
E(X r ) = MX (0) = , r = 1, 2, . . . and Var(X) = 2 − 2 = 2 .
λr λ λ λ

(c) Since E(X) is not finite, the m.g.f. of X does not exist.
Definition 0.89 (Equality in Distribution). Let X and Y be two r.v.’s with d.f.’s FX and FY , respectively. We say that
d
X and Y have the same distribution (written as X = Y ) if FX (x) = FY (x), ∀ x ∈ R.
Remark 0.90. (i) Let X and Y be two discrete r.v.’s with p.m.f.’s fX and fY , respectively. Then,
d
X = Y ⇐⇒ fX (x) = fY (x), ∀ x ∈ R.
d
(ii) Let X and Y be two continuous r.v.’s. Then, X = Y iff there exist versions of p.d.f.’s fX and fY of X and Y ,
respectively, such that fX (x) = fY (x), ∀ x ∈ R.
d d
(iii) Suppose X = Y , then for any Borel measurable function h : R → R, h(X) = h(Y ) and hence E(h(X)) =
E(h(Y )).
d
Theorem 0.91. Let X and Y be r.v.’s such that for some c > 0, MX (t) = MY (t), ∀ t ∈ (−c, c). Then, X = Y .

Proof. Special Case: Suppose that X and Y are discrete r.v.’s with support SX = SY = {1, 2, . . . }, pk = P (X = k)
and qk = P (Y = k), k = 1, 2, . . . . Then

MX (t) = MY (t), ∀ t ∈ (−c, c), for some c > 0


X∞ ∞
X
=⇒ ekt pk = ekt qk ∀ t ∈ (−c, c)
k=1 k=1
Lecture 0 Lecture Notes 0-43


X ∞
X
=⇒ Λk pk = Λk qk ∀ Λ ∈ (e−c , ec )
k=1 k=1
=⇒ pk = qk ∀ k = 1, 2, . . . ,

d
since if two power series are equal over an interval then their coefficients are the same. Thus, X = Y .
Example 0.92. For any p ∈ (0, 1) and positive integer n, let Xp,n be a discrete r.v. with p.m.f.
 
 n px (1 − p)n−x , if x = {0, 1, . . . , n},

fp,n (x) = x

0, otherwise.

Here, p ∈ (0, 1) and n ∈ N. (Such a r.v. or probability distribution is called binomial r.v. or distribution with n trials
d
and probability of success p). Define Yp,n = n − Xp,n . Using the m.g.f. of Xp,n , show that Yp,n = X1−p,n . Find
E(X1/2,n ).

Solution: We have
n   n  
 X n x X n
MXp,n (t) = E etXp,n = etx p (1 − p)n−x = (et p)x (1 − p)n−x = (1 − p + pet )n , t ∈ R.
x=0
x x=0
x

Now
 
MYp,n (t) = E etYp,n = E et(n−Xp,n )


= ent MXp,n (−t) = ent (1 − p + pe−t )n


= (p + (1 − p)et )n = (1 − (1 − p) + (1 − p)et )n = MX1−p,n (t) ∀ t ∈ R.

d
Thus, Yp,n = X1−p,n .
Alternatively,

fYp,n (y) = P (Yp,n = y)


= P (Xp,n = n − y)
 
n
pn−y (1 − p)n−(n−y) , if n − y = {0, 1, . . . , n},


= n−y

0, otherwise.
 
 n (1 − p)y (1 − (1 − p))n−y , if y = {0, 1, . . . , n},

= y

0, otherwise.
= fX1−p,n (y) ∀ y ∈ R.

d
Thus, Yp,n = X1−p,n .
d
Now for p = 1/2, X1/2,n = n − X1/2,n . Thus, E(X1/2,n ) = E(n − X1/2,n ) =⇒ E(X1/2,n ) = n/2.

e−|x| d
Example 0.93. Let X be a r.v. with p.d.f. fX (x) = , −∞ < x < ∞ and let Y = −X. Show that Y = X and
2
hence show that E(X) = 0.
Lecture 0 Lecture Notes 0-44

Solution: We have
∞ ∞
e−|x| e−|x|
Z Z
MY (t) = E(etY ) = E(e−tX ) = e−tx dx = etx dx = MX (t) ∀ t ∈ (−1, 1).
−∞ 2 −∞ 2

"
∞ 0 ∞
e−|x| ex e−x
Z Z Z
MX (t) = etx dx = etx dx + etx dx
−∞ 2 −∞ 2 0 2
Z ∞ Z ∞ 
1
= e−(1+t)x dx + e−(1−t)x dx
2 0 0
  #
1 1 1 1 d
= + = ∀ t ∈ (−1, 1) =⇒ X = Y.
2 1+t 1−t 1 − t2

Alternatively, the p.d.f. of Y is

e−|y|/2 d
= fX (y) ∀ − ∞ < y < ∞ =⇒ X = Y.
fY (y) =
2
Z ∞
Thus, E(Y ) = E(X) =⇒ E(−X) = E(X) =⇒ E(X) = 0 (since |x|fX (x)dx < ∞).
−∞

0.11. Inequalities

Inequalities provide estimates of probabilities when they can not be evaluated precisely.
Theorem 0.94. Let X be a r.v. and let g : R → R be a non-negative function such that E(g(X)) is finite. Then, for
any c > 0,
E(g(X))
P (g(X) ≥ c) ≤ .
c

Proof. We will prove it for the case of continuous r.v.


Let A = {x ∈ R : g(x) ≥ c}. Let fX (x) denote the p.d.f. of X. Then,
Z ∞
E(g(X)) = g(x)fX (x)dx
−∞
Z ∞
= g(x)[IA (x) + IAc (x)]fX (x)dx
−∞
Z ∞ Z ∞
= g(x)IA (x)fX (x)dx + g(x)IAc (x) fX (x)dx
−∞ −∞
Z ∞
≥ g(x)IA (x)fX (x)dx
−∞
Z ∞
≥c IA (x)fX (x)dx
−∞
Z
E(g(X))
=c fX (x)dx = cP (g(X) ≥ c) =⇒ P (g(X) ≥ c) ≤ .
A c

This completes the proof.


Lecture 0 Lecture Notes 0-45

Corollary 0.95. (a) Let g : [0, ∞) → R be a non-negative and strictly increasing function such that E(g(X)) is finite.
Then, for any c > 0 such that g(c) > 0,
E(g(|X|))
P (|X| ≥ c) ≤ .
g(c)

(b) Let r > 0 and t > 0. Then,

E(|X|r )
P (|X| ≥ t) ≤ , (Markov’s inequality)
tr
E(|X|)
provided E(|X r |) < ∞. In particular, P (|X| ≥ t) ≤ , provided E(|X|) < ∞.
t

Proof. (a) Note that

P (|X| ≥ c) = P (g(|X|) ≥ g(c)) (since g is strictly increasing)


E(g(|X|))
≤ (by Theorem 0.94).
g(c)

(b) We take g(x) = xr , x ≥ 0, r > 0. Then, g is strictly increasing on [0, ∞) and is non-negative. Using (a) we get

E(g(|X|)) E(|X|r )
P (|X| ≥ t) ≤ = .
g(t) tr

This proves the result.


Theorem 0.96 (Chebyshev Inequality). Let X be a r.v. with finite variance σ 2 and E(X) = µ. Then, for any  > 0,
1
P (|X − µ| ≥ σ) ≤ .
2

Proof. Using the above Corollary

E(|X − µ|2 ) E((X − µ)2 ) 1


P (|X − µ| ≥ σ) ≤ 2 2
= = 2.
 σ 2 σ 2 
This completes the proof.

Example 0.97 (The above bounds are sharp). Let X be a r.v. with p.m.f.
1
 8 , if x = −1, 1,

f (x) = 34 , if x = 0, .

0, otherwise.

1
Then E(X 2 ) = 4 and P (|X| ≥ 1) = 41 .
Using the Markov inequality, P (|X| ≥ 1) ≤ E(X 2 ) = 41 .
Example 0.98. Let X be a r.v. with p.d.f.
( 1
√ √

2 3
, if − 3<x< 3,
f (x) = .
0, otherwise.
Lecture 0 Lecture Notes 0-46

√ √
3 3
x2
Z Z
x
Then µ = E(X) = √
√ dx = 0, σ 2 = E(X 2 ) = √
√ dx = 1 and
− 3 2 3 − 3 2 3
Z 3/2

3 1 3
P (|X| ≥ ) = 1 − √ dx = 1 − = 0.134.
2 −3/2 2 3 2

Using the Markov inequality P (|X| ≥ 23 ) ≤ 94 E(X 2 ) = 4


9 = 0.444 . . . (considerably conservative).

Definition 0.99. Let −∞ ≤ a < b ≤ ∞. A function ψ : (a, b) → R is said to be a convex function if

ψ(αx + (1 − α)y) ≤ αψ(x) + (1 − α)ψ(y) ∀ x, y ∈ (a, b) and ∀ α ∈ (0, 1).

The function ψ(·) is said to be strictly convex if the above inequality is strict.

We state the following theorem without proof.


Theorem 0.100. (i) Let ψ : (a, b) → R be a convex function. Then, ψ is continuous on (a, b) and is almost everywhere
differentiable (i.e. if D is the set of points where ψ is not differentiable then D does not contain any interval).
(ii) Let ψ : (a, b) → R be a differentiable function. Then, ψ is convex (strictly convex) on (a, b) iff ψ 0 is non-decreasing
(strictly increasing) on (a, b).
(iii) Let ψ : (a, b) → R be a twice differentiable function. Then, ψ is convex (strictly convex) on (a, b) iff

ψ 00 (x) ≥ (>)0, ∀ x ∈ (a, b).


Theorem 0.101 (Jensen’s Inequality). Let ψ : (a, b) → R be a convex function and let X be a r.v. with d.f. F having
support S ⊆ (a, b). Then,

E(ψ(X)) ≥ ψ(E(X)), provided the expectations exist.

Proof. We give the proof for the special case where ψ is twice differentiable on (a, b) so that ψ 00 (x) ≥ 0, ∀ x ∈ (a, b).
Let µ = E(X). Expand ψ(x) into a Taylor series about µ we get

(x − µ)2 00
ψ(x) = ψ(µ) + (x − µ)ψ 0 (µ) + ψ (ξ), ∀ x ∈ (a, b)
2!
for some ξ between µ and x. Thus,

ψ(x) ≥ ψ(µ) + (x − µ)ψ 0 (µ) =⇒ E(ψ(X)) ≥ E(ψ(µ) + (X − µ)ψ 0 (µ)) = ψ(µ) = ψ(E(X)).

This completes the proof.


Example 0.102. (a) For any r.v. X, E(X 2 ) ≥ (E(X))2 [take ψ(x) = x2 , x ∈ R is convex, apply Jensen’s Inequality]
and E(|X|) ≥ |E(X)| [Take ψ(x) = |x|, x ∈ R is convex and apply Jensen’s Inequality].
(b) For any r.v. X with P (X > 0) = 1, E(ln X) ≤ ln E(X) [Take ψ(x) = − ln x is convex on (0, ∞) and apply
Jensen’s Inequality].
(c) For any r.v. X, E(eX ) ≥ eE(X) [Take ψ(x) = ex , x ∈ R is convex and apply Jensen’s Inequality].
(d) For any r.v. X with P (X > 0) = 1, E(X)E(1/X) ≥ 1 [Take ψ(x) = 1/x, x > 0 is convex and apply Jensen’s
Inequality].
Lecture 0 Lecture Notes 0-47

Pn
Example 0.103. Let a1 , a2 , . . . , an , w1 , w2 , . . . , wn be positive constants such that i=1 wi = 1. Prove the AM-
GM-HM inequality
n n
X Y 1
ai wi ≥ awi ≥ Pn
i
wi , (AM ≥ GM ≥ HM ).
i=1 i=1 i=1 ai

Solution: Let X be a r.v. with p.m.f.


(
wi , if x = ai , i = 1, 2, . . . , n,
f (x) = .
0, otherwise.

Then ψ(x) = − ln x, x > 0 is a convex function. Therefore

E(ψ(X)) ≥ ψ(E(X))
=⇒ E(− ln X) ≥ − ln E(X)
n n
!
X X
=⇒ − (ln ai )wi ≥ − ln ai wi
i=1 i=1
n
! n
! n n
X Y X Y
=⇒ ln ai wi ≥ ln aw
i
i
=⇒ ai wi ≥ aw
i .
i

i=1 i=1 i=1 i=1

n n
X wi Y
Replacing ai ’s by 1
ai ’s, we get ≤ 1/ aw i
i . Therefore,
i=1
ai i=1

n n
X Y 1
ai wi ≥ aw
i ≥ Pn
i
wi .
i=1 i=1 i=1 ai

0.12. Summary of Probabilty Distributions

Let X be a r.v. defined on a probability space (Ω, F, P ) associated with a random experiment E . Let FX (·) be its
distribution function and fX (·) be its p.m.f. / p.d.f.
The probabilty distribution of X (i.e., p.m.f. / p.d.f.) describes the manner in which the r.v. X takes values in various
sets. It may be desirable to have a set of numerical measures that provide a summary of the prominent features of
the probability distribution of X. We call these measures as descriptive measures. Four prominently used descriptive
measures are:
(1) Measures of Central Tendency or Location (also called Averages):
This gives us the idea about central value of the probability distribution around which the values of r.v. X are clustered.
Commonly used measures of central tendency are:
(a) Mean:
Z ∞ X
µ= µ01 = E(X) = xfX (x)dx or xfX (x) → may or may not exist.
−∞ x∈SX

Whenever it exists it gives us the idea about average observed value of X when E is repeated a large number of times.
d
Note that if distribution of X is symmetric about µ (i.e., X − µ = µ − X), then E(X) = µ, provided it exists.
Lecture 0 Lecture Notes 0-48

Mean seems to be the best suited measure of central tendency for symmetric distribution. Because of its simplicity
mean is the most commonly used average. However mean may be affected by a few extreme values and also it may
not be defined.
(b) Median:
Before defining the median we first inroduce the concept of quantile function or quantile.
The quantile function of r.v. X is a function QX : (0, 1) → R defined by

QX (p) = inf{x ∈ R : FX (x) ≥ p}, p ∈ (0, 1).

For a fixed p ∈ (0, 1) the quantity ξp = QX (p) is called the quantile of order p. Note that

FX (ξp −) ≤ p ≤ FX (ξp ), (Exercise)

and FX (ξp ) = p provided FX is continuous at ξp . Also note that:


· QX (FX (x)) ≤ x, provided 0 < FX (x) < 1;
· FX (QX (p)) ≥ p, ∀ 0 < p < 1;
· FX is continuous =⇒ FX (QX (p)) = p;
· QX (p) ≤ x ⇐⇒ FX (x) ≥ p;
−1 −1
· QX (p) = FX (p), provided FX (p) exists;
· QX (p1 ) ≤ QX (p2 ), ∀ 0 < p1 < p2 < 1.
The quantile of order 0.5 is called the median of (distribution) of X. If me is the median of X, then
1
FX (me −) ≤ ≤ FX (me ).
2
If the random experiment E is repeated a large number of times about half of the times observed value of X is expected
to be less than me and about half of the times it is expected to be grearter than me .
Suppose that the distribution of X is symmetric about µ. Then
d
X −µ=µ−X
=⇒ P (X − µ ≤ 0) = P (µ − X ≤ 0)
=⇒ FX (µ) = 1 − FX (µ−)
1
=⇒ FX (µ−) ≤ ≤ FX (µ) =⇒ µ = E(X) = me , provided FX is continuous at µ.
2
Merits of Median as a Measure of Central Tendency:
· Unlike mean it is always defined;
· Median is not affected by a few extreme values of X as it takes into account only the probabilities with which
different values occur and not their numerical values.
As a measure of central tendency the median is preferred over the mean if the distribution is asymmetric and a few
extreme observations occur with positive probabilities.
Demerits of Median as a Measure of Central Tendency:
· Does not at all take into account the numerical values assumed by X;
Lecture 0 Lecture Notes 0-49

· For many probability distributions it is not easy to evaluate.


(c) Mode:
Roughly speaking mode m0 of a probability distribution is the value that occurs with highest probability and is defined
by
fX (m0 ) = sup{fX (x) : x ∈ SX }.
If the random experiment E is repeated a large number of times then either mode m0 or a value in the neighborhood
of m0 is observed with maximum frequency.
Note that mode of a distribution may not be unique. A distribution having single / double / triple / multiple mode(s) is
called a unimodal / bimodal / trimodal / multimodal distribution.
Merits of a Mode as a Measure of Central Tendency:
It is easy to understand and easy to calculate. Normally, it can be found by just inspections.
Demerits of Mode as a Measure of Central Tendency:
· A probability distribution may have more than one mode which may be far apart.
As a measure of central tendency, mode is less preferred than mean and median. Clearly for symmetric unimodal
distributions mean=median=mode.
(2) Measures of Dispersion:
Apart from measures of central tendency other measures are often required to describe a probability distribution.
Measures of dispersion give the idea about the scatter (cluster / dispersion) of probability mass of the distribution
about a measure of a central tendency. Some of the measures of dispersion are listed below.
(a) Range:
Let SX = [a, b]. Then range of distribution of X is defined by R = b − a. It does not take into account how the
probability mass is distributed over [a, b]. For this reason it is not a preferred measure of dispersion.
(b) Mean Deviation:
Let A be a suitable measure of central tendency. Define
· M D(A) = E(|X − A|) → called the mean deviation of X about A (provided it exists);
· M D(µ) = E(|X − µ|) → mean deviation about mean µ = E(X);
· M D(me ) = E(|X − me |) → mean deviation about median.
It can be show that M D(me ) ≤ M D(A), ∀ A ∈ R. For this reason M D(me ) seems to be more applicable than
M D(A) for any A ∈ R.
· M D(A) is generally difficult to compute for many distributions;
· M D(A) is sensitive to extreme observations;
· M D(A) may not exist for many distributions.
(c) Standard Deviation (SD):
p p
The standard
p deviation of distribution of X is defined by σ = Var(X) = E(X − µ)2 , where µ ∈ R. Clearly
σ ≤ E(X − A)2 , ∀A ∈ R. It has same unit as that of X.
Standard deviation σ gives us the idea of average spread of values of X around the mean µ.
Lecture 0 Lecture Notes 0-50

· σ is simple to compute for most distributions (unlike M D(A), A ∈ R);


· SD is most widely used measure of dispersion (especially for nearly symmetric distributions);
· For some distributions SD does not exist;
· SD is sensitive to extreme observations.
(d) Quartile Deviation:
Let q1 = ξ0.25 = quantile of order 0.25 (lower quantile of X),
q2 = me = ξ0.5 = quantile of order 0.5=median,
q3 = ξ0.75 = quantile of order 0.75 (upper quantile of X).
So, q1 , q2 , q3 divide the probability distribution of X into 4 parts so that
1 1 3
FX (q1 −) ≤ ≤ FX (q1 ), FX (q2 −) ≤ ≤ FX (q2 ) and FX (q3 −) ≤ ≤ FX (q3 ).
4 2 4
Note that q1 , q2 and q3 divide the p.d.f. / p.m.f. of X into 4 parts so that each of them has 25% probability mass.
q3 − q1
Define IQR = q3 − q1 → inter-quantile range, QD = → quantile deviation or the semi-interquantile range.
2
· Unlike SD, QD is not sensitive to extreme values assumed by X.
· Does not at all take into account numerical values of X.
· Ignores the tail of the probability distribution (constituting 50% of probability diistributin on left side of q1 and right
side of q3 ).
· QD depends on the unit of measurements of X and thus it may not be appropriate for comparing dispersions of two
q3 − q1
probability distributions having different units of measurements. For this purpose one may use CQD = →
q3 + q1
coefficient of quartile deviation. It does not depend on units of measurements.
(d) Coefficient of Variation:
Like QD, the SD σ also depends on units of measurements of r.v. X and thus it is not an appropriate measure of
dispersion for comparing distributions having different units of measurements. For this purpose we consider
σ
CV (coefficient of variation) = ,
µ
p
where µ = E(X), σ = Var(X). Here, we assume µ 6= 0.
· CV measures variation per unit of mean.
· CV does not depend on the unit of measurements of r.v. X.
· CV is very sensitive to small changes in µ when µ is near 0.
(3) Measure of Skewness:
Skewness of a probability distribution is a measure of its asymmetry (lack of symmetry).
d
Recall that: Distribution of X is symmetric about µ ⇐⇒ X − µ = µ − X ⇐⇒ fX (µ + x) = fX (µ − x), ∀ x ∈ R
and in that case
· µ = E(X) = me (median);
· The shape of the p.d.f. / p.m.f. on the left of µ is the mirror image of that on the right side of µ.
Lecture 0 Lecture Notes 0-51

Positively Skewed Distributions:


· Have more probability mass to the right side of p.d.f. / p.m.f.
· Have longer tails on the right side of p.d.f.
For unimodal positively skewed distribution, normally

Mode < Median < Mean

since the positive mass to large values of X pulls up the values of mean µ.
Negatively Skewed Distributions:
· Have more probability mass to the left side of the p.d.f. / p.m.f.
· Have longer tails on the left side of p.d.f.
For unimodal negatively skewed distributions, normally

Mean < Median < Mode.


p X −µ
Let E(X) = µ, Var(X) = σ and Z = : standardized variable (independent of units). Define
σ
E((X − µ)3 ) µ3
Coefficient of skewness = β1 = E(Z 3 ) = 3
= 3/2 , where µr = E((X − µ)r ), r = 1, 2, . . .
σ µ2
· For symmetric distributions β1 = 0. Converse may not be true.
· For positively skewed distributions, normally β1 is large positive quantity.
· For negatively skewed distributions, normally β1 is a small negative quantity.
A measure of skewness can also be based on quantiles. Let q1 : first quantile, me : Median (or second quantile q2 ),
q3 : third quantile, µ : mean.
 
q1 + q3
· For symmetric distributions: q3 − m = m − q1 m = .
2
· For positively skewed distributions: q3 − m > m − q1 .
· For negatively skewed distributions: q3 − m < m − q1 .
Thus a measure of skewness can be based on (q3 − m) − (m − q1 ) = q3 − 2m + q1 . Define

(q3 − m) − (m − q1 ) q3 − 2m + q1
Yule coefficient of skewness = β2 = = (independent of units).
q3 − q1 q3 − q1
Clearly for positively / negatively skewed distribution β2 > 0/β2 < 0 and for symmetric distributions β2 = 0.
(4) Measures of Kurtosis:
For µ ∈ R and σ > 0, let Yµ,σ be a r.v. having p.d.f.
1 (x−µ)2
fYµ,σ (x) = √ e− 2σ2 , −∞ < x < ∞ (Normal distribution, Yµ,σ ∼ N (µ, σ 2 )).
σ 2π
It can be shown that
· E(Yµ,σ ) = µ, Var(Yµ,σ ) = σ 2 ;
Lecture 0 Lecture Notes 0-52

d
· Yµ,σ − µ = µ − Yµ,σ and hence β1 = 0, E((Yµ,σ − µ)4 ) = 3σ 4 ;
· fYµ,σ (·) is unimodal and symmetric.
Kurtosis of the probability distribution of X is a measure of peakedness and thickness of tails of p.m.f. / p.d.f. of X
relative to that of normal distribution.
A disribution is said to have higher (lower) kurtosis than the normal distribution if its p.m.f. / p.d.f. in comparison
with p.d.f. of a normal distribution, has a sharper (rounded) peak and longer, fatter (shorter, thinner) tails.
X −µ
Define Z = (independent of units)
σ
E((X − µ)4 ) µ4
ν1 = E(Z 4 ) = = 2 → Kurtosis of the probability distribution of X.
σ4 µ2

ν1 is used as a measure of kurtosis for unimodal distributions. For N (µ, σ 2 ) distribution, ν1 = 3. The quantity
ν2 = ν1 − 3 is called the excess kurtosis of the distribution of X. Obviously for normal distributions, ν2 = 0.
Mesokurtic distributions: Distributions with ν2 = 0,
Leptokurtic distributions: Distributions with ν2 > 0 (has sharper peak and longer, fatter tails).
Platykurtic distributions: Distributions with ν2 < 0 (has rounded peak and shorter, thinner tails).
Example 0.104. For α ∈ [0, 1], let Xα has the p.d.f.
(
αex , x < 0,
fα (x) = .
(1 − α)e−x , x ≥ 0.

Recall that for r ∈ {1, 2, . . . }


Z ∞
Ir = xr−1 e−x dx = (r − 1)! (using integration by parts).
0

Thus, for r ∈ {1, 2, . . . }


Z 0 Z ∞
µ0r (α) = E(Xαr ) = αxr ex dx + (1 − α)xr e−x dx
−∞ 0
Z ∞
= ((−1) α + 1 − α)r
xr e−x dx
0
(
(1 − 2α)r!, r ∈ {1, 3, 5, . . . },
= .
r!, r ∈ {2, 4, 6, . . . }.

Let ξp be the quantile of order p ∈ (0, 1). Then Fα (ξp ) = p, where Fα is the d.f. of Xα . Clearly Fα (0) =
Z 0
α ex dx = α. For 0 ≤ α < p, we have
−∞

Z 0 Z ξp
p = Fα (ξp ) = x
αe dx + (1 − α)e−x dx = 1 − (1 − α)e−ξp
−∞ 0

and for α ≥ p
Z ξp
p= αex dx = αeξp .
−∞
Lecture 0 Lecture Notes 0-53

Thus,   
ln 1−α

1−p , if 0 ≤ α < p,
ξp =  
− ln αp , if p ≤ α ≤ 1,

  
ln 4(1−α) , if 0 ≤ α < 1 ,
3 4
q1 (α) = ξ1/4 =
− ln (4α) , if 1 ≤ α ≤ 1,
4
(
ln (2(1 − α)) , if 0 ≤ α < 21 ,
me (α) = ξ1/2 = 1
− ln (2α) , if 2 ≤ α ≤ 1,
(
ln (4(1 − α)) , if 0 ≤ α < 43 ,
q3 (α) = ξ3/4 =
− ln 4α 3

3 , if 4 ≤ α ≤ 1,

µ01 (α) = E(Xα ) = 1 − 2α,


Mode = m0 (α) = sup{fα (x) : −∞ < x < ∞} = max{α, 1 − α},
p p
µ02 (α) = E(Xα2 ) = 2, σ(α) = Var(Xα ) = 1 + 4α − α2 .

Note that, for 0 ≤ α < 12 , me (α) = ln(2(1 − α)) ≥ 0 and for α > 21 , me (α) = − ln(2α) < 0. Thus, for 0 ≤ α < 1
2
(so that me (α) ≥ 0)

M D(me (α)) = E(|X − me (α)|)


Z 0 Z me (α) Z ∞
−x
=α x
(me (α) − x)e dx + (1 − α) (me (α) − x)e dx + (1 − α) (x − me (α))e−x dx
−∞ 0 me (α)

= me (α) + 2α = ln(2(1 − α)) + 2α.


1
Similarly, for 2 ≤ α ≤ 1 (so that me (α) ≤ 0)

M D(me (α)) = E(|X − me (α)|)


Z me (α) Z 0 Z ∞
=α x
(me (α) − x)e dx + α x
(x − me (α))e dx + (1 − α) (x − me (α))e−x dx
−∞ me (α) 0

= 2(1 − α) − me (α) = ln(2α) + 2(1 − α).

Thus, (
ln(2(1 − α)) + 2α, if 0 ≤ α < 21 ,
M D(me (α)) =
ln(2α) + 2(1 − α), if 21 ≤ α ≤ 1,
(
ln 3, if 0 ≤ α < 41 or 34 ≤ α ≤ 1,
IQR ≡ IQR(α) = q3 (α) − q1 (α) =
ln(16α(1 − α)), if 41 ≤ α < 43 ,
 √
 ln 3, if 0 ≤ α < 14 ,
q3 (α) − q1 (α)  p
QD ≡ QD(α) = = ln(4 α(1 − α), if 14 ≤ α < 43 ,
2  √

ln 3, if 43 ≤ α ≤ 1,
Lecture 0 Lecture Notes 0-54


ln 3

   , if 0 ≤ α < 41 ,
16(1−α)2




 ln 3


ln(16α(1 − α))

q3 (α) − q1 (α) 
 , if 14 ≤ α ≤ 43 ,
CQD ≡ CQD(α) = = 
(1−α)
q3 (α) + q1 (α)  ln


 α

ln 3


3
− 2  , if 4 ≤ α ≤ 1.


ln 16α

3

For α 6= 21 ,

σ(α) 1 + 4α − 4α2
CV ≡ CV (α) = 0 = ,
µ1 (α) 1 − 2α
µ3 (α) = E((Xα − µ01 (α))3 ) = µ03 (α) − 3µ01 (α)µ02 (α) + 2(µ01 (α))3 = 2(1 − 2α)3 ,
µ3 (α) 2(1 − 2α)3
β1 ≡ β1 (α) = =√ ,
σ(α) 1 + 4α − 4α2
 4
 ln( 3 ) , if 0 ≤ α < 1 ,



 ln 3 4

ln(4α(1 − α))


− , if 41 ≤ α < 21 ,


q3 (α) − 2m(α) + q1 (α)  ln(16α(1 − α))
β2 ≡ β2 (α) = =
q3 (α) − q1 (α) ln(4α(1 − α))
, if 21 ≤ α ≤ 43 ,



ln(16α(1 − α))




 3

 ln( 4 ) , if 3 ≤ α ≤ 1.


ln 3 4

Clearly, for 0 ≤ α < 21 , βi (α) > 0, i = 1, 2 and for 1


2 < α ≤ 1, βi (α) < 0, i = 1, 2. For α = 21 , βi (α) = 0, i = 1, 2.
Thus,
· for 0 ≤ α < 21 , distribution of Xα is positively skewed;
1
· for 2 < α ≤ 1, distribution of Xα is negatively skewed;
· for α = 12 , distribution of Xα is symmetric (infact in this case fα (x) = fα (−x), ∀ x ∈ R).

µ4 ≡ µ4 (α) = E((Xα − µ01 (α))4 )


= µ04 (α) − 4µ01 (α)µ03 (α) + 6(µ01 (α))2 µ02 (α) − 3(µ01 (α))4 = 24 − 12(1 − 2α)2 − 3(1 − 2α)4

µ4 (α) 24 − 12(1 − 2α)2 − 3(1 − 2α)4


ν1 ≡ ν1 (α) = = 2
(µ2 (α))2 (2 − (1 − 2α)2 )
and
12 − 6(1 − 2α)4
ν2 ≡ ν2 (α) − 3 = 2.
(2 − (1 − 2α)2 )
Clearly, for any α ∈ [0, 1], ν2 (α) > 0. It follows that for any value of α ∈ [0, 1] the distribution of Xα is leptokurtic.

You might also like