Chapter 4
Chapter 4
1 Introduction.
When a large collection of observations is assembled, we are usually inter-
ested not in the any individual observation, but rather in some quantity by
which we can represent the complete set of observation such as some central
value (arithmetic mean, median etc.). Similarly, in case of the random vari-
able we are interested in value around which the probability distribution of
the random variable is cantered. The mean of the probability distribution
of given random variable is termed as its expectation.
R
Definition 1.1. A function X is said to be integrable iff | X | dP < ∞,
i.e., iff X + dP and X − dP are both finite.
R R
provided, Z X
| x | dP < ∞ or | x | P [X = x] < ∞.
x∈Ω x∈Ω
1
Example. Let in a coin tossing experiment, we define a random variable
X takes value 1 if heads turns up and takes value 0 in case of tails. Let
probability of heads turns up in p. Then
E[X] = 1 · P [X = 1] + 0 · P [X = 0] = p.
E[Y ] = E[aX + b]
Z
= (ax + b)fX (x)dx
Zx∈Ω Z
= axfX (x)dx + bfX (x)dx
x∈Ω x∈Ω
Z Z
= a xfX (x)dx + b fX (x)dx
x∈Ω x∈Ω
R
= aE[X] + b, since x∈Ω fX (x)dx = 1.
Theorem 1.3. Let X and Y are two random variables. Then E[X + Y ] =
E[X] + E[Y ].
2
Proof. Let Y, X are continuous random variables. Then
Z Z
E[X + Y ] = (X + Y )fXY (x, y)dydx
X∈ΩX Y ∈ΩY
Z Z
= XfXY (x, y)dydx
X∈ΩX Y ∈ΩY
Z Z
+ Y fXY (x, y)dydx
X∈ΩX Y ∈ΩY
Z Z
= X fXY (x, y)dydx
X∈ΩX Y ∈ΩY
Z Z
+ Y fXY (x, y)dydx
ZX∈ΩX Y ∈ΩY Z
= XfX (x)dx + Y fY (y)dy
X∈ΩX Y ∈ΩY
= E[X] + E[Y ].
Theorem 1.4. Let X and Y are two random variables and let X ≥ Y. Then
E[X] ≥ E[Y ].
Proof. Given X ≥ Y, then
X −Y ≥ 0
⇒ E[X − Y ] ≥ 0
⇒ E[X] − E[Y ] ≥ 0
⇒ E[X] ≥ E[Y ].
| E[X] |≤ E | X | .
Theorem 1.6. Let X and Y are two independent random variables. Then
E[XY ] = E[X] · E[Y ].
3
Proof. Given X and Y are independent random variables. Then
Z Z
E[XY ] = XY fXY (x, y)dydx
X∈ΩX Y ∈ΩY
Z Z
= XY fX (x)fY (y)dydx
X∈ΩX Y ∈ΩY
since X and Y are independent.
Z Z
= XfX (x)dx · Y fY (y)dy.
X∈ΩX Y ∈ΩY
= E[X] · E[Y ].
Theorem 1.7. Let E[XY ] = E[X] · E[Y ]. Then it is not necessary that X
and Y are two independent random variables.
This implies
E[X · Y ] = E[X] · E[Y ].
Now
P [X = 0, Y = 0] 1
P [X = 0 | Y = 0] = = 0 6= = P [X = 0].
P [Y = 0] 2
This implies that E[X · Y ] = E[X] · E[Y ] does not implies that X and Y
4
are independent.
NOTE:) From Theorem 1.6 and Theorem 1.7, we can conclude that for
independent random variable X and Y the following condition is necessary
but not sufficient
E[X · Y ] = E[X] · E[Y ].
X \Y 1 -1 P(X=x)
1 1/4 1/4 1/2
Example.
-1 1/4 1/4 1/2
P(Y= y) 1/2 1/2 1
E[X] = E[IA ]
Z
= fX (x)dx
x∈A
= P [x ∈ A] = P [A].
5
Example. Let outcome of an experiment is an integer from {1, 2, 3, 4}.
Also let the experiment preformed n times. Let Xi denotes the sum of
outcomes after ith trial. Find out E[Xn ].
Solution. Let Yi denotes that out come of ith trial, then Xn = ni=1 Yi .
P
This implies
Xn n
X
E[Xn ] = E[ Yi ] = E[Yi ]. (3)
i=1 i=1
Now,
4
X 1 5
E[Yi ] = y= . (4)
4 2
y=1
2 Conditional Expectation.
In previous chapter we defined the conditional probability measure. Can we
define expectation with respect to new probability measure?
B ∩ F = {B ∩ A : A ∈ F} = FB ,
6
In continuous case
Z
1
E[X | B] = xfX (x)dx.
P [B] x∈B
In continuous case
Z
1
E[X | Y ∈ B] = xfX,Y (x, y)dx.
P [Y ∈ B] x∈Ω,y∈B
i. E[X | FB ] is FB measurable.
E[XY | FB ] = E[E[X | FB ]Y ].
Example. A die rolled twice, Y be the outcome of the first roll and X
be the sum of the two outcomes. Show that E[X | Y ] is random variable
with respect to FY , where FY , is the smallest algebra for which Y is random
variable. Also Show that E[E[X | Y ]] = E[X].
Solution. The outcome of the first roll Y (ω) = ω where can take val-
ues from the set {1, 2, 3, 4, 5, 6}. Hence {1}, {2}, . . . {6} are the atoms of
the algebra for which Y is a random variable. Also P [Y = i] = 61 for all
i = 1, 2, . . . , 6.
7
Then,
6
X
E[X | Y = i] = (i + j)P [X = i + j | Y = i]
j=1
6
X P [X = j + i, Y = i]
= (i + j)
P [Y = i]
j=1
6
1 X
= (i + j)P [X = j + i, Y = i]
P [Y = i]
j=1
6
1 X
= (i + j)P [Z = j]P [Y = i]
P [Y = i]
j=1
6
X
= (i + j)P [Z = j],
j=1
This shows that the value of E[X | Y ] depends on the value of Y and take
6 different value for the corresponding to the 6 different values of y. This
implies that
{ω : E[X | Y ] = 3.5 + i} = {ω : Y = i} ∈ FY .
FE[X|Y ] ⊆ FY ,
8
Now consider,
6
X
E[E[X | Y ]] = E[X | Y = j] · P [Y = j]
j=1
6
" 6 #
X X
= xP [X = x | Y = j] · P [Y = j]
j=1 x=1
6
" 6 #
X X P [X = x, Y = j]
= x · P [Y = j]
P [Y = j]
j=1 x=1
6
" 6 #
X X
= xP [X = x, Y = j]
j=1 x=1
6
X 6
X
= x P [X = x, Y = j]
x=1 j=1
6
X
= xP [X = x] = E[X].
x=1
Solution.
Example. Let X and Y are two random variables and following table
shows the joint probability mass function of X and Y :
Y \X -1 0 1 P[Y=y]
0 1/15 2/15 1/15 4/15
1 3/15 2/15 1/15 6/15
2 2/15 1/15 2/15 5/15
P[X=x] 6/15 5/15 4/15 1
9
Find E[X | Y = 2].
Solution. Condition probability mass function of X given Y = 2 is as
follows:
P [X = x, Y = 2] 1
P [X = x | Y = 2] = = P [X = x, Y = 2].
P [Y = 2] 3
i.e.,
2/5, x = −1;
P [X = x | Y = 2] = 1/5, x = 0;
2/5, x = 1.
Hence
Example. Let X and Y are two random variables having joint probability
mass function as
1
2x+1
, x ≥ y, x, y = 1, 2, 3, . . .;
P [X = x, Y = y] =
0, otherwise.
P [X = x, Y = y] 1
P [Y = y] = = .
P [X = x] x
10
3 Moments & Moment Generating Functions.
The term moments in statistics has a resemblance with the term moments
in physics. In physics, moment shows amount of some physical quantity
applied to the some object and it is defined as the product of distance of an
object from a fixed reference point and some physical quantity such as the
force, charge, etc. The rth moment is denoted by µr and defined as
Z
µr = xr ρ(x)dx,
where x is the distance from some fix reference point (or origin) and ρ(x)
be some physical quantity. In particular, the first moment in physics used
to find the center of mass in a system of point masses. In statistics also we
use the same formula for calculating moments, the only difference is that in
statistics instead of ρ(x) we use probability density function fX (x) of vari-
able X. Hence the first moment in statistics gives the center of probability
mass instead of center of mass. In the similar way, in physics the second
moment (moment of inertia), shows the distribution of the point mass with
regards of a fix axis (or origin), and defined as the sum of all the elemen-
tal point masses each multiplied by the square of its perpendicular distance
from the some fix reference point (or origin). While in statistics second mo-
ment is defined as: sum of the square of distance from the some fix reference
point (or origin) multiplied by the corresponding probability masse. The
second statistical moment use to measure the dispersion from the fix ref-
erence point (or origin). Similar correspondence exists for higher moments
also. The above discussion shows the statistical moment are equally impor-
tant as moments in the physics. Hence we can conclude that, statistical
moments are used to characterize any statistical population. Now we are
interested in precise mathematical definition of moments.
Note. when A = 0, then m0r is called the rth moment about the
origin and when A = X̄, then mr is called the rth moment about mean.
E | X − A |r is known as rth absolute moment about A.
11
Now first moment about mean is
find first two moments about origin and first two moments about mean of
X.
Solution. The First moment about origin is
N N
X x 1 X N +1
E[X] = = x= ,
N N 2
x=1 x=1
find first two moments about origin and first two moments about mean of
X.
12
Solution. The First moment about origin is
n
X n
E[X] = xpx (1 − p)n−x
x
x=0
n
X n!
= xpx (1 − p)n−x
x!(n − x)!
x=0
n
X (n − 1)!
= np px−1 (1 − p)n−x
(x − 1)!(n − x)!
x=0
n
X (n − 1)!
= np px−1 (1 − p)n−x
(x − 1)!(n − x)!
x−1=0
= np.
p(1 − p)x−1 , x = 1, 2, . . . , 0 ≤ p ≤ 1;
P [X = x] =
0, otherwise.
find first two moments about origin and first two moments about mean of
X.
13
Solution. The First moment about origin is
∞
X
E[X] = xp(1 − p)x−1
x=1
X∞
= p x(1 − p)x
x=1
1 1
= p 2
= .
(1 − (1 − p)) p
find first two moments about origin and first two moments about mean of
X.
Solution. The First moment about origin is
Z ∞
λn
E[X] = xe−λx xn−1 dx
Γ(n) 0
Z ∞
λn
= e−λx xn dx
Γ(n) 0
λn Γn + 1
= ·
Γ(n) λn+1
n
= .
λ
14
and second moment about origin is
Z ∞
λn
2
E[X ] = x2 e−λx xn−1 dx
Γ(n) 0
Z ∞
λn
= e−λx xn+1 dx
Γ(n) 0
λn Γn + 2
= ·
Γn λn+2
n(n + 1)
= .
λ2
Example. Let X be a normal random variable, having the following prob-
ability function
1
( 2
√ 1 e− 2σ2 (x−µ) , −∞ ≤ x ≤ ∞; σ 2 > 0, −∞ ≤ µ ≤ ∞;
P [X = x] = 2πσ
0, otherwise.
find first two moments about origin and first two moments about mean of
X.
Solution. The First moment about origin is
Z ∞
1 1 2
E[X] = √ xe− 2σ2 (x−µ) dx
2πσ −∞
x−µ
Let y = σ ,
Z ∞
1 1 2
E[X] = √ (yσ + µ)e− 2 y dy
2π −∞
Z ∞ Z ∞
1 − 21 y 2 µ 1 2
= √ ye dy + √ e− 2 y dy
2π −∞ 2π −∞
= 0 + µ,
15
x−µ
Let y = σ ,
Z ∞
1 1 2
E[X 2 ] =√ (yσ + µ)2 e− 2 y dy
2π −∞
Z ∞ Z ∞
σ2 2 − 21 y 2 µ2 1 2
= √ y e dy + √ e− 2 y dy
2π −∞ 2π −∞
Z ∞
2σµ 1 2
+√ ye− 2 y dy
2π −∞
2 Z ∞
σ 1 2
= √ y 2 e− 2 y dy + µ2 + 0.
2π −∞
in second integral we are integrating standard normal density over its entire
range and the third integral is 0 as the integrand is an odd function. Now
let z = y 2 /2, then
2σ 2 ∞ 3/2−1 −z
Z
2
E[X ] = √ z e dz + µ2 + 0
π 0
2σ 2
= √ Γ(3/2) + µ2
π
σ 2
= √ Γ(1/2) + µ2 = σ 2 + µ2 .
π
p
Note . Γ(1/2) = (π).
16
(tX)2 (tX)3
MX (t) = E[etX ] = E[1 + tX + + + . . .]
2! 3!
t2 E[X 2 ] t3 E[X 3 ]
= 1 + tE[X] + + + ...
2! 3!
t2 m02 t3 m03
= 1 + tm01 + + + ... (5)
2! 3!
where m0r is the rth moment about origin. If we differentiate (5) r times
with respect to t, and substituting t = 0, we get the rth moment about
origin, i.e.,
∂r
m0r = (MX (t)) .
∂tr t=0
Example. Let X be a Poisson random variable, having the following prob-
ability function
e−λ λx
P [X = x] = x! , x = 0, 1, 2, . . . , λ > 0;
0, otherwise.
MX (t) = E[etX ]
∞
X e−λ λx
= etx
x!
x=0
∞
X (et λ)x
= e−λ
x!
x=0
−λ et λ t
= e e = e−λ(1−e ) .
λe−λx , x ≥ 0 λ > 0;
fX (x) =
0, otherwise.
17
Solution.
MX (t) = E[etX ]
Z ∞
= λ etx e−λx dx
0
Z ∞
= λ e−(λ−t)x dx
0
t −1
λ
= = 1− .
λ−t λ
Theorem 3.1. Let X be a random variable, with corresponding moment
generating functions MX (t). Then
MaX+b (t) = exp(bt)MX (at),
where a, b are constants.
Proof. Let Y = aX + b, then characteristic function of Y is as follows:
MY (t) = E[exp(Y t)] = E[exp((aX + b)t)]
= exp(bt)E[exp(aXt)]
= exp(bt)MX (at).
Theorem 3.2. Let X1 and X2 are two independent random variables, with
corresponding characteristic functions MX1 (t) and MX2 (t). Then
MX1 +X2 (t) = MX1 (t)MX2 (t).
Proof. Let Y = X1 + X2 , then characteristic function of Y is as follows:
MY (t) = E[exp(Y t)] = E[exp((X1 + X2 )t)]
= E[exp(X1 t)]E[exp(X2 t)]
= MX1 (t)MX2 (t).
The previous theorem can be extended to any number of independent vari-
ate.
Theorem 3.3. Let Xi , i = 1, 2, . . . , n are n independent random variables,
with corresponding characteristic functions MXi (t), i = 1, 2, . . . , n. Let Y =
P n
i=1 Xi Then
Yn
MY (t) = MXi (t).
i=1
Proof. we can prove the above statement in a very similar way used in
the previous theorem.
18
4 Characteristic Function.
Although, moment generating function is very important tool for generating
the moments of a given
P txdistribution. It does
R txnot exist for the distributions,
for which the sum x e p(x) or integral x e f (x)dx does not converge for
real valued t.
For example, the moment generating function of following mass and density
does not exist
a
P [X = x] = 2 , , x = 0, 1, 2, . . . , n,
x
and
1 1
fX (x) = , −∞ < X < ∞.
π 1 + x2
This problem can be solved by designing a more general generating function.
19
Find out the characteristic function of X.
Solution.
φX (t) = E[eitX ]
Z ∞
λn
= eitx e−λx xn−1 dx
Γ(n) 0
Z ∞
λn
= λ e−(λ−it)x xn−1 dx
Γ(n) 0
λn Γ(n)
=
Γ(n) (λ − it)n
it −n
= 1− .
λ
1. φ(0) = 1.
2. | φ(t) |≤ φ(0) = 1.
4. φ(t) = φ(−t).
Proof.
1. Z ∞
φ(0) = fX (x)dx = 1.
−∞
2.
Z ∞
| φ(t) | ≤ | eitx f (x)dx |
−∞
Z ∞ Z ∞
≤ | eitx | f (x)dx = fX (x)dx = 1.
−∞ −∞
p
Since eitx = cos(tx)+i sin(tx) and hence | eitx |= cos2 (tx) + sin2 (tx) =
1.
20
3. For h 6= 0,
Z ∞
| φ(t + h) − φ(t) | = | (ei(t+h)x − eitx )f (x)dx |
−∞
Z ∞
≤ | ei(t+h)x − eitx | f (x)dx
Z−∞∞
= | eitx (eihx − 1) | f (x)dx
−∞
Z ∞
= | (eihx − 1) | f (x)dx,
−∞
p
since | eitx |= cos2 (tx) + sin2 (tx) = 1. Now taking h → 0, we get
Z ∞
lim | φ(t + h) − φ(t) |≤ lim | (eihx − 1) | f (x)dx = 0.
h→0 −∞ h→0
4.
φ(t) = E[eitx ] = E[cos(tx) + i sin(tx)]
and
21
T
FX (x + h) − FX (x − h)
Z
1 sin(ht) −itx
⇒ = lim e φX (t)dt,
2h T →∞ 2π −T ht
taking h → 0, we get
Z ∞
FX (x + h) − FX (x − h) 1 sin(ht) −itx
lim = lim e φX (t)dt,
h→0 2h 2π −∞ h→0 ht
Z ∞
0 1
fX (x) = FX (x) = e−itx φX (t)dt,
2π −∞
which is the Inverse Laplace Transformation of φX (t).
22
Theorem 4.3. Let X be a random variable, with characteristic functions
φX (t). Then
φaX+b (t) = exp(ibt)φX (at),
where a, b are constants.
Theorem 4.4. Let X1 and X2 are two independent random variables, and
with corresponding characteristic functions φX1 (t) and φX2 (t). Then
Proof. we can prove the above statement in a very similar way used in
the previous theorem.
23
Proof.
Z ∞ Z ∞
φX (t) = itx
e f (x)dx = e−ity f (−y)dy = φX (−t)
−∞ −∞
φX (t) = φX (t)
Exercise.
24
1. Let 3 dice tossed together
(a) Find out the expected value of the sum of face numbers on the
dice.
(b) Find out the expected value of the sum of face numbers on the
dice.
7. Find out the first two moment about origin of the following distribu-
tions with the help of their respective moment generating functions
25
(c) Beta distribution with parameter m, n.
10. Using characteristic function show that the probability density func-
tion of a standard normal variate is symmetric about origin.
φX (t) = e−|t| .
12. Let X and Y are two random variables from exp(λ). Using charac-
teristic function show that X + Y is Gamma variate with parameter
(λ, 2).
13. Let X and Y are two random variables from N (µ, σ 2 ). Using charac-
teristic function show that X + Y is again follow normal variate with
parameters 2µ and 2σ 2 .
26