Week9 Notes
Week9 Notes
9. Week 9
Remark 9.1 (Conditional Distribution for discrete random vectors). Let X = (X1 , X2 , · · · , Xp+q )
be a discrete random vector with support SX and joint p.m.f. fX . Let Y = (X1 , X2 , · · · , Xp ) and
Z = (Xp+1 , Xp+2 , · · · , Xp+q ). Then Y and Z both are discrete random vectors. Let fY and SY
denote the joint p.m.f. and support of Y , respectively. Let fZ and SZ denote the joint p.m.f. and
support of Z, respectively. For z ∈ SZ , consider the set
Tz := {y ∈ Rp : (y, z) ∈ SX }.
Note 9.2. For notational convenience, we have discussed the conditional distribution of first p
component RVs with respect to the final q component RVs. However, as long as the (p + q)-
dimensional joint distribution is known, we can discuss the conditional distribution of any of the
k-component RVs with respect to the other (p + q − k)-component RVs.
Note 9.3. When values for some of the components RVs are given, the conditional distribution
provides an updated probability distribution for the rest of the component RVs.
98
Note 9.4. Let (X, Y ) be a 2-dimensional discrete random vector such that X and Y are indepen-
dent. Then fX,Y (x, y) = fX (x)fY (y), ∀x, y ∈ R. Then
fY |X (y | x) = fY (y), ∀x ∈ SX , y ∈ SY .
This statement can be generalized to higher dimensions with appropriate changes in the notation.
Definition 9.6 (Continuous Random Vector and its Joint Probability Density Function (Joint
p.d.f.)). A random vector X = (X1 , X2 , · · · , Xp ) is said to be a continuous random vector if there
exists an integrable function f : Rp → [0, ∞) such that
FX (x) = P(X1 ≤ x1 , X2 ≤ x2 , · · · , Xp ≤ xp )
Z x1 Z x2 Z xp
= ··· f (t1 , t2 , · · · , tp ) dtp dtp−1 · · · dt2 dt1 , ∀x = (x1 , x2 , · · · , xp ) ∈ Rp .
t1 =−∞ t2 =−∞ tp =−∞
The function f is called the joint probability density function (joint p.d.f.) of X.
Remark 9.7. Let X be a continuous random vector with joint DF FX and joint p.d.f. fX . Then
we have the following observations.
(a) FX is jointly continuous in all co-ordinates.
(b) P(X = x) = 0, ∀x ∈ Rp . More generally, if A ⊂ Rp is finite or countably infinite, then by
the finite/countable additivity of PX , we have
X X
P(X ∈ A) = PX (A) = PX ({x}) = P(X = x) = 0.
x∈A x∈A
1= lim FX (x1 , x2 , · · · , xp )
xj →∞ ∀j
Z x1 Z x2 Z xp
= lim ··· f (t1 , t2 , · · · , tp ) dtp dtp−1 · · · dt2 dt1
xj →∞ ∀j t1 =−∞ t2 =−∞ tp =−∞
99
Z ∞ Z ∞ Z ∞
= ··· fX (t1 , t2 , · · · , tp ) dtp dtp−1 · · · dt2 dt1 .
−∞ −∞ −∞
(d) Suppose that the joint p.d.f. fX of a p-dimensional random vector X is piecewise continu-
ous. Then by the Fundamental Theorem of Calculus (specifically multivariable Calculus),
we have
∂p
fX (x1 , x2 , · · · , xp ) = FX (x1 , x2 , · · · , xp ),
∂x1 ∂x2 · · · ∂xp
wherever the partial derivative on the right hand side exists.
(e) If X is a p-dimensional random vector such that its joint DF FX is continuous on Rp
∂p
and such that the partial derivative F
∂x1 ∂x2 ···∂xp X
exists everywhere except possibly on a
countable number of curves on Rp . Let A ⊂ R denote the set of all points on such curves.
p
(f) The joint p.d.f. of a continuous random vector is not unique. As in the case of continuous
RVs, the joint p.d.f. is determined uniquely upto sets of ‘volume 0’. Here, we also get
versions of the joint p.d.f..
(g) For A ⊂ Rp , we have
ZZZ
P(X ∈ A) = fX (t1 , t2 , · · · , tp ) dtp dtp−1 · · · dt2 dt1
A
ZZZ
= fX (t1 , t2 , · · · , tp )1A (t1 , t2 , · · · , tp ) dtp dtp−1 · · · dt2 dt1 ,
Rp
provided the integral can be defined. We do not prove this statement in this course.
(h) For any j ∈ {1, 2, · · · , p}, for xj ∈ R
= P(X ∈ R × · · · × R × (−∞, xj ] × R × · · · × R)
100
Z ∞ Z ∞ Z xj Z ∞ Z ∞
= ··· ··· fX (t1 , t2 , · · · , tp ) dtp dtp−1 · · · dt2 dt1 .
t1 =−∞ tj−1 =−∞ tj =−∞ tj+1 =−∞ tp =−∞
Consider gj : R → R defined by
Z ∞ Z ∞ Z ∞ Z ∞
gj (tj ) := ··· ··· fX (t1 , t2 , · · · , tp ) dtp · · · dtj−1 dtj+1 · · · dt2 dt1 .
t1 =−∞ tj−1 =−∞ tj+1 =−∞ tp =−∞
R xj
It is immediate that gj satisfies the properties of a p.d.f. and FXj (xj ) = tj =−∞ gj (tj ) dtj .
Therefore, Xj is a continuous RV with p.d.f. gj . More generally, all marginal distributions of
X are also continuous and can be obtained by integrating out the unnecessary co-ordinates.
The function gj is usually referred to as the marginal p.d.f. of Xj .
Then f is the joint p.d.f. of some p-dimensional continuous random vector X. We are not going
to discuss the proof of this statement in this course.
We can identify the independence of the component RVs for a continuous random vector via the
joint p.d.f.. The proof is similar to Theorem 8.48 and is skipped for brevity.
Theorem 9.9. Let X = (X1 , X2 , · · · , Xp ) be a continuous random vector with joint DF FX , joint
p.d.f. fX . Let fXj denote the marginal p.d.f. of Xj . Then X1 , X2 , · · · , Xp are independent if and
only if
p
Y
fX1 ,X2 ,··· ,Xp (x1 , x2 , · · · , xp ) = fXj (xj ), ∀x1 , x2 , · · · , xp ∈ R.
j=1
Example 9.10. Given p.d.f.s f1 , f2 , · · · , fp : R → [0, ∞), consider the function f : Rp → [0, ∞)
defined by
p
fj (xj ), ∀x = (x1 , x2 , · · · , xp ) ∈ Rp .
Y
f (x) :=
j=1
Then Z ∞ Z ∞ Z ∞
··· fX (t1 , t2 , · · · , tp ) dtp dtp−1 · · · dt2 dt1 = 1.
−∞ −∞ −∞
101
By Remark 9.8, we have that f is the joint p.d.f. of a p-dimensional continuous random vector such
that the component RVs are independent, by Theorem 9.9. Using this method, we can construct
many examples of continuous random vectors.
Remark 9.11. Let X = (X1 , X2 , · · · , Xp ) be a continuous random vector with joint p.d.f. fX . Then
X1 , X2 , · · · , Xp are independent if and only if
p
Y
fX1 ,X2 ,··· ,Xp (x1 , x2 , · · · , xp ) = gj (xj ), ∀x1 , x2 , · · · , xp ∈ R
j=1
for some integrable functions g1 , g2 , · · · , gp : R → [0, ∞). In this case, the marginal p.d.fs fXj have
R −1
∞
the form cj gj , where the number cj can be determined from the relation cj = −∞ gj (x) dx .
Example 9.12. Let Z = (X, Y ) be a 2-dimensional continuous random vector with the joint p.d.f.
of the form
if 0 < x < y < 1
αxy,
fZ (x, y) =
0, otherwise
for some constant α ∈ R. For fZ to take non-negative values, we must have α > 0. Now,
Z ∞ Z ∞ Z 1 Z y Z 1
y3 α
fZ (x, y) dxdy = αxy dxdy = α dy = .
−∞ −∞ y=0 x=0 y=0 2 8
R∞ R∞
For fZ to be a joint p.d.f., we need −∞ −∞ fZ (x, y) dxdy = 1 and hence α = 8 > 0. Also note
that for this value of α, fZ takes non-negative values. The marginal p.d.f. fX of X can now be
computed as follows.
1
y=x 8xy dy, if x ∈ (0, 1)
4x[1 − x2 ], if x ∈ (0, 1)
Z ∞
R
fX (x) = fZ (x, y) dy = = .
−∞ 0, otherwise 0,
otherwise
Example 9.13. Let U = (X, Y, Z) be a 3-dimensional continuous random vector with the joint
p.d.f. of the form
if x, y, z ∈ (0, 1)
αxyz,
fU (x, y, z) =
0, otherwise
for some constant α ∈ R. For fZ to take non-negative values, we must have α > 0. Now,
Z ∞ Z ∞ Z ∞ Z 1 Z 1 Z 1
α
fU (x, y, z) dxdydz = αxyz dxdydz = .
−∞ −∞ −∞ x=0 y=0 z=0 8
R∞ R∞ R∞
For fU to be a joint p.d.f., we need −∞ −∞ −∞ fU (x, y, z) dxdydz = 1 and hence α = 8 > 0. Also
note that for this value of α, fU takes non-negative values. The marginal p.d.f. fX of X can now
be computed as follows.
Z ∞ Z ∞ R 1
R1
8xyz dydz, if x ∈ (0, 1) 2x,
if x ∈ (0, 1)
z=0 y=0
fX (x) = fU (x, y, z) dydz = = .
−∞ −∞ 0, otherwise 0, otherwise
d d
By the symmetry of fU (x, y, z) in the variables x, y and z, we conclude that X = Y = Z. Observe
that fX,Y,Z (x, y, z) = fX (x)fY (y)fZ (z), ∀x, y, z and hence the RVs X, Y, Z are independent.
Note 9.14. There are random vectors which are neither discrete nor continuous. We do not
discuss such examples in this course.
Remark 9.15 (Conditional Distribution for continuous random vectors). We now discuss an ana-
logue of conditional distributions as discussed in Remark 9.1 for discrete random vectors. To avoid
notational complexity, we work in dimension 2. Let (X, Y ) be a 2-dimensional continuous random
vector with joint DF FX,Y and joint p.d.f. fX,Y . Let fX and fY denote the marginal p.d.fs of X
and Y respectively. Since P(X = x) = 0, ∀x ∈ R, expressions of the form P(Y ∈ A | X = x)
are not defined for A ⊂ R. We consider x ∈ R such that fX (x) > 0 and look at the following
computation. For y ∈ R,
P(Y ≤ y, x − h < X ≤ x)
lim P(Y ≤ y | x − h < X ≤ x) = lim
h↓0 h↓0 P(x − h < X ≤ x)
103
Rx Ry
x−h −∞ fX,Y (t, s) dsdt
= lim Rx
x−h fX (t) dt
h↓0
1 Rx Ry
x−h −∞ fX,Y (t, s) dsdt
= lim h 1 Rx
h↓0
h x−h X
f (t) dt
Ry
−∞ fX,Y (x, s) ds
=
fX (x)
Z y
fX,Y (x, s)
= ds
−∞ fX (x)
Here, we have assumed continuity of the p.d.fs. Motivated by the above computation, we define
the conditional DF of Y given X = x (provided fX (x) > 0) by
Note 9.16. For notational convenience, we have discussed the conditional distribution of first
p component RVs with respect to the final q component RVs. However, as long as the (p + q)-
dimensional joint distribution is known, we can discuss the conditional distribution of any of the
k-component RVs with respect to the other (p + q − k)-component RVs.
104
Note 9.17. Let (X, Y ) be a 2-dimensional continuous random vector such that X and Y are
independent. Then fX,Y (x, y) = fX (x)fY (y), ∀x, y ∈ R. Then
fY |X (y | x) = fY (y), ∀y ∈ R,
provided fX (x) > 0. This statement can be generalized to higher dimensions with appropriate
changes in the notation.
Earlier, we have discussed about the distribution of functions of RVs. We now generalize the
same concept for random vectors.
Once the joint DF FY is known, the joint p.m.f./p.d.f. of Y can then be deduced by standard
techniques.
Example 9.20. Let X1 ∼ U nif orm(0, 1) and X2 ∼ U nif orm(0, 1) be independent RVs. Suppose
we are interested in the distribution of Y = X1 + X2 . By independence of X1 and X2 , the joint
p.d.f. (X1 , X2 ) is given by
FY (y) = P(Y ≤ y)
= P(h(X1 , X2 ) ≤ y)
Z ∞ Z ∞
= 1(−∞,y] (h(x1 , x2 ))fX1 ,X2 (x1 , x2 ) dx1 dx2
−∞ −∞
Z 1Z 1
= 1(−∞,y] (x1 + x2 ) dx1 dx2
0 0
0, if y < 0,
xy =0 xy−x if 0 ≤ y < 1,
R R 1
=0 dx2 dx1 ,
1 2
=
1 − 21 × (2 − y) × (2 − y), if 1 ≤ y < 2,
1, if y ≥ 2
0, if y < 0,
2
y , if 0 ≤ y < 1,
2
=
4y−y 2 −2
, if 1 ≤ y < 2,
2
1, if y ≥ 2
As done in the case of RVs, in the setting of Remark 9.19, we consider the computation of the
joint p.m.f./p.d.f. of Y directly, instead of computing the joint DF FY first. The next result is a
direct generalization of Theorem 5.21 and we skip the proof for brevity.
106
Theorem 9.21 (Change of Variables for Discrete random vectors). Let X = (X1 , . . . , Xp ) be a
p-dimensional discrete random vector with joint p.m.f. fX and support SX . Let h = (h1 , · · · , hq ) :
Rp → Rq be a function and let Y = (Y1 , · · · , Yq ) = h(X) = (h1 (X), · · · , hq (X)). Then Y is a
discrete random vector with support
SY = h(SX ) = {h(x) : x ∈ SX },
joint p.m.f. X
fX (x), if y ∈ SY ,
x∈SX
fY (y) =
h(x)=y
0,
otherwise
and joint DF
fX (x), ∀y ∈ Rq .
X
FY (y) =
x∈SX
h(x)≤y
Example 9.22. Fix p ∈ (0, 1) and let n1 , · · · , nq be positive integers. Let X1 , · · · , Xq be indepen-
dent RVs with Xi ∼ Binomial(ni , p), i = 1, · · · , q. Here, the with p.m.f.s are given by
ni px (1 − p)ni −x , ∀x
∈ {0, 1, · · · , ni },
x
fXi (xi ) =
0, otherwise
fY (y) = P(X1 + · · · + Xq = y)
X
= fX (x1 , · · · , xq )
Qq
(x1 ,··· ,xq )∈ i=1 {0,1,··· ,ni }
x1 +···+xq =y
107
q !
y n−y
X Y ni
= p (1 − p)
Qq
(x1 ,··· ,xq )∈ i=1 {0,1,··· ,ni } i=1
xi
x1 +···+xq =y
!
n y
= p (1 − p)n−y .
y
Therefore, Y = X1 + · · · + Xq ∼ Binomial(n, p) with n = n1 + · · · + nq .
Remark 9.23. We had earlier mentioned that Bernoulli(p) distribution is the same as Binomial(1, p)
distribution. Using the above computation, we can identify a Binomial(n, p) RV as a sum of n
independent RVs each having distribution Bernoulli(p). We shall come back to this observation
in later lectures.
For continuous random vectors, we have the following generalization of Theorem 6.1. Proof of
this result is being skipped.
Theorem 9.24. Let X = (X1 , . . . , Xp ) be a p-dimensional continuous random vector with joint
p.d.f. fX . Suppose that {x ∈ Rp : fX (x) > 0} can be written as a disjoint union ∪ki=1 Si of open
sets in Rp .
Let hj : Rp → R, j = 1, · · · , p be functions such that h = (h1 , · · · , hp ) : Si → Rp is one-to-one
with inverse h−1
i = ((h1i )−1 , · · · , (hpi )−1 ) for each i = 1, · · · , k. Moreover, assume that (hji )−1 , i =
1, 2, · · · , k; j = 1, · · · , p have continuous partial derivatives and the Jacobian determinant of the
transformation
∂(h1i )−1 ∂(h1i )−1
∂y1
(t) ··· ∂yp
(y)
.. .. ..
Ji := . . . ̸= 0, ∀i = 1, · · · , k.
∂(hpi )−1 ∂(hpi )−1
∂y1
(y) · · · ∂yp
(y)
Then the p-dimensional random vector Y = (Y1 , · · · , Yp ) = h(X) = (h1 (X), · · · , hp (X)) is a
continuous with joint p.d.f.
k
fX ((h1i )−1 (y), · · · , (hpi )−1 (y)) |Ji | 1h(Si ) (y).
X
fY (y) =
i=1
p.d.f.
1 exp(− x1 +x ), if x1 > 0, x2 > 0
2
λ2 λ
fX1 ,X2 (x1 , x2 ) = fX1 (x1 )fX2 (x2 ) =
0, otherwise.
Here, {(x1 , x2 ) ∈ R2 : fX1 ,X2 (x1 , x2 ) > 0} = {(x1 , x2 ) ∈ R2 : x1 > 0, x2 > 0} and h : {(x1 , x2 ) ∈
R2 : x1 > 0, x2 > 0} → R2 is one-to-one with range (0, ∞) × (0, 1). The inverse function is
h−1 (y1 , y2 ) = (y1 y2 , y1 (1 − y2 )) for (y1 , y2 ) ∈ (0, ∞) × (0, 1) with Jacobian determinant given by
y2 y1
J(y1 , y2 ) = = −y1 .
1 − y2 −y1
0, otherwise.
Now, we compute the marginal distributions. The marginal p.d.f. fY1 is given by
1 exp − yλ1 , if y1 > 0
Z ∞ y
λ2 1
fY1 (y1 ) = fY1 ,Y2 (y1 , y2 ) dy2 =
−∞ 0, otherwise
X1
Therefore Y1 = X1 + X2 ∼ Gamma(2, λ) and Y2 = X1 +X2
∼ U nif orm(0, 1). Moreover,
Remark 9.26. We had earlier mentioned that Exponential(λ) distribution is the same as Gamma(1, λ)
distribution. Using the above computation, we can identify a Gamma(2, λ) RV as a sum of two
independent RVs each having distribution Gamma(1, λ). A more general property in this regard
is mentioned in practice problem set 8.
We now consider expectations for random vectors and for functions of random vectors. The
concepts are same as discussed in the case of RVs.
Remark 9.28. If the sum or the integral above converges absolutely, we say that the expectation
Eh(X) exists or equivalently, Eh(X) is finite. Otherwise, we shall say that the expectation Eh(X)
does not exist.
The following results is a generalization of Proposition 6.19. We skip the proof for brevity.
Proposition 9.29. (a) Let X = (X1 , X2 , · · · , Xp ) be a discrete random vector with joint p.m.f.
fX and support SX and let h : Rp → R be a function. Consider the discrete RV Y := h(X)
|y|fY (y) < ∞ and in
P
with p.m.f. fY and support SY . Then EY exists if and only if y∈SY
110
this case,
X X
EY = Eh(X) = h(x)fX (x) = yfY (y).
x∈SX y∈SY
(b) Let X = (X1 , X2 , · · · , Xp ) be a continuous random vector with joint p.d.f. fX . Let h :
Rp → R be a function such that the RV Y := h(X) is continuous with p.d.f. fY . Then EY
R∞
exists if and only if −∞ |y|fY (y) dy < ∞ and in this case,
Z ∞ Z ∞ Z ∞ Z ∞
EY = Eh(X) = ··· h(x)fX (x) dx = yfY (y) dy.
−∞ −∞ −∞ −∞
Note 9.30. As considered for the case of RVs, by choosing different functions h : Rp → R, we
obtain several quantities of interest of the form Eh(X) for a p-dimensional random vector X.
Definition 9.31 (Some special expectations for Random Vectors). Let X = (X1 , X2 , · · · , Xp ) be
a p-dimensional discrete/continuous random vector.
(a) (Joint Moments) For non-negative integers k1 , . . . , kp , let h(x) := xk11 · · · xkpp , ∀x ∈ Rp .
Then,
µ′k1 ,...,kp := E X1k1 · · · Xpkp
Then
µk1 ,...,kp := E (X1 − E (X1 ))k1 · · · (Xp − E (Xp ))kp
Remark 9.32. We now list some properties of the above quantities. The properties are being
stated under the assumption that the expectations involved exist. Let X = (X1 , X2 , · · · , Xp ) be a
p-dimensional discrete/continuous random vector.
p
X
! p
X
(a) Let a1 , . . . , ap be real constants. Then, E ai Xi = ai EXi . To see this for discrete
i=1 i=1
X, observe that
p
X
! p
X X p X
X p
X
E ai Xi = ai xi fX (x) = ai xi fX (x) = ai EXi .
i=1 x∈SX i=1 i=1 x∈SX i=1
The interchange of the order of summation is allowed due to absolute convergence of the
series involved. The proof for continuous X is similar.
(b) Cov(Xi , Xj ) = Cov(Xj , Xi ), for all i, j = 1, . . . , p.
(c) Cov(Xi , Xi ) = V ar(Xi ), for all i = 1, . . . , p.
(d) For all i, j = 1, . . . , p, we have
In particular,
p ! p p X
p
a2i V
X X X
V ar ai X i = ar (Xi ) + ai aj Cov (Xi , Xj )
i=1 i=1 i=1 j=1
j̸=i
p
a2i V ar (Xi ) + 2
X X
= ai aj Cov (Xi , Xj ) .
i=1 1≤i<j≤p
For simplicity, we discuss the proof when p = 2 and X = (X1 , X2 ) is continuous with joint
p.d.f. fX . Recall from Theorem 9.9 that fX (x1 , x2 ) = fX1 (x1 )fX2 (x2 ), ∀x1 , x2 , ∈ R. Then,
2
! Z ∞ Z ∞
Y
E hi (Xi ) = h1 (x1 )h2 (x2 )fX (x1 , x2 ) dx1 dx2
i=1 −∞ −∞
Z ∞ Z ∞
= h1 (x1 )h2 (x2 )fX1 (x1 )fX2 (x2 ) dx1 dx2
−∞ −∞
Z ∞ Z ∞
= h1 (x1 )fX1 (x1 ) dx1 h2 (x2 )fX2 (x2 ) dx2
−∞ −∞
2
Y
= Ehi (Xi ).
i=1
(g) This is a special case of statement (f). Let A1 , A2 , · · · , Ap ⊆ R. Consider the functions
1, if x ∈ Ai
hi (xi ) := = 1Ai (xi ), ∀xi ∈ R, i = 1, 2, · · · , p.
0, otherwise.
R∞ R∞ R∞
Note that Ehi (Xi ) = −∞ −∞ ··· −∞ 1Ai (xi )fX1 (x1 )fX2 (x2 ) · · · fXp (xp ) dx1 dx2 · · · dxp =
R∞
−∞ 1Ai (xi )fXi (xi ) dxi = P(Xi ∈ Ai ), when X is continuous. The same equality is also
113
(h) Continue with the assumptions of statement (f). For fixed y1 , y2 , · · · , yp ∈ R, consider the
functions g1 , g2 , · · · , gp : R → R defined by
1,if hi (xi ) ≤ yi
gi (xi ) := ∀xi ∈ R, i = 1, 2, · · · , p.
0, otherwise.
Note that Egi (Xi ) = P(hi (Xi ) ≤ yi ) = Fhi (Xi ) (yi ), ∀i and
Fh1 (X1 ),h2 (X2 ),··· ,hp (Xp ) (y1 , y2 , · · · , yp ) = P(h1 (X1 ) ≤ y1 , h2 (X2 ) ≤ y2 , · · · , hp (Xp ) ≤ yp )
p
Y
= P(hi (Xi ) ≤ yi )
i=1
Yp
= Fh(Xi ) (yi ).
i=1
Cov(X1 , X2 ) = 0.
with
n Pp o
A := t = (t1 , t2 , . . . , tp ) ∈ Rp : E e tX
i=1 i i <∞ .
Cov (Xi , Xj )
(o) As discussed for the case of random variables, the joint Characteristic function for ran-
dom vectors has nice properties similar to the joint MGF. If some joint moment for the
random vectors exists, then it can be recovered from the partial derivatives of the joint
Characteristic function etc.. We do not discuss these properties in detail in this course.