0% found this document useful (0 votes)
7 views

Week9 Notes

Uploaded by

Reevu Thapa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Week9 Notes

Uploaded by

Reevu Thapa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

97

9. Week 9

Remark 9.1 (Conditional Distribution for discrete random vectors). Let X = (X1 , X2 , · · · , Xp+q )
be a discrete random vector with support SX and joint p.m.f. fX . Let Y = (X1 , X2 , · · · , Xp ) and
Z = (Xp+1 , Xp+2 , · · · , Xp+q ). Then Y and Z both are discrete random vectors. Let fY and SY
denote the joint p.m.f. and support of Y , respectively. Let fZ and SZ denote the joint p.m.f. and
support of Z, respectively. For z ∈ SZ , consider the set

Tz := {y ∈ Rp : (y, z) ∈ SX }.

The conditional p.m.f. of Y given Z = z ∈ SZ is defined by



 fX (y,z) ,
if y ∈ Tz

P(Y = y, Z = z) 
fY |Z (y | z) := P(Y = y | Z = z) = = fZ (z)
P(Z = z) 0, otherwise.

By definition, fY |Z (y | z) ≥ 0, ∀y ∈ Rp and fY |Z (y | z) = fY |Z (y | z) = 1. Therefore,


P P
y∈Rp y∈Tz
p
for every z ∈ SZ , the function y ∈ R 7→ fY |Z (y | z) is a joint p.m.f. with support Tz . We refer to
the probability law/distribution described by this p.m.f. as the conditional distribution of Y given
Z = z ∈ SZ . The conditional DF of Y given Z = z ∈ SZ is given by
P(Y ≤ y, Z = z) X fX (t, z) X
FY |Z (y | z) := P(Y ≤ y | Z = z) = = = fY |Z (t | z),
P(Z = z) t≤y
fZ (z) t≤y
t∈Tz t∈Tz

where t ≤ y refers to component-wise inequalities tj ≤ yj for all j = 1, 2, · · · , p.

Note 9.2. For notational convenience, we have discussed the conditional distribution of first p
component RVs with respect to the final q component RVs. However, as long as the (p + q)-
dimensional joint distribution is known, we can discuss the conditional distribution of any of the
k-component RVs with respect to the other (p + q − k)-component RVs.

Note 9.3. When values for some of the components RVs are given, the conditional distribution
provides an updated probability distribution for the rest of the component RVs.
98

Note 9.4. Let (X, Y ) be a 2-dimensional discrete random vector such that X and Y are indepen-
dent. Then fX,Y (x, y) = fX (x)fY (y), ∀x, y ∈ R. Then

fY |X (y | x) = fY (y), ∀x ∈ SX , y ∈ SY .

This statement can be generalized to higher dimensions with appropriate changes in the notation.

Example 9.5. In Example 8.51, we have, for fixed x ∈ {1, 2, 3, 4},


 
 fX,Y (x,y) ,
if y ∈ {1, 2, 3, 4}  x+y , if y ∈ {1, 2, 3, 4}

 

fY |X (y | x) =  fX (x) =  2(2x+5)
0, otherwise.
 0, otherwise.

Definition 9.6 (Continuous Random Vector and its Joint Probability Density Function (Joint
p.d.f.)). A random vector X = (X1 , X2 , · · · , Xp ) is said to be a continuous random vector if there
exists an integrable function f : Rp → [0, ∞) such that

FX (x) = P(X1 ≤ x1 , X2 ≤ x2 , · · · , Xp ≤ xp )
Z x1 Z x2 Z xp
= ··· f (t1 , t2 , · · · , tp ) dtp dtp−1 · · · dt2 dt1 , ∀x = (x1 , x2 , · · · , xp ) ∈ Rp .
t1 =−∞ t2 =−∞ tp =−∞

The function f is called the joint probability density function (joint p.d.f.) of X.

Remark 9.7. Let X be a continuous random vector with joint DF FX and joint p.d.f. fX . Then
we have the following observations.
(a) FX is jointly continuous in all co-ordinates.
(b) P(X = x) = 0, ∀x ∈ Rp . More generally, if A ⊂ Rp is finite or countably infinite, then by
the finite/countable additivity of PX , we have
X X
P(X ∈ A) = PX (A) = PX ({x}) = P(X = x) = 0.
x∈A x∈A

(c) By definition, we have fX (x) ≥ 0, ∀x ∈ Rp and

1= lim FX (x1 , x2 , · · · , xp )
xj →∞ ∀j
Z x1 Z x2 Z xp
= lim ··· f (t1 , t2 , · · · , tp ) dtp dtp−1 · · · dt2 dt1
xj →∞ ∀j t1 =−∞ t2 =−∞ tp =−∞
99
Z ∞ Z ∞ Z ∞
= ··· fX (t1 , t2 , · · · , tp ) dtp dtp−1 · · · dt2 dt1 .
−∞ −∞ −∞

(d) Suppose that the joint p.d.f. fX of a p-dimensional random vector X is piecewise continu-
ous. Then by the Fundamental Theorem of Calculus (specifically multivariable Calculus),
we have
∂p
fX (x1 , x2 , · · · , xp ) = FX (x1 , x2 , · · · , xp ),
∂x1 ∂x2 · · · ∂xp
wherever the partial derivative on the right hand side exists.
(e) If X is a p-dimensional random vector such that its joint DF FX is continuous on Rp
∂p
and such that the partial derivative F
∂x1 ∂x2 ···∂xp X
exists everywhere except possibly on a
countable number of curves on Rp . Let A ⊂ R denote the set of all points on such curves.
p

Then X is a continuous random vector with the joint p.d.f.



∂p
F (x), if x = (x1 , x2 , · · · , xp ) ∈ Ac ,


∂x1 ∂x2 ···∂xp X

fX (x) = 
0,
 if x = (x1 , x2 , · · · , xp ) ∈ A.

(f) The joint p.d.f. of a continuous random vector is not unique. As in the case of continuous
RVs, the joint p.d.f. is determined uniquely upto sets of ‘volume 0’. Here, we also get
versions of the joint p.d.f..
(g) For A ⊂ Rp , we have
ZZZ
P(X ∈ A) = fX (t1 , t2 , · · · , tp ) dtp dtp−1 · · · dt2 dt1
A
ZZZ
= fX (t1 , t2 , · · · , tp )1A (t1 , t2 , · · · , tp ) dtp dtp−1 · · · dt2 dt1 ,
Rp

provided the integral can be defined. We do not prove this statement in this course.
(h) For any j ∈ {1, 2, · · · , p}, for xj ∈ R

FXj (xj ) = P(Xj ∈ (−∞, xj ])

= P(X1 ∈ R, · · · , Xj−1 ∈ R, Xj ∈ (−∞, xj ], Xj+1 ∈ R, · · · , Xp ∈ R)

= P(X ∈ R × · · · × R × (−∞, xj ] × R × · · · × R)
100
Z ∞ Z ∞ Z xj Z ∞ Z ∞
= ··· ··· fX (t1 , t2 , · · · , tp ) dtp dtp−1 · · · dt2 dt1 .
t1 =−∞ tj−1 =−∞ tj =−∞ tj+1 =−∞ tp =−∞

Consider gj : R → R defined by
Z ∞ Z ∞ Z ∞ Z ∞
gj (tj ) := ··· ··· fX (t1 , t2 , · · · , tp ) dtp · · · dtj−1 dtj+1 · · · dt2 dt1 .
t1 =−∞ tj−1 =−∞ tj+1 =−∞ tp =−∞
R xj
It is immediate that gj satisfies the properties of a p.d.f. and FXj (xj ) = tj =−∞ gj (tj ) dtj .
Therefore, Xj is a continuous RV with p.d.f. gj . More generally, all marginal distributions of
X are also continuous and can be obtained by integrating out the unnecessary co-ordinates.
The function gj is usually referred to as the marginal p.d.f. of Xj .

Remark 9.8. Let f : Rp → [0, ∞) be an integrable function with


Z ∞ Z ∞ Z ∞
··· fX (t1 , t2 , · · · , tp ) dtp dtp−1 · · · dt2 dt1 = 1.
−∞ −∞ −∞

Then f is the joint p.d.f. of some p-dimensional continuous random vector X. We are not going
to discuss the proof of this statement in this course.

We can identify the independence of the component RVs for a continuous random vector via the
joint p.d.f.. The proof is similar to Theorem 8.48 and is skipped for brevity.

Theorem 9.9. Let X = (X1 , X2 , · · · , Xp ) be a continuous random vector with joint DF FX , joint
p.d.f. fX . Let fXj denote the marginal p.d.f. of Xj . Then X1 , X2 , · · · , Xp are independent if and
only if
p
Y
fX1 ,X2 ,··· ,Xp (x1 , x2 , · · · , xp ) = fXj (xj ), ∀x1 , x2 , · · · , xp ∈ R.
j=1

Example 9.10. Given p.d.f.s f1 , f2 , · · · , fp : R → [0, ∞), consider the function f : Rp → [0, ∞)
defined by
p
fj (xj ), ∀x = (x1 , x2 , · · · , xp ) ∈ Rp .
Y
f (x) :=
j=1

Then Z ∞ Z ∞ Z ∞
··· fX (t1 , t2 , · · · , tp ) dtp dtp−1 · · · dt2 dt1 = 1.
−∞ −∞ −∞
101

By Remark 9.8, we have that f is the joint p.d.f. of a p-dimensional continuous random vector such
that the component RVs are independent, by Theorem 9.9. Using this method, we can construct
many examples of continuous random vectors.

Remark 9.11. Let X = (X1 , X2 , · · · , Xp ) be a continuous random vector with joint p.d.f. fX . Then
X1 , X2 , · · · , Xp are independent if and only if
p
Y
fX1 ,X2 ,··· ,Xp (x1 , x2 , · · · , xp ) = gj (xj ), ∀x1 , x2 , · · · , xp ∈ R
j=1

for some integrable functions g1 , g2 , · · · , gp : R → [0, ∞). In this case, the marginal p.d.fs fXj have
R −1

the form cj gj , where the number cj can be determined from the relation cj = −∞ gj (x) dx .

Example 9.12. Let Z = (X, Y ) be a 2-dimensional continuous random vector with the joint p.d.f.
of the form 
if 0 < x < y < 1

αxy,

fZ (x, y) =
0, otherwise

for some constant α ∈ R. For fZ to take non-negative values, we must have α > 0. Now,
Z ∞ Z ∞ Z 1 Z y Z 1
y3 α
fZ (x, y) dxdy = αxy dxdy = α dy = .
−∞ −∞ y=0 x=0 y=0 2 8
R∞ R∞
For fZ to be a joint p.d.f., we need −∞ −∞ fZ (x, y) dxdy = 1 and hence α = 8 > 0. Also note
that for this value of α, fZ takes non-negative values. The marginal p.d.f. fX of X can now be
computed as follows.
 
 1
y=x 8xy dy, if x ∈ (0, 1)
4x[1 − x2 ], if x ∈ (0, 1)
Z ∞ 
R 

fX (x) = fZ (x, y) dy =  = .
−∞ 0, otherwise 0,
  otherwise

The marginal p.d.f. fY of Y follows by a similar computation.


 
 y 8xy dx, if y ∈ (0, 1) 4y 3 , if y ∈ (0, 1)
Z ∞ 
 R 

x=0
fY (y) = fZ (x, y) dx =  = .
−∞ 0,
 otherwise 0,
 otherwise

Observe that fZ ( 12 , 12 ) = 0 and fX ( 21 )fY ( 12 ) = 3


2
× 1
2
= 34 . Hence X and Y are not independent.
102

Example 9.13. Let U = (X, Y, Z) be a 3-dimensional continuous random vector with the joint
p.d.f. of the form 
if x, y, z ∈ (0, 1)

αxyz,

fU (x, y, z) = 
0, otherwise

for some constant α ∈ R. For fZ to take non-negative values, we must have α > 0. Now,
Z ∞ Z ∞ Z ∞ Z 1 Z 1 Z 1
α
fU (x, y, z) dxdydz = αxyz dxdydz = .
−∞ −∞ −∞ x=0 y=0 z=0 8
R∞ R∞ R∞
For fU to be a joint p.d.f., we need −∞ −∞ −∞ fU (x, y, z) dxdydz = 1 and hence α = 8 > 0. Also
note that for this value of α, fU takes non-negative values. The marginal p.d.f. fX of X can now
be computed as follows.
 
Z ∞ Z ∞ R 1
 R1
8xyz dydz, if x ∈ (0, 1) 2x,

 if x ∈ (0, 1)
z=0 y=0

fX (x) = fU (x, y, z) dydz = = .
−∞ −∞ 0, otherwise 0, otherwise

 

d d
By the symmetry of fU (x, y, z) in the variables x, y and z, we conclude that X = Y = Z. Observe
that fX,Y,Z (x, y, z) = fX (x)fY (y)fZ (z), ∀x, y, z and hence the RVs X, Y, Z are independent.

Note 9.14. There are random vectors which are neither discrete nor continuous. We do not
discuss such examples in this course.

Remark 9.15 (Conditional Distribution for continuous random vectors). We now discuss an ana-
logue of conditional distributions as discussed in Remark 9.1 for discrete random vectors. To avoid
notational complexity, we work in dimension 2. Let (X, Y ) be a 2-dimensional continuous random
vector with joint DF FX,Y and joint p.d.f. fX,Y . Let fX and fY denote the marginal p.d.fs of X
and Y respectively. Since P(X = x) = 0, ∀x ∈ R, expressions of the form P(Y ∈ A | X = x)
are not defined for A ⊂ R. We consider x ∈ R such that fX (x) > 0 and look at the following
computation. For y ∈ R,
P(Y ≤ y, x − h < X ≤ x)
lim P(Y ≤ y | x − h < X ≤ x) = lim
h↓0 h↓0 P(x − h < X ≤ x)
103
Rx Ry
x−h −∞ fX,Y (t, s) dsdt
= lim Rx
x−h fX (t) dt
h↓0
1 Rx Ry
x−h −∞ fX,Y (t, s) dsdt
= lim h 1 Rx
h↓0
h x−h X
f (t) dt
Ry
−∞ fX,Y (x, s) ds
=
fX (x)
Z y
fX,Y (x, s)
= ds
−∞ fX (x)
Here, we have assumed continuity of the p.d.fs. Motivated by the above computation, we define
the conditional DF of Y given X = x (provided fX (x) > 0) by

FY |X (y | x) := lim P(Y ≤ y | x − h < X ≤ x), y ∈ R


h↓0

and the conditional p.d.f. of Y given X = x (provided fX (x) > 0) by


fX,Y (x, y)
fY |X (y | x) := , y ∈ R.
fX (x)
These calculations generalizes to the higher dimensions as follows. Let X = (X1 , X2 , · · · , Xp+q )
be a continuous random vector with joint p.d.f. fX . Let Y = (X1 , X2 , · · · , Xp ) and Z =
(Xp+1 , Xp+2 , · · · , Xp+q ). If z ∈ Rq be such that fZ (z) > 0, then we define the conditional DF
of Y given Z = z by

FY |Z (y | z) := lim P(X1 ≤ y1 , · · · , Xp ≤ yp | xj − hj < Xj ≤ xj , ∀j), y ∈ Rp


hj ↓0
j=p+1,p+2,··· ,p+q

and the conditional p.d.f. of Y given Z = z by


fY,Z (y, z)
fY |Z (y | z) := , y ∈ Rp .
fZ (z)

Note 9.16. For notational convenience, we have discussed the conditional distribution of first
p component RVs with respect to the final q component RVs. However, as long as the (p + q)-
dimensional joint distribution is known, we can discuss the conditional distribution of any of the
k-component RVs with respect to the other (p + q − k)-component RVs.
104

Note 9.17. Let (X, Y ) be a 2-dimensional continuous random vector such that X and Y are
independent. Then fX,Y (x, y) = fX (x)fY (y), ∀x, y ∈ R. Then

fY |X (y | x) = fY (y), ∀y ∈ R,

provided fX (x) > 0. This statement can be generalized to higher dimensions with appropriate
changes in the notation.

Example 9.18. In Example 9.12, we have, for fixed x ∈ (0, 1),



2xy
, if y ∈ (x, 1)

fX,Y (x, y)


x(1−x2 )
fY |X (y | x) = =
fX (x) 0, otherwise.

Earlier, we have discussed about the distribution of functions of RVs. We now generalize the
same concept for random vectors.

Remark 9.19. Let X = (X1 , . . . , Xp ) be a p-dimensional discrete/continuous random vector with


joint p.m.f./p.d.f. fX . We are interested in the distribution of Y = h(X) for functions h : Rp →
Rq . Here, Y = (Y1 , . . . , Yq ) is a q-dimensional random vector with Yj = hj (X1 , . . . , Xp ), where
hj : Rp → R, j = 1, 2, · · · , q denotes the component functions of h. The distribution of Y is
uniquely determined as soon as we are able to compute the joint DF FY of Y . Note that

FY (y1 , · · · , yq ) = P(Y1 ≤ y1 , · · · , Yq ≤ yq ) = P(h1 (X) ≤ y1 , · · · , hq (X) ≤ yq ), ∀(y1 , · · · , yq ) ∈ Rq .

Once the joint DF FY is known, the joint p.m.f./p.d.f. of Y can then be deduced by standard
techniques.

Example 9.20. Let X1 ∼ U nif orm(0, 1) and X2 ∼ U nif orm(0, 1) be independent RVs. Suppose
we are interested in the distribution of Y = X1 + X2 . By independence of X1 and X2 , the joint
p.d.f. (X1 , X2 ) is given by

fX1 ,X2 (x1 , x2 ) = fX1 (x1 ) fX2 (x2 )



 1, if x1 , x2 ∈ (0, 1)
=
0, otherwise.
105

Consider the function h : R2 → R defined by h(x1 , x2 ) := x1 + x2 , ∀(x1 , x2 ) ∈ R2 . Then Y =


h(X1 , X2 ). Now, for y ∈ R

FY (y) = P(Y ≤ y)

= P(h(X1 , X2 ) ≤ y)
Z ∞ Z ∞
= 1(−∞,y] (h(x1 , x2 ))fX1 ,X2 (x1 , x2 ) dx1 dx2
−∞ −∞
Z 1Z 1
= 1(−∞,y] (x1 + x2 ) dx1 dx2
0 0




 0, if y < 0,



 xy =0 xy−x if 0 ≤ y < 1,
 R R 1
=0 dx2 dx1 ,

1 2
=



 1 − 21 × (2 − y) × (2 − y), if 1 ≤ y < 2,



1, if y ≥ 2





 0, if y < 0,


 2
 y , if 0 ≤ y < 1,


2
=
4y−y 2 −2
, if 1 ≤ y < 2,





 2


1, if y ≥ 2

Here, FY is differentiable everywhere except possibly at the points 0, 1, 2 and






y, if y ∈ (0, 1),


FY′ (y) = 2 − y, if y ∈ (1, 2),



0,

otherwise.
R∞
Observe that −∞ FY′ (y) dy = 1 and the derivative is non-negative. Hence, Y is a continuous RV
with the p.d.f. given by FY′ .

As done in the case of RVs, in the setting of Remark 9.19, we consider the computation of the
joint p.m.f./p.d.f. of Y directly, instead of computing the joint DF FY first. The next result is a
direct generalization of Theorem 5.21 and we skip the proof for brevity.
106

Theorem 9.21 (Change of Variables for Discrete random vectors). Let X = (X1 , . . . , Xp ) be a
p-dimensional discrete random vector with joint p.m.f. fX and support SX . Let h = (h1 , · · · , hq ) :
Rp → Rq be a function and let Y = (Y1 , · · · , Yq ) = h(X) = (h1 (X), · · · , hq (X)). Then Y is a
discrete random vector with support

SY = h(SX ) = {h(x) : x ∈ SX },

joint p.m.f.  X



 fX (x), if y ∈ SY ,

x∈SX
fY (y) =
h(x)=y


0,

otherwise
and joint DF
fX (x), ∀y ∈ Rq .
X
FY (y) =
x∈SX
h(x)≤y

Example 9.22. Fix p ∈ (0, 1) and let n1 , · · · , nq be positive integers. Let X1 , · · · , Xq be indepen-
dent RVs with Xi ∼ Binomial(ni , p), i = 1, · · · , q. Here, the with p.m.f.s are given by

 
 ni px (1 − p)ni −x , ∀x

 ∈ {0, 1, · · · , ni },
x
fXi (xi ) =
0, otherwise

for i = 1, · · · , q. Using independence, the joint p.m.f. is given by


 q ! q
ni Pq xi Pq
Y
p i=1 (1 − p)n− i=1 xi , ∀(x1 , · · · , xq ) ∈ {0, 1, · · · , ni },
 Y


fX (x1 , · · · , xq ) = i=1 xi i=1


0,

otherwise

where n = n1 + · · · + nq . Consider Y = X1 + · · · + Xq . Now, if y ∈


/ {0, 1, · · · , n}, fY (y) =
P(X1 + · · · + Xq = y) = 0 and if y ∈ {0, 1, · · · , n}, then

fY (y) = P(X1 + · · · + Xq = y)
X
= fX (x1 , · · · , xq )
Qq
(x1 ,··· ,xq )∈ i=1 {0,1,··· ,ni }
x1 +···+xq =y
107

q !
y n−y
X Y ni
= p (1 − p)
Qq
(x1 ,··· ,xq )∈ i=1 {0,1,··· ,ni } i=1
xi
x1 +···+xq =y
!
n y
= p (1 − p)n−y .
y
Therefore, Y = X1 + · · · + Xq ∼ Binomial(n, p) with n = n1 + · · · + nq .

Remark 9.23. We had earlier mentioned that Bernoulli(p) distribution is the same as Binomial(1, p)
distribution. Using the above computation, we can identify a Binomial(n, p) RV as a sum of n
independent RVs each having distribution Bernoulli(p). We shall come back to this observation
in later lectures.

For continuous random vectors, we have the following generalization of Theorem 6.1. Proof of
this result is being skipped.

Theorem 9.24. Let X = (X1 , . . . , Xp ) be a p-dimensional continuous random vector with joint
p.d.f. fX . Suppose that {x ∈ Rp : fX (x) > 0} can be written as a disjoint union ∪ki=1 Si of open
sets in Rp .
Let hj : Rp → R, j = 1, · · · , p be functions such that h = (h1 , · · · , hp ) : Si → Rp is one-to-one
with inverse h−1
i = ((h1i )−1 , · · · , (hpi )−1 ) for each i = 1, · · · , k. Moreover, assume that (hji )−1 , i =
1, 2, · · · , k; j = 1, · · · , p have continuous partial derivatives and the Jacobian determinant of the
transformation
∂(h1i )−1 ∂(h1i )−1
∂y1
(t) ··· ∂yp
(y)
.. .. ..
Ji := . . . ̸= 0, ∀i = 1, · · · , k.
∂(hpi )−1 ∂(hpi )−1
∂y1
(y) · · · ∂yp
(y)
Then the p-dimensional random vector Y = (Y1 , · · · , Yp ) = h(X) = (h1 (X), · · · , hp (X)) is a
continuous with joint p.d.f.
k
fX ((h1i )−1 (y), · · · , (hpi )−1 (y)) |Ji | 1h(Si ) (y).
X
fY (y) =
i=1

Example 9.25. Fix λ > 0. Let X1 ∼ Exponential(λ) and X2 ∼ Exponential(λ) be independent


RVs defined on the same probability space. The joint distribution of (X1 , X2 ) is given by the joint
108

p.d.f. 
1 exp(− x1 +x ), if x1 > 0, x2 > 0

 2
λ2 λ
fX1 ,X2 (x1 , x2 ) = fX1 (x1 )fX2 (x2 ) =
0, otherwise.

Consider the function



(x + x2 , x1x+x ), ∀x1 > 0, x2 > 0,

 1
1 2
h(x1 , x2 ) =
0, otherwise.

Here, {(x1 , x2 ) ∈ R2 : fX1 ,X2 (x1 , x2 ) > 0} = {(x1 , x2 ) ∈ R2 : x1 > 0, x2 > 0} and h : {(x1 , x2 ) ∈
R2 : x1 > 0, x2 > 0} → R2 is one-to-one with range (0, ∞) × (0, 1). The inverse function is
h−1 (y1 , y2 ) = (y1 y2 , y1 (1 − y2 )) for (y1 , y2 ) ∈ (0, ∞) × (0, 1) with Jacobian determinant given by

y2 y1
J(y1 , y2 ) = = −y1 .
1 − y2 −y1

Now, Y = (Y1 , Y2 ) = h(X1 , X2 ) = (X1 + X2 , X1X+X


1
2
) has the joint p.d.f. given by

X1 ,X2 (y1 y2 , y1 (1 − y2 ))|J(y1 , y2 )|, if y1 > 0, y2 ∈ (0, 1)

f

fY1 ,Y2 (y1 , y2 ) =
0, otherwise.



 
 1 y exp − y1 , if y1 > 0, y2 ∈ (0, 1)


2 1
= λ λ

0, otherwise.

Now, we compute the marginal distributions. The marginal p.d.f. fY1 is given by

 
1 exp − yλ1 , if y1 > 0

Z ∞  y
λ2 1
fY1 (y1 ) = fY1 ,Y2 (y1 , y2 ) dy2 =
−∞ 0, otherwise

and the marginal p.d.f. fY2 is given by



1, if y2 ∈ (0, 1)
Z ∞ 

fY2 (y2 ) = fY1 ,Y2 (y1 , y2 ) dy1 =  .
−∞ 0,
 otherwise
109

X1
Therefore Y1 = X1 + X2 ∼ Gamma(2, λ) and Y2 = X1 +X2
∼ U nif orm(0, 1). Moreover,

fY1 ,Y2 (y1 , y2 ) = fY1 (y1 )fY2 (y2 ), ∀(y1 , y2 ) ∈ R2

and hence Y1 and Y2 are independent.

Remark 9.26. We had earlier mentioned that Exponential(λ) distribution is the same as Gamma(1, λ)
distribution. Using the above computation, we can identify a Gamma(2, λ) RV as a sum of two
independent RVs each having distribution Gamma(1, λ). A more general property in this regard
is mentioned in practice problem set 8.

We now consider expectations for random vectors and for functions of random vectors. The
concepts are same as discussed in the case of RVs.

Definition 9.27 (Expectation/Mean/Expected Value for functions of Random Vectors). Let X =


(X1 , X2 , · · · , Xp ) be a p-dimensional discrete/continuous random vector with joint p.m.f./p.d.f. fX .
Let h : Rp → R be a function. Then h(X) is an one-dimensional random vectors, i.e. an RV. We
say that the expectation of h(X), denoted by Eh(X), is defined as the quantity

|h(x)|fX (x) < ∞ for discrete X,
P P
h(x)fX (x), if


x∈SX x∈SX

Eh(X) := R∞ R∞ R∞ R∞ R∞ R∞
−∞ · · · −∞ h(x)fX (x) dx, if ··· |h(x)|fX (x) dx < ∞ for continuous X.


 −∞ −∞ −∞ −∞

In the discrete case, SX denotes the support of X.

Remark 9.28. If the sum or the integral above converges absolutely, we say that the expectation
Eh(X) exists or equivalently, Eh(X) is finite. Otherwise, we shall say that the expectation Eh(X)
does not exist.

The following results is a generalization of Proposition 6.19. We skip the proof for brevity.

Proposition 9.29. (a) Let X = (X1 , X2 , · · · , Xp ) be a discrete random vector with joint p.m.f.
fX and support SX and let h : Rp → R be a function. Consider the discrete RV Y := h(X)
|y|fY (y) < ∞ and in
P
with p.m.f. fY and support SY . Then EY exists if and only if y∈SY
110

this case,
X X
EY = Eh(X) = h(x)fX (x) = yfY (y).
x∈SX y∈SY

(b) Let X = (X1 , X2 , · · · , Xp ) be a continuous random vector with joint p.d.f. fX . Let h :
Rp → R be a function such that the RV Y := h(X) is continuous with p.d.f. fY . Then EY
R∞
exists if and only if −∞ |y|fY (y) dy < ∞ and in this case,
Z ∞ Z ∞ Z ∞ Z ∞
EY = Eh(X) = ··· h(x)fX (x) dx = yfY (y) dy.
−∞ −∞ −∞ −∞

Note 9.30. As considered for the case of RVs, by choosing different functions h : Rp → R, we
obtain several quantities of interest of the form Eh(X) for a p-dimensional random vector X.

Definition 9.31 (Some special expectations for Random Vectors). Let X = (X1 , X2 , · · · , Xp ) be
a p-dimensional discrete/continuous random vector.

(a) (Joint Moments) For non-negative integers k1 , . . . , kp , let h(x) := xk11 · · · xkpp , ∀x ∈ Rp .
Then,
 
µ′k1 ,...,kp := E X1k1 · · · Xpkp

is called a joint moment of order k1 + · · · + kp of X, provided it exists.


(b) (Joint Central Moments) For non-negative integers k1 , . . . , kp , let

h(x) := (x1 − E (X1 ))k1 · · · (xp − E (Xp ))kp , ∀x ∈ Rp .

Then
 
µk1 ,...,kp := E (X1 − E (X1 ))k1 · · · (Xp − E (Xp ))kp

is called a joint central moment of order k1 + · · · + kp of X, provided it exists.


(c) (Covariance) Fix i, j = 1, . . . , p. Let h(x) := (xi − E (Xi )) (xj − E (Xj )) , ∀x = (x1 , x2 , · · · , xp ) ∈
Rp . Then, E [(Xi − E (Xi )) (Xj − E (Xj ))] is called the covariance between Xi and Xj , pro-
vided it exists. We shall denote this quantity by Cov(Xi , Xj ).
(d) (Joint Moment Generating Function, or simply, Joint MGF) We define
n  Pp  o
A := t = (t1 , t2 , . . . , tp ) ∈ Rp : E e tX
i=1 i i <∞ ,
111

and consider the function MX : A → R defined by


 Pp 
tX
MX (t) = E e i=1 i i , ∀t = (t1 , t2 , . . . , tp ) ∈ A.

If (−a1 , a1 ) × (−a2 , a2 ) × · · · × (−ap , ap ) ⊆ A for some a1 , a2 , · · · , ap > 0, then the function


MX is called the joint moment generating function (joint MGF) of the random vector X.
Note that t = (0, 0, · · · , 0) ∈ Rp yields MX (t) = 1 and hence (0, 0, · · · , 0) ∈ A.
(e) (Joint Characteristic Function) Define ΦX : Rp → C by
 Pp 
i tj Xj
ΦX (t) = E e j=1 = E exp (i t.X) = E cos(t.X) + i E sin(t.X), ∀t = (t1 , t2 , . . . , tp ) ∈ Rp ,
√ Pp
where i denotes the complex number −1 and t.X = j=1 tj Xj is the standard dot product
p p
in R . This function exists for all t ∈ R .

Remark 9.32. We now list some properties of the above quantities. The properties are being
stated under the assumption that the expectations involved exist. Let X = (X1 , X2 , · · · , Xp ) be a
p-dimensional discrete/continuous random vector.

p
X
! p
X
(a) Let a1 , . . . , ap be real constants. Then, E ai Xi = ai EXi . To see this for discrete
i=1 i=1
X, observe that
p
X
! p
X X p X
X p
X
E ai Xi = ai xi fX (x) = ai xi fX (x) = ai EXi .
i=1 x∈SX i=1 i=1 x∈SX i=1

The interchange of the order of summation is allowed due to absolute convergence of the
series involved. The proof for continuous X is similar.
(b) Cov(Xi , Xj ) = Cov(Xj , Xi ), for all i, j = 1, . . . , p.
(c) Cov(Xi , Xi ) = V ar(Xi ), for all i = 1, . . . , p.
(d) For all i, j = 1, . . . , p, we have

Cov(Xi , Xj ) = E [Xi Xj − Xi (EXj ) − Xj (EXi ) + (EXi )(EXj )]

= E (Xi Xj ) − E (Xi ) E (Xj )


112

(e) Let X1 , X2 , . . . , Xp , Y1 , Y2 , . . . , Yq be RVs, and let a1 , . . . , ap , b1 , . . . , bq be real constants.


Then,  
Xp q
X p X
X q
Cov  ai Xi , bj Y j  = ai bj Cov (Xi , Yj ) .
i=1 j=1 i=1 j=1

In particular,
p ! p p X
p
a2i V
X X X
V ar ai X i = ar (Xi ) + ai aj Cov (Xi , Xj )
i=1 i=1 i=1 j=1
j̸=i
p
a2i V ar (Xi ) + 2
X X
= ai aj Cov (Xi , Xj ) .
i=1 1≤i<j≤p

(f) Let X1 , X2 , · · · , Xp be independent and let h1 , h2 , · · · , hp : R → R be functions. Then


p
Y
! p
Y
E hi (Xi ) = Ehi (Xi ).
i=1 i=1

For simplicity, we discuss the proof when p = 2 and X = (X1 , X2 ) is continuous with joint
p.d.f. fX . Recall from Theorem 9.9 that fX (x1 , x2 ) = fX1 (x1 )fX2 (x2 ), ∀x1 , x2 , ∈ R. Then,
2
! Z ∞ Z ∞
Y
E hi (Xi ) = h1 (x1 )h2 (x2 )fX (x1 , x2 ) dx1 dx2
i=1 −∞ −∞
Z ∞ Z ∞
= h1 (x1 )h2 (x2 )fX1 (x1 )fX2 (x2 ) dx1 dx2
−∞ −∞
Z ∞  Z ∞ 
= h1 (x1 )fX1 (x1 ) dx1 h2 (x2 )fX2 (x2 ) dx2
−∞ −∞
2
Y
= Ehi (Xi ).
i=1

(g) This is a special case of statement (f). Let A1 , A2 , · · · , Ap ⊆ R. Consider the functions

1, if x ∈ Ai


hi (xi ) := = 1Ai (xi ), ∀xi ∈ R, i = 1, 2, · · · , p.
0, otherwise.

R∞ R∞ R∞
Note that Ehi (Xi ) = −∞ −∞ ··· −∞ 1Ai (xi )fX1 (x1 )fX2 (x2 ) · · · fXp (xp ) dx1 dx2 · · · dxp =
R∞
−∞ 1Ai (xi )fXi (xi ) dxi = P(Xi ∈ Ai ), when X is continuous. The same equality is also
113

true when X is discrete. Now, consider the function h : Rp → R defined by h(x) =


Qp
i=1 hi (xi ), ∀x ∈ Rp . Using (f), we have
p
Y
P(X1 ∈ A1 , X2 ∈ A2 , · · · , Xp ∈ Ap ) = P(Xi ∈ Ai ).
i=1

(h) Continue with the assumptions of statement (f). For fixed y1 , y2 , · · · , yp ∈ R, consider the
functions g1 , g2 , · · · , gp : R → R defined by

1,if hi (xi ) ≤ yi


gi (xi ) :=  ∀xi ∈ R, i = 1, 2, · · · , p.
0, otherwise.

Note that Egi (Xi ) = P(hi (Xi ) ≤ yi ) = Fhi (Xi ) (yi ), ∀i and

Fh1 (X1 ),h2 (X2 ),··· ,hp (Xp ) (y1 , y2 , · · · , yp ) = P(h1 (X1 ) ≤ y1 , h2 (X2 ) ≤ y2 , · · · , hp (Xp ) ≤ yp )
p
Y
= P(hi (Xi ) ≤ yi )
i=1
Yp
= Fh(Xi ) (yi ).
i=1

Hence, the RVs h1 (X1 ), h2 (X2 ), · · · , hp (Xp ) are independent.


(i) Let X1 , X2 be independent RVs. Then E(X1 X2 ) = (EX1 )(EX2 ) and hence, using (d),

Cov(X1 , X2 ) = 0.

Further, if X1 , X2 , · · · , Xp are independent, then using (e),


p ! p
a2i V ar (Xi )
X X
V ar ai X i =
i=1 i=1

for all real constants a1 , a2 , · · · , ap .


(j) Recall that MX : A → R is given by
 Pp 
tX
MX (t) = E e i=1 i i , ∀t = (t1 , t2 , . . . , tp ) ∈ A,
114

with
n  Pp  o
A := t = (t1 , t2 , . . . , tp ) ∈ Rp : E e tX
i=1 i i <∞ .

Taking t = (0, 0, · · · , 0) ∈ Rp yields MX (0, 0, · · · , 0) = 1 and hence (0, 0, · · · , 0) ∈ A. In


particular, A ̸= ∅. Also, MX (t) > 0, ∀t ∈ A.
 Pp   
t X
(k) If t = (0, · · · , 0, ti , 0, · · · , 0) ∈ A, then MX (t) = E e k=1 k k = E eti Xi = MXi (ti ).
 Pp 
t X
Similarly, if t = (0, · · · , 0, ti , 0, · · · , 0, tj , 0, · · · , 0) ∈ A, then MX (t) = E e k=1 k k =
 
E eti Xi +tj Xj = MXi ,Xj (ti , tj ).
(l) This result is being stated without proof. If (−a1 , a1 ) × (−a2 , a2 ) × · · · × (−ap , ap ) ⊆ A for
some a1 , a2 , · · · , ap > 0, then MX possesses partial derivatives of all orders in (−a1 , a1 ) ×
(−a2 , a2 ) × · · · × (−ap , ap ). Furthermore, for non-negative integers k1 , . . . , kp
∂ k1 +k2 +k3 +···+kp
" #
 
E X1k1 X2k2 · · · Xpkp = k MX (t) .
∂tk11 · · · ∂tpp (t1 ,t2 ...tp )=(0,...,0)

For i ̸= j with i, j ∈ {1, . . . , p}, we have

Cov (Xi , Xj )

= E (Xi Xj ) − E (Xi ) E (Xj )


" # " # " #
∂2 ∂ ∂
= MX (t) − MX (t) MX (t)
∂ti ∂tj (t1 ,t2 ...tp )=(0,...,0)
∂ti (t1 ,t2 ...tp )=(0,...,0)
∂tj (t1 ,t2 ...tp )=(0,...,0)
" #
2

= ΨX (t) ,
∂ti ∂tj
where ΨX (t) := ln MX (t), t ∈ A. Compare this with the one-dimensional case in Proposi-
tion 6.45.
(m) If X1 , X2 , · · · , Xp are independent, then for all t ∈ A,
 Pp  p ! p   p
tX ti Xi ti Xi
Y Y Y
MX (t) = E e i=1 i i =E e = E e = MXi (ti ).
i=1 i=1 i=1

(n) If (−a1 , a1 ) × (−a2 , a2 ) × · · · × (−ap , ap ) ⊆ A for some a1 , a2 , · · · , ap > 0 and MX (t) =


Qp
i=1 MXi (ti ), ∀t ∈ A, then it can be shown that X1 , X2 , · · · , Xp are independent. We do
not discuss the proof of this result in this course.
115

(o) As discussed for the case of random variables, the joint Characteristic function for ran-
dom vectors has nice properties similar to the joint MGF. If some joint moment for the
random vectors exists, then it can be recovered from the partial derivatives of the joint
Characteristic function etc.. We do not discuss these properties in detail in this course.

You might also like