0% found this document useful (0 votes)
52 views33 pages

Notes 5

The document discusses marginal distributions and densities. The marginal density of a random variable X is defined as the integral of the joint density f(x,y) over all values of the other variable y. Similarly, the marginal density of Y is defined as the integral of f(x,y) over all values of x. It is shown that if the joint density can be written as the product of two univariate functions, f(x,y)=g1(x)g2(y), then the random variables X and Y are independent. This factorization property is known as the factorization theorem.

Uploaded by

Rithu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views33 pages

Notes 5

The document discusses marginal distributions and densities. The marginal density of a random variable X is defined as the integral of the joint density f(x,y) over all values of the other variable y. Similarly, the marginal density of Y is defined as the integral of f(x,y) over all values of x. It is shown that if the joint density can be written as the product of two univariate functions, f(x,y)=g1(x)g2(y), then the random variables X and Y are independent. This factorization property is known as the factorization theorem.

Uploaded by

Rithu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Marginal Distribution and Marginal Den-

sity: (X, Y ) has the joint pdf f (x, y). The


marginal density functions of X and Y are given
by
Z ∞
fX (x) = f (x, y)dy.
−∞
Z ∞
fY (y) = f (x, y)dx.
−∞

Explanation: We can actually derive the above


equations. Take an arbitrary a and consider the
region A = {(x, y) : x ≤ a}.

P (A) = P (X ≤ a) = FX (a). But we know


ZZ Z a Z ∞
P (A) = f (x, y)dxdy = dx[ dyf (x, y)]
A −∞ −∞
R∞
Let g(x) = −∞ dyf (x, y). Then
Z a
FX (a) = dxg(x) for all possible a
−∞
R∞
which implies fX (x) = g(x) = −∞ dyf (x, y).
R∞
Similarly, we prove fY (y) = −∞ f (x, y)dx.
2
Ex.1

Suppose the joint pdf is given by


(
1 if x2 + y 2 ≤ 1
f (x, y) = π
0 Otherwise
We compute the marginal density fY (y). If
|y| > 1 then f (x, y) = 0 for all x
Z ∞ Z ∞
fY (y) = f (x, y)dx = 0dx = 0.
−∞ −∞
q
If |y| ≤ 1 then f (x, y) = 0 for |x| > 1 − y 2
√ q
Z ∞ Z 1−y 2
1 2 1 − y2
fY (y) = f (x, y)dx = √ 2
dx = .
−∞ − 1−y π π
So
 √
 2 1−y 2
fY (y) = π if − 1 ≤ y ≤ 1
 0 Otherwise
 √
 2 1−x2
Similarly, fX (x) = π if − 1 ≤ x ≤ 1
 0 Otherwise
Note the marginal is not uniform!
3
Indicator Functions

(
1 if argument is true
I(argument) =
0 if argument is false
For example, in Ex.1
1
f (x, y) = I(x2 + y 2 ≤ 1)
π
q
2 1 − x2
fX (x) = I(−1 ≤ x ≤ 1)
π
q
2 1 − y2
fY (y) = I(−1 ≤ y ≤ 1)
π

Note that

I(0 < x < 1, 0 < y < 1) = I(0 < x < 1)I(0 < y < 1)
and in general,

I(argument 1)I(argument 2)
= I(arguments 1 and 2)

4
Ex.2

Suppose the joint pdf is given by


1 − x2+y2
f (x, y) = e 2 .

We compute the marginal density fX (x).
Z ∞
fX (x) = f (x, y)dy
−∞
Z ∞ 2 2
1 − x +y
= e 2 dy
−∞ 2π
2 y2
Z ∞
1 x
= e− 2 e− 2 dy
−∞ 2π
1 − x2 ∞ 1 − y 2
Z
= √ e 2 √ e 2 dy
2π −∞ 2π
1 − x2
= √ e 2 ×1

1 − x2
= √ e 2.

y2
Likewise, fY (y) = √1 e− 2 . So X and Y are

N (0, 1).
5
Independent Random Variables

Definition: We say X and Y are independent,


if for every two sets A and B of real numbers,

P (X ∈ A, Y ∈ B) = P (X ∈ A)P (Y ∈ B).

Remark 1: Suppose (X, Y ) has a continuous


joint pdf f (x, y). Suppose the marginal densi-
ties of X and Y are fX (x) and fY (y). Then X
and Y are independent if and only if

f (x, y) = fX (x)fY (y) for all pairs of (x, y).

Remark 2: Suppose (X, Y ) has a discrete joint


pf f (x, y). Then X and Y are independent
if and only if

f (x, y) = fX (x)fY (y) for all pairs of (x, y).


where fX (x) and fY (y) are the probability func-
tion (p.f.) of X and Y .
6
In Ex.1, we claim X and Y are not independent,
because
s s q q
2 2 4 1/3 1/3 4
fX ( )fY ( )= 2
=
3 3 π 3π 2
s s
2 2
But f ( , )=0
3 3

In Ex.2, we have
1 − x2
fX (x) = √ e 2

1 − y2
fY (y) = √ e 2

1 − x2 1 − y 2 1 − x2+y2
fX (x)fY (y) = √ e 2 √ e 2 = e 2
2π 2π 2π
Therefore we checked for every pair (x, y)

fX (x)fY (y) = f (x, y)


which means X and Y are independent.
7
Factorization Theorem

If we can find two univariate functions h(x) and


q(y) such that

(∗) f (x, y) = g1(x)g2(y),


then X and Y are independent.

Note that if X and Y are independent, then


f (x, y) = fX (x)fY (y). So the condition (*)
does hold, because we let g1(x) = fX (x) and
g2(y) = fY (y).

The nice thing about Factorization theorem


is that often we can find such a factoriza-
tion of f (x, y) as in (*) without computing the
marginal densities, because g1(x) is not neces-
sarily fX (x) and g2(y) is not necessarily fY (y).

8
Ex.3

Suppose the joint pdf of (X, Y ) is given by


(
2e−x−2y if 0 < x, 0 < y
f (x, y) =
0 Otherwise

Using indicator function, we can write

f (x, y) = 2e−x−2y I(x > 0, y > 0).


We choose g1(x) = 2e−xI(x > 0) and g2(y) =
e−2y I(y > 0). Check

g1(x)g2(y) = 2e−xI(x > 0)e−2y I(y > 0)


= 2e−x−2y I(x > o)I(y > 0)
= 2e−x−2y I(x > 0, y > 0)
= f (x, y)
By Factorization theorem, we know X and Y
are independent.

9
Proof of Factorization Theorem
Suppose condition (*) holds for some g1 (x) and g2(y).
Z ∞
fX (x) = f (x, y)dy
Z−∞
∞ Z ∞
= g1(x)g2(y)dy = g1(x)( g2 (y)dy)
−∞ −∞

Z ∞
fY (y) = f (x, y)dx
Z−∞
∞ Z ∞
= g1(x)g2(y)dx = g2(y)( g1(x)dx)
−∞ −∞
R∞ R∞
From 1 = −∞ −∞
f (x, y)dxdy we also find
Z ∞Z ∞
1 = g1 (x)g2(y)dxdy
Z−∞∞ −∞ Z ∞
= ( g1(x)dx)( g2(y)dy).
−∞ −∞
Thus we have
Z ∞ Z ∞
fX (x)fY (y) = g1 (x)g2(y)( g2(y)dy)( g1 (x)dx)
−∞ −∞
= g1 (x)g2(y) = f (x, y).

So by definition X and Y are independent.

10
Conditional pdf

Suppose (X, Y ) has a joint pdf f (x, y). The


marginal pdf’s of X and Y are denoted by
fX (x) and fY (y).

Take any x, if fX (x) > 0, then we say the


conditional pdf of Y given X = x is
f (x, y)
fY (y|X = x) = .
fX (x)
fY (y|X = x) describes the distribution of Y
when we observe that the value of X is x.

Take any y, if fY (y) > 0, then we say the con-


ditional pdf of X given Y = y is
f (x, y)
fX (x|Y = y) = .
fY (y)
fX (x|Y = y) describes the distribution of X
when we observe that the value of Y is y.

11
For any two sets A and B, P (X ∈ A|Y = y)
means the probability that the value of X is
in A when we observe that the value of Y is
y. P (Y ∈ B|X = x) means the probability that
the value of Y is in B when we observe that
the value of X is x.

We can compute them by


Z
P (X ∈ A|Y = y) = fX (x|Y = y)dx
A
and
Z
P (Y ∈ B|X = x) = fY (y|X = x)dy
B

Important identities:

fX (x)fY (y|X = x) = f (x, y).

fY (y)fX (x|Y = y) = f (x, y).

12
Ex.1 Independent X and Y

If X and Y are independent, then f (x, y) =


fX (x)fY (y).
f (x, y)
fY (y|X = x) = = fY (y)
fX (x)

f (x, y)
fX (x|Y = y) = = fX (x)
fY (y)
Therefore, the conditional pdf is exactly the
marginal pdf.

The following statements are equivalent:

1. X and Y are independent

2. f (x, y) = fX (x)fY (y) for all (x, y)

3. fY (y|X = x) = ff (x,y) = fY (y) for all (x, y)


X (x)

4. fX (x|Y = y) = ff(x,y) = fX (x) for all (x, y)


Y (y)
13
Ex.2

Suppose the joint pdf


(
1 if x2 + y 2 ≤ 1
f (x, y) = π
0 Otherwise
Find fX (x|Y = 0.5).

Solution: We have calculated


q
2
fY (y) = 1 − y 2I(−1 ≤ y ≤ 1)
π
1 I(x2 + 0.52 ≤ 1)
f (x, 0.5)
fX (x|Y = 0.5) = = π q
fY (0.5) 2 1 − 0.52
π
1 √
= √ I(|x| ≤ 0.75).
3

14
Ex.3

Suppose the joint pdf is given by


(
6(x − y) if 0 < y < x < 1
f (x, y) =
0 Otherwise
Compute fY (y|X = 0.6).

Solution: First compute fX (x = 0.6).


Z ∞
fX (0.6) = f (0.6, y)dy
−∞
Z 0.6
= 6(0.6 − y)dy = 1.08
0
Then by definition of conditional pdf,
f (0.6, y)
fY (y|X = 0.6) =
fX (0.6)
6(0.6 − y)I(0 < y < 0.6)
=
1.08
(0.6 − y)
= I(0 < y < 0.6).
0.18

15
Continuous Bayes Theorem

Question: suppose we know the marginal den-


sity of X fX (x) and the conditional pdf of Y
given X: fY (y|X = x). How to get the condi-
tional pdf of X given Y ? For example, what is
fX (x|Y = 1)?

Solution: Let f (x, y) be the joint pdf of X and


Y , then by definition
f (x, 1)
fX (x|Y = 1) = .
fY (1)
So we need to figure out f (x, 1) and fY (1).
We know

f (x, 1) = fX (x)fY (y = 1|X = x).

16
Continuous Bayes Theorem, contd’

Z ∞
fY (y) = f (r, y)dr
−∞
Z ∞
fY (1) = f (r, 1)dr
Z−∞

= fX (r)f (y = 1|X = r)dr.
−∞
Therefore,
f (x)fY (y = 1|X = x)
fX (x|Y = 1) = R ∞ X .
−∞ fX (r)fY (y = 1|X = r)dr
In general:
f (x, y) f (x, y)
fX (x|Y = y) = = R∞
fY (y) −∞ f (r, y)dr
f (x)fY (y|X = x)
= R∞ X .
−∞ fX (r)fY (y|X = r)dr
The last equality is called continuous Bayes
theorem.

17
Ex.4

Suppose that fX (x) = e−xI(x > 0) and the


conditional pdf of Y given X = x is fY (y|X =
x) = xe−yxI(y > 0). Find (a) fX (x|Y = y) and
(b) P (X > 1|Y = 1).

Solution: First we write down


f (x)fY (y|X = x)
fX (x|Y = y) = R ∞ X
−∞ fX (r)fY (y|X = r)dr

fX (x) = e−xI(x > 0).

fY (y|X = x) = ye−xy .

fX (x)fY (y|X = x) = xe−(1+y)xI(x > 0).


Then we know
xe−(1+y)xI(x > 0)
fX (x|Y = y) = R ∞ .
−(1+y)r I(r > 0)dr
−∞ re

18
Z ∞
re−(1+y)r I(x > 0)dr
Z−∞

= re−(1+y)r dr
0
1
=
(1 + y)2
Therefore,
xe−(1+y)xI(x > 0)
fX (x|Y = y) = 1
(1+y)2
= (1 + y)2xe−(1+y)xI(x > 0).

(b) By (a) we know

fX (x|Y = 1) = 4xe−2xI(x > 0).


Z ∞
P (X > 1|Y = 1) = fX (x|Y = 1)dx
Z1∞
= 4xe−2xdx
1
= 3e−2
19
Bivariate Transformation

Suppose we have a pair of random variables


(X, Y ). We create a new pair of random vari-
ables (U, V ) by a bivariate transformation from
(X, Y ), that is,

U = g1(X, Y ) V = g2(X, Y ).
Let’s assume the transformation is one-to-one
and smooth, which means there are h1, h2 such
that

X = h1(U, V ) Y = h2(U, V )
and g1, g2, h1, h2 are differentiable.

20
Examples of Bivariate Transformation

Example 1:

U =X +Y V =X −Y
then
U +V U −V
X= Y =
2 2

Example 2:
q  
Y
R= X2 + Y 2 Θ = arctan
X
then

X = R cos(Θ) Y = R sin(Θ)

21
Jacobian of Bivariate Transformation

Suppose we have the following one-to-one bi-


variate transformation from (X, Y ) to (U, V )

U = g1(X, Y ) V = g2(X, Y ).
Assume we can write

X = h1(U, V ) Y = h2(U, V ).
Let
 
∂h1 (u,v) ∂h1 (u,v)
J =  ∂h ∂u ∂v 
2 (u,v) ∂h2 (u,v)
∂u ∂v
The Jacobian of the transformation is denoted
by |detJ|, which is computed by
∂h1(u, v) ∂h2(u, v) ∂h1(u, v) ∂h2(u, v)
| × − × |.
∂u ∂v ∂v ∂u

22
Examples of Jacobian Calculation

Example 1:

U =X +Y V =X −Y
then
U +V U −V
X= = h1(U, V ) Y = = h2(U, V )
2 2
∂h1(u, v) 1 ∂h1(u, v) 1
= =
∂u 2 ∂v 2
∂h2(u, v) 1 ∂h2(u, v) 1
= =−
∂u 2 ∂v 2
11 1 1 1
|detJ| = | − (− ) | =
22 2 2 2

23
Examples of Jacobian Calculation, contd’

Example 2:
q  
Y
R= X2 + Y 2 Θ = arctan
X
then
X = R cos(Θ) = h1(R, Θ)

Y = R sin(Θ) = h2(R, Θ)
Think R is U and Θ is V .
∂h1(r, θ) ∂h1(r, θ)
= cos(θ) = −r sin(θ)
∂r ∂θ
∂h2(r, θ) ∂h2(r, θ)
= sin(θ) = r cos(θ)
∂r ∂θ
|detJ| = | cos(θ)r cos(θ) − (−r sin(θ)) sin(θ)| = r

24
Joint pdf after bivariate transformation

Suppose we have the following one-to-one bi-


variate transformation from (X, Y ) to (U, V )

U = g1(X, Y ) V = g2(X, Y ).
Assume we can write

X = h1(U, V ) Y = h2(U, V ).

Let fX,Y denote the joint pdf of (X, Y )

Let fU,V denote the joint pdf of (U, V )

Then the joint pdf of (U, V ) can be computed


by

fU,V (u, v) = fX,Y (h1(u, v), h2(u, v))|detJ|

25
Example 1 and Example 2

Suppose the joint pdf of (X, Y ) is f (x, y) =


x2 +y 2
1e −

2 . Let (U, V ) = (X + Y, X − Y ) and
q  
2 2
(R, Θ) = ( X + Y , arctan X Y ) Find the joint

pdf of (U, V ) and (R, Θ).

Solution:
1 ( u+v
2 ) 2 +( u−v )2
2 1
fU,V (u, v) = e− 2
2π 2
1 − u2+v2
fU,V (u, v) = e 4

1 − (r cos(θ))2+(r sin(θ))2
fR,Θ(r, θ) = e 2 r

1 − r2
fR,Θ(r, θ) = re 2

r ≥ 0, −π ≤ θ ≤ π
So R and Θ are independent. Why? because
of the factorization theorem.
26
What are the marginal pdfs of Θ and R?
Z ∞
1 2 1 − r2 ∞ 1
− r2
fΘ(θ) = re dr = e 2 |0 =
0 2π 2π 2π
Z π
1 −r2 2
− r2
fR (r) = re dθ = re
−π 2π
2 √
What is the marginal pdf of T = R4 ? R = 4T
√ √
√ − 4t2 d 4t
fT (t) = 4te 2 | | = 2e−2t
dt

27
Joint distribution of n random variables

Notation: n random variables Xi i = 1, 2, . . . , n


(x1, x2, . . . , xn) is the value of the n-dimensional
random vector (X1, X2, . . . , Xn).

Definition: A joint pdf function f (x1, . . . , xn)


must satisfy two conditions

1. f (x1, . . . , xn) ≥ 0

R∞ R∞
2. −∞ · · · −∞ f (x1, . . . , xn)dx1 · · · dxn = 1.

Let A be some region, and denote P (A) =


P ((X1, . . . , Xn) ∈ A). Then
Z Z
P (A) = ··· f (x1, . . . , xn)dx1 · · · dxn.
(x1 ,...,xn )∈A

28
The joint distribution function is
F (x1, . . . , xn) = Pr(X1 ≤ x1, . . . , Xn ≤ xn).
Z x Z x
1 n
F (x1, . . . , xn) = ··· f (r1, . . . , rn)dr1 · · · drn
−∞ −∞

∂ nF (x1, . . . , xn)
f (x1, . . . , xn) =
∂x1 · · · ∂xn

Marginal pdf

The marginal pdf of X1 can be computed by


Z ∞ Z ∞
fX1 (x1) = ··· f (x1, . . . , xn)dx2 · · · dxn.
| −∞ {z −∞}
n−1 dimension
Similarly,
Z ∞ Z ∞
fXn (xn) = ··· f (x1, . . . , xn)dx1dx2 · · · dxn−1.
| −∞ {z −∞}
n−1 dimension

Basically, to find the marginal pdf of Xi, we


just take the integral of f (x1, . . . , xn) with re-
spect to all xs except xi.
29
The conditional pdf of X1 given that X2 =
x2, . . . , Xn = xn is
fX1 (x1|X2 = x2, . . . , Xn = xn)

f (x1, x2, . . . , xn)


= R∞
−∞ f (r1 , x2 , . . . , xn )dr1
and
Z ∞
f (r1, x2, . . . , xn)dr1
−∞
gives us the joint pdf of (X2, X3, . . . , Xn).

The conditional joint pdf of (X1, X2) given


that X3 = x3, . . . , Xn = xn is
f(X1,X2)(x1, x2|X3 = x3, . . . , Xn = xn)

f (x1, x2, . . . , xn)


= R∞ R∞
−∞ −∞ f (r1, r2, . . . , xn )dr1dr2
and
Z ∞ Z ∞
f (r1, r2, . . . , xn)dr1dr2
−∞ −∞
gives us the joint pdf of (X3, . . . , Xn).
30
Independence

Definition: We say X1, X2, . . . , Xn are inde-


pendent if only if

f (x1, . . . , xn) = fX1 (x1) · · · fXn (xn)


That is, the joint pdf is equal to the product
of the marginal densities.

If X1, X2, . . . , Xn are independent, then

P (X1 ∈ A1, X2 ∈ A2, . . . , Xn ∈ An)


= P (X1 ∈ A1)P (X2 ∈ A2) · · · P (Xn ∈ An)

IID RVs: If X1, X2, . . . Xn are independent


and have the same distribution, that is, they
have the same pdf. Then we say X1, X2,
. . . Xn are independent and identically distributed
(i.i.d.) random variables.

31
Factorization Theorem: X1, X2, . . . , Xn are
independent if only if

f (x1, . . . , xn) = h1(x1) · · · hn(xn).

Ex 1: Suppose the joint pdf of (X, Y, Z) is


(
e−x−y−z if 0 < x, y, z
f (x, y, z) =
0 Otherwise
Use the factorization theorem to show they are
independent.

Solution:

f (x, y, z) = e−x−y−z I(x > 0, y > 0, z > 0)


= e−xe−y e−z I(x > 0)I(y > 0)I(z > 0).
Let h1(x) = e−xI(x > 0) h2(y) = e−y I(y > 0)
and h3(z) = e−z I(z > 0). Then

f (x, y, z) = h1(x)h2(y)h3(z)
By the factorization theorem we show X, Y, Z
are independent.
32
Ex 2: Suppose X, Y, Z are i.i.d. random vari-
ables following standard normal distribution.
Write down the joint pdf of (X, Y, Z).

Solution: The pdf of standard normal distribu-


tion is
1 − s2
f (s) = √ e 2

Therefore,

f (x, y, z) = fX (x)fY (y)fZ (z)


1 − x2 1 − y 2 1 − z 2
= √ e 2√ e 2√ e 2
2π 2π 2π
1 3 − x2+y2+z2
= (√ ) e 2

33
Ex 3: Suppose X1, . . . , Xn are i.i.d. random
variables following U nif (0, a) distribution. Let
X(n) be the maximum of X1, X2, . . . , Xn. Find
the pdf of X(n).

Solution: Note that since 0 < Xi < a, we know


0 < X(n) < a. We compute P (X(n) ≤ t) for any
given t, 0 < t < a
P (X(n) ≤ t) = P (X1 ≤ t, X2 ≤ t, . . . , Xn ≤ t)
= P (X1 ≤ t)P (X2 ≤ t) · · · P (Xn ≤ t)

We used the independence assumption. Also


by the identical distribution assumption, we
know
Z t Z t
1 t
P (Xi ≤ t) = f (x)dx = dx =
0 0 a a
for each i = 1, 2, . . . , n.
 n
t
P (X(n) ≤ t) =
a
Thus
dP (X(n) ≤ t) n t n−1
 
fX(n) (t) = =
dt a a
34

You might also like