0% found this document useful (0 votes)

60 views

Random Variables

This document discusses joint probability density functions for continuous random variables X and Y. It defines what it means for X and Y to be jointly continuous, requiring the existence of a joint probability density function fX,Y(x,y). The joint density contains all information about the underlying probability measure and can be used to compute probabilities of events defined in terms of X and Y. It also discusses marginal densities, independence of continuous random variables, expected values of functions of X and Y, and computing the density of a new random variable defined as a function of X and Y.

Uploaded by

Kartik Besoya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views

Random Variables

Uploaded by

Kartik Besoya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

6 Jointly continuous random variables

Again, we deviate from the order in the book for this chapter, so the subsec-
tions in this chapter do not correspond to those in the text.

6.1 Joint density functions

Recall that X is continuous if there is a function f (x) (the density) such that
Z t
P(X ≤ t) = fX (x) dx
−∞

We generalize this to two random variables.

Definition 1. Two random variables X and Y are jointly continuous if there
is a function fX,Y (x, y) on R2 , called the joint probability density function,
such that
Z Z
P(X ≤ s, Y ≤ t) = fX,Y (x, y) dxdy
x≤s,y≤t

The integral is over {(x, y) : x ≤ s, y ≤ t}. We can also write the integral as
Z s Z t
P(X ≤ s, Y ≤ t) = fX,Y (x, y) dy dx
−∞ −∞
Z t Z s
= fX,Y (x, y) dx dy
−∞ −∞

In order for a function f (x, y) to be a joint density it must satisfy

f (x, y) ≥ 0
Z ∞ Z ∞
f (x, y)dxdy = 1
−∞ −∞

Just as with one random variable, the joint density function contains all
the information about the underlying probability measure if we only look at
the random variables X and Y . In particular, we can compute the probability
of any event defined in terms of X and Y just using f (x, y).
Here are some events defined in terms of X and Y :
{X ≤ Y }, {X 2 + Y 2 ≤ 1}, and {1 ≤ X ≤ 4, Y ≥ 0}. They can all be written
in the form {(X, Y ) ∈ A} for some subset A of R2 .

1
Proposition 1. For A ⊂ R2 ,
Z Z
P((X, Y ) ∈ A) = f (x, y) dxdy
A

The two-dimensional integral is over the subset A of R2 . Typically, when

we want to actually compute this integral we have to write it as an iterated
integral. It is a good idea to draw a picture of A to help do this.
A rigorous proof of this theorem is beyond the scope of this course. In
particular we should note that there are issues involving σ-fields and con-
straints on A. Nonetheless, it is worth looking at how the proof might start
to get some practice manipulating integrals of joint densities.
If A = (−∞, s] × (−∞, t], then the equation is the definition of jointly
continuous. Now suppose A = (−∞, s] × (a, b]. The we can write it as
A = [(−∞, s] × (−∞, b]] \ [(−∞, s] × (−∞, a]] So we can write the event

{(X, Y ) ∈ A} = {(X, Y ) ∈ (−∞, s] × (−∞, b]} \ {(X, Y ) ∈ (−∞, s] × (−∞, a]}

MORE !!!!!!!!!
Definition: Let A ⊂ R2 . We say X and Y are uniformly distributed on A
if
1
, if (x, y) ∈ A
f (x) = c
0, otherwise

where c is the area of A.

Example: Let X, Y be uniform on [0, 1] × [0, 2]. Find P(X + Y ≤ 1).

Example: Let X, Y have density

1 1
f (x, y) = exp(− (x2 + y 2 ))
2π 2
Compute P(X ≤ Y ) and P(X 2 + Y 2 ≤ 1).

Example: Now suppose X, Y have density

−x−y
e if x, y ≥ 0
f (x, y) =
0, otherwise

2
Compute P(X + Y ≤ t).
What does the pdf mean? In the case of a single discrete RV, the pmf
has a very concrete meaning. f (x) is the probability that X = x. If X is a
single continuous random variable, then
Z x+δ
P(x ≤ X ≤ x + δ) = f (u) du ≈ δf (x)
x

If X, Y are jointly continuous, than

P(x ≤ X ≤ x + δ, y ≤ Y ≤ y + δ) ≈ δ 2 f (x, y)

6.2 Independence and marginal distributions

Suppose we know the joint density fX,Y (x, y) of X and Y . How do we find
their individual densities fX (x), fY (y). These are called marginal densities.
The cdf of X is

FX (x) = P(X ≤ x) = P(−∞ < X ≤ x, −∞ < Y < ∞)

Z x Z ∞
= fX,Y (u, y) dy du
−∞ −∞

Differentiate this with respect to x and we get

Z ∞
fX (x) = fX,Y (x, y) dy
−∞

In words, we get the marginal density of X by integrating y from −∞ to ∞

in the joint density.

Proposition 2. If X and Y are jointly continuous with joint density fX,Y (x, y),
then the marginal densities are given by
Z ∞
fX (x) = fX,Y (x, y) dy
−∞
Z ∞
fY (y) = fX,Y (x, y) dx
−∞

3
We will define independence of two contiunous random variables differ-
ently than the book. The two definitions are equivalent.

Definition 2. Let X, Y be jointly continuous random variables with joint

density fX,Y (x, y) and marginal densities fX (x), fY (y). We say they are
independent if

fX,Y (x, y) = fX (x)fY (y)

If we know the joint density of X and Y , then we can use the definition
to see if they are independent. But the definition is often used in a different
way. If we know the marginal densities of X and Y and we know that they
are independent, then we can use the definition to find their joint density.

Example: If X and Y are independent random variables and each has the
standard normal distribution, what is their joint density?
1 1
f (x, y) = exp(− (x2 + y 2 ))
2π 2
Example: Suppose that X and Y have a joint density that is uniform on
the disc centered at the origin with radius 1. Are they independent?

Example: If X and Y have a joint density that is uniform on the square

[a, b] × [c, d], then they are independent.

Example: Suppose that X and Y have joint density

−x−y
e if x, y ≥ 0
f (x, y) =
0, otherwise

Are X and Y independent?

Example: Suppose that X and Y are independent. X is uniform on [0, 1]

and Y has the Cauchy density.
(a) Find their joint density.
(b) Compute P(0 ≤ X ≤ 1/2, 0 ≤ Y ≤ 1)
(c) Compute P(Y ≥ X).

4
6.3 Expected value
If X and Y are jointly continuously random variables, then the mean of X
is still given by
Z ∞
E[X] = x fX (x) dx
−∞

If we write the marginal fX (x) in terms of the joint density, then this becomes
Z ∞Z ∞
E[X] = x fX,Y (x, y) dxdy
−∞ −∞

Now suppose we have a function g(x, y) from R2 to R. Then we can define

a new random variable by Z = g(X, Y ). In a later section we will see how to
compute the density of Z from the joint density of X and Y . We could then
compute the mean of Z using the density of Z. Just as in the discrete case
there is a shortcut.
Theorem 1. Let X, Y be jointly continuous random variables with joint
density f (x, y). Let g(x, y) : R2 → R. Define a new random variable by
Z = g(X, Y ). Then
Z ∞Z ∞
E[Z] = g(x, y) f (x, y) dxdy
−∞ −∞

provided
Z ∞ Z ∞
|g(x, y)| f (x, y) dxdy < ∞
−∞ −∞

An important special case is the following

Corollary 1. If X and Y are jointly continuous random variables and a, b
are real numbers, then
E[aX + bY ] = aE[X] + bE[Y ]

Example: X and Y have joint density

x + y if 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
f (x, y) =
0, otherwise
Let Z = X + Y . Find the mean and variance of Z.
We now consider independence and expectation.

5
Theorem 2. If X and Y are independent and jointly continuous, then

E[XY ] = E[X] E[Y ]

Proof. Since they are independent, fX,Y (x, y) = fX (x)fY (y). So

Z Z
E[XY ] = xy fX (x) fY (y) dxdy
Z Z
= x fX (x) dx y fY (y) dy = E[X]E[Y ]

6.4 Function of two random variables

Suppose X and Y are jointly continuous random variables. Let g(x, y) be a
function from R2 to R. We define a new random variable by Z = g(X, Y ).
Recall that we have already seen how to compute the expected value of Z. In
this section we will see how to compute the density of Z. The general strategy
is the same as when we considered functions of one random variable: we first
compute the cumulative distribution function.
Example: Let X and Y be independent random variables, each of which is
uniformly distributed on [0, 1]. Let Z = XY . First note that the range of Z
is [0, 1].
Z Z
FZ (z) = P(Z ≤ z) = 1 dxdy
A

Where A is the region

A = {(x, y) : 0 ≤ x ≤ 1, 0 ≤ y ≤ 1, xy ≤ z}

PICTURE

"Z #
Z 1 z/x
FZ (z) = z + 1 dy dx
z 0
"Z #
Z 1 z/x
= z+ 1 dy dx
z 0

6
1
z
Z
= z+ dx
z x
= z + z ln x|1z = z − z ln z

This is the cdf of Z. So we differentiate to get the density.

d d 1
FZ (z) = z − z ln z = 1 − ln z − z = − ln z
dz dz z

− ln z, if 0 ≤ z ≤ 1
fZ (z) =
0, otherwise

Example: Let X and Y be independent random variables, each of which is

exponential with parameter λ. Let Z = X + Y . Find the density of Z.
Should get gamma with same λ and w = 2.
This is special case of a much more general result. The sum of gamma(λ, w1 )
and gamma(λ, w2 ) is gamma(λ, w1 + w2 ). We could try to show this as we
did the previous example. But it is much easier to use moment generating
functions which we will introduce in the next section.

Example: Let (X, Y ) be uniformly distributed on the triangle with vertices

at (0, 0), (1, 0), (0, 1). Let Z = X + Y . Find the pdf of Z.

One of the most important examples of a function of two random variables

is Z = X + Y . In this case

FZ (z) = P(Z ≤ z) = P(X + Y ≤ z)

Z ∞ Z z−x
= f (x, y) dy dx
−∞ −∞

To get the density of Z we need to differentiate this with respect to Z. The

only z dependence is in the upper limit of the inside integral.
Z ∞ Z z−x
d d
fZ (z) = FZ (z) = f (x, y) dy dx
dz −∞ dz −∞
Z ∞
= f (x, z − x)dx
−∞

7
If X and Y are independent, then this becomes
Z ∞
fZ (z) = fX (x)fY (z − x)dx
−∞

This is known as a convolution. We can use this formula to find the density of
the sum of two independent random variables. But in some cases it is easier
to do this using generating functions which we study in the next section.
Example: Let X and Y be independent random variables each of which has
the standard normal distribution. Find the density of Z = X + Y .
We need to compute the convolution
Z ∞
1 1 1
fZ (z) = exp(− x2 − (z − x)2 ) dx
2π −∞ 2 2
Z ∞
1 1
= exp(−x2 − z 2 + xz) dx
2π −∞ 2
Z ∞
1 1
= exp(−(x − z/2)2 − z 2 ) dx
2π −∞ 4
Z ∞
2 1
= e−z /4 exp(−(x − z/2)2 ) dx
2π −∞

Now the substitution u = x − z/2 shows

Z ∞ Z ∞
2
exp(−(x − z/2) ) dx = exp(−u2 ) du
−∞ −∞
2
This is a constant - it does not depend on z. So fZ (z) = ce−z /4 . Another
simple substitution allows one to evaluate the constant, but there is no need.
We can already see that Z has a normal distribution with mean zero and
variance 2. The constant is whatever is needed to normalize the distribution.

6.5 Moment generating functions

This will be very similar to what we did in the discrete case.
Definition 3. For a continuous random variable X, the moment generating
function (mgf ) of X is
Z ∞
tX
MX (t) = E[e ] = etx fX (x) dx
−∞

8
Example: Compute it for the gamma distribution and find
w
λ
M (t) =
λ−t
A special case of the gamma distribution is the exponential distribution -
λ
you just take w = 1. So we see that for the exponential M (t) = λ−t .

Proposition 3. (1) Let X be a continuous random variable with mgf MX (t).

Then
dk
E[X k ] = MX (t)|t=0
dtk
(2) If X and Y are independent continuous random variables then
MX+Y (t) = MX (t)MY (t)
(3) If the mgf of X is MX (t) and we let Y = aX + b, then
MY (t) = etb MX (at)
Proof. For (1)
Z ∞
dk dk
MX (t)|t=0 = fX (x) etx |t=0 dx
dtk dtk −∞
Z ∞
dk
= fX (x) k etx |t=0 dx
dt
Z−∞
∞
= fX (x) xk etx |t=0 dx
Z−∞
∞
= fX (x) xk dx = E[X k ]
−∞

If X and Y are independent, then

MX+Y (t) = E[exp(t(X + Y ))] = E[exp(tX) exp(tY )]
= E[exp(tX)]E[exp(tY )] = MX (t)MY (t)
This calculation assumes that since X and Y are independent, then exp(tX)
and exp(tY ) are independent random variables. We have not shown this.
Part (3) is just
MY (t) = E[etY ] = E[et(aX+b) ] = etb E[etaX ] = etb MX (at)

9
As an application of part (3) we have
Example: Let X have the exponential distribution with parameter λ. Let
Y = X/λ. Use mgf’s to show Y has the exponential distribution with pa-
rameter 1.
Example: In the homework you show that the mgf for the normal density
is
1
MX (t) = exp(µt)MZ (σt) = exp(µt + σ 2 t2 )
2
Proposition 4. (a) If X1 , X2 , · · · , Xn are independent and each is normal
with mean µi and variance σi2 , then Y = X1 + X2 + · · · + Xn has a normal
distribution with mean µ and variance σ 2 given by
n
X
µ = µi ,
i=1
n
X
σ2 = σi2
i=1

(b) If X1 , X2 , · · · , Xn are independent and each is exponential with parameter

λ, then Y = X1 + X2 + · · · + Xn has a gamma distribution with parameters
λ = λ and w = n.
(c) If X1 , X2 , · · · , Xn are independent and each is gamma with parameters
λ, wi , then Y = X1 + X2 + · · · + Xn has a gamma distibution with parameters
λ and w = w1 + · · · + wn .
We will prove the theorem by proving statements about generating func-
tions. For example, for part (a) what we will really prove is that the moment
generating function of Y is that of a normal with the stated parameters.
To complete the proof we need to know that if two random variables have
the same moment generating functions then they have the same densities.
This is a theorem but it is a hard theorem and it requires some technical
assumptions on the random variables. We will ignore these subtleties and
just assume that if two RV’s have the same mgf, then they have the same
density.
Proof. We prove all three parts by simply computing the mgf’s involved.

10
6.6 Cumulative distribution functions and more inde-
pendence
Recall that for a discrete random variable X we have a probability mass
function fX (x) which is just fX (x) = P(X = x). And for a continuous
random variable X we have a probability density function fX (x). It is a
density in the sense that if ǫ > 0 is small, then P(x ≤ X ≤ x + ǫ) ≈ f (x)ǫ.
For both types of random variables we have a cumulative distribution
function and its definition is the same for all types of RV’s.

Definition 4. Let X, Y be random variables (discrete or continuous). Their

joint (cumulative) distribution function is

FX,Y (x, y) = P(X ≤ x, Y ≤ y)

If X and Y are jointly continuous then we can compute the joint cdf from
their joint pdf:
Z x Z y
FX,Y (x, y) = f (u, v) dv du
−∞ −∞

If we know the joint cdf, then we can compute the joint pdf by taking partial
derivatives of the above :
∂2
FX,Y (x, y) = f (x, y)
∂x∂y
The joint cdf has properties similar to the cdf for a single RV.

Proposition 5. Let F (x, y) be the joint cdf of two continuous random vari-
ables. Then F (x, y) is a continuous function on R2 and

lim F (x, y) = 0, lim F (x, y) = 1,

x,y→−∞ x,y→∞

F (x1 , y) ≤ F (x2 , y) if x1 ≤ x2 , F (x, y1 ) ≤ F (x, y2 ) if y1 ≤ y2

lim F (x, y) = FY (y) lim F (x, y) = FX (x)

x→∞ y→∞

We will use the joint cdf to prove more results about independent of RV’s.

11
Theorem 3. If X and Y are jointly continuous random variables then they
are independent if and only if FX,Y (x, y) = FX (x)FY (y).
The theorem is true for discrete random variables as well.
Proof.

Example: Suppose that the joint cdf of X and Y is

1
(1 − e−2x )(y + 1) if x ≥ 0, −1 ≤ y ≤ 1
2

−2x
F (x, y) = (1 − e ) if x ≥ 0, y ≥ 1
0
 if x ≥ 0, y < −1
0 if x<0
Show that X and Y are independent and find their joint density.
Theorem 4. If X and Y are independent jointly continuous random vari-
ables and g and h are functions from R to R then g(X) and h(Y ) are inde-
pendent random variables.
We will only prove a special case of the theorem. In the homework you
prove two other special cases.
Proof. We prove the theorem for g and h that are increasing. We also assume
they are differentiable. Let W = g(X), Z = h(Y ). By the previous theorem
we can show that W and Z are independent by showing that FW,Z (w, z) =
FW (w)FZ (z). We have

FW,Z (w, z) = P(g(X) ≤ w, h(Y ) ≤ z)

Because g and h are increasing, the event {g(X) ≤ w, h(Y ) ≤ z} is the same
as the event {X ≤ g −1 (w), Y ≤ h−1 (z)}. So

FW,Z (w, z) = P(X ≤ g −1 (w), Y ≤ h−1 (z))

= FX,Y (g −1 (w), h−1 (z)) = FX (g −1 (w))FY (h−1 (z))

where the last equality comes from the previous theorem and the indepen-
dence of X and Y . The individual cdfs of W and Z are

FW (w) = P(X ≤ g −1 (w)) = FX (g −1 (w))

FZ (z) = P(Y ≤ h−1 (z)) = FY (h−1 (z))

So we have shown FW,Z (w, z) = FW (w)FZ (z).

12
Corollary 2. If X and Y are independent jointly continuous random vari-
ables and g and h are functions from R to R then
E[g(X)h(Y )] = E[g(X)]E[h(Y )]
Recall that for any two random variables X and Y , we have E[X + Y ] =
E[X] + E[Y ]. If they are independent we also have
Theorem 5. If X and Y are independent and jointly continuous, then
var(X + Y ) = var(X) + var(Y )
Proof.

6.7 Change of variables

Suppose we have two random variables X and Y and we know their joint
density. We have two functions g : R2 → R and g : R2 → R, and we define
two new random variables by U = g(X, Y ), W = h(X, Y ). Can we find
the joint density of U and W ? In principle we can do this by computing
their joint cdf and then taking partial derivatives. In practice this can be a
mess. There is a another way involving Jacobians which we will study in this
section. But we start by illustrating the cdf approach with an example.
Example Let X and Y be independent standard normal RV’s. Let U =
X + Y and W = X − Y . Find the joint density of U and W . After a lot of
computation you should find that U and W are independent and each is a
normal RV with mean zero and variance 2.
There is a another way to compute the joint density of W, Y that we
will now study. First we return to the case of a function of a single random
variable. Support that X is a continuous random variable and we know it’s
density. g is a function from R to R and we define a new random variable
Y = g(Z). We want to find the density of Y . Our previous approach was to
compute the cdf first. Now suppose that g is strictly increasing on the range
of X. Then we have the following formula.
Proposition 6. If X is a continuous random variable whose range is D and
f : D → R is strictly increasing and differentiable, then
d −1
fY (y) = fX (g −1 (y)) g (y)
dy

13
Proof.
Z g −1 (y)
−1
P(Y ≤ y) = P(g(X) ≤ y) = P(X ≤ g (y)) = fX (x) dx
−∞

Now differentiate both sides with respect to y to finish the proof.

We review some multivariate calculus. Let D and S be open subsets of
2
R . Let T (x, y) be a map from D to S that is 1-1 and onto. (So it has an
inverse.) We also assume it is differentiable. For each point in D, T (x, y) is
in R2 . So we can write T as T (x, y) = (u(x, y), w(x, y)) We have an integral
Z Z
f (x, y) dxdy
D

that we want to rewrite as an integral over S with respect to u and w. This

is like doing a substitution in a one-dimensional integral. In that case you
dx
have dx = du du The analog of dx/du here is the Jacobian
∂x ∂x
∂x ∂y ∂y ∂x
J(u, w) = det ∂u
∂y
∂w
∂y = −
∂u ∂w ∂u ∂w ∂u ∂w
We then have
Z Z Z Z
f (x, y) dxdy = f (T −1 (u, w)) |J(u, w)| dudw
D S

Often f (T −1 (u, w)) is simply written as f (u, w). In practice you write f ,
which is originally a function of x and y as a function of u and w.
If A is a subset of D, then we have
Z Z Z Z
f (x, y) dxdy = f (T −1 (u, w)) |J(u, w)| dudw
A T (A)

We now state what this results says about joint pdf’s.

Proposition 7. Let T (x, y) be a 1-1, onto map from D to S. Let X, Y be
random variables such that range of (X, Y ) is D, and let fX,Y (x, y) be their
joint density. Define two new random variables by (U, W ) = T (X, Y ). Then
the range of (U, W ) is S and their joint pdf on this range is
fU,W (u, w) = f (T −1 (u, w)) |J(u, w)|
where the Jacobian J(u, w) is defined above.

14
Example - Polar coordinates Let X and Y be independent standard

normal random variables. Define new random variables R, Θ by

x = r cos(θ), y = r sin(θ)

Find the joint density of R, Θ.

Some calculation shows the Jacobian is r. (This is the same r you saw in
vector calc: dxdy = rdrdθ. ) And the joint density is

1 −r 2 /2
fR,Θ (r, θ) = 2π
re if r ≥ 0, 0 ≤ θ ≤ 2π
0, otherwise
Note that this implies that R and Θ are independent.
Example We redo the example that we started this section with and did
using the joint cdf. X and Y are independent standard normal RV’s. U =
X + Y and W = X − Y .

Example Let X and Y be independent random variables. They both have

an exponential distribution with λ = 1. Let

U = X + Y,
X
W =
X +Y
Find the joint density of U and W .
x
Let T (x, y) = (x + y, x+y ). Then T is a bijection from [0, ∞) × [0, ∞)
onto [0, ∞) × [0, 1]. We need to find its inverse, i.e., find x, y in terms of u, w.
Multiply the two equations to get x = uw. Then y = u − x = u − uw. So
T −1 (u, w) = (uw, u − uw). And so
∂x ∂x
∂u ∂w w u
J(u, w) = det ∂y ∂y = det = −u
∂u ∂w
1 − w −u
So

ue−u if u ≥ 0, 0 ≤ w ≤ 1
fU,W (u, w) =
0, otherwise

Example Let X and Y be independent random variables. X has a gamma

distribution with parameters λ = 1 and w = 2. Y has an exponential

15
distribution with parameter λ = 1. Let U = X + Y and W = Y /X. Find
the joint pdf of U, W .
X = U/(1 + V ), Y = U V /(1 + V ).
Jacobian is U/(1 + V )2 .

Bivariate normal If X and Y are independent standard normal RV’s, then

their joint density is proportional to exp(− 12 (x2 + y 2 )). This is a special case
of a bivariate normal distribution. In the more general case they need not
be independent. We find consider a special case of the bivariate normal. Let
−1 < ρ < 1. Define
1 1
f (x, y) = p exp(− 2)
(x2 − 2ρxy + y 2 ))
2π 1 − ρ 2 2(1 − ρ

You can compute the marginals of this joint distribution by the usual trick of
completing the square. You find that X and Y both have a standard normal
distribution. Note that the stuff in the exponential is a quadratic form in x
and y. A more general quadratic form would have three parameters:

exp(−(Ax2 + 2Bxy + Cy 2 ))

In order for the intergal to converge the quadratic form Ax2 + 2Bxy + Cy 2
must be positive.
Now suppose we start with two independent random variables X and Y
which are independent and define

U = aX + bY, W = cX + dY

where a, b, c, d are real numbers. In matrix notation

U a b X
=
W c d Y

What is the joint density of U and W ? The transformation T is linear and

so its inverse is linear (assuming it is invertible). So the Jacobian will just
be a constant. So the joint density of U, W will be of the form exp(− 12 mess)
where mess is what we get when we rewrite x2 + y 2 in terms of u and w.
Argue this will be of the form Au2 + 2Buw + Cw2 . So we get some sort of
bivariate normal.

16
This can be generalized to n RV’s. A joint normal (or Gaussian) distri-
bution is of the form
1
c exp(− (x, M x))
2
where M is a positive definite n by n matrix and c is the normalizing constant.

Correlation coefficient
If X and Y are independent, then E[XY ] − E[X]E[Y ] = 0. If there are
not independent, it need not be zero and it is in some sense a measure of
how dependent they are.

Definition 5. The covariance of X and Y is

cov(X, Y ) = E[XY ] − E[X]E[Y ]

The correlation coefficient is a

cov(X, Y )
ρ(X, Y ) = p p
var(X) var(Y )

The correlation coefficient has the advantage that it is scale invariant:

ρ(aX, bY ) = ρ(X, Y ). It can be shown that for any random variables −1 ≤
ρ(X, Y ) ≤ 1.

Bivariate normal - cont We return to the joint density

1 1
f (x, y) = p exp(− 2)
(x2 − 2ρxy + y 2 ))
2π 1 − ρ 2 2(1 − ρ

Note that f (−x, −y) = f (x, y). This implies E[X] = E[Y ] = 0. So
cov(X, Y ) = E[XY ].

1 x2 − 2ρxy + y 2
Z Z
E[XY ] = xy exp(− ) dxdy
2(1 − ρ2 )
p
2π 1 − ρ2
y 2 − 2ρxy
Z
1 1
Z
2
= x exp(− x) y exp(− ) dy dx
2(1 − ρ2 ) 2(1 − ρ2 )
p
2π 1 − ρ2
(y − ρx)2 − ρ2 x2
Z
1 1
Z
2
= x exp(− x) y exp(− ) dy dx
2(1 − ρ2 ) 2(1 − ρ2 )
p
2π 1 − ρ2

17
y2
Z
1 1 2
Z
= x exp(− x ) (y + ρx) exp(− ) dy dx
2(1 − ρ2 )
p
2π 1 − ρ2 2
y2
Z Z
1 2 1 2
= ρ p x exp(− x ) dx exp(− ) dy
2π 1 − ρ2 2 2(1 − ρ2 )
= ρ

So the correlation coefficient is ρ. Of coure this is why we wrote the density

in the form that we did.

6.8 Conditional density and expectation

We first review what we have done. For events A, B,
P(A ∩ B)
P(A|B) =
P(B)

provided P(B) > 0. If we define Q(A) = P(A|B), then Q is a new probability

measure.
Let X be a discrete RV with pmf fX (x). If we know B occurs the pmf
for X will be different. The conditional pmf of X given B is
P(X = x, B)
f (x|B) = P(X = x|B) =
P(B)
The conditional expectation of X given B is
X
E[X|B] = x f (x|B)
x

A partition is a collection of disjoint events Bn whose union is all of the

sample space Ω. The partition theorem says that for a random variable X.
X
E[X] = E[X|Bn ]P(Bn )
n

Most of our applications were of the following form. Let Y be another discrete
RV. Define Bn = {Y = n} where n ranges over the range of Y . Then
X
E[X] = E[X|Y = n]P(Y = n)
n

18
Now suppose X and Y are continuous random variables. We want to
condition on Y = y. We cannot do this since P(Y = y) = 0. How can we
make sense of something like P(a ≤ X ≤ b|Y = y) ? We can define it by a
limiting process:

lim P(a ≤ X ≤ b|y − ǫ ≤ Y ≤ y + ǫ)

ǫ→0

Now let f (x, y) be the joint pdf of X and Y .

R R b

y+ǫ
f (u, w) dw du
a y−ǫ
P(a ≤ X ≤ b|y − ǫ ≤ Y ≤ y + ǫ) = R ∞ R y+ǫ
−∞ y−ǫ
f (u, w) dw du

Assuming f is continuous and ǫ is small,

Z y+ǫ
f (u, w) dw ≈ 2ǫf (u, y)
y−ǫ

So the above just becomes

Rb Z b
a
2ǫf (u, y)du f (u, y)
R∞ = du
−∞
2ǫf (u, y)du a f Y (y)

This motivates the following definition:

Definition 6. Let X, Y be jointly continuous RV’s with pdf fX,Y (x, y). The
conditional density of X given Y = y is

fX,Y (x, y)
fX|Y (x|y) = , if fY (y) > 0
fY (y)

When fY (y) = 0 we can just define it to be 0. We also define

Z b
P(a ≤ X ≤ b|Y = y) = fX|Y (x|y) dx
a

We have made the above definitions. We could have defined fX|Y and
P(a ≤ X ≤ b|Y = y) as limits and then proved the above as theorems.
What happens if X and Y are independent? Then f (x, y) = fX (x)fY (y).
So fX|Y (x|y) = fX (x) as we would expect.

19
Example (X, Y ) is uniformly distributed on the triangle with vertices (0, 0), (0, 1)
and (1, 0). Find the conditional density of X given Y .
The joint density is 2 on the triangle.
Z 1−y
fY (y) = 2 dx = 2(1 − y), 0 ≤ y ≤ 1
0

And we have
2 1
fX|Y (x|y) = = , 0≤x≤1−y
2(1 − y) 1−y
So given Y = y, X is uniformly distributed on [0, 1 − y].
The conditional expectation is defined in the obvious way
Definition 7.
Z
E[X|Y = y] = x fX|Y (x|y) dx

Note that E[X|Y = y] is a function of y. In our example, E[X|Y = y] =

1
2
(1 − y).
Example Let X, Y be independent,each having an exponential distribution
with the same λ. Let Z = X + Y . Find fZ|X , fX|Z , E[Z|X = x] and
E[X|Z = z].
First we need to find the joint density of X and Z. We use change of
variables. Let U = X, W = X + Y . The inverse is x = u, y = w − u.The
Jacobian is
∂x ∂x
1 0
J(u, w) = det ∂u∂y
∂w
∂y = det =1
∂u ∂w
−1 1

We have fX,Y (x, y) = λ2 exp(−λ(x + y)) for x, y ≥ 0. So

λ2 e−λz , if 0 ≤ x ≤ z
fX,Z (x, z) =
0, otherwise
It is convenient to write the condition on x, z as 1(0 ≤ x ≤ z). This notation
means the function is 1 if 0 ≤ x ≤ z is satisfied and 0 if it is not. So
fX,Z (x, z) = λ2 e−λz 1(0 ≤ x ≤ z). So we have for x ≥ 0,
fX,Z (x, z) λ2 e−λz 1(0 ≤ x ≤ z)
fX|Z (x|z) = = = λe−λ(z−x) 1(0 ≤ x ≤ z)
fX (x) λe−λx

20
Using this and the sub u = z − x, we find
Z ∞ Z ∞
−λ(z−x) 1
E[Z|X = x] = z λe dz = (u + x)λe−λu du = x +
x 0 λ
For the other one, we first find the marginal for Z:
Z ∞ Z ∞
fZ (z) = fX,Z (x, z) dx = λ2 e−λz 1(0 ≤ x ≤ z) dx
Z−∞z
−∞

= λ2 e−λz dx = λ2 ze−λz
0
So we have
fX,Z (x, z) λ2 e−λz 1(0 ≤ x ≤ z) 1
fX|Z (x|z) = = 2 −λz
= 1(0 ≤ x ≤ z)
fZ (z) λ ze z
So given that Z = z, X is uniformly distributed on [0, z]. So E[X|Z = z] =
z/2.
Recall the partition theorem for discrete RV’s X and Y ,
X
E[Y ] = E[Y |X = n]P(X = n)
n

For continuous random variables we have

Theorem 6. Let X, Y be jointly continuous random variables. Then
Z
E[Y ] = E[Y |X = x] fX (x) dx

where the integral is over the range of x where fX (x) > 0, i.e., the range of
X.
Proof. Recall the definition:
Z
E[Y |X = x] = y fY |X (y|x) dy
fY,X (y, x)
Z
= y dy
fX (x)

So
Z Z Z
E[Y |X = x] fX (x) dx = y fX,Y (x, y)dx dy = E[Y ]

21
Recall that the partition theorem was useful when it was hard to compute
the expected value of Y , but easy to compute the expected value of Y given
that some other random variable is known.
There was another partition theorem for discrete random variable that
gave a formula for P (B) where B is an event in terms of conditional prob-
abilities of B. Here is a special case where B and the partition both come
from random variables. Let X and Y be discrete RV’s. Then
X
P (a ≤ Y ≤ b) = P (a ≤ Y ≤ b|X = n)P(X = n)
n

For continuous random variables we have

Theorem 7. Let X, Y be jointly continuous random variables. Then

Z
P (a ≤ X ≤ b) = P (a ≤ X ≤ b|Y = y) fY (y) dy

where
Z b
P (a ≤ X ≤ b|Y = y) = fX|Y (x|y)dx
a

Proof. Straightforward using the above definitions.

Example: Quality of lightbulbs varies because ... For fixed factory con-
ditions, the lifetime of the lightbulb has an exponential distribution. We
model this by assuming the parameter λ is uniformly distributed between
5 × 10−4 and 8 × 10−4 . Find the mean lifetime of a lightbulb and the pdf for
its lifetime. Is it exponential?
Example: Let X, Y be independent standard normal RV’s. Let Z = X + Y .
Find fZ|X , fX|Z , E[Z|X = x] and E[X|Z = z].

6 Jointly Continuous Random Variables: 6.1 Joint Density Functions
No ratings yet
6 Jointly Continuous Random Variables: 6.1 Joint Density Functions
23 pages
6 Jointly Continuous Random Variables: 6.1 Joint Density Functions
No ratings yet
6 Jointly Continuous Random Variables: 6.1 Joint Density Functions
11 pages
6 Jointly Continuous Random Variables: 6.1 Joint Density Functions
No ratings yet
6 Jointly Continuous Random Variables: 6.1 Joint Density Functions
22 pages
Joint Probability Functions
No ratings yet
Joint Probability Functions
7 pages
AMA2104 Probability and Engineering Statistics 3 Joint Distribution
No ratings yet
AMA2104 Probability and Engineering Statistics 3 Joint Distribution
25 pages
Joint
No ratings yet
Joint
5 pages
Joint Random Variables 1
No ratings yet
Joint Random Variables 1
11 pages
Lect6 PDF
No ratings yet
Lect6 PDF
11 pages
Chap 3: Two Random Variables: X X X X X
No ratings yet
Chap 3: Two Random Variables: X X X X X
63 pages
Chap 3: Two Random Variables: X X X X X
No ratings yet
Chap 3: Two Random Variables: X X X X X
63 pages
Lectures Week5
No ratings yet
Lectures Week5
6 pages
Transformations of Two Random Variables
No ratings yet
Transformations of Two Random Variables
9 pages
Notes 05
No ratings yet
Notes 05
19 pages
3. Continuous Random Variables
No ratings yet
3. Continuous Random Variables
28 pages
Chapter 5
No ratings yet
Chapter 5
19 pages
notes05 (1)
No ratings yet
notes05 (1)
20 pages
Lecture Notes
No ratings yet
Lecture Notes
23 pages
Notes 5
No ratings yet
Notes 5
33 pages
Joint Distribution and Later
No ratings yet
Joint Distribution and Later
61 pages
Lecture11 Slides
No ratings yet
Lecture11 Slides
7 pages
Chapter 4
No ratings yet
Chapter 4
97 pages
Advanced Probability & Statistics __ 23CST-286
No ratings yet
Advanced Probability & Statistics __ 23CST-286
28 pages
Lecture+17_inClass
No ratings yet
Lecture+17_inClass
33 pages
Prob Review
No ratings yet
Prob Review
19 pages
l4
No ratings yet
l4
49 pages
Theories Joint Distribution PDF
No ratings yet
Theories Joint Distribution PDF
25 pages
Theories Joint Distribution
No ratings yet
Theories Joint Distribution
25 pages
Conditional Expectation For Discrete Random Variables: Agenda
No ratings yet
Conditional Expectation For Discrete Random Variables: Agenda
5 pages
Chapter 4
No ratings yet
Chapter 4
41 pages
PTSP Notes Unit 2
No ratings yet
PTSP Notes Unit 2
15 pages
Unit-12 IGNOU STATISTICS
No ratings yet
Unit-12 IGNOU STATISTICS
34 pages
Lecture No.6
No ratings yet
Lecture No.6
8 pages
Joint Continuous Random Variables
No ratings yet
Joint Continuous Random Variables
3 pages
Joint Density
No ratings yet
Joint Density
28 pages
Stochastic
No ratings yet
Stochastic
75 pages
MTH 514 Notes CH 4,5,6,10
No ratings yet
MTH 514 Notes CH 4,5,6,10
69 pages
Llecture2 1
No ratings yet
Llecture2 1
62 pages
Joint Probability Distribution
No ratings yet
Joint Probability Distribution
14 pages
Continuous Couples
No ratings yet
Continuous Couples
9 pages
MATH 156 Chapter 4 - Probability and Calculus
No ratings yet
MATH 156 Chapter 4 - Probability and Calculus
32 pages
PRB Theory
No ratings yet
PRB Theory
5 pages
Stats 116 SU
No ratings yet
Stats 116 SU
128 pages
Joint Distributions: Basic Theory
No ratings yet
Joint Distributions: Basic Theory
10 pages
Probabilidad
No ratings yet
Probabilidad
23 pages
Distributions and Normal Random Variables
No ratings yet
Distributions and Normal Random Variables
8 pages
Joint Continuous Distributions
No ratings yet
Joint Continuous Distributions
9 pages
StatLec1 5 PDF
No ratings yet
StatLec1 5 PDF
23 pages
Joint Dist
No ratings yet
Joint Dist
30 pages
TEST 2 - Review Fall 2014
No ratings yet
TEST 2 - Review Fall 2014
2 pages
CL 202 Multiplerandomvars
No ratings yet
CL 202 Multiplerandomvars
45 pages
Econ-2042- Unit 4-HO
No ratings yet
Econ-2042- Unit 4-HO
13 pages
Chapter 4: Multiple Random Variables
No ratings yet
Chapter 4: Multiple Random Variables
34 pages
Web
No ratings yet
Web
329 pages
Distribution of Sums, Ratios and Order Statistics
No ratings yet
Distribution of Sums, Ratios and Order Statistics
2 pages
StochasticModels 2011 Part 3
No ratings yet
StochasticModels 2011 Part 3
15 pages
Math5846_chapter5
No ratings yet
Math5846_chapter5
102 pages
Stochastic Hydrology: Indian Institute of Science
No ratings yet
Stochastic Hydrology: Indian Institute of Science
56 pages
Chapter 2 Multivariate Distributions: 2.1 Distributions of Two Random Variables
No ratings yet
Chapter 2 Multivariate Distributions: 2.1 Distributions of Two Random Variables
124 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Actsc 432 Review Part 1
No ratings yet
Actsc 432 Review Part 1
7 pages
1520-0493-1520-0493 1958 086 0117 Anotgd 2 0 Co 2
No ratings yet
1520-0493-1520-0493 1958 086 0117 Anotgd 2 0 Co 2
6 pages
On The Product of Two Kappa Mu Random Variables and Its Application To Double and Composite Fading Channels
No ratings yet
On The Product of Two Kappa Mu Random Variables and Its Application To Double and Composite Fading Channels
14 pages
Prof. U.J.Dixit
No ratings yet
Prof. U.J.Dixit
11 pages
Types of Continuous Probability Distributions
No ratings yet
Types of Continuous Probability Distributions
1 page
Mrbayes Credit To Author
No ratings yet
Mrbayes Credit To Author
10 pages
Chap4 - 1 Time To Failure
No ratings yet
Chap4 - 1 Time To Failure
36 pages
Note - 1 PDF
No ratings yet
Note - 1 PDF
75 pages
Geoff Bohling Bayes
No ratings yet
Geoff Bohling Bayes
10 pages
MSO 201a: Probability and Statistics 2015-2016-II Semester Assignment-X
No ratings yet
MSO 201a: Probability and Statistics 2015-2016-II Semester Assignment-X
23 pages
Overview of Bayesian Statistics
No ratings yet
Overview of Bayesian Statistics
13 pages
Inspections For Dormant Failures
No ratings yet
Inspections For Dormant Failures
18 pages
Probability Distributions and Combination of Random Variables
No ratings yet
Probability Distributions and Combination of Random Variables
52 pages
Fundamental of Speech Enhencements
No ratings yet
Fundamental of Speech Enhencements
112 pages
Distribution Analyzer User Guide
No ratings yet
Distribution Analyzer User Guide
132 pages
Week 9 - Activity 6
No ratings yet
Week 9 - Activity 6
5 pages
Applied Bayesian Econometrics For Central Bankers Updated 2017 PDF
No ratings yet
Applied Bayesian Econometrics For Central Bankers Updated 2017 PDF
222 pages
PSF Models in Stata
No ratings yet
PSF Models in Stata
14 pages
Discrete Probability Distributions
No ratings yet
Discrete Probability Distributions
6 pages
Best Fit Probability Distributions For Monthly Radiosonde Weather Data
No ratings yet
Best Fit Probability Distributions For Monthly Radiosonde Weather Data
8 pages
Instant Download The R Book 2nd Edition Michael J. Crawley PDF All Chapters
100% (7)
Instant Download The R Book 2nd Edition Michael J. Crawley PDF All Chapters
60 pages
Exponential Distribution
No ratings yet
Exponential Distribution
20 pages
Mean Stress Effects On Random Fatigue of Nonlinear Structures
No ratings yet
Mean Stress Effects On Random Fatigue of Nonlinear Structures
8 pages
Specimen Paper CS1
No ratings yet
Specimen Paper CS1
7 pages
23.karlis and Xekalaki Mixed Poisson Review PDF
No ratings yet
23.karlis and Xekalaki Mixed Poisson Review PDF
24 pages
6 - Probability Distributions
No ratings yet
6 - Probability Distributions
20 pages
Sem2 Stats2
100% (1)
Sem2 Stats2
23 pages
Marketing Model (BA ZC 421) Sidharth Mishra 13/01/2021: BITS Pilani BITS Pilani
No ratings yet
Marketing Model (BA ZC 421) Sidharth Mishra 13/01/2021: BITS Pilani BITS Pilani
268 pages
Math 361, Problem Set 11
No ratings yet
Math 361, Problem Set 11
4 pages