Random Variables
Random Variables
Again, we deviate from the order in the book for this chapter, so the subsec-
tions in this chapter do not correspond to those in the text.
The integral is over {(x, y) : x ≤ s, y ≤ t}. We can also write the integral as
Z s Z t
P(X ≤ s, Y ≤ t) = fX,Y (x, y) dy dx
−∞ −∞
Z t Z s
= fX,Y (x, y) dx dy
−∞ −∞
Just as with one random variable, the joint density function contains all
the information about the underlying probability measure if we only look at
the random variables X and Y . In particular, we can compute the probability
of any event defined in terms of X and Y just using f (x, y).
Here are some events defined in terms of X and Y :
{X ≤ Y }, {X 2 + Y 2 ≤ 1}, and {1 ≤ X ≤ 4, Y ≥ 0}. They can all be written
in the form {(X, Y ) ∈ A} for some subset A of R2 .
1
Proposition 1. For A ⊂ R2 ,
Z Z
P((X, Y ) ∈ A) = f (x, y) dxdy
A
MORE !!!!!!!!!
Definition: Let A ⊂ R2 . We say X and Y are uniformly distributed on A
if
1
, if (x, y) ∈ A
f (x) = c
0, otherwise
2
Compute P(X + Y ≤ t).
What does the pdf mean? In the case of a single discrete RV, the pmf
has a very concrete meaning. f (x) is the probability that X = x. If X is a
single continuous random variable, then
Z x+δ
P(x ≤ X ≤ x + δ) = f (u) du ≈ δf (x)
x
P(x ≤ X ≤ x + δ, y ≤ Y ≤ y + δ) ≈ δ 2 f (x, y)
Proposition 2. If X and Y are jointly continuous with joint density fX,Y (x, y),
then the marginal densities are given by
Z ∞
fX (x) = fX,Y (x, y) dy
−∞
Z ∞
fY (y) = fX,Y (x, y) dx
−∞
3
We will define independence of two contiunous random variables differ-
ently than the book. The two definitions are equivalent.
If we know the joint density of X and Y , then we can use the definition
to see if they are independent. But the definition is often used in a different
way. If we know the marginal densities of X and Y and we know that they
are independent, then we can use the definition to find their joint density.
Example: If X and Y are independent random variables and each has the
standard normal distribution, what is their joint density?
1 1
f (x, y) = exp(− (x2 + y 2 ))
2π 2
Example: Suppose that X and Y have a joint density that is uniform on
the disc centered at the origin with radius 1. Are they independent?
4
6.3 Expected value
If X and Y are jointly continuously random variables, then the mean of X
is still given by
Z ∞
E[X] = x fX (x) dx
−∞
If we write the marginal fX (x) in terms of the joint density, then this becomes
Z ∞Z ∞
E[X] = x fX,Y (x, y) dxdy
−∞ −∞
provided
Z ∞ Z ∞
|g(x, y)| f (x, y) dxdy < ∞
−∞ −∞
5
Theorem 2. If X and Y are independent and jointly continuous, then
A = {(x, y) : 0 ≤ x ≤ 1, 0 ≤ y ≤ 1, xy ≤ z}
PICTURE
"Z #
Z 1 z/x
FZ (z) = z + 1 dy dx
z 0
"Z #
Z 1 z/x
= z+ 1 dy dx
z 0
6
1
z
Z
= z+ dx
z x
= z + z ln x|1z = z − z ln z
7
If X and Y are independent, then this becomes
Z ∞
fZ (z) = fX (x)fY (z − x)dx
−∞
This is known as a convolution. We can use this formula to find the density of
the sum of two independent random variables. But in some cases it is easier
to do this using generating functions which we study in the next section.
Example: Let X and Y be independent random variables each of which has
the standard normal distribution. Find the density of Z = X + Y .
We need to compute the convolution
Z ∞
1 1 1
fZ (z) = exp(− x2 − (z − x)2 ) dx
2π −∞ 2 2
Z ∞
1 1
= exp(−x2 − z 2 + xz) dx
2π −∞ 2
Z ∞
1 1
= exp(−(x − z/2)2 − z 2 ) dx
2π −∞ 4
Z ∞
2 1
= e−z /4 exp(−(x − z/2)2 ) dx
2π −∞
8
Example: Compute it for the gamma distribution and find
w
λ
M (t) =
λ−t
A special case of the gamma distribution is the exponential distribution -
λ
you just take w = 1. So we see that for the exponential M (t) = λ−t .
9
As an application of part (3) we have
Example: Let X have the exponential distribution with parameter λ. Let
Y = X/λ. Use mgf’s to show Y has the exponential distribution with pa-
rameter 1.
Example: In the homework you show that the mgf for the normal density
is
1
MX (t) = exp(µt)MZ (σt) = exp(µt + σ 2 t2 )
2
Proposition 4. (a) If X1 , X2 , · · · , Xn are independent and each is normal
with mean µi and variance σi2 , then Y = X1 + X2 + · · · + Xn has a normal
distribution with mean µ and variance σ 2 given by
n
X
µ = µi ,
i=1
n
X
σ2 = σi2
i=1
10
6.6 Cumulative distribution functions and more inde-
pendence
Recall that for a discrete random variable X we have a probability mass
function fX (x) which is just fX (x) = P(X = x). And for a continuous
random variable X we have a probability density function fX (x). It is a
density in the sense that if ǫ > 0 is small, then P(x ≤ X ≤ x + ǫ) ≈ f (x)ǫ.
For both types of random variables we have a cumulative distribution
function and its definition is the same for all types of RV’s.
If X and Y are jointly continuous then we can compute the joint cdf from
their joint pdf:
Z x Z y
FX,Y (x, y) = f (u, v) dv du
−∞ −∞
If we know the joint cdf, then we can compute the joint pdf by taking partial
derivatives of the above :
∂2
FX,Y (x, y) = f (x, y)
∂x∂y
The joint cdf has properties similar to the cdf for a single RV.
Proposition 5. Let F (x, y) be the joint cdf of two continuous random vari-
ables. Then F (x, y) is a continuous function on R2 and
We will use the joint cdf to prove more results about independent of RV’s.
11
Theorem 3. If X and Y are jointly continuous random variables then they
are independent if and only if FX,Y (x, y) = FX (x)FY (y).
The theorem is true for discrete random variables as well.
Proof.
Because g and h are increasing, the event {g(X) ≤ w, h(Y ) ≤ z} is the same
as the event {X ≤ g −1 (w), Y ≤ h−1 (z)}. So
where the last equality comes from the previous theorem and the indepen-
dence of X and Y . The individual cdfs of W and Z are
12
Corollary 2. If X and Y are independent jointly continuous random vari-
ables and g and h are functions from R to R then
E[g(X)h(Y )] = E[g(X)]E[h(Y )]
Recall that for any two random variables X and Y , we have E[X + Y ] =
E[X] + E[Y ]. If they are independent we also have
Theorem 5. If X and Y are independent and jointly continuous, then
var(X + Y ) = var(X) + var(Y )
Proof.
13
Proof.
Z g −1 (y)
−1
P(Y ≤ y) = P(g(X) ≤ y) = P(X ≤ g (y)) = fX (x) dx
−∞
Often f (T −1 (u, w)) is simply written as f (u, w). In practice you write f ,
which is originally a function of x and y as a function of u and w.
If A is a subset of D, then we have
Z Z Z Z
f (x, y) dxdy = f (T −1 (u, w)) |J(u, w)| dudw
A T (A)
14
Example - Polar coordinates Let X and Y be independent standard
x = r cos(θ), y = r sin(θ)
U = X + Y,
X
W =
X +Y
Find the joint density of U and W .
x
Let T (x, y) = (x + y, x+y ). Then T is a bijection from [0, ∞) × [0, ∞)
onto [0, ∞) × [0, 1]. We need to find its inverse, i.e., find x, y in terms of u, w.
Multiply the two equations to get x = uw. Then y = u − x = u − uw. So
T −1 (u, w) = (uw, u − uw). And so
∂x ∂x
∂u ∂w w u
J(u, w) = det ∂y ∂y = det = −u
∂u ∂w
1 − w −u
So
ue−u if u ≥ 0, 0 ≤ w ≤ 1
fU,W (u, w) =
0, otherwise
15
distribution with parameter λ = 1. Let U = X + Y and W = Y /X. Find
the joint pdf of U, W .
X = U/(1 + V ), Y = U V /(1 + V ).
Jacobian is U/(1 + V )2 .
You can compute the marginals of this joint distribution by the usual trick of
completing the square. You find that X and Y both have a standard normal
distribution. Note that the stuff in the exponential is a quadratic form in x
and y. A more general quadratic form would have three parameters:
exp(−(Ax2 + 2Bxy + Cy 2 ))
In order for the intergal to converge the quadratic form Ax2 + 2Bxy + Cy 2
must be positive.
Now suppose we start with two independent random variables X and Y
which are independent and define
U = aX + bY, W = cX + dY
16
This can be generalized to n RV’s. A joint normal (or Gaussian) distri-
bution is of the form
1
c exp(− (x, M x))
2
where M is a positive definite n by n matrix and c is the normalizing constant.
Correlation coefficient
If X and Y are independent, then E[XY ] − E[X]E[Y ] = 0. If there are
not independent, it need not be zero and it is in some sense a measure of
how dependent they are.
cov(X, Y )
ρ(X, Y ) = p p
var(X) var(Y )
Note that f (−x, −y) = f (x, y). This implies E[X] = E[Y ] = 0. So
cov(X, Y ) = E[XY ].
1 x2 − 2ρxy + y 2
Z Z
E[XY ] = xy exp(− ) dxdy
2(1 − ρ2 )
p
2π 1 − ρ2
y 2 − 2ρxy
Z
1 1
Z
2
= x exp(− x) y exp(− ) dy dx
2(1 − ρ2 ) 2(1 − ρ2 )
p
2π 1 − ρ2
(y − ρx)2 − ρ2 x2
Z
1 1
Z
2
= x exp(− x) y exp(− ) dy dx
2(1 − ρ2 ) 2(1 − ρ2 )
p
2π 1 − ρ2
17
y2
Z
1 1 2
Z
= x exp(− x ) (y + ρx) exp(− ) dy dx
2(1 − ρ2 )
p
2π 1 − ρ2 2
y2
Z Z
1 2 1 2
= ρ p x exp(− x ) dx exp(− ) dy
2π 1 − ρ2 2 2(1 − ρ2 )
= ρ
Most of our applications were of the following form. Let Y be another discrete
RV. Define Bn = {Y = n} where n ranges over the range of Y . Then
X
E[X] = E[X|Y = n]P(Y = n)
n
18
Now suppose X and Y are continuous random variables. We want to
condition on Y = y. We cannot do this since P(Y = y) = 0. How can we
make sense of something like P(a ≤ X ≤ b|Y = y) ? We can define it by a
limiting process:
Definition 6. Let X, Y be jointly continuous RV’s with pdf fX,Y (x, y). The
conditional density of X given Y = y is
fX,Y (x, y)
fX|Y (x|y) = , if fY (y) > 0
fY (y)
We have made the above definitions. We could have defined fX|Y and
P(a ≤ X ≤ b|Y = y) as limits and then proved the above as theorems.
What happens if X and Y are independent? Then f (x, y) = fX (x)fY (y).
So fX|Y (x|y) = fX (x) as we would expect.
19
Example (X, Y ) is uniformly distributed on the triangle with vertices (0, 0), (0, 1)
and (1, 0). Find the conditional density of X given Y .
The joint density is 2 on the triangle.
Z 1−y
fY (y) = 2 dx = 2(1 − y), 0 ≤ y ≤ 1
0
And we have
2 1
fX|Y (x|y) = = , 0≤x≤1−y
2(1 − y) 1−y
So given Y = y, X is uniformly distributed on [0, 1 − y].
The conditional expectation is defined in the obvious way
Definition 7.
Z
E[X|Y = y] = x fX|Y (x|y) dx
20
Using this and the sub u = z − x, we find
Z ∞ Z ∞
−λ(z−x) 1
E[Z|X = x] = z λe dz = (u + x)λe−λu du = x +
x 0 λ
For the other one, we first find the marginal for Z:
Z ∞ Z ∞
fZ (z) = fX,Z (x, z) dx = λ2 e−λz 1(0 ≤ x ≤ z) dx
Z−∞z
−∞
= λ2 e−λz dx = λ2 ze−λz
0
So we have
fX,Z (x, z) λ2 e−λz 1(0 ≤ x ≤ z) 1
fX|Z (x|z) = = 2 −λz
= 1(0 ≤ x ≤ z)
fZ (z) λ ze z
So given that Z = z, X is uniformly distributed on [0, z]. So E[X|Z = z] =
z/2.
Recall the partition theorem for discrete RV’s X and Y ,
X
E[Y ] = E[Y |X = n]P(X = n)
n
where the integral is over the range of x where fX (x) > 0, i.e., the range of
X.
Proof. Recall the definition:
Z
E[Y |X = x] = y fY |X (y|x) dy
fY,X (y, x)
Z
= y dy
fX (x)
So
Z Z Z
E[Y |X = x] fX (x) dx = y fX,Y (x, y)dx dy = E[Y ]
21
Recall that the partition theorem was useful when it was hard to compute
the expected value of Y , but easy to compute the expected value of Y given
that some other random variable is known.
There was another partition theorem for discrete random variable that
gave a formula for P (B) where B is an event in terms of conditional prob-
abilities of B. Here is a special case where B and the partition both come
from random variables. Let X and Y be discrete RV’s. Then
X
P (a ≤ Y ≤ b) = P (a ≤ Y ≤ b|X = n)P(X = n)
n
where
Z b
P (a ≤ X ≤ b|Y = y) = fX|Y (x|y)dx
a
Example: Quality of lightbulbs varies because ... For fixed factory con-
ditions, the lifetime of the lightbulb has an exponential distribution. We
model this by assuming the parameter λ is uniformly distributed between
5 × 10−4 and 8 × 10−4 . Find the mean lifetime of a lightbulb and the pdf for
its lifetime. Is it exponential?
Example: Let X, Y be independent standard normal RV’s. Let Z = X + Y .
Find fZ|X , fX|Z , E[Z|X = x] and E[X|Z = z].
22