Theories Joint Distribution
Theories Joint Distribution
So far we have focused on probability distributions for single random variables. However, we are often interested in probability statements concerning
two or more random variables. The following examples are illustrative:
In ecological studies, counts, modeled as random variables, of several
species are often made. One species is often the prey of another; clearly,
the number of predators will be related to the number of prey.
The joint probability distribution of the x, y and z components of
wind velocity can be experimentally measured in studies of atmospheric
turbulence.
The joint distribution of the values of various physiological variables in
a population of patients is often of interest in medical studies.
A model for the joint distribution of age and length in a population of
fish can be used to estimate the age distribution from the length distribution. The age distribution is relevant to the setting of reasonable
harvesting policies.
Joint Distribution
Joint probability mass functions: Let X and Y be discrete random variables defined on the sample space that take on values {x1 , x2 , } and
{y1 , y2 , }, respectively. The joint probability mass function of (X, Y )
is
(1.2)
p(xi , yj ) = P (X = xi , Y = yj ).
Solution: the joint distribution of (X, Y ) can be summarized in the following table:
x/y 0 1 2 3
0 18 28 18 0
1 0 18 28 81
Marginal probability mass functions: Suppose that we wish to find the pmf
of Y from the joint pmf of X and Y in the previous example:
pY (0) = P (Y = 0)
= P (Y = 0, X = 0) + P (Y = 0, X = 1)
= 1/8 + 0 = 1/8
pY (1) = P (Y = 1)
= P (Y = 1, X = 0) + P (Y = 1, X = 1)
= 2/8 + 1/8 = 3/8
In general, to find the frequency function of Y , we simply sum down the
appropriate column of the table giving the joint pmf of X and Y . For this
reason, pY is called the marginal probability mass function of Y . Similarly,
summing across the rows gives
X
pX (x) =
(x, yi ),
i
Joint PDF and Joint CDF: Suppose that X and Y are continuous random
variables. The joint probability density function (pdf) of X and Y is the
function f (x, y) such that for every set C of pairs of real numbers
Z Z
(1.3)
P ((X, Y ) C) =
f (x, y)dxdy.
(x,y)C
f (a, b)dadb,
when da and db are small and f (x, y) is continuous at a, b. Hence f (a, b) is a
measure of how likely it is that the random vector (X, Y ) will be near (a, b).
This is similar to the interpretation of the pdf f (x) for a single random
variable X being a measure of how likely it is to be near x.
The joint CDF of (X, Y ) can be obtained as follows:
F (a, b) = P {X (, a], Y (, b]}
Z Z
=
f (x, y)dxdy
Z
X(,a],Y (,b]
Z a
f (x, y)dxdy
Then we have
d
d
fX (x) =
P (X x) =
dx
dx
f (x, y)dydx =
f (x, y)dy.
fY (y) =
f (x, y)dx.
12 2
(x + xy), 0 x 1, 0 y 1.
7
f (x, y)dxdy
Z
X>Y
1Z 1
12 2
(x + xy)dxdy
7
0
y
Z 1
12 x3 x2 y 1
( +
)| dy
=
2 y
0 7 3
Z 1
12 12
12
12
=
( + y y 3 y 3 )dy
14
21
14
0 21
9
=
14
=
(b) For 0 x 1,
Z
fX (x) =
0
12
12
6
f (x2 + xy)dy = x2 + x.
7
7
7
Example 3 Suppose the set of possible values for (X, Y ) is the rectangle
D = {(x, y) : 0 x 1, 0 y 1}. Let the joint pdf of (X, Y ) be
6
f (x, y) = (x + y 2 ), for (x, y) D.
5
(a) Verify that f (x, y) is a valid pdf.
(b) Find P (0 X 1/4, 0 Y 1/4).
(c) Find the marginal pdf of X and Y .
(d) Find P ( 41 Y 34 ).
The random variables X and Y are said to be independent if for any two
sets of real numbers A and B,
(2.4)
P (X A, Y B) = P (X A)P (Y B).
3
10 .
Example 5 Suppose that a man and a woman decide to meet at a certain location. If each person independently arrives at a time uniformly
distributed between [0, T ].
Find the joint pdf of the arriving times X and Y .
Find the probability that the first to arrive has to wait longer than a
period of ( < T ).
2.1
If X1 , , Xn are all discrete random variables, the joint pmf of the variables
is the function
(2.5)
p(x1 , , xn ) = P (X1 = x1 , , Xn = xn ).
If the variables are continuous, the joint pdf of the variables is the function f (x1 , , xn ) such that
Z b1
Z bn
P (a1 X1 b1 , , an Xn bn ) =
f (x1 , , xn )dx1 xn .
a1
an
One of the most important joint distributions is the multinomial distribution which arises when a sequence of n independent and identical experiments is performed, where each experiment can result in any one of r
possible outcomes, with respective probabilities p1 , , pr . If we let denote
the number of the n experiments that result in outcome i, then
(2.6)
whenever
P (X1 = n1 , . . . , Xr = nr ) =
Pr
i=1 ni
n!
pn1 1 pnr r ,
n1 ! nr !
= n.
10
11
2.2
x
X
P (X = x)P (Y = z x),
12
13
What is the probability that the total lifetime is between 1 and 2 years?
2.3
Conditional Distribution
The use of conditional distribution allows us to define conditional probabilities of events associated with one random variable when we are given the
value of a second random variable.
(1) The Discrete Case: Suppose X and Y are discrete random variables.
The conditional probability mass function of X given Y = yj is the conditional probability distribution of X given Y = yj . The conditional probability mass function of X|Y is
pX|Y (xi |yj ) = P (X = xi |Y = yj )
P (X = xi , Y = yj )
=
P (Y = yj )
pX,Y (xi , yj )
=
pY (yj )
This is just the conditional probability of the event {X = xi } given that
{Y = yi }.
14
Example 10 Considered the situation that a fair coin is tossed three times
independently. Let X denote the number of heads on the first toss and Y
denote the total number of heads.
What is the conditional probability mass function of X given Y?
Are X and Y independent?
15
If X and Y are independent random variables, then the conditional probability mass function is the same as the unconditional one. This follows
because if X is independent of Y , then
pX|Y (x|y) = P (X = x|Y = y)
P (X = x, Y = y)
=
P (Y = y)
P (X = x)P (Y = y)
=
P (Y = y)
= P (X = x)
(2) The Continuous Case: If X and Y have a joint probability density function f (x, y), then the conditional pdf of X, given that Y = y, is defined for
all values of y such that fY (y) > 0, by
(2.9)
fX|Y (x|y) =
fX,Y (x, y)
.
fY (y)
To motivate this definition, multiply the left-hand side by dx and the right
hand side by (dxdy)/dy to obtain
fX,Y (x, y)dxdy
fY (y)dy
P {x X x + dx, y Y y + dy}
P {y Y y + dy}
= P {x X x + dx|y Y y + dy}.
fX|Y (x|y)dx =
In other words, for small values of dx and dy, fX|y (x|y) represents the
conditional probability that X is between x and x + dx given that Y is
between y and y + dy.
That is, if X and Y are jointly continuous, then for any set A,
Z
P {X A|Y = y} =
fX|Y (x|y)dx.
A
X given that Y = y by
Z
FX|Y (a|y) = P (X a|Y = y) =
fX|Y (x|y)dx.
17
Expected Values
Let X and Y be jointly distributed rvs with pmf p(x, y) or pdf f (x, y)
according to whether the variables are discrete or continuous. Then the
expected value of a function h(X, Y ), denoted by E[h(X, Y )], is given by
( PP
h(x, y)p(x, y)
Discrete
x
y
(3.10)
E[h(X, Y )] = R R
h(x, y)f (x, y)dxdy Continuous
Example 12 The joint pdf of X and Y is
18
19
y3
0.0
y2
0
1.5
0.5
0.4
0.2
y1
0.2
0.4
Covariance and correlation are related parameters that indicate the extent
to which two random variables co-vary. Suppose there are two technology
stocks. If they are affected by the same industry trends, their prices will
tend to rise or fall together. They co-vary. Covariance and correlation
measure such a tendency. We will begin with the problem of calculating the
expected values of a function of two random variables.
x1
1.5
x2
0.5
0.0
0.5
1.0
x3
20
1.5
4.1
Covariance
When two random variables are not independent, we can measure how
strongly they are related to each other. The covariance between two rvs X
and Y is
Cov(X, Y ) = E[(X X )(Y Y )]
P P
X, Y discrete
y (x X )(y Y )p(x, y)
R x R
=
(x X )(y Y )f (x, y)dxdy X, Y continuous
Variance of a random variable can be view as a special case of the above
definition: Var(X) = Cov(X, X).
Properties of covariance:
1. A shortcut formula: Cov(X, Y ) = E(XY ) X Y .
3. Cov(aX + b, cY + d) = acCov(X, Y ).
21
y=0
0.20
0.05
y = 100
0.10
0.15
22
y = 200
?
0.20
0.30
23
4.2
Correlation Coefficient
The defect of covariance is that its computed value depends critically on the
units of measurement (e.g., kilograms versus pounds, meters versus feet).
Ideally, the choice of units should have no effect on a measure of strength of
relationship. This can be achieved by scaling the covariance by the standard
deviations of X and Y .
The correlation coefficient of X and Y , denoted by X,Y , is defined by
(4.1)
X,Y =
Cov(X, Y )
.
X Y
24
25