Chapter 4 - Multivariate Probability Distribution
Chapter 4 - Multivariate Probability Distribution
Our study of random variables and their probability distributions in the preceding chapters is
restricted to one-dimensional sample spaces, where we recorded outcomes of an experiment as
values assumed by a single random variable. There will be situations, however, where we may
find it desirable to record the simultaneous outcomes of several variables. For example:
1. We might measure the amount of precipitate P and volume V of gas released from a
controlled chemical experiment, giving rise to a two-dimensional sample space
consisting of the outcomes (p, v).
3. In a study to determine the likelihood of success in college based on high school data,
we might use a three-dimensional sample space and record for each individual his or
her aptitude test score, high school class rank, and grade-point average at the end of
freshman year in college.
If X and Y are two discrete random variables, the probability distribution for their simultaneous
occurrence can be represented by a function with values f ( x, y ) for any pair of values ( x, y )
within the range of the random variables X and Y. This function is referred to as the joint
probability distribution of X and Y.
Definition 1.1
The function f ( x, y ) is a joint probability distribution or probability mass function of the
discrete random variables X and Y if
1. f ( x, y ) 0 for all ( x, y )
2. f ( x, y ) = 1
x y
3. P ( X = x, Y = y ) = f ( x , y )
For any region A in the xy plane, P ( X , Y ) A = f ( x, y )
A
1
SCHOOL OF MATHEMATICAL SCIENCES
The information for a discrete joint distribution can be neatly summarized in tabular form as
follows:
Y
y1 … yn p(x)
x1 p(x1,y1) … p(x1,yn) p(x1)
. . . .
X . . . .
. . . .
xm p(xm,y1) p(xm,yn) p(xm)
p(y) p(y1) … p(yn) 1
Example 1
Given the following joint probability distribution f ( x, y) :
Y
0 1 4
1 0.10 0.05 0.15
X 3 0.05 0.20 0.25
5 0.15 0.00 0.05
Solution
(a)
Y g ( x)
0 1 4
1 0.10 0.05 0.15 0.3
X 3 0.05 0.20 0.25 0.5
5 0.15 0.00 0.05 0.2
h( y ) 0.30 0.25 0.45 1.0
2
SCHOOL OF MATHEMATICAL SCIENCES
Example 2
Two refills for a ballpoint pen are selected at random from a box that contains 3 blue refills, 2
red refills, and 3 green refills. If X is the number of blue refills and Y is the number of red refills
selected, find
Note: 0 x + y 2
Example 3
Roll a red die and a green die. Let X1 = number of dots on the red die, X2 = number of dots on
the green die. There are 36 points in the sample space.
Table: Possible Outcomes of Rolling a Red Die and a Green Die. (First number in the pair is
the number on the red die.)
3
SCHOOL OF MATHEMATICAL SCIENCES
Green 1 2 3 4 5 6
Red
1 1 1 1 2 1 3 1 4 1 5 1 6
2 2 1 2 2 2 3 2 4 2 5 2 6
3 3 1 3 2 3 3 3 4 3 5 3 6
4 4 1 4 2 4 3 4 4 4 5 4 6
5 5 1 5 2 5 3 5 4 5 5 5 6
6 6 1 6 2 6 3 6 4 6 5 6 6
1 1
The probability of (1, 1) is . The probability of (6, 3) is also .
36 36
1 1
Now consider P ( 2 X 1 4,1 X 2 3) = f ( 2, 2 ) + f ( 2,3) + f ( 3, 2 ) + f ( 3,3) = 4 =
36 9
When X and Y are continuous random variables, the joint density function f ( x, y ) is a surface
lying above the xy plane, and P ( X , Y ) A , where A is any region in the xy plane, is equal
to the volume of the right cylinder bounded by the base A and the surface f ( x, y ) .
Definition 1.2
The function f ( x, y) is a joint density function of the continuous random variables X and Y if
1. f ( x, y) 0 for all ( x, y)
2. f ( x, y )dxdy = 1
− −
Example 4
A candy company distributes boxes of chocolates with a mixture of creams, toffees, and nuts
coated in both light and dark chocolate. For a randomly selected box, let X and Y, respectively,
be the proportion of the light and dark chocolates that are creams and suppose that the joint
density function is
2
(2 x + 3 y ) 0 x 1,0 y 1
f ( x, y ) = 5
0 elsewhere
Solution
1 1
2
(a) f ( x, y )dxdy = (2 x + 3 y)dxdy
− − 0 0
5
4
SCHOOL OF MATHEMATICAL SCIENCES
x =1
2 x 2 6 xy
1
= + dy
0
5 5 x =0
2 6y
1
= + dy
0
5 5
1
2 y 3y2
= +
5 5 0
2 3
= + =1
5 5
1 1 1
(b) P[( X , Y ) A] = P 0 X , Y
2 4 2
1 1
2 2
2
= (2 x + 3 y ) dxdy
1 0 5
4
1
1 x=
2
2 x 2 6 xy
2
= + dy
1 5 5
4 x =0
1
1 3y
2
= + dy
1 10 5
4
1
y 3y2 2
= +
10 10 1
4
1 1 3 1 3 13
= + − + =
10 2 4 4 16 160
5
SCHOOL OF MATHEMATICAL SCIENCES
Example 5
Let f ( x, y ) = kx be a joint density function on the region R in the plane described by
0 x y 1 . Find the value of k.
Example 6
An insurance company insures a large number of drivers. Let X be the random variable
representing the company’s losses under collision insurance, and let Y represent the company’s
losses under liability insurance. X and Y have joint density function
2x + 2 − y
, for 0 x 1 and 0 y 2
f ( x, y ) = 4
0, otherwise.
6
SCHOOL OF MATHEMATICAL SCIENCES
Example 7
Let X, Y, and Z have the joint probability density function
kxy 2 z , 0 x, y 1, 0 z 2
f ( x, y , z ) =
0, elsewhere
(a) Find k.
1 1
(b) Find P X , Y ,1 Z 2 .
4 2
Solution
(a) f ( x, y, z )dxdydz = 1
− − −
2 1 1
kxy zdxdydz = 1
2
0 0 0
2 1 x =1
x2 y 2 z
0 0 k 2 dydz = 1
x =0
2 1
y2 z
k
0 0
2
dydz = 1
2 y =1
y3 z
0 6
k dz = 1
y =0
2
z
k 6dz = 1
0
z =2
z2
k =1
12 z =0
4
k =1
12
12
k = =3
4
7
SCHOOL OF MATHEMATICAL SCIENCES
1
2 1 4
1 1
(b) P X , Y ,1 Z 2 = 3xy 2 zdxdydz
4 2 110
2
1
2 1 x=
3x 2 y 2 z 4
= dydz
1 1
2 x =0
2
2 1
3y2 z
= dydz
1 1
32
2
2 y =1
y3 z
= dz
1
32 y=
1
2
z z
2
= − dz
1 32 256
2
7z
= dz
1
256
z =2
7z2
=
512 z =1
28 7 21
= − =
512 512 512
2. Marginal Distributions
Given the joint probability distribution f ( x, y ) of the discrete random variables X and Y, the
probability distribution g(x) of X alone is obtained by summing f ( x, y ) over the values of Y.
Similarly, the probability distribution h(y) of Y alone is obtained by summing f (x, y) over the
values of X. We define g(x) and h(y) to be the marginal distributions of X and Y, respectively.
When X and Y are continuous random variables, summations are replaced by integrals. We can
now make the following general definitions.
Definition 2.1
The marginal distributions of X alone and of Y alone are
g ( x ) = f ( x, y ) and h ( y ) = f ( x, y )
y x
The term marginal is used here because, in the discrete case, the values of g(x) and h(y) are just
the marginal totals of the respective columns and rows when the values of f(x, y) are displayed
in a rectangular table.
8
SCHOOL OF MATHEMATICAL SCIENCES
Example 8
By referring to Example 1, find the marginal distributions for X and Y.
Solution
Marginal distribution for X
x 1 3 5
g(x) 0.30 0.50 0.20
Y 0 1 4
h(y) 0.30 0.25 0.45
Example 9
By referring to Example 2, find the marginal distribution of X and Y.
9
SCHOOL OF MATHEMATICAL SCIENCES
Example 10
Find the marginal distribution of X and Y.
2
(2 x + 3 y ), 0 x 1, 0 y 1,
f ( x, y ) = 5
0, elsewhere.
Solution
g ( x) = f ( x, y )dy
−
1
2
= (2 x + 3 y )dy
0
5
y =1
4 xy 6 y 2
= +
5 10 y =0
4x + 3
=
5
4x + 3
,0 x 1
g ( x) = 5
0, elsewhere
h( y) = f ( x, y )dx
−
1
2
= (2 x + 3 y )dx
0
5
x =1
2 x 2 6 xy
= +
5 5 x =0
2 + 6y
=
5
2 + 6y
,0 y 1
h( y) = 5
0, elsewhere
Example 11
Let f ( x, y) = 6 x be a joint density function on the region R in the plane described by
0 x y 1.
10
SCHOOL OF MATHEMATICAL SCIENCES
f ( x, y )
The function is strictly a function of y with x fixed and satisfies all the conditions of
g ( x)
a probability distribution. This is also true when f ( x, y ) and g ( x ) are the joint density and
marginal distribution, respectively, of continuous random variables. As a result, it is extremely
f ( x, y )
important that we make use of the special type of distribution of the form in order to
g ( x)
effectively compute conditional probabilities. This type of distribution is called a conditional
probability distribution.
Definition 3.1
Let X and Y be two random variables, discrete or continuous. The conditional distribution of
the random variable Y given that X = x is
f ( x, y )
f ( y x) = , provided g ( x ) 0
g ( x)
Similarly, the conditional distribution of X given that Y = y is
11
SCHOOL OF MATHEMATICAL SCIENCES
f ( x, y )
f ( x y) = , provided h ( y ) 0
h( y)
If we wish to find the probability that the discrete random variable X falls between a and b
when it is known that the discrete variable Y = y, we evaluate
P ( a X b Y = y ) = f ( x y ),
a x b
where the summation extends over all values of X between a and b. When X and Y are
continuous, we evaluate
b
P ( a X b Y = y ) = f ( x y ) dx.
a
Example 12
By referring to Example 2, find the conditional distribution of X, given that Y = 1, and use it
to determine P( X = 0 | Y = 1).
Therefore, if it is known that 1 of the 2 pen refills selected is red, we have a probability equal
1
to that the other refill is not blue.
2
Example 13
The joint density for the random variables (X,Y), where X is the unit temperature change and Y
is the proportion of spectrum shift that a certain atomic particle produces is
10 xy 2 0 x y 1
f ( x, y ) =
0 elsewhere
a) Find the marginal densities g(x), h(y), and the conditional density f ( y | x) .
12
SCHOOL OF MATHEMATICAL SCIENCES
b) Find the probability that the spectrum shifts more than half of the total observations,
given the temperature is increased to 0.25 units.
1
(a) g ( x) = f ( x, y )dy = 10 xy 2 dy
− x
y =1
x (1 − x3 ) , 0 x 1
10 3 10
= xy =
3 y=x 3
y
h( y) = f ( x, y )dx = 10 xy dx
2
− 0
2 x= y
= 5x2 y = 5 y4 , 0 y 1
x =0
f ( x, y ) 10 xy 2 3y2
f ( y x) = = = ,0 x y 1
g ( x) 10
x (1 − x )
3 1 − x 3
1
P Y X = 0.25 = f ( y x = 0.25 )dy
1
(b)
2 1
2
1
3y2 8
= dy =
1 1 − 0.25
3
9
2
Example 14
Given the joint density function
x (1 + 3 y 2 )
, 0 x 2, 0 y 1
f ( x, y ) = 4
0, elsewhere
1 1
Find g ( x ) , h ( y ) , f ( x y ) and evaluate P X Y = .
1
4 2 3
13
SCHOOL OF MATHEMATICAL SCIENCES
Solution
1
x (1 + 3 y 2 )
g ( x) = f ( x, y )dy = dy
− 0
4
y =1
xy xy 3 x
= + = ,0 x 2
4 4 y =0 2
2
x (1 + 3 y 2 )
h( y) = f ( x, y )dx = dx
− 0
4
x=2
x 2 3x 2 y 2 1+ 3y2
= + =
8 8 x =0 2
f ( x, y ) x (1 + 3 y ) 1 + 3 y 2 x
2
f ( x y) = = = ,
h( y) 4 2 2
and
1
1 1 1 x 3 2
P X Y = = dx =
4 2 3 1 2 64
4
4. Statistical Independence
If f ( x | y ) does not depend on y, as in the case for Example 14, then f ( x | y ) = g ( x) and
f ( x, y ) = g ( x)h( y ) . The proof follows by substituting
f ( x, y ) = f ( x y ) h ( y )
into the marginal distribution of X. That is,
g ( x) =
−
f ( x, y )dy = f ( x y ) h ( y )dy
−
g ( x ) = f ( x y ) and then f ( x, y ) = g ( x ) h ( y )
If f ( x | y ) does not depend on y, then the outcome of the random variable Y has no impact on
the outcome of the random variable X. In other words, we say that X and Y are independent
random variables.
14
SCHOOL OF MATHEMATICAL SCIENCES
Definition 4.1
Let X and Y be two random variables, discrete or continuous, with joint probability
distribution f ( x | y ) and marginal distributions g (x) and h( y ) , respectively. The random
variables X and Y are said to be statistically independent if and only if
f ( x, y) = g ( x).h( y)
for all (x,y) within their range.
The continuous random variables of Example 14 are statistically independent, since the product
of the two marginal distributions gives the joint density function. This is obviously not the case,
however, for the continuous random variables of Example 13. Checking for statistical
independence of discrete random variables requires a more thorough investigation, since it is
possible to have the product of the marginal distributions equal to the joint probability
distribution for some but not all combinations of ( x, y ) . If you can find any point ( x, y ) for
which f ( x, y ) is defined such that f ( x, y ) g ( x ) h ( y ) , the discrete variables are not
statistically independent.
Example 15
By referring to Example 2, is the number of blue refills in the sample independent of the number
of red refills? (Is X independent of Y?)
Example 16
6 xy 2 , 0 x 1;0 y 1
Let f ( x, y ) =
0, elsewhere
15
SCHOOL OF MATHEMATICAL SCIENCES
Note:
1. Cov( X , Y ) is a measurement of the nature of the association between the random variables
X and Y. If large values of X often results in large values of Y or small values of X result in
small values of Y, positive X − X will result in positive Y − Y and negative X − X will
result in negative Y − Y . Thus, the product ( X − X )(Y − Y ) will tend to be positive. On
the other hand, if large X values often result in small Y values, the product
( X − X )(Y − Y ) will tend to be negative.
2. The sign of the covariance indicates whether the relationship between two dependent
random variables is positive or negative.
3. When X and Y are statistically independent, the Cov( X , Y ) =0. The converse is not
generally true, i.e. two variables may have zero covariance and still not be statistically
independent.
4. The covariance only describes the linear relationship between two random variables.
Therefore, if a covariance between X and Y is zero, X and Y may have a nonlinear
relationship, which means that they are not necessarily independent.
16
SCHOOL OF MATHEMATICAL SCIENCES
Theorem 5.1
If X and Y are random variables with means X and Y , respectively, the covariance of X
and Y is
Cov( X , Y ) = E[( X − X )(Y − Y )] = E[ XY ] − E[ X ]E[Y ] = E[ XY ] − X Y
Proof
For the discrete case, we can write
XY = ( x − X )( y − Y ) f ( x, y)
x y
= xyf ( x, y) − X yf ( x, y) −Y xf ( x, y) + X Y f ( x, y)
x y x y x y x y
Since
X = xf ( x, y), Y = yf ( x, y), and f ( x, y) = 1
x y x y x y
Example 17
Find the covariance between X and Y with joint probability function:
Y
0 1 4
1 0.10 0.05 0.15
X 3 0.05 0.20 0.25
5 0.15 0.00 0.05
Solution
5 4
E XY = xyf ( x, y )
x =1 y =0
= (1)( 0 ) f (1, 0 ) + (1)(1) f (1,1) + (1)( 4 ) f (1, 4 ) + ( 3)( 0 ) f ( 3, 0 ) + ( 3)(1) f ( 3,1) + ( 3 )( 4 ) f (3, 4 )
+ ( 5 )( 0 ) f ( 5, 0 ) + ( 5 )(1) f ( 5,1) + ( 5 )( 4 ) f ( 5, 4 )
= f (1,1) + 4 f (1, 4 ) + 3 f ( 3,1) + 12 f ( 3, 4 ) + 5 f ( 5,1) + 20 f ( 5, 4 )
= 0.05 + 4 ( 0.15) + 3 ( 0.20 ) + 12 ( 0.25) + 5 ( 0 ) + 20 ( 0.05)
= 5.25
Marginal Distribution for X
x
1 3 5
g (x) 0.30 0.5 0.20
5
X = xg ( x ) = (1)( 0.30 ) + ( 3)( 0.50 ) + ( 5 )( 0.20 ) = 2.8
x =1
17
SCHOOL OF MATHEMATICAL SCIENCES
XY = E XY − X Y
= 5.25 − ( 2.8 )( 2.05 )
= −0.49
Example 18
The fraction X of male runners and the fraction Y of female runners who compete in marathon
races are described by the joint density function
8 xy, 0 y x 1
f ( x, y ) =
0, elsewhere
Find the covariance of X and Y.
Solution
4 x3 , 0 x 1
g ( x) =
0, elsewhere
4 y (1 − y ) , 0 y 1
2
h( y) =
0, elsewhere
1
4
X = E X = 4 x 4 dx =
0
5
1
Y = E Y = 4 y 2 (1 − y 2 ) dy =
8
0
15
1 1
4
E XY = 8 x 2 y 2 dxdy =
0 y
9
4 4 8 4
XY = E XY − X Y = − =
9 5 15 225
Example 19
Let X and Y be discrete random variables with a joint probability distribution shown as follows.
Show that X and Y are dependent but have zero covariance.
18
SCHOOL OF MATHEMATICAL SCIENCES
Solution
1 1
E XY = xyf ( x, y )
x =−1 y =−1
= ( −1)( −1) f ( −1, −1) + ( −1)(1) f ( −1,1) + (1)( −1) f (1, −1) + (1)(1) f (1,1)
1 1 1 1
= − − + =0
16 16 16 16
Marginal Distribution for X
x
-1 0 1
g (x) 5 6 5
16 16 16
1
5 5
X = xg ( x ) = ( −1) + (1) = 0
x =−1 16 16
Marginal Distribution for Y
y
-1 0 1
h (y) 5 6 5
16 16 16
1
5 5
Y = yh ( y ) = ( −1) + (1) = 0
y =−1 16 16
XY = E XY − X Y
= 0−0
=0
5 5 25 1
However, g ( −1) h ( −1) = = but f ( −1, −1) = . Hence, X and Y are not
16 16 256 16
independent since f ( −1, −1) g ( −1) h ( −1) .
19
SCHOOL OF MATHEMATICAL SCIENCES
Although the covariance between two random variables does provide information regarding
the nature of the relationship, the magnitude of XY does not indicate anything regarding the
strength of the relationship, since XY is not scale-free. Its magnitude will depend on the units
used to measure both X and Y. There is a scale-free version of the covariance called the
correlation coefficient that is used widely in statistics.
Definition 5.2
Let X and Y be random variables with covariance XY and standard deviation X and Y ,
respectively. The correlation coefficient of X and Y is
XY
XY = , where -1 ≤ ≤ 1
XY
XY is free of the units of X and Y. It assumes a value of zero when XY = 0. When there is an
exact linear dependency, say Y = a + bX , XY = 1 if b > 0 and XY = −1 if b < 0.
Example 20
Find the correlation coefficient between X and Y in Example 17.
Solution
E X 2 = (12 ) ( 0.30 ) + ( 32 ) ( 0.5 ) + ( 52 ) ( 0.20 ) = 9.8
E Y 2 = ( 02 ) ( 0.30 ) + (12 ) ( 0.25 ) + ( 42 ) ( 0.45 ) = 7.45
XY −0.49 −0.49
XY = = = = −0.1942
XY 1.96 3.2475 (1.4 )(1.8021)
Example 21
Find the correlation coefficient of X and Y in Example 18.
Solution
1
2
E X = 4 x5 dx =
2
0
3
1
E Y 2 = 4 y 3 (1 − y 2 )dy = 1 −
2 1
=
0
3 3
2
2 4 2
2 = − =
X
3 5 75
2
1 8 11
Y = − =
2
3 15 225
4 / 225 4
XY = XY = = = 0.4924
XY 2 / 75 11/ 225 66
20
SCHOOL OF MATHEMATICAL SCIENCES
Note that although the covariance in Example 20 is larger in magnitude (disregarding the sign)
that that in Example 21, the relationship of the magnitude of the correlation coefficients in
these two examples is just the reverse. This is evidence that we cannot look at the magnitude
of the covariance to decide on how strong the relationship is.
Theorem 5.2
E X Y = E X E Y
If X represent the daily production of some item from machine A and Y the daily production of
the same kind of item from machine B, then X + Y represents the total number of items produced
daily by both machines. Theorem 5.2 states that the average daily production for both machines
is equal to the sum of the average daily production of each machine.
Theorem 5.3
Let X and Y be two independent variables. Then
E XY = E X E Y
Proof
E XY = xyf ( x, y )dxdy
− −
= E X E Y
Theorem 5.2 can be illustrated for discrete variables by considering the experiment of tossing
a green die and a red die. Let the random variable X represent the outcome on the green die and
the random variable Y represent the outcome on the red die. Then XY represents the product of
the numbers that occur on the pair of dice. In the long run, the average of the products of the
numbers is equal to the product of the average number that occurs on the green die and the
average of the number that occurs on the red die.
Theorem 5.4
If X and Y are independent random variables, then
XY = 0
Proof:
XY = E[ XY ] − X Y
=0
21
SCHOOL OF MATHEMATICAL SCIENCES
Theorem 5.5
If X and Y are random variables with joint probability distribution f ( x, y) , then
Var(aX + bY + c) = a 2 Var(X ) + b 2 Var(Y ) + 2abCov( X , Y )
Proof:
= E (ax + by + c) − a X − bY − c
2
= E a( x − X ) + b ( y − Y )
2
= a 2 E ( x − X ) 2 + 2abE( x − X )( y − Y ) + b 2 E ( y − Y ) 2
= a 2 X2 + 2ab XY + b 2 Y2
Corollary:
Corollary 1 to 3 state that the variance is unchanged if a constant is added to or subtracted from
a random variable. The addition or subtraction of a constant simply shifts the values of X to the
right or to the left but does not change their variability. However, if a random variable is
multiplied or divided by a constant, then Corollary 1 and 3 state that the variance is multiplied
or divided by the square of the constant.
The result stated in Corollary 4 is obtained from Theorem 5.5 by invoking Theorem 5.4.
Corollary 5 follows when b in Corollary 4 is replaced by -b.
Generalizing to a linear combination of n independent random variables, we have Corollary 6.
22
SCHOOL OF MATHEMATICAL SCIENCES
Example 22
Suppose E[X] = -3, E[X2] = 13, Var [Y] = 20, E[Y] = 4, and E [XY] = 7. Find Var [5X – 9Y].
Solution
Var X = E X 2 − ( E X )
2
= 13 − ( −3) = 4
2
Cov XY = E XY − E X E Y
= 7 − ( −3)( 4 )
= 19
Var 5 X − 9Y = 25Var X + 81Var Y − 2 ( 5 )( 9 ) Cov XY
= 25 ( 4 ) + 81( 20 ) − 90 (19 ) = 100 + 1620 − 1710 = 10
Example 23
If X and Y are random variables with variances X2 = 2 and Y2 = 4 and covariance XY
2
= −2 ,
find the variance of the random variable Z = 3 X − 4Y + 8 .
Solution
Var(3 X − 4Y + 8) = 9Var(X ) + 16Var(Y ) − 2 ( 3)( 4 ) Cov( X , Y )
= ( 9 )( 2 ) + (16 )( 4 ) − ( 24 )( −2 ) = 130
Example 24
Let X and Y denote the amounts of two different types of impurities in a batch of a certain
chemical product. Suppose that X and Y are independent random variables with variances
X2 = 2 and Y2 = 3 . Find the variance of the random variable Z = 3 X − 2Y + 5
Solution
Var(3 X − 2Y + 5) = 9Var(X ) + 4Var(Y )
= ( 9 )( 2 ) + ( 4 )( 3) = 30
In general, if a given trial can result in any one of k possible outcomes E1, E2, …, Ek with
probabilities p1 , p2 ,..., pk , then the multinomial distribution will give the probability that E1
occurs x1 times, E2 occurs x2 times, …, and Ek occurs xk times in n independent trials, where
23
SCHOOL OF MATHEMATICAL SCIENCES
x1 + x2 + ... + xk = n
We shall denote this joint probability distribution by
f ( x1 , x2 ,..., xk ; p1 , p2 ,..., pk , n ) ,
where p1 + p2 + ... + pk = 1 , since the result of each trial must be one of the k possible outcomes.
The following shows the multinomial distribution.
Multinomial Distribution
If a given trial can result in the k outcomes E1, E2, …, Ek with probabilities p1 , p2 ,..., pk , then
the probability distribution of the random variables X 1 , X 2 ,..., X k representing the number of
occurrences for E1, E2, …, Ek in n independent trials, is
n x1 x2
f ( x1 , x2 ,..., xk ; p1 , p2 ,..., pk , n ) = xk
p1 p 2 ... p k ,
1 2
x , x ,..., xk
with
k k
xi = n and
i =1
p
i =1
i =1
Example 25
A certain city has 3 newspapers, A, B, and C. Newspaper A has 50 percent of the readers in
the city. Newspaper B, has 30 percent of the readers, and newspaper C has the remaining 20
percent. Find the probability that, among 8 randomly chosen readers in that city, 5 will read
newspaper A, 2 will read newspaper B, and 1 will read newspaper C. (assume no one reads
more than one newspaper)
Solution
8
f ( 5, 2,1;0.5, 0.3, 0.2,8 ) = ( 0.5 ) ( 0.3) ( 0.2 )
5 2 1
5, 2,1
8!
= ( 0.5) ( 0.3) ( 0.2 )
5 2 1
5!2!1!
= 0.0945
Example 26
The complexity of arrivals and departures of planes at an airport is such that computer
simulation is often used to model the “ideal” conditions. For a certain airport with three
runways, it is known that in the ideal setting the following are the probabilities that the
individual runways are accessed by a randomly arriving commercial jet:
2
Runway 1: p1 =
9
1
Runway 2: p2 =
6
24
SCHOOL OF MATHEMATICAL SCIENCES
11
Runway 3: p3 =
18
What is the probability that 6 randomly arriving airplanes are distributed in the following
fashion?
Runway 1: 2 airplanes
Runway 2: 1 airplane
Runway 3: 3 airplanes
Solution
6 2 1 11
2 1 3
2 1 11
f 2,1,3; , , ,8 6 =
9 6 18 2,1,3 9 6 18
2 1 3
6! 2 1 11
=
2!1!3! 9 6 18
= 0.1127
7. Conditional Expectations
Definition 7.1
If X and Y are any two random variables, the conditional expectation of g(X), given that Y =
y, is defined to be
E[ g ( X ) | Y = y] = g ( x) f ( x | y)dx if X and Y are jointly continuous and
−
E[ X ] = E[ E[ X | Y ]] (7.1)
E X = E X Y = y P (Y = y ) (7.2)
y
25
SCHOOL OF MATHEMATICAL SCIENCES
Proof
E X Y = y P (Y = y ) = xP ( X = x Y = y )P (Y = y )
y y x
P ( X = x, Y = y )
= x P (Y = y )
y x P (Y = y )
= xP ( X = x, Y = y )
y x
= x P ( X = x, Y = y )
x y
= xP ( X = x )
x
= EX
Example 27
A quality control plan for an assembly line involves sampling n =10 finished items per day and
counting Y, the number of defectives. If p denotes the probability of observing a defective, the
Y has a binomial distribution, assuming a large number of items are produced by the line. But
p varies from day to day and is assumed to have a uniform distribution on the interval from 0
to ¼. Find the expected value of Y.
Solution
1
Y ~ Bin (10, p ) , where p ~ Uniform 0,
4
E Y = E Y p f ( p ) dp
−
E Y p = np = 10 p
1
4, 0 p
f ( p) = 4
0, elsewhere
E Y = E Y p f ( p ) dp
−
1
4
= (10 p )( 4 ) dp
0
1
= 20 p 2 4
0
20 5
= =
16 4
26
SCHOOL OF MATHEMATICAL SCIENCES
Example 28
A professor works in Moon Township and lives in Pittsburgh. It is about a 25 mile commute.
The professor randomly chooses from 3 different routes home in a futile attempt to evade rush
hour traffic. The routes are identified by the name of a major bridge along the way. The
professor has accumulated data over a lengthy period of time on the mean drive times of the
three routes. Using the data summary given below,
Route Probability of Route Expected Time of
Route (in minutes)
Wickle bridge 0.2 55
Fort bridge 0.4 50
Liberty bridge 0.4 45
Solution
Let X be the drive time
Y = 1 be route Wickle bridge is chosen
Y = 2 be route Fort bridge is chosen
Y = 3 be route Liberty bridge is chosen
E X = E X Y = y P (Y = y )
y
27