0% found this document useful (0 votes)
55 views71 pages

Chap2-Discrete Random Variables

1. A discrete random variable is a function that maps from a sample space of possible outcomes to real numbers. 2. Examples of discrete random variables include the number of heads when tossing a coin 3 times, and the index of the first heads. 3. The probability mass function gives the probability that a discrete random variable equals each possible value, and the cumulative distribution function gives the probability that the variable is less than or equal to each value.

Uploaded by

nguimbousdorette
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views71 pages

Chap2-Discrete Random Variables

1. A discrete random variable is a function that maps from a sample space of possible outcomes to real numbers. 2. Examples of discrete random variables include the number of heads when tossing a coin 3 times, and the index of the first heads. 3. The probability mass function gives the probability that a discrete random variable equals each possible value, and the cumulative distribution function gives the probability that the variable is less than or equal to each value.

Uploaded by

nguimbousdorette
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

DISCRETE RANDOM VARIABLES

DEFINITION : A discrete random variable is a function X(s) from


a finite or countably infinite sample space S to the real numbers :
X(·) : S → R.

EXAMPLE : Toss a coin 3 times in sequence. The sample space


is
S = {HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } ,
and examples of random variables are
• X(s) = the number of Heads in the sequence ; e.g., X(HT H) = 2 ,

• Y (s) = The index of the first H ; e.g., Y (T T H) = 3 ,


0 if the sequence has no H , i.e., Y (T T T ) = 0 .

NOTE : In this example X(s) and Y (s) are actually integers .

71
Value-ranges of a random variable correspond to events in S .

EXAMPLE : For the sample space


S = {HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } ,
with
X(s) = the number of Heads ,
the value
X(s) = 2 , corresponds to the event {HHT , HT H , T HH} ,
and the values
1 < X(s) ≤ 3 , correspond to {HHH , HHT , HT H , T HH} .

NOTATION : If it is clear what S is then we often just write


X instead of X(s) .

72
Value-ranges of a random variable correspond to events in S ,
and
events in S have a probability .
Thus
Value-ranges of a random variable have a probability .

EXAMPLE : For the sample space


S = {HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } ,
with X(s) = the number of Heads ,
we have
6
P (0 < X ≤ 2) = .
8

QUESTION : What are the values of


P (X ≤ −1) , P (X ≤ 0) , P (X ≤ 1) , P (X ≤ 2) , P (X ≤ 3) , P (X ≤ 4) ?

73
NOTATION : We will also write pX (x) to denote P (X = x) .

EXAMPLE : For the sample space

S = {HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } ,
with
X(s) = the number of Heads ,
we have
1
pX (0) ≡ P ( {T T T } ) = 8

3
pX (1) ≡ P ( {HT T , T HT , T T H} ) = 8

3
pX (2) ≡ P ( {HHT , HT H , T HH} ) = 8

1
pX (3) ≡ P ( {HHH} ) = 8
where
pX (0) + pX (1) + pX (2) + pX (3) = 1 . ( Why ? )

74
E3 X(s)
HHH
S
E2 3
HHT

HTH
2
THH

HTT
E1 1
THT

TTH

E0 0
TTT

Graphical representation of X .

The events E0 , E1 , E2 , E3 are disjoint since X(s) is a function !


(X : S → R must be defined for all s ∈ S and must be single-valued.)

75
The graph of pX .

76
DEFINITION :
pX (x) ≡ P (X = x) ,
is called the probability mass function .

DEFINITION :
FX (x) ≡ P (X ≤ x) ,
is called the (cumulative) probability distribution function .

PROPERTIES :
• FX (x) is a non-decreasing function of x . ( Why ? )
• FX (−∞) = 0 and FX (∞) = 1 . ( Why ? )
• P (a < X ≤ b) = FX (b) − FX (a) . ( Why ? )

NOTATION : When it is clear what X is then we also write

p(x) for pX (x) and F (x) for FX (x) .

77
EXAMPLE : With X(s) = the number of Heads , and
S = {HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } ,
1 3 3 1
p(0) = 8
, p(1) = 8
, p(2) = 8
, p(3) = 8
,
we have the probability distribution function

F (−1) ≡ P (X ≤ −1) = 0
1
F ( 0) ≡ P (X ≤ 0) = 8
4
F ( 1) ≡ P (X ≤ 1) = 8
7
F ( 2) ≡ P (X ≤ 2) = 8
F ( 3) ≡ P (X ≤ 3) = 1
F ( 4) ≡ P (X ≤ 4) = 1
We see, for example, that
P (0 < X ≤ 2) = P (X = 1) + P (X = 2)

7 1 6
= F (2) − F (0) = 8
− 8
= 8
.

78
The graph of the probability distribution function FX .

79
EXAMPLE : Toss a coin until ”Heads” occurs.
Then the sample space is countably infinite , namely,
S = {H , T H , T T H , T T T H , · · · } .

The random variable X is the number of rolls until ”Heads” occurs :


X(H) = 1 , X(T H) = 2 , X(T T H) = 3 , ···
Then
1 1
p(1) = 2
, p(2) = 4
, p(3) = 18 , · · · ( Why ? )
and n n
X X 1 1
F (n) = P (X ≤ n) = p(k) = k
= 1 − n ,
k=1 k=1
2 2
and, as should be the case,
∞ n
X X 1
p(k) = lim p(k) = lim (1 − n ) = 1 .
k=1
n→∞
k=1
n→∞ 2

NOTE : The outcomes in S do not have equal probability !


EXERCISE : Draw the probability mass and distribution functions.

80
X(s) is the number of tosses until ”Heads” occurs · · ·

REMARK : We can also take S ≡ Sn as all ordered outcomes of


length n. For example, for n = 4,

S4 = { H̃HHH , H̃HHT , H̃HT H , H̃HT T ,

H̃T HH , H̃T HT , H̃T T H , H̃T T T ,

T H̃HH , T H̃HT , T H̃T H , T H̃T T ,

T T H̃H , T T H̃T , T T T H̃ , TTTT }.

where for each outcome the first ”Heads” is marked as H̃ .


1
Each outcome in S4 has equal probability 2−n (here 2−4 = 16
) , and
1 1 1 1
pX (1) = 2
, pX (2) = 4
, pX (3) = 8
, pX (4) = 16
··· ,
independent of n .

81
Joint distributions
The probability mass function and the probability distribution function
can also be functions of more than one variable.

EXAMPLE : Toss a coin 3 times in sequence. For the sample space

S = {HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } ,
we let
X(s) = # Heads , Y (s) = index of the first H (0 for T T T ) .
Then we have the joint probability mass function
pX,Y (x, y) = P (X = x , Y = y) .
For example,
pX,Y (2, 1) = P (X = 2 , Y = 1)

= P ( 2 Heads , 1st toss is Heads)

2 1
= 8
= 4
.

82
EXAMPLE : ( continued · · · ) For
S = {HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } ,
X(s) = number of Heads, and Y (s) = index of the first H ,

we can list the values of pX,Y (x, y) :

Joint probability mass function pX,Y (x, y)


y=0 y=1 y=2 y=3 pX (x)
1 1
x=0 8 0 0 0 8
1 1 1 3
x=1 0 8 8 8 8
2 1 3
x=2 0 8 8 0 8
1 1
x=3 0 8 0 0 8
1 4 2 1
pY (y) 8 8 8 8 1

NOTE :
• The marginal probability pX is the probability mass function of X.
• The marginal probability pY is the probability mass function of Y .

83
EXAMPLE : ( continued · · · )
X(s) = number of Heads, and Y (s) = index of the first H .

y=0 y=1 y=2 y=3 pX (x)


1 1
x=0 8 0 0 0 8
1 1 1 3
x=1 0 8 8 8 8
2 1 3
x=2 0 8 8 0 8
1 1
x=3 0 8 0 0 8
1 4 2 1
pY (y) 8 8 8 8 1

For example,

• X = 2 corresponds to the event {HHT , HT H , T HH} .


• Y = 1 corresponds to the event {HHH , HHT , HT H , HT T } .
• (X = 2 and Y = 1) corresponds to the event {HHT , HT H} .

QUESTION : Are the events X = 2 and Y = 1 independent ?

84
X(s)
Y(s)
E 31
HHH
S
E 13 3
TTH
E11
HTT
1
E12 THT

E 21 HTH
2
HHT

E 22 THH

0
E 00
TTT

The events Ei,j ≡ { s ∈ S : X(s) = i , Y (s) = j } are disjoint .


QUESTION : Are the events X = 2 and Y = 1 independent ?

85
DEFINITION :
pX,Y (x, y) ≡ P (X = x , Y = y) ,
is called the joint probability mass function .

DEFINITION :
FX,Y (x, y) ≡ P (X ≤ x , Y ≤ y) ,
is called the joint (cumulative) probability distribution function .

NOTATION : When it is clear what X and Y are then we also


write

p(x, y) for pX,Y (x, y) ,


and
F (x, y) for FX,Y (x, y) .

86
EXAMPLE : Three tosses : X(s) = # Heads, Y (s) = index 1st H .
Joint probability mass function pX,Y (x, y)
y=0 y=1 y=2 y=3 pX (x)
1 1
x=0 8 0 0 0 8
1 1 1 3
x=1 0 8 8 8 8
2 1 3
x=2 0 8 8 0 8
1 1
x=3 0 8 0 0 8
1 4 2 1
pY (y) 8 8 8 8 1

Joint distribution function FX,Y (x, y) ≡ P (X ≤ x, Y ≤ y)


y=0 y=1 y=2 y=3 FX (·)
1 1 1 1 1
x=0 8 8 8 8 8
1 2 3 4 4
x=1 8 8 8 8 8
1 4 6 7 7
x=2 8 8 8 8 8
1 5 7
x=3 8 8 8 1 1
1 5 7
FY (·) 8 8 8 1 1

Note that the distribution function FX is a copy of the 4th column,


and the distribution function FY is a copy of the 4th row. ( Why ? )

87
In the preceding example :
Joint probability mass function pX,Y (x, y)
y=0 y=1 y=2 y=3 pX (x)
1 1
x=0 8 0 0 0 8
1 1 1 3
x=1 0 8 8 8 8
2 1 3
x=2 0 8 8 0 8
1 1
x=3 0 8 0 0 8
1 4 2 1
pY (y) 8 8 8 8 1

Joint distribution function FX,Y (x, y) ≡ P (X ≤ x, Y ≤ y)


y=0 y=1 y=2 y=3 FX (·)
1 1 1 1 1
x=0 8 8 8 8 8
1 2 3 4 4
x=1 8 8 8 8 8
1 4 6 7 7
x=2 8 8 8 8 8
1 5 7
x=3 8 8 8 1 1
1 5 7
FY (·) 8 8 8 1 1

QUESTION : Why is
P (1 < X ≤ 3 , 1 < Y ≤ 3) = F (3, 3) − F (1, 3) − F (3, 1) + F (1, 1) ?

88
EXERCISE :
Roll a four-sided die (tetrahedron) two times.
(The sides are marked 1 , 2 , 3 , 4 .)
Suppose each of the four sides is equally likely to end facing down.
Suppose the outcome of a single roll is the side that faces down ( ! ).

Define the random variables X and Y as

X = result of the first roll , Y = sum of the two rolls.

• What is a good choice of the sample space S ?


• How many outcomes are there in S ?
• List the values of the joint probability mass function pX,Y (x, y) .
• List the values of the joint cumulative distribution function FX,Y (x, y) .

89
EXERCISE :

Three balls are selected at random from a bag containing

2 red , 3 green , 4 blue balls .

Define the random variables

R(s) = the number of red balls drawn,


and
G(s) = the number of green balls drawn .

List the values of


• the joint probability mass function pR,G (r, g) .
• the marginal probability mass functions pR (r) and pG (g) .
• the joint distribution function FR,G (r, g) .
• the marginal distribution functions FR (r) and FG (g) .

90
Independent random variables

Two discrete random variables X(s) and Y (s) are independent if


P (X = x , Y = y) = P (X = x) · P (Y = y) , for all x and y ,

or, equivalently, if their probability mass functions satisfy


pX,Y (x, y) = pX (x) · pY (y) , for all x and y ,

or, equivalently, if the events


Ex ≡ X −1 ({x}) and Ey ≡ Y −1 ({y}) ,
are independent in the sample space S , i.e.,
P (Ex Ey ) = P (Ex ) · P (Ey ) , for all x and y .
NOTE :
• In the current discrete case, x and y are typically integers .
• X −1 ({x}) ≡ { s ∈ S : X(s) = x } .

91
X(s)
Y(s)
E 31
HHH
S
E 13 3
TTH
E11
HTT
1
E12 THT

E 21 HTH
2
HHT

E 22 THH

0
E 00
TTT

Three tosses : X(s) = # Heads, Y (s) = index 1st H .

• What are the values of pX (2) , pY (1) , pX,Y (2, 1) ?


• Are X and Y independent ?

92
RECALL :

X(s) and Y (s) are independent if for all x and y :

pX,Y (x, y) = pX (x) · pY (y) .

EXERCISE :

Roll a die two times in a row.

Let
X be the result of the 1st roll ,
and
Y the result of the 2nd roll .

Are X and Y independent , i.e., is

pX,Y (k, ℓ) = pX (k) · pY (ℓ), for all 1 ≤ k, ℓ ≤ 6 ?

93
EXERCISE :

Are these random variables X and Y independent ?

Joint probability mass function pX,Y (x, y)


y=0 y=1 y=2 y=3 pX (x)
x=0 1
8 0 0 0 1
8
x=1 0 1
8
1
8
1
8
3
8
x=2 0 2
8
1
8 0 3
8
x=3 0 1
8 0 0 1
8
pY (y) 1
8
4
8
2
8
1
8 1

94
EXERCISE : Are these random variables X and Y independent ?

Joint probability mass function pX,Y (x, y)


y = 1 y = 2 y = 3 pX (x)
x=1 1
3
1
12
1
12
1
2
x=2 2
9
1
18
1
18
1
3
x=3 1
9
1
36
1
36
1
6
pY (y) 2
3
1
6
1
6 1

Joint distribution function FX,Y (x, y) ≡ P (X ≤ x, Y ≤ y)


y = 1 y = 2 y = 3 FX (x)
x=1 1
3
5
12
1
2
1
2
x=2 5
9
25
36
5
6
5
6
x=3 2
3
5
6 1 1
FY (y) 2
3
5
6 1 1

QUESTION : Is FX,Y (x, y) = FX (x) · FY (y) ?

95
PROPERTY :
The joint distribution function of independent random variables
X and Y satisfies
FX,Y (x, y) = FX (x) · FY (y) , for all x, y .

PROOF :

FX,Y (xk , yℓ ) = P (X ≤ xk , Y ≤ yℓ )
P P
= i≤k j≤ℓ pX,Y (xi , yj )
P P
= i≤k j≤ℓ pX (xi ) · pY (yj ) (by independence)
P P
= i≤k { pX (xi ) · j≤ℓ pY (yj ) }
P P
= { i≤k pX (xi ) } · { j≤ℓ pY (yj ) }

= FX (xk ) · FY (yℓ ) .

96
Conditional distributions

Let X and Y be discrete random variables with joint probability


mass function
pX,Y (x, y) .

For given x and y , let


Ex = X −1 ({x}) and Ey = Y −1 ({y}) ,
be their corresponding events in the sample space S.

Then
P (Ex Ey ) pX,Y (x, y)
P (Ex |Ey ) ≡ = .
P (Ey ) pY (y)

Thus it is natural to define the conditional probability mass function


pX,Y (x, y)
pX|Y (x|y) ≡ P (X = x | Y = y) = .
pY (y)

97
X(s)
Y(s)
E 31
HHH
S
E 13 3
TTH
E11
HTT
1
E12 THT

E 21 HTH
2
HHT

E 22 THH

0
E 00
TTT

Three tosses : X(s) = # Heads, Y (s) = index 1st H .

• What are the values of P (X = 2 | Y = 1) and P (Y = 1 | X = 2) ?

98
EXAMPLE : (3 tosses : X(s) = # Heads, Y (s) = index 1st H.)
Joint probability mass function pX,Y (x, y)
y = 0 y = 1 y = 2 y = 3 pX (x)
x=0 1
8 0 0 0 1
8
x=1 0 1
8
1
8
1
8
3
8
x=2 0 2
8
1
8 0 3
8
x=3 0 1
8 0 0 1
8
pY (y) 1
8
4
8
2
8
1
8 1
pX,Y (x,y)
Conditional probability mass function pX|Y (x|y) = pY (y)
.
y=0 y=1 y=2 y=3
x=0 1 0 0 0
x=1 0 2
8
4
8 1
x=2 0 4
8
4
8 0
x=3 0 2
8 0 0
1 1 1 1

pX,Y (x,y)
EXERCISE : Also construct the Table for pY |X (y|x) = pX (x)
.

99
EXAMPLE :
Joint probability mass function pX,Y (x, y)
y = 1 y = 2 y = 3 pX (x)
x=1 1
3
1
12
1
12
1
2
x=2 2
9
1
18
1
18
1
3
x=3 1
9
1
36
1
36
1
6
pY (y) 2
3
1
6
1
6 1

pX,Y (x,y)
Conditional probability mass function pX|Y (x|y) = pY (y)
.
y=1 y=2 y=3
x=1 1
2
1
2
1
2
x=2 1
3
1
3
1
3
x=3 1
6
1
6
1
6
1 1 1

QUESTION : What does the last Table tell us?


EXERCISE : Also construct the Table for P (Y = y|X = x) .

100
Expectation
The expected value of a discrete random variable X is
X X
E[X] ≡ xk · P (X = xk ) = xk · pX (xk ) .
k k

Thus E[X] represents the weighted average value of X .

( E[X] is also called the mean of X .)

EXAMPLE : The expected value of rolling a die is


1 1 1 1
P6 7
E[X] = 1 · 6 + 2 · 6 + · · · + 6 · 6 = 6 · k=1 k = 2
.

EXERCISE : Prove the following :


• E[aX] = a E[X] ,
• E[aX + b] = a E[X] + b .

101
EXAMPLE : Toss a coin until ”Heads” occurs. Then
S = {H , T H , T T H , T T T H , · · · } .

The random variable X is the number of tosses until ”Heads” occurs :


X(H) = 1 , X(T H) = 2 , X(T T H) = 3 .
Then
n
1 1 1 X k
E[X] = 1 · + 2 · + 3 · + · · · = lim k
= 2.
2 4 8 n→∞
k=1
2
Pn k
n k=1 k/2
1 0.50000000
2 1.00000000
3 1.37500000
10 1.98828125
40 1.99999999
REMARK :
Perhaps using Sn = {all sequences of n tosses} is better · · ·

102
The expected value of a function of a random variable is

X
E[g(X)] ≡ g(xk ) p(xk ) .
k

EXAMPLE :
The pay-off of rolling a die is $k2 , where k is the side facing up.

What should the entry fee be for the betting to break even?

SOLUTION : Here g(X) = X 2 , and


6
X
21 1 6(6 + 1)(2 · 6 + 1) 91 ∼
E[g(X)] = k = = = $15.17 .
k=1
6 6 6 6

103
The expected value of a function of two random variables is
XX
E[g(X, Y )] ≡ g(xk , yℓ ) p(xk , yℓ ) .
k ℓ

EXAMPLE : y=1 y=2 y=3 pX (x)


x=1 1
3
1
12
1
12
1
2
x=2 2
9
1
18
1
18
1
3
x=3 1
9
1
36
1
36
1
6
pY (y) 2
3
1
6
1
6 1

1 1 1 5
E[X] = 1· 2
+ 2· 3
+ 3· 6
= 3
,
2 1 1 3
E[Y ] = 1· 3
+ 2· 6
+ 3· 6
= 2
,
1 1 1
E[XY ] = 1· 3
+ 2· 12
+ 3· 12

2 1 1
+ 2· 9
+ 4· 18
+ 6· 18

1 1 1 5 ( So ? )
+ 3· 9
+ 6· 36
+ 9· 36
= 2
.

104
PROPERTY :

• If X and Y are independent then E[XY ] = E[X] E[Y ] .

PROOF :
P P
E[XY ] = k ℓ xk yℓ pX,Y (xk , yℓ )
P P
= k ℓ xk yℓ pX (xk ) pY (yℓ ) (by independence)
P P
= k{ xk pX (xk ) ℓ yℓ pY (yℓ )}
P P
= { k xk pX (xk )} · { ℓ yℓ pY (yℓ )}

= E[X] · E[Y ] .

EXAMPLE : See the preceding example !

105
PROPERTY : E[X + Y ] = E[X] + E[Y ] . ( Always ! )

PROOF :
P P
E[X + Y ] = k ℓ (xk + yℓ ) pX,Y (xk , yℓ )
P P P P
= k ℓ xk pX,Y (xk , yℓ ) + k ℓ yℓ pX,Y (xk , yℓ )
P P P P
= k ℓ xk pX,Y (xk , yℓ ) + ℓ k yℓ pX,Y (xk , yℓ )
P P P P
= k {xk ℓ pX,Y (xk , yℓ )} + ℓ{ yℓ k pX,Y (xk , yℓ )}
P P
= k {xk pX (xk )} + ℓ {yℓ pY (yℓ )}

= E[X] + E[Y ] .

NOTE : X and Y need not be independent !

106
EXERCISE :
Probability mass function pX,Y (x, y)
y = 6 y = 8 y = 10 pX (x)
x=1 1
5 0 1
5
2
5
x=2 0 1
5 0 1
5
x=3 1
5 0 1
5
2
5
pY (y) 2
5
1
5
2
5 1

Show that

• E[X] = 2 , E[Y ] = 8 , E[XY ] = 16

• X and Y are not independent

Thus if
E[XY ] = E[X] E[Y ] ,

then it does not necessarily follow that X and Y are independent !

107
Variance and Standard Deviation
Let X have mean
µ = E[X] .

Then the variance of X is


X
2
V ar(X) ≡ E[ (X − µ) ] ≡ (xk − µ)2 p(xk ) ,
k

which is the average weighted square distance from the mean.

We have
V ar(X) = E[X 2 − 2µX + µ2 ]

= E[X 2 ] − 2µE[X] + µ2

= E[X 2 ] − 2µ2 + µ2

= E[X 2 ] − µ2 .

108
The standard deviation of X is
p p p
σ(X) ≡ V ar(X) = E[ (X − µ)2 ] = E[X 2 ] − µ2 .

which is the average weighted distance from the mean.

EXAMPLE : The variance of rolling a die is


6
X 1
V ar(X) = [k2 · ] − µ2
k=1
6

1 6(6 + 1)(2 · 6 + 1) 7 2 35
= − ( ) = .
6 6 2 12

The standard deviation is


r
35 ∼
σ = = 1.70 .
12

109
Covariance
Let X and Y be random variables with mean

E[X] = µX , E[Y ] = µY .

Then the covariance of X and Y is defined as


X
Cov(X, Y ) ≡ E[ (X−µX ) (Y −µY ) ] = (xk −µX ) (yℓ −µY ) p(xk , yℓ ) .
k,ℓ

We have
Cov(X, Y ) = E[ (X − µX ) (Y − µY ) ]

= E[XY − µX Y − µY X + µX µY ]

= E[XY ] − µX µY − µY µX + µX µY

= E[XY ] − E[X] E[Y ] .

110
We defined

Cov(X, Y ) ≡ E[ (X − µX ) (Y − µY ) ]
X
= (xk − µX ) (yℓ − µY ) p(xk , yℓ )
k,ℓ

= E[XY ] − E[X] E[Y ] .


NOTE :
Cov(X, Y ) measures ”concordance ” or ”coherence ” of X and Y :

• If X > µX when Y > µY and X < µX when Y < µY then

Cov(X, Y ) > 0 .

• If X > µX when Y < µY and X < µX when Y > µY then

Cov(X, Y ) < 0 .

111
EXERCISE : Prove the following :

• V ar(aX + b) = a2 V ar(X) ,

• Cov(X, Y ) = Cov(Y, X) ,

• Cov(cX, Y ) = c Cov(X, Y ) ,

• Cov(X, cY ) = c Cov(X, Y ) ,

• Cov(X + Y, Z) = Cov(X, Z) + Cov(Y, Z) ,

• V ar(X + Y ) = V ar(X) + V ar(Y ) + 2 Cov(X, Y ) .

112
PROPERTY :

If X and Y are independent then Cov(X, Y ) = 0 .

PROOF :

We have already shown ( with µX ≡ E[X] and µY ≡ E[Y ] ) that

Cov(X, Y ) ≡ E[ (X − µX ) (Y − µY ) ] = E[XY ] − E[X] E[Y ] ,

and that if X and Y are independent then

E[XY ] = E[X] E[Y ] .

from which the result follows.

113
EXERCISE : ( already used earlier · · · )
Probability mass function pX,Y (x, y)
y = 6 y = 8 y = 10 pX (x)
x=1 1
5 0 1
5
2
5
x=2 0 1
5 0 1
5
x=3 1
5 0 1
5
2
5
pY (y) 2
5
1
5
2
5 1
Show that
• E[X] = 2 , E[Y ] = 8 , E[XY ] = 16
• Cov(X, Y ) = E[XY ] − E[X] E[Y ] = 0
• X and Y are not independent

Thus if
Cov(X, Y ) = 0 ,

then it does not necessarily follow that X and Y are independent !

114
PROPERTY :

If X and Y are independent then

V ar(X + Y ) = V ar(X) + V ar(Y ) .

PROOF :

We have already shown (in an exercise !) that

V ar(X + Y ) = V ar(X) + V ar(Y ) + 2 Cov(X, Y ) ,

and that if X and Y are independent then

Cov(X, Y ) = 0 ,

from which the result follows.

115
EXERCISE :

Compute
E[X] , E[Y ] , E[X 2 ] , E[Y 2 ]

E[XY ] , V ar(X) , V ar(Y )

Cov(X, Y )
for

Joint probability mass function pX,Y (x, y)


y = 0 y = 1 y = 2 y = 3 pX (x)
x=0 1
8 0 0 0 1
8
x=1 0 1
8
1
8
1
8
3
8
x=2 0 2
8
1
8 0 3
8
x=3 0 1
8 0 0 1
8
pY (y) 1
8
4
8
2
8
1
8 1

116
EXERCISE :

Compute
E[X] , E[Y ] , E[X 2 ] , E[Y 2 ]

E[XY ] , V ar(X) , V ar(Y )

Cov(X, Y )
for

Joint probability mass function pX,Y (x, y)


y=1 y=2 y=3 pX (x)
x=1 1
3
1
12
1
12
1
2
x=2 2
9
1
18
1
18
1
3
x=3 1
9
1
36
1
36
1
6
pY (y) 2
3
1
6
1
6 1

117
SPECIAL DISCRETE RANDOM VARIABLES

The Bernoulli Random Variable

A Bernoulli trial has only two outcomes , with probability


P (X = 1) = p,

P (X = 0) = 1−p ,
e.g., tossing a coin, winning or losing a game, · · · .
We have
E[X] = 1 · p + 0 · (1 − p) = p ,

E[X 2 ] = 12 · p + 02 · (1 − p) = p ,

V ar(X) = E[X 2 ] − E[X]2 = p − p2 = p(1 − p) .

NOTE : If p is small then V ar(X) ∼


=p.

118
EXAMPLES :

1
• When p = 2
(e.g., for tossing a coin), we have
1 1
E[X] = p = 2
, V ar(X) = p(1 − p) = 4
.

• When rolling a die , with outcome k , (1 ≤ k ≤ 6) , let


X(k) = 1 if the roll resulted in a six ,
and
X(k) = 0 if the roll did not result in a six .
Then
1 5
E[X] = p = 6
, V ar(X) = p(1 − p) = 36
.

• When p = 0.01 , then

E[X] = 0.01 , V ar(X) = 0.0099 ∼


= 0.01 .

119
The Binomial Random Variable

Perform a Bernoulli trial n times in sequence .

Assume the individual trials are independent .

An outcome could be
100011001010 (n = 12) ,
with probability
P (100011001010) = p5 · (1 − p)7 . ( Why ? )

Let the X be the number of ”successes ” (i.e. 1’s) .

For example,
X(100011001010) = 5 .
We have
 
12
P (X = 5) = · p5 · (1 − p)7 . ( Why ? )
5

120
In general, for k successes in a sequence of n trials, we have
 
n
P (X = k) = · pk · (1 − p)n−k , (0 ≤ k ≤ n) .
k

EXAMPLE : Tossing a coin 12 times:


1
n = 12 , p = 2
k pX (k) FX (k)
0 1 / 4096 1 / 4096
1 12 / 4096 13 / 4096
2 66 / 4096 79 / 4096
3 220 / 4096 299 / 4096
4 495 / 4096 794 / 4096
5 792 / 4096 1586 / 4096
6 924 / 4096 2510 / 4096
7 792 / 4096 3302 / 4096
8 495 / 4096 3797 / 4096
9 220 / 4096 4017 / 4096
10 66 / 4096 4083 / 4096
11 12 / 4096 4095 / 4096
12 1 / 4096 4096 / 4096

121
1
The Binomial mass and distribution functions for n = 12 , p = 2

122
For k successes in a sequence of n trials :
 
n
P (X = k) = · pk · (1 − p)n−k , (0 ≤ k ≤ n) .
k

EXAMPLE : Rolling a die 12 times:


1
n = 12 , p = 6
k pX (k) FX (k)
0 0.1121566221 0.112156
1 0.2691758871 0.381332
2 0.2960935235 0.677426
3 0.1973956972 0.874821
4 0.0888280571 0.963649
5 0.0284249838 0.992074
6 0.0066324966 0.998707
7 0.0011369995 0.999844
8 0.0001421249 0.999986
9 0.0000126333 0.999998
10 0.0000007580 0.999999
11 0.0000000276 0.999999
12 0.0000000005 1.000000

123
1
The Binomial mass and distribution functions for n = 12 , p = 6

124
EXAMPLE :
In 12 rolls of a die write the outcome as, for example,

100011001010
where
1 denotes the roll resulted in a six ,
and
0 denotes the roll did not result in a six .

As before, let X be the number of 1’s in the outcome.

Then X represents the number of sixes in the 12 rolls.

Then, for example, using the preceding Table :

P (X = 5) ∼
= 2.8 % , P (X ≤ 5) ∼
= 99.2 % .

125
EXERCISE : Show that from
 
n
P (X = k) = · pk · (1 − p)n−k ,
k
and  
n
P (X = k + 1) = · pk+1 · (1 − p)n−k−1 ,
k+1
it follows that
P (X = k + 1) = ck · P (X = k) ,
where n−k p
ck = · .
k+1 1−p

NOTE : This recurrence formula is an efficient and stable algorithm


to compute the binomial probabilities :
P (X = 0) = (1 − p)n ,

P (X = k + 1) = ck · P (X = k) , k = 0, 1, · · · , n − 1 .

126
Mean and variance of the Binomial random variable :

By definition, the mean of a Binomial random variable X is


n n  
X X n
E[X] = k · P (X = k) = k· pk (1 − p)n−k ,
k
k=0 k=0

which can be shown to equal np .

An easy way to see this is as follows :

If in a sequence of n independent Bernoulli trials we let

Xk = the outcome of the kth Bernoulli trial , (Xk = 0 or 1 ) ,

then
X ≡ X1 + X2 + · · · + Xn ,
is the Binomial random variable that counts the successes ” .

127
X ≡ X1 + X2 + · · · + Xn

We know that
E[Xk ] = p ,
so
E[X] = E[X1 ] + E[X2 ] + · · · + E[Xn ] = np .

We already know that


V ar(Xk ) = E[Xk2 ] − (E[Xk ])2 = p − p2 = p(1 − p) ,

so, since the Xk are independent , we have

V ar(X) = V ar(X1 ) + V ar(X2 ) + · · · + V ar(Xn ) = np(1 − p) .

NOTE : If p is small then V ar(X) ∼


= np .

128
EXAMPLES :

• For 12 tosses of a coin , with Heads is success, we have


1
so n = 12 , p=
2
E[X] = np = 6 , V ar(X) = np(1 − p) = 3 .

• For 12 rolls of a die , with six is success , we have


1
so n = 12 , p=
6
E[X] = np = 2 , V ar(X) = np(1 − p) = 5/3 .

• If n = 500 and p = 0.01 , then

E[X] = np = 5 , V ar(X) = np(1 − p) = 4.95 ∼


= 5.

129
The Poisson Random Variable

The Poisson variable approximates the Binomial random variable :


k
 
n k n−k ∼ −λ λ
P (X = k) = · p · (1 − p) = e · ,
k k!
when we take

λ = n p ( the average number of successes ) .

This approximation is accurate if n is large and p small .

Recall that for the Binomial random variable


E[X] = n p , and V ar(X) = np(1 − p) ∼
= np when p is small.

Indeed, for the Poisson random variable we will show that


E[X] = λ and V ar(X) = λ .

130
A stable and efficient way to compute the Poisson probability

λk
P (X = k) = e−λ · , k = 0, 1, 2, · · · ,
k!

λk+1
P (X = k + 1) = e−λ · ,
(k + 1)!

is to use the recurrence relation

P (X = 0) = e−λ ,

λ
P (X = k + 1) = · P (X = k) , k = 0, 1, 2, · · · .
k+1

NOTE : Unlike the Binomial random variable, the Poisson random


variable can have an arbitrarily large integer value k.

131
The Poisson random variable
λk
P (X = k) = e−λ · , k = 0, 1, 2, · · · ,
k!
has (as shown later) : E[X] = λ and V ar(X) = λ .

The Poisson distribution function is


k k
X λℓ X λℓ
F (k) = P (X ≤ k) = e−λ = e−λ ,
ℓ=0
ℓ! ℓ=0
ℓ!

with, as should be the case,



X λℓ
lim F (k) = e−λ = e−λ eλ = 1 .
k→∞
ℓ=0
ℓ!

( using the Taylor series from Calculus for eλ ) .

132
The Poisson random variable
−λ λk
P (X = k) = e · , k = 0, 1, 2, · · · ,
k!
models the probability of k ”successes ” in a given ”time” interval,
when the average number of successes is λ .

EXAMPLE : Suppose customers arrive at the rate of six per hour.


The probability that k customers arrive in a one-hour period is
60 ∼
P (k = 0) = e−6 · = 0.0024 ,
0!
61 ∼
P (k = 1) = e−6 · = 0.0148 ,
1!
62 ∼
P (k = 2) = e−6 · = 0.0446 .
2!
The probability that more than 2 customers arrive is
1 − (0.0024 + 0.0148 + 0.0446) ∼
= 0.938 .

133
λk
 
n ∼
pBinomial (k) = pk (1 − p)n−k = pPoisson (k) = e−λ
k k!

EXAMPLE : λ = 6 customers/hour.
For the Binomial take n = 12 , p = 0.5 (0.5 customers/5 minutes) ,
so that indeed np = λ .
k pBinomial pPoisson FBinomial FPoisson
0 0.0002 0.0024 0.0002 0.0024
1 0.0029 0.0148 0.0031 0.0173
2 0.0161 0.0446 0.0192 0.0619
3 0.0537 0.0892 0.0729 0.1512
4 0.1208 0.1338 0.1938 0.2850
5 0.1933 0.1606 0.3872 0.4456
6 0.2255 0.1606 0.6127 0.6063
7 0.1933 0.1376 0.8061 0.7439
8 0.1208 0.1032 0.9270 0.8472
9 0.0537 0.0688 0.9807 0.9160
10 0.0161 0.0413 0.9968 0.9573
11 0.0029 0.0225 0.9997 0.9799
12 0.0002 0.0112 1.0000 0.9911⋆ Why not 1.0000 ?

Here the approximation is not so good · · ·

134
λk
 
n ∼
pBinomial (k) = pk (1 − p)n−k = pPoisson (k) = e−λ
k k!
EXAMPLE : λ = 6 customers/hour.
For the Binomial take n = 60 , p = 0.1 (0.1 customers/minute) ,
so that indeed np = λ .
k pBinomial pPoisson FBinomial FPoisson
0 0.0017 0.0024 0.0017 0.0024
1 0.0119 0.0148 0.0137 0.0173
2 0.0392 0.0446 0.0530 0.0619
3 0.0843 0.0892 0.1373 0.1512
4 0.1335 0.1338 0.2709 0.2850
5 0.1662 0.1606 0.4371 0.4456
6 0.1692 0.1606 0.6064 0.6063
7 0.1451 0.1376 0.7515 0.7439
8 0.1068 0.1032 0.8583 0.8472
9 0.0685 0.0688 0.9269 0.9160
10 0.0388 0.0413 0.9657 0.9573
11 0.0196 0.0225 0.9854 0.9799
12 0.0089 0.0112 0.9943 0.9911
13 ··· ··· ··· ···

Here the approximation is better · · ·

135
1
n = 12 , p = 2
, λ=6 n = 200 , p = 0.01 , λ = 2

The Binomial (blue) and Poisson (red) probability mass functions.


For the case n = 200, p = 0.01, the approximation is very good !

136
For the Binomial random variable we found
E[X] = np and V ar(X) = np(1 − p) ,

while for the Poisson random variable, with λ = np we will show

E[X] = np and V ar(X) = np .

Note again that


np(1 − p) ∼
= np , when p is small .

EXAMPLE : In the preceding two Tables we have


n=12 , p=0.5 n=60 , p=0.1
Binomial Poisson Binomial Poisson
E[X] 6.0000 6.0000 E[X] 6.0000 6.0000
V ar[X] 3.0000 6.0000 V ar[X] 5.4000 6.0000
σ[X] 1.7321 2.4495 σ[X] 2.3238 2.4495

137
FACT : (The Method of Moments)

By Taylor expansion of etX about t = 0 , we have


2 2 3 3
h t X t X i
ψ(t) ≡ E[etX ] = E 1 + tX + + + ···
2! 3!
t2 t3
= 1 + t E[X] + E[X 2 ] + E[X 3 ] + · · · .
2! 3!

It follows that
ψ ′ (0) = E[X] , ψ ′′ (0) = E[X 2 ] . ( Why ? )

This sometimes facilitates computing the mean


µ = E[X] ,
and the variance
V ar(X) = E[X 2 ] − µ2 .

138
APPLICATION : The Poisson mean and variance :
∞ ∞
X X λk
ψ(t) ≡ E[etX ] = etk P (X = k) = etk e−λ
k=0 k=0
k!

−λ
X (λet )k et t
= e = e−λ eλ = eλ(e −1) .
k=0
k!

t
Here ψ ′ (t) = λ et eλ(e −1)

′′ t 2 t λ(et −1)
 
ψ (t) = λ λ (e ) + e e ( Check ! )

so that E[X] = ψ ′ (0) = λ

E[X 2 ] = ψ ′′ (0) = λ(λ + 1) = λ2 + λ

V ar(X) = E[X 2 ] − E[X]2 = λ .

139
EXAMPLE : Defects in a wire occur at the rate of one per 10 meter,
with a Poisson distribution :
−λ λk
P (X = k) = e · , k = 0, 1, 2, · · · .
k!
What is the probability that :

• A 12-meter roll has at no defects?


ANSWER : Here λ = 1.2 , and P (X = 0) = e−λ = 0.3012 .

• A 12-meter roll of wire has one defect?


ANSWER : With λ = 1.2 , P (X = 1) = e−λ · λ = 0.3614 .

• Of five 12-meter rolls two have one defect and three have none?
 
5
ANSWER : ·0.30123 ·0.36142 = 0.0357 . ( Why ? )
3

140
EXERCISE :
Defects in a certain wire occur at the rate of one per 10 meter.
Assume the defects have a Poisson distribution.
What is the probability that :
• a 20-meter wire has no defects?

• a 20-meter wire has at most 2 defects?

EXERCISE :
Customers arrive at a counter at the rate of 8 per hour.
Assume the arrivals have a Poisson distribution.
What is the probability that :
• no customer arrives in 15 minutes?

• two customers arrive in a period of 30 minutes?

141

You might also like