0% found this document useful (0 votes)
31 views

Chap2-Discrete Random Variables

1. A discrete random variable is a function that maps from a sample space of possible outcomes to real numbers. 2. Examples of discrete random variables include the number of heads when tossing a coin 3 times, and the index of the first heads. 3. The probability mass function gives the probability that a discrete random variable equals each possible value, and the cumulative distribution function gives the probability that the variable is less than or equal to each value.

Uploaded by

nguimbousdorette
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

Chap2-Discrete Random Variables

1. A discrete random variable is a function that maps from a sample space of possible outcomes to real numbers. 2. Examples of discrete random variables include the number of heads when tossing a coin 3 times, and the index of the first heads. 3. The probability mass function gives the probability that a discrete random variable equals each possible value, and the cumulative distribution function gives the probability that the variable is less than or equal to each value.

Uploaded by

nguimbousdorette
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

DISCRETE RANDOM VARIABLES

DEFINITION : A discrete random variable is a function X(s) from


a finite or countably infinite sample space S to the real numbers :
X(·) : S → R.

EXAMPLE : Toss a coin 3 times in sequence. The sample space


is
S = {HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } ,
and examples of random variables are
• X(s) = the number of Heads in the sequence ; e.g., X(HT H) = 2 ,

• Y (s) = The index of the first H ; e.g., Y (T T H) = 3 ,


0 if the sequence has no H , i.e., Y (T T T ) = 0 .

NOTE : In this example X(s) and Y (s) are actually integers .

71
Value-ranges of a random variable correspond to events in S .

EXAMPLE : For the sample space


S = {HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } ,
with
X(s) = the number of Heads ,
the value
X(s) = 2 , corresponds to the event {HHT , HT H , T HH} ,
and the values
1 < X(s) ≤ 3 , correspond to {HHH , HHT , HT H , T HH} .

NOTATION : If it is clear what S is then we often just write


X instead of X(s) .

72
Value-ranges of a random variable correspond to events in S ,
and
events in S have a probability .
Thus
Value-ranges of a random variable have a probability .

EXAMPLE : For the sample space


S = {HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } ,
with X(s) = the number of Heads ,
we have
6
P (0 < X ≤ 2) = .
8

QUESTION : What are the values of


P (X ≤ −1) , P (X ≤ 0) , P (X ≤ 1) , P (X ≤ 2) , P (X ≤ 3) , P (X ≤ 4) ?

73
NOTATION : We will also write pX (x) to denote P (X = x) .

EXAMPLE : For the sample space

S = {HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } ,
with
X(s) = the number of Heads ,
we have
1
pX (0) ≡ P ( {T T T } ) = 8

3
pX (1) ≡ P ( {HT T , T HT , T T H} ) = 8

3
pX (2) ≡ P ( {HHT , HT H , T HH} ) = 8

1
pX (3) ≡ P ( {HHH} ) = 8
where
pX (0) + pX (1) + pX (2) + pX (3) = 1 . ( Why ? )

74
E3 X(s)
HHH
S
E2 3
HHT

HTH
2
THH

HTT
E1 1
THT

TTH

E0 0
TTT

Graphical representation of X .

The events E0 , E1 , E2 , E3 are disjoint since X(s) is a function !


(X : S → R must be defined for all s ∈ S and must be single-valued.)

75
The graph of pX .

76
DEFINITION :
pX (x) ≡ P (X = x) ,
is called the probability mass function .

DEFINITION :
FX (x) ≡ P (X ≤ x) ,
is called the (cumulative) probability distribution function .

PROPERTIES :
• FX (x) is a non-decreasing function of x . ( Why ? )
• FX (−∞) = 0 and FX (∞) = 1 . ( Why ? )
• P (a < X ≤ b) = FX (b) − FX (a) . ( Why ? )

NOTATION : When it is clear what X is then we also write

p(x) for pX (x) and F (x) for FX (x) .

77
EXAMPLE : With X(s) = the number of Heads , and
S = {HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } ,
1 3 3 1
p(0) = 8
, p(1) = 8
, p(2) = 8
, p(3) = 8
,
we have the probability distribution function

F (−1) ≡ P (X ≤ −1) = 0
1
F ( 0) ≡ P (X ≤ 0) = 8
4
F ( 1) ≡ P (X ≤ 1) = 8
7
F ( 2) ≡ P (X ≤ 2) = 8
F ( 3) ≡ P (X ≤ 3) = 1
F ( 4) ≡ P (X ≤ 4) = 1
We see, for example, that
P (0 < X ≤ 2) = P (X = 1) + P (X = 2)

7 1 6
= F (2) − F (0) = 8
− 8
= 8
.

78
The graph of the probability distribution function FX .

79
EXAMPLE : Toss a coin until ”Heads” occurs.
Then the sample space is countably infinite , namely,
S = {H , T H , T T H , T T T H , · · · } .

The random variable X is the number of rolls until ”Heads” occurs :


X(H) = 1 , X(T H) = 2 , X(T T H) = 3 , ···
Then
1 1
p(1) = 2
, p(2) = 4
, p(3) = 18 , · · · ( Why ? )
and n n
X X 1 1
F (n) = P (X ≤ n) = p(k) = k
= 1 − n ,
k=1 k=1
2 2
and, as should be the case,
∞ n
X X 1
p(k) = lim p(k) = lim (1 − n ) = 1 .
k=1
n→∞
k=1
n→∞ 2

NOTE : The outcomes in S do not have equal probability !


EXERCISE : Draw the probability mass and distribution functions.

80
X(s) is the number of tosses until ”Heads” occurs · · ·

REMARK : We can also take S ≡ Sn as all ordered outcomes of


length n. For example, for n = 4,

S4 = { H̃HHH , H̃HHT , H̃HT H , H̃HT T ,

H̃T HH , H̃T HT , H̃T T H , H̃T T T ,

T H̃HH , T H̃HT , T H̃T H , T H̃T T ,

T T H̃H , T T H̃T , T T T H̃ , TTTT }.

where for each outcome the first ”Heads” is marked as H̃ .


1
Each outcome in S4 has equal probability 2−n (here 2−4 = 16
) , and
1 1 1 1
pX (1) = 2
, pX (2) = 4
, pX (3) = 8
, pX (4) = 16
··· ,
independent of n .

81
Joint distributions
The probability mass function and the probability distribution function
can also be functions of more than one variable.

EXAMPLE : Toss a coin 3 times in sequence. For the sample space

S = {HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } ,
we let
X(s) = # Heads , Y (s) = index of the first H (0 for T T T ) .
Then we have the joint probability mass function
pX,Y (x, y) = P (X = x , Y = y) .
For example,
pX,Y (2, 1) = P (X = 2 , Y = 1)

= P ( 2 Heads , 1st toss is Heads)

2 1
= 8
= 4
.

82
EXAMPLE : ( continued · · · ) For
S = {HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } ,
X(s) = number of Heads, and Y (s) = index of the first H ,

we can list the values of pX,Y (x, y) :

Joint probability mass function pX,Y (x, y)


y=0 y=1 y=2 y=3 pX (x)
1 1
x=0 8 0 0 0 8
1 1 1 3
x=1 0 8 8 8 8
2 1 3
x=2 0 8 8 0 8
1 1
x=3 0 8 0 0 8
1 4 2 1
pY (y) 8 8 8 8 1

NOTE :
• The marginal probability pX is the probability mass function of X.
• The marginal probability pY is the probability mass function of Y .

83
EXAMPLE : ( continued · · · )
X(s) = number of Heads, and Y (s) = index of the first H .

y=0 y=1 y=2 y=3 pX (x)


1 1
x=0 8 0 0 0 8
1 1 1 3
x=1 0 8 8 8 8
2 1 3
x=2 0 8 8 0 8
1 1
x=3 0 8 0 0 8
1 4 2 1
pY (y) 8 8 8 8 1

For example,

• X = 2 corresponds to the event {HHT , HT H , T HH} .


• Y = 1 corresponds to the event {HHH , HHT , HT H , HT T } .
• (X = 2 and Y = 1) corresponds to the event {HHT , HT H} .

QUESTION : Are the events X = 2 and Y = 1 independent ?

84
X(s)
Y(s)
E 31
HHH
S
E 13 3
TTH
E11
HTT
1
E12 THT

E 21 HTH
2
HHT

E 22 THH

0
E 00
TTT

The events Ei,j ≡ { s ∈ S : X(s) = i , Y (s) = j } are disjoint .


QUESTION : Are the events X = 2 and Y = 1 independent ?

85
DEFINITION :
pX,Y (x, y) ≡ P (X = x , Y = y) ,
is called the joint probability mass function .

DEFINITION :
FX,Y (x, y) ≡ P (X ≤ x , Y ≤ y) ,
is called the joint (cumulative) probability distribution function .

NOTATION : When it is clear what X and Y are then we also


write

p(x, y) for pX,Y (x, y) ,


and
F (x, y) for FX,Y (x, y) .

86
EXAMPLE : Three tosses : X(s) = # Heads, Y (s) = index 1st H .
Joint probability mass function pX,Y (x, y)
y=0 y=1 y=2 y=3 pX (x)
1 1
x=0 8 0 0 0 8
1 1 1 3
x=1 0 8 8 8 8
2 1 3
x=2 0 8 8 0 8
1 1
x=3 0 8 0 0 8
1 4 2 1
pY (y) 8 8 8 8 1

Joint distribution function FX,Y (x, y) ≡ P (X ≤ x, Y ≤ y)


y=0 y=1 y=2 y=3 FX (·)
1 1 1 1 1
x=0 8 8 8 8 8
1 2 3 4 4
x=1 8 8 8 8 8
1 4 6 7 7
x=2 8 8 8 8 8
1 5 7
x=3 8 8 8 1 1
1 5 7
FY (·) 8 8 8 1 1

Note that the distribution function FX is a copy of the 4th column,


and the distribution function FY is a copy of the 4th row. ( Why ? )

87
In the preceding example :
Joint probability mass function pX,Y (x, y)
y=0 y=1 y=2 y=3 pX (x)
1 1
x=0 8 0 0 0 8
1 1 1 3
x=1 0 8 8 8 8
2 1 3
x=2 0 8 8 0 8
1 1
x=3 0 8 0 0 8
1 4 2 1
pY (y) 8 8 8 8 1

Joint distribution function FX,Y (x, y) ≡ P (X ≤ x, Y ≤ y)


y=0 y=1 y=2 y=3 FX (·)
1 1 1 1 1
x=0 8 8 8 8 8
1 2 3 4 4
x=1 8 8 8 8 8
1 4 6 7 7
x=2 8 8 8 8 8
1 5 7
x=3 8 8 8 1 1
1 5 7
FY (·) 8 8 8 1 1

QUESTION : Why is
P (1 < X ≤ 3 , 1 < Y ≤ 3) = F (3, 3) − F (1, 3) − F (3, 1) + F (1, 1) ?

88
EXERCISE :
Roll a four-sided die (tetrahedron) two times.
(The sides are marked 1 , 2 , 3 , 4 .)
Suppose each of the four sides is equally likely to end facing down.
Suppose the outcome of a single roll is the side that faces down ( ! ).

Define the random variables X and Y as

X = result of the first roll , Y = sum of the two rolls.

• What is a good choice of the sample space S ?


• How many outcomes are there in S ?
• List the values of the joint probability mass function pX,Y (x, y) .
• List the values of the joint cumulative distribution function FX,Y (x, y) .

89
EXERCISE :

Three balls are selected at random from a bag containing

2 red , 3 green , 4 blue balls .

Define the random variables

R(s) = the number of red balls drawn,


and
G(s) = the number of green balls drawn .

List the values of


• the joint probability mass function pR,G (r, g) .
• the marginal probability mass functions pR (r) and pG (g) .
• the joint distribution function FR,G (r, g) .
• the marginal distribution functions FR (r) and FG (g) .

90
Independent random variables

Two discrete random variables X(s) and Y (s) are independent if


P (X = x , Y = y) = P (X = x) · P (Y = y) , for all x and y ,

or, equivalently, if their probability mass functions satisfy


pX,Y (x, y) = pX (x) · pY (y) , for all x and y ,

or, equivalently, if the events


Ex ≡ X −1 ({x}) and Ey ≡ Y −1 ({y}) ,
are independent in the sample space S , i.e.,
P (Ex Ey ) = P (Ex ) · P (Ey ) , for all x and y .
NOTE :
• In the current discrete case, x and y are typically integers .
• X −1 ({x}) ≡ { s ∈ S : X(s) = x } .

91
X(s)
Y(s)
E 31
HHH
S
E 13 3
TTH
E11
HTT
1
E12 THT

E 21 HTH
2
HHT

E 22 THH

0
E 00
TTT

Three tosses : X(s) = # Heads, Y (s) = index 1st H .

• What are the values of pX (2) , pY (1) , pX,Y (2, 1) ?


• Are X and Y independent ?

92
RECALL :

X(s) and Y (s) are independent if for all x and y :

pX,Y (x, y) = pX (x) · pY (y) .

EXERCISE :

Roll a die two times in a row.

Let
X be the result of the 1st roll ,
and
Y the result of the 2nd roll .

Are X and Y independent , i.e., is

pX,Y (k, ℓ) = pX (k) · pY (ℓ), for all 1 ≤ k, ℓ ≤ 6 ?

93
EXERCISE :

Are these random variables X and Y independent ?

Joint probability mass function pX,Y (x, y)


y=0 y=1 y=2 y=3 pX (x)
x=0 1
8 0 0 0 1
8
x=1 0 1
8
1
8
1
8
3
8
x=2 0 2
8
1
8 0 3
8
x=3 0 1
8 0 0 1
8
pY (y) 1
8
4
8
2
8
1
8 1

94
EXERCISE : Are these random variables X and Y independent ?

Joint probability mass function pX,Y (x, y)


y = 1 y = 2 y = 3 pX (x)
x=1 1
3
1
12
1
12
1
2
x=2 2
9
1
18
1
18
1
3
x=3 1
9
1
36
1
36
1
6
pY (y) 2
3
1
6
1
6 1

Joint distribution function FX,Y (x, y) ≡ P (X ≤ x, Y ≤ y)


y = 1 y = 2 y = 3 FX (x)
x=1 1
3
5
12
1
2
1
2
x=2 5
9
25
36
5
6
5
6
x=3 2
3
5
6 1 1
FY (y) 2
3
5
6 1 1

QUESTION : Is FX,Y (x, y) = FX (x) · FY (y) ?

95
PROPERTY :
The joint distribution function of independent random variables
X and Y satisfies
FX,Y (x, y) = FX (x) · FY (y) , for all x, y .

PROOF :

FX,Y (xk , yℓ ) = P (X ≤ xk , Y ≤ yℓ )
P P
= i≤k j≤ℓ pX,Y (xi , yj )
P P
= i≤k j≤ℓ pX (xi ) · pY (yj ) (by independence)
P P
= i≤k { pX (xi ) · j≤ℓ pY (yj ) }
P P
= { i≤k pX (xi ) } · { j≤ℓ pY (yj ) }

= FX (xk ) · FY (yℓ ) .

96
Conditional distributions

Let X and Y be discrete random variables with joint probability


mass function
pX,Y (x, y) .

For given x and y , let


Ex = X −1 ({x}) and Ey = Y −1 ({y}) ,
be their corresponding events in the sample space S.

Then
P (Ex Ey ) pX,Y (x, y)
P (Ex |Ey ) ≡ = .
P (Ey ) pY (y)

Thus it is natural to define the conditional probability mass function


pX,Y (x, y)
pX|Y (x|y) ≡ P (X = x | Y = y) = .
pY (y)

97
X(s)
Y(s)
E 31
HHH
S
E 13 3
TTH
E11
HTT
1
E12 THT

E 21 HTH
2
HHT

E 22 THH

0
E 00
TTT

Three tosses : X(s) = # Heads, Y (s) = index 1st H .

• What are the values of P (X = 2 | Y = 1) and P (Y = 1 | X = 2) ?

98
EXAMPLE : (3 tosses : X(s) = # Heads, Y (s) = index 1st H.)
Joint probability mass function pX,Y (x, y)
y = 0 y = 1 y = 2 y = 3 pX (x)
x=0 1
8 0 0 0 1
8
x=1 0 1
8
1
8
1
8
3
8
x=2 0 2
8
1
8 0 3
8
x=3 0 1
8 0 0 1
8
pY (y) 1
8
4
8
2
8
1
8 1
pX,Y (x,y)
Conditional probability mass function pX|Y (x|y) = pY (y)
.
y=0 y=1 y=2 y=3
x=0 1 0 0 0
x=1 0 2
8
4
8 1
x=2 0 4
8
4
8 0
x=3 0 2
8 0 0
1 1 1 1

pX,Y (x,y)
EXERCISE : Also construct the Table for pY |X (y|x) = pX (x)
.

99
EXAMPLE :
Joint probability mass function pX,Y (x, y)
y = 1 y = 2 y = 3 pX (x)
x=1 1
3
1
12
1
12
1
2
x=2 2
9
1
18
1
18
1
3
x=3 1
9
1
36
1
36
1
6
pY (y) 2
3
1
6
1
6 1

pX,Y (x,y)
Conditional probability mass function pX|Y (x|y) = pY (y)
.
y=1 y=2 y=3
x=1 1
2
1
2
1
2
x=2 1
3
1
3
1
3
x=3 1
6
1
6
1
6
1 1 1

QUESTION : What does the last Table tell us?


EXERCISE : Also construct the Table for P (Y = y|X = x) .

100
Expectation
The expected value of a discrete random variable X is
X X
E[X] ≡ xk · P (X = xk ) = xk · pX (xk ) .
k k

Thus E[X] represents the weighted average value of X .

( E[X] is also called the mean of X .)

EXAMPLE : The expected value of rolling a die is


1 1 1 1
P6 7
E[X] = 1 · 6 + 2 · 6 + · · · + 6 · 6 = 6 · k=1 k = 2
.

EXERCISE : Prove the following :


• E[aX] = a E[X] ,
• E[aX + b] = a E[X] + b .

101
EXAMPLE : Toss a coin until ”Heads” occurs. Then
S = {H , T H , T T H , T T T H , · · · } .

The random variable X is the number of tosses until ”Heads” occurs :


X(H) = 1 , X(T H) = 2 , X(T T H) = 3 .
Then
n
1 1 1 X k
E[X] = 1 · + 2 · + 3 · + · · · = lim k
= 2.
2 4 8 n→∞
k=1
2
Pn k
n k=1 k/2
1 0.50000000
2 1.00000000
3 1.37500000
10 1.98828125
40 1.99999999
REMARK :
Perhaps using Sn = {all sequences of n tosses} is better · · ·

102
The expected value of a function of a random variable is

X
E[g(X)] ≡ g(xk ) p(xk ) .
k

EXAMPLE :
The pay-off of rolling a die is $k2 , where k is the side facing up.

What should the entry fee be for the betting to break even?

SOLUTION : Here g(X) = X 2 , and


6
X
21 1 6(6 + 1)(2 · 6 + 1) 91 ∼
E[g(X)] = k = = = $15.17 .
k=1
6 6 6 6

103
The expected value of a function of two random variables is
XX
E[g(X, Y )] ≡ g(xk , yℓ ) p(xk , yℓ ) .
k ℓ

EXAMPLE : y=1 y=2 y=3 pX (x)


x=1 1
3
1
12
1
12
1
2
x=2 2
9
1
18
1
18
1
3
x=3 1
9
1
36
1
36
1
6
pY (y) 2
3
1
6
1
6 1

1 1 1 5
E[X] = 1· 2
+ 2· 3
+ 3· 6
= 3
,
2 1 1 3
E[Y ] = 1· 3
+ 2· 6
+ 3· 6
= 2
,
1 1 1
E[XY ] = 1· 3
+ 2· 12
+ 3· 12

2 1 1
+ 2· 9
+ 4· 18
+ 6· 18

1 1 1 5 ( So ? )
+ 3· 9
+ 6· 36
+ 9· 36
= 2
.

104
PROPERTY :

• If X and Y are independent then E[XY ] = E[X] E[Y ] .

PROOF :
P P
E[XY ] = k ℓ xk yℓ pX,Y (xk , yℓ )
P P
= k ℓ xk yℓ pX (xk ) pY (yℓ ) (by independence)
P P
= k{ xk pX (xk ) ℓ yℓ pY (yℓ )}
P P
= { k xk pX (xk )} · { ℓ yℓ pY (yℓ )}

= E[X] · E[Y ] .

EXAMPLE : See the preceding example !

105
PROPERTY : E[X + Y ] = E[X] + E[Y ] . ( Always ! )

PROOF :
P P
E[X + Y ] = k ℓ (xk + yℓ ) pX,Y (xk , yℓ )
P P P P
= k ℓ xk pX,Y (xk , yℓ ) + k ℓ yℓ pX,Y (xk , yℓ )
P P P P
= k ℓ xk pX,Y (xk , yℓ ) + ℓ k yℓ pX,Y (xk , yℓ )
P P P P
= k {xk ℓ pX,Y (xk , yℓ )} + ℓ{ yℓ k pX,Y (xk , yℓ )}
P P
= k {xk pX (xk )} + ℓ {yℓ pY (yℓ )}

= E[X] + E[Y ] .

NOTE : X and Y need not be independent !

106
EXERCISE :
Probability mass function pX,Y (x, y)
y = 6 y = 8 y = 10 pX (x)
x=1 1
5 0 1
5
2
5
x=2 0 1
5 0 1
5
x=3 1
5 0 1
5
2
5
pY (y) 2
5
1
5
2
5 1

Show that

• E[X] = 2 , E[Y ] = 8 , E[XY ] = 16

• X and Y are not independent

Thus if
E[XY ] = E[X] E[Y ] ,

then it does not necessarily follow that X and Y are independent !

107
Variance and Standard Deviation
Let X have mean
µ = E[X] .

Then the variance of X is


X
2
V ar(X) ≡ E[ (X − µ) ] ≡ (xk − µ)2 p(xk ) ,
k

which is the average weighted square distance from the mean.

We have
V ar(X) = E[X 2 − 2µX + µ2 ]

= E[X 2 ] − 2µE[X] + µ2

= E[X 2 ] − 2µ2 + µ2

= E[X 2 ] − µ2 .

108
The standard deviation of X is
p p p
σ(X) ≡ V ar(X) = E[ (X − µ)2 ] = E[X 2 ] − µ2 .

which is the average weighted distance from the mean.

EXAMPLE : The variance of rolling a die is


6
X 1
V ar(X) = [k2 · ] − µ2
k=1
6

1 6(6 + 1)(2 · 6 + 1) 7 2 35
= − ( ) = .
6 6 2 12

The standard deviation is


r
35 ∼
σ = = 1.70 .
12

109
Covariance
Let X and Y be random variables with mean

E[X] = µX , E[Y ] = µY .

Then the covariance of X and Y is defined as


X
Cov(X, Y ) ≡ E[ (X−µX ) (Y −µY ) ] = (xk −µX ) (yℓ −µY ) p(xk , yℓ ) .
k,ℓ

We have
Cov(X, Y ) = E[ (X − µX ) (Y − µY ) ]

= E[XY − µX Y − µY X + µX µY ]

= E[XY ] − µX µY − µY µX + µX µY

= E[XY ] − E[X] E[Y ] .

110
We defined

Cov(X, Y ) ≡ E[ (X − µX ) (Y − µY ) ]
X
= (xk − µX ) (yℓ − µY ) p(xk , yℓ )
k,ℓ

= E[XY ] − E[X] E[Y ] .


NOTE :
Cov(X, Y ) measures ”concordance ” or ”coherence ” of X and Y :

• If X > µX when Y > µY and X < µX when Y < µY then

Cov(X, Y ) > 0 .

• If X > µX when Y < µY and X < µX when Y > µY then

Cov(X, Y ) < 0 .

111
EXERCISE : Prove the following :

• V ar(aX + b) = a2 V ar(X) ,

• Cov(X, Y ) = Cov(Y, X) ,

• Cov(cX, Y ) = c Cov(X, Y ) ,

• Cov(X, cY ) = c Cov(X, Y ) ,

• Cov(X + Y, Z) = Cov(X, Z) + Cov(Y, Z) ,

• V ar(X + Y ) = V ar(X) + V ar(Y ) + 2 Cov(X, Y ) .

112
PROPERTY :

If X and Y are independent then Cov(X, Y ) = 0 .

PROOF :

We have already shown ( with µX ≡ E[X] and µY ≡ E[Y ] ) that

Cov(X, Y ) ≡ E[ (X − µX ) (Y − µY ) ] = E[XY ] − E[X] E[Y ] ,

and that if X and Y are independent then

E[XY ] = E[X] E[Y ] .

from which the result follows.

113
EXERCISE : ( already used earlier · · · )
Probability mass function pX,Y (x, y)
y = 6 y = 8 y = 10 pX (x)
x=1 1
5 0 1
5
2
5
x=2 0 1
5 0 1
5
x=3 1
5 0 1
5
2
5
pY (y) 2
5
1
5
2
5 1
Show that
• E[X] = 2 , E[Y ] = 8 , E[XY ] = 16
• Cov(X, Y ) = E[XY ] − E[X] E[Y ] = 0
• X and Y are not independent

Thus if
Cov(X, Y ) = 0 ,

then it does not necessarily follow that X and Y are independent !

114
PROPERTY :

If X and Y are independent then

V ar(X + Y ) = V ar(X) + V ar(Y ) .

PROOF :

We have already shown (in an exercise !) that

V ar(X + Y ) = V ar(X) + V ar(Y ) + 2 Cov(X, Y ) ,

and that if X and Y are independent then

Cov(X, Y ) = 0 ,

from which the result follows.

115
EXERCISE :

Compute
E[X] , E[Y ] , E[X 2 ] , E[Y 2 ]

E[XY ] , V ar(X) , V ar(Y )

Cov(X, Y )
for

Joint probability mass function pX,Y (x, y)


y = 0 y = 1 y = 2 y = 3 pX (x)
x=0 1
8 0 0 0 1
8
x=1 0 1
8
1
8
1
8
3
8
x=2 0 2
8
1
8 0 3
8
x=3 0 1
8 0 0 1
8
pY (y) 1
8
4
8
2
8
1
8 1

116
EXERCISE :

Compute
E[X] , E[Y ] , E[X 2 ] , E[Y 2 ]

E[XY ] , V ar(X) , V ar(Y )

Cov(X, Y )
for

Joint probability mass function pX,Y (x, y)


y=1 y=2 y=3 pX (x)
x=1 1
3
1
12
1
12
1
2
x=2 2
9
1
18
1
18
1
3
x=3 1
9
1
36
1
36
1
6
pY (y) 2
3
1
6
1
6 1

117
SPECIAL DISCRETE RANDOM VARIABLES

The Bernoulli Random Variable

A Bernoulli trial has only two outcomes , with probability


P (X = 1) = p,

P (X = 0) = 1−p ,
e.g., tossing a coin, winning or losing a game, · · · .
We have
E[X] = 1 · p + 0 · (1 − p) = p ,

E[X 2 ] = 12 · p + 02 · (1 − p) = p ,

V ar(X) = E[X 2 ] − E[X]2 = p − p2 = p(1 − p) .

NOTE : If p is small then V ar(X) ∼


=p.

118
EXAMPLES :

1
• When p = 2
(e.g., for tossing a coin), we have
1 1
E[X] = p = 2
, V ar(X) = p(1 − p) = 4
.

• When rolling a die , with outcome k , (1 ≤ k ≤ 6) , let


X(k) = 1 if the roll resulted in a six ,
and
X(k) = 0 if the roll did not result in a six .
Then
1 5
E[X] = p = 6
, V ar(X) = p(1 − p) = 36
.

• When p = 0.01 , then

E[X] = 0.01 , V ar(X) = 0.0099 ∼


= 0.01 .

119
The Binomial Random Variable

Perform a Bernoulli trial n times in sequence .

Assume the individual trials are independent .

An outcome could be
100011001010 (n = 12) ,
with probability
P (100011001010) = p5 · (1 − p)7 . ( Why ? )

Let the X be the number of ”successes ” (i.e. 1’s) .

For example,
X(100011001010) = 5 .
We have
 
12
P (X = 5) = · p5 · (1 − p)7 . ( Why ? )
5

120
In general, for k successes in a sequence of n trials, we have
 
n
P (X = k) = · pk · (1 − p)n−k , (0 ≤ k ≤ n) .
k

EXAMPLE : Tossing a coin 12 times:


1
n = 12 , p = 2
k pX (k) FX (k)
0 1 / 4096 1 / 4096
1 12 / 4096 13 / 4096
2 66 / 4096 79 / 4096
3 220 / 4096 299 / 4096
4 495 / 4096 794 / 4096
5 792 / 4096 1586 / 4096
6 924 / 4096 2510 / 4096
7 792 / 4096 3302 / 4096
8 495 / 4096 3797 / 4096
9 220 / 4096 4017 / 4096
10 66 / 4096 4083 / 4096
11 12 / 4096 4095 / 4096
12 1 / 4096 4096 / 4096

121
1
The Binomial mass and distribution functions for n = 12 , p = 2

122
For k successes in a sequence of n trials :
 
n
P (X = k) = · pk · (1 − p)n−k , (0 ≤ k ≤ n) .
k

EXAMPLE : Rolling a die 12 times:


1
n = 12 , p = 6
k pX (k) FX (k)
0 0.1121566221 0.112156
1 0.2691758871 0.381332
2 0.2960935235 0.677426
3 0.1973956972 0.874821
4 0.0888280571 0.963649
5 0.0284249838 0.992074
6 0.0066324966 0.998707
7 0.0011369995 0.999844
8 0.0001421249 0.999986
9 0.0000126333 0.999998
10 0.0000007580 0.999999
11 0.0000000276 0.999999
12 0.0000000005 1.000000

123
1
The Binomial mass and distribution functions for n = 12 , p = 6

124
EXAMPLE :
In 12 rolls of a die write the outcome as, for example,

100011001010
where
1 denotes the roll resulted in a six ,
and
0 denotes the roll did not result in a six .

As before, let X be the number of 1’s in the outcome.

Then X represents the number of sixes in the 12 rolls.

Then, for example, using the preceding Table :

P (X = 5) ∼
= 2.8 % , P (X ≤ 5) ∼
= 99.2 % .

125
EXERCISE : Show that from
 
n
P (X = k) = · pk · (1 − p)n−k ,
k
and  
n
P (X = k + 1) = · pk+1 · (1 − p)n−k−1 ,
k+1
it follows that
P (X = k + 1) = ck · P (X = k) ,
where n−k p
ck = · .
k+1 1−p

NOTE : This recurrence formula is an efficient and stable algorithm


to compute the binomial probabilities :
P (X = 0) = (1 − p)n ,

P (X = k + 1) = ck · P (X = k) , k = 0, 1, · · · , n − 1 .

126
Mean and variance of the Binomial random variable :

By definition, the mean of a Binomial random variable X is


n n  
X X n
E[X] = k · P (X = k) = k· pk (1 − p)n−k ,
k
k=0 k=0

which can be shown to equal np .

An easy way to see this is as follows :

If in a sequence of n independent Bernoulli trials we let

Xk = the outcome of the kth Bernoulli trial , (Xk = 0 or 1 ) ,

then
X ≡ X1 + X2 + · · · + Xn ,
is the Binomial random variable that counts the successes ” .

127
X ≡ X1 + X2 + · · · + Xn

We know that
E[Xk ] = p ,
so
E[X] = E[X1 ] + E[X2 ] + · · · + E[Xn ] = np .

We already know that


V ar(Xk ) = E[Xk2 ] − (E[Xk ])2 = p − p2 = p(1 − p) ,

so, since the Xk are independent , we have

V ar(X) = V ar(X1 ) + V ar(X2 ) + · · · + V ar(Xn ) = np(1 − p) .

NOTE : If p is small then V ar(X) ∼


= np .

128
EXAMPLES :

• For 12 tosses of a coin , with Heads is success, we have


1
so n = 12 , p=
2
E[X] = np = 6 , V ar(X) = np(1 − p) = 3 .

• For 12 rolls of a die , with six is success , we have


1
so n = 12 , p=
6
E[X] = np = 2 , V ar(X) = np(1 − p) = 5/3 .

• If n = 500 and p = 0.01 , then

E[X] = np = 5 , V ar(X) = np(1 − p) = 4.95 ∼


= 5.

129
The Poisson Random Variable

The Poisson variable approximates the Binomial random variable :


k
 
n k n−k ∼ −λ λ
P (X = k) = · p · (1 − p) = e · ,
k k!
when we take

λ = n p ( the average number of successes ) .

This approximation is accurate if n is large and p small .

Recall that for the Binomial random variable


E[X] = n p , and V ar(X) = np(1 − p) ∼
= np when p is small.

Indeed, for the Poisson random variable we will show that


E[X] = λ and V ar(X) = λ .

130
A stable and efficient way to compute the Poisson probability

λk
P (X = k) = e−λ · , k = 0, 1, 2, · · · ,
k!

λk+1
P (X = k + 1) = e−λ · ,
(k + 1)!

is to use the recurrence relation

P (X = 0) = e−λ ,

λ
P (X = k + 1) = · P (X = k) , k = 0, 1, 2, · · · .
k+1

NOTE : Unlike the Binomial random variable, the Poisson random


variable can have an arbitrarily large integer value k.

131
The Poisson random variable
λk
P (X = k) = e−λ · , k = 0, 1, 2, · · · ,
k!
has (as shown later) : E[X] = λ and V ar(X) = λ .

The Poisson distribution function is


k k
X λℓ X λℓ
F (k) = P (X ≤ k) = e−λ = e−λ ,
ℓ=0
ℓ! ℓ=0
ℓ!

with, as should be the case,



X λℓ
lim F (k) = e−λ = e−λ eλ = 1 .
k→∞
ℓ=0
ℓ!

( using the Taylor series from Calculus for eλ ) .

132
The Poisson random variable
−λ λk
P (X = k) = e · , k = 0, 1, 2, · · · ,
k!
models the probability of k ”successes ” in a given ”time” interval,
when the average number of successes is λ .

EXAMPLE : Suppose customers arrive at the rate of six per hour.


The probability that k customers arrive in a one-hour period is
60 ∼
P (k = 0) = e−6 · = 0.0024 ,
0!
61 ∼
P (k = 1) = e−6 · = 0.0148 ,
1!
62 ∼
P (k = 2) = e−6 · = 0.0446 .
2!
The probability that more than 2 customers arrive is
1 − (0.0024 + 0.0148 + 0.0446) ∼
= 0.938 .

133
λk
 
n ∼
pBinomial (k) = pk (1 − p)n−k = pPoisson (k) = e−λ
k k!

EXAMPLE : λ = 6 customers/hour.
For the Binomial take n = 12 , p = 0.5 (0.5 customers/5 minutes) ,
so that indeed np = λ .
k pBinomial pPoisson FBinomial FPoisson
0 0.0002 0.0024 0.0002 0.0024
1 0.0029 0.0148 0.0031 0.0173
2 0.0161 0.0446 0.0192 0.0619
3 0.0537 0.0892 0.0729 0.1512
4 0.1208 0.1338 0.1938 0.2850
5 0.1933 0.1606 0.3872 0.4456
6 0.2255 0.1606 0.6127 0.6063
7 0.1933 0.1376 0.8061 0.7439
8 0.1208 0.1032 0.9270 0.8472
9 0.0537 0.0688 0.9807 0.9160
10 0.0161 0.0413 0.9968 0.9573
11 0.0029 0.0225 0.9997 0.9799
12 0.0002 0.0112 1.0000 0.9911⋆ Why not 1.0000 ?

Here the approximation is not so good · · ·

134
λk
 
n ∼
pBinomial (k) = pk (1 − p)n−k = pPoisson (k) = e−λ
k k!
EXAMPLE : λ = 6 customers/hour.
For the Binomial take n = 60 , p = 0.1 (0.1 customers/minute) ,
so that indeed np = λ .
k pBinomial pPoisson FBinomial FPoisson
0 0.0017 0.0024 0.0017 0.0024
1 0.0119 0.0148 0.0137 0.0173
2 0.0392 0.0446 0.0530 0.0619
3 0.0843 0.0892 0.1373 0.1512
4 0.1335 0.1338 0.2709 0.2850
5 0.1662 0.1606 0.4371 0.4456
6 0.1692 0.1606 0.6064 0.6063
7 0.1451 0.1376 0.7515 0.7439
8 0.1068 0.1032 0.8583 0.8472
9 0.0685 0.0688 0.9269 0.9160
10 0.0388 0.0413 0.9657 0.9573
11 0.0196 0.0225 0.9854 0.9799
12 0.0089 0.0112 0.9943 0.9911
13 ··· ··· ··· ···

Here the approximation is better · · ·

135
1
n = 12 , p = 2
, λ=6 n = 200 , p = 0.01 , λ = 2

The Binomial (blue) and Poisson (red) probability mass functions.


For the case n = 200, p = 0.01, the approximation is very good !

136
For the Binomial random variable we found
E[X] = np and V ar(X) = np(1 − p) ,

while for the Poisson random variable, with λ = np we will show

E[X] = np and V ar(X) = np .

Note again that


np(1 − p) ∼
= np , when p is small .

EXAMPLE : In the preceding two Tables we have


n=12 , p=0.5 n=60 , p=0.1
Binomial Poisson Binomial Poisson
E[X] 6.0000 6.0000 E[X] 6.0000 6.0000
V ar[X] 3.0000 6.0000 V ar[X] 5.4000 6.0000
σ[X] 1.7321 2.4495 σ[X] 2.3238 2.4495

137
FACT : (The Method of Moments)

By Taylor expansion of etX about t = 0 , we have


2 2 3 3
h t X t X i
ψ(t) ≡ E[etX ] = E 1 + tX + + + ···
2! 3!
t2 t3
= 1 + t E[X] + E[X 2 ] + E[X 3 ] + · · · .
2! 3!

It follows that
ψ ′ (0) = E[X] , ψ ′′ (0) = E[X 2 ] . ( Why ? )

This sometimes facilitates computing the mean


µ = E[X] ,
and the variance
V ar(X) = E[X 2 ] − µ2 .

138
APPLICATION : The Poisson mean and variance :
∞ ∞
X X λk
ψ(t) ≡ E[etX ] = etk P (X = k) = etk e−λ
k=0 k=0
k!

−λ
X (λet )k et t
= e = e−λ eλ = eλ(e −1) .
k=0
k!

t
Here ψ ′ (t) = λ et eλ(e −1)

′′ t 2 t λ(et −1)
 
ψ (t) = λ λ (e ) + e e ( Check ! )

so that E[X] = ψ ′ (0) = λ

E[X 2 ] = ψ ′′ (0) = λ(λ + 1) = λ2 + λ

V ar(X) = E[X 2 ] − E[X]2 = λ .

139
EXAMPLE : Defects in a wire occur at the rate of one per 10 meter,
with a Poisson distribution :
−λ λk
P (X = k) = e · , k = 0, 1, 2, · · · .
k!
What is the probability that :

• A 12-meter roll has at no defects?


ANSWER : Here λ = 1.2 , and P (X = 0) = e−λ = 0.3012 .

• A 12-meter roll of wire has one defect?


ANSWER : With λ = 1.2 , P (X = 1) = e−λ · λ = 0.3614 .

• Of five 12-meter rolls two have one defect and three have none?
 
5
ANSWER : ·0.30123 ·0.36142 = 0.0357 . ( Why ? )
3

140
EXERCISE :
Defects in a certain wire occur at the rate of one per 10 meter.
Assume the defects have a Poisson distribution.
What is the probability that :
• a 20-meter wire has no defects?

• a 20-meter wire has at most 2 defects?

EXERCISE :
Customers arrive at a counter at the rate of 8 per hour.
Assume the arrivals have a Poisson distribution.
What is the probability that :
• no customer arrives in 15 minutes?

• two customers arrive in a period of 30 minutes?

141

You might also like