Chap2-Discrete Random Variables
Chap2-Discrete Random Variables
71
Value-ranges of a random variable correspond to events in S .
72
Value-ranges of a random variable correspond to events in S ,
and
events in S have a probability .
Thus
Value-ranges of a random variable have a probability .
73
NOTATION : We will also write pX (x) to denote P (X = x) .
S = {HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } ,
with
X(s) = the number of Heads ,
we have
1
pX (0) ≡ P ( {T T T } ) = 8
3
pX (1) ≡ P ( {HT T , T HT , T T H} ) = 8
3
pX (2) ≡ P ( {HHT , HT H , T HH} ) = 8
1
pX (3) ≡ P ( {HHH} ) = 8
where
pX (0) + pX (1) + pX (2) + pX (3) = 1 . ( Why ? )
74
E3 X(s)
HHH
S
E2 3
HHT
HTH
2
THH
HTT
E1 1
THT
TTH
E0 0
TTT
Graphical representation of X .
75
The graph of pX .
76
DEFINITION :
pX (x) ≡ P (X = x) ,
is called the probability mass function .
DEFINITION :
FX (x) ≡ P (X ≤ x) ,
is called the (cumulative) probability distribution function .
PROPERTIES :
• FX (x) is a non-decreasing function of x . ( Why ? )
• FX (−∞) = 0 and FX (∞) = 1 . ( Why ? )
• P (a < X ≤ b) = FX (b) − FX (a) . ( Why ? )
77
EXAMPLE : With X(s) = the number of Heads , and
S = {HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } ,
1 3 3 1
p(0) = 8
, p(1) = 8
, p(2) = 8
, p(3) = 8
,
we have the probability distribution function
F (−1) ≡ P (X ≤ −1) = 0
1
F ( 0) ≡ P (X ≤ 0) = 8
4
F ( 1) ≡ P (X ≤ 1) = 8
7
F ( 2) ≡ P (X ≤ 2) = 8
F ( 3) ≡ P (X ≤ 3) = 1
F ( 4) ≡ P (X ≤ 4) = 1
We see, for example, that
P (0 < X ≤ 2) = P (X = 1) + P (X = 2)
7 1 6
= F (2) − F (0) = 8
− 8
= 8
.
78
The graph of the probability distribution function FX .
79
EXAMPLE : Toss a coin until ”Heads” occurs.
Then the sample space is countably infinite , namely,
S = {H , T H , T T H , T T T H , · · · } .
80
X(s) is the number of tosses until ”Heads” occurs · · ·
81
Joint distributions
The probability mass function and the probability distribution function
can also be functions of more than one variable.
S = {HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } ,
we let
X(s) = # Heads , Y (s) = index of the first H (0 for T T T ) .
Then we have the joint probability mass function
pX,Y (x, y) = P (X = x , Y = y) .
For example,
pX,Y (2, 1) = P (X = 2 , Y = 1)
2 1
= 8
= 4
.
82
EXAMPLE : ( continued · · · ) For
S = {HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } ,
X(s) = number of Heads, and Y (s) = index of the first H ,
NOTE :
• The marginal probability pX is the probability mass function of X.
• The marginal probability pY is the probability mass function of Y .
83
EXAMPLE : ( continued · · · )
X(s) = number of Heads, and Y (s) = index of the first H .
For example,
84
X(s)
Y(s)
E 31
HHH
S
E 13 3
TTH
E11
HTT
1
E12 THT
E 21 HTH
2
HHT
E 22 THH
0
E 00
TTT
85
DEFINITION :
pX,Y (x, y) ≡ P (X = x , Y = y) ,
is called the joint probability mass function .
DEFINITION :
FX,Y (x, y) ≡ P (X ≤ x , Y ≤ y) ,
is called the joint (cumulative) probability distribution function .
86
EXAMPLE : Three tosses : X(s) = # Heads, Y (s) = index 1st H .
Joint probability mass function pX,Y (x, y)
y=0 y=1 y=2 y=3 pX (x)
1 1
x=0 8 0 0 0 8
1 1 1 3
x=1 0 8 8 8 8
2 1 3
x=2 0 8 8 0 8
1 1
x=3 0 8 0 0 8
1 4 2 1
pY (y) 8 8 8 8 1
87
In the preceding example :
Joint probability mass function pX,Y (x, y)
y=0 y=1 y=2 y=3 pX (x)
1 1
x=0 8 0 0 0 8
1 1 1 3
x=1 0 8 8 8 8
2 1 3
x=2 0 8 8 0 8
1 1
x=3 0 8 0 0 8
1 4 2 1
pY (y) 8 8 8 8 1
QUESTION : Why is
P (1 < X ≤ 3 , 1 < Y ≤ 3) = F (3, 3) − F (1, 3) − F (3, 1) + F (1, 1) ?
88
EXERCISE :
Roll a four-sided die (tetrahedron) two times.
(The sides are marked 1 , 2 , 3 , 4 .)
Suppose each of the four sides is equally likely to end facing down.
Suppose the outcome of a single roll is the side that faces down ( ! ).
89
EXERCISE :
90
Independent random variables
91
X(s)
Y(s)
E 31
HHH
S
E 13 3
TTH
E11
HTT
1
E12 THT
E 21 HTH
2
HHT
E 22 THH
0
E 00
TTT
92
RECALL :
EXERCISE :
Let
X be the result of the 1st roll ,
and
Y the result of the 2nd roll .
93
EXERCISE :
94
EXERCISE : Are these random variables X and Y independent ?
95
PROPERTY :
The joint distribution function of independent random variables
X and Y satisfies
FX,Y (x, y) = FX (x) · FY (y) , for all x, y .
PROOF :
FX,Y (xk , yℓ ) = P (X ≤ xk , Y ≤ yℓ )
P P
= i≤k j≤ℓ pX,Y (xi , yj )
P P
= i≤k j≤ℓ pX (xi ) · pY (yj ) (by independence)
P P
= i≤k { pX (xi ) · j≤ℓ pY (yj ) }
P P
= { i≤k pX (xi ) } · { j≤ℓ pY (yj ) }
= FX (xk ) · FY (yℓ ) .
96
Conditional distributions
Then
P (Ex Ey ) pX,Y (x, y)
P (Ex |Ey ) ≡ = .
P (Ey ) pY (y)
97
X(s)
Y(s)
E 31
HHH
S
E 13 3
TTH
E11
HTT
1
E12 THT
E 21 HTH
2
HHT
E 22 THH
0
E 00
TTT
98
EXAMPLE : (3 tosses : X(s) = # Heads, Y (s) = index 1st H.)
Joint probability mass function pX,Y (x, y)
y = 0 y = 1 y = 2 y = 3 pX (x)
x=0 1
8 0 0 0 1
8
x=1 0 1
8
1
8
1
8
3
8
x=2 0 2
8
1
8 0 3
8
x=3 0 1
8 0 0 1
8
pY (y) 1
8
4
8
2
8
1
8 1
pX,Y (x,y)
Conditional probability mass function pX|Y (x|y) = pY (y)
.
y=0 y=1 y=2 y=3
x=0 1 0 0 0
x=1 0 2
8
4
8 1
x=2 0 4
8
4
8 0
x=3 0 2
8 0 0
1 1 1 1
pX,Y (x,y)
EXERCISE : Also construct the Table for pY |X (y|x) = pX (x)
.
99
EXAMPLE :
Joint probability mass function pX,Y (x, y)
y = 1 y = 2 y = 3 pX (x)
x=1 1
3
1
12
1
12
1
2
x=2 2
9
1
18
1
18
1
3
x=3 1
9
1
36
1
36
1
6
pY (y) 2
3
1
6
1
6 1
pX,Y (x,y)
Conditional probability mass function pX|Y (x|y) = pY (y)
.
y=1 y=2 y=3
x=1 1
2
1
2
1
2
x=2 1
3
1
3
1
3
x=3 1
6
1
6
1
6
1 1 1
100
Expectation
The expected value of a discrete random variable X is
X X
E[X] ≡ xk · P (X = xk ) = xk · pX (xk ) .
k k
101
EXAMPLE : Toss a coin until ”Heads” occurs. Then
S = {H , T H , T T H , T T T H , · · · } .
102
The expected value of a function of a random variable is
X
E[g(X)] ≡ g(xk ) p(xk ) .
k
EXAMPLE :
The pay-off of rolling a die is $k2 , where k is the side facing up.
What should the entry fee be for the betting to break even?
103
The expected value of a function of two random variables is
XX
E[g(X, Y )] ≡ g(xk , yℓ ) p(xk , yℓ ) .
k ℓ
1 1 1 5
E[X] = 1· 2
+ 2· 3
+ 3· 6
= 3
,
2 1 1 3
E[Y ] = 1· 3
+ 2· 6
+ 3· 6
= 2
,
1 1 1
E[XY ] = 1· 3
+ 2· 12
+ 3· 12
2 1 1
+ 2· 9
+ 4· 18
+ 6· 18
1 1 1 5 ( So ? )
+ 3· 9
+ 6· 36
+ 9· 36
= 2
.
104
PROPERTY :
PROOF :
P P
E[XY ] = k ℓ xk yℓ pX,Y (xk , yℓ )
P P
= k ℓ xk yℓ pX (xk ) pY (yℓ ) (by independence)
P P
= k{ xk pX (xk ) ℓ yℓ pY (yℓ )}
P P
= { k xk pX (xk )} · { ℓ yℓ pY (yℓ )}
= E[X] · E[Y ] .
105
PROPERTY : E[X + Y ] = E[X] + E[Y ] . ( Always ! )
PROOF :
P P
E[X + Y ] = k ℓ (xk + yℓ ) pX,Y (xk , yℓ )
P P P P
= k ℓ xk pX,Y (xk , yℓ ) + k ℓ yℓ pX,Y (xk , yℓ )
P P P P
= k ℓ xk pX,Y (xk , yℓ ) + ℓ k yℓ pX,Y (xk , yℓ )
P P P P
= k {xk ℓ pX,Y (xk , yℓ )} + ℓ{ yℓ k pX,Y (xk , yℓ )}
P P
= k {xk pX (xk )} + ℓ {yℓ pY (yℓ )}
= E[X] + E[Y ] .
106
EXERCISE :
Probability mass function pX,Y (x, y)
y = 6 y = 8 y = 10 pX (x)
x=1 1
5 0 1
5
2
5
x=2 0 1
5 0 1
5
x=3 1
5 0 1
5
2
5
pY (y) 2
5
1
5
2
5 1
Show that
Thus if
E[XY ] = E[X] E[Y ] ,
107
Variance and Standard Deviation
Let X have mean
µ = E[X] .
We have
V ar(X) = E[X 2 − 2µX + µ2 ]
= E[X 2 ] − 2µE[X] + µ2
= E[X 2 ] − 2µ2 + µ2
= E[X 2 ] − µ2 .
108
The standard deviation of X is
p p p
σ(X) ≡ V ar(X) = E[ (X − µ)2 ] = E[X 2 ] − µ2 .
1 6(6 + 1)(2 · 6 + 1) 7 2 35
= − ( ) = .
6 6 2 12
109
Covariance
Let X and Y be random variables with mean
E[X] = µX , E[Y ] = µY .
We have
Cov(X, Y ) = E[ (X − µX ) (Y − µY ) ]
= E[XY − µX Y − µY X + µX µY ]
= E[XY ] − µX µY − µY µX + µX µY
110
We defined
Cov(X, Y ) ≡ E[ (X − µX ) (Y − µY ) ]
X
= (xk − µX ) (yℓ − µY ) p(xk , yℓ )
k,ℓ
Cov(X, Y ) > 0 .
Cov(X, Y ) < 0 .
111
EXERCISE : Prove the following :
• V ar(aX + b) = a2 V ar(X) ,
• Cov(X, Y ) = Cov(Y, X) ,
• Cov(cX, Y ) = c Cov(X, Y ) ,
• Cov(X, cY ) = c Cov(X, Y ) ,
112
PROPERTY :
PROOF :
113
EXERCISE : ( already used earlier · · · )
Probability mass function pX,Y (x, y)
y = 6 y = 8 y = 10 pX (x)
x=1 1
5 0 1
5
2
5
x=2 0 1
5 0 1
5
x=3 1
5 0 1
5
2
5
pY (y) 2
5
1
5
2
5 1
Show that
• E[X] = 2 , E[Y ] = 8 , E[XY ] = 16
• Cov(X, Y ) = E[XY ] − E[X] E[Y ] = 0
• X and Y are not independent
Thus if
Cov(X, Y ) = 0 ,
114
PROPERTY :
PROOF :
Cov(X, Y ) = 0 ,
115
EXERCISE :
Compute
E[X] , E[Y ] , E[X 2 ] , E[Y 2 ]
Cov(X, Y )
for
116
EXERCISE :
Compute
E[X] , E[Y ] , E[X 2 ] , E[Y 2 ]
Cov(X, Y )
for
117
SPECIAL DISCRETE RANDOM VARIABLES
P (X = 0) = 1−p ,
e.g., tossing a coin, winning or losing a game, · · · .
We have
E[X] = 1 · p + 0 · (1 − p) = p ,
E[X 2 ] = 12 · p + 02 · (1 − p) = p ,
118
EXAMPLES :
1
• When p = 2
(e.g., for tossing a coin), we have
1 1
E[X] = p = 2
, V ar(X) = p(1 − p) = 4
.
119
The Binomial Random Variable
An outcome could be
100011001010 (n = 12) ,
with probability
P (100011001010) = p5 · (1 − p)7 . ( Why ? )
For example,
X(100011001010) = 5 .
We have
12
P (X = 5) = · p5 · (1 − p)7 . ( Why ? )
5
120
In general, for k successes in a sequence of n trials, we have
n
P (X = k) = · pk · (1 − p)n−k , (0 ≤ k ≤ n) .
k
121
1
The Binomial mass and distribution functions for n = 12 , p = 2
122
For k successes in a sequence of n trials :
n
P (X = k) = · pk · (1 − p)n−k , (0 ≤ k ≤ n) .
k
123
1
The Binomial mass and distribution functions for n = 12 , p = 6
124
EXAMPLE :
In 12 rolls of a die write the outcome as, for example,
100011001010
where
1 denotes the roll resulted in a six ,
and
0 denotes the roll did not result in a six .
P (X = 5) ∼
= 2.8 % , P (X ≤ 5) ∼
= 99.2 % .
125
EXERCISE : Show that from
n
P (X = k) = · pk · (1 − p)n−k ,
k
and
n
P (X = k + 1) = · pk+1 · (1 − p)n−k−1 ,
k+1
it follows that
P (X = k + 1) = ck · P (X = k) ,
where n−k p
ck = · .
k+1 1−p
P (X = k + 1) = ck · P (X = k) , k = 0, 1, · · · , n − 1 .
126
Mean and variance of the Binomial random variable :
then
X ≡ X1 + X2 + · · · + Xn ,
is the Binomial random variable that counts the successes ” .
127
X ≡ X1 + X2 + · · · + Xn
We know that
E[Xk ] = p ,
so
E[X] = E[X1 ] + E[X2 ] + · · · + E[Xn ] = np .
128
EXAMPLES :
129
The Poisson Random Variable
130
A stable and efficient way to compute the Poisson probability
λk
P (X = k) = e−λ · , k = 0, 1, 2, · · · ,
k!
λk+1
P (X = k + 1) = e−λ · ,
(k + 1)!
P (X = 0) = e−λ ,
λ
P (X = k + 1) = · P (X = k) , k = 0, 1, 2, · · · .
k+1
131
The Poisson random variable
λk
P (X = k) = e−λ · , k = 0, 1, 2, · · · ,
k!
has (as shown later) : E[X] = λ and V ar(X) = λ .
132
The Poisson random variable
−λ λk
P (X = k) = e · , k = 0, 1, 2, · · · ,
k!
models the probability of k ”successes ” in a given ”time” interval,
when the average number of successes is λ .
133
λk
n ∼
pBinomial (k) = pk (1 − p)n−k = pPoisson (k) = e−λ
k k!
EXAMPLE : λ = 6 customers/hour.
For the Binomial take n = 12 , p = 0.5 (0.5 customers/5 minutes) ,
so that indeed np = λ .
k pBinomial pPoisson FBinomial FPoisson
0 0.0002 0.0024 0.0002 0.0024
1 0.0029 0.0148 0.0031 0.0173
2 0.0161 0.0446 0.0192 0.0619
3 0.0537 0.0892 0.0729 0.1512
4 0.1208 0.1338 0.1938 0.2850
5 0.1933 0.1606 0.3872 0.4456
6 0.2255 0.1606 0.6127 0.6063
7 0.1933 0.1376 0.8061 0.7439
8 0.1208 0.1032 0.9270 0.8472
9 0.0537 0.0688 0.9807 0.9160
10 0.0161 0.0413 0.9968 0.9573
11 0.0029 0.0225 0.9997 0.9799
12 0.0002 0.0112 1.0000 0.9911⋆ Why not 1.0000 ?
134
λk
n ∼
pBinomial (k) = pk (1 − p)n−k = pPoisson (k) = e−λ
k k!
EXAMPLE : λ = 6 customers/hour.
For the Binomial take n = 60 , p = 0.1 (0.1 customers/minute) ,
so that indeed np = λ .
k pBinomial pPoisson FBinomial FPoisson
0 0.0017 0.0024 0.0017 0.0024
1 0.0119 0.0148 0.0137 0.0173
2 0.0392 0.0446 0.0530 0.0619
3 0.0843 0.0892 0.1373 0.1512
4 0.1335 0.1338 0.2709 0.2850
5 0.1662 0.1606 0.4371 0.4456
6 0.1692 0.1606 0.6064 0.6063
7 0.1451 0.1376 0.7515 0.7439
8 0.1068 0.1032 0.8583 0.8472
9 0.0685 0.0688 0.9269 0.9160
10 0.0388 0.0413 0.9657 0.9573
11 0.0196 0.0225 0.9854 0.9799
12 0.0089 0.0112 0.9943 0.9911
13 ··· ··· ··· ···
135
1
n = 12 , p = 2
, λ=6 n = 200 , p = 0.01 , λ = 2
136
For the Binomial random variable we found
E[X] = np and V ar(X) = np(1 − p) ,
137
FACT : (The Method of Moments)
It follows that
ψ ′ (0) = E[X] , ψ ′′ (0) = E[X 2 ] . ( Why ? )
138
APPLICATION : The Poisson mean and variance :
∞ ∞
X X λk
ψ(t) ≡ E[etX ] = etk P (X = k) = etk e−λ
k=0 k=0
k!
∞
−λ
X (λet )k et t
= e = e−λ eλ = eλ(e −1) .
k=0
k!
t
Here ψ ′ (t) = λ et eλ(e −1)
′′ t 2 t λ(et −1)
ψ (t) = λ λ (e ) + e e ( Check ! )
139
EXAMPLE : Defects in a wire occur at the rate of one per 10 meter,
with a Poisson distribution :
−λ λk
P (X = k) = e · , k = 0, 1, 2, · · · .
k!
What is the probability that :
• Of five 12-meter rolls two have one defect and three have none?
5
ANSWER : ·0.30123 ·0.36142 = 0.0357 . ( Why ? )
3
140
EXERCISE :
Defects in a certain wire occur at the rate of one per 10 meter.
Assume the defects have a Poisson distribution.
What is the probability that :
• a 20-meter wire has no defects?
EXERCISE :
Customers arrive at a counter at the rate of 8 per hour.
Assume the arrivals have a Poisson distribution.
What is the probability that :
• no customer arrives in 15 minutes?
141