18 - Expected Value
18 - Expected Value
DISCRETE STRUCTURES II
DARRYL HILL
BASED ON THE TEXTBOOK:
DISCRETE STRUCTURES FOR COMPUTER SCIENCE: COUNTING,
RECURSION, AND PROBABILITY
BY MICHIEL SMID
𝑆: Sample space 𝑆 = {𝑎, 𝑏, 𝑐}
Outcome: Element of 𝑆 7 2 1
Pr 𝑎 = , Pr 𝑏 = , Pr 𝑐 =
Event: Subset of 𝑆 10 10 10
Pr 𝑥 : 𝑥 ∈ 𝑆 → 0,1
σ𝑤∈𝑆 Pr 𝑤 = 1 𝑋 𝑎 = 1, 𝑋 𝑏 = 2, 𝑋 𝑐 = 3
Random Variable: 𝐸 𝑋
𝑃 1 of 𝑋
= expected value
1+2+3
"neither random nor variable" First instinct 𝐸 𝑋 = =2
3
∘
∘ 𝑋: But we choose 𝑎 ∈ 𝑆 70 % of the time, and 𝑏
𝑆 ∘ k ℝ 20% of the time and 𝑐 10% of the time!
∘ What is the average then if we consider the
∘ probabilities?
𝑆: Sample space 𝑆 = {𝑎, 𝑏, 𝑐}
Outcome: Element of 𝑆 7 2 1
Pr 𝑎 = , Pr 𝑏 = , Pr 𝑐 =
Event: Subset of 𝑆 10 10 10
Pr 𝑥 : 𝑥 ∈ 𝑆 → 0,1
σ𝑤∈𝑆 Pr 𝑤 = 1 𝑋 𝑎 = 1, 𝑋 𝑏 = 2, 𝑋 𝑐 = 3
Random Variable: 𝐸 𝑋
𝑃1
= expected value of 𝑋 - "Weighted Average“
Where each value of a random variable is given a
function 𝑋: 𝑆 → ℝ weight proportional to its probability.
Random Variable: 𝐸 𝑋
𝑃1
= expected value of 𝑋 - "Weighted Average“
Where each value of a random variable is given a
function 𝑋: 𝑆 → ℝ weight proportional to its probability.
Random Variable:
𝑃 1 value is:
The definition of expected
function 𝑋: 𝑆 → ℝ
𝐸 𝑋 = 𝑋 𝑤 ⋅ Pr(𝑤)
𝑤∈𝑆
"neither random nor variable"
∘ For the random variable 𝑋 above:
∘ 𝐸 𝑋 = 𝑋 𝑎 ⋅ Pr 𝑎 + 𝑋 𝑏 ⋅ Pr 𝑏 + 𝑋 𝑐 ⋅ Pr(𝑐)
𝑆 ∘
j
k 7 2 1 14
∘
𝐸 𝑋 =1⋅ +2⋅ +3⋅ =
∘ 𝑋: ℝ 10 10 10 10
𝐸 𝑋 = 1 ⋅ Pr 1 + 2 ⋅ Pr 2 + 3 ⋅ Pr 3
𝐸 𝑋 = 𝑋 𝑤 ⋅ Pr(𝑤)
𝑤∈𝑆 +4 ⋅ Pr 4 + 5 ⋅ Pr 5 + 6 ⋅ Pr 6
𝐸 𝑋 = 3.5 7
𝐸 𝑋 = ,
1
2
𝑌= 1 2
𝑟𝑒𝑠𝑢𝑙𝑡 = ≈ 0.286
𝐸(𝑋) 7
1 1
Thus in general 𝐸 ≠
𝑋 𝐸(𝑋)
𝐸 𝑋 = 𝑋 𝑤 ⋅ Pr(𝑤)
𝑤∈𝑆
3 4 5 6 𝑇𝑇 7 𝑇𝐻
8 9
We will look at 3 ways to compute the 4 5 6 7𝐻𝑇 8 9 10
𝑇𝑇
Expected Value. We will go in order of 5 6 7 8 9 10 11
difficulty. 6 7 8 9 10 11 12
𝐸 𝑋 = 𝑋 𝑤 ⋅ Pr(𝑤) 𝑋 𝑖, 𝑗 = sum of all entries
𝑤∈𝑆 𝑖,𝑗 ∈𝑆
= 252
Roll fair red die and fair blue die. 252
𝑆 = 𝑖, 𝑗 1 ≤ 𝑖 ≤ 6, 1 ≤ 𝑗 ≤ 6 } 𝐸 𝑋 = 𝑋 𝑖, 𝑗 ⋅ Pr(𝑖, 𝑗) = =7
36
Uniform probability: Pr 𝑖, 𝑗 =
36
1 𝑖,𝑗 ∈𝑆
𝑃1
𝑋: 𝑆 → ℝ: 𝑋 = red + blue: 𝑋 𝑖, 𝑗 = 𝑖 + 𝑗 1 2 3 4 5 46
1 2 3
𝑃34 5
𝑃26 7
𝐸 𝑋 = 𝑋 𝑖, 𝑗 ⋅ Pr(𝑖, 𝑗) 2 3 4 5 6 7 8
𝑇𝑇 𝑇𝐻
𝑖,𝑗 ∈𝑆 3 4 5 6 7 8 9
4 5 6 𝐻𝑇
7 8 𝑇𝑇9 10
1 5 6 7 8 9 10 11
= 𝑋 𝑖, 𝑗
36 6 7 8 9 10 11 12
𝑖,𝑗 ∈𝑆
𝑋=4 { 3,1 , 2,2 , 1,3 }
𝐸 𝑋 = 𝑋 𝑤 ⋅ Pr(𝑤)
𝑤∈𝑆
We only look at elements of the summation
Roll fair red die and fair blue die. where 𝑋 = 4. There are 3, so the probability
𝑆 = 𝑖, 𝑗 1 ≤ 𝑖 ≤ 6, 1 ≤ 𝑗 ≤ 6 } sums to
3
Uniform probability: Pr 𝑖, 𝑗 =
1
36
𝑃1
36
𝑋: 𝑆 → ℝ: 𝑋 = red + blue: 𝑋 𝑖, 𝑗 = 𝑖 + 𝑗 1 2 3 4 5 6
1 2 3 4 5 6 7
Goal is to get a different formula that is
shorter and easier. 2 3 4 5 6 7 8
3 4 5 6 7 8 9
If we look at the table (which is really just the
function 𝑋 𝑖, 𝑗 ), there are entries that occur 4 5 6 7𝐻𝑇 8 9
𝑇𝑇 10
multiple times. 5 6 7 8 9 10 11
6 7 8 9 10 11 12
For instance, 4 occurs 3 times.
The event 𝑋 = 4 { 3,1 , 2,2 , 1,3 }
𝐸 𝑋 = 𝑋 𝑤 ⋅ Pr(𝑤)
𝑤∈𝑆
3,1 , 2,2 , 1,3
Pr 𝑋 = 4 =
Roll fair red die and fair blue die. 𝑆
𝑆 = 𝑖, 𝑗 1 ≤ 𝑖 ≤ 6, 1 ≤ 𝑗 ≤ 6 }
3
= 𝑃1
1
Uniform probability: Pr 𝑖, 𝑗 =
36 36 5
𝑋: 𝑆 → ℝ: 𝑋 = red + blue: 𝑋 𝑖, 𝑗 = 𝑖 + 𝑗 1 2 3 4 5 6
4
Of course this is the definition of an event, 1 2 3 𝑃34 5 𝑃26 7
4 5 6 7𝐻𝑇 8 9
𝑇𝑇 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12
𝐸 𝑋
𝐸 𝑋 = 𝑋 𝑤 ⋅ Pr(𝑤)
1 2 3 4 5
𝑤∈𝑆 =2⋅ +3⋅ +4⋅ +5⋅ +6⋅
36 36 36 36 36
6 5 4 3 2
Roll fair red die and fair blue die. +7⋅ +8⋅ +9⋅ + 10 ⋅ + 11 ⋅
𝑆 = 𝑖, 𝑗 1 ≤ 𝑖 ≤ 6, 1 ≤ 𝑗 ≤ 6 } 36 36 36 36 36
1
Uniform probability: Pr 𝑖, 𝑗 =
1
36
+ 12 ⋅
36
=7 𝑃1
𝑋: 𝑆 → ℝ: 𝑋 = red + blue: 𝑋 𝑖, 𝑗 = 𝑖 + 𝑗 5
1 2 3 4 5 6
4
We can rewrite our summation then to sum 1 2 3 𝑃34 5 𝑃26 7
5 6 7 8 9 10 11
6 7 8 9 10 11 12
𝐸 𝑋
𝐸 𝑋 = 𝑋 𝑤 ⋅ Pr(𝑤)
1 2 3 4 5
𝑤∈𝑆 =2⋅ +3⋅ +4⋅ +5⋅ +6⋅
36 36 36 36 36
6 5 4 3 2
Roll fair red die and fair blue die. +7⋅ +8⋅ +9⋅ + 10 ⋅ + 11 ⋅
𝑆 = 𝑖, 𝑗 1 ≤ 𝑖 ≤ 6, 1 ≤ 𝑗 ≤ 6 } 36 36 36 36 36
1
Uniform probability: Pr 𝑖, 𝑗 =
1
36
+ 12 ⋅
36
=7 𝑃1
𝑋: 𝑆 → ℝ: 𝑋 = red + blue: 𝑋 𝑖, 𝑗 = 𝑖 + 𝑗 5
1 2 3 4 5 6
4
We can rewrite our summation then to sum 1 2 3 𝑃34 5 𝑃26 7
5 6 7 8 9 10 11
6 7 8 9 10 11 12
𝐸 𝑋
𝐸 𝑋 = 𝑋 𝑤 ⋅ Pr(𝑤)
1 2 3 4 5
𝑤∈𝑆 =2⋅ +3⋅ +4⋅ +5⋅ +6⋅
36 36 36 36 36
6 5 4 3 2
Roll fair red die and fair blue die. +7⋅ +8⋅ +9⋅ + 10 ⋅ + 11 ⋅
𝑆 = 𝑖, 𝑗 1 ≤ 𝑖 ≤ 6, 1 ≤ 𝑗 ≤ 6 } 36 36 36 36 36
1 3
+ 12 ⋅
36
=7 𝑃1
𝑋: 𝑆 → ℝ: 𝑋 = red + blue: 𝑋 𝑖, 𝑗 = 𝑖 + 𝑗 1 5
1 2 3 4 5 6
4
Still pretty painful. But using the former 1 2 3 𝑃34 5 𝑃26 7
method, there are 36 entries to add up. 2 3 4 5 6 7 8
5 6 7 8 9 10 11
6 7 8 9 10 11 12
Random Variable: 𝑋: 𝑆 → ℝ 𝐸 𝑋 = 𝑋(𝑤) ⋅ Pr(𝑤)
𝑤∈𝑆
Event: "𝑋 = 𝑘" {𝑤 ∈ 𝑆: 𝑋 𝑤 = 𝑘}
Gather all elements 𝑤 ∈ 𝑆 for which 𝑋 𝑤 = 𝑘.
𝑘 ∈ range of function 𝑋. Then the above summation can be rewritten as:
𝐸 𝑋 =
𝑃1𝑋(𝑤) ⋅ Pr(𝑤)
Instead of looking at every element of
𝑆, we look at every Event defined by the ∀𝑘 𝑤:𝑋 𝑤 =𝑘
range of 𝑋, that is, all values 𝑋 can take. = 𝑘 ⋅ Pr(𝑤)
∀𝑘 𝑤:𝑋 𝑤 =𝑘
Sum over Events instead of Outcomes
∘ We are still summing over all elements of 𝑆 but
∘ 𝑋: we are dividing into subsets based on the Events
𝑆 ∘ k ℝ 𝑋 = 𝑘.
∘
∘
Random Variable: 𝑋: 𝑆 → ℝ 𝐸 𝑋 = 𝑋(𝑤) ⋅ Pr(𝑤)
𝑤∈𝑆
Event: "𝑋 = 𝑘" {𝑤 ∈ 𝑆: 𝑋 𝑤 = 𝑘}
Gather all elements 𝑤 ∈ 𝑆 for which 𝑋 𝑤 = 𝑘.
𝑘 ∈ range of function 𝑋. Then the above summation can be rewritten as:
𝐸 𝑋 =
𝑃1𝑋(𝑤) ⋅ Pr(𝑤)
Instead of looking at every element of
𝑆, we look at every Event defined by the ∀𝑘 𝑤:𝑋 𝑤 =𝑘
range of 𝑋, that is, all values 𝑋 can take. = 𝑘 ⋅ Pr 𝑤
∀𝑘 𝑤:𝑋 𝑤 =𝑘
Sum over Events instead of Outcomes
∘ The part in brackets is simply our definition of
∘ 𝑋: the Event "𝑋 = 𝑘“. Thus we can rewrite it as
𝑆 ∘ k ℝ
∘ 𝐸 𝑋 = 𝑘 ⋅ Pr(𝑋 = 𝑘)
∘ ∀𝑘
Random Variable: 𝑋: 𝑆 → ℝ These Events are defined by “𝑋 = 𝑘“. So every
element in the Event is mapped to the same
Event: "𝑋 = 𝑘" {𝑤 ∈ 𝑆: 𝑋 𝑤 = 𝑘} value 𝑘.
𝑆 ∘ k ℝ 𝐸 𝑋 = 𝑘 ⋅ Pr(𝑋 = 𝑘)
∘ 𝑘∈𝑟𝑎𝑛𝑔𝑒 𝑜𝑓 𝑋
∘ These expressions are the same (different order).
Random Variable: 𝑋: 𝑆 → ℝ
𝐸 𝑋 = 𝑋(𝑤) ⋅ Pr(𝑤)
Event: "𝑋 = 𝑘" {𝑤 ∈ 𝑆: 𝑋 𝑤 = 𝑘}
𝑤∈𝑆
𝑘 ∈ range of function 𝑋.
𝑃1 = 𝑘)
𝐸 𝑋 = 𝑘 ⋅ Pr(𝑋
Instead of looking at every element of 𝑘
𝑆, we look at every Event defined by the
range of 𝑋, that is, all values 𝑋 can take. These expressions are the same.
Sum over Events instead of Outcomes All we have done to go from one to the other is
∘ we changed the order that we summed over the
∘ 𝑋: elements 𝑤 ∈ 𝑆.
𝑆 ∘ k ℝ
∘ Next we will look at the 3rd technique, Linearity
∘ of Expectation.
𝐸 𝑋 = 𝑋(𝑤) ⋅ Pr(𝑤) We introduce a third random variable
𝑤∈𝑆
𝑍 =𝑋+𝑌
𝐸 𝑋 = 𝑘 ⋅ Pr(𝑋 = 𝑘)
𝑘 𝐸 𝑍 = 𝑍(𝑤) ⋅ Pr(𝑤)
𝑤∈𝑆
Linearity of Expectation: = [𝑋 𝑤
𝑃
+ 𝑌 𝑤 1] ⋅ Pr(𝑤)
𝑤∈𝑆
Given two random variables 𝑋 and 𝑌,
= [𝑋 𝑤 ⋅ Pr(𝑤) + 𝑌 𝑤 ⋅ Pr(𝑤)]
𝐸 𝑋 + 𝑌 = 𝐸 𝑋 + 𝐸(𝑌). 𝑤∈𝑆
= 𝑋 𝑤 ⋅ Pr(𝑤) + 𝑌 𝑤 ⋅ Pr(𝑤)
“The expected value of the sum is equal to 𝑤∈𝑆 𝑤∈𝑆
the sum of the expected values.”
= 𝐸 𝑋 + 𝐸(𝑌)
We will show this follows directly from the
first expression above.
𝐸 𝑋 = 𝑋(𝑤) ⋅ Pr(𝑤) This will work for any number of Random
𝑤∈𝑆
Variables:
𝐸 𝑋 = 𝑘 ⋅ Pr(𝑋 = 𝑘)
𝑍 𝑤 = 𝑧1 𝑤 + 𝑧2 𝑤 + ⋯ + 𝑧𝑛 𝑤
𝑘
Linearity of Expectation: 𝑛 𝑃1
𝑍 𝑤 = 𝑧𝑖 (𝑤)
Given two random variables 𝑋 and 𝑌, 𝑖=1
𝐸 𝑋 + 𝑌 = 𝐸 𝑋 + 𝐸(𝑌). 𝑛
𝐸 𝑍 = 𝑧𝑖 𝑤 ⋅ Pr(𝑤)
“The expected value of the sum is equal to 𝑖=1 𝑤∈𝑆
the sum of the expected values.”
𝑛
We will show this follows directly from the = 𝐸(𝑧𝑖 )
first expression above. 𝑖=1
1 7
Roll fair red die and fair blue die. 𝐸 𝑟𝑒𝑑 = ⋅ 1 + 2 + 3 + 4 + 5 + 6 =
6 2
𝑆 = 𝑖, 𝑗 1 ≤ 𝑖 ≤ 6, 1 ≤ 𝑗 ≤ 6 } 1 7
𝐸 𝑏𝑙𝑢𝑒 = ⋅ 1 + 2 + 3 + 4 + 5 + 6 =
6 2
𝑋: 𝑆 → ℝ: 𝑋 = red + blue: 𝑋 𝑖, 𝑗 = 𝑖 + 𝑗 Using linearity of expectation:
If everyone in the world does this, will there This is an infinite probability space that we
be more girls than boys in the world, or more will apply Expected Value to.
boys than girls in the world?
0 < 𝑝 < 1: What is the sample space?
Experiment -> success with prob 𝑝
-> failure prob 1 − 𝑝 We have seen this before:
∞ ∞
0 < 𝑝 < 1:
Coin comes up: Pr(𝑇 𝑘−1 𝐻) = 1 − 𝑝 𝑘−1
⋅𝑝
3
H with prob 𝑝 𝑘=1
∞
𝑃
𝑘=1 1
T with prob 1 − 𝑝 𝑘−1 Let 𝑖 = 𝑘 − 1
=𝑝⋅ 1−𝑝
𝑘=1
Flip coins until H, each coin flip is independent. ∞
𝑋 = number of flips =𝑝⋅ 1−𝑝 𝑖
𝑖=0
What is 𝐸(𝑋)? Substitute 1 − 𝑝 for 𝑥:
1 1
𝑆 = {𝑇 𝑘−1 𝐻: 𝑘 ≥ 1} =𝑝⋅ =𝑝⋅ =1
1− 1−𝑝 𝑝
Pr(𝑇 𝑘−1 𝐻) = Pr(𝑋 = 𝑘) = 1 − 𝑝 𝑘−1 ⋅𝑝
∞ We will use the expression for 𝐸 𝑋 where
𝑘
1 we iterate over the range of 𝑋. What is the
𝑥 =
1−𝑥 range of 𝑋?
𝑘=0
𝑘 is the number of flips in the sequence.
0 < 𝑝 < 1:
Coin comes up: We have 𝑘 ≥ 1 and 𝑘 → ∞. Thus:
3
H with prob 𝑝 𝑃1
∞
T with prob 1 − 𝑝
𝐸 𝑋 = 𝑘 ⋅ Pr(𝑋 = 𝑘)
Flip coins until H, each coin flip is independent. 𝑘=1
𝑋 = number of flips
What is the Event 𝑋 = 𝑘? It is when there are
What is 𝐸(𝑋)? exactly 𝑘 coin flips (ending in heads). Thus
𝑆 = {𝑇 𝑘−1
𝐻: 𝑘 ≥ 1} “𝑋 = 𝑘" = {𝑇 𝑘−1 𝐻}
Pr(𝑇 𝑘−1 𝐻) = Pr(𝑋 = 𝑘) = 1 − 𝑝 𝑘−1 ⋅𝑝 So Pr 𝑋 = 𝑘 = Pr(𝑇 𝑘−1 𝐻)
∞
∞
𝑘
1 𝐸 𝑋 = 𝑘 ⋅ Pr(𝑋 = 𝑘)
𝑥 =
1−𝑥 𝑘=1
𝑘=0
0 < 𝑝 < 1: ∞
Coin comes up:
H with prob 𝑝 = 𝑘 ⋅ 1 −3𝑝 𝑘−1
𝑃 ⋅𝑝 1
𝑘=1
T with prob 1 − 𝑝
∞
Flip coins until H, each coin flip is independent. =𝑝⋅𝑘⋅ 1−𝑝 𝑘−1
𝑋 = number of flips 𝑘=1
𝑆 = {𝑇 𝑘−1 𝐻: 𝑘 ≥ 1}
Pr(𝑇 𝑘−1 𝐻) = Pr(𝑋 = 𝑘) = 1 − 𝑝 𝑘−1 ⋅𝑝
∞ ∞
1
𝑥𝑘 = 𝐸 𝑋 = 𝑘 ⋅ Pr(𝑋 = 𝑘)
1−𝑥
𝑘=0 𝑘=1
∞
0 < 𝑝 < 1: 𝑘−1
Coin comes up: = 𝑝⋅𝑘⋅ 1−𝑝
𝑘=1
H with prob 𝑝
3
T with prob 1 − 𝑝
Without the 𝑝 we have:
𝑃1
Flip coins until H, each coin flip is 2 3…
1+2⋅ 1−𝑝 +3⋅ 1−𝑝 +4⋅ 1−𝑝
independent.
𝑋 = number of flips
If the 𝑘 were gone we understand how to solve
this.
What is 𝐸(𝑋)?