0% found this document useful (0 votes)
13 views7 pages

Mit18 05 s22 Class04-Prep-B

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views7 pages

Mit18 05 s22 Class04-Prep-B

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Discrete Random Variables: Expected Value

Class 4, 18.05
Jeremy Orloff and Jonathan Bloom

1 Learning Goals

1. Know how to compute expected value (mean) of a discrete random variable.

2. Know the expected value of Bernoulli, binomial and geometric random variables.

2 Expected Value

In the R reading questions for this lecture, you simulated the average value of rolling a die
many times. You should have gotten a value close to the exact answer of 3.5. To motivate
the formal definition of the average, or expected value, we first consider some examples.
Example 1. Suppose we have a six-sided die marked with five 5 3’s and one 6. (This was
the red one from our non-transitive dice.) What would you expect the average of 6000 rolls
to be?
Solution: If we knew the value of each roll, we could compute the average by summing
the 6000 values and dividing by 6000. Without knowing the values, we can compute the
expected average as follows.
Since there are five 3’s and one six we expect roughly 5/6 of the rolls will give 3 and 1/6 will
give 6. Assuming this to be exactly true, we have the following table of values and counts:
value: 3 6
expected counts: 5000 1000
The average of these 6000 values is then
5000 ⋅ 3 + 1000 ⋅ 6 5 1
= ⋅ 3 + ⋅ 6 = 3.5
6000 6 6
We consider this the expected average in the sense that we ‘expect’ each of the possible
values to occur with the given frequencies.

Example 2. We roll two standard 6-sided dice. You win $1000 if the sum is 2 and lose
$100 otherwise. How much do you expect to win on average per trial?
1
Solution: The probability of a 2 is 1/36. If you play 𝑁 times, you can ‘expect’ ⋅𝑁
36
of the trials to give a 2 and 35
36 ⋅ 𝑁 of the trials to give something else. Thus your total
expected winnings are
𝑁 35𝑁
1000 ⋅ − 100 ⋅ .
36 36
To get the expected average per trial we divide the total by 𝑁 :
1 35
expected average = 1000 ⋅ − 100 ⋅ = −69.44.
36 36

1
18.05 Class 4, Discrete Random Variables: Expected Value, Spring 2022 2

Think: Would you be willing to play this game one time? Multiple times?

Notice that in both examples the sum for the expected average consists of terms which are
a value of the random variable times its probabilitiy. This leads to the following definition.
Definition: Suppose 𝑋 is a discrete random variable that takes values 𝑥1 , 𝑥2 , …, 𝑥𝑛 with
probabilities 𝑝(𝑥1 ), 𝑝(𝑥2 ), …, 𝑝(𝑥𝑛 ). The expected value of 𝑋 is denoted 𝐸[𝑋] and defined
by
𝑛
𝐸[𝑋] = ∑ 𝑝(𝑥𝑗 ) 𝑥𝑗 = 𝑝(𝑥1 )𝑥1 + 𝑝(𝑥2 )𝑥2 + … + 𝑝(𝑥𝑛 )𝑥𝑛 .
𝑗=1

Notes:

1. The expected value is also called the mean or average of 𝑋 and often denoted by 𝜇
(“mu”).

2. As seen in the above examples, the expected value need not be a possible value of the
random variable. Rather it is a weighted average of the possible values.

3. Expected value is a summary statistic, providing a measure of the location or central


tendency of a random variable.

4. If all the values are equally probable then the expected value is just the usual average of
the values.

Example 3. Find 𝐸[𝑋] for the random variable X with table:


values of 𝑋: 1 3 5
pmf: 1/6 1/6 2/3
1 1 2 24
Solution: 𝐸[𝑋] = ⋅1+ ⋅3+ ⋅5= =4
6 6 3 6
Example 4. Let 𝑋 be a Bernoulli(𝑝) random variable. Find 𝐸[𝑋].
Solution: 𝑋 takes values 1 and 0 with probabilities 𝑝 and 1 − 𝑝, so

𝐸[𝑋] = 𝑝 ⋅ 1 + (1 − 𝑝) ⋅ 0 = 𝑝.

Important: This is an important example. Be sure to remember that the expected value of
a Bernoulli(𝑝) random variable is 𝑝.
Think: What is the expected value of the sum of two dice?

2.1 Mean and center of mass

You may have wondered why we use the name ‘probability mass function’. Here’s one
reason: if we place an object of mass 𝑝(𝑥𝑗 ) at position 𝑥𝑗 for each 𝑗, then 𝐸[𝑋] is the
position of the center of mass. Let’s recall the latter notion via an example.
Example 5. Suppose we have two masses along the 𝑥-axis, mass 𝑚1 = 500 at position
𝑥1 = 3 and mass 𝑚2 = 100 at position 𝑥2 = 6. Where is the center of mass?
18.05 Class 4, Discrete Random Variables: Expected Value, Spring 2022 3

Solution: Intuitively we know that the center of mass is closer to the larger mass.
𝑚1 𝑚2
𝑥
3 6

From physics we know the center of mass is


𝑚1 𝑥1 + 𝑚2 𝑥2 500 ⋅ 3 + 100 ⋅ 6
𝑥= = = 3.5.
𝑚1 + 𝑚2 600
We call this formula a ‘weighted’ average of the 𝑥1 and 𝑥2 . Here 𝑥1 is weighted more heavily
because it has more mass.
Now look at the definition of expected value 𝐸[𝑋]. It is a weighted average of the values of
𝑋 with the weights being probabilities 𝑝(𝑥𝑖 ) rather than masses! We might say that “The
expected value is the point at which the distribution would balance”. Note the similarity
between the physics example and Example 1.

2.2 Algebraic properties of 𝐸[𝑋]

When we add, scale or shift random variables the expected values do the same. The
shorthand mathematical way of saying this is that 𝐸[𝑋] is linear.
1. If 𝑋 and 𝑌 are random variables on a sample space Ω then

𝐸[𝑋 + 𝑌 ] = 𝐸[𝑋] + 𝐸[𝑌 ]

2. If 𝑎 and 𝑏 are constants then

𝐸[𝑎𝑋 + 𝑏] = 𝑎𝐸[𝑋] + 𝑏.

We will think of 𝑎𝑋 + 𝑏 as scaling 𝑋 by 𝑎 and shifting it by 𝑏.

Before proving these properties, let’s see them in action with a few examples.

Example 6. Roll two dice and let 𝑋 be the sum. Find 𝐸[𝑋].
Solution: Let 𝑋1 be the value on the first die and let 𝑋2 be the value on the second
die. Since 𝑋 = 𝑋1 + 𝑋2 we have 𝐸[𝑋] = 𝐸[𝑋1 ] + 𝐸[𝑋2 ]. Earlier we computed that
𝐸[𝑋1 ] = 𝐸[𝑋2 ] = 3.5, therefore 𝐸[𝑋] = 7.

Example 7. Let 𝑋 ∼ binomial(𝑛, 𝑝). Find 𝐸[𝑋].


Solution: Recall that 𝑋 models the number of successes in 𝑛 Bernoulli(𝑝) random variables,
which we’ll call 𝑋1 , … 𝑋𝑛 . The key fact, which we highlighted in the previous reading for
this class, is that
𝑛
𝑋 = 𝑋 1 + 𝑋 2 + … + 𝑋 𝑛 = ∑ 𝑋𝑗 .
𝑗=1

Now we can use the Algebraic Property (1) to make the calculation simple.
𝑛
𝑋 = ∑ 𝑋𝑗 ⇒ 𝐸[𝑋] = ∑ 𝐸[𝑋𝑗 ] = ∑ 𝑝 = 𝑛𝑝 .
𝑗=1 𝑗 𝑗
18.05 Class 4, Discrete Random Variables: Expected Value, Spring 2022 4

We could have computed 𝐸[𝑋] directly as


𝑛 𝑛
𝑛
𝐸[𝑋] = ∑ 𝑘𝑝(𝑘) = ∑ 𝑘( )𝑝𝑘 (1 − 𝑝)𝑛−𝑘 .
𝑘=0 𝑘=0
𝑘

It is possible to show that the sum of this series is indeed 𝑛𝑝. We think you’ll agree that
the method using Property (1) is much easier.

Example 8. (For infinite random variables the mean does not always exist.) Suppose 𝑋
has an infinite number of values according to the following table.
values 𝑥: 2 22 23 … 2𝑘 …
Try to compute the mean.
pmf 𝑝(𝑥): 1/2 1/22 1/23 … 1/2𝑘 …
Solution: The mean is
∞ ∞
1
𝐸[𝑋] = ∑ 2𝑘 = ∑ 1 = ∞.
𝑘=1
2𝑘 𝑘=1

The mean does not exist! This can happen with infinite series.
Example 9. Mean of a geometric distribution
Let 𝑋 ∼ geo(𝑝). Recall this means 𝑋 takes values 𝑘 = 0, 1, 2, …with probabilities 𝑝(𝑘) =
(1 − 𝑝)𝑘 𝑝. (𝑋 models the number of tails before the first heads in a sequence of Bernoulli
trials.) The mean is given by
1−𝑝
𝐸[𝑋] = .
𝑝
To see this requires a clever trick. Mathematicians love this sort of thing and we hope you
are able to follow the logic and enjoy it. In this class we will not ask you to come up with
something like this on an exam.
Here’s the trick.: to compute 𝐸[𝑋] we have to sum the infinite series

𝐸[𝑋] = ∑ 𝑘(1 − 𝑝)𝑘 𝑝.
𝑘=0


1
Now, we know the sum of the geometric series: ∑ 𝑥𝑘 = .
𝑘=0
1−𝑥

1
Differentiate both sides: ∑ 𝑘𝑥𝑘−1 = .
𝑘=0
(1 − 𝑥)2

𝑥
Multiply by 𝑥: ∑ 𝑘𝑥𝑘 = .
𝑘=0
(1 − 𝑥)2

1−𝑝
Replace 𝑥 by 1 − 𝑝: ∑ 𝑘(1 − 𝑝)𝑘 = .
𝑘=0
𝑝2

1−𝑝
Multiply by 𝑝: ∑ 𝑘(1 − 𝑝)𝑘 𝑝 = .
𝑘=0
𝑝
This last expression is the mean.
1−𝑝
𝐸[𝑋] = .
𝑝
18.05 Class 4, Discrete Random Variables: Expected Value, Spring 2022 5

Example 10. Flip a fair coin until you get heads for the first time. What is the expected
number of times you flipped tails?
Solution: The number of tails before the first head is modeled by 𝑋 ∼ geo(1/2). From the
1/2
previous example 𝐸[𝑋] = = 1. This is a surprisingly small number.
1/2

Example 11. Michael Jordan, perhaps the greatest basketball player ever, made 80% of
his free throws. In a game what is the expected number he would make before his first miss.
Solution: Here is an example where we want the number of successes before the first
failure. Using the neutral language of heads and tails: success is tails (probability 1 − 𝑝)
and failure is heads (probability = 𝑝). Therefore 𝑝 = 0.2 and the number of tails (made
free throws) before the first heads (missed free throw) is modeled by a 𝑋 ∼ geo(0.2). We
saw in Example 9 that this is
1−𝑝 0.8
𝐸[𝑋] = = = 4.
𝑝 0.2

2.3 Expected values of functions of a random variable

(The change of variables formula.)


If 𝑋 is a discrete random variable taking values 𝑥1 , 𝑥2 , …and ℎ is a function then ℎ(𝑋) is
a new random variable. Its expected value is

𝐸[ℎ(𝑋)] = ∑ ℎ(𝑥𝑗 )𝑝(𝑥𝑗 ).


𝑗

We illustrate this with several examples.


Example 12. Let 𝑋 be the value of a roll of one die and let 𝑌 = 𝑋 2 . Find 𝐸[𝑌 ].
Solution: Since there are a small number of values we can make a table.
𝑋 1 2 3 4 5 6
𝑌 1 4 9 16 25 36
prob 1/6 1/6 1/6 1/6 1/6 1/6
Notice the probability for each 𝑌 value is the same as that of the corresponding 𝑋 value.
So,
1 1 1
𝐸[𝑌 ] = 𝐸[𝑋 2 ] = 12 ⋅ + 22 ⋅ + … + 62 ⋅ = 15.167.
6 6 6
Example 13. Roll two dice and let 𝑋 be the sum. Suppose the payoff function is given by
𝑌 = 𝑋 2 − 6𝑋 + 1. Is this a good bet?
12
Solution: We have 𝐸[𝑌 ] = ∑(𝑗2 − 6𝑗 + 1)𝑝(𝑗), where 𝑝(𝑗) = 𝑃 (𝑋 = 𝑗).
𝑗=2

We show the table, but really we’ll use R to do the calculation.


𝑋 2 3 4 5 6 7 8 9 10 11 12
𝑌 -7 -8 -7 -4 1 8 17 28 41 56 73
prob 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36
Here’s the R code I used to compute 𝐸[𝑌 ] = 13.833.
18.05 Class 4, Discrete Random Variables: Expected Value, Spring 2022 6

x = 2:12
y = x^2 - 6*x + 1
p = c(1 2 3 4 5 6 5 4 3 2 1)/36
ave = sum(p*y)
It gave 𝐸[𝑌 ] = 13.833.
To answer the question above: since the expected payoff is positive it looks like a bet worth
taking.

Quiz: If 𝑌 = ℎ(𝑋) does 𝐸[𝑌 ] = ℎ(𝐸[𝑋])? Solution: NO!!! This is not true in general!
Think: Is it true in the previous example?
Quiz: If 𝑌 = 3𝑋 + 77 does 𝐸[𝑌 ] = 3𝐸[𝑋] + 77?
Solution: Yes. By property (2), scaling and shifting does behave like this.

2.4 Proofs of the algebraic properties of 𝐸[𝑋]

We finish by proving the algebraic properties of 𝐸[𝑋].


1. For random variables 𝑋 and 𝑌 on a sample space Ω: 𝐸[𝑋 + 𝑌 ] = 𝐸[𝑋] + 𝐸[𝑌 ]>
2. For constants 𝑎, 𝑏 and random variable 𝑋: 𝐸[𝑎𝑋 + 𝑏] = 𝑎𝐸[𝑋] + 𝑏.
The proof of Property (1) is simple, but there is some subtlety in even understanding what
it means to add two random variables. Recall that the value of random variable is a number
determined by the outcome of an experiment. To add 𝑋 and 𝑌 means to add the values of
𝑋 and 𝑌 for the same outcome. In table form this looks like:
outcome 𝜔: 𝜔1 𝜔2 𝜔3 … 𝜔𝑛
value of 𝑋: 𝑥1 𝑥2 𝑥3 … 𝑥𝑛
value of 𝑌 : 𝑦1 𝑦2 𝑦3 … 𝑦𝑛
value of 𝑋 + 𝑌 : 𝑥1 + 𝑦1 𝑥2 + 𝑦2 𝑥3 + 𝑦3 … 𝑥𝑛 + 𝑦𝑛
prob. 𝑃 (𝜔): 𝑃 (𝜔1 ) 𝑃 (𝜔2 ) 𝑃 (𝜔3 ) … 𝑃 (𝜔𝑛 )
The proof of (1) follows immediately:

𝐸[𝑋 + 𝑌 ] = ∑(𝑥𝑖 + 𝑦𝑖 )𝑃 (𝜔𝑖 ) = ∑ 𝑥𝑖 𝑃 (𝜔𝑖 ) + ∑ 𝑦𝑖 𝑃 (𝜔𝑖 ) = 𝐸[𝑋] + 𝐸[𝑌 ].

The proof of Property (2) only takes one line.

𝐸[𝑎𝑋 + 𝑏] = ∑ 𝑝(𝑥𝑖 )(𝑎𝑥𝑖 + 𝑏) = 𝑎 ∑ 𝑝(𝑥𝑖 )𝑥𝑖 + 𝑏 ∑ 𝑝(𝑥𝑖 ) = 𝑎𝐸[𝑋] + 𝑏.

The 𝑏 term in the last expression follows because ∑ 𝑝(𝑥𝑖 ) = 1.


MIT OpenCourseWare
https://ocw.mit.edu

18.05 Introduction to Probability and Statistics


Spring 2022

For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms.

You might also like