0% found this document useful (0 votes)

326 views

Jupyter Notebook Viewer

The document is a summary of a Jupyter notebook that introduces probability theory concepts using Python implementations. It discusses Pierre-Simon Laplace's classical definition of probability, defines key probability terms like experiment, outcome, sample space, and event. It then provides code to calculate probability and works through examples like the probability of rolling an even number on a die and problems involving drawing balls from an urn containing different numbers of colored balls.

Uploaded by

Thomas Amleth Bottini

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

326 views

Jupyter Notebook Viewer

Uploaded by

Thomas Amleth Bottini

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Jupyter Notebook Viewer http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability...

Peter Norvig, 12 Feb 2016

A Concrete Introduction to Probability (using

Python)
This notebook covers the basics of probability theory, with Python 3 implementations. (You
should have some background in probability (http://www.dartmouth.edu/~chance/teaching_aids
/books_articles/probability_book/pdf.html) and Python (https://www.python.org/about
/gettingstarted/).)

In 1814, Pierre-Simon Laplace wrote (https://en.wikipedia.org

/wiki/Classical_definition_of_probability):

Probability ... is thus simply a fraction whose numerator is the number of

favorable cases and whose denominator is the number of all the cases possible
... when nothing leads us to expect that any one of these cases should occur
more than any other.

Pierre-Simon Laplace (https://en.wikipedia.org/wiki/Pierre-Simon_Laplace)

1814

Laplace really nailed it, way back then! If you want to untangle a probability problem, all you
have to do is be methodical about defining exactly what the cases are, and then careful in
counting the number of favorable and total cases. We'll start being methodical by defining some
vocabulary:

Experiment (https://en.wikipedia.org/wiki/Experiment_(probability_theory%29): An
occurrence with an uncertain outcome that we can observe.
For example, rolling a die.
Outcome (https://en.wikipedia.org/wiki/Outcome_(probability%29): The result of an
experiment; one particular state of the world. What Laplace calls a "case."
For example: 4 .
Sample Space (https://en.wikipedia.org/wiki/Sample_space): The set of all possible
outcomes for the experiment.
For example, {1, 2, 3, 4, 5, 6} .
Event (https://en.wikipedia.org/wiki/Event_(probability_theory%29): A subset of
possible outcomes that together have some property we are interested in.
For example, the event "even die roll" is the set of outcomes {2, 4, 6} .
Probability (https://en.wikipedia.org/wiki/Probability_theory): As Laplace said, the

1 of 32 5/19/18, 3:34 PM
Jupyter Notebook Viewer http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability...

Code for P
P is the traditional name for the Probability function:

In [1]: from fractions import Fraction

def P(event, space):

"The probability of an event, given a sample space of equiprobable outcomes."
return Fraction(len(event & space),
len(space))

Read this as implementing Laplace's quote directly: "Probability is thus simply a fraction whose
numerator is the number of favorable cases and whose denominator is the number of all the
cases possible."

Warm-up Problem: Die Roll

What's the probability of rolling an even number with a single six-sided fair die?

We can define the sample space D and the event even , and compute the probability:

In [2]: D = {1, 2, 3, 4, 5, 6}
even = { 2, 4, 6}

P(even, D)

Out[2]: Fraction(1, 2)

It is good to confirm what we already knew.

You may ask: Why does the definition of P use len(event & space) rather than
len(event) ? Because I don't want to count outcomes that were specified in event but
aren't actually in the sample space. Consider:

In [3]: even = {2, 4, 6, 8, 10, 12}

P(even, D)

Out[3]: Fraction(1, 2)

Here, len(event) and len(space) are both 6, so if just divided, then P would be 1,
which is not right. The favorable cases are the intersection of the event and the space, which in
Python is (event & space) . Also note that I use Fraction rather than regular division
because I want exact answers like 1/3, not 0.3333333333333333.

2 of 32 5/19/18, 3:34 PM
Jupyter Notebook Viewer http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability...

Urn Problems
Around 1700, Jacob Bernoulli wrote about removing colored balls from an urn in his landmark
treatise Ars Conjectandi (https://en.wikipedia.org/wiki/Ars_Conjectandi), and ever since then,
explanations of probability have relied on urn problems (https://www.google.com
/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=probability%20ball%20urn).
(You'd think the urns would be empty by now.)

Jacob Bernoulli (https://en.wikipedia.org/wiki/Jacob_Bernoulli)

1700

For example, here is a three-part problem adapted (http://mathforum.org/library/drmath

/view/69151.html) from mathforum.org:

An urn contains 23 balls: 8 white, 6 blue, and 9 red. We select six balls at
random (each possible selection is equally likely). What is the probability of each
of these possible outcomes:

1. all balls are red

2. 3 are blue, 2 are white, and 1 is red
3. exactly 4 balls are white

So, an outcome is a set of 6 balls, and the sample space is the set of all possible 6 ball
combinations. We'll solve each of the 3 parts using our P function, and also using basic
arithmetic; that is, counting. Counting is a bit tricky because:

We have multiple balls of the same color.

An outcome is a set of balls, where order doesn't matter, not a sequence, where order
matters.

To account for the first issue, I'll have 8 diﬀerent white balls labelled 'W1' through 'W8' ,
rather than having eight balls all labelled 'W' . That makes it clear that selecting 'W1' is
diﬀerent from selecting 'W2' .

The second issue is handled automatically by the P function, but if I want to do calculations by
hand, I will sometimes first count the number of permutations of balls, then get the number of
combinations by dividing the number of permutations by c!, where c is the number of balls in a
combination. For example, if I want to choose 2 white balls from the 8 available, there are 8
ways to choose a first white ball and 7 ways to choose a second, and therefore 8 × 7 = 56
permutations of two white balls. But there are only 56 / 2 = 28 combinations, because (W1,
W2) is the same combination as (W2, W1) .

We'll start by defining the contents of the urn:

3 of 32 5/19/18, 3:34 PM
Jupyter Notebook Viewer http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability...

In [4]: def cross(A, B):

"The set of ways of concatenating one item from collection A with one from B."
return {a + b
for a in A for b in B}

urn = cross('W', '12345678') | cross('B', '123456') | cross('R', '123456789'

urn

Out[4]: {'B1',
'B2',
'B3',
'B4',
'B5',
'B6',
'R1',
'R2',
'R3',
'R4',
'R5',
'R6',
'R7',
'R8',
'R9',
'W1',
'W2',
'W3',
'W4',
'W5',
'W6',
'W7',
'W8'}

In [5]: len(urn)

Out[5]: 23

Now we can define the sample space, U6 , as the set of all 6-ball combinations. We use
itertools.combinations to generate the combinations, and then join each combination
into a string:

In [6]: import itertools

def combos(items, n):

"All combinations of n items; each combo as a concatenated str."
return {' '.join(combo)
for combo in itertools.combinations(items, n)}

U6 = combos(urn, 6)

len(U6)

Out[6]: 100947

I don't want to print all 100,947 members of the sample space; let's just peek at a random
sample of them:

4 of 32 5/19/18, 3:34 PM
Jupyter Notebook Viewer http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability...

In [7]: import random

random.sample(U6, 10)

Out[7]: ['R7 W3 B1 R4 B3 W2',

'R7 R3 B1 W4 R5 W6',
'B5 B4 B6 W1 R3 R2',
'W3 B1 R4 W8 R9 W5',
'W7 B6 W1 R3 R9 B3',
'W3 W1 R8 R2 R1 W5',
'B5 R7 R4 W8 R2 W6',
'W7 W1 R3 W4 R5 B3',
'B4 R7 B2 B1 W6 W2',
'W7 B1 W4 W8 W6 W2']

Is 100,947 really the right number of ways of choosing 6 out of 23 items, or "23 choose 6", as
mathematicians call it (https://en.wikipedia.org/wiki/Combination)? Well, we can choose any of
23 for the first item, any of 22 for the second, and so on down to 18 for the sixth. But we don't
care about the ordering of the six items, so we divide the product by 6! (the number of
permutations of 6 things) giving us:

23 ⋅ 22 ⋅ 21 ⋅ 20 ⋅ 19 ⋅ 18
23 choose 6 = = 100947
6!
Note that 23 ⋅ 22 ⋅ 21 ⋅ 20 ⋅ 19 ⋅ 18 = 23! / 17!, so, generalizing, we can write:

n!
n choose c =
(n − c)! ⋅ c!
And we can translate that to code and verify that 23 choose 6 is 100,947:

In [8]: from math import factorial

def choose(n, c):

"Number of ways to choose c items from a list of n items."
return factorial(n) // (factorial(n - c) * factorial(c))

In [9]: choose(23, 6)

Out[9]: 100947

Now we're ready to answer the 4 problems:

Urn Problem 1: what's the probability of selecting 6 red balls?

In [10]: red6 = {s for s in U6 if s.count('R') == 6}

P(red6, U6)

Out[10]: Fraction(4, 4807)

Let's investigate a bit more. How many ways of getting 6 red balls are there?

In [11]: len(red6)

Out[11]: 84

5 of 32 5/19/18, 3:34 PM
Jupyter Notebook Viewer http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability...

Why are there 84 ways? Because there are 9 red balls in the urn, and we are asking how many
ways we can choose 6 of them:

In [12]: choose(9, 6)

Out[12]: 84

So the probabilty of 6 red balls is then just 9 choose 6 divided by the size of the sample space:

In [13]: P(red6, U6) == Fraction(choose(9, 6),

len(U6))

Out[13]: True

Urn Problem 2: what is the probability of 3 blue, 2 white, and 1

red?

In [14]: b3w2r1 = {s for s in U6 if

s.count('B') == 3 and s.count('W') == 2 and s.count('R') == 1}

P(b3w2r1, U6)

Out[14]: Fraction(240, 4807)

We can get the same answer by counting how many ways we can choose 3 out of 6 blues, 2 out
of 8 whites, and 1 out of 9 reds, and dividing by the number of possible selections:

In [15]: P(b3w2r1, U6) == Fraction(choose(6, 3) * choose(8, 2) * choose(9, 1),

len(U6))

Out[15]: True

Here we don't need to divide by any factorials, because choose has already accounted for
that.

We can get the same answer by figuring: "there are 6 ways to pick the first blue, 5 ways to pick
the second blue, and 4 ways to pick the third; then 8 ways to pick the first white and 7 to pick
the second; then 9 ways to pick a red. But the order 'B1, B2, B3' should count as the
same as 'B2, B3, B1' and all the other orderings; so divide by 3! to account for the
permutations of blues, by 2! to account for the permutations of whites, and by 100947 to get a
probability:

In [16]: P(b3w2r1, U6) == Fraction((6 * 5 * 4) * (8 * 7) * 9,

factorial(3) * factorial(2) * len(U6))

Out[16]: True

Urn Problem 3: What is the probability of exactly 4 white balls?

We can interpret this as choosing 4 out of the 8 white balls, and 2 out of the 15 non-white balls.
Then we can solve it the same three ways:

6 of 32 5/19/18, 3:34 PM
Jupyter Notebook Viewer http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability...

In [17]: w4 = {s for s in U6 if
s.count('W') == 4}

P(w4, U6)

Out[17]: Fraction(350, 4807)

In [18]: P(w4, U6) == Fraction(choose(8, 4) * choose(15, 2),

len(U6))

Out[18]: True

In [19]: P(w4, U6) == Fraction((8 * 7 * 6 * 5) * (15 * 14),

factorial(4) * factorial(2) * len(U6))

Out[19]: True

Revised Version of P , with more general

events
To calculate the probability of an even die roll, I originally said

even = {2, 4, 6}

But that's inelegant—I had to explicitly enumerate all the even numbers from one to six. If I ever
wanted to deal with a twelve or twenty-sided die, I would have to go back and change even . I
would prefer to define even once and for all like this:

In [20]: def even(n): return n % 2 == 0

Now in order to make P(even, D) work, I'll have to modify P to accept an event as either a
set of outcomes (as before), or a predicate over outcomes—a function that returns true for an
outcome that is in the event:

In [21]: def P(event, space):

"""The probability of an event, given a sample space of equiprobable outcomes.
event can be either a set of outcomes, or a predicate (true for outcomes in the ev
if is_predicate(event):
event = such_that(event, space)
return Fraction(len(event & space), len(space))

is_predicate = callable

def such_that(predicate, collection):

"The subset of elements in the collection for which the predicate is true."
return {e for e in collection if predicate(e)}

Here we see how such_that , the new even predicate, and the new P work:

In [22]: such_that(even, D)

Out[22]: {2, 4, 6}

7 of 32 5/19/18, 3:34 PM
Jupyter Notebook Viewer http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability...

In [23]: P(even, D)

Out[23]: Fraction(1, 2)

In [24]: D12 = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}

such_that(even, D12)

Out[24]: {2, 4, 6, 8, 10, 12}

In [25]: P(even, D12)

Out[25]: Fraction(1, 2)

Note: such_that is just like the built-in function filter , except such_that returns a set.

We can now define more interesting events using predicates; for example we can determine the
probability that the sum of a three-dice roll is prime (using a definition of is_prime that is
eﬃcient enough for small n ):

In [26]: D3 = {(d1, d2, d3) for d1 in D for d2 in D for d3 in D}

def prime_sum(outcome): return is_prime(sum(outcome))

def is_prime(n): return n > 1 and not any(n % i == 0 for i in range(2, n))

P(prime_sum, D3)

Out[26]: Fraction(73, 216)

Card Problems
Consider dealing a hand of five playing cards. We can define deck as a set of 52 cards, and
Hands as the sample space of all combinations of 5 cards:

In [27]: suits = 'SHDC'

ranks = 'A23456789TJQK'
deck = cross(ranks, suits)
len(deck)

Out[27]: 52

In [28]: Hands = combos(deck, 5)

assert len(Hands) == choose(52, 5)

random.sample(Hands, 5)

Out[28]: ['AH 6D 5D TS 4H',

'JC AD AH 7S QC',
'6C 7S 3H 9C KH',
'6D 5C QH TH QS',
'6C 3D 5D KH 5S']

Now we can answer questions like the probability of being dealt a flush (5 cards of the same
suit):

8 of 32 5/19/18, 3:34 PM
Jupyter Notebook Viewer http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability...

In [29]: def flush(hand):

return any(hand.count(suit) == 5 for suit in suits)

P(flush, Hands)

Out[29]: Fraction(33, 16660)

Or the probability of four of a kind:

In [30]: def four_kind(hand):

return any(hand.count(rank) == 4 for rank in ranks)

P(four_kind, Hands)

Out[30]: Fraction(1, 4165)

9 of 32 5/19/18, 3:34 PM
Jupyter Notebook Viewer http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability...

Fermat and Pascal: Gambling, Triangles, and the

Birth of Probability

Pierre de Fermat (https://en.wikipedia.org

/wiki/Pierre_de_Fermat)
1654

Blaise Pascal] (https://en.wikipedia.org

/wiki/Blaise_Pascal)
1654

Consider a gambling game consisting of tossing a coin. Player H wins the game if 10 heads
come up, and T wins if 10 tails come up. If the game is interrupted when H has 8 heads and T
has 7 tails, how should the pot of money (which happens to be 100 Francs) be split? In 1654,
Blaise Pascal and Pierre de Fermat corresponded on this problem, with Fermat writing
(http://mathforum.org/isaac/problems/prob1.html):

Dearest Blaise,

As to the problem of how to divide the 100 Francs, I think I have found a
solution that you will find to be fair. Seeing as I needed only two points to win
the game, and you needed 3, I think we can establish that after four more tosses
of the coin, the game would have been over. For, in those four tosses, if you did
not get the necessary 3 points for your victory, this would imply that I had in fact
gained the necessary 2 points for my victory. In a similar manner, if I had not
achieved the necessary 2 points for my victory, this would imply that you had in
fact achieved at least 3 points and had therefore won the game. Thus, I believe
the following list of possible endings to the game is exhaustive. I have denoted
'heads' by an 'h', and tails by a 't.' I have starred the outcomes that indicate a
win for myself.

h h h h * h h h t * h h t h * h h t t *
h t h h * h t h t * h t t h * h t t t
t h h h * t h h t * t h t h * t h t t
t t h h * t t h t t t t h t t t t

I think you will agree that all of these outcomes are equally likely. Thus I believe
that we should divide the stakes by the ration 11:5 in my favor, that is, I should
receive (11/16)*100 = 68.75 Francs, while you should receive 31.25 Francs.

I hope all is well in Paris,

Your friend and colleague,

Pierre

Pascal agreed with this solution, and replied (http://mathforum.org/isaac/problems/prob2.html)

with a generalization that made use of his previous invention, Pascal's Triangle. There's even a
book (https://smile.amazon.com/Unfinished-Game-Pascal-Fermat-Seventeenth-Century
/dp/0465018963?sa-no-redirect=1) about it.

10 of 32 5/19/18, 3:34 PM
Jupyter Notebook Viewer http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability...

In [31]: def win_unfinished_game(Hneeds, Tneeds):

"The probability that H will win the unfinished game, given the number of points n
def Hwins(outcome): return outcome.count('h') >= Hneeds
return P(Hwins, continuations(Hneeds, Tneeds))

def continuations(Hneeds, Tneeds):

"All continuations of a game where H needs `Hneeds` points to win and T needs `Tne
rounds = ['ht' for _ in range(Hneeds + Tneeds - 1)]
return set(itertools.product(*rounds))

In [32]: continuations(2, 3)

Out[32]: {('h', 'h', 'h', 'h'),

('h', 'h', 'h', 't'),
('h', 'h', 't', 'h'),
('h', 'h', 't', 't'),
('h', 't', 'h', 'h'),
('h', 't', 'h', 't'),
('h', 't', 't', 'h'),
('h', 't', 't', 't'),
('t', 'h', 'h', 'h'),
('t', 'h', 'h', 't'),
('t', 'h', 't', 'h'),
('t', 'h', 't', 't'),
('t', 't', 'h', 'h'),
('t', 't', 'h', 't'),
('t', 't', 't', 'h'),
('t', 't', 't', 't')}

In [33]: win_unfinished_game(2, 3)

Out[33]: Fraction(11, 16)

Our answer agrees with Pascal and Fermat; we're in good company!

11 of 32 5/19/18, 3:34 PM
Jupyter Notebook Viewer http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability...

Non-Equiprobable Outcomes: Probability

Distributions
So far, we have made the assumption that every outcome in a sample space is equally likely. In
real life, we often get outcomes that are not equiprobable. For example, the probability of a child
being a girl is not exactly 1/2, and the probability is slightly diﬀerent for a second child. An article
(http://people.kzoo.edu/barth/math105/moreboys.pdf) gives the following counts for two-child
families in Denmark, where GB means a family where the first child is a girl and the second a
boy:

GG: 121801 GB: 126840

BG: 127123 BB: 135138

We will introduce three more definitions:

Frequency (https://en.wikipedia.org/wiki/Frequency_%28statistics%29): a number

describing how often an outcome occurs. Can be a count like 121801, or a ratio like 0.515.

Distribution (http://mathworld.wolfram.com/StatisticalDistribution.html): A mapping from

outcome to frequency for each outcome in a sample space.

Probability Distribution (https://en.wikipedia.org/wiki/Probability_distribution): A distribution

that has been normalized so that the sum of the frequencies is 1.

We define ProbDist to take the same kinds of arguments that dict does: either a mapping
or an iterable of (key, val) pairs, and/or optional keyword arguments.

In [34]: class ProbDist(dict):

"A Probability Distribution; an {outcome: probability} mapping."
def __init__(self, mapping=(), **kwargs):
self.update(mapping, **kwargs)
# Make probabilities sum to 1.0; assert no negative probabilities
total = sum(self.values())
for outcome in self:
self[outcome] = self[outcome] / total
assert self[outcome] >= 0

We also need to modify the functions P and such_that to accept either a sample space or a
probability distribution as the second argument.

12 of 32 5/19/18, 3:34 PM
Jupyter Notebook Viewer http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability...

In [35]: def P(event, space):

"""The probability of an event, given a sample space of equiprobable outcomes.
event: a collection of outcomes, or a predicate that is true of outcomes in the ev
space: a set of outcomes or a probability distribution of {outcome: frequency} pai
if is_predicate(event):
event = such_that(event, space)
if isinstance(space, ProbDist):
return sum(space[o] for o in space if o in event)
else:
return Fraction(len(event & space), len(space))

def such_that(predicate, space):

"""The outcomes in the sample pace for which the predicate is true.
If space is a set, return a subset {outcome,...};
if space is a ProbDist, return a ProbDist {outcome: frequency,...};
in both cases only with outcomes where predicate(element) is true."""
if isinstance(space, ProbDist):
return ProbDist({o:space[o] for o in space if predicate(o)})
else:
return {o for o in space if predicate(o)}

Here is the probability distribution for Danish two-child families:

In [36]: DK = ProbDist(GG=121801, GB=126840,

BG=127123, BB=135138)
DK

Out[36]: {'BB': 0.2645086533229465,

'BG': 0.24882071317004043,
'GB': 0.24826679089140383,
'GG': 0.23840384261560926}

And here are some predicates that will allow us to answer some questions:

In [37]: def first_girl(outcome): return outcome[0] == 'G'

def first_boy(outcome): return outcome[0] == 'B'
def second_girl(outcome): return outcome[1] == 'G'
def second_boy(outcome): return outcome[1] == 'B'
def two_girls(outcome): return outcome == 'GG'

In [38]: P(first_girl, DK)

Out[38]: 0.4866706335070131

In [39]: P(second_girl, DK)

Out[39]: 0.4872245557856497

The above says that the probability of a girl is somewhere between 48% and 49%, but that it is
slightly diﬀerent between the first or second child.

In [40]: P(second_girl, such_that(first_girl, DK)), P(second_girl, such_that(first_boy

Out[40]: (0.4898669165584115, 0.48471942072973107)

In [41]: P(second_boy, such_that(first_girl, DK)), P(second_boy, such_that(first_boy

Out[41]: (0.5101330834415885, 0.5152805792702689)

13 of 32 5/19/18, 3:34 PM
Jupyter Notebook Viewer http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability...

The above says that the sex of the second child is more likely to be the same as the first child,
by about 1/2 a percentage point.

In [42]: bag94 = ProbDist(brown=30, yellow=20, red=20, green=10, orange=10, tan=10

bag96 = ProbDist(blue=24, green=20, orange=16, yellow=14, red=13, brown=13

Next, define MM as the joint distribution—the sample space for picking one M&M from each
bag. The outcome 'yellow green' means that a yellow M&M was selected from the 1994
bag and a green one from the 1996 bag.

14 of 32 5/19/18, 3:34 PM
Jupyter Notebook Viewer http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability...

In [43]: def joint(A, B, sep=''):

"""The joint distribution of two independent probability distributions.
Result is all entries of the form {a+sep+b: P(a)*P(b)}"""
return ProbDist({a + sep + b: A[a] * B[b]
for a in A
for b in B})

MM = joint(bag94, bag96, ' ')

Out[43]: {'brown blue': 0.07199999999999997,

'brown brown': 0.038999999999999986,
'brown green': 0.05999999999999997,
'brown orange': 0.04799999999999998,
'brown red': 0.038999999999999986,
'brown yellow': 0.04199999999999998,
'green blue': 0.02399999999999999,
'green brown': 0.012999999999999996,
'green green': 0.019999999999999993,
'green orange': 0.015999999999999993,
'green red': 0.012999999999999996,
'green yellow': 0.013999999999999995,
'orange blue': 0.02399999999999999,
'orange brown': 0.012999999999999996,
'orange green': 0.019999999999999993,
'orange orange': 0.015999999999999993,
'orange red': 0.012999999999999996,
'orange yellow': 0.013999999999999995,
'red blue': 0.04799999999999998,
'red brown': 0.025999999999999992,
'red green': 0.03999999999999999,
'red orange': 0.03199999999999999,
'red red': 0.025999999999999992,
'red yellow': 0.02799999999999999,
'tan blue': 0.02399999999999999,
'tan brown': 0.012999999999999996,
'tan green': 0.019999999999999993,
'tan orange': 0.015999999999999993,
'tan red': 0.012999999999999996,
'tan yellow': 0.013999999999999995,
'yellow blue': 0.04799999999999998,
'yellow brown': 0.025999999999999992,
'yellow green': 0.03999999999999999,
'yellow orange': 0.03199999999999999,
'yellow red': 0.025999999999999992,
'yellow yellow': 0.02799999999999999}

First we'll look at the "One is yellow and one is green" part:

In [44]: def yellow_and_green(outcome): return 'yellow' in outcome and 'green' in

such_that(yellow_and_green, MM)

Out[44]: {'green yellow': 0.25925925925925924, 'yellow green': 0.7407407407407408}

Now we can answer the question: given that we got a yellow and a green (but don't know which
comes from which bag), what is the probability that the yellow came from the 1994 bag?

15 of 32 5/19/18, 3:34 PM
Jupyter Notebook Viewer http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability...

In [45]: def yellow94(outcome): return outcome.startswith('yellow')

P(yellow94, such_that(yellow_and_green, MM))

Out[45]: 0.7407407407407408

16 of 32 5/19/18, 3:34 PM
Jupyter Notebook Viewer http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability...

So there is a 74% chance that the yellow comes from the 1994 bag.

Answering this question was straightforward: just like all the other probability problems, we
simply create a sample space, and use P to pick out the probability of the event in question,
given what we know about the outcome. But in a sense it is curious that we were able to solve
this problem with the same methodology as the others: this problem comes from a section titled
My favorite Bayes's Theorem Problems, so one would expect that we'd need to invoke Bayes
Theorem to solve it. The computation above shows that that is not necessary.

Rev. Thomas Bayes (https://en.wikipedia.org/wiki/Thomas_Bayes)

1701-1761

Of course, we could solve it using Bayes Theorem. Why is Bayes Theorem recommended?
Because we are asked about the probability of an event given the evidence, which is not
immediately available; however the probability of the evidence given the event is.

Before we see the colors of the M&Ms, there are two hypotheses, A and B , both with equal
probability:

A: first M&M from 94 bag, second from 96 bag

B: first M&M from 96 bag, second from 94 bag
P(A) = P(B) = 0.5

Then we get some evidence:

E: first M&M yellow, second green

We want to know the probability of hypothesis A , given the evidence:

P(A | E)

That's not easy to calculate (except by enumerating the sample space). But Bayes Theorem
says:

P(A | E) = P(E | A) * P(A) / P(E)

The quantities on the right-hand-side are easier to calculate:

P(E | A) = 0.20 * 0.20 = 0.04

P(E | B) = 0.10 * 0.14 = 0.014
P(A) = 0.5
P(B) = 0.5
P(E) = P(E | A) * P(A) + P(E | B) * P(B)
= 0.04 * 0.5 + 0.014 * 0.5 = 0.027

17 of 32 5/19/18, 3:34 PM
Jupyter Notebook Viewer http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability...

Newton's Answer to a Problem by Pepys

Samuel Pepys (https://en.wikipedia.org

Isaac Newton (https://en.wikipedia.org /wiki/Samuel_Pepys)
/wiki/Isaac_Newton) 1693
1693

This paper (http://fermatslibrary.com/s/isaac-newton-as-a-probabilist) explains how Samuel

Pepys wrote to Isaac Newton in 1693 to pose the problem:

Which of the following three propositions has the greatest chance of success?

1. Six fair dice are tossed independently and at least one “6” appears.
2. Twelve fair dice are tossed independently and at least two “6”s appear.
3. Eighteen fair dice are tossed independently and at least three “6”s appear.

Newton was able to answer the question correctly (although his reasoning was not quite right);
let's see how we can do. Since we're only interested in whether a die comes up as "6" or not,
we can define a single die and the joint distribution over n dice as follows:

In [46]: die = ProbDist({'6':1/6, '-':5/6})

def dice(n, die):

"Joint probability from tossing n dice."
if n == 1:
return die
else:
return joint(die, dice(n - 1, die))

In [47]: dice(3, die)

Out[47]: {'---': 0.5787037037037037,

'--6': 0.11574074074074073,
'-6-': 0.11574074074074073,
'-66': 0.023148148148148143,
'6--': 0.11574074074074073,
'6-6': 0.023148148148148143,
'66-': 0.023148148148148143,
'666': 0.0046296296296296285}

Now we are ready to determine which proposition is more likely to have the required number of
sixes:

18 of 32 5/19/18, 3:34 PM
Jupyter Notebook Viewer http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability...

In [48]: def at_least(k, result): return lambda s: s.count(result) >= k

In [49]: P(at_least(1, '6'), dice(6, die))

Out[49]: 0.6651020233196161

In [50]: P(at_least(2, '6'), dice(12, die))

Out[50]: 0.6186673737323009

In [51]: P(at_least(3, '6'), dice(18, die))

Out[51]: 0.5973456859478227

We reach the same conclusion Newton did, that the best chance is rolling six dice.

Simulation
Sometimes it is inconvenient to explicitly define a sample space. Perhaps the sample space is
infinite, or perhaps it is just very large and complicated, and we feel more confident in writing a
program to simulate one pass through all the complications, rather than try to enumerate the
complete sample space. Random sampling from the simulation can give an accurate estimate of
the probability.

Simulating Monopoly

Mr. Monopoly (https://en.wikipedia.org/wiki/Rich_Uncle_Pennybags)

1940—

Consider problem 84 (https://projecteuler.net/problem=84) from the excellent Project Euler

(https://projecteuler.net), which asks for the probability that a player in the game Monopoly ends
a roll on each of the squares on the board. To answer this we need to take into account die rolls,
chance and community chest cards, and going to jail (from the "go to jail" space, from a card, or
from rolling doubles three times in a row). We do not need to take into account anything about
buying or selling properties or exchanging money or winning or losing the game, because these
don't change a player's location. We will assume that a player in jail will always pay to get out of
jail immediately.

A game of Monopoly can go on forever, so the sample space is infinite. But even if we limit the
sample space to say, 1000 rolls, there are 211000 such sequences of rolls (and even more
possibilities when we consider drawing cards). So it is infeasible to explicitly represent the
sample space.

But it is fairly straightforward to implement a simulation and run it for, say, 400,000 rolls (so the
average square will be landed on 10,000 times). Here is the code for a simulation:

19 of 32 5/19/18, 3:34 PM
Jupyter Notebook Viewer http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability...

In [52]: from collections import Counter, deque

import random

# The board: a list of the names of the 40 squares

# As specified by https://projecteuler.net/problem=84
board = """GO A1 CC1 A2 T1 R1 B1 CH1 B2 B3
JAIL C1 U1 C2 C3 R2 D1 CC2 D2 D3
FP E1 CH2 E2 E3 R3 F1 F2 U2 F3
G2J G1 G2 CC3 G3 R4 CH3 H1 T2 H2""".split()

def monopoly(steps):
"""Simulate given number of steps of Monopoly game,
yielding the number of the current square after each step."""
goto(0) # start at GO
CC_deck = Deck('GO JAIL' + 14 * ' ?')
CH_deck = Deck('GO JAIL C1 E3 H2 R1 R R U -3' + 6 * ' ?')
doubles = 0
jail = board.index('JAIL')
for _ in range(steps):
d1, d2 = random.randint(1, 6), random.randint(1, 6)
goto(here + d1 + d2)
doubles = (doubles + 1) if (d1 == d2) else 0
if doubles == 3 or board[here] == 'G2J':
goto(jail)
elif board[here].startswith('CC'):
do_card(CC_deck)
elif board[here].startswith('CH'):
do_card(CH_deck)
yield here

def goto(square):
"Update the global variable 'here' to be square."
global here
here = square % len(board)

def Deck(names):
"Make a shuffled deck of cards, given a space-delimited string."
cards = names.split()
random.shuffle(cards)
return deque(cards)

def do_card(deck):
"Take the top card from deck and do what it says."
global here
card = deck[0] # The top card
deck.rotate(-1) # Move top card to bottom of deck
if card == 'R' or card == 'U':
while not board[here].startswith(card):
goto(here + 1) # Advance to next railroad or utility
elif card == '-3':
goto(here - 3) # Go back 3 spaces
elif card != '?':
goto(board.index(card))# Go to destination named on card

And the results:

In [53]: results = list(monopoly(400000))

I'll show a histogram of the squares, with a dotted red line at the average:

20 of 32 5/19/18, 3:34 PM
Jupyter Notebook Viewer http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability...

In [54]: %matplotlib inline

import matplotlib.pyplot as plt

plt.hist(results, bins=40)
avg = len(results) / 40
plt.plot([0, 39], [avg, avg], 'r--');

Another way to see the results:

21 of 32 5/19/18, 3:34 PM
Jupyter Notebook Viewer http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability...

In [55]: ProbDist(Counter(board[i] for i in results))

Out[55]: {'A1': 0.0211675,

'A2': 0.021105,
'B1': 0.022485,
'B2': 0.0233825,
'B3': 0.0231025,
'C1': 0.02732,
'C2': 0.0240925,
'C3': 0.02433,
'CC1': 0.0187875,
'CC2': 0.0258225,
'CC3': 0.02384,
'CH1': 0.00864,
'CH2': 0.010405,
'CH3': 0.008545,
'D1': 0.028305,
'D2': 0.0294775,
'D3': 0.03062,
'E1': 0.0285925,
'E2': 0.0273425,
'E3': 0.0317775,
'F1': 0.027155,
'F2': 0.02686,
'F3': 0.0261775,
'FP': 0.0289025,
'G1': 0.0269675,
'G2': 0.0258,
'G3': 0.0249025,
'GO': 0.0305975,
'H1': 0.0219575,
'H2': 0.0261525,
'JAIL': 0.0621475,
'R1': 0.03009,
'R2': 0.028755,
'R3': 0.030435,
'R4': 0.024545,
'T1': 0.023715,
'T2': 0.02204,
'U1': 0.0257525,
'U2': 0.0279075}

There is one square far above average: JAIL , at a little over 6%. There are four squares far
below average: the three chance squares, CH1 , CH2 , and CH3 , at around 1% (because 10 of
the 16 chance cards send the player away from the square), and the "Go to Jail" square, square
number 30 on the plot, which has a frequency of 0 because you can't end a turn there. The other
squares are around 2% to 3% each, which you would expect, because 100% / 40 = 2.5%.

22 of 32 5/19/18, 3:34 PM
Jupyter Notebook Viewer http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability...

The Central Limit Theorem / Strength in

Numbers Theorem
So far, we have talked of an outcome as being a single state of the world. But it can be useful to
break that state of the world down into components. We call these components random
variables. For example, when we consider an experiment in which we roll two dice and observe
their sum, we could model the situation with two random variables, one for each die. (Our
representation of outcomes has been doing that implicitly all along, when we concatenate two
parts of a string, but the concept of a random variable makes it oﬃcial.)

The Central Limit Theorem states that if you have a collection of random variables and sum
them up, then the larger the collection, the closer the sum will be to a normal distribution (also
called a Gaussian distribution or a bell-shaped curve). The theorem applies in all but a few
pathological cases.

As an example, let's take 5 random variables reprsenting the per-game scores of 5 basketball
players, and then sum them together to form the team score. Each random variable/player is
represented as a function; calling the function returns a single sample from the distribution:

In [56]: from random import gauss, triangular, choice, vonmisesvariate, uniform

def SC(): return posint(gauss(15.1, 3) + 3 * triangular(1, 4, 13)) # 30.1

def KT(): return posint(gauss(10.2, 3) + 3 * triangular(1, 3.5, 9)) # 22.1
def DG(): return posint(vonmisesvariate(30, 2) * 3.08) # 14.0
def HB(): return posint(gauss(6.7, 1.5) if choice((True, False)) else gauss
def OT(): return posint(triangular(5, 17, 25) + uniform(0, 30) + gauss(6,

def posint(x): "Positive integer"; return max(0, int(round(x)))

And here is a function to sample a random variable k times, show a histogram of the results, and
return the mean:

In [57]: from statistics import mean

def repeated_hist(rv, bins=10, k=100000):

"Repeat rv() k times and make a histogram of the results."
samples = [rv() for _ in range(k)]
plt.hist(samples, bins=bins)
return mean(samples)

The two top-scoring players have scoring distributions that are slightly skewed from normal:

23 of 32 5/19/18, 3:34 PM
Jupyter Notebook Viewer http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability...

In [58]: repeated_hist(SC, bins=range(60))

Out[58]: 30.09618

In [59]: repeated_hist(KT, bins=range(60))

Out[59]: 22.1383

The next two players have bi-modal distributions; some games they score a lot, some games
not:

In [60]: repeated_hist(DG, bins=range(60))

Out[60]: 14.02429

24 of 32 5/19/18, 3:34 PM
Jupyter Notebook Viewer http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability...

In [61]: repeated_hist(HB, bins=range(60))

Out[61]: 11.70888

The fifth "player" (actually the sum of all the other players on the team) looks like this:

In [62]: repeated_hist(OT, bins=range(60))

Out[62]: 36.31564

Now we define the team score to be the sum of the five players, and look at the distribution:

25 of 32 5/19/18, 3:34 PM
Jupyter Notebook Viewer http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability...

In [63]: def GSW(): return SC() + KT() + DG() + HB() + OT()

repeated_hist(GSW, bins=range(70, 160, 2))

Out[63]: 114.31262

Sure enough, this looks very much like a normal distribution. The Central Limit Theorem appears
to hold in this case. But I have to say "Central Limit" is not a very evocative name, so I propose
we re-name this as the Strength in Numbers Theorem, to indicate the fact that if you have a lot
of numbers, you tend to get the expected result.

Conclusion
We've had an interesting tour and met some giants of the field: Laplace, Bernoulli, Fermat,
Pascal, Bayes, Newton, ... even Mr. Monopoly and The Count.

The Count (https://en.wikipedia.org/wiki/Count_von_Count)

1972—

The conclusion is: be explicit about what the problem says, and then methodical about defining
the sample space, and finally be careful in counting the number of outcomes in the numerator
and denominator. Easy as 1-2-3.

26 of 32 5/19/18, 3:34 PM
Jupyter Notebook Viewer http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability...

Appendix: Continuous Sample Spaces

Everything up to here has been about discrete, finite sample spaces, where we can enumerate
all the possible outcomes.

But I was asked about continuous sample spaces, such as the space of real numbers. The
principles are the same: probability is still the ratio of the favorable cases to all the cases, but
now instead of counting cases, we have to (in general) compute integrals to compare the sizes
of cases. Here we will cover a simple example, which we first solve approximately by simulation,
and then exactly by calculation.

The Hot New Game Show Problem: Simulation

Oliver Roeder posed this problem (http://fivethirtyeight.com/features/can-you-win-this-hot-new-
game-show/) in the 538 Riddler blog:

Two players go on a hot new game show called Higher Number Wins. The two
go into separate booths, and each presses a button, and a random number
between zero and one appears on a screen. (At this point, neither knows the
other’s number, but they do know the numbers are chosen from a standard
uniform distribution.) They can choose to keep that first number, or to press the
button again to discard the first number and get a second random number,
which they must keep. Then, they come out of their booths and see the final
number for each player on the wall. The lavish grand prize — a case full of gold
bullion — is awarded to the player who kept the higher number. Which number
is the optimal cutoﬀ for players to discard their first number and choose
another? Put another way, within which range should they choose to keep the
first number, and within which range should they reject it and try their luck with a
second number?

We'll use this notation:

A, B: the two players.

A, B: the cutoﬀ values they choose: the lower bound of the range of first numbers they will
accept.
a, b: the actual random numbers that appear on the screen.

For example, if player A chooses a cutoff of A = 0.6, that means that A would accept any first
number greater than 0.6, and reject any number below that cutoff. The question is: What cutoff,
A, should player A choose to maximize the chance of winning, that is, maximize P(a > b)?

First, simulate the number that a player with a given cutoﬀ gets (note that random.random()
returns a float sampled uniformly from the interval [0..1]):

In [64]: def number(cutoff):

"Play the game with given cutoff, returning the first or second random number."
first = random.random()
return first if first > cutoff else random.random()

In [65]: number(.5)

Out[65]: 0.643051044503982

27 of 32 5/19/18, 3:34 PM
Jupyter Notebook Viewer http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability...

Now compare the numbers returned with a cutoff of A versus a cutoff of B, and repeat for a large
number of trials; this gives us an estimate of the probability that cutoff A is better than cutoff B:

In [66]: def Pwin(A, B, trials=30000):

"The probability that cutoff A wins against cutoff B."
Awins = sum(number(A) > number(B)
for _ in range(trials))
return Awins / trials

In [67]: Pwin(.5, .6)

Out[67]: 0.49946666666666667

Now define a function, top , that considers a collection of possible cutoffs, estimate the
probability for each cutoff playing against each other cutoff, and returns a list with the N top
cutoffs (the ones that defeated the most number of opponent cutoffs), and the number of
opponents they defeat:

In [ ]: def top(N, cutoffs):

"Return the N best cutoffs and the number of opponent cutoffs they beat."
winners = Counter(A if Pwin(A, B) > 0.5 else B
for (A, B) in itertools.combinations(cutoffs, 2))
return winners.most_common(N)

In [ ]: from numpy import arange

%time top(5, arange(0.50, 0.99, 0.01))

28 of 32 5/19/18, 3:34 PM
Jupyter Notebook Viewer http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability...

We get a good idea of the top cutoﬀs, but they are close to each other, so we can't quite be sure
which is best, only that the best is somewhere around 0.60. We could get a better estimate by
increasing the number of trials, but that would consume more time.

The Hot New Game Show Problem: Exact

Calculation
More promising is the possibility of making Pwin(A, B) an exact calculation. But before we
get to Pwin(A, B) , let's solve a simpler problem: assume that both players A and B have
chosen a cutoff, and have each received a number above the cutoff. What is the probability that
A gets the higher number? We'll call this Phigher(A, B) . We can think of this as a two-
dimensional sample space of points in the (a, b) plane, where a ranges from the cutoff A to 1 and
b ranges from the cutoff B to 1. Here is a diagram of that two-dimensional sample space, with
the cutoffs A=0.5 and B=0.6:

The total area of the sample space is 0.5 × 0.4 = 0.20, and in general it is (1 - A) · (1 - B). What
about the favorable cases, where A beats B? That corresponds to the shaded triangle below:

The area of a triangle is 1/2 the base times the height, or in this case, 0.42 / 2 = 0.08, and in
general, (1 - B)2 / 2. So in general we have:

Phigher(A, B) = favorable / total

favorable = ((1 - B) ** 2) / 2
total = (1 - A) * (1 - B)
Phigher(A, B) = (((1 - B) ** 2) / 2) / ((1 - A) * (1 - B))

29 of 32 5/19/18, 3:34 PM
Jupyter Notebook Viewer http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability...

In [ ]: def Phigher(A, B):

"Probability that a sample from [A..1] is higher than one from [B..1]."
if A <= B:
return (1 - B) / (2 * (1 - A))
else:
return 1 - Phigher(B, A)

In [ ]: Phigher(0.5, 0.6)

We're now ready to tackle the full game. There are four cases to consider, depending on whether
A and B gets a first number that is above or below their cutoﬀ choices:

first a first b P(a, b) P(A wins | a, b) Comment

a>A b>B (1 - A) · (1 - B) Phigher(A, B) Both above cutoﬀ; both keep first numbers

a<A b<B A·B Phigher(0, 0) Both below cutoﬀ, both get new numbers from [0..1]

a>A b<B (1 - A) · B Phigher(A, 0) A keeps number; B gets new number from [0..1]

a<A b>B A · (1 - B) Phigher(0, B) A gets new number from [0..1]; B keeps number

For example, the first row of this table says that the event of both first numbers being above
their respective cutoﬀs has probability (1 - A) · (1 - B), and if this does occur, then the probability
of A winning is Phigher(A, B). We're ready to replace the old simulation-based Pwin with a new
calculation-based version:

In [ ]: def Pwin(A, B):

"With what probability does cutoff A win against cutoff B?"
return ((1-A) * (1-B) * Phigher(A, B) # both above cutoff
+ A * B * Phigher(0, 0) # both below cutoff
+ (1-A) * B * Phigher(A, 0) # A above, B below
+ A * (1-B) * Phigher(0, B)) # A below, B above

That was a lot of algebra. Let's define a few tests to check for obvious errors:

In [ ]: def test():

assert Phigher(0.5, 0.5) == Phigher(0.7, 0.7) == Phigher(0, 0) == 0.5
assert Pwin(0.5, 0.5) == Pwin(0.7, 0.7) == 0.5
assert Phigher(.6, .5) == 0.6
assert Phigher(.5, .6) == 0.4
return 'ok'

test()

Let's repeat the calculation with our new, exact Pwin :

In [ ]: top(5, arange(0.50, 0.99, 0.01))

It is good to see that the simulation and the exact calculation are in rough agreement; that gives
me more confidence in both of them. We see here that 0.62 defeats all the other cutoﬀs, and
0.61 defeats all cutoﬀs except 0.62. The great thing about the exact calculation code is that it
runs fast, regardless of how much accuracy we want. We can zero in on the range around 0.6:

In [ ]: top(10, arange(0.500, 0.700, 0.001))

30 of 32 5/19/18, 3:34 PM
Jupyter Notebook Viewer http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability...

This says 0.618 is best, better than 0.620. We can get even more accuracy:

In [ ]: top(5, arange(0.61700, 0.61900, 0.00001))

So 0.61803 is best. Does that number look familiar (https://en.wikipedia.org/wiki/Golden_ratio)?

Can you prove that it is what I think it is?

To understand the strategic possibilities, it is helpful to draw a 3D plot of Pwin(A, B) for

values of A and B between 0 and 1:

In [ ]: import numpy as np

from mpl_toolkits.mplot3d.axes3d import Axes3D

def map2(fn, A, B):

"Map fn to corresponding elements of 2D arrays A and B."
return [list(map(fn, Arow, Brow))
for (Arow, Brow) in zip(A, B)]

cutoffs = arange(0.00, 1.00, 0.02)

A, B = np.meshgrid(cutoffs, cutoffs)

fig = plt.figure(figsize=(10,10))
ax = fig.add_subplot(1, 1, 1, projection='3d')
ax.set_xlabel('A')
ax.set_ylabel('B')
ax.set_zlabel('Pwin(A, B)')
ax.plot_surface(A, B, map2(Pwin, A, B));

What does this Pringle of Probability (http://fivethirtyeight.com/features/should-you-shoot-free-

throws-underhand/) show us? The highest win percentage for A, the peak of the surface, occurs
when A is around 0.5 and B is 0 or 1. We can confirm that, finding the maximum Pwin(A, B)
for many diﬀerent cutoﬀ values of A and B :

In [ ]: cutoffs = (set(arange(0.00, 1.00, 0.01)) |

set(arange(0.500, 0.700, 0.001)) |
set(arange(0.61700, 0.61900, 0.00001)))

In [ ]: max([Pwin(A, B), A, B]

for A in cutoffs for B in cutoffs)

So A could win 62.5% of the time if only B would chose a cutoff of 0. But, unfortunately for A, a
rational player B is not going to do that. We can ask what happens if the game is changed so
that player A has to declare a cutoff first, and then player B gets to respond with a cutoff, with
full knowledge of A's choice. In other words, what cutoff should A choose to maximize
Pwin(A, B) , given that B is going to take that knowledge and pick a cutoff that minimizes
Pwin(A, B) ?

In [ ]: max(min([Pwin(A, B), A, B] for B in cutoffs)

for A in cutoffs)

And what if we run it the other way around, where B chooses a cutoﬀ first, and then A
responds?

31 of 32 5/19/18, 3:34 PM
Jupyter Notebook Viewer http://nbviewer.jupyter.org/url/norvig.com/ipython/Probability...

In [ ]: min(max([Pwin(A, B), A, B] for A in cutoffs)

for B in cutoffs)

In both cases, the rational choice for both players in a cutoﬀ of 0.61803, which corresponds to
the "saddle point" in the middle of the plot. This is a stable equilibrium; consider fixing B =
0.61803, and notice that if A changes to any other value, we slip oﬀ the saddle to the right or
left, resulting in a worse win probability for A. Similarly, if we fix A = 0.61803, then if B changes
to another value, we ride up the saddle to a higher win percentage for A, which is worse for B.
So neither player will want to move from the saddle point.

The moral for continuous spaces is the same as for discrete spaces: be careful about defining
your space; count/measure carefully, and let your code take care of the rest.

32 of 32 5/19/18, 3:34 PM

Answers To Problems For A Course in Real Analysis by Hugo Junghenn
No ratings yet
Answers To Problems For A Course in Real Analysis by Hugo Junghenn
10 pages
Notes On Real Analysis: Lee Larson March 26, 2012
0% (1)
Notes On Real Analysis: Lee Larson March 26, 2012
5 pages
Some Problems From Tre Fe Ten
No ratings yet
Some Problems From Tre Fe Ten
3 pages
(Lecture Notes) Andrei Jorza-Math 5c - Introduction To Abstract Algebra, Spring 2012-2013 - Solutions To Some Problems in Dummit & Foote (2013)
No ratings yet
(Lecture Notes) Andrei Jorza-Math 5c - Introduction To Abstract Algebra, Spring 2012-2013 - Solutions To Some Problems in Dummit & Foote (2013)
30 pages
S11MTH3175GroupThFinalSolAI PDF
No ratings yet
S11MTH3175GroupThFinalSolAI PDF
13 pages
Complex Analysis For Mathematics and Engineering 5th Edition PDF
100% (8)
Complex Analysis For Mathematics and Engineering 5th Edition PDF
651 pages
(ADVANCE ABSTRACT ALGEBRA) Pankaj Kumar and Nawneet Hooda
100% (1)
(ADVANCE ABSTRACT ALGEBRA) Pankaj Kumar and Nawneet Hooda
82 pages
Analytic Phase Research
No ratings yet
Analytic Phase Research
5 pages
八週團體運動介入對銀髮族幸福感及功能性體適能之提昇效益 Unlocked
No ratings yet
八週團體運動介入對銀髮族幸福感及功能性體適能之提昇效益 Unlocked
90 pages
Lectures on the Coupling Method
From Everand
Lectures on the Coupling Method
Torgny Lindvall
No ratings yet
Complex Analysis Practicals
No ratings yet
Complex Analysis Practicals
15 pages
Probability Answers PDF
No ratings yet
Probability Answers PDF
87 pages
Answers To Problems For Combinatorial Mathematics by Douglas West
No ratings yet
Answers To Problems For Combinatorial Mathematics by Douglas West
25 pages
Proofs of Quadratic Reciprocity
No ratings yet
Proofs of Quadratic Reciprocity
9 pages
Probability Theory
No ratings yet
Probability Theory
68 pages
Tom M. Apostol-Calculus, Volume II - Multi-Variable Calculus and Linear Algebra, With Applications To Differential Equations and Probabil - K2opt
No ratings yet
Tom M. Apostol-Calculus, Volume II - Multi-Variable Calculus and Linear Algebra, With Applications To Differential Equations and Probabil - K2opt
94 pages
Examples of Proofs
No ratings yet
Examples of Proofs
4 pages
Proofsss
No ratings yet
Proofsss
30 pages
Notes
No ratings yet
Notes
422 pages
Notes On Introductory Combinatorics Robert Tarjan
No ratings yet
Notes On Introductory Combinatorics Robert Tarjan
124 pages
Week 13
No ratings yet
Week 13
17 pages
Madhava MC Paper 12
No ratings yet
Madhava MC Paper 12
2 pages
Full Download Geometric Function Theory Explorations in Complex Analysis Cornerstones 1st Edition Steven G. Krantz PDF DOCX
100% (6)
Full Download Geometric Function Theory Explorations in Complex Analysis Cornerstones 1st Edition Steven G. Krantz PDF DOCX
50 pages
Burkill A Second Course in Mathematical Analysis PDF
No ratings yet
Burkill A Second Course in Mathematical Analysis PDF
268 pages
Enumerative Combinatorics
No ratings yet
Enumerative Combinatorics
505 pages
MAT401-Lecture 1
100% (1)
MAT401-Lecture 1
12 pages
Math Research Papers
100% (2)
Math Research Papers
140 pages
The Riemann and Lebesgue Integrals
100% (1)
The Riemann and Lebesgue Integrals
14 pages
Lecture Note 2033 Full Version
No ratings yet
Lecture Note 2033 Full Version
43 pages
Cnditional Probability
No ratings yet
Cnditional Probability
5 pages
Princeton MAT 104 Fall 2014 Assignment List
No ratings yet
Princeton MAT 104 Fall 2014 Assignment List
3 pages
Lecture NotesLecture Notes For Introductory Probability
No ratings yet
Lecture NotesLecture Notes For Introductory Probability
218 pages
ComplexAnalysis Notes
100% (2)
ComplexAnalysis Notes
119 pages
MIR Vilenkin N Combinatorial Mathematics For Recreation 1972 PDF
No ratings yet
MIR Vilenkin N Combinatorial Mathematics For Recreation 1972 PDF
205 pages
(Golden Maths Series) N.P. Bali-Real Analysis-FirlNknlknewall Media (2005)
100% (1)
(Golden Maths Series) N.P. Bali-Real Analysis-FirlNknlknewall Media (2005)
420 pages
Math 55a
No ratings yet
Math 55a
68 pages
Gabriel Klambauer Problems and Propositions in Analysis Lecture Notes in Pure and Applied Mathematics 1979 PDF
No ratings yet
Gabriel Klambauer Problems and Propositions in Analysis Lecture Notes in Pure and Applied Mathematics 1979 PDF
475 pages
Maths (Organizer)
No ratings yet
Maths (Organizer)
68 pages
KVPY Report: Analytic Number Theory
No ratings yet
KVPY Report: Analytic Number Theory
8 pages
Linear Algebra (Mir, 1983)
No ratings yet
Linear Algebra (Mir, 1983)
393 pages
PART I: Ordinary Differential Equations
No ratings yet
PART I: Ordinary Differential Equations
5 pages
Honor Calculus Min Yan
100% (1)
Honor Calculus Min Yan
632 pages
Concise Mathematics Enciclpedy
No ratings yet
Concise Mathematics Enciclpedy
3,236 pages
Proof of the ABC Conjecture by P S Thul
No ratings yet
Proof of the ABC Conjecture by P S Thul
92 pages
Some Properties of The Lozinskii Logarithmic Norm: Ordinary Differential Equations
No ratings yet
Some Properties of The Lozinskii Logarithmic Norm: Ordinary Differential Equations
10 pages
MA2401 Lecture Notes
No ratings yet
MA2401 Lecture Notes
270 pages
Bartle
67% (3)
Bartle
11 pages
An Elementary Treatise On Differential E
100% (1)
An Elementary Treatise On Differential E
301 pages
4 Countability Axioms: Definition 4.1. Let
100% (1)
4 Countability Axioms: Definition 4.1. Let
13 pages
Mathematics - I (MATH F111)
No ratings yet
Mathematics - I (MATH F111)
70 pages
Twelvefold Way: 1 2 Viewpoints
No ratings yet
Twelvefold Way: 1 2 Viewpoints
10 pages
Open Mapping Theorem (Complex Analysis) PDF
No ratings yet
Open Mapping Theorem (Complex Analysis) PDF
2 pages
Introduction To Real Analysis 3rd Edition Manfred Stoll pdf download
100% (1)
Introduction To Real Analysis 3rd Edition Manfred Stoll pdf download
79 pages
1 S5 PDF
No ratings yet
1 S5 PDF
85 pages
(Ch4) - Interpolation-Material
No ratings yet
(Ch4) - Interpolation-Material
59 pages
RG Overview of Complex Analysis and Applications
No ratings yet
RG Overview of Complex Analysis and Applications
8 pages
Basic Methods of Linear Functional Analysis
From Everand
Basic Methods of Linear Functional Analysis
John D. Pryce
No ratings yet
A Survey of Minimal Surfaces
From Everand
A Survey of Minimal Surfaces
Robert Osserman
3.5/5 (1)
A Short Course in Automorphic Functions
From Everand
A Short Course in Automorphic Functions
Joseph Lehner
No ratings yet
Analytic Functions
From Everand
Analytic Functions
M.A. Evgrafov
No ratings yet
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Introductions to Set and Functions
From Everand
Introductions to Set and Functions
Simone Malacrida
No ratings yet
App C4
No ratings yet
App C4
9 pages
A3.2 - Means and Proportions, One and Two Samples
No ratings yet
A3.2 - Means and Proportions, One and Two Samples
4 pages
Sampling Distribution
No ratings yet
Sampling Distribution
13 pages
Homework 1 Solutions
No ratings yet
Homework 1 Solutions
4 pages
Statistics and Probability Quarter 4 Module 1
No ratings yet
Statistics and Probability Quarter 4 Module 1
4 pages
Q3 Module 15
No ratings yet
Q3 Module 15
33 pages
Pengaruh Terapi Bermain Puzzle Terhadap Perkembangan Motorik Halus Pada Anak Pra Sekolah Di TK Inti Gugus Tulip Iii Padang Tahun 2018
No ratings yet
Pengaruh Terapi Bermain Puzzle Terhadap Perkembangan Motorik Halus Pada Anak Pra Sekolah Di TK Inti Gugus Tulip Iii Padang Tahun 2018
7 pages
(Selected Works in Probability and Statistics - Selected Works in Probability and Statistics) Jianqing Fan - Ya'acov Ritov - Chien-Fu Wu - Selected Works of Peter J. Bickel (2013, Springer) PDF
No ratings yet
(Selected Works in Probability and Statistics - Selected Works in Probability and Statistics) Jianqing Fan - Ya'acov Ritov - Chien-Fu Wu - Selected Works of Peter J. Bickel (2013, Springer) PDF
609 pages
Chapter 4 Introduction To Probability 2018 PDF
No ratings yet
Chapter 4 Introduction To Probability 2018 PDF
3 pages
Hypothesis Testing: ECE 3530 (Spring 2010
No ratings yet
Hypothesis Testing: ECE 3530 (Spring 2010
72 pages
STAT232%2802%29
No ratings yet
STAT232%2802%29
2 pages
COA or COQ Fluoride Test
No ratings yet
COA or COQ Fluoride Test
1 page
Basel - Nov 2021 U2
No ratings yet
Basel - Nov 2021 U2
2 pages
Daftar Lampiran Selvy
No ratings yet
Daftar Lampiran Selvy
13 pages
NEA of PT Results
No ratings yet
NEA of PT Results
5 pages
MQM100 MultipleChoice Chapter5
88% (8)
MQM100 MultipleChoice Chapter5
16 pages
ANOVA (Analysis of Variance)
No ratings yet
ANOVA (Analysis of Variance)
5 pages
Chapter 5. Elementary Probability
No ratings yet
Chapter 5. Elementary Probability
11 pages
Thesis 1973D H723b
No ratings yet
Thesis 1973D H723b
85 pages
Testing of Hypothesis - One Sample
No ratings yet
Testing of Hypothesis - One Sample
66 pages
Data Analysis Finals2
No ratings yet
Data Analysis Finals2
5 pages
Data Collection Methods
No ratings yet
Data Collection Methods
55 pages
Population Variance Is Known
No ratings yet
Population Variance Is Known
12 pages
Westinghouse Method of Rating
No ratings yet
Westinghouse Method of Rating
2 pages
Man203 Chapter 2 Simplex Method
No ratings yet
Man203 Chapter 2 Simplex Method
5 pages
Concept Map 9
No ratings yet
Concept Map 9
2 pages
Statistics Drills
No ratings yet
Statistics Drills
5 pages