0% found this document useful (0 votes)
74 views8 pages

Math Prob Slinker Measure Theory Infinite Heads

1) The document discusses the foundational theory of probability and measure theory as it relates to infinite sample spaces. It provides definitions for finite additivity, sigma algebras, and probability for finite sample spaces according to Kolmogorov's axioms. 2) It then gives an example of two probability spaces with the same sample space but different probability measures to illustrate the distinction between formal probability theory and applied probability. 3) The document introduces the concept of independence of events and states that it will examine how measure theory can be used to prove that the probability of an infinite sequence of coin flips all resulting in heads is zero.

Uploaded by

Prez H. Cannady
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views8 pages

Math Prob Slinker Measure Theory Infinite Heads

1) The document discusses the foundational theory of probability and measure theory as it relates to infinite sample spaces. It provides definitions for finite additivity, sigma algebras, and probability for finite sample spaces according to Kolmogorov's axioms. 2) It then gives an example of two probability spaces with the same sample space but different probability measures to illustrate the distinction between formal probability theory and applied probability. 3) The document introduces the concept of independence of events and states that it will examine how measure theory can be used to prove that the probability of an infinite sequence of coin flips all resulting in heads is zero.

Uploaded by

Prez H. Cannady
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

MEASURE THEORY AND THE PROBABILITY OF AN INFINITE SEQUENCE OF HEADS

KENT SLINKER

revised and corrected May 2016

I. Introduction.

The traditional theory of probability seems to have arisen purely from the desire to solve problems related to
gambling. According to the mathematician Tom M. Apostle, it was a gambler’s dispute in 1964 that lead to the
general theory of probability by way of correspondence between Blaise Pascal and Pierre de Fermat concerning a
game of dice, known today as chuck-a-luck, where the question was, What was the probability of the occurrence of
at least one “double six” in 24 throws of the dice?1 We will not treat this problem here, but rather refer the reader
to excellent treatments of the problem found in [4] and [7]. The context of Probability as applied to gambling
by its very nature dealt with problems where the total number of possible outcomes, known as the sample space,
was finite. Eventually questions arose which required dealing with an infinite sample space. This, together with
Cantor’s discovery that the cardinality of the interval (0, 1) was, in some sense, larger than the cardinality of the
natural numbers, required tackling cases where the sample space was uncountably infinite. As both the applied and
theoretical aspects of probability grew in complexity, and mathematicians began to worry more about foundational
issues in the early parts of the 20th century, it became necessary to develop the theory in a formal and rigorous
way. This was achieved largely by the work of the Russian mathematician, A. Kolmogorov, with his Foundations
of Probability Theory, published first in Russian in 1933. Kolmogorov’s approach was axiomatic in nature, and
in time became part of a larger discipline known as measure theory. In this paper we will state the basic axioms
of probability and examine how results from measure theory provide a proof that the probability for an infinite
number of coin tosses which all come up heads is zero.

II. The Axioms of Probability for Finite Samples Spaces.

Before we give the definition of probability for finite sample spaces, we need to define two concepts central to
probability, that of a set function which is finitely additive, and a sigma algebra of sets.

Definition (Finitely Additive Set Function). Let F be a collection of sets, then a set function P : F → R is said
to be finitely additive iff
P : (C ∪ D) = P (C) + P (D)
whenever C and D are disjoint sets in F such that C ∪ D is also in F.
1
[1] pg. 469
1
MEASURE THEORY AND THE PROBABILITY OF AN INFINITE SEQUENCE OF HEADS 2

Simple examples of finitely additive set functions abound. For example, let F be the set of intervals in R of the
form (a, b) and define P : F → R by b − a, then P is a finitely additive set function. The last criteria of the above
definition requires closure of the sets in F under the operation of taking unions, which leads to our next definition
of a sigma algebra of sets.

Definition (Sigma algebra of sets). A non-empty class F of subsets of a given universal set Ω is called a sigma
algebra if for every D and C in F we have

D ∪ C ∈ F and Dc ∈ F

Where Dc = Ω \ D.

There are many consequences that can be deduced fairly easily from this above definition, for instance that
∅ ∈ F, (since F contains at least one element D, then D = D ∪ D ∈ F, so Dc ∈ F, hence ∅ = D ∪ Dc ∈ F.) Note
that the term sigma algebra refers to a non-empty class of subsets of a given universal set that possesses the above
properties, rather than special operations defined on sets. It should be clear that if F consists of all subsets of some
non-empty Universal set S, then F is a sigma algebra.

Definition (Probability for Finite Sample Spaces). Let F denote a sigma algebra whose elements are subsets of a
given finite set Ω. A function P : F → R is called a probability function (or measure) if it satisfies the following
three conditions:

i ) P is finitely additive
ii ) P (A) ≥ 0 for all A ∈ F
iii ) P (Ω) = 1
Given the above conditions, the triple (Ω, F, P ) is called a finite probability space.

Terminology. The set Ω is called the sample space, and each ω ∈ Ω is called an outcome, and the subsets A ⊆ F
are called events. Let A be an event, then given an outcome ω, either ω ∈ A or ω ∈ Ac , in the former case we say
the event A has occurred, otherwise we say the event A has not occurred. An event A is said to be impossible if
A = ∅, since clearly ω ∈ Ac for all ω ∈ Ω and certain if A = Ω. The non-negative number, P (A), is said to be the
probability that an outcome ω is an element of A. It is also customary to say that P (A) is the probability that the
event A occurs. It is important to note that while P (∅) = 0, not all events with probability 0 need be impossible.
Usually the sigma algebra is taken to be the set of all subsets of Ω, in which case if Ω = {ω1 , ω2 , . . . , ωn } by
properties i) and ii ), we have

n
X
(1) P (Ω) = P (ωi ) = 1
i=1
MEASURE THEORY AND THE PROBABILITY OF AN INFINITE SEQUENCE OF HEADS 3

Since every subset A ∈ F can be written as some disjoint union of the ωi , the function P is completely determined
by the value of P (ωi ) for each ωi ∈ Ω. Notice that one need not assign a positive probability to every element
ωi ∈ F , as long as (1) holds.
Here is is helpful to consider two different probability spaces with the same sample space and sigma algebra, which
draws attention to the formal theory of probability (one where the above axioms hold) and applied probability,
where the function P is supposed to model the probability of events in some real-world experiment.

Example. Consider two probability spaces, (Ω, F, P1 ) and (Ω, F, P2 ), where Ω = {1, 2, 3, 4, 5, 6} and F is taken to
be all subsets of Ω. As previously noted, the function P is completely determined by the value of P (ωi ) for each
6
ωi ∈ Ω. So define P1 (ω) = 16 for all ω ∈ Ω, then P1 (Ω) =
P
P (ωi ) = 1. This probability space is often associated
i=1
with the probability of rolling a particular number on a fair dice, and is often referred to as an equiprobable function,
or a uniform probability distribution - which are simply functions which assign the same probability measure to
each outcome in the sample space. In contrast, define

1

2
 if ω = 1

P2 (ω) = 1
if ω = 2, 3, 4, 5
8


0 if ω = 6
6
P
Then again, P2 (Ω) = P (ωi ) = 1, but it is unclear what physical experiment this particular probability space
i=1
may represent, even though it is a genuine probability space, since it is consistent with all of the axioms. This
example also illustrates that P (ωi ) can have the value zero without ωi being the empty set.

Also of key importance to our question is the concept of independence of events.

Definition. (Independent events) Let (Ω, F, P ) be a probability space, and let A, B ∈ F be events. Then A and
B are said to be independent iff P (A ∩ B) = P (A)P (B)

Independence is a mathematical reflection of the notion that the occurrence of one event A has no effect on the
occurrence of some other event B, and the other way around. In these cases, the resulting probability is the product
of the individual probabilities for A and B. But the converse is true also, if, for example, it is discovered that the
probability of two events taken together is equal to the product of their individual probabilities, we can conclude,
at least in the Mathematical sense, that the two events are independent. In our particular question, we will assume
the probability of flipping heads is 1/2 and that each coin flip is independent of the ones which proceed it.
We now turn to the question concerning the probability of flipping an infinite number of heads. Before we do
so, we need to extend our definition of Probability to include sample spaces which are countably infinite. We will
later extend this definition to also include sample spaces which are uncountably infinite.

Definition (Probability for countably infinite sample spaces). This definition is the same as our definition for the
probability of finite sample spaces, with the addition that our sigma algebra may be countably infinite. Specifically,
for every countably infinite collection of elements of Ak ∈ F , we add the following
MEASURE THEORY AND THE PROBABILITY OF AN INFINITE SEQUENCE OF HEADS 4


 
∞ P
iv ) P ∪ Ak = P (Ak ) if Ai ∩ Aj = ∅ whenever i 6= j
k=1 k=1

With this addition, it is possible to answer our question about the probability of an infinite toss of heads.
However, it is perhaps more instructive to present an erroneous proof that the probability of an infinite toss of
heads is 0.

Erroneous Proof. Assume that each toss of the coin is independent and the probability of getting heads on a single
toss is 21 , then for k tosses, there are 2k distinct sets {a1 , . . . , ak } such that a ∈ {H, T }, precisely one of which has
ai = Hi for 1 ≤ i ≤ k. We denote this special case as Ak and define our countably infinite sample space to be,
1
Ω = {A1 , A2 , . . .} where F is all subsets of Ω, as usual. Define the probability P (Ak ) = 2k
, then P (Ak ) represents
the probability of getting k consecutive tosses of heads. Moreover, we have
∞ ∞ ∞
!
[ X X 1
P (Ω) = P Ak = P (Ak ) = =1
2k
k=1 k=1 k=1

Now assume the probability of flipping an infinite number of heads is given by some positive number , but then, by
1 1
the Archimedean property ∃k ∈ N such that 0 < P (Ak ) = 2k
< k < , but since P (Ak ) represents the probability
of flipping k heads in a row, it can’t be less than the probability of flipping an infinite number of heads, hence the
probability of flipping an infinite number of heads must be 0. 

As noted, there is a problem with this approach. It can be discovered by careful observation of our sample space
Ω. Observe, Ω = {A1 , A2 , A3 , 
. . .} = {{H
 1 } , {H1 , H2 } , {H1 , H2 , H3 } . . .}, but A1 ∩ A2 = A1 6= ∅, and in general,

Ak = lim 21k = 0 6= 1 as earlier claimed.
S
Ak ∩ Ak+1 = Ak 6= ∅, hence P
k=1 k→∞

To make this approach work for countably infinite samples spaces, mathematicians have introduced what are
known as Bernoulli trials, where the Ak are not defined {H1 , H2 , H3 , . . . , Hk } but rather are made up of trials which
continue until we flip a tail at which case the trial ends, so in our case, Ak would be interpreted the k th flip that
first results in a tail, so our Bernoulli trials look like A1 = T1 , A2 = H1 T2 , A3 = H1 H2 T3 , and so on. Notice in
this case each element of our sample space is Ak with the special property that each Ak represents the specified
outcome of the k th flip as being tails, so Aj ∩ Ai = ∅ for i 6= j. The limiting value of P (An ) as n → ∞ is defined as
1
the probability of flipping an infinite sequence of heads, which in this case is the same as lim k = 0. But we will
k→∞ 2
leave this approach at the level of this outline, since our aim is to solve the problem where the event of flipping an
infinite number of heads is included in the sample space, we will proceed along that line and leave the reader to
explore that approach (see especially, [1, 3, 4]).

III. “Randomly”choosing elements for infinite samples spaces.

To further motivate our discussion suppose we ask the question, What is the probability of randomly choosing a
rational number in the set Ω = (0, 1)? Clearly our answer requires some sense of what it means to “randomly chose”
MEASURE THEORY AND THE PROBABILITY OF AN INFINITE SEQUENCE OF HEADS 5

but more important for our discussion is the fact that Ω is uncountably infinite, and our two previous definitions
of a Probability measure do not include this case, so we will have to modify our approach one more time. Before
doing so, we will prove a theorem that shows that for any uncountably infinite sample space Ω, all but a countable
number of events must have probability zero.
Theorem 1. Let (Ω, F, P ) be a probability space where Ω is uncountably infinite, and let A be the set of all x ∈ Ω
with positive probability, then A is countable.
n o
1 1
Proof. For each n ∈ N, define An = x ∈ Ω | n+1 < P (x) ≤
, then the size of An is at most n, since if its size
n
 
1
were n + 1 or greater then the sum of all of the probabilities in An would be greater than (n + 1) n+1 = 1, which
is impossible. Since every x ∈ A has positive probability, then x ∈ An for some n, and since A is the set of x ∈ Ω
with positive probability, A = A1 ∪ A2 ∪ . . .. Since the countable union of finite sets is countable, A is countable. 

What Theorem 1 tells us is that “some” elements of our set (0, 1) must have zero probability, since the cardinality
of (0, 1) is uncountable, and only a countable subset of (0, 1) can have non-zero probability. But even if we chose
S = {x ∈ (0, 1) | x ∈ Q} to be the countable subset of (0, 1), we can not define “randomly picked” to mean that
each element of S is as equally probable as any other of being chosen.
Theorem 2. Given a countably infinite sample space Ω, then it is impossible to assign the same non-zero probability
to every element in Ω.
Proof. Suppose to the contrary that every element of Ω has the same probability of being picked, let  be that

 = 1, but since  is constant, by the Archimedean property, ∃n ∈ N such that n1 < ,
P
probability, so we have
i=1
hence we have the contradiction,
n n ∞
X 1 X X
1 = P (Ω) = < < =1
i=1
n i=1 i=1
2
Hence it is impossible that every element of Ω has the same probability of being chosen. 

Observe that the above proof holds for all infinite sample spaces, since by Theorem 1, all but a countable number
of outcomes can be given positive probability. Theorem 2 also tells us that when asking questions about picking
elements at random from sample spaces which are infinite, some definition other than “each element has the same
probability” must be used.
We mention the question of “picking at random” to introduce some of the difficulties that arise when talking
about samples spaces which are infinite. We will not need this concept to answer our question about the probability
of flipping a infinite sequence of heads, since in that case we will not be picking subsets of Ω at random, but will
rather pick subsets which have a particular property. What we will need, however, is to abandon assigning any
positive probability to individual elements of Ω. We develop this concept further below where, instead of individual
elements, we consider intervals instead. First we need to give an informal definition of probability for uncountable
sample spaces, where the sample space Ω ⊆ R.

2We could have equally shown that for any fixed positive number , the infinite sum, P  is unbounded.
i=1
MEASURE THEORY AND THE PROBABILITY OF AN INFINITE SEQUENCE OF HEADS 6

Definition (Probability for uncountable samples spaces). Let (Ω, F, P ) be a probability space where Ω ⊆ R is
uncountable and F is a sigma-algebra of Borel subsets of Ω, we will call A, B ∈ F, measurable sets iff they the
following properties:3
i) If A is measurable, then so is R \ A.
ii) If {A1 , A2 . . .} is a countable collection of measurable sets, then {A1 ∪ A2 ∪ . . .} is measurable.
iii) Every interval (open, closed, half open, etc) is measurable.
Any non-negative, completely additive set function P defined on F such that P (Ω) = 1 is a probability measure.

We can now set the stage for proving that the probability of flipping an infinite number of heads is zero.
Our sample space Ω will consist of sets of coin flips, such as {H1 , T2 , T3 , H4 , . . .} (the subscripts are used to
distinguish consecutive coin flips), which are countably infinite in length. The start of one such set might look like
{H1 , T2 , T3 , H4 , T5 , T6 , H7 , . . .} however, we will be interested in the subset of all such sets which represent at some
point an infinite sequence of heads.
Now consider the well-know fact that in the interval (0, 1) the decimal expression of every irrational number
consists of a never-repeating sequence of digits, and that every rational in the same interval comes in two forms,
those with periods which repeat infinitely and those which terminate in all 0’s. By also noting that every decimal
expression that ends with all 0’s can be replaced by an equivalent one which has all 9’s (e.g. 0.5 = 0.49999 . . .), our
insight will be to realize that by looking at the base 2 representation of these numbers, we have infinite sequences
of 0’s and 1’s that never repeat if the number is irrational, and for the rationals, we have sequences of 0’s and 1’s
that forever repeat, and those which terminate with all 0’s. By insisting that we represent these with all 1’s instead
1
(for example, in base 2, 2 = .1000 . . . = .1111 . . .), and by observing that the set x ∈ (0, 1) expressed in base 2 by
changing T to 1 and H to 0, is identical to our above set of all possible infinite sequences of coin flips, we have a
model for the probability of flipping an infinite number of heads connected to the question of picking a subset of
rational numbers from the interval (0, 1). To exploit this connection, we will have to further develop the concept of
measure as applied to subsets of the real line.

Theorem 3. Let H denote the set which contains all sets of consecutive coin flips which eventually result in an
infinite sequence of Heads, then H is countably infinite

Proof. For the sake of simplicity, we will refer to the elements of H as {a1 , a2 , . . .} where a ∈ {H, T}. Observe that
every element x ∈ H is a set which contains the result of the nth coin toss and, since H is the set of all possible coin
tosses which contain an infinite sequence of heads, for some k all subsequent ai for i ≥ k have a = H.
To show that H is infinite, just observe that for each n ∈ N, there is an element of H which has all tails before
the nth coin toss, and all heads thereafter.
To show that H is countable, let 1 stand for heads, and 0 stand for tails, we can form a bijection between each
element x ∈ H and H∗ where each element in H∗ is the unique binary representation of some number in the interval

3For the sake of simplicity, we will not define what counts as a Borel set, but observe most subsets of R one would normally consider is
a Borel set. For a more extensive explanation, see [1, 7].
MEASURE THEORY AND THE PROBABILITY OF AN INFINITE SEQUENCE OF HEADS 7

(0, 1) which ends in a sequence of non-terminating 1’s, and so represents an element of Q. Since Q is countable, so
is H.4 

With this in mind, we will be able to show that that the probability of picking any rational from the interval
(0, 1) = 0, which is actually a stronger statement than our original question, since the set H is a bijection to a
subset of Q. In order to so do, we will need a few results from measure theory first.

IV. The Measure of of S = {x ∈ (0, 1) | x ∈ Q}. Since our probability will be defined on intervals, rather than
individual elements, we need some sense of how this might be done. First observe that for our interval (0, 1), we
define P (a < X < b) = b − a. This notation may be confusing, but recall our outcomes are no longer individual
elements of (0, 1) but rather subsets of (0, 1), so P (a < X < b) = b − a, just says the probability of picking the
entire interval (a, b) is just b − a. Our first step will be establish which subsets of (0, 1) have probability zero, or,
using the vocabulary of Measure theory, which subsets of (0, 1) have measure zero.

Definition. A subset A ∈ R is said to be a zero set, or have measure zero, if for every  > 0, there exists a
∞ ∞
P
countable collection of intervals (ai , bi ) such that A ⊆ ∪ (ai , bi ) and bi − ai ≤ 
i=1 i=1

We can use this definition to show that any finite subset of R has measure zero.

Theorem 4. Any given finite subset of R has measure zero.

Proof. Let A be a finite subset of R that contains n elements, and denote the elements of A by {x1 , x2 , . . . , xn }. Let
n n n n
    2 
S P P P
 > 0, then A ⊆ (xi − 2n , xi + 2n ) and (xi − 2n , xi + 2n )= 2n = n ≤ , hence A has measure zero. 
i=1 i=1 i=1 i=1

What this means is that in some sense we can “corral off” finite subsets of R and show that the probability of
picking any finite subset of (0, 1) has Probability 0. This is an important consequence of our definition of a zero
set, but since H is countably infinite, we need to extend the results of Theorem 4 to countably infinite sets. Which
is precisely what we do next.

Theorem 5. Any countably infinite subset of R has measure zero.

Proof. Let A be a countably infinite subset of R and denote the elements of A by {x1 , x2 , . . . , }. Let  > 0, then

(xi − 2i , xi + 2i ) . Observe that the length of any given interval is 2i + 2i = 2i−1

S
A⊆ , hence the length of all
i=1


P
of the intervals is 2i−1 ≤ , hence A has measure zero. 
i=1

An easy corollary of Theorem 5 is the following:

Corollary 6. The set Q has measure zero.

Proof. Since Q is a countably infinite subset of R, by Theorem 5, Q has measure zero. 



4One obvious bijection is to observe for each h ∈ H we have h = {a , a , . . .} where a ∈ {H, T}. Define f : H → R by f (h) = P 1 (a )
1 2 i2 i
i=1
where ai = 1 if a = H, and 0 otherwise. To show that f is one to one and onto we leave to the reader.
MEASURE THEORY AND THE PROBABILITY OF AN INFINITE SEQUENCE OF HEADS 8

While it is not necessary for the proof of our main theorem, we can now prove the following well-known result.

Corollary 7. The set (0, 1) \ Q has measure 1

Proof. Observe that the measure of (0, 1) is 1 − 0 = 1. Since the measure of two disjoint sets is the sum of their
individual measures5, then by Corollary 6, the set (0, 1) \ Q has measure 1. 

We can now prove that the probability of flipping an infinite sequence of heads is zero.

Theorem 8. The probability of flipping an infinite sequence of heads is 0.

Proof. Let (Ω, F, P ) be a probability space where Ω = (0, 1) and suppose F is a sigma-algebra of Borel subsets of
Ω. Let H∗ be the subset of Ω as defined in Theorem 3, then P (H∗ ) = 0, since H∗ ⊂ Q, by relating H∗ to H, we have
that the probability of flipping an infinite sequence of heads is zero. 

References
[1] Tom M Apostol. Calculus, volume II. Blaisdell Pub, 1961.
[2] Santiago Cañez. The riemann-lebesgue theorem. http://math.berkeley.edu/ scanez/courses/math104/fall11/handouts/riemann-
lebesgue.pdf.
[3] William Feller. An Introduction to Probability Theory and Its Applications. Wiley, second edition, 1957.
[4] Richard Isaac. The Pleasures of Probability. Springer-Verlag, 1995.
[5] Frank Jones. Lebesgue Integration on Euclidean Space. Jones and Bartlett, 1993.
[6] Terence Tao. An Introduction to Measure Theory. American Mathematical Society, 2011.
[7] Warren Weaver. Lady Luck : The Theory of Probability. Dover, 1982.

San Antonio College


E-mail address: [email protected]

5We have not proven this, but the reader may consult [5, 2, 6] for elementary proofs.

You might also like