0% found this document useful (0 votes)
57 views102 pages

Continuous Random Variables: Scott Sheffield

This document outlines a lecture on continuous random variables. It begins by defining a continuous random variable as one that has a probability density function f(x) such that the probability of an event B is given by the integral of f(x) over B. It then provides examples of computing probabilities for simple continuous distributions like the uniform distribution on [0,2]. Finally, it discusses how to define the expectation of continuous random variables using integrals of x*f(x), analogous to the discrete case.

Uploaded by

Ed Z
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views102 pages

Continuous Random Variables: Scott Sheffield

This document outlines a lecture on continuous random variables. It begins by defining a continuous random variable as one that has a probability density function f(x) such that the probability of an event B is given by the integral of f(x) over B. It then provides examples of computing probabilities for simple continuous distributions like the uniform distribution on [0,2]. Finally, it discusses how to define the expectation of continuous random variables using integrals of x*f(x), analogous to the discrete case.

Uploaded by

Ed Z
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 102

18.

600: Lecture 17
Continuous random variables

Scott Sheffield

MIT
Outline

Continuous random variables

Expectation and variance of continuous random variables

Uniform random variable on [0, 1]

Uniform random variable on [, ]

Measurable sets and a famous paradox


Outline

Continuous random variables

Expectation and variance of continuous random variables

Uniform random variable on [0, 1]

Uniform random variable on [, ]

Measurable sets and a famous paradox


Continuous random variables

I Say X is a continuous random variable if there exists a


probability density
R function
R f = fX on R such that
P{X B} = B f (x)dx := 1B (x)f (x)dx.
Continuous random variables

I Say X is a continuous random variable if there exists a


probability density
R function
R f = fX on R such that
P{X B} = B f (x)dx := 1B (x)f (x)dx.
R R
I We may assume R f (x)dx = f (x)dx = 1 and f is
non-negative.
Continuous random variables

I Say X is a continuous random variable if there exists a


probability density
R function R f = fX on R such that
P{X B} = B f (x)dx := 1B (x)f (x)dx.
R R
I We may assume R f (x)dx = f (x)dx = 1 and f is
non-negative.
Rb
I Probability of interval [a, b] is given by a f (x)dx, the area
under f between a and b.
Continuous random variables

I Say X is a continuous random variable if there exists a


probability density
R function R f = fX on R such that
P{X B} = B f (x)dx := 1B (x)f (x)dx.
R R
I We may assume R f (x)dx = f (x)dx = 1 and f is
non-negative.
Rb
I Probability of interval [a, b] is given by a f (x)dx, the area
under f between a and b.
I Probability of any single point is zero.
Continuous random variables

I Say X is a continuous random variable if there exists a


probability density
R function R f = fX on R such that
P{X B} = B f (x)dx := 1B (x)f (x)dx.
R R
I We may assume R f (x)dx = f (x)dx = 1 and f is
non-negative.
Rb
I Probability of interval [a, b] is given by a f (x)dx, the area
under f between a and b.
I Probability of any single point is zero.
I Define cumulative distribution function R
a
F (a) = FX (a) := P{X < a} = P{X a} = f (x)dx.
Simple example

(
1/2 x [0, 2]
I Suppose f (x) =
0 x6 [0, 2].
Simple example

(
1/2 x [0, 2]
I Suppose f (x) =
0 x6 [0, 2].
I What is P{X < 3/2}?
Simple example

(
1/2 x [0, 2]
I Suppose f (x) =
0 x6 [0, 2].
I What is P{X < 3/2}?
I What is P{X = 3/2}?
Simple example

(
1/2 x [0, 2]
I Suppose f (x) =
0 x6 [0, 2].
I What is P{X < 3/2}?
I What is P{X = 3/2}?
I What is P{1/2 < X < 3/2}?
Simple example

(
1/2 x [0, 2]
I Suppose f (x) =
0 x6 [0, 2].
I What is P{X < 3/2}?
I What is P{X = 3/2}?
I What is P{1/2 < X < 3/2}?
I What is P{X (0, 1) (3/2, 5)}?
Simple example

(
1/2 x [0, 2]
I Suppose f (x) =
0 x6 [0, 2].
I What is P{X < 3/2}?
I What is P{X = 3/2}?
I What is P{1/2 < X < 3/2}?
I What is P{X (0, 1) (3/2, 5)}?
I What is F ?
Simple example

(
1/2 x [0, 2]
I Suppose f (x) =
0 x6 [0, 2].
I What is P{X < 3/2}?
I What is P{X = 3/2}?
I What is P{1/2 < X < 3/2}?
I What is P{X (0, 1) (3/2, 5)}?
I What is F ?

0
a0
I F (a) = FX (a) = a/2 0 < a < 2

1 a2

Simple example

(
1/2 x [0, 2]
I Suppose f (x) =
0 x6 [0, 2].
I What is P{X < 3/2}?
I What is P{X = 3/2}?
I What is P{1/2 < X < 3/2}?
I What is P{X (0, 1) (3/2, 5)}?
I What is F ?

0
a0
I F (a) = FX (a) = a/2 0 < a < 2

1 a2

I In general P(a x b) = F (b) F (x).


Simple example

(
1/2 x [0, 2]
I Suppose f (x) =
0 x6 [0, 2].
I What is P{X < 3/2}?
I What is P{X = 3/2}?
I What is P{1/2 < X < 3/2}?
I What is P{X (0, 1) (3/2, 5)}?
I What is F ?

0
a0
I F (a) = FX (a) = a/2 0 < a < 2

1 a2

I In general P(a x b) = F (b) F (x).


I We say that X is uniformly distributed on [0, 2].
Another example

(
x/2 x [0, 2]
I Suppose f (x) =
0 06 [0, 2].
Another example

(
x/2 x [0, 2]
I Suppose f (x) =
0 06 [0, 2].
I What is P{X < 3/2}?
Another example

(
x/2 x [0, 2]
I Suppose f (x) =
0 06 [0, 2].
I What is P{X < 3/2}?
I What is P{X = 3/2}?
Another example

(
x/2 x [0, 2]
I Suppose f (x) =
0 06 [0, 2].
I What is P{X < 3/2}?
I What is P{X = 3/2}?
I What is P{1/2 < X < 3/2}?
Another example

(
x/2 x [0, 2]
I Suppose f (x) =
0 06 [0, 2].
I What is P{X < 3/2}?
I What is P{X = 3/2}?
I What is P{1/2 < X < 3/2}?
I What is F ?
Another example

(
x/2 x [0, 2]
I Suppose f (x) =
0 06 [0, 2].
I What is P{X < 3/2}?
I What is P{X = 3/2}?
I What is P{1/2 < X < 3/2}?
I What is F ?

0
a0
I FX (a) = a2 /4 0<a<2

1 a2

Outline

Continuous random variables

Expectation and variance of continuous random variables

Uniform random variable on [0, 1]

Uniform random variable on [, ]

Measurable sets and a famous paradox


Outline

Continuous random variables

Expectation and variance of continuous random variables

Uniform random variable on [0, 1]

Uniform random variable on [, ]

Measurable sets and a famous paradox


Expectations of continuous random variables
I Recall that when X was a discrete random variable, with
p(x) = P{X = x}, we wrote
X
E [X ] = p(x)x.
x:p(x)>0
Expectations of continuous random variables
I Recall that when X was a discrete random variable, with
p(x) = P{X = x}, we wrote
X
E [X ] = p(x)x.
x:p(x)>0

I How should we define E [X ] when X is a continuous random


variable?
Expectations of continuous random variables
I Recall that when X was a discrete random variable, with
p(x) = P{X = x}, we wrote
X
E [X ] = p(x)x.
x:p(x)>0

I How should we define E [X ] when X is a continuous random


variable?
R
I Answer: E [X ] = f (x)xdx.
Expectations of continuous random variables
I Recall that when X was a discrete random variable, with
p(x) = P{X = x}, we wrote
X
E [X ] = p(x)x.
x:p(x)>0

I How should we define E [X ] when X is a continuous random


variable?
R
I Answer: E [X ] = f (x)xdx.
I Recall that when X was a discrete random variable, with
p(x) = P{X = x}, we wrote
X
E [g (X )] = p(x)g (x).
x:p(x)>0
Expectations of continuous random variables
I Recall that when X was a discrete random variable, with
p(x) = P{X = x}, we wrote
X
E [X ] = p(x)x.
x:p(x)>0

I How should we define E [X ] when X is a continuous random


variable?
R
I Answer: E [X ] = f (x)xdx.
I Recall that when X was a discrete random variable, with
p(x) = P{X = x}, we wrote
X
E [g (X )] = p(x)g (x).
x:p(x)>0

I What is the analog when X is a continuous random variable?


Expectations of continuous random variables
I Recall that when X was a discrete random variable, with
p(x) = P{X = x}, we wrote
X
E [X ] = p(x)x.
x:p(x)>0

I How should we define E [X ] when X is a continuous random


variable?
R
I Answer: E [X ] = f (x)xdx.
I Recall that when X was a discrete random variable, with
p(x) = P{X = x}, we wrote
X
E [g (X )] = p(x)g (x).
x:p(x)>0

I What is the analog when X is a continuous random variable?


R
I Answer: we will write E [g (X )] = f (x)g (x)dx.
Variance of continuous random variables

I Suppose X is a continuous random variable with mean .


Variance of continuous random variables

I Suppose X is a continuous random variable with mean .


I We can write Var[X ] = E [(X )2 ], same as in the discrete
case.
Variance of continuous random variables

I Suppose X is a continuous random variable with mean .


I We can write Var[X ] = E [(X )2 ], same as in the discrete
case.
I Next, if g =R g1 + g2 then R
RE [g (X )] = g1 (x)f
 (x)dx + g2 (x)f (x)dx =
g1 (x) + g2 (x) f (x)dx = E [g1 (X )] + E [g2 (X )].
Variance of continuous random variables

I Suppose X is a continuous random variable with mean .


I We can write Var[X ] = E [(X )2 ], same as in the discrete
case.
I Next, if g =R g1 + g2 then R
RE [g (X )] = g1 (x)f
 (x)dx + g2 (x)f (x)dx =
g1 (x) + g2 (x) f (x)dx = E [g1 (X )] + E [g2 (X )].
I Furthermore, E [ag (X )] = aE [g (X )] when a is a constant.
Variance of continuous random variables

I Suppose X is a continuous random variable with mean .


I We can write Var[X ] = E [(X )2 ], same as in the discrete
case.
I Next, if g =R g1 + g2 then R
RE [g (X )] = g1 (x)f
 (x)dx + g2 (x)f (x)dx =
g1 (x) + g2 (x) f (x)dx = E [g1 (X )] + E [g2 (X )].
I Furthermore, E [ag (X )] = aE [g (X )] when a is a constant.
I Just as in the discrete case, we can expand the variance
expression as Var[X ] = E [X 2 2X + 2 ] and use additivity
of expectation to say that
Var[X ] = E [X 2 ] 2E [X ] + E [2 ] = E [X 2 ] 22 + 2 =
E [X 2 ] E [X ]2 .
Variance of continuous random variables

I Suppose X is a continuous random variable with mean .


I We can write Var[X ] = E [(X )2 ], same as in the discrete
case.
I Next, if g =R g1 + g2 then R
RE [g (X )] = g1 (x)f
 (x)dx + g2 (x)f (x)dx =
g1 (x) + g2 (x) f (x)dx = E [g1 (X )] + E [g2 (X )].
I Furthermore, E [ag (X )] = aE [g (X )] when a is a constant.
I Just as in the discrete case, we can expand the variance
expression as Var[X ] = E [X 2 2X + 2 ] and use additivity
of expectation to say that
Var[X ] = E [X 2 ] 2E [X ] + E [2 ] = E [X 2 ] 22 + 2 =
E [X 2 ] E [X ]2 .
I This formula is often useful for calculations.
Outline

Continuous random variables

Expectation and variance of continuous random variables

Uniform random variable on [0, 1]

Uniform random variable on [, ]

Measurable sets and a famous paradox


Outline

Continuous random variables

Expectation and variance of continuous random variables

Uniform random variable on [0, 1]

Uniform random variable on [, ]

Measurable sets and a famous paradox


Uniform random variables on [0, 1]

I Suppose X is a random
( variable with probability density
1 x [0, 1]
function f (x) =
0 x 6 [0, 1].
Uniform random variables on [0, 1]

I Suppose X is a random
( variable with probability density
1 x [0, 1]
function f (x) =
0 x 6 [0, 1].
I Then for any 0 a b 1 we have P{X [a, b]} = b a.
Uniform random variables on [0, 1]

I Suppose X is a random
( variable with probability density
1 x [0, 1]
function f (x) =
0 x 6 [0, 1].
I Then for any 0 a b 1 we have P{X [a, b]} = b a.
I Intuition: all locations along the interval [0, 1] equally likely.
Uniform random variables on [0, 1]

I Suppose X is a random
( variable with probability density
1 x [0, 1]
function f (x) =
0 x 6 [0, 1].
I Then for any 0 a b 1 we have P{X [a, b]} = b a.
I Intuition: all locations along the interval [0, 1] equally likely.
I Say that X is a uniform random variable on [0, 1] or that X
is sampled uniformly from [0, 1].
Properties of uniform random variable on [0, 1]
I Suppose X is a random
( variable with probability density
1 x [0, 1]
function f (x) = which implies
0 x 6 [0, 1],

0
a<0
FX (a) = a a [0, 1] .

1 a>1

Properties of uniform random variable on [0, 1]
I Suppose X is a random
( variable with probability density
1 x [0, 1]
function f (x) = which implies
0 x 6 [0, 1],

0
a<0
FX (a) = a a [0, 1] .

1 a>1

I What is E [X ]?
Properties of uniform random variable on [0, 1]
I Suppose X is a random
( variable with probability density
1 x [0, 1]
function f (x) = which implies
0 x 6 [0, 1],

0
a<0
FX (a) = a a [0, 1] .

1 a>1

I What is E [X ]?
I Guess 1/2 (since 1/2 is, you know, in the middle).
Properties of uniform random variable on [0, 1]
I Suppose X is a random
( variable with probability density
1 x [0, 1]
function f (x) = which implies
0 x 6 [0, 1],

0
a<0
FX (a) = a a [0, 1] .

1 a>1

I What is E [X ]?
I Guess 1/2 (since 1/2 is, you know, in the middle).
2 1
R R1
I Indeed, f (x)xdx = 0 xdx = x2 = 1/2.
0
Properties of uniform random variable on [0, 1]
I Suppose X is a random
( variable with probability density
1 x [0, 1]
function f (x) = which implies
0 x 6 [0, 1],

0
a<0
FX (a) = a a [0, 1] .

1 a>1

I What is E [X ]?
I Guess 1/2 (since 1/2 is, you know, in the middle).
2 1
R R1
I Indeed, f (x)xdx = 0 xdx = x2 = 1/2.
0
I What is the general moment E [X k ] for k 0?
Properties of uniform random variable on [0, 1]
I Suppose X is a random
( variable with probability density
1 x [0, 1]
function f (x) = which implies
0 x 6 [0, 1],

0
a<0
FX (a) = a a [0, 1] .

1 a>1

I What is E [X ]?
I Guess 1/2 (since 1/2 is, you know, in the middle).
2 1
R R1
I Indeed, f (x)xdx = 0 xdx = x2 = 1/2.
0
I What is the general moment E [X k ] for k 0?
I Answer: 1/(k + 1).
Properties of uniform random variable on [0, 1]
I Suppose X is a random
( variable with probability density
1 x [0, 1]
function f (x) = which implies
0 x 6 [0, 1],

0
a<0
FX (a) = a a [0, 1] .

1 a>1

I What is E [X ]?
I Guess 1/2 (since 1/2 is, you know, in the middle).
2 1
R R1
I Indeed, f (x)xdx = 0 xdx = x2 = 1/2.
0
I What is the general moment E [X k ] for k 0?
I Answer: 1/(k + 1).
I What would you guess the variance is? Expected square of
distance from 1/2?
Properties of uniform random variable on [0, 1]
I Suppose X is a random
( variable with probability density
1 x [0, 1]
function f (x) = which implies
0 x 6 [0, 1],

0
a<0
FX (a) = a a [0, 1] .

1 a>1

I What is E [X ]?
I Guess 1/2 (since 1/2 is, you know, in the middle).
2 1
R R1
I Indeed, f (x)xdx = 0 xdx = x2 = 1/2.
0
I What is the general moment E [X k ] for k 0?
I Answer: 1/(k + 1).
I What would you guess the variance is? Expected square of
distance from 1/2?
I Its obviously less than 1/4, but how much less?
Properties of uniform random variable on [0, 1]
I Suppose X is a random
( variable with probability density
1 x [0, 1]
function f (x) = which implies
0 x 6 [0, 1],

0
a<0
FX (a) = a a [0, 1] .

1 a>1

I What is E [X ]?
I Guess 1/2 (since 1/2 is, you know, in the middle).
2 1
R R1
I Indeed, f (x)xdx = 0 xdx = x2 = 1/2.
0
I What is the general moment E [X k ] for k 0?
I Answer: 1/(k + 1).
I What would you guess the variance is? Expected square of
distance from 1/2?
I Its obviously less than 1/4, but how much less?
I VarE [X 2 ] E [X ]2 = 1/3 1/4 = 1/12.
Outline

Continuous random variables

Expectation and variance of continuous random variables

Uniform random variable on [0, 1]

Uniform random variable on [, ]

Measurable sets and a famous paradox


Outline

Continuous random variables

Expectation and variance of continuous random variables

Uniform random variable on [0, 1]

Uniform random variable on [, ]

Measurable sets and a famous paradox


Uniform random variables on [, ]

I Fix < and suppose X is a random ( variable with


1
x [, ]
probability density function f (x) =
0 x 6 [, ].
Uniform random variables on [, ]

I Fix < and suppose X is a random ( variable with


1
x [, ]
probability density function f (x) =
0 x 6 [, ].
ba
I Then for any a b we have P{X [a, b]} = .
Uniform random variables on [, ]

I Fix < and suppose X is a random ( variable with


1
x [, ]
probability density function f (x) =
0 x 6 [, ].
ba
I Then for any a b we have P{X [a, b]} = .
I Intuition: all locations along the interval [, ] are equally
likely.
Uniform random variables on [, ]

I Fix < and suppose X is a random ( variable with


1
x [, ]
probability density function f (x) =
0 x 6 [, ].
ba
I Then for any a b we have P{X [a, b]} = .
I Intuition: all locations along the interval [, ] are equally
likely.
I Say that X is a uniform random variable on [, ] or that
X is sampled uniformly from [, ].
Uniform random variables on [, ]

I Suppose X is a random
( variable with probability density
1
x [, ]
function f (x) =
0 x 6 [, ].
Uniform random variables on [, ]

I Suppose X is a random
( variable with probability density
1
x [, ]
function f (x) =
0 x 6 [, ].
I What is E [X ]?
Uniform random variables on [, ]

I Suppose X is a random
( variable with probability density
1
x [, ]
function f (x) =
0 x 6 [, ].
I What is E [X ]?
+
I Intuitively, wed guess the midpoint 2 .
Uniform random variables on [, ]

I Suppose X is a random
( variable with probability density
1
x [, ]
function f (x) =
0 x 6 [, ].
I What is E [X ]?
+
I Intuitively, wed guess the midpoint 2 .
I Whats the cleanest way to prove this?
Uniform random variables on [, ]

I Suppose X is a random
( variable with probability density
1
x [, ]
function f (x) =
0 x 6 [, ].
I What is E [X ]?
+
I Intuitively, wed guess the midpoint 2 .
I Whats the cleanest way to prove this?
I One approach: let Y be uniform on [0, 1] and try to show that
X = ( )Y + is uniform on [, ].
Uniform random variables on [, ]

I Suppose X is a random
( variable with probability density
1
x [, ]
function f (x) =
0 x 6 [, ].
I What is E [X ]?
+
I Intuitively, wed guess the midpoint 2 .
I Whats the cleanest way to prove this?
I One approach: let Y be uniform on [0, 1] and try to show that
X = ( )Y + is uniform on [, ].
I Then expectation linearity gives
+
E [X ] = ( )E [Y ] + = (1/2)( ) + = 2 .
Uniform random variables on [, ]

I Suppose X is a random
( variable with probability density
1
x [, ]
function f (x) =
0 x 6 [, ].
I What is E [X ]?
+
I Intuitively, wed guess the midpoint 2 .
I Whats the cleanest way to prove this?
I One approach: let Y be uniform on [0, 1] and try to show that
X = ( )Y + is uniform on [, ].
I Then expectation linearity gives
+
E [X ] = ( )E [Y ] + = (1/2)( ) + = 2 .
I Using similar logic, what is the variance Var[X ]?
Uniform random variables on [, ]

I Suppose X is a random
( variable with probability density
1
x [, ]
function f (x) =
0 x 6 [, ].
I What is E [X ]?
+
I Intuitively, wed guess the midpoint 2 .
I Whats the cleanest way to prove this?
I One approach: let Y be uniform on [0, 1] and try to show that
X = ( )Y + is uniform on [, ].
I Then expectation linearity gives
+
E [X ] = ( )E [Y ] + = (1/2)( ) + = 2 .
I Using similar logic, what is the variance Var[X ]?
I Answer: Var[X ] = Var[( )Y + ] = Var[( )Y ] =
( )2 Var[Y ] = ( )2 /12.
Outline

Continuous random variables

Expectation and variance of continuous random variables

Uniform random variable on [0, 1]

Uniform random variable on [, ]

Measurable sets and a famous paradox


Outline

Continuous random variables

Expectation and variance of continuous random variables

Uniform random variable on [0, 1]

Uniform random variable on [, ]

Measurable sets and a famous paradox


Uniform measure: is probability defined for all subsets?

I One of the
( very simplest probability density functions is
1 x [0, 1]
f (x) = .
0 0 6 [0, 1].
Uniform measure: is probability defined for all subsets?

I One of the
( very simplest probability density functions is
1 x [0, 1]
f (x) = .
0 0 6 [0, 1].
I If B [0, 1] is an interval, then P{X B} is the length of
that interval.
Uniform measure: is probability defined for all subsets?

I One of the
( very simplest probability density functions is
1 x [0, 1]
f (x) = .
0 0 6 [0, 1].
I If B [0, 1] is an interval, then P{X B} is the length of
that interval.
R R
I Generally, if B [0, 1] then P{X B} = B 1dx = 1B (x)dx
is the total volume or total length of the set B.
Uniform measure: is probability defined for all subsets?

I One of the
( very simplest probability density functions is
1 x [0, 1]
f (x) = .
0 0 6 [0, 1].
I If B [0, 1] is an interval, then P{X B} is the length of
that interval.
R R
I Generally, if B [0, 1] then P{X B} = B 1dx = 1B (x)dx
is the total volume or total length of the set B.
I What if B is the set of all rational numbers?
Uniform measure: is probability defined for all subsets?

I One of the
( very simplest probability density functions is
1 x [0, 1]
f (x) = .
0 0 6 [0, 1].
I If B [0, 1] is an interval, then P{X B} is the length of
that interval.
R R
I Generally, if B [0, 1] then P{X B} = B 1dx = 1B (x)dx
is the total volume or total length of the set B.
I What if B is the set of all rational numbers?
I How do we mathematically define the volume of an arbitrary
set B?
Idea behind parodox
I Hypothetical: Consider the interval [0, 1) with the two
endpoints glued together (so it looks like a circle). What if we
could partition [0, 1) into a countably infinite collection of
disjoint sets that all looked the same (up to a rotation of the
circle) and thus had to have the same probability?
Idea behind parodox
I Hypothetical: Consider the interval [0, 1) with the two
endpoints glued together (so it looks like a circle). What if we
could partition [0, 1) into a countably infinite collection of
disjoint sets that all looked the same (up to a rotation of the
circle) and thus had to have the same probability?
I If that probability was zero, then (by countable additivity)
probability of whole circle would be zero, a contradiction.
Idea behind parodox
I Hypothetical: Consider the interval [0, 1) with the two
endpoints glued together (so it looks like a circle). What if we
could partition [0, 1) into a countably infinite collection of
disjoint sets that all looked the same (up to a rotation of the
circle) and thus had to have the same probability?
I If that probability was zero, then (by countable additivity)
probability of whole circle would be zero, a contradiction.
I But if that probability were a number greater than zero the
probability of whole circle would be infinite, also a
contradiction...
Idea behind parodox
I Hypothetical: Consider the interval [0, 1) with the two
endpoints glued together (so it looks like a circle). What if we
could partition [0, 1) into a countably infinite collection of
disjoint sets that all looked the same (up to a rotation of the
circle) and thus had to have the same probability?
I If that probability was zero, then (by countable additivity)
probability of whole circle would be zero, a contradiction.
I But if that probability were a number greater than zero the
probability of whole circle would be infinite, also a
contradiction...
I Related problem: if (in a non-atomic world, where mass was
infinitely divisible) you could cut a cake into countably
infinitely many pieces all of the same weight, how much would
each piece weigh?
Idea behind parodox
I Hypothetical: Consider the interval [0, 1) with the two
endpoints glued together (so it looks like a circle). What if we
could partition [0, 1) into a countably infinite collection of
disjoint sets that all looked the same (up to a rotation of the
circle) and thus had to have the same probability?
I If that probability was zero, then (by countable additivity)
probability of whole circle would be zero, a contradiction.
I But if that probability were a number greater than zero the
probability of whole circle would be infinite, also a
contradiction...
I Related problem: if (in a non-atomic world, where mass was
infinitely divisible) you could cut a cake into countably
infinitely many pieces all of the same weight, how much would
each piece weigh?
I Question: Is it really possible to partition [0, 1) into
countably many identical (up to rotation) pieces?
Cutting things into identical slices: a warmup problem

I Consider the set of numbers {0, 1, 2, . . . , 99}.


Cutting things into identical slices: a warmup problem

I Consider the set of numbers {0, 1, 2, . . . , 99}.


I Lets suggest one fancy way to divide this set into ten equal
subsets that are translations of each other modulo 100.
Cutting things into identical slices: a warmup problem

I Consider the set of numbers {0, 1, 2, . . . , 99}.


I Lets suggest one fancy way to divide this set into ten equal
subsets that are translations of each other modulo 100.
I Two numbers are equivalent modulo 10 if their difference is
a multiple of 10 (so they end in same digit). Pick a set
S {0, 1, 2, . . . , 99} with one number from each equivalence
class, e.g., S = {40, 21, 42, 53, 94, 5, 76, 27, 28, 39}.
Cutting things into identical slices: a warmup problem

I Consider the set of numbers {0, 1, 2, . . . , 99}.


I Lets suggest one fancy way to divide this set into ten equal
subsets that are translations of each other modulo 100.
I Two numbers are equivalent modulo 10 if their difference is
a multiple of 10 (so they end in same digit). Pick a set
S {0, 1, 2, . . . , 99} with one number from each equivalence
class, e.g., S = {40, 21, 42, 53, 94, 5, 76, 27, 28, 39}.
I Then for each j {0, 10, 20, . . . , 90} define the set
Sj = {s + j : s S}, where addition is modulo 100.
Cutting things into identical slices: a warmup problem

I Consider the set of numbers {0, 1, 2, . . . , 99}.


I Lets suggest one fancy way to divide this set into ten equal
subsets that are translations of each other modulo 100.
I Two numbers are equivalent modulo 10 if their difference is
a multiple of 10 (so they end in same digit). Pick a set
S {0, 1, 2, . . . , 99} with one number from each equivalence
class, e.g., S = {40, 21, 42, 53, 94, 5, 76, 27, 28, 39}.
I Then for each j {0, 10, 20, . . . , 90} define the set
Sj = {s + j : s S}, where addition is modulo 100.
I Now observe that every number in {0, 1, 2, . . . , 99} lies in
exactly one of the ten Sj sets we have defined.
Cutting things into identical slices: a warmup problem

I Consider the set of numbers {0, 1, 2, . . . , 99}.


I Lets suggest one fancy way to divide this set into ten equal
subsets that are translations of each other modulo 100.
I Two numbers are equivalent modulo 10 if their difference is
a multiple of 10 (so they end in same digit). Pick a set
S {0, 1, 2, . . . , 99} with one number from each equivalence
class, e.g., S = {40, 21, 42, 53, 94, 5, 76, 27, 28, 39}.
I Then for each j {0, 10, 20, . . . , 90} define the set
Sj = {s + j : s S}, where addition is modulo 100.
I Now observe that every number in {0, 1, 2, . . . , 99} lies in
exactly one of the ten Sj sets we have defined.
I On next slide, were going to do something similar with [0, 1)
in place of {0, 1, 2, . . . , 99} and the rational numbers in
[0, 1) in place of {0, 10, 20, . . . , 90}.
Formulating the paradox precisely

I Consider wrap-around translations r (x) = (x + r ) mod 1.


Formulating the paradox precisely

I Consider wrap-around translations r (x) = (x + r ) mod 1.


I We expect r (B) to have same probability as B.
Formulating the paradox precisely

I Consider wrap-around translations r (x) = (x + r ) mod 1.


I We expect r (B) to have same probability as B.
I Call x, y equivalent modulo rationals if x y is rational
(e.g., x = 3 and y = 9/4). An equivalence class is
the set of points in [0, 1) equivalent to some given point.
Formulating the paradox precisely

I Consider wrap-around translations r (x) = (x + r ) mod 1.


I We expect r (B) to have same probability as B.
I Call x, y equivalent modulo rationals if x y is rational
(e.g., x = 3 and y = 9/4). An equivalence class is
the set of points in [0, 1) equivalent to some given point.
I There are uncountably many of these classes.
Formulating the paradox precisely

I Consider wrap-around translations r (x) = (x + r ) mod 1.


I We expect r (B) to have same probability as B.
I Call x, y equivalent modulo rationals if x y is rational
(e.g., x = 3 and y = 9/4). An equivalence class is
the set of points in [0, 1) equivalent to some given point.
I There are uncountably many of these classes.
I Let A [0, 1) contain one point from each class. For each
x [0, 1), there is one a A such that r = x a is rational.
Formulating the paradox precisely

I Consider wrap-around translations r (x) = (x + r ) mod 1.


I We expect r (B) to have same probability as B.
I Call x, y equivalent modulo rationals if x y is rational
(e.g., x = 3 and y = 9/4). An equivalence class is
the set of points in [0, 1) equivalent to some given point.
I There are uncountably many of these classes.
I Let A [0, 1) contain one point from each class. For each
x [0, 1), there is one a A such that r = x a is rational.
I Then each x in [0, 1) lies in r (A) for one rational r [0, 1).
Formulating the paradox precisely

I Consider wrap-around translations r (x) = (x + r ) mod 1.


I We expect r (B) to have same probability as B.
I Call x, y equivalent modulo rationals if x y is rational
(e.g., x = 3 and y = 9/4). An equivalence class is
the set of points in [0, 1) equivalent to some given point.
I There are uncountably many of these classes.
I Let A [0, 1) contain one point from each class. For each
x [0, 1), there is one a A such that r = x a is rational.
I Then each x in [0, 1) lies in r (A) for one rational r [0, 1).
I Thus [0, 1) = r (A) as r ranges over rationals in [0, 1).
Formulating the paradox precisely

I Consider wrap-around translations r (x) = (x + r ) mod 1.


I We expect r (B) to have same probability as B.
I Call x, y equivalent modulo rationals if x y is rational
(e.g., x = 3 and y = 9/4). An equivalence class is
the set of points in [0, 1) equivalent to some given point.
I There are uncountably many of these classes.
I Let A [0, 1) contain one point from each class. For each
x [0, 1), there is one a A such that r = x a is rational.
I Then each x in [0, 1) lies in r (A) for one rational r [0, 1).
I Thus [0, 1) = r (A) as r ranges over rationals in [0, 1).
P
I If P(A) =
P0, then P(S) = r P(r (A)) = 0. If P(A) > 0 then
P(S) = r P(r (A)) = . Contradicts P(S) = 1 axiom.
Three ways to get around this
I 1. Re-examine axioms of mathematics: the very existence
of a set A with one element from each equivalence class is
consequence of so-called axiom of choice. Removing that
axiom makes paradox goes away, since one can just suppose
(pretend?) these kinds of sets dont exist.
Three ways to get around this
I 1. Re-examine axioms of mathematics: the very existence
of a set A with one element from each equivalence class is
consequence of so-called axiom of choice. Removing that
axiom makes paradox goes away, since one can just suppose
(pretend?) these kinds of sets dont exist.
I 2. Re-examine axioms of probability: Replace countable
additivity with finite additivity? (Doesnt fully solve problem:
look up Banach-Tarski.)
Three ways to get around this
I 1. Re-examine axioms of mathematics: the very existence
of a set A with one element from each equivalence class is
consequence of so-called axiom of choice. Removing that
axiom makes paradox goes away, since one can just suppose
(pretend?) these kinds of sets dont exist.
I 2. Re-examine axioms of probability: Replace countable
additivity with finite additivity? (Doesnt fully solve problem:
look up Banach-Tarski.)
I 3. Keep the axiom of choice and countable additivity but
dont define probabilities of all sets: Instead of defining
P(B) for every subset B of sample space, restrict attention to
a family of so-called measurable sets.
Three ways to get around this
I 1. Re-examine axioms of mathematics: the very existence
of a set A with one element from each equivalence class is
consequence of so-called axiom of choice. Removing that
axiom makes paradox goes away, since one can just suppose
(pretend?) these kinds of sets dont exist.
I 2. Re-examine axioms of probability: Replace countable
additivity with finite additivity? (Doesnt fully solve problem:
look up Banach-Tarski.)
I 3. Keep the axiom of choice and countable additivity but
dont define probabilities of all sets: Instead of defining
P(B) for every subset B of sample space, restrict attention to
a family of so-called measurable sets.
I Most mainstream probability and analysis takes the third
approach.
Three ways to get around this
I 1. Re-examine axioms of mathematics: the very existence
of a set A with one element from each equivalence class is
consequence of so-called axiom of choice. Removing that
axiom makes paradox goes away, since one can just suppose
(pretend?) these kinds of sets dont exist.
I 2. Re-examine axioms of probability: Replace countable
additivity with finite additivity? (Doesnt fully solve problem:
look up Banach-Tarski.)
I 3. Keep the axiom of choice and countable additivity but
dont define probabilities of all sets: Instead of defining
P(B) for every subset B of sample space, restrict attention to
a family of so-called measurable sets.
I Most mainstream probability and analysis takes the third
approach.
I In practice, sets we care about (e.g., countable unions of
points and intervals) tend to be measurable.
Perspective

I More advanced courses in probability and analysis (such as


18.125 and 18.175) spend a significant amount of time
rigorously constructing a class of so-called measurable sets
and the so-called Lebesgue measure, which assigns a real
number (a measure) to each of these sets.
Perspective

I More advanced courses in probability and analysis (such as


18.125 and 18.175) spend a significant amount of time
rigorously constructing a class of so-called measurable sets
and the so-called Lebesgue measure, which assigns a real
number (a measure) to each of these sets.
I These courses also replace the Riemann integral with the
so-called Lebesgue integral.
Perspective

I More advanced courses in probability and analysis (such as


18.125 and 18.175) spend a significant amount of time
rigorously constructing a class of so-called measurable sets
and the so-called Lebesgue measure, which assigns a real
number (a measure) to each of these sets.
I These courses also replace the Riemann integral with the
so-called Lebesgue integral.
I We will not treat these topics any further in this course.
Perspective

I More advanced courses in probability and analysis (such as


18.125 and 18.175) spend a significant amount of time
rigorously constructing a class of so-called measurable sets
and the so-called Lebesgue measure, which assigns a real
number (a measure) to each of these sets.
I These courses also replace the Riemann integral with the
so-called Lebesgue integral.
I We will not treat these topics any further in this course.
I We usually limit our attention to probability density functions
fR and sets B for which the ordinary Riemann integral
1B (x)f (x)dx is well defined.
Perspective

I More advanced courses in probability and analysis (such as


18.125 and 18.175) spend a significant amount of time
rigorously constructing a class of so-called measurable sets
and the so-called Lebesgue measure, which assigns a real
number (a measure) to each of these sets.
I These courses also replace the Riemann integral with the
so-called Lebesgue integral.
I We will not treat these topics any further in this course.
I We usually limit our attention to probability density functions
fR and sets B for which the ordinary Riemann integral
1B (x)f (x)dx is well defined.
I Riemann integration is a mathematically rigorous theory. Its
just not as robust as Lebesgue integration.

You might also like