Merged Lectures
Merged Lectures
18.600: Lecture 1
Permutations and combinations, Pascals
triangle, learning to count
Scott Sheffield
MIT
Outline Outline
Permutations Permutations
Problems Problems
Politics Which of these statements is probably true?
I Suppose that, in some election, betting markets place the I 1. X (t) will go below 50 at some future point.
probability that your favorite candidate will be elected at 58 I 2. X (t) will get all the way below 20 at some point
percent. Price of a contact that pays 100 dollars if your
I 3. X (t) will reach both 70 and 30, at different future times.
candidate wins is 58 dollars.
I Market seems to say that your candidate will probably win, if I 4. X (t) will reach both 65 and 35 at different future times.
probably means with probability greater than .5. I 5. X (t) will hit 65, then 50, then 60, then 55.
I The price of such a contract may fluctuate in time. I Answers: 1, 2, 4.
I Let X (t) denote the price at time t. I Full explanations coming toward the end of the course.
I Suppose X (t) is known to vary continuously in time. What is I Problem sets in this course explore applications of probability
probability p it reaches 59 before 57?
to politics, medicine, finance, economics, science, engineering,
I If p > .5, we can make money in expecation by buying at 58
philosophy, dating, etc. Stories motivate the math and make
and selling when price hits 57 or 59.
it easier to remember.
I If p < .5, we can sell at 58 and buy when price hits 57 or 59.
I Efficient market hypothesis (a.k.a. no free money just lying
I Provocative question: what simple advice, that would greatly
around hypothesis) suggests p = .5 (with some caveats...) benefit humanity, are we unaware of? Foods to avoid?
I Natural model for prices: repeatedly toss coin, adding 1 for Exercises to do? Books to read? How would we know?
heads and 1 for tails, until price hits 0 or 100. I Lets start with easier questions.
Outline Outline
Permutations Permutations
Problems Problems
Permutations Permutation notation
Remark, just for fun I n ways to assign hat for the first person. No matter what
choice I make, there will remain n 1 ways to assign hat to
the second person. No matter what choice I make there, there
Permutations will remain n 2 ways to assign a hat to the third person, etc.
I This is a useful trick: break counting problem into a sequence
Counting tricks of stages so that one always has the same number of choices
to make at each stage. Then the total count becomes a
product of number of choices available at each stage.
Binomial coefficients
I Easy to make mistakes. For example, maybe in your problem,
the number of choices at one stage actually does depend on
Problems choices made during earlier stages.
Problems
n
Outline k notation
Problems
More problems
I How many hands that have four cards of the same suit, one
card of another suit?
4 13 13
4 3 1
I
18.600: Lecture 2
Multinomial coefficients and more counting Multinomial coefficients
problems
Integer partitions
Scott Sheffield
MIT
More problems
A1 A2 A3 A4 + A1 A2 A3 B4 + A1 A2 B3 A4 + A1 A2 B3 B4 +
I In general, if you have n elements you wish to divide into r A1 B2 A3 A4 + A1 B2 A3 B4 + A1 B2 B3 A4 + A1 B2 B3 B4 +
distinct piles of sizes n1 , n2 . . . nr , how many ways to do that?
B1 A2 A3 A4 + B1 A2 A3 B4 + B1 A2 B3 A4 + B1 A2 B3 B4 +
Answer n1 ,n2n,...,nr := n1 !n2n!!...nr ! .
I
B1 B2 A3 A4 + B1 B2 A3 B4 + B1 B2 B3 A4 + B1 B2 B3 B4
I What happens to this sum if we erase subscripts?
I (A + B)4 = B 4 + 4AB 3 + 6A2 B 2 + 4A3 B + A4 . Coefficient of
A2 B 2 is 6 because 6 length-4 sequences have 2 As and 2 Bs.
(A + B)n = nk=0 kn Ak B nk , because there are
P
I Generally,
n
k sequences with k As and (n k) Bs.
Multinomial coefficients
I How many sequences a1 , . . . , ak of non-negative integers
satisfy a1 + a2 + . . . + ak = n?
Integer partitions Answer: n+k1
I
n . Represent partition by k 1 bars and n
stars, e.g., as | || |.
More problems
Outline Outline
What is probability?
Sample space
Scott Sheffield
DeMorgans laws
MIT
Axioms of probability
Even more fundamental question: defining a set of possible Event: subset of the sample space
outcomes
I Roll a die n times. Define a sample space to be I If a set A is comprised of some of the elements of B, say A is
{1, 2, 3, 4, 5, 6}n , i.e., the set of a1 , . . . , an with each a subset of B and write A B.
aj {1, 2, 3, 4, 5, 6}. I Similarly, B A means A is a subset of B (or B is a superset
I Shuffle a standard deck of cards. Sample space is the set of of A).
52! permutations. I If S is a finite sample space with n elements, then there are 2n
I Will it rain tomorrow? Sample space is {R, N}, which stand subsets of S.
for rain and no rain. I Denote by the set with no elements.
I Randomly throw a dart at a board. Sample space is the set of
points on the board.
Intersections, unions, complements Venn diagrams
AB Ac B Formalizing probability
Sample space
A Bc Ac B c
A B DeMorgans laws
Axioms of probability
Outline DeMorgans laws
Formalizing probability I It will not snow or rain means It will not snow and it will
not rain.
I If S is event that it snows, R is event that it rains, then
Sample space (S R)c = S c R c
I More generally: (ni=1 Ei )c = ni=1 (Ei )c
DeMorgans laws
I It will not both snow and rain means Either it will not
snow or it will not rain.
I (S R)c = S c R c
Axioms of probability I (ni=1 Ei )c = ni=1 (Ei )c
Outline Outline
18.600: Lecture 4
Axioms of probability and Axioms of probability
inclusion-exclusion
Consequences of axioms
Scott Sheffield
MIT
Inclusion exclusion
Axioms of probability
I P(A) [0, 1] for all A S.
I P(S) = 1.
Consequences of axioms I Finite additivity: P(A B) = P(A) + P(B) if A B = .
P
I Countable additivity: P(i=1 Ei ) = i=1 P(Ei ) if Ei Ej =
for each pair i and j.
Inclusion exclusion
Axiom breakdown
Outline
I Neurological: When I think it will rain tomorrow the
truth-sensing part of my brain exhibits 30 percent of its
maximum electrical activity. Should have P(A) [0, 1],
maybe P(S) = 1, not necessarily P(A B) = P(A) + P(B)
when A B = .
I Frequentist: P(A) is the fraction of times A occurred during Axioms of probability
the previous (large number of) times we ran the experiment.
Seems to satisfy axioms...
I Market preference (risk neutral probability): P(A) is Consequences of axioms
price of contract paying dollar if A occurs divided by price of
contract paying dollar regardless. Seems to satisfy axioms,
assuming no arbitrage, no bid-ask spread, complete market... Inclusion exclusion
I Personal belief: P(A) is amount such that Id be indifferent
between contract paying 1 if A occurs and contract paying
P(A) no matter what. Seems to satisfy axioms with some
notion of utility units, strong assumption of rationality...
Outline Intersection notation
Axioms of probability
Inclusion exclusion
18.600: Lecture 5
Equal likelihood
Problems with all outcomes equally likely,
including a famous hat problem A few problems
Scott Sheffield
Hat problem
MIT
Equal likelihood
Problems Outline
I Roll two dice. What is the probability that their sum is three?
I 2/36 = 1/18
I Toss eight coins. What is the probability that exactly five of Equal likelihood
them are heads?
8
8
5 /2
I
I In a class of 100 people with cell phone numbers, what is the A few problems
probability that nobody has a number ending in 37?
I (99/100)100 1/e Hat problem
I Roll ten dice. What is the probability that a 6 appears on
exactly five of the dice?
10 5 10
A few more problems
5 5 /6
I
I
Equal likelihood
n
X X
P(ni=1 Ei ) = P(Ei ) P(Ei1 Ei2 ) + . . .
A few problems i=1 i1 <i2
X
+ (1)(r +1) P(Ei1 Ei2 . . . Eir )
i1 <i2 <...<ir
Hat problem n+1
= + . . . + (1) P(E1 E2 . . . En ).
n
P
I The notation i1 <i2 <ir means a sum over all of the r
A few more problems
subsets of size r of the set {1, 2, . . . , n}.
I n people toss hats into a bin, randomly shuffle, return one hat
to each person. Find probability nobody gets own hat.
Equal likelihood
I Inclusion-exclusion. Let Ei be the event that ith person gets
own hat.
I What is P(Ei1 Ei2 . . . Eir )? A few problems
(nr )!
I Answer: n! .
I There are nr terms like that in the inclusion exclusion sum. Hat problem
n (nr )!
What is r n! ?
I Answer: r1! .
1 1 1 1
A few more problems
I P(ni=1 Ei ) = 1 2! + 3! 4! + . . . n!
1 1 1 1
I 1 P(ni=1 Ei ) = 1 1 + 2! 3! + 4! . . . n! 1/e .36788
Outline Problems
I Whats the probability of a full house in poker (i.e., in a five
card hand, 2 have one value and three have another)?
I Answer 1:
Equal likelihood # ordered distinct-five-card sequences giving full house
# ordered distinct-five-card sequences
I Thats
A few problems 5
2 13 12 (4 3 2) (4 3)/(52 51 50 49 48) = 6/4165.
I Answer 2:
Hat problem # unordered distinct-five-card sets giving full house
# unordered distinct-five-card sets
Thats 13 12 43 42 / 52
5 = 6/4165.
I
A few more problems What is the probability of a two-pair hand in poker?
I
Fix suit breakdown, then face values: 42 2 13
13 52
2 13/ 5
I
2
I How about bridge hand with 3 of one suit, 3 of one suit, 2 of
one
suit, 13
5 of another
suit?
4 13 13 13
52
2 2 3 5 / 13
I
3 2
Outline
18.600: Lecture 6
Definition: probability of A given B
Conditional probability
MIT
Multiplication rule
Examples Examples
Examples Examples
I Given that your friend has exactly two children, one of whom
is a son born on a Tuesday, what is the probability the second
child is a son.
I Make the obvious (though not quite correct) assumptions.
Every child is either boy or girl, and equally likely to be either
one, and all days of week for birth equally likely, etc.
I Make state space matrix of 196 = 14 14 elements
I Easy to see answer is 13/27.
Outline
18.600: Lecture 7
Bayes formula and independence Bayes formula
Scott Sheffield
MIT Independence
I
I Bayes theorem/law/rule states the following:
P(E ) = P(EF ) + P(EF c ) P(A|B) = P(B|A)P(A)
P(B) .
c c
= P(E |F )P(F ) + P(E |F )P(F ) I Follows from definition of conditional probability:
P(AB) = P(B)P(A|B) = P(A)P(B|A).
I In words: want to know the probability of E . There are two I Tells how to update estimate of probability of A when new
scenarios F and F c . If I know the probabilities of the two evidence restricts your sample space to B.
scenarios and the probability of E conditioned on each P(B|A)
scenario, I can work out the probability of E .
I So P(A|B) is P(B) times P(A).
P(B|A)
I Example: D = have disease, T = positive test. I Ratio P(B) determines how compelling new evidence is.
I If P(D) = p, P(T |D) = .9, and P(T |D c ) = .1, then I What does it mean if ratio is zero?
P(T ) = .9p + .1(1 p). I What if ratio is 1/P(A)?
I What is P(D|T )?
Outline Outline
Independence Independence
Independence Independence of multiple events
I Say E and F are independent if P(EF ) = P(E )P(F ).
I Equivalent statement: P(E |F ) = P(E ). Also equivalent:
I Say E1 . . . En are independent if for each
P(F |E ) = P(F ).
{i1 , i2 , . . . , ik } {1, 2, . . . n} we have
I Example: toss two coins. Sample space contains four equally P(Ei1 Ei2 . . . Eik ) = P(Ei1 )P(Ei2 ) . . . P(Eik ).
likely elements (H, H), (H, T ), (T , H), (T , T ).
I In other words, the product rule works.
I Is event that first coin is heads independent of event that
second coin heads.
I Independence implies P(E1 E2 E3 |E4 E5 E6 ) =
P(E1 )P(E2 )P(E3 )P(E4 )P(E5 )P(E6 )
P(E4 )P(E5 )P(E6 ) = P(E1 E2 E3 ), and other similar
I Yes: probability of each event is 1/2 and probability of both is
statements.
1/4.
I Does pairwise independence imply independence?
I Is event that first coin is heads independent of event that
number of heads is odd? I No. Consider these three events: first coin heads, second coin
heads, odd number heads. Pairwise independent, not
I Yes: probability of each event is 1/2 and probability of both is
independent.
1/4...
I despite fact that (in everyday English usage of the word)
oddness of the number of heads depends on the first coin.
18.600: Lecture 8
Defining random variables
Discrete random variables
MIT
Recursions
Indicators Outline
I In this example, X is called a Poisson random variable with Defining random variables
intensity .
I Question: what is the state space in this example?
I Answer: Didnt specify. One possibility would be to define Probability mass function and distribution function
state space as S = {0, 1, 2, . . .} and define X (as a function
on S) by X (j) = j. ThePprobability function would be
determined by P(S) = kS e k /k!. Recursions
I Are there other choices of S and P and other functions X
from S to P for which the values of P{X = k} are the
same?
I Yes. X is a Poisson random variable with intensity is
statement only about the probability mass function of X .
Outline Using Bayes rule to set up recursions
I Gambler one has positive integer m dollars, gambler two has
positive integer n dollars. Take turns making one dollar bets
until one runs out of money. What is probability first gambler
runs out of money first?
Defining random variables I n/(m + n)
I Gamblers ruin: what if gambler one has an unlimited
amount of money?
Probability mass function and distribution function I Wins eventually with probability one.
I Problem of points: in sequence of independent fair coin
tosses, what is probability Pn,m to see n heads before seeing m
Recursions tails?
I Observe: Pn,m is equivalent to the probability of having n or
more heads in first m + n 1 trials.
Probability of exactly n heads in m + n 1 trials is m+n1
I
n .
I Famous correspondence by Fermat and Pascal. Led Pascal to
write Le Triangle Arithmetique.
Outline
18.600: Lecture 9
Defining expectation
Expectations of discrete random variables
MIT
Motivation
I Let X be the number that comes up when you roll a standard I If X and Y are distinct random variables, then can one say
six-sided die. What is E [X 2 ]?
that E [X + Y ] = E [X ] + E [Y ]?
I 1 (1 + 4 + 9 + 16 + 25 + 36) = 91/12
6 I Yes. In fact, for real constants a and b, we have
I Let Xj be 1 if the jth coin toss is heads and 0 otherwise.
E [aX + bY ] = aE [X ] + bE [Y ].
What is the expectation of X = ni=1 Xj ?
P
Pn I This is called the linearity of expectation.
I Can compute this directly as
k=0 P{X = k}k. I Another way to state this fact: given sample space S and
I Alternatively, use symmetry. Expected number of heads probability measure P, the expectation E [] is a linear
should be same as expected number of tails. real-valued function on the space of random variables.
I This implies E [X ] = E [n X ]. Applying I Can extend to more variables
E [aX + b] = aE [X ] + b formula (with a = 1 and b = n), we E [X1 + X2 + . . . + Xn ] = E [X1 ] + E [X2 ] + . . . + E [Xn ].
obtain E [X ] = n E [X ] and conclude that E [X ] = n/2.
More examples Outline
I Contract one: Ill toss 10 coins, and if they all come up heads
(probability about one in a thousand), Ill give you 20 billion
dollars.
I Contract two: Ill just give you ten million dollars.
I What are expectations of the two contracts? Which would
you prefer?
I Can you find a function u(x) such that given two random
wealth variables W1 and W2 , you prefer W1 whenever
E [u(W1 )] < E [u(W2 )]?
I Lets assume u(0) = 0 and u(1) = 1. Then u(x) = y means
that you are indifferent between getting 1 dollar no matter
what and getting x dollars with probability 1/y .
Outline
MIT
Properties
Decomposition trick
I Also, X
E [g (X )] = g (x)p(x).
x:p(x)>0
Defining variance Very important alternatative formula
Outline Outline
Examples Examples
Properties Properties
Outline Outline
Examples Examples
Properties Properties
Outline Outline
Examples Examples
Properties Properties
Answer: 52
I
5 . I Now A2 = (AP 1 + AP
2
2 + . . . + A5 ) can be expanded into 25
I How many such hands have k aces? terms: A2 = 5i=1 5j=1 Ai Aj .
Answer: k4 5k
48
. So E [A2 ] = 5i=1 5j=1 E [Ai Aj ].
P P
I I
I So Var[X ] = E [X 2 ] (E [X ])2 = 2 1 = 1.
Outline
18.600: Lecture 11
Binomial random variables and repeated Bernoulli random variables
trials
Properties: expectation and variance
Scott Sheffield
MIT
More problems
Outline Outline
n
n(n1)...(ni+1)
I Recall that i = i(i1)...(1) . This implies a simple
n n1
I Let X be a binomial random variable with parameters (n, p). but important identity: i i = n i1 .
I What is E [X ]? I Using this identity (and q = 1 p), we can write
I Direct approach: by definition of expectation, n n
E [X ] = ni=0 P{X = i}i.
P X n i ni X n 1 i ni
E [X ] = i pq = n pq .
i i 1
I What happens if we modify the nth row of Pascals triangle by i=0 i=1
multiplying the i term by i? Pn n1
p (i1) q (n1)(i1) .
I Rewrite this as E [X ] = np i=1 i1
I For example, replace the 5th row (1, 5, 10, 10, 5, 1) by
(0, 5, 20, 30, 20, 5). Does this remind us of an earlier row in I Substitute j = i 1 to get
the triangle? n1
X n1
I Perhaps the prior row (1, 4, 6, 4, 1)? E [X ] = np p j q (n1)j = np(p + q)n1 = np.
j
j=0
Outline Outline
I An airplane seats 200, but the airline has sold 205 tickets.
Each person, independently, has a .05 chance of not showing
up for the flight. What is the probability that more than 200
people will show up for the flight?
P205 205
j 205j
j .95 .05
I
j=201
I In a 100 person senate, forty people always vote for the
Republicans position, forty people always for the Democrats
position and 20 people just toss a coin to decide which way to
vote. What is the probability that a given vote is tied?
20
20
10 /2
I
18.600: Lecture 12
Poisson random variable definition
Poisson random variables
MIT
Outline Outline
P j
I Setting j = k 1, this is j=0 j! e = .
Variance Outline
k
I Given P{X = k} = k! e for integer k 0, what is Var[X ]?
I Think of X as (roughly) a Bernoulli (n, p) random variable
with n very large and p = /n.
I This suggests Var[X ] npq (since np and
Poisson random variable definition
q = 1 p 1). Can we show directly that Var[X ] = ?
I Compute
Poisson random variable properties
X X k X k1
E [X 2 ] = P{X = k}k 2 = k2 e = k e .
k! (k 1)!
k=0 k=0 k=1
I Then Var[X ] = E [X 2 ] E [X ]2 = ( + 1) 2 = .
Outline Poisson random variable problems
I A country has an average of 2 plane crashes per year.
I How reasonable is it to assume the number of crashes is
Poisson with parameter 2?
I Assuming this, what is the probability of exactly 2 crashes?
Poisson random variable definition Of zero crashes? Of four crashes?
I e k /k! with = 2 and k set to 2 or 0 or 4
I A city has an average of five major earthquakes a century.
Poisson random variable properties What is the probability that there is at least one major
earthquake in a given decade (assuming the number of
earthquakes per decade is Poisson)?
Poisson random variable problems I 1 e k /k! with = .5 and k = 0
I A casino deals one million five-card poker hands per year.
Approximate the probability that there are exactly 2 royal
flush hands during a given year.
Expected number of royal flushes is = 106 4/ 52
5 1.54.
I
Answer is e k /k! with k = 2.
Outline
18.600: Lecture 13
Lectures 1-12 Review Counting tricks and basic principles of probability
Scott Sheffield
I Observe P(A B) = P(A) + P(B) P(AB). I n people toss hats into a bin, randomly shuffle, return one hat
I Also, P(E F G ) = to each person. Find probability nobody gets own hat.
P(E ) + P(F ) + P(G ) P(EF ) P(EG ) P(FG ) + P(EFG ). I Inclusion-exclusion. Let Ei be the event that ith person gets
I More generally, own hat.
n
X X I What is P(Ei1 Ei2 . . . Eir )?
P(ni=1 Ei ) = P(Ei ) P(Ei1 Ei2 ) + . . . (nr )!
I Answer: n! .
i=1 i1 <i2
(r +1)
X I There are nr terms like that in the inclusion exclusion sum.
+ (1) P(Ei1 Ei2 . . . Eir ) n (nr )!
What is r n! ?
i1 <i2 <...<ir
1
= + . . . + (1)n+1 P(E1 E2 . . . En ).
I Answer: r! .
1 1 1 1
I P(ni=1 Ei ) = 1 2! + 3! 4! + ... n!
n
P
I The notation i1 <i2 <...<ir means a sum over all of the r I 1
1 P(ni=1 Ei ) = 1 1 + 2! 1
3! 1
+ 4! 1
. . . n! 1/e .36788
subsets of size r of the set {1, 2, . . . , n}.
Conditional probability Dividing probability into two cases
I Bayes theorem/law/rule states the following: I We can check the probabilityPaxioms: 0 P(E |F ) 1,
P(A|B) = P(B|A)P(A)
P(B) . P(S|F ) = 1, and P(Ei ) = P(Ei |F ), if i ranges over a
I Follows from definition of conditional probability: countable set and the Ei are disjoint.
P(AB) = P(B)P(A|B) = P(A)P(B|A). I The probability measure P(|F ) is related to P().
I Tells how to update estimate of probability of A when new I To get former from latter, we set probabilities of elements
evidence restricts your sample space to B. outside of F to zero and multiply probabilities of events inside
I So P(A|B) is P(B|A)
times P(A). of F by 1/P(F ).
P(B)
P(B|A)
I P() is the prior probability measure and P(|F ) is the
I Ratio P(B) determines how compelling new evidence is. posterior measure (revised after discovering that F occurs).
Independence Independence of multiple events
Outline Outline
Counting tricks and basic principles of probability Counting tricks and basic principles of probability
I A random variable X is a function from the state space to the I Given any event E , can define an indicator random variable,
real numbers. i.e., let X be random variable equal to 1 on the event E and 0
I Can interpret X as a quantity whose value depends on the otherwise. Write this as X = 1E .
outcome of an experiment. I The value of 1E (either 1 or 0) indicates whether the event
I Say X is a discrete random variable if (with probability one) has occurred.
If E1 , E2 , . . . , Ek are events then X = ki=1 1Ei is the number
P
if it takes one of a countable set of values. I
I For each a in this countable set, write p(a) := P{X = a}. of these events that occur.
Call p the probability mass function. I Example: in n-hat shuffle problem, let Ei be the event ith
I
P
Write F (a) = P{X a} = xa p(x). Call F the person gets own hat.
Then ni=1 1Ei is total number of people who get own hats.
P
cumulative distribution function. I
I Say X is a discrete random variable if (with probability one) I If the state space S is countable, we can give SUM OVER
it takes one of a countable set of values. STATE SPACE definition of expectation:
I For each a in this countable set, write p(a) := P{X = a}. X
Call p the probability mass function. E [X ] = P{s}X (s).
I The expectation of X , written E [X ], is defined by sS
I If X is a random variable and g is a function from the real I If X and Y are distinct random variables, then
numbers to the real numbers then g (X ) is also a random E [X + Y ] = E [X ] + E [Y ].
variable. I In fact, for real constants a and b, we have
I How can we compute E [g (X )]? E [aX + bY ] = aE [X ] + bE [Y ].
I Answer: X I This is called the linearity of expectation.
E [g (X )] = g (x)p(x). I Can extend to more variables
x:p(x)>0
E [X1 + X2 + . . . + Xn ] = E [X1 ] + E [X2 ] + . . . + E [Xn ].
Poisson processes
What should a Poisson point process be?
Scott Sheffield
Poisson point process axioms
MIT
Consequences of axioms
I Many phenomena (number of phone calls or customers I Example: Joe works for a bank and notices that his town sees
arriving in a given period, number of radioactive emissions in an average of one mortgage foreclosure per month.
a given time period, number of major hurricanes in a given I Moreover, looking over five years of data, it seems that the
time period, etc.) can be modeled this way. number of foreclosures per month follows a rate 1 Poisson
I A Poisson random variable X with parameter has distribution.
expectation and variance . I That is, roughly a 1/e fraction of months has 0 foreclosures, a
1
I Special case: if = 1, then P{X = k} = k!e . 1/e fraction has 1, a 1/(2e) fraction has 2, a 1/(6e) fraction
I Note how quickly this goes to zero, as a function of k. has 3, and a 1/(24e) fraction has 4.
I Example: number of royal flushes in a million five-card poker I Joe concludes that the probability of seeing 10 foreclosures
hands is approximately Poisson with parameter during a given month is only 1/(10!e). Probability to see 10
106 /649739 1.54. or
P more (an extreme tail event that would destroy the bank) is
I Example: if a country expects 2 plane crashes in a year, then k=10 1/(k!e), less than one in million.
the total number might be approximately Poisson with I Investors are impressed. Joe receives large bonus.
parameter = 2. I But probably shouldnt....
Outline Outline
What should a Poisson point process be? What should a Poisson point process be?
What should a Poisson point process be? What should a Poisson point process be?
Consequences of axioms: time till first event Consequences of axioms: time till second, third events
18.600: Lecture 16
Geometric random variables
More discrete random variables
MIT
Problems
Example Outline
MIT
Uniform random variable on [, ]
(
1/2 x [0, 2]
I Suppose f (x) =
x6 [0, 2].
(
0 x/2 x [0, 2]
I Suppose f (x) =
I What is P{X < 3/2}? 0 06 [0, 2].
I What is P{X = 3/2}? I What is P{X < 3/2}?
I What is P{1/2 < X < 3/2}? I What is P{X = 3/2}?
I What is P{X (0, 1) (3/2, 5)}? I What is P{1/2 < X < 3/2}?
I What is F ? I What is F ?
I We say that X is uniformly distributed on the interval
[0, 2].
Outline Outline
Expectation and variance of continuous random variables Expectation and variance of continuous random variables
Measurable sets and a famous paradox Measurable sets and a famous paradox
Expectations of continuous random variables Variance of continuous random variables
I Recall that when X was a discrete random variable, with
p(x) = P{X = x}, we wrote I Suppose X is a continuous random variable with mean .
X I We can write Var[X ] = E [(X )2 ], same as in the discrete
E [X ] = p(x)x.
case.
x:p(x)>0
I Next, if g =R g1 + g2 then R
I How should we define E [X ] when X is a continuous random RE [g (X )] = g1 (x)f
(x)dx + g2 (x)f (x)dx =
variable? g1 (x) + g2 (x) f (x)dx = E [g1 (X )] + E [g2 (X )].
R
I Answer: E [X ] = f (x)xdx. I Furthermore, E [ag (X )] = aE [g (X )] when a is a constant.
I Recall that when X was a discrete random variable, with I Just as in the discrete case, we can expand the variance
p(x) = P{X = x}, we wrote expression as Var[X ] = E [X 2 2X + 2 ] and use additivity
X of expectation to say that
E [g (X )] = p(x)g (x). Var[X ] = E [X 2 ] 2E [X ] + E [2 ] = E [X 2 ] 22 + 2 =
x:p(x)>0 E [X 2 ] E [X ]2 .
I What is the analog when X is a continuous random variable? I This formula is often useful for calculations.
R
I Answer: we will write E [g (X )] = f (x)g (x)dx.
Outline Outline
Expectation and variance of continuous random variables Expectation and variance of continuous random variables
Measurable sets and a famous paradox Measurable sets and a famous paradox
Recall continuous random variable definitions Uniform random variables on [0, 1]
I Suppose X is a random
( variable with probability density
1
x [, ]
function f (x) = Continuous random variables
0 x 6 [, ].
I What is E [X ]?
+
Expectation and variance of continuous random variables
I Intuitively, wed guess the midpoint 2 .
I Whats the cleanest way to prove this?
Uniform random variable on [0, 1]
I One approach: let Y be uniform on [0, 1] and try to show that
X = ( )Y + is uniform on [, ].
I Then expectation linearity gives Uniform random variable on [, ]
+
E [X ] = ( )E [Y ] + = (1/2)( ) + = 2 .
I Using similar logic, what is the variance Var[X ]? Measurable sets and a famous paradox
I Answer: Var[X ] = Var[( )Y + ] = Var[( )Y ] =
( )2 Var[Y ] = ( )2 /12.
Outline Uniform measure on [0, 1]
18.600: Lecture 18
Tossing coins
Normal random variables
MIT
Tossing coins I Suppose we toss a million fair coins. How many heads will we
get?
I About half a million, yes, but how close to that? Will we be
Normal random variables off by 10 or 1000 or 100,000?
I How can we describe the error?
I Lets try this out.
Special case of central limit theorem
Tossing coins Outline
Problems
Scott Sheffield
Memoryless property
MIT
Memoryless property
I Last one has simple form for exponential random variables.
We have P{Y > a} = e a for a [0, ).
I Note: X > a if and only if X1 > a and X2 > a.
Relationship to Poisson random variables I X1 and X2 are independent, so
P{X > a} = P{X1 > a}P{X2 > a} = e 1 a e 2 a = e a .
I If X1 , . . . , Xn are independent exponential with 1 , . . . n , then
min{X1 , . . . Xn } is exponential with = 1 + . . . + n .
Outline Outline
I Alice assumes Bob means independent tosses of a fair coin. I Suppose you start at time zero with n radioactive particles.
Under this assumption, all 211 outcomes of eleven-coin-toss Suppose that each one (independently of the others) will
sequence are equally likely. Bob considers HHHHHHHHHHH decay at a random time, which is an exponential random
more likely than HHHHHHHHHHT, since former could result variable with parameter .
from a faulty coin. I Let T be amount of time until no particles are left. What are
I Alice sees Bobs point but considers it annoying and churlish E [T ] and Var[T ]?
to ask about coin toss sequence and criticize listener for I Let T1 be the amount of time you wait until the first particle
assuming this means independent tosses of fair coin. decays, T2 the amount of additional time until the second
I Without that assumption, Alice has no idea what context Bob particle decays, etc., so that T = T1 + T2 + . . . Tn .
has in mind. (An environment where two-headed novelty coins I Claim: T1 is exponential with parameter n.
are common? Among coin-tossing cheaters with particular
I Claim: T2 is exponential with parameter (n 1).
agendas?...)
And so forth. E [T ] = ni=1 E [Ti ] = 1 nj=1 1j and (by
P P
I
I Alice: you need assumptions to convert stories into math.
independence) Var[T ] = ni=1 Var[Ti ] = 2 nj=1 j12 .
P P
I Bob: good to question assumptions.
Outline Outline
18.600: Lecture 20
Gamma distribution
More continuous random variables
MIT
Beta distribution
k n1 n1 x 1 (x)(n1) e x
p e p= .
(n 1)! N (n 1)!
(n1) e x
I The probability from previous side, N1 (x)(n1)! suggests
the form for a continuum random variable. Gamma distribution
I Replace n (generally integer valued) with (which we will
eventually allow be to be any real number).
I Say that random variable X has( gamma distribution with Cauchy distribution
(x)1 e x
() x 0
parameters (, ) if fX (x) = .
0 x <0
I Waiting time interpretation makes sense only for integer , Beta distribution
but distribution is defined for general positive .
Outline Cauchy distribution
Cauchy distribution: Brownian motion interpretation Question: what if we start at (0, 2)?
MIT
Independent random variables
Examples
Examples Examples
Joint probability mass functions: discrete random variables Joint distribution functions: continuous random variables
I Roll a die repeatedly and let X be such that the first even
Distributions of functions of random variables
number (the first 2, 4, or 6) appears on the X th roll.
I Let Y be the the number that appears on the X th roll.
Joint distributions I Are X and Y independent? What is their joint law?
I If j 1, then
Independent random variables P{X = j, Y = 2} = P{X = j, Y = 4}
I On a certain hiking trail, it is well known that the lion, tiger, I Lion, tiger, and bear attacks are independent Poisson
and bear attacks are independent Poisson processes with
processes with values .1/hour, .2/hour, and .3/hour.
respective values of .1/hour, .2/hour, and .3/hour.
I Distribution of time Ttiger till first tiger attack?
I Let T R be the amount of time until the first animal
attacks. Let A {lion, tiger, bear} be the species of the first
I Exponential tiger = .2/hour. So P{Ttiger > a} = e .2a .
attacking animal. I How about E [Ttiger ] and Var[Ttiger ]?
I What is the probability density function for T ? How about I E [Ttiger ] = 1/tiger = 5 hours, Var[Ttiger ] = 1/2tiger = 25
E [T ]? hours squared.
I Are T and A independent? I Time until 5th attack by any animal?
I Let T1 be the time until the first attack, T2 the subsequent I distribution with = 5 and = .6.
time until the second attack, etc., and let A1 , A2 , . . . be the I X , where X th attack is 5th bear attack?
corresponding species. I Negative binomial with parameters p = 1/2 and n = 5.
I Are all of the Ti and Ai independent of each other? What are I Can hiker breathe sigh of relief after 5 attack-free hours?
their probability distributions?
I Sum Z of n independent copies of X ? I We claimed in an earlier lecture that this was a gamma
I We can interpret Z as time slot where nth head occurs in distribution with parameters (, n).
e y (y )n1
i.i.d. sequence of p-coin tosses. I So fZ (y ) = (n) .
I So Z is negative binomial
n1 (n, p). So I We argued this point by taking limits of negative binomial
P{Z = k} = k1n1 p (1 p)kn p. distributions. Can we check it directly?
I By induction, would suffice to show that a gamma (, 1) plus
an independent gamma (, n) is a gamma (, n + 1).
18.600: Lecture 23
Conditional probability, order statistics, Conditional probability densities
expectations of sums
Order statistics
Scott Sheffield
MIT
Expectations of sums
Outline Outline
Example Outline
Properties of expectation
18.600: Lecture 24
Covariance and some conditional
expectation exercises Covariance and correlation
Scott Sheffield
Paradoxes: getting ready to think about conditional expectation
MIT
18.600: Lecture 25
Conditional probability distributions
Conditional expectation
MIT
I Definition:
Var(X |Y ) = E (X E [X |Y ])2 |Y = E X 2 E [X |Y ]2 |Y .
I Let X be a random variable of variance X2 and Y an
I Var(X |Y ) is a random variable that depends on Y . It is the independent random variable of variance Y2 and write
variance of X in the conditional distribution for X given Y . Z = X + Y . Assume E [X ] = E [Y ] = 0.
I Note E [Var(X |Y )] = E [E [X 2 |Y ]] E [E [X |Y ]2 |Y ] = I What are the covariances Cov(X , Y ) and Cov(X , Z )?
E [X 2 ] E [E [X |Y ]2 ]. I How about the correlation coefficients (X , Y ) and (X , Z )?
I If we subtract E [X ]2 from first term and add equivalent value I What is E [Z |X ]? And how about Var(Z |X )?
E [E [X |Y ]]2 to the second, RHS becomes I Both of these values are functions of X . Former is just X .
Var[X ] Var[E [X |Y ]], which implies following:
Latter happens to be a constant-valued function of X , i.e.,
I Useful fact: Var(X ) = Var(E [X |Y ]) + E [Var(X |Y )]. happens not to actually depend on X . We have
I One can discover X in two stages: first sample Y from Var(Z |X ) = Y2 .
marginal and compute E [X |Y ], then sample X from I Can we check the formula
distribution given Y value. Var(Z ) = Var(E [Z |X ]) + E [Var(Z |X )] in this case?
I Above fact breaks variance into two parts, corresponding to
these two stages.
Outline Outline
Interpretation Examples
I Sometimes think of the expectation E [Y ] as a best guess or I Toss 100 coins. Whats the conditional expectation of the
best predictor of the value of Y . number of heads given that there are k heads among the first
I It is best in the sense that at among all constants m, the fifty tosses?
expectation E [(Y m)2 ] is minimized when m = E [Y ]. I k + 25
I But what if we allow non-constant predictors? What if the I Whats the conditional expectation of the number of aces in a
predictor is allowed to depend on the value of a random five-card poker hand given that the first two cards in the hand
variable X that we can observe directly? are aces?
I Let g (x) be such a function. Then E [(y g (X ))2 ] is I 2 + 3 2/50
minimized when g (X ) = E [Y |X ].
Outline
18.600: Lecture 26
Moment generating functions and Moment generating functions
characteristic functions
Characteristic functions
Scott Sheffield
MIT
Continuity theorems and perspective
I We showed that if Z = X + Y and X and Y are independent, I If Z = aX then can I use MX to determine MZ ?
then MZ (t) = MX (t)MY (t) I Answer: Yes. MZ (t) = E [e tZ ] = E [e taX ] = MX (at).
I If X1 . . . Xn are i.i.d. copies of X and Z = X1 + . . . + Xn then I If Z = X + b then can I use MX to determine MZ ?
what is MZ ? I Answer: Yes. MZ (t) = E [e tZ ] = E [e tX +bt ] = e bt MX (t).
I Answer: MXn . Follows by repeatedly applying formula above. I Latter answer is the special case of MZ (t) = MX (t)MY (t)
I This a big reason for studying moment generating functions. where Y is the constant random variable b.
It helps us understand what happens when we sum up a lot of
independent copies of the same random variable.
Examples More examples: normal random variables
Continuity theorems
18.600: Lecture 27
Continuous random variables
Lectures 15-27 Review
MIT
Outline Outline
Continuous random variable properties derivable from coin DeMoivre-Laplace Limit Theorem
toss intuition
I Cov(X , Y ) = Cov(Y , X )
I Cov(X , X ) = Var(X )
I Cov(aX , Y ) = aCov(X , Y ).
I Now define covariance of X and Y by
Cov(X , Y ) = E [(X E [X ])(Y E [Y ]).
I Cov(X1 + X2 , Y ) = Cov(X1 , Y ) + Cov(X2 , Y ).
I Note: by definition Var(X ) = Cov(X , X ).
I General statement of bilinearity of covariance:
I Covariance formula E [XY ] E [X ]E [Y ], or expectation of Xm n
X m X
X n
Xn n
X X
Var( Xi ) = Var(Xi ) + 2 Cov(Xi , Xj ).
i=1 i=1 (i,j):i<j
Defining correlation Conditional probability distributions
I Definition:
Var(X |Y ) = E (X E [X |Y ])2 |Y = E X 2 E [X |Y ]2 |Y .
I Let X be a random variable of variance X2 and Y an
I Var(X |Y ) is a random variable that depends on Y . It is the independent random variable of variance Y2 and write
variance of X in the conditional distribution for X given Y . Z = X + Y . Assume E [X ] = E [Y ] = 0.
I Note E [Var(X |Y )] = E [E [X 2 |Y ]] E [E [X |Y ]2 |Y ] = I What are the covariances Cov(X , Y ) and Cov(X , Z )?
E [X 2 ] E [E [X |Y ]2 ]. I How about the correlation coefficients (X , Y ) and (X , Z )?
I If we subtract E [X ]2 from first term and add equivalent value I What is E [Z |X ]? And how about Var(Z |X )?
E [E [X |Y ]]2 to the second, RHS becomes I Both of these values are functions of X . Former is just X .
Var[X ] Var[E [X |Y ]], which implies following:
Latter happens to be a constant-valued function of X , i.e.,
I Useful fact: Var(X ) = Var(E [X |Y ]) + E [Var(X |Y )]. happens not to actually depend on X . We have
I One can discover X in two stages: first sample Y from Var(Z |X ) = Y2 .
marginal and compute E [X |Y ], then sample X from I Can we check the formula
distribution given Y value. Var(Z ) = Var(E [Z |X ]) + E [Var(Z |X )] in this case?
I Above fact breaks variance into two parts, corresponding to
these two stages.
Moment generating functions Moment generating functions for sums of i.i.d. random
variables
I Let X be a random variable and M(t) = E [e tX ].
I Then M 0 (0) = E [X ] and M 00 (0) = E [X 2 ]. Generally, nth I We showed that if Z = X + Y and X and Y are independent,
derivative of M at zero is E [X n ].
then MZ (t) = MX (t)MY (t)
I Let X and Y be independent random variables and I If X1 . . . Xn are i.i.d. copies of X and Z = X1 + . . . + Xn then
Z = X +Y.
what is MZ ?
I Write the moment generating functions as MX (t) = E [e tX ] I Answer: MXn . Follows by repeatedly applying formula above.
and MY (t) = E [e tY ] and MZ (t) = E [e tZ ].
I This a big reason for studying moment generating functions.
I If you knew MX and MY , could you compute MZ ?
It helps us understand what happens when we sum up a lot of
I By independence, MZ (t) = E [e t(X +Y ) ] = E [e tX e tY ] = independent copies of the same random variable.
E [e tX ]E [e tY ] = MX (t)MY (t) for all t. I If Z = aX then MZ (t) = E [e tZ ] = E [e taX ] = MX (at).
I In other words, adding independent random variables I If Z = X + b then MZ (t) = E [e tZ ] = E [e tX +bt ] = e bt MX (t).
corresponds to multiplying moment generating functions.
Examples Cauchy distribution
I If X is binomial with parameters (p, n) then I A standard Cauchy random variable is a random real
MX (t) = (pe t + 1 p)n . number with probability density f (x) = 1 1+x
1
2.
I If X is Poisson with parameter > 0 then I There is a spinning flashlight interpretation. Put a flashlight
MX (t) = exp[(e t 1)]. at (0, 1), spin it to a uniformly random angle in [/2, /2],
2 /2
I If X is normal with mean 0, variance 1, then MX (t) = e t . and consider point X where light beam hits the x-axis.
I If X is normal with mean , variance 2 , then I FX (x) = P{X x} = P{tan x} = P{ tan1 x} =
2 2 1 1 1 x.
MX (t) = e t /2+t . 2 + tan
d 1 1
I If X is exponential with parameter > 0 then MX (t) = t .
I Find fX (x) = dx F (x) = 1+x 2 .
Beta distribution
18.600: Lecture 29
Weak law of large numbers Weak law of large numbers: Markov/Chebyshev approach
Scott Sheffield
18.600: Lecture 30
Central limit theorem Central limit theorem
Scott Sheffield
Proving the central limit theorem I Here (b) (a) = P{a Z b} when Z is a standard
normal random variable.
n np
I S
npq describes number of standard deviations that Sn is
above or below its mean.
I Question: Does a similar statement hold if the Xi are i.i.d. but
have some other probability distribution?
I Central limit theorem: Yes, if they have finite variance.
Example Example
18.600: Lecture 31
Strong law of large numbers and Jensens A story about Pedro
inequality
Strong law of large numbers
Scott Sheffield
MIT
Jensens inequality
I How much does Pedro make in expectation over 10 years with I How would you advise Pedro to invest over the next 10 years
risky approach? 100 years?
if Pedro wants to be completely sure that he doesnt lose
I Answer: let Ri be i.i.d. random variables each equal to 1.15 money?
with probability .53 and .85 with probability .47. Total value I What if Pedro is willing to accept substantial risk if it means
after n steps is initial investment times
there is a good chance it will enable his grandchildren to retire
Tn := R1 R2 . . . Rn .
in comfort 100 years from now?
I Compute E [R1 ] = .53 1.15 + .47 .85 = 1.009. I What if Pedro wants the money for himself in ten years?
I Then E [T120 ] = 1.009120 2.93. And I Lets do some simulations.
E [T1200 ] = 1.0091200 46808.9
I We wrote Tn = R1 . . . Rn . P
Taking logs, we can write
Xi = log Ri and Sn = log Tn = ni=1 Xi .
I Now Sn is a sum of i.i.d. random variables.
I E [X1 ] = E [log R1 ] = .53(log 1.15) + .47(log .85) .0023. A story about Pedro
I By the law of large numbers, if we take n extremely large,
then Sn /n .0023 with high probability.
Strong law of large numbers
I This means that, when n is large, Sn is usually a very negative
value, which means Tn is usually very close to zero (even
though its expectation is very large).
Jensens inequality
I Bad news for Pedros grandchildren. After 100 years, the
portfolio is probably in bad shape. But what if Pedro takes an
even longer view? Will Tn converge to zero with probability
one as n gets large? Or will Tn perhaps always eventually
rebound?
Outline Strong law of large numbers
Strong law implies weak law Proof of strong law assuming E [X 4 ] <
18.600: Lecture 32
Markov chains
Markov Chains
MIT
I For example, imagine a simple weather model with two states: I To describe a Markov chain, we need to define Pij for any
rainy and sunny. i, j {0, 1, . . . , M}.
I If its rainy one day, theres a .5 chance it will be rainy the I It is convenient to represent the collection of transition
next day, a .5 chance it will be sunny. probabilities Pij as a matrix:
I If its sunny one day, theres a .8 chance it will be sunny the
P00 P01 . . . P0M
next day, a .2 chance it will be rainy. P10 P11 . . . P1M
I In this climate, sun tends to last longer than rain.
A=
I Given that it is rainy today, how many days to I expect to
have to wait to see a sunny day?
I Given that it is sunny today, how many days to I expect to PM0 PM1 . . . PMM
have to wait to see a rainy day? I For this to make sense, we require Pij 0 for all i, j and
I Over the long haul, what fraction of days are sunny? PM
j=0 Pij = 1 for each i. That is, the rows sum to one.
I What does it mean if all of the rows are identical? Markov chains
I Answer: state sequence Xi consists of i.i.d. random variables.
I What if matrix is the identity?
Examples
I Answer: states never change.
I What if each Pij is either one or zero?
I Answer: state evolution is deterministic. Ergodicity and stationarity
In a relationship
Markov chains
Single Its complicated
Examples
Married Engaged
Ergodicity and stationarity
I Can we assign a probability to each arrow?
I Markov model implies time spent in any state (e.g., a
marriage) before leaving is a geometric random variable.
I Not true... Can we make a better model with more states?
I Recall
that
.285719 .714281 2/7 5/7
A10 = =
.285713 .714287 2/7 5/7
Outline
18.600: Lecture 33
Entropy
Entropy
MIT
Conditional entropy
Entropy Entropy
C 110
I Data compression: X1 , X2 , . . . , Xn be i.i.d. instances of X .
Do there exist encoding schemes such that the expected
D 111 number of bits required to encode the entire sequence is
I No sequence in code is an extension of another. about H(X )n (assuming n is sufficiently large)?
I What does 100111110010 spell? I Yes. We can cut space of N n possibilities close to exactly in
I A coding scheme is equivalent to a twenty questions strategy. half at each stage (up till near end maybe).
Outline Outline
Entropy Entropy
P
P I Definitions:PHY =yj (X ) = i p(xi |yj ) log p(xi |yj ) and
I Definitions:PHY =yj (X ) = i p(xi |yj ) log p(xi |yj ) and HY (X ) = j HY =yj (X )pY (yj ).
HY (X ) = j HY =yj (X )pY (yj ). I Important property two: HY (X ) H(X ) with equality if
I Important property one: H(X , Y ) = H(Y ) + HY (X ). and only if X and Y are independent.
I In words, the expected amount of information we learn when I In words, the expected amount of information we learn when
discovering (X , Y ) is equal to expected amount we learn when discovering X after having discovered Y cant be more than
discovering Y plus expected amount when we subsequently the expected amount of information we would learn when
discover X (given our knowledge of Y ). discovering X before knowing anything about Y .
I To prove this property, recall that p(xi , yj ) = pY (yj )p(xi |yj ). I
P
Proof: note that E(p1 , p2 , . . . , pn ) := pi log pi is concave.
P P
I Thus,
P P H(X , Y ) = i j p(xi , yj ) log p(xi , yj ) = I The vector v = {pX (x1 ), pX (x2 ), . . . , pX (xn )} is a weighted
Pi j pY (yj )p(xi |yj )[log P pY (yj ) + log p(xi |yj )] = average of vectors vj := {pX (x1 |yj ), pX (x2 |yj ), . . . , pX (xn |yj )}
P j pY (yP j ) log pY (yj ) i p(xi |yj ) as j ranges over possible values. By (vector version of)
j pY (yj ) i p(xi |yj ) log p(xi |yj ) = H(Y ) + HY (X ). Jensens inequality, P P
H(X ) = E(v ) = E( pY (yj )vj ) pY (yj )E(vj ) = HY (X ).
Outline
18.600: Lecture 34
Martingales and the optional stopping
Martingales and stopping times
theorem
Scott Sheffield
Optional stopping theorem
MIT
Outline Outline
I Doobs optional stopping time theorem is contained in I Doobs Optional Stopping Theorem: If the sequence
many basic texts on probability and Martingales. (See, for X0 , X1 , X2 , . . . is a bounded martingale, and T is a stopping
example, Theorem 10.10 of Probability with Martingales, by time, then the expected value of XT is X0 .
David Williams, 1991.)
I When we say martingale is bounded, we mean that for some
I Essentially says that you cant make money (in expectation) C , we have that with probability one |Xi | < C for all i.
by buying and selling an asset whose price is a martingale.
I Why is this assumption necessary?
I Precisely, if you buy the asset at some time and adopt any
I Can we give a counterexample if boundedness is not assumed?
strategy at all for deciding when to sell it, then the expected
price at the time you sell is the price you originally paid. I Theorem can be proved by induction if stopping time T is
bounded. Unbounded T requires a limit argument. (This is
I If market price is a martingale, you cannot make money in
where boundedness of martingale is used.)
expectation by timing the market.
I Many asset prices are believed to behave approximately like I The two-element sequence E [X ], X is a martingale.
martingales, at least in the short term. I In previous lectures, we interpreted the conditional
I Efficient market hypothesis: new information is instantly expectation E [X |Y ] as a random variable.
absorbed into the stock value, so expected value of the stock I Depends only on Y . Describes expectation of X given
tomorrow should be the value today. (If it were higher, observed Y value.
statistical arbitrageurs would bid up todays price until this
was not the case.)
I We showed E [E [X |Y ]] = E [X ].
I But what about interest, risk premium, etc.?
I This means that the three-element sequence E [X ], E [X |Y ], X
is a martingale.
I According to the fundamental theorem of asset pricing,
the discounted price XA(n)
(n)
, where A is a risk-free asset, is a
I More generally if Yi are any random variables, the sequence
martingale with respected to risk neutral probability. More E [X ], E [X |Y1 ], E [X |Y1 , Y2 ], E [X |Y1 , Y2 , Y3 ], . . . is a
on this next lecture. martingale.
Martingales as real-time subjective probability updates More conditional probability martingale examples
I Ivan sees email from girlfriend with subject some possibly
serious news, thinks theres a 20 percent chance shell break
up with him by emails end. Revises number after each line:
I Example: let C be the amount of oil available for drilling
I Oh Ivan, Ive missed you so much! 12
under a particular piece of land. Suppose that ten geological
I I have something crazy to tell you, 24 tests are done that will ultimately determine the value of C .
I and so sorry to do this by email. (Wheres your phone!?) 38 Let Cn be the conditional expectation of C given the
I Ive been spending lots of time with a guy named Robert, 52 outcome of the first n of these tests. Then the sequence
I a visiting database consultant on my project 34 C0 , C1 , C2 , . . . , C10 = C is a martingale.
I who seems very impressed by my work. 23 I Let Ai be my best guess at the probability that a basketball
I Robert wants me to join his startup in Palo Alto. 38 team will win the game, given the outcome of the first i
I Exciting!!! Of course I said Id have to talk to you first, 24 minutes of the game. Then (assuming some rationality of
my personal probabilities) Ai is a martingale.
I because you are absolutely my top priority in my life, 8
I and youre stuck at MIT for at least three more years... 11
I but honestly, Im just so confused on so many levels. 15
I Call me!!! I love you! Alice 0
Outline
18.600: Lecture 35
Martingales and risk neutral probability Martingales and stopping times
Scott Sheffield
I Let T be a non-negative integer valued random variable. I Suppose that an asset price is a martingale that starts at 50
I Think of T as giving the time the asset will be sold if the and changes by increments of 1 at each time step. What is
price sequence is X0 , X1 , X2 , . . .. the probability that the price goes down to 40 before it goes
I Say that T is a stopping time if the event that T = n up to 70?
depends only on the values Xi for i n. In other words, the I What is the probability that it goes down to 45 then up to 55
decision to sell at time n depends only on prices up to time n, then down to 45 then up to 55 again all before reaching
not on (as yet unknown) future prices. either 0 or 100?
Outline Outline
Risk neutral probability and martingales Risk neutral probability and martingales
Martingales applied to finance Risk neutral probability
Risk neutral probability of outcomes known at fixed time T Risk neutral probability differ vs. ordinary probability
I Risk neutral probability of event A: PRN (A) denotes I At first sight, one might think that PRN (A) describes the
markets best guess at the probability that A will occur.
Price{Contract paying 1 dollar at time T if A occurs }
. I But suppose A is the event that the government is dissolved
Price{Contract paying 1 dollar at time T no matter what }
and all dollars become worthless. What is PRN (A)?
I If risk-free interest rate is constant and equal to r I Should be 0. Even if people think A is likely, a contract
(compounded continuously), then denominator is e rT . paying a dollar when A occurs is worthless.
I Assuming no arbitrage (i.e., no risk free profit with zero I Now, suppose there are only 2 outcomes: A is event that
upfront investment), PRN satisfies axioms of probability. That economy booms and everyone prospers and B is event that
is, 0 PRN (A) 1, and PRN (S) = 1, and if events Aj are economy sags and everyone is needy. Suppose purchasing
disjoint then PRN (A1 A2 . . .) = PRN (A1 ) + PRN (A2 ) + . . . power of dollar is the same in both scenarios. If people think
I Arbitrage example: if A and B are disjoint and A has a .5 chance to occur, do we expect PRN (A) > .5 or
PRN (A B) < P(A) + P(B) then we sell contracts paying 1 if PRN (A) < .5?
A occurs and 1 if B occurs, buy contract paying 1 if A B I Answer: PRN (A) < .5. People are risk averse. In second
occurs, pocket difference. scenario they need the money more.f
Non-systemic event Extensions of risk neutral probability
I Suppose that A is the event that the Boston Red Sox win the
World Series. Would we expect PRN (A) to represent (the I Definition of risk neutral probability depends on choice of
markets best assessment of) the probability that the Red Sox currency (the so-called numeraire).
will win? I In 2016 presidential election, investors predicted value of
I Arguably yes. The amount that people in general need or Mexican peso (in US dollars) would be lower
value dollars does not depend much on whether A occurs I Risk neutral probability can be defined for variable times and
(even though the financial needs of specific individuals may variable interest rates e.g., one can take the numeraire to
depend on heavily on A). be amount one dollar in a variable-interest-rate money market
I Even if some people bet based on loyalty, emotion, insurance account has grown to when outcome is known. Can define
against personal financial exposure to teams prospects, etc., PRN (A) to be price of contract paying this amount if and
there will arguably be enough in-it-for-the-money statistical when A occurs.
arbitrageurs to keep price near a reasonable guess of what I For simplicity, we focus on fixed time T , fixed interest rate r
well-informed informed experts would consider the true in this lecture.
probability.
18.600: Lecture 36
Risk Neutral Probability and Black-Scholes Black-Scholes
Scott Sheffield
Outline Overview
I The mathematics of todays lecture will not go far beyond
things we know.
I Main mathematical tasks will be to compute expectations of
functions of log-normal random variables (to get the
Black-Scholes formula) and differentiate under an integral (to
Black-Scholes compute risk neutral density functions from option prices).
I Will spend time giving financial interpretations of the math.
I Can interpret this lecture as a sophisticated story problem,
illustrating an important application of the probability we have
Call quotes and risk neutral probability
learned in this course (involving probability axioms,
expectations, cumulative distribution functions, etc.)
I Brownian motion (as mathematically constructed by MIT
professor Norbert Wiener) is a continuous time martingale.
I Black-Scholes theory assumes that the log of an asset price is
a process called Brownian motion with drift with respect to
risk neutral probability. Implies option price formula.
Black-Scholes: main assumption and conclusion Black-Scholes example: European call option
I More famous MIT professors: Black, Scholes, Merton. I A European call option on a stock at maturity date T ,
I 1997 Nobel Prize. strike price K , gives the holder the right (but not obligation)
to purchase a share of stock for K dollars at time T .
I Assumption: the log of an asset price X at fixed future time The document gives the
T is a normal random variable (call it N) with some known bearer the right to pur-
chase one share of MSFT
variance (call it T 2 ) and some mean (call it ) with respect from me on May 31 for
35 dollars. SS
to risk neutral probability.
I
2
Observation: N normal (, T 2 ) implies E [e N ] = e +T /2 . I If X is the value of the stock at T , then the value of the
option at time T is given by g (X ) = max{0, X K }.
I Observation: If X0 is the current price then
2 I Black-Scholes: price of contract paying g (X ) at time T is
X0 = ERN [X ]e rT = ERN [e N ]e rT = e +( /2r )T .
ERN [g (X )]e rT = ERN [g (e N )]e rT where N is normal with
I Observation: This implies = log X0 + (r 2 /2)T . variance T 2 , mean = log X0 + (r 2 /2)T .
I Conclusion: If g is any function then the price of a contract I Write this as
that pays g (X ) at time T is
e rT ERN [max{0, e N K }] = e rT ERN [(e N K )1Nlog K ]
ERN [g (X )]e rT = ERN [g (e N )]e rT e rT
Z
(x)2
= e 2T 2 (e x K )dx.
where N is normal with mean and variance T 2 . 2T log K
I E [E 2 ] = 2 and Cov[E , V ] = 0.
I Probability of no earthquake or eruption in first year is
1
e (2+1) 10 = e .3 (see next part). Same for any year by
memoryless property. Expected number of
quake/eruption-free years is 10e .3 7.4.
I Probability density function of min{E , V } is 3e (2+1)x for
x 0, and 0 for x < 0.
Order statistics
18.600: Lecture 38
Review: practice problems I Let X be a uniformly distributed random variable on [1, 1].
I Compute the variance of X 2 .
I If X1 , . . . , Xn are independent copies of X , what is the
Scott Sheffield
probability density function for the smallest of the Xi
MIT
P{min{X1 , . . . , Xn } > x}
1x n
= P{X1 > x, X2 > x, . . . , Xn > x} = ( ) .
2
So the density function is
1x n n 1 x n1
( ) = ( ) .
x 2 2 2
Moment generating functions answers Entropy
Entropy answers
Z b
1 2
E [e X 1X (a,b) ] = e x e x /2 dx
Z
1 2
E [e 3X 3 ] = e 3x3 e x /2 dx a 2
2
Z
Z b
1 x 2
=
1 x 2 6x+6
e 2 dx = e x e 2 dx
2 a 2
Z
Z b
1 x 2 2x+11
1 x 2 6x+9 3/2 = e 2 dx
= e 2 e dx
2 a 2
Z Z b
1 (x3)2 1/2 1 (x1)2
= e 3/2 e 2 dx =e e 2 dx
2 a 2
Z b1
= e 3/2 1/2 1 x2
=e e 2 dx
a1 2
= e 1/2 ((b 1) (a 1))
If you want more probability and statistics... Thanks for taking the course!
I UNDERGRADUATE:
(a) 18.615 Introduction to Stochastic Processes
(b) 18.642 Topics in Math with Applications in Finance
(c) 18.650 Statistics for Applications
I GRADUATE LEVEL PROBABILITY I Considering previous generations of mathematically inclined
(a) 18.175 Theory of Probability MIT students, and adopting a frequentist point of view...
(b) 18.176 Stochastic calculus
(c) 18.177 Topics in stochastic processes (topics vary
I You will probably do some important things with your lives.
repeatable, offered twice next year) I I hope your probabilistic shrewdness serves you well.
I GRADUATE LEVEL STATISTICS I Thinking more short term...
(a) 18.655 Mathematical statistics
(b) 18.657 Topics in statistics (topics vary topic this year was
I Happy exam day!
machine learning; repeatable) I And may the odds be ever in your favor.
I OUTSIDE OF MATH DEPARTMENT
(a) Look up new MIT minor in statistics and data sciences.
(b) Look up long list of probability/statistics courses (about 78
total) at https://stat.mit.edu/academics/subjects/
(c) Ask other MIT faculty how they use probability and statistics
in their research.