Recitation Week01v1
Recitation Week01v1
Recitation Notes
The common probability distributions occur on their own and frequently as building
blocks of other random variables. That helps in finding new results and modeling
new processes.
For example, consider a binomial random variable X that takes values 1 or 0 with
probabilities p or q = 1 − p, respectively. If we draw n independent variables from the
same distribution, and we ask how many 1’s (“successes”) there are, independent of
their order, what is the distribution? Since the ordering doesn’t matter, we can figure
out the probability of one case, then count up the number of different arrangements,
and multiply by that number.
The combinatorial part nk represents how many ways there are to place k 1’s in a
But what if we don’t know n in advance? Or what if the orderings are not all
equivalent? In these cases, we don’t rush to group things together. We identify the
cases of interest, one-by-one if necessary, and assign probabilities. Then we’re ready
to compute any quantity of interest.
Example: mean waiting time in a Bernoulli trial. Suppose we keep doing new trials
and drawing new random variables until we succeed. That is, we are interested
in finding the n that corresponds to having one “success” after a string of n − 1
consecutive “failures.” How long does it take, on average, to roll a seven (p = 6/36)
with a pair of dice? How many hands of Texas Hold’em does it take to get dealt a
pair of aces (p = 12/2652)?
Let T be the waiting time, a random variable that takes values 0, 1, 2, . . .. Then Prob(T =
15.455x Page 1 of 3
n) = q n−1 p. Therefore
∞
X
E [T ] = nq n−1 p
n=0
∞
X d n
= (q p)
n=0
dq
∞
!
d X
=p qn
dq n=0
2
d 1 1
=p =p
dq 1−q 1−q
1
= .
p
That means that you’ll need to wait, on average, about 6 turns to roll a seven with
the dice, and about 221 hands of poker to get dealt pocket aces, or any other specific
pair.
There are two tricks I used here. The first was to recognize nq n−1 as the derivative of
something simpler. The second was to interchange the order of the summation and
the differentiation, which is justified here because the infinite series converge as long
as q < 1.
Notice that there is no memory of previous events. Whether you have been waiting 2
hands or 220, the expected waiting time from that point forward remains the same.
This is known as the Markov property: the future expectations depend only on the
current state, not on what happened in the past to get us there.
Sometimes history does matter. It depends on the question asked. The probability
of observing an equal number of heads and tails of a fair coin after two flips is 1/2.
But if you got two of the same outcome in the first two flips, then the probability
that two additional flips brings you even is only 1/4.
Let’s return to our Bernoulli problem and compute the variance of the waiting time.
We can use the general result that
Var(T ) = E T 2 − E [T ]2
15.455x Page 2 of 3
and compute directly
∞
X
2
n2 q n−1 p.
E T =
n=0
The previous trick still works if we generalize it a bit. For the r-th moment, there
will be a factor of nr in the sum. So we can write
X ∞
r p
E [T ] = nr q n
q n=0
∞
r X ! r
p d n p d 1
= q q = q .
q dq n=0
q dq 1 − q
Evaluating for r = 2, subtracting off the square of the mean, and letting q = 1 − p,
we have
1−p
Var(T ) = .
p2
15.455x Page 3 of 3