0% found this document useful (0 votes)
46 views3 pages

Probability Cheatsheet Midterm

Uploaded by

SaketRules
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views3 pages

Probability Cheatsheet Midterm

Uploaded by

SaketRules
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Probability Cheatsheet: Thinking Conditionally Bayes’ Rule

Midterm Edition Bayes’ Rule, and with extra conditioning (just add in C!)
Independence P (B|A)P (A)
P (A|B) =
Compiled by William Chen (http://wzchen.com) and Joe Blitzstein, Independent Events A and B are independent if knowing whether P (B)
with contributions from Sebastian Chiu, Yuan Jiang, Yuqi Hou, and A occurred gives no information about whether B occurred. More P (B|A, C)P (A|C)
Jessy Hwang. Material based on Joe Blitzstein’s lectures formally, A and B (which have nonzero probability) are independent if P (A|B, C) =
and only if one of the following equivalent statements holds: P (B|C)
(http://stat110.net) and the Blitzstein-Hwang Introduction to
Probability textbook (https://amzn.to/2L4rYs5). Licensed under CC We can also write
P (A ∩ B) = P (A)P (B) P (A, B, C) P (B, C|A)P (A)
BY-NC-SA 4.0.
P (A|B, C) = =
P (A|B) = P (A) P (B, C) P (B, C)
P (B|A) = P (B) Odds Form of Bayes’ Rule
Last Updated September 26, 2022
Conditional Independence A and B are conditionally independent P (A|B) P (B|A) P (A)
=
given C if P (A ∩ B|C) = P (A|C)P (B|C). Conditional independence P (Ac |B) P (B|Ac ) P (Ac )
Counting does not imply independence, and independence does not imply
conditional independence.
The posterior odds of A are the likelihood ratio times the prior odds.

Unions, Intersections, and Complements Random Variables and their Distributions


Multiplication Rule
De Morgan’s Laws A useful identity that can make calculating
probabilities of unions easier by relating them to intersections, and PMF, CDF, and Independence
C e
cak vice versa. Analogous results hold with more than two sets.
V Probability Mass Function (PMF) Gives the probability that a
S C waffle discrete random variable takes on the value x.
e
c c c
cak (A ∪ B) = A ∩ B
V cake c c c pX (x) = P (X = x)
(A ∩ B) = A ∪ B
waffle
wa S
ffle C
V cake Joint, Marginal, and Conditional
S
waffl

1.0
e Joint Probability P (A ∩ B) or P (A, B) – Probability of A and B.
Let’s say we have a compound experiment (an experiment with Marginal (Unconditional) Probability P (A) – Probability of A.

0.8
multiple components). If the 1st component has n1 possible outcomes, Conditional Probability P (A|B) = P (A, B)/P (B) – Probability of
the 2nd component has n2 possible outcomes, . . . , and the rth A, given that B occurred.

0.6
component has nr possible outcomes, then overall there are

pmf
n1 n2 . . . nr possibilities for the whole experiment. Conditional Probability is Probability P (A|B) is a probability

0.4
function for any fixed B. Any theorem that holds for probability also ●

holds for conditional probability.


Sampling Table ● ●

0.2
Probability of an Intersection or Union ● ●

0.0
Intersections via Conditioning
0 1 2 3 4
P (A, B) = P (A)P (B|A)
x
P (A, B, C) = P (A)P (B|A)P (C|A, B)
Unions via Inclusion-Exclusion The PMF satisfies
2 8 X
5 pX (x) ≥ 0 and pX (x) = 1
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
7 9 x
1 4 P (A ∪ B ∪ C) = P (A) + P (B) + P (C)
3 6
Cumulative Distribution Function (CDF) Gives the probability
− P (A ∩ B) − P (A ∩ C) − P (B ∩ C) that a random variable is less than or equal to x.
The sampling table gives the number of possible samples of size k out + P (A ∩ B ∩ C). FX (x) = P (X ≤ x)
of a population of size n, under various assumptions about how the
sample is collected. Law of Total Probability (LOTP)
Let B1 , B2 , B3 , ..., Bn be a partition of the sample space (i.e., they are

1.0
disjoint and their union is the entire sample space). ●

Order Matters Not Matter ● ●

P (A) = P (A|B1 )P (B1 ) + P (A|B2 )P (B2 ) + · · · + P (A|Bn )P (Bn )

0.8
n + k − 1
k
With Replacement n P (A) = P (A ∩ B1 ) + P (A ∩ B2 ) + · · · + P (A ∩ Bn )
k ● ●

0.6
n!  n
For LOTP with extra conditioning, just add in another event C!
Without Replacement

cdf
(n − k)! k

0.4
P (A|C) = P (A|B1 , C)P (B1 |C) + · · · + P (A|Bn , C)P (Bn |C)
● ●
P (A|C) = P (A ∩ B1 |C) + P (A ∩ B2 |C) + · · · + P (A ∩ Bn |C)
Naive Definition of Probability

0.2
Special case of LOTP with B and B c as partition: ● ●
If all outcomes are equally likely, the probability of an event A

0.0
c c ●

happening is: P (A) = P (A|B)P (B) + P (A|B )P (B )


c 0 1 2 3 4
P (A) = P (A ∩ B) + P (A ∩ B )
number of outcomes favorable to A x
Pnaive (A) =
number of outcomes
The CDF is an increasing, right-continuous function with Discrete Distributions Negative Binomial Distribution
FX (x) → 0 as x → −∞ and FX (x) → 1 as x → ∞ Let us say that X is distributed NBin(r, p). We know the following:
Distributions for four sampling schemes Story X is the number of “failures” that we will have before we
Independence Intuitively, two random variables are independent if achieve our rth success. Our successes have probability p.
Replace No Replace
knowing the value of one gives no information about the other. Example Thundershock has 60% accuracy and can faint a wild
Discrete r.v.s X and Y are independent if for all values of x and y Fixed # trials (n) Binomial HGeom Raticate in 3 hits. The number of misses before Pikachu faints
(Bern if n = 1) Raticate with Thundershock is distributed NBin(3, 0.6).
P (X = x, Y = y) = P (X = x)P (Y = y) Draw until r success NBin NHGeom
(Geom if r = 1) Hypergeometric Distribution
Expected Value and Indicators Let us say that X is distributed HGeom(w, b, n). We know the
Bernoulli Distribution following:
Expected Value and Linearity The Bernoulli distribution is the simplest case of the Binomial Story In a population of w desired objects and b undesired objects,
distribution, where we only have one trial (n = 1). Let us say that X is X is the number of “successes” we will have in a draw of n objects,
Expected Value (a.k.a. mean, expectation, or average) is a weighted distributed Bern(p). We know the following: without replacement. The draw of n objects is assumed to be a
average of the possible outcomes of our random variable. Story A trial is performed with probability p of “success”, and X is simple random sample (all sets of n objects are equally likely).
Mathematically, if x1 , x2 , x3 , . . . are all of the distinct possible values the indicator of success: 1 means success, 0 means failure. Examples Here are some HGeom examples.
that X can take, the expected value of X is
Example Let X be the indicator of Heads for a fair coin toss. Then • Let’s say that we have only b Weedles (failure) and w Pikachus
X ∼ Bern( 21 ). Also, 1 − X ∼ Bern( 12 ) is the indicator of Tails.
P
E(X) = xi P (X = xi ) (success) in Viridian Forest. We encounter n Pokemon in the
i
forest, and X is the number of Pikachus in our encounters.
Binomial Distribution
X Y X+Y • The number of Aces in a 5 card hand.
3 4 7 Bin(10,1/2) • You have w white balls and b black balls, and you draw n balls.
2 2 4

0.30
You will draw X white balls.
6 8 14

0.25
10 23 33 • You have w white balls and b black balls, and you draw n balls

1 –3 –2

0.20
● ●
without replacement. The number of white balls in your sample
1 0 1

0.15
is HGeom(w, b, n); the number of black balls is HGeom(b, w, n).

pmf
5 9 14 ● ●

0.10
• Capture-recapture A forest has N elk, you capture n of them,
4 1 5

0.05
tag them, and release them. Then you recapture a new sample
... ... ...
● ●

of size m. How many tagged elk are now in the new sample?

0.00
● ●
● ●

1 1
n
1
n HGeom(n, N − n, m)
n∑ + n∑ = ∑ (xi + yi)
n 0 2 4 6 8 10
xi yi n x
i=1 i=1 i=1

Let us say that X is distributed Bin(n, p). We know the following: Poisson Distribution
E(X) + E(Y) = E(X + Y)
Story X is the number of “successes” that we will achieve in n Let us say that X is distributed Pois(λ). We know the following:
Linearity For any r.v.s X and Y , and constants a, b, c, independent trials, where each trial is either a success or a failure, each
Story There are rare events (low probability events) that occur many
with the same probability p of success. We can also write X as a sum
different ways (high possibilities of occurrences) at an average rate of λ
E(aX + bY + c) = aE(X) + bE(Y ) + c of multiple independent Bern(p) random variables. Let X ∼ Bin(n, p)
occurrences per unit space or time. The number of events that occur
and Xj ∼ Bern(p), where all of the Bernoullis are independent. Then
in that unit of space or time is X.
Same distribution implies same mean If X and Y have the same X = X1 + X2 + X3 + · · · + Xn
distribution, then E(X) = E(Y ) and, more generally, Example A certain busy intersection has an average of 2 accidents
Example If Jeremy Lin makes 10 free throws and each one per month. Since an accident is a low probability event that can
E(g(X)) = E(g(Y )) independently has a 43 chance of getting in, then the number of free happen many different ways, it is reasonable to model the number of
accidents in a month at that intersection as Pois(2). Then the number
throws he makes is distributed Bin(10, 34 ). of accidents that happen in two months at that intersection is
Indicator Random Variables Properties Let X ∼ Bin(n, p), Y ∼ Bin(m, p) with X ⊥
⊥ Y. distributed Pois(4).
Indicator Random Variable is a random variable that takes on the • Redefine success n − X ∼ Bin(n, 1 − p) Properties Let X ∼ Pois(λ1 ) and Y ∼ Pois(λ2 ), with X ⊥
⊥ Y.
value 1 or 0. It is always an indicator of some event: if the event • Sum X + Y ∼ Bin(n + m, p)
occurs, the indicator is 1; otherwise it is 0. They are useful for many 1. Sum X + Y ∼ Pois(λ1 + λ2 )
problems about counting how many events of some kind occur. Write • Conditional X|(X + Y = r) ∼ HGeom(n, m, r) 
λ1

2. Conditional X|(X + Y = n) ∼ Bin n, λ1 +λ2
( • Binomial-Poisson Relationship Bin(n, p) is approximately
1 if A occurs, Pois(λ) if p is small. 3. Chicken-egg If there are Z ∼ Pois(λ) items and we randomly
IA =
0 if A does not occur. and independently “accept” each item with probability p, then
Geometric Distribution the number of accepted items Z1 ∼ Pois(λp), and the number of
2
Note that IA = IA , IA IB = IA∩B , and IA∪B = IA + IB − IA IB . rejected items Z2 ∼ Pois(λ(1 − p)), and Z1 ⊥
⊥ Z2 .
Let us say that X is distributed Geom(p). We know the following:
Distribution IA ∼ Bern(p) where p = P (A). Story X is the number of “failures” that we will achieve before we
achieve our first success. Our successes have probability p. Formulas
Fundamental Bridge The expectation of the indicator for event A is 1
the probability of event A: E(IA ) = P (A). Example If each pokeball we throw has probability 10 to catch Mew,
1
the number of failed pokeballs will be distributed Geom( 10 ). Geometric Series
Variance and Standard Deviation First Success Distribution 2 n−1
n−1
X k 1 − rn
1 + r + r + ··· + r = r =
Var(X) = E (X − E(X)) = E(X ) − (E(X))
2 2 2
Equivalent to the Geometric distribution, except that it includes the k=0
1−r
q first success in the count. This is 1 more than the number of failures. 2 1
SD(X) = Var(X) If X ∼ FS(p) then E(X) = 1/p. 1 + r + r + ··· = if |r| < 1
1−r
Exponential Function (ex ) Orderings of i.i.d. random variables 5. Symmetry. If X1 , . . . , Xn are i.i.d., consider using symmetry.


I call 2 UberX’s and 3 Lyfts at the same time. If the time it takes for 6. Before moving on. Check some simple and extreme cases,
xn x2 x3 x n
 
x
X the rides to reach me are i.i.d., what is the probability that all the check whether the answer seems plausible, check for biohazards.
e = =1+x+ + + · · · = lim 1+
n! 2! 3! n→∞ n Lyfts will arrive first? Answer: Since the arrival times of the five cars
n=0
are i.i.d., all 5! orderings of the arrivals are equally likely. There are
3!2! orderings that involve the Lyfts arriving first, so the probability
Biohazards
Example Problems 3!2!
= 1/10 . Alternatively, there are 53 Contributions from Jessy Hwang

that the Lyfts arrive first is
5!
ways to choose 3 of the 5 slots for the Lyfts to occupy, where each of 1. Don’t misuse the naive definition of probability. When
Contributions from Sebastian Chiu answering “What is the probability that in a group of 3 people,
the choices are equally likely. One of these choices has all 3 of the
5 no two have the same birth month?”, it is not correct to treat
Lyfts arriving first, so the probability is 1/ = 1/10 . the people as indistinguishable balls being placed into 12 boxes,
Calculating Probability 3 since that assumes the list of birth months {January, January,
January} is just as likely as the list {January, April, June},
A textbook has n typos, which are randomly scattered amongst its n
pages, independently. You pick a random page. What is the  Expectation of Negative Hypergeometric even though the latter is six times more likely.
1
probability that it has no typos? Answer: There is a 1 − n What is the expected number of cards that you draw before you pick 2. Don’t confuse unconditional, conditional, and joint
probability that any specific typo isn’t on your page, and thus a your first Ace in a shuffled deck (not counting the Ace)? Answer: P (B|A)P (A)
probabilities. In applying P (A|B) = P (B)
, it is not

1 n
 Consider a non-Ace. Denote this to be card j. Let Ij be the indicator correct to say “P (B) = 1 because we know B happened”; P (B)
1− probability that there are no typos on your page. For n that card j will be drawn before the first Ace. Note that Ij = 1 says is the prior probability of B. Don’t confuse P (A|B) with
n that j is before all 4 of the Aces in the deck. The probability that this P (A, B).
large, this is approximately e−1 = 1/e. occurs is 1/5 by symmetry. Let X be the number of cards drawn
before the first Ace. Then X = I1 + I2 + ... + I48 , where each indicator 3. Don’t assume independence without justification. In the
corresponds to one of the 48 non-Aces. Thus, matching problem, the probability that card 1 is a match and
Linearity and Indicators (1) card 2 is a match is not 1/n2 . Binomial and Hypergeometric
E(X) = E(I1 ) + E(I2 ) + ... + E(I48 ) = 48/5 = 9.6 . are often confused; the trials are independent in the Binomial
In a group of n people, what is the expected number of distinct story and dependent in the Hypergeometric story.
birthdays (month and day)? What is the expected number of birthday Pattern-matching with ex Taylor series 4. Don’t forget to do sanity checks. Probabilities must be
matches? Answer: Let X be the number of distinct birthdays and Ij
between 0 and 1. Variances must be ≥ 0. Supports must make
 
be the indicator for the jth day being represented. 1
For X ∼ Pois(λ), find E . Answer: By LOTUS, sense. PMFs must sum to 1. PDFs must integrate to 1.
X+1
n
E(Ij ) = 1 − P (no one born on day j) = 1 − (364/365) 5. Don’t confuse random variables, numbers, and events.
∞ ∞
1 e−λ λk e−λ X λk+1 e−λ λ Let X be an r.v. Then g(X) is an r.v. for any function g. In
 
1 X
n E = = = (e − 1) particular, X 2 , |X|, F (X), and IX>3 are r.v.s.
By linearity, E(X) = 365 (1 − (364/365) ) . Now let Y be the X+1 k+1 k! λ k=0 (k + 1)! λ
k=0 P (X 2 < X|X ≥ 0), E(X), Var(X), and g(E(X)) are numbers.
number of birthday matches and Ji be the indicator that the ith pair X = 2R and F (X) ≥ −1 are events. It does not make sense to
of people have the same birthday. The probability that any two
n
Problem-Solving Strategies write −∞∞
F (X)dx, because F (X) is a random variable. It does
specific people share a birthday is 1/365, so E(Y ) = /365 . not make sense to write P (X), because X is not an event.
2 Contributions from Jessy Hwang, Yuan Jiang, Yuqi Hou
6. Don’t confuse a random variable with its distribution.
1. Getting started. Start by defining relevant events and To get the PDF of X 2 , you can’t just square the PDF of X. To
random variables. (“Let A be the event that I pick the fair get the PDF of X + Y , you can’t just add the PDF of X and
Linearity and Indicators (2) coin”; “Let X be the number of successes.”) Clear notion is the PDF of Y .
There are n people at a party, each with hat. At the end of the party, important for clear thinking! Then decide what it is that you’re
supposed to be finding, in terms of your notation (“I want to 7. Don’t pull non-linear functions out of expectations.
they each leave with a random hat. What is the expected number of E(g(X)) does not equal g(E(X)) in general. The St.
people who leave with the right hat? Answer: Each hat has a 1/n find P (X = 3|A)”). Think about what type of object your
answer should be (a number? A random variable? A PMF?) Petersburg paradox is an extreme example.
chance of going to the right person. By linearity, the average number
and what it should be in terms of.
of hats that go to their owners is n(1/n) = 1 .
Try simple and extreme cases. To make an abstract experiment
more concrete, try drawing a picture or making up numbers
that could have happened. Pattern recognition: does the
Linearity and First Success structure of the problem resemble something we’ve seen before?
This problem is known as the coupon collector problem. There are n 2. Calculating probability of an event. Use counting
coupon types. At each draw, you get a uniformly random coupon type. principles if the naive definition of probability applies. Is the
What is the expected number of coupons needed until you have a probability of the complement easier to find? Look for
complete set? Answer: Let N be the number of coupons needed; we symmetries. Look for something to condition on, then apply
want E(N ). Let N = N1 + · · · + Nn , where N1 is the draws to get our Bayes’ Rule or the Law of Total Probability.
first new coupon, N2 is the additional draws needed to draw our
3. Finding the distribution of a random variable. First make
second new coupon and so on. By the story of the First Success,
sure you need the full distribution not just the mean (see next
N2 ∼ FS((n − 1)/n) (after collecting first coupon type, there’s
item). Check the support of the random variable: what values
(n − 1)/n chance you’ll get something new). Similarly,
can it take on? Use this to rule out distributions that don’t fit.
N3 ∼ FS((n − 2)/n), and Nj ∼ FS((n − j + 1)/n). By linearity,
Is there a story for one of the named distributions that fits the
problem at hand? Can you write the random variable as a
n function of an r.v. with a known distribution, say Y = g(X)?
n n n X1
E(N ) = E(N1 ) + · · · + E(Nn ) = + + ··· + = n
n n−1 1 j 4. Calculating expectation. If it has a named distribution,
j=1
check out the table of distributions. If it’s a function of an r.v.
with a named distribution, try LOTUS. If it’s a count of
This is approximately n(log(n) + 0.577) by Euler’s approximation. something, try breaking it up into indicator r.v.s.

You might also like