CME 106 - Probability Cheatsheet
CME 106 - Probability Cheatsheet
Subscribe here
(https://docs.google.com/forms/d/e/1FAIpQLSeOr-
yp8VzYIs4ZtE9HVkRcMJyDcJ2FieM82fUsFoCssHu9DA/viewform) to be notified of new
releases!
(https://stanford.edu/~shervine/teaching/cme-106/cheatsheet-probability#cme-106---
introduction-to-probability-and-statistics-for-engineers)CME 106 - Introduction to Probability and
Statistics for Engineers (teaching/cme-106) English
Probability Statistics
(https://stanford.edu/~shervine/teaching/cme-
106/cheatsheet-
probability#cheatsheet)Probability cheatsheet
Star 568
(https://stanford.edu/~shervine/teaching/cme-106/cheatsheet-
probability#introduction)
Introduction to Probability and Combinatorics
❐ Sample space ― The set of all possible outcomes of an experiment is known as the sample space
of the experiment and is denoted by S .
❐ Event ― Any subset E of the sample space is known as an event. That is, an event is a set
consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in
E , then we say that E has occurred.
❐ Axioms of probability ― For each event E , we denote P (E) as the probability of event E
occurring.
Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will
occur is 1, i.e:
P (S) = 1
P (⋃ Ei ) = ∑ P (Ei )
n n
i=1 i=1
❐ Permutation ― A permutation is an arrangement of r objects from a pool of n objects, in a given
order. The number of such arrangements is given by P (n, r), defined as:
n!
P (n, r) =
(n − r)!
P (n, r) n!
C(n, r) = =
r!(n − r)!
r!
(https://stanford.edu/~shervine/teaching/cme-106/cheatsheet-
probability#conditional-probability)
Conditional Probability
❐ Bayes' rule ― For events A and B such that P (B) > 0, we have:
P (B∣A)P (A)
P (A∣B) =
P (B)
we have:
n
∀i
= j, Ai ∩ Aj = ∅
and ⋃ Ai = S
i=1
n
Remark: for any event B in the sample space, we have P (B) = ∑ P (B∣Ai )P (Ai ).
i=1
P (B∣Ak )P (Ak )
P (Ak ∣B) =
n
∑ P (B∣Ai )P (Ai )
i=1
P (A ∩ B) = P (A)P (B)
(https://stanford.edu/~shervine/teaching/cme-106/cheatsheet-
probability#random-variables)
Random Variables
Definitions
❐ Random variable ― A random variable, often noted X , is a function that maps every element in a
sample space to a real line.
x→−∞ x→+∞
as:
F (x) = P (X ⩽ x)
❐ Probability density function (PDF) ― The probability density function f is the probability that X
takes on values between two adjacent realizations of the random variable.
❐ Discrete case ― Here, X takes discrete values, such as outcomes of coin flips. By noting f and
F the PDF and CDF respectively, we have the following relations:
F (x) = ∑ P (X = xi )
and f (xj ) = P (X = xj )
xi ⩽x
0 ⩽ f (xj ) ⩽ 1
and ∑ f (xj ) = 1
❐ Continuous case ― Here, X takes continuous values, such as the temperature in the room. By
noting f and F the PDF and CDF respectively, we have the following relations:
x
dF
F (x) = ∫
f (y)dy
and f (x) =
−∞ dx
+∞
f (x) ⩾ 0
and ∫
f (x)dx = 1
−∞
(https://stanford.edu/~shervine/teaching/cme-106/cheatsheet-
probability#expectation)
Expectation and Moments of the Distribution
In the following sections, we are going to keep the same notations as before and the formulas will
be explicitly detailed for the discrete (D) and continuous (C) cases.
❐ Expected value ― The expected value of a random variable, also known as the mean value or the
first moment, is often noted E[X] or μ and is the value that we would obtain by averaging the
results of the experiment infinitely many times. It is computed as follows:
n +∞
(D) E[X] = ∑ xi f (xi )
and (C) E[X] = ∫
xf (x)dx
i=1 −∞
❐ Generalization of the expected value ― The expected value of a function of a random variable
g(X) is computed as follows:
n +∞
(D) E[g(X)] = ∑ g(xi )f (xi )
and (C) E[g(X)] = ∫
g(x)f (x)dx
i=1 −∞
❐ k th moment ― The k th moment, noted E[X k ], is the value of X k that we expect to observe on
average on infinitely many trials. It is computed as follows:
n +∞
(D) k
E[X ] = ∑ xki f (xi )
and (C) E[X ] = ∫
k
xk f (x)dx
i=1 −∞
Remark: the k th moment is a particular case of the previous definition with g : X ↦ Xk.
❐ Variance ― The variance of a random variable, often noted Var(X) or σ 2 , is a measure of the
spread of its distribution function. It is determined as follows:
❐ Standard deviation ― The standard deviation of a random variable, often noted σ , is a measure of
the spread of its distribution function which is compatible with the units of the actual random
variable. It is determined as follows:
σ= Var(X)
❐ Characteristic function ― A characteristic function ψ(ω) is derived from a probability density
function f (x) and is defined as:
n +∞
(D) ψ(ω) = ∑ f (xi )e
iωxi
and (C) ψ(ω) = ∫
f (x)eiωx dx
i=1 −∞
❐ Euler's formula ― For θ ∈ R, the Euler formula is the name given to the identity:
1 ∂kψ
E[X ] = k [ k ]
k
i ∂ω ω=0
❐ Transformation of random variables ― Let the variables X and Y be linked by some function. By
noting fX and fY the distribution function of X and Y respectively, we have:
∣ dx ∣
fY (y) = fX (x) ∣∣ ∣∣
∣ dy ∣
❐ Leibniz integral rule ― Let g be a function of x and potentially c, and a, b boundaries that may
depend on c. We have:
(∫ g(x)dx) =
b b
∂ ∂b ∂a ∂g
⋅ g(b) − ⋅ g(a) + ∫ (x)dx
∂c ∂c ∂c ∂c
a a
(https://stanford.edu/~shervine/teaching/cme-106/cheatsheet-
probability#probability-distributions)
Probability Distributions
❐ Chebyshev's inequality ― Let X be a random variable with expected value μ. For k, σ > 0, we
have the following inequality:
1
P (∣X − μ∣ ⩾ kσ) ⩽
k2
❐ Discrete distributions ― Here are the main discrete distributions to have in mind:
μx −μ iω
−1)
X ∼ Po(μ) e
eμ(e μ μ
x!
❐ Continuous distributions ― Here are the main continuous distributions to have in mind:
12
1 1 x−μ 2
e− 2 ( σ )
1 2 2
X ∼ N (μ, σ) eiωμ− 2 ω σ μ σ2
2π σ
1 1 1
X ∼ Exp(λ) λe−λx
1 − iω
λ2
λ
(https://stanford.edu/~shervine/teaching/cme-106/cheatsheet-
probability#joint-rv)
Jointly Distributed Random Variables
❐ Joint probability density function ― The joint probability density function of two random variables
X and Y , that we note fXY , is defined as follows:
+∞
(D) fX (xi ) = ∑ fXY (xi , yj )
and (C) fX (x) = ∫
fXY (x, y)dy
j −∞
x y
(D) FXY (x, y) = ∑ ∑ fXY (xi , yj )
and (C) FXY (x, y) = ∫
∫
fXY (x′
xi ⩽x yj ⩽y
−∞ −∞
❐ Conditional density ― The conditional density of X with respect to Y , often noted fX∣Y , is
defined as follows:
fXY (x, y)
fX∣Y (x) =
fY (y)
+∞ +∞
(D) E[X Y ] = p q
∑ ∑ xpi yjq f (xi , yj )
and (C) E[X Y ] = ∫
p q
∫
xp y q f
i j −∞ −∞
k=1
2
❐ Covariance ― We define the covariance of two random variables X and Y , that we note σXY or
2
Cov(X, Y ) ≜ σXY = E[(X − μX )(Y − μY )] = E[XY ] − μX μY
2
σXY
=
ρXY
σX σY
(https://twitter.com/shervinea) (https://linkedin.com/in/shervineamidi)
(https://github.com/shervinea) (https://scholar.google.com/citations?user=nMnMTm8AAAAJ)