Hitchhiker S Guide To Probability
Hitchhiker S Guide To Probability
MATT RAYMOND
1. Some Prerequisites
Date: 2/3/21.
1Be careful! Some authors use other names for A when A ' N.
2In order to make this rigorous, we really need to define what it means for x to tend to y.
This leads us to limits!
1
2 MATT RAYMOND
The astute reader would recognise that the above sum approximates the area
under the graph of f .3
Notice how f may have many primitives (but they all differ by a constant).
Now we state a practical and important (slightly simplified) result.
2. Random Variables
with pX (xi ) = P (X = xi ) > 0 for each xi ∈ X(Ω), and pX (x) = 0 for all
others. The set of xi with P (X = xi ) > 0 is called the support of X, suppX.
Example 2.2. Suppose we toss an unfair dice once with X being the number
of heads collected (either 1 or 0) with pX (1) = p, and p(0) = 1−pX (1) = 1−p.
This gives us an first example of a probability distribution, from which X
is sampled. We say X is Bernoulli distributed, and write X ∼ Ber(p).
We say that in the above X is uniformly distributed on [a, b], or X ∼ U([a, b]).
Let us quickly review conditional probability. Suppose that X = xj occurs
before X = xi . Then we define the conditional probability of X = xi
given that X = xj has already occurred.
P (X = xi ∪ X = xj )
(2.4) P (X = xj | X = xi ) = ,
P (X = xi )
requiring that P (X = xi ) 6= 0.
(2.8) P (X ∪ Y ) = P (X)P (Y ).
the probability of getting any single point on [a, b] is 0 (why?). But we can
consider the probability that x ∈ [x − ε, x + ε], as ε → 0.7
This is why Gaussian distributions are so important! The sum of any set
of continuous or discrete random variables (sampled from one distribution)
becomes normally distributed as the set becomes countable.9)
For continuous variables, replace the sum over S with an integral over R.
Example 3.4. Suppose pX (x) = 2−2x for x ∈ (0, 1), but x = 0 when x 6∈ (0, 1).
The expected value of X is given by
Z Z 1
2 2
(3.5) E(X) = (2 − 2x)x dx = (2x − 2x2 ) dx = x2 − x3 = 1 − .
R (0,1) 3 3
0
This makes sense, plot 2x − 2x2 and 2 − x on (0, 1). The operator E has some
nice properties,
(1) The map E is R-linear. That is, for λ, µ ∈ R and random variables X
and Y , E(λX + µY ) = λE(X) + µE(Y ).
(2) The expectation of a random variable with a symmetric distribution
coincides with its axis of symmetry.
(3) Suppose for X : Ω → R, we have a map f : R → R. Then the
expectation of the map f ◦ X : Ω → R → R is
Z
(3.6) E(f ◦ X) = f (x)pX (x) dx.
R
The same holds for discrete variables on replacing the integral over R
with a sum over SuppX.
Exercise 3.8. Prove that the density function in the above example is nor-
malised (hint: geometric series).
Var(X) = E (X − EX)2 .
(3.10)
Exercise 3.11. Show that Var(X) = E(X 2 ) − (EX)2 . Hence show that for
X ∼ Pois(λ), EX = λ, and EX 2 = λ2 + λ (use 3.3). Hence, show Var X = λ.