Probability Basics Annotated Slides
Probability Basics Annotated Slides
Meryem Benammar
ISAE-Supaéro, DEOS
Context
Probability spaces
Mathematics behind IE
Information measurement :
Randomness (uncertainty) : Information theory !
Probabilities and statistics
Context
Probability spaces
Experiments
Experiment Alphabet Ω
Ω = {H, T }
Ω = {1, 2, 3, 4, 5, 6}
Ω = {1, 2, 3, 4, 5, 6} × {1, 2, 3, 4, 5, 6}
• Result is H : E1 = {H}
• Result is T : E2 = {T }
Ω = {H, T } • Result is whatever : E3 = {H, T }
• Coin was lost : E4 = {∅}
Probability measure
P:E → [0, 1]
E → P(E)
X
Exhaustivity P(w) = 1
w∈Ω
X
Elementary measures P(E) = P(w)
w∈E
Probability space
• P2 (H) = 0.7
• P2 (T ) = 0.3 ({H, T }, P({H, T }), P2 )
Rigged coin flip
Random variables
Definition
Let us consider a probability space P = (Ω, E, P) and (X , EX ) a new alphabet
with its event space. A random variable X is a mapping from Ω to X
X: Ω → X
w → X(w).
• The probability measure PX is defined, for all events Ex in EX
Context
Probability spaces
PX : X → [0 : 1]
x → PX (x) = PX (X = x) = PX ({x})
2. Discrete uniform over the interval [1 : K] (Unif ([1 : K])) : Fair dice throw
3. Binomial with parameter (n, p) (Binom(n, p)) : number of heads in n coin flips
Definition (Moments)
To each random variable X with pmf PX are associated
• An expected (average) value E(X) (first order moment)
X
E(X) , x.PX (x)
x∈X
V(X) , E X 2 − E2 (X).
Examples :
X ×Y → [0, 1]
(x, y) → PX,Y (x, y) = P(X = x and Y = y)
X X X
PX,Y (x, y) = 1 and PX (x) = PY (y) = 1
(x,y)∈X ×Y x∈X y∈Y
PX,Y (x, y)
PX|Y (x|y) = = P (X = x|Y = y)
PY (y)
PX,Y (x, y)
PY |X (y|x) = = P (Y = y|X = x)
PX (x)
Bayes’ formula
Definition
Let (X, Y ) be a pair of random variables with joint pmf PX,Y . Assume that we
only know PX and PY |X , then Bayes’ formulae write as
PX (x)PY |X (y|x)
PX|Y (x|y) = X
PX (x0 )PY |X (y|x0 )
x0
Random vectors
• A binary stream of n = 10 bits iid Bern(0.5)
U k = (0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1)
• A random image of n = 20 × 20 gray-level iid pixels
-1
-2
-3
-4
0 20 40 60 80 100
X1 × . . . Xn → [0, 1]
(x1 , . . . , xn ) → PX1 ,...,Xn (x1 , . . . , xn ) = P(X1 = x1 , . . . , Xn = xn )
Example :
1. A set of variables (X1 , . . . , Xn ) are pairwise independent
n
Y
PX1 ,...,Xn (x1 , . . . , xn ) = PXi (xi ),
i=1
2. If, further, the variables are identically distributed (iid), i.e., they follow the
same law PX , then
n
Y
PX1 ,...,Xn (x1 , . . . , xn ) = PX (xi ).
i=1
Context
Probability spaces
• Interval probability :
Z b
P(a ≤ X ≤ b) = fX (x) dx
a
Definition (Moments)
To each random variable X with pmf PX / pdf fX are associated
• An expected value E(X) (first order moment)
Z
x.fX (x) dx.
x∈X
V(X) , E X 2 − E2 (X).
Properties (Moments)
V(α.X) = α2 V(X).
V(X + c) = V(X).
X1 × . . . Xn → [0, 1]
(x1 , . . . , xn ) → fX1 ,...,Xn (x1 , . . . , xn )
Chain rule
n
Y
fX1 ,...,Xn (x1 , . . . , xn ) = fXi |X1 ,...,Xi−1 (xi |x1 , . . . , xi−1 )
i=1
n
Y fX1 ,...,Xi (x1 , . . . , xi )
=
i=1
fX1 ,...,Xi−1 (x1 , . . . , xi−1 )
Kullback-Leibler divergence
Properties
The KL divergence verifies a certain set of properties :
1. Asymmetry : DKL (PX ||QX ) 6= DKL (QX ||PX ).
2. Null element : DKL (PX ||PX ) = 0.
3. Positivity : DKL (PX ||QX ) ≥ 0, for all laws (PX , QX ).