Prob Review
Prob Review
Rob Hall
September 9, 2010
What is Probability?
AC
A A∩B B A∪B
Properties of Set Operations
I Commutativity: A ∪ B = B ∪ A
I Associativity: A ∪ (B ∪ C ) = (A ∪ B) ∪ C .
I Likewise for intersection.
I Proof?
Properties of Set Operations
I Commutativity: A ∪ B = B ∪ A
I Associativity: A ∪ (B ∪ C ) = (A ∪ B) ∪ C .
I Likewise for intersection.
I Proof? Follows easily from commutative and associative
properties of “and” and “or” in the definitions.
Properties of Set Operations
I Commutativity: A ∪ B = B ∪ A
I Associativity: A ∪ (B ∪ C ) = (A ∪ B) ∪ C .
I Likewise for intersection.
I Proof? Follows easily from commutative and associative
properties of “and” and “or” in the definitions.
I Distributive properties: A ∩ (B ∪ C ) = (A ∩ B) ∪ (A ∩ C )
A ∪ (B ∩ C ) = (A ∪ B) ∩ (A ∪ C )
I Proof?
Properties of Set Operations
I Commutativity: A ∪ B = B ∪ A
I Associativity: A ∪ (B ∪ C ) = (A ∪ B) ∪ C .
I Likewise for intersection.
I Proof? Follows easily from commutative and associative
properties of “and” and “or” in the definitions.
I Distributive properties: A ∩ (B ∪ C ) = (A ∩ B) ∪ (A ∩ C )
A ∪ (B ∩ C ) = (A ∪ B) ∩ (A ∪ C )
I Proof? Show each side of the equality contains the other.
I DeMorgan’s Law ...see book.
Disjointness and Partitions
Remarks: may consider the event space to be the power set of the sample
space (for a discrete sample space - more later).
Probability Terminology
Remarks: may consider the event space to be the power set of the sample
space (for a discrete sample space - more later). e.g., rolling a fair die:
Ω = {1, 2, 3, 4, 5, 6}
Probability Terminology
Remarks: may consider the event space to be the power set of the sample
space (for a discrete sample space - more later). e.g., rolling a fair die:
Ω = {1, 2, 3, 4, 5, 6}
F = 2Ω = {{1}, {2} . . . {1, 2} . . . {1, 2, 3} . . . {1, 2, 3, 4, 5, 6}, {}}
Probability Terminology
Remarks: may consider the event space to be the power set of the sample
space (for a discrete sample space - more later). e.g., rolling a fair die:
Ω = {1, 2, 3, 4, 5, 6}
F = 2Ω = {{1}, {2} . . . {1, 2} . . . {1, 2, 3} . . . {1, 2, 3, 4, 5, 6}, {}}
P({1}) = P({2}) = . . . = 16 (i.e., a fair die)
P({1, 3, 5}) = 21 (i.e., half chance of odd result)
P({1, 2, 3, 4, 5, 6}) = 1 (i.e., result is “almost surely” one of the faces).
Axioms for Probability
A A∩B B
P(A ∪ B) – General Unions
P(A ∩ B)
P(A|B) =
P(B)
A A∩B B
Interpretation: the outcome is definitely in B, so treat
B as the entire sample space and find the probability
that the outcome is also in A.
Conditional Probabilities
P(A ∩ B)
P(A|B) =
P(B)
A A∩B B
Interpretation: the outcome is definitely in B, so treat
B as the entire sample space and find the probability
that the outcome is also in A.
This rapidly leads to: P(A|B)P(B) = P(A ∩ B) aka the “chain rule for
probabilities.” (why?)
Conditional Probabilities
P(A ∩ B)
P(A|B) =
P(B)
A A∩B B
Interpretation: the outcome is definitely in B, so treat
B as the entire sample space and find the probability
that the outcome is also in A.
This rapidly leads to: P(A|B)P(B) = P(A ∩ B) aka the “chain rule for
probabilities.” (why?)
When A1 , A2 . . . are a partition of Ω:
∞
X ∞
X
P(B) = P(B ∩ Ai ) = P(B|Ai )P(Ai )
i=1 i=1
Conditional Probabilities
P(A ∩ B)
P(A|B) =
P(B)
A A∩B B
Interpretation: the outcome is definitely in B, so treat
B as the entire sample space and find the probability
that the outcome is also in A.
This rapidly leads to: P(A|B)P(B) = P(A ∩ B) aka the “chain rule for
probabilities.” (why?)
When A1 , A2 . . . are a partition of Ω:
∞
X ∞
X
P(B) = P(B ∩ Ai ) = P(B|Ai )P(Ai )
i=1 i=1
P(A) =
Conditional Probability Example
Suppose we throw a fair die:
Ω = {1, 2, 3, 4, 5, 6}, F = 2Ω , P({i}) = 16 , i = 1 . . . 6
A = {1, 2, 3, 4} i.e., “result is less than 5,”
B = {1, 3, 5} i.e., “result is odd.”
2
P(A) =
3
P(B) =
Conditional Probability Example
Suppose we throw a fair die:
Ω = {1, 2, 3, 4, 5, 6}, F = 2Ω , P({i}) = 16 , i = 1 . . . 6
A = {1, 2, 3, 4} i.e., “result is less than 5,”
B = {1, 3, 5} i.e., “result is odd.”
2
P(A) =
3
1
P(B) =
2
P(A|B) =
Conditional Probability Example
Suppose we throw a fair die:
Ω = {1, 2, 3, 4, 5, 6}, F = 2Ω , P({i}) = 16 , i = 1 . . . 6
A = {1, 2, 3, 4} i.e., “result is less than 5,”
B = {1, 3, 5} i.e., “result is odd.”
2
P(A) =
3
1
P(B) =
2
P(A ∩ B)
P(A|B) =
P(B)
Conditional Probability Example
Suppose we throw a fair die:
Ω = {1, 2, 3, 4, 5, 6}, F = 2Ω , P({i}) = 16 , i = 1 . . . 6
A = {1, 2, 3, 4} i.e., “result is less than 5,”
B = {1, 3, 5} i.e., “result is odd.”
2
P(A) =
3
1
P(B) =
2
P(A ∩ B)
P(A|B) =
P(B)
P({1, 3})
=
P(B)
Conditional Probability Example
Suppose we throw a fair die:
Ω = {1, 2, 3, 4, 5, 6}, F = 2Ω , P({i}) = 16 , i = 1 . . . 6
A = {1, 2, 3, 4} i.e., “result is less than 5,”
B = {1, 3, 5} i.e., “result is odd.”
2
P(A) =
3
1
P(B) =
2
P(A ∩ B)
P(A|B) =
P(B)
P({1, 3})
=
P(B)
2
=
3
Conditional Probability Example
Suppose we throw a fair die:
Ω = {1, 2, 3, 4, 5, 6}, F = 2Ω , P({i}) = 16 , i = 1 . . . 6
A = {1, 2, 3, 4} i.e., “result is less than 5,”
B = {1, 3, 5} i.e., “result is odd.”
2
P(A) =
3
1
P(B) =
2
P(A ∩ B) P(A ∩ B)
P(A|B) = P(B|A) =
P(B) P(A)
P({1, 3})
=
P(B)
2
=
3
Conditional Probability Example
Suppose we throw a fair die:
Ω = {1, 2, 3, 4, 5, 6}, F = 2Ω , P({i}) = 16 , i = 1 . . . 6
A = {1, 2, 3, 4} i.e., “result is less than 5,”
B = {1, 3, 5} i.e., “result is odd.”
2
P(A) =
3
1
P(B) =
2
P(A ∩ B) P(A ∩ B)
P(A|B) = P(B|A) =
P(B) P(A)
P({1, 3}) 1
= =
P(B) 2
2
=
3
P(A|B)P(B)
P(B|A) =
P(A)
Often this is written as:
P(A|Bi )P(Bi )
P(Bi |A) = P
i P(A|Bi )P(Bi )
Where Bi are a partition of Ω (note the bottom is just the law of
total probability).
Independence
1.0
1.0
0.8
0.8
0.6
0.6
●
FX(x)
FX(x)
0.4
0.4
0.2
0.2
●
0.0
0.0
0 1 2 3 4 −2 −1 0 1 2
Discrete Distributions
0.2
0.1
0.0
−4 −2 0 2 4
x
Multiple Random Variables
= aE (X ) + bE (Y ) + c
Characteristics of Distributions
Questions:
1. E [EX ] =
Characteristics of Distributions
Questions:
P
1. E [EX ] = x (EX )fX (x) =
Characteristics of Distributions
Questions:
P P
1. E [EX ] = x (EX )fX (x) = (EX ) x fX (x) = EX
Characteristics of Distributions
Questions:
P P
1. E [EX ] = x (EX )fX (x) = (EX ) x fX (x) = EX
2. E (X · Y ) = E (X )E (Y )?
Characteristics of Distributions
Questions:
P P
1. E [EX ] = x (EX )fX (x) = (EX ) x fX (x) = EX
2. E (X · Y ) = E (X )E (Y )?
Not in general, although when fX ,Y = fX fY :
X X X
E (X ·Y ) = xyfX (x)fY (y ) = xfX (x) yfY (y ) = EX ·EY
x,y x y
Characteristics of Distributions
Var(X ) = E (X − EX )2
This may give an idea of how “spread out” a distribution is.
Characteristics of Distributions
Var(X ) = E (X − EX )2
This may give an idea of how “spread out” a distribution is.
A useful alternate form is:
E (X − EX )2 = E [X 2 − 2XE (X ) + (EX )2 ]
= E (X 2 ) − 2E (X )E (X ) + (EX )2
= E (X 2 ) − (EX )2
Characteristics of Distributions
Var(X ) = E (X − EX )2
This may give an idea of how “spread out” a distribution is.
A useful alternate form is:
E (X − EX )2 = E [X 2 − 2XE (X ) + (EX )2 ]
= E (X 2 ) − 2E (X )E (X ) + (EX )2
= E (X 2 ) − (EX )2
Var(X + Y ) = E (X − EX + Y − EY )2
= E (X − EX )2 + E (Y − EY )2 +2 E (X − EX )(Y − EY )
| {z } | {z } | {z }
Var(X ) Var(Y ) Cov(X ,Y )
Characteristics of Distributions
Var(X + Y ) = E (X − EX + Y − EY )2
= E (X − EX )2 + E (Y − EY )2 +2 E (X − EX )(Y − EY )
| {z } | {z } | {z }
Var(X ) Var(Y ) Cov(X ,Y )
(why?)
Putting it all together
E (X̄n ) =
Putting it all together
n
1X
E (X̄n ) = E [ Xi ] =
n
i=1
Putting it all together
n n
1X 1X
E (X̄n ) = E [ Xi ] = E (Xi ) =
n n
i=1 i=1
Putting it all together
n n
1X 1X 1
E (X̄n ) = E [ Xi ] = E (Xi ) = nµ = µ
n n n
i=1 i=1
Putting it all together
n n
1X 1X 1
E (X̄n ) = E [ Xi ] = E (Xi ) = nµ = µ
n n n
i=1 i=1
n
1X
Var(X̄n ) = Var( Xi ) =
n
i=1
Putting it all together
n n
1X 1X 1
E (X̄n ) = E [ Xi ] = E (Xi ) = nµ = µ
n n n
i=1 i=1
n
1X 1 σ2
Var(X̄n ) = Var( Xi ) = 2 nσ 2 =
n n n
i=1
Entropy of a Distribution
Entropy gives the mean depth in the tree (= mean number of bits).
Law of Large Numbers (LLN)
σ2
P(|X̄n − µ| ≥ ) ≤ →0
n2
For any fixed , as n → ∞.
Law of Large Numbers (LLN)
Recall our variable X̄n = n1 ni=1 Xi .
P
We may wonder about its behavior as n → ∞.
Law of Large Numbers (LLN)
Recall our variable X̄n = n1 ni=1 Xi .
P
We may wonder about its behavior as n → ∞.
The weak law of large numbers:
In English: choose and a probability that |X̄n − µ| < , I can find you
an n so your probability is achieved.
Law of Large Numbers (LLN)
Recall our variable X̄n = n1 ni=1 Xi .
P
We may wonder about its behavior as n → ∞.
The weak law of large numbers:
In English: choose and a probability that |X̄n − µ| < , I can find you
an n so your probability is achieved.
The strong law of large numbers:
P( lim X̄n = µ) = 1
n→∞
In English: choose and a probability that |X̄n − µ| < , I can find you
an n so your probability is achieved.
The strong law of large numbers:
P( lim X̄n = µ) = 1
n→∞
n= 1 0.8 n= 2 n= 10 n= 75
0.8
2.0
0.6
0.6
0.6
1.5
Density
Density
Density
Density
0.4
0.4
0.4
1.0
0.2
0.2
0.2
0.5
0.0
0.0
0.0
0.0
1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6
h h h h
Central Limit Theorem (CLT)
The distribution of X̄n also converges weakly to a Gaussian,
x −µ
lim FX̄n (x) = Φ( √ )
n→∞ nσ
Simulated n dice rolls and took average, 5000 times:
n= 1 0.8 n= 2 n= 10 n= 75
0.8
2.0
0.6
0.6
0.6
1.5
Density
Density
Density
Density
0.4
0.4
0.4
1.0
0.2
0.2
0.2
0.5
0.0
0.0
0.0
0.0
1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6
h h h h
n= 1 0.8 n= 2 n= 10 n= 75
0.8
2.0
0.6
0.6
0.6
1.5
Density
Density
Density
Density
0.4
0.4
0.4
1.0
0.2
0.2
0.2
0.5
0.0
0.0
0.0
0.0
1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6
h h h h