Probability
Probability
1 / 43
Fundamental Assumption In this Course
2 / 43
Statistical Inference vs Probability
−1
Statistical Inference = Probability
In Probability In Statistical Inference
For a specified probability For a specified set of data, what
distribution, what are the are properties of the
properties of data from this distribution(s)
distribution?
Example
Example i.i.d
X1 , X2 . . . ∼ N (θ, 1) for some
X1 , X2 · · · ∼ N (0, 1) which are θ
independent (called i.i.d). Observe that X1 = 0.134,
are P (X
What 1 > 0.5), X2 = −1, . . . .
P X1 +X32 +X3 > 2? What is θ?
3 / 43
In this session
4 / 43
Random Variables
Joint Distribution
Joint Distribution
Expectation and Covariance
Two important multivariate distribution
5 / 43
Discrete Random Variable
fX (x) = P (X = k)
6 / 43
Continuous Random Variable
7 / 43
Cumulative Distribution Function (c.d.f)
The distribution of X can also be specified by its cumulative
distribution function
FX (x) = P (X ≤ x)
8 / 43
Expectation
Expectation or mean or average value of g(X) for some function
g and random variable X is
(P
x g(x)fX (x) if X is a discrete random variable
E[g(X)] = R
R g(x)fX (x) if X is a discrete random variable
Mean of X
(P
x xfX (x) if X is a discrete random variable
µX = E(X) = R
R xfX (x) if X is a discrete random variable
Variance of X
(P
x (x − E(X))2 fX (x) in discrete case
V ar(X) = E[(X−E(X))]2 R 2
R (x − E(X)) fX (x) in continuous cas
9 / 43
Properties of Mean and Variance
E(aX + b) = aE(X) + b
E(X1 + · · · + Xn ) = E(X1 ) + · · · + E(Xn )
2.
V ar(aX + b) = a2 V ar(X)
p
3. V ar(X) is called then standard deviation of X
10 / 43
Random Variables
Joint Distribution
Joint Distribution
Expectation and Covariance
Two important multivariate distribution
11 / 43
Example
Toss a fair coin three times. Let X be the number of heads in
the first two tosses and Y be the total number of heads in three
tosses.
Outcome HHH HHT HT H HT T T HH T HT TTH TTT
Value of X 2 2 1 1 1 1 0 0
Value of Y 3 2 2 1 2 1 1 0
Probability 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8
12 / 43
Summarize all possible pair value of (X, Y ) and the
corresponding probability
y
0 1 2 3
1 1
0 8 8 0 0
1 1
1 0 4 4 0
x 1 1
2 0 0 8 8
3 0 0 0 0
P (X ≤ 2, Y = 2) =P (X = 0, Y = 2) + P (X = 1, Y = 2)
+ P (X = 2, Y = 2) = . . .
P (X = 1) =P (X = 1, Y = 1) + P (X = 1, Y = 2) = . . .
13 / 43
Joint Probability Mass Function (join p.m.f)
▶
f (x, y) = P (X = x, Y = y)
14 / 43
Marginal Probability Mass Function
15 / 43
Example
y p.m.f
0 1 2 3 of X
1 1
0 8 8 0 0 P (X = 0) = 81 + 81 + 0 + 0 = 14
1 1
1 0 4 4 0 P (X = 1) = 0 + 41 + 41 + 0 = 12
x 1 1
2 0 0 8 8 P (X = 2) = 0 + 0 + 81 + 18 = 14
3 0 0 0 0 P (X = 3) = 0 + 0 + 0 + 0 = 0
p.m.f of Y 1 3 3 1
8 8 8 8 total = 1
P (Y = y)
16 / 43
Joint Probability Density Function (joint p.d.f)
17 / 43
Marginal Probability Density Function
18 / 43
Example
19 / 43
Solution
1.
Z 0.25 Z 0.75
P (0 < X < 0.25, 0.5 < Y < 0.75) = f (x, y) dy dx
0 0.5
Z 0.25 Z 0.75
2
= (2x + 3y) dy dx
0 0.5 5
4
Z 0.25 Z 0.75 6
Z 0.25 Z 0.75
= x dy dx + y dy dx
5 0 0.5 5 0 0.5
| {z } | {z }
= xy|0.75
y=0.5 y2
0.75
= 2
y=0.5
4 0.25
Z 0.25
6 5
Z
= 0.25x dx + dx
5 0 5 0 32
= ...
20 / 43
2. If x ≤ 0 or x > 1 then f (x, y) = 0 for all y. Hence
Z ∞ Z ∞
fX (x) = f (x, y)dy = 0dy = 0
−∞ −∞
(
2
5 (2x + 3y) if 0 ≤ y ≤ 1
If 0 ≤ x ≤ 1 then f (x, y) = . So
0 otherwise
Z ∞ Z 1 1
2 2 3
fX (x) = f (x, y)dy = (2x + 3y)dy = 2xy + y 2
−∞ 0 5 5 2 y=0
4x + 3
=
5
(
4x+3
5 if 0 ≤ x ≤ 1
fX (x) =
0 otherwise
21 / 43
Exercise
Roll a fair dice twice. Let X and Y be the face number of the
first and second roll.
1. Find the joint probability mass function of X and Y
2. Determine the marginal probability mass function of X
and of Y
3. Explore the relationship between
▶ P (X = x|Y = y) and P (X = x)
▶ P (X = x, Y = y) and P (X = x)P (Y = y)
21 / 43
Statistically Independent
22 / 43
Expectation of g(X, Y )
(P
g(x, y)fX,Y (x, y)
x,y in discrete case
E[g(X, Y )] = RR
R2 g(x, y)fX,Y (x, y) in continuous case
Covariance
Correlation
The correlation between X and Y is their covariance
normalized by the product of their standard deviations:
Cov(X)
corr(X, Y ) = q p
V ar(X) V ar(Y )
23 / 43
Example
Compute the covariance and correlation of two discrete random
variable X and Y with joint p.m.f
y
0 1 2 3
1 1
0 8 8 0 0
1 1
1 0 4 4 0
x 1 1
2 0 0 8 8
3 0 0 0 0
Solution
We have
So
1 1
X
E(XY ) = xyf (x, y) = (0)(0) +(0)(1) +· · ·+(3)(3)(0) = 2
x,y 8 8
24 / 43
The marginal p.m.f of Y
The marginal p.m.f of X
x 0 1 2
y 0 1 2 3
P (X = x) 1/4 1/2 1/4
P (Y = y) 1/8 3/8 3/8 1/8
So
So
X
E(X) = xP (X = x) = 1 X
E(Y ) = xP (X = x) = 3/2
x
x
2
E(X ) = 3/2
E(Y 2 ) = 3
2 2
V ar(X) = E(X ) − [E(X)] = 1/2
V ar(Y ) = E(Y 2 ) − [E(Y )]2 = 3/4
Covariance
Correlation
Cov(X, Y ) 1/2
corr = p p =p p ≈ 0.8165
V ar(X) V ar(Y ) 1/2 3/4
25 / 43
Properties of Expectation and Covariance
1. Symmetry
Cov(X, Y ) = Cov(Y, X)
2. Bilinear property
n X
X m
Cov(a1 X1 +· · ·+an Xn , b1 Y1 +· · ·+bm Ym ) = ai bj Cov(Xi , Yj )
i=1 j=1
3.
Cov(X, X) = V ar(X)
4. Variance of sum
n
X X
V ar(X1 + X2 + · · · + Xn ) = V ar(Xi ) + 2 Cov(Xi , Xj )
i=1 i<j
Property
If X ∼ M ultinomial(n, p) then Xj ∼ Bernoulli(pj )
27 / 43
Standard Multivariate Normal Distribution
Z1
i.i.d
Let Z1 , . . . , Zk ∼ N (0, 1). The joint p.d.f of Z = . . . is
Zk
k
Y 1 − x2i 1 − 12 xT x
fZ1 ,...,Zk (x1 , . . . , xk ) = √ e 2 = k e
i=1 2π (2π) 2
x1
..
where x = . We say that Z has a standard multivariate
xk
Normal distribution written Z ∼ N (0, I) where 0 is a zero
column vector and I is a k × k identity matrix
28 / 43
Multivariate Normal Distribution
Let µ be a k × 1 column vector and Σ be a k × k symmetric,
definite
positive
matrix. A random vector
X1
X = . . . ∼ N (µ, Σ) if the joint p.d.f of X is
Xk
1 (x−µ)T ΣT (x−µ)
−
fX (x) = k 1 e
2 det(Σ)
(2π) 2 (det Σ) 2
Property
1. µi = E(Xi ) and Σij = Cov(Xi , Xj ). So we call µ be the
mean column and Σ be the covariance matrix
2. Xi ∼ N (µi , Σii )
3. a1 X1 + · · · + ak Xk has normal distribution for all numbers
a1 , . . . , ak
4. (X − µ)T Σ−1 (X − µ) ∼ χ2k
29 / 43
5. Conditional distribution of a component given any
information of other components also has normal
distribution
6. If Σ is diagonal then the components of X are statistically
independent
30 / 43
Random Variables
Joint Distribution
Joint Distribution
Expectation and Covariance
Two important multivariate distribution
31 / 43
Moment Generating Function
Definition (Moment Generating Function (MGF))
The moment generating function of a random variable X is a
function of single argument t ∈ R defined by
MX (t) = E(etX )
Theorem
Let X and Y be two random variables such that, for some h > 0
and every t ∈ (−h, h), both MX (t) and MY (t) are finite and
MX (t) = MY (t). Then X and Y have the same distribution.
The reason why the MGF will be useful for us is because if X1 , ..., Xn
are independent, then the MGF of their sum satisfies
MX1 +...+Xn (t) = E etX1 × ... × E etXn = MX1 (t)...MXn (t)
34 / 43
Example
i.i.d
Let X1 , . . . , Xn ∼ N (µ, σ) then the MGF of X1 + · · · + Xn is
σ 2 t2 σ 2 t2
MX1 +···+Xn (t) = MX1 (t) . . . MXn (t) = eµt+ 2 . . . eµt+ 2
nσ 2 t2
= enµt+ 2
X1 + · · · + Xn ∼ N (nµ, nσ 2 )
35 / 43
Example
P (X = x) = px (1 − p)1−x , x ∈ {0, 1}
The MGF of X is
or
MX (t) = pet + q
where q = 1 − p
36 / 43
Example
The MGF of X is
n n
!
tX
X
tk
X
tk n k
MX (t) = E[e ]= e P (X = k) = e p (1 − p)n−k
k=0 k=0
k
n
!
X n
= (pet )k q n−k with q = 1 − p
k=0
k
n
= pet + q
37 / 43
Exercise
i.i.d
Let X1 , . . . , Xn ∼ Ber(p).
1. Find the MGF of X1 + · · · + Xn
2. What is the distribution of X1 + · · · + Xn ?
38 / 43
Exercise
1. A random variable X has Poisson distribution with
parameter λ. It p.m.f is given by
λk
P (X = k) = e−λ , k = 0, 1, 2, 3, . . .
k!
The MGF of X is
α
1 1
MX (t) = , for t ≤ k = β1 , θ = α
1 − βt β
i.i.d
Let X1 , . . . , Xn ∼ E(λ). Find the MGF and distribution of
X1 + · · · + Xn
41 / 43
Statistics
For data X1 , ..., Xn , a statistic T (X1 , ..., Xn ) is any real-valued
function of the data. In other words, it is any number that you
can compute from the data.
Example
Sample mean
X1 + · · · + Xn
X̄ =
n
i.i.d
If X1 , . . . , Xn ∼ N (µ, σ) then
!
σ2
X̄ ∼ N µ,
n
42 / 43
Exercise
43 / 43