0% found this document useful (0 votes)
18 views44 pages

Probability

Uploaded by

Huế Lê
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views44 pages

Probability

Uploaded by

Huế Lê
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Review Probability

1 / 43
Fundamental Assumption In this Course

Data Comes From Random Sampling


Process

Randomness is a modeling assumption for something we don’t


understand (for example, errors in measurements)

2 / 43
Statistical Inference vs Probability

−1
Statistical Inference = Probability
In Probability In Statistical Inference
For a specified probability For a specified set of data, what
distribution, what are the are properties of the
properties of data from this distribution(s)
distribution?
Example
Example i.i.d
X1 , X2 . . . ∼ N (θ, 1) for some
X1 , X2 · · · ∼ N (0, 1) which are θ
independent (called i.i.d). Observe that X1 = 0.134,
 are P (X
What 1 > 0.5), X2 = −1, . . . .
P X1 +X32 +X3 > 2? What is θ?

3 / 43
In this session

▶ review probability distribution and the mean, variance


▶ study joint probability distribution to explore simultaneous
outcomes of two (or more) quantities such as the amount of
precipitate P and volume V of gas released from a
controlled chemical experiment, the hardness H and tensile
strength T of cold-drawn copper.
▶ Explore the statistically independent concept between
random variables, which is an important property of
random sampling
▶ Study the moment generating function, a useful tool to
discovery the distribution of some important statistics such
as sample mean

4 / 43
Random Variables

Joint Distribution
Joint Distribution
Expectation and Covariance
Two important multivariate distribution

Moment Generating Function

5 / 43
Discrete Random Variable

A discrete random variable X can take a finite or countably


infinite number of possible values. The distribution of X is
specified by its probability mass function (p.m.f):

fX (x) = P (X = k)

For any set A of values that X can take


X
P (X ∈ A) = fX (x)
A

6 / 43
Continuous Random Variable

A continuous random variable X takes values in R and models


continuous real-valued data . The distribution of X is specified
by its probability density function (p.d.f) fX (x), which
satisfies for any set A ⊂ R
Z
P (X ∈ A) = fX (x)dx
x∈A

7 / 43
Cumulative Distribution Function (c.d.f)
The distribution of X can also be specified by its cumulative
distribution function

FX (x) = P (X ≤ x)

Discrete Case Continuous Case


Z x
FX (x) =
X
P (X = u) FX (x) = fX (u)du
−∞
u≤x
Inversely
Inversely
d

fX (x) = FX (x)
fX (x) = FX (x) − FX (x ) dx

8 / 43
Expectation
Expectation or mean or average value of g(X) for some function
g and random variable X is
(P
x g(x)fX (x) if X is a discrete random variable
E[g(X)] = R
R g(x)fX (x) if X is a discrete random variable

Mean of X

(P
x xfX (x) if X is a discrete random variable
µX = E(X) = R
R xfX (x) if X is a discrete random variable

Variance of X

(P
x (x − E(X))2 fX (x) in discrete case
V ar(X) = E[(X−E(X))]2 R 2
R (x − E(X)) fX (x) in continuous cas
9 / 43
Properties of Mean and Variance

For any real numbers a, b and variable X, X1 , ..., Xn


1. Linear property of mean

E(aX + b) = aE(X) + b
E(X1 + · · · + Xn ) = E(X1 ) + · · · + E(Xn )

2.
V ar(aX + b) = a2 V ar(X)
p
3. V ar(X) is called then standard deviation of X

10 / 43
Random Variables

Joint Distribution
Joint Distribution
Expectation and Covariance
Two important multivariate distribution

Moment Generating Function

11 / 43
Example
Toss a fair coin three times. Let X be the number of heads in
the first two tosses and Y be the total number of heads in three
tosses.
Outcome HHH HHT HT H HT T T HH T HT TTH TTT
Value of X 2 2 1 1 1 1 0 0
Value of Y 3 2 2 1 2 1 1 0
Probability 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8

▶ The pair value of X and Y with respect to the outcome HHH is


2 and 3. We denote by

(X, Y )(HHH) = (2, 3)

▶ (X, Y ) = (1, 2) that is X = 1 and Y = 2 if and only if the


outcomes is HT H or T HH. So (X, Y ) = (1, 2) is considered as
an event of {HT H, T HH}. Then

P ((X, Y ) = (1, 2)) = P ({HT H, T HH}) = P (HT H)+P (T HH) = ...

12 / 43
Summarize all possible pair value of (X, Y ) and the
corresponding probability

y
0 1 2 3
1 1
0 8 8 0 0
1 1
1 0 4 4 0
x 1 1
2 0 0 8 8
3 0 0 0 0

From the above table, we can evaluate any probabilty


concerning on X and Y such as

P (X ≤ 2, Y = 2) =P (X = 0, Y = 2) + P (X = 1, Y = 2)
+ P (X = 2, Y = 2) = . . .
P (X = 1) =P (X = 1, Y = 1) + P (X = 1, Y = 2) = . . .

13 / 43
Joint Probability Mass Function (join p.m.f)

The function f (x, y) is a joint probability mass function of the


discrete random variables X and Y if
▶ f (x, y) ≥ 0
▶ XX
f (x, y) = 1
x y


f (x, y) = P (X = x, Y = y)

For any A ⊂ R2 , we have


X X
P ((X, Y ) ∈ A) = P (X = x, Y = y) = f (x, y)
(x,y)∈A (x,y)∈A

14 / 43
Marginal Probability Mass Function

▶ The marginal probability mass function of X is


X
fX (x) = f (x, y)
y

▶ The marginal probability mass function of Y is


X
fY (x) = f (x, y)
x

15 / 43
Example

y p.m.f
0 1 2 3 of X
1 1
0 8 8 0 0 P (X = 0) = 81 + 81 + 0 + 0 = 14
1 1
1 0 4 4 0 P (X = 1) = 0 + 41 + 41 + 0 = 12
x 1 1
2 0 0 8 8 P (X = 2) = 0 + 0 + 81 + 18 = 14
3 0 0 0 0 P (X = 3) = 0 + 0 + 0 + 0 = 0
p.m.f of Y 1 3 3 1
8 8 8 8 total = 1
P (Y = y)

16 / 43
Joint Probability Density Function (joint p.d.f)

The function f (x, y) is joint probability density function of the


continuous random variables X and Y if
▶ f (x, y) ≥ 0
▶ ZZ
f (x, y)dxdy = 1
R2
▶ For any A ⊂ R2 , we have
ZZ
P ((X, Y ) ∈ A) = f (x, y)dxdy
(x,y)∈A

17 / 43
Marginal Probability Density Function

▶ The marginal probability mass function of X is


Z ∞
fX (x) = f (x, y)dy
−∞

▶ The marginal probability mass function of Y is


Z ∞
fY (x) = f (x, y)dx
−∞

18 / 43
Example

Joint p.d.f of two continous random variables X and Y is


(
2
5 (2x + 3y) if 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
f (x, y) =
0 otherwise

1. Compute P (0 < X < 0.25, 0.5 < Y < 0.75)


2. Find the marginal probability density function of X

19 / 43
Solution

1.
Z 0.25 Z 0.75
P (0 < X < 0.25, 0.5 < Y < 0.75) = f (x, y) dy dx
0 0.5
Z 0.25 Z 0.75
2
= (2x + 3y) dy dx
0 0.5 5
4
Z 0.25 Z 0.75 6
Z 0.25 Z 0.75
= x dy dx + y dy dx
5 0 0.5 5 0 0.5
| {z } | {z }
= xy|0.75
y=0.5 y2
0.75
= 2
y=0.5

4 0.25
Z 0.25
6 5
Z
= 0.25x dx + dx
5 0 5 0 32
= ...

20 / 43
2. If x ≤ 0 or x > 1 then f (x, y) = 0 for all y. Hence

Z ∞ Z ∞
fX (x) = f (x, y)dy = 0dy = 0
−∞ −∞

(
2
5 (2x + 3y) if 0 ≤ y ≤ 1
If 0 ≤ x ≤ 1 then f (x, y) = . So
0 otherwise

Z ∞ Z 1 1
2 2 3

fX (x) = f (x, y)dy = (2x + 3y)dy = 2xy + y 2
−∞ 0 5 5 2 y=0
4x + 3
=
5

Thus the marginal p.d.f of X is

(
4x+3
5 if 0 ≤ x ≤ 1
fX (x) =
0 otherwise
21 / 43
Exercise

Roll a fair dice twice. Let X and Y be the face number of the
first and second roll.
1. Find the joint probability mass function of X and Y
2. Determine the marginal probability mass function of X
and of Y
3. Explore the relationship between
▶ P (X = x|Y = y) and P (X = x)
▶ P (X = x, Y = y) and P (X = x)P (Y = y)

21 / 43
Statistically Independent

Let X and Y be two random variables, discrete or continuous,


with joint probability distribution f (x, y) and marginal
distributions f( x) and fY (y), respectively. The random variables
X and Y are said to be statistically independent if and only if

f (x, y) = fX (x)fY (y)

for all (x, y) within their range.

22 / 43
Expectation of g(X, Y )

(P
g(x, y)fX,Y (x, y)
x,y in discrete case
E[g(X, Y )] = RR
R2 g(x, y)fX,Y (x, y) in continuous case

Covariance

Cov(X, Y ) = E [(X − E(X))(Y − E(Y ))] = E(XY ) − E(X)E(Y )

Correlation
The correlation between X and Y is their covariance
normalized by the product of their standard deviations:

Cov(X)
corr(X, Y ) = q p
V ar(X) V ar(Y )
23 / 43
Example
Compute the covariance and correlation of two discrete random
variable X and Y with joint p.m.f
y
0 1 2 3
1 1
0 8 8 0 0
1 1
1 0 4 4 0
x 1 1
2 0 0 8 8
3 0 0 0 0
Solution
We have

E(XY ) = E[g(X, Y )] where g(x, y) = xy

So
1 1
X    
E(XY ) = xyf (x, y) = (0)(0) +(0)(1) +· · ·+(3)(3)(0) = 2
x,y 8 8
24 / 43
The marginal p.m.f of Y
The marginal p.m.f of X

x 0 1 2
y 0 1 2 3
P (X = x) 1/4 1/2 1/4
P (Y = y) 1/8 3/8 3/8 1/8
So
So
X
E(X) = xP (X = x) = 1 X
E(Y ) = xP (X = x) = 3/2
x
x
2
E(X ) = 3/2
E(Y 2 ) = 3
2 2
V ar(X) = E(X ) − [E(X)] = 1/2
V ar(Y ) = E(Y 2 ) − [E(Y )]2 = 3/4
Covariance

Cov(X, Y ) = E(XY ) − E(X)E(Y ) = 2 − (1)(3/2) = 1/2

Correlation
Cov(X, Y ) 1/2
corr = p p =p p ≈ 0.8165
V ar(X) V ar(Y ) 1/2 3/4

25 / 43
Properties of Expectation and Covariance
1. Symmetry
Cov(X, Y ) = Cov(Y, X)
2. Bilinear property
n X
X m
Cov(a1 X1 +· · ·+an Xn , b1 Y1 +· · ·+bm Ym ) = ai bj Cov(Xi , Yj )
i=1 j=1
3.
Cov(X, X) = V ar(X)
4. Variance of sum
n
X X
V ar(X1 + X2 + · · · + Xn ) = V ar(Xi ) + 2 Cov(Xi , Xj )
i=1 i<j

5. If X and Y are indepenent then


5.1 E[f (X)g(Y )] = E[f (X)]E[g(Y )]
5.2 Cov(X, Y ) = 0
6. If X1 , . . . , Xn are independent then
n
X
V ar(X1 + X2 + · · · + Xn ) = V ar(Xi )
26 / 43
Multinomial Distribution
The multivariate version of a Binomial is called a Multinomial.
Consider drawing a ball from an urn which has balls with k
different colors labeled “color 1, color 2, ..., color k.” Let
p = (p1 , ..., pk ) where pj ≥ 0 and pj = 1. Suppose pj is
P

probability to draw a ball of color j. Draw n times (independent


draws with replacement) and let X = (X1 , ..., Xk ) where Xj
is the number P
of times that color j appears.
Hence n = Xj . We say that X has a M ultinomial(n, p)
distribution written X ∼ M ultinomial(n, p). The joint
probability mass function is
!
n
f (x1 , . . . , xn ) = P (X1 = x1 , . . . , Xk = xk ) = px1 . . . pxk k
x1 . . . xk 1
n  n!
where x1 ...xk = x1 !...xk !

Property
If X ∼ M ultinomial(n, p) then Xj ∼ Bernoulli(pj )
27 / 43
Standard Multivariate Normal Distribution

 
Z1
i.i.d
Let Z1 , . . . , Zk ∼ N (0, 1). The joint p.d.f of Z = . . . is
 
Zk

k
Y 1 − x2i 1 − 12 xT x
fZ1 ,...,Zk (x1 , . . . , xk ) = √ e 2 = k e
i=1 2π (2π) 2

x1
 
 .. 
where x =  .  We say that Z has a standard multivariate
xk
Normal distribution written Z ∼ N (0, I) where 0 is a zero
column vector and I is a k × k identity matrix

28 / 43
Multivariate Normal Distribution
Let µ be a k × 1 column vector and Σ be a k × k symmetric,
definite
 positive
 matrix. A random vector
X1
X =  . . .  ∼ N (µ, Σ) if the joint p.d.f of X is
 
Xk
1 (x−µ)T ΣT (x−µ)

fX (x) = k 1 e
2 det(Σ)

(2π) 2 (det Σ) 2

Property
1. µi = E(Xi ) and Σij = Cov(Xi , Xj ). So we call µ be the
mean column and Σ be the covariance matrix
2. Xi ∼ N (µi , Σii )
3. a1 X1 + · · · + ak Xk has normal distribution for all numbers
a1 , . . . , ak
4. (X − µ)T Σ−1 (X − µ) ∼ χ2k
29 / 43
5. Conditional distribution of a component given any
information of other components also has normal
distribution
6. If Σ is diagonal then the components of X are statistically
independent

30 / 43
Random Variables

Joint Distribution
Joint Distribution
Expectation and Covariance
Two important multivariate distribution

Moment Generating Function

31 / 43
Moment Generating Function
Definition (Moment Generating Function (MGF))
The moment generating function of a random variable X is a
function of single argument t ∈ R defined by

MX (t) = E(etX )

Theorem
Let X and Y be two random variables such that, for some h > 0
and every t ∈ (−h, h), both MX (t) and MY (t) are finite and
MX (t) = MY (t). Then X and Y have the same distribution.

The reason why the MGF will be useful for us is because if X1 , ..., Xn
are independent, then the MGF of their sum satisfies
MX1 +...+Xn (t) = E etX1 × ... × E etXn = MX1 (t)...MXn (t)
   

This gives us a very simple tool to understand the distributions of


sums of independent random variables.
32 / 43
Example
Consider a standard normal random variable Z ∼ N (0, 1). The
MGF of Z is
h i Z ∞
tZ
MZ (t) = E e = etx fZ (x)dx
−∞
Z ∞
1 x2
= etx √ e− 2 dx
−∞ 2π
Z ∞
1 − x2 +tx
= √ e 2 dx
−∞ 2π
Z ∞
1 x2 −2tx
= √ e− 2σ2 dx
−∞ 2π
Z ∞
1 − x2 −2tx+t2 −t2 −
= √ e 2 dx
−∞ 2π
Z ∞
t2 1 (x−t)2 t2
=e 2 √ e− 2 dx = e 2
−∞ 2π
| {z }
p.d.f of N (t,1)
| {z }
=1
33 / 43
Example

Let X ∼ N (µ, σ 2 ) then X = µ + σZ the MGF of X is


h i h i
MX (t) = E et(µ+σZ) = eµt E eσtZ = eµt MZ (σt)
(σt)2 σ 2 t2
= eµt e 2 = eµt+ 2

34 / 43
Example

i.i.d
Let X1 , . . . , Xn ∼ N (µ, σ) then the MGF of X1 + · · · + Xn is
σ 2 t2 σ 2 t2
MX1 +···+Xn (t) = MX1 (t) . . . MXn (t) = eµt+ 2 . . . eµt+ 2

nσ 2 t2
= enµt+ 2

which is the MGF of normal distribution with mean nµ and


variance nσ 2 . So

X1 + · · · + Xn ∼ N (nµ, nσ 2 )

35 / 43
Example

Let X ∼ Ber(p) with p.m.f

P (X = x) = px (1 − p)1−x , x ∈ {0, 1}

The MGF of X is

MX (t) = E[etX ] = et×0 P (X = 0) + et×1 P (X = 1) = 1 − p + pet

or
MX (t) = pet + q
where q = 1 − p

36 / 43
Example

Let X ∼ Bin(n, p) with p.m.f


!
n k
P (X = k) = p (1 − p)n−k , for k = 0, 1, . . . , n
k

The MGF of X is
n n
!
tX
X
tk
X
tk n k
MX (t) = E[e ]= e P (X = k) = e p (1 − p)n−k
k=0 k=0
k
n
!
X n
= (pet )k q n−k with q = 1 − p
k=0
k
 n
= pet + q

37 / 43
Exercise

i.i.d
Let X1 , . . . , Xn ∼ Ber(p).
1. Find the MGF of X1 + · · · + Xn
2. What is the distribution of X1 + · · · + Xn ?

38 / 43
Exercise
1. A random variable X has Poisson distribution with
parameter λ. It p.m.f is given by
λk
P (X = k) = e−λ , k = 0, 1, 2, 3, . . .
k!

Find the MGF of X


i.i.d
2. Let X1 , . . . , Xn ∼ P oisson(λ). Find the MGF of
X1 + · · · + Xn . What is the distribution of X1 + · · · + Xn ? 39 / 43
Exercise
Random variable X has exponential distribution E(λ). Its p.d.f
is (
λe−λx if x > 0
f (x) =
0 otherwise

Find the MGF of X


40 / 43
Exercise
X has gamma distribution with
parameters α and β, denoted
by X ∼ Gamma(α, β) if the
p.d.f of X is given by
1 x
α−1 − β
fX (x) = x e
β α Γ(α)

where Γ(α) = 0∞ y α−1 e−y dy.


R

The MGF of X is

1 1

MX (t) = , for t ≤ k = β1 , θ = α
1 − βt β
i.i.d
Let X1 , . . . , Xn ∼ E(λ). Find the MGF and distribution of
X1 + · · · + Xn

41 / 43
Statistics
For data X1 , ..., Xn , a statistic T (X1 , ..., Xn ) is any real-valued
function of the data. In other words, it is any number that you
can compute from the data.
Example
Sample mean
X1 + · · · + Xn
X̄ =
n
i.i.d
If X1 , . . . , Xn ∼ N (µ, σ) then
!
σ2
X̄ ∼ N µ,
n

and sample variance


1  
S2 = (X1 − X̄)2 + · · · + (Xn − X̄)2 ∼ χ2n−1
n−2

42 / 43
Exercise

Rice Exercise 79 page 173


Rice Exercise 83 page 174
Rice Exercie 89 page 174

43 / 43

You might also like