0% found this document useful (0 votes)
73 views28 pages

My Notes For Discrete and Continuous Distributions 987654

This follows from the fact that the product of the densities of independent standard normal random variables is equal to the density of their joint distribution, which is the p-dimensional standard normal distribution. Specifically, let Z1, ..., Zp be independent standard normal random variables. Then their joint density is the product of the marginal densities: Qp i=1 √1 2π exp(-1/2zi2) = √1/(2π)p exp(-1/2 ∑zi2) = (2π)-p/2 exp(-1/2zTz) Where the last equality follows from expanding zTz.

Uploaded by

Shah Fahad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views28 pages

My Notes For Discrete and Continuous Distributions 987654

This follows from the fact that the product of the densities of independent standard normal random variables is equal to the density of their joint distribution, which is the p-dimensional standard normal distribution. Specifically, let Z1, ..., Zp be independent standard normal random variables. Then their joint density is the product of the marginal densities: Qp i=1 √1 2π exp(-1/2zi2) = √1/(2π)p exp(-1/2 ∑zi2) = (2π)-p/2 exp(-1/2zTz) Where the last equality follows from expanding zTz.

Uploaded by

Shah Fahad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

V: Discrete and continuous distributions

A modern crash course in intermediate Statistics and Probability

Paul Rognon

Barcelona School of Economics


Universitat Pompeu Fabra
Universitat Politècnica de Catalunya

1 / 22
Discrete distributions
Bernoulli distribution

The Bernoulli distribution models an experiment with a single binary


outcome (e.g.: success or fail, head or tail). We say that X has a
Bernoulli distribution with parameter p > 0, and write X ∼ Bern(p), if

fX (x) = p x (1 − p)1−x for x = 0, 1

Related distribution and model


• the logistic regression for binary outcome is a generalized linear
model for response with Bernoulli distribution.
• if X ∼ Bern(p), then 2X − 1 has a Rademacher distribution, a
distribution that occurs in machine learning theory.

2 / 22
Binomial distribution
The binomial distribution models an experiment where we count the
number of successes in n independent Bernoulli experiments, all of which
having the same success probability p. We say that Y has a binomial
distribution, and write Y ∼ Bin(n, p). It takes values y = 0, 1, . . . , n
with probability
fY (y ) = yn p y (1 − p)n−y


Properties
• If Y1 ∼ Bin(n1 , p), Y2 ∼ Bin(n2 , p) and are independent, then
Y1 + Y2 ∼ Bin(n1 + n2 , p).
P
• If ∀i = 1 . . . n, Xi are i.i.d Bern(p) then i Xi ∼ Bin(n, p)

Indeed: Qn P P
P(X1 = x , . . . , X = x ) = p xi (1 − p)1−xi = p i xi (1 − p)n− i xi .
1 2 2 i=1
so P( i Xi = k) = kn p k (1 − p)n−k
P 

The number of ways to choose k out of n objects is kn := (n−k)!k! n!



.
3 / 22
Poisson distribution
Poisson distribution is used to model counts of rare random events: shark
attacks, big meteors hitting the Earth, etc. We say that X has a Poisson
distribution with parameter λ > 0 if
λx
fX (x) = e −λ for x = 0, 1, 2, . . . .
x!
Here X = X (Ω) is discrete but infinite.

Related model and properties


• mean and variance are equal to λ. A strong limitation in modelling,
alternatives are negative binomial or adjustments for overdispersion.
• X1 ∼ Pois(λ1 ), X2 ∼ Pois(λ2 ) independent, then
X1 + X2 ∼ Pois(λ1 + λ2 ).
• Poisson process, a stochastic process that counts the number of
occurrences of event by time t.

4 / 22
Multinomial distribution

The multinomial distribution is a multivariate extension of the binomial


distribution. It models an experiment where n independent trials with a
finite number of possible outcomes (larger than 2) are run. We say that
X
Pkhas a Multinomial distribution with parameter (n, p) where
j=1 pj = 1 if:

k
n! X
fX (x1 , . . . , xk ) = p x1 . . . pkxk where xj = n
x1 ! . . . xk ! 1
j=1

Related model and properties


• when n = 1, it is called the categorical distribution
• frequently appears in clustering and dimension reduction models.

5 / 22
Continuous distributions
Normal (Gaussian) distribution
We say that X has a Gaussian distribution with mean µ ∈ R and
variance σ 2 > 0, denoted by N(µ, σ 2 ), if its density function is
 
1 1 2
fX (x) = √ exp − 2 (x − µ) for x ∈ R.
2πσ 2σ
The Gaussian distribution approximates many real phenomena: see
Galton’s board, central limit theorem, etc.
Basic properties
• X ∼ N(µ, σ 2 ) then X σ−µ ∼ N(0, 1) with CDF Φ, so, in particular,
     
a−µ X −µ b−µ b−µ a−µ
P(a ≤ X ≤ b) = P ≤ ≤ =Φ −Φ
σ σ σ σ σ

• Xi ∼ N(µi , σi2 ) independent, then Xi ∼ N( µi , σi2 ).


P P P

R∞ x2
−2
Useful exercise: Compute the integral −∞ e dx (use polar coordinates).
6 / 22
Gamma and Beta distributions
They are based on the gamma function:
Z ∞
Γ(z) = x z−1 e −x dx, z > 0 and Γ(n) = (n − 1)! n ∈ N⋆
0

We say that X has a Gamma distribution with parameters α and β,


denoted by X ∼ Gamma(α, β), if

1
fX (x) = x α−1 e −x/β , x > 0 where α, β > 0
β α Γ(α)

We say that X has a Beta distribution with parameters α and β, denoted


by X ∼ Beta(α, β), if

Γ(α + β) α−1
fX (x) = x (1 − x)β−1 , x ∈ (0, 1) where α, β > 0
Γ(α)Γ(β)

Those and related distributions frequently appear in Bayesian statistics.


7 / 22
Other continuous distributions
The uniform distribution over [a, b], U(a, b)
1
The density function is fX (x) = b−a 11[a,b] (x).

Exponential distribution, Exp(λ)


The density function is fX (x) = λe −λx for λ > 0, x > 0.

Exponential distribution is used to model waiting times between


occurences in a Poisson process. It is a special case of the gamma
distribution, it is a Gamma(1, λ1 ).

Chi-square distribution, χ2p


If Z1 , . . . , Zp are independent N(0, 1) then

X = Z12 + Z22 + · · · + Zp2 ∼ χ2p .

The natural number p is called the degrees of freedom. It is also a


special case of the gamma distribution, it is a Gamma( p2 , 2).
8 / 22
Multivariate Gaussian: definition
Standard multivariate normal distribution
Let Z1 , . . . , Zp be independent identically distributed (iid) N(0, 1)
variables. Their joint distribution is:
p  
Y 1 1 2 1 1
f (z1 , . . . , zp ) = √ exp(− zi ) = p/2
exp(− z T z).
2π 2 (2π) 2
i=1

Let Z = (Z1 , . . . , Zp ). Z is a random vector of p standard normal


random variables with mean vector:

µ = 0p

and covariance matrix:


Σ = Ip
We say Z has a standard multivariate normal distribution and note:

Z ∼ Np (0p , Ip )

9 / 22
Standard multivariate normal distribution

10 / 22
Bivariate normal distribution

We now define a case without independence for two variables. Let


σ12 ρσ1 σ2
 
2
µ ∈ R and Σ = definite positive.
ρσ1 σ2 σ22
We say the vector X = (X1 , X2 ) has a bivariate normal distribution and
note X ∼ N2 (µ, Σ), if:

( "
x1 − µ1 2

1 1 1
f (x1 , x2 ) = exp −
2π σ1 σ2 (1 − ρ2 ) 12 2 (1 − ρ2 ) σ1
#)
x2 − µ2 2
   
x1 − µ1 x2 − µ2
+ − 2ρ
σ2 σ1 σ2

11 / 22
Contours of bivariate normal distribution when ρ = 0

12 / 22
Contours of bivariate normal distribution when ρ ≈ 0.5

13 / 22
Contours of bivariate normal distribution when ρ → 1

14 / 22
Exercise

Qp  
1. Show that √1 exp(− 21 zi2 ) = 1
exp(− 12 z T z).
i=1 2π (2π)p/2

σ12 ρσ1 σ2
 
2. Let X ∼ N2 (µ, Σ) with µ = (µ1 , µ2 ) and Σ = .
ρσ1 σ2 σ22
Show that
f (x1 , x2 ) = (2π)1 p/2 (det Σ)−1/2 exp − 12 (x − µ)T Σ−1 (x − µ)


3. Find 
a sufficient and necessary condition on ρ for
σ12 ρσ1 σ2

Σ= to be positive definite.
ρσ1 σ2 σ22
4. Why are the contours of the bivariate normal ellipse? What are the
principal axes of the ellipse?

15 / 22
Multivariate normal distribution (general case)

Let µ ∈ Rp and Σ a symmetric positive definite p × p matrix. We say


the vector X = (X1 , X2 , . . . , Xp ) has (non-degenerate) multivariate
normal distribution and note X ∼ Np (µ, Σ) when it has density:
 
1 −1/2 1 T −1
f (x) = (det Σ) exp − (x − µ) Σ (x − µ) .
(2π)p/2 2

Its characteristic function is:


 
1
φX (t) = exp itT µ exp − tT Σt ,

∀t
2

Its moment generating function is:


 
1 T
MX (t) = exp tT µ exp

t Σt
2

16 / 22
Multivariate Gaussian: linear
transformations
Linear transformations
Multivariate normal distribution is closed under linear transformations. It
is a defining property of the multivariate normal distribution:

If X ∼ Np (µ, Σ), for any A ∈ Rm×p (m ≤ p), AX ∼ Nm (Aµ, AΣAT )

Corollary
Σ is positive definite then there exists V orthogonal and Λ diagonal such
that: Σ = V ΛV T . We define Σ1/2 = V Λ1/2 V T and
Σ−1/2 = V Λ−1/2 V T .

• If Z ∼ Np (0p , Ip ) and X = µ + Σ1/2 Z then X ∼ Np (µ, Σ).

• If X ∼ Np (µ, Σ) then Σ−1/2 (X − µ) ∼ Np (0p , Ip ).

Exercise
Let X ∼ Np (µ, Σ), what is the distribution of (X − µ)T Σ−1 (X − µ)?

17 / 22
Example: Change of variables, Cholesky decomposition
and multivariate normal distribution
Suppose that X has a standard multivariate normal distribution. Let Σ
be a symmetric positive definite matrix and let Σ = LT L be its Cholesky
decomposition. What’s the distribution of Y = LT X ?

Since x = L−T y then |J(y )| = det(L−T ) = 1/ det(L) and


1 1
fX (L−T y ) = p/2
exp{− (L−T y )T (L−T y )}
(2π) 2
1 1 T −1
= exp{− y Σ y }
(2π)p/2 2

Noting that det(L) = (det Σ)1/2 we get


 
1 1 T −1
fY (y ) = fX (L−T y )|J(y )| = (det Σ)−1/2
exp − y Σ y .
(2π)p/2 2

That is Y ∼ N(0, Σ)
18 / 22
Multivariate Gaussian: marginal
and conditional distributions and
independence
Marginal and conditional distributions
The multivariate normal distribution is closed under marginalization and
conditioning. Split X into two blocks X = (XA , XB ). Denote:
 
ΣAA ΣAB
µ = (µA , µB ) and Σ=
ΣBA ΣBB

Marginal distribution

XA ∼ N|A| (µA , ΣAA )


XB ∼ N|B| (µB , ΣBB )

Where |A| and |B| are the dimension of vectors XA and XB


Conditional distribution
∼ N|A| µA + ΣAB Σ−1 −1

XA |XB = xB BB (xB − µB ), ΣAA − ΣAB ΣBB ΣBA

∼ N|B| µB + ΣBA Σ−1 −1



XB |XA = xA AA (xA − µA ), ΣBB − ΣBA ΣAA ΣAB

19 / 22
Independence
The covariance matrix Σ of a multivariate normal vector and its inverse
K = Σ−1 encode independence relations. K is called the precision or
concentration matrix.
Two by two independence
Xi ⊥
⊥ Xj ⇔ Σij = 0
Conditional independence
⊥ Xj |Xrest ⇔ Σij = ΣiR Σ−1
Xi ⊥ −1
RR ΣRj ⇔ (Σ )ij = 0
The conditional independence properties of the precision matrix give rise
to an entire family of models called Gaussian graphical models.
Block matrix inversion:
−1
(M/D)−1 −(M/D)−1 BD −1
  
A B
M= ==
C D −D −1 C (M/D)−1 D −1 + D −1 C (M/D)−1 BD −1

Where M/D := A − BD −1 C and M/A := D − CA−1 B are called respectively


the Schur complements of block D and of block A.
20 / 22
Exercise

 
1 0.7
1. Consider a bivariate normal with µ = (0, 2) and Σ = .
0.7 1
Find E[X1 |X2 ] and var(X1 |X2 ).
 
1.98 −1.4 −0.14
2. Consider the covariance matrix Σ = −1.40 2.0 0.20  of X a
−0.14 0.2 1.02
Gaussian vector. Are there components of X that are independent? Are
there components of X that are conditionally independent?

21 / 22
Wishart distribution
Wishart distribution
Can we define a distribution over the set of all p × p symmetric positive
definite matrices? Yes in the Gaussian case.
iid
Let X1 , . . . , Xn −→ Np (0p , Σ), then
n
X
Y := nSn = Xi XiT has Wishart distribution Wp (Σ, n)
i=1

Denote K = Σ−1 . Then the density of the Wishart distribution is

(det K )n/2 n−p−1 1


f (Y ) = np/2
(det Y ) 2 e − 2 trace(KY ) ,
2 Γp (n/2)

which is well defined for any real n > p − 1.

We have E(Y ) = nΣ.

22 / 22

You might also like