0% found this document useful (0 votes)
2 views

Lecture 1 Introduction

The document provides an introduction to Bayesian Statistics, contrasting it with the Frequentist paradigm. It discusses key concepts such as parametric statistical models, likelihood functions, and the principles of inference in both paradigms. Additionally, it covers Bayes' formula, prior and posterior distributions, and the application of multivariate normal distributions in Bayesian analysis.

Uploaded by

Lavy Koilpitchai
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Lecture 1 Introduction

The document provides an introduction to Bayesian Statistics, contrasting it with the Frequentist paradigm. It discusses key concepts such as parametric statistical models, likelihood functions, and the principles of inference in both paradigms. Additionally, it covers Bayes' formula, prior and posterior distributions, and the application of multivariate normal distributions in Bayesian analysis.

Uploaded by

Lavy Koilpitchai
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Bayesian Statistics

Introduction

Shaobo Jin
Department of Mathematics

Shaobo Jin (Math) Bayesian Statistics 1 / 16


Introduction Frequentist Paradigm

Parametric Statistical Model

Suppose that the vector of observations x = (x1 , ..., xn ) is generated


from a probability distribution with density f (x | θ), where θ is the
vector of parameters.
For example, if we further assume the observations are iid, then
n
Y
f (x | θ) = f (xi | θ) .
i=1

A parametric statistical model consists of the observation x of a


random variable X , distributed according to the density f (x | θ), where
the parameter θ belongs to a parameter space Θ of nite dimension.

Shaobo Jin (Math) Bayesian Statistics 2 / 16


Introduction Frequentist Paradigm

Likelihood Function

Denition
For an observation x of a random variable X with density f (x | θ), the
likelihood function L (· | x) : Θ → [0, ∞) is dened by
L (θ | x) = f (x | θ).

Example
T
If X = X1 · · · Xn is a sample of independent random variables,


then
n
Y
L (θ | x) = fi (xi | θ) ,
i=1

as a function in θ conditional on x.

Shaobo Jin (Math) Bayesian Statistics 3 / 16


Introduction Frequentist Paradigm

Likelihood Function: Example

1 If X1 , ..., Xn is a sample of i.i.d. random variables according to


N θ, σ 2 , then
n
" ( )#
Y 1 (xi − µ)2
L (θ | x) = √ exp − .
2πσ 2 2σ 2
i=1

2 If X1 , ..., Xn is a sample of i.i.d. random variables according to


Binomial (k, θ), then
n   
Y k xi n−xi
L (θ | x) = θ (1 − θ) .
xi
i=1

Shaobo Jin (Math) Bayesian Statistics 4 / 16


Introduction Frequentist Paradigm

Likelihood Function: Another Example

Consider the case


For i ̸= j , Xi1 · · · Xin and Xj1 · · · Xjn are independent
   

and identically distributed.


For each i, Xi1 , ..., Xip are not necessarily independent.
Then, the likelihood is
n
Y
L (θ | x) = f (xi1 , · · · , xip | θ) ,
i=1

where f (xi1 , · · · , xip | θ) is the joint density of Xi1 · · · Xip .


 

Shaobo Jin (Math) Bayesian Statistics 5 / 16


Introduction Frequentist Paradigm

Inference Principle

In the frequentist context,


1 likelihood principle: the information brought by observation x is

entirely contained in the likelihood function L (θ | x).


2 suciency principle: two observations x and y factorizing through

the same value of a sucient statistic T as T (x) = T (y) must lead


to the same inference on θ.

Shaobo Jin (Math) Bayesian Statistics 6 / 16


Introduction Bayesian Paradigm

Bayes Formula

If A and E are two events, then


P (E | A) P (A)
P (A | E) =
P (E)
P (E | A) P (A)
= .
P (E | A) P (A) + P (E | Ac ) P (Ac )
If X and Y are two random variables, then
f (x | y) f (y) f (x | y) f (y)
f (y | x) = =´ .
f (x) f (x | y) f (y) dy

Shaobo Jin (Math) Bayesian Statistics 7 / 16


Introduction Bayesian Paradigm

Prior and Posterior

A Bayes model consists of a distribution π (θ) on the parameters, and a


conditional probability distribution f (x | θ) on the observations.
The distribution π (θ) is called the prior distribution.
The unknown parameter θ is a random parameter.
By Bayes formula,
f (x | θ) π (θ) f (x | θ) π (θ)
π (θ | x) = =´ ,
m (x) f (x | θ) π (θ) dθ

where the conditional distribution π (θ | x) is the posterior distribution


and m (x) is the marginal distribution of x.

Shaobo Jin (Math) Bayesian Statistics 8 / 16


Introduction Bayesian Paradigm

Update Our Knowledge on θ


The prior often summarizes the prior information about θ.
From similar experiences, the average number of accidents at a
crossing is 1 per 30 days. We assume
π (θ) = 30 exp (−30θ) , [day]−1 .

Our experiment resulted in an observation x.


Three accidents have been recorded after monitoring the
roundabout for one year. The likelihood is
(365θ)3
f (X = 3 | θ) = exp (−365θ) .
3!
We use the information in x to update our knowledge on θ.
By Bayes' formula
f (X = 3 | θ) π (θ)
π (θ | x) = .
m (x)
Shaobo Jin (Math) Bayesian Statistics 9 / 16
Introduction Bayesian Paradigm

Distributions

In a Bayesian model, we will have many distributions


prior distribution: π (θ).
conditional distribution X | θ (likelihood): f (x | θ).
joint distribution of (θ, X): f (x, θ) = f (x | θ) π (θ).
posterior distribution: π (θ | x).
´
marginal distribution of X : m (x) = f (x | θ) π (θ) dθ.

We most of the time use π (·) and m (·) as generic symbols. But in
several cases, they are tied to specic functions.

Shaobo Jin (Math) Bayesian Statistics 10 / 16


Introduction Bayesian Paradigm

Use Bayes Formula To Obtain Posterior

Example
Find the posterior distribution.
1 Suppose that we have an iid sample X | θ ∼ Bernoulli (θ),
i
i = 1, ..., n. The prior is θ ∼ Beta (a0 , b0 ).
2 Suppose that we have an iid sample X | µ ∼ N µ, σ 2 , i = 1, ..., n,

i
where σ 2 is known. The prior is µ ∼ N µ0 , σ02 .


3 Suppose that we have an iid sample X | µ, σ 2 ∼ N µ, σ 2 ,



i
i = 1, ..., n. The priors are µ | σ 2 ∼ N µ0 , σ 2 /λ0 and


σ 2 ∼ InvGamma (a0 , b0 ), where

ba00
 
2
 1 b0
π σ = exp − 2 .
Γ (a0 ) (σ 2 )a0 +1 σ

Shaobo Jin (Math) Bayesian Statistics 11 / 16


Introduction Bayesian Paradigm

Bayesian Inference Principle

Bayesian Inference Principle


Information on the underlying parameter θ is entirely contained in the
posterior distribution π (θ | x). That is, all statistical inference are
based on the posterior distribution π (θ | x).

Some examples are


1 posterior mean: E[θ | x].

2 posterior mode (MAP): θ that maximizes π (θ | x).

3 predictive distribution of a new observation:

ˆ
f (y | x) = f (y | x, θ) π (θ | x) dθ.

Shaobo Jin (Math) Bayesian Statistics 12 / 16


Introduction Multivariate Normal Distribution

From Univariate to Multivariate Normal

Let Z ∼ N (0, 1). Then, X = σZ + µ ∼ N µ, σ 2 , where E [X] = µ and




Var (X) = σ 2 .
T
Let Z = Z1 Z2 · · · Zp be a random vector, each Zj ∼ N (0, 1),


and Zj is independent of Zk for any j ̸= k. Then,

X = Σ1/2 Z + µ ∈ Rp

follows a p−dimensional multivariate normal distribution, denoted by


X ∼ Np (µ, Σ), where E [X] = µ and Var (X) = Σ.

Shaobo Jin (Math) Bayesian Statistics 13 / 16


Introduction Multivariate Normal Distribution

From Univariate to Multivariate Normal: Density

The density function of the random variable X ∼ N µ, σ 2 with σ > 0




can be expressed as
( )
(x − µ)2
 
1 1 1 1
√ exp − = √ exp − (x − µ) 2 (x − µ) .
2πσ 2 2σ 2 2πσ 2 2 σ

A p-dimensional random variable X ∼ Np (µ, Σ) with Σ > 0 has the


density
 
1 1 T −1
f (x) = exp − (x − µ) Σ (x − µ) .
(2π)p/2 det (Σ) 2
p

Shaobo Jin (Math) Bayesian Statistics 14 / 16


Introduction Multivariate Normal Distribution

Some Useful Properties

1 Linear combination of normal remains normal: Suppose that


X ∼ Np (µ, Σ), then AX + d ∼ Nq Aµ + d, AΣA , for every q × p
T


constant matrix A, and every p × 1 constant vector d.


2 Marginal normal + independence imply joint normal: If X1 and
X2 are independent and are distributed Np (µ1 , Σ11 ) and
Nq (µ2 , Σ22 ), respectively, then
     
X1 µ1 Σ11 0
∼ Np+q , .
X2 µ2 0 Σ22
     
X1 µ1 Σ11 Σ12
3 Conditional distribution: Let ∼ Np+q , .
X2 µ2 Σ21 Σ22
Then the conditional distribution of X1 given that X2 = x2 , is

X1 | X2 ∼ N µ1 + Σ12 Σ−1 −1

22 (x2 − µ2 ) , Σ11 − Σ12 Σ22 Σ21 .

Shaobo Jin (Math) Bayesian Statistics 15 / 16


Introduction Multivariate Normal Distribution

Multivariate Normal In Bayesian Statistics

Example
Suppose that X | θ ∼ Np(Cθ, Σ), where Cp×q and Σ > 0 are known.
The prior is Nq µ0 , Λ−1
0 . Find the posterior of θ.

We can in fact use the property of the conditional distribution of a


multivariate normal distribution to simplify the steps.
Result
If we know X1 | X2 ∼ Np (CX2 , Σ) and X2 ∼ Nq (m, Ω), then

Σ + CΩC T
     
X1 Cm CΩ
∼ Np+q , .
X2 m ΩC T Ω

Shaobo Jin (Math) Bayesian Statistics 16 / 16

You might also like