0% found this document useful (0 votes)
82 views12 pages

Lecture 1 (Chapter 3) - Common Families of Distributions

1) The exponential family consists of probability distributions that can be written in a particular form involving an exponential term. 2) Examples of distributions in the exponential family include the binomial, Poisson, exponential, and normal distributions. 3) For distributions in the exponential family, their mean and variance can be calculated using a theorem that provides shortcuts in terms of the parameters of the exponential family form.

Uploaded by

Steve Yang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views12 pages

Lecture 1 (Chapter 3) - Common Families of Distributions

1) The exponential family consists of probability distributions that can be written in a particular form involving an exponential term. 2) Examples of distributions in the exponential family include the binomial, Poisson, exponential, and normal distributions. 3) For distributions in the exponential family, their mean and variance can be calculated using a theorem that provides shortcuts in terms of the parameters of the exponential family form.

Uploaded by

Steve Yang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

STA3020 - Statistical Inference 1

Chapter 3

Common Families of Distributions

3.4 Exponential Families


Definition 3.4.1: (Exponential Family)
A family of pmfs or pdfs is called exponential family if it can be expressed
as !
k
X
f (x|θ) = h(x)c(θ) exp wi (θ)ti (x) (3.4.1)
i=1

where h(x) ≥ 0 and t1 (x), . . . , tk (x) are real-valued functions of the obser-
vation x (they cannot depend on θ), and c(θ) ≥ 0 and w1 (θ), . . . , wk (θ)
are real-valued functions of the possibly vector-valued parameter θ (they
cannot depend on x).

Note: To verify that a family of pdfs or pmfs is an exponential family,


1. Identify the functions h(x), c(θ), ti (x), and wi (θ), and check that
they satisfy the conditions;
2. Show that the family of pdfs or pmfs has the form of (3.4.1).
STA3020 - Statistical Inference 2

Example 3.4.1: (Examples for Exponential Families - Binomial,


Poisson, Exponential, Normal Distributions)

(1) Binomial Distribution:


     x
n x n−x n n p
f (x|p) = p (1 − p) = (1 − p)
x x 1−p
    
n p
= (1 − p)n exp x log ,
x 1−p
then
   
n n p
h(x) = , c(p) = (1 − p) , t(x) = x and w(p) = log .
x 1−p
Note: 0 < p < 1, and f (x|p) is different for p = 0, 0 < p < 1 and p = 1.
The above formula must match all x. Therefore, f (x|p) is an exponential
family only if 0 < p < 1.

(2) Poisson Distribution:


λx e−λ 1
= e−λ exp x log(λ)

f (x|λ) =
x! x!
then
1
h(x) = , c(λ) = e−λ , t(x) = x and w(λ) = log(λ).
x!

(3) Exponential Distribution:


 
1 x
f (x|β) = exp −
β β
then
1 1
h(x) = 1, c(β) = , t(x) = x and w(β) = − .
β β

(4) Normal Distribution:


(x − µ)2 x2 µ2
   
2 1 1 xµ
f (x|µ, σ ) = √ exp − =√ exp − 2 + 2 − 2
2πσ 2σ 2 2πσ 2σ σ 2σ
then
µ2
 
1
h(x) = 1, c(µ, σ) = √ exp − 2 ,
2πσ 2σ
2
x 1 µ
t1 (x) = − , w1 (µ, σ) = 2 , t2 (x) = x and w2 (µ, σ) = 2 .
2 σ σ
STA3020 - Statistical Inference 3

Theorem 3.4.2: If X is a random variable with pdf or pmf of the form


(3.4.1), then it holds for any j,
k
!
X ∂wi (θ) ∂ 
1. E ti (X) = − log c(θ) ;
i=1
∂θj ∂θj
k
! k
!
X ∂wi (θ) ∂2  X ∂ 2 wi (θ)
2. Var ti (X) = − 2 log c(θ) − E 2
ti (X) .
i=1
∂θ j ∂θ j i=1
∂θ j

Remark: The theorem can be utilized as a calculational shortcut for


moments of an exponential family.

Example 3.4.3: (Binomial Mean and Variance)

For Binomial Distribution, we have


   
n n p
h(x) = , c(p) = (1 − p) , t(x) = x and w(p) = log .
x 1−p

Then,
 
d d p 1
w(p) = log = ,
dp dp 1−p p(1 − p)
d2 1 1 2p − 1
2
w(p) = − 2 + 2
= 2 ,
dp p (1 − p) p (1 − p)2
d  d n
log c(p) = n log(1 − p) = − ,
dp dp 1−p
d2  n
2
log c(p) =− .
dp (1 − p)2

Therefore, from Theorem 3.4.2, we have


 
1 n
E X = ⇒ E(X) = np,
p(1 − p) 1−p
   
1 n 2p − 1
Var X = −E 2 X ⇒ Var(X) = np(1 − p).
p(1 − p) (1 − p)2 p (1 − p)2
STA3020 - Statistical Inference 4

Example: (Normal Mean and Variance)

For Normal Distribution, we have


µ2
 
1
h(x) = 1, c(µ, σ) = √ exp − 2 ,
2πσ 2σ
x2 1 µ
t1 (x) = − , w1 (µ, σ) = 2 , t2 (x) = x and w2 (µ, σ) = 2 .
2 σ σ
Then,
∂w1 (µ, σ) ∂(1/σ 2 )
= = 0,
∂µ ∂µ
∂w2 (µ, σ) ∂(µ/σ 2 ) 1
= = 2,
∂µ ∂µ σ
2
∂w1 (µ, σ) ∂(1/σ ) 2
= = − 3,
∂σ ∂σ σ
2
∂w2 (µ, σ) ∂(µ/σ ) 2µ
= = − 3,
∂σ ∂σ  σ
µ2

∂  ∂ log(2π) µ
log c(µ, σ) = − − log(σ) − 2
= − 2,
∂µ ∂µ 2 2σ σ
2
1 µ2
 
∂  ∂ log(2π) µ
log c(µ, σ) = − − log(σ) − = − + .
∂σ ∂σ 2 2σ 2 σ σ3
Therefore, from Theorem 3.4.2, we have
X2 1 µ2
     
1 µ 2 2µ
E X = and E − − − X = − ,
σ2 σ2 σ3 2 σ3 σ σ3
which implies

E(X) = µ, E(X 2 ) = µ2 + σ 2 and Var(X) = E(X 2 ) − (EX)2 = σ 2 .

Definition 3.4.5: The indicator function of a set A, often denoted by


IA (X), is the function

1, if x ∈ A
IA (x) = .
0, if x ∈
/A

Alternatively, we can use I(x ∈ A).


STA3020 - Statistical Inference 5

Example: Let X have a pdf given by

1  x
f (x|θ) = exp 1 − , for θ < x < ∞ and θ > 0.
θ θ
Show that this is NOT an exponential family. The pdf above can be
written using an indicator function:
1  x
f (x|θ) = exp 1 − I[θ,∞) (x).
θ θ

Example 3.4.1: (Exponential Families Using Indicator Functions)

(1) Binomial Distribution:


     x
n x n−x n n p
f (x|p) = I{0,1,...,n} (x) p (1 − p) = I{0,1,...,n} (x) (1 − p)
x x 1−p
    
n p
= I{0,1,...,n} (x) (1 − p)n exp x log ,
x 1−p

then
   
n n p
h(x) = I{0,1,...,n} (x) , c(p) = (1 − p) , t(x) = x and w(p) = log .
x 1−p

Note: 0 < p < 1, and f (x|p) is different for p = 0, 0 < p < 1 and p = 1.
The above formula must match all x. Therefore, f (x|p) is an exponential
family only if 0 < p < 1.

(2) Poisson Distribution:

λx e−λ 1
= I{0,1,... } (x) e−λ exp x log(λ)

f (x|λ) = I{0,1,... } (x)
x! x!
then
1
h(x) = I{0,1,... } (x) , c(λ) = e−λ , t(x) = x and w(λ) = log(λ).
x!

(3) Exponential Distribution:


 
1 x
f (x|β) = I[0,∞) (x) exp −
β β

then
1 1
h(x) = I[0,∞) (x), c(β) = , t(x) = x and w(β) = − .
β β
STA3020 - Statistical Inference 6

(4) Normal Distribution:

(x − µ)2 x2 µ2
   
2 1 1 xµ
f (x|µ, σ ) = √ exp − =√ exp − 2 + 2 − 2
2πσ 2σ 2 2πσ 2σ σ 2σ

then
µ2
 
1
h(x) = 1, c(µ, σ) = √ exp − 2 ,
2πσ 2σ
2
x 1 µ
t1 (x) = − , w1 (µ, σ) = 2 , t2 (x) = x and w2 (µ, σ) = 2 .
2 σ σ

Definition: (Reparameterization of Exponential Families)


k
!
X
f (x|η) = h(x)c∗ (η) exp ηi ti (x) ,
i=1

where η = (η1 , . . . , ηk ), ηi = wi (θ) are natural parameters, h(x) and ti (x)


are the same as in the original parameterization, and
"Z k
! #−1
∞ X
c∗ (η) = h(x) exp ηi ti (x) dx
−∞ i=1

to ensure that the pdf integrates to 1. The set


( Z ∞ k
! )
X
H = η = (η1 , . . . , ηk ) : h(x) exp ηi ti (x) dx < ∞
−∞ i=1

is called the natural parameter space for the family. (The integral is re-
placed by a sum over the values of x for which h(x) > 0 if X is discrete.)
Since the original f (x|θ) in (3.4.1) is a pdf or pmf, it must hold that
n  o
η = w1 (θ), . . . , wk (θ) : θ ∈ Θ ⊂ H.
STA3020 - Statistical Inference 7

Example 3.4.6: (Reparameterization of Normal Distribution)


For Normal distribution, we have
µ2
 
2 1
h(x) = 1, c(µ, σ ) = √ exp − 2 ,
2πσ 2σ
2
x 1 µ
t1 (x) = − , w1 (µ, σ 2 ) = 2 , t2 (x) = x and w2 (µ, σ 2 ) = 2 .
2 σ σ
Let η1 = w1 (µ, σ 2 ) = 1/σ 2 and η2 = w2 (µ, σ 2 ) = µ/σ 2 . The Normal
distribution can be reparameterized as:

η22
 
η1  η
1 2

f (x|η1 , η2 ) = √ exp − exp − x + η2 x ,
2π 2η1 2
where η1 = 1/σ 2 and η2 = µ/σ 2 . The natural parameter space is that
η1 > 0 and −∞ < η2 < ∞.

Definition 3.4.7: A curved exponential family is a family of densities of


the form (3.4.1) for which the dimension of the vector θ = (θ1 , . . . , θd ) is
equal to d < k. If d = k, the family is a full exponential family.

Example 3.4.8: Normal distribution with mean µ and variance σ 2 = µ2 .

(x − µ)2 x2
     
1 1 1 x
f (x|µ) = √ exp − =√ exp − exp − 2 +
2πµ 2µ2 2πµ 2 2µ µ
Let η1 = 1/µ2 and η2 = 1/µ. The Normal distribution n(µ, µ2 ) can
reparameterized as:
√  
η1 1  η
1

f (x|η1 , η2 ) = √ exp − exp − x2 + η2 x .
2π 2 2
Since d = 1 and k = 2, it is a curved exponential family.
STA3020 - Statistical Inference 8

Example 3.4.9: (Normal Approximation)


X1 , . . . , Xn are sampled from a Poisson(λ) population, then the distribu-
tion of X̄ = ni=1 Xn is approximately (according to the Central Limit
P

Theorem)
X̄ ∼ n(λ, λ/n),

which is a curved exponential family.

X1 , . . . , Xn are iid Bernoulli(p), then the distribution of X̄ is approxi-


mately

X̄ ∼ n p, p(1 − p)/n ,

which is also a curved exponential family.

Remark:
1. Theorem 3.4.2 also applied to curved exponential families.
2. Exponential families have nice properties that are very useful in
statistical inference.
STA3020 - Statistical Inference 9

3.5 Location and Scale Families


Three types of families of interest:
1. Location Families
2. Scale Families
3. Location-Scale Families

Note:
1. Each of these families is constructed from a single pdf (or pmf)
known as the standard pdf (pmf) for the family;
2. All other pdfs (or pmfs) in the family are obtained by transforming
the standard pdf (or pmf) in a prescribed way.

Theorem 3.5.1: Let f (x) be any pdf and let µ and σ > 0 be any given
constants. Then the function
 
1 x−µ
g(x|µ, σ) = f
σ σ

is a valid pdf.
Proof.
 
1 x−µ
g(x|µ, σ) = f ≥0
σ σ
∞ ∞ ∞
(y= x−µ
 
x−µ σ )
Z Z Z
1
g(x|µ, σ)dx = f dx ======= f (y)dy = 1.
−∞ −∞ σ σ −∞


STA3020 - Statistical Inference 10

Definition 3.5.2: Let f (x) be any pdf. Then the family of pdfs f (x−µ),
indexed by the parameter µ (−∞ < µ < ∞), is called the location family
with standard pdf f (x) and µ is called the location parameter for the
family.

Remark:
1. The effect of location parameters shifts the density to the left or
right but the shape remains unchanged.
2. If Z has a pdf f (z), then X = Z + µ has density f (x − µ).

Example 3.5.3: (Exponential Location Family)


Let 
e−x x≥0
f (x) =
0 x<0
To form a location family, we replace x with x − µ to obtain
 
e−(x−µ) x − µ ≥ 0 e−(x−µ) x ≥ µ
f (x|µ) = = .
0 x − µ < 0 0 x<µ

If we use the indicator function to express this, we have

f (x|µ) = e−(x−µ) I[0,∞) (x − µ) = e−(x−µ) I[µ,∞) (x).

Definition 3.5.4: Let f (x) be any pdf. Then for any σ > 0, the family
1 x
of pdfs f , indexed by the parameter σ, is called the scale family
σ σ
with standard pdf f (x) and σ is called the scale parameter of the family.

Remark: The effect of scale parameter σ is either to stretch or to contract


the graph f (x) maintaining the same basic shape of the graph.
STA3020 - Statistical Inference 11

Example: (Normal Distribution)

x2
 
1
f (x|σ) = √ exp − 2 , −∞ < x < ∞, σ > 0,
2πσ 2σ

where σ is the scale parameter of the scale family with standard pdf below
 2
1 x
√ exp − , −∞ < x < ∞.
2π 2

Definition 3.5.5: Let f (x) be any pdf.  Then  for any µ (−∞ < µ < ∞),
1 x−µ
and any σ > 0, the family of pdfs f , indexed by the parameter
σ σ
(µ, σ), is called the location-scale family with standard pdf f (x); µ is called
the location parameter and σ is called the scale parameter.

Example: (Normal and Double Exponential Distributions)

(x − µ)2
 
1
f (x|σ) = √ exp − , −∞ < x < ∞, −∞ < µ < ∞, σ > 0.
2πσ 2σ 2
 
1 |x − µ|
f (x|σ) = exp − , −∞ < x < ∞, −∞ < µ < ∞, σ > 0.
2σ σ
STA3020 - Statistical Inference 12

Theorem 3.5.6: Let f (·) be any pdf. Let µ be any real number, and let
σ be any positive
 real number. Then X is a random variable with pdf
1 x−µ
f if and only if there exists a random variable Z with pdf f (z)
σ σ
and X = σZ + µ.

Proof. To prove the “if” part, define g(z) = σz + µ. Then X = g(Z), g


is a monotone function,

−1 x−µ d −1 1
g (x) = and g (x) = .
σ dx σ

Thus by Theorem 2.1.5, the pdf of X is


 
−1
 d −1 1 x−µ
fX (x) = fZ g (x) g (x) = fZ
.
dx σ σ

It is similar to prove the “only if” part: define g(x) = (x − µ)/σ and
let Z = g(X). 

Theorem 3.5.7: Let Z be a random variable with pdf f(z). Suppose


 EZ
1 x−µ
and VarZ exist. If X is a random variable with pdf f , then
σ σ

EX = σEZ + µ and VarX = σ 2 VarZ.

Proof. Based on Theorem 3.5.6, we have X = σZ + µ. 

You might also like