0% found this document useful (0 votes)
46 views19 pages

Families of Distributions: Beamer-Tu-Logo

The document discusses exponential families of distributions. Exponential families have probability density or mass functions that can be written in a standard exponential form involving functions of the data and parameters. This allows properties like expectations to be found via differentiation rather than integration. Examples of distributions that form exponential families include the binomial, Poisson, normal and gamma distributions. The key properties of exponential families are summarized, including a formula for computing expectations and variances of sufficient statistics in terms of the log partition and weight functions.

Uploaded by

Darlyn LC
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views19 pages

Families of Distributions: Beamer-Tu-Logo

The document discusses exponential families of distributions. Exponential families have probability density or mass functions that can be written in a standard exponential form involving functions of the data and parameters. This allows properties like expectations to be found via differentiation rather than integration. Examples of distributions that form exponential families include the binomial, Poisson, normal and gamma distributions. The key properties of exponential families are summarized, including a formula for computing expectations and variances of sufficient statistics in terms of the log partition and weight functions.

Uploaded by

Darlyn LC
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Lecture 9: Exponential and location-scale families

Families of Distributions
In statistics we are interested in some families of distributions, i.e.,
some collections of distributions.
For example, the family of binomial distributions with p ∈ (0, 1) and a
fixed n; the family of normal distributions with µ ∈ R and σ > 0.
Exponential families
A family of pdfs or pmfs indexed by θ is called an exponential family iff
it can be expressed as
!
k
fθ (x) = h(x)c(θ ) exp ∑ wi (θ )ti (x) , θ ∈ Θ,
i=1

where exp(x) = ex , Θ is the set of all values of θ (parameter space),


h(x) ≥ 0 and t1 (x), ..., tk (x) are functions of x (not depending on θ ),
and c(θ ) > 0 and w1 (θ ), ..., wk (θ ) are functions of the possibly
vector-valued θ (not depending on x). beamer-tu-logo

Note that the expression for f may not be unique.


UW-Madison (Statistics) Stat 609 Lecture 9 2015 1 / 19
Example 3.4.1.
To show that a family of pdf’s or pmf’s is an exponential family, we must
identify the functions h(x), ti (x), c(θ ), and wi (θ ) and show that the pdf
or pmf has the given form.
The binomial(n, p) distribution with p ∈ (0, 1) and a fixed n has pmf
       
n x n−x n n p
p (1 − p) = (1 − p) exp log x , x = 0, 1, ..., n.
x x 1−p
p
Let θ = p, c(θ ) = (1 − p)n , w1 (θ ) = log( 1−p ), t1 (x) = x, and h(x) = xn


for x = 0, 1, ..., n and = 0 otherwise.


Then, the binomial family with p ∈ (0, 1) and a fixed n is an exponential
family (k = 1).
(Note that p = 0 and p = 1 are not included in the family.)
Other examples: Poisson, negative binomial, normal, gamma, beta,...

Exponential families have many nice properties.


The following result is useful since we can replace integration or
beamer-tu-logo
summation by differentiation.
UW-Madison (Statistics) Stat 609 Lecture 9 2015 2 / 19
Theorem 3.4.2.
If X has a pdf or pmf from an exponential family and wi (θ )’s are
differentiable functions, then
!
k
∂ wi (θ ) ∂ log c(θ )
E ∑ ti (X ) = −
i=1
∂ θ j ∂ θj
where θj is the jth component of θ , and
! !
k k
∂ wi (θ ) ∂ 2 log c(θ ) ∂ 2 wi (θ )
Var ∑ ti (X ) = − −E ∑ ∂ θ 2 ti (X )
i=1
∂ θj ∂ θj2 i=1 j

Proof.
From the exponential family expression for fθ (x),
k
log fθ (X ) = log h(X ) + log c(θ ) + ∑ wi (θ )ti (X )
i=1
Differentiating this expression leads to
k
∂ log fθ (X ) ∂ log c(θ ) ∂ wi (θ )
= +∑ ti (X ) beamer-tu-logo
∂ θj ∂ θj i=1
∂ θj
UW-Madison (Statistics) Stat 609 Lecture 9 2015 3 / 19
Taking expectation, we obtain
!
  k
∂ log fθ (X ) ∂ log c(θ ) ∂ w (θ )
E
∂ θj
=
∂ θj
+E ∑ ∂iθj ti (X )
i=1

If fθ (x) is a pdf (the proof for pmf is similar), then the left side of the
previous expression is
∂ log fθ (x) ∂ fθ (x) ∂1
Z ∞ Z ∞ Z ∞

fθ (x)dx = dx = fθ (x)dx = =0
−∞ ∂ θj −∞ ∂ θj ∂ θj −∞ ∂ θj
We interchanged the differentiation and integration, which is justified
under the exponential family assumption.
This proves the first result.
Note that 
∂ fθ (X )
 ∂ 2 fθ (X ) 
∂ fθ (X )
2
∂ 2 log f θ (X ) ∂  ∂ θj ∂ θj2 ∂ θj
2
= = − 
∂ θj ∂ θj fθ (X ) fθ (X ) fθ (X )

Then beamer-tu-logo

UW-Madison (Statistics) Stat 609 Lecture 9 2015 4 / 19


 2
 2
∂ f (X ) 
∂ fθ (X )
 ∂θθ 2
!
∂ 2 log f
Z ∞ 
θ (X )

j ∂ θj
E 2
= −   fθ (x) dx
∂ θj  fθ (X )
−∞  fθ (X ) 

∂ log fθ (X ) 2
Z ∞ 2 Z ∞ 
∂ fθ (X )
= dx − fθ (x)dx
−∞∂ θj2 −∞ ∂ θj
" #2
k
∂ log c(θ ) ∂ wi (θ )
Z ∞
=− +∑ ti (X ) fθ (x)dx
−∞ ∂ θj i=1
∂ θj
!
k
∂ wi (θ )
= −Var ∑ ti (X )
i=1
∂ θj

which follows from the first result.


Then the second result follows from
k
∂ 2 log fθ (X ) ∂ 2 log c(θ ) ∂ 2 wi (θ )
= + ∑ ti (X )
∂ θj2 ∂ θj2 i=1 ∂ θj2 beamer-tu-logo

UW-Madison (Statistics) Stat 609 Lecture 9 2015 5 / 19


Example 3.4.4.
If X ∼ N(µ, σ 2 ), then θ = (µ, σ ) and
(x − µ)2 µ2 x2
     
1 1 µx
fθ (x) = √ exp − = √ exp − 2 exp −
2πσ 2σ 2 2πσ 2σ σ 2 2σ 2
2
Let h(x) = 1, c(θ ) = √1 µ
exp(− 2σ 2 2
2 ), w1 (θ ) = 1/σ , w2 (θ ) = µ/σ ,
2πσ
t1 (x) = −x 2 /2, and t2 (x) = x.
Then this normal family is an exponential family with k = 2.
Applying Theorem 3.4.2, we obtain E(X ) = µ from equation
!
2  
∂ log c(θ ) µ ∂ wi (θ ) X
− = 2 =E ∑ ti (X ) = E 2
∂µ σ i=1
∂ µ σ

Also,
!
2
∂ log c(θ ) µ 2 1 X 2 2µX
 
∂ w (θ )

∂σ
= 3 + =E
σ σ ∑ ∂iσ ti (X ) =E
σ3
− 3
σ
i=1
Using E(X ) = µ, we obtain from this equation that Var(X ) = σ 2 . beamer-tu-logo

UW-Madison (Statistics) Stat 609 Lecture 9 2015 6 / 19


Beta distribution beta(α, β )
For constants α > 0 and β > 0, the beta(α, β ) distribution has pdf

 Γ(α+β ) x α−1 (1 − x)β −1 0<x <1
Γ(α)Γ(β )
f (x) =
 0 otherwise
This is a pdf because
Z 1
Γ(α)Γ(β )
x α−1 (1 − x)β −1 dx =
0 Γ(α + β )

beamer-tu-logo

UW-Madison (Statistics) Stat 609 Lecture 9 2015 7 / 19


Since Γ(2) = Γ(1) = 1, beta(1, 1) is the same as uniform(0, 1).
If X ∼ beta(α, β ), then 1 − X ∼ beta(β , α).
The pdf of beta(α, β ) can be increasing (α > 1, β = 1), decreasing
(α = 1, β > 1), U-shaped (α < 1, β < 1), or unimodal (α > 1, β > 1).
If α = β , then the pdf of beta(α, β ) is symmetric about 12 .
For any r > 0, if X ∼ beta(α, β ), then
Z 1
Γ(α + β ) Γ(α + β )Γ(r + α)
E(X r ) = x r +α−1 (1 − x)β −1 dx =
Γ(α)Γ(β ) 0 Γ(α)Γ(r + α + β )
In particular,
α α(α + 1)
E(X ) = , E(X 2 ) =
α +β (α + β )(α + β + 1)

Then
α(α + 1) α2 αβ
Var(X ) = − 2
= 2
(α + β )(α + β + 1) (α + β ) (α + β ) (α + β + 1)
beamer-tu-logo

The family of beta(α, β ) distributions is an exponential family.


UW-Madison (Statistics) Stat 609 Lecture 9 2015 8 / 19
Natural exponential family
If ηi = wi (θ ), i = 1, ..., k , and η = (η1 , ..., ηk ), the form of fθ in the
exponential family becomes
!
k
fη∗ (x) = h(x)c ∗ (η) exp ∑ ηi ti (x)
i=1
η is called the natural parameter.
The set of η’s for which fη∗ (x) is a well-defined pdf is called the natural
parameter space.

Full or curved exponential families


In an exponential family, if the dimension of θ is k (there is an open set
⊂ Θ), then the family is a full exponential family. Otherwise the family is
a curved exponential family.

An example of a full exponential family is N(µ, σ 2 ), µ ∈ R, σ > 0.


An example of a curved exponential family is N(µ, µ 2 ), µ ∈ R. beamer-tu-logo

UW-Madison (Statistics) Stat 609 Lecture 9 2015 9 / 19


How to show a family is not an exponential family
It may be difficult to show that a family is not an exponential family.
We cannot say “we are not able to express fθ (x) in the form of an
exponential family".
If fθ , θ ∈ Θ is an exponential family, then
{x : fθ (x) > 0} = {x : h(x) > 0}
which does not depend on θ values.
This fact can be used to show a family is non-exponential, i.e., if
{x : fθ (x) > 0} depends on θ , then fθ , θ ∈ Θ, is not an exponential
family.
Consider the family of two parameters exponential distributions with
pdf’s (
λ −1 e−(x−µ)/λ x>µ
fθ (x) = µ ∈ R, λ > 0
0 x≤µ
It is not an exponential family because
beamer-tu-logo
{x : fθ (x) > 0} = {x : x > µ}
UW-Madison (Statistics) Stat 609 Lecture 9 2015 10 / 19
Definition 3.5.2 (location family)
Let f (x) be a given pdf. The family of pdf’s, f (x − µ), µ ∈ R, is called a
location family with location parameter µ.

Examples of location families are normal and Cauchy with location


parameter µ ∈ R and the other parameter σ fixed.
Other examples are given later.
The pdf f (x − µ) is obtained by shifting the entire curve f (x) by an
amount µ (see the figure) without changing the structure of f (x).
It can be shown that X ∼ f (x − µ) iff X = Z + µ with Z ∼ f (x).

beamer-tu-logo

UW-Madison (Statistics) Stat 609 Lecture 9 2015 11 / 19


Definition 3.5.4 (scale family)
Let f (x) be a given pdf. The family of pdf’s, σ −1 f (x/σ ), σ > 0, is called
a scale family with scale parameter σ .
Examples of scale families are normal and Cauchy with scale
parameter σ > 0 and µ fixed, gamma(α, β ) with β > 0 and α fixed.
Other examples are given later.
The pdf σ −1 f (x/σ ) is obtained by stretching (σ > 1) or contracting
(σ < 1) the curve f (x) while still maintaining the same shape.
It can be shown that X ∼ σ −1 f (x/σ ) iff X = σ Z with Z ∼ f (x).

beamer-tu-logo

UW-Madison (Statistics) Stat 609 Lecture 9 2015 12 / 19


Definition 3.5.5 (location-scale family)
Let f (x) be a given pdf. The family of pdf’s, σ −1 f ((x − µ)/σ ), µ ∈ R,
σ > 0, is called a location-scale family with location parameter µ and
scale parameter σ .

A location-scale family is a combination of a location family and a


scale family: it contains a sub-family that is a location family with
any fixed σ , and a sub-family that is a scale family with any fixed
µ.

beamer-tu-logo

UW-Madison (Statistics) Stat 609 Lecture 9 2015 13 / 19


Examples of location-scale families are normal, double
exponential, Cauchy, logistic, and two-parameter exponential
distributions with location parameter µ ∈ R and scale parameter
σ > 0. Except for the two-parameter exponential distribution, all
others are symmetric about µ.
If f (x) is symmetric about 0, then σ −1 f ((x − µ)/σ ) is symmetric
about µ and µ is the median of X ∼ σ −1 f ((x − µ)/σ ); furthermore,
if the expectation of f (x) exists, then µ is the expectation of
σ −1 f ((x − µ)/σ ).
It can be shown that X ∼ σ −1 f ((x − µ)/σ ) iff X = σ Z + µ with
Z ∼ f (x); furthermore, if E(Z 2 ) < ∞, then E(X ) = σ E(Z ) + µ and
Var(X ) = σ 2 Var(Z ).
The pdf f (x)Rin a location-scale family is standard
R∞ 2
iff the

expectation −∞ xf (x)dx = 0 and the variance −∞ x f (x)dx = 1.
Typically, we choose a standard f (x) to generate a location-scale
family, in which case µ and σ 2 are the expectation and variance of
σ −1 f ((x − µ)/σ ), respectively. beamer-tu-logo

UW-Madison (Statistics) Stat 609 Lecture 9 2015 14 / 19


Two parameter exponential distribution exponential(µ, β )
If X ∼ exponential(β ) and µ ∈ R is a constant, then the distribution of
Y = X + µ is called the two parameter exponential distribution and
denoted by exponential(µ, β ).
Its pdf and cdf are (by transformation)
( 1 −(x−µ)/β (
βe x≥µ 1 − e−(x−µ)/β x≥µ
f (x) = F (x) =
0 x<µ 0 x<µ
and, if Y ∼ exponential(µ, β ),
e µt 1 eı̇µt
E(Y )=µ +β , Var(Y )=β 2 , MY (t)= , t < , φY (t)= , t ∈R
1−βt β 1 − ı̇β t
Double exponential distribution double-exponential(µ, σ )
By reflecting the pdf of exponential(µ, σ ) around µ, we obtain the
double-exponential(µ, σ ) pdf that is symmetric about µ:
( 1 −(x−µ)/σ
2σ e x≥µ 1 −|x−µ|/σ
f (x) = = e , x ∈ R beamer-tu-logo
1 (x−µ)/σ
e x<µ 2σ

UW-Madison (Statistics) Stat 609 Lecture 9 2015 15 / 19
This pdf is not bell-shaped; in fact, it has a peak (a
non-differentiable point) at x = µ.
Its cdf is
1 − 21 e−(x−µ)/σ
(
x≥µ
F (x) =
1 (x−µ)/σ
2e x<µ
If X ∼ double-exponential(µ, σ ), then
Z = (X − µ)/σ ∼ double-exponential(0, 1).
If Z = (X − µ)/σ ∼ double-exponential(0, 1), then
1
Z ∞
E(Z ) = xe−|x| dx = 0
2 −∞

because xe−|x|
is an odd function, and
1 ∞ 2 −|x|
Z Z ∞
Var(Z ) = E(Z 2 ) = x e dx = x 2 e−x dx = Γ(3) = 2
2 −∞ 0
If X ∼ double-exponential(µ, σ ), then X = σ Z + µ,
Z = (X − µ)/σ ∼ double-exponential(0, 1), and
beamer-tu-logo
E(X ) = E(σ Z + µ) = µ, Var(X ) = Var(σ Z + µ) = σ 2 Var(Z ) = 2σ 2
UW-Madison (Statistics) Stat 609 Lecture 9 2015 16 / 19
Logistic distribution logistic(µ, σ )
For constants µ ∈ R and σ > 0, the logistic(µ, σ ) distribution has pdf
e−(x−µ)/σ
f (x) = , x ∈R
σ [1 + e−(x−µ)/σ ]2
This pdf is again bell-shaped and symmetric about µ.
The cdf of logistic(µ, σ ) has a close form:
Z x
1
F (x) = f (t)dt = , x ∈R
−∞ 1 + e−(x−µ)/σ
By symmetry, E(X ) = µ if X ∼ logistic(µ, σ ).
The variance of X ∼ logistic(µ, σ ) is not easy to obtain, but we
give the result here: Var(X ) = σ 2 π 2 /3.
Pareto distribution pareto(α, β )
For constants α > 0 and β > 0, the pareto(α, σ ) distribution has pdf
αβ α x −(α+1)
(
x >β
f (x) = beamer-tu-logo

0 x ≤β
UW-Madison (Statistics) Stat 609 Lecture 9 2015 17 / 19
First, f is indeed a pdf, because
Z ∞ Z ∞ β
−(α+1) α −α α −α
α

f (x)dx = αβ x dx = β x =β β =1
−∞ β ∞
Using a similar argument, we can obtain the cdf of pareto(α, β ) as
  α
 1− β x >β
x
F (x) =
0 x ≤β

Since the integral β∞ x −t dx is finite iff t > 1, E(X ) = ∞ if α ≤ 1


R

when X ∼ pareto(α, β ); if α > 1, then


αβ α −(α−1) β

αβ α −(α−1)
Z ∞
α −α αβ
E(X ) = αβ x dx = x = β =
β α −1 α −1 ∞ α −1
Similarly, Var(X ) = ∞ if α ≤ 2; and if α > 2,
αβ α −α+2 β αβ 2

αβ α −α+2
Z ∞
2 α −α+1
E(X ) = αβ x dx = x = β =
β α −2
∞ α −2 α −2
αβ 2 α 2β 2 αβ 2 beamer-tu-logo
Var(X ) = E(X 2 ) − [E(X )]2 = − =
α − 2 (α − 1)2 (α − 1)2 (α − 2)
UW-Madison (Statistics) Stat 609 Lecture 9 2015 18 / 19
Weibull distribution Weibull(γ, β )
For constants γ > 0 and β > 0, if X ∼ exponential(β ), then
Y = X 1/γ ∼ Weibull(γ, β ) with pdf
( γ γ−1 −x γ /β
βx e x >0
f (x) =
0 x ≤0

An example of Y ∼ Weibull(γ, β ) is lifetime or failure time.


If Y ∼ Weibull(γ, β ), then X = Y γ ∼ exponential(β ) and
1
Z ∞
E(Y ) = E(X 1/γ
)=x 1/γ e−x/β dx
β 0
 
1
Z ∞
1/γ 1/γ u 1/γ
= β u e du = β Γ +1
0 γ

Similarly, we can obtain that


(     2 )
2 1
Var(Y ) = β 2/γ Γ +1 − Γ +1 beamer-tu-logo
γ γ
UW-Madison (Statistics) Stat 609 Lecture 9 2015 19 / 19

You might also like