Study Guide
Study Guide
Study Guide
By Yuanhao Jiang
1 Introduction
The main purpose of this chapter is to find different types of distributions and their mean and
variance and many other useful applications will be taken into consideration.
2 Discrete Distributions
Discrete distributions mean the sample space of a random variable X is countable, and mostly, X
has integer-valued outcomes.
Hypergeometric Distribution
Definition: A random variable has a hypergeometric distribution if
(𝑀
𝑥
)(𝑁−𝑀
𝐾−𝑥
)
𝑃(𝑋 = 𝑥|𝑁, 𝑀, 𝐾) = , 𝑥 = 0,1, … , 𝐾
(𝑁
𝐾
)
Mean:
𝐾𝑀
𝐸𝑋 =
𝑁
Variance:
𝐾𝑀 (𝑁 − 𝑀)(𝑁 − 𝐾)
𝑉𝑎𝑟 𝑋 = ( )
𝑁 𝑁(𝑁 − 1)
Binomial Distribution
Before we look into Binomial Distribution, we should consider the Bernoulli Distribution first.
Definition: A random variable has a Bernoulli distribution if
1 𝑤𝑖𝑡ℎ 𝑝𝑟𝑜𝑏𝑎𝑙𝑖𝑙𝑡𝑦 𝑝
𝑋= { 0 ≤ 0 ≤ 1.
0 𝑤𝑖𝑡ℎ 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 1 − 𝑝,
Mean:
𝐸𝑋 = 𝑝
Variance:
𝑉𝑎𝑟 𝑋 = 𝑝(1 − 𝑝)
Binomial distribution is based on Bernoulli Distribution, so now let’s take a look at binomial
distribution.
Definition: A random variable has a binomial(𝑛, 𝑝) distribution if
𝑛
𝑃(𝑌 = 𝑦|𝑛, 𝑝) = ( ) 𝑝 𝑦 (1 − 𝑝)𝑛−𝑦 , 𝑦 = 0,1,2, … , 𝑛
𝑦
Mean:
𝐸𝑋 = 𝑛𝑝
Variance:
𝑉𝑎𝑟 𝑋 = 𝑛𝑝(1 − 𝑝)
MGF:
𝑀𝑋 (𝑡) = [𝑝𝑒 𝑡 + (1 − 0)]𝑛 .
Poisson Distribution
Definition: A random variable has a Poisson(𝜆) distribution if
𝑒 −𝜆 𝜆𝑥
𝑃(𝑋 = 𝑥|𝜆) = , 𝑥 = 0,1, …
𝑥!
Mean:
∞
𝑒 −𝜆 𝜆𝑥
𝐸𝑋 = ∑ 𝑥
𝑥!
𝑥=0
∞
𝑒 −𝜆 𝜆𝑥
= ∑𝑥
𝑥!
𝑥=1
∞
−𝜆
𝜆𝑥−1
= 𝜆𝑒 ∑
(𝑥 − 1)!
𝑥=1
∞
−𝜆
𝜆𝑦
= 𝜆𝑒 ∑
𝑦!
𝑦=0
=𝜆
Variance: Similar calculation to Mean.
𝑡 −1)
𝑉𝑎𝑟 𝑋 = 𝑒 𝜆(𝑒
MGF:
𝑡 −1)
𝑀𝑋 (𝑡) = 𝑒 𝜆(𝑒 .
Special relationship between Poission Distribution and Binomial Distribution: when 𝑛 is very
large and 𝑝 is small, the Poisson is approximate to binomial.
Mean:
(1 − 𝑝)
𝐸𝑌 = 𝑟
𝑝
Variance:
(1 − 𝑝)
𝑉𝑎𝑟 𝑌 = 𝑟
𝑝2
Special relationship between Poission Distribution and Binomial Distribution: If 𝑟 → ∞ 𝑎𝑛𝑑 𝑝 →
1 such that 𝑟(1 − 𝑝) → 𝜆, 0 < 𝜆 < ∞, then
(1 − 𝑝)
𝐸𝑌 = 𝑟 → 𝜆,
𝑝
(1 − 𝑝)
𝑉𝑎𝑟 𝑌 = 𝑟 → 𝜆,
𝑝2
Geometric Distribution
Definition: Geometric Distribution is the simplest of the waiting time distributions and is a special
case of the negative binomial distribution. A random variable has a geometric distribution if
𝑃(𝑋 = 𝑥|𝑝) = 𝑝(1 − 𝑝)𝑥−1 , 𝑥 = 1,2, …,
This distribution is when 𝑟 = 1 in
𝑥−1 𝑟
𝑃(𝑋 = 𝑥|𝑟, 𝑝) = ( ) 𝑝 (1 − 𝑝)𝑥−𝑟 , 𝑥 = 𝑟, 𝑟 + 1, …,
𝑟−1
Mean:
1
𝐸𝑋 = 𝐸𝑌 + 1 =
𝑝
Variance:
1−𝑝
𝑉𝑎𝑟 𝑋 =
𝑝2
3 Continuous Distribution
In this section we will talk about some famous continuous distribution.
Uniform Distribution
Definition: The continuous uniform distribution is defined by spreading mass uniformly over an
interval [a, b]. Its pdf is given by
1
𝑓(𝑥|𝑎, 𝑏) = {𝑏 − 𝑎 𝑖𝑓 𝑥 ∈ [𝑎, 𝑏]
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒.
Mean:
𝑏+𝑎
𝐸𝑋 =
2
Variance:
(𝑏 − 𝑎)2
𝑉𝑎𝑟 𝑋 =
12
Gamma Distribution
Definition: The gamma(𝛼, 𝛽) distribution is defined over an interval [0, +∞). Its pdf is given by
1
𝑓(𝑥|𝛼, 𝛽) = 𝑥 𝛼−1 𝑒 −𝑥/𝛽 , 0 < 𝑥 < +∞, 𝛼 > 0, 𝛽 > 0
Γ(𝛼)𝛽 𝛼
Mean:
𝐸𝑋 = 𝛼𝛽
Variance:
𝑉𝑎𝑟 𝑋 = 𝛼𝛽 2
MGF:
1 1
𝑀𝑋 (𝑡) = ( )𝛼 , 𝑡 <
1 − 𝛽𝑡 𝛽
Special cases of the gamma distribution
𝑝
When 𝛼 = , where 𝑝 is an integer and 𝛽 = 2, then the gamma pdf becomes
2
1 𝑝
( )−1 −𝑥/2
𝑓(𝑥|𝑝) = 𝑥 2 𝑒 , 0 < 𝑥 < +∞,
Γ(𝑝/𝑥)2𝑝/2
which is the χ2 pdf with p degrees of freedom.
When 𝛼 = 1, then the gamma pdf becomes
1
𝑓(𝑥|𝛽) = 𝑒 −𝑥/𝛽 , 0 < 𝑥 < +∞,
𝛽
which is the exponential pdf with the scale parameter 𝛽.
When 𝑋~𝑒𝑥𝑝𝑜𝑛𝑒𝑛𝑡𝑖𝑎𝑙(𝛽), then 𝑌 = 𝑋1/γ has a Weibull(𝛾, 𝛽) distribution. Its pdf is given by
𝛾 𝛾−1 −𝑦 𝛾/𝛽
𝑓𝑌 (𝑦|𝛾, 𝛽) = 𝑦 𝑒 , 0 < 𝑦 < +∞, 𝛾 > 0, 𝛽 > 0
𝛽
Weibull Distribution is important for analyzing failure time data and very useful for modeling
hazard functions.
Mean:
𝐸𝑋 = 𝜇
Variance:
𝑉𝑎𝑟 𝑋 = 𝜎 2
Beta Distribution
Definition: The beta(𝛼, 𝛽) distribution is defined over an interval of (0,1). Its pdf is given by
1
𝑓(𝑥|𝛼, 𝛽) = 𝑥 𝛼−1 (1 − 𝑥)𝛽−1 𝑑𝑥
𝐵(𝛼, 𝛽)
Mean:
𝛼
𝐸𝑋 =
𝛼+𝛽
Variance:
𝛼𝛽
𝑉𝑎𝑟 𝑋 = 2
(𝛼 + 𝛽) (𝛼 + 𝛽 + 1)
Cauchy Distribution
Definition: The normal(𝜇, 𝜎 2 ) distribution is defined over an interval of (−∞, +∞). Its pdf is
given by
1 1
𝑓(𝜋|𝜃) = , − ∞ < 𝑥 < +∞, −∞ < 𝜃 < +∞
𝜋 1 + (𝜋 − 𝜃)2
Mean:
𝐸|𝑋| = ∞
Variance: Doesn’t exist.
Lognormal Distribution
Definition: The lognormal𝑙𝑜𝑔𝑋~𝑛(𝜇, 𝜎 2 ) distribution is defined over an interval of (0, +∞). Its
pdf is given by
1 1 −(𝑙𝑜𝑔𝑥−𝜇)2/(2𝜎2 )
𝑓(𝑥|𝜇, 𝜎 2 ) = 𝑒 , 0 < 𝑥 < +∞, −∞ < 𝜇 < +∞, 𝜎 > 0,
√2𝜋𝜎 𝑥
Mean:
2 /2
𝐸𝑋 = 𝑒 𝜇+𝜎
Variance:
2 2
𝑉𝑎𝑟 𝑋 = 𝑒 2(𝜇+𝜎 ) − 𝑒 2𝜇+𝜎
4 Exponential Families
Definition 4.1: A family of pdfs or pmfs is called an Exponential family if it can be expressed as
𝑘
Theorem 4.2 𝐼𝑓 𝑋 𝑖𝑠 𝑎 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 𝑤𝑖𝑡ℎ 𝑝𝑑𝑓 𝑜𝑟 𝑝𝑚𝑓 𝑜𝑓 𝑡ℎ𝑒 𝑎𝑏𝑜𝑣𝑒 𝑓𝑜𝑟𝑚, 𝑡ℎ𝑒𝑛
𝑘
∂𝜔𝑖 (𝜃) ∂
𝐸 (∑ 𝑡𝑖 (𝑋)) = − 𝑙𝑜𝑔𝑐(𝜃);
∂𝜃𝑗 ∂𝜃𝑗
𝑖=1
𝑘 𝑘
∂𝜔𝑖 (𝜃) ∂2 ∂2 𝜔𝑖 (𝜃)
𝑉𝑎𝑟 (∑ 𝑡𝑖 (𝑋)) = − 2 𝑙𝑜𝑔𝑐(𝜃) − 𝐸 (∑ 𝑡𝑖 (𝑋))
∂𝜃𝑗 ∂𝜃𝑗 ∂𝜃2𝑗
𝑖=1 𝑖=1
Definition 4.3 The indicator function of a set A, most often donated by 𝐼𝐴 (𝑥), is the function
1, 𝑥 ∈ 𝐴
𝐼𝐴 (𝑥) = {
0, 𝑥 ∉ 𝐴
We can re-parameterize an exponential family, then we get
𝑘
Definition 4.4 A curved exponential family is a family of densities of the form for which the
dimension of the vector 𝜃 to be equal to 𝑑 < 𝑘. If 𝑑 = 𝑘, the family is a full exponential family.
Theorem 5.1 Let 𝑓(𝑥) be any pdf and let 𝜇 𝑎𝑛𝑑 𝜎 > 0 be any given constants. Then the
function
1 𝑥−𝜇
𝑔(𝑥|𝜇, 𝜎) = 𝑓 ( )
𝜎 𝜎
Definition 5.2 Assume 𝑓(𝑥) to be any pdf. Then the family of pdfs 𝑓(𝑥 − 𝜇), indexed by the
parameter 𝜇, −∞ < 𝜇 < +∞, is called the location family with standard pdf 𝑓(𝑥) and 𝜇 is
called the location parameter for the family.
1 𝑥
Definition 5.3 Assume 𝑓(𝑥) to be any pdf. Then for any 𝜎 > 0 , the family of pdfs 𝑓(𝜎) ,
𝜎
indexed by the parameter 𝜎, is called the scale family with standard pdf 𝑓(𝑥), and 𝜎 is called
the scale parameter of the family.
Definition 5.4 Assume 𝑓(𝑥) to be any pdf. Then for any 𝜇, −∞ < 𝜇 < +∞, and any 𝜎 > 0,
1 𝑥−𝜇
the family for pdfs 𝑓( ), indexed by the parameter(𝜇, 𝜎), is called the location-scale family
𝜎 𝜎
with standard pdf 𝑓(𝑥); 𝜇 is called the location parameter and 𝜎 is called the scale parameter.
Theorem 5.5 Let 𝑓(·) be any pdf. Let 𝜇 ben any real number, and let 𝜎 be any positive real
1 𝑥−𝜇
number. Then 𝑋 is random variable with pdf 𝑓( ) if and only if there exists a random
𝜎 𝜎
Theorem 6.2 Let 𝑋𝛼,𝛽 denote a gamma(𝛼, 𝛽) random variable with pdf 𝑓(𝑥|𝛼, 𝛽), where 𝛼 >
Lemma 6.3 (Stein’s Lemma) Let 𝑋~𝑛(𝜃, 𝜎 2 ), and let 𝑔 be a differentiable function satisfying
𝐸|𝑔′ (𝑋)| < ∞. Then
𝐸[𝑔(𝑋)(𝑋 − 𝜃)] = 𝜎 2 𝐸𝑔′ (𝑋).
Theorem 6.4 Let 𝑋𝑝2 denote a chi squared random variable with 𝑝 degrees of freedom. For any
function ℎ(𝑥),
2
ℎ(𝑋𝑝+2 )
𝐸ℎ(𝑋𝑝2 ) = 𝑝𝐸 ( 2 )
𝑋𝑝+2
Theorem 6.5 (Hwang) Let 𝑔(𝑥) be a function with −∞ < 𝐸𝑔(𝑋) < ∞ and −∞ < 𝑔(−1) <
∞. Then:
a. 𝐼𝑓 𝑋~𝑃𝑜𝑖𝑠𝑖𝑜𝑛(𝜆),
𝐸(𝜆𝑔(𝑋)) = 𝐸(𝑋𝑔(𝑋 − 1)).
b. 𝐼𝑓 𝑋~𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑏𝑖𝑛𝑜𝑚𝑖𝑎𝑙(𝑟, 𝑝),
𝑋
𝐸((1 − 𝑝)𝑔(𝑋)) = 𝐸 ( 𝑔(𝑋 − 1))
𝑟+𝑋−1