Probability Distributions
Probability Distributions
Distributions
Guilherme J. M. Rosa
University of Wisconsin-Madison
Phenotypic Traits
1
Binomial Distribution
Distribution of the number of successes in a
sequence of n independent yes/no experiments,
each of which yields success with probability p
Such a success/failure experiment is also called
a Bernoulli experiment or Bernoulli trial
When n = 1, the binomial distribution is a
Bernoulli distribution
Binomial Distribution
0 ≤ p ≤1
y | n, p ~ Bin(n, p) y = 0, 1, 2,…, n
(number of successes in n trials)
n!
Pr(y | n, p) = p y (1− p)n−y
y!(n − y)!
2
Binomial Distribution
Bin(7, 0.2)
Bin(30, 0.3)
Bin(50, 0.5)
Multinomial Distribution
Generalization of the binomial distribution for n
independent trials with outcome in one of k
categories, with each category having a given
fixed success probability pi
The multinomial distribution gives the probability
of any particular combination of numbers of
successes for the various categories
3
Multinomial Distribution
y | n, p ~ Multin(n, p1, p 2 ,…, p k )
k
0 ≤ pi ≤ 1 ; ∑p i =1
i=1
k
y i = 0, 1, 2,…, n ; ∑y i =n
i=1
n!
Pr(y | n, p) = p1y1 p 2y2 …p ykk
y1 ! y 2 ! … y k !
Poisson Distribution
Distribution that expresses the probability of a
given number of independent events occurring
in a fixed interval of time and/or space
The Poisson distribution can also be used for
the number of events in other specified
intervals such as distance, area or volume
4
Poisson Distribution
λ>0
y | λ ~ Poisson(λ)
y = 0, 1, 2,…
λ y e−λ
Pr(y | λ) =
y!
E[y | λ] = Var[y | λ] = λ
Poisson Distribution
Poisson (1)
Poisson (5)
Poisson (15)
5
Uniform Distribution
The continuous uniform distribution, also know as
rectangular distribution, is a symmetric density
distributions in which all intervals of the same length
on the distribution's support are equally probable
The support is defined by the two parameters, α and
β, which are its minimum and maximum values
Uniform Distribution
p(y|α,β)
1/(β-α)
y
α β
6
Uniform Distribution
p(y)
1/n
…
1 2 3 … n-1 n
y
Beta Distribution
α > 0; β > 0
y | α, β ~ Beta(α, β)
0 ≤ y ≤1
Γ(α + β) α−1
p(y | α, β) = y (1− y)β−1
Γ(α)Γ(β)
α αβ
E[y | α, β] = ; Var[y | α, β] = 2
α+β (α + β) (α + β +1)
7
Beta Distribution
Gamma Distribution
α and β > 0
y | α, β ~ Gamma(α, β)
y>0
β α α−1
p(y | α, β) = y exp {−βy}
Γ(α)
α α
E[y | α, β] = ; Var[y | α, β] =
β β2
8
Gamma Distribution
Chi-Square Distribution
φ>0
y | φ ~ χ 2φ
y>0
[same as Gamma(φ / 2, 1 / 2)]
2 −φ/2 φ/2−1
p(y | φ) = y exp {−y / 2}
Γ(φ / 2)
E[y | φ] = φ ; Var[y | φ] = 2φ
9
Exponential Distribution
λ>0
y | λ ~ Exp(λ)
y>0
[same as Gamma(1, λ)]
p(y | λ) = λe−λy
E[y | λ] = λ −1 ; Var[y | λ] = λ −2
Exponential Distribution
10
Normal (Gaussian) Distribution
−∞ < µ < ∞
y | µ, σ 2 ~ N(µ, σ 2 ) σ2 > 0
−∞ < y < ∞
1 $ 1 '
p(y | µ, σ 2 ) = exp %− 2 (y − µ)2 (
2πσ 2 & 2σ )
E[y | µ, σ 2 ] = µ ; Var[y | µ, σ 2 ] = σ 2
11
Normal (Gaussian) Distribution
1 n
ð ∑ x i ~ Normal
n i=1 n→∞
(Central Limit Theorem)
Continuous
Inverse Gamma (conjugate prior for the unknown variance of a
normal distribution)
Inverse Chi-square
Scaled inverse Chi-square
Logistic (heavy tailed bell-shaped distribution; its cumulative
distribution function is the logistic function, which appears in
logistic regression and feedforward neural networks)
Among others…
12
13
Relationships among
common distributions
(Leemis, 1986)
−∞ < µ < ∞
y | µ, Σ ~ N P (µ, Σ) Σ: positive definite
−∞ < y < ∞
$ 1 '
p(y | µ, Σ) = (2π)−p/2 | Σ |−1/2 exp %− (y − µ)T Σ−1 (y − µ)(
& 2 )
ð E[y | µ, Σ] = µ ; Var[y | µ, Σ] = Σ
14
The Bivariate Normal Distribution
! y $ )! ! $,
1 + µ1 $ # σ12 ρσ1σ 2 &.
# &~ N # &,
#" y 2 &% +# µ 2 & # ρσ σ σ 22 &%.-
*" %" 1 2
σ12
ρ=
σ12 σ 22
ρ: coefficient of correlation
σ12: covariance between y1
p(y1, y 2 | Θ) =
1 and y2
2πσ1σ 2 1− ρ 2
15
Multivariate Normal:
Marginal Distributions
" Σ Σ12 %
µ = (µ , µ ) and Σ = $
11
T T
y = (y , y ) T T T T '
1 2 1 2
$# Σ 21 Σ 22 '&
$ 1 '
= (2π)−p1 /2 | Σ11 |−1/2 exp %− (y1 − µ1 )T Σ11
−1
(y1 − µ1 )(
& 2 )
Multivariate Normal:
Marginal Distributions
16
Multivariate Normal:
Conditional Distributions
" Σ Σ12 %
µ = (µ , µ ) and Σ = $
11
T T
y = (y , y ) T T T T '
1 2 1 2
$# Σ 21 Σ 22 '&
Multivariate Normal:
Conditional Distributions
y2 y2
y1 y1
17
Wishart Distribution
ν: degrees of freedom
W | ν, S ~ Wishart ν (S) S: scale matrix; (k x k)
symm., positive definite
$ 1 '
p(W | ν, S) ∝| W |( ν−k−1)/2 exp %− tr (S−1W )(
& 2 )
E(W | ν, S) ∝ νS
ν: degrees of freedom
−1
W | ν, S ~ Inv-Wishart ν (S ) S: scale matrix; (k x k)
symm., positive definite
$ 1 '
p(W | ν, S) ∝| W |−( ν+k+1)/2 exp %− tr (SW −1 )(
& 2 )
E(W | ν, S) ∝ (ν − k −1)−1 S
18