100% found this document useful (1 vote)
537 views

Probability Distributions

This document provides a list and overview of various probability distributions, both discrete and continuous. It covers common discrete distributions like the binomial and hypergeometric distributions. It also mentions continuous distributions and mixed discrete-continuous distributions. For each distribution, it gives a brief definition and highlights important properties. The goal is to serve as a reference for the different types of probability distributions.

Uploaded by

newmetro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
537 views

Probability Distributions

This document provides a list and overview of various probability distributions, both discrete and continuous. It covers common discrete distributions like the binomial and hypergeometric distributions. It also mentions continuous distributions and mixed discrete-continuous distributions. For each distribution, it gives a brief definition and highlights important properties. The goal is to serve as a reference for the different types of probability distributions.

Uploaded by

newmetro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 248

List of Probability Distributions

compiled by Mark Herkommer


Contents

0.1 List of probability distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1


0.1.1 Discrete distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.1.2 Continuous distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
0.1.3 Mixed discrete/continuous distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
0.1.4 Joint distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
0.1.5 Non-numeric distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
0.1.6 Miscellaneous distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
0.1.7 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1 Discrete Distributions - With Finite Support 16


1.1 Bernoulli distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.1.1 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.1.2 Related distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.1.3 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.1.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.1.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.1.6 External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.2 Rademacher distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.2.1 Mathematical formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.2.2 van Zuijlen’s bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.2.3 Bounds on sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.2.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.2.5 Related distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.2.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.3 Binomial distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.3.1 Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.3.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.3.3 Mean and variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.3.4 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.3.5 Median . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

i
ii CONTENTS

1.3.6 Covariance between two binomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24


1.3.7 Related distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.3.8 Confidence intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.3.9 Generating binomial random variates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.3.10 Tail Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.3.11 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.3.12 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.3.13 External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.4 Beta-binomial distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.4.1 Motivation and derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.4.2 Moments and properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.4.3 Point estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.4.4 Further Bayesian considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.4.5 Shrinkage factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1.4.6 Related distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1.4.7 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1.4.8 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.4.9 External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.5 Degenerate distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.5.1 Constant random variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.6 Hypergeometric distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
1.6.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
1.6.2 Combinatorial identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
1.6.3 Application and example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
1.6.4 Symmetries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
1.6.5 Hypergeometric test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
1.6.6 Order of draws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
1.6.7 Related distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
1.6.8 Multivariate hypergeometric distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
1.6.9 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
1.6.10 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
1.6.11 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
1.6.12 External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
1.7 Poisson binomial distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
1.7.1 Mean and variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
1.7.2 Probability mass function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
1.7.3 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
1.7.4 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
CONTENTS iii

1.7.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
1.8 Fisher’s noncentral hypergeometric distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
1.8.1 Univariate distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
1.8.2 Multivariate distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
1.8.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
1.8.4 Software available . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
1.8.5 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
1.8.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
1.9 Wallenius’ noncentral hypergeometric distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
1.9.1 Univariate distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
1.9.2 Multivariate distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
1.9.3 Complementary Wallenius’ noncentral hypergeometric distribution . . . . . . . . . . . . . . . 51
1.9.4 Software available . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
1.9.5 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
1.9.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
1.10 Benford’s law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
1.10.1 Mathematical statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
1.10.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
1.10.3 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
1.10.4 Explanations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
1.10.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
1.10.6 Statistical tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
1.10.7 Generalization to digits beyond the first . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
1.10.8 Tests with common distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
1.10.9 Distributions known to obey Benford’s law . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
1.10.10 Distributions known to not obey Benford’s law . . . . . . . . . . . . . . . . . . . . . . . . . . 58
1.10.11 Criteria for distributions expected and not expected to obey Benford’s Law . . . . . . . . . . . 58
1.10.12 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
1.10.13 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
1.10.14 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
1.10.15 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
1.10.16 External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

2 Continuous Distributions - Supported on semi-infinite intervals, usually [0,∞) 72


2.1 Beta prime distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
2.1.1 Generalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
2.1.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
2.1.3 Related distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
2.1.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
iv CONTENTS

2.1.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
2.2 Birnbaum–Saunders distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
2.2.1 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
2.2.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
2.2.3 Probability density function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
2.2.4 Standard fatigue life distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
2.2.5 Cumulative distribution function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
2.2.6 Quantile function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.2.7 External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.2.8 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.3 Chi distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.3.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
2.3.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
2.3.3 Related distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
2.3.4 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
2.3.5 External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
2.4 Chi-squared distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
2.4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
2.4.2 Introduction to the chi-squared distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
2.4.3 Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
2.4.4 Relation to other distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
2.4.5 Generalizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
2.4.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
2
2.4.7 Table of χ value vs p-value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
2.4.8 History and name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
2.4.9 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
2.4.10 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
2.4.11 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
2.4.12 External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
2.5 Dagum distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
2.5.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
2.5.2 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
2.5.3 External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
2.6 Exponential distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
2.6.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
2.6.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
2.6.3 Parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
2.6.4 Generating exponential variates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
CONTENTS v

2.6.5 Related distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100


2.6.6 Applications of exponential distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
2.6.7 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
2.6.8 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
2.6.9 External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
2.7 F-distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
2.7.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
2.7.2 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
2.7.3 Generalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
2.7.4 Related distributions and properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
2.7.5 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
2.7.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
2.7.7 External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
2.8 Fisher’s z-distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
2.8.1 Related Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
2.8.2 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
2.8.3 External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
2.9 Folded normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
2.9.1 Differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
2.9.2 Related distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
2.9.3 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
2.9.4 External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
2.9.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
2.10 Fréchet distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
2.10.1 Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
2.10.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
2.10.3 Related distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
2.10.4 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
2.10.5 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
2.10.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
2.10.7 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
2.10.8 External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
2.11 Gamma distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
2.11.1 Characterization using shape k and scale θ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
2.11.2 Characterization using shape α and rate β . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
2.11.3 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
2.11.4 Parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
2.11.5 Generating gamma-distributed random variables . . . . . . . . . . . . . . . . . . . . . . . . . 120
vi CONTENTS

2.11.6 Related distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122


2.11.7 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
2.11.8 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
2.11.9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
2.11.10 External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
2.12 Generalized gamma distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
2.12.1 Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
2.12.2 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
2.12.3 Kullback-Leibler divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
2.12.4 Software implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
2.12.5 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
2.12.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
2.13 Generalized Pareto distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
2.13.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
2.13.2 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
2.13.3 Characteristic and Moment Generating Functions . . . . . . . . . . . . . . . . . . . . . . . . 128
2.13.4 Special cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
2.13.5 Generating generalized Pareto random variables . . . . . . . . . . . . . . . . . . . . . . . . . 128
2.13.6 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
2.13.7 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
2.13.8 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
2.13.9 External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
2.14 Gamma/Gompertz distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
2.14.1 Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
2.14.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
2.14.3 Related distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
2.14.4 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
2.14.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
2.14.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
2.15 Gompertz distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
2.15.1 Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
2.15.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
2.15.3 Related distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
2.15.4 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
2.15.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
2.15.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
2.16 Half-normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
2.16.1 Parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
CONTENTS vii

2.16.2 Related distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135


2.16.3 External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
2.16.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
2.17 Hotelling’s T-squared distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
2.17.1 The distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
2.17.2 Hotelling’s T-squared statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
2.17.3 Hotelling’s two-sample T-squared statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
2.17.4 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
2.17.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
2.17.6 External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
2.18 Inverse Gaussian distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
2.18.1 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
2.18.2 Relationship with Brownian motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
2.18.3 Maximum likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
2.18.4 Generating random variates from an inverse-Gaussian distribution . . . . . . . . . . . . . . . . 141
2.18.5 Related distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
2.18.6 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
2.18.7 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
2.18.8 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
2.18.9 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
2.18.10 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
2.18.11 External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
2.19 Lévy distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
2.19.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
2.19.2 Related distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
2.19.3 Random sample generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
2.19.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
2.19.5 Footnotes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
2.19.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
2.19.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
2.19.8 External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
2.20 Log-Cauchy distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
2.20.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
2.20.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
2.20.3 Estimating parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
2.20.4 Uses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
2.20.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
2.21 Log-Laplace distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
viii CONTENTS

2.21.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148


2.21.2 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
2.22 Log-logistic distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
2.22.1 Characterisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
2.22.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
2.22.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
2.22.4 Related distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
2.22.5 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
2.22.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
2.23 Log-normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
2.23.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
2.23.2 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
2.23.3 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
2.23.4 Occurrence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
2.23.5 Maximum likelihood estimation of parameters . . . . . . . . . . . . . . . . . . . . . . . . . 158
2.23.6 Multivariate log-normal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
2.23.7 Related distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
2.23.8 Similar distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
2.23.9 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
2.23.10 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
2.23.11 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
2.23.12 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
2.23.13 External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
2.24 Lomax distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
2.24.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
2.24.2 Relation to the Pareto distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
2.24.3 Relation to generalized Pareto distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
2.24.4 Relation to q-exponential distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
2.24.5 Non-central moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
2.24.6 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
2.24.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
2.25 Geometric stable distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
2.25.1 Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
2.25.2 Relationship to the stable distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
2.25.3 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
2.26 Nakagami distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
2.26.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
2.26.2 Parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
CONTENTS ix

2.26.3 Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166


2.26.4 History and applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
2.26.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
2.27 Pareto distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
2.27.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
2.27.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
2.27.3 Generalized Pareto distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
2.27.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
2.27.5 Relation to other distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
2.27.6 Lorenz curve and Gini coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
2.27.7 Parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
2.27.8 Graphical representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
2.27.9 Random sample generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
2.27.10 Variants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
2.27.11 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
2.27.12 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
2.27.13 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
2.27.14 External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
2.28 Pearson distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
2.28.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
2.28.2 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
2.28.3 Particular types of distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
2.28.4 Relation to other distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
2.28.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
2.28.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
2.28.7 Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
2.29 Phase-type distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
2.29.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
2.29.2 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
2.29.3 Special cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
2.29.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
2.29.5 Generating samples from phase-type distributed random variables . . . . . . . . . . . . . . . . 188
2.29.6 Approximating other distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
2.29.7 Fitting a phase type distribution to data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
2.29.8 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
2.29.9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
2.30 Rayleigh distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
2.30.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
x CONTENTS

2.30.2 Relation to random vector lengths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191


2.30.3 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
2.30.4 Parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
2.30.5 Generating random variates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
2.30.6 Related distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
2.30.7 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
2.30.8 Proof of correctness – Unequal variances . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
2.30.9 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
2.30.10 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
2.31 Rayleigh mixture distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
2.31.1 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
2.31.2 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
2.32 Rice distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
2.32.1 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
2.32.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
2.32.3 Related distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
2.32.4 Limiting cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
2.32.5 Parameter estimation (the Koay inversion technique) . . . . . . . . . . . . . . . . . . . . . . . 198
2.32.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
2.32.7 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
2.32.8 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
2.32.9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
2.32.10 External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
2.33 Shifted Gompertz distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
2.33.1 Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
2.33.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
2.33.3 Related distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
2.33.4 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
2.33.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
2.34 Type-2 Gumbel distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
2.34.1 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
2.35 Weibull distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
2.35.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
2.35.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
2.35.3 Weibull plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
2.35.4 Related distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
2.35.5 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
2.35.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
CONTENTS xi

2.35.7 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209


2.35.8 External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210

3 Text and image sources, contributors, and licenses 227


3.1 Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
3.2 Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
3.3 Content license . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
0.1. LIST OF PROBABILITY DISTRIBUTIONS 1

0.1 List of probability distributions


Many probability distributions are so important in theory or applications that they have been given specific names.

0.1.1 Discrete distributions


0.25

p=0.5 and n=20


p=0.7 and n=20
p=0.5 and n=40
0.20
0.15
0.10
0.05
0.00

0 10 20 30 40

Binomial distribution

With finite support

• The Bernoulli distribution, which takes value 1 with probability p and value 0 with probability q = 1 − p.

• The Rademacher distribution, which takes value 1 with probability 1/2 and value −1 with probability 1/2.

• The binomial distribution, which describes the number of successes in a series of independent Yes/No experiments
all with the same probability of success.

• The beta-binomial distribution, which describes the number of successes in a series of independent Yes/No exper-
iments with heterogeneity in the success probability.

• The degenerate distribution at x0 , where X is certain to take the value x0 . This does not look random, but it satisfies
the definition of random variable. This is useful because it puts deterministic variables and random variables in the
same formalism.

• The discrete uniform distribution, where all elements of a finite set are equally likely. This is the theoretical
distribution model for a balanced coin, an unbiased die, a casino roulette, or the first card of a well-shuffled deck.
2 CONTENTS

Degenerate distribution

• The hypergeometric distribution, which describes the number of successes in the first m of a series of n consec-
utive Yes/No experiments, if the total number of successes is known. This distribution arises when there is no
replacement.

• The Poisson binomial distribution, which describes the number of successes in a series of independent Yes/No
experiments with different success probabilities.

• Fisher’s noncentral hypergeometric distribution

• Wallenius’ noncentral hypergeometric distribution

• Benford’s law, which describes the frequency of the first digit of many naturally occurring data.

With infinite support

• The beta negative binomial distribution

• The Boltzmann distribution, a discrete distribution important in statistical physics which describes the probabilities
of the various discrete energy levels of a system in thermal equilibrium. It has a continuous analogue. Special cases
include:

• The Gibbs distribution


• The Maxwell–Boltzmann distribution
0.1. LIST OF PROBABILITY DISTRIBUTIONS 3

Poisson distribution

• The Borel distribution


• The extended negative binomial distribution
• The extended hypergeometric distribution
• The generalized log-series distribution
• The geometric distribution, a discrete distribution which describes the number of attempts needed to get the first
success in a series of independent Bernoulli trials, or alternatively only the number of losses before the first success
(i.e. one less).
• The logarithmic (series) distribution
• The negative binomial distribution or Pascal distribution a generalization of the geometric distribution to the nth
success.
• The parabolic fractal distribution
• The Poisson distribution, which describes a very large number of individually unlikely events that happen in a
certain time interval. Related to this distributions are a number of other distributions: the displaced Poisson, the
hyper-Poisson, the general Poisson binomial and the Poisson type distributions.
• The Conway–Maxwell–Poisson distribution, a two-parameter extension of the Poisson distribution with an
adjustable rate of decay.
4 CONTENTS

Skellam distribution

• The Zero-truncated Poisson distribution, for processes in which zero counts are not observed

• The Polya–Eggenberger distribution

• The Skellam distribution, the distribution of the difference between two independent Poisson-distributed random
variables.

• The skew elliptical distribution

• The Yule–Simon distribution

• The zeta distribution has uses in applied statistics and statistical mechanics, and perhaps may be of interest to
number theorists. It is the Zipf distribution for an infinite number of elements.

• Zipf’s law or the Zipf distribution. A discrete power-law distribution, the most famous example of which is the
description of the frequency of words in the English language.

• The Zipf–Mandelbrot law is a discrete power law distribution which is a generalization of the Zipf distribution.

0.1.2 Continuous distributions


Supported on a bounded interval

• The Arcsine distribution on [a,b], which is a special case of the Beta distribution if a=0 and b=1.
0.1. LIST OF PROBABILITY DISTRIBUTIONS 5

Beta distribution

• The Beta distribution on [0,1], a family of two-parameter distributions with one mode, of which the uniform
distribution is a special case, and which is useful in estimating success probabilities.
• The Logitnormal distribution on (0,1).
• The Dirac delta function although not strictly a function, is a limiting form of many continuous probability functions.
It represents a discrete probability distribution concentrated at 0 — a degenerate distribution — but the notation
treats it as if it were a continuous distribution.
• The continuous uniform distribution on [a,b], where all points in a finite interval are equally likely.
• The rectangular distribution is a uniform distribution on [−1/2,1/2].
• The Irwin–Hall distribution is the distribution of the sum of n i.i.d. U(0,1) random variables.
• The Bates distribution is the distribution of the mean of n i.i.d. U(0,1) random variables.
• The Kent distribution on the three-dimensional sphere.
• The Kumaraswamy distribution is as versatile as the Beta distribution but has simple closed forms for both the cdf
and the pdf.
• The logarithmic distribution (continuous)
• The Marchenko–Pastur distribution is important in the theory of random matrices.
• The PERT distribution is a special case of the beta distribution
6 CONTENTS

Continuous uniform distribution

• The raised cosine distribution on [ µ − s, µ + s ]

• The reciprocal distribution

• The triangular distribution on [a, b], a special case of which is the distribution of the sum of two independent
uniformly distributed random variables (the convolution of two uniform distributions).

• The trapezoidal distribution

• The truncated normal distribution on [a, b].

• The U-quadratic distribution on [a, b].

• The von Mises distribution on the circle.

• The von Mises-Fisher distribution on the N-dimensional sphere has the von Mises distribution as a special case.

• The Wigner semicircle distribution is important in the theory of random matrices.

Supported on semi-infinite intervals, usually [0,∞)

• The Beta prime distribution


0.1. LIST OF PROBABILITY DISTRIBUTIONS 7

Chi-squared distribution

• The Birnbaum–Saunders distribution, also known as the fatigue life distribution, is a probability distribution used
extensively in reliability applications to model failure times.

• The chi distribution

• The noncentral chi distribution

• The chi-squared distribution, which is the sum of the squares of n independent Gaussian random variables. It is a
special case of the Gamma distribution, and it is used in goodness-of-fit tests in statistics.

• The inverse-chi-squared distribution


• The noncentral chi-squared distribution
• The Scaled-inverse-chi-squared distribution

• The Dagum distribution

• The exponential distribution, which describes the time between consecutive rare random events in a process with
no memory.

• The F-distribution, which is the distribution of the ratio of two (normalized) chi-squared-distributed random vari-
ables, used in the analysis of variance. It is referred to as the beta prime distribution when it is the ratio of two
chi-squared variates which are not normalized by dividing them by their numbers of degrees of freedom.

• The noncentral F-distribution


8 CONTENTS

0.5
k = 1.0, θ = 2.0
k = 2.0, θ = 2.0
0.4 k = 3.0, θ = 2.0
k = 5.0, θ = 1.0
k = 9.0, θ = 0.5
0.3
k = 7.5, θ = 1.0
k = 0.5, θ = 1.0
0.2

0.1

0
0 2 4 6 8 10 12 14 16 18 20

Gamma distribution

• Fisher’s z-distribution

• The folded normal distribution

• The Fréchet distribution

• The Gamma distribution, which describes the time until n consecutive rare random events occur in a process with
no memory.

• The Erlang distribution, which is a special case of the gamma distribution with integral shape parameter,
developed to predict waiting times in queuing systems
• The inverse-gamma distribution

• The Generalized gamma distribution

• The generalized Pareto distribution

• The Gamma/Gompertz distribution

• The Gompertz distribution

• The half-normal distribution

• Hotelling’s T-squared distribution

• The inverse Gaussian distribution, also known as the Wald distribution


0.1. LIST OF PROBABILITY DISTRIBUTIONS 9

Pareto distribution

• The Lévy distribution

• The log-Cauchy distribution

• The log-Laplace distribution

• The log-logistic distribution

• The log-normal distribution, describing variables which can be modelled as the product of many small independent
positive variables.

• The Lomax distribution

• The Mittag–Leffler distribution

• The Nakagami distribution


10 CONTENTS

• The Pareto distribution, or “power law” distribution, used in the analysis of financial data and critical behavior.

• The Pearson Type III distribution

• The Phase-type distribution, used in queueing theory

• The phased bi-exponential distribution is commonly used in pharmokinetics

• The phased bi-Weibull distribution

• The Rayleigh distribution

• The Rayleigh mixture distribution

• The Rice distribution

• The shifted Gompertz distribution

• The type-2 Gumbel distribution

• The Weibull distribution or Rosin Rammler distribution, of which the exponential distribution is a special case,
is used to model the lifetime of technical devices and is used to describe the particle size distribution of particles
generated by grinding, milling and crushing operations.

Cauchy distribution
0.1. LIST OF PROBABILITY DISTRIBUTIONS 11

Laplace distribution

Supported on the whole real line

• The Behrens–Fisher distribution, which arises in the Behrens–Fisher problem.

• The Cauchy distribution, an example of a distribution which does not have an expected value or a variance. In
physics it is usually called a Lorentzian profile, and is associated with many processes, including resonance energy
distribution, impact and natural spectral line broadening and quadratic stark line broadening.

• Chernoff’s distribution

• The Exponentially modified Gaussian distribution, a convolution of a normal distribution with an exponential dis-
tribution.

• The Fisher–Tippett, extreme value, or log-Weibull distribution

• Fisher’s z-distribution

• The skewed generalized t distribution

• The generalized logistic distribution

• The generalized normal distribution

• The geometric stable distribution

• The Gumbel distribution


12 CONTENTS

Stable distribution

• The Holtsmark distribution, an example of a distribution that has a finite expected value but infinite variance.
• The hyperbolic distribution
• The hyperbolic secant distribution
• The Johnson SU distribution
• The Landau distribution
• The Laplace distribution
• The Lévy skew alpha-stable distribution or stable distribution is a family of distributions often used to characterize
financial data and critical behavior; the Cauchy distribution, Holtsmark distribution, Landau distribution, Lévy
distribution and normal distribution are special cases.
• The Linnik distribution
• The logistic distribution
• The map-Airy distribution
• The normal distribution, also called the Gaussian or the bell curve. It is ubiquitous in nature and statistics due to
the central limit theorem: every variable that can be modelled as a sum of many small independent, identically
distributed variables with finite mean and variance is approximately normal.
• The Normal-exponential-gamma distribution
0.1. LIST OF PROBABILITY DISTRIBUTIONS 13

• The Normal-inverse Gaussian distribution

• The Pearson Type IV distribution (see Pearson distributions)

• The skew normal distribution

• Student’s t-distribution, useful for estimating unknown means of Gaussian populations.

• The noncentral t-distribution


• The skew t distribution

• The type-1 Gumbel distribution

• The Tracy–Widom distribution

• The Voigt distribution, or Voigt profile, is the convolution of a normal distribution and a Cauchy distribution. It is
found in spectroscopy when spectral line profiles are broadened by a mixture of Lorentzian and Doppler broadening
mechanisms.

• The Gaussian minus exponential distribution is a convolution of a normal distribution with (minus) an exponential
distribution.

With variable support

• The generalized extreme value distribution has a finite upper bound or a finite lower bound depending on what
range the value of one of the parameters of the distribution is in (or is supported on the whole real line for one
special value of the parameter

• The generalized Pareto distribution has a support which is either bounded below only, or bounded both above and
below

• The Tukey lambda distribution is either supported on the whole real line, or on a bounded interval, depending on
what range the value of one of the parameters of the distribution is in.

• The Wakeby distribution

0.1.3 Mixed discrete/continuous distributions

• The rectified Gaussian distribution replaces negative values from a normal distribution with a discrete component
at zero.

• The compound poisson-gamma or Tweedie distribution is continuous over the strictly positive real numbers, with
a mass at zero.

0.1.4 Joint distributions

For any set of independent random variables the probability density function of their joint distribution is the product of
their individual density functions.
14 CONTENTS

Two or more random variables on the same sample space

• The Dirichlet distribution, a generalization of the beta distribution.

• The Ewens’s sampling formula is a probability distribution on the set of all partitions of an integer n, arising in
population genetics.

• The Balding–Nichols model

• The multinomial distribution, a generalization of the binomial distribution.

• The multivariate normal distribution, a generalization of the normal distribution.

• The multivariate t-distribution, a generalization of the Student’s t-distribution.

• The negative multinomial distribution, a generalization of the negative binomial distribution.

• The generalized multivariate log-gamma distribution

Matrix-valued distributions

• The Wishart distribution

• The inverse-Wishart distribution

• The matrix normal distribution

• The matrix t-distribution

0.1.5 Non-numeric distributions


• The categorical distribution

0.1.6 Miscellaneous distributions


• The Cantor distribution

• The generalized logistic distribution family

• The Pearson distribution family

• The phase-type distribution

0.1.7 See also


• Mixture distribution

• Cumulative distribution function

• Likelihood function

• List of statistical topics

• Probability density function

• Random variable

• Histogram
0.1. LIST OF PROBABILITY DISTRIBUTIONS 15

• Truncated distribution

• Copula (statistics)
• Probability distribution

• Relationships among probability distributions


Chapter 1

Discrete Distributions - With Finite Support

1.1 Bernoulli distribution


In probability theory and statistics, the Bernoulli distribution, named after Swiss scientist Jacob Bernoulli, is the
probability distribution of a random variable which takes the value 1 with success probability of p and the value 0 with
failure probability of q = 1 − p . It can be used to represent a coin toss where 1 and 0 would represent “head” and “tail”
(or vice versa), respectively. In particular, unfair coins would have p ̸= 0.5 .
The Bernoulli distribution is a special case of the two-point distribution, for which the two possible outcomes need not
be 0 and 1.

1.1.1 Properties
If X is a random variable with this distribution, we have:

P r(X = 1) = 1 − P r(X = 0) = 1 − q = p.

The probability mass function f of this distribution, over possible outcomes k, is


p ifk = 1,
f (k; p) =
1 − p ifk = 0.

This can also be expressed as

f (k; p) = pk (1 − p)1−k fork ∈ {0, 1}.

The expected value of a Bernoulli random variable X is

E (X) = p

and its variance is

Var (X) = p (1 − p) .

16
1.1. BERNOULLI DISTRIBUTION 17

Plot of Bernoulli distribution probability mass function

The Bernoulli distribution is a special case of the binomial distribution with n = 1 .[1]
The kurtosis goes to infinity for high and low values of p , but for p = 1/2 the two-point distributions including the
Bernoulli distribution have a lower excess kurtosis than any other probability distribution, namely −2.
The Bernoulli distributions for 0 ≤ p ≤ 1 form an exponential family.
The maximum likelihood estimator of p based on a random sample is the sample mean.

1.1.2 Related distributions

• If X1 , . . . , Xn are independent, identically distributed (i.i.d.) random variables, all Bernoulli distributed with
success probability p, then

∑n
Y = k=1 Xk ∼ B(n, p) (binomial distribution).

The Bernoulli distribution is simply B(1, p) .

• The categorical distribution is the generalization of the Bernoulli distribution for variables with any constant number
of discrete values.

• The Beta distribution is the conjugate prior of the Bernoulli distribution.

• The geometric distribution models the number of independent and identical Bernoulli trials needed to get one
success.

• If Y ~ Bernoulli(0.5), then (2Y−1) has a Rademacher distribution.


18 CHAPTER 1. DISCRETE DISTRIBUTIONS - WITH FINITE SUPPORT

1.1.3 See also


• Bernoulli process
• Bernoulli sampling
• Bernoulli trial
• Binary entropy function
• Binomial Distribution

1.1.4 Notes
[1] McCullagh and Nelder (1989), Section 4.2.2.

1.1.5 References
• McCullagh, Peter; Nelder, John (1989). Generalized Linear Models, Second Edition. Boca Raton: Chapman and
Hall/CRC. ISBN 0-412-31760-5.

• Johnson, N.L., Kotz, S., Kemp A. (1993) Univariate Discrete Distributions (2nd Edition). Wiley. ISBN 0-471-
54897-9

• Doctor Professor Patrick McDikkButte McGeep, Auschvitz 1943 get gud productions llc.

1.1.6 External links


• Hazewinkel, Michiel, ed. (2001), “Binomial distribution”, Encyclopedia of Mathematics, Springer, ISBN 978-1-
55608-010-4
• Weisstein, Eric W., “Bernoulli Distribution”, MathWorld.
• Interactive graphic: Univariate Distribution Relationships

1.2 Rademacher distribution


In probability theory and statistics, the Rademacher distribution (which is named after Hans Rademacher) is a discrete
probability distribution where a random variate X has a 50% chance of being either +1 or −1.[1]
A series of Rademacher distributed variables can be regarded as a simple symmetrical random walk where the step size
is 1.

1.2.1 Mathematical formulation


The probability mass function of this distribution is


1/2 if k = −1,
f (k) = 1/2 if k = +1,

0 otherwise.

It can be also written as a probability density function, in terms of the Dirac delta function, as
1.2. RADEMACHER DISTRIBUTION 19

1
f (k) = (δ (k − 1) + δ (k + 1)) .
2

1.2.2 van Zuijlen’s bound


van Zuijlen has proved the following result.[2]
Let Xi be a set of independent Rademacher distributed random variables. Then

( ∑n X )
i

Pr i=1 ≤ 1 ≥ 0.5.
n
The bound is sharp and better than that which can be derived from the normal distribution (approximately Pr > 0.31).

1.2.3 Bounds on sums


Let { Xᵢ } be a set of random variables with a Rademacher distribution. Let { aᵢ } be a sequence of real numbers. Then

∑ t2
Pr( Xi ai > t||a||2 ) ≤ e− 2
i

where ||a||2 is the Euclidean norm of the sequence { aᵢ }, t > 0 is a real number and Pr(Z) is the probability of event Z.[3]
Let Y = Σ Xᵢaᵢ and let Y be an almost surely convergent series in a Banach space. The for t > 0 and s ≥ 1 we have[4]

1 2
P r(||Y || > st) ≤ [ P r(||Y || > t)]cs
c
for some constant c.
Let p be a positive real number. Then[5]

∑ 1
∑ 1 ∑ 1
c1 [ |ai |2 ] 2 ≤ (E[| ai Xi |p ]) p ≤ c2 [ |ai |2 ] 2
where c1 and c2 are constants dependent only on p.
For p ≥ 1

c2 ≤ c1 p
Another bound on the sums is known as the Bernstein inequalities.

1.2.4 Applications
The Rademacher distribution has been used in bootstrapping.
The Rademacher distribution can be used to show that normally distributed and uncorrelated does not imply independent.
Random vectors with components sampled independently from the Rademacher distribution are useful for various stochastic
approximations, for example:

• The Hutchinson trace estimator,[6] which can be used to efficiently approximate the trace of a matrix of which the
elements are not directly accessible, but rather implicitly defined via matrix-vector products.
• SPSA, a computationally cheap, derivative-free, stochastic gradient approximation, useful for numerical optimiza-
tion.
20 CHAPTER 1. DISCRETE DISTRIBUTIONS - WITH FINITE SUPPORT

1.2.5 Related distributions

• Bernoulli distribution: If X has a Rademacher distribution, then X+1


2 has a Bernoulli(1/2) distribution.

• Laplace distribution: If X has a Rademacher distribution and Y ~ Exp(λ), then XY ~ Laplace(0, 1/λ).

1.2.6 References
[1] Hitczenko P, Kwapień S (1994) On the Rademacher series. Progress in probability 35: 31-36

[2] van Zuijlen Martien CA (2011) On a conjecture concerning the sum of independent Rademacher random variables. http:
//arxiv.org/abs/1112.4988

[3] MontgomerySmith SJ (1990) The distribution of Rademacher sums. Proc Amer Math Soc 109: 517522

[4] Dilworth SJ, Montgomery-Smith SJ (1993) The distribution of vector-valued Radmacher series. Ann Probab 21 (4) 2046-2052

[5] Khintchine A (1923) Über dyadische Brüche. Math Zeitschr 18: 109–116

[6] Avron, H. and Toledo, S. Randomized algorithms for estimating the trace of an implicit symmetric positive semidefinite matrix.
Journal of the ACM, 58(2):8, 2011.

1.3 Binomial distribution


“Binomial model” redirects here. For the binomial model in options pricing, see Binomial options pricing model.
See also: Negative binomial distribution
In probability theory and statistics, the binomial distribution with parameters n and p is the discrete probability dis-
tribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success
with probability p. A success/failure experiment is also called a Bernoulli experiment or Bernoulli trial; when n = 1, the
binomial distribution is a Bernoulli distribution. The binomial distribution is the basis for the popular binomial test of
statistical significance.
The binomial distribution is frequently used to model the number of successes in a sample of size n drawn with replacement
from a population of size N. If the sampling is carried out without replacement, the draws are not independent and so
the resulting distribution is a hypergeometric distribution, not a binomial one. However, for N much larger than n, the
binomial distribution is a good approximation, and widely used.

1.3.1 Specification

Probability mass function

In general, if the random variable X follows the binomial distribution with parameters n ∈ ℕ and p ∈ [0,1], we write X ~
B(n, p). The probability of getting exactly k successes in n trials is given by the probability mass function:

( )
n k
f (k; n, p) = Pr(X = k) = p (1 − p)n−k
k

for k = 0, 1, 2, ..., n, where

( )
n n!
=
k k!(n − k)!
1.3. BINOMIAL DISTRIBUTION 21

Binomial distribution for p = 0.5


with n and k as in Pascal’s triangle
The probability that a ball in a Galton box with 8 layers (n = 8) ends up in the central bin (k = 4) is 70/256 .

is the binomial coefficient, hence the name of the distribution. The formula can be understood as follows: we want exactly
k successes
(n) (pk ) and n − k failures (1 − p)n − k . However, the k successes can occur anywhere among the n trials, and there
are k different ways of distributing k successes in a sequence of n trials.
In creating reference tables for binomial distribution probability, usually the table is filled in up to n/2 values. This is
because for k > n/2, the probability can be calculated by its complement as

f (k, n, p) = f (n − k, n, 1 − p).

Looking at the expression ƒ(k, n, p) as a function of k, there is a k value that maximizes it. This k value can be found by
calculating

f (k + 1, n, p) (n − k)p
=
f (k, n, p) (k + 1)(1 − p)
and comparing it to 1. There is always an integer M that satisfies
22 CHAPTER 1. DISCRETE DISTRIBUTIONS - WITH FINITE SUPPORT

(n + 1)p − 1 ≤ M < (n + 1)p.

ƒ(k, n, p) is monotone increasing for k < M and monotone decreasing for k > M, with the exception of the case where (n
+ 1)p is an integer. In this case, there are two values for which ƒ is maximal: (n + 1)p and (n + 1)p − 1. M is the most
probable (most likely) outcome of the Bernoulli trials and is called the mode. Note that the probability of it occurring can
be fairly small.
The following recurrence relation holds:

 
 p(n − k) Prob(k) + (k + 1)(p − 1) Prob(k + 1) = 0, 
 Prob(0) = (1 − p)n 

Cumulative distribution function

The cumulative distribution function can be expressed as:

⌊k⌋ ( )
∑ n
F (k; n, p) = Pr(X ≤ k) = pi (1 − p)n−i
i=0
i

where ⌊k⌋ is the “floor” under k, i.e. the greatest integer less than or equal to k.
It can also be represented in terms of the regularized incomplete beta function, as follows:[1]

F (k; n, p) = Pr(X ≤ k)
= I1−p (n − k, k + 1)
( ) ∫ 1−p
n
= (n − k) tn−k−1 (1 − t)k dt.
k 0

Some closed-form bounds for the cumulative distribution function are given below.

1.3.2 Example

Suppose a biased coin comes up heads with probability 0.3 when tossed. What is the probability of achieving 0, 1,..., 6
heads after six tosses?

(6)
Pr(0heads ) = f (0) = Pr(X = 0) = 0 0.30 (1 − 0.3)6−0 ≈ 0.1176
(6)
Pr(1heads ) = f (1) = Pr(X = 1) = 1 0.31 (1 − 0.3)6−1 ≈ 0.3025
(6)
Pr(2heads ) = f (2) = Pr(X = 2) = 2 0.32 (1 − 0.3)6−2 ≈ 0.3241
(6)
Pr(3heads ) = f (3) = Pr(X = 3) = 3 0.33 (1 − 0.3)6−3 ≈ 0.1852
(6)
Pr(4heads ) = f (4) = Pr(X = 4) = 4 0.34 (1 − 0.3)6−4 ≈ 0.0595
(6)
Pr(5heads ) = f (5) = Pr(X = 5) = 5 0.35 (1 − 0.3)6−5 ≈ 0.0102
(6)
Pr(6heads ) = f (6) = Pr(X = 6) = 6 0.36 (1 − 0.3)6−6 ≈ 0.0007 [2]
1.3. BINOMIAL DISTRIBUTION 23

1.3.3 Mean and variance


If X ~ B(n, p), that is, X is a binomially distributed random variable, n being the total number of experiments and p the
probability of each experiment yielding a successful result, then the expected value of X is:[3]

E[X] = np,

(For example, if n=100, and p=1/4, then the average number of successful results will be 25)
The variance is:

Var[X] = np(1 − p).

1.3.4 Mode
Usually the mode of a binomial B(n, p) distribution is equal to ⌊(n + 1)p⌋ , where ⌊·⌋ is the floor function. However when
(n + 1)p is an integer and p is neither 0 nor 1, then the distribution has two modes: (n + 1)p and (n + 1)p − 1. When p is
equal to 0 or 1, the mode will be 0 and n correspondingly. These cases can be summarized as follows:



⌊(n + 1) p⌋ if(n + 1)pnoninteger a or 0 is ,
mode = (n + 1) p and (n + 1) p − 1 if(n + 1)p ∈ {1, . . . , n},


n if(n + 1)p = n + 1.

Proof
( )
Let f (k) = nk pk q n−k . For p = 0 only f (0) has a nonzero value with f (0) = 1 and for p = 1 we find f (n) = 1 and
f (k) = 0 for k ̸= n . This proves that the mode is 0 for p = 0 and n for p = 1 .
f (k+1) (n−k)p
Let 0 < p < 1 . We find f (k) = (k+1)p . From this follows

k > (n + 1)p − 1 ⇒ ak+1 < ak


k = (n + 1)p − 1 ⇒ ak+1 = ak
k < (n + 1)p − 1 ⇒ ak+1 > ak

So when (n + 1)p − 1 is an integer, then (n + 1)p − 1 and (n + 1)p is a mode. In the case that (n + 1)p − 1 ∈
/ Z , then
only ⌊(n + 1)p − 1⌋ + 1 = ⌊(n + 1)p⌋ is a mode.[4]

1.3.5 Median
In general, there is no single formula to find the median for a binomial distribution, and it may even be non-unique.
However several special results have been established:

• If np is an integer, then the mean, median, and mode coincide and equal np.[5][6]
• Any median m must lie within the interval ⌊np⌋ ≤ m ≤ ⌈np⌉.[7]
• A median m cannot lie too far away from the mean: |m − np| ≤ min{ ln 2, max{p, 1 − p} }.[8]
• The median is unique and equal to m = round(np) in cases when either p ≤ 1 − ln 2 or p ≥ ln 2 or |m − np| ≤ min{p,
1 − p} (except for the case when p = ½ and n is odd).[7][8]
24 CHAPTER 1. DISCRETE DISTRIBUTIONS - WITH FINITE SUPPORT

• When p = 1/2 and n is odd, any number m in the interval ½(n − 1) ≤ m ≤ ½(n + 1) is a median of the binomial
distribution. If p = 1/2 and n is even, then m = n/2 is the unique median.

1.3.6 Covariance between two binomials

If two binomially distributed random variables X and Y are observed together, estimating their covariance can be useful.
Using the definition of covariance, in the case n = 1 (thus being Bernoulli trials) we have

Cov(X, Y ) = E(XY ) − µX µY .

The first term is non-zero only when both X and Y are one, and μX and μY are equal to the two probabilities. Defining
pB as the probability of both happening at the same time, this gives

Cov(X, Y ) = pB − pX pY ,

and for n independent pairwise trials

Cov(X, Y )n = n(pB − pX pY ).

If X and Y are the same variable, this reduces to the variance formula given above.

1.3.7 Related distributions

Sums of binomials

If X ~ B(n, p) and Y ~ B(m, p) are independent binomial variables with the same probability p, then X + Y is again a
binomial variable; its distribution is X+Y ~ B(n+m, p).
However, if X and Y do not have the same probability p, then the variance of the sum will be smaller than the variance
of a binomial variable distributed as B(n + m, p̄).

Conditional binomials

If X ~ B(n, p) and, conditional on X, Y ~ B(X, q), then Y is a simple binomial variable with distribution

Y ∼ B(n, pq).

For example imagine throwing n balls to a basket UX and taking the balls that hit and throwing them to another basket
UY. If p is the probability to hit UX then X ~ B(n, p) is the number of balls that hit UX. If q is the probability to hit UY
then the number of balls that hit UY is Y ~ B(X, q) and therefore Y ~ B(n, pq).

Bernoulli distribution

The Bernoulli distribution is a special case of the binomial distribution, where n = 1. Symbolically, X ~ B(1, p) has the
same meaning as X ~ Bern(p). Conversely, any binomial distribution, B(n, p), is the distribution of the sum of n Bernoulli
trials, Bern(p), each with the same probability p.
1.3. BINOMIAL DISTRIBUTION 25

Poisson binomial distribution

The binomial distribution is a special case of the Poisson binomial distribution, which is a sum of n independent non-
identical Bernoulli trials Bern(pi).[9] If X has the Poisson binomial distribution with p1 = … = pn =p then X ~ B(n,
p).

Normal approximation

Normal p.d.f.
0.3
Binomial p.m.f.
0.25

0.2
P[X=k]

0.15

0.1

0.05

0
0 1 2 3 4 5 6

k
Binomial probability mass function and normal probability density function approximation for n = 6 and p = 0.5

If n is large enough, then the skew of the distribution is not too great. In this case a reasonable approximation to B(n, p)
is given by the normal distribution

N (np, np(1 − p)),

and this basic approximation can be improved in a simple way by using a suitable continuity correction. The basic
26 CHAPTER 1. DISCRETE DISTRIBUTIONS - WITH FINITE SUPPORT

approximation generally improves as n increases (at least 20) and is better when p is not near to 0 or 1.[10] Various rules
of thumb may be used to decide whether n is large enough, and p is far enough from the extremes of zero or one:

• One rule is that both x=np and n(1 − p) must be greater than 5. However, the specific number varies from source
to source, and depends on how good an approximation one wants; some sources give 10 which gives virtually the
same results as the following rule for large n until n is very large (ex: x=11, n=7752).
• A second rule[10] is that for n > 5 the normal approximation is adequate if

( ) (√ √ )
1 1−p p
√ − < 0.3
n p 1−p

• Another commonly used rule holds that the normal approximation is appropriate only if everything within 3 standard
deviations of its mean is within the range of possible values, that is if


µ ± 3σ = np ± 3 np(1 − p) ∈ [0, n].

The following is an example of applying a continuity correction. Suppose one wishes to calculate Pr(X ≤ 8) for a binomial
random variable X. If Y has a distribution given by the normal approximation, then Pr(X ≤ 8) is approximated by Pr(Y
≤ 8.5). The addition of 0.5 is the continuity correction; the uncorrected normal approximation gives considerably less
accurate results.
This approximation, known as de Moivre–Laplace theorem, is a huge time-saver when undertaking calculations by hand
(exact calculations with large n are very onerous); historically, it was the first use of the normal distribution, introduced in
Abraham de Moivre's book The Doctrine of Chances in 1738. Nowadays, it can be seen as a consequence of the central
limit theorem since B(n, p) is a sum of n independent, identically distributed Bernoulli variables with parameter p. This
fact is the basis of a hypothesis test, a “proportion z-test”, for the value of p using x/n, the sample proportion and estimator
of p, in a common test statistic.[11]
For example, suppose one randomly samples n people out of a large population and ask them whether they agree with a
certain statement. The proportion of people who agree will of course depend on the sample. If groups of n people were
sampled repeatedly and truly randomly, the proportions would follow an approximate normal distribution with mean equal
to the true proportion p of agreement in the population and with standard deviation σ = (p(1 − p)/n)1/2 .

Poisson approximation

The binomial distribution converges towards the Poisson distribution as the number of trials goes to infinity while the
product np remains fixed. Therefore the Poisson distribution with parameter λ = np can be used as an approximation to
B(n, p) of the binomial distribution if n is sufficiently large and p is sufficiently small. According to two rules of thumb,
this approximation is good if n ≥ 20 and p ≤ 0.05, or if n ≥ 100 and np ≤ 10.[12]

Limiting distributions

• Poisson limit theorem: As n approaches ∞ and p approaches 0 while np remains fixed at λ > 0 or at least np approaches
λ > 0, then the Binomial(n, p) distribution approaches the Poisson distribution with expected value λ.[12]
• de Moivre–Laplace theorem: As n approaches ∞ while p remains fixed, the distribution of

X − np

np(1 − p)
1.3. BINOMIAL DISTRIBUTION 27

approaches the normal distribution with expected value 0 and variance 1. This result is sometimes loosely
stated by saying that the distribution of X is asymptotically normal with expected value np and variance np(1
− p). This result is a specific case of the central limit theorem.

Beta distribution

Beta distributions provide a family of conjugate prior probability distributions for binomial distributions in Bayesian
inference. The domain of the beta distribution can be viewed as a probability, and in fact the beta distribution is often
used to describe the distribution of a probability value p:[13]

pα−1 (1 − p)β−1
P (p; α, β) =
B(α, β)

1.3.8 Confidence intervals


Main article: Binomial proportion confidence interval

Even for quite large values of n, the actual distribution of the mean is significantly nonnormal.[14] Because of this problem
several methods to estimate confidence intervals have been proposed.
Let n1 be the number of successes out of n, the total number of trials, and let

n1
p̂ =
n
be the proportion of successes. Let zα/₂ be the 100(1 − α/2)th percentile of the standard normal distribution.

• Wald method


p̂(1 − p̂)
p̂ ± z α2 .
n

A continuity correction of 0.5/n may be added.

• Agresti-Coull method[15]


p̃(1 − p̃)
p̃ ± z α2 .
n + z 2α
2

Here the estimate of p is modified to

n1 + 12 z 2α
2
p̃ =
n + z 2α
2

• ArcSine method[16]
28 CHAPTER 1. DISCRETE DISTRIBUTIONS - WITH FINITE SUPPORT

( (√ ) )
z
sin2 arcsin p̂ ± √
2 n

• Wilson (score) method[17]


p̂ + 1 2
2n z1− α ± 1
2n z1− 2
α 4np̂(1 − p̂) + z1−
2
α
2 2
.
1 + n1 z1−
2
α
2

The exact (Clopper-Pearson) method is the most conservative.[14] The Wald method although commonly recommended
in the text books is the most biased.

1.3.9 Generating binomial random variates


Methods for random number generation where the marginal distribution is a binomial distribution are well-established.[18][19]
One way to generate random samples from a binomial distribution is to use an inversion algorithm. To do so, one must
calculate the probability that P(X=k) for all values k from 0 through n. (These probabilities should sum to a value close
to one, in order to encompass the entire sample space.) Then by using a linear congruential generator to generate samples
uniform between 0 and 1, one can transform the calculated samples U[0,1] into discrete numbers by using the probabilities
calculated in step one.

1.3.10 Tail Bounds


For k ≤ np, upper bounds for the lower tail of the distribution function can be derived. In particular, Hoeffding’s inequality
yields the bound

( )
(np − k)2
F (k; n, p) ≤ exp −2 ,
n

and Chernoff’s inequality can be used to derive the bound

( )
1 (np − k)2
F (k; n, p) ≤ exp − .
2p n

Moreover, these bounds are reasonably tight when p = 1/2, since the following expression holds for all k ≥ 3n/8[20]

( )
1 16( n2 − k)2
F (k; n, 1
2) ≥ exp − .
15 n

However, the bounds do not work well for extreme values of p. In particular, as p → 1, value F(k;n,p) goes to zero (for
fixed k, n with k<n) while the upper bound above goes to a positive constant. In this case a better bound is given by [21]

( ( ))
k k
F (k; n, p) ≤ exp −nD || p if 0 < <p
n n
1.3. BINOMIAL DISTRIBUTION 29

where D(a|| p) is the relative entropy between an a-coin and a p-coin (i.e. between the Bernoulli(a) and Bernoulli(p)
distribution):

a 1−a
D(a||p) = (a) log + (1 − a) log .
p 1−p

Asymptotically, this bound is reasonably tight; see [21] for details. An equivalent formulation of the bound is

( ( ))
k k
Pr(X ≥ k) = F (n − k; n, 1 − p) ≤ exp −nD || p if p < < 1.
n n
Both these bounds are derived directly from the Chernoff bound. It can also be shown that,

( ( ))
1 k k
Pr(X ≥ k) = F (n − k; n, 1 − p) ≥ exp −nD || p if p < < 1.
(n + 1)2 n n
This is proved using the method of types (see for example chapter 12 of Elements of Information Theory by Cover and
Thomas [22] ).

1.3.11 See also


• Logistic regression
• Multinomial distribution
• Negative binomial distribution
• Binomial measure, an example of a multifractal measure.[23]

1.3.12 References
[1] Wadsworth, G. P. (1960). Introduction to probability and random variables. USA: McGraw-Hill New York. p. 52.

[2] Hamilton Institute. “The Binomial Distribution” October 20, 2010.

[3] See Proof Wiki

[4] See also the answer to the question “finding mode in Binomial distribution”

[5] Neumann, P. (1966). "Über den Median der Binomial- and Poissonverteilung”. Wissenschaftliche Zeitschrift der Technischen
Universität Dresden (in German) 19: 29–33.

[6] Lord, Nick. (July 2010). “Binomial averages when the mean is an integer”, The Mathematical Gazette 94, 331-332.

[7] Kaas, R.; Buhrman, J.M. (1980). “Mean, Median and Mode in Binomial Distributions”. Statistica Neerlandica 34 (1): 13–18.
doi:10.1111/j.1467-9574.1980.tb00681.x.

[8] Hamza, K. (1995). “The smallest uniform upper bound on the distance between the mean and the median of the binomial and
Poisson distributions”. Statistics & Probability Letters 23: 21–25. doi:10.1016/0167-7152(94)00090-U.

[9] Wang, Y. H. (1993). “On the number of successes in independent trials” (PDF). Statistica Sinica 3 (2): 295–312.

[10] Box, Hunter and Hunter (1978). Statistics for experimenters. Wiley. p. 130.

[11] NIST/SEMATECH, “7.2.4. Does the proportion of defectives meet requirements?" e-Handbook of Statistical Methods.

[12] NIST/SEMATECH, “6.3.3.1. Counts Control Charts”, e-Handbook of Statistical Methods.


30 CHAPTER 1. DISCRETE DISTRIBUTIONS - WITH FINITE SUPPORT

[13] MacKay, David (2003). Information Theory, Inference and Learning Algorithms. Cambridge University Press; First Edition.
ISBN 978-0521642989.

[14] Brown, Lawrence D.; Cai, T. Tony; DasGupta, Anirban (2001), “Interval Estimation for a Binomial Proportion”, Statistical
Science 16 (2): 101–133, doi:10.1214/ss/1009213286, retrieved 2015-01-05

[15] Agresti, Alan; Coull, Brent A. (May 1998), “Approximate is better than 'exact' for interval estimation of binomial proportions”
(PDF), The American Statistician 52 (2): 119–126, doi:10.2307/2685469, retrieved 2015-01-05

[16] Pires MA Confidence intervals for a binomial proportion: comparison of methods and software evaluation.

[17] Wilson, Edwin B. (June 1927), “Probable inference, the law of succession, and statistical inference” (PDF), J. American Statis-
tical Association 22 (158): 209–212, doi:10.2307/2276774, retrieved 2015-01-05

[18] Devroye, Luc (1986) Non-Uniform Random Variate Generation, New York: Springer-Verlag. (See especially Chapter X, Dis-
crete Univariate Distributions)

[19] Kachitvichyanukul, V.; Schmeiser, B. W. (1988). “Binomial random variate generation”. Communications of the ACM 31 (2):
216–222. doi:10.1145/42372.42381.

[20] Matoušek, J, Vondrak, J: The Probabilistic Method (lecture notes) .

[21] R. Arratia and L. Gordon: Tutorial on large deviations for the binomial distribution, Bulletin of Mathematical Biology 51(1)
(1989), 125–131 .

[22] T. Cover and J. Thomas, “Elements of Information Theory, 2nd Edition”, Wiley 2006

[23] Mandelbrot, B. B., Fisher, A. J., & Calvet, L. E. (1997). A multifractal model of asset returns. 3.2 The Binomial Measure is the
Simplest Example of a Multifractal

1.3.13 External links

• Interactive graphic: Univariate Distribution Relationships

• Binomial distribution formula calculator

• Binomial distribution calculator

• Difference of two binomial variables: X-Y or |X-Y|

1.4 Beta-binomial distribution


In probability theory and statistics, the beta-binomial distribution is a family of discrete probability distributions on
a finite support of non-negative integers arising when the probability of success in each of a fixed or known number of
Bernoulli trials is either unknown or random. The beta-binomial distribution is the binomial distribution in which the
probability of success at each trial is not fixed but random and follows the beta distribution. It is frequently used in
Bayesian statistics, empirical Bayes methods and classical statistics as an overdispersed binomial distribution.
It reduces to the Bernoulli distribution as a special case when n = 1. For α = β = 1, it is the discrete uniform distribution
from 0 to n. It also approximates the binomial distribution arbitrarily well for large α and β. The beta-binomial is a
one-dimensional version of the Dirichlet-multinomial distribution, as the binomial and beta distributions are special cases
of the multinomial and Dirichlet distributions, respectively.
1.4. BETA-BINOMIAL DISTRIBUTION 31

1.4.1 Motivation and derivation


Beta-binomial distribution as a compound distribution

The Beta distribution is a conjugate distribution of the binomial distribution. This fact leads to an analytically tractable
compound distribution where one can think of the p parameter in the binomial distribution as being randomly drawn from
a beta distribution. Namely, if

X ∼ Bin(n, p)
( )
n k
thenP (X = k|p, n) = L(k|p) = p (1 − p)n−k
k
where Bin(n,p) stands for the binomial distribution, and where p is a random variable with a beta distribution.

π(p|α, β) = Beta(α, β)
pα−1 (1 − p)β−1
=
B(α, β)
then the compound distribution is given by

∫ 1
f (k|n, α, β) = L(k|p)π(p|α, β) dp
0
( ) ∫ 1
n 1
= pk+α−1 (1 − p)n−k+β−1 dp
k B(α, β) 0
( )
n B(k + α, n − k + β)
= .
k B(α, β)
Using the properties of the beta function, this can alternatively be written

Γ(n + 1) Γ(k + α)Γ(n − k + β) Γ(α + β)


f (k|n, α, β) = .
Γ(k + 1)Γ(n − k + 1) Γ(n + α + β) Γ(α)Γ(β)
It is within this context that the beta-binomial distribution appears often in Bayesian statistics: the beta-binomial is the
posterior predictive distribution of a binomial random variable with a beta distribution prior on the success probability.

Beta-binomial as an urn model

The beta-binomial distribution can also be motivated via an urn model for positive integer values of α and β, known as the
Polya urn model. Specifically, imagine an urn containing α red balls and β black balls, where random draws are made. If
a red ball is observed, then two red balls are returned to the urn. Likewise, if a black ball is drawn, then two black balls
are returned to the urn. If this is repeated n times, then the probability of observing k red balls follows a beta-binomial
distribution with parameters n,α and β.
Note that if the random draws are with simple replacement (no balls over and above the observed ball are added to the
urn), then the distribution follows a binomial distribution and if the random draws are made without replacement, the
distribution follows a hypergeometric distribution.

1.4.2 Moments and properties


The first three raw moments are
32 CHAPTER 1. DISCRETE DISTRIBUTIONS - WITH FINITE SUPPORT


µ1 =
α+β

nα[n(1 + α) + β]
µ2 =
(α + β)(1 + α + β)

nα[n2 (1 + α)(2 + α) + 3n(1 + α)β + β(β − α)]


µ3 =
(α + β)(1 + α + β)(2 + α + β)

and the kurtosis is

[ ]
(α + β)2 (1 + α + β) 3αβn(6 − n) 18αβn2
β2 = (α + β)(α + β − 1 + 6n) + 3αβ(n − 2) + 6n2 − − .
nαβ(α + β + 2)(α + β + 3)(α + β + n) α+β (α + β)2

α
Letting π = α+β we note, suggestively, that the mean can be written as


µ= = nπ
α+β

and the variance as

nαβ(α + β + n) α+β+n
σ2 = 2
= nπ(1 − π) = nπ(1 − π)[1 + (n − 1)ρ]
(α + β) (α + β + 1) α+β+1

1
where ρ = α+β+1 is the pairwise correlation between the n Bernoulli draws and is called the over-dispersion parameter.
The following recurrence relation holds:

 
 (α + k)(n − k)p(k) − (k + 1)p(k + 1)(β − k + n − 1) = 0, 
 p(0) = (β)n 
(α+β)n

1.4.3 Point estimates


Method of moments

The method of moments estimates can be gained by noting the first and second moments of the beta-binomial namely


µ1 =
α+β
nα[n(1 + α) + β]
µ2 =
(α + β)(1 + α + β)

and setting these raw moments equal to the first and second raw sample moments respectively
1.4. BETA-BINOMIAL DISTRIBUTION 33

µ̂1 = m1
µ̂2 = m2

and solving for α and β we get

nm1 − m2
α̂ =
m1 − m1 − 1) + m1
n( m 2

(n − m1 )(n − m2
m1 )
β̂ = .
m1 − m1 − 1) + m1
n( m 2

Note that these estimates can be non-sensically negative which is evidence that the data is either undispersed or under-
dispersed relative to the binomial distribution. In this case, the binomial distribution and the hypergeometric distribution
are alternative candidates respectively.

Maximum likelihood estimation

While closed-form maximum likelihood estimates are impractical, given that the pdf consists of common functions
(gamma function and/or Beta functions), they can be easily found via direct numerical optimization. Maximum likeli-
hood estimates from empirical data can be computed using general methods for fitting multinomial Pólya distributions,
methods for which are described in (Minka 2003). The R package VGAM through the function vglm, via maximum like-
lihood, facilitates the fitting of glm type models with responses distributed according to the beta-binomial distribution.
Note also that there is no requirement that n is fixed throughout the observations.

Example

The following data gives the number of male children among the first 12 children of family size 13 in 6115 families taken
from hospital records in 19th century Saxony (Sokal and Rohlf, p. 59 from Lindsey). The 13th child is ignored to assuage
the effect of families non-randomly stopping when a desired gender is reached.
We note the first two sample moments are

m1 = 6.23
m2 = 42.31
n = 12

and therefore the method of moments estimates are

α̂ = 34.1350
β̂ = 31.6085.

The maximum likelihood estimates can be found numerically

α̂mle = 34.09558
β̂mle = 31.5715
34 CHAPTER 1. DISCRETE DISTRIBUTIONS - WITH FINITE SUPPORT

and the maximized log-likelihood is

log L = −12492.9

from which we find the AIC

AIC = 24989.74.

The AIC for the competing binomial model is AIC = 25070.34 and thus we see that the beta-binomial model provides
a superior fit to the data i.e. there is evidence for overdispersion. Trivers and Willard posit a theoretical justification for
heterogeneity (also known as "burstiness") in gender-proneness among families (i.e. overdispersion).
The superior fit is evident especially among the tails

1.4.4 Further Bayesian considerations


It is convenient to reparameterize the distributions so that the expected mean of the prior is a single parameter: Let

π(θ|µ, M ) = Beta(M µ, M (1 − µ))


Γ(M )
= θM µ−1 (1 − θ)M (1−µ)−1
Γ(M µ)Γ(M (1 − µ))

where

α
µ=
α+β
M =α+β

so that

E(θ|µ, M ) = µ
µ(1 − µ)
Var(θ|µ, M ) = .
M +1

The posterior distribution ρ(θ|k) is also a beta distribution:

ρ(θ|k) ∝ ℓ(k|θ)π(θ|µ, M )
= Beta(k + M µ, n − k + M (1 − µ))
( )
Γ(M ) n k+M µ−1
= θ (1 − θ)n−k+M (1−µ)−1
Γ(M µ)Γ(M (1 − µ)) k

And
1.4. BETA-BINOMIAL DISTRIBUTION 35

k + Mµ
E(θ|k) = .
n+M
while the marginal distribution m(k|μ, M) is given by

∫ 1
m(k|µ, M ) = l(k|θ)π(θ|µ, M ) dθ
0
( )∫ 1
Γ(M ) n
= θk+M µ−1 (1 − θ)n−k+M (1−µ)−1 dθ
Γ(M µ)Γ(M (1 − µ)) k
( ) 0
Γ(M ) n Γ(k + M µ)Γ(n − k + M (1 − µ))
= .
Γ(M µ)Γ(M (1 − µ)) k Γ(n + M )

Substituting back M and μ, in terms of α and β , this becomes:


Γ(n+1) Γ(k+α)Γ(n−k+β) Γ(α+β)
m(k|α, β) = Γ(k+1)Γ(n−k+1) Γ(n+α+β) Γ(α)Γ(β) .

which is the expected beta-binomial distribution with parameters n, α and β .


We can also use the method of iterated expectations to find the expected value of the marginal moments. Let us write
our model as a two-stage compound sampling model. Let ki be the number of success out of ni trials for event i:

ki ∼ Bin(ni , θi )
θi ∼ Beta(µ, M ), i.i.d.

We can find iterated moment estimates for the mean and variance using the moments for the distributions in the two-stage
model:

( ) [ ( )]
k k
E =E E θ = E(θ) = µ
n n

( ) [ ( )] [ ( )]
k k k
var = E var θ + var E θ
n n n
[( ) ]
1
=E θ(1 − θ) µ, M + var (θ|µ, M )
n
1 n − 1 (µ(1 − µ))
= (µ(1 − µ)) +
n n M +1
( )
µ(1 − µ) n−1
= 1+ .
n M +1

(Here we have used the law of total expectation and the law of total variance.)
We want point estimates for µ and M . The estimated mean µ̂ is calculated from the sample

∑N
i=1 ki
µ̂ = ∑N .
i=1 ni
36 CHAPTER 1. DISCRETE DISTRIBUTIONS - WITH FINITE SUPPORT

The estimate of the hyperparameter M is obtained using the moment estimates for the variance of the two-stage model:

( ) [ ]
1 ∑ 1 ∑ µ̂(1 − µ̂)
N N
ki ni − 1
s2 = var = 1+
N i=1 ni N i=1 ni c+1
M

Solving:

c= µ̂(1 − µ̂) − s2
M ∑N ,
s2 − µ̂(1−µ̂)
N i=1 1/n i

where

∑N
2 N ni (θˆi − µ̂)2
i=1
s = ∑N .
(N − 1) i=1 ni

Since we now have parameter point estimates, µ̂ and M c , for the underlying distribution, we would like to find a point
estimate θ̃i for the probability of success for event i. This is the weighted average of the event estimate θˆi = ki /ni and
µ̂ . Given our point estimates for the prior, we may now plug in these values to find a point estimate for the posterior

cµ̂
ki + M c
M ni k i
θ˜i = E(θ|ki ) = = µ̂ + .
c
ni + M c
ni + M c ni
ni + M

1.4.5 Shrinkage factors


We may write the posterior estimate as a weighted average:

θ̃i = B̂i µ̂ + (1 − B̂i )θ̂i

where B̂i is called the shrinkage factor.


B̂i =
M̂ + ni

1.4.6 Related distributions


• BB(1, 1, n) ∼ U (0, n) where U (a, b) is the discrete uniform distribution.

1.4.7 See also


• Dirichlet-multinomial distribution
1.5. DEGENERATE DISTRIBUTION 37

1.4.8 References
• Minka, Thomas P. (2003). Estimating a Dirichlet distribution. Microsoft Technical Report.

1.4.9 External links


• Using the Beta-binomial distribution to assess performance of a biometric identification device
• Fastfit contains Matlab code for fitting Beta-Binomial distributions (in the form of two-dimensional Pólya distribu-
tions) to data.
• Interactive graphic: Univariate Distribution Relationships
• Beta-Binomial distribution package for R

1.5 Degenerate distribution


In mathematics, a degenerate distribution or deterministic distribution is the probability distribution of a random
variable which only takes a single value. Examples include a two-headed coin and rolling a die whose sides all show the
same number. This distribution satisfies the definition of “random variable” even though it does not appear random in the
everyday sense of the word; hence it is considered degenerate.
The degenerate distribution is localized at a point k0 on the real line. The probability mass function equals 1 at this point
and 0 elsewhere.
The distribution can be viewed as the limiting case of a continuous distribution whose variance goes to 0 causing the
probability density function to be a delta function at k0 , with infinite height there but area equal to 1.
The cumulative distribution function of the degenerate distribution is:
{
1, if k ≥ k0
F (k; k0 ) =
0, if k < k0

1.5.1 Constant random variable


In probability theory, a constant random variable is a discrete random variable that takes a constant value, regardless
of any event that occurs. This is technically different from an almost surely constant random variable, which may take
other values, but only on events with probability zero. Constant and almost surely constant random variables provide a
way to deal with constant values in a probabilistic framework.
Let X: Ω → R be a random variable defined on a probability space (Ω, P). Then X is an almost surely constant random
variable if there exists c ∈ R such that

Pr(X = c) = 1,
and is furthermore a constant random variable if

X(ω) = c, ∀ω ∈ Ω.
Note that a constant random variable is almost surely constant, but not necessarily vice versa, since if X is almost surely
constant then there may exist γ ∈ Ω such that X(γ) ≠ c (but then necessarily Pr({γ}) = 0, in fact Pr(X ≠ c) = 0).
For practical purposes, the distinction between X being constant or almost surely constant is unimportant, since the
cumulative distribution function F(x) of X does not depend on whether X is constant or 'merely' almost surely constant.
In this case,
38 CHAPTER 1. DISCRETE DISTRIBUTIONS - WITH FINITE SUPPORT

{
1, x ≥ c,
F (x) =
0, x < c.

The function F(x) is a step function; in particular it is a translation of the Heaviside step function.

1.6 Hypergeometric distribution


In probability theory and statistics, the hypergeometric distribution is a discrete probability distribution that describes
the probability of k successes in n draws, without replacement, from a finite population of size N that contains exactly K
successes, wherein each draw is either a success or a failure. In contrast, the binomial distribution describes the probability
of k successes in n draws with replacement.
In statistics, the hypergeometric test uses the hypergeometric distribution to calculate the statistical significance of having
drawn a specific k successes (out of n total draws) from the aforementioned population. The test is often used to identify
which sub-populations are over- or under-represented in a sample. This test has a wide range of applications. For example,
a marketing group could use the test to understand their customer base by testing a set of known customers for over-
representation of various demographic subgroups (e.g., women, people under 30).

1.6.1 Definition
The following conditions characterize the hypergeometric distribution:

• The result of each draw (the elements of the population being sampled) can be classified into one of two mutually
exclusive categories (e.g. Pass/Fail or Female/Male or Employed/Unemployed).
• The probability of a success changes on each draw, as each draw decreases the population (sampling without re-
placement from a finite population).

A random variable X follows the hypergeometric distribution if its probability mass function (pmf) is given by[1]

(K )(N −K )
k
P (X = k) = (Nn−k
)
n

where

• N is the population size,


• K is the number of success states in the population,
• n is the number of draws,
• k is the number of observed successes,
( )
• ab is a binomial coefficient.

The pmf is positive when max(0, n + K − N ) ≤ k ≤ min(K, n) .


The pmf satisfies the recurrence relation

(k + 1)(N − K − (n − k − 1))P (X = k + 1) = (K − k)(n − k)P (X = k)


1.6. HYPERGEOMETRIC DISTRIBUTION 39

with

(N −K )
P (X = 0) = (N
n
)
n

1.6.2 Combinatorial identities


As one would expect, the probabilities sum up to 1 :
∑ −K
(Kk )(Nn−k )
0≤k≤n N =1
(n)
This is essentially Vandermonde’s identity from combinatorics.
Also note the following identity holds:

(K )(N −K ) (n)(N −n)


k
(Nn−k
) = k
(NK−k
) .
n K

This follows from the symmetry of the problem, but it can also be shown by expressing the binomial coefficients in terms
of factorials and rearranging the latter.

1.6.3 Application and example


The classical application of the hypergeometric distribution is sampling without replacement. Think of an urn with
two types of marbles, red ones and green ones. Define drawing a green marble as a success and drawing a red marble as
a failure (analogous to the binomial distribution). If the variable N describes the number of all marbles in the urn (see
contingency table below) and K describes the number of green marbles, then N − K corresponds to the number of red
marbles. In this example, X is the random variable whose outcome is k, the number of green marbles actually drawn in
the experiment. This situation is illustrated by the following contingency table:
Now, assume (for example) that there are 5 green and 45 red marbles in the urn. Standing next to the urn, you close
your eyes and draw 10 marbles without replacement. What is the probability that exactly 4 of the 10 are green? Note
that although we are looking at success/failure, the data are not accurately modeled by the binomial distribution, because
the probability of success on each trial is not the same, as the size of the remaining population changes as we remove each
marble.
This problem is summarized by the following contingency table:
The probability of drawing exactly k green marbles can be calculated by the formula

(K )(N −K )
k
P (X = k) = f (k; N, K, n) = (Nn−k
) .
n

Hence, in this example calculate

(5)(45)
5 · 8145060
P (X = 4) = f (4; 50, 5, 10) = (50)6
4
= = 0.003964583 . . . .
10
10272278170
Intuitively we would expect it to be even more unlikely for all 5 marbles to be green.

(5)(45)
1 · 1221759
P (X = 5) = f (5; 50, 5, 10) = (50)5
5
= = 0.0001189375 . . . ,
10
10272278170
40 CHAPTER 1. DISCRETE DISTRIBUTIONS - WITH FINITE SUPPORT

As expected, the probability of drawing 5 green marbles is roughly 35 times less likely than that of drawing 4.

Application to Texas Hold'em Poker

In Hold'em Poker players make the best hand they can combining the two cards in their hand with the 5 cards (community
cards) eventually turned up on the table. The deck has 52 and there are 13 of each suit. For this example assume a player
has 2 clubs in the hand and there are 3 cards showing on the table, 2 of which are also clubs. The player would like to
know the probability of one of the next 2 cards to be shown being a club to complete his flush.
There are 4 clubs showing so there are 9 still unseen. There are 5 cards showing (2 in the hand and 3 on the table) so
there are 52 − 5 = 47 still unseen.
The probability that one of the next two cards turned is a club can be calculated using hypergeometric with k = 1, n =
2, K = 9 and N = 47 . (about 31.6%)
The probability that both of the next two cards turned are clubs can be calculated using hypergeometric with k = 2, n =
2, K = 9 and N = 47 . (about 3.3%)
The probability that neither of the next two cards turned are clubs can be calculated using hypergeometric with k =
0, n = 2, K = 9 and N = 47 . (about 65.0%)

1.6.4 Symmetries
Swapping the roles of green and red marbles:

f (k; N, K, n) = f (n − k; N, N − K, n)

Swapping the roles of drawn and not drawn marbles:

f (k; N, K, n) = f (K − k; N, K, N − n)

Swapping the roles of green and drawn marbles:

f (k; N, K, n) = f (k; N, n, K)

1.6.5 Hypergeometric test


The hypergeometric test uses the hypergeometric distribution to measure the statistical significance of having drawn
a sample consisting of a specific number of k successes (out of n total draws) from a population of size N containing
K successes. In a test for over-representation of successes in the sample, the hypergeometric p-value is calculated as
the probability of randomly drawing k or more successes from the population in n total draws. In a test for under-
representation, the p-value is the probability of randomly drawing k or fewer successes.

Relationship to Fisher’s exact test

See also: Fisher’s noncentral hypergeometric distribution

The test based on the hypergeometric distribution (hypergeometric test) is identical to the corresponding one-tailed version
of Fisher’s exact test[2] ). Reciprocally, the p-value of a two-sided Fisher’s exact test can be calculated as the sum of two
appropriate hypergeometric tests (for more information see[3] ).
1.6. HYPERGEOMETRIC DISTRIBUTION 41

1.6.6 Order of draws

The probability of drawing any sequence of white and black marbles (the hypergeometric distribution) depends only on
the number of white and black marbles, not on the order in which they appear; i.e., it is an exchangeable distribution. As
a result, the probability of drawing a white marble in the ith draw is[4]

K
P (Wi ) = .
N

1.6.7 Related distributions

Let X ~ Hypergeometric( K , N , n ) and p = K/N .

• If n = 1 then X has a Bernoulli distribution with parameter p .

• Let Y have a binomial distribution with parameters n and p ; this models the number of successes in the analogous
sampling problem with replacement. If N and K are large compared to n , and p is not close to 0 or 1, then X and
Y have similar distributions, i.e., P (X ≤ k) ≈ P (Y ≤ k) .

• If n is large, N and K are large compared to n , and p is not close to 0 or 1, then

( )
k − np
P (X ≤ k) ≈ Φ √
np(1 − p)

where Φ is the standard normal distribution function

• If the probabilities to draw a white or black marble are not equal (e.g. because white marbles are bigger/easier to
grasp than black marbles) then X has a noncentral hypergeometric distribution

• The beta-binomial distribution is a conjugate prior for the hypergeometric distribution.

The following table describes four distributions related to the number of successes in a sequence of draws:

1.6.8 Multivariate hypergeometric distribution

The model of an urn with black and white marbles can be extended to the case where there are more than two colors of
marbles. If there are Kᵢ marbles of color i in the urn and you take n marbles at random without replacement, then the
number of marbles of each color in the sample (k1 ,k2 ,...,k ) has the multivariate hypergeometric distribution. This has the
same relationship to the multinomial distribution that the hypergeometric distribution has to the binomial distribution—
the multinomial distribution is the “with-replacement” distribution and the multivariate hypergeometric is the “without-
replacement” distribution.
The
∑c properties of this distribution are given in the adjacent table, where c is the number of different colors and N =
i=1 Ki is the total number of marbles.
42 CHAPTER 1. DISCRETE DISTRIBUTIONS - WITH FINITE SUPPORT

Example

Suppose there are 5 black, 10 white, and 15 red marbles in an urn. You reach in and randomly select six marbles without
replacement. What is the probability that you pick exactly two of each color?

(5)(10)(15)
P (2black , 2white , 2red ) = 2
(230) 2
= 0.079575596816976
6

Note: When picking the six marbles without replacement, the expected number of black marbles is 6×(5/30) = 1, the expected
number of white marbles is 6×(10/30) = 2, and the expected number of red marbles is 6×(15/30) = 3. This comes from
the expected value of a Binomial distribution, E(X) = np.

1.6.9 See also

• Multinomial distribution

• Sampling (statistics)

• Generalized hypergeometric function

• Coupon collector’s problem

• Geometric distribution

• Keno

1.6.10 Notes
[1] Rice, John A. (2007). Mathematical Statistics and Data Analysis (Third ed.). Duxbury Press. p. 42.

[2] Rivals, I.; Personnaz, L.; Taing, L.; Potier, M.-C (2007). “Enrichment or depletion of a GO category within a class of genes:
which test?". Bioinformatics 23 (4): 401–407. doi:10.1093/bioinformatics/btl633. PMID 17182697.

[3] K. Preacher and N. Briggs. “Calculation for Fisher’s Exact Test: An interactive calculation tool for Fisher’s exact probability
test for 2 x 2 tables (interactive page)".

[4] http://www.stat.yale.edu/~{}pollard/Courses/600.spring2010/Handouts/Symmetry%5BPolyaUrn%5D.pdf

1.6.11 References

• Berkopec, Aleš (2007). “HyperQuick algorithm for discrete hypergeometric distribution”. Journal of Discrete
Algorithms 5 (2): 341. doi:10.1016/j.jda.2006.01.001.

• Skala, M. (2011). “Hypergeometric tail inequalities: ending the insanity” (PDF). unpublished note

1.6.12 External links

• The Hypergeometric Distribution and Binomial Approximation to a Hypergeometric Random Variable by Chris
Boucher, Wolfram Demonstrations Project.

• Weisstein, Eric W., “Hypergeometric Distribution”, MathWorld.


1.7. POISSON BINOMIAL DISTRIBUTION 43

1.7 Poisson binomial distribution


In probability theory and statistics, the Poisson binomial distribution is the discrete probability distribution of a sum
of independent Bernoulli trials that are not necessarily identically distributed. The concept is named after Siméon Denis
Poisson.
In other words, it is the probability distribution of the number of successes in a sequence of n independent yes/no ex-
periments with success probabilities p1 , p2 , . . . , pn . The ordinary binomial distribution is a special case of the Poisson
binomial distribution, when all success probabilities are the same, that is p1 = p2 = · · · = pn .

1.7.1 Mean and variance


Since a Poisson binomial distributed variable is a sum of n independent Bernoulli distributed variables, its mean and
variance will simply be sums of the mean and variance of the n Bernoulli distributions:


n
µ= pi
i=1


n
σ2 = (1 − pi )pi
i=1
For fixed values of the mean ( µ ) and size (n), the variance is maximal when all success probabilities are equal and we
have a binomial distribution. When the mean is fixed, the variance is bounded from above by the variance of the Poisson
distribution with the same mean which is attained asymptotically as n tends to infinity.

1.7.2 Probability mass function


The probability of having k successful trials out of a total of n can be written as the sum [1]

∑ ∏ ∏
Pr(K = k) = pi (1 − pj )
A∈Fk i∈A j∈Ac

where Fk is the set of all subsets of k integers that can be selected from {1,2,3,...,n}. For example, if n = 3, then
F2 = {{1, 2}, {1, 3}, {2, 3}} . Ac is the complement of A , i.e. Ac = {1, 2, 3, . . . , n} \ A .
Fk will contain n!/((n − k)!k!) elements, the sum over which is infeasible to compute in practice unless the number
of trials n is small (e.g. if n = 30, F15 contains over 1020 elements). However, there are other, more efficient ways to
calculate Pr(K = k) .
As long as none of the success probabilities are equal to one, one can calculate the probability of k successes using the
recursive formula [2] [3]

n
 ∏
 (1 − pi )
 k=0
i=1
Pr(K = k) = ∑

1
k
 k (−1)i−1 Pr(K = k − i)T (i) k>0
i=1

where

∑n ( )i
pj
T (i) = .
j=1
1 − pj
44 CHAPTER 1. DISCRETE DISTRIBUTIONS - WITH FINITE SUPPORT

The recursive formula is not numerically stable, and should be avoided if n is greater than approximately 20. Another
possibility is using the discrete Fourier transform [4]

1 ∑ −lk ∏ ( )
n n
Pr(K = k) = C 1 + (C l − 1)pm
n+1 m=1
l=0

( ) √
where C = exp 2iπ
n+1 and i = −1 .

Still other methods are described in .[5]

1.7.3 Entropy

There is no simple formula for the entropy of a Poisson binomial distribution, but the entropy can be upper bounded by
that entropy of a binomial distribution with the same number parameter and the same mean. Therefore the entropy can
also be upper bounded by the entropy of a Poisson distribution with the same mean. [6]
The Shepp-Olkin conjecture, due to Lawrence Shepp and Ingram Olkin in 1981, states that the entropy of a Poisson
binomial distribution is a concave function of the success probabilities p1 , p2 , . . . , pn . [7]

1.7.4 See also

• Le Cam’s theorem

• Binomial distribution

• Poisson distribution

1.7.5 References

[1] Wang, Y. H. (1993). “On the number of successes in independent trials” (PDF). Statistica Sinica 3 (2): 295–312.

[2] Shah, B. K. (1994). “On the distribution of the sum of independent integer valued random variables”. American Statistician 27
(3): 123–124. JSTOR 2683639.

[3] Chen, X. H.; A. P. Dempster; J. S. Liu (1994). “Weighted finite population sampling to maximize entropy” (PDF). Biometrika
81 (3): 457. doi:10.1093/biomet/81.3.457.

[4] Fernandez, M.; S. Williams (2010). “Closed-Form Expression for the Poisson-Binomial Probability Density Function”. IEEE
Transactions on Aerospace Electronic Systems 46: 803–817. doi:10.1109/TAES.2010.5461658.

[5] Chen, S. X.; J. S. Liu (1997). “Statistical Applications of the Poisson-Binomial and conditional Bernoulli distributions”. Statis-
tica Sinica 7: 875–892.

[6] Harremoës, P. (2001). “Binomial and Poisson distributions as maximum entropy distributions” (PDF). IEEE Transactions on
Information Theory 47 (5): 2039–2041. doi:10.1109/18.930936.

[7] Shepp, Lawrence; Olkin, Ingram (1981). “Entropy of the sum of independent Bernoulli random variables and of the multinomial
distribution”. In Gani, J.; Rohatgi, V.K. Contributions to probability: A collection of papers dedicated to Eugene Lukacs. New
York,: Academic Press,. pp. 201–206. ISBN 0-12-274460-8. MR 0618689.
1.8. FISHER’S NONCENTRAL HYPERGEOMETRIC DISTRIBUTION 45

1.8 Fisher’s noncentral hypergeometric distribution


In probability theory and statistics, Fisher’s noncentral hypergeometric distribution is a generalization of the hypergeometric
distribution where sampling probabilities are modified by weight factors. It can also be defined as the conditional distri-
bution of two or more binomially distributed variables dependent upon their fixed sum.
The distribution may be illustrated by the following urn model. Assume, for example, that an urn contains m1 red balls
and m2 white balls, totalling N = m1 + m2 balls. Each red ball has the weight ω1 and each white ball has the weight ω2 .
We will say that the odds ratio is ω = ω1 / ω2 . Now we are taking balls randomly in such a way that the probability of
taking a particular ball is proportional to its weight, but independent of what happens to the other balls. The number of
balls taken of a particular color follows the binomial distribution. If the total number n of balls taken is known then the
conditional distribution of the number of taken red balls for given n is Fisher’s noncentral hypergeometric distribution.
To generate this distribution experimentally, we have to repeat the experiment until it happens to give n balls.
If we want to fix the value of n prior to the experiment then we have to take the balls one by one until we have n balls.
The balls are therefore no longer independent. This gives a slightly different distribution known as Wallenius’ noncentral
hypergeometric distribution. It is far from obvious why these two distributions are different. See the entry for noncentral
hypergeometric distributions for an explanation of the difference between these two distributions and a discussion of
which distribution to use in various situations.
The two distributions are both equal to the (central) hypergeometric distribution when the odds ratio is 1.
Unfortunately, both distributions are known in the literature as “the” noncentral hypergeometric distribution. It is impor-
tant to be specific about which distribution is meant when using this name.
Fisher’s noncentral hypergeometric distribution was first given the name extended hypergeometric distribution (Hark-
ness, 1965), and some authors still use this name today.

1.8.1 Univariate distribution


The probability function, mean and variance are given in the table to the right.
An alternative expression of the distribution has both the number of balls taken of each color and the number of balls not
taken as random variables, whereby the expression for the probability becomes symmetric.
The calculation time for the probability function can be high when the sum in P 0 has many terms. The calculation time
can be reduced by calculating the terms in the sum recursively relative to the term for y = x and ignoring negligible terms
in the tails (Liao and Rosen, 2001).
The mean can be approximated by:

−2c
µ≈ √
b − b2 − 4ac
where a = ω − 1 , b = m1 + n − N − (m1 + n)ω , c = m1 nω .
The variance can be approximated by:

/( )
N 1 1 1 1
σ2 ≈ + + +
N −1 µ m1 − µ n − µ µ + m2 − n
Better approximations to the mean and variance are given by Levin (1984, 1990), McCullagh and Nelder (1989), Liao
(1992), and Eisinga and Pelzer (2011). The saddlepoint methods to approximate the mean and the variance suggested
Eisinga and Pelzer (2011) offer extremely accurate results.

Properties

The following symmetry relations apply:


46 CHAPTER 1. DISCRETE DISTRIBUTIONS - WITH FINITE SUPPORT

fnchypg(x; n, m1 , N, ω) = fnchypg(n − x; n, m2 , N, 1/ω) .

fnchypg(x; n, m1 , N, ω) = fnchypg(x; m1 , n, N, ω) .

fnchypg(x; n, m1 , N, ω) = fnchypg(m1 − x; N − n, m1 , N, 1/ω) .

Recurrence relation:

(m1 − x + 1)(n − x + 1)
fnchypg(x; n, m1 , N, ω) = fnchypg(x − 1; n, m1 , N, ω) ω.
x(m2 − n + x)

Recurrence relation

A Fisher hypergeometric distribution gives the distribution of the number of successes in n independent draws from a
population of size ntot containing nsucc successes with the odds ratio w .

{ }
wf (x)(x − n)(nsucc − x) − (x + 1)f (x + 1)(n + nsucc − ntot − x − 1) = 0,
1
f (0) = 2 F1 (−n,−nsucc ;−n−nsucc +ntot +1;w)

1.8.2 Multivariate distribution

The distribution can be expanded to any number of colors c of balls in the urn. The multivariate distribution is used when
there are more than two colors.
The probability function and a simple approximation to the mean are given to the right. Better approximations to the
mean and variance are given by McCullagh and Nelder (1989).

Properties

The order of the colors is arbitrary so that any colors can be swapped.
The weights can be arbitrarily scaled:

mfnchypg(x; n, m, ω) = mfnchypg(x; n, m, rω) for all r ∈ R+ .

Colors with zero number (mi = 0) or zero weight (ωi = 0) can be omitted from the equations.
Colors with the same weight can be joined:

mfnchypg (x; n, m, (ω1 , . . . , ωc−1 , ωc−1 ))


= mfnchypg ((x1 , . . . , xc−1 + xc ); n, (m1 , . . . , mc−1 + mc ), (ω1 , . . . , ωc−1 )) ·
hypg(xc ; xc−1 + xc , mc , mc−1 + mc )

where hypg(x; n, m, N ) is the (univariate, central) hypergeometric distribution probability.


1.8. FISHER’S NONCENTRAL HYPERGEOMETRIC DISTRIBUTION 47

1.8.3 Applications
Fisher’s noncentral hypergeometric distribution is useful for models of biased sampling or biased selection where the
individual items are sampled independently of each other with no competition. The bias or odds can be estimated from
an experimental value of the mean. Use Wallenius’ noncentral hypergeometric distribution instead if items are sampled
one by one with competition.
Fisher’s noncentral hypergeometric distribution is used mostly for tests in contingency tables where a conditional distri-
bution for fixed margins is desired. This can be useful, for example, for testing or measuring the effect of a medicine.
See McCullagh and Nelder (1989).

1.8.4 Software available


• FisherHypergeometricDistribution in Mathematica.

• An implementation for the R programming language is available as the package named BiasedUrn. Includes uni-
variate and multivariate probability mass functions, distribution functions, quantiles, random variable generating
functions, mean and variance.

• The R package MCMCpack includes the univariate probability mass function and random variable generating
function.

• SAS System includes univariate probability mass function and distribution function.

• Implementation in C++ is available from www.agner.org.

• Calculation methods are described by Liao and Rosen (2001) and Fog (2008).

1.8.5 See also


• Noncentral hypergeometric distributions

• Wallenius’ noncentral hypergeometric distribution

• Hypergeometric distribution

• Urn models

• Biased sample

• Bias

• Contingency table

• Fisher’s exact test

1.8.6 References
Breslow, N. E.; Day, N. E. (1980), Statistical Methods in Cancer Research, Lyon: International Agency for Research on
Cancer.
Eisinga, R.; Pelzer, B. (2011), “Saddlepoint approximations to the mean and variance of the extended hypergeometric
distribution”, Statistica Neerlandica 65 (1): 22–31, doi:10.1111/j.1467-9574.2010.00468.x.
Fog, A. (2007), Random number theory.
Fog, A. (2008), “Sampling Methods for Wallenius’ and Fisher’s Noncentral Hypergeometric Distributions”, Communi-
cations in statictics, Simulation and Computation 37 (2): 241–257, doi:10.1080/03610910701790236.
48 CHAPTER 1. DISCRETE DISTRIBUTIONS - WITH FINITE SUPPORT

Johnson, N. L.; Kemp, A. W.; Kotz, S. (2005), Univariate Discrete Distributions, Hoboken, New Jersey: Wiley and Sons.
Levin, B. (1984), “Simple Improvements on Cornfield’s approximation to the mean of a noncentral Hypergeometric
random variable”, Biometrika 71 (3): 630–632, doi:10.1093/biomet/71.3.630.
Levin, B. (1990), “The saddlepoint correction in conditional logistic likelihood analysis”, Biometrika 77 (2): 275–285,
JSTOR 2336805.
Liao, J. (1992), “An Algorithm for the Mean and Variance of the Noncentral Hypergeometric Distribution”, Biometrics
48 (3): 889–892, doi:10.2307/2532354, JSTOR 2532354.
Liao, J. G.; Rosen, O. (2001), “Fast and Stable Algorithms for Computing and Sampling from the Noncentral Hyperge-
ometric Distribution”, The American Statistician 55 (4): 366–369, doi:10.1198/000313001753272547.
McCullagh, P.; Nelder, J. A. (1989), Generalized Linear Models, 2. ed., London: Chapman and Hall.

1.9 Wallenius’ noncentral hypergeometric distribution


In probability theory and statistics, Wallenius’ noncentral hypergeometric distribution (named after Kenneth Ted
Wallenius) is a generalization of the hypergeometric distribution where items are sampled with bias.
This distribution can be illustrated as an urn model with bias. Assume, for example, that an urn contains m1 red balls and
m2 white balls, totalling N = m1 + m2 balls. Each red ball has the weight ω1 and each white ball has the weight ω2 . We
will say that the odds ratio is ω = ω1 / ω2 . Now we are taking n balls, one by one, in such a way that the probability of
taking a particular ball at a particular draw is equal to its proportion of the total weight of all balls that lie in the urn at
that moment. The number of red balls x1 that we get in this experiment is a random variable with Wallenius’ noncentral
hypergeometric distribution.
The matter is complicated by the fact that there is more than one noncentral hypergeometric distribution. Wallenius’
noncentral hypergeometric distribution is obtained if balls are sampled one by one in such a way that there is competition
between the balls. Fisher’s noncentral hypergeometric distribution is obtained if the balls are sampled simultaneously or
independently of each other. Unfortunately, both distributions are known in the literature as “the” noncentral hypergeo-
metric distribution. It is important to be specific about which distribution is meant when using this name.
The two distributions are both equal to the (central) hypergeometric distribution when the odds ratio is 1.
It is far from obvious why these two distributions are different. See the Wikipedia entry on noncentral hypergeometric
distributions for a more detailed explanation of the difference between these two probability distributions.

1.9.1 Univariate distribution

Wallenius’ distribution is particularly complicated because each ball has a probability of being taken that depends not
only on its weight, but also on the total weight of its competitors. And the weight of the competing balls depends on the
outcomes of all preceding draws.
This recursive dependency gives rise to a difference equation with a solution that is given in open form by the integral in
the expression of the probability mass function in the table above.
Closed form expressions for the probability mass function exist (Lyons, 1980), but they are not very useful for practical
calculations because of extreme numerical instability, except in degenerate cases.
Several other calculation methods are used, including recursion, Taylor expansion and numerical integration (Fog, 2007,
2008).
The most reliable calculation method is recursive calculation of f(x,n) from f(x,n−1) and f(x−1,n−1) using the recursion
formula given below under properties. The probabilities of all (x,n) combinations on all possible trajectories leading to the
desired point are calculated, starting with f(0,0) = 1 as shown on the figure to the right. The total number of probabilities to
calculate is n(x+1)-x2 . Other calculation methods must be used when n and x are so big that this method is too inefficient.
1.9. WALLENIUS’ NONCENTRAL HYPERGEOMETRIC DISTRIBUTION 49

The probability that all balls have the same color is easier to calculate. See the formula below under multivariate distri-
bution.
No exact formula for the mean is known (short of complete enumeration of all probabilities). The equation given above
is reasonably accurate. This equation can be solved for μ by Newton-Raphson iteration. The same equation can be used
for estimating the odds from an experimentally obtained value of the mean.

Properties of the univariate distribution

Wallenius’ distribution has fewer symmetry relations than Fisher’s noncentral hypergeometric distribution has. The only
symmetry relates to the swapping of colors:

wnchypg(x; n, m1 , m2 , ω) = wnchypg(n − x; n, m2 , m1 , 1/ω) .

Unlike Fisher’s distribution, Wallenius’ distribution has no symmetry relating to the number of balls not taken.
The following recursion formula is useful for calculating probabilities:

wnchypg(x; n, m1 , m2 , ω) =

(m1 − x + 1)ω
wnchypg(x − 1; n − 1, m1 , m2 , ω) +
(m1 − x + 1)ω + m2 + x − n

m2 + x − n + 1
wnchypg(x; n − 1, m1 , m2 , ω)
(m1 − x)ω + m2 + x − n + 1

Another recursion formula is also known:

wnchypg(x; n, m1 , m2 , ω) =
m1 ω
wnchypg(x − 1; n − 1, m1 − 1, m2 , ω) +
m1 ω + m2

m2
wnchypg(x; n − 1, m1 , m2 − 1, ω) .
m1 ω + m2

The probability is limited by

f1 (x) ≤ wnchypg(x; n, m1 , m2 , ω) ≤ f2 (x) , for ω < 1 ,

f1 (x) ≥ wnchypg(x; n, m1 , m2 , ω) ≥ f2 (x) , for ω > 1 , where


( )( )
m1 m2 n!
f1 (x) =
x n − x (m1 + m2 /ω)x (m2 + ω(m1 − x))n−x
( )( )
m1 m2 n!
f2 (x) = ,
x n − x (m1 + (m2 − x2 )/ω)x (m2 + ωm1 )n−x

where the underlined superscript indicates the falling factorial ab = a(a − 1) . . . (a − b + 1) .


50 CHAPTER 1. DISCRETE DISTRIBUTIONS - WITH FINITE SUPPORT

1.9.2 Multivariate distribution


The distribution can be expanded to any number of colors c of balls in the urn. The multivariate distribution is used when
there are more than two colors.
The probability mass function can be calculated by various Taylor expansion methods or by numerical integration (Fog,
2008).
The probability that all balls have the same color, j, can be calculated as:

n
mj
mwnchypg((0, . . . , 0, xj , 0, . . .); n, m, ω) = ( ∑c )n
1
ωj i=1 mi ωi

for x = n ≤ m , where the underlined superscript denotes the falling factorial.


A reasonably good approximation to the mean can be calculated using the equation given above. The equation can be
solved by defining θ so that

µi = mi (1 − eωi θ )

and solving


c
µi = n
i=1

for θ by Newton-Raphson iteration.


The equation for the mean is also useful for estimating the odds from experimentally obtained values for the mean.
No good way of calculating the variance is known. The best known method is to approximate the multivariate Wallenius
distribution by a multivariate Fisher’s noncentral hypergeometric distribution with the same mean, and insert the mean as
calculated above in the approximate formula for the variance of the latter distribution.

Properties of the multivariate distribution

The order of the colors is arbitrary so that any colors can be swapped.
The weights can be arbitrarily scaled:

mwnchypg(x; n, m, ω) = mwnchypg(x; n, m, rω) for all r ∈ R+ .

Colors with zero number (mᵢ = 0) or zero weight (ωᵢ = 0) can be omitted from the equations.
Colors with the same weight can be joined:

mwnchypg (x; n, m, (ω1 , . . . , ωc−1 , ωc−1 )) =

mwnchypg ((x1 , . . . , xc−1 + xc ); n, (m1 , . . . , mc−1 + mc ), (ω1 , . . . , ωc−1 )) ·

hypg(xc ; xc−1 + xc , mc , mc−1 + mc ) ,

where hypg(x; n, m, N ) is the (univariate, central) hypergeometric distribution probability.


1.9. WALLENIUS’ NONCENTRAL HYPERGEOMETRIC DISTRIBUTION 51

1.9.3 Complementary Wallenius’ noncentral hypergeometric distribution

The balls that are not taken in the urn experiment have a distribution that is different from Wallenius’ noncentral hyperge-
ometric distribution, due to a lack of symmetry. The distribution of the balls not taken can be called the complementary
Wallenius’ noncentral hypergeometric distribution.
Probabilities in the complementary distribution are calculated from Wallenius’ distribution by replacing n with N-n, xᵢ
with mᵢ - xᵢ, and ωᵢ with 1/ωᵢ.

1.9.4 Software available

• WalleniusHypergeometricDistribution in Mathematica.

• An implementation for the R programming language is available as the package named BiasedUrn. Includes uni-
variate and multivariate probability mass functions, distribution functions, quantiles, random variable generating
functions, mean and variance.

• Implementation in C++ is available from www.agner.org.

1.9.5 See also

• Noncentral hypergeometric distributions

• Fisher’s noncentral hypergeometric distribution

• Biased sample

• Bias

• Population genetics

• Fisher’s exact test

1.9.6 References

Chesson, J. (1976), “A non-central multivariate hypergeometric distribution arising from biased sampling with application
to selective predation”, Journal of Applied Probability (Applied Probability Trust) 13 (4): 795–797, doi:10.2307/3212535,
JSTOR 3212535.
Fog, A. (2007), Random number theory.
Fog, A. (2008), “Calculation Methods for Wallenius’ Noncentral Hypergeometric Distribution”, Communications in stat-
ictics, Simulation and Computation 37 (2): 258–273, doi:10.1080/03610910701790269.
Johnson, N. L.; Kemp, A. W.; Kotz, S. (2005), Univariate Discrete Distributions, Hoboken, New Jersey: Wiley and Sons.
Lyons, N. I. (1980), “Closed Expressions for Noncentral Hypergeometric Probabilities”, Communications in statictics, B
9: 313–314.
Manly, B. F. J. (1974), “A Model for Certain Types of Selection Experiments”, Biometrics (International Biometric
Society) 30 (2): 281–294, doi:10.2307/2529649, JSTOR 2529649.
Wallenius, K. T. (1963), Biased Sampling: The Non-central Hypergeometric Probability Distribution. Ph.D. Thesis, Stan-
ford University, Department of Statistics.
52 CHAPTER 1. DISCRETE DISTRIBUTIONS - WITH FINITE SUPPORT

1.10 Benford’s law

Not to be confused with the unrelated adage Benford’s law of controversy.


Benford’s law, also called the First-Digit Law, is a phenomenological law about the frequency distribution of leading
digits in many (but not all) real-life sets of numerical data. That law states that in many naturally occurring collections
of numbers the small digits occur disproportionately often as leading significant digits.[1] For example, in sets which obey
the law the number 1 would appear as the most significant digit about 30% of the time, while larger digits would occur in
that position less frequently: 9 would appear less than 5% of the time. If all digits were distributed uniformly, they would
each occur about 11.1% of the time.[2] Benford’s law also concerns the expected distribution for digits beyond the first,
which approach a uniform distribution.
It has been shown that this result applies to a wide variety of data sets, including electricity bills, street addresses, stock
prices, population numbers, death rates, lengths of rivers, physical and mathematical constants,[3] and processes described
by power laws (which are very common in nature). It tends to be most accurate when values are distributed across multiple
orders of magnitude.
The graph here shows Benford’s law for base 10. There is a generalization of the law to numbers expressed in other bases
(for example, base 16), and also a generalization from leading 1 digit to leading n digits.
It is named after physicist Frank Benford, who stated it in 1938,[4] although it had been previously stated by Simon
Newcomb in 1881.[5]

1.10.1 Mathematical statement

A set of numbers is said to satisfy Benford’s law if the leading digit d (d ∈ {1, ..., 9}) occurs with probability

( ) ( )
d+1 1
P (d) = log10 (d + 1) − log10 (d) = log10 = log10 1 + .
d d

Numerically, the leading digits have the following distribution in Benford’s law, where d is the leading digit and P(d) the
probability:
The quantity P(d) is proportional to the space between d and d + 1 on a logarithmic scale. Therefore, this is the distribution
expected if the mantissae of the logarithms of the numbers (but not the numbers themselves) are uniformly and randomly
distributed. For example, a number x, constrained to lie between 1 and 10, starts with the digit 1 if 1 ≤ x < 2, and starts
with the digit 9 if 9 ≤ x < 10. Therefore, x starts with the digit 1 if log 1 ≤ log x < log 2, or starts with 9 if log 9 ≤ log x <
log 10. The interval [log 1, log 2] is much wider than the interval [log 9, log 10] (0.30 and 0.05 respectively); therefore
if log x is uniformly and randomly distributed, it is much more likely to fall into the wider interval than the narrower
interval, i.e. more likely to start with 1 than with 9. The probabilities are proportional to the interval widths, and this
gives the equation above. (The above discussion assumed x is between 1 and 10, but the result is the same no matter how
many digits x has before the decimal point.)
An extension of Benford’s law predicts the distribution of first digits in other bases besides decimal; in fact, any base b ≥
2. The general form is:

( )
1
P (d) = logb (d + 1) − logb (d) = logb 1 + .
d

For b = 2 (the binary number system), Benford’s law is true but trivial: All binary numbers (except for 0) start with the
digit 1. (On the other hand, the generalization of Benford’s law to second and later digits is not trivial, even for binary
numbers.) Also, Benford’s law does not apply to unary systems such as tally marks.
1.10. BENFORD’S LAW 53

1.10.2 Example
Examining a list of the heights of the 60 tallest structures in the world by category shows that 1 is by far the most common
leading digit, irrespective of the unit of measurement:

1.10.3 History
The discovery of Benford’s law goes back to 1881, when the American astronomer Simon Newcomb noticed that in
logarithm tables the earlier pages (that started with 1) were much more worn than the other pages.[5] Newcomb’s published
result is the first known instance of this observation and includes a distribution on the second digit, as well. Newcomb
proposed a law that the probability of a single number N being the first digit of a number was equal to log(N + 1) − log(N).
The phenomenon was again noted in 1938 by the physicist Frank Benford,[4] who tested it on data from 20 different
domains and was credited for it. His data set included the surface areas of 335 rivers, the sizes of 3259 US populations,
104 physical constants, 1800 molecular weights, 5000 entries from a mathematical handbook, 308 numbers contained in
an issue of Reader’s Digest, the street addresses of the first 342 persons listed in American Men of Science and 418 death
rates. The total number of observations used in the paper was 20,229. This discovery was later named after Benford
(making it an example of Stigler’s Law).
In 1995, Ted Hill proved the result about mixed distributions mentioned below.[6]

1.10.4 Explanations
Arno Berger and Ted Hill have stated that, “The widely known phenomenon called Benford’s law continues to defy
attempts at an easy derivation”.[1]
However, limited explanations of Benford’s law have been offered.

Overview

Benford’s law states that the fractional part of the logarithm of the data is uniformly distributed between 0 and 1. It tends
to apply most accurately to data that are distributed uniformly across many orders of magnitude. As a rule, the more
orders of magnitude that the data evenly covers, the more accurately Benford’s law applies.
For instance, one can expect that Benford’s law would apply to a list of numbers representing the populations of UK
villages, or representing the values of small insurance claims. But if a “village” is defined as a settlement with population
between 300 and 999, or a “small insurance claim” is defined as a claim between $50 and $99, then Benford’s law will
not apply.[7][8]
Consider the probability distributions shown below, referenced to a log scale.[9] In each case, the total area in red is the
relative probability that the first digit is 1, and the total area in blue is the relative probability that the first digit is 8.
For the left distribution, the size of the areas of red and blue are approximately proportional to the widths of each red
and blue bar. Therefore the numbers drawn from this distribution will approximately follow Benford’s law. On the other
hand, for the right distribution, the ratio of the areas of red and blue is very different from the ratio of the widths of each
red and blue bar. Rather, the relative areas of red and blue are determined more by the height of the bars than the widths.
Accordingly, the first digits in this distribution do not satisfy Benford’s law at all.[8]
Thus, real-world distributions that span several orders of magnitude rather uniformly (e.g. populations of villages / towns
/ cities, stock-market prices), are likely to satisfy Benford’s law to a very high accuracy. On the other hand, a distribution
that is mostly or entirely within one order of magnitude (e.g. heights of human adults, or IQ scores) is unlikely to
satisfy Benford’s law very accurately, or at all.[7][8] However, it is not a sharp line: As the distribution gets narrower, the
discrepancies from Benford’s law typically increase gradually.
In terms of conventional probability density (referenced to a linear scale rather than log scale, i.e. P(x)dx rather than P(log
x) d(log x)), the equivalent criterion is that Benford’s law will be very accurately satisfied when P(x) is approximately
proportional to 1/x over several orders-of-magnitude variation in x.[9]
54 CHAPTER 1. DISCRETE DISTRIBUTIONS - WITH FINITE SUPPORT

This discussion is not a full explanation of Benford’s law, because we have not explained why we so often come across
data-sets that, when plotted as a probability distribution of the logarithm of the variable, are relatively uniform over several
orders of magnitude.[10] The following sections give examples of how this might happen.

Outcomes of exponential growth processes

Here is a simple example where Benford’s law would occur. 1000 cells of bacteria are introduced into a dish full of food.
The number of bacteria grows exponentially, doubling each day. Every few hours for 30 days, one counts the number
of bacteria that are in the dish, and writes down that number on a list. (By the end of 30 days, there will be a trillion
bacteria.) Then this list of numbers will follow Benford’s law quite accurately.
Why? Remember, the number of bacteria is growing exponentially, doubling each day. On the first day, the number of
bacteria is increasing from 1000 towards 2000: The first digit is 1 the whole day. On the second day, there are 2000
bacteria increasing towards 4000: The first digit is 2 for fourteen hours and 3 for six hours. On the third day, there are
4000 bacteria increasing towards 8000: The first digit will pass through 4, 5, 6, and 7, spending less and less time in each
digit. The next day, there are 8000 bacteria increasing towards 16,000. The leading digit will pass rapidly through 8 and
9 in a few hours, but then once there are 10,000 bacteria, the first digit will be 1 for a whole 24 hours, until the number
of bacteria gets to 20,000.
From this example, it can be seen that the first digit is 1 with the highest probability, and 9 with the lowest.
Another way to think about it is: An exponentially-growing quantity is moving rightward on a log-scale at a constant rate.
If we measure the number of bacteria at a random time in the 30-day window, we will get a random point on the log-scale,
uniformly distributed in that corresponding window (about 6 orders of magnitude). As explained in the previous section,
we expect this kind of probability distribution to satisfy Benford’s law with high accuracy.
This example makes it plausible that data tables that involve measurements of exponentially growing quantities will agree
with Benford’s Law. But the law also describes many data-sets which do not have any apparent relation to exponential
growth.

Multiplicative fluctuations

Many real-world examples of Benford’s law arise from multiplicative fluctuations.[11] For example, if a stock price starts
at $100, and then each day it gets multiplied by a randomly-chosen factor between 0.99 and 1.01, then over an extended
period of time the probability distribution of its price satisfies Benford’s law with higher and higher accuracy.
The reason is that the logarithm of the stock price is undergoing a random walk, so over time its probability distribution will
get more and more broad and uniform (see above).[11] (More technically, the central limit theorem says that multiplying
more and more random variables will create a log-normal distribution with larger and larger variance, so eventually it
covers many orders of magnitude almost uniformly.)
Unlike multiplicative fluctuations, additive fluctuations do not lead to Benford’s law: They lead instead to normal prob-
ability distributions (again by the central limit theorem), which do not satisfy Benford’s law. For example, the “number
of heartbeats that I experience on a given day” can be written as the sum of many random variables (e.g. the sum of
heartbeats per minute over all the minutes of the day), so this quantity is unlikely to follow Benford’s law. By contrast,
that hypothetical stock price described above can be written as the product of many random variables (i.e. the price
change factor for each day), so is likely to follow Benford’s law quite well.

Scale invariance

If there is a list of lengths, the distribution of first digits of numbers in the list may be generally similar regardless of
whether all the lengths are expressed in metres, or yards, or feet, or inches, etc.
This is not always the case. For example, the height of adult humans almost always starts with a 1 or 2 when measured in
meters, and almost always starts with 4, 5, 6, or 7 when measured in feet.
But consider a list of lengths that is spread evenly over many orders of magnitude. For example, a list of 1000 lengths
1.10. BENFORD’S LAW 55

mentioned in scientific papers will include the measurements of molecules, bacteria, plants, and galaxies. If one writes
all those lengths in meters, or writes them all in feet, it is reasonable to expect that the distribution of first digits should
be the same on the two lists.
In these situations, where the distribution of first digits of a data set is scale invariant (or independent of the units that the
data are expressed in), the distribution of first digits is always given by Benford’s Law.[12][13] To be sure of approximate
agreement with Benford’s Law, the data has to be approximately invariant when scaled up by any factor up to 10. A
lognormally distributed data set with wide dispersion has this approximate property, as do some of the examples mentioned
above.
For example, the first (non-zero) digit on this list of lengths should have the same distribution whether the unit of mea-
surement is feet or yards. But there are three feet in a yard, so the probability that the first digit of a length in yards is
1 must be the same as the probability that the first digit of a length in feet is 3, 4, or 5. Applying this to all possible
measurement scales gives the logarithmic distribution of Benford’s law.

Multiple probability distributions

For numbers drawn from certain distributions (IQ scores, human heights) the Law fails to hold because these variates
obey a normal distribution which is known not to satisfy Benford’s law,[14] since normal distributions can't span several
orders of magnitude and the mantissae of their logarithms will not be (even approximately) uniformly distributed.
However, if one “mixes” numbers from those distributions, for example by taking numbers from newspaper articles,
Benford’s law reappears. This can also be proven mathematically: if one repeatedly “randomly” chooses a probability
distribution (from an uncorrelated set) and then randomly chooses a number according to that distribution, the resulting
list of numbers will obey Benford’s Law.[6][15] A similar probabilistic explanation for the appearance of Benford’s Law
in everyday-life numbers has been advanced by showing that it arises naturally when one considers mixtures of uniform
distributions.[16]

1.10.5 Applications

Accounting fraud detection

In 1972, Hal Varian suggested that the law could be used to detect possible fraud in lists of socio-economic data submitted
in support of public planning decisions. Based on the plausible assumption that people who make up figures tend to
distribute their digits fairly uniformly, a simple comparison of first-digit frequency distribution from the data with the
expected distribution according to Benford’s Law ought to show up any anomalous results.[17] Following this idea, Mark
Nigrini showed that Benford’s Law could be used in forensic accounting and auditing as an indicator of accounting and
expenses fraud.[18] In practice, applications of Benford’s Law for fraud detection routinely use more than the first digit.[18]

Legal status

In the United States, evidence based on Benford’s law has been admitted in criminal cases at the federal, state, and local
levels.[19]

Election data

Benford’s Law has been invoked as evidence of fraud in the 2009 Iranian elections,[20] and also used to analyze other
election results. However, other experts consider Benford’s Law essentially useless as a statistical indicator of election
fraud in general.[21][22]
56 CHAPTER 1. DISCRETE DISTRIBUTIONS - WITH FINITE SUPPORT

Macroeconomic data

Similarly, the macroeconomic data the Greek government reported to the European Union before entering the eurozone
was shown to be probably fraudulent using Benford’s law, albeit years after the country joined.[23]

Genome data

The number of open reading frames and their relationship to genome size differs between eukaryotes and prokaryotes
with the former showing a log-linear relationship and the latter a linear relationship. Benford’s law has been used to test
this observation with an excellent fit to the data in both cases.[24]

Scientific fraud detection

A test of regression coefficients in published papers showed agreement with Benford’s law.[25] As a comparison group
subjects were asked to fabricate statistical estimates. The fabricated results failed to obey Benford’s law.

1.10.6 Statistical tests


Although the chi squared test has been used to test for compliance with Benford’s law it has low statistical power when
used with small samples.
The Kolmogorov–Smirnov test and the Kuiper test are more powerful when the sample size is small particularly when
Stephens’s corrective factor is used.[26] These tests may be overly conservative when applied to discrete distribution.
Values for the Benford test have been generated by Morrow.[27] The critical values of the test statistics are shown below:
Two alternative tests specific to this law have been published: first, the max (m) statistic[28] is given by

√ { }
m= N · * max9i=1 Pr(XFSD has = i) − log10 (1 + 1/i) ,

and secondly, the distance (d) statistic[29] is given by

v
u 9 [ ]2
u ∑
d = tN · Pr(XFSD has = i) − log10 (1 + 1/i) ,
i=1

where FSD is the First Significant Digit and N is the sample size. Morrow has determined the critical values for both
these statistics, which are shown below:[27]
Nigrini[30] has suggested the use of a z statistic

|po − pe | − 1
2n
z=
si
with

[ ]1/2
pe (1 − pe )
si = ,
n

where |x| is the absolute value of x, n is the sample size, 1/(2n) is a continuity correction factor, pₑ is the proportion
expected from Benford’s law and pₒ is the observed proportion in the sample.
1.10. BENFORD’S LAW 57

Morrow has also shown that for any random variable X (with a continuous pdf) divided by its standard deviation (σ), a
value A can be found such that the probability of the distribution of the first significant digit of the random variable ( X /
σ )A will differ from Benford’s Law by less than ε > 0.[27] The value of A depends on the value of ε and the distribution
of the random variable.
A method of accounting fraud detection based on bootstrapping and regression has been proposed.[31]

1.10.7 Generalization to digits beyond the first

It is possible to extend the law to digits beyond the first.[32] In particular, the probability of encountering a number starting
with the string of digits n is given by:

( )
1
log10 (n + 1) − log10 (n) = log10 1+
n

(For example, the probability that a number starts with the digits 3, 1, 4 is log10 (1 + 1/314) ≈ 0.0014.) This result can be
used to find the probability that a particular digit occurs at a given position within a number. For instance, the probability
that a “2” is encountered as the second digit is[32]

( ) ( ) ( )
1 1 1
log10 1 + + log10 1 + + · · · + log10 1 + ≈ 0.109
12 22 92

And the probability that d (d = 0, 1, ..., 9) is encountered as the n-th (n > 1) digit is

∑−1
10n−1 ( )
1
log10 1 +
10k + d
k=10n−2

The distribution of the n-th digit, as n increases, rapidly approaches a uniform distribution with 10% for each of the ten
digits.[32] Four digits is often enough to assume a uniform distribution of 10% as '0' appears 10.0176% of the time in the
fourth digit while '9' appears 9.9824% of the time.

1.10.8 Tests with common distributions

Benford’s law was empirically tested against the numbers (up to the 10th digit) generated by a number of important
distributions, including the uniform distribution, the exponential distribution, the half-normal distribution, the right-
truncated normal, the normal distribution, the chi square distribution and the log normal distribution. [14] In addition to
these the ratio distribution of two uniform distributions, the ratio distribution of two exponential distributions, the ratio
distribution of two half-normal distributions, the ratio distribution of two right-truncated normal distributions, the ratio
distribution of two chi-square distributions (the F distribution) and the log normal distribution were tested.
The uniform distribution as might be expected does not obey Benford’s law. In contrast, the ratio distribution of two
uniform distributions is well described by Benford’s law. Benford’s law also describes the exponential distribution and the
ratio distribution of two exponential distributions well. Although the half-normal distribution does not obey Benford’s law,
the ratio distribution of two half-normal distributions does. Neither the right-truncated normal distribution nor the ratio
distribution of two right-truncated normal distributions are well described by Benford’s law. This is not surprising as this
distribution is weighted towards larger numbers. Neither the normal distribution nor the ratio distribution of two normal
distributions (the Cauchy distribution) obey Benford’s law. The fit of chi square distribution depends on the degrees
of freedom (df) with good agreement with df = 1 and decreasing agreement as the df increases. The F distribution is
fitted well for low degrees of freedom. With increasing dfs the fit decreases but much more slowly than the chi square
distribution. The fit of the log-normal distribution depends on the mean and the variance of the distribution. The variance
58 CHAPTER 1. DISCRETE DISTRIBUTIONS - WITH FINITE SUPPORT

has a much greater effect on the fit than does the mean. Larger values of both parameters result in better agreement with
the law. The ratio of two log normal distributions is a log normal so this distribution was not examined.
Other distributions that have been examined include the Muth distribution, Gompertz distribution, Weibull distribution,
gamma distribution, log-logistic distribution and the exponential power distribution all of which show reasonable agree-
ment with the law.[28][33] The Gumbel distribution – a density increases with increasing value of the random variable –
does not show agreement with this law.[33]

1.10.9 Distributions known to obey Benford’s law


Some well-known infinite integer sequences provably satisfy Benford’s Law exactly (in the asymptotic limit as more and
more terms of the sequence are included). Among these are the Fibonacci numbers,[34][35] the factorials,[36] the powers
of 2,[37][38] and the powers of almost any other number.[37]
Likewise, some continuous processes satisfy Benford’s Law exactly (in the asymptotic limit as the process continues
through time). One is an exponential growth or decay process: If a quantity is exponentially increasing or decreasing in
time, then the percentage of time that it has each first digit satisfies Benford’s Law asymptotically (i.e. increasing accuracy
as the process continues through time).

1.10.10 Distributions known to not obey Benford’s law


Square roots and reciprocals do not obey this law.[39] The 1974 Vancouver, Canada telephone book violates Benford’s
law because regulations require that telephone numbers have a fixed number of digits and do not begin with 1. Benford’s
law is violated by the populations of all places with population at least 2500 from five US states according to the 1960 and
1970 censuses, where only 19% began with digit 1 but 20% began with digit 2, for the simple reason that the truncation
at 2500 introduces statistical bias.[39] The terminal digits in pathology reports violate Benford’s law due to rounding, and
the fact that terminal digits are never expected to follow Benford’s law in the first place.[40]

1.10.11 Criteria for distributions expected and not expected to obey Benford’s Law
A number of criteria—applicable particularly to accounting data—have been suggested where Benford’s Law can be
expected to apply and not to apply.[41]

Distributions that can be expected to obey Benford’s Law

• When the mean is greater than the median and the skew is positive
• Numbers that result from mathematical combination of numbers: e.g. quantity × price
• Transaction level data: e.g. disbursements, sales
• Numbers produced when doing any multiplicative calculations with an Oughtred slide rule, since the answers nat-
urally fall into the right logarithmic distribution.

Distributions that would not be expected to obey Benford’s Law

• Where numbers are assigned sequentially: e.g. check numbers, invoice numbers
• Where numbers are influenced by human thought: e.g. prices set by psychological thresholds ($1.99)
• Accounts with a large number of firm-specific numbers: e.g. accounts set up to record $100 refunds
• Accounts with a built-in minimum or maximum
• Where no transaction is recorded
1.10. BENFORD’S LAW 59

1.10.12 Moments
Moments of random variables for the digits 1 to 9 following this law have been calculated:[42]

• mean 3.440
• variance 6.057
• skewness 0.796
• kurtosis −0.548

For the first and second digit distribution these values are also known:[43]

• mean 38.590
• variance 621.832
• skewness 0.772
• kurtosis −0.547

A table of the exact probabilities for the joint occurrence of the first two digits according to Benford’s law is available,[43]
as is the population correlation between the first and second digits:[43] ρ = 0.0561.

1.10.13 See also


• Fraud detection in predictive analytics
• Zipf’s law

1.10.14 References
[1] Arno Berger and Theodore P Hill, Benford’s Law Strikes Back: No Simple Explanation in Sight for Mathematical Gem, 2011

[2] Weisstein, Eric W. “Benford’s Law”. MathWorld, A Wolfram web resource. Retrieved 7 June 2015.

[3] Paul H. Kvam, Brani Vidakovic, Nonparametric Statistics with Applications to Science and Engineering, p. 158

[4] Frank Benford (March 1938). “The law of anomalous numbers”. Proc. Am. Philos. Soc. 78 (4): 551–572. JSTOR 984802.
(subscription required)

[5] Simon Newcomb (1881). “Note on the frequency of use of the different digits in natural numbers”. American Journal of
Mathematics (American Journal of Mathematics, Vol. 4, No. 1) 4 (1/4): 39–40. doi:10.2307/2369148. JSTOR 2369148.
(subscription required)

[6] Theodore P. Hill (1995). “A Statistical Derivation of the Significant-Digit Law” (PDF). Statistical Science 10: 354–363.
doi:10.1214/ss/1177009869. MR 1421567.

[7] Steven W. Smith. “The Scientist and Engineer’s Guide to Digital Signal Processing, chapter 34, Explaining Benford’s Law”.
Retrieved 15 December 2012. (especially section 10).

[8] Fewster, R. M. (2009). “A simple explanation of Benford’s Law”. The American Statistician 63 (1): 26–32. doi:10.1198/tast.2009.0005.

[9] This section discusses and plots probability distributions of the logarithms of a variable. This is not the same as taking a regular
probability distribution of a variable, and simply plotting it on a log scale. Instead, one multiplies the distribution by a certain
function. The log scale distorts the horizontal distances, so the height has to be changed also, in order for the area under
each section of the curve to remain true to the original distribution. See, for example, . Specifically: P (log x)d(log x) =
(1/x)P (log x)dx .
60 CHAPTER 1. DISCRETE DISTRIBUTIONS - WITH FINITE SUPPORT

[10] Arno Berger and Theodore P Hill, Benford’s Law Strikes Back: No Simple Explanation in Sight for Mathematical Gem, 2011.
The authors describe this argument, but say it “still leaves open the question of why it is reasonable to assume that the logarithm
of the spread, as opposed to the spread itself—or, say, the log log spread—should be large.” Moreover, they say: “assuming
large spread on a logarithmic scale is equivalent to assuming an approximate conformance with [Benford’s law]" (italics added),
something which they say lacks a “simple explanation”.

[11] L. Pietronero, E. Tosatti, V. Tosatti, A. Vespignani (2001). “Explaining the uneven distribution of numbers in nature: the laws
of Benford and Zipf”. Physica A 293: 297–304. Bibcode:2001PhyA..293..297P. doi:10.1016/S0378-4371(00)00633-6.

[12] Roger S. Pinkham, On the Distribution of First Significant Digits, Ann. Math. Statist. Volume 32, Number 4 (1961), 1223-
1230.

[13] MathWorld – Benford’s Law

[14] Formann, A. K. (2010). Morris, Richard James, ed. “The Newcomb-Benford Law in Its Relation to Some Common Dis-
tributions”. PLoS ONE 5 (5): e10541. Bibcode:2010PLoSO...510541F. doi:10.1371/journal.pone.0010541. PMC 2866333.
PMID 20479878.

[15] Theodore P. Hill (July–August 1998). “The first digit phenomenon” (PDF). American Scientist 86 (4): 358. Bibcode:1998AmSci..86..358H.
doi:10.1511/1998.4.358.

[16] Élise Janvresse and Thierry de la Rue (2004), “From Uniform Distributions to Benford’s Law”, Journal of Applied Probability,
41 1203–1210 doi:10.1239/jap/1101840566 MR 2122815 preprint

[17] Varian, Hal (1972). “Benford’s Law (Letters to the Editor)". The American Statistician 26 (3): 65. doi:10.1080/00031305.1972.10478934.

[18] Nigrini, Mark J. (May 1999). “I've Got Your Number:How a mathematical phenomenon can help CPAs uncover fraud and
other irregulaities”. Journal of Accountancy.

[19] “From Benford to Erdös”. Radio Lab. Episode 2009-10-09. 2009-09-30.

[20] Stephen Battersby Statistics hint at fraud in Iranian election New Scientist 24 June 2009

[21] Joseph Deckert, Mikhail Myagkov and Peter C. Ordeshook, (2010) The Irrelevance of Benford’s Law for Detecting Fraud in
Elections, Caltech/MIT Voting Technology Project Working Paper No. 9

[22] Charles R. Tolle, Joanne L. Budzien, and Randall A. LaViolette (2000) Do dynamical systems follow Benford?s Law?, Chaos
10, 2, pp.331–336 (2000); doi:10.1063/1.166498

[23] Müller, Hans Christian: Greece Was Lying About Its Budget Numbers. Forbes. 12 September 2011.

[24] Friar, JL; Goldman, T; Pérez-Mercader, J (2012). “Genome sizes and the benford distribution”. PLOS ONE 7 (5): e36624.
arXiv:1205.6512. Bibcode:2012PLoSO...736624F. doi:10.1371/journal.pone.0036624.

[25] Diekmann A (2007) Not the First Digit! Using Benford’s Law to detect fraudulent scientific data. J Appl Stat 34 (3) 321–329,
doi:10.1080/02664760601004940

[26] Stephens, M. A. (1970). “Use of the Kolmogorov–Smirnov, Cramér–Von Mises and Related Statistics without Extensive Ta-
bles”. Journal of the Royal Statistical Society, Series B 32 (1): 115–122. Retrieved 2013-03-09.

[27] Morrow, J. (2010) “Benford’s Law, Families of Distributions and a test basis”, UW-Madison

[28] Leemis, L. M.; Schmeiser, B. W.; Evans, D. L. (2000). “Survival distributions satisfying Benford’s Law”. The Amererican
Statistician 54 (4): 236–241. doi:10.1080/00031305.2000.10474554.

[29] Cho, W. K. T.; Gaines, B. J. (2007). “Breaking the (Benford) law: Statistical fraud detection in campaign finance”. The
Amererican Statistician 61 (3): 218–223. doi:10.1198/000313007X223496.

[30] Nigrini, M. (1996). “A taxpayer compliance application of Benford’s Law”. J Amer Tax Assoc 18: 72–91.

[31] Suh, I. S.; Headrick, T. C.; Minaburo, S. (2011). “An effective and efficient analytic technique: A bootstrap regression procedure
and Benford’s Law”. J Forensic & Investigative Accounting 3 (3).

[32] Theodore P. Hill, “The Significant-Digit Phenomenon”, The American Mathematical Monthly, Vol. 102, No. 4, (Apr., 1995),
pp. 322–327. Official web link (subscription required). Alternate, free web link.
1.10. BENFORD’S LAW 61

[33] Dümbgen, L; Leuenberger, C (2008). “Explicit bounds for the approximation error in Benford’s Law”. Elect Comm in Probab
13: 99–112. doi:10.1214/ECP.v13-1358.

[34] L. C. Washington, “Benford’s Law for Fibonacci and Lucas Numbers”, The Fibonacci Quarterly, 19.2, (1981), 175–177

[35] Duncan, R. L. (1967). “An Application of Uniform Distribution to the Fibonacci Numbers”. The Fibonacci Quarterly 5:
137–140.

[36] Sarkar, P. B. (1973). "An Observation on the Significant Digits of Binomial Coefficients and Factorials”, Sankhya". B, 35:
363–364.

[37] In general, the sequence k1 , k2 , k3 , etc., satisfies Benford’s Law exactly, under the condition that log10 k is an irrational number.
This is a straightforward consequence of the equidistribution theorem.

[38] That the first 100 powers of 2 approximately satisfy Benford’s Law is mentioned by Ralph Raimi. Raimi, Ralph A. (1976).
“The First Digit Problem”. American Mathematical Monthly 83 (7): 521–538. doi:10.2307/2319349.

[39] Raimi, RA (1976). “The first digit problem”. American Mathematical Monthly 83: 521–538. doi:10.2307/2319349.

[40] Beer, TW (2009). “Terminal digit preference: beware of Benford’s Law”. J Clin Pathol 62: 192. doi:10.1136/jcp.2008.061721.

[41] Durtschi, C; Hillison, W; Pacini, C (2004). “The effective use of Benford’s Law to assist in detecting fraud in accounting data”.
J Forensic Accounting 5: 17–34.

[42] Scott, P.D.; Fasli, M. (2001) “Benford’s Law: An empirical investigation and a novel explanation”. CSM Technical Report 349,
Department of Computer Science, Univ. Essex

[43] Suh, I.S.; Headrick, T.C. (2010). “A comparative analysis of the bootstrap versus traditional statistical procedures applied to
digital analysis based on Benford’s Law” (PDF). Journal of Forensic and Investigative Accounting 2 (2): 144–175.

1.10.15 Further reading


• Arno Berger and Theodore P. Hill (2015). An Introduction to Benford’s Law. Princeton University Press. ISBN
978-0-691-16306-2.
• Alex Ely Kossovsky. Benford’s Law: Theory, the General Law of Relative Quantities, and Forensic Fraud Detection
Applications, 2014, World Scientific Publishing. ISBN 978-981-4583-68-8.
• “Benford’s Law – from Wolfram MathWorld”. Mathworld.wolfram.com. 14 June 2012. Retrieved 2012-06-26.
• Mark J. Nigrini (2012). Benford’s Law: Applications for Forensic Accounting, Auditing, and Fraud Detection. John
Wiley & Sons. p. 330. ISBN 978-1-118-15285-0.
• Alessandro Gambini, Giovanni Mingari Scarpello, Daniele Ritelli; et al. (2012). “Probability of digits by dividing
random numbers: A ψ and ζ functions approach”. Expositiones Mathematicae 30 (4): 223–238. doi:10.1016/j.exmath.2012.03.001.
• Sehity; Hoelzl, Erik; Kirchler, Erich (2005). “Price developments after a nominal shock: Benford’s Law and
psychological pricing after the euro introduction”. International Journal of Research in Marketing 22 (4): 471–
480. doi:10.1016/j.ijresmar.2005.09.002.
• Nicolas Gauvrit; Jean-Paul Delahaye (2011). “Scatter and regularity implies Benford’s Law...and more”. Zenil:
Randomness through computation: some answers, more questions. ISBN 9814327751: 58–69. arXiv:0910.1359.
Bibcode:2009arXiv0910.1359G. doi:10.1142/9789814327756_0004.
• Bernhard Rauch1, Max Göttsche, Gernot Brähler, Stefan Engel (August 2011). “Fact and Fiction in EU-Governmental
Economic Data”. German Economic Review 12 (3): 243–255. doi:10.1111/j.1468-0475.2011.00542.x.
• Wendy Cho and Brian Gaines (August 2007). “Breaking the (Benford) Law: statistical fraud detection in campaign
finance”. The American Statistician 61 (3): 218–223. doi:10.1198/000313007X223496.
• Geiringer, Hilda; Furlan, L. V. (1948). “The Law of Harmony in Statistics: An Investigation of the Metrical
Interdependence of Social Phenomena. by L. V. Furlan”. Journal of the American Statistical Association (American
Statistical Association) 43 (242): 325–328. doi:10.2307/2280379. JSTOR 2280379.
62 CHAPTER 1. DISCRETE DISTRIBUTIONS - WITH FINITE SUPPORT

1.10.16 External links


General audience

• Benford Online Bibliography, an online bibliographic database on Benford’s Law.


• Companion website for Benford’s Law by Mark Nigrini Website includes 15 data sets, 10 Excel templates, photos,
documents, and other miscellaneous items related to Benford’s Law
• Following Benford’s Law, or Looking Out for No. 1, 1998 article from The New York Times.

• A further five numbers: number 1 and Benford’s law, BBC radio segment by Simon Singh

• From Benford to Erdös, Radio segment from the Radiolab program


• Looking out for number one by Jon Walthoe, Robert Hunt and Mike Pearson, Plus Magazine, September 1999

• Video showing Benford’s Law applied to Web Data (incl. Minnesota Lakes, US Census Data and Digg Statistics)
• An illustration of Benford’s Law, showing first-digit distributions of various sequences evolve over time, interactive.

• Generate your own Benford numbers A script for generating random numbers compliant with Benford’s Law.
• Testing Benford’s Law An open source project showing Benford’s Law in action against publicly available datasets.

• Testing Benford’s Law in OLAP Cubes Implementation with Microsoft Analysis Services.
• Mould, Steve. “Number 1 and Benford’s Law”. Numberphile. Brady Haran.

• A third of property values begin with a 1 An example of Benford’s Law appearing in house price data.
• Benford’s Very Strange Law - Professor John D. Barrow, lecture on Benford’s Law.

More mathematical

• Weisstein, Eric W., “Benford’s Law”, MathWorld.

• Benford’s law, Zipf’s law, and the Pareto distribution by Terence Tao

• Country Data and Benford’s Law, Benford’s Law from Ratios of Random Numbers at Wolfram Demonstrations
Project.

• Benford’s Law Solved with Digital Signal Processing

• Interactive graphic: Univariate Distribution Relationships


1.10. BENFORD’S LAW 63

Biologist and statistician Ronald Fisher


64 CHAPTER 1. DISCRETE DISTRIBUTIONS - WITH FINITE SUPPORT

Probability mass function for Fisher’s noncentral hypergeometric distribution for different values of the odds ratio ω.
m1 = 80, m2 = 60, n = 100, ω = 0.01, ..., 1000
1.10. BENFORD’S LAW 65

Biologist and statistician Ronald Fisher


66 CHAPTER 1. DISCRETE DISTRIBUTIONS - WITH FINITE SUPPORT

Probability mass function for Wallenius’ Noncentral Hypergeometric Distribution for different values of the odds ratio ω.
m1 = 80, m2 = 60, n = 100, ω = 0.1 ... 20
1.10. BENFORD’S LAW 67

Recursive calculation of probability f(x,n) in Wallenius’ distribution. The light grey fields are possible points on the way to the final point.
The arrows indicate an arbitrary trajectory.
68 CHAPTER 1. DISCRETE DISTRIBUTIONS - WITH FINITE SUPPORT

Probability mass function for the Complementary Wallenius’ Noncentral Hypergeometric Distribution for different values of the odds
ratio ω.
m1 = 80, m2 = 60, n = 40, ω = 0.05 ... 10
1.10. BENFORD’S LAW 69

Rozklad Benforda
35

30

25

20
P_k [%]

15

10

0
1 2 3 4 5 6 7 8 9
k

The distribution of first digits, according to Benford’s law. Each bar represents a digit, and the height of the bar is the percentage of
numbers that start with that digit.
70 CHAPTER 1. DISCRETE DISTRIBUTIONS - WITH FINITE SUPPORT

0.35
Benford's Law

Physical Constants
0.3

0.25

0.2
Frequency

0.15

0.1

0.05

0
1 2 3 4 5 6 7 8 9

First Digit

Frequency of first significant digit of physical constants plotted against Benford’s law

A logarithmic scale bar. Picking a random x position uniformly on this number line, roughly 30% of the time the first digit of the number
will be 1.
1.10. BENFORD’S LAW 71

Distribution of first digits (in %, red bars) in the population of the 237 countries of the world. Black dots indicate the distribution predicted
by Benford’s law.
Chapter 2

Continuous Distributions - Supported on


semi-infinite intervals, usually [0,∞)

2.1 Beta prime distribution


In probability theory and statistics, the beta prime distribution (also known as inverted beta distribution or beta
distribution of the second kind[1] ) is an absolutely continuous probability distribution defined for x > 0 with two
parameters α and β, having the probability density function:

xα−1 (1 + x)−α−β
f (x) =
B(α, β)
where B is a Beta function.
The cumulative distribution function is

F (x; α, β) = I 1+x
x (α, β) ,

where I is the regularized incomplete beta function.


The expectation value, variance, and other details of the distribution are given in the sidebox; for β > 4 , the excess
kurtosis is

α(α + β − 1)(5β − 11) + (β − 1)2 (β − 2)


γ2 = 6
α(α + β − 1)(β − 3)(β − 4)
While the related beta distribution is the conjugate prior distribution of the parameter of a Bernoulli distribution expressed
as a probability, the beta prime distribution is the conjugate prior distribution of the parameter of a Bernoulli distribution
expressed in odds. The distribution is a Pearson type VI distribution.[1]

The mode of a variate X distributed as β (α, β) is X̂ = α−1
β+1 . Its mean is α
β−1 if β > 1 (if β ≤ 1 the mean is infinite,
α(α+β−1)
in other words it has no well defined mean) and its variance is (β−2)(β−1)2 if β > 2 .
For −α < k < β , the k-th moment E[X k ] is given by

B(α + k, β − k)
E[X k ] = .
B(α, β)

72
2.1. BETA PRIME DISTRIBUTION 73

For k ∈ N with k < β , this simplifies to


k
α+i−1
E[X k ] = .
i=1
β−i

The cdf can also be written as

xα ·2 F1 (α, α + β, α + 1, −x)
α · B(α, β)
where 2 F1 is the Gauss’s hypergeometric function 2 F1 .
Differential equation
{( ) }
2−α−β
x2 + x f ′ (x) + f (x)(−α + βx + x + 1) = 0, f (1) = B(α,β)

2.1.1 Generalization
Two more parameters can be added to form the generalized beta prime distribution.

p > 0 shape (real)


q > 0 scale (real)

having the probability density function:

( )αp−1 ( ( )p )−α−β
p xq 1 + xq
f (x; α, β, p, q) =
qB(α, β)
with mean

qΓ(α + p1 )Γ(β − p1 )
if βp > 1
Γ(α)Γ(β)
and mode

( )1
αp − 1 p
q if αp ≥ 1
βp + 1
Note that if p=q=1 then the generalized beta prime distribution reduces to the standard beta prime distribution

Compound gamma distribution

The compound gamma distribution[2] is the generalization of the beta prime when the scale parameter, q is added, but
where p=1. It is so named because it is formed by compounding two gamma distributions:

∫ ∞
β ′ (x; α, β, 1, q) = G(x; α, p)G(p; β, q) dp
0
74 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

where G(x;a,b) is the gamma distribution with shape a and inverse scale b. This relationship can be used to generate
random variables with a compound gamma, or beta prime distribution.
The mode, mean and variance of the compound gamma can be obtained by multiplying the mode and mean in the above
infobox by q and the variance by q2 .

2.1.2 Properties
′ ′
• If X ∼ β (α, β) then 1
X ∼ β (β, α) .
′ ′
• If X ∼ β (α, β, p, q) then kX ∼ β (α, β, p, kq) .
′ ′
• β (α, β, 1, 1) = β (α, β)

2.1.3 Related distributions



• If X ∼ F (α, β) then α
βX ∼ β ( α2 , β2 )

• If X ∼ Beta(α, β) then X
1−X ∼ β (α, β)

• If X ∼ Γ(α, 1) and Y ∼ Γ(β, 1) , then X
Y ∼ β (α, β) .

• β (p, 1, a, b) = Dagum(p, a, b) the Dagum distribution

• β (1, p, a, b) = SinghMaddala(p, a, b) the Singh-Maddala distribution

• β (1, 1, γ, σ) = LL(γ, σ) the Log logistic distribution

• Beta prime distribution is a special case of the type 6 Pearson distribution

• Pareto distribution type II is related to Beta prime distribution

• Pareto distribution type IV is related to Beta prime distribution

• inverted Dirichlet distribution, a generalization of the beta prime distribution

2.1.4 Notes
[1] Johnson et al (1995), p248

[2] Dubey, Satya D. (December 1970). “Compound gamma, beta and F distributions”. Metrika 16: 27–31. doi:10.1007/BF02613934.

2.1.5 References
• Jonhnson, N.L., Kotz, S., Balakrishnan, N. (1995). Continuous Univariate Distributions, Volume 2 (2nd Edition),
Wiley. ISBN 0-471-58494-0

• MathWorld article

2.2 Birnbaum–Saunders distribution


The Birnbaum–Saunders distribution, also known as the fatigue life distribution, is a probability distribution used
extensively in reliability applications to model failure times. There are several alternative formulations of this distribution
in the literature. It is named after Z. W. Birnbaum and S. C. Saunders.
2.2. BIRNBAUM–SAUNDERS DISTRIBUTION 75

2.2.1 Theory
This distribution was developed to model failures due to cracks. A material is placed under repeated cycles of stress. The
j th cycle leads to an increase in the crack by X amount. The sum of the X is assumed to be normally distributed with
mean nμ and variance nσ2 . The probability that the crack does not exceed a critical length ω is

( )
ω − nµ
P (X ≤ ω) = Φ √
σ n
where Φ() is the cdf of normal distribution.
If T is the number of cycles to failure then the cumulative distribution function (cdf) of T is

( ) ( ) ( √ ) (√ [( )0.5 ( )0.5 ])
ω − tµ tµ − ω µ t ω µω t ω/µ
P (T ≤ t) = 1 − Φ √ =Φ √ =Φ − √ =Φ −
σ t σ t σ σ t σ ω/µ t
The more usual form of this distribution is:

( [( ) ( )0.5 ])
0.5
1 x β
F (x; α, β) = Φ −
α β x
Here α is the shape parameter and β is the location parameter.

2.2.2 Properties
The Birnbaum–Saunders distribution is unimodal with a median of β.
The mean (μ), variance (σ2 ), skewness (γ) and kurtosis (κ) are as follows:

α2
µ = β(1 + )
2
5α2
σ 2 = (αβ)2 (1 + )
4
2 2
16α (11α + 6)
γ=
(5α2 + 4)3
6α2 (93α2 + 41)
κ=3+
(5α2 + 4)2
Given a data set that is thought to be Birnbaum-Saunders distributed the parameters’ values are best estimated by maximum
likelihood.

Differential equation

The cdf of the Birnbaum-Saunders distribution is a solution of the following differential equation:

 ( ( ) ( ) ) 

 2α2 βx2 (β + x)f ′ (x) + f (x) −β 3 + x3 + α2 + 1 βx2 + 3α2 − 1 β 2 x = 0, 

 

 −
(β−1)2


 f (1) = (β+1)e 2α2 β
√ √ 
2 2πα β

If T is Birnbaum-Saunders distributed with parameters α and β then T −1 is also Birnbaum-Saunders distributed with
parameters α and β−1 .
76 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

Transformation

Let T be a Birnbaum-Saunders distributed variate with parameters α and β. A useful transformation of T is

[( ) ( )−0.5 ]
0.5
1 T T
X= −
2 β β

Equivalently

( )
T = β 1 + 2X 2 + 2X(1 + X 2 )0.5

X is then distributed normally with a mean of zero and a variance of α2 / 4.

2.2.3 Probability density function

The general formula for the probability density function (pdf) is

√ √ √ √ 
x−µ
β + β
x−µ 
x−µ
β − β
x−µ 
f (x) = ϕ  x > µ; γ, β > 0
2γ (x − µ) γ

where γ is the shape parameter, μ is the location parameter, β is the scale parameter, and ϕ is the probability density
function of the standard normal distribution.

2.2.4 Standard fatigue life distribution

The case where μ = 0 and β = 1 is called the standard fatigue life distribution. The pdf for the standard fatigue life
distribution reduces to

√ √ √ √ 
x + x1 x − x1
f (x) = ϕ  x > 0; γ > 0
2γx γ

Since the general form of probability functions can be expressed in terms of the standard distribution, all of the subsequent
formulas are given for the standard form of the function.

2.2.5 Cumulative distribution function

The formula for the cumulative distribution function is

√ √ 
x− 1
F (x) = Φ  
x
x > 0; γ > 0
γ

where Φ is the cumulative distribution function of the standard normal distribution.


2.3. CHI DISTRIBUTION 77

2.2.6 Quantile function


The formula for the quantile function is

[ √ ]2
1
γΦ−1 (p) + 4 + (γΦ−1 (p))
2
G(p) =
4
where Φ −1 is the quantile function of the standard normal distribution.

2.2.7 External links


• Fatigue life distribution

2.2.8 References
• Birnbaum, Z. W.; Saunders, S. C. (1969), “A new family of life distributions”, Journal of Applied Probability 6
(2): 319–327, doi:10.2307/3212003, JSTOR 3212003
• Desmond, A.F. (1985), “Stochastic models of failure in random environments”, Canadian Journal of Statistics 13
(3): 171–183, doi:10.2307/3315148, JSTOR 3315148
• Johnson, N.; Kotz, S.; Balakrishnan, N. (1995), Continuous Univariate Distributions 2 (2nd ed.), New York: Wiley
• Lemonte, A. J.; Cribari-Neto, F.; Vasconcellos, K. L. P. (2007), “Improved statistical inference for the two-
parameter Birnbaum–Saunders distribution”, Computational Statistics and Data Analysis 51: 4656–4681, doi:10.1016/j.csda.2006.08.0
• Lemonte, A. J.; Simas, A. B.; Cribari-Neto, F. (2008), “Bootstrap-based improved estimators for the two-parameter
Birnbaum–Saunders distribution”, Journal of Statistical Computation and Simulation 78: 37–49, doi:10.1080/10629360600903882
• Cordeiro, G. M.; Lemonte, A. J. (2011), “The β-Birnbaum–Saunders distribution: An improved distribution for
fatigue life modeling”, Computational Statistics and Data Analysis 55: 1445–1461, doi:10.1016/j.csda.2010.10.007
• Lemonte, A. J. (2013), “A new extension of the Birnbaum–Saunders distribution”, Brazilian Journal of Probability
and Statistics 27: 133–149, doi:10.1214/11-BJPS160

This article incorporates public domain material from websites or documents of the National Institute of Standards and
Technology.

2.3 Chi distribution


See also: Chi-squared distribution

In probability theory and statistics, the chi distribution is a continuous probability distribution. It is the distribution
of the square root of the sum of squares of independent random variables having a standard normal distribution. The
most familiar examples are the Rayleigh distribution with chi distribution with 2 degrees of freedom, and the Maxwell
distribution of (normalized) molecular speeds which is a chi distribution with 3 degrees of freedom (one for each spatial
coordinate). If Xi are k independent, normally distributed random variables with means µi and standard deviations σi ,
then the statistic

v
u k (
u∑ Xi − µi )2
Y =t
i=1
σi
78 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

is distributed according to the chi distribution. Accordingly, dividing by the mean of the chi distribution (scaled by
the square root of n − 1) yields the correction factor in the unbiased estimation of the standard deviation of the normal
distribution. The chi distribution has one parameter: k which specifies the number of degrees of freedom (i.e. the number
of Xi ).

2.3.1 Characterization
Probability density function

The probability density function is

x2
21− 2 xk−1 e−
k
2
f (x; k) =
Γ( k2 )
where Γ(z) is the Gamma function.

Cumulative distribution function

The cumulative distribution function is given by:

F (x; k) = P (k/2, x2 /2)


where P (k, x) is the regularized Gamma function.

Generating functions

Moment generating function The moment generating function is given by:

( )
k 1 t2
M (t) = M , , +
2 2 2
( )
√ Γ((k + 1)/2) k + 1 3 t2
t 2 M , ,
Γ(k/2) 2 2 2

Characteristic function The characteristic function is given by:

( )
k 1 −t2
φ(t; k) = M , , +
2 2 2
( )
√ Γ((k + 1)/2) k + 1 3 −t2
it 2 M , ,
Γ(k/2) 2 2 2
where again, M (a, b, z) is Kummer’s confluent hypergeometric function.

2.3.2 Properties
Differential equation
{ }
( ) ν
21− 2
xf ′ (x) + f (x) −ν + x2 + 1 = 0, f (1) = √
eΓ( ν2 )
2.3. CHI DISTRIBUTION 79

Moments

The raw moments are then given by:

Γ((k + j)/2)
µj = 2j/2
Γ(k/2)

where Γ(z) is the Gamma function. The first few raw moments are:

√ Γ((k+1)/2)
µ1 = 2
Γ(k/2)

µ2 = k
√ Γ((k+3)/2)
µ3 = 2 2 = (k + 1)µ1
Γ(k/2)
µ4 = (k)(k + 2)
√ Γ((k+5)/2)
µ5 = 4 2 = (k + 1)(k + 3)µ1
Γ(k/2)
µ6 = (k)(k + 2)(k + 4)
where the rightmost expressions are derived using the recurrence relationship for the Gamma function:

Γ(x + 1) = xΓ(x)

From these expressions we may derive the following relationships:



Mean: µ = 2 Γ((k+1)/2)
Γ(k/2)

Variance: σ 2 = k − µ2
Skewness: γ1 = µ
σ3 (1 − 2σ 2 )
Kurtosis excess: γ2 = 2
σ 2 (1 − µσγ1 − σ 2 )

Entropy

The entropy is given by:

1
S = ln(Γ(k/2)) + (k−ln(2)−(k−1)ψ0 (k/2))
2
where ψ0 (z) is the polygamma function.

2.3.3 Related distributions


• If X ∼ χk (x) then X 2 ∼ χ2k (chi-squared distribution)
χk (x)−µk d
• limk→∞ σk −→ N (0, 1) (Normal distribution)

• If X ∼ N (0, 1) then |X| ∼ χ1 (x)


80 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

• If X ∼ χ1 (x) then σX ∼ HN (σ) (half-normal distribution) for any σ > 0


• χ2 (x) ∼ Rayleigh(1) (Rayleigh distribution)
• χ3 (x) ∼ Maxwell(1) (Maxwell distribution)
• ∥N i=1,...,k (0, 1)∥2 ∼ χk (x) (The 2-norm of k standard normally distributed variables is a chi distribution with k
degrees of freedom)
• chi distribution is a special case of the generalized gamma distribution or the nakagami distribution or the noncentral
chi distribution

2.3.4 See also


• Nakagami distribution

2.3.5 External links


• http://mathworld.wolfram.com/ChiDistribution.html

2.4 Chi-squared distribution


This article is about the mathematics of the chi-squared distribution. For its uses in statistics, see chi-squared test. For
the music group, see Chi2 (band).

In probability theory and statistics, the chi-squared distribution (also chi-square or χ²-distribution) with k degrees of
freedom is the distribution of a sum of the squares of k independent standard normal random variables. It is a special
case of the gamma distribution and is one of the most widely used probability distributions in inferential statistics, e.g.,
in hypothesis testing or in construction of confidence intervals.[2][3][4][5] When it is being distinguished from the more
general noncentral chi-squared distribution, this distribution is sometimes called the central chi-squared distribution.
The chi-squared distribution is used in the common chi-squared tests for goodness of fit of an observed distribution to a
theoretical one, the independence of two criteria of classification of qualitative data, and in confidence interval estimation
for a population standard deviation of a normal distribution from a sample standard deviation. Many other statistical tests
also use this distribution, like Friedman’s analysis of variance by ranks.

2.4.1 Definition
If Z 1 , ..., Zk are independent, standard normal random variables, then the sum of their squares,


k
Q = Zi2 ,
i=1

is distributed according to the chi-squared distribution with k degrees of freedom. This is usually denoted as

Q ∼ χ2 (k) or Q ∼ χ2k .
The chi-squared distribution has one parameter: k — a positive integer that specifies the number of degrees of freedom
(i.e. the number of Zi’s)
2.4. CHI-SQUARED DISTRIBUTION 81

2.4.2 Introduction to the chi-squared distribution


The chi-squared distribution is used primarily in hypothesis testing. Unlike more widely-known distributions such as the
normal distribution and the exponential distribution, the chi-squared distribution is rarely used to model natural phenom-
ena. It arises in the following hypothesis tests, among others.

• Chi-squared test of independence in contingency tables


• Chi-squared test of goodness of fit of observed data to hypothetical distributions
• Likelihood-ratio test for nested models
• Log-rank test in survival analysis
• Cochran–Mantel–Haenszel test for stratified contingency tables

It is also a component of the definition of the t-distribution and the F-distribution used in t-tests, analysis of variance, and
regression analysis.
The primary reason that the chi-squared distribution is used extensively in hypothesis testing is its relationship to the
normal distribution. Many hypothesis tests use a test statistic, such as the t statistic in a t-test. For these hypothesis tests,
as the sample size, n, increases, the sampling distribution of the test statistic approaches the normal distribution (Central
Limit Theorem). Because the test statistic (such as t) is asymptotically normally distributed, provided the sample size
is sufficiently large, the distribution used for hypothesis testing may be approximated by a normal distribution. Testing
hypotheses using a normal distribution is well understood and relatively easy. The simplest chi-squared distribution is
the square of a standard normal distribution. So wherever a normal distribution could be used for a hypothesis test, a
chi-squared distribution could be used.
Specifically, suppose that Z is a standard normal random variable, with mean = 0 and variance = 1. Z ~ N(0,1). A sample
drawn at random from Z is a sample from the distribution shown in the graph of the standard normal distribution. Define a
new random variable Q. To generate a random sample from Q, take a sample from Z and square the value. The distribution
of the squared values is given by the random variable Q = Z2 . The distribution of the random variable Q is an example of
a chi-squared distribution: Q ∼ χ21 . The subscript 1 indicates that this particular chi-squared distribution is constructed
from only 1 standard normal distribution. A chi-squared distribution constructed by squaring a single standard normal
distribution is said to have 1 degree of freedom. Thus, as the sample size for a hypothesis test increases, the distribution
of the test statistic approaches a normal distribution. and the distribution of the square of the test statistic approaches a
chi-squared distribution. Just as extreme values of the normal distribution have low probability (and give small p-values),
extreme values of the chi-squared distribution have low probability.
An additional reason that the chi-squared distribution is widely used is that it is a member of the class of likelihood
ratio tests (LRT).[6] LRT’s have several desirable properties; in particular, LRT’s commonly provide the highest power to
reject the null hypothesis (Neyman–Pearson lemma). However, the normal and chi-squared approximations are only valid
asymptotically. For this reason, it is preferable to use the t distribution rather than the normal approximation or the chi-
squared approximation for small sample size. Similarly, in analyses of contingency tables, the chi-squared approximation
will be poor for small sample size, and it is preferable to use the Fisher Exact test. Ramsey and Ramsey show that the
exact binomial test is always more powerful than the normal approximation.[7]

Where are the squared normal distributions?

If the chi-squared distribution is used because it is the sum of squared normal distributions, where are the squared nor-
mal distributions in contingency tables analyzed with a chi-squared test? The answer can be traced back to the normal
approximation to the binomial distribution. Consider an experiment in which 10 fair coins are tossed, and the number of
heads is observed. This experiment can be modeled with a binomial distribution, with n=10 trials and p = 0.5 probability
of heads on each trial. Suppose that heads is observed 1 times in 10 trials. What is the probability of a result as extreme
as 1 heads in 10 trials, if the probability of heads is p=0.5?
Three methods to determine the probability are:
82 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

• Calculate the probability exactly using the binomial distribution.

• Estimate the probability using normal approximation to the binomial distribution.

• Estimate the probability using a chi-squared test. This result will be the same as the result for the normal approxi-
mation.

Calculation using the exact binomial and the normal approximation may be performed using http://vassarstats.net/binomialX.
html. Calculation of the chi-square probability may be performed using http://vassarstats.net/csfit.html.
Using the binomial distribution, the probability of a result as extreme 1 heads in 10 trials is the sum of the probabilities
of 0 heads, 1 head, 9 heads, or 10 heads. Notice that this is a two-tailed or two-sided test. This test gives p=0.0215.
Using the normal approximation to the binomial distribution, the (two-sided) probability of a result as extreme 1 heads
in 10 trials is p=0.0271.
The chi-squared test is performed as follows. The observed number of heads is 1, and the observed number of tails is
9. The expected number of heads = expected number of tails = 10*0.5 = 5. The difference between the observed and
expected is 1-5=−4 for heads, and 9-5=4 for tails. The chi-squared statistic (with Yates’s correction for continuity) is

(|1 − 5| − 0.5)2 (|9 − 5| − 0.5)2


χ2 = + = 4.9.
5 5
For the chi-squared test, the (two-sided) probability of a result as extreme as 1 heads in 10 trials is p=0.027, the same
as the result using the normal approximation. That is, the probability that the chi-squared statistic with one degree of
freedom is greater than 4.9 is p=0.027.
Lancaster [8] shows the connections among the binomial, normal, and chi-squared distributions, as follows. De Moivre
and Laplace established that a binomial distribution could be approximated by a normal distribution. Specifically they
showed the asymptotic normality of the random variable

(m − N p)
χ= √
(N pq)

where m is the observed number of successes in N trials, where the probability of success is p, and q = 1 − p.
Squaring both sides of the equation gives

(m − N p)2
χ2 =
(N pq)

Using N = Np + N(1 − p), N = m + (N − m), and q = 1 − p, this equation simplifies to

(m − N p)2 (N − m − N q)2
χ2 = +
(N p) (N q)

The expression on the right is of the form that Pearson would generalize to the form:


n
(Oi − Ei )2
χ2 =
i=1
Ei

where
2.4. CHI-SQUARED DISTRIBUTION 83

χ2 = Pearson’s cumulative test statistic, which asymptotically approaches a χ2 distribution.


Oi = the number of observations of type i.
Ei = N pi = the expected (theoretical) frequency of type i, asserted by the null hypothesis that the fraction
of type i in the population is pi
n = the number of cells in the table.

In the case of a binomial outcome (flipping a coin), the binomial distribution may be approximated by a normal distribution
(for sufficiently large n). Because the square of a standard normal distribution is the chi-squared distribution with one
degree of freedom, the probability of a result such as 1 heads in 10 trials can be approximated either by the normal or
the chi-squared distribution. However, many problems involve more than the two possible outcomes of a binomial, and
instead require 3 or more categories, which leads to the multinomial distribution. Just as de Moivre and Laplace sought for
and found the normal approximation to the binomial, Pearson sought for and found a multivariate normal approximation to
the multinomial distribution. Pearson showed that the chi-squared distribution, the sum of multiple normal distributions,
was such an approximation to the multinomial distribution [8]

2.4.3 Characteristics
Further properties of the chi-squared distribution can be found in the box at the upper right corner of this article.

Probability density function

The probability density function (pdf) of the chi-squared distribution is

{ x(k/2−1) e−x/2
2k/2 Γ( k
, x > 0;
f (x; k) = 2)

0, otherwise.

where Γ(k/2) denotes the Gamma function, which has closed-form values for integer k.
For derivations of the pdf in the cases of one, two and k degrees of freedom, see Proofs related to chi-squared distribution.

Differential equation

The pdf of the chi-squared distribution is a solution to the following differential equation:

{ }
2xf ′ (x) + f (x)(−k + x + 2) = 0,
−k/2
f (1) = √2eΓ k
(2)

Cumulative distribution function

Its cumulative distribution function is:

( )
γ( k2 , x
2) k x
F (x; k) = =P , ,
Γ( k2 ) 2 2

where γ(s,t) is the lower incomplete Gamma function and P(s,t) is the regularized Gamma function.
In a special case of k = 2 this function has a simple form:
84 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

Chernoff bound for the CDF and tail (1-CDF) of a chi-squared random variable with ten degrees of freedom (k = 10)

F (x; 2) = 1 − e− 2
x

and the form is not much more complicated for other small even k.
Tables of the chi-squared cumulative distribution function are widely available and the function is included in many
spreadsheets and all statistical packages.
Letting z ≡ x/k , Chernoff bounds on the lower and upper tails of the CDF may be obtained.[9] For the cases when
0 < z < 1 (which include all of the cases when this CDF is less than half):

F (zk; k) ≤ (ze1−z )k/2 .

The tail bound for the cases when z > 1 , similarly, is

1 − F (zk; k) ≤ (ze1−z )k/2 .

For another approximation for the CDF modeled after the cube of a Gaussian, see under Noncentral chi-squared distri-
bution.
2.4. CHI-SQUARED DISTRIBUTION 85

Additivity

It follows from the definition of the chi-squared distribution that the sum of independent chi-squared variables is also
chi-squared distributed. Specifically, if {Xi}i₌₁n are independent chi-squared variables with {ki}i₌₁n degrees of freedom,
respectively, then Y = X1 + ⋯ + Xn is chi-squared distributed with k1 + ⋯ + kn degrees of freedom.

Sample mean

The sample mean of n i.i.d. chi-squared variables of degree k is distributed according to a gamma distribution with shape
α and scale θ parameters:

1∑
n
X̄ = Xi ∼ Gamma (α = n k/2, θ = 2/n) where Xi ∼ χ2 (k)
n i=1

Asymptotically, given that for a scale parameter α going to infinity, a Gamma distribution converges towards a Normal
distribution with expectation µ = α · θ and variance σ 2 = α θ2 , the sample mean converges towards:

n→∞
X̄ −−−−→N (µ = k, σ 2 = 2 k/n)
Note that we would have obtained the same result invoking instead the central limit theorem, noting that for each chi-
squared variable of degree k the expectation is k , and its variance 2 k (and hence the variance of the sample mean X̄
being σ 2 = 2 k/n ).

Entropy

The differential entropy is given by

∫ ∞ [ ( )] ( ) [ ]
k k k k
h= f (x; k) ln f (x; k) dx = + ln 2 Γ + 1− ψ ,
−∞ 2 2 2 2
where ψ(x) is the Digamma function.
The chi-squared distribution is the maximum entropy probability distribution for a random variate X for which E(X) = k
and E(ln(X)) = ψ (k/2) + log(2) are fixed. Since the chi-squared is in the family of gamma distributions, this can be
derived by substituting appropriate values in the Expectation of the Log moment of Gamma. For derivation from more
basic principles, see the derivation in moment generating function of the sufficient statistic.

Noncentral moments

The moments about zero of a chi-squared distribution with k degrees of freedom are given by[10][11]

Γ(m + k2 )
E(X m ) = k(k + 2)(k + 4) · · · (k + 2m − 2) = 2m .
Γ( k2 )

Cumulants

The cumulants are readily obtained by a (formal) power series expansion of the logarithm of the characteristic function:

κn = 2n−1 (n − 1)! k
86 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

Asymptotic properties

By the central limit theorem, because the chi-squared distribution is the sum of k independent random variables with
finite mean and variance, it converges to a normal distribution for large k. For many practical purposes, for k > 50 the
[12]
distribution is sufficiently close to a normal distribution
√ for the difference to be ignored. Specifically, if X ~ χ²(k), then
as k tends to infinity, the√distribution of (X − k)/ 2k tends to a standard normal distribution. However, convergence is
slow as the skewness is 8/k and the excess kurtosis is 12/k.

• The sampling distribution of ln(χ2 ) converges to normality much faster than the sampling distribution of χ2 ,[13] as
the logarithm removes much of the asymmetry.[14] Other functions of the chi-squared distribution converge more
rapidly to a normal distribution. Some examples are:
√ √
• If X ~ χ²(k) then 2X is approximately normally distributed with mean 2k−1 and unit variance (result credited to
R. A. Fisher).

• If X ~ χ²(k) then 3 X/k is approximately normally distributed with mean 1−2/(9k) and variance 2/(9k). [15] This is
known as the Wilson–Hilferty transformation.

2.4.4 Relation to other distributions

Approximate formula for median compared with numerical quantile (top). Difference between numerical quantile and approximate
formula (bottom).

√ d
• As k → ∞ , (χ2k − k)/ 2k −
→ N (0, 1) (normal distribution)
2.4. CHI-SQUARED DISTRIBUTION 87

• χ2k ∼ χ′ k (0) (Noncentral chi-squared distribution with non-centrality parameter λ = 0 )


2

• If X ∼ F(ν1 , ν2 ) then Y = limν2 →∞ ν1 X has the chi-squared distribution χ2ν1

• As a special case, if X ∼ F(1, ν2 ) then Y = limν2 →∞ X has the chi-squared distribution χ21

• ∥N i=1,...,k (0, 1)∥2 ∼ χ2k (The squared norm of k standard normally distributed variables is a chi-squared distri-
bution with k degrees of freedom)

• If X ∼ χ2 (ν) and c > 0 , then cX ∼ Γ(k = ν/2, θ = 2c) . (gamma distribution)



• If X ∼ χ2k then X ∼ χk (chi distribution)

• If X ∼ χ2 (2) , then X ∼ Exp(1/2) is an exponential distribution. (See Gamma distribution for more.)

• If X ∼ Rayleigh(1) (Rayleigh distribution) then X 2 ∼ χ2 (2)

• If X ∼ Maxwell(1) (Maxwell distribution) then X 2 ∼ χ2 (3)

• If X ∼ χ2 (ν) then 1
X ∼ Inv-χ2 (ν) (Inverse-chi-squared distribution)

• The chi-squared distribution is a special case of type 3 Pearson distribution

• If X ∼ χ2 (ν1 ) and Y ∼ χ2 (ν2 ) are independent then X


X+Y ∼ Beta( ν21 , ν22 ) (beta distribution)

• If X ∼ U(0, 1) (uniform distribution) then −2 log (X) ∼ χ2 (2)

• χ2 (6) is a transformation of Laplace distribution


∑n
• If Xi ∼ Laplace(µ, β) then i=1 2|Xβi −µ| ∼ χ2 (2n)

• chi-squared distribution is a transformation of Pareto distribution

• Student’s t-distribution is a transformation of chi-squared distribution

• Student’s t-distribution can be obtained from chi-squared distribution and normal distribution

• Noncentral beta distribution can be obtained as a transformation of chi-squared distribution and Noncentral chi-
squared distribution

• Noncentral t-distribution can be obtained from normal distribution and chi-squared distribution

A chi-squared variable with k degrees of freedom is defined as the sum of the squares of k independent standard normal
random variables.
If Y is a k-dimensional Gaussian random vector with mean vector μ and rank k covariance matrix C, then X = (Y−μ)T C −1 (Y−μ)
is chi-squared distributed with k degrees of freedom.
The sum of squares of statistically independent unit-variance Gaussian variables which do not have mean zero yields a
generalization of the chi-squared distribution called the noncentral chi-squared distribution.
If Y is a vector of k i.i.d. standard normal random variables and A is a k×k symmetric, idempotent matrix with rank k−n
then the quadratic form YT AY is chi-squared distributed with k−n degrees of freedom.
The chi-squared distribution is also naturally related to other distributions arising from the Gaussian. In particular,

• Y is F-distributed, Y ~ F(k1 ,k2 ) if Y = X


X1 /k1
2 /k2
where X1 ~ χ²(k1 ) and X2 ~ χ²(k2 ) are statistically independent.

• If X is chi-squared distributed, then X is chi distributed.

• If X1 ~ χ2 k1 and X2 ~ χ2 k2 are statistically independent, then X1 + X2 ~ χ2 k1 ₊k2 . If X1 and X2 are not independent,
then X1 + X2 is not chi-squared distributed.
88 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

2.4.5 Generalizations

The chi-squared distribution is obtained as the sum of the squares of k independent, zero-mean, unit-variance Gaussian
random variables. Generalizations of this distribution can be obtained by summing the squares of other types of Gaussian
random variables. Several such distributions are described below.

Linear combination

..., Xn are chi square random variables and a1 , ..., an ∈ R>0 , then a closed expression for the distribution of
If X1 , ∑
n
X = i=1 ai Xi is not known. It may be, however, calculated using the property of characteristic functions of the
chi-squared random variable.[16]

Chi-squared distributions

Noncentral chi-squared distribution Main article: Noncentral chi-squared distribution

The noncentral chi-squared distribution is obtained from the sum of the squares of independent Gaussian random variables
having unit variance and nonzero means.

Generalized chi-squared distribution Main article: Generalized chi-squared distribution

The generalized chi-squared distribution is obtained from the quadratic form z′Az where z is a zero-mean Gaussian vector
having an arbitrary covariance matrix, and A is an arbitrary matrix.

Gamma, exponential, and related distributions

The chi-squared distribution X ~ χ²(k) is a special case of the gamma distribution, in that X ~ Γ(k/2, 1/2) using the rate
parameterization of the gamma distribution (or X ~ Γ(k/2, 2) using the scale parameterization of the gamma distribution)
where k is an integer.
Because the exponential distribution is also a special case of the Gamma distribution, we also have that if X ~ χ²(2), then
X ~ Exp(1/2) is an exponential distribution.
The Erlang distribution is also a special case of the Gamma distribution and thus we also have that if X ~ χ²(k) with even
k, then X is Erlang distributed with shape parameter k/2 and scale parameter 1/2.

2.4.6 Applications

The chi-squared distribution has numerous applications in inferential statistics, for instance in chi-squared tests and in
estimating variances. It enters the problem of estimating the mean of a normally distributed population and the problem
of estimating the slope of a regression line via its role in Student’s t-distribution. It enters all analysis of variance problems
via its role in the F-distribution, which is the distribution of the ratio of two independent chi-squared random variables,
each divided by their respective degrees of freedom.
Following are some of the most common situations in which the chi-squared distribution arises from a Gaussian-distributed
sample.

∑n ∑n
• if X1 , ..., Xn are i.i.d. N(μ, σ2 ) random variables, then i=1 (Xi − X̄)2 ∼ σ 2 χ2n−1 where X̄ = 1
n i=1 Xi .

• The box below shows some statistics based on Xi ∼ Normal(μi, σ2 i), i = 1, ⋯, k, independent random variables
that have probability distributions related to the chi-squared distribution:
2.4. CHI-SQUARED DISTRIBUTION 89

The chi-squared distribution is also often encountered in Magnetic Resonance Imaging .[17]

2.4.7 Table of χ 2 value vs p-value

The p-value is the probability of observing a test statistic at least as extreme in a chi-squared distribution. Accordingly,
since the cumulative distribution function (CDF) for the appropriate degrees of freedom (df) gives the probability of
having obtained a value less extreme than this point, subtracting the CDF value from 1 gives the p-value. The table below
gives a number of p-values matching to χ2 for the first 10 degrees of freedom.
A low p-value indicates greater statistical significance, i.e. greater confidence that the observed deviation from the null
hypothesis is significant. A p-value of 0.05 is often used as a cutoff between significant and not-significant results.

2.4.8 History and name

This distribution was first described by the German statistician Friedrich Robert Helmert in papers of 1875-6,[19][20]
where he computed the sampling distribution of the sample variance of a normal population. Thus in German this was
traditionally known as the Helmert’sche (“Helmertian”) or “Helmert distribution”.
The distribution was independently rediscovered by the English mathematician Karl Pearson in the context of goodness
of fit, for which he developed his Pearson’s chi-squared test, published in 1900, with computed table of values published
in (Elderton 1902), collected in (Pearson 1914, pp. xxxi–xxxiii, 26–28, Table XII). The name “chi-squared” ultimately
derives from Pearson’s shorthand for the exponent in a multivariate normal distribution with the Greek letter Chi, writing
-½χ² for what would appear in modern notation as -½xT Σ−1 x (Σ being the covariance matrix).[21] The idea of a family
of “chi-squared distributions”, however, is not due to Pearson but arose as a further development due to Fisher in the
1920s.[19]

2.4.9 See also

• Cochran’s theorem

• F-distribution

• Fisher’s method for combining independent tests of significance

• Gamma distribution

• Generalized chi-squared distribution

• Hotelling’s T-squared distribution

• Noncentral chi-squared distribution

• Pearson’s chi-squared test

• Student’s t-distribution

• Wilks’ lambda distribution

• Wishart distribution
90 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

2.4.10 References
[1] M.A. Sanders. “Characteristic function of the central chi-squared distribution” (PDF). Retrieved 2009-03-06.
[2] Abramowitz, Milton; Stegun, Irene A., eds. (December 1972) [1964]. “Chapter 26”. Handbook of Mathematical Functions
with Formulas, Graphs, and Mathematical Tables. Applied Mathematics Series 55 (10 ed.). New York, USA: United States
Department of Commerce, National Bureau of Standards; Dover Publications. p. 940. ISBN 978-0-486-61272-0. LCCN
64-60036. MR 0167642.
[3] NIST (2006). Engineering Statistics Handbook - Chi-Squared Distribution
[4] Jonhson, N. L.; Kotz, S.; Balakrishnan, N. (1994). “Chi-Squared Distributions including Chi and Rayleigh”. Continuous
Univariate Distributions 1 (Second ed.). John Willey and Sons. pp. 415–493. ISBN 0-471-58495-9.
[5] Mood, Alexander; Graybill, Franklin A.; Boes, Duane C. (1974). Introduction to the Theory of Statistics (Third ed.). McGraw-
Hill. pp. 241–246. ISBN 0-07-042864-6.
[6] Westfall, Peter H. (2013). Understanding Advanced Statistical Methods. Boca Raton, FL: CRC Press. ISBN 978-1-4665-1210-
8.
[7] Ramsey, PH (1988). “Evaluating the Normal Approximation to the Binomial Test”. Journal of Educational Statistics 13 (2):
173–82.
[8] Lancaster, H.O. (1969), The Chi-squared Distribution, Wiley
[9] Dasgupta, Sanjoy D. A.; Gupta, Anupam K. (2002). “An Elementary Proof of a Theorem of Johnson and Lindenstrauss”
(PDF). Random Structures and Algorithms 22: 60–65. doi:10.1002/rsa.10073. Retrieved 2012-05-01.
[10] Chi-squared distribution, from MathWorld, retrieved Feb. 11, 2009
[11] M. K. Simon, Probability Distributions Involving Gaussian Random Variables, New York: Springer, 2002, eq. (2.35), ISBN
978-0-387-34657-1
[12] Box, Hunter and Hunter (1978). Statistics for experimenters. Wiley. p. 118. ISBN 0471093157.
[13] Bartlett, M. S.; Kendall, D. G. (1946). “The Statistical Analysis of Variance-Heterogeneity and the Logarithmic Transforma-
tion”. Supplement to the Journal of the Royal Statistical Society 8 (1): 128–138. JSTOR 2983618.
[14] Shoemaker, Lewis H. (2003). “Fixing the F Test for Equal Variances”. The American Statistician 57 (2): 105–114. doi:10.1198/0003130031441.
JSTOR 30037243.
[15] Wilson, E. B.; Hilferty, M. M. (1931). “The distribution of chi-squared” (PDF). Proc. Natl. Acad. Sci. USA 17 (12): 684–688.
[16] Davies, R.B. (1980). “Algorithm AS155: The Distributions of a Linear Combination of χ2 Random Variables”. Journal of the
Royal Statistical Society 29 (3): 323–333. doi:10.2307/2346911.
[17] den Dekker A. J., Sijbers J., (2014) “Data distributions in magnetic resonance images: a review”, Physica Medica,
[18] Chi-Squared Test Table B.2. Dr. Jacqueline S. McLaughlin at The Pennsylvania State University. In turn citing: R.A. Fisher
and F. Yates, Statistical Tables for Biological Agricultural and Medical Research, 6th ed., Table IV
[19] Hald 1998, pp. 633–692, 27. Sampling Distributions under Normality.
[20] F. R. Helmert, "Ueber die Wahrscheinlichkeit der Potenzsummen der Beobachtungsfehler und über einige damit im Zusam-
menhange stehende Fragen", Zeitschrift für Mathematik und Physik 21, 1876, S. 102–219
[21] R. L. Plackett, Karl Pearson and the Chi-Squared Test, International Statistical Review, 1983, 61f. See also Jeff Miller, Earliest
Known Uses of Some of the Words of Mathematics.

2.4.11 Further reading


• Hald, Anders (1998). A history of mathematical statistics from 1750 to 1930. New York: Wiley. ISBN 0-471-
17912-4.
• Elderton, William Palin (1902). “Tables for Testing the Goodness of Fit of Theory to Observation”. Biometrika 1
(2): 155–163. doi:10.1093/biomet/1.2.155.
2.5. DAGUM DISTRIBUTION 91

2.4.12 External links

• Hazewinkel, Michiel, ed. (2001), “Chi-squared distribution”, Encyclopedia of Mathematics, Springer, ISBN 978-
1-55608-010-4

• Calculator for the pdf, cdf and quantiles of the chi-squared distribution

• Earliest Uses of Some of the Words of Mathematics: entry on Chi squared has a brief history

• Course notes on Chi-Squared Goodness of Fit Testing from Yale University Stats 101 class.

• Mathematica demonstration showing the chi-squared sampling distribution of various statistics, e.g. Σx², for a
normal population

• Simple algorithm for approximating cdf and inverse cdf for the chi-squared distribution with a pocket calculator

2.5 Dagum distribution


The Dagum distribution is a continuous probability distribution defined over positive real numbers. It is named after
Camilo Dagum, who proposed it in a series of papers in the 1970s.[1][2] The Dagum distribution arose from several variants
of a new model on the size distribution of personal income and is mostly associated with the study of income distribution.
There is both a three-parameter specification (Type I) and a four-parameter specification (Type II) of the Dagum distri-
bution; a summary of the genesis of this distribution can be found in “A Guide to the Dagum Distributions”.[3] A general
source on statistical size distributions often cited in work using the Dagum distribution is Statistical Size Distributions in
Economics and Actuarial Sciences.[4]

2.5.1 Definition

The cumulative distribution function of the Dagum distribution (Type I) is given by

( ( )−a )−p
F (x; a, b, p) = 1 + xb for x > 0 and where a, b, p > 0.

The corresponding probability density function is given by

( )
ap ( xb )ap
f (x; a, b, p) = ( x )p+1 .
x ( b )a + 1

The Dagum distribution can be derived as a special case of the Generalized Beta II (GB2) distribution (a generalization of
the Beta prime distribution). There is also an intimate relationship between the Dagum and Singh-Maddala distribution.

1
X ∼ D(a, b, p) ⇐⇒ ∼ SM (a, 1b , p)
X
The cumulative distribution function of the Dagum (Type II) distribution adds a point mass at the origin and then follows
a Dagum (Type I) distribution over the rest of the support (i.e. over the positive halfline)

( ( x )−a )−p
F (x; a, b, p, δ) = δ + (1 − δ) 1 + .
b
92 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

2.5.2 References

[1] Dagum, Camilo (1975); A model of income distribution and the conditions of existence of moments of finite order; Bulletin of
the International Statistical Institute, 46 (Proceeding of the 40th Session of the ISI, Contributed Paper), 199-205.

[2] Dagum, Camilo (1977); A new model of personal income distribution: Specification and estimation; Economie Appliquée, 30,
413-437.

[3] Kleiber, Christian (2008) “A Guide to the Dagum Distributions” in Chotikapanich, Duangkamon (ed.) Modeling Income Dis-
tributions and Lorenz Curves (Economic Studies in Inequality, Social Exclusion and Well-Being), Chapter 6, Springer

[4] Kleiber, Christian and Samuel Kotz (2003) Statistical Size Distributions in Economics and Actuarial Sciences, Wiley

2.5.3 External links

• Camilo Dagum (1925 - 2005) : obituary

2.6 Exponential distribution


Not to be confused with the exponential family of probability distributions.

In probability theory and statistics, the exponential distribution (a.k.a. negative exponential distribution) is the
probability distribution that describes the time between events in a Poisson process, i.e. a process in which events occur
continuously and independently at a constant average rate. It is a particular case of the gamma distribution. It is the
continuous analogue of the geometric distribution, and it has the key property of being memoryless. In addition to being
used for the analysis of Poisson processes, it is found in various other contexts.
The exponential distribution is not the same as the class of exponential families of distributions, which is a large class
of probability distributions that includes the exponential distribution as one of its members, but also includes the normal
distribution, binomial distribution, gamma distribution, Poisson, and many others.

2.6.1 Characterization

Probability density function

The probability density function (pdf) of an exponential distribution is

{
λe−λx x ≥ 0,
f (x; λ) =
0 x < 0.

Alternatively, this can be defined using the Heaviside step function, H(x).

f (x; λ) = λe−λx H(x)

Here λ > 0 is the parameter of the distribution, often called the rate parameter. The distribution is supported on the
interval [0, ∞). If a random variable X has this distribution, we write X ~ Exp(λ).
The exponential distribution exhibits infinite divisibility.
2.6. EXPONENTIAL DISTRIBUTION 93

Cumulative distribution function

The cumulative distribution function is given by

{
1 − e−λx x ≥ 0,
F (x; λ) =
0 x < 0.

Alternatively, this can be defined using the Heaviside step function, H(x).

F (x; λ) = (1 − e−λx )H(x)

Alternative parameterization

A commonly used alternative parametrization is to define the probability density function (pdf) of an exponential distri-
bution as

{
1 −β
x

βe x ≥ 0,
f (x; β) =
0 x < 0.

where β > 0 is mean, standard deviation, and scale parameter of the distribution, the reciprocal of the rate parameter, λ,
defined above. In this specification, β is a survival parameter in the sense that if a random variable X is the duration of
time that a given biological or mechanical system manages to survive and X ~ Exp(β) then E[X] = β. That is to say, the
expected duration of survival of the system is β units of time. The parametrization involving the “rate” parameter arises
in the context of events arriving at a rate λ, when the time between events (which might be modeled using an exponential
distribution) has a mean of β = λ−1 .
The alternative specification is sometimes more convenient than the one given above, and some authors will use it as a
standard definition. This alternative specification is not used here. Unfortunately this gives rise to a notational ambiguity.
In general, the reader must check which of these two specifications is being used if an author writes "X ~ Exp(λ)", since
either the notation in the previous (using λ) or the notation in this section (here, using β to avoid confusion) could be
intended. An example of this notational switch: reference[1] uses λ for β.

2.6.2 Properties
Mean, variance, moments and median

The mean or expected value of an exponentially distributed random variable X with rate parameter λ is given by

1
E[X] = =β
λ
In light of the examples given above, this makes sense: if you receive phone calls at an average rate of 2 per hour, then
you can expect to wait half an hour for every call.
The variance of X is given by

1
Var[X] =
λ2
so the standard deviation is equal to the mean.
94 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

The mean is the probability mass centre, that is the first moment.

The moments of X, for n = 1, 2, ..., are given by

n!
E [X n ] =
λn
The median of X is given by

ln(2)
m[X] = < E[X]
λ
where ln refers to the natural logarithm. Thus the absolute difference between the mean and median is
2.6. EXPONENTIAL DISTRIBUTION 95

The median is the preimage F−1 (1/2).

1 − ln(2) 1
| E[X] − m[X]| = < = deviation standard
λ λ
in accordance with the median-mean inequality.

Memorylessness

An exponentially distributed random variable T obeys the relation

Pr (T > s + t|T > s) = Pr(T > t), ∀s, t ≥ 0


96 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

When T is interpreted as the waiting time for an event to occur relative to some initial time, this relation implies that, if T
is conditioned on a failure to observe the event over some initial period of time s, the distribution of the remaining waiting
time is the same as the original unconditional distribution. For example, if an event has not occurred after 30 seconds,
the conditional probability that occurrence will take at least 10 more seconds is equal to the unconditional probability of
observing the event more than 10 seconds relative to the initial time.
The exponential distribution and the geometric distribution are the only memoryless probability distributions.
The exponential distribution is consequently also necessarily the only continuous probability distribution that has a constant
Failure rate.

Quantiles

Tukey criteria for anomalies.[2]

The quantile function (inverse cumulative distribution function) for Exp(λ) is

− ln(1 − p)
F −1 (p; λ) = , 0≤p<1
λ
The quartiles are therefore:

• first quartile: ln(4/3)/λ


2.6. EXPONENTIAL DISTRIBUTION 97

• median: ln(2)/λ
• third quartile: ln(4)/λ

And as a consequence the interquartile range is ln(3)/λ.

Kullback–Leibler divergence

The directed Kullback–Leibler divergence of eλ ('approximating' distribution) from eλ0 ('true' distribution) is given by

λ
∆(λ0 ||λ) = log(λ0 ) − log(λ) + − 1.
λ0

Maximum entropy distribution

Among all continuous probability distributions with support [0, ∞) and mean µ , the exponential distribution with λ = µ1
has the largest differential entropy. In other words, it is the maximum entropy probability distribution for a random variate
X which is greater than or equal to zero and for which E[X] is fixed.[3]

Distribution of the minimum of exponential random variables

Let X1 , ..., Xn be independent exponentially distributed random variables with rate parameters λ1 , ..., λn. Then

min {X1 , . . . , Xn }

is also exponentially distributed, with parameter

λ = λ1 + · · · + λn

This can be seen by considering the complementary cumulative distribution function:

Pr (min{X1 , . . . , Xn } > x) = Pr (X1 > x, . . . , Xn > x)


∏n
= Pr(Xi > x)
i=1
( )
∏n ∑
n
= exp(−xλi ) = exp −x λi .
i=1 i=1

The index of the variable which achieves the minimum is distributed according to the law

λk
Pr (Xk = min{X1 , . . . , Xn }) = .
λ1 + · · · + λn
Note that

max{X1 , . . . , Xn }

is not exponentially distributed.


98 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

2.6.3 Parameter estimation


Suppose a given variable is exponentially distributed and the rate parameter λ is to be estimated. Among the estimators
of λ, the maximum likelihood estimator (MLE) and the uniformly minimum variance unbiased estimator (UMVUE) are
1/([mean sample|X̄]) and (n − 1)/(X̄ ∗ n) , respectively. The one that minimizes the expected mean squared error is
(n − 2)/(X̄ ∗ n) .[4]

Maximum likelihood

The likelihood function for λ, given an independent and identically distributed sample x = (x1 , ..., xn) drawn from the
variable, is:

( )

n ∑
n
L(λ) = λ exp(−λxi ) = λn exp −λ xi = λn exp (−λnx) ,
i=1 i=1

where:

1∑
n
x= xi
n i=1

is the sample mean.


The derivative of the likelihood function’s logarithm is:


 1
> 0, 0 < λ < x ,


d d n 
ln(L(λ)) = (n ln(λ) − λnx) = − nx = 0, λ = x1 ,
dλ dλ λ 



< 0, λ > 1 .
x

Consequently the maximum likelihood estimate for the rate parameter is:

b = 1.
λ
x
Although this is not an unbiased estimator of λ , x is an unbiased[5] MLE[6] estimator of 1/λ = β, where β is the scale
parameter defined in the 'Alternative parameterization' section above and the distribution mean.

Confidence intervals

The 100(1 − α)% confidence interval for the rate parameter of an exponential distribution is given by:[7]

2n 1 2n
< <
b 2 α
λχ λ b 2α
λχ
1− ,2n 2 ,2n 2

which is also equal to:

2nx 1 2nx
< < 2
χ21− α ,2n λ χ α ,2n
2 2
2.6. EXPONENTIAL DISTRIBUTION 99

where χ2 p,v is the 100(p) percentile of the chi squared distribution with v degrees of freedom, n is the number of obser-
vations of inter-arrival times in the sample, and x-bar is the sample average. A simple approximation to the exact interval
endpoints can be derived using a normal approximation to the χ2 p,v distribution. This approximation gives the following
values for a 95% confidence interval:

( )
b 1.96
λlow = λ 1 − √
n
( )
b 1 + 1.96
λupp = λ √
n
This approximation may be acceptable for samples containing at least 15 to 20 elements.[8]

Bayesian inference

The conjugate prior for the exponential distribution is the gamma distribution (of which the exponential distribution is a
special case). The following parameterization of the gamma probability density function is useful:

β α α−1
Gamma(λ; α, β) = λ exp(−λβ).
Γ(α)

The posterior distribution p can then be expressed in terms of the likelihood function defined above and a gamma prior:

p(λ) ∝ L(λ) × Gamma(λ; α, β)


β α α−1
= λn exp (−λnx) × λ exp(−λβ)
Γ(α)
∝ λ(α+n)−1 exp(−λ (β + nx)).

Now the posterior density p has been specified up to a missing normalizing constant. Since it has the form of a gamma
pdf, this can easily be filled in, and one obtains:

p(λ) = Gamma(λ; α + n, β + nx).

Here the hyperparameter α can be interpreted as the number of prior observations, and β as the sum of the prior obser-
vations. The posterior mean here is:

α+n
.
β + nx

2.6.4 Generating exponential variates


A conceptually very simple method for generating exponential variates is based on inverse transform sampling: Given a
random variate U drawn from the uniform distribution on the unit interval (0, 1), the variate

T = F −1 (U )

has an exponential distribution, where F −1 is the quantile function, defined by


100 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

− ln(1 − p)
F −1 (p) = .
λ
Moreover, if U is uniform on (0, 1), then so is 1 − U. This means one can generate exponential variates as follows:

− ln(U )
T = .
λ
Other methods for generating exponential variates are discussed by Knuth[9] and Devroye.[10]
The ziggurat algorithm is a fast method for generating exponential variates.
A fast method for generating a set of ready-ordered exponential variates without using a sorting routine is also available.[10]

2.6.5 Related distributions


• Exponential distribution is closed under scaling by a positive factor. If X ~ Exp(λ) then kX ~ Exp(λ/k).

• If X ~ Exp(λ) and Y ~ Exp(ν) then min(X, Y) ~ Exp(λ + ν).

• If Xi ~ Exp(λ) then min{X1 , ..., Xn} ~ Exp(nλ).

• The Benktander Weibull distribution reduces to a truncated exponential distribution. If X ~ Exp(λ) then 1+X ~
BenktanderWeibull(λ, 1).

• The exponential distribution is a limit of a scaled beta distribution:

lim nBeta(1, n) = Exp(1).


n→∞


• If Xi ~ Exp(λ) then the sum X1 + ... + Xk = i Xi ~ Erlang(k, λ) which is just a Gamma(k, λ−1 ) (in (k, θ)
parametrization) or Gamma(k, λ) (in (α,β) parametrization) with an integer shape parameter k.

• If X ~ Exp(1) then μ − σ log(X) ~ GEV(μ, σ, 0).

• If X ~ Exp(λ) then X ~ Gamma(1, λ−1 ) (in (k, θ) parametrization) or Gamma(1, λ) (in (α, β) parametrization).

• If X ~ Exp(λ) and Y ~ Exp(ν) then λX − νY ~ Laplace(0, 1).

• If X, Y ~ Exp(λ) then X − Y ~ Laplace(0, λ−1 ).

• If X ~ Laplace(μ, β−1 ) then |X − μ| ~ Exp(β).

• If X ~ Exp(1) then (logistic distribution):

( )
e−X
µ − β log 1−e−X
∼ Logistic(µ, β)
2.6. EXPONENTIAL DISTRIBUTION 101

• If X, Y ~ Exp(1) then (logistic distribution):

(X )
µ − β log Y ∼ Logistic(µ, β)

• If X ~ Exp(λ) then keX ~ Pareto(k, λ).

• If X ~ Pareto(1, λ) then log(X) ~ Exp(λ).

• Exponential distribution is a special case of type 3 Pearson distribution.

e−X
• If X ~ Exp(λ) then k ∼ PowerLaw(k, λ) (power law)

• If X ~ Rayleigh(λ−1/2 ) then X2 /2 ~ Exp(λ).

• If X ~ Exp(λ) then X ∼ Weibull( λ1 , 1) (Weibull distribution)

• If Xi ~ U(0, 1) then

lim n min (X1 , . . . , Xn ) ∼ Exp(1)


n→∞

• If Y|X ~ Poisson(X) where X ~ Exp(λ−1 ) then Y ∼ Geometric( 1+λ


1
) (geometric distribution)

• If X ~ Exp(1) and Y ∼ Γ(α, α
β
) then XY ∼ K(α, β) (K-distribution)

• The Hoyt distribution can be obtained from Exponential distribution and Arcsine distribution

• If X ~ Exp(λ) and Y ~ Erlang(n, λ) then:

X
∼ Pareto(1, n)
Y

• If X ~ Exp(λ) and Y ∼ Γ(n, λ1 ) then X


Y ∼ Pareto(1, n)

• If X ~ SkewLogistic(θ), then log(1 + e-X ) ~ Exp(θ).

• If X ~ Exp(λ) and Y = μ − β log(Xλ) then Y ∼ Gumbel(μ, β).

• If X ~ Exp(1/2) then X ∼ χ2 2 , i.e. X has a chi-squared distribution with 2 degrees of freedom.

• Let X ∼ Exp(λX) and Y ∼ Exp(λY) be independent. Then Z = λλX X


YY
has probability density function fZ (z) =
1 λX
(z+1)2 . This can be used to obtain a confidence interval for λY .

Other related distributions:

• Hyper-exponential distribution – the distribution whose density is a weighted sum of exponential densities.
• Hypoexponential distribution – the distribution of a general sum of exponential random variables.
• exGaussian distribution – the sum of an exponential distribution and a normal distribution.
102 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

2.6.6 Applications of exponential distribution


Occurrence of events

The exponential distribution occurs naturally when describing the lengths of the inter-arrival times in a homogeneous
Poisson process.
The exponential distribution may be viewed as a continuous counterpart of the geometric distribution, which describes
the number of Bernoulli trials necessary for a discrete process to change state. In contrast, the exponential distribution
describes the time for a continuous process to change state.
In real-world scenarios, the assumption of a constant rate (or probability per unit time) is rarely satisfied. For example,
the rate of incoming phone calls differs according to the time of day. But if we focus on a time interval during which the
rate is roughly constant, such as from 2 to 4 p.m. during work days, the exponential distribution can be used as a good
approximate model for the time until the next phone call arrives. Similar caveats apply to the following examples which
yield approximately exponentially distributed variables:

• The time until a radioactive particle decays, or the time between clicks of a geiger counter

• The time it takes before your next telephone call

• The time until default (on payment to company debt holders) in reduced form credit risk modeling

Exponential variables can also be used to model situations where certain events occur with a constant probability per unit
length, such as the distance between mutations on a DNA strand, or between roadkills on a given road.
In queuing theory, the service times of agents in a system (e.g. how long it takes for a bank teller etc. to serve a customer)
are often modeled as exponentially distributed variables. (The arrival of customers for instance is also modeled by the
Poisson distribution if the arrivals are independent and distributed identically.) The length of a process that can be thought
of as a sequence of several independent tasks follows the Erlang distribution (which is the distribution of the sum of several
independent exponentially distributed variables).
Reliability theory and reliability engineering also make extensive use of the exponential distribution. Because of the
memoryless property of this distribution, it is well-suited to model the constant hazard rate portion of the bathtub curve
used in reliability theory. It is also very convenient because it is so easy to add failure rates in a reliability model. The
exponential distribution is however not appropriate to model the overall lifetime of organisms or technical devices, because
the “failure rates” here are not constant: more failures occur for very young and for very old systems.
In physics, if you observe a gas at a fixed temperature and pressure in a uniform gravitational field, the heights of the various
molecules also follow an approximate exponential distribution, known as the Barometric formula. This is a consequence
of the entropy property mentioned below.
In hydrology, the exponential distribution is used to analyze extreme values of such variables as monthly and annual
maximum values of daily rainfall and river discharge volumes.[12]

The blue picture illustrates an example of fitting the exponential distribution to ranked annually maximum
one-day rainfalls showing also the 90% confidence belt based on the binomial distribution. The rainfall data
are represented by plotting positions as part of the cumulative frequency analysis.

Prediction

Having observed a sample of n data points from an unknown exponential distribution a common task is to use these
samples to make predictions about future data from the same source. A common predictive distribution over future
samples is the so-called plug-in distribution, formed by plugging a suitable estimate for the rate parameter λ into the
exponential density function. A common choice of estimate is the one provided by the principle of maximum likelihood,
and using this yields the predictive density over a future sample xn₊₁, conditioned on the observed samples x = (x1 , ..., xn)
given by
2.6. EXPONENTIAL DISTRIBUTION 103

Fitted cumulative exponential distribution to annually maximum 1-day rainfalls using CumFreq[11]

( ) ( x )
1 n+1
pML (xn+1 | x1 , . . . , xn ) = exp −
x x

The Bayesian approach provides a predictive distribution which takes into account the uncertainty of the estimated pa-
rameter, although this may depend crucially on the choice of prior.
A predictive distribution free of the issues of choosing priors that arise under the subjective Bayesian approach is

n
nn+1 (x)
pCNML (xn+1 | x1 , . . . , xn ) = n+1 ,
(nx + xn+1 )

which can be considered as

• (1) a frequentist confidence distribution, obtained from the distribution of the pivotal quantity xn+1 /x ;[13]
• (2) a profile predictive likelihood, obtained by eliminating the parameter λ from the joint likelihood of xn₊₁ and λ
by maximization;[14]
• (3) an objective Bayesian predictive posterior distribution, obtained using the non-informative Jeffreys prior 1/λ;
• (4) the Conditional Normalized Maximum Likelihood (CNML) predictive distribution, from information theoretic
considerations.[15]
104 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

The accuracy of a predictive distribution may be measured using the distance or divergence between the true exponential
distribution with rate parameter, λ0 , and the predictive distribution based on the sample x. The Kullback–Leibler diver-
gence is a commonly used, parameterisation free measure of the difference between two distributions. Letting Δ(λ0 ||p)
denote the Kullback–Leibler divergence between an exponential with rate parameter λ0 and a predictive distribution p it
can be shown that

1
Eλ0 [∆(λ0 || pML )] = ψ(n) + − log(n)
n−1
1
Eλ0 [∆(λ0 || pCNML )] = ψ(n) + − log(n)
n
where the expectation is taken with respect to the exponential distribution with rate parameter λ0 ∈ (0, ∞), and ψ( · ) is
the digamma function. It is clear that the CNML predictive distribution is strictly superior to the maximum likelihood
plug-in distribution in terms of average Kullback–Leibler divergence for all sample sizes n > 0.

2.6.7 See also


• Dead time – an application of exponential distribution to particle detector analysis.
• Laplace distribution, or the “double exponential distribution”.

2.6.8 References
[1] David Olive, Chapter 4. Truncated Distributions, “Lemma 4.3”, Southern Illinois University, February 18, 2010, p.107.

[2] Brillinger, David R. (2011). “Data analysis, exploratory” (PDF). SAGE Publications. pp. 530–537. doi:10.4135/9781412959636.n128.
Retrieved 2013-11-21.

[3] Park, Sung Y.; Bera, Anil K. (2009). “Maximum entropy autoregressive conditional heteroskedasticity model” (PDF). Journal
of Econometrics (Elsevier): 219–230. Retrieved 2011-06-02.

[4] Abdulaziz Elfessi and David M. Reineke, "A Bayesian Look at Classical Estimation: The Exponential Distribution", Journal of
Statistics Education Volume 9, Number 1 (2001).

[5] Richard Arnold Johnson; Dean W. Wichern (2007). Applied Multivariate Statistical Analysis. Pearson Prentice Hall. ISBN
978-0-13-187715-3. Retrieved 10 August 2012.

[6] NIST/SEMATECH e-Handbook of Statistical Methods

[7] Ross, Sheldon M. (2009). Introduction to probability and statistics for engineers and scientists (4th ed.). Associated Press. p.
267. ISBN 978-0-12-370483-2.

[8] Guerriero, V. (2012). “Power Law Distribution: Method of Multi-scale Inferential Statistics”. Journal of Modern Mathematics
Frontier (JMMF) 1: 21–28.

[9] Donald E. Knuth (1998). The Art of Computer Programming, volume 2: Seminumerical Algorithms, 3rd edn. Boston: Addison–
Wesley. ISBN 0-201-89684-2. See section 3.4.1, p. 133.

[10] Luc Devroye (1986). Non-Uniform Random Variate Generation. New York: Springer-Verlag. ISBN 0-387-96305-7. See
chapter IX, section 2, pp. 392–401.

[11] “Cumfreq, a free computer program for cumulative frequency analysis”.

[12] Ritzema (ed.), H.P. (1994). Frequency and Regression Analysis (PDF). Chapter 6 in: Drainage Principles and Applications,
Publication 16, International Institute for Land Reclamation and Improvement (ILRI), Wageningen, The Netherlands. pp.
175–224. ISBN 90-70754-33-9.

[13] Lawless, J.F., Fredette, M.,"Frequentist predictions intervals and predictive distributions”, Biometrika (2005), Vol 92, Issue 3,
pp 529–542.
2.7. F-DISTRIBUTION 105

[14] Bjornstad, J.F. (1990). “Predictive Likelihood: A Review”. Statist. Sci. 5 (2): 242–254. doi:10.1214/ss/1177012175.

[15] D. F. Schmidt and E. Makalic, "Universal Models for the Exponential Distribution", IEEE Transactions on Information Theory,
Volume 55, Number 7, pp. 3087–3090, 2009 doi:10.1109/TIT.2009.2018331

2.6.9 External links


• Hazewinkel, Michiel, ed. (2001), “Exponential distribution”, Encyclopedia of Mathematics, Springer, ISBN 978-
1-55608-010-4
• Online calculator of Exponential Distribution

2.7 F-distribution
Not to be confused with F-statistics as used in population genetics.
The F-distribution, also known as Snedecor’s F distribution or the Fisher–Snedecor distribution (after Ronald
Fisher and George W. Snedecor) is, in probability theory and statistics, a continuous probability distribution.[1][2][3][4]
The F-distribution arises frequently as the null distribution of a test statistic, most notably in the analysis of variance; see
F-test.

2.7.1 Definition
If a random variable X has an F-distribution with parameters d1 and d2 , we write X ~ F(d1 , d2 ). Then the probability
density function (pdf) for X is given by


d
(d1 x)d1 d2 2
(d1 x+d2 )d1 +d2
f (x; d1 , d2 ) = )
( d1
, d22
xB 2
( ) d21 ( )− d1 +d 2
1 d1 d1
−1 d1 2
= ( d1 d2 ) x 2 1+ x
B 2, 2 d2 d2

for real x ≥ 0. Here B is the beta function. In many applications, the parameters d1 and d2 are positive integers, but the
distribution is well-defined for positive real values of these parameters.
The cumulative distribution function is

( d1 )
F (x; d1 , d2 ) = I d1 x
2 , d22 ,
d1 x+d2

where I is the regularized incomplete beta function.


The expectation, variance, and other details about the F(d1 , d2 ) are given in the sidebox; for d2 > 8, the excess kurtosis is

d1 (5d2 − 22)(d1 + d2 − 2) + (d2 − 4)(d2 − 2)2


γ2 = 12
d1 (d2 − 6)(d2 − 8)(d1 + d2 − 2)

The k-th moment of an F(d1 , d2 ) distribution exists and is finite only when 2k < d2 and it is equal to [5]

( )k ( d1 ) ( )
d2 Γ + k Γ d22 − k
µX (k) = ( )
2
( )
d1 Γ d21 Γ d22
106 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

Biologist and statistician Ronald Fisher

The F-distribution is a particular parametrization of the beta prime distribution, which is also called the beta distribution
of the second kind.
The characteristic function is listed incorrectly in many standard references (e.g., [2] ). The correct expression [6] is
2.7. F-DISTRIBUTION 107

( )
Γ( d1 +d
2 )
2
d1 d2 d2
φF
d1 ,d2 (s) = U , 1 − , − ıs
Γ( d22 ) 2 2 d1
where U(a, b, z) is the confluent hypergeometric function of the second kind.

2.7.2 Characterization
A random variate of the F-distribution with parameters d1 and d2 arises as the ratio of two appropriately scaled chi-
squared variates:[7]

U1 /d1
X=
U2 /d2
where

• U 1 and U 2 have chi-squared distributions with d1 and d2 degrees of freedom respectively, and
• U 1 and U 2 are independent.

In instances where the F-distribution is used, for example in the analysis of variance, independence of U 1 and U 2 might
be demonstrated by applying Cochran’s theorem.
Equivalently, the random variable of the F-distribution may also be written

s21 s22
X= /
σ12 σ22
where s1 2 and s2 2 are the sums of squares S 1 2 and S 2 2 from two normal processes with variances σ1 2 and σ2 2 divided by
the corresponding number of χ2 degrees of freedom, d1 and d2 respectively.
In a frequentist context, a scaled F-distribution therefore gives the probability p(s1 2 /s2 2 | σ1 2 , σ2 2 ), with the F-distribution
itself, without any scaling, applying where σ1 2 is being taken equal to σ2 2 . This is the context in which the F-distribution
most generally appears in F-tests: where the null hypothesis is that two independent normal variances are equal, and
the observed sums of some appropriately selected squares are then examined to see whether their ratio is significantly
incompatible with this null hypothesis.
The quantity X has the same distribution in Bayesian statistics, if an uninformative rescaling-invariant Jeffreys prior is
taken for the prior probabilities of σ1 2 and σ2 2 .[8] In this context, a scaled F-distribution thus gives the posterior probability
p(σ2 2 /σ1 2 |s1 2 , s2 2 ), where now the observed sums s1 2 and s2 2 are what are taken as known.

Differential equation

The probability density function of the F-distribution is a solution of the following differential equation:

 

 2x (d1 x + d2 ) f ′ (x) + (2d1 x + d2 d1 x − d2 d1 + 2d2 ) f (x) = 0, 

 
d1 d2


1
d12 d22 (d1 +d2 ) 2 (−d1 −d2 ) 

 f (1) = d1 d

B( 2 , 22 )

2.7.3 Generalization
A generalization of the (central) F-distribution is the noncentral F-distribution.
108 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

2.7.4 Related distributions and properties


X/d1
• If X ∼ χ2d1 and Y ∼ χ2d2 are independent, then Y /d2 ∼ F(d1 , d2 )

• If X ∼ Beta(d1 /2, d2 /2) (Beta distribution) then d2 X


d1 (1−X) ∼ F(d1 , d2 )
d1 X/d2
• Equivalently, if X ~ F(d1 , d2 ), then 1+d1 X/d2 ∼ Beta(d1 /2, d2 /2) .

• If X ~ F(d1 , d2 ) then Y = limd2 →∞ d1 X has the chi-squared distribution χ2d1

• F(d1 , d2 ) is equivalent to the scaled Hotelling’s T-squared distribution d2


d1 (d1 +d2 −1) T2 (d1 , d1 + d2 − 1) .

• If X ~ F(d1 , d2 ) then X−1 ~ F(d2 , d1 ).


• If X ~ t(n) then

X 2 ∼ F(1, n)

X −2 ∼ F(n, 1)

• F-distribution is a special case of type 6 Pearson distribution

• If X and Y are independent, with X, Y ~ Laplace(μ, b) then

|X−µ|
|Y −µ| ∼ F(2, 2)

log X
• If X ~ F(n, m) then 2 ∼ FisherZ(n, m) (Fisher’s z-distribution)
• The noncentral F-distribution simplifies to the F-distribution if λ = 0.
• The doubly noncentral F-distribution simplifies to the F-distribution if λ1 = λ2 = 0

• If QX (p) is the quantile p for X ~ F(d1 , d2 ) and QY (1 − p) is the quantile 1−p for Y ~ F(d2 , d1 ), then

1
QX (p) =
QY (1 − p)

2.7.5 See also


• Chi-squared distribution
• Chow test
• Gamma distribution
• Hotelling’s T-squared distribution
• Student’s t-distribution
• Wilks’ lambda distribution
• Wishart distribution
2.8. FISHER’S Z-DISTRIBUTION 109

2.7.6 References
[1] Johnson, Norman Lloyd; Samuel Kotz; N. Balakrishnan (1995). Continuous Univariate Distributions, Volume 2 (Second Edition,
Section 27). Wiley. ISBN 0-471-58494-0.

[2] Abramowitz, Milton; Stegun, Irene A., eds. (December 1972) [1964]. “Chapter 26”. Handbook of Mathematical Functions
with Formulas, Graphs, and Mathematical Tables. Applied Mathematics Series 55 (10 ed.). New York, USA: United States
Department of Commerce, National Bureau of Standards; Dover Publications. p. 946. ISBN 978-0-486-61272-0. LCCN
64-60036. MR 0167642.

[3] NIST (2006). Engineering Statistics Handbook – F Distribution

[4] Mood, Alexander; Franklin A. Graybill; Duane C. Boes (1974). Introduction to the Theory of Statistics (Third Edition, pp.
246–249). McGraw-Hill. ISBN 0-07-042864-6.

[5] Taboga, Marco. “The F distribution”.

[6] Phillips, P. C. B. (1982) “The true characteristic function of the F distribution,” Biometrika, 69: 261–264 JSTOR 2335882

[7] M.H. DeGroot (1986), Probability and Statistics (2nd Ed), Addison-Wesley. ISBN 0-201-11366-X, p. 500

[8] G.E.P. Box and G.C. Tiao (1973), Bayesian Inference in Statistical Analysis, Addison-Wesley. p.110

2.7.7 External links


• Table of critical values of the F-distribution
• Earliest Uses of Some of the Words of Mathematics: entry on F-distribution contains a brief history
• Free calculator for F-testing

2.8 Fisher’s z-distribution


Not to be confused with Fisher z-transformation.
Fisher’s z-distribution is the statistical distribution of half the logarithm of an F distribution variate:

1
z= log F
2
It was first described by Ronald Fisher in a paper delivered at the International Mathematical Congress of 1924 in Toronto,
titled “On a distribution yielding the error functions of several well-known statistics” (Proceedings of the International
Congress of Mathematics, Toronto, 2: 805-813 (1924). Nowadays one usually uses the F distribution instead.
The probability density function and cumulative distribution function can be found by using the F-distribution at the value
of x′ = e2x . However, the mean and variance do not follow the same transformation.
The probability density function is[1][2]

d /2 d /2
2d11 d22 ed1 x
f (x; d1 , d2 ) = ,
B(d1 /2, d2 /2) (d1 e2x + d2 )(d1 +d2 )/2

where B is the beta function.


When the degrees of freedom becomes large ( d1 , d2 → ∞ ) the distribution approach normality with mean[1]

x̄ = (1/d2 − 1/d1 )/2


110 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

and variance

σx2 = (1/d1 + 1/d2 )/2.

2.8.1 Related Distribution


• If X ∼ FisherZ(n, m) then e2X ∼ F(n, m) (F-distribution)
log X
• If X ∼ F(n, m) then 2 ∼ FisherZ(n, m)

2.8.2 References
• Fisher, R.A. (1924) On a Distribution Yielding the Error Functions of Several Well Known Statistics Proceedings
of the International Congress of Mathematics, Toronto, 2: 805-813 pdf copy

[1] Leo A. Aroian (December 1941). “A study of R. A. Fisher’s z distribution and the related F distribution”. The Annals of
Mathematical Statistics 12 (4): 429–448. doi:10.1214/aoms/1177731681. JSTOR 2235955.

[2] Charles Ernest Weatherburn. A first course in mathematical statistics.

2.8.3 External links


• MathWorld entry

2.9 Folded normal distribution


The folded normal distribution is a probability distribution related to the normal distribution. Given a normally dis-
tributed random variable X with mean μ and variance σ2 , the random variable Y = |X| has a folded normal distribution.
Such a case may be encountered if only the magnitude of some variable is recorded, but not its sign. The distribution is
called Folded because probability mass to the left of the x = 0 is “folded” over by taking the absolute value. In the physics
of heat conduction, the folded normal distribution is a fundamental solution of the heat equation on the upper plane (i.e.
a heat kernel).
The probability density function (PDF) is given by

1 (x−µ)2 1 (x+µ)2
fY (x; µ, σ) = √ e− 2σ2 + √ e− 2σ2
σ 2π σ 2π
for x≥0, and 0 everywhere else. It follows that the cumulative distribution function (CDF) is given by:

[ ( ) ( )]
1 x+µ x−µ
FY (x; µ, σ) = erf √ + erf √
2 σ 2 σ 2
for x≥0, where erf() is the error function. This expression reduces to the CDF of the half-normal distribution when μ =
0.
The mean of the folded distribution is then

√ ( ) ( )
2 −µ2 −µ
µY = σ exp − µ erf √ ,
π 2σ 2 2σ
2.9. FOLDED NORMAL DISTRIBUTION 111

The variance then is expressed easily in terms of the mean:

σY2 = µ2 + σ 2 − µ2Y .

Both the mean (μ) and variance (σ2 ) of X in the original normal distribution can be interpreted as the location and scale
parameters of Y in the folded distribution.

2.9.1 Differential equations

The PDF of the folded normal distribution can also be defined by the system of differential equations

 4 ′′ ′
( 2 )

 σ f (x) + 2σ 2
xf (x) + −µ + σ 2
+ x2
f (x) = 0
√ 1 − 2σ2
µ2

 f (0) = 2/π σ e
 ′
f (0) = 0

2.9.2 Related distributions

• When μ = 0, the distribution of Y is a half-normal distribution.

• The random variable (Y/σ)2 has a noncentral chi-squared distribution with 1 degree of freedom and noncentrality
equal to (μ/σ)2 .

2.9.3 See also

• Folded cumulative distribution

2.9.4 External links

• Virtual Laboratories: The Folded Normal Distribution

2.9.5 References

• Leone FC, Nottingham RB, Nelson LS (1961). “The Folded Normal Distribution”. Technometrics (Technometrics,
Vol. 3, No. 4) 3 (4): 543–550. doi:10.2307/1266560. JSTOR 1266560.

• Johnson NL (1962). “The folded normal distribution: accuracy of the estimation by maximum likelihood”. Tech-
nometrics (Technometrics, Vol. 4, No. 2) 4 (2): 249–256. doi:10.2307/1266622. JSTOR 1266622.

• Nelson LS (1980). “The Folded Normal Distribution”. J Qual Technol 12 (4): 236–238.

• Elandt RC (1961). “The folded normal distribution: two methods of estimating parameters from moments”. Tech-
nometrics (Technometrics, Vol. 3, No. 4) 3 (4): 551–562. doi:10.2307/1266561. JSTOR 1266561.

• Lin PC (2005). “Application of the generalized folded-normal distribution to the process capability measures”. Int
J Adv Manuf Technol 26 (7–8): 825–830. doi:10.1007/s00170-003-2043-x.
112 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

2.10 Fréchet distribution


The Fréchet distribution, also known as inverse Weibull distribution,[2][3] is a special case of the generalized extreme
value distribution. It has the cumulative distribution function

−α
Pr(X ≤ x) = e−x if x > 0.

where α > 0 is a shape parameter. It can be generalised to include a location parameter m (the minimum) and a scale
parameter s > 0 with the cumulative distribution function

−α
Pr(X ≤ x) = e−(
x−m
s ) if x > m.

Named for Maurice Fréchet who wrote a related paper in 1927, further work was done by Fisher and Tippett in 1928 and
by Gumbel in 1958.

2.10.1 Characteristics
The single parameter Fréchet with parameter α has standardized moment

∫ ∞ ∫ ∞
t− α e−t dt,
k
µk = xk f (x)dx =
0 0

−α
(with t = x ) defined only for k < α :

( )
k
µk = Γ 1 −
α

where Γ (z) is the Gamma function.


In particular:

• For α > 1 the expectation is E[X] = Γ(1 − α1 )


( )2
• For α > 2 the variance is Var(X) = Γ(1 − α2 ) − Γ(1 − α1 ) .

The quantile qy of order y can be expressed through the inverse of the distribution,

−α
1
qy = F −1 (y) = (− loge y)

In particular the median is:

q1/2 = (loge 2)− α .


1

( ) α1
α
The mode of the distribution is α+1 .
Especially for the 3-parameter Fréchet, the first quartile is q1 = m + α√ s and the third quartile q3 = m + α√s
.
log(4) log( 43 )

Also the quantiles for the mean and mode are:


2.10. FRÉCHET DISTRIBUTION 113

( ( ))
−α 1
F (mean) = exp −Γ 1−
α
( )
α+1
F (mode) = exp − .
α

2.10.2 Applications

• In hydrology, the Fréchet distribution is applied to extreme events such as annually maximum one-day rainfalls
and river discharges.[4] The blue picture illustrates an example of fitting the Fréchet distribution to ranked annually
maximum one-day rainfalls in Oman showing also the 90% confidence belt based on the binomial distribution. The
cumulative frequencies of the rainfall data are represented by plotting positions as part of the cumulative frequency
analysis. However, in most hydrological applications, the distribution fitting is via the generalized extreme value
distribution as this avoids imposing the assumption that the distribution does not have a lower bound (as required
by the Frechet distribution).

• One useful test to assess whether a multivariate distribution is asymptotically dependent or independent consists
transforming the data into standard Frechet margins using transformation Zi = −1/logFi (Xi ) and then mapping
from the cartesian to pseudo-polar coordinates (R, W ) = (Z1 + Z2 , Z1 /(Z1 + Z2 )) . R >> 1 corresponds to
the extreme data for which at least only one component is large while W approximately 1 or 0 corresponds to only
one component being extreme.

2.10.3 Related distributions

• If X ∼ U (0, 1) (Uniform distribution (continuous)) then m + s(− log(X))−1/α ∼ Frechet(α, s, m)

• If X ∼ Frechet(α, s, m) then kX + b ∼ Frechet(α, ks, km + b)


1
• If Xi ∼ Frechet(α, s, m) and Y = max{ X1 , . . . , Xn } then Y ∼ Frechet(α, n α s, m)

• The cumulative distribution function of the Frechet distribution solves the maximum stability postulate equation

• If X ∼ Frechet(α, s, m = 0) then its reciprocal is Weibull-distributed: X −1 ∼ Weibull(k = α, λ = s−1 )

2.10.4 Properties

• The Frechet distribution is a max stable distribution

• The negative of a random variable having a Frechet distribution is a min stable distribution

2.10.5 See also

• Type-2 Gumbel distribution

• Fisher–Tippett–Gnedenko theorem

• CumFreq (application software for probability distributions including Fréchet)


114 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

2.10.6 References
[1] Muraleedharan. G, C. Guedes Soares and Cláudia Lucas (2011). “Characteristic and Moment Generating Functions of Gen-
eralised Extreme Value Distribution (GEV)". In Linda. L. Wright (Ed.), Sea Level Rise, Coastal Engineering, Shorelines and
Tides, Chapter 14, pp. 269–276. Nova Science Publishers. ISBN 978-1-61728-655-1

[2] Khan M.S., Pasha G.R. and Pasha A.H. (February 2008). “Theoretical Analysis of Inverse Weibull Distribution” (PDF). WSEAS
TRANSACTIONS on MATHEMATICS 7 (2). pp. 30–38.

[3] de Gusmão, FelipeR.S. and Ortega, EdwinM.M. and Cordeiro, GaussM. (2011). “The generalized inverse Weibull distribution”.
Statistical Papers 52 (3) (Springer-Verlag). pp. 591–619. doi:10.1007/s00362-009-0271-3. ISSN 0932-5026.

[4] Coles, Stuart (2001). An Introduction to Statistical Modeling of Extreme Values,. Springer-Verlag. ISBN 1-85233-459-2.

2.10.7 Publications
• Fréchet, M., (1927). “Sur la loi de probabilité de l'écart maximum.” Ann. Soc. Polon. Math. 6, 93.
• Fisher, R.A., Tippett, L.H.C., (1928). “Limiting forms of the frequency distribution of the largest and smallest
member of a sample.” Proc. Cambridge Philosophical Society 24:180–190.
• Gumbel, E.J. (1958). “Statistics of Extremes.” Columbia University Press, New York.
• Kotz, S.; Nadarajah, S. (2000) Extreme value distributions: theory and applications, World Scientific. ISBN 1-
86094-224-5

2.10.8 External links


• Bank of England working paper
• An application of a new extreme value distribution to air pollution data
• Wave Analysis for Fatigue and Oceanography

2.11 Gamma distribution


Not to be confused with Gamma function.

In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distri-
butions. The common exponential distribution and chi-squared distribution are special cases of the gamma distribution.
There are three different parametrizations in common use:

1. With a shape parameter k and a scale parameter θ.


2. With a shape parameter α = k and an inverse scale parameter β = 1/θ, called a rate parameter.
3. With a shape parameter k and a mean parameter μ = k/β.

In each of these three forms, both parameters are positive real numbers.
The parameterization with k and θ appears to be more common in econometrics and certain other applied fields, where
e.g. the gamma distribution is frequently used to model waiting times. For instance, in life testing, the waiting time until
death is a random variable that is frequently modeled with a gamma distribution.[2]
The parameterization with α and β is more common in Bayesian statistics, where the gamma distribution is used as
a conjugate prior distribution for various types of inverse scale (aka rate) parameters, such as the λ of an exponential
2.11. GAMMA DISTRIBUTION 115

distribution or a Poisson distribution[3] – or for that matter, the β of the gamma distribution itself. (The closely related
inverse gamma distribution is used as a conjugate prior for scale parameters, such as the variance of a normal distribution.)
If k is an integer, then the distribution represents an Erlang distribution; i.e., the sum of k independent exponentially
distributed random variables, each of which has a mean of θ (which is equivalent to a rate parameter of 1/θ).
The gamma distribution is the maximum entropy probability distribution for a random variable X for which E[X] = kθ =
α/β is fixed and greater than zero, and E[ln(X)] = ψ(k) + ln(θ) = ψ(α) − ln(β) is fixed (ψ is the digamma function).[4]

2.11.1 Characterization using shape k and scale θ

A random variable X that is gamma-distributed with shape k and scale θ is denoted by

X ∼ Γ(k, θ) ≡ Gamma(k, θ)

Probability density function

The probability density function using the shape-scale parametrization is

xk−1 e− θ
x

f (x; k, θ) = for x > 0 and k, θ > 0.


θk Γ(k)

Here Γ(k) is the gamma function evaluated at k.

Cumulative distribution function

The cumulative distribution function is the regularized gamma function:

∫ ( )
x
γ k, xθ
F (x; k, θ) = f (u; k, θ) du =
0 Γ(k)

where γ(k, x/θ) is the lower incomplete gamma function.


It can also be expressed as follows, if k is a positive integer (i.e., the distribution is an Erlang distribution):[5]

1 ( x )i − x 1 ( x )i

k−1 ∞

e θ = e− θ
x
F (x; k, θ) = 1 −
i=0
i! θ i! θ
i=k

2.11.2 Characterization using shape α and rate β

Alternatively, the gamma distribution can be parameterized in terms of a shape parameter α = k and an inverse scale
parameter β = 1/θ, called a rate parameter. A random variable X that is gamma-distributed with shape α and rate β is
denoted

X ∼ Γ(α, β) ≡ Gamma(α, β)
116 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

Probability density function

The corresponding density function in the shape-rate parametrization is

β α xα−1 e−xβ
g(x; α, β) = for x ≥ 0 and α, β > 0
Γ(α)

Both parametrizations are common because either can be more convenient depending on the situation.

Cumulative distribution function

The cumulative distribution function is the regularized gamma function:

∫ x
γ(α, βx)
F (x; α, β) = f (u; α, β) du =
0 Γ(α)

where γ(α, βx) is the lower incomplete gamma function.


If α is a positive integer (i.e., the distribution is an Erlang distribution), the cumulative distribution function has the
following series expansion:[5]


α−1 ∑∞
(βx)i −βx (βx)i
F (x; α, β) = 1 − e = e−βx
i=0
i! i=α
i!

2.11.3 Properties
Skewness

The skewness is equal to 2/ k , it depends only on the shape parameter (k) and approaches a normal distribution when
k is large (approximately when k > 10).

Median calculation

Unlike the mode and the mean which have readily calculable formulas based on the parameters, the median does not have
an easy closed form equation. The median for this distribution is defined as the value ν such that

∫ ν
1
xk−1 e− θ dx = 12 .
x

Γ(k)θk 0

A formula for approximating the median for any gamma distribution, when the mean is known, has been derived based
on the fact that the ratio μ/(μ − ν) is approximately a linear function of k when k ≥ 1.[6] The approximation formula is

3k − 0.8
ν≈µ ,
3k + 0.2
where µ(= kθ) is the mean.
A rigorous treatment of the problem of determining an asymptotic expansion and bounds for the median of the Gamma
Distribution was handled first by Chen and Rubin, who proved
2.11. GAMMA DISTRIBUTION 117

1
m− < λ(m) < m,
3

where λ(m) denotes the median of the Gamma(m, 1) distribution.[7]


K.P. Choi later showed that the first five terms in the asymptotic expansion of the median are

1 8 184 2248
λ(m) = m − + + 2
+ − ···
3 405m 25515m 3444525m3

by comparing the median to Ramanujan’s θ function.[8]


Later, it was shown that λ(m) is a convex function of m .[9]

Summation

If Xi has a Gamma(ki, θ) distribution for i = 1, 2, ..., N (i.e., all distributions have the same scale parameter θ), then

(N )

N ∑
Xi ∼ Gamma ki , θ
i=1 i=1

provided all Xi are independent.


For the cases where the Xi are independent but have different scale parameters see Mathai (1982) and Moschopoulos
(1984).
The gamma distribution exhibits infinite divisibility.

Scaling

If

X ∼ Gamma(k, θ),

then, for any c > 0,

cX ∼ Gamma(k, cθ).

Indeed, we know that if X is an exponential r.v. with rate λ then c X is an exponential r.v. with rate λ / c; the same
thing is valid with Gamma variates (and this can be checked using the moment-generating function, see, e.g., these notes,
10.4-(ii)): multiplication by a positive constant c divides the rate (or, equivalently, multiplies the scale).

Exponential family

The gamma distribution is a two-parameter exponential family with natural parameters k − 1 and −1/θ (equivalently, α −
1 and −β), and natural statistics X and ln(X).
If the shape parameter k is held fixed, the resulting one-parameter family of distributions is a natural exponential family.
118 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

Logarithmic expectation

One can show that

E[ln(X)] = ψ(α) − ln(β)

or equivalently,

E[ln(X)] = ψ(k) + ln(θ)

where ψ is the digamma function.


This can be derived using the exponential family formula for the moment generating function of the sufficient statistic,
because one of the sufficient statistics of the gamma distribution is ln(x).

Information entropy

The information entropy is

H(X) = E[− ln(p(X))] = E[−α ln(β) + ln(Γ(α)) − (α − 1) ln(X) + βX] = α − ln(β) + ln(Γ(α)) + (1 − α)ψ(α).

In the k, θ parameterization, the information entropy is given by

H(X) = k + ln(θ) + ln(Γ(k)) + (1 − k)ψ(k).

Kullback–Leibler divergence

The Kullback–Leibler divergence (KL-divergence), of Gamma(αp, βp) (“true” distribution) from Gamma(αq, βq) (“ap-
proximating” distribution) is given by[10]

βq − βp
DKL (αp , βp ; αq , βq ) = (αp − αq )ψ(αp ) − log Γ(αp ) + log Γ(αq ) + αq (log βp − log βq ) + αp
βp

Written using the k, θ parameterization, the KL-divergence of Gamma(kp, θp) from Gamma(kq, θq) is given by

θp − θq
DKL (kp , θp ; kq , θq ) = (kp − kq )ψ(kp ) − log Γ(kp ) + log Γ(kq ) + kq (log θq − log θp ) + kp
θq

Laplace transform

The Laplace transform of the gamma PDF is

βα
F (s) = (1 + θs)−k = .
(s + β)α
2.11. GAMMA DISTRIBUTION 119

Differential equation
{ −1/β −α
}
βxf ′ (x) + f (x)(−αβ + β + x) = 0; f (1) = e Γ(α) β
{ −k
}
e−θ ( θ1 )
xf ′ (x) + f (x)(−k + θx + 1) = 0; f (1) = Γ(k)

2.11.4 Parameter estimation

Maximum likelihood estimation

The likelihood function for N iid observations (x1 , ..., xN) is


N
L(k, θ) = f (xi ; k, θ)
i=1

from which we calculate the log-likelihood function


N ∑
N
xi
ℓ(k, θ) = (k − 1) ln (xi ) − − N k ln(θ) − N ln(Γ(k))
i=1 i=1
θ

Finding the maximum with respect to θ by taking the derivative and setting it equal to zero yields the maximum likelihood
estimator of the θ parameter:

1 ∑
N
θ̂ = xi
kN i=1

Substituting this into the log-likelihood function gives


N (∑ )
xi
ℓ = (k − 1) ln (xi ) − N k − N k ln − N ln(Γ(k))
i=1
kN

Finding the maximum with respect to k by taking the derivative and setting it equal to zero yields

( )
1 ∑ 1 ∑
N N
ln(k) − ψ(k) = ln xi − ln(xi )
N i=1 N i=1

There is no closed-form solution for k. The function is numerically very well behaved, so if a numerical solution is
desired, it can be found using, for example, Newton’s method. An initial value of k can be found either using the method
of moments, or using the approximation

( )
1 1
ln(k) − ψ(k) ≈ 1+
2k 6k + 1

If we let
120 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

( )
1 ∑ 1 ∑
N N
s = ln xi − ln (xi )
N i=1 N i=1

then k is approximately


3−s+ (s − 3)2 + 24s
k≈
12s
which is within 1.5% of the correct value.[11] An explicit form for the Newton–Raphson update of this initial guess is:[12]

ln(k) − ψ(k) − s
k←k− ′
.
k − ψ (k)
1

Bayesian minimum mean squared error

With known k and unknown θ, the posterior density function for theta (using the standard scale-invariant prior for θ) is

1∏
N
P (θ|k, x1 , . . . , xN ) ∝ f (xi ; k, θ)
θ i=1

Denoting


N
y
y≡ xi , P (θ|k, x1 , . . . , xN ) = C(xi )θ−N k−1 e− θ
i=1

Integration over θ can be carried out using a change of variables, revealing that 1/θ is gamma-distributed with parameters
α = Nk, β = y.

∫ ∞ ∫ ∞
−N k−1+m − y
θ θ e dθ = xN k−1−m e−xy dx = y −(N k−m) Γ(N k − m)
0 0

The moments can be computed by taking the ratio (m by m = 0)

Γ(N k − m) m
E[xm ] = y
Γ(N k)
which shows that the mean ± standard deviation estimate of the posterior distribution for θ is

y y2
±
N k − 1 (N k − 1)2 (N k − 2)

2.11.5 Generating gamma-distributed random variables


Given the scaling property above, it is enough to generate gamma variables with θ = 1 as we can later convert to any value
of β with simple division.
2.11. GAMMA DISTRIBUTION 121

Suppose we wish to generate random variables from Gamma(n+δ,1), where n is a non-negative integer and 0 < δ < 1.
Using the fact that a Gamma(1, 1) distribution is the same as an Exp(1) distribution, and noting the method of generating
exponential variables, we conclude that if U is uniformly distributed on (0, 1], then −ln(U) is distributed Gamma(1, 1).
Now, using the "α-addition” property of gamma distribution, we expand this result:


n
− ln Uk ∼ Γ(n, 1)
k=1

where Uk are all uniformly distributed on (0, 1] and independent. All that is left now is to generate a variable distributed
as Gamma(δ, 1) for 0 < δ < 1 and apply the "α-addition” property once more. This is the most difficult part.
Random generation of gamma variates is discussed in detail by Devroye,[13]:401–428 noting that none are uniformly fast
for all shape parameters. For small values of the shape parameter, the algorithms are often not valid.[13]:406 For arbitrary
values of the shape parameter, one can apply the Ahrens and Dieter[14] modified acceptance–rejection method Algorithm
GD (shape k ≥ 1), or transformation method[15] when 0 < k < 1. Also see Cheng and Feast Algorithm GKM 3[16] or
Marsaglia’s squeeze method.[17]
The following is a version of the Ahrens-Dieter acceptance–rejection method:[14]

1. Let m be 1.
2. Generate V3m−2, V3m−1 and V3m as independent uniformly distributed on (0, 1] variables.
3. If V3m−2 ≤ v0 , where v0 = e
e+δ , then go to step 4, else go to step 5.
1/δ δ−1
4. Let ξm = V3m−1 , ηm = V3m ξm . Go to step 6.
5. Let ξm = 1 − ln V3m−1 , ηm = V3m e−ξm .
δ−1 −ξm
6. If ηm > ξm e , then increment m and go to step 2.
7. Assume ξ = ξm to be the realization of Γ(δ, 1).

A summary of this is

 
⌊k⌋

θ ξ − ln(Ui ) ∼ Γ(k, θ)
i=1

where

• ⌊k⌋ is the integral part of k,


• ξ has been generated using the algorithm above with δ = {k} (the fractional part of k),
• Uk and Vl are distributed as explained above and are all independent.

While the above approach is technically correct, Devroye notes that it is linear in the value of k and in general is not a good
choice. Instead he recommends using either rejection-based or table-based methods, depending on context.[13]:401–428
For example, Marsaglia’s simple transformation-rejection method relying on a one normal and one uniform random
number:[18]

1. Setup: d=a-1/3, c=1/sqrt(9d).


2. Generate: v=(1+c*x)ˆ3, with x standard normal.
122 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

3. if v>0 and log(UNI) < 0.5*xˆ2+d-d*v+d*log(v) return d*v.


4. go back to step 2.

With 1 ≤ a = α = k generates a gamma distributed random number in time that is approximately constant with k. The
acceptance rate does depend on k, with an acceptance rate of 0.95, 0.98, and 0.99 for k=1, 2, and 4. For k<1, one can
use γα = γ1+α U 1/α to boost k to be usable with this method.

2.11.6 Related distributions


Special cases

Conjugate prior

In Bayesian inference, the gamma distribution is the conjugate prior to many likelihood distributions: the Poisson,
exponential, normal (with known mean), Pareto, gamma with known shape σ, inverse gamma with known shape param-
eter, and Gompertz with known scale parameter.
The gamma distribution’s conjugate prior is:[19]

−1
1 pk−1 e−θ q
p(k, θ|p, q, r, s) = ,
Z Γ(k)r θks
where Z is the normalizing constant, which has no closed-form solution. The posterior distribution can be found by
updating the parameters as follows:


p′ = p xi ,
i

q′ = q + xi ,
i
r′ = r + n,
s′ = s + n,
where n is the number of observations, and xi is the ith observation.

Compound gamma

If the shape parameter of the gamma distribution is known, but the inverse-scale parameter is unknown, then a gamma
distribution for the inverse-scale forms a conjugate prior. The compound distribution, which results from integrating out
the inverse-scale, has a closed form solution, known as the compound gamma distribution.[20]
If instead the shape parameter is known but the mean is unknown, with the prior of the mean being given by another
gamma distribution, then it results in K-distribution.

Others

• If X ~ Gamma(1, 1/λ) (shape -scale parametrization), then X has an exponential distribution with rate parameter
λ.
• If X ~ Gamma(ν/2, 2), then X is identical to χ2 (ν), the chi-squared distribution with ν degrees of freedom. Con-
versely, if Q ~ χ2 (ν) and c is a positive constant, then cQ ~ Gamma(ν/2, 2c).
• If k is an integer, the gamma distribution is an Erlang distribution and is the probability distribution of the waiting
time until the kth “arrival” in a one-dimensional Poisson process with intensity 1/θ. If
2.11. GAMMA DISTRIBUTION 123

(x)
X ∼ Γ(k ∈ Z, θ), Y ∼ Pois ,
θ
then
P (X > x) = P (Y < k).

• If X has a Maxwell–Boltzmann distribution with parameter a, then

(3 )
X2 ∼ Γ 2 , 2a
2

√ √
• If X ~ Gamma(k, θ), then X follows a generalized gamma distribution with parameters p = 2, d = 2k, and a = θ
.
• More generally, if X ~ Gamma(k,θ), then X q for q > 0 follows a generalized gamma distribution with parameters
p = 1/q, d = k/q, and a = θq .
• If X ~ Gamma(k, θ), then 1/X ~ Inv-Gamma(k, θ−1 ) (see Inverse-gamma distribution for derivation).
• If X ~ Gamma(α, θ) and Y ~ Gamma(β, θ) are independently distributed, then X/(X + Y) has a beta distribution
with parameters α and β.
• If Xi ~ Gamma(αi, 1) are independently distributed, then the vector (X1 /S, ..., Xn/S), where S = X1 + ... + Xn,
follows a Dirichlet distribution with parameters α1 , ..., αn.
• For large k the gamma distribution converges to Gaussian distribution with mean μ = kθ and variance σ2 = kθ2 .
• The gamma distribution is the conjugate prior for the precision of the normal distribution with known mean.
• The Wishart distribution is a multivariate generalization of the gamma distribution (samples are positive-definite
matrices rather than positive real numbers).
• The gamma distribution is a special case of the generalized gamma distribution, the generalized integer gamma
distribution, and the generalized inverse Gaussian distribution.
• Among the discrete distributions, the negative binomial distribution is sometimes considered the discrete analogue
of the Gamma distribution.
• Tweedie distributions – the gamma distribution is a member of the family of Tweedie exponential dispersion
models.

2.11.7 Applications
The gamma distribution has been used to model the size of insurance claims[21] and rainfalls.[22] This means that aggregate
insurance claims and the amount of rainfall accumulated in a reservoir are modelled by a gamma process. The gamma
distribution is also used to model errors in multi-level Poisson regression models, because the combination of the Poisson
distribution and a gamma distribution is a negative binomial distribution.
In wireless communication, the gamma distribution is used to model the multi-path fading of signal power.
In neuroscience, the gamma distribution is often used to describe the distribution of inter-spike intervals.[23][24]
In bacterial gene expression, the copy number of a constitutively expressed protein often follows the gamma distribution,
where the scale and shape parameter are, respectively, the mean number of bursts per cell cycle and the mean number of
protein molecules produced by a single mRNA during its lifetime.[25]
In genomics, the gamma distribution was applied in peak calling step (i.e. in recognition of signal) in ChIP-chip[26] and
ChIP-seq[27] data analysis.
The gamma distribution is widely used as a conjugate prior in Bayesian statistics. It is the conjugate prior for the precision
(i.e. inverse of the variance) of a normal distribution. It is also the conjugate prior for the exponential distribution.
124 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

2.11.8 Notes
[1] http://ocw.mit.edu/courses/mathematics/18-443-statistics-for-applications-fall-2006/lecture-notes/lecture6.pdf

[2] See Hogg and Craig (1978, Remark 3.3.1) for an explicit motivation

[3] Scalable Recommendation with Poisson Factorization, Prem Gopalan, Jake M. Hofman, David Blei, arXiv.org 2014

[4] Park, Sung Y.; Bera, Anil K. (2009). “Maximum entropy autoregressive conditional heteroskedasticity model” (PDF). Journal
of Econometrics (Elsevier): 219–230. doi:10.1016/j.jeconom.2008.12.014. Retrieved 2011-06-02.

[5] Papoulis, Pillai, Probability, Random Variables, and Stochastic Processes, Fourth Edition

[6] Banneheka BMSG, Ekanayake GEMUPD (2009) “A new point estimator for the median of gamma distribution”. Viyodaya J
Science, 14:95–103

[7] Jeesen Chen, Herman Rubin, Bounds for the difference between median and mean of gamma and poisson distributions, Statistics
& Probability Letters, Volume 4, Issue 6, October 1986, Pages 281-283, ISSN 0167-7152, .

[8] Choi, K.P. “On the Medians of the Gamma Distributions and an Equation of Ramanujan”, Proceedings of the American
Mathematical Society, Vol. 121, No. 1 (May, 1994), pp. 245-251.

[9] Berg,Christian and Pedersen, Henrik L. “Convexity of the median in the gamma distribution”.

[10] W.D. Penny, [www.fil.ion.ucl.ac.uk/~{}wpenny/publications/densities.ps KL-Divergences of Normal, Gamma, Dirichlet, and


Wishart densities]

[11] Minka, Thomas P. (2002) “Estimating a Gamma distribution”. http://research.microsoft.com/en-us/um/people/minka/papers/


minka-gamma.pdf

[12] Choi, S.C.; Wette, R. (1969) “Maximum Likelihood Estimation of the Parameters of the Gamma Distribution and Their Bias”,
Technometrics, 11(4) 683–690

[13] Devroye, Luc (1986). Non-Uniform Random Variate Generation. New York: Springer-Verlag. ISBN 0-387-96305-7. See
Chapter 9, Section 3.

[14] Ahrens, J. H.; Dieter, U (January 1982). “Generating gamma variates by a modified rejection technique”. Communications of
the ACM 25 (1): 47–54. doi:10.1145/358315.358390.. See Algorithm GD, p. 53.

[15] Ahrens, J. H.; Dieter, U. (1974). “Computer methods for sampling from gamma, beta, Poisson and binomial distributions”.
Computing 12: 223–246. doi:10.1007/BF02293108. CiteSeerX: 10.1.1.93.3828.

[16] Cheng, R.C.H., and Feast, G.M. Some simple gamma variate generators. Appl. Stat. 28 (1979), 290–295.

[17] Marsaglia, G. The squeeze method for generating gamma variates. Comput, Math. Appl. 3 (1977), 321–325.

[18] Marsaglia, G.; Tsang, W. W. (2000). “A simple method for generating gamma variables”. ACM Transactions on Mathematical
Software 26 (3): 363–372. doi:10.1145/358407.358414.

[19] Fink, D. 1995 A Compendium of Conjugate Priors. In progress report: Extension and enhancement of methods for setting data
quality objectives. (DOE contract 95‑831).

[20] Dubey, Satya D. (December 1970). “Compound gamma, beta and F distributions”. Metrika 16: 27–31. doi:10.1007/BF02613934.

[21] p. 43, Philip J. Boland, Statistical and Probabilistic Methods in Actuarial Science, Chapman & Hall CRC 2007

[22] Aksoy, H. (2000) “Use of Gamma Distribution in Hydrological Analysis”, Turk J. Engin Environ Sci, 24, 419 – 428.

[23] J. G. Robson and J. B. Troy, “Nature of the maintained discharge of Q, X, and Y retinal ganglion cells of the cat”, J. Opt. Soc.
Am. A 4, 2301–2307 (1987)

[24] M.C.M. Wright, I.M. Winter, J.J. Forster, S. Bleeck “Response to best-frequency tone bursts in the ventral cochlear nucleus is
governed by ordered inter-spike interval statistics”, Hearing Research 317 (2014)

[25] N. Friedman, L. Cai and X. S. Xie (2006) “Linking stochastic dynamics to population distribution: An analytical framework of
gene expression”, Phys. Rev. Lett. 97, 168302.
2.12. GENERALIZED GAMMA DISTRIBUTION 125

[26] DJ Reiss, MT Facciotti and NS Baliga (2008) “Model-based deconvolution of genome-wide DNA binding”, Bioinformatics, 24,
396–403

[27] MA Mendoza-Parra, M Nowicka, W Van Gool, H Gronemeyer (2013) “Characterising ChIP-seq binding patterns by model-
based peak shape deconvolution”, BMC Genomics, 14:834

2.11.9 References
• R. V. Hogg and A. T. Craig (1978) Introduction to Mathematical Statistics, 4th edition. New York: Macmillan.
(See Section 3.3.)'
• P. G. Moschopoulos (1985) The distribution of the sum of independent gamma random variables, Annals of the
Institute of Statistical Mathematics, 37, 541–544
• A. M. Mathai (1982) Storage capacity of a dam with gamma type inputs, Annals of the Institute of Statistical
Mathematics, 34, 591–597

2.11.10 External links


• Hazewinkel, Michiel, ed. (2001), “Gamma-distribution”, Encyclopedia of Mathematics, Springer, ISBN 978-1-
55608-010-4
• Weisstein, Eric W., “Gamma distribution”, MathWorld.
• Engineering Statistics Handbook

2.12 Generalized gamma distribution


The generalized gamma distribution is a continuous probability distribution with three parameters. It is a generalization
of the two-parameter gamma distribution. Since many distributions commonly used for parametric models in survival
analysis (such as the Weibull distribution and the log-normal distribution) are special cases of the generalized gamma, it
is sometimes used to determine which parametric model is appropriate for a given set of data.[1]

2.12.1 Characteristics
The generalized gamma has three parameters: a > 0 , d > 0 , and p > 0 . For non-negative x, the probability density
function of the generalized gamma is[2]

(p/ad )xd−1 e−(x/a)


p

f (x; a, d, p) = ,
Γ(d/p)

where Γ(·) denotes the gamma function.


The cumulative distribution function is

γ(d/p, (x/a)p )
F (x; a, d, p) = ,
Γ(d/p)

where γ(·) denotes the lower incomplete gamma function.


If d = p then the generalized gamma distribution becomes the Weibull distribution. Alternatively, if p = 1 the generalised
gamma becomes the gamma distribution.
126 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

Alternative parameterisations of this distribution are sometimes used; for example with the substitution α = d/p.[3] In
addition, a shift parameter can be added, so the domain of x starts at some value other than zero.[3] If the restrictions
on the signs of a, d and p are also lifted (but α = d/p remains positive), this gives a distribution called the Amoroso
distribution, after the Italian mathematician and economist Luigi Amoroso who described it in 1925.[4]

2.12.2 Moments

If X has a generalized gamma distribution as above, then[3]

Γ( d+r
p )
E(X r ) = ar .
Γ( dp )

2.12.3 Kullback-Leibler divergence

If f1 and f2 are the probability density functions of two generalized gamma distributions, then their Kullback-Leibler
divergence is given by

∫ ∞
f1 (x; a1 , d1 , p1 )
DKL (f1 ∥ f2 ) = f1 (x; a1 , d1 , p1 ) ln dx
0 f2 (x; a2 , d2 , p2 )
[ ] ( ) ( )p2
p1 ad22 Γ (d2 /p2 ) ψ (d1 /p1 ) Γ (d1 + p2 )/p1 a1 d1
= ln d1
+ + ln a1 (d1 − d2 ) + −
p2 a1 Γ (d1 /p1 ) p1 Γ (d1 /p1 ) a2 p1

where ψ(·) is the digamma function.[5]

2.12.4 Software implementation


ln d−ln p
In R implemented in the package flexsurv, function dgengamma, with different parametrisation: µ = ln a + ,
√ p
σ = √1pd , Q = dp .

2.12.5 See also


• Generalized integer gamma distribution

2.12.6 References
[1] Box-Steffensmeier, Janet M.; Jones, Bradford S. (2004) Event History Modeling: A Guide for Social Scientists. Cambridge
University Press. ISBN 0-521-54673-7 (pp. 41-43)

[2] Stacy, E.W. (1962). “A Generalization of the Gamma Distribution.” Annals of Mathematical Statistics 33(3): 1187-1192.
JSTOR 2237889

[3] Johnson, N.L.; Kotz, S; Balakrishnan, N. (1994) Continuous Univariate Distributions, Volume 1, 2nd Edition. Wiley. ISBN
0-471-58495-9 (Section 17.8.7)

[4] Gavin E. Crooks (2010), The Amoroso Distribution, Technical Note, Lawrence Berkeley National Laboratory.

[5] C. Bauckhage (2014), Computing the Kullback-Leibler Divergence between two Generalized Gamma Distributions, arXiv:
1401.6853.
2.13. GENERALIZED PARETO DISTRIBUTION 127

2.13 Generalized Pareto distribution


This article is about a particular family of continuous distributions referred to as the generalized Pareto distribution. For
the hierarchy of generalized Pareto distributions, see Pareto distribution.

In statistics, the generalized Pareto distribution (GPD) is a family of continuous probability distributions. It is often
used to model the tails of another distribution. It is specified by three parameters: location µ , scale σ , and shape ξ .[1][2]
Sometimes it is specified by only scale and shape[3] and sometimes only by its shape parameter. Some references give the
shape parameter as κ = −ξ .[4]

2.13.1 Definition
The standard cumulative distribution function (cdf) of the GPD is defined by[5]

{
−1/ξ
1 − (1 + ξz) ̸ 0,
forξ =
Fξ (z) =
1 − e−z forξ = 0.

where the support is z ≥ 0 for ξ ≥ 0 and 0 ≤ z ≤ −1/ξ for ξ < 0 .

{ ξ+1
(ξz + 1)− ξ ̸ 0,
forξ =
fξ (z) =
e−z forξ = 0.

Differential equation

The cdf of the GPD is a solution of the following differential equation:

{ }
(ξz + 1)fξ′ (z) + (ξ + 1)fξ (z) = 0,
fξ (0) = 1

2.13.2 Characterization
x−µ
The related location-scale family of distributions is obtained by replacing the argument z by σ and adjusting the support
accordingly: The cumulative distribution function is

 ( )−1/ξ

1 − 1 + ξ(x−µ)
σ forξ ̸= 0,
F(ξ,µ,σ) (x) =
1 − exp (− x−µ ) forξ = 0.
σ

for x ⩾ µ when ξ ⩾ 0 , and µ ⩽ x ⩽ µ − σ/ξ when ξ < 0 , where µ ∈ R , σ > 0 , and ξ ∈ R .


The probability density function (pdf) is

( ) − 1 −1
1 ξ(x − µ) ( ξ )
f(ξ,µ,σ) (x) = 1+
σ σ

or equivalently
128 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

1
σξ
f(ξ,µ,σ) (x) = 1
+1
(σ + ξ(x − µ)) ξ

again, for x ⩾ µ when ξ ⩾ 0 , and µ ⩽ x ⩽ µ − σ/ξ when ξ < 0 .


The pdf is a solution of the following differential equation:

{ }
f ′ (x)(−µξ + σ + ξx) + (ξ + 1)f (x) = 0,
− 1 −1
(1− µξ
σ )
ξ
f (0) = σ

2.13.3 Characteristic and Moment Generating Functions

The characteristic and moment generating functions are derived and skewness and kurtosis are obtained from MGF by
Muraleedharan and Guedes Soares[6]

2.13.4 Special cases


• If the shape ξ and location µ are both zero, the GPD is equivalent to the exponential distribution.

• With shape ξ > 0 and location µ = σ/ξ , the GPD is equivalent to the Pareto distribution with scale xm = σ/ξ
and shape α = 1/ξ .

2.13.5 Generating generalized Pareto random variables

If U is uniformly distributed on (0, 1], then

σ(U −ξ − 1)
X =µ+ ∼ GPD(µ, σ, ξ ̸= 0)
ξ

and

X = µ − σ ln(U ) ∼ GPD(µ, σ, ξ = 0).

Both formulas are obtained by inversion of the cdf.


In Matlab Statistics Toolbox, you can easily use “gprnd” command to generate generalized Pareto random numbers.
With GNU R you can use the packages POT or evd with the “rgpd” command (see for exact usage: http://rss.acs.unt.edu/
Rdoc/library/POT/html/simGPD.html)

2.13.6 See also


• Pareto_distribution

• Generalized extreme value distribution

• Pickands–Balkema–de Haan theorem


2.14. GAMMA/GOMPERTZ DISTRIBUTION 129

2.13.7 Notes
[1] Coles, Stuart (2001-12-12). An Introduction to Statistical Modeling of Extreme Values. Springer. p. 75. ISBN 9781852334598.
[2] Dargahi-Noubary, G. R. (1989). “On tail estimation: An improved method”. Mathematical Geology 21 (8): 829–842. doi:10.1007/BF00894450.
[3] Hosking, J. R. M.; Wallis, J. R. (1987). “Parameter and Quantile Estimation for the Generalized Pareto Distribution”. Tech-
nometrics 29 (3): 339–349. doi:10.2307/1269343.
[4] Davison, A. C. (1984-09-30). “Modelling Excesses over High Thresholds, with an Application”. In de Oliveira, J. Tiago.
Statistical Extremes and Applications. Kluwer. p. 462. ISBN 9789027718044.
[5] Embrechts, Paul; Klüppelberg, Claudia; Mikosch, Thomas (1997-01-01). Modelling extremal events for insurance and finance.
p. 162. ISBN 9783540609315.
[6] Muraleedharan, G.; C, Guedes Soares (2014). “Characteristic and Moment Generating Functions of Generalised Pareto(GP3)
and Weibull Distributions”. Journal of Scientific Research and Reports 3 (14): 1861–1874. doi:10.9734/JSRR/2014/10087.

2.13.8 References
• Pickands, James (1975). “Statistical inference using extreme order statistics”. Annals of Statistics 3: 119–131.
doi:10.1214/aos/1176343003

• Balkema, A.; De Haan, Laurens (1974). “Residual life time at great age”. Annals of Probability 2 (5): 792–804.
doi:10.1214/aop/1176996548

• N. L. Johnson, S. Kotz, and N. Balakrishnan (1994). Continuous Univariate Distributions Volume 1, second edition.
New York: Wiley. ISBN 0-471-58495-9. Chapter 20, Section 12: Generalized Pareto Distributions.

• Barry C. Arnold (2011). “Chapter 7: Pareto and Generalized Pareto Distributions”. In Duangkamon Chotika-
panich. Modeling Distributions and Lorenz Curves. New York: Springer. ISBN 9780387727967.

• Arnold, B. C. and Laguna, L. (1977). On generalized Pareto distributions with applications to income data. Ames,
Iowa: Iowa State University, Department of Economics.

2.13.9 External links


• Mathworks: Generalized Pareto distribution

2.14 Gamma/Gompertz distribution


In probability and statistics, the Gamma/Gompertz distribution is a continuous probability distribution. It has been
used as an aggregate-level model of customer lifetime and a model of mortality risks.

2.14.1 Specification
Probability density function

The probability density function of the Gamma/Gompertz distribution is:

bsebx β s
f (x; b, s, β) = s+1
(β − 1 + ebx )
where b > 0 is the scale parameter and β, s > 0 are the shape parameters of the Gamma/Gompertz distribution.
130 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

Cumulative distribution function

The cumulative distribution function of the Gamma/Gompertz distribution is:

βs
F (x; b, s, β) = 1 − s , x > 0, b, s, β > 0
(β − 1 + ebx )

= 1 − e−bsx , β = 1

Moment generating function

The moment generating function is given by:



β s
sb
2 F1 (s + 1, (t/b) + s; (t/b) + s + 1; 1 − β), β ̸= 1;
−tx t + sb
E(e ) =


sb
, β = 1.
t + sb
∑∞
where 2 F1 (a, b; c; z) = k=0 [(a)k (b)k /(c)k ]z k /k! is a Hypergeometric function.

2.14.2 Properties
The Gamma/Gompertz distribution is a flexible distribution that can be skewed to the right or to the left.

2.14.3 Related distributions


• When β = 1, this reduces to an Exponential distribution with parameter sb.

• The gamma distribution is a natural conjugate prior to a Gompertz likelihood with known, scale parameter b. [1]

• When the shape parameter η of a Gompertz distribution varies according to a gamma distribution with shape
parameter α and scale parameter β (mean = α/β ), the distribution of x is Gamma/Gompertz.[1]

2.14.4 See also


• Gompertz distribution

• Customer lifetime value

2.14.5 Notes
[1] Bemmaor, A.C.; Glady, N. (2012)

2.14.6 References
• Bemmaor, Albert C.; Glady, Nicolas (2012). “Modeling Purchasing Behavior With Sudden 'Death': A Flexible
Customer Lifetime Model”. Management Science 58 (5): 1012–1021. doi:10.1287/mnsc.1110.1461.

• Bemmaor, Albert C.; Glady, Nicolas (2011). “Implementing the Gamma/Gompertz/NBD Model in MATLAB”
(PDF). Cergy-Pontoise: ESSEC Business School.
2.15. GOMPERTZ DISTRIBUTION 131

• Gompertz, B. (1825). “On the Nature of the Function Expressive of the Law of Human Mortality, and on a New
Mode of Determining the Value of Life Contingencies”. Philosophical Transactions of the Royal Society of London
115: 513–583. doi:10.1098/rstl.1825.0026. JSTOR 107756.
• Johnson, Norman L.; Kotz, Samuel; Balakrishnan, N. (1995). “Continuous Univariate Distributions” 2 (2nd ed.).
New York: John Wiley & Sons. pp. 25–26. ISBN 0-471-58494-0.
• Manton, K. G.; Stallard, E.; Vaupel, J. W. (1986). “Alternative Models for the Heterogeneity of Mortality Risks
Among the Aged”. Journal of the American Statistical Association 81: 635–644. doi:10.1080/01621459.1986.10478316.

2.15 Gompertz distribution


In probability and statistics, the Gompertz distribution is a continuous probability distribution. The Gompertz distribu-
tion is often applied to describe the distribution of adult lifespans by demographers[1][2] and actuaries.[3][4] Related fields of
science such as biology[5] and gerontology[6] also considered the Gompertz distribution for the analysis of survival. More
recently, computer scientists have also started to model the failure rates of computer codes by the Gompertz distribution.[7]
In Marketing Science, it has been used as an individual-level simulation for customer lifetime value modeling.[8]

2.15.1 Specification
Probability density function

The probability density function of the Gompertz distribution is:

( )
f (x; η, b) = bηebx eη exp −ηebx forx ≥ 0,

where b > 0 is the scale parameter and η > 0 is the shape parameter of the Gompertz distribution. In the actuarial
and biological sciences and in demography, the Gompertz distribution is parametrized slightly differently (Gompertz–
Makeham law of mortality).

Cumulative distribution function

The cumulative distribution function of the Gompertz distribution is:

( ( ))
F (x; η, b) = 1 − exp −η ebx − 1 ,

where η, b > 0, and x ≥ 0 .

Moment generating function

The moment generating function is:

( )
E e−tX = ηeη Et/b (η)

where

∫ ∞
Et/b (η) = e−ηv v −t/b dv, t > 0.
1
132 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

2.15.2 Properties

The Gompertz distribution is a flexible distribution that can be skewed to the right and to the left.

Shapes

The Gompertz density function can take on different shapes depending on the values of the shape parameter η :

• When η ≥ 1, the probability density function has its mode at 0.

• When 0 < η < 1, the probability density function has its mode at

x∗ = (1/b) ln (1/η) with0 < F (x∗ ) < 1 − e−1 = 0.632121

Kullback-Leibler divergence

If f1 and f2 are the probability density functions of two Gompertz distributions, then their Kullback-Leibler divergence
is given by

∫ ∞
f1 (x; b1 , η1 )
DKL (f1 ∥ f2 ) = f1 (x; b1 , η1 ) ln dx
0 f2 (x; b2 , η2 )
 
( ) ( )
eη1 b1 η1 b η b
+ eη1  + 1, η1  − (η1 + 1)
2 2 2
= ln η2 − 1 Ei(−η1 ) + b2 Γ
e b2 η2 b1 b1 b1
η1

where Ei(·) denotes the exponential integral and Γ(·, ·) is the upper incomplete gamma function.[9]

2.15.3 Related distributions

• If X is defined to be the result of sampling from a Gumbel distribution until a negative value Y is produced, and
setting X=−Y, then X has a Gompertz distribution.

• The gamma distribution is a natural conjugate prior to a Gompertz likelihood with known scale parameter b. [8]

• When η varies according to a gamma distribution with shape parameter α and scale parameter β (mean = α/β ),
the distribution of x is Gamma/Gompertz.[8]

2.15.4 See also

• Gompertz function

• Customer lifetime value

• Gamma Gompertz distribution


2.16. HALF-NORMAL DISTRIBUTION 133

2.15.5 Notes
[1] Vaupel, James W. (1986). “How change in age-specific mortality affects life expectancy”. Population Studies 40 (1): 147–157.
doi:10.1080/0032472031000141896.

[2] Preston, Samuel H.; Heuveline, Patrick; Guillot, Michel (2001). Demography:measuring and modeling population processes.
Oxford: Blackwell.

[3] Benjamin, Bernard; Haycocks, H.W.; Pollard, J. (1980). The Analysis of Mortality and Other Actuarial Statistics. London:
Heinemann.

[4] Willemse, W. J.; Koppelaar, H. (2000). “Knowledge elicitation of Gompertz' law of mortality”. Scandinavian Actuarial Journal
(2): 168–179.

[5] Economos, A. (1982). “Rate of aging, rate of dying and the mechanism of mortality”. Archives of Gerontology and Geriatrics
1 (1): 46–51.

[6] Brown, K.; Forbes, W. (1974). “A mathematical model of aging processes”. Journal of Gerontology 29 (1): 46–51. doi:10.1093/geronj/29.1.46.

[7] Ohishi, K.; Okamura, H.; Dohi, T. (2009). “Gompertz software reliability model: estimation algorithm and empirical valida-
tion”. Journal of Systems and Software 82 (3): 535–543. doi:10.1016/j.jss.2008.11.840.

[8] Bemmaor, Albert C.; Glady, Nicolas (2012). “Modeling Purchasing Behavior With Sudden 'Death': A Flexible Customer
Lifetime Model”. Management Science 58 (5): 1012–1021. doi:10.1287/mnsc.1110.1461.

[9] Bauckhage, C. (2014), Characterizations and Kullback-Leibler Divergence of Gompertz Distributions, arXiv:1402.3193.

2.15.6 References
• Bemmaor, Albert C.; Glady, Nicolas (2011). “Implementing the Gamma/Gompertz/NBD Model in MATLAB”
(PDF). Cergy-Pontoise: ESSEC Business School.

• Gompertz, B. (1825). “On the Nature of the Function Expressive of the Law of Human Mortality, and on a New
Mode of Determining the Value of Life Contingencies”. Philosophical Transactions of the Royal Society of London
115: 513–583. doi:10.1098/rstl.1825.0026. JSTOR 107756.

• Johnson, Norman L.; Kotz, Samuel; Balakrishnan, N. (1995). “Continuous Univariate Distributions” 2 (2nd ed.).
New York: John Wiley & Sons. pp. 25–26. ISBN 0-471-58494-0.

• Sheikh, A. K.; Boah, J. K.; Younas, M. (1989). “Truncated Extreme Value Model for Pipeline Reliability”. Reli-
ability Engineering and System Safety 25 (1): 1–14. doi:10.1016/0951-8320(89)90020-3.

2.16 Half-normal distribution


The half-normal distribution is a special case of the folded normal distribution.
Let X follow an ordinary normal distribution, N (0, σ 2 ) , then Y = |X| follows a half-normal distribution. Thus, the
half-normal distribution is a fold at the mean of an ordinary normal distribution with mean zero.
Using the σ parametrization of the normal distribution, the probability density function (PDF) of the half-normal is given
by

√ ( )
2 y2
fY (y; σ) = √ exp − 2 y>0
σ π 2σ

σ√ 2
Where E[Y ] = µ = π
.
134 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

Alternatively using

a scaled precision (inverse of the variance) parametrization (to avoid issues if σ is near zero), obtained
by setting θ = σ√π2 , the probability density function is given by

( 2 2)
2θ y θ
fY (y; θ) = exp − y>0
π π
1
where E[Y ] = µ = θ .
The cumulative distribution function (CDF) is given by

∫ y
√ ( )
21 x2
FY (y; σ) = exp − 2 dx
0 πσ 2σ

Using the change-of-variables z = x/( 2σ) , the CDF can be written as

∫ √ ( )
2 y/( 2σ) ( ) y
FY (y; σ) = √ exp −z 2 dz = erf √ ,
π 0 2σ
where erf(x) is the error function, a standard function in many mathematical software packages.
The quantile function (or inverse CDF) is written:


Q(F ; σ) = σ 2 erf−1 (F )

where 0 ≤ F ≤ 1 and erf−1 () is the inverse error function


The expectation is then given by


E(Y ) = σ 2/π,

The variance is given by

( )
2
Var(Y ) = σ 2
1− .
π

Since this is proportional to the variance σ2 of X, σ can be seen as a scale parameter of the new distribution.
The entropy of the half-normal distribution is exactly one bit less the entropy of a zero-mean normal distribution with
the same second moment about 0. This can be understood intuitively since the magnitude operator reduces information
by one bit (if the probability distribution at its input is even). Alternatively, since a half-normal distribution is always
positive, the one bit it would take to record whether a standard normal random variable were positive (say, a 1) or negative
(say, a 0) is no longer necessary. Thus,

( )
1 πσ 2 1
H(Y ) = log +
2 2 2
Differential equation
{ √ 2 − 12 }
e 2σ
σ 2 f ′ (x) + xf (x) = 0, f (1) = π σ
{ 2
}
−θ
πf ′ (x) + 2θ2 xf (x) = 0, f (1) = 2e ππ θ
2.17. HOTELLING’S T-SQUARED DISTRIBUTION 135

2.16.1 Parameter estimation

Given numbers {xi }ni=1 drawn from a half-normal distribution, the unknown parameter σ of that distribution can be
estimated by the method of maximum likelihood, giving

v
u n
u1 ∑
σ̂ = t x2
n i=1 i

2.16.2 Related distributions


• The distribution is a special case of the folded normal distribution with μ = 0.

• It also coincides with a zero-mean normal distribution truncated from below at zero (see truncated normal distri-
bution)

• (Y/σ)^2 has a chi square distribution with 1 degree of freedom.

• Y/σ has a chi distribution with 1 degree of freedom, if Y ∼ HN (σ) then Y /σ ∼ χ1 .

2.16.3 External links


• Half-Normal Distribution at MathWorld

1

(note that MathWorld uses the parameter θ = σ π/2 )

2.16.4 References

2.17 Hotelling’s T-squared distribution


In statistics Hotelling’s T-squared distribution is a univariate distribution proportional to the F-distribution and arises
importantly as the distribution of a set of statistics which are natural generalizations of the statistics underlying Student’s
t-distribution. In particular, the distribution arises in multivariate statistics in undertaking tests of the differences between
the (multivariate) means of different populations, where tests for univariate problems would make use of a t-test.
The distribution is named for Harold Hotelling, who developed it[1] as a generalization of Student’s t-distribution.

2.17.1 The distribution

If the vector pd1 is Gaussian multivariate-distributed with zero mean and unit covariance matrix N(p01,pIp) and mMp
is a p x p matrix with a Wishart distribution with unit scale matrix and m degrees of freedom W(pIp,m) then m(1d'
pM−1 pd1) has a Hotelling T2 distribution with dimensionality parameter p and m degrees of freedom.[2]
2
If the notation Tp,m is used to denote a random variable having Hotelling’s T-squared distribution with parameters p and
m then, if a random variable X has Hotelling’s T-squared distribution,

X ∼ Tp,m
2

then[1]
136 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

m−p+1
X ∼ Fp,m−p+1
pm

where Fp,m−p+1 is the F-distribution with parameters p and m−p+1.

2.17.2 Hotelling’s T-squared statistic

Hotelling’s T-squared statistic is a generalization of Student’s t statistic that is used in multivariate hypothesis testing, and
is defined as follows.[1]
Let Np (µ, ) denote a p-variate normal distribution with location µ and covariance . Let

x1 , . . . , xn ∼ Np (µ, )

be n independent random variables, which may be represented as p × 1 column vectors of real numbers. Define

x1 + · · · + xn
x=
n

to be the sample mean. It can be shown that

n(x − µ)′ −1
(x − µ) ∼ χ2p ,

where χ2p is the chi-squared distribution with p degrees of freedom. To show this use the fact that x ∼ Np (µ, /n) and
then derive the characteristic function of the random variable y = n(x − µ)′ −1 (x − µ) . This is done below,

ϕy (θ) = E eiθy ,

′ −1
= E eiθn(x−µ) (x−µ)


′ −1 p ′ −1
(2π)− 2 |Σ/n|− 2 e− 2 n(x−µ) Σ
1 1
= eiθn(x−µ) (x−µ) (x−µ)
dx1 ...dxp


′ −1
p
−2iθΣ−1 )(x−µ)
(2π)− 2 |Σ/n|− 2 e− 2 n(x−µ) (Σ
1 1
= dx1 ...dxp ,


′ −1
−1 −1 −1 −2iθΣ−1 )(x−µ)
(2π)− 2 |(Σ−1 −2iθΣ−1 )−1 /n|− 2 e− 2 n(x−µ) (Σ
p
1
− 12 1 1
= |(Σ −2iθΣ ) /n| |Σ/n|
2 dx1 ...dxp ,

= |(Ip − 2iθIp )|− 2 ,


1

p
= (1 − 2iθ)− 2 . ■

However, is often unknown and we wish to do hypothesis testing on the location µ .


2.17. HOTELLING’S T-SQUARED DISTRIBUTION 137

Sum of p squared t’s

Define

1 ∑
n
W= (xi − x)(xi − x)′
n − 1 i=1

to be the sample covariance. Here we denote transpose by an apostrophe. It can be shown that W is a positive (semi)
definite matrix and (n − 1)W follows a p-variate Wishart distribution with n−1 degrees of freedom.[3] Hotelling’s T-
squared statistic is then defined[4] to be

t2 = n(x − µ)′ W−1 (x − µ)

and, also from above,

t2 ∼ Tp,n−1
2

i.e.

n−p 2
t ∼ Fp,n−p ,
p(n − 1)

where Fp,n−p is the F-distribution with parameters p and n−p. In order to calculate a p value, multiply the t 2 statistic by
the above constant and use the F-distribution.

2.17.3 Hotelling’s two-sample T-squared statistic


If x1 , . . . , xnx ∼ Np (µ, V) and y1 , . . . , yny ∼ Np (µ, V) , with the samples independently drawn from two independent
multivariate normal distributions with the same mean and covariance, and we define

1 ∑ 1 ∑
nx ny
x= xi y= yi
nx i=1 ny i=1

as the sample means, and

∑nx ∑ny
i=1 (xi − x)(xi − x)′ + i=1 (yi − y)(yi − y)′
W=
nx + ny − 2
as the unbiased pooled covariance matrix estimate, then Hotelling’s two-sample T-squared statistic is

nx ny
t2 = (x − y)′ W−1 (x − y) ∼ T 2 (p, nx + ny − 2)
nx + ny

and it can be related to the F-distribution by[3]

nx + ny − p − 1 2
t ∼ F (p, nx + ny − 1 − p).
(nx + ny − 2)p
138 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

The non-null distribution of this statistic is the noncentral F-distribution (the ratio of a non-central Chi-squared random
variable and an independent central Chi-squared random variable)

nx + ny − p − 1 2
t ∼ F (p, nx + ny − 1 − p; δ),
(nx + ny − 2)p

with

nx ny ′ −1
δ= ν V ν,
nx + ny

where ν is the difference vector between the population means.


More robust and powerful tests than Hotelling’s two-sample test have been proposed in the literature, see for example the
interpoint distance based tests which can be applied also when the number of variables is comparable with, or even larger
than, the number of subjects.[5][6]
In the two variable case, the formula simplifies nicely allowing appreciation of how the correlation, r , between the
variables affects t2 . If we define

d1 = x.1 − y .1 , d2 = x.2 − y .2

and

√ √
SD1 = W11 SD2 = W22

then

[( )2 ( )2 ( )( )]
nx ny d1 d2 d1 d2
2
t = + − 2r
(nx + ny )(1 − r2 ) SD1 SD2 SD1 SD2

Thus, if the differences in the two rows of the vector (x − y) are of the same sign, in general, t2 becomes smaller as r
becomes more positive. If the differences are of opposite sign t2 becomes larger as r becomes more positive.

2.17.4 See also

• Student’s t-test in univariate statistics

• Student’s t-distribution in univariate probability theory

• Multivariate Student distribution.

• F-distribution (commonly tabulated or available in software libraries, and hence used for testing the T-squared
statistic using the relationship given above)

• Wilks’ lambda distribution (in multivariate statistics Wilks’s Λ is to Hotelling’s T 2 as Snedecor’s F is to Student’s t
in univariate statistics).
2.18. INVERSE GAUSSIAN DISTRIBUTION 139

2.17.5 References
[1] Hotelling, H. (1931). “The generalization of Student’s ratio”. Annals of Mathematical Statistics 2 (3): 360–378. doi:10.1214/aoms/1177732979.

[2] Eric W. Weisstein, CRC Concise Encyclopedia of Mathematics, Second Edition, Chapman & Hall/CRC, 2003, p. 1408

[3] Mardia, K. V.; Kent, J. T.; Bibby, J. M. (1979). Multivariate Analysis. Academic Press. ISBN 0-12-471250-9.

[4] http://www.itl.nist.gov/div898/handbook/pmc/section5/pmc543.htm

[5] Marozzi, M. (2014). “Multivariate tests based on interpoint distances with application to magnetic resonance imaging”. Statis-
tical Methods in Medical Research. doi:10.1177/0962280214529104.

[6] Marozzi, M. (2015). “Multivariate multidistance tests for high-dimensional low sample size case-control studies”. Statistics in
Medicine 34. doi:10.1002/sim.6418.

2.17.6 External links


• Prokhorov, A.V. (2001), “Hotelling T 2 -distribution”, in Hazewinkel, Michiel, Encyclopedia of Mathematics, Springer,
ISBN 978-1-55608-010-4

2.18 Inverse Gaussian distribution


In probability theory, the inverse Gaussian distribution (also known as the Wald distribution) is a two-parameter
family of continuous probability distributions with support on (0,∞).
Its probability density function is given by

[ ]1/2
λ −λ(x − µ)2
f (x; µ, λ) = exp
2πx3 2µ2 x
for x > 0, where µ > 0 is the mean and λ > 0 is the shape parameter.
As λ tends to infinity, the inverse Gaussian distribution becomes more like a normal (Gaussian) distribution. The inverse
Gaussian distribution has several properties analogous to a Gaussian distribution. The name can be misleading: it is
an “inverse” only in that, while the Gaussian describes a Brownian Motion’s level at a fixed time, the inverse Gaussian
describes the distribution of the time a Brownian Motion with positive drift takes to reach a fixed positive level.
Its cumulant generating function (logarithm of the characteristic function) is the inverse of the cumulant generating func-
tion of a Gaussian random variable.
To indicate that a random variable X is inverse Gaussian-distributed with mean μ and shape parameter λ we write

X ∼ IG(µ, λ).

2.18.1 Properties
Summation

If Xi has a IG(μ0 wi, λ0 wi2 ) distribution for i = 1, 2, ..., n and all Xi are independent, then


n ( ∑ (∑ )2 )
S= Xi ∼ IG µ0 wi , λ0 wi .
i=1
140 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

Note that

Var(Xi ) µ2 w2 µ2
= 0 i2 = 0
E(Xi ) λ0 wi λ0
is constant for all i. This is a necessary condition for the summation. Otherwise S would not be inverse Gaussian.

Scaling

For any t > 0 it holds that

X ∼ IG(µ, λ) ⇒ tX ∼ IG(tµ, tλ).

Exponential family

The inverse Gaussian distribution is a two-parameter exponential family with natural parameters -λ/(2μ²) and -λ/2, and
natural statistics X and 1/X.

Differential equation

Main article: Differential equation


{ λ(1−µ)2
}
2 2 ′
( ) √ −
2µ2
2µ x f (x) + f (x) −λµ + λx + 3µ x = 0, f (1) =
2 2 2 λe √

2.18.2 Relationship with Brownian motion


The stochastic process Xt given by

X0 = 0
Xt = νt + σWt
(where Wt is a standard Brownian motion and ν > 0 ) is a Brownian motion with drift ν.
Then, the first passage time for a fixed level α > 0 by Xt is distributed according to an inverse-gaussian:

2
Tα = inf{0 < t | Xt = α} ∼ IG( αν , α
σ 2 ).

When drift is zero

A common special case of the above arises when the Brownian motion has no drift. In that case, parameter μ tends to
infinity, and the first passage time for fixed level α has probability density function

( ( α )2 ) ( )
α α2
f x; 0, = √ exp − .
σ σ 2πx3 2xσ 2
α2
This is a Lévy distribution with parameters c = σ2 and µ = 0 .
2.18. INVERSE GAUSSIAN DISTRIBUTION 141

2.18.3 Maximum likelihood


The model where

Xi ∼ IG(µ, λwi ), i = 1, 2, . . . , n

with all wi known, (μ, λ) unknown and all Xi independent has the following likelihood function

( ) n2 (∏ ) 12 ( n )
λ∑ λ ∑ λ∑
n n n
λ wi 1
L(µ, λ) = exp wi − 2 wi Xi − wi .
2π i=1
Xi3 µ i=1 2µ i=1 2 i=1 Xi

Solving the likelihood equation yields the following maximum likelihood estimates

∑n ( )
1∑
n
i=1 wi Xi 1 1 1
µ̂ = ∑ n , = wi − .
i=1 wi λ̂ n i=1 Xi µ̂

µ̂ and λ̂ are independent and

( )

n
n 1 2
µ̂ ∼ IG µ, λ wi ∼ χ .
i=1 λ̂ λ n−1

2.18.4 Generating random variates from an inverse-Gaussian distribution


The following algorithm may be used.[1]
Generate a random variate from a normal distribution with a mean of 0 and 1 standard deviation

ν = N (0, 1).

Square the value

y = ν2

and use this relation

µ2 y µ√
x=µ+ − 4µλy + µ2 y 2 .
2λ 2λ
Generate another random variate, this time sampled from a uniform distribution between 0 and 1

z = U (0, 1).

If

µ
z≤
µ+x
142 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

then return

x
else return

µ2
.
x
Sample code in Java:
1 public double inverseGaussian(double mu, double lambda) { 2 Random rand = new Random(); 3 double v = rand.nextGaussian();
// sample from a normal distribution with a mean of 0 and 1 standard deviation 4 double y = v*v; 5 double x = mu +
(mu*mu*y)/(2*lambda) - (mu/(2*lambda)) * Math.sqrt(4*mu*lambda*y + mu*mu*y*y); 6 double test = rand.nextDouble();
// sample from a uniform distribution between 0 and 1 7 if (test <= (mu)/(mu + x)) 8 return x; 9 else 10 return (mu*mu)/x;
11 }
And to plot Wald distribution in Python using matplotlib and NumPy:
1 import matplotlib.pyplot as plt 2 import numpy as np 3 4 h = plt.hist(np.random.wald(3, 2, 100000), bins=200,
normed=True) 5 6 plt.show()

2.18.5 Related distributions


• If X ∼ IG(µ, λ) then kX ∼ IG(kµ, kλ)
∑n
• If Xi ∼ IG(µ, λ) then i=1 Xi ∼ IG(nµ, n2 λ)
• If Xi ∼ IG(µ, λ) for i = 1, . . . , n then X̄ ∼ IG(µ, nλ)
∑n (∑ ∑n )
n 2
• If Xi ∼ IG(µi , 2µ2i ) then i=1 Xi ∼ IG i=1 µi , 2( i=1 µi )

The convolution of a inverse Gaussian distribution (a Wald distribution) and an exponential (an ex-Wald distribution) is
used as a model for response times in psychology,[2] with visual search as one example.[3]

2.18.6 History
This distribution appears to have been first derived by Schrödinger in 1915 as the time to first passage of a Brownian
motion.[4] The name inverse Gaussian was proposed by Tweedie in 1945.[5] Wald re-derived this distribution in 1947 as
the limiting form of a sample in a sequential probability ratio test. Tweedie investigated this distribution in 1957 and
established some of its statistical properties.

2.18.7 Software
The R programming language has software for this distribution.[6] The inverse Gaussian distribution can be called using
the statmod package.

2.18.8 See also


• Generalized inverse Gaussian distribution
• Tweedie distributions—The inverse Gaussian distribution is a member of the family of Tweedie exponential dis-
persion models
• Stopping time
2.19. LÉVY DISTRIBUTION 143

2.18.9 Notes
[1] Michael, John R.; Schucany, William R.; Haas, Roy W. (May 1976). “Generating Random Variates Using Transformations with
Multiple Roots”. The American Statistician (American Statistical Association) 30 (2): 88–90. doi:10.2307/2683801. JSTOR
2683801.

[2] Schwarz, W (2001). “The ex-Wald distribution as a descriptive model of response times”. Behavior research methods, instru-
ments, & computers : a journal of the Psychonomic Society, Inc 33 (4): 457–69. PMID 11816448.

[3] Palmer, E. M.; Horowitz, T. S.; Torralba, A.; Wolfe, J. M. (2011). “What are the shapes of response time distributions in visual
search?". Journal of Experimental Psychology: Human Perception and Performance 37: 58. doi:10.1037/a0020747.

[4] Schrodinger E (1915) Zur Theorie der Fall—und Steigversuche an Teilchenn mit Brownscher Bewegung. Physikalische Zeitschrift
16, 289-295

[5] Folks, J. L.; Chhikara, R. S. (1978). “The Inverse Gaussian Distribution and Its Statistical Application--A Review”. Journal of
the Royal Statistical Society. Series B (Methodological) 40 (3): 263–289. doi:10.2307/2984691. JSTOR 2984691.

[6] Giner, Goknur. “A monotonically convergent Newton iteration for the quantiles of any unimodal distribution, with application
to the inverse Gaussian distribution” (PDF).

2.18.10 References
• The inverse gaussian distribution: theory, methodology, and applications by Raj Chhikara and Leroy Folks, 1989
ISBN 0-8247-7997-5
• System Reliability Theory by Marvin Rausand and Arnljot Høyland
• The Inverse Gaussian Distribution by Dr. V. Seshadri, Oxford Univ Press, 1993

2.18.11 External links


• Inverse Gaussian Distribution in Wolfram website.

2.19 Lévy distribution


For the more general family of Lévy alpha-stable distributions, of which this distribution is a special case, see stable
distribution.

In probability theory and statistics, the Lévy distribution, named after Paul Lévy, is a continuous probability distribution
for a non-negative random variable. In spectroscopy, this distribution, with frequency as the dependent variable, is known
as a van der Waals profile.[note 1] It is a special case of the inverse-gamma distribution.
It is one of the few distributions that are stable and that have probability density functions that can be expressed analytically,
the others being the normal distribution and the Cauchy distribution.

2.19.1 Definition
The probability density function of the Lévy distribution over the domain x ≥ µ is


c e− 2(x−µ)
c

f (x; µ, c) =
2π (x − µ)3/2
where µ is the location parameter and c is the scale parameter. The cumulative distribution function is
144 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

(√ )
c
F (x; µ, c) = erfc
2(x − µ)

where erfc(z) is the complementary error function. The shift parameter µ has the effect of shifting the curve to the right
by an amount µ , and changing the support to the interval [ µ , ∞ ). Like all stable distributions, the Levy distribution
has a standard form f(x;0,1) which has the following property:

f (x; µ, c)dx = f (y; 0, 1)dy

where y is defined as

x−µ
y=
c
The characteristic function of the Lévy distribution is given by


−2ict
φ(t; µ, c) = eiµt− .

Note that the characteristic function can also be written in the same form used for the stable distribution with α = 1/2
and β = 1 :

1/2
(1−i sign(t))
φ(t; µ, c) = eiµt−|ct| .

Assuming µ = 0 , the nth moment of the unshifted Lévy distribution is formally defined by:

√ ∫ ∞
def c e−c/2x xn
mn = dx
2π 0 x3/2
which diverges for all n > 0 so that the moments of the Lévy distribution do not exist. The moment generating function
is then formally defined by:

√ ∫ ∞
def c e−c/2x+tx
M (t; c) = dx
2π 0 x3/2
which diverges for t > 0 and is therefore not defined in an interval around zero, so that the moment generating function is
not defined per se. Like all stable distributions except the normal distribution, the wing of the probability density function
exhibits heavy tail behavior falling off according to a power law:


c 1
lim f (x; µ, c) = .
x→∞ 2π x3/2
This is illustrated in the diagram below, in which the probability density functions for various values of c and µ = 0 are
plotted on a log-log scale.
Differential equation
{ c 3/2
}
e 2µ (− µ
c
)
f (x)(−c − 3µ + 3x) + 2(x − µ)2 f ′ (x) = 0, f (0) = √
2πc
2.19. LÉVY DISTRIBUTION 145

2.19.2 Related distributions


• If X ∼ Levy(µ, c) then kX + b ∼ Levy(kµ + b, kc)
• If X ∼ Levy(0, c) then X ∼ Inv-Gamma( 12 , 2c ) (inverse gamma distribution)
• Lévy distribution is a special case of type 5 Pearson distribution
−2
• If Y ∼ Normal(µ, σ 2 ) (Normal distribution) then (Y − µ) ∼ Levy(0, 1/σ 2 )
−2
• If X ∼ Normal(µ, √1σ ) then (X − µ) ∼ Levy(0, σ)

• If X ∼ Levy(µ, c) then X ∼ Stable(1/2, 1, c, µ) (Stable distribution)


• If X ∼ Levy(0, c) then X ∼ Scale-inv-χ2 (1, c) (Scaled-inverse-chi-squared distribution)
1
−2 √
• If X ∼ Levy(µ, c) then (X − µ) ∼ FoldedNormal(0, 1/ c) (Folded normal distribution)

2.19.3 Random sample generation


Random samples from the Lévy distribution can be generated using inverse transform sampling. Given a random variate
U drawn from the uniform distribution on the unit interval (0, 1], the variate X given by

c
X = F −1 (U ) = +µ
(Φ−1 (1 − U /2))2
is Lévy-distributed with location µ and scale c . Here Φ(x) is the cumulative distribution function of the standard normal
distribution.

2.19.4 Applications
• The frequency of geomagnetic reversals appears to follow a Lévy distribution
• The time of hitting a single point α (different from the starting point 0) by the Brownian motion has the Lévy
distribution with c = α2 . (For a Brownian motion with drift, this time may follow an inverse Gaussian distribution,
which has the Lévy distribution as a limit.)

• The length of the path followed by a photon in a turbid medium follows the Lévy distribution.[1]

• A Cauchy process can be defined as a Brownian motion subordinated to a process associated with a Lévy distribution.[2]

2.19.5 Footnotes
[1] “van der Waals profile” appears with lowercase “van” in almost all sources, such as: Statistical mechanics of the liquid surface
by Clive Anthony Croxton, 1980, A Wiley-Interscience publication, ISBN 0-471-27663-4, ISBN 978-0-471-27663-0, ; and in
Journal of technical physics, Volume 36, by Instytut Podstawowych Problemów Techniki (Polska Akademia Nauk), publisher:
Państwowe Wydawn. Naukowe., 1995,

2.19.6 Notes
[1] Rogers, Geoffrey L, Multiple path analysis of reflectance from turbid media. Journal of the Optical Society of America A, 25:11,
p 2879-2883 (2008).

[2] Applebaum, D. “Lectures on Lévy processes and Stochastic calculus, Braunschweig; Lecture 2: Lévy processes” (PDF). Uni-
versity of Sheffield. pp. 37–53.
146 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

2.19.7 References

• “Information on stable distributions”. Retrieved July 13, 2005. - John P. Nolan’s introduction to stable distributions,
some papers on stable laws, and a free program to compute stable densities, cumulative distribution functions,
quantiles, estimate parameters, etc. See especially An introduction to stable distributions, Chapter 1

2.19.8 External links

• Weisstein, Eric W., “Lévy Distribution”, MathWorld.

• Lévy and stock prices

2.20 Log-Cauchy distribution


In probability theory, a log-Cauchy distribution is a probability distribution of a random variable whose logarithm is
distributed in accordance with a Cauchy distribution. If X is a random variable with a Cauchy distribution, then Y = exp(X)
has a log-Cauchy distribution; likewise, if Y has a log-Cauchy distribution, then X = log(Y) has a Cauchy distribution.[1]

2.20.1 Characterization

Probability density function

The log-Cauchy distribution has the probability density function:

1
f (x; µ, σ) = [ ( )2 ] , x > 0
ln x−µ
xπσ 1 + σ
[ ]
1 σ
= , x>0
xπ (ln x − µ)2 + σ 2

where µ is a real number and σ > 0 .[1][2] If σ is known, the scale parameter is eµ .[1] µ and σ correspond to the location
parameter and scale parameter of the associated Cauchy distribution.[1][3] Some authors define µ and σ as the location
and scale parameters, respectively, of the log-Cauchy distribution.[3]
For µ = 0 and σ = 1 , corresponding to a standard Cauchy distribution, the probability density function reduces to:[4]

1
f (x; 0, 1) = , x>0
xπ(1 + (ln x)2 )

Cumulative distribution function

The cumulative distribution function (cdf) when µ = 0 and σ = 1 is:[4]

1 1
F (x; 0, 1) = + arctan(ln x), x > 0
2 π
2.20. LOG-CAUCHY DISTRIBUTION 147

Survival function

The survival function when µ = 0 and σ = 1 is:[4]

1 1
S(x; 0, 1) = − arctan(ln x), x > 0
2 π

Hazard rate

The hazard rate when µ = 0 and σ = 1 is:[4]

 
( ) −1
1 1 1
λ(x; 0, 1) =  ( ) − arctan(ln x)  , x > 0
xπ 1 + (ln x)
2 2 π

The hazard rate decreases at the beginning and at the end of the distribution, but there may be an interval over which the
hazard rate increases.[4]

2.20.2 Properties
The log-Cauchy distribution is an example of a heavy-tailed distribution.[5] Some authors regard it as a “super-heavy
tailed” distribution, because it has a heavier tail than a Pareto distribution-type heavy tail, i.e., it has a logarithmically
decaying tail.[5][6] As with the Cauchy distribution, none of the non-trivial moments of the log-Cauchy distribution are
finite.[4] The mean is a moment so the log-Cauchy distribution does not have a defined mean or standard deviation.[7][8]
The log-Cauchy distribution is infinitely divisible for some parameters but not for others.[9] Like the lognormal distri-
bution, log-t or log-Student distribution and Weibull distribution, the log-Cauchy distribution is a special case of the
generalized beta distribution of the second kind.[10][11] The log-Cauchy is actually a special case of the log-t distribution,
similar to the Cauchy distribution being a special case of the Student’s t distribution with 1 degree of freedom.[12][13]
Since the Cauchy distribution is a stable distribution, the log-Cauchy distribution is a logstable distribution.[14] Logstable
distributions have poles at x=0.[13]

2.20.3 Estimating parameters


The median of the natural logarithms of a sample is a robust estimator of µ .[1] The median absolute deviation of the
natural logarithms of a sample is a robust estimator of σ .[1]

2.20.4 Uses
In Bayesian statistics, the log-Cauchy distribution can be used to approximate the improper Jeffreys-Haldane density, 1/k,
which is sometimes suggested as the prior distribution for k where k is a positive parameter being estimated.[15][16] The
log-Cauchy distribution can be used to model certain survival processes where significant outliers or extreme results may
occur.[2][3][17] An example of a process where a log-Cauchy distribution may be an appropriate model is the time between
someone becoming infected with HIV virus and showing symptoms of the disease, which may be very long for some
people.[3] It has also been proposed as a model for species abundance patterns.[18]

2.20.5 References
[1] Olive, D.J. (June 23, 2008). “Applied Robust Statistics” (PDF). Southern Illinois University. p. 86. Retrieved 2011-10-18.
148 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

[2] Lindsey, J.K. (2004). Statistical analysis of stochastic processes in time. Cambridge University Press. pp. 33, 50, 56, 62, 145.
ISBN 978-0-521-83741-5.

[3] Mode, C.J. & Sleeman, C.K. (2000). Stochastic processes in epidemiology: HIV/AIDS, other infectious diseases. World Scientific.
pp. 29–37. ISBN 978-981-02-4097-4.

[4] Marshall, A.W. & Olkin, I. (2007). Life distributions: structure of nonparametric, semiparametric, and parametric families.
Springer. pp. 443–444. ISBN 978-0-387-20333-1.

[5] Falk, M., Hüsler, J. & Reiss, R. (2010). Laws of Small Numbers: Extremes and Rare Events. Springer. p. 80. ISBN 978-3-
0348-0008-2.

[6] Alves, M.I.F., de Haan, L. & Neves, C. (March 10, 2006). “Statistical inference for heavy and super-heavy tailed distributions”
(PDF).

[7] “Moment”. Mathworld. Retrieved 2011-10-19.

[8] Wang, Y. “Trade, Human Capital and Technology Spillovers: An Industry Level Analysis”. Carleton University. p. 14.

[9] Bondesson, L. (2003). “On the Lévy Measure of the Lognormal and LogCauchy Distributions”. Methodology and Computing
in Applied Probability (Kluwer Academic Publications): 243–256. Retrieved 2011-10-18.

[10] Knight, J. & Satchell, S. (2001). Return distributions in finance. Butterworth-Heinemann. p. 153. ISBN 978-0-7506-4751-9.

[11] Kemp, M. (2009). Market consistency: model calibration in imperfect markets. Wiley. ISBN 978-0-470-77088-7.

[12] MacDonald, J.B. (1981). “Measuring Income Inequality”. In Taillie, C., Patil, G.P. & Baldessari, B. Statistical distributions in
scientific work: proceedings of the NATO Advanced Study Institute. Springer. p. 169. ISBN 978-90-277-1334-6.

[13] Kleiber, C. & Kotz, S. (2003). Statistical Size Distributions in Economics and Actuarial Science. Wiley. pp. 101–102, 110.
ISBN 978-0-471-15064-0.

[14] Panton, D.B. (May 1993). “Distribution function values for logstable distributions”. Computers & Mathematics with Applications
25 (9): 17–24. doi:10.1016/0898-1221(93)90128-I. Retrieved 2011-10-18.

[15] Good, I.J. (1983). Good thinking: the foundations of probability and its applications. University of Minnesota Press. p. 102.
ISBN 978-0-8166-1142-3.

[16] Chen, M. (2010). Frontiers of Statistical Decision Making and Bayesian Analysis. Springer. p. 12. ISBN 978-1-4419-6943-9.

[17] Lindsey, J.K., Jones, B. & Jarvis, P.; Jones; Jarvis (September 2001). “Some statistical issues in modelling pharmacokinetic
data”. Statistics in Medicine 20 (17–18): 2775–278. doi:10.1002/sim.742. Retrieved 2011-10-19.

[18] Zuo-Yun, Y.; et al. (June 2005). “LogCauchy, log-sech and lognormal distributions of species abundances in forest communi-
ties”. Ecological Modelling 184 (2–4): 329–340. doi:10.1016/j.ecolmodel.2004.10.011. Retrieved 2011-10-18.

2.21 Log-Laplace distribution


In probability theory and statistics, the log-Laplace distribution is the probability distribution of a random variable
whose logarithm has a Laplace distribution. If X has a Laplace distribution with parameters μ and b, then Y = eX has a
log-Laplace distribution. The distributional properties can be derived from the Laplace distribution.

2.21.1 Characterization

Probability density function

A random variable has a log-Laplace(μ, b) distribution if its probability density function is:[1]
2.22. LOG-LOGISTIC DISTRIBUTION 149

( )
f (x|µ, b) = 1
2bx exp − | ln x−µ|
b

 ( )

exp − µ−ln x
if x < µ
1 b
= ( )
2bx 
exp − ln x−µ if x ≥ µ
b

The cumulative distribution function for Y when y > 0, is

F (y) = 0.5 [1 + sgn(log(y) − µ) (1 − exp(−| log(y) − µ|/b))].

Versions of the log-Laplace distribution based on an asymmetric Laplace distribution also exist.[2] Depending on the
parameters, including asymmetry, the log-Laplace may or may not have a finite mean and a finite variance.[2]
Differential equation
{ µ }
 −
 bxf ′ (x) + (b − 1)f (x) = 0, f (1) = e 2bb if x < µ
{ µ }

 bxf ′ (x) + (b + 1)f (x) = 0, f (1) = e b
2b if x ≥ µ

2.21.2 References

[1] Lindsey, J.K. (2004). Statistical analysis of stochastic processes in time. Cambridge University Press. p. 33. ISBN 978-0-521-
83741-5.

[2] Kozubowski, T.J. & Podgorski, K. “A Log-Laplace Growth Rate Model” (PDF). University of Nevada-Reno. p. 4. Retrieved
2011-10-21.

2.22 Log-logistic distribution

In probability and statistics, the log-logistic distribution (known as the Fisk distribution in economics) is a continuous
probability distribution for a non-negative random variable. It is used in survival analysis as a parametric model for events
whose rate increases initially and decreases later, for example mortality rate from cancer following diagnosis or treatment.
It has also been used in hydrology to model stream flow and precipitation, and in economics as a simple model of the
distribution of wealth or income.
The log-logistic distribution is the probability distribution of a random variable whose logarithm has a logistic distribution.
It is similar in shape to the log-normal distribution but has heavier tails. Unlike the log-normal, its cumulative distribution
function can be written in closed form.

2.22.1 Characterisation

There are several different parameterizations of the distribution in use. The one shown here gives reasonably interpretable
parameters and a simple form for the cumulative distribution function.[1][2] The parameter α > 0 is a scale parameter
and is also the median of the distribution. The parameter β > 0 is a shape parameter. The distribution is unimodal when
β > 1 and its dispersion decreases as β increases.
The cumulative distribution function is
150 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

1
F (x; α, β) =
1 + (x/α)−β
(x/α)β
=
1 + (x/α)β

= β
α + xβ

where x > 0 , α > 0 , β > 0.


The probability density function is

(β/α)(x/α)β−1
f (x; α, β) = 2
(1 + (x/α)β )

Alternative parameterization

An alternative parametrization is given by the pair µ, s in analogy with the logistic distribution:

µ = ln(α)

s = 1/β

2.22.2 Properties

Moments

The k th raw moment exists only when k < β, when it is given by[3][4]

E(X k ) = αk B(1 − k/β, 1 + k/β)


k π/β
= αk
sin(k π/β)

where B() is the beta function. Expressions for the mean, variance, skewness and kurtosis can be derived from this.
Writing b = π/β for convenience, the mean is

E(X) = αb/ sin b, β > 1,

and the variance is

( )
Var(X) = α2 2b/ sin 2b − b2 / sin2 b , β > 2.

Explicit expressions for the skewness and kurtosis are lengthy.[5] As β tends to infinity the mean tends to α , the variance
and skewness tend to zero and the excess kurtosis tends to 6/5 (see also related distributions below).
2.22. LOG-LOGISTIC DISTRIBUTION 151

Quantiles

The quantile function (inverse cumulative distribution function) is :

( )1/β
p
F −1 (p; α, β) = α .
1−p

It follows that the median is α , the lower quartile is 31/β α and the upper quartile is 3−1/β α .

2.22.3 Applications
Survival analysis

The log-logistic distribution provides one parametric model for survival analysis. Unlike the more commonly used Weibull
distribution, it can have a non-monotonic hazard function: when β > 1, the hazard function is unimodal (when β ≤ 1,
the hazard decreases monotonically). The fact that the cumulative distribution function can be written in closed form is
particularly useful for analysis of survival data with censoring.[6] The log-logistic distribution can be used as the basis of
an accelerated failure time model by allowing α to differ between groups, or more generally by introducing covariates that
affect α but not β by modelling log(α) as a linear function of the covariates.[7]
The survival function is

S(t) = 1 − F (t) = [1 + (t/α)β ]−1 ,

and so the hazard function is

f (t) (β/α)(t/α)β−1
h(t) = = .
S(t) 1 + (t/α)β

Hydrology

The log-logistic distribution has been used in hydrology for modelling stream flow rates and precipitation.[1][2]
Extreme values like maximum one-day rainfall and river discharge per month or per year often follow a log-normal
distribution.[8] The log-normal distribution, however, needs a numeric approximation. As the log-logistic distribution,
which can be solved analytically, is similar to the log-normal distribution, it can be used instead.
The blue picture illustrates an example of fitting the log-logistic distribution to ranked maximum one-day October rainfalls
and it shows the 90% confidence belt based on the binomial distribution. The rainfall data are represented by the plotting
position r/(n+1) as part of the cumulative frequency analysis.

Economics

The log-logistic has been used as a simple model of the distribution of wealth or income in economics, where it is known
as the Fisk distribution.[9] Its Gini coefficient is 1/β .[10]

Networking

The log-logistic has been used as a model for the period of time beginning when some data leaves a software user appli-
cation in a computer and the response is received by the same application after travelling through and being processed by
152 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

other computers, applications, and network segments, most or all of them without hard real-time guarantees (for example,
when an application is displaying data coming from a remote sensor connected to the Internet). It has been shown to be a
more accurate probabilistic model for that than the log-normal distribution or others, as long as abrupt changes of regime
in the sequences of those times are properly detected.[11]

2.22.4 Related distributions


• If X ∼ LL(α, β) then kX ∼ LL(kα, β)

• LL(α, β) ∼ Dagum(1, α, β) (Dagum distribution)

• LL(α, β) ∼ SinghMaddala(1, α, β) (Singh-Maddala distribution)



• LL(γ, σ) ∼ β (1, 1, γ, σ) Beta prime distribution

• If X has a log-logistic distribution with scale parameter α and shape parameter β then Y = log(X) has a logistic
distribution with location parameter log(α) and scale parameter 1/β .

• As the shape parameter β of the log-logistic distribution increases, its shape increasingly resembles that of a (very
narrow) logistic distribution. Informally, as β →∞,

LL(α, β) → L(α, α/β).

• The log-logistic distribution with shape parameter β = 1 and scale parameter α is the same as the generalized
Pareto distribution with location parameter µ = 0 , shape parameter ξ = 1 and scale parameter α :

LL(α, 1) = GP D(1, α, 1).

• The addition of another parameter (a shift parameter) formally results in a shifted log-logistic distribution, but
this is usually considered in a different parameterization so that the distribution can be bounded above or bounded
below.

Generalizations

Several different distributions are sometimes referred to as the generalized log-logistic distribution, as they contain
the log-logistic as a special case.[10] These include the Burr Type XII distribution (also known as the Singh-Maddala
distribution) and the Dagum distribution, both of which include a second shape parameter. Both are in turn special cases
of the even more general generalized beta distribution of the second kind. Another more straightforward generalization of
the log-logistic is the shifted log-logistic distribution.

2.22.5 See also


• Probability distributions: List of important distributions supported on semi-infinite intervals

2.22.6 References
[1] Shoukri, M.M.; Mian, I.U.M.; Tracy, D.S. (1988), “Sampling Properties of Estimators of the Log-Logistic Distribution with
Application to Canadian Precipitation Data”, The Canadian Journal of Statistics 16 (3): 223–236, doi:10.2307/3314729, JSTOR
3314729

[2] Ashkar, Fahim; Mahdi, Smail (2006), “Fitting the log-logistic distribution by generalized moments”, Journal of Hydrology 328
(3–4): 694–703, doi:10.1016/j.jhydrol.2006.01.014
2.23. LOG-NORMAL DISTRIBUTION 153

[3] Tadikamalla, Pandu R.; Johnson, Norman L. (1982), “Systems of Frequency Curves Generated by Transformations of Logistic
Variables”, Biometrika 69 (2): 461–465, doi:10.1093/biomet/69.2.461, JSTOR 2335422

[4] Tadikamalla, Pandu R. (1980), “A Look at the Burr and Related Distributions”, International Statistical Review 48 (3): 337–344,
doi:10.2307/1402945, JSTOR 1402945

[5] McLaughlin, Michael P. (2001), A Compendium of Common Probability Distributions (PDF), p. A–37, retrieved 2008-02-15

[6] Bennett, Steve (1983), “Log-Logistic Regression Models for Survival Data”, Journal of the Royal Statistical Society, Series C 32
(2): 165–171, doi:10.2307/2347295, JSTOR 2347295

[7] Collett, Dave (2003), Modelling Survival Data in Medical Research (2nd ed.), CRC press, ISBN 1-58488-325-1

[8] Ritzema (ed.), H.P. (1994), Frequency and Regression Analysis (PDF), Chapter 6 in: Drainage Principles and Applications,
Publication 16, International Institute for Land Reclamation and Improvement (ILRI), Wageningen, The Netherlands, pp. 175–
224, ISBN 90-70754-33-9

[9] Fisk, P.R. (1961), “The Graduation of Income Distributions”, Econometrica 29 (2): 171–185, doi:10.2307/1909287, JSTOR
1909287

[10] Kleiber, C.; Kotz, S (2003), Statistical Size Distributions in Economics and Actuarial Sciences, Wiley, ISBN 0-471-15064-9

[11] Gago-Benítez, A.; Fernández-Madrigal J.-A., Cruz-Martín, A. (2013), Log-Logistic Modeling of Sensory Flow Delays in Net-
worked Telerobots, IEEE Sensors 13(8), pp. 2944 – 2953

2.23 Log-normal distribution


In probability theory, a log-normal (or lognormal) distribution is a continuous probability distribution of a random
variable whose logarithm is normally distributed. Thus, if the random variable X is log-normally distributed, then Y =
ln(X) has a normal distribution. Likewise, if Y has a normal distribution, then X = exp(Y ) has a log-normal distribution.
A random variable which is log-normally distributed takes only positive real values. The distribution is occasionally
referred to as the Galton distribution or Galton’s distribution, after Francis Galton.[1] The log-normal distribution
also has been associated with other names, such as McAlister, Gibrat and Cobb–Douglas.[1]
A log-normal process is the statistical realization of the multiplicative product of many independent random variables,
each of which is positive. This is justified by considering the central limit theorem in the log domain. The log-normal
distribution is the maximum entropy probability distribution for a random variate X for which the mean and variance of
ln(X) are specified.[2]

2.23.1 Notation
Given a log-normally distributed random variable X and two parameters µ and σ that are, respectively, the mean and
standard deviation of the variable’s natural logarithm, then the logarithm of X is normally distributed, and we can write
X as

X = eµ+σZ

with Z a standard normal variable.


This relationship is true regardless of the base of the logarithmic or exponential function. If loga (Y ) is normally dis-
tributed, then so is logb (Y ) , for any two positive numbers a, b ̸= 1 . Likewise, if eX is log-normally distributed, then so
is aX , where a is a positive number ̸= 1 .
On a logarithmic scale, µ and σ can be called the location parameter and the scale parameter, respectively.
In contrast, the mean, standard deviation, and variance of the non-logarithmized sample values are respectively denoted
m , s.d., and v in this article. The two sets of parameters can be related as (see also Arithmetic moments below)[3]
154 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

( ) √ (
m v )
µ = ln √ v
, σ = ln 1 + 2
1+ m2
m

2.23.2 Characterization
Probability density function

A random positive variable x is log-normally distributed if the logarithm of x is normally distributed,

[ ]
1 (lnx − µ)2
N (lnx; µ, σ) = √ exp − , x > 0.
σ 2π 2σ 2

A change of variables must conserve differential probability. In particular,

dlnx dx
N (lnx)dlnx = N (lnx) dx = N (lnx) = ln N (x)dx,
dx x
where

[ ]
1 (lnx − µ)2
ln N (x; µ, σ) = √ exp − , x>0
xσ 2π 2σ 2

is the log-normal probability density function.[1]

Cumulative distribution function

The cumulative distribution function is

∫ [ ( )] ( ) ( )
x
1 ln x − µ 1 ln x − µ ln x − µ
ln N (ξ; µ, σ)dξ = 1 + erf √ = erfc − √ =Φ ,
0 2 σ 2 2 σ 2 σ

where erfc is the complementary error function, and Φ is the cumulative distribution function of the standard normal
distribution.

Characteristic function and moment generating function

n2 σ 2
All moments of the log-normal distribution exist and it holds that: E[X n ] = enµ+ 2 (which can be derived by letting
2
z = ln(x)−(µ+nσ
σ
)
within the integral). However, the expected value E[etX ] is not defined for any positive value of the
argument t as the defining integral diverges. In consequence the moment generating function is not defined.[4] The last is
related to the fact that the lognormal distribution is not uniquely determined by its moments.
Similarly, the characteristic function E[eitX ] is not defined in the half complex plane and therefore it is not analytic
in the origin. In consequence, the characteristic function of the log-normal distribution cannot be represented as an
∑∞ n
nµ+n2 σ 2 /2
infinite convergent series.[5] In particular, its Taylor formal series n=0 (it) n! e diverges. However, a number
[5][6][7][8]
of alternative divergent series representations have been obtained
A closed-form formula for the characteristic function φ(t) with t in the domain of convergence is not known. A relatively
simple approximating formula is available in closed form and given by[9]
2.23. LOG-NORMAL DISTRIBUTION 155

( )
W 2 (tσ 2 eµ ) + 2W (tσ 2 eµ )
exp −
√ 2σ 2
φ(t) ≈
1+W (tσ 2 eµ )

where W is the Lambert W function. This approximation is derived via an asymptotic method but it stays sharp all over
the domain of convergence of φ .

2.23.3 Properties
Location and scale

The location and scale parameters of a log-normal distribution, i.e. µ and σ , are more readily treated using the geometric
mean, GM[X] , and the geometric standard deviation, GSD[X] , rather than the arithmetic mean, E[X] , and the arithmetic
standard deviation, SD[X] .

Geometric moments The geometric mean of the log-normal distribution is GM[X] = eµ , and the geometric stan-
dard deviation is GSD[X] = eσ .[10][11] By analogy with the arithmetic statistics, one can define a geometric variance,
2
GVar[X] = eσ , and a geometric coefficient of variation,[10] GCV[X] = eσ − 1 .
Because the log-transformed variable Y = ln X is symmetric and quantiles are preserved under monotonic transforma-
tions, the geometric mean of a log-normal distribution is equal to its median, Med[X] .[12]
Note that the geometric mean is less than the arithmetic mean. This is due to the AM–GM inequality, and corresponds
to the logarithm being convex down. In fact,

1 2 √ 2 √
E[X] = eµ+ 2 σ = eµ · eσ = GM[X] · GVar[X].

In finance the term e− 2 σ is sometimes interpreted as a convexity correction. From the point of view of stochastic
1 2

calculus, this is the same correction term as in Itō's lemma for geometric Brownian motion.

Arithmetic moments The arithmetic mean, arithmetic variance, and arithmetic standard deviation of a log-normally
distributed variable X are given by

1 2
E[X] = eµ+ 2 σ ,
2 2 2
Var[X] = (eσ − 1)e2µ+σ = (eσ − 1)(E[X])2 ,
√ 1 2√ √
SD[X] = Var[X] = eµ+ 2 σ eσ2 − 1 = E[X] eσ2 − 1,
respectively.
The location ( µ ) and scale ( σ ) parameters can be obtained if the arithmetic mean and the arithmetic variance are
known; it is simpler if σ is computed first:

( )
1 Var[X] 1
µ = ln(E[X]) − ln 1 + 2
= ln(E[X]) − σ 2 ,
2 (E[X]) 2
( )
Var[X]
σ 2 = ln 1 + .
(E[X])2
For any real or complex number s , the s th moment of a log-normally distributed variable X is given by[1]

1 2
σ2
E[X s ] = esµ+ 2 s .
156 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

A log-normal distribution is not uniquely determined by its moments E[X k ] for k ≥ 1 , that is, there exists some other
distribution with the same moments for all k .[1] In fact, there is a whole family of distributions with the same moments
as the log-normal distribution.

Mode and median

The mode is the point of global maximum of the probability density function. In particular, it solves the equation (ln f )′ =
0:

2
Mode[X] = eµ−σ .
The median is such a point where FX = 0.5 :

Med[X] = eµ .

Arithmetic coefficient of variation


SD[X]
The arithmetic coefficient of variation CV[X] is the ratio E[X] (on the natural scale). For a log-normal distribution it is
equal to


CV[X] = eσ2 − 1.
Contrary to the arithmetic standard deviation, the arithmetic coefficient of variation is independent of the arithmetic mean.

Partial expectation
∫∞
The partial expectation of a random variable X with respect to a threshold k is defined as g(k) = k xln N (x) dx where
ln N (x) is the probability density function of X . Alternatively, and using the definition of conditional expectation, it can
be written as g(k) = E[X|X > k]P (X > k) . For a log-normal random variable the partial expectation is given by:

∫ ∞ ( )
1 2 µ + σ 2 − ln k
g(k) = xln N (x) dx = eµ+ 2 σ Φ .
k σ
Where Phi is the normal cumulative distribution function. The derivation of the formula is provided in the discussion of
this Wikipedia entry. The partial expectation formula has applications in insurance and economics, it is used in solving
the partial differential equation leading to the Black–Scholes formula.

Conditional expectation

The conditional expectation of a lognormal random variable X with respect to a threshold k is its partial expectation
divided by the cumulative probability of being in that range:

µ+σ 2 /2 Φ[ (ln(k)−µ−σ ]
E[X|X < k] = e · σ
1
2 + 12 erf [ ln(k)−µ


]
2

µ+σ 2 /2 Φ[ (µ+σ σ−ln(k) ]


E[X|X ⩾ k] = e ·
1
2 − 12 erf [ ln(k)−µ


]
2.23. LOG-NORMAL DISTRIBUTION 157

Other

A set of data that arises from the log-normal distribution has a symmetric Lorenz curve (see also Lorenz asymmetry
coefficient).[13]
The harmonic H , geometric G and arithmetic A means of this distribution are related;[14] such relation is given by

G2
H= .
A
Log-normal distributions are infinitely divisible,[15] but they are not stable distributions, which can be easily drawn
from.[16]

2.23.4 Occurrence
The log-normal distribution is important in the description of natural phenomena. The reason is that for many natural
processes of growth, relative growth rate is independent of size. This is also known as Gibrat’s law, after Robert Gibrat
(1904–1980) who formulated it for companies. It can be shown that a growth process following Gibrat’s law will result
in entity sizes with a log-normal distribution.[17] Examples include:

• In biology and medicine,


• Measures of size of living tissue (length, skin area, weight);[18]
• For highly communicable epidemics, such as SARS in 2003, if publication intervention is involved, the num-
ber of hospitalized cases is shown to satistfy the lognormal distribution with no free parameters if an entropy is
assumed and the standard deviation is determined by the principle of maximum rate of entropy production.[19]
• The length of inert appendages (hair, claws, nails, teeth) of biological specimens, in the direction of growth;
• Certain physiological measurements, such as blood pressure of adult humans (after separation on male/female
subpopulations)[20]

Consequently, reference ranges for measurements in healthy individuals are more accurately estimated by
assuming a log-normal distribution than by assuming a symmetric distribution about the mean.

• In colloidal chemistry and polymer chemistry


• Particle size distributions
• Molar mass distributions
• In hydrology, the log-normal distribution is used to analyze extreme values of such variables as monthly and annual
maximum values of daily rainfall and river discharge volumes.[21]
• The image on the right illustrates an example of fitting the log-normal distribution to ranked annually maxi-
mum one-day rainfalls showing also the 90% confidence belt based on the binomial distribution. The rainfall
data are represented by plotting positions as part of a cumulative frequency analysis.
• In social sciences and demographics
• In economics, there is evidence that the income of 97%–99% of the population is distributed log-normally.[22]
(The distribution of higher-income individuals follows a Pareto distribution.[23] )
• In finance, in particular the Black–Scholes model, changes in the logarithm of exchange rates, price indices,
and stock market indices are assumed normal[24] (these variables behave like compound interest, not like
simple interest, and so are multiplicative). However, some mathematicians such as Benoît Mandelbrot have
argued [25] that log-Lévy distributions which possesses heavy tails would be a more appropriate model, in
particular for the analysis for stock market crashes. Indeed, stock price distributions typically exhibit a fat
tail.[26]
158 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

• city sizes
• Technology
• In reliability analysis, the lognormal distribution is often used to model times to repair a maintainable system.[27]
• In wireless communication, “the local-mean power expressed in logarithmic values, such as dB or neper, has a
normal (i.e., Gaussian) distribution.” [28] Also, the random obstruction of radio signals due to large buildings
and hills, called shadowing, is often modeled as a lognormal distribution.
• It has been proposed that coefficients of friction and wear may be treated as having a lognormal distribution
[29]

• In spray process, such as droplet impact,



the size of secondary produced droplet has a lognormal distribution,
with the standard deviation : σ = 66 determined by the principle of maximum rate of entropy production[30]
It is an open question whether this value of σ has some generality for other cases, though for spreading of
communicable epidemics, σ is shown also to take this value.[19]
• Particle size distributions produced by comminution with random impacts, such as in ball milling
• The file size distribution of publicly available audio and video data files (MIME types) follows a log-normal
distribution over five orders of magnitude.[31]
• The length of chess games tends to follow a log normal distribution.[32]

2.23.5 Maximum likelihood estimation of parameters


For determining the maximum likelihood estimators of the log-normal distribution parameters μ and σ, we can use the
same procedure as for the normal distribution. To avoid repetition, we observe that

n (
∏ )
1
L(x; µ, σ) = N (ln x; µ, σ)
i=1
xi

where by L we denote the probability density function of the log-normal distribution and by N that of the normal distri-
bution. Therefore, using the same indices to denote distributions, we can write the log-likelihood function thus:


ℓL (µ, σ|x1 , x2 , . . . , xn ) = − ln xk + ℓN (µ, σ| ln x1 , ln x2 , . . . , ln xn )
k
= constant +ℓN (µ, σ| ln x1 , ln x2 , . . . , ln xn ).

Since the first term is constant with regard to μ and σ, both logarithmic likelihood functions, ℓL and ℓN , reach their
maximum with the same µ and σ . Hence, using the formulas for the normal distribution maximum likelihood parameter
estimators and the equality above, we deduce that for the log-normal distribution it holds that

∑ ∑ 2
ln xk 2 (ln xk − µ
b)
b=
µ k
b =
,σ k
.
n n

2.23.6 Multivariate log-normal


If X ∼ N (µ, Σ) is a multivariate normal distribution then Y = exp(X) has a multivariate log-normal distribution[33]
with mean

1
E[Y ]i = eµi + 2 Σii ,
2.23. LOG-NORMAL DISTRIBUTION 159

and covariance matrix

1
Var[Y ]ij = eµi +µj + 2 (Σii +Σjj ) (eΣij − 1).

2.23.7 Related distributions

• If X ∼ N (µ, σ 2 ) is a normal distribution, then exp(X) ∼ ln N (µ, σ 2 ).

• If X ∼ ln N (µ, σ 2 ) is distributed log-normally, then ln(X) ∼ N (µ, σ 2 ) is a normal random variable.


∏n
• If Xj ∼ ln N (µj , σj2 ) are n independent log-normally distributed variables, and Y = j=1 Xj , then Y is also
distributed log-normally:

(∑ ∑n )
n
Y ∼ ln N j=1 µj , j=1 σj2 .

• Let Xj ∼ ln N (µj , σj2 ) be independent log-normally distributed variables with possibly varying σ and µ parame-
∑n
ters, and Y = j=1 Xj . The distribution of Y has no closed-form expression, but can be reasonably approximated
by another log-normal distribution Z at the right tail.[34] Its probability density function at the neighborhood of 0
has been characterized[16] and it does not resemble any log-normal distribution. A commonly used approximation
due to L.F. Fenton (but previously stated by R.I. Wilkinson and mathematical justified by Marlow[35] ) is obtained
by matching the mean and variance of another lognormal distribution:

[∑ 2 2
]
2 e2µj +σj (eσj − 1)
σZ = ln ∑ 2 +1 ,
( eµj +σj /2 )2
[∑ 2
] σ2
µZ = ln eµj +σj /2 − Z .
2
In the case that all Xj have the same variance parameter σj = σ , these formulas simplify to

[ ∑ 2µj ]
2 e
2
σZ = ln (eσ − 1) ∑ µj 2 + 1 ,
( e )
[∑ ] σ2 σ2
µZ = ln eµj + − Z.
2 2

• If X ∼ ln N (µ, σ 2 ) , then X + c is said to have a shifted log-normal distribution with support x ∈ (c, +∞) .
E[X + c] = E[X] + c , Var[X + c] = Var[X] .

• If X ∼ ln N (µ, σ 2 ) , then aX ∼ ln N (µ + ln a, σ 2 ).

• If X ∼ ln N (µ, σ 2 ) , then 1
X ∼ ln N (−µ, σ 2 ).

• If X ∼ ln N (µ, σ 2 ) then X a ∼ ln N (aµ, a2 σ 2 ). for a ̸= 0

• Lognormal distribution is a special case of semi-bounded Johnson distribution

• If X|Y ∼ Rayleigh(Y ) with Y ∼ ln N (µ, σ 2 ) , then X ∼ Suzuki(µ, σ) (Suzuki distribution)


160 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

2.23.8 Similar distributions


A substitute for the log-normal whose integral can be expressed in terms of more elementary functions[36] can be obtained
based on the logistic distribution to get an approximation for the CDF

[( )π/(σ√3) ]−1

F (x; µ, σ) = +1 .
x

This is a log-logistic distribution.

2.23.9 See also


• Log-distance path loss model

• Slow fading

2.23.10 Notes
[1] Johnson, Norman L.; Kotz, Samuel; Balakrishnan, N. (1994), “14: Lognormal Distributions”, Continuous univariate distribu-
tions. Vol. 1, Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics (2nd ed.), New York:
John Wiley & Sons, ISBN 978-0-471-58495-7, MR 1299979

[2] Park, Sung Y.; Bera, Anil K. (2009). “Maximum entropy autoregressive conditional heteroskedasticity model” (PDF). Journal
of Econometrics (Elsevier) 150 (2): 219–230. doi:10.1016/j.jeconom.2008.12.014. Retrieved 2011-06-02.

[3] “Lognormal mean and variance”

[4] Heyde, CC. (1963), “On a property of the lognormal distribution”, Journal of the Royal Statistical Society, Series B (Method-
ological) 25 (2): 392–393, doi:10.1007/978-1-4419-5823-5_6

[5] Holgate, P. (1989). “The lognormal characteristic function, vol. 18, pp. 4539–4548, 1989”. Communications in Statistical –
Theory and Methods 18 (12): 4539–4548. doi:10.1080/03610928908830173.

[6] Barakat, R. (1976). “Sums of independent lognormally distributed random variables”. Journal of the Optical Society of America
66 (3): 211–216. doi:10.1364/JOSA.66.000211.

[7] Barouch, E.; Kaufman, GM.; Glasser, ML. (1986). “On sums of lognormal random variables” (PDF). Studies in Applied
Mathematics 75 (1): 37–55.

[8] Leipnik, Roy B. (January 1991). “On Lognormal Random Variables: I – The Characteristic Function”. Journal of the Australian
Mathematical Society Series B 32 (3): 327–347. doi:10.1017/S0334270000006901.

[9] S. Asmussen, J.L. Jensen, L. Rojas-Nandayapa. “On the Laplace transform of the Lognormal distribution”, Thiele centre
preprint, (2013).

[10] Kirkwood, Thomas BL (Dec 1979). “Geometric means and measures of dispersion”. Biometrics 35 (4): 908–9. doi:10.2307/2530139.

[11] Limpert, E; Stahel, W; Abbt, M (2001). “Lognormal distributions across the sciences: keys and clues”. BioScience 51 (5):
341–352. doi:10.1641/0006-3568(2001)051[0341:LNDATS]2.0.CO;2.

[12] Daly, Leslie E.; Bourke, Geoffrey Joseph (2000). Interpretation and uses of medical statistics (5th ed.). Wiley-Blackwell. p. 89.
doi:10.1002/9780470696750. ISBN 978-0-632-04763-5.

[13] Damgaard, Christian; Weiner, Jacob (2000). “Describing inequality in plant size or fecundity”. Ecology 81 (4): 1139–1142.
doi:10.1890/0012-9658(2000)081[1139:DIIPSO]2.0.CO;2.

[14] Rossman, Lewis A (July 1990). “Design stream flows based on harmonic means”. J Hydraulic Engineering 116 (7): 946–950.
doi:10.1061/(ASCE)0733-9429(1990)116:7(946).
2.23. LOG-NORMAL DISTRIBUTION 161

[15] Olof Thorin (1977), ["On the Infinite Divisibility of the Log Normal Distribution"]. Scandinavian Actuarial Journal, 03/1977;
1977(3):121-148. Template:10.1080/03461238.1977.10405635

[16] Gao, X.; Xu, H; Ye, D. (2009), “Asymptotic Behaviors of Tail Density for Sum of Correlated Lognormal Variables”. Interna-
tional Journal of Mathematics and Mathematical Sciences, vol. 2009, Article ID 630857. doi:10.1155/2009/630857

[17] Sutton, John (Mar 1997). “Gibrat’s Legacy”. Journal of Economic Literature 32 (1): 40–59. JSTOR 2729692.

[18] Huxley, Julian S. (1932). Problems of relative growth. London. ISBN 0-486-61114-0. OCLC 476909537.

[19] Wang, WB; Wang, CF; Wu, ZN; Hu, RF (2013). “Modelling the spreading rate of controlled communicable epidemics through
an entropy-based thermodynamic model” 56 (11). SCIENCE CHINA Physics, Mechanics & Astronomy. pp. 2143–2150.

[20] Makuch, Robert W.; D.H. Freeman; M.F. Johnson (1979). “Justification for the lognormal distribution as a model for blood
pressure”. Journal of Chronic Diseases 32 (3): 245–250. doi:10.1016/0021-9681(79)90070-5. Retrieved 27 February 2012.

[21] Ritzema (ed.), H.P. (1994). Frequency and Regression Analysis (PDF). Chapter 6 in: Drainage Principles and Applications,
Publication 16, International Institute for Land Reclamation and Improvement (ILRI), Wageningen, The Netherlands. pp.
175–224. ISBN 90-70754-33-9.

[22] Clementi, Fabio; Gallegati, Mauro (2005) “Pareto’s law of income distribution: Evidence for Germany, the United Kingdom,
and the United States”, EconWPA

[23] Wataru, Souma (2002-02-22). “Physics of Personal Income”. arXiv:cond-mat/0202388.

[24] Black, F.; Scholes, M. (1973). “The Pricing of Options and Corporate Liabilities”. Journal of Political Economy 81 (3): 637.
doi:10.1086/260062.

[25] Mandelbrot, Benoit (2004). The (mis-)Behaviour of Markets. Basic Books. ISBN 9780465043552.

[26] Bunchen, P., Advanced Option Pricing, University of Sydney coursebook, 2007

[27] O'Connor, Patrick; Kleyner, Andre (2011). Practical Reliability Engineering. John Wiley & Sons. p. 35. ISBN 978-0-470-
97982-2.

[28] http://wireless.per.nl/reference/chaptr03/shadow/shadow.htm Archived May 9, 2015 at the Wayback Machine

[29] Steele, C. (2008). “Use of the lognormal distribution for the coefficients of friction and wear”. Reliability Engineering & System
Safety 93 (10): 1574–2013. doi:10.1016/j.ress.2007.09.005.

[30] Wu, Zi-Niu (July 2003). “Prediction of the size distribution of secondary ejected droplets by crown splashing of droplets
impinging on a solid wall”. Probabilistic Engineering Mechanics 18 (3): 241–249. doi:10.1016/S0266-8920(03)00028-6.

[31] Gros, C; Kaczor, G.; Markovic, D (2012). “Neuropsychological constraints to human data production on a global scale”. The
European Physical Journal B 85 (28). doi:10.1140/epjb/e2011-20581-3.

[32] http://chess.stackexchange.com/questions/2506/what-is-the-average-length-of-a-game-of-chess/4899#4899

[33] Tarmast, Ghasem (2001). Multivariate Log–Normal Distribution (PDF). ISI Proceedings: 53rd Session. Seoul.

[34] Asmussen, S.; Rojas-Nandayapa, L. (2008). “Asymptotics of Sums of Lognormal Random Variables with Gaussian Copula”.
Statistics and Probability Letters 78 (16): 2709–2714. doi:10.1016/j.spl.2008.03.035.

[35] Marlow, NA. (Nov 1967). “A normal limit theorem for power sums of independent normal random variables”. Bell System
Technical Journal 46 (9): 2081–2089. doi:10.1002/j.1538-7305.1967.tb04244.x.

[36] Swamee, P. K. (2002). “Near Lognormal Distribution”. Journal of Hydrologic Engineering 7 (6): 441–444. doi:10.1061/(ASCE)1084-
0699(2002)7:6(441).
162 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

2.23.11 References
• Crow, Edwin L.; Shimizu, Kunio (Editors) (1988), Lognormal Distributions, Theory and Applications, Statis-
tics: Textbooks and Monographs 88, New York: Marcel Dekker, Inc., pp. xvi+387, ISBN 0-8247-7803-0, MR
0939191, Zbl 0644.62014
• Aitchison, J. and Brown, J.A.C. (1957) The Lognormal Distribution, Cambridge University Press.
• E. Limpert, W. Stahel and M. Abbt (2001) Log-normal Distributions across the Sciences: Keys and Clues, Bio-
Science, 51 (5), 341–352.
• Eric W. Weisstein et al. Log Normal Distribution at MathWorld. Electronic document, retrieved October 26, 2006.
• Holgate, P. (1989). “The lognormal characteristic function”. Communications in Statistics - Theory and Methods
18 (12): 4539–4548. doi:10.1080/03610928908830173.

2.23.12 Further reading


• Brooks, Robert; Corson, Jon; Donal, Wales (1994). “The Pricing of Index Options When the Underlying Assets
All Follow a Lognormal Diffusion”. Advances in Futures and Options Research 7.

2.23.13 External links

2.24 Lomax distribution


The Lomax distribution, conditionally also called the Pareto Type II distribution, is a heavy-tail probability distri-
bution often used in business, economics, and actuarial modeling.[1][2] It is named after K. S. Lomax. It is essentially a
Pareto distribution that has been shifted so that its support begins at zero.[3]

2.24.1 Characterization
Probability density function

The probability density function (pdf) for the Lomax distribution is given by

α[ x ]−(α+1)
p(x) = 1+ , x ≥ 0,
λ λ
with shape parameter α > 0 and scale parameter λ > 0 . The density can be rewritten in such a way that more clearly
shows the relation to the Pareto Type I distribution. That is:

αλα
p(x) =
(x + λ)α+1

Differential equation

The pdf of the Lomax distribution is a solution to the following differential equation:

{ }
(λ + x)p′ (x) + (α + 1)p(x) = 0,
p(0) = αλ
2.24. LOMAX DISTRIBUTION 163

2.24.2 Relation to the Pareto distribution


The Lomax distribution is a Pareto Type I distribution shifted so that its support begins at zero. Specifically:

IfY ∼ Pareto(xm = λ, α), then Y − xm ∼ Lomax(λ, α).


The Lomax distribution is a Pareto Type II distribution with x =λ and μ=0:[4]

IfX ∼ Lomax(λ, α) then X ∼ P(II)(xm = λ, α, µ = 0).

2.24.3 Relation to generalized Pareto distribution


The Lomax distribution is a special case of the generalized Pareto distribution. Specifically:

1 λ
µ = 0, ξ = , σ= .
α α

2.24.4 Relation to q-exponential distribution


The Lomax distribution is a special case of the q-exponential distribution. The q-exponential extends this distribution to
support on a bounded interval. The Lomax parameters are given by:

2−q 1
α= , λ= .
q−1 λq (q − 1)

2.24.5 Non-central moments


The ν th non-central moment E[X ν ] exists only if the shape parameter α strictly exceeds ν , when the moment has the
value

λν Γ(α − ν)Γ(1 + ν)
E(X ν ) =
Γ(α)

2.24.6 See also


• Power law

2.24.7 References
[1] Lomax, K. S. (1954) “Business Failures; Another example of the analysis of failure data”. Journal of the American Statistical
Association, 49, 847–852. JSTOR 2281544
[2] Johnson, N.L., Kotz, S., Balakrishnan, N. (1994) Continuous Univariate Distributions, Volume 1, 2nd Edition, Wiley. ISBN
0-471-58495-9 (pages 575, 602)
[3] Van Hauwermeiren M and Vose D (2009). A Compendium of Distributions [ebook]. Vose Software, Ghent, Belgium. Available
at www.vosesoftware.com. Accessed 07/07/11
[4] Kleiber, Christian; Kotz, Samuel (2003), Statistical Size Distributions in Economics and Actuarial Sciences, Wiley Series in
Probability and Statistics 470, John Wiley & Sons, p. 60, ISBN 9780471457169.
164 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

2.25 Geometric stable distribution


A geometric stable distribution or geo-stable distribution is a type of leptokurtic probability distribution. Geomet-
ric stable distributions were introduced in Klebanov, L. B., Maniya, G. M., and Melamed, I. A. (1985). A problem of
Zolotarev and analogs of infinitely divisible and stable distributions in a scheme for summing a random number of random
variables, Theory of Probability & Its Applications, 29(4):791–794. These distributions are analogues for stable distri-
butions for the case when the number of summands is random, independent on the distribution of summand and having
geometric distribution The geometric stable distribution may be symmetric or asymmetric. A symmetric geometric sta-
ble distribution is also referred to as a Linnik distribution. The Laplace distribution is a special case of the geometric
stable distribution and of a Linnik distribution. The Mittag–Leffler distribution is also a special case of a geometric stable
distribution.
The geometric stable distribution has applications in finance theory.[1][2][3]

2.25.1 Characteristics
For most geometric stable distributions, the probability density function and cumulative distribution function have no
closed form solution. But a geometric stable distribution can be defined by its characteristic function, which has the
form:[4]

φ(t; α, β, λ, µ) = [1 + λα |t|α ω − iµt]−1


{
1 − i tan πα2 β sign(t) ifα ̸= 1
where ω =
1 + i π2 β log |t| sign(t) ifα = 1
α , which must be greater than 0 and less than or equal to 2, is the shape parameter or index of stability, which determines
how heavy the tails are.[4] Lower α corresponds to heavier tails.
β , which must be greater than or equal to −1 and less than or equal to 1, is the skewness parameter.[4] When β is negative
the distribution is skewed to the left and when β is positive the distribution is skewed to the right. When β is zero the
distribution is symmetric, and the characteristic function reduces to:[4]

φ(t; α, 0, λ, µ) = [1 + λα |t|α − iµt]−1


The symmetric geometric stable distribution with µ = 0 is also referred to as a Linnik distribution.[5][6] A completely
skewed geometric stable distribution, that is with β = 1 , α < 1 , with 0 < µ < 1 is also referred to as a Mittag–Leffler
distribution.[7] Although β determines the skewness of the distribution, it should not be confused with the typical skewness
coefficient or 3rd standardized moment, which in most circumstances is undefined for a geometric stable distribution.
λ > 0 is the scale parameter and µ is the location parameter.[4]
When α = 2, β = 0 and µ = 0 (i.e., a symmetric geometric stable distribution or Linnik distribution with α =2), the
distribution becomes the symmetric Laplace distribution with mean of 0,[5] which has a probability density function of:

( )
1 |x|
f (x|0, λ) = exp −
2λ λ
The Laplace distribution has a variance equal to 2λ2 . However, for α < 2 the variance of the geometric stable distribution
is infinite.

2.25.2 Relationship to the stable distribution


The stable distribution has the property that if X1 , X2 , . . . , Xn are independent, identically distributed random variables
taken from a stable distribution, the sum Y = an (X1 + X2 + · · · + Xn ) + bn has the same distribution as the Xi s for
2.26. NAKAGAMI DISTRIBUTION 165

some an and bn .
The geometric stable distribution has a similar property, but where the number of elements in the sum is a geometrically
distributed random variable. If X1 , X2 , . . . are independent and identically distributed random variables taken from a
geometric stable distribution, the limit of the sum Y = aNp (X1 + X2 + · · · + XNp ) + bNp approaches the distribution
of the Xi s for some coefficients aNp and bNp as p approaches 0, where Np is a random variable independent of the Xi
s taken from a geometric distribution with parameter p.[2] In other words:

Pr(Np = n) = (1 − p)n−1 p .
The distribution is strictly geometric stable only if the sum Y = a(X1 + X2 + · · · + XNp ) equals the distribution of the
Xi s for some a.[1]
There is also a relationship between the stable distribution characteristic function and the geometric stable distribution
characteristic function. The stable distribution has a characteristic function of the form:

Φ(t; α, β, λ, µ) = exp [ itµ−|λt|α (1−iβ sign(t)Ω) ] ,


where

{
tan πα
2 ifα ̸= 1,
Ω=
− π2 log |t| ifα = 1.

The geometric stable characteristic function can be expressed in terms of a stable characteristic function as:[8]

φ(t; α, β, λ, µ) = [1 − log(Φ(t; α, β, λ, µ))]−1 .

2.25.3 References
[1] Rachev, S. & Mittnik, S. (2000). Stable Paretian Models in Finance. Wiley. pp. 34–36. ISBN 978-0-471-95314-2.
[2] Trindade, A.A.; Zhu, Y. & Andrews, B. (May 18, 2009). “Time Series Models With Asymmetric Laplace Innovations” (PDF).
pp. 1–3. Retrieved 2011-02-27.
[3] Meerschaert, M. & Sceffler, H. “Limit Theorems for Continuous Time Random Walks” (PDF). p. 15. Retrieved 2011-02-27.
[4] Kozubowski, T.; Podgorski, K. & Samorodnitsky, G. “Tails of Lévy Measure of Geometric Stable Random Variables” (PDF).
pp. 1–3. Retrieved 2011-02-27.
[5] Kotz, S.; Kozubowski, T. & Podgórski, K. (2001). The Laplace distribution and generalizations. Birkhäuser. pp. 199–200.
ISBN 978-0-8176-4166-5.
[6] Kozubowski, T. (2006). “A Note on Certain Stability and Limiting Properties of ν-infinitely divisible distribution” (PDF). Int.
J. Contemp. Math. Sci. 1 (4): 159. Retrieved 2011-02-27.
[7] Burnecki, K.; Janczura, J.; Magdziarz, M. & Weron, A. (2008). “Can One See a Competition Between Subdiffusion and Lévy
Flights? A Care of Geometric Stable Noise” (PDF). Acta Physica Polonica B 39 (8): 1048. Retrieved 2011-02-27.
[8] “Geometric Stable Laws Through Series Representations” (PDF). Serdica Mathematical Journal 25: 243. 1999. Retrieved
2011-02-28.

2.26 Nakagami distribution


The Nakagami distribution or the Nakagami-m distribution is a probability distribution related to the gamma distri-
bution. It has two parameters: a shape parameter m and a second parameter controlling spread, Ω .
166 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

2.26.1 Characterization
Its probability density function (pdf) is[1]

2mm ( m )
f (x; m, Ω) = m
x2m−1 exp − x2 .
Γ(m)Ω Ω

Its cumulative distribution function is[1]

( m )
F (x; m, Ω) = P m, x2

where P is the incomplete gamma function (regularized).
Differential equation
{ ( ) m }
2mm e− Ω Ω−m
xΩf ′ (x) + f (x) 2mx2 − 2mΩ + Ω = 0, f (1) = Γ(m)

2.26.2 Parameter estimation


The parameters m and Ω are[2]

[ ]
E2 X 2
m= ,
Var [X 2 ]

and

[ ]
Ω = E X2 .

An alternative way of fitting the distribution is to re-parametrize Ω and m as σ = Ω/m and m.[3] Then, by taking the
derivative of log likelihood with respect to each of the new parameters, the following equations are obtained and these
can be solved using the Newton-Raphson method:

x2m
Γ(m) = ,
σm
and

x2
σ=
m
It is reported by authors that modelling data with Nakagami distribution and estimating parameters by above mention
method results in better performance for low data regime compared to moments based methods.

2.26.3 Generation
The Nakagami distribution is related to the gamma distribution. In particular, given a random variable Y ∼ Gamma(k, θ)
, it is possible to obtain a random variable X ∼ Nakagami(m, Ω) , by setting k = m , θ = Ω/m , and taking the square
root of Y :
2.27. PARETO DISTRIBUTION 167


X= Y

The Nakagami distribution f (y; m, Ω) can be generated from the chi distribution with parameter k set to 2m and then
following it by a scaling transformation of random variables. That is, a Nakagami random variable X is generated by a
simple scaling transformation on a Chi-distributed random variable Y ∼ χ(2m) as below:


X= (Ω/2m) Y.

2.26.4 History and applications

The Nakagami distribution is relatively new, being first proposed in 1960.[4] It has been used to model attenuation of
wireless signals traversing multiple paths.[5]

2.26.5 References
[1] Laurenson, Dave (1994). “Nakagami Distribution”. Indoor Radio Channel Propagation Modelling by Ray Tracing Techniques.
Retrieved 2007-08-04.

[2] R. Kolar, R. Jirik, J. Jan (2004) “Estimator Comparison of the Nakagami-m Parameter and Its Application in Echocardiogra-
phy”, Radioengineering, 13 (1), 8–12

[3] Mitra, Rangeet; Mishra, Amit Kumar; Choubisa, Tarun (2012). “Maximum Likelihood Estimate of Parameters of Nakagami-m
Distribution”. International Conference on Communications, Devices and Intelligent Systems (CODIS), 2012: 9-12.

[4] Nakagami, M. (1960) “The m-Distribution, a general formula of intensity of rapid fading”. In William C. Hoffman, editor,
Statistical Methods in Radio Wave Propagation: Proceedings of a Symposium held June 18-20, 1958, pp 3-36. Pergamon Press.

[5] Parsons, J. D. (1992) The Mobile Radio Propagation Channel. New York: Wiley.

2.27 Pareto distribution


The Pareto distribution, named after the Italian civil engineer, economist, and sociologist Vilfredo Pareto, is a power
law probability distribution that is used in description of social, scientific, geophysical, actuarial, and many other types of
observable phenomena.

2.27.1 Definition

If X is a random variable with a Pareto (Type I) distribution,[1] then the probability that X is greater than some number
x, i.e. the survival function (also called tail function), is given by

{( )α
xm
x x ≥ xm ,
F (x) = Pr(X > x) =
1 x < xm .

where x is the (necessarily positive) minimum possible value of X, and α is a positive parameter. The Pareto Type I
distribution is characterized by a scale parameter x and a shape parameter α, which is known as the tail index. When
this distribution is used to model the distribution of wealth, then the parameter α is called the Pareto index.
168 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

2.27.2 Properties

Cumulative distribution function

From the definition, the cumulative distribution function of a Pareto random variable with parameters α and x is

{ ( )α
1 − xxm x ≥ xm ,
FX (x) =
0 x < xm .

When plotted on linear axes, the distribution assumes the familiar J-shaped curve which approaches each of the orthogonal
axes asymptotically. All segments of the curve are self-similar (subject to appropriate scaling factors). When plotted in
a log-log plot, the distribution is represented by a straight line.

Probability density function

It follows (by differentiation) that the probability density function is

{
αxα
m
xα+1 x ≥ xm ,
fX (x) =
0 x < xm .

Moments and characteristic function

• The expected value of a random variable following a Pareto distribution is

{
∞ α ≤ 1,
E(X) = αxm
α−1 α > 1.

• The variance of a random variable following a Pareto distribution is


∞ α ∈ (1, 2],
Var(X) = ( x )2
 m α
α > 2.
α−1 α−2

(If α ≤ 1, the variance does not exist.)

• The raw moments are

{
∞ α ≤ n,
µ′n = αxn
m
α−n α > n.

• The moment generating function is only defined for non-positive values t ≤ 0 as


2.27. PARETO DISTRIBUTION 169

[ ]
M (t; α, xm ) = E etX = α(−xm t)α Γ(−α, −xm t)

M (0, α, xm ) = 1.

• The characteristic function is given by

φ(t; α, xm ) = α(−ixm t)α Γ(−α, −ixm t),

where Γ(a, x) is the incomplete gamma function.

Conditional distributions

The conditional probability distribution of a Pareto-distributed random variable, given the event that it is greater than or
equal to a particular number x1 exceeding xm , is a Pareto distribution with the same Pareto index α but with minimum
x1 instead of xm .

A characterization theorem

Suppose X1 , X2 , X3 , . . . are independent identically distributed random variables whose probability distribution is sup-
ported on the interval [xm , ∞) for some xm > 0 . Suppose that for all n , the two random variables min{X1 , . . . , Xn }
and (X1 + · · · + Xn )/ min{X1 , . . . , Xn } are independent. Then the common distribution is a Pareto distribution.

Geometric mean

The geometric mean (G) is[2]

( )
1
G = xm exp
α

Harmonic mean

The harmonic mean (H) is[2]

( )
1
H = xm 1 +
α

2.27.3 Generalized Pareto distributions

See also: Generalized Pareto distribution

There is a hierarchy [1][3] of Pareto distributions known as Pareto Type I, II, III, IV, and Feller–Pareto distributions.[1][3][4]
Pareto Type IV contains Pareto Type I–III as special cases. The Feller–Pareto[3][5] distribution generalizes Pareto Type
IV.
170 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

Pareto types I–IV

The Pareto distribution hierarchy is summarized in the next table comparing the survival functions (complementary CDF).
When μ = 0, the Pareto distribution Type II is also known as the Lomax distribution.[6]
In this section, the symbol x , used before to indicate the minimum value of x, is replaced by σ.
The shape parameter α is the tail index, μ is location, σ is scale, γ is an inequality parameter. Some special cases of Pareto
Type (IV) are

P (IV )(σ, σ, 1, α) = P (I)(σ, α),

P (IV )(µ, σ, 1, α) = P (II)(µ, σ, α),

P (IV )(µ, σ, γ, 1) = P (III)(µ, σ, γ).

The finiteness of the mean, and the existence and the finiteness of the variance depend on the tail index α (inequality index
γ). In particular, fractional δ-moments are finite for some δ > 0, as shown in the table below, where δ is not necessarily
an integer.

Feller–Pareto distribution

Feller[3][5] defines a Pareto variable by transformation U = Y −1 − 1 of a beta random variable Y, whose probability density
function is

y γ1 −1 (1 − y)γ2 −1
f (y) = , 0 < y < 1; γ1 , γ2 > 0,
B(γ1 , γ2 )

where B( ) is the beta function. If

W = µ + σ(Y −1 − 1)γ , σ > 0, γ > 0,

then W has a Feller–Pareto distribution FP(μ, σ, γ, γ1 , γ2 ).[1]


If U1 ∼ Γ(δ1 , 1) and U2 ∼ Γ(δ2 , 1) are independent Gamma variables, another construction of a Feller–Pareto (FP)
variable is[7]

( )γ
U1
W =µ+σ
U2

and we write W ~ FP(μ, σ, γ, δ1 , δ2 ). Special cases of the Feller–Pareto distribution are

F P (σ, σ, 1, 1, α) = P (I)(σ, α)

F P (µ, σ, 1, 1, α) = P (II)(µ, σ, α)

F P (µ, σ, γ, 1, 1) = P (III)(µ, σ, γ)

F P (µ, σ, γ, 1, α) = P (IV )(µ, σ, γ, α).


2.27. PARETO DISTRIBUTION 171

2.27.4 Applications
Pareto originally used this distribution to describe the allocation of wealth among individuals since it seemed to show
rather well the way that a larger portion of the wealth of any society is owned by a smaller percentage of the people in
that society. He also used it to describe distribution of income.[8] This idea is sometimes expressed more simply as the
Pareto principle or the “80-20 rule” which says that 20% of the population controls 80% of the wealth.[9] However, the
80-20 rule corresponds to a particular value of α, and in fact, Pareto’s data on British income taxes in his Cours d'économie
politique indicates that about 30% of the population had about 70% of the income. The probability density function (PDF)
graph at the beginning of this article shows that the “probability” or fraction of the population that owns a small amount
of wealth per person is rather high, and then decreases steadily as wealth increases. (Note that the Pareto distribution
is not realistic for wealth for the lower end. In fact, net worth may even be negative.) This distribution is not limited to
describing wealth or income, but to many situations in which an equilibrium is found in the distribution of the “small” to
the “large”. The following examples are sometimes seen as approximately Pareto-distributed:

• The sizes of human settlements (few cities, many hamlets/villages)[10]


• File size distribution of Internet traffic which uses the TCP protocol (many smaller files, few larger ones)[10]
• Hard disk drive error rates[11]
• Clusters of Bose–Einstein condensate near absolute zero[12]
• The values of oil reserves in oil fields (a few large fields, many small fields)[10]
• The length distribution in jobs assigned supercomputers (a few large ones, many small ones)
• The standardized price returns on individual stocks [10]

• Sizes of sand particles [10]


• Sizes of meteorites
• Numbers of species per genus (There is subjectivity involved: The tendency to divide a genus into two or more
increases with the number of species in it)
• Areas burnt in forest fires
• Severity of large casualty losses for certain lines of business such as general liability, commercial auto, and workers
compensation.[13][14]
• In hydrology the Pareto distribution is applied to extreme events such as annually maximum one-day rainfalls
and river discharges. The blue picture illustrates an example of fitting the Pareto distribution to ranked annually
maximum one-day rainfalls showing also the 90% confidence belt based on the binomial distribution. The rainfall
data are represented by plotting positions as part of the cumulative frequency analysis.

2.27.5 Relation to other distributions


Relation to the exponential distribution

The Pareto distribution is related to the exponential distribution as follows. If X is Pareto-distributed with minimum x
and index α, then

( )
X
Y = log
xm
is exponentially distributed with rate parameter α. Equivalently, if Y is exponentially distributed with rate α, then
172 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

xm eY

is Pareto-distributed with minimum x and index α.


This can be shown using the standard change of variable techniques:

( ( ) ) ( )α
X xm
Pr(Y < y) = Pr log < y = Pr(X < xm ey ) = 1 − = 1 − e−αy .
xm xm ey

The last expression is the cumulative distribution function of an exponential distribution with rate α.

Relation to the log-normal distribution

Note that the Pareto distribution and log-normal distribution are alternative distributions for describing the same types
of quantities. One of the connections between the two is that they are both the distributions of the exponential of ran-
dom variables distributed according to other common distributions, respectively the exponential distribution and normal
distribution.

Relation to the generalized Pareto distribution

The Pareto distribution is a special case of the generalized Pareto distribution, which is a family of distributions of similar
form, but containing an extra parameter in such a way that the support of the distribution is either bounded below (at a
variable point), or bounded both above and below (where both are variable), with the Lomax distribution as a special
case. This family also contains both the unshifted and shifted exponential distributions.
The Pareto distribution with scale xm and shape α is equivalent to the generalized Pareto distribution with location
µ = xm , scale σ = xm /α and shape ξ = 1/α . Vice versa one can get the Pareto distribution from the GPD by
xm = σ/ξ and α = 1/ξ .

Relation to Zipf’s law

Pareto distributions are continuous probability distributions. Zipf’s law, also sometimes called the zeta distribution, may
be thought of as a discrete counterpart of the Pareto distribution.

Relation to the “Pareto principle”

The "80-20 law", according to which 20% of all people receive 80% of all income, and 20% of the most affluent 20%
receive 80% of that 80%, and so on, holds precisely when the Pareto index is α = log4 (5) = log(5)/log(4), approximately
1.161. This result can be derived from the Lorenz curve formula given below. Moreover, the following have been
shown[15] to be mathematically equivalent:

• Income is distributed according to a Pareto distribution with index α > 1.

• There is some number 0 ≤ p ≤ 1/2 such that 100p % of all people receive 100(1 − p) % of all income, and similarly
for every real (not necessarily integer) n > 0, 100pn % of all people receive 100(1 − p)n percentage of all income.

This does not apply only to income, but also to wealth, or to anything else that can be modeled by this distribution.
This excludes Pareto distributions in which 0 < α ≤ 1, which, as noted above, have infinite expected value, and so cannot
reasonably model income distribution.
2.27. PARETO DISTRIBUTION 173

2.27.6 Lorenz curve and Gini coefficient


The Lorenz curve is often used to characterize income and wealth distributions. For any distribution, the Lorenz curve
L(F) is written in terms of the PDF f or the CDF F as

∫ x(F ) ∫F
xf (x) dx x(F ′ ) dF ′
L(F ) = ∫xm∞ = ∫01
xm
xf (x) dx
0
x(F ′ ) dF ′

where x(F) is the inverse of the CDF. For the Pareto distribution,

xm
x(F ) = 1
(1 − F ) α
and the Lorenz curve is calculated to be

1
L(F ) = 1 − (1 − F )1− α ,

Although the numerator and denominator in the expression for L(F ) diverge for 0 ≤ α < 1 , their ratio does not,
yielding L=0 in these cases, which yields a Gini coefficient of unity. Examples of the Lorenz curve for a number of
Pareto distributions are shown in the graph on the right.
The Gini coefficient is a measure of the deviation of the Lorenz curve from the equidistribution line which is a line
connecting [0, 0] and [1, 1], which is shown in black (α = ∞) in the Lorenz plot on the right. Specifically, the Gini
coefficient is twice the area between the Lorenz curve and the equidistribution line. The Gini coefficient for the Pareto
distribution is then calculated (for α ≥ 1 ) to be

(∫ 1 )
1
G=1−2 L(F )dF =
0 2α − 1
(see Aaberge 2005).

2.27.7 Parameter estimation


The likelihood function for the Pareto distribution parameters α and x , given a sample x = (x1 , x2 , ..., xn), is


n
xα ∏n
1
m n nα
L(α, xm ) = α α+1 = α xm α+1 .
i=1
xi x
i=1 i

Therefore, the logarithmic likelihood function is


n
ℓ(α, xm ) = n ln α + nα ln xm − (α + 1) ln xi .
i=1

It can be seen that ℓ(α, xm ) is monotonically increasing with x , that is, the greater the value of x , the greater the value
of the likelihood function. Hence, since x ≥ x , we conclude that

bm = min xi .
x
i
174 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

To find the estimator for α, we compute the corresponding partial derivative and determine where it is zero:

∂ℓ n ∑n
= + n ln xm − ln xi = 0.
∂α α i=1

Thus the maximum likelihood estimator for α is:

n
b= ∑
α .
i (ln xi − ln x
bm )

The expected statistical error is:[16]

αb
σ=√ .
n
Malik (1970)[17] gives the exact joint distribution of (x̂m , α̂) . In particular, x̂m and α̂ are independent and x̂m is Pareto
with scale parameter x and shape parameter nα, whereas α̂ has an Inverse-gamma distribution with shape and scale
parameters n−1 and nα, respectively.

2.27.8 Graphical representation


The characteristic curved 'Long Tail' distribution when plotted on a linear scale, masks the underlying simplicity of the
function when plotted on a log-log graph, which then takes the form of a straight line with negative gradient: It follows
from the formula for the probability density function that for x ≥ x ,

( )

m
m ) − (α + 1) log x.
log fX (x) = log α α+1 = log(αxα
x
Since α is positive, the gradient −(α+1) is negative.

2.27.9 Random sample generation


Random samples can be generated using inverse transform sampling. Given a random variate U drawn from the uniform
distribution on the unit interval (0, 1], the variate T given by

xm
T = 1

is Pareto-distributed.[18] If U is uniformly distributed on [0, 1), it can be exchanged with (1 − U).

2.27.10 Variants
Bounded Pareto distribution

See also: Truncated distribution

The bounded (or truncated) Pareto distribution has three parameters α, L and H. As in the standard Pareto distribution α
determines the shape. L denotes the minimal value, and H denotes the maximal value. (The variance in the table on the
right should be interpreted as the second moment).
2.27. PARETO DISTRIBUTION 175

The probability density function is

αLα x−α−1
( L )α
1− H

where L ≤ x ≤ H, and α > 0.

Generating bounded Pareto random variables If U is uniformly distributed on (0, 1), then applying inverse-transform
method [19]

1 − Lα x−α
U=
1 − (H
L α
)
( )− α1
U H α − U Lα − H α
x= −
H α Lα
is a bounded Pareto-distributed.

Symmetric Pareto distribution

The symmetric Pareto distribution can be defined by the probability density function:[20]

{
−α−1
2 αxm |x| |x| > xm
1 α
f (x; α, xm ) =
0 otherwise.

It has a similar shape to a Pareto distribution for x > x and is mirror symmetric about the vertical axis.

2.27.11 See also


• Bradford’s law

• Pareto analysis

• Pareto efficiency

• Pareto interpolation

• Power law probability distributions

• Traffic generation model

2.27.12 Notes
[1] Barry C. Arnold (1983). Pareto Distributions. International Co-operative Publishing House. ISBN 0-89974-012-X.

[2] Johnson NL, Kotz S, Balakrishnan N (1994) Continuous univariate distributions Vol 1. Wiley Series in Probability and Statistics.

[3] Johnson, Kotz, and Balakrishnan (1994), (20.4).

[4] Christian Kleiber and Samuel Kotz (2003). Statistical Size Distributions in Economics and Actuarial Sciences. Wiley. ISBN
0-471-15064-9.
176 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

[5] Feller, W. (1971). An Introduction to Probability Theory and its Applications II (2nd ed.). New York: Wiley. p. 50. “The den-
sities (4.3) are sometimes called after the economist Pareto. It was thought (rather naïvely from a modern statistical standpoint)
that income distributions should have a tail with a density ~ Ax−α as x → ∞.”

[6] Lomax, K. S. (1954). Business failures. Another example of the analysis of failure data.Journal of the American Statistical
Association, 49, 847–852.

[7] Chotikapanich, Duangkamon. “Chapter 7: Pareto and Generalized Pareto Distributions”. Modeling Income Distributions and
Lorenz Curves. pp. 121–122.

[8] Pareto, Vilfredo, Cours d'Économie Politique: Nouvelle édition par G.-H. Bousquet et G. Busino, Librairie Droz, Geneva, 1964,
pages 299–345.

[9] For a two-quantile population, where approximately 18% of the population owns 82% of the wealth, the Theil index takes the
value 1.

[10] Reed, William J.; et al. (2004.). “The Double Pareto-Lognormal Distribution – A New Parametric Model for Size Distri-
butions”. Communications in Statistics – Theory and Methods 33 (8): 1733–1753. doi:10.1081/sta-120037438. CiteSeerX:
10.1.1.70.4555. Check date values in: |date= (help)

[11] Schroeder, Bianca; Damouras, Sotirios; Gill, Phillipa (2010-02-24). “Understanding latent sector error and how to protect
against them” (PDF). 8th Usenix Conference on File and Storage Technologies (FAST 2010). Retrieved 2010-09-10. We exper-
imented with 5 different distributions (Geometric,Weibull, Rayleigh, Pareto, and Lognormal), that are commonly used in the
context of system reliability, and evaluated their fit through the total squared differences between the actual and hypothesized
frequencies (χ2 statistic). We found consistently across all models that the geometric distribution is a poor fit, while the Pareto
distribution provides the best fit.

[12] Yuji Ijiri; Simon, Herbert A. (May 1975). “Some Distributions Associated with Bose–Einstein Statistics”. Proc. Nat. Acad.
Sci. USA 72 (5): 1654–1657. PMC 432601. PMID 16578724. Retrieved 24 January 2013.

[13] Kleiber and Kotz (2003): page 94.

[14] Seal, H. (1980). “Survival probabilities based on Pareto claim distributions”. ASTIN Bulletin 11: 61–71.

[15] Hardy, Michael (2010). “Pareto’s Law”. Mathematical Intelligencer 32 (3): 38–43. doi:10.1007/s00283-010-9159-2.

[16] M. E. J. Newman (2005). “Power laws, Pareto distributions and Zipf’s law”. Contemporary Physics 46 (5): 323–351. arXiv:cond-
mat/0412004. Bibcode:2005ConPh..46..323N. doi:10.1080/00107510500052444.

[17] H. J. Malik (1970). “Estimation of the Parameters of the Pareto Distribution”. Metrika 15.

[18] Tanizaki, Hisashi (2004). Computational Methods in Statistics and Econometrics. CRC Press. p. 133.

[19] http://www.cs.bgu.ac.il/~{}mps042/invtransnote.htm

[20] Grabchak, M. & Samorodnitsky, D. “Do Financial Returns Have Finite or Infinite Variance? A Paradox and an Explanation”
(PDF). pp. 7–8.

2.27.13 References
• M. O. Lorenz (1905). “Methods of measuring the concentration of wealth”. Publications of the American Statistical
Association 9 (70): 209–219. Bibcode:1905PAmSA...9..209L. doi:10.2307/2276207.

• Pareto V (1965) “La Courbe de la Repartition de la Richesse” (Originally published in 1896). In: Busino G, editor.
Oevres Completes de Vilfredo Pareto. Geneva: Librairie Droz. pp. 1–5.

• Pareto, V. (1895). La legge della domanda. Giornale degli Economisti, 10, 59–68. English translation in Rivista di
Politica Economica, 87 (1997), 691–700.

• Pareto, V. (1897). Cours d'économie politique. Lausanne: Ed. Rouge.


2.28. PEARSON DISTRIBUTION 177

2.27.14 External links


• Gini’s Nuclear Family / Rolf Aabergé. – In: International Conference to Honor Two Eminent Social Scientists,
May, 2005 – PDF

• Hazewinkel, Michiel, ed. (2001), “Pareto distribution”, Encyclopedia of Mathematics, Springer, ISBN 978-1-
55608-010-4

• syntraf1.c is a C program to generate synthetic packet traffic with bounded Pareto burst size and exponential in-
terburst time.

• “Self-Similarity in World Wide Web Traffic: Evidence and Possible Causes” /Mark E. Crovella and Azer Bestavros

• Weisstein, Eric W., “Pareto distribution”, MathWorld.

2.28 Pearson distribution


The Pearson distribution is a family of continuous probability distributions. It was first published by Karl Pearson in
1895 and subsequently extended by him in 1901 and 1916 in a series of articles on biostatistics.

2.28.1 History
The Pearson system was originally devised in an effort to model visibly skewed observations. It was well known at the time
how to adjust a theoretical model to fit the first two cumulants or moments of observed data: Any probability distribution
can be extended straightforwardly to form a location-scale family. Except in pathological cases, a location-scale family
can be made to fit the observed mean (first cumulant) and variance (second cumulant) arbitrarily well. However, it was
not known how to construct probability distributions in which the skewness (standardized third cumulant) and kurtosis
(standardized fourth cumulant) could be adjusted equally freely. This need became apparent when trying to fit known
theoretical models to observed data that exhibited skewness. Pearson’s examples include survival data, which are usually
asymmetric.
In his original paper, Pearson (1895, p. 360) identified four types of distributions (numbered I through IV) in addition to
the normal distribution (which was originally known as type V). The classification depended on whether the distributions
were supported on a bounded interval, on a half-line, or on the whole real line; and whether they were potentially skewed or
necessarily symmetric. A second paper (Pearson 1901) fixed two omissions: it redefined the type V distribution (originally
just the normal distribution, but now the inverse-gamma distribution) and introduced the type VI distribution. Together
the first two papers cover the five main types of the Pearson system (I, III, IV, V, and VI). In a third paper, Pearson (1916)
introduced further special cases and subtypes (VII through XII).
Rhind (1909, pp. 430–432) devised a simple way of visualizing the parameter space of the Pearson system, which was
subsequently adopted by Pearson (1916, plate 1 and pp. 430ff., 448ff.). The Pearson types are characterized by two
quantities, commonly referred to as β1 and β2 . The first is the square of the skewness: β1 = γ12 where γ1 is the skewness,
or third standardized moment. The second is the traditional kurtosis, or fourth standardized moment: β2 = γ2 + 3.
(Modern treatments define kurtosis γ2 in terms of cumulants instead of moments, so that for a normal distribution we
have γ2 = 0 and β2 = 3. Here we follow the historical precedent and use β2 .) The diagram on the right shows which
Pearson type a given concrete distribution (identified by a point (β1 , β2 )) belongs to.
Many of the skewed and/or non-mesokurtic distributions familiar to us today were still unknown in the early 1890s.
What is now known as the beta distribution had been used by Thomas Bayes as a posterior distribution of the parameter
of a Bernoulli distribution in his 1763 work on inverse probability. The Beta distribution gained prominence due to its
membership in Pearson’s system and was known until the 1940s as the Pearson type I distribution. [1] (Pearson’s type
II distribution is a special case of type I, but is usually no longer singled out.) The gamma distribution originated from
Pearson’s work (Pearson 1893, p. 331; Pearson 1895, pp. 357, 360, 373–376) and was known as the Pearson type III
distribution, before acquiring its modern name in the 1930s and 1940s. [2] Pearson’s 1895 paper introduced the type IV
178 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

distribution, which contains Student’s t-distribution as a special case, predating William Sealy Gosset's subsequent use by
several years. His 1901 paper introduced the inverse-gamma distribution (type V) and the beta prime distribution (type
VI).

2.28.2 Definition
A Pearson density p is defined to be any valid solution to the differential equation (cf. Pearson 1895, p. 381)

p′ (x) a+x−λ
+ = 0. (1)
p(x) b2 (x − λ)2 + b1 (x − λ) + b0
with :

4β2 − 3β1
b0 = µ2 ,
10β2 − 12β1 − 18
√ √ β2 + 3
a = b1 = µ2 β1 ,
10β2 − 12β1 − 18
2β2 − 3β1 − 6
b2 = .
10β2 − 12β1 − 18
According to Ord,[3] Pearson devised the underlying form of Equation (1) on the basis of, firstly, the formula for the
derivative of the logarithm of the density function of the normal distribution (which gives a linear function) and, secondly,
from a recurrence relation for values in the probability mass function of the hypergeometric distribution (which yields the
linear-divided-by-quadratic structure).
In Equation (1), the parameter a determines a stationary point, and hence under some conditions a mode of the distribution,
since

p′ (λ − a) = 0

follows directly from the differential equation.


Since we are confronted with a first order linear differential equation with variable coefficients, its solution is straightfor-
ward:

( ∫ )
x−a
p(x) ∝ exp − dx .
b2 x2 + b1 x + b0
The integral in this solution simplifies considerably when certain special cases of the integrand are considered. Pearson
(1895, p. 367) distinguished two main cases, determined by the sign of the discriminant (and hence the number of real
roots) of the quadratic function

f (x) = b2 x2 + b1 x + b0 . (2)

2.28.3 Particular types of distribution


Case 1, negative discriminant: The Pearson type IV distribution

If the discriminant of the quadratic function (2) is negative ( b21 − 4b2 b0 < 0 ), it has no real roots. Then define
2.28. PEARSON DISTRIBUTION 179

y = x + 2bb12 and

4 b2 b0 −b21
α= 2 b2 .

Observe that α is a well-defined real number and α ≠ 0, because by assumption 4b2 b0 − b21 > 0 and therefore b2 ≠ 0.
Applying these substitutions, the quadratic function (2) is transformed into

f (x) = b2 (y 2 + α2 ).

The absence of real roots is obvious from this formulation, because α2 is necessarily positive.
We now express the solution to the differential equation (1) as a function of y:

( ∫ )
1 y− b1
2 b2 −a
p(y) ∝ exp − dy .
b2 y 2 + α2

Pearson (1895, p. 362) called this the “trigonometrical case”, because the integral

∫ (y)
y− 2 b2 a+b1
2 b2 1 2 b2 a + b1
dy = ln(y 2 + α2 ) − arctan + C0
y 2 + α2 2 2 b2 α α

involves the inverse trigonometric arctan function. Then

[ ( ) (y) ]
1 y2 ln α 2 b2 a + b1
p(y) ∝ exp − ln 1 + 2 − + arctan + C1
2 b2 α b2 2 b22 α α

Finally, let

1
m= 2 b2 and
ν = − 2 b22ba+b

1
2

Applying these substitutions, we obtain the parametric function:

[ ]−m [ ( y )]
y2
p(y) ∝ 1 + 2 exp −ν arctan
α α

This unnormalized density has support on the entire real line. It depends on a scale parameter α > 0 and shape parameters
m > 1/2 and ν. One parameter was lost when we chose to find the solution to the differential equation (1) as a function
of y rather than x. We therefore reintroduce a fourth parameter, namely the location parameter λ. We have thus derived
the density of the Pearson type IV distribution:


Γ(m+ ν2 i) 2 [

Γ(m)
( )2 ]−m [ ( )]
x−λ x−λ
p(x) = ( ) 1+ exp −ν arctan .
α B m − 12 , 12 α α

The normalizing constant involves the complex Gamma function (Γ) and the Beta function (B).
180 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

The Pearson type VII distribution The shape parameter ν of the Pearson type IV distribution controls its skewness.
If we fix its value at zero, we obtain a symmetric three-parameter family. This special case is known as the Pearson type
VII distribution (cf. Pearson 1916, p. 450). Its density is

[ ( )2 ]−m
1 x−λ
p(x) = ( ) 1+ ,
α B m − 12 , 12 α

where B is the Beta function.


An alternative parameterization (and slight specialization) of the type VII distribution is obtained by letting


α=σ 2 m − 3,
which requires m > 3/2. This entails a minor loss of generality but ensures that the variance of the distribution exists and
is equal to σ2 . Now the parameter m only controls the kurtosis of the distribution. If m approaches infinity as λ and σ are
held constant, the normal distribution arises as a special case:

[ ( )2 ]−m
1 x−λ
lim √ ( ) 1+ √
m→∞ σ 2 m − 3 B m − 1 , 1 σ 2m − 3
2 2
[ ( x−λ )2 ]−m
1 Γ(m)
= √ ( 1 ) × lim ( )√ × lim 1+ σ
σ 2Γ 2 m→∞
Γ m− 1 m− 3 m→∞ 2m − 3
2 2
[ ( )2 ]
1 1 x−λ
= √ × 1 × exp −
σ 2π 2 σ
This is the density of a normal distribution with mean λ and standard deviation σ.
It is convenient to require that m > 5/2 and to let

5 3
m= + .
2 γ2
This is another specialization, and it guarantees that the first four moments of the distribution exist. More specifically,
the Pearson type VII distribution parameterized in terms of (λ, σ, γ2 ) has a mean of λ, standard deviation of σ, skewness
of zero, and excess kurtosis of γ2 .

Student’s t-distribution The Pearson type VII distribution is equivalent to the non-standardized Student’s t-distribution
with parameters ν > 0, μ, σ2 by applying the following substitutions to its original parameterization:

λ = µ,

α = νσ 2 ,and
ν+1
m= 2 ,

Observe that the constraint m > 1/2 is satisfied.


The resulting density is

( )− ν+1
1 1 (x − µ)2 2

p(x|µ, σ , ν) = √
2
(ν 1) 1 + ,
νσ 2 B , ν σ2
2 2
2.28. PEARSON DISTRIBUTION 181

which is easily recognized as the density of a Student’s t-distribution.


Note also that this implies that the Pearson type VII distribution subsumes the standard Student’s t-distribution and also
the standard Cauchy distribution. In particular, the standard Student’s t-distribution arises as a subcase, when μ = 0 and
σ2 = 1, equivalent to the following substitutitons:

λ = 0,

α = ν,and
ν+1
m= 2 ,

The density of this restricted one-parameter family is a standard Student’s t:

( )− ν+1
1 x2 2

p(x) = √ ( ν 1 ) 1+ ,
ν B 2, 2 ν

Case 2, non-negative discriminant

If the quadratic function (2) has a non-negative discriminant ( b21 −4b2 b0 ≥ 0 ), it has real roots a1 and a2 (not necessarily
distinct):


−b1 −b21 − 4b2 b0
a1 = ,
2b2

−b1 + b21 − 4b2 b0
a2 = ,
2b2
In the presence of real roots the quadratic function (2) can be written as

f (x) = b2 (x − a1 )(x − a2 ),

and the solution to the differential equation is therefore

( ∫ )
1 x−a
p(x) ∝ exp − dx .
b2 (x − a1 )(x − a2 )

Pearson (1895, p. 362) called this the “logarithmic case”, because the integral


x−a (a1 − a) ln(x − a1 ) − (a2 − a) ln(x − a2 )
dx = +C
(x − a1 )(x − a2 ) a1 − a2

involves only the logarithm function, and not the arctan function as in the previous case.
Using the substitution

1
ν=
b2 (a1 − a2 )

we obtain the following solution to the differential equation (1):


182 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

p(x) ∝ (x − a1 )−ν(a1 −a) (x − a2 )ν(a2 −a) .

Since this density is only known up to a hidden constant of proportionality, that constant can be changed and the density
written as follows:

( )−ν(a1 −a) ( )ν(a2 −a)


x x
p(x) ∝ 1− 1−
a1 a2

The Pearson type I distribution The Pearson type I distribution (a generalization of the beta distribution) arises
when the roots of the quadratic equation (2) are of opposite sign, that is, a1 < 0 < a2 . Then the solution p is supported
on the interval (a1 , a2 ) . Apply the substitution

x = a1 + y(a2 − a1 ) where 0 < y < 1,

which yields a solution in terms of y that is supported on the interval (0, 1):

( )(−a1 +a)ν ( )(a2 −a)ν


a1 − a2 a2 − a1
p(y) ∝ y (1 − y) .
a1 a2

One may define:

a − a1
m1 =
b2 (a1 − a2 )
a − a2
m2 =
b2 (a2 − a1 )
Regrouping constants and parameters, this simplifies to:

p(y) ∝ y m1 (1 − y)m2 ,

Thus x−λ−a1
a2 −a1 follows a B(m1 + 1, m2 + 1) with λ = µ1 − (a2 − a1 ) m1m+m
1 +1
2 +2
− a1
It turns out that m1 , m2 > −1 is necessary and sufficient for p to be a proper probability density function.

The Pearson type II distribution The Pearson type II distribution is a special case of the Pearson type I family
restricted to symmetric distributions.
For the Pearson Type II Curve,[4]

( )m
x2
y = y0 1 − 2
a

where


x= d2 /2 − (n3 − n)/12
2.28. PEARSON DISTRIBUTION 183

∑ 2
the ordinate, y, is the frequency of d . The Pearson Type II Curve is used in computing the table of significant
correlation coefficients for Spearman’s rank correlation coefficient when the number of items in a series is less than 100
(or 30, depending on some sources). After that, the distribution mimics a standard Student’s t-distribution. For the table
of values, certain values are used as the constants in the previous equation:

5β2 − 9
m=
2(3 − β2 )

2µ2 β2
a2 =
3 − β2
N [Γ(2m + 2)]
y0 =
a[22m+1 ][Γ(m + 1)]
The moments of x used are

µ2 = (n − 1)[(n2 + n)/12]2

3(25n4 − 13n3 − 73n2 + 37n + 72)


β2 =
25n(n + 1)2 (n − 1)

The Pearson type III distribution

λ = µ1 + b0
b1 − (m + 1)b1
b0 + b1 (x − λ)is Gamma(m + 1, b21 )

The Pearson type III distribution is a gamma distribution or chi-squared distribution.

The Pearson type V distribution Defining new parameters:

b1
C1 = 2b2

λ = µ1 − a−C1
1−2b2

x − λfollows an InverseGamma( b12 − 1, a−C


b2 )
1

The Pearson type V distribution is an inverse-gamma distribution.

The Pearson type VI distribution

λ = µ1 + (a2 − a1 ) m2m+m
2 +1
1 +2
− a2
x−λ−a2
a2 −a1 follows a : β ′ (m2 + 1, −m2 − m1 − 1)

The Pearson type VI distribution is a beta prime distribution or F-distribution.

2.28.4 Relation to other distributions


The Pearson family subsumes the following distributions, among others:

• beta distribution (type I)


184 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

• beta prime distribution (type VI)

• Cauchy distribution (type IV)

• chi-squared distribution (type III)

• continuous uniform distribution (limit of type I)

• exponential distribution (type III)

• gamma distribution (type III)

• F-distribution (type VI)

• inverse-chi-squared distribution (type V)

• inverse-gamma distribution (type V)

• normal distribution (limit of type I, III, IV, V, or VI)

• Student’s t-distribution (type VII, which is the non-skewed subtype of type IV)

2.28.5 Applications
These models are used in financial markets, given their ability to be parametrised in a way that has intuitive meaning for
market traders. A number of models are in current use that capture the stochastic nature of the volatility of rates, stocks
etc. and this family of distributions may prove to be one of the more important.
In the United States, the Log-Pearson III is the default distribution for flood frequency analysis.

2.28.6 Notes
[1] Miller, Jeff; et al. (2006-07-09). “Beta distribution”. Earliest Known Uses of Some of the Words of Mathematics. Retrieved
December 9, 2006. External link in |work= (help)

[2] Miller, Jeff; et al. (2006-12-07). “Gamma distribution”. Earliest Known Uses of Some of the Words of Mathematics. Retrieved
December 9, 2006. External link in |work= (help)

[3] Ord J.K. (1972) p2

[4] Ramsey, Philip H. (1989-09-01). “Critical Values for Spearman’s Rank Order Correlation”. Retrieved August 22, 2007.

2.28.7 Sources
Primary sources

• Pearson, Karl (1893). “Contributions to the mathematical theory of evolution [abstract]". Proceedings of the Royal
Society 54 (326–330): 329–333. doi:10.1098/rspl.1893.0079. JSTOR 115538.

• Pearson, Karl (1895). “Contributions to the mathematical theory of evolution, II: Skew variation in homoge-
neous material”. Philosophical Transactions of the Royal Society 186: 343–414. Bibcode:1895RSPTA.186..343P.
doi:10.1098/rsta.1895.0010. JSTOR 90649.

• Pearson, Karl (1901). “Mathematical contributions to the theory of evolution, X: Supplement to a memoir on skew
variation”. Philosophical Transactions of the Royal Society A 197 (287–299): 443–459. Bibcode:1901RSPTA.197..443P.
doi:10.1098/rsta.1901.0023. JSTOR 90841.
2.29. PHASE-TYPE DISTRIBUTION 185

• Pearson, Karl (1916). “Mathematical contributions to the theory of evolution, XIX: Second supplement to a mem-
oir on skew variation”. Philosophical Transactions of the Royal Society A 216 (538–548): 429–457. Bibcode:1916RSPTA.216..429P.
doi:10.1098/rsta.1916.0009. JSTOR 91092.

• Rhind, A. (July–October 1909). “Tables to facilitate the computation of the probable errors of the chief constants
of skew frequency distributions”. Biometrika 7 (1/2): 127–147. doi:10.1093/biomet/7.1-2.127. JSTOR 2345367.

Secondary sources

• Milton Abramowitz and Irene A. Stegun (1964). Handbook of Mathematical Functions with Formulas, Graphs,
and Mathematical Tables. National Bureau of Standards.

• Eric W. Weisstein et al. Pearson Type III Distribution. From MathWorld.

References

• Elderton, Sir W.P, Johnson, N.L. (1969) Systems of Frequency Curves. Cambridge University Press.

• Ord J.K. (1972) Families of Frequency Distributions. Griffin, London.

2.29 Phase-type distribution


A phase-type distribution is a probability distribution constructed by a convolution or mixture of exponential distribu-
tions.[1] It results from a system of one or more inter-related Poisson processes occurring in sequence, or phases. The
sequence in which each of the phases occur may itself be a stochastic process. The distribution can be represented by a
random variable describing the time until absorption of a Markov process with one absorbing state. Each of the states of
the Markov process represents one of the phases.
It has a discrete time equivalent the discrete phase-type distribution.
The set of phase-type distributions is dense in the field of all positive-valued distributions, that is, it can be used to
approximate any positive-valued distribution.

2.29.1 Definition

Consider a continuous-time Markov process with m + 1 states, where m ≥ 1, such that the states 1,...,m are transient states
and state 0 is an absorbing state. Further, let the process have an initial probability of starting in any of the m + 1 phases
given by the probability vector (α0 ,α) where α0 is a scalar and α is a 1 × m vector.
The continuous phase-type distribution is the distribution of time from the above process’s starting until absorption in
the absorbing state.
This process can be written in the form of a transition rate matrix,

[ ]
0 0
Q= 0 ,
S S

where S is an m × m matrix and S0 = –S1. Here 1 represents an m × 1 vector with every element being 1.
186 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

2.29.2 Characterization

The distribution of time X until the process reaches the absorbing state is said to be phase-type distributed and is denoted
PH(α,S).
The distribution function of X is given by,

F (x) = 1 − α exp(Sx)1,

and the density function,

f (x) = α exp(Sx)S0 ,

for all x > 0, where exp( · ) is the matrix exponential. It is usually assumed the probability of process starting in the
absorbing state is zero (i.e. α0 = 0). The moments of the distribution function are given by

E[X n ] = (−1)n n!αS −n 1.

2.29.3 Special cases

The following probability distributions are all considered special cases of a continuous phase-type distribution:

• Degenerate distribution, point mass at zero or the empty phase-type distribution - 0 phases.

• Exponential distribution - 1 phase.

• Erlang distribution - 2 or more identical phases in sequence.

• Deterministic distribution (or constant) - The limiting case of an Erlang distribution, as the number of phases
become infinite, while the time in each state becomes zero.

• Coxian distribution - 2 or more (not necessarily identical) phases in sequence, with a probability of transitioning to
the terminating/absorbing state after each phase.

• Hyper-exponential distribution (also called a mixture of exponential) - 2 or more non-identical phases, that each
have a probability of occurring in a mutually exclusive, or parallel, manner. (Note: The exponential distribution is
the degenerate situation when all the parallel phases are identical.)

• Hypoexponential distribution - 2 or more phases in sequence, can be non-identical or a mixture of identical and
non-identical phases, generalises the Erlang.

As the phase-type distribution is dense in the field of all positive-valued distributions, we can represent any positive valued
distribution. However, the phase-type is a light-tailed or platikurtic distribution. So the representation of heavy-tailed or
leptokurtic distribution by phase type is an approximation, even if the precision of the approximation can be as good as
we want.

2.29.4 Examples

In all the following examples it is assumed that there is no probability mass at zero, that is α0 = 0.
2.29. PHASE-TYPE DISTRIBUTION 187

Exponential distribution

The simplest non-trivial example of a phase-type distribution is the exponential distribution of parameter λ. The parameter
of the phase-type distribution are : S = -λ and α = 1.

Hyper-exponential or mixture of exponential distribution

The mixture of exponential or hyper-exponential distribution with λ1 ,λ2 ,...,λ >0 can be represented as a phase type dis-
tribution with

α = (α1 , α2 , α3 , α4 , ..., αn )
∑n
with i=1 αi = 1 and

 
−λ1 0 0 0 0
 0 −λ2 0 0 0 
 
S=
 0 0 −λ3 0 0 .
 0 0 0 −λ4 0 
0 0 0 0 −λ5

This mixture of densities of exponential distributed random variables can be characterized through


n ∑
n
−λi x
f (x) = αi λi e = αi fXi (x),
i=1 i=1

or its cumulative distribution function


n ∑
n
−λi x
F (x) = 1 − αi e = αi FXi (x).
i=1 i=1

with Xi ∼ Exp(λi )

Erlang distribution

The Erlang distribution has two parameters, the shape an integer k > 0 and the rate λ > 0. This is sometimes denoted
E(k,λ). The Erlang distribution can be written in the form of a phase-type distribution by making S a k×k matrix with
diagonal elements -λ and super-diagonal elements λ, with the probability of starting in state 1 equal to 1. For example
E(5,λ),

α = (1, 0, 0, 0, 0),

and

 
−λ λ 0 0 0
 0 −λ λ 0 0 
 
S=
 0 0 −λ λ 0 
.
 0 0 0 −λ λ 
0 0 0 0 −λ
188 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

For a given number of phases, the Erlang distribution is the phase type distribution with smallest coefficient of variation.[2]
The hypoexponential distribution is a generalisation of the Erlang distribution by having different rates for each transition
(the non-homogeneous case).

Mixture of Erlang distribution

The mixture of two Erlang distribution with parameter E(3,β1 ), E(3,β2 ) and (α1 ,α2 ) (such that α1 + α2 = 1 and for each
i, αi ≥ 0) can be represented as a phase type distribution with

α = (α1 , 0, 0, α2 , 0, 0),

and

 
−β1 β1 0 0 0 0
 0 −β1 β1 0 0 0 
 
 0 0 −β1 0 0 0 
S=
 0
.
 0 0 −β2 β2 0 
 0 0 0 0 −β2 β2 
0 0 0 0 0 −β2

Coxian distribution

The Coxian distribution is a generalisation of the hypoexponential distribution. Instead of only being able to enter the
absorbing state from state k it can be reached from any phase. The phase-type representation is given by,

 
−λ1 p1 λ1 0 ... 0 0
 .. 
 0 −λ2 p2 λ2 . 0 0 
 
 .. .. .. .. .. .. 
 . . . . 
S= . . 
 .. 
 0 0 . −λk−2 pk−2 λk−2 0 
 
 0 0 ... 0 −λk−1 pk−1 λk−1 
0 0 ... 0 0 −λk

and

α = (1, 0, . . . , 0),

where 0 < p1 ,...,pk−₁ ≤ 1. In the case where all pi = 1 we have the hypoexponential distribution. The Coxian distribution
is extremely important as any acyclic phase-type distribution has an equivalent Coxian representation.
The generalised Coxian distribution relaxes the condition that requires starting in the first phase.

2.29.5 Generating samples from phase-type distributed random variables

BuTools includes methods for generating samples from phase-type distributed random variables.[3]
2.29. PHASE-TYPE DISTRIBUTION 189

2.29.6 Approximating other distributions


Any distribution can be arbitrarily well approximated by a phase type distribution.[4][5] In practice, however, approxi-
mations can be poor when the size of the approximating process is fixed. Approximating a deterministic distribution of
time 1 with 10 phases, each of average length 0.1 will have variance 0.1 (because the Erlang distribution has smallest
variance[2] ).

• BuTools a MATLAB and Mathematica script for fitting phase-type distributions to 3 specified moments
• momentmatching a MATLAB script to fit a minimal phase-type distribution to 3 specified moments[6]

2.29.7 Fitting a phase type distribution to data


Methods to fit a phase type distribution to data can be classified as maximum likelihood methods or moment match-
ing methods.[7] Fitting a phase type distribution to heavy-tailed distributions has been shown to be practical in some
situations.[8]

• PhFit a C script for fitting discrete and continuous phase type distributions to data[9]
• EMpht is a C script for fitting phase-type distributions to data or parametric distributions using an expectation–
maximization algorithm.[10]
• HyperStar was developed around the core idea of making phase-type fitting simple and user-friendly, in order to
advance the use of phase-type distributions in a wide range of areas. It provides a graphical user interface and
yields good fitting results with only little user interaction.[11]
• jPhase is a Java library which can also compute metrics for queues using the fitted phase type distribution[12]

2.29.8 See also


• Discrete phase-type distribution
• Continuous-time Markov process
• Exponential distribution
• Hyper-exponential distribution
• Queueing theory

2.29.9 References
[1] Harchol-Balter, M. (2012). “Real-World Workloads: High Variability and Heavy Tails”. Performance Modeling and Design of
Computer Systems. p. 347. doi:10.1017/CBO9781139226424.026. ISBN 9781139226424.

[2] Aldous, David; Shepp, Larry (1987). “The least variable phase type distribution is erlang” (PDF). Stochastic Models 3 (3): 467.
doi:10.1080/15326348708807067.

[3] Horváth, G. B.; Reinecke, P.; Telek, M. S.; Wolter, K. (2012). “Efficient Generation of PH-Distributed Random Vari-
ates”. Analytical and Stochastic Modeling Techniques and Applications. Lecture Notes in Computer Science 7314. p. 271.
doi:10.1007/978-3-642-30782-9_19. ISBN 978-3-642-30781-2.

[4] Bolch, Gunter; Greiner, Stefan; de Meer, Hermann; Trivedi, Kishor S. (1998). “Steady-State Solutions of Markov Chains”.
Queueing Networks and Markov Chains. pp. 103–151. doi:10.1002/0471200581.ch3. ISBN 0471193666.

[5] Cox, D. R. (2008). “A use of complex probabilities in the theory of stochastic processes”. Mathematical Proceedings of the
Cambridge Philosophical Society 51 (2): 313. doi:10.1017/S0305004100030231.
190 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

[6] Osogami, T.; Harchol-Balter, M. (2006). “Closed form solutions for mapping general distributions to quasi-minimal PH distri-
butions”. Performance Evaluation 63 (6): 524. doi:10.1016/j.peva.2005.06.002.

[7] Lang, Andreas; Arthur, Jeffrey L. (1996). “Parameter approximation for Phase-Type distributions”. In Chakravarthy, S.; Alfa,
Attahiru S. Matrix Analytic methods in Stochastic Models. CRC Press. ISBN 0824797663.

[8] Ramaswami, V.; Poole, D.; Ahn, S.; Byers, S.; Kaplan, A. (2005). “Ensuring Access to Emergency Services in the Presence of
Long Internet Dial-Up Calls”. Interfaces 35 (5): 411. doi:10.1287/inte.1050.0155.

[9] Horváth, András S.; Telek, Miklós S. (2002). “PhFit: A General Phase-Type Fitting Tool”. Computer Performance Evaluation:
Modelling Techniques and Tools. Lecture Notes in Computer Science 2324. p. 82. doi:10.1007/3-540-46029-2_5. ISBN
978-3-540-43539-6.

[10] Asmussen, Søren; Nerman, Olle; Olsson, Marita (1996). “Fitting Phase-Type Distributions via the EM Algorithm”. Scandina-
vian Journal of Statistics 23 (4): 419–441. JSTOR 4616418.

[11] Reinecke, P.; Krauß, T.; Wolter, K. (2012). “Cluster-based fitting of phase-type distributions to empirical data”. Computers &
Mathematics with Applications 64 (12): 3840. doi:10.1016/j.camwa.2012.03.016.

[12] Pérez, J. F.; Riaño, G. N. (2006). “jPhase: an object-oriented tool for modeling phase-type distributions”. Proceeding from
the 2006 workshop on Tools for solving structured Markov chains (SMCtools '06) (PDF). doi:10.1145/1190366.1190370. ISBN
1595935061.

• M. F. Neuts. Matrix-Geometric Solutions in Stochastic Models: an Algorithmic Approach, Chapter 2: Probability


Distributions of Phase Type; Dover Publications Inc., 1981.

• G. Latouche, V. Ramaswami. Introduction to Matrix Analytic Methods in Stochastic Modelling, 1st edition. Chap-
ter 2: PH Distributions; ASA SIAM, 1999.

• C. A. O'Cinneide (1990). Characterization of phase-type distributions. Communications in Statistics: Stochastic


Models, 6(1), 1-57.

• C. A. O'Cinneide (1999). Phase-type distribution: open problems and a few properties, Communication in Statistic:
Stochastic Models, 15(4), 731-757.

2.30 Rayleigh distribution


Not to be confused with Rayleigh mixture distribution.

In probability theory and statistics, the Rayleigh distribution /ˈreɪli/ is a continuous probability distribution for positive-
valued random variables.
A Rayleigh distribution is often observed when the overall magnitude of a vector is related to its directional components.
One example where the Rayleigh distribution naturally arises is when wind velocity is analyzed into its orthogonal 2-
dimensional vector components. Assuming that each component is uncorrelated, normally distributed with equal variance,
and zero mean, then the overall wind speed (vector magnitude) will be characterized by a Rayleigh distribution. A second
example of the distribution arises in the case of random complex numbers whose real and imaginary components are i.i.d.
(independently and identically distributed) Gaussian with equal variance and zero mean. In that case, the absolute value
of the complex number is Rayleigh-distributed.
The distribution is named after Lord Rayleigh.

2.30.1 Definition

The probability density function of the Rayleigh distribution is[1]


2.30. RAYLEIGH DISTRIBUTION 191

x −x2 /(2σ2 )
f (x; σ) = e , x ≥ 0,
σ2
where σ is the scale parameter of the distribution. The cumulative distribution function is[1]

F (x; σ) = 1 − e−x
2
/(2σ 2 )

for x ∈ [0, ∞).

2.30.2 Relation to random vector lengths


Consider the two-dimensional vector Y = (U, V ) which has components that are Gaussian-distributed, centered at zero,
2 2
e−u /2σ
and independent. Then fU (u; σ) = √
2πσ 2
, and similarly for fV (v; σ) .
Let x be the length of Y . It is distributed as

∫ ∞ ∫ ∞ √
1
dv e−u /2σ 2 −v 2 /2σ 2
2
f (x; σ) = du e δ(x − u2 + v 2 ).
2πσ 2 −∞ −∞

By transforming to the polar coordinate system one has

∫ 2π ∫ ∞
1 x −x2 /2σ2
dr δ(r − x)re−r
2
/2σ 2
f (x; σ) = dϕ = e ,
2πσ 2 0 0 σ2

which is the Rayleigh distribution. It is straightforward to generalize to vectors of dimension other than 2. There are also
generalizations when the components have unequal variance or correlations.

2.30.3 Properties
The raw moments are given by:

( )
k k k
µk = σ 2 Γ 1 +
2
2

where Γ(z) is the Gamma function.


The mean and variance of a Rayleigh random variable may be expressed as:


π
µ(X) = σ ≈ 1.253σ
2

and

4−π 2
var(X) = σ ≈ 0.429σ 2
2
The mode is σ and the maximum pdf is
192 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

1 −1 1
fmax = f (σ; σ) = e 2 ≈ 0.606
σ σ
The skewness is given by:


2 π(π − 3)
γ1 = 3 ≈ 0.631
(4 − π) 2

The excess kurtosis is given by:

6π 2 − 24π + 16
γ2 = − ≈ 0.245
(4 − π)2

The characteristic function is given by:

√ [ ( ) ]
− 12 σ 2 t2 π σt
φ(t) = 1 − σte erfi √ −i
2 2

where erfi(z) is the imaginary error function. The moment generating function is given by

√ [ ( ) ]
1 2 2 π σt
M (t) = 1 + σt e 2σ t erf √ +1
2 2

where erf(z) is the error function.

Differential entropy

The differential entropy is given by

( )
σ γ
H = 1 + ln √ +
2 2

where γ is the Euler–Mascheroni constant.

Differential equation

The pdf of the Rayleigh distribution is a solution of the following differential equation:

 2 ′ ( 2 ) 
 σ xf (x) + f (x) x − σ = 0 
2

 exp(− 2σ12 ) 
f (1) = σ2

2.30.4 Parameter estimation


Given a sample of N independent and identically distributed Rayleigh random variables xi with parameter σ ,
2.30. RAYLEIGH DISTRIBUTION 193

c2 ≈ 1
∑N
σ 2N i=1 x2i is an unbiased maximum likelihood estimate.
v
u
u 1 ∑ N
σ̂ ≈ t x2
2N i=1 i
√ √
N
N !(N −1)! N [2]
σ = σ̂ Γ(N ) N
Γ(N + 1 )
= σ̂ 4 √
(2N )! π
2

Confidence intervals

To find the (1 − α) confidence interval, first find the two numbers χ21 , χ22 where:

P r(χ2 (2N ) ≤ χ21 ) = α/2, P r(χ2 (2N ) ≤ χ22 ) = 1 − α/2

then
N x2
χ22
≤σ
b2 ≤ N x2 [3]
χ21

2.30.5 Generating random variates


Given a random variate U drawn from the uniform distribution in the interval (0, 1), then the variate


X=σ −2 ln(U )
has a Rayleigh distribution with parameter σ . This is obtained by applying the inverse transform sampling-method.

2.30.6 Related distributions



• R ∼ Rayleigh(σ) is Rayleigh distributed if R = X 2 + Y 2 , where X ∼ N (0, σ 2 ) and Y ∼ N (0, σ 2 ) are
independent normal random variables.[4] (This gives motivation to the use of the symbol “sigma” in the above
parameterization of the Rayleigh density.)

• The chi distribution with v = 2 is equivalent to Rayleigh Distribution with σ = 1. I.e., if R ∼ Rayleigh(1) , then
R2 has a chi-squared distribution with parameter N , degrees of freedom, equal to two (N = 2)

[Q = R2 ] ∼ χ2 (N ) .
∑N
• If R ∼ Rayleigh(σ) , then i=1 Ri2 has a gamma distribution with parameters N and 2σ 2

[ ]

N
Y = Ri2 ∼ Γ(N, 2σ 2 ).
i=1

• The Rice distribution is a generalization of the Rayleigh distribution.

• The Weibull distribution is a generalization


√ of the Rayleigh distribution. In this instance, parameter σ is related to
the Weibull scale parameter λ : λ = σ 2.

• The Maxwell–Boltzmann distribution describes the magnitude of a normal vector in three dimensions.

• If X has an exponential distribution X ∼ Exponential(λ) , then Y = 2Xσ 2 λ ∼ Rayleigh(σ).
194 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

2.30.7 Applications

An application of the estimation of σ can be found in magnetic resonance imaging (MRI). As MRI images are recorded
as complex images but most often viewed as magnitude images, the background data is Rayleigh distributed. Hence, the
above formula can be used to estimate the noise variance in an MRI image from background data.[5] [6]

2.30.8 Proof of correctness – Unequal variances

Click [show] to expand

We start with

∫ ∞ ∫ ∞ √
1
dv e−u /2σ12 −v 2 /2σ22
2
f (x; σ) = du e δ(x − u2 + v 2 ).
2πσ1 σ2 −∞ −∞

as above, except with σ1 and σ2 distinct.


Let a = uσ2 /σ1 so that a/σ2 = u/σ1 , and differentiating we have:

σ1
da = du
σ2

Substituting,

 √( 
∫ ∞ ∫ ∞ )2
σ1 1 σ1
dv e−a e−v δ x − a + v2 
2
/2σ22 2
/2σ22
f (x; σ) = da
σ2 2πσ1 σ2 −∞ −∞ σ2

As before, we perform a polar coordinate transformation:[7]


 a = rcos(ϕ)
v = rsin(ϕ)

dadv = rdrdϕ

Substituting,

 √( 
∫ 2π ∫ ∞ )2
σ1 1 σ1
rdr e−r δ x − + v2  .
2
/2σ22
f (x; σ) = dϕ a
σ2 2πσ1 σ2 0 0 σ2

Simplifying,

 √( 
∫ 2π ∫ ∞ )2
1 σ1
rdr e−r δ x − a + v2  .
2
/2σ22
f (x; σ) = dϕ
2πσ22 0 0 σ2

See Hoyt distribution for more information.


2.31. RAYLEIGH MIXTURE DISTRIBUTION 195

2.30.9 See also


• Normal distribution
• Rayleigh fading
• Rayleigh mixture distribution
• Circular error probable

2.30.10 References
[1] Papoulis, Athanasios; Pillai, S. (2001) Probability, Random Variables and Stochastic Processe. ISBN 0073660116, ISBN
9780073660110
[2] Siddiqui, M. M. (1964) “Statistical inference for Rayleigh distributions”, The Journal of Research of the National Bureau of
Standards, Sec. D: Radio Science, Vol. 68D, No. 9, p. 1007
[3] Siddiqui, M. M. (1961) “Some Problems Connected With Rayleigh Distributions”, The Journal of Research of the National
Bureau of Standards, Sec. D: Radio Propagation, Vol. 66D, No. 2, p. 169
[4] Hogema, Jeroen (2005) “Shot group statistics”
[5] Sijbers J., den Dekker A. J., Raman E. and Van Dyck D. (1999) “Parameter estimation from magnitude MR images”, Interna-
tional Journal of Imaging Systems and Technology, 10(2), 109–114
[6] den Dekker A. J., Sijbers J., (2014) “Data distributions in magnetic resonance images: a review”, Physica Medica,
[7] http://physicspages.com/2012/12/24/coordinate-transformations-the-jacobian-determinant/

2.31 Rayleigh mixture distribution


Not to be confused with Rayleigh scattering.

In probability theory and statistics a Rayleigh mixture distribution is a weighted mixture of multiple probability dis-
tributions where the weightings are equal to the weightings of a Rayleigh distribution.[1] Since the probability density
function for a (standard) Rayleigh distribution is given by[2]

x −x2 /2σ2
f (x; σ) = e , x ≥ 0,
σ2
Rayleigh mixture distributions have probability density functions of the form

∫ ∞
re−r /2σ
2 2

f (x; σ, n) = τ (x, r; n) dr,


0 σ2
where τ (x, r; n) is a well-defined probability density function or sampling distribution.[1]
The Rayleigh mixture distribution is one of many types of compound distributions in which the appearance of a value in
a sample or population might be interpreted as a function of other underlying random variables. Mixture distributions are
often used in mixture models, which are used to express probabilities of sub-populations within a larger population.

2.31.1 See also


• Mixture distribution
• List of probability distributions
196 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

2.31.2 References
[1] Karim R., Hossain P., Begum S., and Hossain F., “Rayleigh Mixture Distribution”, Journal of Applied Mathematics, Vol. 2011,
doi:10.1155/2011/238290 (2011).

[2] Jackson J.L., “Properties of the Rayleigh Distribution”, Johns Hopkins University (1954).

2.32 Rice distribution


In probability theory, the Rice distribution or Rician distribution is the probability distribution of the magnitude of a
circular bivariate normal random variable with potentially non-zero mean. It was named after Stephen O. Rice.

2.32.1 Characterization
The probability density function is

( ) ( xν )
x −(x2 + ν 2 )
f (x | ν, σ) = 2 exp I0 ,
σ 2σ 2 σ2

where I 0 (z) is the modified Bessel function of the first kind with order zero.
The characteristic function is:[1][2]

χX (t | ν, σ)
( )[ ( )
ν2 1 ν2 1 2 2
= exp − 2 Ψ2 1; 1, ; 2 , − σ t
2σ 2 2σ 2
( )]
√ 3 3 ν2 1
+ i 2σtΨ2 ; 1, ; 2 , − σ 2 t2 ,
2 2 2σ 2

where Ψ2 (α; γ, γ ′ ; x, y) is one of Horn’s confluent hypergeometric functions with two variables and convergent for all
finite values of x and y . It is given by:[3][4]

∑∞ ∑ ∞
(α)m+n xm y n
Ψ2 (α; γ, γ ′ ; x, y) = ,
n=0 m=0
(γ)m (γ ′ )n m!n!

where

Γ(x + n)
(x)n = x(x + 1) · · · (x + n − 1) =
Γ(x)

is the rising factorial.

2.32.2 Properties
Moments

The first few raw moments are:


2.32. RICE DISTRIBUTION 197

′ √
µ1 = σ π/2 L1/2 (−ν 2 /2σ 2 )

µ2 = 2σ 2 + ν 2
′ √
µ3 = 3σ 3 π/2 L3/2 (−ν 2 /2σ 2 )

µ4 = 8σ 4 + 8σ 2 ν 2 + ν 4
′ √
µ5 = 15σ 5 π/2 L5/2 (−ν 2 /2σ 2 )

µ6 = 48σ 6 + 72σ 4 ν 2 + 18σ 2 ν 4 + ν 6
and, in general, the raw moments are given by


µk = σ k 2k/2 Γ(1+k/2) Lk/2 (−ν 2 /2σ 2 ).

Here Lq(x) denotes a Laguerre polynomial:

Lq (x) = L(0)
q (x) = M (−q, 1, x) = 1 F1 (−q; 1; x)

where M (a, b, z) =1 F1 (a; b; z) is the confluent hypergeometric function of the first kind. When k is even, the raw
moments become simple polynomials in σ and ν, as in the examples above.
For the case q = 1/2:

( )
1
L1/2 (x) = 1 F1 − ; 1; x
2
[ ( ) ( )]
−x −x
= ex/2 (1 − x) I0 − xI1 .
2 2

The second central moment, the variance, is

µ2 = 2σ 2 + ν 2 − (πσ 2 /2) L21/2 (−ν 2 /2σ 2 ).

Note that L21/2 (·) indicates the square of the Laguerre polynomial L1/2 (·) , not the generalized Laguerre polynomial
(2)
L1/2 (·).

Differential equation

The pdf of the Rice distribution is a solution of the following differential equation:

 ( ) ( ) 

 σ 4 x2 f ′′ (x) + 2σ 2 x3 − σ 4 x f ′ (x) + f (x) σ 4 − v 2 x2 + x4 = 0 


 


 ( ) 

 2
exp − v2σ+1 I0 ( σv2 )

2
f (1) =
 
σ 2

 


 ( ) 


 ′
2
exp − v2σ+1
2 ((σ2 −1)I0 ( σv2 )+vI1 ( σv2 )) 

f (1) = σ4
198 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

2.32.3 Related distributions


√ ( ) ( )
• R ∼ Rice (ν, σ) has a Rice distribution if R = X 2 + Y 2 where X ∼ N ν cos θ, σ 2 and Y ∼ N ν sin θ, σ 2
are statistically independent normal random variables and θ is any real number.

• Another case where R ∼ Rice (ν, σ) comes from the following steps:

ν2
1. Generate P having a Poisson distribution with parameter (also mean, for a Poisson) λ = 2σ 2 .

2. Generate X having a chi-squared distribution with 2P + 2 degrees of freedom.



R = σ X.

• If R ∼ Rice (ν, 1) then R2 has a noncentral chi-squared distribution with two degrees of freedom and noncentrality
parameter ν 2 .
• If R ∼ Rice (ν, 1) then R has a noncentral chi distribution with two degrees of freedom and noncentrality parameter
ν.
• If R ∼ Rice (0, σ) then R ∼ Rayleigh (σ) , i.e., for the special case of the Rice distribution given by ν = 0, the
distribution becomes the Rayleigh distribution, for which the variance is µ2 = 4−π
2 σ .
2

• If R ∼ Rice (0, σ) then R2 has an exponential distribution.[5]

2.32.4 Limiting cases


For large values of the argument, the Laguerre polynomial becomes[6]

|x|ν
lim Lν (x) = .
x→−∞ Γ(1 + ν)

It is seen that as ν becomes large or σ becomes small the mean becomes ν and the variance becomes σ2 .

2.32.5 Parameter estimation (the Koay inversion technique)


There are three different methods for estimating the parameters of the Rice distribution, (1) method of moments,[7][8][9][10]
(2) method of maximum likelihood,[7][8][9] and (3) method of least squares. In the first two methods the interest is in
estimating the parameters of the distribution, ν and σ, from a sample of data. This can be done using the method of
moments, e.g., the sample mean and the sample standard deviation. The sample mean is an estimate of μ1 ' and the
sample standard deviation is an estimate of μ2 1/2 .
The following is an efficient method, known as the “Koay inversion technique”.[11] for solving the estimating equations,
based on the sample mean and the sample standard deviation, simultaneously . This inversion technique is also known as
the fixed point formula of SNR. Earlier works[7][12] on the method of moments usually use a root-finding method to solve
the problem, which is not efficient.
′ 1/2
First, the ratio of the sample mean to the sample standard deviation is defined as r, i.e., r = µ1 /µ2 . The fixed point
formula of SNR is expressed as


g(θ) = ξ(θ) [1 + r2 ] − 2,
ν
where θ is the ratio of the parameters, i.e., θ = σ , and ξ(θ) is given by:
2.32. RICE DISTRIBUTION 199

π [ ]2
ξ(θ) = 2 + θ2 − exp (−θ2 /2) (2 + θ2 )I0 (θ2 /4) + θ2 I1 (θ2 /4) ,
8
where I0 and I1 are modified Bessel functions of the first kind.
Note that ξ(θ) is a scaling factor of σ and is related to µ2 by:

µ2 = ξ(θ)σ 2 .

To find the fixed point, θ∗ , of g , an initial solution is selected, θ0 , that is greater than the lower bound, which is
√ ′ 1/2
θlowerbound = 0 and occurs when r = π/(4 − π) [11] (Notice that this is the r = µ1 /µ2 of a Rayleigh distribution).
This provides a starting point for the iteration, which uses functional composition, and this continues until g i (θ0 ) − θi−1
is less than some small positive value. Here, g i denotes the composition of the same function, g , i times. In practice, we
associate the final θn for some integer n as the fixed point, θ∗ , i.e., θ∗ = g (θ∗ ) .
Once the fixed point is found, the estimates ν and σ are found through the scaling function, ξ(θ) , as follows:

1/2
µ
σ=√2 ,
ξ (θ∗ )

and

√( )

ν= µ1 2 + (ξ (θ∗ ) − 2) σ 2 .

To speed up the iteration even more, one can use the Newton’s method of root-finding.[11] This particular approach is
highly efficient.

2.32.6 Applications
• The Euclidean norm of a bivariate normally distributed random vector.
• Rician fading
• Effect of sighting error on target shooting.[13]

2.32.7 See also


• Rayleigh distribution
• Stephen O. Rice (1907–1986)

2.32.8 Notes
[1] Liu 2007 (in one of Horn’s confluent hypergeometric functions with two variables).

[2] Annamalai 2000 (in a sum of infinite series).

[3] Erdelyi 1953.

[4] Srivastava 1985.

[5] Richards, M.A., Rice Distribution for RCS, Georgia Institute of Technology (Sep 2006)
200 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

[6] Abramowitz and Stegun (1968) §13.5.1

[7] Talukdar et al. 1991

[8] Bonny et al. 1996

[9] Sijbers et al. 1998

[10] den Dekker and Sijbers 2014

[11] Koay et al. 2006 (known as the SNR fixed point formula).

[12] Abdi 2001

[13] “Ballistipedia”. Retrieved 4 May 2014.

2.32.9 References
• Abramowitz, M. and Stegun, I. A. (ed.), Handbook of Mathematical Functions, National Bureau of Standards,
1964; reprinted Dover Publications, 1965. ISBN 0-486-61272-4
• Rice, S. O., Mathematical Analysis of Random Noise. Bell System Technical Journal 24 (1945) 46–156.
• I. Soltani Bozchalooi and Ming Liang (20 November 2007). “A smoothness index-guided approach to wavelet
parameter selection in signal de-noising and fault detection”. Journal of Sound and Vibration 308 (1–2): 253–254.
doi:10.1016/j.jsv.2007.07.038.
• Liu, X. and Hanzo, L., A Unified Exact BER Performance Analysis of Asynchronous DS-CDMA Systems Using
BPSK Modulation over Fading Channels, IEEE Transactions on Wireless Communications, Volume 6, Issue 10,
October 2007, Pages 3504–3509.
• Annamalai, A., Tellambura, C. and Bhargava, V. K., Equal-Gain Diversity Receiver Performance in Wireless
Channels, IEEE Transactions on Communications,Volume 48, October 2000, Pages 1732–1745.
• Erdelyi, A., Magnus, W., Oberhettinger, F. and Tricomi, F. G., Higher Transcendental Functions, Volume 1.
McGraw-Hill Book Company Inc., 1953.
• Srivastava, H. M. and Karlsson, P. W., Multiple Gaussian Hypergeometric Series. Ellis Horwood Ltd., 1985.
• Sijbers J., den Dekker A. J., Scheunders P. and Van Dyck D., “Maximum Likelihood estimation of Rician distri-
bution parameters”, IEEE Transactions on Medical Imaging, Vol. 17, Nr. 3, p. 357–361, (1998)
• den Dekker, A.J., and Sijbers, J (December 2014). “Data distributions in magnetic resonance images: a review”.
Physica Medica 30 (7): 725–741. doi:10.1016/j.ejmp.2014.05.002.
• Koay, C.G. and Basser, P. J., Analytically exact correction scheme for signal extraction from noisy magnitude MR
signals, Journal of Magnetic Resonance, Volume 179, Issue = 2, p. 317–322, (2006)
• Abdi, A., Tepedelenlioglu, C., Kaveh, M., and Giannakis, G. On the estimation of the K parameter for the Rice
fading distribution, IEEE Communications Letters, Volume 5, Number 3, March 2001, Pages 92–94.
• Talukdar, K.K., and Lawing, William D. (March 1991). “Estimation of the parameters of the Rice distribution”.
Journal of the Acoustical Society of America 89 (3): 1193–1197. doi:10.1121/1.400532.
• Bonny,J.M., Renou, J.P., and Zanca, M. (November 1996). “Optimal Measurement of Magnitude and Phase from
MR Data”. Journal of Magnetic Resonance, Series B 113 (2): 136–144. doi:10.1006/jmrb.1996.0166.

2.32.10 External links


• MATLAB code for Rice/Rician distribution (PDF, mean and variance, and generating random samples)
2.33. SHIFTED GOMPERTZ DISTRIBUTION 201

2.33 Shifted Gompertz distribution


The shifted Gompertz distribution is the distribution of the largest of two independent random variables one of which
has an exponential distribution with parameter b and the other has a Gumbel distribution with parameters η and b.
In its original formulation the distribution was expressed referring to the Gompertz distribution instead of the Gumbel
distribution but, since the Gompertz distribution is a reverted Gumbel distribution, the labelling can be considered as
accurate. It has been used as a model of adoption of innovations. It was proposed by Bemmaor[1] (1994). Some of its
statistical properties have been studied further by Jiménez and Jodrá [2] (2009).
It has been used to predict the growth and decline of social networks and on-line services and shown to be superior to the
Bass model and Weibull distribution (see the work by Christian Bauckhage and co-authors).

2.33.1 Specification
Probability density function

The probability density function of the shifted Gompertz distribution is:

−bx [ ( )]
f (x; b, η) = be−bx e−ηe 1 + η 1 − e−bx for x ≥ 0.

where b > 0 is the scale parameter and η > 0 is the shape parameter of the shifted Gompertz distribution.

Cumulative distribution function

The cumulative distribution function of the shifted Gompertz distribution is:

( ) −bx
F (x; b, η) = 1 − e−bx e−ηe for x ≥ 0.

2.33.2 Properties
The shifted Gompertz distribution is right-skewed for all values of η . It is more flexible than the Gumbel distribution.

Shapes

The shifted Gompertz density function can take on different shapes depending on the values of the shape parameter η :

• 0 < η ≤ 0.5 the probability density function has its mode at 0.


• η > 0.5 the probability density function has its mode at

ln(z ⋆ )
mode = − 0 < z⋆ < 1
b
where z ⋆ is the smallest root of

η 2 z 2 − η(3 + η)z + η + 1 = 0 ,

which is

z ⋆ = [3 + η − (η 2 + 2η + 5)1/2 ]/(2η).
202 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

2.33.3 Related distributions


If η varies according to a gamma distribution with shape parameter α and scale parameter β (mean = αβ ), the distribution
of x is Gamma/Shifted Gompertz (G/SG). When α is equal to one, the G/SG reduces to the Bass model (Bemmaor 1994).
The G/SG has been applied by Dover, Goldenberg and Shapira [3] (2009) and Van den Bulte and Stremersch [4] (2004)
among others in the context of the diffusion of innovations. The model is discussed in Chandrasekaran and Tellis [5] (2007).

2.33.4 See also


• Gumbel distribution

• Generalized extreme value distribution

• Mixture model

• Bass model

• Gompertz distribution

2.33.5 References
[1] Bemmaor, Albert C. (1994). “Modeling the Diffusion of New Durable Goods: Word-of-Mouth Effect Versus Consumer Het-
erogeneity”. In G. Laurent, G.L. Lilien & B. Pras. Research Traditions in Marketing. Boston: Kluwer Academic Publishers.
pp. 201–223. ISBN 0-7923-9388-0.

[2] Jiménez, Fernando; Jodrá, Pedro (2009). “A Note on the Moments and Computer Generation of the Shifted Gompertz Distri-
bution”. Communications in Statistics - Theory and Methods 38 (1): 78–89. doi:10.1080/03610920802155502.

[3] Dover, Yaniv; Goldenberg, Jacob; Shapira, Daniel (2012). “Network Traces on Penetration: Uncovering Degree Distribution
From Adoption Data”. Marketing Science. doi:10.1287/mksc.1120.0711.

[4] Van den Bulte, Christophe; Stremersch, Stefan (2004). “Social Contagion and Income Heterogeneity in New Product Diffusion:
A Meta-Analytic Test”. Marketing Science 23 (4): 530–544. doi:10.1287/mksc.1040.0054.

[5] Chandrasekaran, Deepa; Tellis, Gerard J. (2007). “A Critical Review of Marketing Research on Diffusion of New Products”.
In Naresh K. Malhotra. Review of Marketing Research 3. Armonk: M.E. Sharpe. pp. 39–80. ISBN 978-0-7656-1306-6.

2.34 Type-2 Gumbel distribution


In probability theory, the Type-2 Gumbel probability density function is

−a
f (x|a, b) = abx−a−1 e−bx

for

0<x<∞

This implies that it is similar to the Weibull distributions, substituting b = λ−k and a = −k . Note however that a
positive k (as in the Weibull distribution) would yield a negative a, which is not allowed here as it would yield a negative
probability density.
For 0 < a ≤ 1 the mean is infinite. For 0 < a ≤ 2 the variance is infinite.
The cumulative distribution function is
2.35. WEIBULL DISTRIBUTION 203

−a
F (x|a, b) = e−bx

The moments E[X k ] exist for k < a


The special case b = 1 yields the Fréchet distribution
Based on The GNU Scientific Library, used under GFDL.

2.34.1 See also

• Extreme value theory

• Gumbel distribution

• Type-1 Gumbel distribution

2.35 Weibull distribution


In probability theory and statistics, the Weibull distribution /ˈveɪbʊl/ is a continuous probability distribution. It is named
after Swedish mathematician Waloddi Weibull, who described it in detail in 1951, although it was first identified by Fréchet
(1927) and first applied by Rosin & Rammler (1933) to describe a particle size distribution.

2.35.1 Definition

The probability density function of a Weibull random variable is:[1]

{ ( )k−1
e−(x/λ)
k
k x
λ λ x ≥ 0,
f (x; λ, k) =
0 x < 0,

where k > 0 is the shape parameter and λ > 0 is the scale parameter of the distribution. Its complementary cumulative
distribution function is a stretched exponential function. The Weibull distribution is related to a number of other proba-
bility distributions;
√ in particular, it interpolates between the exponential distribution (k = 1) and the Rayleigh distribution
(k = 2 and λ = 2σ [2] ).
If the quantity X is a “time-to-failure”, the Weibull distribution gives a distribution for which the failure rate is proportional
to a power of time. The shape parameter, k, is that power plus one, and so this parameter can be interpreted directly as
follows:

• A value of k < 1 indicates that the failure rate decreases over time. This happens if there is significant “infant
mortality”, or defective items failing early and the failure rate decreasing over time as the defective items are
weeded out of the population.

• A value of k = 1 indicates that the failure rate is constant over time. This might suggest random external events are
causing mortality, or failure.

• A value of k > 1 indicates that the failure rate increases with time. This happens if there is an “aging” process, or
parts that are more likely to fail as time goes on.

In the field of materials science, the shape parameter k of a distribution of strengths is known as the Weibull modulus.
204 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

2.35.2 Properties
Density function

The form of the density function of the Weibull distribution changes drastically with the value of k. For 0 < k < 1, the
density function tends to ∞ as x approaches zero from above and is strictly decreasing. For k = 1, the density function
tends to 1/λ as x approaches zero from above and is strictly decreasing. For k > 1, the density function tends to zero as
x approaches zero from above, increases until its mode and decreases after it. It is interesting to note that the density
function has infinite negative slope at x = 0 if 0 < k < 1, infinite positive slope at x = 0 if 1 < k < 2 and null slope at x = 0
if k > 2. For k = 2 the density has a finite positive slope at x = 0. As k goes to infinity, the Weibull distribution converges
to a Dirac delta distribution centered at x = λ. Moreover, the skewness and coefficient of variation depend only on the
shape parameter.

Distribution function

The cumulative distribution function for the Weibull distribution is

F (x; k, λ) = 1 − e−(x/λ)
k

for x ≥ 0, and F(x; k; λ) = 0 for x < 0.


The quantile (inverse cumulative distribution) function for the Weibull distribution is

1/k
Q(p; k, λ) = λ(− ln(1 − p))

for 0 ≤ p < 1.
The failure rate h (or hazard function) is given by

k ( x )k−1
h(x; k, λ) = .
λ λ

Moments

The moment generating function of the logarithm of a Weibull distributed random variable is given by[3]

( )
[ ] t
E et log X = λt Γ +1
k
where Γ is the gamma function. Similarly, the characteristic function of log X is given by

( )
[ ] it
E eit log X = λit Γ +1 .
k
In particular, the nth raw moment of X is given by

( n)
mn = λ n Γ 1 + .
k
The mean and variance of a Weibull random variable can be expressed as
2.35. WEIBULL DISTRIBUTION 205

( )
1
E(X) = λΓ 1 +
k
and

[ ( ) ( ( ))2 ]
2 1
2
var(X) = λ Γ 1 + − Γ 1+ .
k k

The skewness is given by

( )
Γ 1 + k3 λ3 − 3µσ 2 − µ3
γ1 =
σ3
where the mean is denoted by μ and the standard deviation is denoted by σ.
The excess kurtosis is given by

−6Γ41 + 12Γ21 Γ2 − 3Γ22 − 4Γ1 Γ3 + Γ4


γ2 =
[Γ2 − Γ21 ]2
where Γi = Γ(1 + i/k) . The kurtosis excess may also be written as:

λ4 Γ(1 + k4 ) − 4γ1 σ 3 µ − 6µ2 σ 2 − µ4


γ2 = −3
σ4

Moment generating function

A variety of expressions are available for the moment generating function of X itself. As a power series, since the raw
moments are already known, one has

∞ n n (
[ ] ∑ t λ n)
E etX = Γ 1+ .
n=0
n! k

Alternatively, one can attempt to deal directly with the integral


[ ] ∞
k ( x )k−1 −(x/λ)k
E etX = etx e dx.
0 λ λ
If the parameter k is assumed to be a rational number, expressed as k = p/q where p and q are integers, then this integral
can be evaluated analytically.[4] With t replaced by −t, one finds

√ ( )
[ ] pk q/p 1−k 2−k p−k pp
1 p , p ,..., p
E e−tX = √ G q,p

λk tk ( 2π)q+p−2 p,q 0 1 q−1
q, q,..., q
(q λk tk )q

where G is the Meijer G-function.


The characteristic function has also been obtained by Muraleedharan et al. (2007). The characteristic function and
moment generating function of 3-parameter Weibull distribution have also been derived by Muraleedharan & Soares
(2014) by a direct approach.
206 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

Information entropy

The information entropy is given by

( ) ( )
1 λ
H(λ, k) = γ 1− + ln +1
k k
where γ is the Euler–Mascheroni constant.

Parameter estimation

Maximum likelihood The maximum likelihood estimator for the λ parameter given k is,

1∑ k
n
λ̂k = x
n i=1 i

The maximum likelihood estimator for k is,

∑n
1∑
k n
−1 i=1 xi ln xi
k̂ = ∑ n k
− ln xi
i=1 xi n i=1

This being an implicit function, one must generally solve for k by numerical means.
When x1 > x2 > ... > xN are the N largest observed samples from a dataset of more than N samples, then the
maximum likelihood estimator for the λ parameter given k is,[5]

1 ∑ k
N
λ̂k = (x − xkN )
N i=1 i

Also given that condition, the maximum likelihood estimator for k is,

∑N
1 ∑
N
i=1 (xi ln xi − xN ln xN )
k k
−1
k̂ = ∑N − ln xi
i=1 (xi − xN )
k k N i=1

Again, this being an implicit function, one must generally solve for k by numerical means.

2.35.3 Weibull plot


The fit of data to a Weibull distribution can be visually assessed using a Weibull Plot.[6] The Weibull Plot is a plot
of the empirical cumulative distribution function F̂ (x) of data on special axes in a type of Q-Q plot. The axes are
ln(− ln(1 − F̂ (x))) versus ln(x) . The reason for this change of variables is the cumulative distribution function can be
linearized:

F (x) = 1 − e−(x/λ)
k

− ln(1 − F (x)) = (x/λ)k


ln(− ln(1 − F (x))) = k ln x − k ln λ
| {z } | {z } | {z }
'y' 'mx' 'c'
2.35. WEIBULL DISTRIBUTION 207

which can be seen to be in the standard form of a straight line. Therefore if the data came from a Weibull distribution
then a straight line is expected on a Weibull plot.
There are various approaches to obtaining the empirical distribution function from data: one method is to obtain the
i−0.3
vertical coordinate for each point using F̂ = n+0.4 where i is the rank of the data point and n is the number of data
[7]
points.
Linear regression can also be used to numerically assess goodness of fit and estimate the parameters of the Weibull
distribution. The gradient informs one directly about the shape parameter k and the scale parameter λ can also be
inferred.
The Weibull distribution is used

• In survival analysis[8]
• In reliability engineering and failure analysis
• In electrical engineering to represent overvoltage occurring in an electrical system
• In industrial engineering to represent manufacturing and delivery times
• In extreme value theory
• In weather forecasting
• To describe wind speed distributions, as the natural distribution often matches the Weibull shape[9]
• In communications systems engineering
• In radar systems to model the dispersion of the received signals level produced by some types of clutters
• To model fading channels in wireless communications, as the Weibull fading model seems to exhibit good fit
to experimental fading channel measurements

• In general insurance to model the size of reinsurance claims, and the cumulative development of asbestosis losses
• In forecasting technological change (also known as the Sharif-Islam model)[10]

• In hydrology the Weibull distribution is applied to extreme events such as annual maximum one-day rainfalls and
river discharges. The blue picture illustrates an example of fitting the Weibull distribution to ranked annually
maximum one-day rainfalls showing also the 90% confidence belt based on the binomial distribution. The rainfall
data are represented by plotting positions as part of the cumulative frequency analysis.

• In describing the size of particles generated by grinding, milling and crushing operations, the 2-Parameter Weibull
distribution is used, and in these applications it is sometimes known as the Rosin-Rammler distribution. In this
context it predicts fewer fine particles than the Log-normal distribution and it is generally most accurate for narrow
particle size distributions. The interpretation of the cumulative distribution function is that F(x; k; λ) is the mass
fraction of particles with diameter smaller than x, where λ is the mean particle size and k is a measure of the spread
of particle sizes.

2.35.4 Related distributions


• The translated Weibull distribution (or 3-parameter Weibull) contains an additional parameter.[3] It has the probability
density function
( )k−1
k x−θ
e−( λ )
x−θ k
f (x; k, λ, θ) =
λ λ
for x ≥ θ and f(x; k, λ, θ) = 0 for x < θ, where k > 0 is the shape parameter, λ > 0 is the scale parameter and θ is the
location parameter of the distribution. When θ=0, this reduces to the 2-parameter distribution.
208 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

• The Weibull distribution can be characterized as the distribution of a random variable W such that the random
variable
( )k
W
X=
λ
is the standard exponential distribution with intensity 1.[3]

• This implies that the Weibull distribution can also be characterized in terms of a uniform distribution: if U is uni-
formly distributed on (0,1), then the random variable W = λ(− ln(U ))1/k is Weibull distributed with parameters
k and λ. (Note that − ln(U ) here is equivalent to X just above.) This leads to an easily implemented numerical
scheme for simulating a Weibull distribution.

• The Weibull distribution interpolates √


between the exponential distribution with intensity 1/λ when k = 1 and a
Rayleigh distribution of mode σ = λ/ 2 when k = 2.

• The Weibull distribution (usually sufficient in reliability engineering) is a special case of the three parameter
exponentiated Weibull distribution where the additional exponent equals 1. The exponentiated Weibull distribution
accommodates unimodal, bathtub shaped*[11] and monotone failure rates.

• The Weibull distribution is a special case of the generalized extreme value distribution. It was in this connection
that the distribution was first identified by Maurice Fréchet in 1927.[12] The closely related Fréchet distribution,
named for this work, has the probability density function

k ( x )−1−k −(x/λ)−k
fFrechet (x; k, λ) = e = ∗fWeibull (x; −k, λ).
λ λ
• The distribution of a random variable that is defined as the minimum of several random variables, each having a
different Weibull distribution, is a poly-Weibull distribution.

• The Weibull distribution was first applied by Rosin & Rammler (1933) to describe particle size distributions. It is
widely used in mineral processing to describe particle size distributions in comminution processes. In this context
the cumulative distribution is given by
{ ( )m
ln(0.2) Px
1−e 80 x ≥ 0,
f (x; P80 , m) =
0 x < 0,
where

x
P80
m

• Because of its availability in Spreadsheets, it is also used where the underlying behavior is actually better modeled
by an Erlang distribution.[13]

2.35.5 See also


• Fisher–Tippett–Gnedenko theorem
• Logistic distribution
• Rosin–Rammler distribution for particle size analysis
2.35. WEIBULL DISTRIBUTION 209

2.35.6 References
[1] Papoulis, Athanasios Papoulis; Pillai, S. Unnikrishna (2002). Probability, Random Variables, and Stochastic Processes (4th ed.).
Boston: McGraw-Hill. ISBN 0-07-366011-6.

[2] http://www.mathworks.com.au/help/stats/rayleigh-distribution.html

[3] Johnson, Kotz & Balakrishnan 1994

[4] See (Cheng, Tellambura & Beaulieu 2004) for the case when k is an integer, and (Sagias & Karagiannidis 2005) for the rational
case.

[5] Sornette, D. (2004). Critical Phenomena in Natural Science: Chaos, Fractals, Self-organization, and Disorder..

[6] The Weibull plot

[7] Wayne Nelson (2004) Applied Life Data Analysis. Wiley-Blackwell ISBN 0-471-64462-5

[8] Survival/Failure Time Analysis

[9] Wind Speed Distribution Weibull

[10] “The Weibull distribution as a general model for forecasting technological change”. Technological Forecasting and Social Change
18: 247–256. doi:10.1016/0040-1625(80)90026-8. Retrieved 2013-09-05.

[11] “System evolution and reliability of systems”. Sysev (Belgium). 2010-01-01.

[12] Montgomery, Douglas. Introduction to statistical quality control. [S.l.]: John Wiley. p. 95. ISBN 9781118146811.

[13] Chatfield, C.; Goodhardt, G.J. (1973). “A Consumer Purchasing Model with Erlang Interpurchase Times”. Journal of the
American Statistical Association 68: 828–835. doi:10.1080/01621459.1973.10481432.

2.35.7 Bibliography
• Fréchet, Maurice (1927), “Sur la loi de probabilité de l'écart maximum”, Annales de la Société Polonaise de Math-
ematique, Cracovie 6: 93–116.
• Johnson, Norman L.; Kotz, Samuel; Balakrishnan, N. (1994), Continuous univariate distributions. Vol. 1, Wiley
Series in Probability and Mathematical Statistics: Applied Probability and Statistics (2nd ed.), New York: John
Wiley & Sons, ISBN 978-0-471-58495-7, MR 1299979
• Muraleedharan, G.; Rao, A.D.; Kurup, P.G.; Nair, N. Unnikrishnan; Sinha, Mourani (2007), “Modified Weibull
Distribution for Maximum and Significant Wave Height Simulation and Prediction”, Coastal Engineering 54 (8):
630–638, doi:10.1016/j.coastaleng.2007.05.001
• Muraleedharan, G.; Soares, C.G. (2014), “Characteristic and Moment Generating Functions of Generalised Pareto
(GP3) and Weibull Distributions”, Journal of Scientific Research and Reports 3 (14): 1861–1874, doi:10.9734/JSRR/2014/10087.
• Rosin, P.; Rammler, E. (1933), “The Laws Governing the Fineness of Powdered Coal”, Journal of the Institute of
Fuel 7: 29–36.
• Sagias, Nikos C.; Karagiannidis, George K. (2005), “Gaussian class multivariate Weibull distributions: theory and
applications in fading channels” (PDF), Institute of Electrical and Electronics Engineers. Transactions on Information
Theory 51 (10): 3608–3619, doi:10.1109/TIT.2005.855598, ISSN 0018-9448, MR 2237527
• Weibull, W. (1951), “A statistical distribution function of wide applicability” (PDF), J. Appl. Mech.-Trans. ASME
18 (3): 293–297.
• “Engineering statistics handbook”. National Institute of Standards and Technology. 2008. |chapter= ignored (help)
• Nelson, Jr, Ralph (2008-02-05). “Dispersing Powders in Liquids, Part 1, Chap 6: Particle Volume Distribution”.
Retrieved 2008-02-05.
210 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

2.35.8 External links


• Hazewinkel, Michiel, ed. (2001), “Weibull distribution”, Encyclopedia of Mathematics, Springer, ISBN 978-1-
55608-010-4

• Mathpages - Weibull Analysis


• The Weibull Distribution

• Reliability Analysis with Weibull


• Interactive graphic: Univariate Distribution Relationsqrhips
2.35. WEIBULL DISTRIBUTION 211

Ronald Fisher
212 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

Fitted cumulative Fréchet distribution to extreme one-day rainfalls


2.35. WEIBULL DISTRIBUTION 213

Illustration of the gamma PDF for parameter values over k and x with θ set to 1, 2, 3, 4, 5 and 6. One can see each θ layer by itself here
as well as by k and x. .
214 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

Illustration of the Kullback–Leibler (KL) divergence for two gamma PDFs. Here β = β0 + 1 which are set to 1, 2, 3, 4, 5 and 6. The
typical asymmetry for the KL divergence is clearly visible.
2.35. WEIBULL DISTRIBUTION 215

Wald Distribution using Python with aid of matplotlib and NumPy


216 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

Probability density function for the Lévy distribution on a log-log scale.


2.35. WEIBULL DISTRIBUTION 217

Hazard function. α = 1, values of β as shown in legend


218 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

Fitted cumulative log-logistic distribution to maximum one-day October rainfalls using CumFreq, see also Distribution fitting
2.35. WEIBULL DISTRIBUTION 219

1.6
mode
1.4
median
1.2
mean
1.0
0.8

0.6 σ = 0.25
0.4

0.2
σ=1
0.0
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2
Comparison of mean, median and mode of two log-normal distributions with different skewness.
220 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

Fitted cumulative log-normal distribution to annually maximum 1-day rainfalls, see distribution fitting
2.35. WEIBULL DISTRIBUTION 221

Fitted cumulative Pareto (Lomax) distribution to maximum one-day rainfalls using CumFreq, see also distribution fitting
222 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

Lorenz curves for a number of Pareto distributions. The case α = ∞ corresponds to perfectly equal distribution (G = 0) and the α = 1
line corresponds to complete inequality (G = 1)
2.35. WEIBULL DISTRIBUTION 223

Diagram of the Pearson system, showing distributions of types I, III, VI, V, and IV in terms of β1 (squared skewness) and β2 (traditional
kurtosis)
224 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

Plot of Pearson type VII densities with λ = 0, σ = 1, and: γ2 = ∞ (red); γ2 = 4 (blue); and γ2 = 0 (black)
2.35. WEIBULL DISTRIBUTION 225

R
ν

In the 2D plane, pick a fixed point at distance ν from the origin. Generate a distribution of 2D points centered around that point, where
the x and y coordinates are chosen independently from a gaussian distribution with standard deviation σ (blue region). If R is the distance
from these points to the origin, then R has a Rice distribution.
226 CHAPTER 2. CONTINUOUS DISTRIBUTIONS - SUPPORTED ON SEMI-INFINITE INTERVALS, USUALLY [0,∞)

Fitted cumulative Weibull distribution to maximum one-day rainfalls using CumFreq, see also distribution fitting
Chapter 3

Text and image sources, contributors, and


licenses

3.1 Text
• List of probability distributions Source: https://en.wikipedia.org/wiki/List_of_probability_distributions?oldid=683689286 Contributors:
Fnielsen, Michael Hardy, Robinh, Giftlite, Smalljim, PAR, Mindmatrix, Btyner, Nanite, [email protected], Schmock, Gargoyle888, Katieh5584,
Fuzzyrandom, Oli Filth, Kjetil1001, G716, Autopilot, Ben Moore, Peleg, Winterfors, Cydebot, Davidhof, IanOsgood, Coffee2theorems,
Albmont, Paresnah, It Is Me Here, Wastle, DrMicro, TXiKiBoT, Nschuma, Rlendog, Sheppa28, Yerpo, Melcombe, Rumping, Carolus m,
Skbkekas, Qwfp, Addbot, Rmalouf, Cbauckhage, LinkFA-Bot, Mdnahas, Csigabi, Xqbot, Tomaschwutz, Ehsan.azhdari, Kastchei, Pjsanchez,
Ibayes, GonzoEcon, Illia Connell, Dexbot, MoreHumanThanNot, Smason79, Limit-theorem, Parwig, Herbmuell, Julia Abril, Carterkd and
Anonymous: 30
• Bernoulli distribution Source: https://en.wikipedia.org/wiki/Bernoulli_distribution?oldid=689855034 Contributors: Bryan Derksen, Miguel~enwiki,
Olivier, Michael Hardy, Tomi, TakuyaMurata, Whkoh, Poor Yorick, Rossami, Charles Matthews, Jitse Niesen, Robbot, Wtanaka, Weialawaga~enwiki,
Giftlite, MarkSweep, Rdsmith4, Urhixidur, Discospinster, Jpk, El C, Runner1928, Musiphil, Eric Kvaalen, Complex01, PAR, Cburnett, Aquae,
Tomash, Btyner, Mathbot, YurikBot, Wavelength, Schmock, SmackBot, RDBury, Federalist51, Zven, Iwaterpolo, Bando26, Yoderj, FilipeS,
AlekseyP, Thijs!bot, Wikid77, Lovibond, Pabristow, .anacondabot, Albmont, Aziz1005, User A1, Lilac Soul, ILikeHowMuch, Policron,
Philip Trueman, TXiKiBoT, A4bot, Camkego, Typofier, Sharmistha1, OKBot, Melcombe, Jt, Alexbot, Qwfp, Bgeelhoed, Addbot, MrOllie,
Ozob, Luckas-bot, Wjastle, Deepakazad, Xqbot, Bdmy, Kyng, Erik9bot, Lothar von Richthofen, Herix, Amonet, TobeBot, Trappist the monk,
Jowa fan, EmausBot, User3000, AvicBot, Flatland1, ChuispastonBot, Kasirbot, Alex.j.flint, Beaumont877, MusikAnimal, Andreas27krause,
Theyshallbow, Cwobeel, Dr. J. Rodal, Jochen Burghardt, NikelsenH, BeyondNormality, Parswerk, VictorM Casero, Taste wicki, Loraof and
Anonymous: 61
• Rademacher distribution Source: https://en.wikipedia.org/wiki/Rademacher_distribution?oldid=677925716 Contributors: Michael Hardy,
Tomi, Dean p foster, MisterSheik, PAR, Btyner, Entropeneur, SmackBot, Reko, Baccyak4H, R'n'B, TomyDuby, VolkovBot, DrMicro, SieBot,
Qwfp, Addbot, Erik9bot, TobeBot, Duoduoduo, MidgleyC, EmausBot, ZéroBot, Koertefa, Lfj 2, QuarkyPi, ChrisGualtieri, Dexbot, Beyond-
Normality, Virion123, Joemore05 and Anonymous: 7
• Binomial distribution Source: https://en.wikipedia.org/wiki/Binomial_distribution?oldid=689603800 Contributors: AxelBoldt, Bryan Derk-
sen, -- April, Miguel~enwiki, AdamRetchless, Youandme, Michael Hardy, David Martland, Tomi, TakuyaMurata, Ellywa, Ahoerstemeier,
Charles Matthews, Timwi, Jitse Niesen, Br43402, Phr, McKay, Robbot, Sander123, Cdang, Benwing, Seglea, MichaelGensheimer, Henrygb,
JB82, Robinh, Wile E. Heresiarch, Giftlite, BenFrantzDale, MSGJ, Mboverload, Knutux, LiDaobing, MarkSweep, Gauss, Atemperman, Xiao
Fei, PhotoBox, Rich Farmbrough, Guanabot, Paul August, Nabla, MisterSheik, Pt, Elipongo, Musiphil, Gary, Eric Kvaalen, Atlant, Rgc-
legg, PAR, Supergroupiejoy, Cburnett, Oleg Alexandrov, Postrach, Linas, Mindmatrix, LOL, Crackerbelly, Pufferfish101, Btyner, Graham87,
Qwertyus, Rjwilmsi, NatusRoma, [email protected], Westm, New Thought, Ayla, DVdm, Volunteer Marek, YurikBot, Hede2000,
Zwobot, Hirak 99, Deville, Lt-wiki-bot, Zmoboros, Ilmari Karonen, SmackBot, Blue520, Aarond10, Nbarth, Colonies Chris, Iwaterpolo,
Can't sleep, clown will eat me, Brutha~enwiki, G716, Neshatian, Rjmorris, ML5, ZantTrang, Dicklyon, Bill Malloy, Ylloh, Falk Lieder,
DavidFHoughton, WeggeBot, Stebulus, Eesnyder, Janlo, Talgalili, Thijs!bot, Fisherjs, Wikid77, N5iln, Anupam, Ruber chiken, AntiVan-
dalBot, Smachet, VectorPosse, JEH, AchatesAVC, Daytona2, MER-C, Ph.eyes, Dricherby, VoABot II, JamesBWatson, Baccyak4H, Froid,
Homunq, Mmustafa~enwiki, MartinBot, R'n'B, Lucaswilkins, Mahewa, Gill110951, Coppertwig, N6ne, Spellcast, Gogobera, DrMicro, Pleas-
antville, Blahb31, Clay Spence, TXiKiBoT, Marvinrulesmars, Toll booth, A4bot, Steven J. Anderson, Nschuma, Stigin, Logan, WillKitch,
Nguyenngaviet, Quietbritishjim, SieBot, Rlendog, BotMultichill, Gerald Tros, Garde, Hxhbot, Allmightyduck, Johnstjohn, OKBot, AlanUS,
Melcombe, Redtryfan77, PsyberS, ClueBot, Rumping, Koczy, GorillaWarfare, The Thing That Should Not Be, Meisterkoch, Kmassey, UKoch,
Gauravm1312, Excirial, DaDexter, Watchduck, Madkaugh, Kakofonous, Qwfp, XLinkBot, Kwjbot, Efexan~enwiki, Alexius08, Tayste, MrOl-
lie, SoSaysChappy, Tyw7, AsphyxiateDrake, Legobot, Tedtoal, Luckas-bot, Yobot, Cflm001, Qonnec, Wjastle, Moseschinyama, AnomieBOT,
Erel Segal, Joule36e5, Rubinbot, Materialscientist, ArthurBot, Xqbot, Nasnema, J04n, Locobot, Mhadi.afrasiabi, Ajs072, Citation bot 1, In-
telligentsium, Wa03, Tal physdancer, Pinethicket, Stpasha, BPets, Gperjim, Mr Ape, Ian.Shannon, Tim1357, FoxBot, Sintau.tayua, TobeBot,
Dinamik-bot, Bmazin, Duoduoduo, Innotata, Alzarian16, EmausBot, Vincent Semeria, Tpudlik, Yuzisee, Welhaven, Chewings72, Sigma0

227
228 CHAPTER 3. TEXT AND IMAGE SOURCES, CONTRIBUTORS, AND LICENSES

1, Akseli.palen, Cuttlefishy, ClueBot NG, Sealed123, Helpful Pixie Bot, Ljwsummer, BG19bot, Pallaviagarwal90, Vagobot, Mark Arsten,
MC-CPO, Especially Lime, Markonius, Arr4, Dexbot, Stephan Kulla, Frosty, 069952497a, Aint one, Gmk7, Wikistiwari, Hdchina2010,
BeyondNormality, Stdp, Millstei, Knife-in-the-drawer, Pooipedia, Will Perry, Mukhtiarhussainqazi and Anonymous: 312
• Beta-binomial distribution Source: https://en.wikipedia.org/wiki/Beta-binomial_distribution?oldid=686968336 Contributors: Michael Hardy,
Benwing, Giftlite, Jonsafari, Rjwilmsi, Gadget850, Chris the speller, Gnp, Massbless~enwiki, Myasuda, Michael Fourman, Tomixdf, Bac-
cyak4H, Charlesmartin14, Akiezun, Domminico, Willy.feng, Nschuma, Sheppa28, Melcombe, Thouis.r.jones, UKoch, Auntof6, Qwfp, Ad-
dbot, Luckas-bot, Yobot, Frederic Y Bois, BenzolBot, PigFlu Oink, RedBot, GoingBatty, ZéroBot, Thtanner, Jack Greenmaven, Kdisarno,
Jestingrabbit, Chafe66, Sieste, Herbmuell, BeyondNormality, Bicycledreamer and Anonymous: 25
• Degenerate distribution Source: https://en.wikipedia.org/wiki/Degenerate_distribution?oldid=664315407 Contributors: Bryan Derksen, Gareth
Owen, XJaM, PierreAbbat, Miguel~enwiki, Michael Hardy, TakuyaMurata, Charles Matthews, PAR, Ryan Reich, Btyner, YurikBot, RussBot,
Petter Strandmark, Gareth Jones, SmackBot, Radagast83, Moloch981, Baccyak4H, Quietbritishjim, Rlendog, Melcombe, Rumping, Qwfp,
Addbot, MrVanBot, Legobot, Xqbot, Erik9bot, Xnn, EmausBot, Darkness Shines, IkamusumeFan, ChrisGualtieri, Bryanrutherford0, Beyond-
Normality, Loraof, Speinstene27 and Anonymous: 13
• Hypergeometric distribution Source: https://en.wikipedia.org/wiki/Hypergeometric_distribution?oldid=687136612 Contributors: Bryan Derk-
sen, Michael Hardy, Booyabazooka, Tomi, TakuyaMurata, Sboehringer, David Shay, Robbot, Josh Cherry, Schutz, Benwing, Giftlite, MSGJ,
Zigger, Gauss, Goshng, El C, Kamrik, Maximilianh, Rgclegg, PAR, Burn, Wtmitchell, Cburnett, Linas, Mindmatrix, LOL, Pol098, Comman-
der Keane, Btyner, Marudubshinki, RichardWeiss, YurikBot, Entropeneur, Ott2, Janto, Kingboyk, GrinBot~enwiki, Bo Jacoby, Eug, Bluebot,
Nbarth, Iwaterpolo, Bilgrau, J. Finkelstein, CmdrObot, MaxEnt, Mikewax, Talgalili, Thijs!bot, PBH~enwiki, Felipehsantos, Herr blaschke,
.anacondabot, Livingthingdan, Baccyak4H, David Eppstein, User A1, Antoine 245, It Is Me Here, Jia.meng, Pleasantville, Blahb31, Clay
Spence, TXiKiBoT, FedeLebron, Johnlv12, Nerdmaster, Arnold90, Screech1941, Melcombe, ClueBot, Rumping, SkatingNerd, UKoch, Qwfp,
Veryhuman, DavidLDill, Porejide, Alexius08, Jht4060, Addbot, DarrylNester, Wtruttschel, MrVanBot, LaaknorBot, ‫زرشک‬, HerculeBot, Jack
Joff, Luckas-bot, Yobot, Yvswan, AnomieBOT, Erel Segal, DirlBot, Randomactsofkindness2, Xqbot, Makeswell, ChevyC, Drcrnc, Seattle
Jörg, Prőhle Tamás, Gnathan87, Duoduoduo, RjwilmsiBot, Skaphan, Peteraandrews, I9606, AManWithNoPlan, Reb42, Eidolon232, ClueBot
NG, Frietjes, Gunungblau, Wbm1058, Mlxq531006, Intervallic, MC-CPO, Mtmoore321, BattyBot, Isabel duarte, Kiwi4boy, Cammy169,
I3iaaach, Groenger, SimonPerera, Mark viking, Equilibrium Allure, Milan Malinsky, MaEtUgR, BeyondNormality, Salubrious Toxin, Ana
Caroline França and Anonymous: 115
• Poisson binomial distribution Source: https://en.wikipedia.org/wiki/Poisson_binomial_distribution?oldid=661413838 Contributors: Michael
Hardy, Jabowery, Rjwilmsi, Winterfors, Melcombe, Motmahp, MystBot, Addbot, Yobot, Entropeter, Rafael Calsaverini, RjwilmsiBot, Chuis-
pastonBot, Jomtung, Yili.hong1, BeyondNormality, Monkbot, Spanachan and Anonymous: 10
• Fisher’s noncentral hypergeometric distribution Source: https://en.wikipedia.org/wiki/Fisher’{}s_noncentral_hypergeometric_distribution?
oldid=687136383 Contributors: Michael Hardy, RichardWeiss, SmackBot, Headbomb, SHCarter, Arnold90, UKoch, Addbot, Yobot, Citation
bot, DirlBot, Nicolas Perrault III, Citation bot 1, RjwilmsiBot, CitationCleanerBot, BeyondNormality and Anonymous: 4
• Wallenius’ noncentral hypergeometric distribution Source: https://en.wikipedia.org/wiki/Wallenius’{}_noncentral_hypergeometric_distribution?
oldid=677989858 Contributors: CBM, TheFearow, Magioladitis, Tomaxer, Arnold90, Melcombe, Rumping, UKoch, Yobot, Citation bot,
GrouchoBot, Omnipaedista, Citation bot 1, Lucas Thoms, KLBot2, BeyondNormality, BigCrunsh~enwiki and Anonymous: 4
• Benford’s law Source: https://en.wikipedia.org/wiki/Benford’{}s_law?oldid=690149615 Contributors: AxelBoldt, Calypso, Bryan Derksen,
Fnielsen, ChangChienFu, Leandrod, Edward, Michael Hardy, Tez, Shyamal, Tomi, Axlrosen, Ahoerstemeier, DavidWBrooks, Den fjät-
trade ankan~enwiki, Randywombat, Samw, Jacquerie27, Cherkash, Charles Matthews, Doradus, Taxman, BenRG, Jeffq, Gandalf61, Hen-
rygb, KellyCoinGuy, Paul Murray, Giftlite, Wizzy, BenFrantzDale, Orangemike, MSGJ, Curps, Finn-Zoltan, Eequor, Tagishsimon, Wma-
han, Mmm~enwiki, Noe, Beland, MacGyverMagic, Pmanderson, Urhixidur, Aioth, Thorwald, Ayager, Jason Carreiro, Lubaf, 4pq1injbok,
Rich Farmbrough, Smyth, Sam Derbyshire, Bender235, ESkog, Elwikipedista~enwiki, Janna Isabot, Foobaz, Jumbuck, Burn, Cburnett,
Amorymeltzer, Ceyockey, Mindmatrix, Dzordzm, Waldir, Frankie1969, Reddwarf2956, RuM, Lord.lucan, Rjwilmsi, Bubba73, The wub,
FlaBot, Mathbot, Harmil, Srleffler, Hillman, RussBot, Philopedia, Lsdan, Kinser, Buster79, Irishguy, Meira, Amakuha, Avraham, Eurosong,
Red Jay, Carabinieri, Mrwright, Geoffrey.landis, Jack Upland, Cmglee, Groyolo, Sbyrnes321, That Guy, From That Show!, AndrewWTay-
lor, Das my, A13ean, SmackBot, InverseHypercube, Betacommand, Thumperward, Morte, Nbarth, Kindall, Michael.Pohoreski, Kevinpur-
cell, Sholom, Cybercobra, Mitar, Hgilbert, Circumspice, Derek farn, Michael Bednarek, Kompere, Vashtihorvat, Hu12, DouglasCalvert,
Kencf0618, IanOfNorwich, Achoo5000, Amniarix, VoxLuna, David s graff, Nunquam Dormio, Doctormatt, Reywas92, Kweeket, Uncle-
Bubba, Pcu123456789, Headbomb, Uruiamme, Escarbot, Oreo Priest, StringRay, AstroLynx, Subwaynyc, Jj137, Cyclotome, Hannes Eder,
SamIAmNot, JAnDbot, Oxinabox, Eurobas, Mkch, PhilKnight, PChalmer, Jakob.scholbach, Baccyak4H, Paul Niquette, Johnbibby, JJ Harri-
son, David Eppstein, Lonewolf1313, ExoSynth, Jordi G, SteveChervitzTrutane, STeamTraen, SuneJ~enwiki, DrMicro, Pleasantville, Akwdb,
PMajer, Gpeilon, Lanzkron, MBlakley, Kojones, KWRegan, Drnathanfurious, Bdb484, Cgwaldman, Sheppa28, Euryalus, Gknor, Alexsmail,
BGrayson, Oxymoron83, Lightmouse, Sunrise, Sean.hoyland, Melcombe, Mr. Stradivarius, MenoBot, DavidHobby, ClueBot, Rumping, Im-
franklyn, Ebster95, DragonBot, Robertharder, Sun Creator, NuclearWarfare, Qwfp, Johnuniq, Tam0031, Justin Mauger, XLinkBot, Charles
Sturm, Asrghasrhiojadrhr, Addbot, DOI bot, Download, Protonk, LinkFA-Bot, Tassedethe, Numbo3-bot, Ehrenkater, Lightbot, Jlederluis,
FarhanC99, Luckas-bot, Yobot, Nghtwlkr, Julia W, Jeremyleader, Albrodax, Azylber, AnomieBOT, Nishkarshs, Citation bot, Cactusthorn,
Srich32977, CurmudgeonlyEditor, AV3000, Tarantulae, Sprlzrd, BYZANTIVM, Joxemai, Nameless23, Prari, Social Norm, Citation bot
1, Marcus erronius, Adlerbot, MondalorBot, Foobarnix, Trappist the monk, Gnathan87, Crtolle, Jackessler, Tbhotch, Dalba, NameIsRon,
Malurth, Ash.matadeen, JaeDyWolf, K6ka, Malcolm77, U+003F, Tijfo098, Friendshao, Mathstat, Iiii I I I, Vskipper, Snotbot, Reify-tech,
Bibcode Bot, Technical 13, WikiTryHardDieHard, Foxtod, TLAN38, Chafe66, Tropcho, ChrisGualtieri, Illia Connell, Dexbot, Kiwi4boy,
Kkved, Lingvano, REM888, BeyondNormality, Monkbot, AmandaJohnson2014, Chuluojun, Todd Christopher Headrick, Machavanne, Pe-
tersonharryboy, Mdude2005, Meucat, Rdodds, Francesco98989899 and Anonymous: 189
• Beta prime distribution Source: https://en.wikipedia.org/wiki/Beta_prime_distribution?oldid=672462497 Contributors: Michael Hardy, Tomi,
Cyan, Selket, Robinh, Giftlite, Rich Farmbrough, MisterSheik, Eric Kvaalen, Bookandcoffee, Oleg Alexandrov, Btyner, Rjwilmsi, Krish-
navedala, Schmock, Entropeneur, SmackBot, Maksim-e~enwiki, R'n'B, RomainThibaux, SieBot, Sheppa28, OKBot, Melcombe, Celique,
Rumping, TerryM--re, Qwfp, Mingovia, Addbot, AnomieBOT, Xqbot, RjwilmsiBot, ZéroBot, BG19bot, Purple Post-its, Herbmuell, Beyond-
Normality, Monkbot and Anonymous: 13
3.1. TEXT 229

• Birnbaum–Saunders distribution Source: https://en.wikipedia.org/wiki/Birnbaum%E2%80%93Saunders_distribution?oldid=650573858


Contributors: Michael Hardy, Btyner, Rjwilmsi, Khazar, Magioladitis, VolkovBot, DrMicro, Melcombe, UKoch, Qwfp, Sandrobt, Addbot,
Yobot, Citation bot, RjwilmsiBot, BeyondNormality, Mxalsh and Anonymous: 5
• Chi distribution Source: https://en.wikipedia.org/wiki/Chi_distribution?oldid=683998211 Contributors: Michael Hardy, Dbenbenn, Rich
Farmbrough, MisterSheik, PAR, Btyner, Krishnavedala, SmackBot, Tom Lougheed, Aastrup, Bluebot, Nbarth, Iwaterpolo, Odedee, Harish
victory, Ensign beedrill, User A1, TXiKiBoT, Sheppa28, Melcombe, SchreiberBike, Qwfp, DumZiBoT, FellGleaming, Addbot, Josevellezcal-
das, Meisam, Yobot, Wjastle, Erik9bot, Nicolas Perrault III, Kastchei, EmausBot, Vsbasto, ZéroBot, MelbourneStar, BeyondNormality and
Anonymous: 20
• Chi-squared distribution Source: https://en.wikipedia.org/wiki/Chi-squared_distribution?oldid=690329433 Contributors: AxelBoldt, Bryan
Derksen, The Anome, Ap, Michael Hardy, Stephen C. Carlson, Tomi, Mdebets, Ronz, Den fjättrade ankan~enwiki, Willem, Jitse Niesen,
Hgamboa, Fibonacci, Zero0000, AaronSw, Robbot, Sander123, Seglea, Henrygb, Robinh, Isopropyl, Weialawaga~enwiki, Giftlite, Dbenbenn,
BenFrantzDale, Herbee, Sietse, MarkSweep, Gauss, Zfr, Fintor, Rich Farmbrough, Dbachmann, Paul August, Bender235, MisterSheik, O18,
TheProject, NickSchweitzer, Iav, Jumbuck, B k, Kotasik, Sligocki, PAR, Cburnett, Shoefly, Oleg Alexandrov, Mindmatrix, Btyner, Rjwilmsi,
Pahan~enwiki, Salix alba, FlaBot, Alvin-cs, Pstevens, Bgwhite, Philten, Roboto de Ajvol, YurikBot, Wavelength, Jtbandes, Schmock, Tony1,
Zwobot, Jspacemen01-wiki, Reyk, Zvika, KnightRider~enwiki, SmackBot, Eskimbot, BiT, Afa86, Bluebot, TimBentley, Master of Pup-
pets, Silly rabbit, Nbarth, AdamSmithee, Iwaterpolo, Eliezg, Wen D House, A.R., G716, Saippuakauppias, Rigadoun, Loodog, Mgiganteus1,
Qiuxing, Funnybunny, Chris53516, Tawkerbot2, Jackzhp, CBM, Rflrob, Dgw, FilipeS, Blaisorblade, Talgalili, Thijs!bot, DanSoper, Lovi-
bond, Pabristow, MER-C, Plantsurfer, Mcorazao, J-stan, Leotolstoy, Wasell, VoABot II, Jaekrystyn, User A1, TheRanger, MartinBot, STBot,
Steve8675309, Neon white, Icseaturtles, It Is Me Here, TomyDuby, Mikael Häggström, Quantling, Policron, Nm420, HyDeckar, Sam Black-
eter, DrMicro, LeilaniLad, Gaara144, AstroWiki, Notatoad, Johnlv12, Wesamuels, Tarkashastri, Quietbritishjim, Rlendog, Sheppa28, Phe-
bot, Jason Goldstick, Tombomp, OKBot, Melcombe, Digisus, Volkan.cevher, Loren.wilton, Animeronin, ClueBot, Jdgilbey, MATThematical,
UKoch, SamuelTheGhost, EtudiantEco, Bluemaster, Qwfp, XLinkBot, Knetlalala, MystBot, Paulginz, Fergikush, Tayste, Addbot, Fgnievin-
ski, Fieldday-sunday, MrOllie, Download, LaaknorBot, Renatokeshet, Lightbot, Ettrig, Chaldor, Luckas-bot, Yobot, Wjastle, Johnlemarti-
rao, AnomieBOT, Microball, MtBell, Materialscientist, Geek1337~enwiki, EOBarnett, DirlBot, LilHelpa, Lixiaoxu, Xqbot, Eliel Jimenez,
Etoombs, Control.valve, NocturneNoir, GrouchoBot, RibotBOT, Entropeter, Shadowjams, Griffinofwales, Constructive editor, FrescoBot,
Tom.Reding, Stpasha, MastiBot, Gperjim, Fergusq, Xnn, RjwilmsiBot, Kastchei, Alph Bot, Wassermann7, Markg0803, EmausBot, John
of Reading, Yuzisee, Dai bach, Pet3ris, U+003F, Zephyrus Tavvier, Levdtrotsky, ChuispastonBot, Emilpohl, Brycehughes, ClueBot NG,
BG19bot, Analytics447, Snouffy, Drhowey, Dlituiev, Minsbot, Dexbot, HelicopterLlama, Limit-theorem, Ameer diaa, Idoz he, Zjbranson,
DonaghHorgan, Catalin.ghervase, BeyondNormality, Monkbot, Alakzi, Bderrett, Uceeylu, Michaelg2015, Zcollvee and Anonymous: 247
• Dagum distribution Source: https://en.wikipedia.org/wiki/Dagum_distribution?oldid=680740995 Contributors: Michael Hardy, Nabla, Rgdboer,
SmackBot, KylieTastic, Sheppa28, Melcombe, Addbot, MarkAHershberger, Csigabi, Shadowjams, ZéroBot, GonzoEcon, Solomon7968 and
BeyondNormality
• Exponential distribution Source: https://en.wikipedia.org/wiki/Exponential_distribution?oldid=689861550 Contributors: AxelBoldt, CYD,
Bryan Derksen, Taw, Taral, Enchanter, Isis~enwiki, Edward, Michael Hardy, Rp, Dcljr, Tomi, Cyp, Den fjättrade ankan~enwiki, Smack, Dcoet-
zee, Fvw, Wilke, Robbot, Benwing, Henrygb, Decrypt3, Giftlite, Zaha, MarkSweep, Gauss, Karl-Henner, Rich Farmbrough, Vsmith, Paul
August, MisterSheik, Mdf, Mwanner, Kappa, PAR, Cburnett, Jheald, Wyatts, Woohookitty, Mindmatrix, Igny, LOL, Btyner, A3r0, MekaD,
Rjwilmsi, Pruneau, Mathbot, Shaile, Bgwhite, YurikBot, RobotE, Zeno of Elea, Avraham, Johndburger, Closedmouth, MStraw, Nothlit, Ilmari
Karonen, Zvika, Shingkei, Zeycus, Cazort, Mcld, Remohammadi, TimBentley, Iwaterpolo, Memming, Aldaron, Spartanfox86, Mattroberts,
CapitalR, Markjoseph125, Erzbischof, Skittleys, Talgalili, Thijs!bot, Hsne, Headbomb, JAnDbot, IanOsgood, Beaumont, .anacondabot, Cof-
fee2theorems, A.M.R., User A1, R'n'B, Grinofadrunkwoman, Policron, Largoplazo, Jester7777, VolkovBot, Dudubur, TXiKiBoT, A4bot,
Rei-bot, Z.E.R.O., Groceryheist, SieBot, Rlendog, Sheppa28, Jason Goldstick, Aiden Fisher, OKBot, Zzxterry, Water and Land, Anchor Link
Bot, Melcombe, GorillaWarfare, The Thing That Should Not Be, WDavis1911, UKoch, Thegeneralguy, DragonBot, Skbkekas, Schreiber-
Bike, Qwfp, Mejjem, MystBot, Addbot, Drevicko, MrOllie, Butchbrody, HerculeBot, Luckas-bot, Yobot, Wjastle, Kan8eDie, AnomieBOT,
Ularevalo98, Sergey Suslov, ArthurBot, Bdmy, Isheden, NOrbeck, Kyng, FrescoBot, BenzolBot, Oysindi, Calmer Waters, Avabait, Stpasha,
Sss41, Amonet, ActivExpression, TobeBot, Duoduoduo, Kastchei, Wassermann7, Yuzisee, Yoyod, ZéroBot, Aria802, Zephyrus Tavvier,
Scortchi, ClueBot NG, Asitgoes, BarrelProof, Mpaa, Helpful Pixie Bot, CD.Rutgers, Boriaj, Hyoseok, Trombonechamp, René Vápeník,
ChrisGualtieri, Mogism, Burzuchius, SFK2, Linuxjava, Tertius51, Carloslizarragac, BeyondNormality, Jodawill, Garthtarr, 16Gred, GVpep,
Engheta, Melanie.zbrooks, Thefeudalspirit, Palavian, Uceeylu and Anonymous: 195
• F-distribution Source: https://en.wikipedia.org/wiki/F-distribution?oldid=688771796 Contributors: Bryan Derksen, Fnielsen, Michael Hardy,
Tomi, Mdebets, Cherkash, Dysprosia, Jitse Niesen, Robbot, Seglea, Henrygb, Robinh, Wile E. Heresiarch, Giftlite, MarkSweep, Oscar, O18,
Arthena, Cburnett, Jheald, Btyner, RichardWeiss, Salix alba, Elmer Clark, Nehalem, Timholy, SmackBot, Unyoyega, Adouzzy, Commander
Keane bot, TedE, Markjoseph125, Irwangatot, Thijs!bot, PBH~enwiki, DanSoper, Zorgkang, JAnDbot, Hectorlamadrid, Albmont, Livingth-
ingdan, Brenda Hmong, Jr, TomyDuby, Ged.R, Quietbritishjim, Sheppa28, OKBot, Melcombe, UKoch, Alexbot, Razorflame, Bluemaster,
Qwfp, XLinkBot, The Squicks, Addbot, DarrylNester, Fgnievinski, MrOllie, SpBot, Jan eissfeldt, HerculeBot, Luckas-bot, Yobot, Ptbotgourou,
Materialscientist, Xqbot, GrouchoBot, JokeySmurf, Pinethicket, Tom.Reding, Gperjim, Amonet, Kastchei, EmausBot, ZéroBot, Ethaniel,
Art2SpiderXL, Emilpohl, ClueBot NG, Asitgoes, HMSSolent, MusikAnimal, Califasuseso, IkamusumeFan, Kondormari, BeyondNormality,
Monkbot, Loraof and Anonymous: 49
• Fisher’s z-distribution Source: https://en.wikipedia.org/wiki/Fisher’{}s_z-distribution?oldid=690265701 Contributors: Fnielsen, Dunchar-
ris, Eric Kvaalen, Stemonitis, Btyner, RichardWeiss, Rjwilmsi, Tevildo, Headbomb, VolkovBot, Hey jude, don't let me down, Sheppa28,
Melcombe, SchreiberBike, Qwfp, Addbot, Bte99, Lightbot, WikiDreamer Bot, WikitanvirBot, ZéroBot, BeyondNormality, Monkbot and
Anonymous: 5
• Folded normal distribution Source: https://en.wikipedia.org/wiki/Folded_normal_distribution?oldid=654482378 Contributors: Michael Hardy,
Smalljim, PAR, Btyner, Rjwilmsi, Vossman, SamuelRiv, SmackBot, O keyes, Alaibot, Karho.Yau, MarshBot, Krowsky, VolkovBot, Jeff G.,
Rlendog, Melcombe, Worfolk, Qwfp, Addbot, DOI bot, Qorilla, Dannaf, Citation bot 1, ZéroBot, ClueBot NG, CitationCleanerBot, Beyond-
Normality, Upliftmofoe and Anonymous: 12
230 CHAPTER 3. TEXT AND IMAGE SOURCES, CONTRIBUTORS, AND LICENSES

• Fréchet distribution Source: https://en.wikipedia.org/wiki/Fr%C3%A9chet_distribution?oldid=683744017 Contributors: Michael Hardy,


Tomi, Arthena, Gene Nygaard, SmackBot, Xtaty~enwiki, Nutcracker, A. Pichler, VolkovBot, Rlendog, Sheppa28, Melcombe, Neznanec,
Rumping, Qwfp, Addbot, Yobot, AnomieBOT, Csigabi, Xqbot, FrescoBot, LucienBOT, Jonesey95, EmausBot, ZéroBot, JA(000)Davidson,
Asitgoes, Amr.rs, Helpful Pixie Bot, BG19bot, Risk modeler, BeyondNormality and Anonymous: 11
• Gamma distribution Source: https://en.wikipedia.org/wiki/Gamma_distribution?oldid=686932783 Contributors: Bryan Derksen, Fnielsen,
Michael Hardy, Tomi, Barak~enwiki, A5, Samsara, Phil Boswell, Robbot, Robbyjo~enwiki, Benwing, Gandalf61, Henrygb, Robinh, Dougg,
Ambarish, Brenton, Wile E. Heresiarch, Giftlite, Paul Pogonyshev, Darin, Fangz, MarkSweep, Gauss, MisterSheik, Bobo192, O18, Com-
plex01, PAR, Velella, Cburnett, LukeSurl, Linas, David Haslam, Smmurphy, Btyner, Jshadias, Rjwilmsi, Hgkamath, Chobot, Zebediah49,
DVdm, Bgwhite, Wavelength, Jlc46, Dobromila, Gaius Cornelius, Schmock, Zwobot, Entropeneur, Thomas stieltjes, Arthur Rubin, Erik144,
Mebden, Bo Jacoby, Zvika, SmackBot, Adfernandes, Patrick.Wuechner, Mcld, Aastrup, Eug, MalafayaBot, Adam Clark, Colonies Chris, Arg,
Iwaterpolo, Berland, Wiki me, Autopilot, Lambiam, Xtaty~enwiki, Qiuxing, Dicklyon, CapitalR, Freelancer685, Mjohnrussell, TestUser001,
Shorespirit, Talgalili, Thijs!bot, Wikid77, PBH~enwiki, Lovibond, Pichote, Jirka6, Frobnitzem, Stephreg, Albmont, Baccyak4H, Lfcampos,
Stevvers, User A1, Leyo, Cmghim925, Policron, STBotD, LoyalSoldier, Cerberus0, VolkovBot, DrMicro, Clay Spence, Asteadman, A4bot,
Mundhenk, Thric3, Quietbritishjim, SieBot, Rlendog, Tommyjs, Donmegapoppadoc, RSchlicht, Jason Goldstick, Melcombe, JL-Bot, ClueBot,
Dshutin, Alpapad, ClaudeLo, LSFenster, SamuelTheGhost, Shabbychef, True rover, Sun Creator, Frau K, Galapah, Qwfp, Bethb88, Sandrobt,
Gjnaasaa, Abtweed98, MystBot, Paulginz, Tayste, Addbot, CanadianLinuxUser, MrOllie, Yobot, Wjastle, Nallimbot, Langmore, AnomieBOT,
JonathanWilliford, Umpi77, Wiki5d, DirlBot, Bdmy, GrouchoBot, ChristopherKingChemist, Supergrane, Damiano.varagnolo, Entropeter,
FrescoBot, Nicolas Perrault III, Narc813, Aple123, Tom.Reding, AmphBot, Apocralyptic, Plasticspork, Amonet, Dinamik-bot, Bobmath,
Patrke, Bitlemon, Kastchei, ZéroBot, Quondum, SporkBot, Vminin, Mikhail Ryazanov, ClueBot NG, Mathstat, Xuehuit, Mpaa, Qzxpqbp,
WJVaughn3, Mich8611, Rockykumar1982, Nickfeng88, Josvebot, Solomon7968, Perspectiva8, Manoguru, Dlituiev, BattyBot, Illia Connell,
AppliedMathematics, Wqwz.wqwz1, Evan Aad, Drewblasius, Rodionova.alenka, Tengyaow, Maththerkel, BeyondNormality, Mingyuanzhou,
Berzoi075, Monkbot, Julesmath, Velvel2, Krzyswit2 and Anonymous: 236
• Generalized gamma distribution Source: https://en.wikipedia.org/wiki/Generalized_gamma_distribution?oldid=640669679 Contributors:
Brenton, Jheald, Drbreznjev, Headbomb, Bdemeshev, Melcombe, Qwfp, Yobot, Mathstat, Helpful Pixie Bot, BeyondNormality and Anony-
mous: 2
• Generalized Pareto distribution Source: https://en.wikipedia.org/wiki/Generalized_Pareto_distribution?oldid=682671471 Contributors: Michael
Hardy, Zickzack, Rlendog, Melcombe, UKoch, Leonsoftware, Yobot, AnomieBOT, Citation bot, Isheden, Mathstat, Helpful Pixie Bot, Moj-
dadyr, Dexbot, Limit-theorem, Roboloni, Maththerkel, BeyondNormality and Anonymous: 9
• Gamma/Gompertz distribution Source: https://en.wikipedia.org/wiki/Gamma/Gompertz_distribution?oldid=667197172 Contributors: Michael
Hardy, Dawnfire999, RHaworth, Rjwilmsi, Wikid77, MadmanBot, Melcombe, Dthomsen8, Yobot, FrescoBot, Jeffreyhokanson, BeyondNor-
mality, Monkbot and Anonymous: 3
• Gompertz distribution Source: https://en.wikipedia.org/wiki/Gompertz_distribution?oldid=684371322 Contributors: Michael Hardy, Dawn-
fire999, Rjwilmsi, Kinu, Naraht, Tim bates, Wikid77, Melcombe, Excirial, Muro Bot, Addbot, Yobot, WikiDan61, Ptbotgourou, AnomieBOT,
Csigabi, LilHelpa, Trappist the monk, Svarul, BattyBot, Chancelade, CalculusOfVariations, BeyondNormality, Monkbot and Anonymous: 13
• Half-normal distribution Source: https://en.wikipedia.org/wiki/Half-normal_distribution?oldid=669792770 Contributors: Michael Hardy,
Jérôme, PAR, Btyner, Iwaterpolo, Tevyeguy, User A1, VolkovBot, Rlendog, Melcombe, Qwfp, Addbot, Srw1138, Anders Sandberg, Wateenel-
lende, Wikipelli, ZéroBot, Chire, Fjoelskaldr, CarlWesolowski, BeyondNormality, Juen and Anonymous: 14
• Hotelling’s T-squared distribution Source: https://en.wikipedia.org/wiki/Hotelling’{}s_T-squared_distribution?oldid=682569932 Contrib-
utors: Michael Hardy, Tomi, Cherkash, Robinh, Giftlite, MarkSweep, Bender235, 3mta3, Eric Kvaalen, PAR, Sean3000, Btyner, YurikBot,
Adfernandes, Ck lostsword, JorisvS, Thijs!bot, Escarbot, RogierBrussee, R'n'B, Ged.R, Slysplace, Thefellswooper, Melcombe, Shabbychef,
Qwfp, Addbot, Aboctok, LaaknorBot, Yobot, Xqbot, GrouchoBot, Kiefer.Wolfowitz, Tom.Reding, MehdiPedia, Amonet, Kastchei, Emaus-
Bot, ZéroBot, U3964057, BG19bot, Zzyxxyzz, Mgcampb, Attleboro, Gbstats, Wittawat, BeyondNormality, I am grungy and Anonymous:
13
• Inverse Gaussian distribution Source: https://en.wikipedia.org/wiki/Inverse_Gaussian_distribution?oldid=679292016 Contributors: Michael
Hardy, Tomi, Giftlite, MisterSheik, Oleg Alexandrov, David Haslam, Jfr26, Btyner, Krishnavedala, SmackBot, Aastrup, Iwaterpolo, Memming,
Oceanh, Rhfeng, LandruBek, Wikid77, Felipehsantos, LachlanA, Sterrys, Baccyak4H, User A1, Dima373, NickMulgan, DrMicro, Sheppa28,
Deavik, Melcombe, Joelecohen, Alexbot, Qwfp, Abtweed98, Addbot, Yobot, Wjastle, AnomieBOT, Kristjan.Jonasson, Vana Seshadri, En-
tropeter, FrescoBot, The real moloch57, RedBot, Dalba, Tjagger, ZéroBot, Batman50, Zfeinst, Braincricket, GKSmyth, BG19bot, Manoguru,
Dexbot, AppliedMathematics, Limit-theorem, Bluemix, BeyondNormality, Llbuaa and Anonymous: 35
• Lévy distribution Source: https://en.wikipedia.org/wiki/L%C3%A9vy_distribution?oldid=667008179 Contributors: Michael Hardy, Sebas-
tianHelm, Giftlite, Night Gyr, Delius, DBrane, Tsirel, Eric Kvaalen, Ynhockey, PAR, Gene Nygaard, Jfr26, Btyner, Mathbot, Krishnavedala,
Gaius Cornelius, Dysmorodrepanis~enwiki, Maechler, Digfarenough, SmackBot, Saihtam, Nbarth, Ligulembot, Caviare, Xcentaur, Gbellocchi,
Kloveland, Thijs!bot, Wainson, Lovibond, GirasoleDE, Rlendog, Sheppa28, Melcombe, Badger Drink, MelonBot, Qwfp, Addbot, AndersBot,
Tassedethe, 84user, Yobot, Csigabi, Ptrf, Kerack, GrouchoBot, FrescoBot, PyonDude, Kastchei, ZéroBot, WJVaughn3, Smarket~enwiki,
Helpful Pixie Bot, AvocatoBot, Uniquejeff, JamieBallingall, BeyondNormality, Parpid and Anonymous: 19
• Log-Cauchy distribution Source: https://en.wikipedia.org/wiki/Log-Cauchy_distribution?oldid=674541101 Contributors: Michael Hardy,
Corvi42, Rjwilmsi, A bit iffy, Racklever, Myasuda, Katharineamy, Rlendog, Melcombe, Qwfp, Addbot, Ben Ben, Citation bot, RjwilmsiBot,
ZéroBot, Helpful Pixie Bot, George Ponderevo, GoldAccount, Lemnaminor, Stamptrader, BeyondNormality and Anonymous: 1
• Log-Laplace distribution Source: https://en.wikipedia.org/wiki/Log-Laplace_distribution?oldid=637638283 Contributors: Michael Hardy,
Oleg Alexandrov, Ladislav Mecir, Alaibot, Johnlv12, Rlendog, Melcombe, Midx1004, Addbot, Luckas-bot, Yobot, Helpful Pixie Bot, Beyond-
Normality and Anonymous: 1
• Log-logistic distribution Source: https://en.wikipedia.org/wiki/Log-logistic_distribution?oldid=683180052 Contributors: John Vandenberg,
Jérôme, Rjwilmsi, PeterSymonds, Chris the speller, Aztek41, Fetchcomms, DonAndre, VolkovBot, Rlendog, Sheppa28, Melcombe, Sevilledade,
Qwfp, Addbot, GargoyleBot, Alexenderius, AndersBot, Yobot, Citation bot, FrescoBot, Citation bot 1, Tom.Reding, RjwilmsiBot, Asitgoes,
Masssly, Helpful Pixie Bot, Solomon7968, Illia Connell, Mark viking, Ddev55, BeyondNormality and Anonymous: 6
3.1. TEXT 231

• Log-normal distribution Source: https://en.wikipedia.org/wiki/Log-normal_distribution?oldid=688668875 Contributors: AxelBoldt, Bryan


Derksen, Michael Hardy, Tomi, Tkinias, Cherkash, Jitse Niesen, Fredrik, Schutz, Meduz, Wile E. Heresiarch, Weialawaga~enwiki, Giftlite,
Bfinn, Paul Pogonyshev, Rubik-wuerfel, Urhixidur, Ta bu shi da yu, Ciemo, Dbachmann, Alue, ZeroOne, MisterSheik, Zachlipton, PAR, Pon-
tus, Cburnett, Danhash, Evil Monkey, ^demon, Jeff3000, Btyner, RichardWeiss, Rjwilmsi, Raddick, MZMcBride, FlaBot, Jimt075~enwiki,
Krishnavedala, Volunteer Marek, Nehalem, Bgwhite, YurikBot, Wavelength, Encyclops, Cleared as filed, Schmock, Cmglee, Lunch, SmackBot,
Unyoyega, Mcld, Nbarth, Iwaterpolo, Berland, Khukri, Autopilot, Ocatecir, Martinp23, Phoxhat, Osbornd, Mrdthree, A. Pichler, Floklk, Don-
keyKong64, Jackzhp, Myasuda, NonDucor, Thijs!bot, PBH~enwiki, Pichote, Erxnmedia, IanOsgood, Sterrys, BenB4, Magioladitis, Albmont,
Baccyak4H, User A1, Rmaus, Ricardogpn, Mange01, Thomasda, Lojikl, Mikael Häggström, VolkovBot, DrMicro, The Siktath, Philip True-
man, Leav, Edutabacman, Stigin, ColinGillespie, SieBot, Rlendog, Sheppa28, Gknor, Ciberelm, Acct4, Oxymoron83, Techman224, OKBot,
Sairvinexx, Water and Land, Melcombe, Martarius, ClueBot, David.hilton.p, Philtime~enwiki, Biochem67, Umpi~enwiki, Qwfp, Humanengr,
Skunkboy74, Porejide, Addbot, Fgnievinski, Till Riffert, MagnusA.Bot, Mdnahas, Wikomidia, ‫ירון‬, Seriousme, RobertHannah89, Yobot, 2D,
Wjastle, AnomieBOT, Erel Segal, Safdarmarwat, Frederic Y Bois, LilHelpa, Hxu, Isheden, Joxemai, Constructive editor, Rgbcmy, LucienBOT,
Hobsonlane, Nixphoeni, Gausseliminering, Jonesey95, Stpasha, MondalorBot, IhorLviv, Trappist the monk, Occawen, Dinamik-bot, Rjwilm-
siBot, Llnr, Vincent Semeria, Dewritech, GoingBatty, DavidMCEddy, Rhowell77, Letsgoexploring, Donner60, Ashkax, Christian Damgaard,
Nite1010, ClueBot NG, Asitgoes, Psorakis, P.O.E., Helpful Pixie Bot, Mishnadar, BG19bot, Fluctuator, Jan Spousta, Jetlee0618, Manoguru,
BattyBot, Cyberbot II, Dexbot, Kiwi4boy, Mogism, Lbwhu, StriatumPDM, Mivus, Limit-theorem, Kondormari, Epicgenius, AndreaGerali,
EJM86, Kuperov, HolgerBrandsmeier, Dmontier, BeyondNormality, Monkbot, Srijankedia, Ralequi, RESLND, GNAAWOWHM, Laubeg,
Isambard Kingdom, Guilhermesalome and Anonymous: 201
• Lomax distribution Source: https://en.wikipedia.org/wiki/Lomax_distribution?oldid=646099981 Contributors: Michael Hardy, Nabla, Head-
bomb, David Eppstein, Rlendog, Melcombe, UKoch, Addbot, AnomieBOT, Isheden, FrescoBot, Asitgoes, Mathstat, Purple Post-its, Beyond-
Normality and Anonymous: 4
• Geometric stable distribution Source: https://en.wikipedia.org/wiki/Geometric_stable_distribution?oldid=681853267 Contributors: Michael
Hardy, Myasuda, Vigyani, Rlendog, Melcombe, Qwfp, Addbot, Thehelpfulbot, SporkBot, Helpful Pixie Bot, Buffbills7701, BeyondNormality,
Levkleb and Anonymous: 1
• Nakagami distribution Source: https://en.wikipedia.org/wiki/Nakagami_distribution?oldid=617448036 Contributors: Michael Hardy, Selket,
Giftlite, Oleg Alexandrov, Hgkamath, Vegaswikian, Krishnavedala, SmackBot, Alksub, Mgiganteus1, Wafulz, Cydebot, Alaibot, Harish vic-
tory, Baccyak4H, VladimirSlavik, EightiesRocker, BusaJD~enwiki, Melcombe, Excesskurtosis, Qwfp, Addbot, AnomieBOT, EmausBot,
ZéroBot, ClueBot NG, Golaleh, BeyondNormality, Monkbot and Anonymous: 13
• Pareto distribution Source: https://en.wikipedia.org/wiki/Pareto_distribution?oldid=684130381 Contributors: AxelBoldt, Bryan Derksen,
The Anome, Heron, Edward, Michael Hardy, Tomi, Glenn, BenKovitz, Phil Boswell, Robbot, Benwing, Henrygb, Giftlite, Paul Pogonyshev,
SanderSpek~enwiki, Noe, Antandrus, Beland, MarkSweep, Chadernook, Vivacissamamente, Fenice, MisterSheik, Nannen, O18, JavOs, Eric
Kvaalen, PAR, Mindmatrix, David Haslam, Btyner, DaveApter, Rjwilmsi, Hgkamath, FlaBot, Bubbleboys, Carrionluggage, Clark Kent, Shell
Kinney, Nowa, Avraham, Lendu, ChemGardener, SmackBot, Reedy, Melchoir, Mcld, Jprg1966, Nbarth, Gruzd, A. B., Iwaterpolo, Hve,
Alexxandros, Dreftymac, Joseph Solis in Australia, Cyberyder, Jive Dadson, Courcelles, Vyznev Xnebara, Teratornis, Talgalili, Headbomb,
PBH~enwiki, WinBot, Widefox, Lovibond, Mack2, Olaf, Magioladitis, Am rods, Wprestong, User A1, Shomoita, Mange01, Policron, Tatrgel,
Paintitblack ft, DrMicro, Tkmckenzie, Philip Trueman, Enigmaman, PhysPhD, Rlendog, OKBot, Water and Land, Melcombe, ClueBot, Rock
soup, UKoch, LunaDeFerrari, Doobliebop, Qwfp, EdChem, Rror, Dthomsen8, SilvonenBot, MystBot, Addbot, MrVanBot, Yobot, Ptbot-
gourou, Wjastle, AnomieBOT, Citation bot, Sergey Suslov, DirlBot, Obersachsebot, Xqbot, DSisyphBot, Isheden, Srich32977, Chuanren,
Joxemai, Scalimani, Undsoweiter, Boxplot, Stpasha, Fentlehan, Trappist the monk, Msghani, Sander69, RjwilmsiBot, EmausBot, Fpoursafaei,
Chaohuang, P3^1$Problems, Cogiati, Ida Shaw, AMenteLibera, Financestudent, Mikhail Ryazanov, ClueBot NG, Asitgoes, Buenas días,
Mathstat, Marsianus, J sandstrom, Probabilityislogic, Helpful Pixie Bot, Mgil83, Bibcode Bot, Purple Post-its, ElphiBot, AvocatoBot, Plutol-
ogist, Lindsayprior, Pankaj303er, Mojdadyr, Illia Connell, Smason79, Frosty, Jpaulson77, BeyondNormality, Monkbot, Danvildanvil, Lmpt,
KasparBot and Anonymous: 121
• Pearson distribution Source: https://en.wikipedia.org/wiki/Pearson_distribution?oldid=679009835 Contributors: Michael Hardy, Tomi, Jitse
Niesen, Chuunen Baka, Benwing, Giftlite, MarkSweep, Rich Farmbrough, Bcat, Eric Kvaalen, PAR, Zzyzx11, Btyner, BD2412, Rjwilmsi,
Martinpeter, Doc glasgow, Mathbot, Jengelh, SmackBot, Tevyeguy, Dicklyon, CBM, Karho.Yau, Jeff560, Robin S, Plasticup, Melcombe,
Sun Creator, MelonBot, Qwfp, DumZiBoT, Addbot, DOI bot, Lightbot, Yobot, Xqbot, Citation bot 1, Kastchei, Ethaniel, Bibcode Bot, Illia
Connell, Anrnusna, BeyondNormality, Monkbot and Anonymous: 23
• Phase-type distribution Source: https://en.wikipedia.org/wiki/Phase-type_distribution?oldid=685873565 Contributors: Michael Hardy, Cameron
Dewe, Bjcairns, Charles Matthews, Benwing, Giftlite, ChrisRuvolo, Pearle, Btyner, Graham87, Chenxlee, Epolk, Basten, Gareth Jones, Smack-
Bot, Goolies flock, Abadpour, Skittleys, Bobblehead, LachlanA, Chaostik, Sarahj2107, R'n'B, M-le-mot-dit, Slysplace, Donmegapoppadoc,
Aiden Fisher, Melcombe, Mgrfan, Niceguyedc, Bender2k14, Muhandes, Aitias, GeorgeTh, Qwfp, Citation bot, Syngola, Foobarnix, Tim1357,
JA(000)Davidson, Robbiemorrison, BSide, LagrangeX, Helpful Pixie Bot, BattyBot, Illia Connell, Dexbot, BeyondNormality, Huxiangking
and Anonymous: 27
• Rayleigh distribution Source: https://en.wikipedia.org/wiki/Rayleigh_distribution?oldid=688531864 Contributors: Zundark, Michael Hardy,
Tomi, Timwi, Pigsonthewing, Giftlite, Paul August, El C, Kwamikagami, Kghose, O18, PAR, Rabarberski, Cburnett, Gene Nygaard, Btyner,
Nanite, Mathbot, BjKa, Kri, Chobot, Krishnavedala, Nzbuu, Splash, Petter Strandmark, Crasshopper, KnightRider~enwiki, Jonathanwagner,
Manfreeed, Mcld, Iwaterpolo, Can't sleep, clown will eat me, Memming, Ferminmx, LandruBek, Mrdthree, Dgianotti, Zylorian, Dougher,
Deflective, Olaf, Fabometric, Ensign beedrill, User A1, MartinBot, Steve8675309, Dbooksta, Anoko moonlight, OKBot, Melcombe, UKoch,
Mel aad, Alexbot, PixelBot, Cacadril, Qwfp, Xavierstuvw, Addbot, Luckas-bot, Yobot, Wjastle, Amirobot, AnomieBOT, In digma, Xqbot,
DSisyphBot, Control.valve, GrouchoBot, WaysToEscape, FrescoBot, Jc3s5h, Briardew, MastiBot, Bzzzzzzzzster, Kastchei, Yuzisee, Mathstat,
Larsribe, Skarmenadius, Davidagross, DudShan2, Cerabot~enwiki, Wavelet transformer, BeyondNormality, Teowey and Anonymous: 74
• Rayleigh mixture distribution Source: https://en.wikipedia.org/wiki/Rayleigh_mixture_distribution?oldid=547965312 Contributors: Bearcat,
Pigsonthewing, Dgianotti, Katharineamy, Melcombe and ArticlesForCreationBot
232 CHAPTER 3. TEXT AND IMAGE SOURCES, CONTRIBUTORS, AND LICENSES

• Rice distribution Source: https://en.wikipedia.org/wiki/Rice_distribution?oldid=662647908 Contributors: Michael Hardy, Tomi, Phil Boswell,
GreatWhiteNortherner, Giftlite, Jason Quinn, Rich Farmbrough, MisterSheik, O18, Mdd, PAR, Btyner, BD2412, Hgkamath, Mosama, Gad-
get850, Sbyrnes321, SmackBot, Iwaterpolo, Rogerbrent, Dicklyon, WikiSlasher, Sijbers, User A1, Steve8675309, Mange01, Ged.R, Dbook-
sta, Phe-bot, Melcombe, Mild Bill Hiccup, Dndung, UKoch, Auntof6, Gsharaf, Qwfp, Voice In The Wilderness, Addbot, Fgnievinski, Yobot,
AnomieBOT, Xqbot, Control.valve, Aardliu, FrescoBot, Stpasha, Ofir michael, TobeBot, Lotje, Dinamik-bot, Kastchei, Yuzisee, Rename-
dUser01302013, ThinkOutsideTheBox101, StanfordCommSci, Yclnjust, Thatsnotaname, Mathstat, Helpful Pixie Bot, ServiceAT, Zvord, Be-
yondNormality and Anonymous: 40
• Shifted Gompertz distribution Source: https://en.wikipedia.org/wiki/Shifted_Gompertz_distribution?oldid=638037374 Contributors: Giftlite,
Dawnfire999, Oleg Alexandrov, Koavf, Krishnavedala, Iwaterpolo, Danhoppe, Mack2, User A1, Josuechan, Melcombe, Qwfp, DOI bot, Yobot,
Yonseca, Citation bot 1, Solarra, Helpful Pixie Bot, BeyondNormality and Anonymous: 8
• Type-2 Gumbel distribution Source: https://en.wikipedia.org/wiki/Type-2_Gumbel_distribution?oldid=585386070 Contributors: SimonP,
Charles Matthews, Selket, MarkSweep, Eric Kvaalen, PAR, SMesser, Btyner, SmackBot, Rrburke, Derek farn, Trakesht, Faermi, Aiden Fisher,
Philtime~enwiki, Qwfp, Addbot, MondalorBot, Penguinnerd121, Pgrinspan, Kondormari, BeyondNormality and Anonymous: 9
• Weibull distribution Source: https://en.wikipedia.org/wiki/Weibull_distribution?oldid=687881070 Contributors: AxelBoldt, Bryan Derksen,
Zundark, Michael Hardy, Tomi, Cyan, Doradus, MH~enwiki, Gobeirne, Giftlite, Tom harrison, Stern~enwiki, Uppland, Bender235, Mister-
Sheik, Kghose, O18, Smalljim, Craigy144, PAR, Wtmitchell, Cburnett, David Haslam, Btyner, Agriculture, Rjwilmsi, YurikBot, RobotE,
Corecode~enwiki, Anomalocaris, Gareth Jones, TDogg310, Avraham, Joanmg, Xareu bs, Darrel francis, Mebden, Sandeep4tech, SmackBot,
WalNi, Diegotorquemada, Jason A Johnson, Iwaterpolo, Eliezg, Argyriou, Chlewbot, Dmh~enwiki, Corfuman, RekishiEJ, A. Pichler, Alf-
pooh~enwiki, Lachambre, Jfcorbett, Janlo, Bgamari, Prof. Frink, Pleitch, Felipehsantos, Nick Number, G Furtado, LachlanA, AntiVandalBot,
Mack2, Gcm, Olaf, Homunq, JJ Harrison, KenT, Edratzer, GuidoGer, Samikrc, Kpmiyapuram, Salih, Policron, Sam Blacketer, VolkovBot,
Oznickr, TXiKiBoT, Rei-bot, Danielc192, Methou, Rlendog, Phe-bot, Dhatfield, OKBot, Water and Land, Melcombe, GioCM, Robertm-
baldwin, ClueBot, Alexbot, Calimo, Qwfp, DumZiBoT, Saad31, MystBot, Addbot, LaaknorBot, Tassedethe, Yobot, Wjastle, AnomieBOT,
Wiki5d, LilHelpa, Yanyanjun, Isheden, GrouchoBot, RibotBOT, FrescoBot, J6w5, Gausseliminering, Sławomir Biały, Citation bot 1, Stpasha,
RedBot, Trappist the monk, Michel192cm, Ale And Quail, Strypd, RjwilmsiBot, Jowa fan, EmausBot, WikitanvirBot, Wikipelli, Slawekb,
Fæ, JA(000)Davidson, Erianna, Rickysmithcmrp, Emilpohl, ClueBot NG, Asitgoes, WJVaughn3, Alexey Sanko, Helpful Pixie Bot, Epzsl2,
BG19bot, Sriharid, Arr0008, SirEdvin, ReconditeRodent, Dough34, Pioneer colonel, BeyondNormality, Mr.khassi, GVpep and Anonymous:
113

3.2 Images
• File:Ambox_important.svg Source: https://upload.wikimedia.org/wikipedia/commons/b/b4/Ambox_important.svg License: Public domain
Contributors: Own work, based off of Image:Ambox scales.svg Original artist: Dsmurat (talk · contribs)
• File:Benford-physical.svg Source: https://upload.wikimedia.org/wikipedia/commons/8/82/Benford-physical.svg License: Public domain Con-
tributors: Transferred from en.wikipedia to Commons by User:Tam0031 using CommonsHelper. Original artist: Drnathanfurious at en.wikipedia
• File:BenfordBroad.gif Source: https://upload.wikimedia.org/wikipedia/commons/d/de/BenfordBroad.png License: Public domain Contrib-
utors: Own work Original artist: Sbyrnes321
• File:BenfordNarrow.gif Source: https://upload.wikimedia.org/wikipedia/commons/c/c8/BenfordNarrow.gif License: Public domain Con-
tributors: Own work Original artist: Sbyrnes321
• File:Benfords_law_illustrated_by_world’{}s_countries_population.png Source: https://upload.wikimedia.org/wikipedia/commons/0/0b/
Benfords_law_illustrated_by_world%27s_countries_population.png License: CC BY-SA 3.0 Contributors: Own work Original artist: Jakob.scholbach
• File:Bernoulli_distribution_chart.jpg Source: https://upload.wikimedia.org/wikipedia/commons/d/d5/Bernoulli_distribution_chart.jpg Li-
cense: CC BY-SA 4.0 Contributors: Own work Original artist: Runner1928
• File:Beta-binomial_cdf.png Source: https://upload.wikimedia.org/wikipedia/commons/a/a4/Beta-binomial_cdf.png License: CC BY-SA
3.0 Contributors: Own work Original artist: Nschuma
• File:Beta-binomial_distribution_pmf.png Source: https://upload.wikimedia.org/wikipedia/commons/e/e1/Beta-binomial_distribution_pmf.
png License: CC BY-SA 3.0 Contributors: Entirely my own work Original artist: Nschuma
• File:Beta_distribution_pdf.png Source: https://upload.wikimedia.org/wikipedia/commons/9/9a/Beta_distribution_pdf.png License: CC-
BY-SA-3.0 Contributors: ? Original artist: ?
• File:Beta_prime_cdf.svg Source: https://upload.wikimedia.org/wikipedia/commons/0/08/Beta_prime_cdf.svg License: CC0 Contributors:
Own work Original artist: Krishnavedala
• File:Beta_prime_pdf.svg Source: https://upload.wikimedia.org/wikipedia/commons/1/18/Beta_prime_pdf.svg License: CC0 Contributors:
Own work Original artist: Krishnavedala
• File:Binomial_Distribution.svg Source: https://upload.wikimedia.org/wikipedia/commons/b/b7/Binomial_Distribution.svg License: CC BY-
SA 3.0 Contributors: Own work. Derived from File:BinDistApprox large.png by Xiao Fei, released under GFDL/CC-BY-SA-3.0. Original
artist: cflm (talk)
• File:Binomial_distribution_cdf.svg Source: https://upload.wikimedia.org/wikipedia/commons/5/55/Binomial_distribution_cdf.svg License:
Public domain Contributors: Own work Original artist: Tayste
• File:Binomial_distribution_pmf.svg Source: https://upload.wikimedia.org/wikipedia/commons/7/75/Binomial_distribution_pmf.svg License:
Public domain Contributors: Own work Original artist: Tayste
3.2. IMAGES 233

• File:Biologist_and_statistician_Ronald_Fisher.jpg Source: https://upload.wikimedia.org/wikipedia/commons/3/37/Biologist_and_statistician_


Ronald_Fisher.jpg License: CC BY 2.0 Contributors: https://www.flickr.com/photos/internetarchivebookimages/20150531109/ Original artist:
Flikr commons
• File:CDF-log_normal_distributions.svg Source: https://upload.wikimedia.org/wikipedia/commons/4/42/CDF-log_normal_distributions.svg
License: CC0 Contributors: Own work Original artist: Krishnavedala
• File:Cauchy_pdf.svg Source: https://upload.wikimedia.org/wikipedia/commons/8/8c/Cauchy_pdf.svg License: CC BY 3.0 Contributors: Own
work Original artist: Skbkekas
• File:Chernoff_XS_CDF.png Source: https://upload.wikimedia.org/wikipedia/en/b/ba/Chernoff_XS_CDF.png License: CC-BY-SA-3.0 Con-
tributors:
Own work
Original artist:
Willem
• File:Chi-square_cdf.svg Source: https://upload.wikimedia.org/wikipedia/commons/0/01/Chi-square_cdf.svg License: CC BY 3.0 Contrib-
utors: Own work Original artist: Geek3
• File:Chi-square_distributionPDF.png Source: https://upload.wikimedia.org/wikipedia/commons/2/21/Chi-square_distributionPDF.png Li-
cense: Public domain Contributors: ? Original artist: ?
• File:Chi-square_pdf.svg Source: https://upload.wikimedia.org/wikipedia/commons/3/35/Chi-square_pdf.svg License: CC BY 3.0 Contrib-
utors: Own work Original artist: Geek3
• File:Chi_distribution_CDF.svg Source: https://upload.wikimedia.org/wikipedia/commons/1/15/Chi_distribution_CDF.svg License: CC0
Contributors: Own work Original artist: Krishnavedala
• File:Chi_distribution_PDF.svg Source: https://upload.wikimedia.org/wikipedia/commons/3/35/Chi_distribution_PDF.svg License: CC0
Contributors: Own work Original artist: Krishnavedala
• File:Chi_on_SAS.png Source: https://upload.wikimedia.org/wikipedia/en/9/96/Chi_on_SAS.png License: Fair use Contributors:
Provided by SAS Institute Inc. Originally published on the SAS blog here. Original artist: ?
• File:Commons-logo.svg Source: https://upload.wikimedia.org/wikipedia/en/4/4a/Commons-logo.svg License: ? Contributors: ? Original
artist: ?
• File:Comparison_mean_median_mode.svg Source: https://upload.wikimedia.org/wikipedia/commons/d/de/Comparison_mean_median_
mode.svg License: CC BY-SA 3.0 Contributors: Own work Original artist: Cmglee
• File:ComplementaryWalleniusNoncentralHypergeometric1.png Source: https://upload.wikimedia.org/wikipedia/en/f/fb/ComplementaryWalleniusNoncentralH
png License: Cc-by-sa-3.0 Contributors:
Own work
Original artist:
Arnold90 (talk) (Uploads)
• File:Cumulative_distribution_function_of_Pareto_distribution.svg Source: https://upload.wikimedia.org/wikipedia/commons/2/2a/Cumulative_
distribution_function_of_Pareto_distribution.svg License: CC BY-SA 3.0 Contributors: Own work Original artist: Danvildanvil
• File:Dagum_Distribution_(pdf).png Source: https://upload.wikimedia.org/wikipedia/commons/a/a1/Dagum_Distribution_%28pdf%29.png
License: CC BY-SA 3.0 Contributors: Own work Original artist: GonzoEcon
• File:Degenerate.svg Source: https://upload.wikimedia.org/wikipedia/commons/2/2a/Degenerate.svg License: CC BY-SA 4.0 Contributors:
Own work Original artist: IkamusumeFan
• File:Degenerate_distribution_PMF.png Source: https://upload.wikimedia.org/wikipedia/commons/9/9e/Degenerate_distribution_PMF.png
License: Public domain Contributors: ? Original artist: ?
• File:Exponential_cdf.svg Source: https://upload.wikimedia.org/wikipedia/commons/b/ba/Exponential_cdf.svg License: CC BY 3.0 Contrib-
utors: This graphic was created with matplotlib. Original artist: Skbkekas
• File:Exponential_pdf.svg Source: https://upload.wikimedia.org/wikipedia/commons/e/ec/Exponential_pdf.svg License: CC BY 3.0 Contrib-
utors: This graphic was created with matplotlib. Original artist: Skbkekas
• File:F_dist_cdf.svg Source: https://upload.wikimedia.org/wikipedia/commons/8/8e/F_dist_cdf.svg License: CC BY-SA 4.0 Contributors:
Own work Original artist: IkamusumeFan
• File:F_pdf.svg Source: https://upload.wikimedia.org/wikipedia/commons/9/92/F_pdf.svg License: CC BY-SA 4.0 Contributors: Own work
Original artist: IkamusumeFan
• File:Fisher_iris_versicolor_sepalwidth.svg Source: https://upload.wikimedia.org/wikipedia/commons/4/40/Fisher_iris_versicolor_sepalwidth.
svg License: CC BY-SA 3.0 Contributors: en:Image:Fisher iris versicolor sepalwidth.png Original artist: en:User:Qwfp (original); Pbroks13
(talk) (redraw)
• File:FishersNoncentralHypergeometric1.png Source: https://upload.wikimedia.org/wikipedia/en/2/22/FishersNoncentralHypergeometric1.
png License: Cc-by-sa-3.0 Contributors: ? Original artist: ?
• File:FitExponDistr.tif Source: https://upload.wikimedia.org/wikipedia/commons/e/e4/FitExponDistr.tif License: CC BY-SA 3.0 Contribu-
tors: Own work Original artist: Buenas días
• File:FitFrechetDistr.tif Source: https://upload.wikimedia.org/wikipedia/commons/8/8f/FitFrechetDistr.tif License: CC BY-SA 3.0 Contrib-
utors: Own work Original artist: Buenas días
234 CHAPTER 3. TEXT AND IMAGE SOURCES, CONTRIBUTORS, AND LICENSES

• File:FitLog-logisticdistr.tif Source: https://upload.wikimedia.org/wikipedia/commons/9/98/FitLog-logisticdistr.tif License: Public domain


Contributors: Own work Original artist: Buenas días
• File:FitLogNormDistr.tif Source: https://upload.wikimedia.org/wikipedia/commons/f/f0/FitLogNormDistr.tif License: CC BY-SA 3.0 Con-
tributors: Own work Original artist: Buenas días
• File:FitParetoDistr.tif Source: https://upload.wikimedia.org/wikipedia/commons/f/f4/FitParetoDistr.tif License: CC BY-SA 3.0 Contribu-
tors: Own work Original artist: Buenas días
• File:FitWeibullDistr.tif Source: https://upload.wikimedia.org/wikipedia/commons/2/29/FitWeibullDistr.tif License: Public domain Con-
tributors: Own work Original artist: Buenas días
• File:Folded_normal_cdf.svg Source: https://upload.wikimedia.org/wikipedia/commons/4/40/Folded_normal_cdf.svg License: Public do-
main Contributors: ? Original artist: ?
• File:Folded_normal_pdf.svg Source: https://upload.wikimedia.org/wikipedia/commons/0/02/Folded_normal_pdf.svg License: Public do-
main Contributors: ? Original artist: ?
• File:Folder_Hexagonal_Icon.svg Source: https://upload.wikimedia.org/wikipedia/en/4/48/Folder_Hexagonal_Icon.svg License: Cc-by-sa-
3.0 Contributors: ? Original artist: ?
• File:Frechet_cdf.svg Source: https://upload.wikimedia.org/wikipedia/commons/d/dd/Frechet_cdf.svg License: GFDL Contributors: Self-
made using python with numpy and matplotlib. Original artist: user:Arthena
• File:Frechet_pdf.svg Source: https://upload.wikimedia.org/wikipedia/commons/e/e0/Frechet_pdf.svg License: GFDL Contributors: Self-
made using python with numpy and matplotlib. Original artist: user:Arthena
• File:Gamma-KL-3D.png Source: https://upload.wikimedia.org/wikipedia/commons/8/8b/Gamma-KL-3D.png License: CC BY-SA 3.0 Con-
tributors:
• Transferred from en.wikipedia by Ronhjones Original artist: Mundhenk at en.wikipedia
• File:Gamma-PDF-3D.png Source: https://upload.wikimedia.org/wikipedia/commons/b/b1/Gamma-PDF-3D.png License: CC BY-SA 3.0
Contributors:
• Transferred from en.wikipedia by Ronhjones Original artist: Mundhenk at en.wikipedia
• File:Gamma_Gompertz_cumulative_distribution_function.png Source: https://upload.wikimedia.org/wikipedia/commons/c/c7/Gamma_
Gompertz_cumulative_distribution_function.png License: CC BY-SA 3.0 Contributors: Own work Original artist: Dawnfire999
• File:Gamma_Gompertz_probability_distribution.png Source: https://upload.wikimedia.org/wikipedia/commons/9/95/Gamma_Gompertz_
probability_distribution.png License: CC BY-SA 3.0 Contributors: Own work Original artist: Dawnfire999
• File:Gamma_distribution_cdf.svg Source: https://upload.wikimedia.org/wikipedia/commons/8/8d/Gamma_distribution_cdf.svg License:
CC-BY-SA-3.0 Contributors:
• Gamma_distribution_cdf.png Original artist: Gamma_distribution_cdf.png: MarkSweep and Cburnett
• File:Gamma_distribution_pdf.svg Source: https://upload.wikimedia.org/wikipedia/commons/e/e6/Gamma_distribution_pdf.svg License:
CC-BY-SA-3.0 Contributors:
• Gamma_distribution_pdf.png Original artist: Gamma_distribution_pdf.png: MarkSweep and Cburnett
• File:Gompertz_cum_dist_nokey.png Source: https://upload.wikimedia.org/wikipedia/commons/2/2b/Gompertz_cum_dist_nokey.png Li-
cense: CC BY-SA 3.0 Contributors: Own work Original artist: Dawnfire999
• File:Gompertz_distrbution.png Source: https://upload.wikimedia.org/wikipedia/commons/5/54/Gompertz_distrbution.png License: CC BY-
SA 3.0 Contributors: Own work Original artist: Svarul
• File:Laplace_distribution_pdf.png Source: https://upload.wikimedia.org/wikipedia/commons/8/89/Laplace_distribution_pdf.png License:
CC-BY-SA-3.0 Contributors: ? Original artist: ?
• File:Levy0_LdistributionPDF.svg Source: https://upload.wikimedia.org/wikipedia/commons/5/5d/Levy0_LdistributionPDF.svg License:
CC0 Contributors: Own work Original artist: Krishnavedala
• File:Levy0_distributionCDF.svg Source: https://upload.wikimedia.org/wikipedia/commons/7/75/Levy0_distributionCDF.svg License: CC0
Contributors: Own work Original artist: Krishnavedala
• File:Levy0_distributionPDF.svg Source: https://upload.wikimedia.org/wikipedia/commons/6/6e/Levy0_distributionPDF.svg License: CC0
Contributors: Own work Original artist: Krishnavedala
• File:LevyDistribution.png Source: https://upload.wikimedia.org/wikipedia/commons/5/58/LevyDistribution.png License: Public domain
Contributors: ? Original artist: ?
• File:Logarithmic_scale.png Source: https://upload.wikimedia.org/wikipedia/commons/8/8a/Logarithmic_scale.png License: Public domain
Contributors: Transferred from fr.wikipedia to Commons by Esp2008 using CommonsHelper. Original artist: HB at French Wikipedia
• File:Logcauchycdf.svg Source: https://upload.wikimedia.org/wikipedia/commons/3/3c/Logcauchycdf.svg License: CC BY-SA 3.0 Contrib-
utors: Own work Original artist: Qwfp
• File:Logcauchypdf.svg Source: https://upload.wikimedia.org/wikipedia/commons/f/f3/Logcauchypdf.svg License: CC BY-SA 3.0 Contrib-
utors: Own work Original artist: Qwfp
• File:Loglogisticcdf.svg Source: https://upload.wikimedia.org/wikipedia/commons/7/75/Loglogisticcdf.svg License: CC BY-SA 3.0 Contrib-
utors: Own work (Original text: self-made) Original artist: Qwfp (talk)
3.2. IMAGES 235

• File:Loglogistichaz.svg Source: https://upload.wikimedia.org/wikipedia/commons/1/1b/Loglogistichaz.svg License: CC BY-SA 3.0 Contrib-


utors: Transferred from en.wikipedia
Original artist: Qwfp (talk) Original uploader was Qwfp at en.wikipedia
• File:Loglogisticpdf.svg Source: https://upload.wikimedia.org/wikipedia/commons/e/e1/Loglogisticpdf.svg License: CC BY-SA 3.0 Contrib-
utors: Transferred from en.wikipedia
Original artist: Qwfp (talk) Original uploader was Qwfp at en.wikipedia
• File:Mean_exp.svg Source: https://upload.wikimedia.org/wikipedia/commons/d/d6/Mean_exp.svg License: CC BY-SA 3.0 Contributors:
Own work Original artist: Erzbischof
• File:Median_exp.svg Source: https://upload.wikimedia.org/wikipedia/commons/c/cc/Median_exp.svg License: CC BY-SA 3.0 Contributors:
Own work Original artist: Erzbischof
• File:Nakagami_cdf.svg Source: https://upload.wikimedia.org/wikipedia/commons/8/82/Nakagami_cdf.svg License: CC0 Contributors: Own
work Original artist: Krishnavedala
• File:Nakagami_pdf.svg Source: https://upload.wikimedia.org/wikipedia/commons/d/d0/Nakagami_pdf.svg License: CC0 Contributors: Own
work Original artist: Krishnavedala
• File:PDF-log_normal_distributions.svg Source: https://upload.wikimedia.org/wikipedia/commons/a/ae/PDF-log_normal_distributions.svg
License: CC0 Contributors: Own work Original artist: Krishnavedala
• File:PDF_Generalized_Pareto.svg Source: https://upload.wikimedia.org/wikipedia/commons/6/6d/PDF_Generalized_Pareto.svg License:
Public domain Contributors: self generated Original artist: [roboloni]
• File:PDF_invGauss.svg Source: https://upload.wikimedia.org/wikipedia/commons/a/a5/PDF_invGauss.svg License: CC0 Contributors: Own
work Original artist: Krishnavedala
• File:PDF_of_Pareto_Distribution.svg Source: https://upload.wikimedia.org/wikipedia/commons/f/f6/PDF_of_Pareto_Distribution.svg Li-
cense: CC BY-SA 3.0 Contributors: Own work Original artist: Sam Mason
• File:ParetoLorenzSVG.svg Source: https://upload.wikimedia.org/wikipedia/commons/b/be/ParetoLorenzSVG.svg License: CC BY-SA 4.0
Contributors: Own work Original artist: Tkmckenzie
• File:Pascal’{}s_triangle;_binomial_distribution.svg Source: https://upload.wikimedia.org/wikipedia/commons/1/17/Pascal%27s_triangle%
3B_binomial_distribution.svg License: Public domain Contributors: Own work Original artist: Watchduck (a.k.a. Tilman Piesk)
• File:Pearson_system.png Source: https://upload.wikimedia.org/wikipedia/commons/9/90/Pearson_system.png License: Public domain Con-
tributors: ? Original artist: ?
• File:Pearson_type_VII_distribution_PDF.png Source: https://upload.wikimedia.org/wikipedia/commons/9/96/Pearson_type_VII_distribution_
PDF.png License: Public domain Contributors: ? Original artist: ?
• File:People_icon.svg Source: https://upload.wikimedia.org/wikipedia/commons/3/37/People_icon.svg License: CC0 Contributors: OpenCli-
part Original artist: OpenClipart
• File:Poisson_pmf.svg Source: https://upload.wikimedia.org/wikipedia/commons/1/16/Poisson_pmf.svg License: CC BY 3.0 Contributors:
Own work Original artist: Skbkekas
• File:Portal-puzzle.svg Source: https://upload.wikimedia.org/wikipedia/en/f/fd/Portal-puzzle.svg License: Public domain Contributors: ?
Original artist: ?
• File:Probability_density_function_of_Pareto_distribution.svg Source: https://upload.wikimedia.org/wikipedia/commons/1/11/Probability_
density_function_of_Pareto_distribution.svg License: CC BY-SA 3.0 Contributors: Own work Original artist: Danvildanvil
• File:Question_book-new.svg Source: https://upload.wikimedia.org/wikipedia/en/9/99/Question_book-new.svg License: Cc-by-sa-3.0 Con-
tributors:
Created from scratch in Adobe Illustrator. Based on Image:Question book.png created by User:Equazcion Original artist:
Tkgd2007
• File:Rayleigh_distributionCDF.svg Source: https://upload.wikimedia.org/wikipedia/commons/a/a9/Rayleigh_distributionCDF.svg License:
CC0 Contributors: Own work Original artist: Krishnavedala
• File:Rayleigh_distributionPDF.svg Source: https://upload.wikimedia.org/wikipedia/commons/6/61/Rayleigh_distributionPDF.svg License:
CC0 Contributors: Own work Original artist: Krishnavedala
• File:Rice_distribution_motivation.svg Source: https://upload.wikimedia.org/wikipedia/commons/a/a3/Rice_distribution_motivation.svg Li-
cense: CC0 Contributors: Own work Original artist: Sbyrnes321
• File:Rice_distributiona_CDF.png Source: https://upload.wikimedia.org/wikipedia/commons/4/41/Rice_distributiona_CDF.png License: CC-
BY-SA-3.0 Contributors: ? Original artist: ?
• File:Rice_distributiona_PDF.png Source: https://upload.wikimedia.org/wikipedia/commons/a/a0/Rice_distributiona_PDF.png License: CC-
BY-SA-3.0 Contributors: ? Original artist: ?
• File:Rozklad_benforda.svg Source: https://upload.wikimedia.org/wikipedia/commons/4/46/Rozklad_benforda.svg License: Public domain
Contributors: Own work Original artist: Gknor
• File:Shiftedgompertz_distribution_CDF.svg Source: https://upload.wikimedia.org/wikipedia/commons/1/17/Shiftedgompertz_distribution_
CDF.svg License: CC0 Contributors: Own work Original artist: Krishnavedala
• File:Shiftedgompertz_distribution_PDF.svg Source: https://upload.wikimedia.org/wikipedia/commons/0/07/Shiftedgompertz_distribution_
PDF.svg License: CC0 Contributors: Own work Original artist: Krishnavedala
236 CHAPTER 3. TEXT AND IMAGE SOURCES, CONTRIBUTORS, AND LICENSES

• File:SkellamDistribution.png Source: https://upload.wikimedia.org/wikipedia/commons/b/b2/SkellamDistribution.png License: Public do-


main Contributors: ? Original artist: ?
• File:Text_document_with_red_question_mark.svg Source: https://upload.wikimedia.org/wikipedia/commons/a/a4/Text_document_with_
red_question_mark.svg License: Public domain Contributors: Created by bdesham with Inkscape; based upon Text-x-generic.svg from the
Tango project. Original artist: Benjamin D. Esham (bdesham)
• File:Tukey_anomaly_criteria_for_Exponential_PDF.png Source: https://upload.wikimedia.org/wikipedia/commons/b/ba/Tukey_anomaly_
criteria_for_Exponential_PDF.png License: CC BY-SA 3.0 Contributors: Own work Original artist: Carlos Lizarraga-Celaya
• File:Two_red_dice_01.svg Source: https://upload.wikimedia.org/wikipedia/commons/3/36/Two_red_dice_01.svg License: CC0 Contribu-
tors: Open Clip Art Library Original artist: Stephen Silver
• File:Uniform_distribution.svg Source: https://upload.wikimedia.org/wikipedia/commons/c/c2/Uniform_distribution.svg License: CC BY-
SA 3.0 Contributors: Own work Original artist: Ben Moore
• File:Wald_Distribution_matplotlib.jpg Source: https://upload.wikimedia.org/wikipedia/commons/4/4f/Wald_Distribution_matplotlib.jpg
License: CC BY-SA 3.0 Contributors: Own work Original artist: Bluemix
• File:WalleniusNoncentralHypergeometric1.png Source: https://upload.wikimedia.org/wikipedia/en/f/fe/WalleniusNoncentralHypergeometric1.
png License: Cc-by-sa-3.0 Contributors: ? Original artist: ?
• File:WalleniusNoncentralHypergeometricRecursion1.png Source: https://upload.wikimedia.org/wikipedia/en/5/59/WalleniusNoncentralHypergeometricRecurs
png License: Cc-by-sa-3.0 Contributors: ? Original artist: ?
• File:Weibull_CDF.svg Source: https://upload.wikimedia.org/wikipedia/commons/7/7e/Weibull_CDF.svg License: CC BY-SA 3.0 Contrib-
utors: Own work Original artist: Calimo, after Philip Leitch.
• File:Weibull_PDF.svg Source: https://upload.wikimedia.org/wikipedia/commons/5/58/Weibull_PDF.svg License: CC BY-SA 3.0 Contrib-
utors: Own work, after Philip Leitch. Original artist: Calimo
• File:Wiki_letter_w_cropped.svg Source: https://upload.wikimedia.org/wikipedia/commons/1/1c/Wiki_letter_w_cropped.svg License: CC-
BY-SA-3.0 Contributors:
• Wiki_letter_w.svg Original artist: Wiki_letter_w.svg: Jarkko Piiroinen
• File:Wikibooks-logo-en-noslogan.svg Source: https://upload.wikimedia.org/wikipedia/commons/d/df/Wikibooks-logo-en-noslogan.svg Li-
cense: CC BY-SA 3.0 Contributors: Own work Original artist: User:Bastique, User:Ramac et al.
• File:Youngronaldfisher2.JPG Source: https://upload.wikimedia.org/wikipedia/commons/a/aa/Youngronaldfisher2.JPG License: Public do-
main Contributors: https://www.adelaide.edu.au Original artist: Unknown

3.3 Content license


• Creative Commons Attribution-Share Alike 3.0

You might also like