0% found this document useful (0 votes)
14 views26 pages

Creating New Distributions Using Integration and S

The document discusses methods for generating new probability distributions through integration by parts (IBP) and summation by parts (SBP), highlighting their application to both continuous and discrete distributions. It presents a general methodology for transforming existing distributions into new forms, with examples and theoretical underpinnings. The paper emphasizes the potential for creating complex probability density functions (pdfs) and their utility in statistical modeling and inference.

Uploaded by

Mahdi Rashedi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views26 pages

Creating New Distributions Using Integration and S

The document discusses methods for generating new probability distributions through integration by parts (IBP) and summation by parts (SBP), highlighting their application to both continuous and discrete distributions. It presents a general methodology for transforming existing distributions into new forms, with examples and theoretical underpinnings. The paper emphasizes the potential for creating complex probability density functions (pdfs) and their utility in statistical modeling and inference.

Uploaded by

Mahdi Rashedi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/332186746

Creating new distributions using integration and summation by parts

Preprint · April 2019


DOI: 10.48550/arXiv.1904.01859

CITATIONS READS
0 185

1 author:

Rose Dawn Baker


University of Salford
237 PUBLICATIONS 5,032 CITATIONS

SEE PROFILE

All content following this page was uploaded by Rose Dawn Baker on 23 May 2019.

The user has requested enhancement of the downloaded file.


Creating new distributions using integration and
arXiv:1904.01859v1 [math.ST] 3 Apr 2019

summation by parts
Rose Baker, School of Business
University of Salford, UK
email [email protected]

April 4, 2019

Abstract
Methods for generating new distributions from old can be thought
of as techniques for simplifying integrals used in reverse. Hence inte-
grating a probability density function (pdf) by parts provides a new
way of modifying distributions; the resulting pdfs are integrals that
sometimes require computation as special functions. Summation by
parts can be used similarly for discrete distributions. The general
methodology is given, with some examples of distribution classes and
of specific distributions, and fits to data.

Keywords
Mixture distribution; partial integration; stochastic dominance; summation
by parts; discrete distribution; special functions

1 Introduction
Parametric models of probability distributions are essential for statistical
inference. Hence a vast number of distributions of all types has been cre-
ated, and a common way to generate new distributions is to modify an old
one. Jones (2015) reviews the main techniques for generalizing univariate
symmetric distributions, and Lai (2012) gives a comprehensive account of
ways to modify survival distributions.
Transforming the random variable is probably the most popular method.
It is a technique that is often used to evaluate unknown integrals, and is
used ‘in reverse’ where the integral of the pdf (unity) is already known, to

1
generate new pdfs. For example, the exponential distribution with survival
function F̄ (x) = exp(−αx) becomes the Weibull distribution with F̄ (y) =
exp(−(αy)β ) on setting x = α−1 (αy)β .
Other integration techniques can be similarly used to create new distribu-
tions. For example, Azzalini’s method (e.g. Azzalini and Capitanio, 2018) of
transforming a symmetric pdf f (x) to 2w(x)f (x), where w(−x) = 1 − w(x),
can be thought of as a trick to simplify asymmetric integrands of type
w(x)f (x) used in reverse. These reflections prompt the thought that since
a probability density function (pdf) must integrate to unity, all the ‘tricks’
used to simplify unknown integrals, if used in reverse, can be used to gen-
erate more complex integrands (pdfs).
A method for simplifying integrands is integration by parts (IBP), and
the creation of new distributions using this method was introduced by Baker
(2019). This article further explores the use of IBP and its discrete analogue,
summation by parts (SBP) to generate new distributions.
Before giving the general methodology, we show the power of the method
with an example, starting with the exponential distribution. This is a special
case of a distribution given in Baker (2019). Write the exponential pdf
f (t) = α exp(−αt) for α > 0 and T ≥ 0 as f (t) = −uv ′ , where u =
αt1/2 , v ′ = − exp(−αt)/t1/2 ; this is just one of many choices of u and v that
could be made. R

Then v(t) = t x−1/2 exp(−αx) dx, and on evaluating the integral by
√ √
changing variable to y where αx = y 2 /2 we see that v(t) = α−1/2 2 πΦ(− 2αt),
where Φ is the normal distribution function. We have that u(0) = 0, v(∞) =
0, and hence using the method of parts, integrating v ′ and differentiating u,
the integrand (the new pdf) is
√ √
α πΦ(− 2αt)
g(t) = √ .
αt
This is a 1-parameter distribution, that can be used for modelling lifetimes
when the hazard function initially high, and the pdf and hazard function
are shown in figure 1. Further details are given in appendix A.
This example shows how potentially useful and fairly tractable new dis-
tributions can be generated quite easily using IBP, and illustrates the point
that the pdf is often a special function.
The next section gives the general methodology and some general prop-
erties of the new distributions for the continuous case, and then some promis-
ing distribution classes are introduced. For discrete distributions, the method
of summation by parts can be used analogously and gives broadly similar
results, discussed in section 6.

2
3
Modified exponential pdf
Hazard function
Exponential pdf
Exponential hazard
2.5

2
Pdf/hazard

1.5

0.5

0
0 1 2 3 4 5
Time

Figure 1: The modified exponential pdf for α = 1 and its hazard function,
along with the pdf and hazard function for the exponential distribution.

3
2 General theory
2.1 Deriving probability density functions
Some notation is now introduced and the basic idea described more formally.
The method gives a transformation of the integrand (pdf) leading to a new
pdf, and the two pdfs can conveniently be called ‘L’ and ‘R’, where the R
form stochastically dominates the L form, so that the probability mass is
further to the right. The original distribution is sometimes referred to here
as the ‘base’ distribution.
Let the support of a distribution be (xl , xh ), which covers distributions
defined on the whole real line, survival distributions, and doubly-bounded
distributions. Write the R-pdf as f (x) = −u(x)v ′ (x), where u is a positive
monotone increasing function so that u′ ≥ 0, with u(xl ) = 0, and v a positive
decreasing function so that v ′ (x) ≤ 0, with v(xh ) = 0. Then applying
integration by parts,
Z xh Z xh
xh
f (x) dx = 1 = [−u(x)v(x)]xl + u′ (x)v(x) dx. (1)
xl xl

The first term on the right vanishes, giving the L-pdf

g(x) = u′ (x)v(x). (2)

When u(xl ) > 0 and v(xh ) > 0, (1) yields a pdf

g(x) = u′ (x)v(x)/{1 + u(xh )v(xh ) − u(xl )v(xl )}. (3)

Here by construction v(xh ) = 0. In Baker (2019) u, v are crafted so that the


simpler form (2) could be used. One can of course transform the integrand
by going L → R from the base distribution instead of R → L as shown
above, to obtain f (x) from g(x) by integrating u′ and differentiating v.

2.2 Distribution functions and Stochastic dominance


Let the distribution functions corresponding to f (x), g(x) be F (x), G(x)
respectively. Write the survival distributions 1 − F (x) = F̄ (x) etc. Then
integrating by parts again
Z x Z x
G(x) = u′ (y)v(y) dy = [u(y)v(y)]xxl − u(y)v ′ (y) dy = u(x)v(x)+F (x),
xl xl
(4)

4
and Ḡ(x) = F̄ (x)−u(x)v(x). The L-distribution has probability mass shifted
to the left, and the R-distribution dominates it stochastically,
Rx i.e. F̄ (x) >
Ḡ(x). The mean can be written as Eg (X) = Ef (X) − xlh u(y)v(y) dy. The
R-distribution may or may not dominate when (3) is used, and

F (x) + u(x)v(x) − u(xl )v(xl )


G= (5)
1 − u(xl )v(xl )

The shifting of probability mass can be better understood by tagging the


mass at x0 using a Dirac delta-function. Then we apply IBP to −u(x)v ′ (x)δ(x−
x0 ), to obtain −u′ (x)v ′ (x0 ) for x ≤ x0 , else zero. Hence a unit probability
mass at x0 has been smeared out onto the range (xl , x0 ) with pdf u′ (x)/u(x0 ),
distribution function u(x)/u(x0 ). Going from the L to the R distribution,
the probability mass is smeared out over (x0 , xh ) with survival function
v(x)/v(x0 ).

2.3 Random numbers


The above suggests a method of generating random numbers from the L-
distribution, given that we can generate them from the R-distribution F .
Let X be a random number from F and V a uniform random number.
Then a random variable Y from G satisfies u(Y )/u(X) = V so that Y =
u−1 {u(X)V }. If u can be inverted, this is a simple way to generate random
numbers from the L-distribution.

2.4 Families of distributions


One usually seeks to generalize a pdf so that a family of distributions
can be generated indexed by one or more parameters, with the original
distribution as a special case. The freedom available in choosing u and
v makes this possible
R xh because if v ′ decreases infinitely fast, the integral
u(x)v(x) = − x u(x)v ′ (y) dy is zero, and g(x) = f (x) for some value of
the indexing parameter (λ−1 = 0 in the examples given later). This makes
large-sample inference easier, as one can e.g. use twice the log-likelihood
increase on adding the new parameter in a chi-squared test of whether the
model fit to a dataset has improved. It also enables parametric tests of
stochastic dominance: appendix B gives a brief discussion.
The next section discusses particular classes of distributions generated
using IBP.

5
3 Some distribution classes
3.1 Using a function of the distribution function for u
With u a function of F , G will be a function of F , so this yields general
families of distributions. The most interesting classes have G → F as the
parameter λ → ∞. In general, when du(x)/ dF (x) → ∞ as F → 0, the pdf
g(x) will be infinite at the lower limit xl , if f (xl ) is finite.

3.1.1 u(x) = F λ (x)


Moving R → L, one choice is to take u(x) = F λ (x), where λ > 0, so that
R1 1−λ
v ′ (x) = f (x)/F λ (x). Then v(x) = F (x) dy/y λ = 1−F1−λ (x) . Hence the
L-distribution has pdf
 λ−1 
F (x) − 1
g(x) = λ f (x), (6)
1−λ
and distribution function
F λ (x) − λF (x)
G(x) = . (7)
1−λ
This is a negative mixture of the original pdf with weight −λ/(1 − λ) and
the top λ-th order statistic with weight 1/(1 − λ). As λ → ∞, G(x) → F (x)
and as λ → 0, G(x) → 1, i.e. the probability mass is zero at above the
lower limit. When λ = 1, from (6) or directly, g(x) = − ln(F )f (x) and
G(x) = {1 − ln(F (x))}F (x).
Expanding F = 1 − F̄ in a Taylor series for small F̄ , we see in the
right tail that Ḡ(x) ≃ (λ/2)F̄ 2 (x), showing the pulling in of the right tail
that follows from the movement of probability mass to the left. The hazard
function is 2f (x)/F̄ (x), twice that of the base distribution.
One may wonder how this is possible, given that G → F as λ → ∞. The
solution to the paradox is that the tail where the hazard is double that of
the base distribution occurs at larger and larger x as λ → ∞. For small F ,
when λ > 1 we have that the hazard function h(x) ≃ λf (x)/(λ − 1), so that
the hazard and pdf is similar to the base distribution, but both are larger by
a factor λ/(λ − 1). For λ < 1, we have that h(x) ≃ λF (x)λ−1 f (x)/(1 − λ),
so that if f (x) is finite, the hazard would be infinite.
Ḡλ−1
Going L → R, set v(x) = Ḡλ (x), so u′ = gḠ−λ and u = 1−1−λ and
λ
F̄ = Ḡ 1−λ
−λḠ
. This is a negative mixture that is longer-tailed than was the
L-distribution. When λ = 1, F̄ = (1 − ln(Ḡ))Ḡ.

6
3.1.2 u = exp(λF )
On taking u(x) = exp λF (x) and using (5),

λF − exp(−λ(1 − F )) + exp(−λ)
G= . (8)
λ − 1 + exp(−λ))
exp(−λ(1−F ))−exp(−λ)
Since H = 1−exp(−λ) is a distribution function, G is the negative
mixture
λ 1 − exp(−λ)
G= F− H.
λ − (1 − exp(−λ)) λ − (1 − exp(−λ)
When F is exponential, H is a truncated extreme-value distribution.
AS λ → ∞, (8) yields G → F , and as λ → 0, Ḡ → F̄ 2 , e.g. a component
lifetime distribution tends to the distribution of the lifetime of a series system
where either of two components must fail for a failure of the system.
In the tail where F̄ is small, expanding (8) shows that the hazard function
is twice that of the base distribution (which is true for all x as λ → 0).
In the left tail,
(1 − exp(−λ))λF
G≃ ,
λ − 1 + exp(−λ)
i.e. the pdf is scaled up from the base distribution by a factor not exceeding
2. For survival distributions as base distribution, this distribution therefore
tends to give an increasing hazard function.
Other increasing functions of F can also be used, e.g. u = exp(λF ) − 1.
This gives
 
1 − exp(−λ)
G = F + (exp(λF ) − 1) ln /λ,
1 − exp(−λF )
which is messy. Any increasing function can be used for u, leading to a vast
number of possible (and messy) distributions.

3.2 Using the transformation u = tλ for survival and doubly


bounded distributions
When u is a power of the random variable (time t), more specific results
follow. All lifetime distributions must have a scale factor α. Without loss of
generality, we set α = 1 for now. The base pdf may include a power of t. Let
this term be tβ−1 , where β is positive for Weibull and gamma distributions,
and β = 1 for the lognormal distribution. Take u = tβ−1+λ . Then from (2)

g(t) = (β − 1 + λ)tβ−2+λ v(t). (9)

7
We must have that β − 1 + λ > 0 so that the pdf is positive, hence λ > 1 − β.
Baker (2019) discusses this case in detail and defines reliability growth
as
ξ = 1/(β − 1 + λ). (10)
The mean lifetime is Eg (T ) = β−1+λ
β+λ Ef (T ), so the proportional increase in
expected lifetime on eliminating inferior items is
Ef (T ) − Eg (T ) 1
∆E(T ) = = = ξ,
Eg (T ) β−1+λ
and the proportional decrease in lifetime caused by inferior items is 1/(β +
λ) = ξ/(ξ + 1). As λ increases and the base distribution is regained, ∆E(T )
goes to zero.

4 Special cases of continuous distributions


This section briefly discusses specific distributions; a great variety of distri-
butions can be generated, of which these are only a small sample.
With the exponential distribution g(t) = exp(−t) as base, shifting right
using the F̄ λ (t) transformation from section 3, we have Ḡ = exp(−x) and
F̄ (x) = exp(−λx)−λ
1−λ
exp(−x)
. This is a 2-parameter phase-type distribution
that occurs e.g. as the prevalence of intermediate radioactive decay prod-
ucts. The hazard function increases linearly with slope λ and levels off at
min(1, λ). The moments are tractable, e.g. the mean is 1 + λ−1 .
Using the tλ transformation to shift right yields f (t) = x exp(−x)
λ+1 +
λ exp(−x)
λ+1 , a mixture of exponential and gamma distributions.
Baker (2019) explored left-shifting distributions such as gamma and
Weibull. These are special cases of the Stacy distribution. The Stacy or
generalized gamma distribution has pdf

αγ(αt)βγ−1 exp(−(αt)γ )
f (t) = ,
Γ(β)
where α > 0, β > 0, γ > 0 and Γ is the gamma function. It includes gamma,
Weibull and lognormal distributions as special cases. Going L → R with
v(t) = exp(−(αt)γ )/(αt)λ yields a simple mixture of Stacy distributions,
but going R → L is more productive. In integrating v ′ (t), the incomplete
gamma function will be needed, defined as
Z ∞
Γ(a; t) = xa−1 exp(−x) dx, (11)
t

8
where a > 0. The gamma function itself is Γ(a) = Γ(a; 0). If a ≤ 0, Γ(a; t)
is still defined but cannot be computed using software that computes the
incomplete gamma function.
The resulting pdf is
α(λ + βγ − 1)(αt)βγ+λ−2 Γ( (1−λ) γ
γ ; (αt) )
g(t) = , (12)
Γ(β)
and the survival function is
(αt)βγ+λ−1 Γ( (1−λ) γ
γ ; (αt) )
Ḡ(t) = F̄ (t) − .
Γ(β)
The relation between the moments for L and R-distributions for the u = tλ
case was given in section 3.2.
The beta distribution is the most common doubly-bounded distribution.
Here the pdf f (x) = B −1 (α, β)xα−1 (1 − x)β−1 , where B denotes
R 1 the beta
function. Taking u = xα+λ−1 , we have that v(x) = B(α, β)−1 x y −λ (1 −
y)β−1 dy + c, where c ≥ 0. The constant of integration had to be zero for
survival distributions as u(∞) = ∞, but here does not. Hence
v(x) = B(1 − λ, β; x)/B(α, β) + c,
where B(1 − λ, β; x) is the complement of the unregularized incomplete beta
function. This yields the pdf
(α + λ − 1)xα+λ−2 v(x)
g(x) = u′ v/(1 + c) = .
1+c
The distribution function is
F (x) + xα+λ−1 (B(1 − λ, β; x)/B(α, β) + c)
G(x) = ,
1+c
where F (x) = (1 − B(α, β; x))/B(α, β).
On integrating gxn by parts, we have that
α−1+λ c Ef (X n )
Eg (X n ) = { + }.
1+c α−1+λ+n α−1+λ+n
We have that g(1) = (α + λ − 1)c/(1 + c). This 4-parameter distribution
allows a much more flexible pdf than the beta distribution. The transforma-
tion can of course also be applied to 1 − X by changing X ↔ 1 − X in the
transformed distribution. Note that the beta distribution is label-invariant
(1 − X also follows a beta distribution) but the transformed distribution is
not.
Random numbers can be generated as follows:

9
1. With U a uniform r.v., if U < c/(1 + c) , generate Z = V 1/(α−1+λ) ,
where V is a uniform r.v.;
2. if U ≥ c/(1 + c) generate X, a r.v. from the parent beta distribution ;
3. Then Y = W 1/α+λ−1 X, where W is a uniform r.v.
To keep generation of random numbers to a minimum, V and W could be
generated by affine transformations on U .
On the whole real line, the normal distribution is the most important by
far, and has been skewed in many ways, e.g. Azzalini and Capitanio (2018).
Taking u(x) = exp(λx) leads to the lagged-normal or normal-exponential
distribution (e.g. Johnson et al 1995). This is exponential in the tail, and
can also be derived as the sum of normal and exponential random variables,
so is not new. The u = F λ transformation yields the class (7) which with
F (x) = Φ(x), where Φ is the normal distribution function, yields a skew
distribution where G(x) = (Φ(x)λ − λΦ(x))/(1 − λ). The moments cannot
be found simply.

5 Data fitting
The method generates vast numbers of distributions, some of which are new.
The aim of the analysis was merely to show by an example that the new
distributions can be useful.
In their book on survival analysis, Klein and Moeschberger (2003) use a
dataset of days to death of 863 kidney-transplant patients whose transplants
were performed at the Ohio State University Transplant Center between
1982 and 1992. Available covariates are age, gender and white/black. Only
140 patients’ survival times were not censored. We discretized age into 5
bands: 1-16, 17-32, 33-48, 49-64 and 65+ and fitted the distribution from
(12) with β = 1, i.e. a modified Weibull distribution. An accelerated time
model was used (e.g. Chiou et al, 2014), so that α = α0 exp(η T X), where X
is a vector of covariates and η a vector of regression coefficients. The model
parameters are then the η, α0 the baseline time-scale, γ the Weibull shape
parameter and λ, reparameterized so that ξ as defined in (10) is used instead
of λ. The fit was done by maximum-likelihood in a purpose-written fortran
program. The incomplete gamma function for negative argument is not
available as a standard special function, and so was evaluated as an integral
for a ≤ 0 in (11). The NAG library routine D01AMF was used. This routine
transforms the integration range to be finite and then integrates adaptively.
Similar routines exist on many platforms.

10
0.0016
Age 33-48
Age 65+

0.0014

0.0012

0.001
Hazard of death

0.0008

0.0006

0.0004

0.0002

0
0 500 1000 1500 2000 2500 3000 3500
Days

Figure 2: Hazard function for the modified Weibull distribution fitted to the
kidney transplant data of Klein and Moeschberger (2003) for the 33-48 and
65+ age groups.

The fit to data cannot be shown because of the censoring, but figure 2
shows the hazard function of the fitted distribution for the central and oldest
age-bands. From this it can be seen that the hazard of death decreases
steadily after transplant, effectively to a constant for the central age-band,
but starts increasing again for the top age band. This result is consistent
with hazard plots using all patients produced by various methods by Klein
and Moeschberger (2003).
The table shows fitted parameters and standard errors. Standard errors
are large, sometimes very large, because little information comes from the
censored survival times. The baseline age group was 33-48 years. It can be
seen that the hazard of death increases rapidly from the lowest age band, and
then increases more slowly with age. Women have a slightly higher hazard
of death than men, and people of colour higher than white, but these effects
are not statistically significant. Because γ > 1, the hazard of death would
rise eventually for all age groups. The value of 1.6 for ξ means that, for
a given set of covariate values, the expected lifetime could be increased by

11
160% of its current value, if all patients could be made to respond as well
as the best ones. This suggests that much could potentially be gained by
studying and improving survival rates.

Parameter Estimate Standard error


α0 0.576×10−4 0.261×10−4
Shape γ 3.315 4.683
Mixing parameter ξ 1.603 0.123
Gender 0.048 0.240
Black/white 0.243 0.295
1-16 age band -29.685 168.886
17-32 age band -2.354 0.530
49-64 age band -.558 0.266
65+ age band 1.338 0.377

Table 1: Estimated parameters and standard errors for the fit to the kidney
transplant dataset.

6 Discrete distributions from summation by parts


A way to generalize discrete (count data) distributions, is to use partial
sum distributions. Wimmer and Mačutek (2012) give a general definition
of partial-sum distributions, where for parent (base) distributions with sup-
port on the non-negative integers, P with probabilities p0 , p1 . . . the descendant
probabilities qi are given by qi = j≥i f (i, j)pj , where f is a real function.
The simplest such distribution has qi = µ−1 ∞
P
j=i+1 pj where µ is the mean
of the parent distribution; this is a discrete form of the length-biased dis-
tributions found in renewal theory. Wimmer and Altmann (2001) describe
more general distributions of this type, and Johnson et al (2005) give a
summary.
Here a methodology is given that yields classes of partial-sum distribu-
tions. The two classes that have been explored in detail reduce to the parent
distribution when a parameter λ or r → ∞. We shall call them r-Poisson, λ-
binomial, etc. The r-class interpolates between a distribution and a variant
of its length-biased form.
The population mean is needed for statistical inference. For example, one
often wishes to regress the mean on some covariates. This is computationally
more difficult (but still possible) when the mean cannot be readily computed

12
in terms of model parameters. The class of r-distributions has the advantage
of tractable moments and so allows more straightforward inference.
The general methodology is next given, followed by the properties of the
2 classes of distributions and example of a fit to data is given.

6.1 The general pmf


The mathematical technique of summation by parts (SBP) can be used to
transform distributions into new ones. The SBP identity can be written
n
X n
X
− ui (vi+1 − vi ) = um vm − un vn+1 + vi (ui − ui−1 ) (13)
i=m i=m+1

for any ui , vi . The proof follows from the telescoping property of the sum
P n
i=m+1 (ui vi+1 − ui−1 vi ) = un vn+1 − um vm+1 . Distributions are commonly
defined on the integers 0 to n, where usually, n = ∞, the main exception
being the binomial distribution. Let a ‘parent’ probability mass function
(pmf) pi = −ui (vi+1 − vi ) ≥ 0, ui ≥ 0 and let ui+1 − ui ≥ 0 with vn+1 = 0.
Since pi ≥ 0, vi ≥ vi+1 ≥ 0. Then from (13)
vi (ui − ui−1 )
qi = (14)
1 − u m vm
is also a pmf.
Set m = −1 and set the pmf p−1 = 0. Then from (14) for i ≥ 0
vi (ui − ui−1 )
qi = .
1 − u−1 v−1
Since vi+1 − vi = −pi /ui and vn+1 = 0, it follows that vi = nj=i pj /uj .
P
Since p−1 = 0, v−1 = v0 . The function ui can be chosen to be zero at
m = −1, e.g. ui = P(i + 1)λ where λ > 0, or r i − 1/r where r > 1. In this
n
case simply qi = ( j=i pj /uj )(ui − ui−1 ). Otherwise
( nj=i pj /uj )(ui − ui−1 )
P
qi = . (15)
1 − u−1 v0
Using (13) in this way, both parent and descendant distributions have the
same upper limit n and hence the same support.
Note that for qi ≥ 0 we require u−1 v0 < 1. Since v0 = nj=0 pj /uj <
P
1/u0 and u−1 < u0 , we have that u−1 v0 < 1 as required. If u is indexed
by a parameter r so that ui+1 /ui → ∞ as r → ∞, then from (15) qi → pi
as r → ∞. This is so because vi → pi /ui , ui − ui−1 → ui , and u−1 v0 →
p0 (u−1 /u0 ) → 0. Hence the class of descendant distributions generalizes the
parent distribution.

13
6.2 The discrete distribution function and moments
Denoting distribution functions for parent and descendant distributions by
Fk , Gk respectively, we have from (13) that

Fk + uk vk+1 − u−1 v0
Gk = . (16)
1 − u−1 v0

Hence the distribution function is readily calculable once the vi are calcu-
lated and the pmf is known. If u−1 = 0, Gk > Fk and the parent stochasti-
cally dominates the descendant distribution.
From (15) the mean µg is
Pn Pn
i=0 i(ui − ui−1 ) j=i pj /uj
µg = . (17)
1 − u−1 v0

On reversing the order of summation,

µ − ni=1 ( i−1
P P
j=0 uj )pi /ui
µg = . (18)
1 − u−1 v0

For the r-th non-central moment,


(r) −
Pn Pi−1 r r
(r)
µ i=1 ( j=0 {(j + 1) − j }uj )pi /ui
µg = . (19)
1 − u−1 v0

6.3 Random numbers


A unit probability mass at k for the parent distribution becomes a pmf
ri = (ui −ui−1 )/(uk −u−1 ) for i ≤ k, so random numbers can be generated by
generating a random number K from the parent distribution,then choosing
a random number I with probability ri . If K is not too large, this can be
done by generating a uniform random number U , and choosing the smallest
i such that ij=0 rj > U . This is the table lookup method (e.g. Shmerling,
P
2013).
From (16) on setting k = 0, we have that q0 = (p0 + u0 v1 − u−1 v0 )/(1 −
u−1 v0 ), so q0 > p0 when u−1 = 0, which makes these distributions useful
when the parent distribution (e.g. Poisson) underfits the probability of no
events occurring.

14
6.4 The ui = (i + 1)λ distribution
With the choice ui = (i + 1)λ , u−1 = 0 and the new system of distributions
n
X
qi = {(i + 1)λ − iλ } pj /(j + 1)λ (20)
j=i

is obtained.
As λ → ∞, qi → pi , and as λ → 0, q0 → 1 and all the probability mass
is at zero.
The case where λ = 1 is briefly mentioned in Johnson et al (2005). In
general, the moments are intractable, unless λ = 1. There (18) enables the
mean to be found, and higher moments are also tractable. Letting pj → pj+1
and setting p0 = 0 gives

X
qi = {(i + 1)λ − iλ } pj /j λ .
j=i+1

This is a class of distributions that generalizes the Bissinger system of dis-


tributions (Johnson et al 2005, p. 509), in which λ = 1.
The choice ui = r i where r > 1 is more tractable, and is discussed next.

6.5 The ui = r i distribution


Pn
Here ui = r i for r > 1. Define Hi (x) = j=i pj x
j, then vi = Hi (1/r) and

Hi (1/r)r i (1 − 1/r)
qi = , (21)
1 − H0 (1/r)/r
i
as v−1 = 1/r. Also, Gi = Fi +r H1−Hi+1 (1/r)−H0 (1/r)/r
0 (1/r)/r
. The pmf for a Poisson
parent distribution is shown in figure 3.
The function H0 is the probability generating function (pgf) for the
parent distribution. From (21) on reversing the order of summation, the pgf
M (s) for the derived distribution is

r − 1 {sH0 (s) − H0 (1/r)/r}


M (s) = .
rs − 1 1 − H0 (1/r)/r

From this, the moments can be read off, e.g. the mean µg is

µ+1 r
µg = − . (22)
1 − H0 (1/r)/r r − 1

15
0.3
Poisson
r=5
r=1.5

0.25

0.2
Probability

0.15

0.1

0.05

0
0 1 2 3 4 5 6 7 8 9 10
Random variable

Figure 3: Probabilities for the Poisson distribution with µ = 2.1, and for
the r-distribution with r = 1.5 and r = 5.

16
The pgf H0 is known in analytic form for the major discrete distributions,
e.g. for the Poisson, H0 (1/r) = exp(−(1 − 1/r)µ). Hence the moments of
the descendant distribution can be found as functions of the parameters of
the parent distribution.
Random numbers can be computed using the general method described
earlier. This now particularizes to the following: generate a random number
K from the parent distribution. Then compute
ln{(r K − r −1 )U + r −1 }
m= .
ln(r)
where U is a uniform [0, 1] random number. Then the integer part of m + 1
is M , a r.v. from the descendant distribution. This is the inverse-transform
method (Shmerling, 2013).
As r → ∞ we have that qi → pi , while when r → 1 the numerator
and denominator of (21) go to zero. Applying L’Hospital’s rule and using
dH0 (x)/ dx|x=1 = µ, qi → (pi + nj=i+1 pj )/(1 + µ), a partial sum distribu-
P
tion corresponding to the parent distribution.

6.6 Long-tailed discrete distributions


Going from q to p a new class of long-tailed distributions can be obtained.
For doubly-bounded discrete distributions, one can use (15), with pi → pn−i .
In general, (13) is used in reverse. For example, for the λ-distributions, set
vi = (i + 1)−λ −λ −λ −λ
Pi− (n + 2) so−λui − ui−1 =−λqi /{(i + 1) − (n + 2) }. This
yields ui = j=0 qj /{(j + 1) − (n + 2) } so that the pmf pi is given by
i
X
pi = {(i + 1)−λ − (i + 2)−λ } qj /{(j + 1)−λ − (n + 2)−λ }.
j=0

As n → ∞ this simplifies to
i
X
pi = {(i + 1)−λ
− (i + 2) −λ
} qj (j + 1)λ .
j=0

For such distributions p0 = (1 − 2−λ )q0 , so p0 < q0 , and the probability


of zero events is reduced. The probabilities pi are easy to compute, as an
infinite sum is not required.
The r-distribution with n = ∞ is trivial: the pgf is simply
r−1
M (s) = H0 (s),
r−s

17
the product of the parent pgf and the pgf of a geometric distribution with
Prob(X = k) = r −k /(1 − 1/r). Hence the effect of SBP has been to add a
geometric random variable to the original.

6.7 Other discrete distributions


Each functional form for ui gives a new family of distributions, and if ui can
increase arbitrarily fast with i, the parent distribution can be regained, so
the descendant distribution will generalize the parent. For example, with
ui = r i − 1/r one obtains
n
X
qi = r i (1 − 1/r) pj /(r j − 1/r),
j=i
Pn
as u1 = 0. As r → 1, qi → j=i pj /(j + 1). In general, the mean cannot
be simply expressed. Clearly, more parameters than one can be introduced,
and a host of new distributions
√ created. In general, they will be quite messy,
for example when ui = r i+1 . However, there may well be other functional
forms for ui that give attractive distributions.
An interesting curiosity is a distribution obtained by setting ui = ts=1 (i+
Q
s),
Q where t is a positive integer. This has u−1 = 0, and ui − ui−1 =
t t−1
s=1 (i + s). The pmf is therefore

t−1
Y n
X t
Y
qi = t (i + s) pj / (j + k).
s=1 j=i k=1

The mean is tractable, and from (17), µg = tµ/(t + 1). From (19) the
variance is
tσ 2 tµ2 tµ
σg2 = + 2
+ ,
t + 2 (t + 1) (t + 2) (t + 1)(t + 2)

where σ 2 is the variance of the parent distribution. As t → ∞ the par-


ent distribution is regained. This distribution generalizes the t = 1 case
mentioned briefly in Johnson et al (2005).

7 Inference
We usually wish to study the effect of covariates on the mean µg . The
mean µg is therefore predicted as µg = µ0 exp(β T X), where X is a vector of

18
covariates. For the example, for the r-distributions, the model parameter µ
was found for each case from (22), using Newton-Raphson iteration, which
usually converged in 4 iterations. The pgf H0 (x) = {1 + µα(1 − x)}−1/α for
the negative binomial distribution. The probabilities qi were then computed
and fits were by maximum likelihood.
Note that for the λ-distributions the mean is not readily calculable in
terms of parent distribution parameters. Model fitting is still possible how-
ever, because the mean µg can be computed using the probabilities qi taken
up to some large cutoff value of i, and the Newton-Raphson iteration still
done for µ.

8 Example
Hilbe (2011) fits negative binomial distributions to a number of datasets.
One is the ‘affairs’ dataset, with 601 observations from Fair (1978), reporting
counts of extramarital affairs over a year in the USA. Table 2 shows results
of fitting the Poisson and NB distributions, and on adding the r parameter,
and table 3 shows the covariates and the regression coefficients. The mean
quoted is for the average values of the covariates.

Model -ℓ AIC
Poisson 1426.8 2871.54
Poisson+r 1126.62 2273.24
Negative Binomial 728.10 1476.20
Negative Binomial + r 711.45 1444.91

Table 2: Model fits for models of the extramarital affairs data of Fair (1978),
showing minus the log-likelihood and the Akaike Information Criterion.

Clearly, adding the r parameter to the negative binomial model has


improved the fit significantly. The model has gone to the limiting case of the
partial-sum distribution where r → 1. The conclusions about the effect of
covariates agree broadly with Hilbe’s analysis. The significant predictors are
self-rating of the marriage from unhappy to happy on a scale of 1-5, degree
of religiosity on a scale from 1 to 5 (anti to very) and years of marriage.
Religious people with a happy marriage who have not been married long
have fewest affairs.
This example shows how the new distributions can improve model fit
and so enable better inference. Computations were done with a purpose-
written fortran program, that used the Numerical Algorithms Group (NAG)

19
Variable Coeff p
Mean 1.873 (.388) -
α 3.02 (.155) -
r 1 -
Gender -.064 (.265) .81
Age -.0229 (.0188) .226
Years married .107 (.0355) .0024
Children? .113 (.307) .366
Religious? -.415 (.100) .000036
Educ. level -.000610 (.0560) .991
Occupation .0737 (.0801) .920
Rating -.447 (.099) .0000003

Table 3: Fitted parameter values for the NB+r model of the extramarital
affairs data. Standard errors are given in parentheses. The last column is
the p-value for a test that the regression coefficient is zero.

function minimisers.

9 Conclusions
Integration by parts is a general method that yields a cornucopia of distri-
butions, only a few of which have been explored here. Some are new, while
some have been derived by other methods and now have a new characteri-
zation. This could be useful e.g. in generating random numbers.
IBP is an addition to other general methods for modifying distributions,
such as transforming variables. Summation by parts can be used similarly
for discrete distributions, and its usefulness may be relatively greater, as the
variety of methods for generating new distributions from old is more limited
in the discrete case.
One can shift the probability mass of a distribution left or right. Shift-
ing it left is useful when dealing with failure-time distributions in reliability,
where the left-shifted distributions can reproduce the bathtub-shaped haz-
ard functions sometimes seen in practice (e.g. Lai and Xie, 2006). For
discrete distributions, the left shifted distributions have a higher probability
of zero events occurring. This zero-inflation can substantially improve the
model fit to data. Shifting probability mass right gives long-tailed distribu-
tions, which are needed in many areas, e.g. finance.
Comparing modifying distributions by integration by parts to using the

20
transformation of variables method, one could sum up as follows:
1. transformation gives a simple pdf, whereas IBP gives the pdf as an
integral, which may be simple or may be a special function;

2. the distribution function for the transformation of variables method


may not be tractable, but for IBP it is easily derived once the pdf is
known and is simply related to it by (4);

3. the IBP method has the further property of yielding a new distribution
that either stochastically dominates the original, or is dominated by
it. This property can be used in a test of stochastic dominance as
described in appendix B.

4. The IBP method changes the skewness of distributions, but cannot


change tail length whilst leaving skewness unchanged, as does e.g. the
arcsinh transformation applied to the normal distribution. However,
IBP could be applied twice e.g. R → L and then L → R to do this by
lengthening each tail in turn.
The methodology does not apply to circular distributions, but it is possi-
ble that an analogous method could be developed here. Possible multivariate
applications have not been mentioned. Clearly, with a bivariate distribution
one can shift the probability mass of X given Y , to obtain a new distribu-
tion with a different marginal distribution for X, but the original marginal
distribution for Y . This type of procedure will not lead to a copula, because
the marginal distribution of X has changed, but such distributions could
still be useful.
It is possible that other integration techniques, such as contour inte-
gration, could also be profitably used. Much further work could be done
in modifying distributions using partial integration, either developing the
general methodology, or using it to derive more new distributions. Fast
computation of the resulting special functions such as (11) is also needed.
The mathematical method of summation by parts enables the creation of
new families of discrete distributions that generalize existing distributions.
Any increasing function defined on non-negative integers (strictly, also on
-1) gives a family of distributions, and two such functions, ui = (i + 1)λ
and ui = r i have been discussed. The latter gives a relatively tractable
distribution, with moments that can be found analytically in terms of the
parameters of the parent distribution when the pgf can be found analytically.
The distributions can be used for modelling data and statistical inference,
and could give a useful sensitivity analysis when a model such as negative

21
binomial
P has been fitted. The class of long-tailed distributions of the form
qi = j≤i f (i, j)pj is completely new.
Further work could include derivation of new classes of distribution using
other functions for ui , and more detailed exploration of their properties.
Bivariate distributions have not been considered here, but new bivariate
distributions could be derived from old by applying SBP to X for each level
of Y .

22
References
[1] Azzalini, A. and Capitanio, A. (2018). The skew normal and related
families, Cambridge University Press, Cambridge UK.

[2] Baker, R. D. (2019). New survival distributions that quantify the gain
from eliminating flawed components, Reliability Engineering and Sys-
tem Safety, in press.

[3] Barrett, G. F. and Donald, S. G. (2003). Consistent tests for stochastic


dominance, Econometrica, 71, 71104.

[4] Chiou, S. H., Kang, S. and Yan, J. (2014). Fast accelerated failure time
modeling for case-cohort data, Statistics and Computing, 24, 559–568.

[5] Fair, R. (1978). A theory of extramarital affairs. Journal of political


economy 86, 45-61.

[6] Hilbe, J. M. (2011). Negative Binomial Regression, 2nd. ed., Cambridge


University press, Cambridge.

[7] Johnson, N.L., Kemp, A. W., Kotz, S. (2005). Univariate Discrete Dis-
tributions, 3rd ed., Wiley, New York.

[8] Johnson, N.L., Kotz, S., Balakrishnan, N. (1995). Continuous Univari-


ate Distributions, 2nd ed., Wiley, New York.

[9] Johnson, N.L., Kemp, A. W., Kotz, S. (2005). Univariate Discrete Dis-
tributions, 3rd ed., Wiley, New York.

[10] Jones, M. C. (2015). On families of distributions with shape parameters,


International Statistical Review, 83, 175-192.

[11] Klein, J. P. and Moeschberger, M. L. (2003). Survival Analysis: tech-


niques for censored and truncated data, 2nd ed., Springer, New York.

[12] Lai, C-D., Xie, M. (2006). Stochastic ageing and Dependence for Reli-
ability, Springer, Berlin.

[13] Lai, C. D. (2012). Constructions and applications of lifetime distribu-


tions, Applied Stochastic Models in Business and Industry, 29, 127-140.

[14] Shmerling, E. (2013). A range reduction method for generating discrete


random variables, Statistics and Probability Letters, 83, 1094-1099.

23
[15] Wimmer, G. and Altmann, G. (2001). A new type of partial-sums dis-
tributions, Statistics and Probability Letters, 52, 359-364.

[16] Wimmer, G. and Mačutek, J. (2012). New integrated view at partial-


sums distributions, Tatra Mountains Mathematical Publications, 51,
183-190.

24
Appendix A: the modified exponential distribution
The distribution is a special case of the modified Stacy distribution where
β = γ = 1, λ = 1/2. Some properties of this distribution are now given
without proof, for completeness, and to show the tractability of the distri-
bution. The pdf is initially infinite, and is exponential in the tail. The
survival function is
√ √
Ḡ(t) = exp(−αt) − 2 πΦ(− 2αt)(αt)1/2 .

The hazard function h(t) = g(t)/Ḡ(t) decreases from infinity at t = 0 to a


constant α, at large t.
The moments are E(T n ) = α−n n!/(1+2n), giving E(T ) = 1/3α,p E(T 2 ) =
(2/5)α−2 , var(T ) = (13/45)α−2 . The coefficient of variation is thus 13/5 ≃
1.61245, compared with 1 for the exponential distribution. Random numbers
are generated as T = −U 2 ln(V )/α, where U, V are uniformly distributed
on [0, 1]. √
502 5/13
The third moment about the mean is 502/945, giving skewness of 91 ≃
3.42118, larger than 2 for the exponential distribution. The excess kurtosis
145
is 17 1183 ≃ 17.12257, compared to 6 for the exponential distribution.

Appendix B: Stochastic dominance tests


Given two samples of returns from investments, one might wish to test for
(first order) stochastic dominance of investment B by investment A (e.g.
Barrett and Donald, 2003). A possible test is to fit a parametric form
to the sample, with the indexing parameter fixed at zero (giving the R-
distribution) for sample A, and floating for sample B. The chi-squared test
or other test that λ−1 6= 0 can then be used to test whether investment A
dominates B stochastically. An exact test would compute λ̂, then permute
observations between the two groups A and B, refitting to generate the
null distribution of λ̂. The p-value is the proportion of permuted samples
for which λ̂ is no greater than the observed value. This test can also be
done with discrete distributions of gains from investment/gambling. Note
that unlike the description here, H0 is often taken that there is stochastic
dominance. This area of inference is complex and not the main subject of
this article, but clearly good parametric tests could be devised.

25

View publication stats

You might also like