FRR 2020-21 Reconstructing Probability Distributions Using Quantile-Based Splines

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

Vol.

5, 2020-21

Reconstructing Probability Distributions using Quantile-based Splines

Hugo Hernandez
ForsChem Research, 050030 Medellin, Colombia
[email protected]

doi: 10.13140/RG.2.2.14827.36645

Abstract

An alternative method for the reconstruction of probability density functions from data
samples is presented, based on the description of the observed cumulative relative frequency
using piecewise cubic polynomials (C² cubic splines). The coefficients of the polynomial
segments (curvature parameters) are estimated using the quantiles of data as knots, and
setting the boundaries either as clamped (forcing the probability density function to be zero at
extremes of the data range), or natural (forcing the derivative of the probability density
function to be zero at the extremes of the data range). In addition, the conditions for
monotonicity of the cumulative probability function are derived, in order to guarantee non-
negative probability density functions. If the monotonicity conditions are not met, the values of
the curvature parameters must be found by constrained optimization. The curvature
parameter values obtained can be used to estimate all the moments of the distribution.
Different representative examples are presented for illustrating this method.

Keywords

Clamped Boundaries, Cumulative Probability, Deciles, Monotonicity, Natural Boundaries,


Polynomial, Probability Density, Quantiles, Quartiles, Splines

1. Introduction

Different methods for the reconstruction of probability density functions using a sample of
data from a population were presented and discussed in a previous report [1]. Two main types
of reconstruction methods were considered: 1) Reconstruction from the observed cumulative
probability distribution of the data sample, and 2) Reconstruction from the distribution
moments estimated from the data sample. Among the moment-based reconstruction
methods, a cubic splines and a general polynomial approach were considered. One of the main
difficulties of those polynomial methods was the possibility of obtaining negative values in the

01/12/2020 ForsChem Research Reports Vol. 5, 2020-21 (1 / 23)


www.forschem.org
Reconstructing Probability Distributions
using Quantile-based Splines
Hugo Hernandez
ForsChem Research
[email protected]

probability density function. In addition, polynomial models were not considered for
reconstructing probability density functions from observed cumulative probability
distributions.

In this report, an alternative cumulative probability-based reconstruction method using cubic


splines is proposed. The method is denoted as Quantile-based Splines, because the knots (or
nodes) of the splines will be determined from the data quantiles, instead of using arbitrary
equidistant nodes. The quantiles of a data sample represent the values for which a specified
fraction of the data values is less than or equal to the value of the quantile [2]. In other words,
the quantiles divide the data into a fixed number of equiprobable groups. Thus, if 2
equiprobable groups are obtained, the quantiles are denoted as bitiles. If 4 equiprobable
groups are obtained, the quantiles are denoted as quartiles. If 10 equiprobable groups are
obtained, the quantiles are denoted as deciles. If 100 equiprobable groups are obtained, the
quantiles are denoted as percentiles. In general, a data sample containing 𝑛 different
observations can be distributed in any natural number 𝑁 of equiprobable groups (denoted in
general as 𝑁-quantiles).

This method also differs with respect to the moment-based splines in that the estimation of the
moments of the distribution is not required. Also, while negative probability density values are
less likely to occur, it is also possible to incorporate additional monotonicity constraints in
order to avoid such erroneous results.

Finally, it is important to mention that this method is not related to any quantile regression
method [3], particularly to quantile smoothing splines regression methods [4,5]. In those cases,
the quantiles of a dependent variable are described as functions (e.g. splines) of independent
variables. In the present situation, splines will be used for describing the cumulative probability
function of the distribution by using the quantiles as knots. The current method might be
considered as a relative of the histospline method introduced in the 1970’s [6], where splines
are used to smooth the histogram of a data sample (not the cumulative probability function).

2. Quantile-based Splines Reconstruction Method

The method proposed in this report consists on describing the cumulative probability function
(𝜙) of a certain variable 𝑋, using splines. The spline is “a thin strip of wood used in building
construction”, which were used as an aid for drawing smooth curves between specified points
[7]. This idea was extended to the use of piecewise polynomials for interpolating functions
when certain specified points are known (knots). It is desirable that the different piecewise
polynomials guarantee continuity of the function and its derivatives, particularly at the knots. A
common approach is using cubic polynomials guaranteeing continuity up to the second

01/12/2020 ForsChem Research Reports Vol. 5, 2020-21 (2 / 23)


www.forschem.org
Reconstructing Probability Distributions
using Quantile-based Splines
Hugo Hernandez
ForsChem Research
[email protected]

derivative (𝐶 2 cubic splines [8]). For our purposes of fitting the cumulative probability function,
monotonicity must also be guaranteed [9].

Let us begin describing the cumulative probability function using the following 𝐶 2 cubic spline
interpolation functions [10]:

𝑘𝑖 (𝑥 − 𝑥𝑖−1 )3 𝑘𝑖−1 (𝑥𝑖 − 𝑥)3


𝜙𝑖 (𝑥) = ( − (𝑥 − 𝑥𝑖−1 )(𝑥𝑖 − 𝑥𝑖−1 )) + ( − (𝑥𝑖 − 𝑥)(𝑥𝑖 − 𝑥𝑖−1 ))
6 𝑥𝑖 − 𝑥𝑖−1 6 𝑥𝑖 − 𝑥𝑖−1
𝑖(𝑥 − 𝑥𝑖−1 ) + (𝑖 − 1)(𝑥𝑖 − 𝑥)
+ , 𝑥𝑖−1 ≤ 𝑥 ≤ 𝑥𝑖 , 𝑖 = 1,2, … , 𝑁
𝑁(𝑥𝑖 − 𝑥𝑖−1 )
(2.1)

where 𝜙𝑖 represents the cumulative probability function for the 𝑖-th equiprobable segment, 𝑥 is
any possible value of the variable 𝑋, 𝑥𝑖 represents the 𝑖-th 𝑁-quantile of the data, and 𝑘𝑖 is the
curvature parameter at the 𝑖-th quantile.

Eq. (2.1) satisfies the following conditions:

𝜙1 (𝑥0 ) = 0
(2.2)
𝑖
𝜙𝑖 (𝑥𝑖 ) = 𝜙𝑖+1 (𝑥𝑖 ) = , 𝑖 = 1,2, … , 𝑁 − 1
𝑁
(2.3)
𝜙𝑁 (𝑥𝑁 ) = 1
(2.4)

This means that the spline functions perfectly describe all the quantiles considered.

The corresponding probability density function obtained from Eq. (2.1) is:

𝑘𝑖 3(𝑥 − 𝑥𝑖−1 )2 𝑘𝑖−1 3(𝑥𝑖 − 𝑥)2


𝜌𝑖 (𝑥) = 𝜙𝑖′ (𝑥) = ( − 𝑥𝑖 + 𝑥𝑖−1 ) − ( − 𝑥𝑖 + 𝑥𝑖−1 )
6 𝑥𝑖 − 𝑥𝑖−1 6 𝑥𝑖 − 𝑥𝑖−1
1
+ , 𝑥𝑖−1 ≤ 𝑥 ≤ 𝑥𝑖
𝑁(𝑥𝑖 − 𝑥𝑖−1 )
(2.5)

In addition, the second derivative of the cumulative probability function (first derivative of the
probability density function) is:

𝑥 − 𝑥𝑖−1 𝑥𝑖 − 𝑥
𝜙𝑖′′ (𝑥) = 𝜌𝑖′ (𝑥) = 𝑘𝑖 + 𝑘𝑖−1 , 𝑥𝑖−1 ≤ 𝑥 ≤ 𝑥𝑖
𝑥𝑖 − 𝑥𝑖−1 𝑥𝑖 − 𝑥𝑖−1
(2.6)

01/12/2020 ForsChem Research Reports Vol. 5, 2020-21 (3 / 23)


www.forschem.org
Reconstructing Probability Distributions
using Quantile-based Splines
Hugo Hernandez
ForsChem Research
[email protected]

Eq. (2.6) satisfies the following set of conditions:

𝜙𝑖′′ (𝑥𝑖 ) = 𝜙𝑖+1


′′
(𝑥𝑖 ) = 𝑘𝑖
(2.7)

Eq. (2.7) shows that the curvature parameters actually represent the second derivatives of the
cumulative probability function at the quantiles. It also shows that Eq. (2.1) guarantees
continuity in the second derivative of the cumulative probability function. However, in order to
guarantee continuity in the first derivative, the following additional set of 𝑁 − 1 conditions
must be met:
𝜙𝑖′ (𝑥𝑖 ) = 𝜙𝑖+1

(𝑥𝑖 )
(2.8)
Using Eq. (2.5), condition (2.8) becomes [10]:

6 1 1
𝑘𝑖−1 (𝑥𝑖 − 𝑥𝑖−1 ) + 2𝑘𝑖 (𝑥𝑖+1 − 𝑥𝑖−1 ) + 𝑘𝑖+1 (𝑥𝑖+1 − 𝑥𝑖 ) = ( − )
𝑁 𝑥𝑖+1 − 𝑥𝑖 𝑥𝑖 − 𝑥𝑖−1
(2.9)

This means that we have 𝑁 + 1 curvature parameters (one for each quantile) and only 𝑁 − 1
algebraic equations, resulting in an underdetermined system. In order to find the optimal
values of the curvature parameters we will consider two possibilities: Assuming either clamped
or free boundaries [11].

2.1. Clamped Boundary Approach

In the clamped boundary approach, we will assume that the probability density function at
each of the extreme values is zero, and then the curvature parameters at the extremes can be
obtained using the following expressions (obtained by setting Eq. 2.5 equal to zero at 𝑥0 and
𝑥𝑁 ):
6
(2𝑘0 + 𝑘1 )(𝑥1 − 𝑥0 ) =
𝑁(𝑥1 − 𝑥0 )
(2.10)
6
(𝑘𝑁−1 + 2𝑘𝑁 )(𝑥𝑁 − 𝑥𝑁−1 ) = −
𝑁(𝑥𝑁 − 𝑥𝑁−1 )
(2.11)

We have now the two additional algebraic equations needed to determine the system. The
resulting system of algebraic equations can be expressed in matrix form as follows:

01/12/2020 ForsChem Research Reports Vol. 5, 2020-21 (4 / 23)


www.forschem.org
Reconstructing Probability Distributions
using Quantile-based Splines
Hugo Hernandez
ForsChem Research
[email protected]

2(𝑥1 − 𝑥0 ) 𝑥1 − 𝑥0 0 ⋯ 0 0 0 𝑘0
𝑥1 − 𝑥0 2(𝑥2 − 𝑥0 ) 𝑥2 − 𝑥1 ⋯ 0 0 0 𝑘1
0 𝑥2 − 𝑥1 2(𝑥3 − 𝑥1 ) ⋯ 0 0 0 𝑘2
⋮ ⋮ ⋮ ⋱ ⋮ ⋮ ⋮ ∙ ⋮
0 0 0 ⋯ 2(𝑥𝑁−1 − 𝑥𝑁−3 ) 𝑥𝑁−1 − 𝑥𝑁−2 0 𝑘𝑁−2
0 0 0 ⋯ 𝑥𝑁−1 − 𝑥𝑁−2 2(𝑥𝑁 − 𝑥𝑁−2 ) 𝑥𝑁 − 𝑥𝑁−1 𝑘𝑁−1
[ 0 0 0 ⋯ 0 𝑥𝑁 − 𝑥𝑁−1 2(𝑥𝑁 − 𝑥𝑁−1 )] [ 𝑘𝑁 ]
(𝑥1 − 𝑥0 )−1
(𝑥2 − 𝑥1 )−1 − (𝑥1 − 𝑥0 )−1
6 (𝑥3 − 𝑥2 )−1 − (𝑥2 − 𝑥1 )−1
= ⋮
𝑁 (𝑥 −1
− (𝑥𝑁−2 − 𝑥𝑁−3 )−1
𝑁−1 − 𝑥𝑁−2 )
(𝑥𝑁 − 𝑥𝑁−1 )−1 − (𝑥𝑁−1 − 𝑥𝑁−2 )−1
[ −(𝑥𝑁 − 𝑥𝑁−1 )−1 ]
(2.12)

and can be solved as:

−1
𝑘0 2(𝑥1 − 𝑥0 ) 𝑥1 − 𝑥0 0 ⋯ 0 0 0
𝑘1 𝑥1 − 𝑥0 2(𝑥2 − 𝑥0 ) 𝑥2 − 𝑥1 ⋯ 0 0 0
𝑘2 6 0 𝑥2 − 𝑥1 2(𝑥3 − 𝑥1 ) ⋯ 0 0 0
⋮ = ⋮ ⋮ ⋮ ⋱ ⋮ ⋮ ⋮
𝑘𝑁−2 𝑁 ⋯ 2(𝑥𝑁−1 − 𝑥𝑁−3 ) 𝑥𝑁−1 − 𝑥𝑁−2 0
0 0 0
𝑘𝑁−1 0 0 0 ⋯ 𝑥𝑁−1 − 𝑥𝑁−2 2(𝑥𝑁 − 𝑥𝑁−2 ) 𝑥𝑁 − 𝑥𝑁−1
[ 𝑘𝑁 ] [ 0 0 0 ⋯ 0 𝑥𝑁 − 𝑥𝑁−1 2(𝑥𝑁 − 𝑥𝑁−1 )]
(𝑥1 − 𝑥0 ) −1

(𝑥2 − 𝑥1 )−1 − (𝑥1 − 𝑥0 )−1


(𝑥3 − 𝑥2 )−1 − (𝑥2 − 𝑥1 )−1
∙ ⋮
(𝑥𝑁−1 − 𝑥𝑁−2 )−1 − (𝑥𝑁−2 − 𝑥𝑁−3 )−1
(𝑥𝑁 − 𝑥𝑁−1 )−1 − (𝑥𝑁−1 − 𝑥𝑁−2 )−1
[ −(𝑥𝑁 − 𝑥𝑁−1 )−1 ]
(2.13)

2.2. Free (Natural) Boundary Approach

An alternative approach is assuming that the curvature at the extreme quantiles is zero. In this
case, the probability density values at the extremes are now free. This assumption can be
expressed as:
𝑘0 = 𝑘𝑁 = 0
(2.14)
The mathematical problem simplifies in this case into:

01/12/2020 ForsChem Research Reports Vol. 5, 2020-21 (5 / 23)


www.forschem.org
Reconstructing Probability Distributions
using Quantile-based Splines
Hugo Hernandez
ForsChem Research
[email protected]

2(𝑥2 − 𝑥0 ) 𝑥2 − 𝑥1 ⋯ 0 0 𝑘1
𝑥2 − 𝑥1 2(𝑥3 − 𝑥1 ) ⋯ 0 0 𝑘2
⋮ ⋮ ⋱ ⋮ ⋮ ∙ ⋮
0 0 ⋯ 2(𝑥𝑁−1 − 𝑥𝑁−3 ) 𝑥𝑁−1 − 𝑥𝑁−2 𝑘𝑁−2
[ 0 0 ⋯ 𝑥𝑁−1 − 𝑥𝑁−2 2(𝑥𝑁 − 𝑥𝑁−2 )] 𝑘𝑁−1 ]
[
(𝑥2 − 𝑥1 ) − (𝑥1 − 𝑥0 )−1
−1

6 (𝑥3 − 𝑥2 )−1 − (𝑥2 − 𝑥1 )−1


= ⋮
𝑁 (𝑥 − 𝑥 ) −1
− (𝑥𝑁−2 − 𝑥𝑁−3 )−1
𝑁−1 𝑁−2
[ (𝑥𝑁 − 𝑥𝑁−1 )−1 − (𝑥𝑁−1 − 𝑥𝑁−2 )−1 ]
(2.15)

solved as:

−1
𝑘1 2(𝑥2 − 𝑥0 ) 𝑥2 − 𝑥1 ⋯ 0 0
𝑘2 6 𝑥2 − 𝑥1 2(𝑥3 − 𝑥1 ) ⋯ 0 0
⋮ = ⋮ ⋮ ⋱ ⋮ ⋮
𝑘𝑁−2 𝑁 ⋯ 2(𝑥𝑁−1 − 𝑥𝑁−3 ) 𝑥𝑁−1 − 𝑥𝑁−2
0 0
[𝑘𝑁−1 ] [ 0 0 ⋯ 𝑥𝑁−1 − 𝑥𝑁−2 2(𝑥𝑁 − 𝑥𝑁−2 )]
(𝑥2 − 𝑥1 ) − (𝑥1 − 𝑥0 )−1
−1

(𝑥3 − 𝑥2 )−1 − (𝑥2 − 𝑥1 )−1


∙ ⋮
(𝑥𝑁−1 − 𝑥𝑁−2 )−1 − (𝑥𝑁−2 − 𝑥𝑁−3 )−1
[ (𝑥𝑁 − 𝑥𝑁−1 )−1 − (𝑥𝑁−1 − 𝑥𝑁−2 )−1 ]
(2.16)

3. Monotonic Estimation of Curvature Parameters

The curvature parameters of the quantile-based spline function estimated in the previous
section might result in a non-monotonic behavior of the function. Since cumulative probability
functions are necessarily monotonic, it is very important to check monotonicity and correct the
values of the estimated parameters if needed.

Monotonicity of the cumulative probability function is guaranteed as long as the probability


density function is non-negative (at least for real variables [12]). For a cubic spline
approximation of the cumulative probability, the probability density function at each segment
will be described by a parabola. Thus, depending on the orientation of the parabola and on the
value of its roots it will be possible to evaluate if the probability density function takes negative
values or not.

01/12/2020 ForsChem Research Reports Vol. 5, 2020-21 (6 / 23)


www.forschem.org
Reconstructing Probability Distributions
using Quantile-based Splines
Hugo Hernandez
ForsChem Research
[email protected]

The orientation of the parabola can be easily identified by comparing the curvature values at
the quantiles of the corresponding segment. If 𝑘𝑖 > 𝑘𝑖−1 , then the parabola describes a
minimum. If 𝑘𝑖 < 𝑘𝑖−1 , then the parabola describes a maximum. If 𝑘𝑖 = 𝑘𝑖−1 , then the segment
is described by a straight line.

For each segment (𝑖) the root values (𝑥𝑟,𝑖 ) of the corresponding parabola are given by:

(𝑘 − 𝑘 )2 2(𝑘𝑖 − 𝑘𝑖−1 )
𝑘𝑖 𝑥𝑖−1 − 𝑘𝑖−1 𝑥𝑖 ± √( 𝑖 3 𝑖−1 + 𝑘𝑖−1 𝑘𝑖 ) (𝑥𝑖 − 𝑥𝑖−1 )2 − 𝑁
𝑥𝑟,𝑖 =
𝑘𝑖 − 𝑘𝑖−1
(3.1)

Complex root values indicate that the parabola does not change sign. Since it is estimated to
exactly fit the cumulative probability function at the quantiles, it cannot take negative values
thus guaranteeing monotonicity.

For real roots, we must check if the values of the region between quantiles can lie on the
negative side of the parabola. Non-monotonic behavior will be observed in the following cases:

For 𝑘𝑖 > 𝑘𝑖−1 when:

(𝑘𝑖 − 𝑘𝑖−1 )2 2(𝑘𝑖 − 𝑘𝑖−1 )


√( + 𝑘𝑖−1 𝑘𝑖 ) (𝑥𝑖 − 𝑥𝑖−1 )2 − − min(𝑘𝑖−1 , 𝑘𝑖 ) (𝑥𝑖 − 𝑥𝑖−1 ) > 0
3 𝑁
(3.2)
For 𝑘𝑖 < 𝑘𝑖−1 when either:

(𝑘𝑖 − 𝑘𝑖−1 )2 2(𝑘𝑖 − 𝑘𝑖−1 )


√( + 𝑘𝑖−1 𝑘𝑖 ) (𝑥𝑖 − 𝑥𝑖−1 )2 − − 𝑘𝑖−1 (𝑥𝑖 − 𝑥𝑖−1 ) < 0
3 𝑁
(3.3)
or
(𝑘𝑖 − 𝑘𝑖−1 )2 2(𝑘𝑖 − 𝑘𝑖−1 )
√( + 𝑘𝑖−1 𝑘𝑖 ) (𝑥𝑖 − 𝑥𝑖−1 )2 − + 𝑘𝑖 (𝑥𝑖 − 𝑥𝑖−1 ) < 0
3 𝑁
(3.4)
For 𝑘𝑖 = 𝑘𝑖−1 when:
2
|𝑘𝑖 | >
𝑁(𝑥𝑖 − 𝑥𝑖−1 )2
(3.5)

01/12/2020 ForsChem Research Reports Vol. 5, 2020-21 (7 / 23)


www.forschem.org
Reconstructing Probability Distributions
using Quantile-based Splines
Hugo Hernandez
ForsChem Research
[email protected]

If any of the corresponding conditions is observed for the parameter values estimated as
described in the previous section, then the parameter values must be re-estimated considering
the previous conditions (Eq. 3.2 to 3.5) as a constraints in the following optimization problem:

min 𝐹𝑜𝑏𝑗,𝑐𝑙𝑎𝑚
𝑘𝑖
(𝑖=0,…,𝑁)
𝑁−1

= ∑ (𝑘𝑖−1 (𝑥𝑖 − 𝑥𝑖−1 ) + 2𝑘𝑖 (𝑥𝑖+1 − 𝑥𝑖−1 ) + 𝑘𝑖+1 (𝑥𝑖+1 − 𝑥𝑖 )


𝑖=1
2 2
6 1 1 6
− ( − )) + ((2𝑘0 + 𝑘1 )(𝑥1 − 𝑥0 ) − )
𝑁 𝑥𝑖+1 − 𝑥𝑖 𝑥𝑖 − 𝑥𝑖−1 𝑁(𝑥1 − 𝑥0 )
2
6
+ ((𝑘𝑁−1 + 2𝑘𝑁 )(𝑥𝑁 − 𝑥𝑁−1 ) + )
𝑁(𝑥𝑁 − 𝑥𝑁−1 )
(3.6)
for the clamped-boundary spline, or

min 𝐹𝑜𝑏𝑗,𝑓𝑟𝑒𝑒
𝑘𝑖
(𝑖=1,…,𝑁−1)
𝑁−1

= ∑ (𝑘𝑖−1 (𝑥𝑖 − 𝑥𝑖−1 ) + 2𝑘𝑖 (𝑥𝑖+1 − 𝑥𝑖−1 ) + 𝑘𝑖+1 (𝑥𝑖+1 − 𝑥𝑖 )


𝑖=1
2
6 1 1
− ( − ))
𝑁 𝑥𝑖+1 − 𝑥𝑖 𝑥𝑖 − 𝑥𝑖−1
(3.7)
for the free-boundary spline.

The curvature parameter values can then be obtained by any suitable numerical optimization
method, either using the algebraic solution (Eq. 2.13 or 2.16), or starting from 𝑘𝑖 = 0 for all
quantiles. Particularly, only the curvature parameters at the extremes of the segments
violating the monotonicity constraint need to be re-optimized. In those cases, the starting
values of the curvature parameters to be re-optimized can also be selected as intermediate
values between 0 and their original algebraic solution values.

4. Properties of the Quantile-based Spline Function

Once the curvature parameters for the quantile-based spline function representing the
cumulative probability function have been (monotonically) estimated, the following properties
of the distribution can be determined.

01/12/2020 ForsChem Research Reports Vol. 5, 2020-21 (8 / 23)


www.forschem.org
Reconstructing Probability Distributions
using Quantile-based Splines
Hugo Hernandez
ForsChem Research
[email protected]

The expected value of the distribution in terms of the curvature parameters at the quantiles is
the following:
𝑥𝑁 𝑁 𝑥𝑖
𝐸(𝑋) = ∫ 𝑥𝜌(𝑥)𝑑𝑥 = ∑ ∫ 𝑥𝜌𝑖 (𝑥)𝑑𝑥
𝑥0 𝑖=1 𝑥𝑖−1
𝑁 2
1 6𝑥𝑖−1 𝑥𝑖2 − 8𝑥𝑖−1 𝑥𝑖3 + 3𝑥𝑖4 − 𝑥𝑖−1
4
= ∑ 𝑘𝑖 ( )
24 𝑥𝑖 − 𝑥𝑖−1
𝑖=1
𝑁 2
1 6𝑥𝑖−1 𝑥𝑖2 − 8𝑥𝑖−1
3 4
𝑥𝑖 + 3𝑥𝑖−1 − 𝑥𝑖4
+ ∑ 𝑘𝑖−1 ( )
24 𝑥𝑖 − 𝑥𝑖−1
𝑖=1
𝑁 𝑁
1 2
𝑥𝑖 + 𝑥𝑖−1 1 𝑥𝑖 + 𝑥𝑖−1
− ∑(𝑘𝑖 − 𝑘𝑖−1 )(𝑥𝑖 − 𝑥𝑖−1 ) ( ) + ∑( )
6 2 𝑁 2
𝑖=1 𝑖=1
(4.1)

In general, the raw moments of the distribution will be given by the following expression:

𝑥𝑁
𝑀𝑚 (𝑋) = 𝐸(𝑋 𝑚 ) = ∫ 𝑥 𝑚 𝜌(𝑥)𝑑𝑥
𝑥0
𝑁
1 𝑘𝑖 𝑥𝑖𝑚+1 2
𝑥𝑖−1 2𝑥𝑖−1 𝑥𝑖 𝑥𝑖2
= ∑ ( − + )
2 𝑥𝑖 − 𝑥𝑖−1 𝑚 + 1 𝑚 + 2 𝑚 + 3
𝑖=1
𝑁 𝑚+1
1 𝑘𝑖−1 𝑥𝑖−1 𝑥𝑖2 2𝑥𝑖−1 𝑥𝑖 2
𝑥𝑖−1
+ ∑ ( − + )
2 𝑥𝑖 − 𝑥𝑖−1 𝑚 + 1 𝑚 + 2 𝑚 + 3
𝑖=1
𝑁
𝑚+3
1 𝑘𝑖 𝑥𝑖−1 + 𝑘𝑖−1 𝑥𝑖𝑚+3
− ∑
(𝑚 + 1)(𝑚 + 2)(𝑚 + 3) 𝑥𝑖 − 𝑥𝑖−1
𝑖=1
𝑁
1
− ∑(𝑘𝑖 − 𝑘𝑖−1 )(𝑥𝑖 − 𝑥𝑖−1 )(𝑥𝑖𝑚+1 − 𝑥𝑖−1
𝑚+1
)
6(𝑚 + 1)
𝑖=1
𝑁
1 𝑥𝑖𝑚+1 − 𝑥𝑖−1
𝑚+1
+ ∑
𝑁(𝑚 + 1) 𝑥𝑖 − 𝑥𝑖−1
𝑖=1
(4.2)

The variance, standard deviation and all moments about the mean can then be obtained from
the raw moments calculated using Eq. (4.2).

The probability density function reconstruction method using quantile-based splines was
implemented as a worksheet with macros in Microsoft Excel®. This application, called
ForsChem Cesium XL, can be freely downloaded from the ForsChem Research site
(www.forschem.org).

01/12/2020 ForsChem Research Reports Vol. 5, 2020-21 (9 / 23)


www.forschem.org
Reconstructing Probability Distributions
using Quantile-based Splines
Hugo Hernandez
ForsChem Research
[email protected]

5. Examples

5.1. Approximating the Standard Uniform Distribution

As a first example, let us consider the Type III standard uniform distribution [13], corresponding
to a uniform distribution between 0 and 1. The cumulative probability function is:

0, 𝑥<0
𝜙𝑈 (𝑥) = { 𝑥, 0≤𝑥≤1
1, 𝑥>1
(5.1)
and the probability density function is:

0, 𝑥<0
𝜌𝑈 (𝑥) = { 1, 0≤𝑥≤1
0, 𝑥>1
(5.2)

In order to approximate this distribution using the quantile-based spline approach, we need to
determine the corresponding quantiles. Considering 𝑁-quantiles in general, the corresponding
values for this distribution are:

𝑖 𝑖
𝑥𝑖 = 𝜙𝑈−1 ( ) = , 𝑖 = 0,1, … , 𝑁
𝑁 𝑁
(5.3)

Table 1. Curvature parameter values for the quantile-based spline (clamped and free
boundaries) approximation of the standard uniform distribution
Bitiles (𝑵 = 𝟐) Quartiles (𝑵 = 𝟒) Deciles (𝑵 = 𝟏𝟎)
𝒊 𝒙𝒊 𝒌𝒊,𝒄𝒍𝒂𝒎 𝒌𝒊,𝒇𝒓𝒆𝒆 𝒙𝒊 𝒌𝒊,𝒄𝒍𝒂𝒎 𝒌𝒊,𝒇𝒓𝒆𝒆 𝒙𝒊 𝒌𝒊,𝒄𝒍𝒂𝒎 𝒌𝒊,𝒇𝒓𝒆𝒆
0 0 6 0 0 13.71 0 0 34.64 0
1 0.5 0 0 0.25 -3.43 0 0.1 -9.28 0
2 1 -6 0 0.5 0 0 0.2 2.49 0
3 0.75 3.43 0 0.3 -0.66 0
4 1 -13.71 0 0.4 0.17 0
5 0.5 0 0
6 0.6 -0.17 0
7 0.7 0.66 0
8 0.8 -2.49 0
9 0.9 9.28 0
10 1 -34.64 0

01/12/2020 ForsChem Research Reports Vol. 5, 2020-21 (10 / 23)


www.forschem.org
Reconstructing Probability Distributions
using Quantile-based Splines
Hugo Hernandez
ForsChem Research
[email protected]

Table 1 shows the values of the quantiles and the corresponding curvature parameter values
estimated for both clamped- and free-boundary monotonic splines, and for different quantiles:
Bitiles, quartiles and deciles. The approximations obtained are graphically summarized in Figure
1, presenting the corresponding goodness-of-fit (𝑅 2) for the cumulative probability function,
and the similitude (𝓈) [14] of the probability density function.

Figure 1. Performance of quantile-based spline approximations of the standard uniform


distribution. Left plots: Cumulative probability. Right plots: Probability densities. Solid blue
lines: Standard uniform distribution. Dotted red lines: Quantile-based clamped-boundary spline
approximations. Dotted green lines: Quantile-based free-boundary spline approximations.

01/12/2020 ForsChem Research Reports Vol. 5, 2020-21 (11 / 23)


www.forschem.org
Reconstructing Probability Distributions
using Quantile-based Splines
Hugo Hernandez
ForsChem Research
[email protected]

Table 2. Distribution properties predicted by the quantile-based spline (clamped and free
boundaries) approximations of the standard uniform distribution
N Boundaries Mean Variance Std. Dev.
2 Clamped 0.5 0.05 0.2236
4 Clamped 0.5 0.07411 0.2722
10 Clamped 0.5 0.08174 0.2859
2,4,10 Free 0.5 0.08333 0.2887
Exact values 0.5 0.08333 0.2887

For this particular example, the free-boundary splines for all quantiles considered accurately
described the behavior of the standard uniform distribution. Even a bitile-based free-boundary
spline achieved such perfect fit. This, of course, cannot be generalized to any distribution but it
is a particular feature of the uniform distribution.

When considering clamped boundaries, such perfect fit was not possible. In this case, by
increasing the number of quantiles, the fit of both the cumulative probability and the
probability density functions improves.

Table 2 compares the estimation of the mean, variance and standard deviation of the
distribution using the different quantile-based spline approximations for the standard uniform
distribution. In all cases, the mean value was correctly predicted due to the symmetry of the
uniform distribution. The variance and standard deviation were correctly predicted by the free-
boundary splines, but significant errors were introduced in the clamped-boundary splines,
particularly when only bitiles are considered (22.5% deviation in the standard deviation). For
quartiles the deviation in the estimation of the standard deviation decreases to 5.7%, and for
the deciles, the deviation in the standard deviation reaches a value of 1%. Considering clamped
boundaries, increasing the number of quantiles considered improves the estimation of the
properties of the uniform distribution, but the number of curvature parameters used to
describe the function also increases.

5.2. Approximating a Truncated Standard Normal Distribution

The normal distribution can also be approximated using quantile-based splines. However, the
quantile-based spline approximation only works for bounded distributions. Thus, a truncated
version of the normal distribution [15] can be used. We then need to select some reasonable
limits for truncating the distribution. Considering a standard normal distribution, and assuming
a symmetrical truncation (𝑥𝑚𝑎𝑥 = −𝑥𝑚𝑖𝑛 ), the cumulative probability function of the truncated
standard normal distribution will be:

01/12/2020 ForsChem Research Reports Vol. 5, 2020-21 (12 / 23)


www.forschem.org
Reconstructing Probability Distributions
using Quantile-based Splines
Hugo Hernandez
ForsChem Research
[email protected]

𝑥 𝑥
erf ( ) − erf ( 𝑚𝑖𝑛 )
𝜙𝑁 (𝑥) = √2 √2 , 𝑥𝑚𝑖𝑛 ≤ 𝑥 ≤ 𝑥𝑚𝑎𝑥
𝑥𝑚𝑎𝑥 𝑥
erf ( ) − erf ( 𝑚𝑖𝑛 )
√2 √2
(5.1)
with the corresponding probability density function:

𝑥2
√2 𝑒 − 2
𝜋
𝜌𝑁 (𝑥) = , 𝑥𝑚𝑖𝑛 ≤ 𝑥 ≤ 𝑥𝑚𝑎𝑥
𝑥 𝑥
erf ( 𝑚𝑎𝑥 ) − erf ( 𝑚𝑖𝑛 )
√2 √2
(5.2)

Table 3 shows the values of the curvature parameters obtained considering clamped and free
boundaries for bitile-, quartile- and decile-based splines, and using 𝑥𝑚𝑎𝑥 = 4. The performance
of the splines obtained is illustrated in Figure 2 (clamped-boundary splines) and Figure 3 (free-
boundary splines). In this case, better results were obtained using clamped boundaries (despite
the unnatural limiting behavior of the probability density observed in some cases). For quartiles
and deciles, the splines obtained by the exact algebraic solution were non-monotonic. Since
the curvature parameters are found by constrained optimization, discontinuities in the
probability density function are observed. Table 4 summarizes the distribution properties
obtained for each spline function.

Table 3. Curvature parameter values for the quantile-based spline (clamped and free
boundaries) approximation of the truncated standard normal distribution
Bitiles (𝑵 = 𝟐) Quartiles (𝑵 = 𝟒) Deciles (𝑵 = 𝟏𝟎)
𝒊 𝒙𝒊 𝒌𝒊,𝒄𝒍𝒂𝒎 𝒌𝒊,𝒇𝒓𝒆𝒆 𝒙𝒊 𝒌𝒊,𝒄𝒍𝒂𝒎 𝒌𝒊,𝒇𝒓𝒆𝒆 𝒙𝒊 𝒌𝒊,𝒄𝒍𝒂𝒎 𝒌𝒊,𝒇𝒓𝒆𝒆
0 -4 0.0938 0 -4 -0.0543 0 -4 -0.0522 0
1 0 0 0 -0.6745 0.2108 0.1356 -1.2816 0.1439 0.0812
2 4 -0.0938 0 0 0 0 -0.8416 0.2565 0.2636
3 0.6745 -0.2108 -0.1356 -0.5244 0.1815 0.1795
4 4 0.0543 0 -0.2533 0.1006 0.1011
5 0 0 0
6 0.2533 -0.1006 -0.1011
7 0.5244 -0.1815 -0.1795
8 0.8416 -0.2565 -0.2636
9 1.2816 -0.1439 -0.0812
10 4 0.0522 0

01/12/2020 ForsChem Research Reports Vol. 5, 2020-21 (13 / 23)


www.forschem.org
Reconstructing Probability Distributions
using Quantile-based Splines
Hugo Hernandez
ForsChem Research
[email protected]

Figure 2. Performance of quantile-based clamped-boundary spline approximations of the


truncated standard normal distribution. Left plots: Cumulative probability. Right plots:
Probability densities. Solid blue lines: Standard uniform distribution. Dotted red lines: Quantile-
based clamped-boundary spline approximations.

Table 4. Distribution properties predicted by the quantile-based spline (clamped and free
boundaries) approximations of the truncated standard normal distribution
N Boundaries Mean Variance Std. Dev.
Clamped 0 3.2 1.7889
2
Free 0 5.33333 2.3094
Clamped 0 1.20106 1.0959
4
Free 0 1.41451 1.1893
Clamped 0 1.11785 1.0573
10
Free 0 1.17683 1.0848
Exact values 0 0.99893 0.9995

01/12/2020 ForsChem Research Reports Vol. 5, 2020-21 (14 / 23)


www.forschem.org
Reconstructing Probability Distributions
using Quantile-based Splines
Hugo Hernandez
ForsChem Research
[email protected]

Figure 3. Performance of quantile-based free-boundary spline approximations of the truncated


standard normal distribution. Left plots: Cumulative probability. Right plots: Probability
densities. Solid blue lines: Standard uniform distribution. Dotted green lines: Quantile-based
free-boundary spline approximations.

Particularly, the exact variance of the truncated normal distribution is determined using the
following expression:

𝑥𝑚𝑎𝑥 2
√2 𝑥𝑚𝑎𝑥 𝑒 − 2
𝜋
𝑉𝑎𝑟(𝑋) = 1 −
𝑥
erf ( 𝑚𝑎𝑥 )
√2
(5.3)

5.3. Approximating the Standard Normal Distribution from a Random Sample

Another approach for approximating a random distribution results from fitting the splines
using the quantiles obtained from a random sample obtained from the distribution.
Particularly, a sample of 100 random number obtained from the standard normal distribution
was used for approximating the cumulative probability using quantile-based splines. The

01/12/2020 ForsChem Research Reports Vol. 5, 2020-21 (15 / 23)


www.forschem.org
Reconstructing Probability Distributions
using Quantile-based Splines
Hugo Hernandez
ForsChem Research
[email protected]

quantiles obtained from the random sample and the corresponding fitted curvature
parameters are summarized in Table 5.

Table 5. Curvature parameter values for the quantile-based spline (clamped and free
boundaries) approximation of the standard normal distribution obtained from a sample of 100
random numbers
Bitiles (𝑵 = 𝟐) Quartiles (𝑵 = 𝟒) Deciles (𝑵 = 𝟏𝟎)
𝒊 𝒙𝒊 𝒌𝒊,𝒄𝒍𝒂𝒎 𝒌𝒊,𝒇𝒓𝒆𝒆 𝒙𝒊 𝒌𝒊,𝒄𝒍𝒂𝒎 𝒌𝒊,𝒇𝒓𝒆𝒆 𝒙𝒊 𝒌𝒊,𝒄𝒍𝒂𝒎 𝒌𝒊,𝒇𝒓𝒆𝒆
0 -2.3791 0.2226 0 -2.3791 0.1667 0 -2.3791 -0.0163 0
1 0.1053 0.0409 0.0204 -0.6363 0.1472 0.2185 -1.0859 0.3353 0.2794
2 2.2531 -0.3456 0 0.1053 0.0993 0.1023 -0.7760 0.7696 0.7716
3 0.7668 -0.2242 -0.3087 -0.5424 -1.3948 -1.3952
4 2.2531 -0.2242 0 -0.1342 1.2557 1.2558
5 0.1053 -0.0936 -0.0935
6 0.3243 0.0935 0.0929
7 0.5497 -0.6267 -0.6242
8 0.8316 0.2900 0.2816
9 1.1165 -0.5886 -0.4644
10 2.2531 0.0747 0

The performance of all spline curves is graphically summarized in Figure 4 for clamped-
boundary splines and in Figure 5 for free-boundary splines. In the cumulative probability plots,
the behavior of the standard normal distribution is also included for comparison, but the
coefficient 𝑅 2 is determined from the observed cumulative relative frequency of the random
sample. Similarly, the probability density plot includes the behavior of the standard normal
distribution but also the observed probability density using a Naïve estimator based on the
finite central differences in cumulative relative frequency [1]. In this case, the similitude
coefficient is determined comparing the quantile-based probability density and the true
probability density of the distribution. While the decile-based spline improved the fit of the
cumulative relative frequency of the sample, it starts to deviate from the original distribution
function. In that sense, the quartile-based spline functions are capable of satisfactorily fitting
the data sample while providing a better prediction of the original distribution. Also in general,
the clamped-boundary splines performed better than the corresponding free-boundary splines.

01/12/2020 ForsChem Research Reports Vol. 5, 2020-21 (16 / 23)


www.forschem.org
Reconstructing Probability Distributions
using Quantile-based Splines
Hugo Hernandez
ForsChem Research
[email protected]

Figure 4. Performance of quantile-based clamped-boundary spline approximations from a


sample of 100 random numbers obtained from the standard normal distribution. Left plots:
Cumulative probability. Right plots: Probability densities. Black dots: Random sample. Solid
blue lines: Standard uniform distribution. Dotted red lines: Quantile-based clamped-boundary
spline approximations.

Table 6. Distribution properties predicted by the quantile-based spline (clamped and free
boundaries) approximations of a sample of 100 random numbers from the standard normal
distribution
N Boundaries Mean Variance Std. Dev.
Clamped 0.0637 1.06535 1.0322
2
Free 0.0427 1.78027 1.3343
Clamped 0.0538 0.93226 0.9655
4
Free 0.0521 1.06217 1.0306
Clamped 0.0304 0.71827 0.8475
10
Free 0.0298 0.73974 0.8601
Exact values 0 1 1

01/12/2020 ForsChem Research Reports Vol. 5, 2020-21 (17 / 23)


www.forschem.org
Reconstructing Probability Distributions
using Quantile-based Splines
Hugo Hernandez
ForsChem Research
[email protected]

Figure 5. Performance of quantile-based free-boundary spline approximations from a sample of


100 random numbers obtained from the standard normal distribution. Left plots: Cumulative
probability. Right plots: Probability densities. Black dots: Random sample. Solid blue lines:
Standard uniform distribution. Dotted green lines: Quantile-based free-boundary spline
approximations.

Table 6 summarizes the properties of the distribution estimated from the corresponding
quantile-based spline approximations. These results also confirm that the quartile-based splines
provided a better description of the original standard normal probability distribution.

5.4. Approximating a Data Sample with an Arbitrary Distribution: Old Faithful Geyser
Eruptions Benchmark

Silverman [16] presented a data sample of the time between eruptions for the Old Faithful
Geyser. This dataset was used as a benchmark for testing and comparing different methods for
the reconstruction of probability density functions [1]. Similarly, this data sample is
approximated using the quantile-based spline approach. However, only quartiles and deciles
will be considered, as it has been previously observed that bitiles are quite limited for
describing probability density functions. Again, clamped- and free-boundary splines are
considered. The quantiles and curvature parameters obtained for this data set are summarized

01/12/2020 ForsChem Research Reports Vol. 5, 2020-21 (18 / 23)


www.forschem.org
Reconstructing Probability Distributions
using Quantile-based Splines
Hugo Hernandez
ForsChem Research
[email protected]

in Table 7. Only the quartile-based free-boundary spline provided a monotonic function from
the algebraic solution of the curvature parameter values. In all other cases, constrained
numerical optimization was used for finding the curvature parameters.

Table 7. Curvature parameter values for the quantile-based spline (clamped and free
boundaries) approximation of the Old Faithful Geyser Eruptions benchmark
Quartiles (𝑵 = 𝟒) Deciles (𝑵 = 𝟏𝟎)
𝒊 𝒙𝒊 𝒌𝒊,𝒄𝒍𝒂𝒎 𝒌𝒊,𝒇𝒓𝒆𝒆 𝒙𝒊 𝒌𝒊,𝒄𝒍𝒂𝒎 𝒌𝒊,𝒇𝒓𝒆𝒆
0 1.6700 2.4111 0 -2.3791 13.6580 0
1 2.3000 -1.0430 -0.6507 -1.0859 -2.6611 0.9497
2 3.8000 1.0287 0.9274 -0.7760 -0.7779 -0.8205
3 4.2500 -0.2539 -0.6835 -0.5424 0.2012 0.2450
4 4.9300 -1.4950 0 -0.1342 1.5380 0.7521
5 0.1053 -1.6782 -1.6731
6 0.3243 3.7249 3.7366
7 0.5497 -3.9030 -3.9701
8 0.8316 1.2700 1.4719
9 1.1165 -0.3689 -1.1304
10 2.2531 -2.3804 0

Figure 6 summarizes the performance of the different spline curves for describing the
cumulative probability data. Also, the corresponding probability density functions obtained are
presented. The decile-based splines provide better fits of the observed cumulative probability,
but the quartile-based splines can satisfactorily identify the bimodal nature of the data. In
addition, the free-boundary approaches provided similar performance fitting the cumulative
probability function using deciles, but the free-boundary spline performed slightly better using
quartiles.

Finally, Table 8 shows the properties of the distribution (mean, variance and standard
deviation) predicted by each spline approximation. These values can be compared to the
properties observed in the data sample. In this case, the decile-based free-boundary spline
function provided the closest approximation to the observed properties of the sample.
However, this does not necessarily mean that this function provides a better description of the
true probability distribution of the population. In fact, as the number of quantiles increases a
better description of the behavior of the sample is possible; the effect, however, is similar to
overfitting of mathematical models, where the descriptive power improves but the predictive
power is negatively affected.

01/12/2020 ForsChem Research Reports Vol. 5, 2020-21 (19 / 23)


www.forschem.org
Reconstructing Probability Distributions
using Quantile-based Splines
Hugo Hernandez
ForsChem Research
[email protected]

Figure 6. Performance of quantile-based free-boundary spline approximations for the Old


Faithful Geyser Eruptions benchmark. Left plots: Cumulative probability. Right plots: Probability
densities. Black dots: Data sample. Dotted red lines: Quantile-based clamped-boundary spline
approximations. Dotted green lines: Quantile-based free-boundary spline approximations.

01/12/2020 ForsChem Research Reports Vol. 5, 2020-21 (20 / 23)


www.forschem.org
Reconstructing Probability Distributions
using Quantile-based Splines
Hugo Hernandez
ForsChem Research
[email protected]

Table 8. Distribution properties predicted by the quantile-based spline (clamped and free
boundaries) approximations of the Old Faithful Geyser Eruptions benchmark
N Boundaries Mean Variance Std. Dev.
Clamped 3.4048 0.98878 0.9944
4
Free 3.4366 1.04444 1.0220
Clamped 3.4430 1.08236 1.0404
10
Free 3.4390 1.09415 1.0460
Exact values 3.4599 1.08221 1.0403

6. Conclusion

An alternative method for reconstructing probability density functions from data samples has
been proposed and explained. This is a polynomial approximation of the cumulative probability
function based on the values of the quantiles observed in the data. The polynomial considered
is a piecewise cubic spline with continuity up to the second derivative (first derivative of the
probability density function). Such polynomial can be expressed in terms of the curvature
parameters at the knots of the splines (corresponding to the second derivatives of the
cumulative function at the quantiles). The curvature parameter values can be easily estimated
by solving a system of linear algebraic equations. Since the system of equations is
underdetermined, additional boundary assumptions are required in order to solve the system.
Two alternatives were presented: Clamped boundaries (where the probability density function
at the extreme values of the random variable are assumed to be exactly zero), and free
(natural) boundaries (where the derivative of the density function at the extremes is zero).

However, not all solutions are valid. If the curvature parameters do not guarantee
monotonicity of the cumulative probability function, then the probability density function will
take unfeasible negative values. In those cases, the curvature parameters must be obtained
from a constrained optimization problem which guarantees monotonicity in all polynomial
segments of the cumulative probability function. The objective function to be minimized
represents the lack of continuity in the splines.

Different representative examples were presented, and for each case, different quantiles and
boundary types were tested. In general, as the number of quantiles increases, the polynomial
can provide a better description of the observed cumulative probability function. However, it
also decreases the capability of the polynomial for predicting the true cumulative probability of
the population. Thus, deciles are recommended for describing the behavior of the data sample,
whereas quartiles are recommended for estimating the behavior of the population. In addition,
it is recommended that both clamped and free boundaries are tested and compared, as long as
the behavior of the probability density function at the limits is not known.

01/12/2020 ForsChem Research Reports Vol. 5, 2020-21 (21 / 23)


www.forschem.org
Reconstructing Probability Distributions
using Quantile-based Splines
Hugo Hernandez
ForsChem Research
[email protected]

The probability density function reconstruction method using quantile-based splines was
implemented as a worksheet with macros in Microsoft Excel®. This application, called
ForsChem Cesium XL, can be freely downloaded from the ForsChem Research site
(www.forschem.org).

Acknowledgments

This research did not receive any specific grant from funding agencies in the public,
commercial, or not-for-profit sectors.

References

[1] Hernandez, H. (2018). Comparison of Methods for the Reconstruction of Probability Density
Functions from Data Samples. ForsChem Research Reports, 3, 2018-12. doi:
10.13140/RG.2.2.30177.35686.

[2] Walpole, R. E., Myers, R. H., Myers, S. L., & Ye, K. (2012). Probability & Statistics for
Engineers & Scientists. 9th Edition, Prentice Hall, Boston. p. 255.

[3] Koenker, R., & Hallock, K. F. (2001). Quantile regression. Journal of Economic Perspectives,
15(4), 143-156.

[4] Koenker, R., Ng, P., & Portnoy, S. (1994). Quantile smoothing splines. Biometrika, 81(4), 673-
680.

[5] De Rossi, G., & Harvey, A. (2009). Quantiles, expectiles and splines. Journal of Econometrics,
152(2), 179-185.

[6] Boneva, L. I., Kendall, D., & Stefanov, I. (1971). Spline transformations: Three new diagnostic
aids for the statistical data-analyst. Journal of the Royal Statistical Society. Series B
(Methodological), 33(1), 1-71.

[7] Wegman, E. J., & Wright, I. W. (1983). Splines in Statistics. Journal of the American Statistical
Association, 78(382), 351-365.

[8] Beatson, R. K., & Chacko, E. (1992). Which cubic spline should one use?. SIAM Journal on
Scientific and Statistical Computing, 13(4), 1009-1024.

[9] Fritsch, F. N., & Carlson, R. E. (1980). Monotone piecewise cubic interpolation. SIAM Journal
on Numerical Analysis, 17(2), 238-246.

[10] Kiusalaas, J. (2005). Numerical Methods in Engineering with Python. Cambridge University
Press, Cambridge. p. 117.

01/12/2020 ForsChem Research Reports Vol. 5, 2020-21 (22 / 23)


www.forschem.org
Reconstructing Probability Distributions
using Quantile-based Splines
Hugo Hernandez
ForsChem Research
[email protected]

[11] Burden, R. L., & Faires, J. D. (2011). Numerical Analysis, 9th Edition. Brooks/Cole Cengage
Learning, Boston. p. 146.

[12] Hernandez, H. (2019). Probability Density Functions of Imaginary and Complex Random
Variables. ForsChem Research Reports, 4, 2019-09. doi: 10.13140/RG.2.2.20867.66083.

[13] Hernandez, H. (2018). Multidimensional Randomness, Standard Random Variables and


Variance Algebra. ForsChem Research Reports, 3, 2018-02. doi: 10.13140/RG.2.2.11902.48966.

[14] Hernandez, H. (2018). Parameter Identification using Standard Transformations: An


Alternative Hypothesis Testing Method. ForsChem Research Reports, 3, 2018-04. doi:
10.13140/RG.2.2.14895.02728.

[15] Hernandez, H. (2020). Molecular Speed Limits and Truncated Distributions. ForsChem
Research Reports, 5, 2020-17. doi: 10.13140/RG.2.2.29426.73922.

[16] Silverman, B. W. (1998). Density Estimation for Statistics and Data Analysis. Boca Raton:
Chapman & Hall/CRC.

01/12/2020 ForsChem Research Reports Vol. 5, 2020-21 (23 / 23)


www.forschem.org

You might also like