1. Introduction
The Pareto distribution has been a key tool in economics and for modeling wealth distribution, as well as in other areas like insurance and finance, where capturing extreme events is crucial [
1]. However, real-world data often exhibit more complex structures than those described by the classical Pareto distribution [
2]. In this context, Feller (1971) [
3] proposed an extension called the Pareto–Feller distribution which includes an additional shape parameter, allowing the distribution to better fit a wider range of phenomena with thicker or heavy tails as needed [
1].
The Pareto–Feller distribution has been widely used across various disciplines to model “heavy-tailed” phenomena, where extreme events such as high incomes or large losses are of interest. The Pareto–Feller distribution emerged from the need for a distribution that offers greater flexibility in data modeling by introducing an additional parameter to control the tail shape and skewness, thus providing a more accurate description of empirical data compared with the standard Pareto distribution [
2]. This distribution has found applications in various fields such as risk theory, river flow modeling, and natural disaster analysis due to its ability to represent both heavy tails and asymmetric distributions. The additional flexibility provided by this distribution is especially valuable in situations where conventional distributions, like the classical Pareto distribution, fail to capture the observed variability and extreme behavior in the data. Notable variants include Pareto type I, used in wealth analysis [
1], the generalized Pareto distribution for modeling extreme events [
4], and the Lomax or Pareto type II distribution, which is applied in survival analysis [
5]. Other important variants are Pareto type IV, which offers greater flexibility in tail shapes [
6], and the truncated Pareto distribution, which is used in scenarios with physical upper limits [
7]. Additionally, the log-logistic distribution, which is employed in contexts similar to the Pareto distribution but with greater flexibility in the tails, is used in survival analysis and system failure studies [
8].
The Pareto–Feller distribution can be constructed as a location-scale transformation of the ratio of two independent gamma-distributed random variables. This method allows the distribution to capture a wide range of tail behaviors and offers flexibility in modeling heavy-tailed phenomena. The use of gamma distributions for generating such models is well documented in the statistical literature. Specifically, more flexible models can be obtained through an appropriate transformation of a bivariate gamma distribution or independent copies of it. This approach is appealing because the correlation of the transformed bivariate gamma distribution is directly tied to the correlation of the original bivariate gamma distribution. Such constructions are commonly used in spatial data modeling. Examples include
distributed spatial models [
9], Weibull spatial models [
10], and Poisson spatial models [
11]. On the other hand, Kotz et al. (2004) [
12] described similar methods for transforming distributions via ratios of gamma variables which are widely applicable in reliability and survival analysis contexts [
12].
Since the construction of the Pareto–Feller distribution is related to the gamma distribution, it is necessary to first define the bivariate gamma distribution to develop the bivariate case of the Pareto–Feller distribution. Thus, we start by defining a sequence of independent normal random variables and show how these lead to a gamma distribution as follows. Let
,
with
be a sequence of independent standardized normal random variables whose correlation function is given by
,
, and let
Then,
is a random variable with a gamma marginal distribution (i.e.,
, where
represents the shape parameter and
represents the rate parameter), with the probability density function (pdf) being given by
where
and
[
10].
The construction of a bivariate Pareto–Feller distribution is derived from the ratio of two bivariate gamma distributions as shown in
Section 2. We consider the bivariate vector
, where the stochastic representation of
,
is given in Equation (
1). Vere-Jones [
13] showed that the distribution of
has a correlated bivariate gamma distribution with the parameters
and
, while the pdf is given by
where
is the usual modified Bessel function of the
-order of the first kind. Gamma variables can be used as building blocks for the construction of flexible non-Gaussian variables. Henceforth, we will call
a gamma random vector with an underlying correlation
[
14,
15] such that the correlation of the gamma bivariate is
. Moreover, when
, Equation (
2) can be written as the product of two independent gamma random variables (i.e.,
,
). Thus, zero pairwise correlation implies pairwise independence, as in the Gaussian case. The pdf
was first discussed in [
16], and its properties were studied in [
17,
18].
The bivariate Pareto–Feller distribution represents a less commonly discussed extension in the statistical literature. It builds upon the principles established by the univariate Pareto distribution. The authors of [
19] addressed extreme value distributions and included discussions which may be relevant to bivariate generalizations. Additionally, the authors of [
20] provided further insights into bivariate distributions, enriching the understanding of Pareto–Feller distributions. These references offer both theoretical and practical frameworks for researching and applying Pareto–Feller distributions in bivariate contexts. The latter works motivated this paper, which presented a bivariate Pareto–Feller distribution built from an Appell hypergeometric function.
This paper is organized as follows.
Section 2 presents the bivariate Pareto–Feller distribution. In particular, the pdf, cumulative distribution function (cdf), joint moment-generating function (mgf), characteristic functions, cross-product moment function, mean, variance, covariance, and correlation function are presented. In
Section 3, some approximations of the differential entropy and, consequently, the mutual information index are presented. Finally, some discussions and conclusions are presented in
Section 4. All simulations included special hypergeometric functions such as the Appell hypergeometric one, and all were implemented in
R 4.4.1. software [
21] using the
zipfR and
hypergeo packages. All proofs of theorems and propositions can be found in the
Appendix A.
2. Bivariate Pareto–Feller Distribution
Let us define a random variable
V with support on the positive real line, defined as a scale mixture of two gamma random variables:
where
,
and
,
. Then,
V is a random variable with a marginal distribution of the beta I type or beta prime [
12,
22] and denoted by
, with the pdf given by
The beta prime distribution is anchored to a shape parameter of the gamma distributions. This construction was previously proposed in [
23,
24,
25].
Based on the stochastic representation in Equation (
3), we consider the bivariate vector of
, where
Here, and are two correlated bivariate gamma distributions with correlations , where , and , , , , . Thus, , .
A new bivariate distribution with a beta prime marginal distribution obtained from the Kibble-type bivariate gamma distribution given in Equation (
2) is presented in the following theorem. This result can be viewed as a generalization of the standard bivariate beta I distribution (or inverted bivariate beta distribution) [
26].
Theorem 1. Let W and R be two independent gamma random variables, and let . Then, the pdf of is given bywhere is an Appell hypergeometric function of the fourth kind, defined as The special functions
and the Gaussian hypergeometric function
are related through the identity
where
, for which
is the Pochhammer symbol. The Gaussian hypergeometric function is a special case of more general power series, where the generalized hypergeometric function is defined for
as
When
, the pdf in Theorem 1 involves the product of two independent beta prime random variables
.
We now consider a new random variable
Y, defined as
where
,
,
, and
. This random variable is a marginal Pareto–Feller distribution. Specifically, using the notation of [
20], we marginally have
with a density defined by
and a mean and variance given by
respectively, with
[
3].
The Pareto–Feller distribution includes as special cases different types of Pareto random variables definitions (type I, II, III, and IV; see [
20]) and the so-called beta prime one. If we consider
, then
, and
is a random vector with a marginal Pareto–Feller distribution. A new bivariate distribution based on marginal Pareto–Feller distributions is presented in the following theorem.
Theorem 2. Let , where , . The pdf of is given by The pdf in Theorem 2 considers an Appell hypergeometric function . We can write this as a series of hypergeometric functions .
Figure 1 illustrates the pdf of Equation (
7) for some parameters. When
increases, the largest values of
and
in the pdf are produced. However, these values depend on the other parameters. Independent of the
value, the pdf is close at the origin
with a positive bias, and it decays exponentially for the smallest values (
,
,
,
, and
). When
and
or 12, for example, the pdf has more symmetry and variability but less bias. Note that the pdf of
, given in Equation (
5), is symmetric for negative values of
. Specifically, the same representations hold for
from
Figure 1 while keeping the other parameters fixed.
Theorem 3. The joint cdf of in Equation (7) can be expressed as Proposition 1. The joint mgf and characteristic functions of given in Equation (7) arewith and Proposition 2. The cross-product moment of in Equation (7) can be expressed as Proposition 2 illustrates that the cross-product moment is the product of two Gaussian hypergeometric functions. Corollary 1 is straightforward from Proposition 2, where the expected value and variance of a marginal Pareto–Feller random variable are presented as well as the covariance and correlation between two marginal Pareto–Feller random variables ( and ).
Corollary 1. If , then it has a pdf according to Equation (7). According to Proposition 2, we have the following: - 1.
, .
- 2.
if for .
- 3.
- 4.
Let . Thus, we have
Figure 2 shows the correlation
for some density parameters. When
and
increase,
increases. More specifically, when
p increases, the correlation
increases with small-to-large values of
and
. When
increases,
increases, as does its maximum value (from 0.05 to 0.80). Nevertheless, Corollary 1(4) illustrates that the correlation
does not depend on either
q or
c.
3. Differential Entropy and Mutual Information Index
The differential entropy of
is an information uncertainty measure [
27]. The differential entropy of
with a pdf
is
and measures the contained information in
based on its pdf’s parameters.
On the other hand, the mutual information index (MII) [
28] between
and
under a dependence assumption (
) is
Proposition 3 ([
29])
. Let . The entropy of () iswhere is the digamma function. Proposition 4 ([
30])
. Let . We have thatas . Proposition 5. If has the pdf given in Equation (7), then the following are true: - (a)
.
- (b)
.
Proposition 6. An approximation of the differential entropy of iswhere and , are obtained from parts (a) and (b) of Proposition 5, respectively. From Equation (
15), an MII between
and
is expressed in terms of the marginal and joint differential entropies [
28,
31]. Then, using Propositions 3 and 6, the MII between
and
can be approximated as follows:
One particular case is when
. Thus, using Corollary 1(1), we have
Figure 3 illustrates the behavior of the approximated MII obtained in Equation (
16) while assuming several values for
and
. We observed that
and increased for
. As in the correlation function case (
Figure 2), the MII increased when the correlation parameter
increased.
4. Concluding Remarks
We presented a representation of the Pareto–Feller distribution with a scale mixture of two gamma random variables. The respective stochastic representation was obtained by the quotient of a scale mixture of two gamma random variables. Then, the resulting bivariate density considered the products of two confluent hypergeometric functions. In particular, the probability distribution function, cumulative distribution function, moment generation function, covariance function, correlation function, cross-product moments, and approximations for the differential entropy and, as a consequence, the mutual information index were derived. Some numerical examples illustrated the behavior of the provided expressions.
Some inferential aspects can be addressed in a future work, such as (1) a numerical approach for optimization of the log-likelihood function; (2) the pseudo-likelihood method, considering the optimization of an objective function which depends on a bivariate pdf; (3) the model’s identifiability; (4) a Bayesian approach; and (5) an extention to the multivariate case. We also encourage researching the consideration of a Pareto–Feller distribution in modeling nonnegative bivariate data.