What Are Copulas
What Are Copulas
as a
Abstract The notion of copula was introduced by A. Sklar in 1959, when answering a question raised by M. Frchet about the relationship between a multidimensional e probability function and its lower dimensional margins. At the beginning, copulas were mainly used in the development of the theory of probabilistic metric spaces. Later, they were of interest to dene nonparametric measures of dependence between random variables, and since then, they began to play an important role in probability and mathematical statistics. In this paper, a general overview of the theory of copulas will be presented. Some of the main results of this theory, various examples, and some open problems will be described. MSC: Primary 60E05; Secondary 62H05, 62H20. Keywords: Copulas; dependence concepts; measures of association, probabilistic metric spaces.
Introduction
During a long time statisticians have been interested on the relationship between a multivariate distribution function and its lower dimensional margins (univariate or of higher dimensions). M. Frchet (see [11]), and G. DallAglio (see [6]) did some interesting works e about this matter in the fties, studying the bivariate and trivariate distribution functions with given univariate margins. The answer to this problem for the univariate margins case was given by A. Sklar in 1959 (see [31]) creating a new class of functions which he called copulas. These new functions are restrictions to [0, 1]2 of bivariate distribution functions
499
whose margins are uniform in [0, 1]. In short, Sklar showed that if H is a bivariate distribution function with margins F (x) and G(y), then there exists a copula C such that H(x, y) = C(F (x), G(y)). Between 1959 and 1976 most of the results about copulas were obtained in the course of the development of the probabilistic metric spaces, mainly in the study of binary operations in the space of the probability distribution functions. In 1942, Karl Menger (see [20]) proposed a probabilistic generalization of the theory of metric spaces, by replacing the number d(p, q) by a distribution function Fpq , whose value Fpq (x) for any real x is the probability that the distance between p and q is less than x. The rst diculty in the construction of probabilistic metric spaces comes when one tries to nd a probabilistic analog of the triangle inequality. Menger proposed Fpr (x + y) T (Fpq (x), Fqr (y)), where T is a triangle norm or t-norm. Some t-norms are copulas, and conversely, some copulas are t-norms. For a history of the development of the theory of probabilistic metric spaces, see [28] and [29]. Subsequently, it was discovered that copulas could be useful to dene nonparametric measures of dependence between random variables. Since then, the concept of copula has been rediscovered in several times, playing an important role in Probability and Statistics, particularly in problems related to dependence, given marginals and functions of random variables that are invariants under monotone transformations. A historical review about the evolution of this matter can be found in [7] and [28]. The recent book by R.B. Nelsen (see [21]) is an important monograph about copulas. As for the relationship with problems of given marginals, it can be seen [2], [5], [8] and [27].
Copulas
We begin with the denition of copula for the bivariate case. Denition 2.1. A copula is a function C : [0, 1]2 [0, 1] which satises: (a) For every u, v in [0, 1], C(u, 0) = 0 = C(0, v), and C(u, 1) = u and C(1, v) = v; (b) for every u1 , u2 , v1 , v2 in [0, 1] such that u1 u2 and v1 v2 , C(u2 , v2 )C(u2 , v1 ) C(u1 , v2 ) + C(u1 , v1 ) 0. The importance of copulas in Statistics is described in Sklars Theorem: Theorem 2.1. Let X and Y be random variables with joint distribution function H and marginal distribution functions F and G, respectively. Then there exists a copula C such that H(x, y) = C(F (x), G(y)) (1)
for all x, y in IR. If F and G are continuous, then C is unique. Otherwise, the copula C is uniquely determined on Ran(F )Ran(G). Conversely, if C is a copula and F and 500
G are distribution functions, then the function H dened by (1) is a joint distribution function with margins F and G. Thus copulas link joint distribution functions to their one-dimensional margins. A proof of this theorem can be found in [29]. A rst example of copulas is the product copula (u, v) = uv, which characterizes independent random variables when the distribution functions are continuous. As a consequence of Sklars Theorem, we encounter the Frchet-Hoeding bounds for e copulas, i.e., for any copula C and for all u, v in [0, 1], W (u, v) = max(u + v 1, 0) C(u, v) min(u, v) = M (u, v), where W and M are also copulas. Much of the usefulness of copulas in the study of nonparametric statistics derives from the facts expressed in the following result. Theorem 2.2. Let X and Y be continuous random variables with copula CXY . Let f and g be strictly monotone functions on Ran(X) and Ran(Y ), respectively. (a) If f and g are strictly increasing, then Cf (X),g(Y ) (u, v) = CXY (u, v). (b) If f is strictly increasing and g is strictly decreasing, then Cf (X),g(Y ) (u, v) = u CXY (u, 1 v). (c) If f is strictly decreasing and g is strictly increasing, then Cf (X),g(Y ) (u, v) = v CXY (1 u, v). (d) If f and g are strictly decreasing, then Cf (X),g(Y ) (u, v) = u+v1+CXY (1u, 1v).
In this section we show some of the most known family of copulas. Example 3.1. Frchets family (1958) (see [12]). The following two-parameter family e of copulas is a convex linear combination of the copulas , W and M , C, (u, v) = M (u, v) + (1 )(u, v) + W (u, v), where , [0, 1] with + 1. Example 3.2. Farlie-Gumbel-Morgensterns family (1960) (see [10]). If [1, 1], then the function C (u, v) = uv + uv(1 u)(1 v) is a one-parameter family of copulas. A generalization of this family can be found in [26]. Example 3.3. Marshall-Olkins family (1967) (see [19]). If , [0, 1], then C, (u, v) = min(u1 v, uv 1 ) is a two-parameter family of copulas. Example 3.4. Archimedean copulas. Copulas of the form C(u, v) = [1] ((u) + (v)) are called Archimedean copulas, where [1] is the pseudo-inverse of a continuous and strictly decreasing function from [0, 1] to [0, ] with (1) = 0. For a detailed study of these copulas, see [13]. 501
In this section we will look at dierent ways in which copulas can be used in the study of dependence between random variables. For a historical review of measures of association and concepts of dependence, see [17] and [18]. For some recent results, see [4], [21], [22], and [30].
4.1
4.1.1
Measures of association
Kendalls
Kendalls tau measure of a pair (X, Y ), distributed according to H, can be dened as the dierence between the probabilities of concordance and discordance for two independent pairs (X1 , Y1 ) and (X2 , Y2 ) each with distribution H; that is XY = Pr{(X1 X2 )(Y1 Y2 ) > 0} Pr{(X1 X2 )(Y1 Y2 ) < 0}. These probabilities can be evaluated by integrating over the distribution of (X2 , Y2 ). So that, in terms of copulas, Kendalls becomes to
1 1
C = 4
C(u, v)dC(u, v) 1,
0 0
Let (X1 , Y1 ), (X2 , Y2 ) and (X3 , Y3 ) be three independent random vectors with a common joint distribution function H. Consider the vectors (X1 , Y1 ) and (X2 , Y3 ), then the Spearmans coecient associated to a pair (X, Y ), distributed according to H, is dened as XY = 3(Pr{(X1 X2 )(Y1 Y3 ) > 0} Pr{(X1 X2 )(Y1 Y3 ) < 0}). In terms of the copula C associated to the pair (X, Y ) becomes to
1 1
(C(u, v) uv)dudv.
0 0
(2)
If we replace the function C(u, v)uv in (2) by its absolute value, then we obtain Schweizer and Wols , given by
1 1
C = 12
|C(u, v) uv|dudv.
0 0
502
4.2
Dependence concepts
Let X and Y be continuous random variables with joint distribution function H and marginals F and G, respectively. We say that X and Y are positive quadrant dependent if H(x, y) F (x)G(y) 0 for all x, y IR. In terms of the copula CXY associated to the pair (X, Y ), it means that CXY (u, v) uv for all u, v [0, 1]. See [15] and [16] for further discussions of this concept of dependence and many others. Lastly, we note that there exist some relationships among some measures of association and certain dependence concepts. For a complete review, see [21].
5
5.1
A binary operation on the set of distribution functions is derivable from a function on random variables if there exists a Borel-measurable two-place function Z satisfying the following condition: For every pair of distribution functions F and G, there exist two random variables X and Y such that F and G are, respectively, the distribution functions of X and Y , and (F, G) is the distribution function of the random variable Z(X, Y ). The notion of quasi-copula was introduced in [1] to characterize operations on distribution functions that can or cannot be derived from operations on random variables (see also [25]). Genest et al. (see [14]) have characterized the quasi-copula concept in simpler operational terms, as the following result asserts: Theorem 5.1.1. A function Q : [0, 1]2 [0, 1] is a quasi-copula if and only if it satises: (i) Q(0, x) = Q(x, 0) = 0 and Q(x, 1) = Q(1, x) = x for all x in [0, 1]; (ii) Q(x, y) is nondecreasing in each of its arguments; and (iii) the Lipschitz condition |Q(x1 , y1 ) Q(x2 , y2 )| |x1 x2 | + |y1 y2 | for all x1 , x2 , y1 and y2 in [0, 1]. Recently (see [24]), it has been proved a new simple characterization of quasi-copulas and some properties of these functions, all of them concerning the mass distribution of a quasi-copula. It has been showed that the features of this mass distribution can be quite dierent from that of a copula.
5.2
Markov processes
We begin this subsection with a product operation for copulas studied in [9].
503
Denition 5.2.1. Let C1 and C2 be copulas. The product of C1 and C2 is the function C1 C2 from [0, 1]2 to [0, 1] given by
1
(C1 C2 )(u, v) =
As a rst result, the authors (see [9]) showed that C1 C2 is a copula, and their main result is the following theorem: Theorem 5.2.1. Let {Xt |t T } be a stochastic process, and for each s, t in T , let Cst denote the copula of the random variables Xs and Xt . Then the following conditions are equivalent: (a) The conditional distribution functions P (x, s; y, t) satisfy the Chapman-Kolmogorov equations for all s < u < t in T and almost all x, y in IR; (b) For all s < u < t in T , Cst = Csu Cut . This theorem yields a new technique for constructing Markov processes.
New problems
There are new problems in the study of the theory of copulas under three dierent points of view. These are: Stochastic orderings: See [3] and [23] for more details. Nonparametric Statistics: The use of copulas to dene nonparametric hypothesis testing. Probability Theory: Developments in the theory of quasi-copulas. Numerical Analysis: Methods of approximation and interpolation in a given family of copulas from data provided by a bivariate random sample.
References
[1] Alsina, C.; Nelsen, R. B. and Schweizer, B. (1993), On the characterization of a class of binary operations on distribution functions, Statist. Probab. Lett. 17, 85-89. e a [2] Bens, V. and Stpn J., editors (1997), Distributions with Given Marginals and e Moment Problems (Kluwer Academic Publishers, Dordrecht). [3] Capr`a P. and Genest, C. (1990), Concepts de dpendance et ordres stochastiques ea e pour des lois bidimensionnelles, Canad. J. Statist. 18, 315-326. [4] Capr`a P. and Genest, C. (1993), Spearmans is larger than Kendalls for ea positively dependent random variables, J. Nonparametr. Statist. 2, 183-194.
504
[5] Cuadras, C.; Fortiana, J. and Rodr guez Lallena, J. A., editors (2002), Distributions with Given Marginals and Statistical Modelling (Kluwer Academic Publishers, Dordrecht). [6] DallAglio, G. (1959), Sulla compatibilit` delle funzioni di ripartizione doppia, a Rend. Mat. 18, 385-413. [7] DallAglio, G. (1991), Frchet classes: the beginnings in Advances in Probability e Distributions with Given Marginals (G. DallAglio, S. Kotz and G. Salinetti, eds.), pp. 1-12, Kluwer Academic Publishers, Dordrecht. [8] DallAglio, G.; Kotz, S. and Salinetti, G., editors (1991), Advances in Probability Distributions with Given Marginals (Kluwer Academic Publishers, Dordrecht). [9] Darsow, W. F.; Nguyen, B. and Olsen, E. T. (1992), Copulas and Markov Processes, Illinois J. Mathematics 36, 600-642. [10] Farlie, D. J. G. (1960), The perfomance of some correlation coecients for a general bivariate distribution, Biometrika 47, 307-323. [11] Frchet, M. (1951), Sur les tableaux de corrlation dont les marges sont donnes, e e e Ann. Univ. Lyon Sect. A 9, 53-77. [12] Frchet, M. (1958), Remarque au sujet de la note prcdente, R. Acad. Sci. Paris e e e 246, 2719-2720. [13] Genest, C. and MacKay, J. (1986), Copules archimdiennes et familles de lois bidie mensionnelles dont les marges sont donnes, Canad. J. Statist. 14, 145-159. e [14] Genest, C.; Quesada Molina, J. J.; Rodr guez Lallena, J. A. and Sempi, C. (1999), A characterization of quasi-copulas, J. Multivariate Anal. 69, 193-205. [15] Hutchinson, T. P. and Lai, C. D. (1990), Continuous Bivariate Distributions, Emphasising Applications (Rumsby Scientic Publishing, Adelaide). [16] Joe, H. (1997), Multivariate Models and Dependence Concepts, Chapman & Hall, London. [17] Kruskal, W. H. (1958), Ordinal measures of association, J. Amer. Statist. Assoc. 53, 814-861. [18] Lehmann, E. L. (1966), Some concepts of dependence, Ann. Math. Statist. 37, 1137-1153.
505
[19] Marshall, A. W. and Olkin, I. (1967), A generalized bivariate exponential distribution, J. Appl. Probability 4, 291-302. [20] Menger, K. (1942), Statistical metrics, Proc. Nat. Acad. Sci. U.S.A. 28, 535-537. [21] Nelsen, R. B. (1999), An Introduction to Copulas, Springer, New York. [22] Nelsen, R. B.; Quesada Molina, J. J.; Rodr guez Lallena, J. A. y Ubeda Flores, M. (2001), Bounds on bivariate distribution functions with given margins and measures of association, Comm. Statist. Theory and Methods vol. 30, number 6, 1155-1162. [23] Nelsen, R. B.; Quesada Molina, J. J.; Rodr guez Lallena, J. A. y Ubeda Flores, M. (2001), Distribution functions of copulas: a class of bivariate probability integral transforms, Statist. Probab. Lett. 54, 277-282. [24] Nelsen, R. B.; Quesada Molina, J. J.; Rodr guez Lallena, J. A. and Ubeda Flores, M. (2001), Some new properties of quasi-copulas, in Distributions with Given Marginals and Statistical Modelling (C. Cuadras, J. Fortiana and J.A. Rodr guez, eds.), Kluwer Academic Publishers, Dordrecht. [25] Nelsen, R. B.; Quesada Molina, J. J.; Schweizer, B. y Sempi, C. (1996), Derivability of some operations on distribution functions, in Distributions with Fixed Marginals and Related Topics (L. Rschendorf, B. Schweizer, M. D. Taylor, eds.), IMS Lecture u Notes-Monograph Series Number 28, pp. 233-243, Hayward, California. [26] Quesada Molina, J. J. and Rodr guez Lallena, J. A. (1995), Bivariate copulas with quadratic sections, J. Nonparametr. Statist. 5, 323-337. [27] Rschendorf, L.; Schweizer, B. and Taylor M. D., editors (1996), Distributions with u Fixed Marginals and Related Topics, (Institute of Mathematical Statistics, Hayward, California). [28] Schweizer, B. (1991), Thirty years of copulas, in Advances in Probability Distributions with Given Marginals (G. DallAglio, S. Kotz and G. Salinetti, eds.), pp. 13-50, Kluwer Academic Publishers, Dordrecht. [29] Schweizer, B. and Sklar, A. (1983), Probabilistic Metric Spaces, Elsevier, New York. [30] Schweizer, B. and Wol, E. F. (1981), On nonparametric measures of dependence for random variables, Ann. Statist. 9, 870-885. [31] Sklar, A. (1959), Fonctions de rpartition ` n dimensions et leurs marges, Publ. e a Inst. Statist. Univ. Paris 8, 229-231.
506