General Notions of Statistical Depth Function
General Notions of Statistical Depth Function
Based on this depth, Donoho and Gasko (1992) studied multivariate location
estimators and Yeh and Singh (1997) developed confidence regions. Properties
of the corresponding contours have been studied by various authors including
Eddy (1985), Nolan (1992), Donoho and Gasko (1992) and Massé and Theodor-
escu (1994). See Carrizosa (1996) for a characterization of halfspace depth
relating to problems of facility location analysis in the operations research
literature.
The “center-outward ordering” interpretation of a depth function suggests
that (i) a relevant notion of “center” is available, and (ii) points near the cen-
ter should have higher depth. From this standpoint, the “center” consists of
the set of points globally maximizing depth, in which case a depth function
should tend to ignore multimodality features of the underlying distribution
P. If, on the other hand, sensitivity to multimodality is desirable, then the
“center” should include local maxima as well, in which case the notion of
center-outward ordering becomes compromised and “inner” points can have
low depth. It is thus important, in considering depth functions, to make a
choice on this issue. In the present paper we opt for the center to be given
by global maxima, with low depth corresponding to large distance from the
center. For further discussion, see Remark A.1 in Appendix A.
Liu (1990) introduced a notion of “simplicial” depth and corresponding mul-
tivariate location estimators. Namely, the simplicial depth (SD) of a point x
in Rd with respect to a probability measure P on Rd is defined to be the
probability that x belongs to a random simplex in Rd , that is,
SDx P = Px ∈ S X1 Xd+1 x ∈ Rd
where X1 Xd+1 is a random sample from P and S x1 xd+1 denotes
the d-dimensional simplex with vertices x1 xd+1 , that is, the set of all
points in Rd that are convex combinations of x1 xd+1 .
Liu and Singh (1993) considered the above two depth functions and two
more, “Mahalanobis” depth and “majority” depth, which they applied in for-
mulating a “quality index” for use in connection with manufacturing processes.
Rousseeuw and Hubert (1999) introduced “regression depth” and Rousseeuw
and Ruts (1996), Ruts and Rousseeuw (1996) and Rousseeuw and Struyf (1998)
studied computing issues concerning depth functions and contours. Liu, Pare-
lius and Singh (1999) considered seven examples of depth function, including a
“convex hull peeling” version and a “likelihood” type, and developed methodol-
ogy for their practical use in exploratory statistical analysis. Likelihood-based
depth functions have also been considered by Fraiman and Meloche (1996) and
Fraiman, Liu and Meloche (1997). Koshevoy and Mosler (1997) introduced a
“zonoid” depth function based on “zonoid trimming.” Bartoszyński, Pearl and
Lawrence (1997) introduced a depth function based on interpoint distances in
the context of a multivariate goodness-of-fit test. Depth functions also arise
in the theory of social choice [see Caplin and Nalebuff (1988, 1991a, b)]. Non-
parametric notions of multivariate “scatter measure” and “more scattered”
based on general depth functions have been formulated and studied by Zuo
and Serfling (2000a). Mizera (1998) has introduced a differential calculus for
depth functions. Finally, Vardi and Zhang (1999) have introduced a method
for constructing depth functions from notions of multivariate median.
Depth functions thus have been introduced ad hoc in great variety, without
regard to whether they meet any particular set of criteria that ought to be
satisfied. Consequently, there is no systematic basis for preferring one such
function over another. In the present paper, we address this issue by asking:
(i) What desirable properties should a statistical depth function possess?
(ii) What constructive approaches lead to attractive depth functions?
(iii) Do existing depth functions possess all desired properties?
In Section 2 we list several desirable properties first introduced by Liu
(1990), on the basis of which we formulate a general definition of “statisti-
cal depth function”. Roughly speaking, these properties may be described as:
STATISTICAL DEPTH FUNCTION 463
depth ≤ the depth of x. [This quantity is used by Liu and Singh (1993) in
defining their “quality index.”] Under P2 and P3, however, convergence of
Rx F to 0 is seen to hold already and thus does not offer anything productive
in addition to P2 and P3.
In Zuo and Serfling [(2000b), Theorem 3.1(iv)], the present form of P4 is use-
ful in establishing compactness of depth-trimmed regions. Further, it plays a
role in using truncation arguments to establish almost sure uniform conver-
gence of sample depth functions to population versions.
For the above two well-known notions of depth function, we thus have found
that one behaves well overall, while in some discrete cases the other is not
completely satisfactory. This leads one to investigate whether other attractive
statistical depth functions can be defined, indeed to explore general structures
for such functions and to seek to identify the more favorable types.
2.3. General structures for statistical depth functions. Four general struc-
tures for construction of statistical depth functions are introduced and inves-
tigated with respect to properties P1–P4. Various existing depth functions are
classified according to these types.
2.3.1. Type A depth functions. Let hx x1 xr be any bounded non-
negative function which in some sense measures the closeness of x to the
points x1 xr . A corresponding Type A depth function is then defined by
the average closeness of x to a random sample of size r:
(1) Dx P = Ehx X1 Xr
where X1 Xr is a random sample from P. For such depth functions
the corresponding sample versions Dx P turn out to be U-statistics or V-
n
statistics.
Taking r = d + 1 and hx x1 xd+1 = I x ∈ S x1 xd+1 , we ob-
tain the simplicial depth, whose properties have been covered in Section 2.2.
Another example is the following.
However, he did not develop it into a depth function, nor did he consider the
affine invariant version (5).
468 Y. ZUO AND R. SERFLING
Example 2.3. [Lp depth (p > 0.)] Another way to measure distance is via
the Lp norm · p . Taking hx x1 = x − x1 p , a corresponding Type B depth
function is given by
−1
(6) Lp Dx F ≡ 1 + E x − X p
Note that Lp Dx F generally does not possess the affine invariance property,
however, since
E Ax + b − AX + b p = E Ax − X p
is nonempty.
Equipped with the above two results, we now take a further look at
SVDα x F and Lp Dx F.
The affine invariance and Corollaries 2.1 and 2.2 thus yield:
The next three results treat P2–P4 for Lp Dx F, p ≥ 1 and L2 Dx F.
Convexity of hx x1 = x − x1 p in the argument x follows in straight-
forward fashion from Minkowski’s inequality. Thus Theorem 2.4 yields P3 for
Lp Dx F, while P4 is obvious. Thus we have
Remark 2.4. In the foregoing proof, condition (iv) of Theorem 2.3 was es-
tablished for L2 x F for all A-symmetric F. For the depth function L2 x F,
it follows from results established in Zuo and Serfling (2000c) that this condi-
tion holds for all H-symmetric F.
Remark 2.5. Although Type B and Type C depth functions are clearly sim-
ilar in form, it is convenient to treat them separately, as they arise from some-
what different conceptual points of view.
u x − Medu X
(10) Ox F ≡ sup
u =1 MADu X
where X has distribution F, Med denotes the univariate median, MAD de-
notes the univariate median absolute deviation defined for univariate Y as
MADY = MedY − MedY, and · is the Euclidean norm. We call
the corresponding Type C depth function projection depth and denote it by
PDx F, x ∈ Rd .
1
Med1≤i≤n Xi = 2
X n+1
2
+ X n+2
2
and X1 ≤ · · · ≤ Xn are the ordered X1 Xn . Donoho and Gasko (1992)
generalized this to arbitrary dimension d, defining On x to be the worst case
outlyingness of x ∈ Rd in any one-dimensional projection of x and the dataset
X. A sample version of the projection depth function PDx F is thus given
by
Liu (1992) suggested the use of (11) as a data depth function, but did not
provide any treatment of it.
where F is a given distribution and µF and F are any corresponding
location and covariance measures, respectively. The case that µF and F
are the mean and covariance matrix of F was suggested by Liu (1992). For
these choices, however, MHD· F is not “robust” [since µF = mean is not
robust, as noted by Liu and Singh (1993)], and it can fail to achieve maximum
value at the center of A-symmetric distributions.
For Type C depth functions, the following analogues of Theorems 2.3 and
2.4 hold and can be proved similarly. It is convenient to write Ox X for
Ox FX .
is nonempty.
The following two theorems establish that PDx F and MHDx F are
proper statistical depth functions.
Theorem 2.11. Let C be a class of closed Borel sets satisfying C1 and C2.
Further, for a given probability measure P on Rd , assume that if x ∈ C ∈ C
and PC < α, then there is a C1 ∈ C such that x ∈ C◦1 and PC1 < α. Then:
(i) Dx P C is upper semicontinuous;
(ii) Dα ≡ x ∈ Rd Dx P C ≥ α, α ∈ 0 1 , are compact and nested
i.e., Dα1 ⊂ Dα2 if α1 > α2 and
(iii) Dα is convex if every C ∈ C is convex.
C2 . P∂C = 0, ∀ C ∈ ,
Singh (1999). These, however, fail to satisfy in general any of P1–P4, and
their effectiveness appears to be confined primarily to models with ellipsoidal
densities, or to situations where sensitivity to multimodality is paramount.
For further discussion, see Remark A.1 in Appendix A.
The zonoid depth function of Koshevoy and Mosler (1997) has some nice
properties but can fail to satisfy “maximality at center” P2 for A- or H-sym-
metric distributions, because it attains maximum value always at the expec-
tation EX for any random variable X in Rd . Also, the sample zonoid depth
function is not robust, as a single corrupted data point can move the “center
point of zonoid data depth” to infinity.
In conclusion, the halfspace and projection depth functions appear to repre-
sent very favorable choices. Both are implementations of the “projection pur-
suit” method, which utilizes all of the one-dimensional views of a dataset as a
foundation for data analysis, thus producing the advantage of great power at
extraction of information, although at the expense of a substantial computa-
tional burden. Also, competitively, the L2 and Mahahalanobis depth functions
appear to have strong potential for development.
Besides carrying intrinsic interest, (A.1) plays a supporting role for other pur-
poses. For example, it underlies the convergence of sample depth contours to
their population counterparts, as in He and Wang (1997) especially for ellip-
tical models and in Zuo and Serfling (2000b) for more general models. In Liu
and Singh (1993), it is basic to the convergence of a certain “quality index”,
while in Liu, Parelius and Singh (1999) it supports various practical methods
such as “DD-plots.”
Results on (A.1) are now available for several cases of depth function.
Donoho and Gasko (1992) proved it for the sample halfspace depth,
H H a closed halfspace x ∈ H
HDn x = inf P x ∈ Rd
n
where P denotes the usual empirical measure, and Liu (1990), Dümbgen
n
(1990), and Arcones and Giné (1993) for the sample simplicial depth
−1
n
SDn x = Ix ∈ S Xi1 Xid+1 x ∈ Rd
d+1 1≤i1 <···<id+1 ≤n
For the sample majority and Mahalanobis depths, under suitable conditions
on F, (A.1) is established by Liu and Singh (1993). For sample versions of
the “projection” depth function and the “Type D” depth functions introduced
above, (A.1) is established in Appendix B of Zuo and Serfling (2000b).
APPENDIX B: PROOFS
P P
MJDx0 P − MJDx P = P x0 ∈ HX 1 Xd
− P x ∈ HX 1 Xd
P P
= P x0 ∈ HX 1 Xd
and x ∈ HX 1 Xd
≥ 0 ✷
Ehx X1 − θ Xr − θ = Ehθ + x X1 Xr
Ehx θ − X1 θ − Xr = Ehθ − x X1 Xr
It follows that
= inf Ehx X1 Xr
x∈Rd
Thus
Ehx0 X1 Xr ≤ maxEhx X1 Xr Ehθ X1 Xr
= Ehx X1 Xr
and hence Dx0 F ≥ Dx F completing the proof. ✷
λ + 1 − λ 1 ··· 1
1 λx̃1 + 1 − λỹ1 x11 ··· xd1
= det
d!
λx̃ + 1 − λỹ x1d · · · xdd
d d
≤ λS x x1 xd + 1 − λS y x1 xd
where x = x̃1 x̃d y = ỹ1 ỹd and xi = xi1 xid for 0 ≤ i ≤
d. Now the convexity of the function xα for 0 < x < ∞ and α ≥ 1 yields
α S x0 x1 xd ≤ λα S x x1 xd + 1 − λα S y x1 xd
(b) It is obvious that α S x x1 xd → ∞ as x → ∞ Thus
SVDα x F → 0 as x → ∞ completing the proof. ✷
It follows that
λx + 1 − λy M ≤λ x M + 1 − λ y M
d
(b) Now we show that there is a point y ∈ R satisfying condition (4) of
Theorem 2.3. Equivalently, we need to show that
(B.1) θ ∈ arg inf E x−X −1
x∈Rd
d µ − x −1
= dFx
Rd dµ
−1 µ − x
= dFx
Rd µ − x −1
−1 µ−X
= E
µ − X −1
Then by convexity and (∗) we conclude that (B.1) holds.
STATISTICAL DEPTH FUNCTION 479
Now (a) and (b) yield ∗, which implies that Dα is closed, and thus Dx P C
is upper semicontinuous.
(ii) The nestedness of Dα is trival. The boundedness of Dα follows from the
fact that Dx P C → 0 as x → ∞ The compactness of Dα now follows
from its being bounded and closed.
(iii) The convexity follows from ∗, since the intersection of convex sets is
convex. ✷
REFERENCES
Arcones, M. A. and Giné, E. (1993). Limit theorems for U-processes. Ann. Probab. 21 1494–1542.
Baggerly, K. A. and Scott, D. W. (1999). Comment on “Multivariate analysis by data depth:
Descriptive statistics, graphics and inference,” by R. Y. Liu, J. M. Parelius and K. Singh.
Ann. Statist. 27 843–844.
Bartoszyński, R., Pearl, D. K. and Lawrence, J. (1997). A multidimensional goodness-of-fit test
based on interpoint distances. J. Amer. Statist. Assoc. 92 577–586.
Beran, R. J. and Millar, P. W. (1997). Multivariate symmetry models. In Festschrift for Lucien
Le Cam: Research Papers in Probability and Statistics (D. Pollard, E. Torgerson and
G. L. Yang, eds.) 13–42. Springer, Berlin.
Caplin, A. and Nalebuff, B. (1988). On 64%-majority rule. Econometrica 56 787–814.
Caplin, A. and Nalebuff, B. (1991a). Aggregation and social choice: A mean voter theorem.
Econometrica 59 1–23.
Caplin, A. and Nalebuff, B. (1991b). Aggregation and imperfect competition: On the existence
of equilibrium. Econometrica 59 25–59.
Carrizosa, E. (1996). A characterization of halfspace depth. J. Multivariate Anal. 58 21–26.
Chamberlin, E. (1937). The Theory of Monopolistic Competition. Harvard Univ. Press.
Chen, Z. (1995). Bounds for the breakdown point of the simplicial median. J. Multivariate Anal.
55 1–13.
Donoho, D. L. (1982). Breakdown properties of multivariate location estimators. Ph. D. qualifying
paper, Dept. Statistics, Harvard Univ.
Donoho, D. L. and Gasko, M. (1992). Breakdown properties of location estimates based on half-
space depth and projected outlyingness. Ann. Statist. 20 1803–1827.
Dümbgen, L. (1990). Limit theorems for the empirical simplicial depth. Statist. Probab. Lett. 14
119–128.
Eddy, W. F. (1985). Ordering of multivariate data. In Computer Science and Statistics: The In-
terface (L. Billard, ed.) 25–30. North-Holland, Amsterdam.
Fraiman, R. and Meloche, J. (1996). Multivariate L-estimation. Preprint.
Fraiman, R., Liu, R. Y. and Meloche, J. (1997). Multivariate density estimation by probing
depth. In L1 -Statistical Procedures and Related Topics (Y. Dodge, ed.) 415–430. IMS,
Hayward, CA.
He, X. and Wang, G. (1997). Convergence of depth contours for multivariate datasets. Ann.
Statist. 25 495–504.
Hotelling, H. (1929). Stability in competition. Econom. J. 39 41–57.
Koshevoy, G. and Mosler, K. (1997). Zonoid trimming for multivariate distributions. Ann.
Statist. 25 1998–2017.
STATISTICAL DEPTH FUNCTION 481
Liu, R. Y. (1990). On a notion of data depth based on random simplices. Ann. Statist. 18 405–414.
Liu, R. Y. (1992). Data depth and multivariate rank tests. In L1 -Statistics and Related Methods
(Y. Dodge, ed.) 279–294. North-Holland, Amsterdam.
Liu, R. Y., Parelius, J. M. and Singh, K. (1999). Multivariate analysis by data depth: Descriptive
statistics, graphics and inference (with discussion). Ann. Statist. 27 783–858.
Liu, R. Y. and Singh, K. (1993). A quality index based on data depth and multivariate rank tests.
J. Amer. Statist. Assoc. 88 252–260.
Mahalanobis, P. C. (1936). On the generalized distance in statistics. Proc. Nat. Acad. Sci. India
12 49–55.
Massé, J. C. and Theodorescu, R. (1994). Halfplane trimming for bivariate distributions. J.
Multivariate Anal. 48 188–202.
Mizera, I. (1998). On depth and deep points: a calculus. Preprint.
Mosteller, C. F. and Tukey, J. W. (1977). Data Analysis and Regression. Addison-Wesley, Read-
ing, MA.
Niinimaa, A., Oja, H. and Tableman, M. (1990). On the finite sample breakdown point of the Oja
bivariate median and of the corresponding half-samples version. Statist. Probab. Lett.
10 325–328.
Nolan, D. (1992). Asymptotics for multivariate trimming. Stochastic Process. Appl. 42 157–169.
Oja, H. (1983). Descriptive statistics for multivariate distributions. Statist. Probab. Lett. 1 327–
333.
Rao, C. R. (1988). Methodology based on the L1 norm in statistical inference. Sankhyā Ser. A 50
289–313.
Rousseeuw, P. J. and Hubert, M. (1999). Regression depth (with discussion). J. Amer. Statist.
Assoc. 94 388–433.
Rousseeuw, P. J. and Ruts, I. (1996). Bivariate location depth. J. Roy. Statist. Soc. Ser. C 45
516–526.
Rousseeuw, P. J. and Struyf, A. (1998). Computing location depth and regression depth in
higher dimensions. Statist. Comput. 8 193–203.
Ruts, I. and Rousseeuw, P. J. (1996). Computing depth contours of bivariate point clouds. Com-
put. Statist. Data Anal. 23 153–168.
Serfling, R. (1980). Approximation Theorems of Mathematical Statistics. Wiley, New York.
Singh, K. (1991). A notion of majority depth. Preprint.
Small, C. G. (1987). Measures of centrality for multivariate and directional distributions. Canad.
J. Statist. 15 31–39.
Small, C. G. (1990). A survey of multidimensional medians. Internat. Statist. Inst. Rev. 58 263–
277.
Stahel, W. A. (1981). Robust estimation: infinitesimal optimality and covariance matrix estima-
tors. Ph. D thesis, ETH, Zurich (in German).
Tukey, J. W. (1975). Mathematics and picturing data. In Proceedings of the International
Congress on Mathematics (R. D. James, ed.) 2 523–531 Canadian Math. Congress.
Tyler, D. E. (1994). Finite sample breakdown points of projection based multivariate location
and scatter statistics. Ann. Statist. 22 1024–1044.
Vardi, Y. and Zhang, C.-H. (1999). The multivariate L1 -median and associated data depth.
Preprint.
Yeh, A. B. and Singh, K. (1997). Balanced confidence regions based on Tukey’s depth and the
bootstrap. J. Roy. Statist. Soc. Ser. B 59 639–652.
Zuo, Y. (1999). Affine equivariant multivariate location estimates with best possible breakdown
points. Preprint.
Zuo, Y. and Serfling, R. (2000a). Nonparametric notions of multivariate “scatter measure” and
“more scattered” based on statistical depth functions. J. Multivariate Anal. To appear.
482 Y. ZUO AND R. SERFLING
Zuo, Y. and Serfling, R. (2000b). Structural properties and convergence results for contours of
sample statistical depth functions. Ann. Statist. 28 483–499.
Zuo, Y. and Serfling, R. (2000c). On the performance of some robust nonparametric location
measures relative to a general notion of multivariate symmetry. J. Statist. Plann. In-
ference 84 55–79.