A Theoretical Analysis of Density Peaks Clustering and The Component-Wise Peak-Finding Algorithm
A Theoretical Analysis of Density Peaks Clustering and The Component-Wise Peak-Finding Algorithm
Abstract—Density peaks clustering detects modes as points with Level set methods detect clusters as connected components
high density and large distance to points of higher density. Each of the density level sets {x : f (x) ≥ λ}, where f is the density
non-mode point is assigned to the same cluster as its nearest function and λ is a cutting threshold. The density f is unknown,
neighbor of higher density. Density peaks clustering has proved
capable in applications, yet little work has been done to understand and hence the level sets are required to be estimated from the
its theoretical properties or the characteristics of the clusterings it data. Nearest-neighbor graphs have been widely used for this
produces. Here, we prove that it consistently estimates the modes purpose [2], [3]. Taking the instances to be the vertices of a
of the underlying density and correctly clusters the data with graph, k-NN graphs add edges between a vertex and all its k
high probability. However, noise in the density estimates can lead nearest neighbors. Mutual k-NN graphs add an edge between
to erroneous modes and incoherent cluster assignments. A novel
clustering algorithm, Component-wise Peak-Finding (CPF), is pro- two vertices only if they are k nearest neighbors of each other. It
posed to remedy these issues. The improvements are twofold: 1) the has been shown that any density level set of a given dataset can
assignment methodology is improved by applying the density peaks be approximated by the connected components of the mutual
methodology within level sets of the estimated density; 2) the algo- k-NN graph [2], [3], and further work attempted to develop an
rithm is not affected by spurious maxima of the density and hence is understanding of the optimal choice of k [2], [4].
competent at automatically deciding the correct number of clusters.
We present novel theoretical results, proving the consistency of Mode-seeking methods aim to directly locate the modes in the
CPF, as well as extensive experimental results demonstrating its density and then associate each instance in the observed data
exceptional performance. Finally, a semi-supervised version of CPF with a relevant mode. Such approaches begin with a density
is presented, integrating clustering constraints to achieve excellent estimate fˆ and then move each point xi towards a mode of fˆ by
performance for an important problem in computer vision. ascending the density. Mean shift, introduced in [5] and further
Index Terms—Density-based clustering, nearest-neighbor developed in [6] and [7], is a popular mode-seeking method
graph, density peaks, semi-supervised clustering, multi-image that associates an instance to a mode along the path of steepest
matching. ascent of the density estimate. To circumvent the costly run time
of mean shift, the authors in [8] proposed a fast sample-based
I. INTRODUCTION method, termed quick shift. Quick shift simply associates each
instance to its nearest neighbor of higher empirical density.
ENSITY-BASED clustering methods relate the notion of
D clusters to high-density contiguous regions of the under-
lying density function. Hartigan [1] proposed the concept of
To return a partition of the data, a segmentation parameter τ
is required such that an instance will not be associated to its
nearest neighbor of higher density if the distance between them
density-based clusters as “regions ... where the densities are
is greater than τ . Quick shift is shown in [9] to consistently
high surrounded by regions where the densities are low”. The
estimate the non-trivial modes of the underlying density and to
concept is attractive for several reasons: 1) the clusters are
correctly assign instances to their associated mode. However,
free to assume any shape, in contrast to model-based clustering
appropriate tuning of τ requires a knowledge of the distances
methods; 2) the clustering method is associated with density but
between modes. Furthermore, determining modes by only the
without requiring strong assumptions on the density function;
distances between instances and their nearest neighbor of higher
3) the number of clusters is linked to density peaks and can be
density can cause outlying points to be erroneously selected as
determined as part of the estimation procedure. Density-based
modes.
clustering methods can be broadly classified into two categories:
The density peaks clustering (DPC) method introduced in [10]
level set methods and mode-seeking methods.
offers a potential remedy to these issues, providing an intuitive
method for sample-based mode detection. The true modes of
Manuscript received 22 November 2022; revised 8 August 2023; accepted
20 October 2023. Date of publication 25 October 2023; date of current version the density are estimated using a decision graph, a scatter plot
8 January 2024. This work was supported by the Science Foundation Ireland of the local density against the distance to the nearest neighbor of
under Grant 16/RC/3872. Recommended for acceptance by M. Mahoney. (Cor- higher density. The modes are estimated as the extreme instances
responding author: Joshua Tobin.)
The authors are with the School of Computer Science & Statistics, on the decision graph. DPC assigns the remaining instances
Trinity College Dublin, Dublin 2 Dublin, Ireland (e-mail: [email protected]; to the detected modes using the same methodology as quick
[email protected]). shift. The partition of the data is extracted by grouping together
This article has supplementary downloadable material available at
https://doi.org/10.1109/TPAMI.2023.3327471, provided by the authors. instances that are assigned to the same mode. The decision graph
Digital Object Identifier 10.1109/TPAMI.2023.3327471 and the resulting clustering for a toy dataset are shown in Fig. 1.
0162-8828 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Xiamen University. Downloaded on April 23,2024 at 13:32:44 UTC from IEEE Xplore. Restrictions apply.
1110 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 46, NO. 2, FEBRUARY 2024
Fig. 1. Left: The decision graph of the DPC method. The three extreme points
are detected as the modes. Right: The assignment of the instances to the modes.
Fig. 2. Left: Noise in the density estimate leads to errors when seeking point
modes. Right: Modal-set methods are robust to noise and recover the true cluster
structure.
While many papers have demonstrated the ability of the DPC
method to provide high-quality clusterings in applications [11],
[12], there is, to the best of our knowledge, only one publication II. RELATED WORK
on the theoretical analysis of the DPC method. In [13], the
authors derive a theoretically grounded rule for selecting modes Adaptations of the DPC method have proliferated in recent
from the decision graph, using a robust linear regression of log years. One strand of works focuses on improving the density
of the density estimates log fˆ(x) against the log of the distances estimator (see [14], [15]), and another strand of works focuses
to neighbors of higher density. on automating the selection of modes from the decision graph
In this work, we seek to deepen our understanding of the DPC (see [16], [17]). The authors of DPC have introduced a recent
method and propose a new density-based clustering technique approach [18], which applies a density estimator based on the
that improves DPC both theoretically and computationally. By intrinsic dimension of the data, and a pruning mechanism for
adapting results from related works, we provide theoretical false modes.
guarantees that DPC consistently estimates the modes of the A robust way of modelling high-density regions in the data
underlying density and can correctly cluster the data with high space is proposed in [19]. Modal-sets generalize the concept
probability. We also demonstrate the deficiencies of the DPC of a point mode to a local support of the density peak. An
methodology in the presence of noisy density estimates. Mo- illustrative example is given in Fig. 2. The related clustering
tivated by the deficiencies, we introduce a novel clustering al- procedure, termed MCores, estimates the modal-sets using con-
gorithm: Component-wise Peak-Finding (CPF). CPF improves nected components of k-NN graphs at different levels of the
DPC in two ways: first, CPF partitions the data into regions mu- empirical density. The authors provide consistency guarantees
tually separated by areas of low density before clustering, thus on the recovery of true modal-sets in the data. A subsequent work
ensuring the correct assignment of instances to their respective presents QuickShift++ [20] improving on the MCores procedure
clusters; second, the peak-finding criterion is directed to seek by adopting the same allocation procedure as quick shift and
modal-sets rather than point modes in the data, reducing the DPC. Recently, in [21], DPC was adapted to detect modal-sets.
sensitivity of the clustering result to fluctuations in the density The method, termed DCF, was shown to detect modal-sets more
estimate. We provide theoretical guarantees for our new algo- efficiently than QuickShift++.
rithm, extending the theoretical properties available for the DPC While [19], [20], [21] use classical non-parametric density
method. In particular, we prove that CPF recovers unique and estimators, recent literature has proposed density estimators
consistent estimates of the high-density regions in the data, and using neural networks. A prominent approach uses energy-based
correctly determines the true number of clusters. Furthermore, models, defining an unnormalized density that is the exponential
the complexity of our algorithm is of the order O(nk log(n)), of the negative energy function, parameterized by a neural
near linear in k and n. Finally, to demonstrate the adaptability of network. The estimation procedure involves either computing
CPF, we present a modified version of the method, CPF-Match, maximum likelihood estimates ([22], [23]), variational approx-
designed for multi-image matching, an application in computer imation to an unnormalized target ([24], [25]), or methods com-
vision. We show that CPF-Match achieves state-of-the-art per- bining both approaches ([26]). The density estimates produced
formance for this task. by these methods are computationally expensive to compute,
The remainder of the paper is organized as follows. In Sec- and consistency guarantees are not currently available making
tion III, we formalize the DPC method, provide a theoretical theoretical analysis challenging. Nevertheless, they can natu-
analysis of its performance, and demonstrate its deficiencies rally be integrated into the clustering method discussed in this
via illustrative examples. In Section IV, the CPF algorithm is work.
explained in detail, and its consistency properties are provided
in Section V. In Section VI, we assess the clustering quality III. DENSITY PEAKS CLUSTERING
of CPF on a range of simulated and real-world datasets and
A. The Method
show that CPF outperforms DPC and other peer clustering
methods. Section VII introduces CPF-Match, an adapted method The peak-finding method in [10] requires two inputs: 1) a
for multi-image matching. Section VIII concludes the paper. density estimate at each data point, and 2) the distance from
Authorized licensed use limited to: Xiamen University. Downloaded on April 23,2024 at 13:32:44 UTC from IEEE Xplore. Restrictions apply.
TOBIN AND ZHANG: THEORETICAL ANALYSIS OF DENSITY PEAKS CLUSTERING AND THE COMPONENT-WISE PEAK-FINDING ALGORITHM 1111
For the remaining points, let b(x) = argminx ∈X {x − x : B. Theoretical Analysis
fˆk (x) < fˆk (x )}, i.e., the nearest neighbor of x with higher
density. Define the distance to the nearest neighbor of higher The quality of the clusterings provided by DPC has been
local density as thoroughly demonstrated in practice, as discussed in Section I.
Yet, no previous work has provided guarantees on the ability
ω(x) = x − b(x). of DPC to recover modes consistently. Through drawing an
Also of interest is the product of the estimated density fˆk (x)
analogy to the quick shift method, we show that DPC can recover
the modes and the associated cluster assignments with strong
and the distance quantity ω(x). This is termed the peak-finding
consistency guarantees.
criterion:
Definition 3: Taking fˆk (x) and ω(x) as defined above, we
Quick shift, as described in Section I, is a fast non-parametric
density-based method that produces clusterings with kernel
define the peak-finding criterion γ(x) as
density estimates. A directed graph is built with the observed
γ(x) = fˆk (x) · ω(x). instances as vertices, and edges added from each instance to its
nearest neighbor of higher estimated density. The final clusters
Following [10], the decision graph is the scatter plot of
are extracted as the connected components of the graph once
{(fˆk (x), ω(x)) : x ∈ X}. To generate a set of mode estimates
edges with length longer than a segmentation parameter τ are
M = {xj }m , threshold values for the density fˆk (x) and the
j=1 removed. The formulation of DPC introduced above is similar
distance ω(x) need to be set: the modes are the data points to quick shift in all but two ways: 1) a k-NN estimate of the
with the two metric values both above the thresholds, i.e., density is used in place of a kernel density estimate, and 2) a
M = {x ∈ X : fˆk (x) ≥ l, ω(x) ≥ τ }.
second threshold value l is defined, used to flag low-density
The algorithm used for density peaks clustering in this formu- instances as outliers. As such, in this section we present the
lation is described in Algorithm 1. The algorithm takes as input main results adapted from the consistency analysis of the k-NN
the dataset X and uses the parameter k to return the final set of density estimator [27] and the consistency analysis of quick
Initially, the set of estimated modes M
clusters C. = ∅ and the
shift [9]. The primary contributions involve drawing the analogy
cluster assignment graph G(X, is initialized with vertices as
E) to the quick shift approach, and the extension of the analysis to
the points of X and no edges. DPC produces the decision graph include the density threshold l used in the mode selection step of
(Lines 2–3). DPC. An extended analysis, including proofs of the theorems,
DPC requests the user to select estimated modes using this plot is given in the supplementary material, available online.
as reference. The estimated modes {xj }m j=1 are then added to We assume that f is α-Hölder continuous and lower bounded
M (Line 4). After the set of estimated modes has been returned, on X . Furthermore, it is assumed that the level sets of f are
edges are added to the graph G(X, from each non-modal
E) continuous with respect to the density level, and the modes of
Authorized licensed use limited to: Xiamen University. Downloaded on April 23,2024 at 13:32:44 UTC from IEEE Xplore. Restrictions apply.
1112 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 46, NO. 2, FEBRUARY 2024
Fig. 3. Illustration of the (r, θ, ν)+ -modes and (r, θ, ν)− -modes of Definition Fig. 4. Illustrative example of the (r, δ)-interior of an attraction region Ax∗ ,
(r,δ)
4. denoted Ax∗ , associated with a mode x∗ .
f have negative definite Hessian. Following [9], we now define hold simultaneously across all modes of the density and can be
a stronger notion of mode that allows for a clearer analysis of chosen arbitrarily small.
DPC. Definition 6: The (r, δ)-interior of an attraction region Ax∗ ,
Definition 4: A mode x∗ ∈ M is an (r, θ, ν)+ -mode, (r,δ)
denoted Ax∗ , is the set of points x1 ∈ Ax∗ such that a path P
if f (x∗ ) > f (x ) + θ for all x ∈ B(x∗ , r)\B(x∗ , rM ) and from x1 to any point x2 ∈ ∂Ax∗ satisfies
f (x∗ ) > ν + θ. A mode x∗ ∈ M is an (r, θ, ν)− -mode, if
f (x∗ ) < f (x ) − θ for some x ∈ B(x∗ , r) and f (x∗ ) > ν + θ. sup inf f (x ) ≥ sup f (x ) + δ.
x∈P x ∈B(x,r) x ∈B(x2 ,r)
Let M+ +
r,θ,ν ⊆ M denote the set of (r, θ, ν) -modes of f .
An illustration of the (r, θ, ν)+ -mode and the (r, θ, ν)− - Points in the interior of an attraction region must satisfy
modes is given in Fig. 3. Recall that the DPC algorithm requires the property that any path leaving the attraction region must
two cutting-off thresholds, one for cutting the value of the significantly decrease in density at some point. An illustrative
density estimate fˆk (x) and the other for cutting the distance to example is given in Fig. 4.
the nearest neighbor of higher estimated density, ω(x). Taking The main result (Theorem 2) states that, as long as the modes
the thresholds as τ and l for the density and distance values are well-estimated, the assignment method of DPC will correctly
contains unique cluster the (r, δ)-interiors of the attraction regions with high
respectively, our first theorem shows that M
probability. Suppose that x∗ ∈ M is estimated by x̂ ∈ M such
and consistent estimates of the (τ + , θ, l)+ -modes of f , for ∗
θ, > 0. that x̂ − x ≤ r. Then, with high probability, for x ∈ Ax∗ ∩
Theorem 1 (Mode Estimation - adapted from Theorem 2 X, density peaks clustering clusters x to the cluster belonging
of [9]): For every x∗ ∈ M+ − to x∗ .
τ +,θ,l \Mτ −,θ,l , with probability
satisfying Theorem 2 (Cluster Assignment - adapted from Theorem 2
at least 1 − ζ, there exists x̂ ∈ M such
of [20]): Suppose that x∗ ∈ M is estimated by x̂ ∈ M
1 that x̂ − x∗ ≤ r. Then, for n sufficiently large, depending on
x̂ − x∗ ≤ C · f (x∗ ) · , (r,δ)
k 1/4 f , δ, ζ and r, with high probability, for any x ∈ Ax∗ ∩ X,
where C is a constant depending on p, n, ζ, and f , DPC clusters x to the cluster belonging to x∗ .
Theorem 1 proves that DPC recovers the modes of an α-
Hölder continuous density f consistently. For n large enough, C. Limitations
with high probability, M contains unique estimates for all the The theoretical analysis of Section III-B is based on the
true modes of f . As such, there is an injection between the set assumption of a sample size large enough that the error of the
of true modes and the set of estimated modes. density estimator can be bounded. In this section, we provide an
The procedure used to assign points to their respective modes analysis of the density peaks clustering algorithm through three
is the same as that used in quick shift. As such, theoretical illustrative datasets, with n = 1500, from the scikit-learn clus-
guarantees developed for a variant of quick shift in [20] can be tering demonstration .1 Taken together, the three datasets provide
applied directly to DPC. We provide the relevant results below. an understanding of the density peaks clustering algorithm and
First, we define the attraction region of a mode. The attraction the type of clusters it returns; see Fig. 5.
region of a particular mode covers all points that flow towards First, the density estimation appears to recover population
the mode along the direction of the gradient of the underlying density for each dataset. The second feature of the density peaks
density. clustering algorithm analyzed is the decision graph, provided to
Definition 5: Let path νx : R → Rp satisfy νx (0) = x and enable the estimation of the modes from the dataset. The method
νx (t) = ∇f (νx (t)). For a mode x∗ ∈ M, its attraction region
of selecting mode estimates from the decision graph is seen to
Ax∗ is the set of points x ∈ X that satisfy limt→∞ νx (t) = x∗ .
It is shown that DPC can cluster sample points in the (r, δ)- 1 https://scikit-learn.org/stable/auto_examples/cluster/plot_cluster_compari
interior of an attraction region. The parameters r > 0 and δ > 0 son.html
Authorized licensed use limited to: Xiamen University. Downloaded on April 23,2024 at 13:32:44 UTC from IEEE Xplore. Restrictions apply.
TOBIN AND ZHANG: THEORETICAL ANALYSIS OF DENSITY PEAKS CLUSTERING AND THE COMPONENT-WISE PEAK-FINDING ALGORITHM 1113
Authorized licensed use limited to: Xiamen University. Downloaded on April 23,2024 at 13:32:44 UTC from IEEE Xplore. Restrictions apply.
1114 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 46, NO. 2, FEBRUARY 2024
13: ∩M
if M = ∅ then
of ρ, fewer vertices will be removed, and it is less likely that 14: ∗
Add x to M.
a proposed modal-set will be disconnected from existing ones. 15: end if
For larger values of ρ, more vertices and their edges will be 16: end loop
removed from the graph, and the probability of the proposed 17:
Initialise G(S, a directed graph with S as
E),
modal-set being disconnected will increase. It is not required vertices and no edges, E = ∅.
to have different ρ values for different component sets, because
1 18: for each x in S\M do
the threshold ρ− p rk (x∗ ) adapts naturally to the density level 19: Add a directed edge from x to b(x).
of the component set being assessed. It is seen that modal-sets 20: end for
associated with spurious modes of the density estimate fˆk will 21: for each cluster center x ∈ M do
not be accepted by CPF, as the modal-sets are not disconnected 22: Let C be the collection of the points connected by
from previously accepted modal-sets.
any directed path in G(S, that terminates at x.
E)
23: Add C ∪ x to C.
C. The CPF Algorithm
24: end for
The CPF ALGORITHM is explained with reference to Algo- 25: end for
rithm 2 and the illustrative example in Fig. 6. 26: return C
The algorithm takes as input the dataset X and uses pa-
Initially,
rameters k and ρ to return the final set of clusters C.
the set of estimated clusters is C = ∅. The undirected mutual (Lines 1–2). In Fig. 6(a), two components are extracted, yielding
k-nearest neighbor graph G(X, E) is constructed. Vertices that S = {S 1 , S 2 }.
have few to no edges are marked as outliers and removed. For each component set S ∈ S, CPF computes the peak-
The remaining data is partitioned into disjoint component sets finding criterion for each point and selects the instance x∗ with
according to the graph G(X, E) yielding S = {S 1 , . . . , S nS } maximal value (Lines 4–5). In Fig. 6(b), the higher estimated
Authorized licensed use limited to: Xiamen University. Downloaded on April 23,2024 at 13:32:44 UTC from IEEE Xplore. Restrictions apply.
TOBIN AND ZHANG: THEORETICAL ANALYSIS OF DENSITY PEAKS CLUSTERING AND THE COMPONENT-WISE PEAK-FINDING ALGORITHM 1115
Authorized licensed use limited to: Xiamen University. Downloaded on April 23,2024 at 13:32:44 UTC from IEEE Xplore. Restrictions apply.
1116 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 46, NO. 2, FEBRUARY 2024
VI. EXPERIMENTS
A. Experimental Set-Up
Code implementing CPF and code to reproduce the below
experiments is available online.2
Machine Configuration: All experiments have been con-
ducted on a PC running Debian 10 (Buster), consisting of 24
cores and 24 GB of RAM.
Evaluation Criteria: To evaluate the clusterings produced
we use the Adjusted Rand Index (ARI) [31] and the Adjusted
Mutual Information (AMI) [32]. For both metrics, a larger value
indicates a higher-quality clustering.
Comparison Methods: We compare with the following state-
of-the-art clustering algorithms:
r Density Peaks Clustering (DPC) method with k-NN den-
sity estimator explained in Section III. Implementation:
Python. Input Parameter: k - neighbors.
r The original DPC (ODP) method of [10]: Implementation:
R. Input Parameter: dc - threshold distance.
r Density Peaks Advanced Clustering (DPA) [18]: Imple-
mentation: Python and C++. Input Parameter: z - peak Fig. 8. Results of the clustering algorithms on synthetic datasets. The ARI
significance parameter. and the AMI for each clustering is given in the lower left corner.
r Adaptive Density Peaks Clustering (ADP) [16]: Implemen-
tation: R. Input Parameter: h - bandwidth. TABLE I
r Comparative Density Peaks Clustering (CDP) [33]: Im- CHARACTERISTICS OF THE REAL-WORLD DATASETS
Authorized licensed use limited to: Xiamen University. Downloaded on April 23,2024 at 13:32:44 UTC from IEEE Xplore. Restrictions apply.
TOBIN AND ZHANG: THEORETICAL ANALYSIS OF DENSITY PEAKS CLUSTERING AND THE COMPONENT-WISE PEAK-FINDING ALGORITHM 1117
TABLE II
QUALITY OF THE CLUSTERINGS FOR THE REAL-WORLD DATASETS
highest combined value of the ARI and AMI across the range TABLE III
P-VALUES FOR WILCOXON SIGNED-RANK TESTS WITH THE
of parameter values assessed. CPF achieves the best clustering, BENJAMINI-HOCHBERG CORRECTION, COMPARING THE ARI AND AMI VALUES
in terms of the ARI and AMI, for six of the datasets assessed, OF CPF WITH THE COMPETITOR METHODS
significantly outperforming all of the competitor methods. Also
presented are the mean rankings for the quality of the clusterings
returned by each of the methods for both metrics. Here, CPF is
seen to have the best performance overall, indicating that the
clustering results are generally of high quality.
In terms of the ARI, the methods with the three next highest
rank is KMS, ODP and CDP. In terms of the AMI, the DPC
method, as formulated in Section III-A is also among the best
performing approaches. Taken together, this makes a strong case
for the ability of the peak-finding criterion to detect meaningful The average run time, in seconds, for each method is presented
clusters in the data. in Table IV. For small datasets, DBS and HDB achieve the
Considering the competitor approaches that determine the fastest run time, however the magnitude of difference with
number of clusters automatically, the performance is signifi- CPF is unlikely to hinder their use in applications. This reflects
cantly worse than CPF. The peak-finding method DPA exhibits their implementation in C++. For larger datasets, CPF remains
inconsistent quality, achieving the best results for the Seeds competitive with the fastest methods and achieve near the fastest
but not detecting meaningful clusterings for the Ecoli, Glass run time for Letter Recognition, the dataset with the largest
and Vertebral datasets. The level set methods DBS and HDB number of instances assessed. Further context is provided in
perform poorly. The poor performance in both metrics indicates Table V, detailing the computational complexity of the algo-
that these methods fail to capture the classes present in the data. rithms. It is concluded that CPF, as well as achieving high quality
MNS achieves the optimal clustering for two datasets, Ecoli clusterings, gracefully scales to larger datasets.
and Page Blocks, but does not regularly return high quality
clusterings. QKS also does not return high quality clusterings, D. Analysis of the Parameter Space
particularly when assessed using ARI. As ARI significantly
penalizes false positive clusters, it can again be concluded that CPF achieves superb results across the datasets when opti-
quick shift is not adequately detecting the true number of clusters mal values for the parameters are applied. This performance
in the data. Considering the significant similarities between the is exhibited across datasets of all sizes, with optimal results
methodology of QKS and that of the DPC methods, the poor achieved for datasets with the fewest and most number of sam-
results are likely the result of difficulty in finding the optimal ples and for datasets with low and high numbers of dimensions.
value of the parameter h. The consistency of the performance of the approach is now
Following the guidance given in [39], the results are also demonstrated for a wide range of parameter values. CPF has two
subjected to a statistical analysis using non-parametric tests. parameters: 1) k, the number of neighbors computed for each
We apply the Wilcoxon signed-rank test for pairwise compar- point when constructing the k-NN graph and computing the the
isons, using the Benjamini-Hochberg correction to control the k-NN density estimator, and 2) ρ, the amount of variation in the
false-discovery rate [40], [41]. The p-values for the associated density used to assess potential cluster centers. The parameters
comparison are shown in Table III. The results indicate a strong of the competitor methods are detailed in Section VI-A. In Fig. 9
level of statistical significance for the improved clustering qual- we present the clustering quality in terms of the ARI and the AMI
ity for the CPF method. CPF significantly outperforms all but over a broad range of parameter values, for four datasets, with
one of the methods assessed and is not outperformed by any of the remainder included in the supplementary material, available
the competitor approaches. online.
Authorized licensed use limited to: Xiamen University. Downloaded on April 23,2024 at 13:32:44 UTC from IEEE Xplore. Restrictions apply.
1118 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 46, NO. 2, FEBRUARY 2024
TABLE IV
AVERAGE RUN TIME FOR THE REAL-WORLD DATASETS
Fig. 9. For each dataset and clustering algorithm, we show the clustering quality as a function of the input parameters. The ARI is shown in blue, and the AMI
in pink. Note that for CPF, we present the clustering quality as a function of both k and ρ.
Authorized licensed use limited to: Xiamen University. Downloaded on April 23,2024 at 13:32:44 UTC from IEEE Xplore. Restrictions apply.
TOBIN AND ZHANG: THEORETICAL ANALYSIS OF DENSITY PEAKS CLUSTERING AND THE COMPONENT-WISE PEAK-FINDING ALGORITHM 1119
Fig. 10. One pair of images from the Bikes and Boat image groups. Lines Fig. 11. Performance curves for the CPF-Match (black) and QuickMatch
between each pair of images indicate a match detected by CPF-Match. (blue) multi-image matching methods. For all datasets, k = 10, ρ = 0.5 and
the threshold parameter of QuickMatch is set to 4.
Authorized licensed use limited to: Xiamen University. Downloaded on April 23,2024 at 13:32:44 UTC from IEEE Xplore. Restrictions apply.
1120 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 46, NO. 2, FEBRUARY 2024
[5] K. Fukunaga and L. Hostetler, “The estimation of the gradient of a [32] N. X. Vinh, J. Epps, and J. Bailey, “Information theoretic measures for
density function, with applications in pattern recognition,” IEEE Trans. clusterings comparison: Variants, properties, normalization and correction
Inf. Theory, vol. 21, no. 1, pp. 32–40, Jan. 1975. for chance,” J. Mach. Learn. Res., vol. 11, pp. 2837–2854, 2010.
[6] Y. Cheng, “Mean shift, mode seeking, and clustering,” IEEE Trans. Pattern [33] Z. Li and Y. Tang, “Comparative density peaks clustering,” Expert Syst.
Anal. Mach. Intell., vol. 17, no. 8, pp. 790–799, Aug. 1995. Appl., vol. 95, pp. 236–247, Apr. 2018.
[7] D. Comaniciu and M. Peter, “Mean shift: A robust approach toward feature [34] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm
space analysis,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 5, for discovering clusters in large spatial databases with noise,” in Proc.
pp. 603–619, May 2002. 2nd Int. Conf. Knowl. Discov. Data Mining, Portland, OR, USA, 1996,
[8] A. Vedaldi and S. Soatto, “Quick shift and kernel methods for mode pp. 226–231.
seeking,” in Proc. Eur. Conf. Comput. Vis., D. Forsyth, P. Torr, and A. [35] R. J. G. B. Campello, D. Moulavi, and J. Sander, “Density-based cluster-
Zisserman, Eds., Berlin, Germany: Springer, 2008, pp. 705–718. ing based on hierarchical density estimates,” in Proc. Pacific-Asia Conf.
[9] H. Jiang, “On the consistency of quick shift,” in Proc. Int. Conf. Neural Knowl. Discov. Data Mining, J. Pei, V. S. Tseng, L. Cao, H. Motoda, and
Inf. Process. Syst., 2017, pp. 46–55. G. Xu, Eds., Berlin, Germany: Springer, 2013, pp. 160–172.
[10] A. Rodriguez and A. Laio, “Clustering by fast search and find of density [36] D. Arthur and S. Vassilvitskii, “k-means++: The advantages of careful
peaks,” Science, vol. 344, no. 6191, pp. 1492–1496, 2014. seeding,” in Proc. 18th Annu. ACM-SIAM Symp. Discrete Algorithms,
[11] X. Li and K.-C. Wong, “Evolutionary multiobjective clustering and its 2007, pp. 1027–1035.
applications to patient stratification,” IEEE Trans. Cybern., vol. 49, no. 5, [37] D. Dua and C. Graff, “UCI machine learning repository,” 2019. [Online].
pp. 1680–1693, May 2019. Available: http://archive.ics.uci.edu/ml
[12] D. Platero-Rochart, R. González-Alemán, E. W. Hernández-Rodríguez, [38] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learn-
F. Leclerc, J. Caballero, and L. Montero-Cabrera, “RCDPeaks: Memory- ing: Data Mining, Inference, and Prediction, 2nd ed. Berlin, Germany:
efficient density peaks clustering of long molecular dynamics,” Bioinfor- Springer, Aug. 2009.
matics, vol. 38, no. 7, pp. 1863–1869, 2022. [39] S. Garcia and F. Herrera, “An extension on “statistical comparisons of
[13] I. Verdinelli and L. Wasserman, “Analysis of a mode clustering diagram,” classifiers over multiple data sets” for all pairwise comparisons,” J. Mach.
Electron. J. Statist., vol. 12, no. 2, pp. 4288–4312, Jan. 2018. Learn. Res., vol. 9, no. 12, pp. 2677–2694, 2008.
[14] J. Xie, H. Gao, W. Xie, X. Liu, and P. W. Grant, “Robust clustering by [40] F. Wilcoxon, “Individual comparisons by ranking methods,” Biometrics,
detecting density peaks and assigning points based on fuzzy weighted vol. 1, pp. 80–83, 1945.
K-nearest neighbors,” Inf. Sci., vol. 354, no. C, pp. 19–40, 2016. [41] Y. Benjamini and Y. Hochberg, “Controlling the false discovery rate: A
[15] L. Yaohui, M. Zhengming, and Y. Fang, “Adaptive density peak clustering practical and powerful approach to multiple testing,” J. Roy. Statist. Soc.
based on K-nearest neighbors with aggregating strategy,” Knowl.-Based Ser. B Methodol., vol. 57, no. 1, pp. 289–300, 1995.
Syst., vol. 133, pp. 208–220, 2017. [42] R. Tron, X. Zhou, C. Esteves, and K. Daniilidis, “Fast multi-image match-
[16] X.-F. Wang and Y. Xu, “Fast clustering using adaptive density peak ing via density-based clustering,” in Proc. IEEE Int. Conf. Comput. Vis.,
detection,” Statist. Methods Med. Res., vol. 26, no. 6, pp. 2800–2811, 2017, pp. 4077–4086.
Dec. 2017. [43] D. Lowe, “Object recognition from local scale-invariant features,” in Proc.
[17] J. Ding, X. He, J. Yuan, and B. Jiang, “Automatic clustering based on 7th IEEE Int. Conf. Comput. Vis., 1999, pp. 1150–1157.
density peak detection using generalized extreme value distribution,” Soft
Comput., vol. 22, no. 9, pp. 2777–2796, May 2018.
[18] M. d’Errico, E. Facco, A. Laio, and A. Rodriguez, “Automatic topography
of high-dimensional data sets by non-parametric density peak clustering,”
Inf. Sci., vol. 560, pp. 476–492, 2021.
[19] H. Jiang and S. Kpotufe, “Modal-set estimation with an application to
clustering,” in Proc. Int. Conf. Artif. Intell. Statist., 2017, pp. 1197–1206,
issn: 2640–3498.
[20] H. Jiang, J. Jang, and S. Kpotufe, “QuickShift++: Provably good initial- Joshua Tobin received the BA (joint honours) degree
izations for sample-based mean shift,” in Proc. Int. Conf. Mach. Learn., in mathematics and economics and the PhD degree in
2018, pp. 2294–2303, issn: 2640–3498. statistics from Trinity College Dublin, in 2018 and
[21] J. Tobin and M. Zhang, “DCF: An efficient and robust density-based clus- 2022, respectively. He is a post-doctoral researcher
tering method,” in Proc. IEEE Int. Conf. Data Mining, 2021, pp. 629–638. with the School of Computer Science & Statistics,
[22] L. Dinh, D. Krueger, and Y. Bengio, “Nice: Non-linear independent Trinity College Dublin. His current research inter-
components estimation,” 2014, arXiv:1410.8516. est include parametric and non-parametric cluster-
[23] L. Dinh, J. Sohl-Dickstein, and S. Bengio, “Density estimation using real ing methods and machine learning applications for
NVP,” 2016, arXiv:1605.08803. healthcare.
[24] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,”
2013, arXiv:1312.6114.
[25] D. Rezende and S. Mohamed, “Variational inference with normalizing
flows,” in Proc. Int. Conf. Mach. Learn., 2015, pp. 1530–1538.
[26] R. Gao, E. Nijkamp, D. P. Kingma, Z. Xu, A. M. Dai, and Y. N. Wu, “Flow
contrastive estimation of energy-based models,” in Proc. IEEE/CVF Conf.
Comput. Vis. Pattern Recognit., 2020, pp. 7518–7528.
[27] S. Dasgupta and S. Kpotufe, “Optimal rates for k-NN density and mode
estimation,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2014, pp. 2555– Mimi Zhang received the BSc degree in statistics
2563. from the University of Science and Technology of
[28] H. Jiang and S. Kpotufe, “Modal-set estimation with an application to China, in 2011, and the PhD degree in industrial
clustering,” in Proc. 20th Int. Conf. Artif. Intell. Statist., 2017, pp. 1197– engineering from the City University of Hong Kong,
1206, issn: 2640–3498. in 2015. She joined Trinity College Dublin as an
[29] H. Jiang, “Density level set estimation on manifolds with DBSCAN,” in assistant professor in 2017. Before joining Trinity
Proc. 34th Int. Conf. Mach. Learn., 2017, pp. 1684–1693, issn: 2640–3498. College Dublin, she was a research associate with
[30] K. Chaudhuri, S. Dasgupta, S. Kpotufe, and U. von Luxburg, “Consistent the University of Strathclyde and Imperial College
procedures for cluster tree estimation and pruning,” IEEE Trans. Inf. London. Her main research areas are machine learn-
Theory, vol. 60, no. 12, pp. 7900–7912, Dec. 2014. ing and operations research, including clustering,
[31] L. Hubert and P. Arabie, “Comparing partitions,” J. Classification, vol. 2, Bayesian optimization, functional data analysis, tree-
no. 1, pp. 193–218, Dec. 1985. based methods, reliability and maintenance, etc.
Authorized licensed use limited to: Xiamen University. Downloaded on April 23,2024 at 13:32:44 UTC from IEEE Xplore. Restrictions apply.