0% found this document useful (0 votes)
23 views12 pages

A Theoretical Analysis of Density Peaks Clustering and The Component-Wise Peak-Finding Algorithm

This document presents a theoretical analysis of density peaks clustering and introduces a new algorithm, Component-wise Peak-Finding (CPF), which improves clustering accuracy by addressing issues related to noise in density estimates. CPF enhances the assignment methodology and automatically determines the number of clusters, demonstrating strong theoretical guarantees and superior performance in experiments. Additionally, a semi-supervised version of CPF is introduced for multi-image matching applications in computer vision.

Uploaded by

chi shang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views12 pages

A Theoretical Analysis of Density Peaks Clustering and The Component-Wise Peak-Finding Algorithm

This document presents a theoretical analysis of density peaks clustering and introduces a new algorithm, Component-wise Peak-Finding (CPF), which improves clustering accuracy by addressing issues related to noise in density estimates. CPF enhances the assignment methodology and automatically determines the number of clusters, demonstrating strong theoretical guarantees and superior performance in experiments. Additionally, a semi-supervised version of CPF is introduced for multi-image matching applications in computer vision.

Uploaded by

chi shang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 46, NO.

2, FEBRUARY 2024 1109

A Theoretical Analysis of Density Peaks Clustering


and the Component-Wise Peak-Finding Algorithm
Joshua Tobin and Mimi Zhang

Abstract—Density peaks clustering detects modes as points with Level set methods detect clusters as connected components
high density and large distance to points of higher density. Each of the density level sets {x : f (x) ≥ λ}, where f is the density
non-mode point is assigned to the same cluster as its nearest function and λ is a cutting threshold. The density f is unknown,
neighbor of higher density. Density peaks clustering has proved
capable in applications, yet little work has been done to understand and hence the level sets are required to be estimated from the
its theoretical properties or the characteristics of the clusterings it data. Nearest-neighbor graphs have been widely used for this
produces. Here, we prove that it consistently estimates the modes purpose [2], [3]. Taking the instances to be the vertices of a
of the underlying density and correctly clusters the data with graph, k-NN graphs add edges between a vertex and all its k
high probability. However, noise in the density estimates can lead nearest neighbors. Mutual k-NN graphs add an edge between
to erroneous modes and incoherent cluster assignments. A novel
clustering algorithm, Component-wise Peak-Finding (CPF), is pro- two vertices only if they are k nearest neighbors of each other. It
posed to remedy these issues. The improvements are twofold: 1) the has been shown that any density level set of a given dataset can
assignment methodology is improved by applying the density peaks be approximated by the connected components of the mutual
methodology within level sets of the estimated density; 2) the algo- k-NN graph [2], [3], and further work attempted to develop an
rithm is not affected by spurious maxima of the density and hence is understanding of the optimal choice of k [2], [4].
competent at automatically deciding the correct number of clusters.
We present novel theoretical results, proving the consistency of Mode-seeking methods aim to directly locate the modes in the
CPF, as well as extensive experimental results demonstrating its density and then associate each instance in the observed data
exceptional performance. Finally, a semi-supervised version of CPF with a relevant mode. Such approaches begin with a density
is presented, integrating clustering constraints to achieve excellent estimate fˆ and then move each point xi towards a mode of fˆ by
performance for an important problem in computer vision. ascending the density. Mean shift, introduced in [5] and further
Index Terms—Density-based clustering, nearest-neighbor developed in [6] and [7], is a popular mode-seeking method
graph, density peaks, semi-supervised clustering, multi-image that associates an instance to a mode along the path of steepest
matching. ascent of the density estimate. To circumvent the costly run time
of mean shift, the authors in [8] proposed a fast sample-based
I. INTRODUCTION method, termed quick shift. Quick shift simply associates each
instance to its nearest neighbor of higher empirical density.
ENSITY-BASED clustering methods relate the notion of
D clusters to high-density contiguous regions of the under-
lying density function. Hartigan [1] proposed the concept of
To return a partition of the data, a segmentation parameter τ
is required such that an instance will not be associated to its
nearest neighbor of higher density if the distance between them
density-based clusters as “regions ... where the densities are
is greater than τ . Quick shift is shown in [9] to consistently
high surrounded by regions where the densities are low”. The
estimate the non-trivial modes of the underlying density and to
concept is attractive for several reasons: 1) the clusters are
correctly assign instances to their associated mode. However,
free to assume any shape, in contrast to model-based clustering
appropriate tuning of τ requires a knowledge of the distances
methods; 2) the clustering method is associated with density but
between modes. Furthermore, determining modes by only the
without requiring strong assumptions on the density function;
distances between instances and their nearest neighbor of higher
3) the number of clusters is linked to density peaks and can be
density can cause outlying points to be erroneously selected as
determined as part of the estimation procedure. Density-based
modes.
clustering methods can be broadly classified into two categories:
The density peaks clustering (DPC) method introduced in [10]
level set methods and mode-seeking methods.
offers a potential remedy to these issues, providing an intuitive
method for sample-based mode detection. The true modes of
Manuscript received 22 November 2022; revised 8 August 2023; accepted
20 October 2023. Date of publication 25 October 2023; date of current version the density are estimated using a decision graph, a scatter plot
8 January 2024. This work was supported by the Science Foundation Ireland of the local density against the distance to the nearest neighbor of
under Grant 16/RC/3872. Recommended for acceptance by M. Mahoney. (Cor- higher density. The modes are estimated as the extreme instances
responding author: Joshua Tobin.)
The authors are with the School of Computer Science & Statistics, on the decision graph. DPC assigns the remaining instances
Trinity College Dublin, Dublin 2 Dublin, Ireland (e-mail: [email protected]; to the detected modes using the same methodology as quick
[email protected]). shift. The partition of the data is extracted by grouping together
This article has supplementary downloadable material available at
https://doi.org/10.1109/TPAMI.2023.3327471, provided by the authors. instances that are assigned to the same mode. The decision graph
Digital Object Identifier 10.1109/TPAMI.2023.3327471 and the resulting clustering for a toy dataset are shown in Fig. 1.

0162-8828 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Xiamen University. Downloaded on April 23,2024 at 13:32:44 UTC from IEEE Xplore. Restrictions apply.
1110 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 46, NO. 2, FEBRUARY 2024

Fig. 1. Left: The decision graph of the DPC method. The three extreme points
are detected as the modes. Right: The assignment of the instances to the modes.
Fig. 2. Left: Noise in the density estimate leads to errors when seeking point
modes. Right: Modal-set methods are robust to noise and recover the true cluster
structure.
While many papers have demonstrated the ability of the DPC
method to provide high-quality clusterings in applications [11],
[12], there is, to the best of our knowledge, only one publication II. RELATED WORK
on the theoretical analysis of the DPC method. In [13], the
authors derive a theoretically grounded rule for selecting modes Adaptations of the DPC method have proliferated in recent
from the decision graph, using a robust linear regression of log years. One strand of works focuses on improving the density
of the density estimates log fˆ(x) against the log of the distances estimator (see [14], [15]), and another strand of works focuses
to neighbors of higher density. on automating the selection of modes from the decision graph
In this work, we seek to deepen our understanding of the DPC (see [16], [17]). The authors of DPC have introduced a recent
method and propose a new density-based clustering technique approach [18], which applies a density estimator based on the
that improves DPC both theoretically and computationally. By intrinsic dimension of the data, and a pruning mechanism for
adapting results from related works, we provide theoretical false modes.
guarantees that DPC consistently estimates the modes of the A robust way of modelling high-density regions in the data
underlying density and can correctly cluster the data with high space is proposed in [19]. Modal-sets generalize the concept
probability. We also demonstrate the deficiencies of the DPC of a point mode to a local support of the density peak. An
methodology in the presence of noisy density estimates. Mo- illustrative example is given in Fig. 2. The related clustering
tivated by the deficiencies, we introduce a novel clustering al- procedure, termed MCores, estimates the modal-sets using con-
gorithm: Component-wise Peak-Finding (CPF). CPF improves nected components of k-NN graphs at different levels of the
DPC in two ways: first, CPF partitions the data into regions mu- empirical density. The authors provide consistency guarantees
tually separated by areas of low density before clustering, thus on the recovery of true modal-sets in the data. A subsequent work
ensuring the correct assignment of instances to their respective presents QuickShift++ [20] improving on the MCores procedure
clusters; second, the peak-finding criterion is directed to seek by adopting the same allocation procedure as quick shift and
modal-sets rather than point modes in the data, reducing the DPC. Recently, in [21], DPC was adapted to detect modal-sets.
sensitivity of the clustering result to fluctuations in the density The method, termed DCF, was shown to detect modal-sets more
estimate. We provide theoretical guarantees for our new algo- efficiently than QuickShift++.
rithm, extending the theoretical properties available for the DPC While [19], [20], [21] use classical non-parametric density
method. In particular, we prove that CPF recovers unique and estimators, recent literature has proposed density estimators
consistent estimates of the high-density regions in the data, and using neural networks. A prominent approach uses energy-based
correctly determines the true number of clusters. Furthermore, models, defining an unnormalized density that is the exponential
the complexity of our algorithm is of the order O(nk log(n)), of the negative energy function, parameterized by a neural
near linear in k and n. Finally, to demonstrate the adaptability of network. The estimation procedure involves either computing
CPF, we present a modified version of the method, CPF-Match, maximum likelihood estimates ([22], [23]), variational approx-
designed for multi-image matching, an application in computer imation to an unnormalized target ([24], [25]), or methods com-
vision. We show that CPF-Match achieves state-of-the-art per- bining both approaches ([26]). The density estimates produced
formance for this task. by these methods are computationally expensive to compute,
The remainder of the paper is organized as follows. In Sec- and consistency guarantees are not currently available making
tion III, we formalize the DPC method, provide a theoretical theoretical analysis challenging. Nevertheless, they can natu-
analysis of its performance, and demonstrate its deficiencies rally be integrated into the clustering method discussed in this
via illustrative examples. In Section IV, the CPF algorithm is work.
explained in detail, and its consistency properties are provided
in Section V. In Section VI, we assess the clustering quality III. DENSITY PEAKS CLUSTERING
of CPF on a range of simulated and real-world datasets and
A. The Method
show that CPF outperforms DPC and other peer clustering
methods. Section VII introduces CPF-Match, an adapted method The peak-finding method in [10] requires two inputs: 1) a
for multi-image matching. Section VIII concludes the paper. density estimate at each data point, and 2) the distance from

Authorized licensed use limited to: Xiamen University. Downloaded on April 23,2024 at 13:32:44 UTC from IEEE Xplore. Restrictions apply.
TOBIN AND ZHANG: THEORETICAL ANALYSIS OF DENSITY PEAKS CLUSTERING AND THE COMPONENT-WISE PEAK-FINDING ALGORITHM 1111

each point to its nearest neighbor of higher density. We consider


Algorithm 1: Density Peaks Clustering.
a dataset X consisting of n data points in Rp drawn from an
unknown density f with compact support X . We use a k-NN Input: Neighborhood parameter k.
density estimator as it is computationally fast and guarantees Output: A set of clusters C
1: Initialisation: M = ∅, G(X,  a directed graph
E),
on its quality are well understood. For a data point x ∈ X, let
with X as vertices and no edges, E  = ∅.
rk (x) be the distance between x and its k-th nearest neighbor.
ˆ
2: Create the decision graph {(fk (x), ω(x)) : x ∈ X}.
The density estimate used is a simple functional of the distance
rk (x). 3: Select the estimated modes using the thresholds l and
Definition 1: For every x ∈ Rp , let rk (x) denote the distance τ , i.e., {x ∈ X : fˆk (x) ≥ l, ω(x) ≥ τ }
4: Add the estimated modes {xj }m 
from x to its k-th nearest neighbor in X. The density estimate j=1 to M.
is given as 5: for each x in X\M do 
k 6: Add a directed edge from x to b(x).
fˆk (x) := , 7: end for
n · vp · rk (x)p
8: for each estimated mode x ∈ M  do
where vp is the volume of the unit sphere in Rp . 9: Let C be the collection of the points connected by
Note that this estimator is different from the estimate of the 
any directed path in G(X,  that terminates at x.
E)
empirical density used in [10], which counts the number of 10: Add C ∪ x to C.
data points within a threshold distance of a given instance. It 11: end for
is replaced as no guarantees of its consistency are possible. As 12: return C
well as a density estimate, the peak-finding criterion requires
the distance from each point to it’s nearest neighbor of higher
density:
point x to b(x) (Lines 5–7). The estimated mode together with
Definition 2: For the point x = argmaxx∈X fˆk (x), we define
all the vertices that have paths terminating at it form a cluster
the quantity
that is added to C (Lines 8–11). Proceeding in this way, each
ω(x) = max

x − x . sample point will be assigned to a unique cluster.
x ∈X

For the remaining points, let b(x) = argminx ∈X {x − x  : B. Theoretical Analysis
fˆk (x) < fˆk (x )}, i.e., the nearest neighbor of x with higher
density. Define the distance to the nearest neighbor of higher The quality of the clusterings provided by DPC has been
local density as thoroughly demonstrated in practice, as discussed in Section I.
Yet, no previous work has provided guarantees on the ability
ω(x) = x − b(x). of DPC to recover modes consistently. Through drawing an
Also of interest is the product of the estimated density fˆk (x)
analogy to the quick shift method, we show that DPC can recover
the modes and the associated cluster assignments with strong
and the distance quantity ω(x). This is termed the peak-finding
consistency guarantees.
criterion:
Definition 3: Taking fˆk (x) and ω(x) as defined above, we
Quick shift, as described in Section I, is a fast non-parametric
density-based method that produces clusterings with kernel
define the peak-finding criterion γ(x) as
density estimates. A directed graph is built with the observed
γ(x) = fˆk (x) · ω(x). instances as vertices, and edges added from each instance to its
nearest neighbor of higher estimated density. The final clusters
Following [10], the decision graph is the scatter plot of
are extracted as the connected components of the graph once
{(fˆk (x), ω(x)) : x ∈ X}. To generate a set of mode estimates
edges with length longer than a segmentation parameter τ are
M = {xj }m , threshold values for the density fˆk (x) and the
j=1 removed. The formulation of DPC introduced above is similar
distance ω(x) need to be set: the modes are the data points to quick shift in all but two ways: 1) a k-NN estimate of the
with the two metric values both above the thresholds, i.e., density is used in place of a kernel density estimate, and 2) a
M = {x ∈ X : fˆk (x) ≥ l, ω(x) ≥ τ }.
second threshold value l is defined, used to flag low-density
The algorithm used for density peaks clustering in this formu- instances as outliers. As such, in this section we present the
lation is described in Algorithm 1. The algorithm takes as input main results adapted from the consistency analysis of the k-NN
the dataset X and uses the parameter k to return the final set of density estimator [27] and the consistency analysis of quick
 Initially, the set of estimated modes M
clusters C.  = ∅ and the
shift [9]. The primary contributions involve drawing the analogy
cluster assignment graph G(X,  is initialized with vertices as
E) to the quick shift approach, and the extension of the analysis to
the points of X and no edges. DPC produces the decision graph include the density threshold l used in the mode selection step of
(Lines 2–3). DPC. An extended analysis, including proofs of the theorems,
DPC requests the user to select estimated modes using this plot is given in the supplementary material, available online.
as reference. The estimated modes {xj }m j=1 are then added to We assume that f is α-Hölder continuous and lower bounded

M (Line 4). After the set of estimated modes has been returned, on X . Furthermore, it is assumed that the level sets of f are
edges are added to the graph G(X,   from each non-modal
E) continuous with respect to the density level, and the modes of

Authorized licensed use limited to: Xiamen University. Downloaded on April 23,2024 at 13:32:44 UTC from IEEE Xplore. Restrictions apply.
1112 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 46, NO. 2, FEBRUARY 2024

Fig. 3. Illustration of the (r, θ, ν)+ -modes and (r, θ, ν)− -modes of Definition Fig. 4. Illustrative example of the (r, δ)-interior of an attraction region Ax∗ ,
(r,δ)
4. denoted Ax∗ , associated with a mode x∗ .

f have negative definite Hessian. Following [9], we now define hold simultaneously across all modes of the density and can be
a stronger notion of mode that allows for a clearer analysis of chosen arbitrarily small.
DPC. Definition 6: The (r, δ)-interior of an attraction region Ax∗ ,
Definition 4: A mode x∗ ∈ M is an (r, θ, ν)+ -mode, (r,δ)
denoted Ax∗ , is the set of points x1 ∈ Ax∗ such that a path P
if f (x∗ ) > f (x ) + θ for all x ∈ B(x∗ , r)\B(x∗ , rM ) and from x1 to any point x2 ∈ ∂Ax∗ satisfies
f (x∗ ) > ν + θ. A mode x∗ ∈ M is an (r, θ, ν)− -mode, if
f (x∗ ) < f (x ) − θ for some x ∈ B(x∗ , r) and f (x∗ ) > ν + θ. sup inf f (x ) ≥ sup f (x ) + δ.
x∈P x ∈B(x,r) x ∈B(x2 ,r)
Let M+ +
r,θ,ν ⊆ M denote the set of (r, θ, ν) -modes of f .
An illustration of the (r, θ, ν)+ -mode and the (r, θ, ν)− - Points in the interior of an attraction region must satisfy
modes is given in Fig. 3. Recall that the DPC algorithm requires the property that any path leaving the attraction region must
two cutting-off thresholds, one for cutting the value of the significantly decrease in density at some point. An illustrative
density estimate fˆk (x) and the other for cutting the distance to example is given in Fig. 4.
the nearest neighbor of higher estimated density, ω(x). Taking The main result (Theorem 2) states that, as long as the modes
the thresholds as τ and l for the density and distance values are well-estimated, the assignment method of DPC will correctly
 contains unique cluster the (r, δ)-interiors of the attraction regions with high
respectively, our first theorem shows that M
probability. Suppose that x∗ ∈ M is estimated by x̂ ∈ M  such
and consistent estimates of the (τ + , θ, l)+ -modes of f , for ∗
θ, > 0. that x̂ − x  ≤ r. Then, with high probability, for x ∈ Ax∗ ∩
Theorem 1 (Mode Estimation - adapted from Theorem 2 X, density peaks clustering clusters x to the cluster belonging
of [9]): For every x∗ ∈ M+ − to x∗ .
τ +,θ,l \Mτ −,θ,l , with probability
 satisfying Theorem 2 (Cluster Assignment - adapted from Theorem 2
at least 1 − ζ, there exists x̂ ∈ M  such
of [20]): Suppose that x∗ ∈ M is estimated by x̂ ∈ M
1 that x̂ − x∗  ≤ r. Then, for n sufficiently large, depending on
x̂ − x∗  ≤ C · f (x∗ ) · , (r,δ)
k 1/4 f , δ, ζ and r, with high probability, for any x ∈ Ax∗ ∩ X,
where C is a constant depending on p, n, ζ, and f , DPC clusters x to the cluster belonging to x∗ .
Theorem 1 proves that DPC recovers the modes of an α-
Hölder continuous density f consistently. For n large enough, C. Limitations
with high probability, M  contains unique estimates for all the The theoretical analysis of Section III-B is based on the
true modes of f . As such, there is an injection between the set assumption of a sample size large enough that the error of the
of true modes and the set of estimated modes. density estimator can be bounded. In this section, we provide an
The procedure used to assign points to their respective modes analysis of the density peaks clustering algorithm through three
is the same as that used in quick shift. As such, theoretical illustrative datasets, with n = 1500, from the scikit-learn clus-
guarantees developed for a variant of quick shift in [20] can be tering demonstration .1 Taken together, the three datasets provide
applied directly to DPC. We provide the relevant results below. an understanding of the density peaks clustering algorithm and
First, we define the attraction region of a mode. The attraction the type of clusters it returns; see Fig. 5.
region of a particular mode covers all points that flow towards First, the density estimation appears to recover population
the mode along the direction of the gradient of the underlying density for each dataset. The second feature of the density peaks
density. clustering algorithm analyzed is the decision graph, provided to
Definition 5: Let path νx : R → Rp satisfy νx (0) = x and enable the estimation of the modes from the dataset. The method
νx (t) = ∇f (νx (t)). For a mode x∗ ∈ M, its attraction region

of selecting mode estimates from the decision graph is seen to
Ax∗ is the set of points x ∈ X that satisfy limt→∞ νx (t) = x∗ .
It is shown that DPC can cluster sample points in the (r, δ)- 1 https://scikit-learn.org/stable/auto_examples/cluster/plot_cluster_compari
interior of an attraction region. The parameters r > 0 and δ > 0 son.html

Authorized licensed use limited to: Xiamen University. Downloaded on April 23,2024 at 13:32:44 UTC from IEEE Xplore. Restrictions apply.
TOBIN AND ZHANG: THEORETICAL ANALYSIS OF DENSITY PEAKS CLUSTERING AND THE COMPONENT-WISE PEAK-FINDING ALGORITHM 1113

A. Peak-Finding on Level Sets


In this section, we explain the component set notation and
the peak-finding criterion. We denote the mutual k-NN graph
G(X, E). The structure of the mutual k-NN graph can help
detect outlier data points in X. In particular, for x ∈ X, x is
an outlier if its vertex in the graph G(X, E) has very few or no
edges. We denote the set of outliers by O.
Definition 7: For every x ∈ Rp , let rk (x) denote the dis-
tance from x to its k-th nearest neighbor in X as before. The
mutual k-NN graph G(X, E) consists of the vertex set X
and the edge set E. There is an edge between two vertices xi
and xj , denoted by {xi , xj } ∈ E, if and only if xi − xj  ≤
min(rk (xi ), rk (xj )). That is, an edge exists between the ver-
tices xi and xj , only if they are a k-nearest neighbor of each
other.
Next, we formalize the notation of connected components of
the mutual k-NN graph G(X, E), beginning with the definition
of connectedness.
Definition 8: A path of length m from xi to xj , denoted by
{{xi , v 1 }, {v 1 , v 2 }, . . . , {v m−1 , xj }}, is a sequence of distinct
edges in E, starting at vertex v0 = xi and ending at vertex v m =
Fig. 5. Density peaks clustering of illustrative datasets. The k-NN density xj , such that {v r−1 , v r } ∈ E for all r = 1, . . . , m. We say that
estimator is used here with k = 40. Left: Density estimates for the dataset. the two data points xi and xj are connected, if there is a path
Darker regions indicate higher density. Center: The decision plot, with thresholds from xi to xj in the graph G(X, E).
set to estimate approximately the correct number of clusters. Right: The final
clustering assignment. The color of the shaded regions indicate the attraction The definition of connected components and component sets
region for each cluster. follows.
Definition 9: A connected component of G(X, E), denoted
by G(S, E(S)), is a subgraph of G(X, E), where any two
vertices in S are connected to each other by paths, and the
perform well when the density of the cluster is concentrated near edge set induced by S is a subset of E: E(S) = {{xi , xj } ∈
the mode and decays as the distance from the mode increases, E : xi ∈ S, xj ∈ S}. The vertex set S of the component graph
such as for the Unequal Variance Gaussian dataset. Each of the G(S, E(S)) is a subset of X and here is termed a component
remaining datasets contains areas of relatively uniform density. set of X.
This poses challenges for the density peaks clustering method From the definition of component, we know that the connected
as noise in the density estimate leads to erroneous modes being components of G(X, E) reveal certain underlying patterns of
selected. Finally, the assignment method of density peaks clus- the data. In particular, the data X can be partitioned into disjoint
tering is assessed. The assignment strategy is shown to perform component sets. Here, we denote the set of component sets S =
well for the Unequal Variance Gaussian dataset. The allocation {S 1 , . . . , S nS }, where nS = |S| is the number of component
of instances to clusters for the Noisy Circles runs contrary to sets, and S 1 ∪ · · · ∪ S nS = X.
geometric intuition about the clusters. In this case, the allocation Theoretical results regarding the ability of connected compo-
assigns instances to clusters across areas of very low density in nents of G(X, E) to estimate the level sets of f are given in [2],
the dataset. [3]. If two points belong to two different component sets, it is
In sum, the density peaks clustering framework performs well highly likely that they are separated by a region of low density.
for datasets containing clusters with clear point modes around
which the density decays, such as Gaussian components. The
framework struggles when the high density regions of the data B. Modelling High-Density Regions
are relatively uniform. In this case, both the mode selection We now explain the mode selection mechanism. The defini-
method and the assignment strategy are shown to be susceptible tions for the peak-finding technique used are the same as those
to errors caused by noise in the density estimate. given in Section III-A. The definitions below are given in terms
of one S ∈ S, and are equivalent for each.
Data points in S are placed in descending order of the peak-
finding criterion, and the modal-set associated with the instance
IV. THE PROPOSED CPF ALGORITHM having maximal value of the peak-finding criterion is automat-
In this section, the improvements to the density peaks clus- ically accepted. To decide whether or not to select modal-sets
tering method that constitute the CPF algorithm are introduced, associated with the subsequent instances, we here utilize an idea
together with a detailed analysis of the clustering algorithm. similar to the methods of [21], [28], [29]. A candidate modal-set

Authorized licensed use limited to: Xiamen University. Downloaded on April 23,2024 at 13:32:44 UTC from IEEE Xplore. Restrictions apply.
1114 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 46, NO. 2, FEBRUARY 2024

Fig. 6. Illustration of stages of the proposed CPF algorithm.

 associated with an instance x∗ is accepted only when it is


M Algorithm 2: The Component-Wise Peak Finding Algo-
well separated from the others. rithm.
Definition 10: Let 0 < ρ < 1. For an instance x∗ ∈ S, define Input: Neighborhood parameter k, fluctuation parameter
a graph G(V x∗ , E(V x∗ )) with ρ. Initialisation: C = ∅.
 1
 Output: A set of clusters C.
V x∗ = x ∈ S : rk (x) < ρ− p rk (x∗ ) .
1: Compute G(X, E), the mutual k-nearest neighbor
 is the connected component of graph.
The estimated modal-set M
2: Extract S, the set of component sets from G(X, E).
the graph G(V x∗ , E(V x∗ )) containing the vertex x∗ . M  is
3: for each S ∈ S do
accepted only if it does not intersect any previously selected 4: Sort the x’s according to their γ values.
modal-set. 5: Let x∗ = argmaxx∈S γ(x).
Note that the k-th nearest neighbour of x∗ in the distance (x∗ )
6: Let V x∗ = {x ∈ S : rk (x) < rρk1/p }.
rk (x∗ ) is a point from the component set S, not from the original
dataset X. This approach allows the graph to better reflect the 7: Let M  ⊆ S be the component set of the graph
scale of the data contained in the component set. The component G(V x∗ , E(V x∗ )) containing x∗ .
sets obtained from the graph G(V x∗ , E(V x∗ )) are assessed, and 8:  = {M
Initialise M  }, the set of true modal-sets in S.
if the component set containing x∗ , i.e., M  , does not intersect 9: loop
previously selected candidate modal-sets, then M  is accepted. 10: Let x∗ = argmaxx∈S {γ(x) : x ∈ 
/ M}.
rk (x∗ )
Varying the parameter ρ determines the number of clusters 11: Let V x∗ = {x ∈ S : rk (x) < ρ1/p }.
for each component set S. The threshold relates directly to 12:  ⊆ S be the component set of the graph
Let M
the estimated density for each of the instances. For example, if G(V x∗ , E(V x∗ )) containing x∗ .
fˆk (x1 ) < ρfˆk (x2 ) then rk (x1 ) > ρ− p rk (x2 ). For low values
1

13:  ∩M
if M  = ∅ then
of ρ, fewer vertices will be removed, and it is less likely that 14: ∗
Add x to M.
a proposed modal-set will be disconnected from existing ones. 15: end if
For larger values of ρ, more vertices and their edges will be 16: end loop
removed from the graph, and the probability of the proposed 17: 
Initialise G(S,  a directed graph with S as
E),
modal-set being disconnected will increase. It is not required vertices and no edges, E  = ∅.
to have different ρ values for different component sets, because 
1 18: for each x in S\M do
the threshold ρ− p rk (x∗ ) adapts naturally to the density level 19: Add a directed edge from x to b(x).
of the component set being assessed. It is seen that modal-sets 20: end for
associated with spurious modes of the density estimate fˆk will 21: for each cluster center x ∈ M  do
not be accepted by CPF, as the modal-sets are not disconnected 22: Let C be the collection of the points connected by
from previously accepted modal-sets. 
any directed path in G(S,  that terminates at x.
E)
23: Add C ∪ x to C. 
C. The CPF Algorithm
24: end for
The CPF ALGORITHM is explained with reference to Algo- 25: end for
rithm 2 and the illustrative example in Fig. 6. 26: return C
The algorithm takes as input the dataset X and uses pa-
 Initially,
rameters k and ρ to return the final set of clusters C.
the set of estimated clusters is C = ∅. The undirected mutual (Lines 1–2). In Fig. 6(a), two components are extracted, yielding
k-nearest neighbor graph G(X, E) is constructed. Vertices that S = {S 1 , S 2 }.
have few to no edges are marked as outliers and removed. For each component set S ∈ S, CPF computes the peak-
The remaining data is partitioned into disjoint component sets finding criterion for each point and selects the instance x∗ with
according to the graph G(X, E) yielding S = {S 1 , . . . , S nS } maximal value (Lines 4–5). In Fig. 6(b), the higher estimated

Authorized licensed use limited to: Xiamen University. Downloaded on April 23,2024 at 13:32:44 UTC from IEEE Xplore. Restrictions apply.
TOBIN AND ZHANG: THEORETICAL ANALYSIS OF DENSITY PEAKS CLUSTERING AND THE COMPONENT-WISE PEAK-FINDING ALGORITHM 1115

density of the instances is represented by darker colors, and


the magnitude of the peak-finding criterion for each instance is
represented using the size of the points.
The subgraph G(V x∗ , E(V x∗ )) is extracted, and the compo-
nent set of G(V x∗ , E(V x∗ )) containing x∗ is denoted by M .
The modal-set M  is automatically accepted, and the set of true
modal-sets for the component set S is initialised as M  = {M }

(Lines 6–8). Following the comutation of the points x for each
components, the purple and green modal-sets are automatically
selected in Fig. 6(c). Fig. 7. Analysis of the proportion of instances that do not have a point of
Next, the instance with maximal value of the peak-finding higher density in their k nearest neighbors. Data was simulated from a mixture
of Gaussian components with n = 40000, with the number of components and
criterion yet to be assessed is selected and denoted by x∗ . The all component parameters chosen randomly to ensure variety. CPF was run with
subgraph G(V x∗ , E(V x∗ )) is extracted, and the component set k = 100. The points in black are (|S|, p) for a given component set with the
of G(V x∗ , E(V x∗ )) containing x∗ is denoted by M  (Lines green dashed line showing the function 0.2 log(|S|)/|S|.

10–12). If M  is disjoint from all selected modal-sets in M, 


 
then M is added to M (Lines 13–14). For the top component
set in Fig. 6(c), no further modal-sets are detected. For the bottom The result proving the quality of the cluster assignment of
component set, a second modal-set, in yellow, is detected. Section III-B can also be applied to each component set, with
Once the center-selection loop is complete, non-center points suitable adjustments made to the number of observations in each
are allocated to their clusters. For each non-center point x, a component set.
directed edge is added from x to b(x), its nearest neighbor
of higher density (Lines 17–20). All vertices that have paths
terminating at the same cluster center are assigned to the same B. Complexity Analysis
cluster, and the cluster is subsequently added to C (Lines 21–23).
The process is repeated for each component set to return the final The most computation-intensive task is creating the mutual k-
set of clusters C. The clusters corresponding to each modal-set NN graph which requires O(nk log(n)) operations on average.
are shown in Fig. 6(d). Furthermore, a sample assignment path The connected components are extracted with O(n) operations.
for an instance in the purple cluster is shown in gold. Another major computational burden is finding, for each point,
its nearest neighbor of higher density in a component set. For
V. ANALYSIS OF CPF the points which do not have a point of higher density in
their neighbors, this requires O(|S|) operations, where |S| is
A. Theoretical Analysis the number of instances in the component set. Experimental
In this section, we show that CPF extends the theoretical results for the proportion of instances without a point of higher
guarantees available to the DPC method in Section III-B. We density in their neighbors are presented in Fig. 7. The green
demonstrate that CPF can, with high probability, estimate each line in the figure is 0.2 log(|S|)/|S|. As the proportion of such
modal-set of the underlying probability density bijectively. instances present in S appears of order O(log(|S|)/|S|), nearest
The notion of modal-sets can also be understood as a method neighbors of higher density are found in O(|S| log(|S|) time.
for pruning spurious estimates from the set of estimated modes, Assessing each cluster center requires O(|S|k) operations. The
in a similar way to the method of [30]. There, the authors prune assignment mechanism requires O(|S|) operations. As such, we
spurious modes arising due to sampling variability by assessing see that the complexity of CPF is O(nk log(n)), near linear in
the level sets at nearby levels of the density. Using nearest neigh- n and k.
bor graphs, [27] translates this framework for mode detection,
showing that the pruning method allows for bijective estimation
of the true modes that with density above a certain level. The C. Limitations
analogy to modal-sets is easily drawn. The CPF procedure will While the CPF algorithm remedies the mode estimation and
only retain an estimated modal-set, say M  , if it is contained in a assignment issues of the DPC algorithm, potential limitations
separate component set of the graph. The correspondence allows of the method exist. CPF, for simplicity, takes as input only
for the following result, given previously in [20], stating that the one neighborhood parameter k, used to compute the mutual k-
modal-set estimates returned by CPF estimate the modal-sets of nearest neighbor graph and the density estimate fˆk . For datasets
f bijectively and consistently. with small number of instances, often the optimal value of k
Theorem 3 (Modal-Set Estimation - adapted from Theorem for these tasks is different, with too small k leads to overseg-
1 of [20]): Let 0 < ρ < 1 and , ζ > 0. Let M 1 , . . . , M m mentation of the data, but too large k causes an oversmoothing
be the modal-sets of f . The following holds with probability of the density estimate and poor detection of the modes. This
1 − ζ. For n sufficiently large depending on f, ζ, and ρ, issue would be compounded if the data contained both low- and
CPF returns m modal-set estimates M  1, . . . , M
 m such that high-density clusters. In such a case, it is possible to define k1

M i ∩ X ⊆ M i ⊆ M i + B(0, ) for i = 1, . . . , m. and k2 for graph estimation and density estimation respectively.

Authorized licensed use limited to: Xiamen University. Downloaded on April 23,2024 at 13:32:44 UTC from IEEE Xplore. Restrictions apply.
1116 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 46, NO. 2, FEBRUARY 2024

VI. EXPERIMENTS
A. Experimental Set-Up
Code implementing CPF and code to reproduce the below
experiments is available online.2
Machine Configuration: All experiments have been con-
ducted on a PC running Debian 10 (Buster), consisting of 24
cores and 24 GB of RAM.
Evaluation Criteria: To evaluate the clusterings produced
we use the Adjusted Rand Index (ARI) [31] and the Adjusted
Mutual Information (AMI) [32]. For both metrics, a larger value
indicates a higher-quality clustering.
Comparison Methods: We compare with the following state-
of-the-art clustering algorithms:
r Density Peaks Clustering (DPC) method with k-NN den-
sity estimator explained in Section III. Implementation:
Python. Input Parameter: k - neighbors.
r The original DPC (ODP) method of [10]: Implementation:
R. Input Parameter: dc - threshold distance.
r Density Peaks Advanced Clustering (DPA) [18]: Imple-
mentation: Python and C++. Input Parameter: z - peak Fig. 8. Results of the clustering algorithms on synthetic datasets. The ARI
significance parameter. and the AMI for each clustering is given in the lower left corner.
r Adaptive Density Peaks Clustering (ADP) [16]: Implemen-
tation: R. Input Parameter: h - bandwidth. TABLE I
r Comparative Density Peaks Clustering (CDP) [33]: Im- CHARACTERISTICS OF THE REAL-WORLD DATASETS

plementation: Matlab. Input Parameter: dc - threshold dis-


tance.
r DBSCAN (DBS) [34]: Implementation: Python and C++.
Input Parameter: eps - threshold distance.
r HDBSCAN (HDB) [35]: Implementation: Python and
C++. Input Parameter: minP ts - minimum cluster size.
r Mean Shift (MNS) [6], [7]: Implementation: Python and
C++. Input Parameter: h - bandwidth.
r Quick Shift (QKS) [8]: Implementation: Python. Input
Parameter: h - bandwidth.
r K-Means++ (KMS) [36]: Implementation: Python and
We present the clustering with the highest combined value of
C++. Input Parameter: k - cluster number.
the ARI and AMI across the range of parameter values assessed.
The distance-based parameters of ODP, ADP, CDP, DBS,
The results are presented in Fig. 8, where different colours
MNS, and QKS were set to fractions of the average standard
indicate different clusters. The datasets are henceforth referred
deviation of the data in each direction. The neighborhood param-
√ to as Unequal Variance, Noisy Circles, Noisy Moons, and Large
eters of CPF, DPC, and HDB were set in the range of log n to n.
m, following Fig. 5. Considering the Unequal Variance dataset,
The parameter for KMS was set in a range of the true number
the mode-seeking methods are seen to extract the correct cluster
of clusters, and the peak significance parameter for DPA was set
structure, while DBS fails to detect clusters at different densities.
from 1.0 to 4.0. DPC, OPD, ADP, and CDP require the number of
The DBS method performs well for the Noisy Moons dataset as
clusters to be specified in advance. For all experiments, the true
the level set approach can detect clusters that are separated by
number of clusters is provided as an input to these algorithms.
regions of low density. DPC and MNS are seen to select multiple
modes for the high-density cluster for the Noisy Circles dataset.
B. Simulated Datasets
CPF is seen to exactly recover the cluster structure for each
A qualitative comparison of the clustering methods is pro- dataset, combining the benefits of level-set and mode-seeking
vided by applying them to four synthetic illustrative datasets. methods.
For brevity, we restrict the number of comparison methods to
a peak-finding approach (DPC), a level-set approach (DBS), C. Real-World Datasets
a mode-seeking method (MNS), and the proposed approach
We assess CPF on a pool of ten real-world datasets. Details
(CPF).
of the datasets can be found in Table I. Instances with missing
values are removed before clustering. Results are presented in
2 https://github.com/tobinjo96/CPFcluster (Github repository) Table II. For each method we present the clustering with the

Authorized licensed use limited to: Xiamen University. Downloaded on April 23,2024 at 13:32:44 UTC from IEEE Xplore. Restrictions apply.
TOBIN AND ZHANG: THEORETICAL ANALYSIS OF DENSITY PEAKS CLUSTERING AND THE COMPONENT-WISE PEAK-FINDING ALGORITHM 1117

TABLE II
QUALITY OF THE CLUSTERINGS FOR THE REAL-WORLD DATASETS

highest combined value of the ARI and AMI across the range TABLE III
P-VALUES FOR WILCOXON SIGNED-RANK TESTS WITH THE
of parameter values assessed. CPF achieves the best clustering, BENJAMINI-HOCHBERG CORRECTION, COMPARING THE ARI AND AMI VALUES
in terms of the ARI and AMI, for six of the datasets assessed, OF CPF WITH THE COMPETITOR METHODS
significantly outperforming all of the competitor methods. Also
presented are the mean rankings for the quality of the clusterings
returned by each of the methods for both metrics. Here, CPF is
seen to have the best performance overall, indicating that the
clustering results are generally of high quality.
In terms of the ARI, the methods with the three next highest
rank is KMS, ODP and CDP. In terms of the AMI, the DPC
method, as formulated in Section III-A is also among the best
performing approaches. Taken together, this makes a strong case
for the ability of the peak-finding criterion to detect meaningful The average run time, in seconds, for each method is presented
clusters in the data. in Table IV. For small datasets, DBS and HDB achieve the
Considering the competitor approaches that determine the fastest run time, however the magnitude of difference with
number of clusters automatically, the performance is signifi- CPF is unlikely to hinder their use in applications. This reflects
cantly worse than CPF. The peak-finding method DPA exhibits their implementation in C++. For larger datasets, CPF remains
inconsistent quality, achieving the best results for the Seeds competitive with the fastest methods and achieve near the fastest
but not detecting meaningful clusterings for the Ecoli, Glass run time for Letter Recognition, the dataset with the largest
and Vertebral datasets. The level set methods DBS and HDB number of instances assessed. Further context is provided in
perform poorly. The poor performance in both metrics indicates Table V, detailing the computational complexity of the algo-
that these methods fail to capture the classes present in the data. rithms. It is concluded that CPF, as well as achieving high quality
MNS achieves the optimal clustering for two datasets, Ecoli clusterings, gracefully scales to larger datasets.
and Page Blocks, but does not regularly return high quality
clusterings. QKS also does not return high quality clusterings, D. Analysis of the Parameter Space
particularly when assessed using ARI. As ARI significantly
penalizes false positive clusters, it can again be concluded that CPF achieves superb results across the datasets when opti-
quick shift is not adequately detecting the true number of clusters mal values for the parameters are applied. This performance
in the data. Considering the significant similarities between the is exhibited across datasets of all sizes, with optimal results
methodology of QKS and that of the DPC methods, the poor achieved for datasets with the fewest and most number of sam-
results are likely the result of difficulty in finding the optimal ples and for datasets with low and high numbers of dimensions.
value of the parameter h. The consistency of the performance of the approach is now
Following the guidance given in [39], the results are also demonstrated for a wide range of parameter values. CPF has two
subjected to a statistical analysis using non-parametric tests. parameters: 1) k, the number of neighbors computed for each
We apply the Wilcoxon signed-rank test for pairwise compar- point when constructing the k-NN graph and computing the the
isons, using the Benjamini-Hochberg correction to control the k-NN density estimator, and 2) ρ, the amount of variation in the
false-discovery rate [40], [41]. The p-values for the associated density used to assess potential cluster centers. The parameters
comparison are shown in Table III. The results indicate a strong of the competitor methods are detailed in Section VI-A. In Fig. 9
level of statistical significance for the improved clustering qual- we present the clustering quality in terms of the ARI and the AMI
ity for the CPF method. CPF significantly outperforms all but over a broad range of parameter values, for four datasets, with
one of the methods assessed and is not outperformed by any of the remainder included in the supplementary material, available
the competitor approaches. online.

Authorized licensed use limited to: Xiamen University. Downloaded on April 23,2024 at 13:32:44 UTC from IEEE Xplore. Restrictions apply.
1118 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 46, NO. 2, FEBRUARY 2024

TABLE IV
AVERAGE RUN TIME FOR THE REAL-WORLD DATASETS

Fig. 9. For each dataset and clustering algorithm, we show the clustering quality as a function of the input parameters. The ARI is shown in blue, and the AMI
in pink. Note that for CPF, we present the clustering quality as a function of both k and ρ.

TABLE V additional benefit of CPF is that the parameters do not depend


COMPUTATIONAL COMPLEXITY FOR THE COMPETITOR ALGORITHMS
on the scale of the data. This is illustrated in the large range of
k, relative to the size of the datasets, for which CPF achieves
excellent results.

VII. MULTI-IMAGE MATCHING WITH CPF


In this section, we introduce an adapted version of CPF for
CPF is relatively robust to the choice of k and ρ for all the multi-image matching. Multi-image matching is an important
datasets apart from the Vertebral dataset, for which the choice of application in computer vision, notably in the reconstruction
k appears important to the clustering quality. The results indicate of 3-D scenes form 2-D images. We can consider the problem
that,√for general application, it is recommended to assess k = as extending clustering from an unsupervised task to a semi-
0.9 n. This value is near the optimal for all of the datasets supervised task. For multi-image matching, the only supervision
apart from the Letter Recognition dataset. The quality of the information provided is the images from which each point is
clusterings remains consistent as the variation parameter used created. No two instances from the same image can be grouped
to assess potential cluster centers varies from ρ = 0.1 to ρ = 0.9. together in the final clustering.
For general application, it is recommended to first assess ρ = 0.6 Quick shift forms the basis of the first successful application of
as competitive results are achieved for all datasets, except Page density-based clustering to the problem of multi-image match-
Blocks. The performance of CPF for values of the parameter ing. QuickMatch [42] modifies quick shift by moving a point
ρ is not affected by the number of samples in the data. Users to its nearest neighbor with higher empirical density, only if the
can intuitively tune the parameter ρ for alternate clusterings, neighbor does not belong to an image already contained in the
increasing ρ if more clusters are desired and decreasing ρ if fewer cluster. We adapt the CPF method introduced in Algorithm 2 to
clusters are desired. Considering the competitor methods, it is accommodate supervision information. Denote the image label
noted that ADP, CDP, DPA, and QKS also achieve consistent of an instance x by I(x) ∈ {1, . . . , nI }, where nI is the number
results as the values of their respective parameters increase. of images assessed. As such, we present CPF-Match by updating
Each of these methods, as well CPF, allocate instances to the the allocation phase of Algorithm 2, substituting lines 19-22 with
same cluster as their nearest neighbor of higher local density. An Algorithm 3. CPF-Match modifies the allocation procedure of

Authorized licensed use limited to: Xiamen University. Downloaded on April 23,2024 at 13:32:44 UTC from IEEE Xplore. Restrictions apply.
TOBIN AND ZHANG: THEORETICAL ANALYSIS OF DENSITY PEAKS CLUSTERING AND THE COMPONENT-WISE PEAK-FINDING ALGORITHM 1119

Fig. 10. One pair of images from the Bikes and Boat image groups. Lines Fig. 11. Performance curves for the CPF-Match (black) and QuickMatch
between each pair of images indicate a match detected by CPF-Match. (blue) multi-image matching methods. For all datasets, k = 10, ρ = 0.5 and
the threshold parameter of QuickMatch is set to 4.

CPF, while component sets and cluster centers are selected in


the same way. performance curves for CPF-Match and QuickMatch for each
of the six datasets are presented in Fig. 11. CPF-Match achieves
Algorithm 3: CPF-Match. superior results compared with QuickMatch for each of the

17: Initialise G(S,  a directed graph with S as
E), datasets. The improvements are notable for the Bikes, Boat and
 = ∅. Leuven image sets. CPF-Match is a viable and effective method
vertices and no edges, E
 in ascending order of the for the multi-image matching problem.
18: Sort the vertices x ∈ S\M
distance from x to b(x).
19: for each x do VIII. CONCLUSION AND FUTURE WORK
20: if I(x) = I(b(x)) then In this work, we provided the first theoretical analysis of the
21: Add a directed edge from x to b(x). popular DPC algorithm. DPC was proven to consistently esti-
22: end if mate the modes of the underlying density, and correctly assign
23: end for instances to clusters with high probability. We also demonstrated
issues with the DPC framework. This analysis motivated a new

A directed graph G(S,  is initialized as before (Line 17).
E) clustering technique, the CPF algorithm. CPF combines the ben-
Next, CPF-Match sorts the points of S not in modal-sets ac- efits of both density-level set and mode-seeking density-based
cording to the distance x − b(x), from smallest to largest clustering methods. CPF offers extended theoretical guarantees,
(Line 18). Processing the non-center points in turn, a directed compared to DPC, and exhibits improved clustering perfor-
edge from x to b(x) is added if x and b(x) are not from the mance on a range of synthetic and real-world datasets. Finally,
same image, i.e., I(x) = I(b(x)) (Lines 19–23). we introduced CPF-Match, an adaptation of CPF for an impor-
To demonstrate the ability of CPF-Match to perform multi- tant semi-supervised computer vision application. In future, we
image matching, we apply it to the Graffiti dataset.3 The dataset envisage the extension of CPF and CPF-Match to incorporate
contains six image groups (bark, bikes, boat, graffiti, Leuven, other forms of supervision, including geometric information
and UBC), each containing six different images of the same for the multi-image matching problem, using node-attributed
scene. Features are extracted from each image using SIFT, mutual k-NN graphs.
roughly 500 for each image [43]. Examples for a pair of images
from two of the six image groups are presented in Fig. 10. ACKNOWLEDGMENT
For evaluation, we apply the same approach as in [42]. For
For the purpose of Open Access, the authors have applied a CC
a test point in an image, we calculate the distance between
BY public copyright licence to any Author Accepted Manuscript
its estimated correspondence and the true correspondence in
version arising from this submission.
another image. If the distance is smaller than a threshold, we
consider the match to be correct. We plot the percentage of
testing points with correct matches versus the threshold values REFERENCES
to obtain a curve which can be interpreted in a manner similar to [1] J. A. Hartigan, Clustering Algorithms. Hoboken, NJ, USA: Wiley, 1975.
a precision-recall curve. As homography matrices are provided [2] M. Maier, M. Hein, and U. von Luxburg, “Optimal construction of k-
nearest-neighbor graphs for identifying noisy clusters,” Theor. Comput.
relating the first image with the remaining images in each group, Sci., vol. 410, no. 19, pp. 1749–1764, 2009.
we use all detected feature points in the first image as test [3] S. Kpotufe and U. von Luxburg, “Pruning nearest neighbor cluster trees,”
points and evaluate the matches in the other five images. The May 2011, arXiv:1105.0540. [Online]. Available: http://arxiv.org/abs/
1105.0540
[4] I. Steinwart, “Adaptive density level set clustering,” in Proc. 24th Annu.
3 https://cvssp.org/featurespace/web/related_papers/graffiti.html Conf. Learn. Theory, 2011, pp. 703–738, issn: 1938–7228.

Authorized licensed use limited to: Xiamen University. Downloaded on April 23,2024 at 13:32:44 UTC from IEEE Xplore. Restrictions apply.
1120 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 46, NO. 2, FEBRUARY 2024

[5] K. Fukunaga and L. Hostetler, “The estimation of the gradient of a [32] N. X. Vinh, J. Epps, and J. Bailey, “Information theoretic measures for
density function, with applications in pattern recognition,” IEEE Trans. clusterings comparison: Variants, properties, normalization and correction
Inf. Theory, vol. 21, no. 1, pp. 32–40, Jan. 1975. for chance,” J. Mach. Learn. Res., vol. 11, pp. 2837–2854, 2010.
[6] Y. Cheng, “Mean shift, mode seeking, and clustering,” IEEE Trans. Pattern [33] Z. Li and Y. Tang, “Comparative density peaks clustering,” Expert Syst.
Anal. Mach. Intell., vol. 17, no. 8, pp. 790–799, Aug. 1995. Appl., vol. 95, pp. 236–247, Apr. 2018.
[7] D. Comaniciu and M. Peter, “Mean shift: A robust approach toward feature [34] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm
space analysis,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 5, for discovering clusters in large spatial databases with noise,” in Proc.
pp. 603–619, May 2002. 2nd Int. Conf. Knowl. Discov. Data Mining, Portland, OR, USA, 1996,
[8] A. Vedaldi and S. Soatto, “Quick shift and kernel methods for mode pp. 226–231.
seeking,” in Proc. Eur. Conf. Comput. Vis., D. Forsyth, P. Torr, and A. [35] R. J. G. B. Campello, D. Moulavi, and J. Sander, “Density-based cluster-
Zisserman, Eds., Berlin, Germany: Springer, 2008, pp. 705–718. ing based on hierarchical density estimates,” in Proc. Pacific-Asia Conf.
[9] H. Jiang, “On the consistency of quick shift,” in Proc. Int. Conf. Neural Knowl. Discov. Data Mining, J. Pei, V. S. Tseng, L. Cao, H. Motoda, and
Inf. Process. Syst., 2017, pp. 46–55. G. Xu, Eds., Berlin, Germany: Springer, 2013, pp. 160–172.
[10] A. Rodriguez and A. Laio, “Clustering by fast search and find of density [36] D. Arthur and S. Vassilvitskii, “k-means++: The advantages of careful
peaks,” Science, vol. 344, no. 6191, pp. 1492–1496, 2014. seeding,” in Proc. 18th Annu. ACM-SIAM Symp. Discrete Algorithms,
[11] X. Li and K.-C. Wong, “Evolutionary multiobjective clustering and its 2007, pp. 1027–1035.
applications to patient stratification,” IEEE Trans. Cybern., vol. 49, no. 5, [37] D. Dua and C. Graff, “UCI machine learning repository,” 2019. [Online].
pp. 1680–1693, May 2019. Available: http://archive.ics.uci.edu/ml
[12] D. Platero-Rochart, R. González-Alemán, E. W. Hernández-Rodríguez, [38] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learn-
F. Leclerc, J. Caballero, and L. Montero-Cabrera, “RCDPeaks: Memory- ing: Data Mining, Inference, and Prediction, 2nd ed. Berlin, Germany:
efficient density peaks clustering of long molecular dynamics,” Bioinfor- Springer, Aug. 2009.
matics, vol. 38, no. 7, pp. 1863–1869, 2022. [39] S. Garcia and F. Herrera, “An extension on “statistical comparisons of
[13] I. Verdinelli and L. Wasserman, “Analysis of a mode clustering diagram,” classifiers over multiple data sets” for all pairwise comparisons,” J. Mach.
Electron. J. Statist., vol. 12, no. 2, pp. 4288–4312, Jan. 2018. Learn. Res., vol. 9, no. 12, pp. 2677–2694, 2008.
[14] J. Xie, H. Gao, W. Xie, X. Liu, and P. W. Grant, “Robust clustering by [40] F. Wilcoxon, “Individual comparisons by ranking methods,” Biometrics,
detecting density peaks and assigning points based on fuzzy weighted vol. 1, pp. 80–83, 1945.
K-nearest neighbors,” Inf. Sci., vol. 354, no. C, pp. 19–40, 2016. [41] Y. Benjamini and Y. Hochberg, “Controlling the false discovery rate: A
[15] L. Yaohui, M. Zhengming, and Y. Fang, “Adaptive density peak clustering practical and powerful approach to multiple testing,” J. Roy. Statist. Soc.
based on K-nearest neighbors with aggregating strategy,” Knowl.-Based Ser. B Methodol., vol. 57, no. 1, pp. 289–300, 1995.
Syst., vol. 133, pp. 208–220, 2017. [42] R. Tron, X. Zhou, C. Esteves, and K. Daniilidis, “Fast multi-image match-
[16] X.-F. Wang and Y. Xu, “Fast clustering using adaptive density peak ing via density-based clustering,” in Proc. IEEE Int. Conf. Comput. Vis.,
detection,” Statist. Methods Med. Res., vol. 26, no. 6, pp. 2800–2811, 2017, pp. 4077–4086.
Dec. 2017. [43] D. Lowe, “Object recognition from local scale-invariant features,” in Proc.
[17] J. Ding, X. He, J. Yuan, and B. Jiang, “Automatic clustering based on 7th IEEE Int. Conf. Comput. Vis., 1999, pp. 1150–1157.
density peak detection using generalized extreme value distribution,” Soft
Comput., vol. 22, no. 9, pp. 2777–2796, May 2018.
[18] M. d’Errico, E. Facco, A. Laio, and A. Rodriguez, “Automatic topography
of high-dimensional data sets by non-parametric density peak clustering,”
Inf. Sci., vol. 560, pp. 476–492, 2021.
[19] H. Jiang and S. Kpotufe, “Modal-set estimation with an application to
clustering,” in Proc. Int. Conf. Artif. Intell. Statist., 2017, pp. 1197–1206,
issn: 2640–3498.
[20] H. Jiang, J. Jang, and S. Kpotufe, “QuickShift++: Provably good initial- Joshua Tobin received the BA (joint honours) degree
izations for sample-based mean shift,” in Proc. Int. Conf. Mach. Learn., in mathematics and economics and the PhD degree in
2018, pp. 2294–2303, issn: 2640–3498. statistics from Trinity College Dublin, in 2018 and
[21] J. Tobin and M. Zhang, “DCF: An efficient and robust density-based clus- 2022, respectively. He is a post-doctoral researcher
tering method,” in Proc. IEEE Int. Conf. Data Mining, 2021, pp. 629–638. with the School of Computer Science & Statistics,
[22] L. Dinh, D. Krueger, and Y. Bengio, “Nice: Non-linear independent Trinity College Dublin. His current research inter-
components estimation,” 2014, arXiv:1410.8516. est include parametric and non-parametric cluster-
[23] L. Dinh, J. Sohl-Dickstein, and S. Bengio, “Density estimation using real ing methods and machine learning applications for
NVP,” 2016, arXiv:1605.08803. healthcare.
[24] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,”
2013, arXiv:1312.6114.
[25] D. Rezende and S. Mohamed, “Variational inference with normalizing
flows,” in Proc. Int. Conf. Mach. Learn., 2015, pp. 1530–1538.
[26] R. Gao, E. Nijkamp, D. P. Kingma, Z. Xu, A. M. Dai, and Y. N. Wu, “Flow
contrastive estimation of energy-based models,” in Proc. IEEE/CVF Conf.
Comput. Vis. Pattern Recognit., 2020, pp. 7518–7528.
[27] S. Dasgupta and S. Kpotufe, “Optimal rates for k-NN density and mode
estimation,” in Proc. Int. Conf. Neural Inf. Process. Syst., 2014, pp. 2555– Mimi Zhang received the BSc degree in statistics
2563. from the University of Science and Technology of
[28] H. Jiang and S. Kpotufe, “Modal-set estimation with an application to China, in 2011, and the PhD degree in industrial
clustering,” in Proc. 20th Int. Conf. Artif. Intell. Statist., 2017, pp. 1197– engineering from the City University of Hong Kong,
1206, issn: 2640–3498. in 2015. She joined Trinity College Dublin as an
[29] H. Jiang, “Density level set estimation on manifolds with DBSCAN,” in assistant professor in 2017. Before joining Trinity
Proc. 34th Int. Conf. Mach. Learn., 2017, pp. 1684–1693, issn: 2640–3498. College Dublin, she was a research associate with
[30] K. Chaudhuri, S. Dasgupta, S. Kpotufe, and U. von Luxburg, “Consistent the University of Strathclyde and Imperial College
procedures for cluster tree estimation and pruning,” IEEE Trans. Inf. London. Her main research areas are machine learn-
Theory, vol. 60, no. 12, pp. 7900–7912, Dec. 2014. ing and operations research, including clustering,
[31] L. Hubert and P. Arabie, “Comparing partitions,” J. Classification, vol. 2, Bayesian optimization, functional data analysis, tree-
no. 1, pp. 193–218, Dec. 1985. based methods, reliability and maintenance, etc.

Authorized licensed use limited to: Xiamen University. Downloaded on April 23,2024 at 13:32:44 UTC from IEEE Xplore. Restrictions apply.

You might also like