0% found this document useful (0 votes)

5 views13 pages

Outer-Points Shaver-Robust Graph-Based Clustering Via Node Cutting

This document presents a robust graph-based clustering method called the Outer-Points Shaver, which enhances spectral clustering by removing low-density data nodes to improve accuracy and resilience against noise and outliers. The method utilizes a pseudo-density reconstruction approach and does not require predefining the number of clusters, allowing for more flexible clustering of local and nonlinear data patterns. Simulation and real-world data comparisons demonstrate that the proposed method outperforms existing clustering techniques in terms of robustness and accuracy.

Uploaded by

Samsudin Ibrahim Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views13 pages

Outer-Points Shaver-Robust Graph-Based Clustering Via Node Cutting

Uploaded by

Samsudin Ibrahim Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Pattern Recognition 97 (2020) 107001

Contents lists available at ScienceDirect

Pattern Recognition
journal homepage: www.elsevier.com/locate/patcog

Outer-Points shaver: Robust graph-based clustering via node cutting

Younghoon Kim, Hyungrok Do, Seoung Bum Kim∗
School of Industrial Management Engineering, Korea University, 145 Anam-Ro, Seoungbuk-Gu, Seoul 02841, Republic of Korea

a r t i c l e i n f o a b s t r a c t

Article history: Graph-based clustering is an eﬃcient method for identifying clusters in local and nonlinear data pat-
Received 4 September 2018 terns. Among the existing methods, spectral clustering is one of the most prominent algorithms. How-
Revised 30 May 2019
ever, this method is vulnerable to noise and outliers. This study proposes a robust graph-based cluster-
Accepted 13 August 2019
ing method that removes the data nodes of relatively low density. The proposed method calculates the
Available online 14 August 2019
pseudo-density from a similarity matrix, and reconstructs it using a sparse regularization model. In this
Keywords: process, noise and the outer points are determined and removed. Unlike previous edge cutting-based
Graph-based clustering methods, the proposed method is robust to noise while detecting clusters because it cuts out irrelevant
Unsupervised learning nodes. We use a simulation and real-world data to demonstrate the usefulness of the proposed method
Spectral clustering by comparing it to existing methods in terms of clustering accuracy and robustness to noisy data. The
Pseudo-density reconstruction comparison results conﬁrm that the proposed method outperforms the alternatives.
Node cutting
© 2019 Elsevier Ltd. All rights reserved.

1. Introduction Unlike compactness-based methods, connectivity-based ap-

proaches, such as graph-based clustering algorithms, measure the
Clustering is a useful and important research topic in machine similarity between two observations based on their connectivity
learning and pattern recognition. Clustering analysis extracts in- (e.g., the geodesic distance between two observation nodes). A
trinsic group structures from the very large data sets that are gen- graph-based clustering algorithm discovers intrinsic groups of local
erated in modern industrial processes. The general goal of clus- and nonlinear patterns by extracting topological information from
tering is to group similar observations together, while separating the relative adjacencies among the observations. In graph-based
dissimilar observations into different groups. Various clustering al- clustering, subgraphs represent clusters where it is intended to
gorithms have been proposed and successfully applied in numer- maximize the intra-connectivity within subgraphs. Various graph-
ous domains ranging from computer vision to network analysis based clustering methods have been proposed [7]. Among these,
[1]. These approaches can be categorized into two groups based on spectral clustering [8] is one of the most prominent and successful
the criteria used to measure the similarities between observations: approaches [9]. Because spectral clustering relies only on a similar-
compactness-based and connectivity-based approaches [2]. ity graph (e.g., each instance is connected to its nearest neighbors),
Compactness-based methods, such as k-means [3] and hier- this method is applicable to most data types, including vector
archical clustering methods [4], measure similarity based on at- and graph data. Spectral clustering embeds the data observations
tribute similarity (e.g., the Euclidian distance between two at- into a vector space spanned by the k eigenvectors corresponding
tribute vectors). Although compactness-based clustering methods to the k smallest eigenvalues of the similarity graph’s normalized
exhibit powerful modeling capabilities, they predominantly fail to Laplacian matrix. Local and nonlinear structures can also be accu-
identify local and nonlinear patterns [2]. The fast single-linkage- rately grouped by clustering in the embedded space. From the per-
algorithm is most commonly used for local and nonlinear pattern spective of graph partitioning using cuts, spectral clustering cor-
structures [5]. However, this method is highly sensitive to outliers, responds to solving the relaxation of an NP-hard discrete graph
leading to redundant linkages between clusters. Mean shift cluster- cut problem based on spectral graph theory. For data given in the
ing [6] is another commonly used method for local and nonlinear form of a similarity graph, the problem is to identify an optimal
patterns. This method can only identify clusters in local patterns if cut such that the edges between different groups have signiﬁcantly
all the local patterns share a single maximum mode. Furthermore, low weights, and the edges within a group have high weights [10].
a suitable kernel bandwidth must be optimized. Although spectral clustering has demonstrated strong clustering
performance, the following two major issues should be addressed.
First, it is highly sensitive to noisy input data. Noisy data renders
∗
Corresponding author. it challenging to identify an optimal cut because this increases the
E-mail address: [email protected] (S.B. Kim). redundant connections between the clusters in a similarity graph.

https://doi.org/10.1016/j.patcog.2019.107001
0031-3203/© 2019 Elsevier Ltd. All rights reserved.
2 Y. Kim, H. Do and S.B. Kim / Pattern Recognition 97 (2020) 107001

Fig. 1. Graphical illustration of the proposed clustering algorithm: (A) Detecting noise and outer points in the similarity graph, (B) cutting out the detected nodes and
identifying connected components, and (C) assigning the noise and outer points to the nearest cluster.

Second, it is necessary to predetermine the number of clusters be- method. The removal of outer points resembles shaving, and thus
cause the k-means algorithm should be applied in a state where we named the algorithm an “outer-points shaver.” In addition, it
spectral embedding is performed in a low-dimensional space. is not necessary to predefine the number of clusters, which can
Various methods have been proposed to address these issues. be determined from the sparsity parameter in the optimization
The methods mainly focus on the construction of a robust simi- model.
larity matrix or the robust decomposition of a similarity matrix. Li The main contributions of this study can be summarized as fol-
et al. [11] proposed a noise-robust-spectral clustering method, fo- lows:
cusing on data to which uniform noise has been added. The data
is transformed to a new space in which the noise points are clus- (1) We propose a new graph-based clustering method with lin-
tered together. In this manner, noise points are detected, and ro- ear regularization model that minimizes the density recon-
bust clustering results are produced. Huang et al. [12] aimed to struction error, subject to data-node selection constraints.
improve the robustness in clustering algorithms based on heat ker- The method is robust to noise and outliers because it cuts
nel theory. Heat kernel can statistically depict traces of random out low-density and noisy points to detect the fundamen-
walk. The concept has an intrinsic connection with diffusion dis- tal structures of clusters. To the best of our knowledge, this
tance which produces more robust clustering results than basic Eu- is the first trial to utilize the linear regularization model to
clidean distance. The authors integrated the heat distributed along perform density-based clustering.
the time scale to measure the distance between each point pair in (2) We present theoretical results that guarantee the grouping
their eigen space. Li et al. [2] proposed the construction of fuzzy- effect in selecting nodes. The grouping effect implies that if
set-based affinity graphs by identifying and exploiting discrimina- two observations have similar nearest neighbors and one of
tive features. The method captures subtle similarity information the two is selected, then the other observation can be simul-
distributed over discriminative feature subspaces to reveal the la- taneously selected with high probability.
tent data distribution. The affinity graph leads to produce more ac- (3) The proposed method does not require significant effort to
curate clustering results than the Gaussian kernel-based similarity determine the number of clusters in advance. The optimal
graph. Bojchevski et al. [13] proposed sparse and latent decomposi- cluster number can be determined through parameter sensi-
tion of the similarity graph. The method jointly learns the spectral tivity analysis of the proposed formulation.
embedding as well as the noisy data. By decomposing the similar- (4) To demonstrate the usefulness of the proposed method, we
ity graph into sparse corruptions and clean data, the method en- conduct simulation and real-world case studies. The results
hances the robustness of spectral clustering. Li et al. [14] investi- demonstrate that our proposed method outperforms the al-
gated the robustness of spectral clustering methods for grouping ternatives.
multi-scale data. The authors proposed the algorithm that com-
The remainder of this paper is organized as follows.
putes an affinity matrix that simultaneously considers the feature
Section 2 presents the details of the proposed outer-points
similarity and reachability similarity of objects. The methods pro-
shaver method. Section 3 presents a simulation study to examine
posed by Li et al. [11] and Bojchevski et al. [13] have similar moti-
the performance of the proposed method and compare it with
vation to the proposed method. The methods share the same idea:
other methods under various scenarios. Section 4 presents a case
decomposing the data into clean and noisy data. Thus in our sim-
study to demonstrate the applicability of the proposed method.
ulation and case studies, we compare our proposed method with
Finally, Section 5 presents our concluding remarks.
the methods of Refs. [11,13].
In this study, we propose a graph-based clustering method for
clustering local and nonlinear pattern data with noise. To en- 2. Proposed method
hance the robustness of the clustering results, we propose the
use of density-reconstruction-based node cutting, which consti- The proposed outer-points shaver (OPS) algorithm consists of
tutes a new method for robust graph-based clustering. The pro- four main steps. The first is to represent the data as a k-nearest
posed method calculates the pseudo-density from a similarity ma- neighbor graph. In this graph, all the observations are represented
trix, and reconstructs it using a sparse regularization model. In by nodes with edges that connect each observation to its nearest
the process, noise and outer points of clusters are identified and neighbors. Second, we conduct pseudo-density reconstruction us-
removed from the similarity graph. After identifying clusters, the ing a linear regularization model to determine the outer points.
removed outer points are assigned to the nearest cluster. Unlike The sparse selection property of the regularization model con-
previous edge cutting-based methods, the proposed method is ro- straints the coefficients of outer points to be zero. The points se-
bust to noise while detecting clusters because it cuts out irrel- lected as the outer points are temporarily excluded from the clus-
evant nodes. Fig. 1 illustrates the main concept of the proposed tering procedure. Third, the subgraphs containing the inner points
Y. Kim, H. Do and S.B. Kim / Pattern Recognition 97 (2020) 107001 3

are clustered using an algorithm to identify connected compo- by investigating the importance of the observations to reconstruct-
nents. Each connected component is determined as a cluster. Fi- ing Ss . Problem (3) effectively selects the top T important observa-
nally, the temporarily excluded outer points are each assigned to tions to reconstruct Ss . Note that we use the unnormalized spectral
their nearest cluster. matrix because the normalization is unnecessary for the proposed
pseudo-density reconstruction with regularization. The normaliza-
2.1. Constructing the k-nearest neighbor graph tion procedure is essential for the methods based on the graph
cut criteria [8]. The unnormalized cut favors the cutting of small
The first step of the proposed algorithm is to represent the data sets of isolated nodes because the cut objective increases with the
by a graph structure. As mentioned in Section 1, representing a number of edges going across the two partitioned parts. However,
data set as a graph is useful for clustering local and nonlinear our formulation consists of a reconstruction error objective with
patterns. There are several choices of constructing a graph from regularization terms. The coefficients of the formulation are opti-
the given data: the fully connected graph, ε -neighborhood graph, mized to reduce the reconstruction error with sparsity, consider-
and k-nearest neighbor graph. A fully connected graph connects ing the connection status of corresponding nodes. Therefore, the
all points with positive similarities to each other and weights all formulation does not require the normalized Laplacian matrix.
edges using a similarity function such as a Gaussian kernel func- Although the optimization formulation selects the most opti-
tion. An ε -neighborhood graph connects all points whose pairwise mal subset of observations, the cardinality constraint makes Prob-
distances are smaller than ε . Both graphs have a scaling factor lem (3) NP-hard [16]. To address the computational challenges of
as a parameter, and it is challenging to determine an appropriate the most optimal subset problem, the computationally tractable
scaling factor. On the other hand, the k-nearest neighbor graph is convex-optimization-based “least absolute shrinkage and selection
known to be robust to the choice of the parameter k. Therefore, operator (LASSO)” has been proposed [17]. Our proposed problem
utilizing the k-nearest neighbor for graph-based clustering is rec- represents a special case of the LASSO problem, which has an equal
ommended [10]. Thus, in this study, we use the k-nearest neighbor number of features and observations. The LASSO formulation can
graph to cluster the data. The definition of the k-nearest neighbor address the computational challenges. However, the LASSO tends
graph is as follows: to select only one feature from a group of collinear features. For
example, if two observations have similar nearest neighbors, then
Definition 1. k-nearest neighbor graph: A k-nearest neighbor- the correlation of the two columns corresponding to the observa-
based graph with n nodes is constructed as follows: An edge eij tions becomes high, and one of the observations is determined as
between nodes i and j is defined as an outer point. Thus, to address this problem, we propose using
the elastic net [18] form of the formulation, which selects collinear
1 if xi ∈ Nk (x j ) or x j ∈ Nk (xi )
ei j = , (1) features simultaneously. The optimization formulation for estimat-
0 otherwise ing the coefficient of the OPS model is

Minimize Ss − Sβ2 + λ1 β1 + λ2 β2 ,

where Nk (xi ) denotes the k-nearest neighbor set of observation xi . 2 2
(4)
According to Eq. (1), an edge between two observations is created β
if and only if one belongs to the k-nearest neighbor set of the where λ1 regulates the sparsity of the coefficient vector β . Larger
other. Note that the number of nearest neighbors k is a parame- λ1 value imposes the L1 -norm of β to be smaller and increases
ter to be determined by the user. Section 3.3 covers the details of the number of selected outer points. On the other hand, λ2 regu-
this parameter. The constructed k-nearest neighbor graph can be lates the smoothness of β . The smoothness denotes the degree of
represented by a similarity matrix S, such that each element of the simultaneous selection of the observation set that shares the same
matrix Sij corresponds to the edge eij of the graph. neighborhoods in the similarity graph. Larger λ2 value results in a
larger number of observations sharing the same neighborhoods in
2.2. Reconstructing the pseudo-density using a regularization model the outer points.
and node cutting Eq. (4) is a generic elastic net problem in which the num-
ber of features is equal to that of the observations. Therefore, the
We consider a linear regularization model with a pseudo- problem can be efficiently solved using various optimization tech-
density vector Ss ∈ Rn × 1 , similarity matrix S ∈ Rn × n , and regression niques, such as coordinate descent [19], projected gradient descent
coefficients β ∈ Rn × 1 . The pseudo-density vector [15] for each ob- [20], and the alternating direction method of multipliers [21]. In
servation is calculated by summing the corresponding row of the this study, we use the projected gradient descent method. If the
similarity matrix S: number of observations is large and necessitates distributed com-
puting, the alternating direction method of multipliers is an appro-

n
Sis = Si j , i ∈ {1, 2 , ..., n}. (2) priate alternative.
j=1
We present our theoretical results concerning the grouping ef-
fect of the proposed OPS model.
The optimization formulation for determining the outer points
is: Lemma 1. Given the tuning parameters λ1 and λ2 , the optimal solu-
tion βˆ of Minimize Ss − Sβ22 + λ1 β1 + λ2 β22 has nonnegative
Minimize Ss − Sβ22 sub ject to β0 ≤ T , (3) β
β entries, i.e., βî ≥ 0, i = 1, 2, ..., n.
where the L0 -norm of a vector β counts the number of nonzero Proof. Suppose that the optimal coefficient βˆ has a negative en-
entries in β . This constraint imposes the coefficients that corre- try: βî < 0. Then, the optimal objective value is given by L(βˆ ) =
spond to the observations (nodes) that are less important to re- Ss − Sβˆ 2 + λ1 β
ˆ 1 + λ2 β
ˆ 2 . Since the similarity matrix S has
2 2
construct the pseudo-density vector to be zero. The outer points nonnegative elements, we have the following inequality:
have a lesser number of connections with other observations in the
similarity graph. That is, the column vectors of S associated with
n
n
n
n
Si j − S ji βîp < Si j − S ji βî , (5)
the outer points are sparser than the others. This is equivalent to
j=1 j=1 j=1 j=1
their amount of contribution to reconstructing Ss being lower than
those of the others. Therefore, we can identify the outer points where βˆ p = |β|
ˆ . Subsequently, we have
4 Y. Kim, H. Do and S.B. Kim / Pattern Recognition 97 (2020) 107001

2 2 2
s the other hand, the inequality strictly holds when the conditions
S − Sβˆ p + λ1 βˆ p + λ2 βˆ p < Ss − Sβˆ + λ1 βˆ are not satisfied.
2 1 2 2 1
2 Fig. 2 illustrates the effect of the sparsity parameter λ1 through
ˆ
+ λ2 β . (6) a toy example. As mentioned previously, larger λ1 value increases
2
the number of outer points. We conduct a sensitivity analysis on
This result implies that L(βˆ p ) < L(βˆ ), which is a contradiction. λ1 and λ2 for the clustering result in Section 3.3.
Therefore, the optimal coefficients βˆ must have nonnegative val- After determining the outer points, we conduct node cutting
ues, i.e., βî ≥ 0, i = 1, 2, ..., n defined as follows:

Lemma 1 guarantees that the optimal coefficients have nonneg- Definition 2. Node cutting: The outer points O = {i : βi = 0} and
ative values. Using this result, we can derive the following group- the edges connected to the outer points {ei j : βi = 0orβ j = 0} are
ing effect theorem in OPS. removed from the k-nearest neighbor graph. (In practice, we re-
move the columns and rows of outer points in the similarity ma-
Theorem 1. Let βˆ be the solution of the OPS given data (Ss , S) trix.)
with parameters λ1 and λ2 , and let D(i, j ) = S1s |βî − βˆ j |. Then,
√
2
2.3. Finding connected component by clustering
S s +S s
D(i, j ) ≤ λ 2 i
2
j
− SiT S j where Sis is the ith entry of Ss and Si is
2
the ith column vector of S. Note that SiT S j implies the number of ob- Removing the outer points breaks the connections inside the
servations connected to observations i and j simultaneously. supergraph and creates subgraphs. In this situation, we can per-
form straightforward clustering by identifying the connected com-
Proof. . The optimal solution βˆ satisfies ∂ L(β )/∂ βk |β =βˆ = 0 if ponents and setting them as clusters. A connected component is
βˆk = 0. Therefore, we have a subgraph in which any two nodes are connected to each other
by paths, and it is not connected to additional nodes in the super-
− 2SiT Ss − Sβˆ + λ1 sgn βî + 2λ2 βî = 0, (7) graph. If we identify all the connected components in the super-
graph, the clustering process is complete.
It is straightforward to determine the connected components of
− 2STj Ss − Sβˆ + λ1 sgn βˆ j + 2λ2 βˆ j = 0, (8) a graph using a nearest neighbor search [22]. A search beginning at
a node will identify the entire connected component before return-
where sgn is the sign function. Subtracting Eq. (8) from
ing. To identify all the connected components of a graph, nearest
Eq. (7) yields
neighbors of the nodes in each connected component set are itera-

tively obtained and added to the set. When searching for a nearest
STj − SiT Ss − Sβˆ + λ2 βî − βˆ j = 0, (9)
neighbor, it searches only nodes that are not already present in the
which is equivalent to set. We execute the algorithm until no more nearest neighbors are

1 T
identified. Having identified a connected component in this man-
βî − βˆ j = Si − S j Ss − Sβˆ . (10) ner, we seed an index that has not currently been searched. We
λ2 then apply the nearest neighbor search to find a new connected
From Eq. (10) we have the following inequality: component. Finally, when all the nodes have been searched, the
s algorithm is terminated, and the group index vector is returned.
βî − βˆ j ≤ S i − S j
1

S − Sβˆ . (11)
λ2 2
2 2.4. Assign outer points to the nearest cluster
From Lemma 1, we must have L(βˆ ) ≤ L(0 ), i.e. Ss − Sβ
ˆ 2+
2 In this step, we assign the outer points to clusters that have
λ1 β
ˆ 1 + λ2 β
ˆ 2 ≤ Ss 2 , which yields
2 2 already been identified. If an outer point belongs to a neighboring
2 set of previously clustered nodes, it is assigned to that cluster. Each
s
S − Sβˆ ≤ Ss 2 .
2
(12) outer point is assigned to a cluster via a k-nearest neighbor classi-
2
fication algorithm. In our experiment study, we varied the number
Utilizing Eq. (11) and Eq. (12), we have the following inequali- of nearest neighbors (k) from one to 20 and found the one that
ties: does not affect the final clustering results. Here we found the ap-

1 s 1 propriate number of nearest neighbors is 10. The outer-point as-
D(i, j ) ≤ S − Sβˆ Si − S j ≤ S i − S j . (13)
λ2 S 2
s
2
2 λ2 2 signing scheme is expressed as follows:

Since S is the similarity matrix of the nearest neigh- yi = arg max ei j i ∈ O, (14)
bor graph, Si − S j 22 = Sis + Ssj − 2SiT S j holds. Therefore, D(i, j ) ≤ Cm
j∈Cm
√

2
Sis +S2j where Cm denotes the mth cluster identified by the OPS and yi is
λ2 2 − SiT S j as desired.
assigned as the cluster label for outer point i. In the previous step,
Here, D(i, j) describes the difference between the coeffi- the clusters reflecting the intrinsic structure have been reasonably
cients of observations i and j. The upper bound in the above identified. Therefore, the straightforward k-nearest neighbor algo-
inequality provides a quantitative description of the group- rithm performs adequately in assigning the outer points to clus-
ing effect of the OPS formulation. D(i, j ) = |βî − βˆ j |/Ss 2 = ters. In cases where the number of features is large, and the perfor-
mance of the distance-based classification algorithm is inadequate,
( Si − S j Ss − Sβ
ˆ | cos θ | )/(λ2 Ss ) where θ indicates the
2 2 2
a classifier such as a regularization method or partial least squares
angle between vectors Si − S j and Ss − Sβˆ . The residual vector can can be used to address the dimensionality issue.
n
be calculated as follows: Ss − Sβˆ = (1 − βq )Sq . Thus, if we as-
q=1 2.5. Computational complexity
sume that the rank of S is equal to n, the equality of the first in-
equality in Eq. (13) holds when the following conditions are satis- This Section examines the computational complexity of the pro-
fied: βq = 1, ∀q ∈ {1, 2, ..., n}\{i, j} and |(1 − βi )/(1 − β j )| = 1. On posed OPS. Constructing the k-nearest neighbor graph consists of
Y. Kim, H. Do and S.B. Kim / Pattern Recognition 97 (2020) 107001 5

Fig. 2. Outer-point identiﬁcation results for the sparsity parameter λ1 : (A) λ1 = 0.1, (B) λ1 = 0.2, (C) λ1 = 0.3, (D) λ1 = 0.4, and (E) λ1 = 0.5. The other parameters λ2 and k
were set to 0.3 and 150, respectively. The blue dots denote original data, and the red dots denote outer points. (F) illustrates the shaved result, with outer points removed,
when λ1 = 0.5.

two steps: calculating distances between observations, and sort- Table 1

Summary of the simulation data sets.
ing the distances to determine the nearest neighbor observations.
The first and second steps require O(n2 p) and O(n2 k), respectively. Data set Number of samples Number of clusters
Therefore, the time complexity of constructing the k-nearest neigh- Three clusters 3,000 3
bor graph is O(n2 p + n2 k). In the reconstruction of pseudo-density Two densities 4,000 2
with regularization, each iteration of solving Eq. (4) contains a gra- Forest 5,000 15
dient descent for a loss function Ss − Sβ22 and a Euclidean pro- Crescent full moon 5,000 2
Half kernels 5,000 2
jection to a constraint space. Calculation of the gradient and pro-
Two spirals 5,000 2
jection uses O(n3 ) and O(n), respectively. Because O(n3 ) dominates Chameleon1 8,000 6
O(n), the complexity of pseudo-density reconstruction is O(n3 ). The Chameleon2 8,000 8
finding connected components to determine the cluster of observa- Chameleon3 10,000 9
tions can be solved with a linear time algorithm. Assigning outer
points to the nearest clusters has the same computational com-
plexity as constructing the nearest neighbor graph. If we assume the nearest mean, serving as a representative point of the clus-
the number of iterations is constant and n > p, n > k, the total com- ter. AP is based on the concept of messages passing between data
putational complexity converges to O(n3 ). points. Unlike KM, AP does not require the specification of the
number of clusters. DBSCAN is a density-based clustering algo-
3. Simulation study rithm that groups points that are closely packed together. Thus, it
can efficiently detect outliers that lie in the low-density regions.
3.1. Simulation setup HDBSCAN extends the DBSCAN by converting it into hierarchical
clustering. The method determines clusters based on the stabil-
We conducted a simulation study to examine the performance ity of clusters. SC is a representative graph-based clustering algo-
of the proposed OPS under various scenarios. Each scenario was rithm, which uses the eigenvalues of the similarity matrix of the
assumed to exhibit a noisy data distribution with local patterns. data to perform dimensionality reduction before clustering. NRSC
Fig. 3 illustrates the distributions for each scenario. We refer the maps data points into a new space through regularization. During
reader to [23] for the distribution in Fig. 3(C), and to [24] for those the mapping, each point spreads its spatial information smoothly
in Fig. 3(G), (H), and (I). Table 1 summarizes these data sets. to other points. RSC jointly learns the spectral embedding as well
We compare the proposed method with seven clustering algo- as the noisy data by decomposing the similarity graph into sparse
rithms: k-means (KM) [3], affinity propagation (AP) [25], density- corruptions and clean data. Thus, the method is more robust to
based spatial clustering of applications with noise (DBSCAN) [26], noise than spectral clustering. KM, SC, NRSC, and RSC require pre-
accelerated hierarchical DBSCAN (HDBSCAN) [27], spectral cluster- determination of the number of clusters. We assume that the ac-
ing (SC) [8], noise-robust-spectral clustering (NRSC) [11], and ro- tual number of clusters is known and set the parameter to be
bust spectral clustering (RSC) [13]. KM is a popular approach to equal to this known number. To optimize the parameters for affin-
cluster analysis in machine learning that partitions N observations ity propagation, DBSCAN, and HDBSCAN, we apply a grid search
into k clusters, where each observation belongs to the cluster with to identify the parameters that yield a similar result to the actual
6 Y. Kim, H. Do and S.B. Kim / Pattern Recognition 97 (2020) 107001

Fig. 3. Distributions of the simulation data: (A) three clusters, (B) two densities, (C) forest, (D) crescent full moon, (E) two kernels, (F) two spirals, (G) chameleon1, (H)
chameleon2, and (I) chameleon3.

number of clusters. Note that in the proposed OPS, it is unneces- subtracting the disagreement proportion of partitions from the
sary to predefine the number of clusters because this number can Rand index. The Hubert index is defined as follows:
be intuitively determined during the adjustment of the sparsity pa-
rameter. Section 3.3 presents the details of the parameter search n ni j ni nj
+4 i j −2 i + j
for the proposed method. 2 2 2 2
To compare the performances of the proposed and compara- Hubert Index = .
n
tive methods, we used two performance measures: the adjusted
2
Rand index [27] and the Hubert index [28], which are both vari-
ants of the Rand index [29]. The Rand index is a measure of (16)
agreement between two data partitions. The adjusted Rand index
is a normalized version of the Rand index [27], which measures Intuitively, the Hubert index measures the correlation between
the degree of agreement more precisely by removing the effect of two data clusters.
the expected similarity. Given two partitions A = {A1 , A2 , . . . Ar } and
B = {B1 , B2 , . . . Bs } of n data points, the adjusted Rand index is de- 3.2. Clustering performance comparison
fined as follows:
Tables 2 and 3 present the clustering performance results in
Adjusted Rand Index
terms of the adjusted Rand index and Hubert index, respectively.
ni j ni nj n The proposed OPS method outperformed the other clustering algo-
i j − i j
2 2 2 2 rithms. The OPS method exhibited the highest adjusted Rand in-
= , dex and Hubert index across all the data sets. The OPS method
1 ni nj ni nj n
2 i + j − i j
performed adequately for local and nonlinear pattern data, such as
2 2 2 2 2
chameleons, a crescent full moon, half kernels, and two spirals. The
(15) OPS method outperformed k-means and affinity propagation even
for local and linear pattern data, such as forest and three clusters.
where nij is the number of data points in Ai ∩Bj , ni = j ni j , and Fig. 4 illustrates graphically the clustering results for the proposed
nj = i ni j . On the other hand, the Hubert index is calculated by OPS method. From the numerical results, we can verify that when
Y. Kim, H. Do and S.B. Kim / Pattern Recognition 97 (2020) 107001 7

Table 2
Clustering performance results for simulation data measured by the adjusted Rand index.

Adjusted Rand Index KM AP DBSCAN HDBSCAN SC NRSC RSC OPS

Three clusters 0.8704 0.8760 0.8417 0.8750 0.8620 0.8694 0.8713 0.8825
Two densities 0.6971 0.6991 0.5379 0.5931 0.7725 0.8262 0.8915 0.9447
Forest 0.8124 0.8128 0.6010 0.7833 0.6340 0.7613 0.8466 0.9021
Crescent full moon 0.0002 0.0104 0.7485 0.7781 0.4789 0.5059 0.6939 0.8006
Half kernels 0.0018 0.0042 0.7645 0.7472 0.1913 0.5358 0.7120 0.8183
Two spirals 0.0430 0.0472 0.8386 0.7564 0.0564 0.6841 0.7918 0.8506
Chameleon1 0.5608 0.6003 0.9275 0.9335 0.7936 0.7945 0.8961 0.9673
Chameleon2 0.3451 0.3867 0.8742 0.8637 0.7524 0.8191 0.9156 0.9927
Chameleon3 0.3638 0.4034 0.8173 0.8891 0.6789 0.7587 0.9330 0.9931

Table 3
Clustering performance results for simulation data measured by the Hubert index.

Hubert Index KM AP DBSCAN HDBSCAN SC NRSC RSC OPS

Three clusters 0.8848 0.8898 0.8718 0.8488 0.8773 0.8949 0.8558 0.8956
Two densities 0.6971 0.6986 0.5379 0.6270 0.7725 0.8835 0.9168 0.9447
Forest 0.9534 0.9534 0.8747 0.9045 0.9000 0.9500 0.9612 0.9757
Crescent full moon 0.0002 0.0107 0.7655 0.7936 0.4789 0.3288 0.6402 0.8003
Half kernels 0.0018 0.0042 0.7645 0.7449 0.1328 0.0901 0.4261 0.8216
Two spirals 0.0430 0.0489 0.8556 0.8320 0.0564 0.1105 0.4669 0.8506
Chameleon1 0.7338 0.7609 0.9567 0.8905 0.8687 0.8889 0.9299 0.9795
Chameleon2 0.6663 0.6857 0.9196 0.8644 0.8381 0.7443 0.8169 0.9956
Chameleon3 0.6789 0.6930 0.8842 0.8716 0.8221 0.6929 0.7731 0.9961

Fig. 4. Graphical clustering results for the proposed OPS algorithm: (A) three clusters, (B) two densities, (C) forest, (D) crescent full moon, (E) half kernels, (F) two spirals,
(G) chameleon1, (H) chameleon2, and (I) chameleon3.
8 Y. Kim, H. Do and S.B. Kim / Pattern Recognition 97 (2020) 107001

Fig. 5. Parameter study on the number of nearest neighbors to construct the similarity graph (k) and the sparsity adjusting parameterλ1 : (A) three clusters, (B) two densities,
(C) forest, (D) crescent full moon, (E) half kernels, (F) two spirals, (G) chameleon1, (H) chameleon2, and (I) chameleon3. The colored bars indicate that the height of the bar
is equal to the actual number of clusters.

applied to noisy data, the proposed OPS method provides robust First, the results demonstrate that sparsity parameter λ1 is a
clustering capability. key factor in determining the number of clusters. Given the num-
ber of nearest neighbors k, the number of clusters varies relatively
3.3. Determining the number of clusters significantly as λ1 changes. It is worth noting that the number of
clusters is stable as λ1 varies when the number of clusters deter-
The number of clusters is an important parameter for cluster- mined by the OPS is equal to the actual number of clusters. If λ1
ing. In general, the number of clusters that maximizes a cluster- is small, the number of outer points removed to determine the ac-
ing performance is selected as an optimum. However, various clus- tual cluster structures is insufficient so that connections between
tering performance measures provide different optimal numbers the clusters are maintained. The redundant connections result in
of clusters [20]. This section demonstrates that the proposed OPS the number of clusters that is lower than the actual number of
can determine the number of clusters in a straightforward manner. clusters. As λ1 increases, a sufficient number of outer points were
In addition, we conducted a sensitivity analysis of the parameters removed and thus, the number of clusters is equal to the actual
such as the number of nearest neighbors k when constructing the number of clusters. In this state, because the intrinsic cluster struc-
k-nearest neighbor graph and smoothing parameter λ2 . ture is determined, the number of clusters does not change sig-
Fig. 5 illustrates the parameter study results. The x-axis repre- nificantly regardless of the number of outer points removed. If λ1
sents the sparsity adjusting parameter λ1 , the y-axis represents the increases excessively, the intrinsic cluster structure is divided into
number of nearest neighbors (k) when constructing the k-nearest smaller clusters so that the number of clusters increases. Or the
neighbor graph, and the z-axis represents the number of clusters. number of clusters can be reduced because all the points compos-
The colored bars indicate that the number of clusters determined ing the clusters are removed. Through these results, we can deter-
by the OPS is equal to the actual number of clusters. The spar- mine the number of clusters by simply identifying an interval in
sity parameter λ1 was varied from 0 to 0.75 at intervals of 0.05 which the number of clusters is stable regardless of λ1 . Note that
units. The parameter for the number of nearest neighbors was var- the affinity propagation algorithm also determines the number of
ied from 60 to 150 and 260 to 350, at intervals of 10 units. The clusters by identifying the stable interval as suggested in the pro-
experimental results have the following two implications: posed OPS method [25].
Y. Kim, H. Do and S.B. Kim / Pattern Recognition 97 (2020) 107001 9

Fig. 6. Parameter study on the sparsity adjusting parameterλ1 and smoothing parameter λ2 : (A) three clusters, (B) two densities, (C) forest, (D) crescent full moon, (E) half
kernels, (F) two spirals, (G) chameleon1, (H) chameleon2, and (I) chameleon3. The colored bars indicate that the height of the bar is equal to the actual number of clusters.

clusters at all. It is worth noting that interaction betweenλ1 and k

exists for the two spirals (Fig. 5(F)) and chameleons (Fig. 5(G)–(I)).
Nevertheless, we can readily determine the number of clusters by
identifying the stable intervals.
Fig. 6 illustrates the results of another parameter study. The x-
axis represents the sparsity adjusting parameter λ1 , the y-axis rep-
resents the smoothing parameter λ2 , and the z-axis represents the
number of clusters. The colored bars indicate that the number of
clusters determined by the OPS is equal to the actual number of
clusters. The sparsity parameter λ1 was varied from 0 to 0.75 at in-
tervals of 0.05 units. The smoothing parameter λ2 was varied from
0.0 to 0.8 at intervals of 0.1 units.
Similar to the previous parameter study, the results demon-
strate that λ1 is a key factor in determining the number of clusters
because the number of clusters varies significantly as λ1 changes.
In contrast, the λ2 does not significantly affect the number of clus-
ters. In summary, the number of clusters in the proposed OPS
Fig. 7. The example plot for noisy data (chameleon1) with 50% noise. method can be determined by simply analyzing the pattern gen-
erated by adjusting the sparsity parameter λ1 .
To finetune the parameters k, λ1 , and λ2 in practice, we first
Second, the results demonstrate that the number of clusters is determine k and λ2 with grid search such that the number of
insensitive to the number of nearest neighbors k. Given λ1 , chang- resulting clusters is relatively stable as λ1 changes. Subsequently,
ing k does not significantly affect the number of clusters. Partic- we find the optimal number of clusters that most frequently ap-
ularly, in a stable interval where the number of clusters is equal pears under the fixed k and λ2 . Finally, we determine the λ1 as
to the actual number of clusters, k does not affect the number of the smallest value that induces the optimal number of clusters.
10 Y. Kim, H. Do and S.B. Kim / Pattern Recognition 97 (2020) 107001

Fig. 8. Noise sensitivity analysis: (A) three clusters, (B) two densities, (C) forest, (D) crescent full moon, (E) half kernels, (F) two spirals, (G) chameleon1, (H) chameleon2,
and (I) chameleon3.

3.4. Noise sensitivity analysis eraging 10 data sets. A method that maintained a larger adjusted
Rand index over the noise ratio would be considered the better
We analyzed the robustness of the proposed OPS method by ex- one. We could observe that the proposed method evidently out-
amining its clustering performance with different amounts of noise performed the other methods in terms of the robustness to noise:
in a data set. We used the nine simulation data and added noise the adjusted Rand index values for the proposed OPS tended to
points randomly with the noise ratio α increased from 0 to 50%. be higher than those of the other methods, exhibiting smaller
The noise ratio is the ratio of the number of added noises to the Rand index changes with increases in the noise ratio. This im-
number of given data points. The noise points are generated from plies that the proposed method efficiently removed noise data,
a uniform distribution where maximum and minimum values for and thus accurately identified intrinsic cluster structures. If the
each coordinate are estimated with the given data set. Fig. 7 illus- amount of noise data is increased, many similarity edges of high
trates an example plot of noise added data. To measure the qual- weight are generated by these noise data. Because spectral cluster-
ity of clustering, we averaged the results over 10 data sets for each ing is based on edge cutting, it is challenging to identify an opti-
noise parameter. When we calculate the clustering index, the given mal edge cut that reflects intrinsic cluster structures. Noise-robust
data points are only considered. The parameters for each cluster- clustering achieves a higher performance than spectral clustering.
ing method were optimized for the data with a noise ratio of 0% However, the method based on edge cutting also exhibited poor
and kept fixed while increasing the noise ratio. We determined the performance in the form of noise increase. The clustering perfor-
parameters for proposed OPS based on the parameter sensitivity mance of the DBSCAN also degraded significantly with an increase
analysis. in noise because the method is sensitive to the tuning parameters.
Fig. 8 presents the results for the proposed and the seven clus- An increase in noise, causing the distribution of the base data to
tering benchmark methods. The x-axis represents the noise ratio, change, requires the DBSCAN parameters to be adjusted. On the
and the y-axis represents the adjusted Rand index obtained by av- other hand, the proposed OPS method is robust to noise because
Y. Kim, H. Do and S.B. Kim / Pattern Recognition 97 (2020) 107001 11

Fig. 9. Experimental results of the computational complexity. The x-axis and y-axis represent the number of observations and computational time, respectively.

Table 4
Summary of the actual case data sets.

Data set Number of samples Number of features Number of clusters

Segment [30] 2,310 19 7

MNIST [31] 10,000 784 10
COIL [32] 14,400 784 20
Human activity [33] 10,299 561 6
Pen digit [34] 10,992 16 10
Gesture phase [35] 9,900 50 5
USPS [36] 11,000 256 10

it is based on node cutting, so that it removes the generated noise commonly used for training and testing in the field of machine
data and identifies intrinsic cluster structures. learning. In our case study, we used the testing part only. The
Canadian Institute for Advanced Research (CIFAR) data set contains
3.5. Experimental results of computational complexity images of animals and vehicles, separated into 10 classes. The orig-
inal color images were transformed to grayscale images. Our case
We presented the theoretical results of the computational com- study used only the testing part of the data set. The Columbia Ob-
plexity of the proposed OPS in Section 2.5. This Section examines ject Image Library (COIL) contains the grayscale images of 20 ob-
the experimental complexity of the OPS. To generate the data sets, jects, which were placed on a motorized turntable against a black
we referred to the distribution of the three clusters data set. We background. The turntable was rotated through 360°, to vary the
checked the computational time by varying the number of obser- object pose. We resized the images from 128 × 128 to 32 × 32. The
vations (n = 6,0 0 0, 12,0 0 0, …, 30,0 0 0) and features (p = 10, 20, …, human activity data set contains accelerometer and gyroscope sen-
50). To perform this experiment, we used MATLAB to implement sor data, measured by smartphones to classify human activities
the OPS on a personal computer (Intel® Core TM i7-8700 CPU @ into six categories (walking, walking upstairs, walking downstairs,
3.20 GHz, 32.00 GB RAM). Fig. 9 illustrates the results. When the sitting, standing, and lying down). The pen digit data set contains
number of observations is relatively small (less than or equal to coordinate information for digit pixels from the hand-written digit
12,0 0 0), the experimental complexity is O(n1.89 ). As the number of data set. The researchers resampled eight points of the pixels de-
observations doubles, the computational time increases 3.7 times. termined by the x and y coordinates, such that the data set has
As the number of observations increases to 30,0 0 0, the experimen- 16 features. The gesture phase data set contains features extracted
tal complexity increases to approximately O(n2.21 ). We expect that from seven videos of people gesticulating. The United States Postal
the computational complexity asymptotically converges to the the- Service (USPS) data set is composed of 11,0 0 0 scaled handwritten
oretical result O(n3 ). Note that the results show that the impact of digit images.
the number of features on the computational time is limited. Table 5 presents the clustering results in terms of the adjusted
Rand index. The proposed OPS method outperformed the other
4. Case study clustering algorithms. The OPS exhibited the highest clustering per-
formance for five out of seven data sets. Although NRSC and KM
To demonstrate the applicability to real situations, we con- achieved the highest clustering results for the COIL and gesture
ducted experiments on seven benchmark data sets, summarized in phase data sets, the OPS performed comparably. Table 6 presents
Table 4. Segment data were drawn randomly from a database of the clustering results measured by the Hubert index, indicating
seven outdoor images, which were hand-segmented to create a la- that the OPS method achieved the best results for five out of seven
bel for each pixel. The data set contains extracted features for im- data sets. These results demonstrate that the proposed OPS method
age segmentation. The Modified National Institute of Standards and provides robust and accurate clustering performance when applied
Technology (MNIST) data set contains handwritten digits, which is to actual case data.
12 Y. Kim, H. Do and S.B. Kim / Pattern Recognition 97 (2020) 107001

Table 5
Clustering performance results for actual case data measured by the adjusted Rand index.

Adjusted Rand Index KM AP DBSCAN HDBSCAN SC NRSC RSC OPS

Segment 0.4406 0.5565 0.4199 0.5129 0.4404 0.4679 0.5435 0.6067

MNIST 0.6909 0.6929 0.8732 0.8834 0.6789 0.6104 0.8737 0.9038
COIL 0.9088 0.9129 0.9527 0.9479 0.9858 0.9867 0.9912 0.9561
Human activity 0.5394 0.5780 0.6443 0.6712 0.5652 0.5826 0.6921 0.7416
Pen digit 0.5312 0.5600 0.6152 0.6834 0.6198 0.6222 0.7072 0.7444
Gesture phase 0.6393 0.5496 0.5815 0.6124 0.5537 0.5510 0.6166 0.5942
USPS 0.6125 0.6124 0.7651 0.7925 0.6212 0.6948 0.8208 0.8358

Table 6
Clustering performance results for actual case data measured by the Hubert index.

Hubert Index KM AP DBSCAN HDBSCAN SC NRSC RSC OPS

Segment 0.7262 0.7831 0.6624 0.6942 0.6727 0.5853 0.6452 0.8024

MNIST 0.8868 0.8875 0.9454 0.9066 0.8799 0.6723 0.7411 0.9686
COIL 0.9777 0.9853 0.9753 0.9568 0.9825 0.9864 0.9734 0.9970
Human activity 0.3555 0.4117 0.5433 0.5165 0.4017 0.4329 0.7240 0.5612
Pen digit 0.2408 0.2925 0.3815 0.3846 0.3996 0.4038 0.7625 0.4012
Gesture phase 0.4092 0.3536 0.3893 0.4924 0.3636 0.4338 0.5083 0.3691
USPS 0.7572 0.7532 0.8153 0.8243 0.7453 0.7851 0.8702 0.8641

In the COIL data and Gesture Phase data, RSC and KM achieved The proposed method shows that the clustering problem can
good clustering performance. In the corresponding data, there are be optimally and efficiently solved by regularized pseudo-density
large correlations between the observations. COIL is image data reconstruction. Depending on how to use the regularization, there
obtained by taking pictures of each object at various angles. The is some potential for its use to solve various pattern recognition
gesture phase is a record of the change of motion sensor over problems. One is to extend our method to yield a label propa-
time. COIL has large spatial correlations and gesture phase has gation algorithm for semi-supervised learning. We can derive the
large temporal correlations between observations. In this situation, solution path of the elastic net over the sparsity parameter. The
the approach of removing observations like OPS could easily break solution path can be used to obtain an effective label propaga-
graph connections in clusters and lead to the degradation of clus- tion path and accurately predict the label of unlabeled data. The
tering performance. In the case of USPS data, graph-based methods other is to use the proposed method as a noise removal method
such as OPS and RSC showed good performance. It is expected that to enhance classification performance. The proposed OPS removes
there are nonlinear characteristics in the data. In terms of the ad- low-density points in the nearest neighbor graph. We can construct
justed Rand index, OPS performs better, and from the perspective the graph with classification data set and remove the noisy points
of the Hubert index, RSC achieves better performance. We believe with pseudo-density reconstruction with regularization method. As
that there are less noisy points in the data so that the difference the noise sensitivity analysis shows, we expect that the method
between the two methods was not significant. As the results of would efficiently reduce the degree of noise, while preserving the
noise sensitivity analysis, it is expected that there will be more ob- normal data points. In addition, there is one more interesting fu-
vious difference in performance when there is much noise in the ture study to extend the proposed method to identify outliers. The
data. outer points provided by the OPS algorithm are possible outliers
because they lie in the low-density region. Therefore, it would
5. Conclusions be worthwhile considering an additional decision rule that could
identify some outliers from the outer points instead of directly as-
In this study, we have examined the problem of robustly clus- signing cluster labels for all of the outer points.
tering data points on a similarity graph with noise. We proposed
the OPS method to detect and remove outer points to reveal the Acknowledgments
intrinsic cluster structure. Determining outer points is a type of
best subset selection problem, which is NP-hard. To solve the prob- The authors would like to thank the editor and reviewers for
lem efficiently, we relaxed the mixed integer programming formu- their useful comments and suggestions, which were of great help
lation to a convex optimization formulation. Through a simulation in improving the quality of the paper. This research was supported
and an actual case study, we compared the performance of the by Brain Korea PLUS; the Basic Science Research Program through
proposed OPS method with those of other clustering methods. We the National Research Foundation of Korea, funded by the Ministry
observed that the proposed OPS method outperformed the other of Science, ICT and Future Planning (NRF-2016R1A2B1008994); the
methods in that it clustered the noisy data accurately and exhib- Ministry of Trade, Industry & Energy under the Industrial Tech-
ited a robust performance when the proportion of noise was in- nology Innovation Program (R1623371), and by an Institute for In-
creased. Although the method shows promising results, the for- formation & communications Technology Promotion grant funded
mulation contains the symmetric similarity matrix which increases by the Korea government (No. 2018-0-00440, ICT-based Crime Risk
the memory and computational complexity in O(n2 ). To handle the Prediction and Response Platform Development for Early Aware-
issue, it would be interesting to solve the formulation with an al- ness of Risk Situation).
ternating direction method of multipliers which can solve the for-
mulation in parallel. The OPS also has some limitations in cluster-
References
ing data that have large temporal or spatial correlations between
observations. We expect that if we add additional constraints that [1] X.L. Sui, L. Xu, X. Qian, T. Liu, Convex clustering with metric learning, Pattern
consider the correlations, we could address the limitations. Recognit 81 (2018) 575–584. https://doi.org/10.1016/j.patcog.2018.04.019.
Y. Kim, H. Do and S.B. Kim / Pattern Recognition 97 (2020) 107001 13

[2] Q. Li, Y. Ren, L. Li, W. Liu, Fuzzy based affinity learning for spectral cluster- [23] P. Fränti, O. Virmajoki, Iterative shrinking method for clustering problems,
ing, Pattern Recognit 60 (2016) 531–542. https://doi.org/10.1016/j.patcog.2016. Pattern Recognit 39 (2006) 761–775. https://doi.org/10.1016/j.patcog.2005.
06.011. 09.012.
[3] J.A. Hartigan, M.A. Wong, Algorithm as 136: a k-means clustering algorithm, J. [24] G. Karypis, E.-.H. Han, V. Kumar, Chameleon: hierarchical clustering using dy-
R. Stat. Soc. C-Appl. 28 (1979) 100–108. https://doi.org/10.2307/2346830. namic modeling, Comput 32 (1999) 68–75. https://doi.org/10.1109/2.781637.
[4] A. Lukasová, Hierarchical agglomerative clustering procedure, Pattern Recognit [25] B.J. Frey, D. Dueck, Clustering by passing messages between data points, Sci-
11 (1979) 365–381. https://doi.org/10.1016/0 031-3203(79)90 049-9. ence 315 (2007) 972–976. https://doi.org/10.1126/science.1136800.
[5] F. De Morsier, D. Tuia, M. Borgeaud, V. Gass, J.-.P. Thiran, Cluster validity [26] M. Ester, H.-.P. Kriegel, J. Sander, X. Xu, in: A Density-Based Algorithm
measure and merging system for hierarchical clustering considering outliers, for Discovering Clusters in Large Spatial Databases with Noise, KDD, 1996,
Pattern Recognit 48 (2015) 1478–1489. https://doi.org/10.1016/j.patcog.2014.10. pp. 226–231.
003. [27] L. McInnes, J. Healy, Accelerated hierarchical density based clustering, in: 2017
[6] D. Comaniciu, P. Meer, Mean shift: A robust approach toward feature space IEEE International Conference on Data Mining Workshops, 2017, pp. 33–42.
analysis, IEEE Trans. Pattern Anal. Mach. Intell. 24 (2002) 603–619. https://doi. https://doi.org/10.1109/ICDMW.2017.12.
org/10.1109/34.10 0 0236. [28] L. Hubert, P. Arabie, Comparing partitions, J. Classif 2 (1985) 193–218. https:
[7] Y. Qin, Z.L. Yu, C.-.D. Wang, Z. Gu, Y. Li, A novel clustering method based on //doi.org/10.1007/BF01908075.
hybrid k-nearest-neighbor graph, Pattern Recognit 74 (2018) 1–14. https://doi. [29] L.J. Hubert, F.B. Baker, The comparison and fitting of given classifica-
org/10.1016/j.patcog.2017.09.008. tion schemes, J. Math. Psychol. 16 (1977) 233–253. https://doi.org/10.1016/
[8] A.Y. Ng, M.I. Jordan, Y. Weiss, On spectral clustering: analysis and an algorithm, 0 022-2496(77)90 054-2.
in: Advances in Neural Information Processing Systems, 2002, pp. 849–856. [30] C. Blake, C. Merz, UCI Repository of Machine Learning Databases, University of
[9] W. Jiang, W. Liu, F.L. Chung, Knowledge transfer for spectral clustering, Pattern California, Department of Information and Computer Science, Irvine, CA, 1998.
Recognit 81 (2018) 484–496. https://doi.org/10.1016/j.patcog.2018.04.018. [31] Y. LeCun, The mnist MNIST database of handwritten digits, http://yann.lecun.
[10] U. Von Luxburg, A tutorial on spectral clustering, Stat. Comput. 17 (2007) 395– com/exdb/mnist/, (1998).
416. https://doi.org/10.10 07/s11222-0 07- 9033- z. [32] S.A. Nene, S.K. Nayar, H. Murase, Columbia object image library (coilCOIL-20),
[11] Z. Li, J. Liu, S. Chen, X. Tang, Noise robust spectral clustering, in: IEEE 11th (1996).
International Conference on Computer Vision, 2007, pp. 1–8. https://doi.org/ [33] D. Anguita, A. Ghio, L. Oneto, X. Parra, J.L. Reyes-Ortiz, A Public Domain Dataset
10.1109/ICCV.2007.4409061. for Human Activity Recognition using Smartphones, ESANN, 2013.
[12] H. Huang, S. Yoo, H. Qin, D. Yu, A robust clustering algorithm based on ag- [34] F. Alimoglu, E. Alpaydin, Y. Denizhan, Combining multiple classifiers for pen-
gregated heat kernel mapping, in: IEEE 11th International Conference on Data based handwritten digit recognition, (1996).
Mining, 2011, pp. 270–279. https://doi.org/10.1109/ICDM.2011.15. [35] R.C.B. Madeo, S.M. Peres, C.A. de Moraes Lima, Gesture phase segmentation
[13] A. Bojchevski, Y. Matkovic, S. Günnemann, Robust spectral clustering for noisy using support vector machines, Expert Syst. Appl. 56 (2016) 100–115. https:
data: Modeling sparse corruptions improves latent embeddings, in: Proceed- //doi.org/10.1016/j.eswa.2016.02.021.
ings of the 23rd ACM SIGKDD International Conference on Knowledge Dis- [36] J.J. Hull, A database for handwritten text recognition research, IEEE Trans. Pat-
covery and Data Mining, 2017, pp. 737–746. https://doi.org/10.1145/3097983. tern Anal. Mach. Intell. 16 (1994) 550–554. https://doi.org/10.1109/34.291440.
3098156.
[14] X. Li, B. Kao, S. Luo, M. Ester, ROSC: robust spectral clustering on multi-scale Younghoon Kim received a Ph.D. in Industrial Management Engineering in 2019
data, in: Proceedings of the 2018 World Wide Web Conference on World Wide from Korea University, Seoul, Korea. His-research interests include feature selec-
Web, 2018, pp. 157–166. https://doi.org/10.1145/3178876.3185993. tion algorithms for high-dimensional data and discrete optimization-based machine
[15] K. Kim, J. Lee, Nonlinear dynamic projection for noise reduction of dispersed learning methods.
manifolds, IEEE Trans. Pattern Anal. Mach. Intell. 36 (2014) 2303–2309. https:
//doi.org/10.1109/TPAMI.2014.2318727.
[16] B.K. Natarajan, Sparse approximate solutions to linear systems, SIAM J. Com- Hyungrok Do received his B.S. in 2014 and is currently a Ph.D. candidate in the
put. 24 (1995) 227–234. https://doi.org/10.1137/S0097539792240406. Department of Industrial Management Engineering in Korea University, Seoul, Ko-
[17] R. Tibshirani, Regression shrinkage and selection via the lasso, J. R. Stat. Soc. rea. His-research interests include machine learning algorithms and optimization.
B-Met. (1996) 267–288.
[18] H. Zou, T. Hastie, Regularization and variable selection via the elastic net, J. Seoung Bum Kim is a Professor in the Department of Industrial Management Engi-
R. Stat. Soc. B 67 (2005) 301–320. https://doi.org/10.1111/j.1467-9868.2005. neering at Korea University. From 20 05–20 09, he was an Assistant Professor in the
00503.x. Department of Industrial & Manufacturing Systems Engineering at the University of
[19] J. Friedman, T. Hastie, H. Höfling, R. Tibshirani, Pathwise coordinate optimiza- Texas at Arlington. He received an M.S. in Industrial and Systems Engineering in
tion, Ann. Appl. Stat. 1 (2007) 302–332. https://doi.org/10.1214/07-AOAS131. 2001, an M.S. in Statistics in 2004, and a Ph.D. in Industrial and Systems Engineer-
[20] J. Liu, S. Ji, J. Ye, in: SLEP: Sparse learning With Efficient Projections, 6, Arizona ing in 2005 from the Georgia Institute of Technology. Dr. Kim’s research interests
State University, 2009, p. 7. utilize data mining methodologies to create new methods for various problems ap-
[21] S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, Distributed optimization and pearing in engineering and science. He has expertise in the machine learning algo-
statistical learning via the alternating direction method of multipliers, Found. rithms for feature extraction/selection problems. He has published more than 100
Trends Mach. Learn. 3 (2011) 1–122. https://doi.org/10.1561/220 0 0 0 0 016. internationally recognized journals and refereed conference proceedings. He was
[22] G. Bounova, O. de Weck, Overview of metrics and their correlation patterns awarded the Jack Youden Prize as the best expository paper in Technometrics for
for multiple-metric topology analysis on heterogeneous graph ensembles, Phys. the Year 2003. He is actively involved with the INFORMS, serving as president for
Rev. E 85 (2012) 016117. https://doi.org/10.1103/PhysRevE.85.016117. the INFORMS Section on Data Mining.

Paper-2 Clustering Algorithms in Data Mining A Review
No ratings yet
Paper-2 Clustering Algorithms in Data Mining A Review
7 pages
A Novel Graph-Based Clustering Method Using Noise Cutting
No ratings yet
A Novel Graph-Based Clustering Method Using Noise Cutting
14 pages
Author's Accepted Manuscript: Pattern Recognition
No ratings yet
Author's Accepted Manuscript: Pattern Recognition
41 pages
1 s2.0 S0031320317303497 Main
No ratings yet
1 s2.0 S0031320317303497 Main
14 pages
ML Assignment 2
No ratings yet
ML Assignment 2
6 pages
Local-Global Fuzzy Clustering With Anchor Graph
No ratings yet
Local-Global Fuzzy Clustering With Anchor Graph
15 pages
Spectral Clustering: X Through The Parameter W 0. The Resulting
No ratings yet
Spectral Clustering: X Through The Parameter W 0. The Resulting
7 pages
LecN10 R
No ratings yet
LecN10 R
9 pages
PR Module 4 QB
No ratings yet
PR Module 4 QB
37 pages
Handbook of Cluster Analysis: C. Hennig, M. Meila, F. Murtagh, R. Rocci (Eds.)
No ratings yet
Handbook of Cluster Analysis: C. Hennig, M. Meila, F. Murtagh, R. Rocci (Eds.)
28 pages
Entropy: Kernel Spectral Clustering For Big Data Networks
No ratings yet
Entropy: Kernel Spectral Clustering For Big Data Networks
20 pages
Spectral Clustering Survey
No ratings yet
Spectral Clustering Survey
12 pages
Adaptive Graph Regularized Low-Rank Matrix Factorization With Noise and Outliers For Clustering
No ratings yet
Adaptive Graph Regularized Low-Rank Matrix Factorization With Noise and Outliers For Clustering
13 pages
Saba DM
No ratings yet
Saba DM
8 pages
Spectral Approach (BU)
No ratings yet
Spectral Approach (BU)
2 pages
Path Based Dissimilarity Measured For Thesis Book Preparation
No ratings yet
Path Based Dissimilarity Measured For Thesis Book Preparation
11 pages
Lecture 13 - Unsupervised Learning, PCA ICA
No ratings yet
Lecture 13 - Unsupervised Learning, PCA ICA
50 pages
Graph-Based Clustering and Data Visualization Algorithms (PDFDrive)
No ratings yet
Graph-Based Clustering and Data Visualization Algorithms (PDFDrive)
120 pages
DS303 Clustering
No ratings yet
DS303 Clustering
20 pages
Scalable Fuzzy Clustering With Anchor Graph
No ratings yet
Scalable Fuzzy Clustering With Anchor Graph
12 pages
澳大利亚悉尼科技大学利用质量与距离峰值快速自主聚类，开发出Torque Clustering算法，实现无参数化高效聚类
No ratings yet
澳大利亚悉尼科技大学利用质量与距离峰值快速自主聚类，开发出Torque Clustering算法，实现无参数化高效聚类
14 pages
SSRN Id3768295
No ratings yet
SSRN Id3768295
7 pages
A Graph-Based Clustering Method With Special Focus On Hyperspectral Imaging
No ratings yet
A Graph-Based Clustering Method With Special Focus On Hyperspectral Imaging
12 pages
Atif IS Paperwork
No ratings yet
Atif IS Paperwork
31 pages
Multi-View - Means Clustering On Big Data: University of Texas at Arlington Arlington, Texas, 76092
No ratings yet
Multi-View - Means Clustering On Big Data: University of Texas at Arlington Arlington, Texas, 76092
7 pages
Tutorial On Spectral Clustering
No ratings yet
Tutorial On Spectral Clustering
26 pages
Topic 6e - Hierarchical Clustering (MIN)
No ratings yet
Topic 6e - Hierarchical Clustering (MIN)
14 pages
A Theoretical Analysis of Density Peaks Clustering and The Component-Wise Peak-Finding Algorithm
No ratings yet
A Theoretical Analysis of Density Peaks Clustering and The Component-Wise Peak-Finding Algorithm
12 pages
Clustering
No ratings yet
Clustering
28 pages
Research On Spectral Clustering Algorithms and Prospects
No ratings yet
Research On Spectral Clustering Algorithms and Prospects
5 pages
PR Assignment 02 - Seemal Ajaz (206979)
No ratings yet
PR Assignment 02 - Seemal Ajaz (206979)
5 pages
Luxburg07 Tutorial 4488
No ratings yet
Luxburg07 Tutorial 4488
32 pages
Data Clustering in K-Means Hierarchical Clustering DBSCAN Clustering
No ratings yet
Data Clustering in K-Means Hierarchical Clustering DBSCAN Clustering
14 pages
A Survey of Kernel and Spectral Methods For Clustering
No ratings yet
A Survey of Kernel and Spectral Methods For Clustering
15 pages
Machine Learning
No ratings yet
Machine Learning
15 pages
(2008) A Survey of Kernel and Spectral Methods For Clustering
No ratings yet
(2008) A Survey of Kernel and Spectral Methods For Clustering
38 pages
Lecture 6
No ratings yet
Lecture 6
55 pages
3.1 Graph Clustering Using Normalized Cuts
No ratings yet
3.1 Graph Clustering Using Normalized Cuts
24 pages
Graph-Based Clustering and Data Visualization Algorithms
No ratings yet
Graph-Based Clustering and Data Visualization Algorithms
1 page
Unsupervised Learning (A.k.a Clustering) : Marcello Pelillo
No ratings yet
Unsupervised Learning (A.k.a Clustering) : Marcello Pelillo
102 pages
Unit 5
No ratings yet
Unit 5
10 pages
DBSCAN - Introduction in Machine Learning.
No ratings yet
DBSCAN - Introduction in Machine Learning.
3 pages
Learning With L1-Graph For Image Analysis-rD5
No ratings yet
Learning With L1-Graph For Image Analysis-rD5
9 pages
Semi-Supervised Spectral Clustering Using Shared Nearest Neighbor For Data With Different Shape and Density
No ratings yet
Semi-Supervised Spectral Clustering Using Shared Nearest Neighbor For Data With Different Shape and Density
8 pages
The Latest Research Progress On Spectral Clustering
No ratings yet
The Latest Research Progress On Spectral Clustering
10 pages
Module 5
No ratings yet
Module 5
43 pages
Chatgpt Unit - 4
No ratings yet
Chatgpt Unit - 4
4 pages
Yang 2017
No ratings yet
Yang 2017
15 pages
By Lior Rokach and Oded Maimon: Clustering Methods
No ratings yet
By Lior Rokach and Oded Maimon: Clustering Methods
5 pages
Birch
No ratings yet
Birch
6 pages
Expert Systems With Applications: Tülin Inkaya
No ratings yet
Expert Systems With Applications: Tülin Inkaya
10 pages
Unit 4 - Machine Learning - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Machine Learning - WWW - Rgpvnotes.in
15 pages
Spectral Clustering
No ratings yet
Spectral Clustering
4 pages
ML TCS Lecture Hierarchical 1608
No ratings yet
ML TCS Lecture Hierarchical 1608
41 pages
Data Mining: Hierarchical Clustering, DBSCAN The EM Algorithm
No ratings yet
Data Mining: Hierarchical Clustering, DBSCAN The EM Algorithm
63 pages
NeurIPS 2021 Multi View Contrastive Graph Clustering Paper
No ratings yet
NeurIPS 2021 Multi View Contrastive Graph Clustering Paper
12 pages
Unit5 CSM ML
No ratings yet
Unit5 CSM ML
32 pages
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
From Everand
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
Fouad Sabry
No ratings yet
Image Segmentation: Unlocking Insights through Pixel Precision
From Everand
Image Segmentation: Unlocking Insights through Pixel Precision
Fouad Sabry
No ratings yet
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
From Everand
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
Fouad Sabry
No ratings yet
Practical Strategies For Extreme Missing Data Imputation in Dementia Diagnosis
No ratings yet
Practical Strategies For Extreme Missing Data Imputation in Dementia Diagnosis
10 pages
A Graph Adaptive Density Peaks Clustering Algorithm For Automatic Centroid Selection and Effective Aggregation
No ratings yet
A Graph Adaptive Density Peaks Clustering Algorithm For Automatic Centroid Selection and Effective Aggregation
16 pages
A Snake Optimization Algorithm-Based Feature Selection Framework For Rapid Detection of Cardiovascular Disease in Its Early Stages
No ratings yet
A Snake Optimization Algorithm-Based Feature Selection Framework For Rapid Detection of Cardiovascular Disease in Its Early Stages
13 pages
Effective Detection of Alzheimer's Disease by Optimizing Fuzzy K-Nearest Neighbors Based On Salp Swarm Algorithm
No ratings yet
Effective Detection of Alzheimer's Disease by Optimizing Fuzzy K-Nearest Neighbors Based On Salp Swarm Algorithm
13 pages
Meta-Learning-Based Approach For IoT Data Analytics
No ratings yet
Meta-Learning-Based Approach For IoT Data Analytics
9 pages
Eloisa Jasmin F. Perez E3Q - Engineering Data Analysis Formative Assessment
No ratings yet
Eloisa Jasmin F. Perez E3Q - Engineering Data Analysis Formative Assessment
2 pages
Steel Tips - Base Plates 1
No ratings yet
Steel Tips - Base Plates 1
6 pages
Chapter 3
No ratings yet
Chapter 3
77 pages
Simulation of Pre-Stressed Slabs Using Abaqus CDP Material Model
No ratings yet
Simulation of Pre-Stressed Slabs Using Abaqus CDP Material Model
10 pages
Text Summarization
No ratings yet
Text Summarization
6 pages
01 Task Performance 1
No ratings yet
01 Task Performance 1
3 pages
Room Checksums: Room - 001 Heating Coil Peak CLG Space Peak Cooling Coil Peak Temperatures
No ratings yet
Room Checksums: Room - 001 Heating Coil Peak CLG Space Peak Cooling Coil Peak Temperatures
1 page
Half Deflection
No ratings yet
Half Deflection
4 pages
Recovering A Project From A MER File
No ratings yet
Recovering A Project From A MER File
4 pages
PLC From Zero To Hero
No ratings yet
PLC From Zero To Hero
388 pages
Uncertainty in Humidity Measurements PDF
No ratings yet
Uncertainty in Humidity Measurements PDF
48 pages
Cambridge Ext2 Ch1 Complex Numbers IWEB
No ratings yet
Cambridge Ext2 Ch1 Complex Numbers IWEB
62 pages
ITF24-DS-Assignment #1
No ratings yet
ITF24-DS-Assignment #1
3 pages
JTT v6.21 en
No ratings yet
JTT v6.21 en
32 pages
H53015302 TRQ XXX
No ratings yet
H53015302 TRQ XXX
2 pages
MA26 Meter & MP-T1 Pulser: Document Ref 903158-001 Rev - 1 10/2001
100% (1)
MA26 Meter & MP-T1 Pulser: Document Ref 903158-001 Rev - 1 10/2001
28 pages
7ut - Transformer Diff Relay Test
100% (2)
7ut - Transformer Diff Relay Test
25 pages
Chapter 2 Exercises and Answers: Answers Are in Blue
No ratings yet
Chapter 2 Exercises and Answers: Answers Are in Blue
6 pages
MS-MO6-L02-Theory of Columns-Rankine Formula
No ratings yet
MS-MO6-L02-Theory of Columns-Rankine Formula
11 pages
Grundfosliterature 5769232
No ratings yet
Grundfosliterature 5769232
14 pages
Daily Practice Problems (DPP) : Sub: Maths Chapter: Quadratic Equation DPP No.: 2
No ratings yet
Daily Practice Problems (DPP) : Sub: Maths Chapter: Quadratic Equation DPP No.: 2
4 pages
Exp Limiting Friction
No ratings yet
Exp Limiting Friction
2 pages
Perio Instruments
100% (3)
Perio Instruments
32 pages
Water and Steam Chemistry, Deposits and Corrosion.
No ratings yet
Water and Steam Chemistry, Deposits and Corrosion.
41 pages
6715 4545 Global Support Toll Free Numbers
No ratings yet
6715 4545 Global Support Toll Free Numbers
1 page
Visual Basic 6.0 Documentation
No ratings yet
Visual Basic 6.0 Documentation
33 pages
Aircraft Welding Cabriana
No ratings yet
Aircraft Welding Cabriana
5 pages
Course Pack OR-BBA 2020
No ratings yet
Course Pack OR-BBA 2020
88 pages
Classification of Air Masses and Fronts - Geography Optional - UPSC - Digitally Learn
No ratings yet
Classification of Air Masses and Fronts - Geography Optional - UPSC - Digitally Learn
14 pages
Preheat Calculation 2 PDF
No ratings yet
Preheat Calculation 2 PDF
3 pages

Outer-Points Shaver-Robust Graph-Based Clustering Via Node Cutting

Uploaded by

Outer-Points Shaver-Robust Graph-Based Clustering Via Node Cutting

Uploaded by

Pattern Recognition 97 (2020) 107001

Contents lists available at ScienceDirect

Outer-Points shaver: Robust graph-based clustering via node cutting

1. Introduction Unlike compactness-based methods, connectivity-based ap-

Minimize Ss − Sβ2 + λ1 β1 + λ2 β2 ,

two steps: calculating distances between observations, and sort- Table 1

Adjusted Rand Index KM AP DBSCAN HDBSCAN SC NRSC RSC OPS

Hubert Index KM AP DBSCAN HDBSCAN SC NRSC RSC OPS

clusters at all. It is worth noting that interaction betweenλ1 and k

Data set Number of samples Number of features Number of clusters

Segment [30] 2,310 19 7

Adjusted Rand Index KM AP DBSCAN HDBSCAN SC NRSC RSC OPS

Segment 0.4406 0.5565 0.4199 0.5129 0.4404 0.4679 0.5435 0.6067

Hubert Index KM AP DBSCAN HDBSCAN SC NRSC RSC OPS

Segment 0.7262 0.7831 0.6624 0.6942 0.6727 0.5853 0.6452 0.8024

You might also like