Fast Self-Supervised Clustering With Anchor Graph
Fast Self-Supervised Clustering With Anchor Graph
Abstract— Benefit from avoiding the utilization of labeled divided into three categories: supervised learning [9], semisu-
samples, which are usually insufficient in the real world, unsu- pervised learning (SSL) [10], and unsupervised learning [11].
pervised learning has been regarded as a speedy and powerful The first group fully utilizes labeled samples, such as support
strategy on clustering tasks. However, clustering directly from
primal data sets leads to high computational cost, which limits vector machine [12], [13] and linear discriminant analysis [14],
its application on large-scale and high-dimensional problems. [15]. The second group merely exploits a few labeled samples,
Recently, anchor-based theories are proposed to partly mitigate where the rest are all substituted with unlabeled sample points.
this problem and field naturally sparse affinity matrix, while The third group completely uses unlabeled samples, in which
it is still a challenge to get excellent performance along with principle component analysis [16], [17] is a fundamentally
high efficiency. To dispose of this issue, we first presented a
fast semisupervised framework (FSSF) combined with a bal- typical dimensionality reduction (DR) [18] approach.
anced K -means-based hierarchical K -means (BKHK) method Since the labeled data in the real world are extremely
and the bipartite graph theory. Thereafter, we proposed a deficient, learning label information is treated as an indispens-
fast self-supervised clustering method involved in this crucial able task in classification [19] and regression missions [20].
semisupervised framework, in which all labels are inferred from However, as the acquisition of labeled samples is tedious
a constructed bipartite graph with exactly k connected compo-
nents. The proposed method remarkably accelerates the general and laborious, sometimes even impossible, prior knowledge
semisupervised learning through the anchor and consists of four is usually not available when dealing with practical prob-
significant parts: 1) obtaining the anchor set as interim through lems [21]. Furthermore, label annotations of low quality also
BKHK algorithm; 2) constructing the bipartite graph; 3) solving have a serious impact on the performance of algorithms.
the self-supervised problem to construct a typical probability Consequently, semisupervised and unsupervised learning are
model with FSSF; and 4) selecting the most representative points
regarding anchors from BKHK as an interim and conducting more generally favored than supervised learning.
label propagation. The experimental results on toy examples SSL leverages both limited labeled data and abundant
and benchmark data sets have demonstrated that the proposed unlabeled data to learn a more efficient model. For instance,
method outperforms other approaches. the general semisupervised model combined with novel class
Index Terms— Bipartite graph, label propagation, discovery [22] is proposed to predict outliers in unknown data.
self-supervised learning, semisupervised framework, special To tackle large graph problems, the neighbor graph construc-
selection. tion with SSL is introduced [23] and large-scale SSL [24]
in classification is also proposed to enhance robustness with
I. I NTRODUCTION
l2, p -norm. However, SSL needs to utilize labeled data after all.
Authorized licensed use limited to: UNIVERSIDAD DE VIGO. Downloaded on February 02,2023 at 10:30:42 UTC from IEEE Xplore. Restrictions apply.
4200 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 33, NO. 9, SEPTEMBER 2022
strictly and also decide how to build the affinity matrix. technique is relatively practical especially dealing with
The introduction of the Gaussian kernel function makes SC large-scale problems possessing complex data structures.
not parameter-free, which is time-consuming when solving 2) Using the bipartite graph learned from samples and
large-scale problems on account of the overall computa- anchors is significant to build sparse similarity matrix
tional complexity of general SC is O(n 2 d), where n and d W , where there is only one parameter k meaning k
is the number of samples and features, respectively. Thus, nearest anchors for each sample.
landmark-based sparse representation strategy [32] is proposed 3) Inspired by the SSL framework, we adopt virtual labels
to apply to SC and construct a more sparse similarity graph. of representative anchors from BKHK to get soft label
As a special kind of unsupervised learning pattern, self- matrix by fulfilling semisupervised clustering problem
supervision-based learning [33], [34] has attracted much utilizing the proposed FSSF. The soft label matrix
attention from researchers in recent years to further improve expresses the meaning of probabilities of the data
clustering performance. Generally, the self-supervised clus- belonging to classes in virtual labels.
tering process mainly adopts auxiliary tasks to explore and 4) The novel special selection strategy is proposed to find
acquire its own supervision information from existed unsuper- out the most representative points, the number of which
vised data. Thereafter, the obtained supervised signals will be is equivalent to the number of real classes c. This proce-
utilized for subsequent training or supervised learning. In the dure is relatively significant to select ultimate real labels.
past two years, a self-supervised clustering module has been The SSL is conducted to operate label propagation at the
widely used in various deep learning networks. For instance, end of our method.
Zhang et al. [35] proposed a self-supervised convolutional It is essential to emphasize some aspects of our method.
subspace clustering network to simultaneously achieve feature 1) The major measures to accelerate the entire algorithm
learning and subspace clustering. In order to cope with the are threefold: the BKHK algorithm, the construction of
outliers problem in multiview clustering, Sun et al. [36] a naturally sparse bipartite graph, and inversion lemma.
proposed a self-supervised deep multiview subspace clustering 2) We take the advantage of fake labels of each anchor
algorithm, in which the clustering results and affinity matrix point to operate self-supervised learning, which is
are mutually trained by an integrated deep learning framework exactly an unsupervised learning strategy with the pro-
to obtain superior clustering performance under multiviews. posed FSSF.
Comparatively, in recent years, there are a few researches
The rest of this article is scheduled as follows. Related
on self-supervision based clustering in the field of machine
works and notations are discussed in Section II. The details
learning. Ye et al. [37] proposed an affinity learning-based
of FSSF are significantly demonstrated in Section III. The
self-supervised diffusion (SSD) for SC to deal with the sen-
proposed method FSSC with FSSF is expounded in Section IV.
sitivity of SC to a fixed affinity matrix, where the cluster-
Corresponding experiments on toy and benchmark data sets
ing results in each iteration provides supervisory signals for
are elaborated in Section V. Finally, we give the conclusion
the diffusion process. Furthermore, a novel self-supervised
and prospective works in Section VI.
clustering method is also proposed to effectively discover
new user intentions [38]. However, the above-mentioned few
self-supervised clustering studies in the field of machine II. N OTATIONS AND R ELATED W ORKS
learning still have potential limitations. Concretely, attributed In this section, some related works, including spectral
to the property of traditional SC, the iterative process in SSD graph theory and some pivotal graph-based clustering tech-
might bring serious computational burden when tackling the niques, will be briefly reviewed first. Meanwhile, some relative
large-scale problem. The study for new user intentions still notations will be described for a vivid description of these
leverages a small number of labeled samples in this clustering algorithms and theories. The BKHK algorithm will be roughly
framework, which is essentially not in the realm of self- described at last for the preparation of our main method.
supervision.
Thence, in order to acquire clustering results with higher A. Graph-Based Learning
quality and faster processing speed, we focus on perform- In recent studies, graph-based clustering [39] techniques
ing self-supervised clustering tasks from the perspective of have been widely and mainly applied in an unsupervised man-
semisupervised label propagation. More specifically, it is pon- ner. First, we denote the data matrix X = [x 1 ; x 2 ; . . . ; x n ] ∈
dered whether it is possible to use sparse theory to integrate Rn×d , where n and d are the number of samples and dimen-
semisupervised ideas with unsupervised methods in clustering sionality, respectively. x i ∈ Rd expresses the i th data point,
tasks. Thus, we propose a fast clustering technique referred which is a row vector. Supposed that ei j means the similarity
to as fast self-supervised clustering (FSSC) with anchor graph weight to connect x i and x j , Euclidean distance is usually
in this article, which integrates a fast semisupervised frame- adopted to be simple. Second, these approaches employ a
work (FSSF) with unsupervised methods in clustering tasks. graph G = (V, E) to model the original data set, where
The procedures of our algorithm are listed as follows. V = X is the graph vertex set and E is the edge set. Associated
1) We utilize the efficient BKHK algorithm to find anchor with each edge ei j ∈ E, Wi j is a nonnegative weight indicating
set from the original data set, which has lower com- the similarity between x i and x j . Generally, the graph can be
putational complexity and better effectiveness in com- divided into directed graph (Wi j = W j i ) [40] and undirected
parison with K -means. Therefore, this anchor generated graph (Wi j = W j i ) [41]. In this article, we focus on the
Authorized licensed use limited to: UNIVERSIDAD DE VIGO. Downloaded on February 02,2023 at 10:30:42 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: FSSC WITH ANCHOR GRAPH 4201
undirected graph with local neighborhood information and the idea of which has been utilized in a novel unsupervised
next a brief review on commonly used similarity graphs is feature selection method [46] to preserve local and global
provided. structure simultaneously. In multiview clustering field, since
1) The ε-Neighborhood Graph: For two random data points, the flaws of fixed common affinity matrix, one-step multiview
if distances between them are smaller than ε, we connect them SC (OMSC) method [47] was proposed with discrimina-
with the same scale, which is regarded as an unweighted graph. tive weights in diverse views, where mapping matrix and
However, the scale of ε is not easy to adjust based on the affinity matrix for different views are optimized dynamically
density distribution of primal data sets. to acquire better clustering performance. Besides, a fuzzy
2) k-Nearest Neighbor Graph: We connect vertex x i with and robust multiview clustering model [48] was proposed to
data point x j if x j belongs to the K-nearest neighbors (KNN) strengthen the stability of each view, where a more sparse
of x i , which leading to a problem that the similarity graph will affinity matrix with more abundant and helpful information
be not symmetric unless we operate W = (W T + W )/2 [42]. was obtained in this framework. Furthermore, in multitask
While we impose a stronger condition which means two SC [49], affinity matrix and mapping function are simultane-
sample points are nearest neighbors to each other, the cor- ously learned with mutual improvement to effectively explore
responding graph is referred to as the mutual KNN graph. intertask correlation.
3) The Fully Connected Graph: Full points are simply On the other hand, since the extra heat kernel parameter
connected with nonnegative similarity values with each other. and singular value decomposition in affinity matrix from
For instance, the Gaussian kernel function [43] primal data, classical SC is generally not efficient to tackle
a large-scale problem, the computational complexity of which
−x i − x j 2
Wi j = W j i = exp (1) is O(n 2 d + n 3 ). Hence, anchor-based theory [50], [51] has
2σ 2
been introduced to generate bipartite graph connecting anchor
can be utilized, where the parameter σ is the width of Gaussian layer and sample layer in recent years, where anchors are a
function. This parameter plays a similar role as the parameter series of representative points that roughly cover entire sample
ε in occasion of the ε-neighborhood graph. We are capable of points. Many scholars have utilized bipartite graphs to build a
utilizing various criteria, including Euclidean distance, Maha- naturally sparse similarity matrix by adjusting the number of
lanobis distance, Minkowski distance, and cosine similarity for anchors and their nearest neighbors. Zhu et al. [52] proposed
improvement of graph. a fast SC to construct a parameter-free large graph with
4) Adaptive Gaussian method graph: In order to pursue effective neighbor assignment. Subsequently, double anchor
the parameter-free measurement, the self-tuning graph with layers and even hierarchical bipartite graph were proposed
Gaussian function [44] is proposed. The details of the simi- and utilized to explore more explicit connection relationship
larity between two vertex x i and x j can be written as among all samples, e.g., Representative Point-based Spec-
2 tral Clustering (RPSC) [53] and SC based on Hierarchical
−d (x i , x j )
Wi j = exp (2) Bipartite Graph (SCHBG) [54]. Random walk-based Laplacian
σi σ j
matrix [55] was also proposed to balance the anchors and
where σi represents the local scaling for x i , which is the samples and improve clustering performance. Furthermore,
distances between vertex x i and its farthest neighborhood. recently, an anchor-based graph has been combined with SSL
Therefore, this parameter can be formulated as to efficiently deal with large-scale problem [56]–[58], which
σi = d(x i , x k ) (3) are able to effectively dispose of the out-of-sample problem
as well.
where x k is the kth neighbor for x i . This strategy will be also In this article, inspired by the integration between
used in following bipartite graph construction. anchor-based theory and SSL, the proposed FSSC method
To consider the connection between each point and avoid with FSSF first and quickly obtain the crucial probability
high computational complexity caused by too much iterations model between anchors and samples, which represents the
using K -means in large-scale data sets, classical SC methods belonging relationship about fake labels for each sample.
are divided into two steps: 1) solving a relaxed continuous The novel special selection strategy is then conducted to
optimization problem [we denote tr (·) is the trace operator] choose the most representative points with the best quality
H = arg min tr(HLHT ) (4) by extracting the maximum score for each sample, which
H T H =I depicts the probability that the sample point belongs to the
where L is the graph Laplacian matrix followed by affinity outlier. Terminally, clustering results can be gained by the label
matrix W , to obtain a constrained matrix H and 2) applying propagation process. As an essential part to generate anchor
K -means or spectral rotations to build indicator matrix of points, the BKHK algorithm will be roughly described later.
clusters.
However, on the one hand, the affinity matrix directly
obtained from original data incorporates noisy and redundant B. Balanced K-Means Based Hierarchical K-Means
information, which extremely affects the clustering perfor- In the past few years, the anchor-based theory has been
mance. In order to dispose of this issue, Zhu et al. [45] adopted introduced to accelerate the construction of a connected graph.
low-dimensional subspace and low-rank constraint dynam- For a specific quantity of anchors, generally, the number of
ically to learn similarity matrix effectively and efficiently, anchors is far less than that of samples, while too sparse
Authorized licensed use limited to: UNIVERSIDAD DE VIGO. Downloaded on February 02,2023 at 10:30:42 UTC from IEEE Xplore. Restrictions apply.
4202 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 33, NO. 9, SEPTEMBER 2022
the large-scale problem. In addition, for concrete selection norm of matrix, i.e., M F = tr(M T M). According to the
2
methods of anchors, on the one hand, though we are able to constructed graph, the general semisupervised framework can
randomly select representative points to fast view the local be formulated as the following cost function:
structure, it is difficult to reach satisfying performance by
n
n
the constructed graph from original samples and these ran- Q(F) = Wi j Fi − F j 2F + μi Fi − Yi 2F . (5)
domly selected anchors. On the other hand, utilizing K -means i, j =1 i=1
in BKHK can obtain desirable representative anchors with The first term in this cost function is a clustering term,
ideal performance; however, the computational complexity which can reach the target that similar samples have similar
is extremely high when solving large-scale problems. Thus, labels. For details, if the similarity between i th data point and
our overall method starts with a speedy and steady BKHK j th data point is extremely high, in order to solve the minimum
algorithm [52] to find representative anchors. of the cost function, Fi can be set almost equal to F j . If the
Fig. 1 vividly shows the entire course of the BKHK similarity between them is almost zero or equal to zero,
algorithm. Similar to the cell division process, this algorithm the constraint to these two samples will be relatively weak.
adopts a balanced binary tree structure, balances to segment We denote that μi > 0 is a regularization parameter for each
almost equal number samples into two clusters, and hierarchi- data point. Accordingly, the second term is a regularization
cally processes each obtained new clusters (if p hierarchies term for labels to measure the discrepancy between obtained
are generated, 2 p representative anchors will be obtained to soft labels Fi and primal labels Yi . On the one hand, if μi is
constitute anchor set U ). Since BKHK is really efficient to deployed to be zero, the label constraint will be invalid and
apply to large data sets that have high dimensions or large this problem will be purely regarded as a label propagation
sample points, it is adopted to accelerate the graph learning to process. On the other hand, if μi is employed as infinitely
get the anchor set U as an interim. To simplify the construction large, the initial label Yi will be maintained. The concrete
of label matrix Y , the index of all anchors learned from primal employment for different samples will be elaborated when two
data sets are recorded. novel parameters are introduced subsequently.
To cope with this problem, we set corresponding SSL model
III. FAST S EMISUPERVISED F RAMEWORK as
In this section, the details of the FSSF will be described. n
n
First, we will introduce a general SSL framework for cluster- min Wi j Fi − F j 2F + μi Fi − Yi 2F . (6)
F
ing tasks. Second, the acceleration strategy for this framework i, j =1 i=1
on similarity matrix then will be demonstrated. Since the Then, we try to deform the Frobenius norm and gain matrix
labels for labeled points and the rest points should be treated form of the problem as
differently, the parameter αl and αu related to regularization
n
n
parameter μ will be introduced, the meaning of which will be min Wi j [(Fi−F j ) (Fi−F j )]+
T
tr[(Fi −Yi )T U (Fi −Yi )]
F
explained later. i, j =1 i=1
(7)
A. General Semisupervised Framework
where U is a n×n diagonal matrix, the i th entry of which is μi .
Consider a graph G = (V, E) with V nodes corresponding The problem can be transformed through identity deformation
to n sample points, in which the first l nodes denote l labeled as
data points and the rest u nodes denote abundant u unlabeled
data points. min tr[F T L F] + tr[(F − Y )T U (F − Y )] (8)
F
Authorized licensed use limited to: UNIVERSIDAD DE VIGO. Downloaded on February 02,2023 at 10:30:42 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: FSSC WITH ANCHOR GRAPH 4203
where L = D − W is referred as an unweighted Laplacian Furthermore, the difference in concrete value and corre-
matrix. sponding meaning between αl and αu will be elaborated and
Let us denote the optimized function in (8) is P(F). Since discussed as follows.
the optimal solution of problem (8) satisfies that the derivative 1) Assuming a labeled data point x i at first, when we
of P(F) for F is equal to zero, we could gain absolutely guarantee the validity of the initial label, this
label should remain unchanged, where αi will be set
∂P(F)
| F=F ∗ = 2L F ∗ + 2U (F ∗ − Y ) = 0. (9) to be zero. In this condition, μi should be deployed
∂F
to be large enough in (11), which will promote the
Through simplification, the optimal solution F ∗ can be effect of fitting term on the problem (6) and contributing
obtained as to complete fixation of the label of x i . Otherwise,
we should make αi to be positive value to make the
F ∗ = (L + U )−1 U Y (10)
initial label adjustable, which can effectively solve the
where the sum of each row of F ∗ is equal to 1, which noises in initial labels.
represents a classical probability model for easier follow-up 2) Second, for one unlabeled data point x j , if we ensure
data processing. [We will see the strict proof of probability that it is impossible to exist novel class in data set,
model following (12) when two novel parameters have been which means every sample all belongs to assumed c
introduced.] classes, α j will be set to be 1 and the value μ j is low
As for the evaluation of probability belonging to outliers enough compared with d j , where the optimization only
for each sample point, which will be demonstrated in the conduct clustering term according to the primitive graph.
following parts, two parameters αl and αu should be introduced Generally, we set αi to be a relatively large value but
to simplify the parameter-setting process. lower than 1, since the number of outliers is usually low
Since the introduction of evaluation strategy, we need to on earth.
set different constraints for labeled samples and unlabeled
samples. Generally, for data point x i , whether x i is labeled B. Acceleration on Similarity Matrix W
or unlabeled, the value of corresponding α is calculated by However, the similarity matrix W in (12) purely con-
structed by the Gaussian kernel function is directly gained
αi = di /(di + μi ) (11)
from primal data set X, which contributes that the solution
where di is the i th element of degree matrix D obtained from process of general semisupervised clustering framework is
W and αi is determined by di and regularization parameter not fast enough. Therefore, in order to accelerate general
μi . On this condition, the value range of αi is limited into framework, we consider to utilize anchor set U from BKHK
[0, 1] which is really efficient for parameter selection. And to construct naturally sparse bipartite graph [59] to build
the corresponding model in framework of certain value on αi connection between anchor set U and original data set X.
will be specifically illustrated. We assume that the number of anchors is m and the objective
Meanwhile, βi = μi /(di + μi ) is introduced to satisfy βi = bipartite graph matrix is B = [b1T ; b2T ; . . . ; bnT ] ∈ Rn×m ,
1 − αi . Assuming matrix Iα and Iβ , we could easily know that which means the similarity between samples and repre-
Iα = I − Iβ , where I is a n × n identity matrix and Iα is a sentative anchors. The original problem can be formulated
n × n diagonal matrix with the i th entry being αi . as
Defining P = D −1 W , (10) becomes n
n
min h i j bi j + γ bi2j (14)
F ∗ = (D − W + U )−1 U Y bi 1=1,bi ≥0
j =1 j =1
= (I − D −1 W + D −1 U )−1 (D −1 U )Y where h i j = x i −y j 22 ,
which represents the square of distance
= (Iα − Iα D −1 W + Iβ )−1 Iβ Y between i th sample and j th anchor using Euclidean distance
= (I − Iα P)−1 Iβ Y (12) to be simple. The first term is the regularization term, which
means the smoothness between anchors and original samples.
which is the ultimate solution of soft label matrix F in the The second term is the sparse term, where B could become as
general semisupervised framework. sparse as possible if the value of γ is set to be large enough.
Based on the concrete constructed details of P and Y , it can Thus, the problem (14) becomes
be easily found that P1n = 1n and Y 1c+1 = 1n , where
1n ∈ Rn×1 and 1c+1 ∈ R(c+1)×1 indicate a column vector with max = γ (15)
γ ,bi 0 =k
elements are all one. Combined with the trait of Iα and Iβ ,
where k embodies the connected components for each original
we will have
sample in bipartite graph matrix B, which is a nonzero integer
Iα P1n + Iβ Y 1c+1 = 1n ⇒ Iβ Y 1c+1 = (I − Iα P)1n and similar to the parameter k in the KNN algorithm. The
⇒ (I − Iα P)−1 Iβ Y 1c+1 = 1n (13) value of k varies on different data sets owing to distinct data
distribution itself. In general, the parameter k is set from
which embodies that the obtained solution F∗ in this semisu- 3 to 30.
pervised framework possesses that characteristic of probability There exists two constraints of this problem: The first is
model. the equality constraint bi 1 = 1, which avoids the appearance
Authorized licensed use limited to: UNIVERSIDAD DE VIGO. Downloaded on February 02,2023 at 10:30:42 UTC from IEEE Xplore. Restrictions apply.
4204 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 33, NO. 9, SEPTEMBER 2022
of all zero vector; the second is inequality constraint bi ≥ 0, regarded as an interim to get final c points belonging to c
since the Bi j symbolizes the similarity meaning. Thus, through classes in overall clustering. The label propagation will be
Lagrange solution we have the bipartite graph conducted at the end of our method.
⎧ The computational complexity analysis for each step and the
⎪
⎨ h i,k+1 − h i j
k , if j ≤ k overall algorithm will be stated at last of this section. Thus,
b̂i j = kh i,k+1 − j =1 h i j (16) our approach can be divided into the following parts.
⎪
⎩ 0, if j > k.
Moreover, we optimize the naturally sparse bipartite graph A. Self-Supervised Clustering Algorithm With Fake Label
matrix B obtained by the (16): assuming x i belongs to anchor The motivation of the self-supervised clustering based on
set U , the nearest anchors for x i will be itself. Therefore, FSSF is to construct a probability model for easier post-
the selection of neighbors is optimized by disposing of zero processing. The initial and artificial labels for m representative
value to improve a little performance. points are usually virtual, which play an important transitive
When accomplishing the construction of bipartite graph effect in FSSF to obtain matrix F. As for Iα and Iβ , on the
matrix B, the symmetric similarity matrix W ∈ Rn×n then one hand, we generally set positive αi for labeled data x i
can be constructed as to tackle the appearance of noises. However, different from
W = B−1 B T (17) the known labels in traditional SSL, m representative points
are selected to roughly and evenly cover primal data points,
where the matrix ∈ R m×m
is a diagonal matrix and the the acquired fake labels of which are merely related to the
i th element
θ i is the sum of i th column of B, which means original structures of n samples. In order to maintain the
θi = i bi j . We can easily get corresponding matrix D = I original distribution of entire representative points, the m
from the bipartite graph. The Laplacian matrix L will be equal different labels of m representative points in matrix Y will
to I − W . Thus, we will have be considered as no noises theoretically. Furthermore, this
L = I −B−1 B T . (18) assumption is validated in experimental results in benchmark
data sets. On the other hand, for unlabeled data x j , α j = 1
In this condition, the number of labeled data points is means that we will lose the capability of filtering outliers.
m, which is equivalent to that of representative anchors. Accordingly, α j = 1 will be deployed to be relatively large to
Initial label matrix Y ∈ Rn×(m+1) and soft label matrix alter their labels as much as possible and lower than 1 to retain
F ∈ Rn×(m+1) are also correspondingly changed in dimension. the function to detect and filter outliers in the semisupervised
Due to D = I from bipartite graph theory earlier, αi will label propagation process.
be set to be 1/(1 + μi ) and βi is equal to μi /(1 + μi ), Moreover, we utilize matrix inversion lemma [60] to fur-
comparatively. ther accelerate the computation of (19) when solving the
According to the solution process of general semisupervised large-scale problem (in this condition, the computation of
framework, the optimal soft label matrix F ∗ can be obtained the inversion of a general n × n matrix will be usually
as time-consuming). This inversion will be transformed into
the computation of m × m matrix. Assuming Q 1 = (I −
F ∗ = (I − Iα W )−1 Iβ Y. (19)
Iα B−1 B T )−1 − I , through (I − Iα B−1 B T )(I + Q 1 ) = I ,
Therefore, the FSSF has been proposed with optimized nat- Q 1 can be transformed into
urally sparse similarity matrix W , where the specific selection
Q 1 = (I − Iα B−1 B T )−1 Iα B−1 B T . (20)
of all parameters will be elaborated in Section IV to tackle
clustering problems. We assume Q 2 = Iα B−1 B T . Since F ∗ = (Q 1 + I )Iβ Y ,
∗
F will be deformed as follows:
IV. FAST S ELF -S UPERVISED C LUSTERING
F ∗ = (I − Iα B−1 B T )−1 Q 2 + I Iβ Y
In this section, the details of the proposed FSSC method are −1
described. The setting of parameters based on the proposed = −Iα B (−Iα B)−1 + −1 B T ) Q 2 + I Iβ Y
FSSF in self-supervised clustering will be demonstrated con- −1
cretely followed by the special selection strategy to discover = − (−Iα B)−1 + −1 B T −1 B T Iβ Y + Iβ Y
−1
the most representative c points with the best quality for label = − −1 (−Iα B)−1 + B T −1 B T Iβ Y + Iβ Y
propagation from m anchors.
−1
We exploit FSSF to conduct the clustering process in a = − (−Iα B)−1 + B T B T Iβ Y + Iβ Y
self-supervised manner, where m representative anchors are −1
considered as the foundation of a bipartite graph and m labeled = − + B T (−Iα B) (−Iα B)−1 B T Iβ Y + Iβ Y
points simultaneously. The labels are completely artificial, −1
= Iα B + B T (−Iα B) B T Iβ Y + Iβ Y. (21)
which are marked by the selected order of each representative
point in the last hierarchy of the BKHK structure. According The contributions of obtained soft label matrix F are as
to the assessment strategy on outliers, we are able to find follows.
proper features for each sample to extract corresponding 1) It is relatively convenient for follow-up processing
representative points, where anchor set U from BKHK is attributed to the probability model of compact F and we
Authorized licensed use limited to: UNIVERSIDAD DE VIGO. Downloaded on February 02,2023 at 10:30:42 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: FSSC WITH ANCHOR GRAPH 4205
can judge the rough class through each Fi j for m classes Algorithm 1 FSSC
from selected anchors. Though the class belonging to Input: The representative anchor set U and the order of all
m anchors does not represent the ultimate clustering anchors Rank.
results, we are able to extract significant information to Self-supervised clustering with FSSF:
continue following the clustering process. 1: Set Y of all samples according to U and Rank.
2) Benefit from the matrix inversion lemma in (20), 2: Build B utilizing Eq.(16).
the inversion process for n × n matrix has been changed 3: while not converge do
into the inversion of m×m, which dramatically mitigates 4: Calculate F by Eq.(19).
the computational cost due to m n in most cases. 5: Accelerate computing of F by Eq.(21).
Concretely, in (20), the computational complexity of 6: end while
( + B T (−Iα B))−1 is O(m 2 n) + O(m 3 ) + O(mn) and The selection of c representative points:
computational complexity of the rest matrix calculation 7: i = 1.
is O(m 2 n) + O(mn). Therefore, we need O(m 2 n) + 8: while i < c do
O(m 3 ) to calculate soft label matrix F according to (20), 9: Extract first m columns of F.
which is linearly related to the number of samples n. 10: Calculate the sum of each row scor ei for x i .
11: Select maximum of scor ei and record the order.
12: Amend other scores.
B. Selection of c Representative Points
13: i = i + 1.
Fi j only represents the probability of i th sample belonging 14: end while
to j th representative anchor or outliers instead of the belonging Label Propagation:
relationship between samples and true classes. For details, 15: while not converge do
since it is sure that the number of true classes in the data set is 16: Calculate the T via computing Eq.(26) quickly.
c, there is still some doubt that how to express the probability 17: end while
of each sample belonging to c classes. The focus on this prob- Output: The result of anticipated clustering label Z with
lem is how to use m obtained anchors to get c representative n data points from converged T f inal .
points from n primal samples. Thus, we consider extracting
a unique score for each sample from matrix F to represent
significance not being outliers.
Attributed to the (m + 1)th column of F ∈ Rn×(m+1) for samples, where more similar to the representative point,
represents the underlying probability of becoming outliers, the smaller corrected sum of rows on similarity will be.
we delete this column maintaining first m columns to get Thus, the probability of the other high similarity points being
F̂ ∈ Rn×m for preprocessing. For sample x i , we can easily selected will be very small.
find that the greater the sum of the i th row of F̂, the less Inspired by feature selection, we denote the sum of i th row
likely it that the i th sample will be identified as an outlier in of F̂ to be the score for i th sample point as
the data set when we are sure that the true label contains c
m
classes. We set α = 1 to avoid the sum of the row for each score(x i ) = Fi j (22)
unlabeled data is all 1, which has a negative impact to regard j
the sum of the row as a proper choosing score.
Therefore, a special selection strategy to extract c repre- where we need to choose the maximum of scores of n samples,
sentative points with the best quality from n samples will which means that larger scores, more possible not to be outlier
be introduced, which aims to find out one corresponding and representative one of c classes with best quality.
representative point belonging to one class in each of the When the first representative point et.x i is selected,
c classes and avoid the absence of representative points in the scores of other samples will be amended by the similarity
a class. It should be significantly emphasized that there is between x i and other points in W . Assuming the scores of x i
no correlation between c representative points and c cluster is z 1 , the explanation can be formulated as
centers actually. On the one hand, the larger extracted score
z 1 = arg max score(x i ). (23)
for the i th sample merely embodies that the i th sample is
more likely to be included among known c true classes. We assume one point x j different from x i , the feature score
On the other hand, the selection of subsequent representative of which is score(x j ), the modification process should be
points is subject to previous representative points, which only
aims to choose representative points from different classes, score(x j )new = (1 − Wi j )score(x j ) (24)
respectively. It is only an ideal situation that the selected c
representative points can refer to the cluster center of each where the scores of other samples will be amended by (24)
category, the probability of which is extremely small. one by one. Then, we will choose the second largest score
Since the number of selected representative points is x h when accomplishing all the amending. The algorithm
exactly c, whenever we choose a representative point, ends until c representative points are selected completely.
we should remove the sample points with high similarity. These c representative points will be treated as the initial
We adapt to the correction of the sum of all other rows points with true labels to operate label propagation processing.
Authorized licensed use limited to: UNIVERSIDAD DE VIGO. Downloaded on February 02,2023 at 10:30:42 UTC from IEEE Xplore. Restrictions apply.
4206 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 33, NO. 9, SEPTEMBER 2022
Authorized licensed use limited to: UNIVERSIDAD DE VIGO. Downloaded on February 02,2023 at 10:30:42 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: FSSC WITH ANCHOR GRAPH 4207
Fig. 4. Clustering results on two toy examples. (a) Two moon data.
(b) Spheres data.
TABLE I
D ESCRIPTION OF D ATA S ETS
with 0.12 noise and Fig. 3(d) shows initial spheres toy data
which consists of four classes.
From Fig. 3(b), it can be seen that the number of labeled
representative anchors is set to be 16 and it is reasonable
for 400 samples and two classes. We can easily find that
the selection of anchor points is roughly uniform within and
between classes, where the red diamonds represent the anchor
points of the first class and the blue squares represent the
anchor points of the second class. Combined with the theory
of the construction of bipartite graph, for sample x i , it will
Fig. 3. Experiments on two toy examples. (a) Original data of Two Moon. be most similar to the nearest anchor point of the same
(b) Anchors of Two Moon. (c) Final points of Two Moon. (d) Original data category. The anchor results on the spheres data set have
of Spheres. (e) Anchors of Spheres. (f) Final points of Spheres. similar characteristics to the two moon data set, which is
shown in Fig. 3(e).
Fig. 3(c) and (f) show the results of selection about c
Considering that m n and c n, the overall computa-
representative points in these two data sets, which express that
tional complexity of the FSSC method is O(nmd). there is only one representative point of each category.
The clustering results in these two data sets are shown
V. E XPERIMENTAL R ESULTS in Fig. 4, where the clustering accuracy of two moon data
In this section, we first validate our approach and exhibit the can achieve 100% and that of spheres data can reach 99.5%.
main stages of the proposed method with two toy examples Though the m anchors and c representative points sometimes
graphically. Second, the parameter sensitivity of our method lightly fluctuate in different experiments, the desirable results
based on the number of anchors m, αl and αu will be con- can be always maintained in these two toy examples.
cretely analyzed on five benchmark data sets with clustering
metrics ACCuracy (ACC), normalized mutual information B. Parameter Sensitivity
(NMI), and clustering time. Finally, compared results with
K -means, SC [61], LSC-R [62], LSC-K [62], FSC [52], and We begin via describing our experimental benchmark data
FRWL-B [55] are assessed, where we run every method ten sets. The specific characteristics of these data sets about the
times in these five data sets, calculate the mean of all results, number of instances, dimensions, and classes of GT are all
and report the metrics ACC, NMI, and running time. These listed on Table I. Two of them, Abalone and Letter, are from
experimental results can demonstrate the effectiveness of our UCI machine learning repository [63]. While PalmData25 with
algorithm and can also validate the computational complexity 16 × 16 image scale, US Postal handwritten digit (USPS) with
analysis in Section IV-D. 16×16 image scale, and Mixed National Institute of Standards
and Technology handwritten digit (MNIST) with 24×24 image
scale belong to image data sets.
A. Validation on Toy Example The three essential parameters in our method are the number
We give two toy examples to analyze and validate our of the representative anchors m and the regularization parame-
method. Fig. 3(a) shows the primal two moon toy data con- ter αl and αu . The influence of these parameters on running
taining two classes which have the same number of samples time and performance will be validated as follows.
Authorized licensed use limited to: UNIVERSIDAD DE VIGO. Downloaded on February 02,2023 at 10:30:42 UTC from IEEE Xplore. Restrictions apply.
4208 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 33, NO. 9, SEPTEMBER 2022
Fig. 5. Trend of ACC, NMI, and clustering time via adjusting the number of anchors on five benchmark data sets. (a) PalmData25. (b) Abalone. (c) USPS.
(d) Letter. (e) MNIST.
Authorized licensed use limited to: UNIVERSIDAD DE VIGO. Downloaded on February 02,2023 at 10:30:42 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: FSSC WITH ANCHOR GRAPH 4209
Fig. 7. Clustering accuracy of Abalone via adjusting parameter αl and αu Fig. 9. Clustering accuracy of Letter via adjusting parameter αl and αu when
when fixing k and m. fixing k and m.
Authorized licensed use limited to: UNIVERSIDAD DE VIGO. Downloaded on February 02,2023 at 10:30:42 UTC from IEEE Xplore. Restrictions apply.
4210 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 33, NO. 9, SEPTEMBER 2022
TABLE II
C OMPARISON IN T ERMS OF ACC (%)
TABLE III
C OMPARISON IN T ERMS OF NMI (%)
Authorized licensed use limited to: UNIVERSIDAD DE VIGO. Downloaded on February 02,2023 at 10:30:42 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: FSSC WITH ANCHOR GRAPH 4211
[4] M. Hausknecht, W.-K. Li, M. Mauk, and P. Stone, “Machine learning [31] R. Zhang, F. Nie, M. Guo, X. Wei, and X. Li, “Joint learning of fuzzy
capabilities of a simulated cerebellum,” IEEE Trans. Neural Netw. Learn. K-means and nonnegative spectral clustering with side information,”
Syst., vol. 28, no. 3, pp. 510–522, Mar. 2017. IEEE Trans. Image Process., vol. 28, no. 5, pp. 2152–2162, May 2019.
[5] W. Kim, M. S. Stankovic, K. H. Johansson, and H. J. Kim, [32] D. Cai and X. Chen, “Large scale spectral clustering via landmark-
“A distributed support vector machine learning over wireless sensor based sparse representation,” IEEE Trans. Cybern., vol. 45, no. 8,
networks,” IEEE Trans. Cybern., vol. 45, no. 11, pp. 2599–2611, pp. 1669–1680, Aug. 2015.
Nov. 2015. [33] D. Hendrycks, M. Mazeika, S. Kadavath, and D. Song, “Using self-
[6] X. Li, M. Chen, F. Nie, and Q. Wang, “A multiview-based parameter free supervised learning can improve model robustness and uncertainty,” in
framework for group detection,” in Proc. AAAI, 2017, pp. 4147–4153. Proc. NeurIPS, 2019, pp. 15637–15648.
[7] X.-D. Wang, R.-C. Chen, Z.-Q. Zeng, C.-Q. Hong, and F. Yan, “Robust [34] Q. Ma, S. Li, W. Zhuang, S. Li, J. Wang, and D. Zeng, “Self-
dimension reduction for clustering with local adaptive learning,” IEEE supervised time series clustering with model-based dynamics,” IEEE
Trans. Neural Netw. Learn. Syst., vol. 30, no. 3, pp. 657–669, Mar. 2019. Trans. Neural Netw. Learn. Syst., early access, Aug. 31, 2020, doi:
[8] Z. Feng and Y. Zhu, “A survey on trajectory data mining: Techniques 10.1109/TNNLS.2020.3016291.
and applications,” IEEE Access, vol. 4, pp. 2056–2067, 2016. [35] J. Zhang et al., “Self-supervised convolutional subspace clustering
[9] Z. Li, Z. Zhang, J. Qin, Z. Zhang, and L. Shao, “Discriminative Fisher network,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.
embedding dictionary learning algorithm for object recognition,” IEEE (CVPR), Jun. 2019, pp. 5473–5482.
Trans. Neural Netw. Learn. Syst., vol. 31, no. 3, pp. 786–800, Mar. 2020. [36] X. Sun, M. Cheng, C. Min, and L. Jing, “Self-supervised deep
[10] X. Fang et al., “Flexible affinity matrix learning for unsupervised and multi-view subspace clustering,” in Proc. ACML, vol. 101, 2019,
semisupervised classification,” IEEE Trans. Neural Netw. Learn. Syst., pp. 1001–1016.
vol. 30, no. 4, pp. 1133–1149, Apr. 2019. [37] J. Ye, Q. Li, J. Yu, X. Wang, and H. Wang, “Affinity learning via
[11] H. Jia, Y.-M. Cheung, and J. Liu, “A new distance metric for unsuper- self-supervised diffusion for spectral clustering,” IEEE Access, vol. 9,
vised learning of categorical data,” IEEE Trans. Neural Netw. Learn. pp. 7170–7182, 2021.
Syst., vol. 27, no. 5, pp. 1065–1079, May 2016. [38] T. Lin, H. Xu, and H. Zhang, “Constrained self-supervised clustering
[12] F. Cai and V. Cherkassky, “Generalized SMO algorithm for SVM-based for discovering new intents (student abstract),” in Proc. AAAI, 2020,
multitask learning,” IEEE Trans. Neural Netw. Learn. Syst., vol. 23, pp. 13863–13864.
no. 6, pp. 997–1003, Jun. 2012. [39] M. A. Lozano and F. Escolano, “Graph matching and clustering using
[13] A. Dong, F.-L. Chung, Z. Deng, and S. Wang, “Semi-supervised SVM kernel attributes,” Neurocomputing, vol. 113, pp. 177–194, Aug. 2013.
with extended hidden features,” IEEE Trans. Cybern., vol. 46, no. 12, [40] N. Paudel, L. Georgiadis, and G. F. Italiano, “Computing critical nodes
pp. 2924–2937, Dec. 2016. in directed graphs,” ACM J. Experim. Algorithmics, vol. 23, pp. 1–24,
[14] B. Leng, J. Zeng, M. Yao, and Z. Xiong, “3D object retrieval with Nov. 2018.
multitopic model combining relevance feedback and LDA model,” IEEE [41] L. Gellert and R. Sanyal, “On degree sequences of undirected, directed,
Trans. Image Process., vol. 24, no. 1, pp. 94–105, Jan. 2015. and bidirected graphs,” Eur. J. Combinatorics, vol. 64, pp. 113–124,
[15] Y. Aliyari Ghassabeh, F. Rudzicz, and H. A. Moghaddam, “Fast Aug. 2017.
incremental LDA feature extraction,” Pattern Recognit., vol. 48, no. 6, [42] F. Nie, X. Wang, M. I. Jordan, and H. Huang, “The constrained
pp. 1999–2012, Jun. 2015. Laplacian rank algorithm for graph-based clustering,” in Proc. AAAI,
[16] C. Alzate and J. A. K. Suykens, “Multiway spectral clustering with 2016, pp. 1969–1976.
out-of-sample extensions through weighted kernel PCA,” IEEE Trans. [43] M. Filippone, F. Camastra, F. Masulli, and S. Rovetta, “A survey of
Pattern Anal. Mach. Intell., vol. 32, no. 2, pp. 335–347, Feb. 2010. kernel and spectral methods for clustering,” Pattern Recognit., vol. 41,
[17] Z. Khan, F. Shafait, and A. Mian, “Joint group sparse PCA for com- no. 1, pp. 176–190, Jan. 2008.
pressed hyperspectral imaging,” IEEE Trans. Image Process., vol. 24, [44] L. Zelnik-Manor and P. Perona, “Self-tuning spectral clustering,” in
no. 12, pp. 4934–4942, Dec. 2015. Proc. NIPS, 2004, pp. 1601–1608.
[18] Z. Lai, Y. Xu, J. Yang, L. Shen, and D. Zhang, “Rotational invariant [45] X. Zhu, S. Zhang, Y. Li, J. Zhang, L. Yang, and Y. Fang, “Low-rank
dimensionality reduction algorithms,” IEEE Trans. Cybern., vol. 47, sparse subspace for spectral clustering,” IEEE Trans. Knowl. Data Eng.,
no. 11, pp. 3733–3746, Nov. 2017. vol. 31, no. 8, pp. 1532–1543, Aug. 2019.
[19] L. Gan, J. Xia, P. Du, and Z. Xu, “Dissimilarity-weighted sparse [46] X. Zhu, S. Zhang, R. Hu, Y. Zhu, and J. Song, “Local and global
representation for hyperspectral image classification,” IEEE Geosci. structure preservation for robust unsupervised spectral feature selection,”
Remote Sens. Lett., vol. 14, no. 11, pp. 1968–1972, Nov. 2017. IEEE Trans. Knowl. Data Eng., vol. 30, no. 3, pp. 517–529, Mar. 2018.
[20] J. Gan, G. Wen, H. Yu, W. Zheng, and C. Lei, “Supervised feature [47] X. Zhu, S. Zhang, W. He, R. Hu, C. Lei, and P. Zhu, “One-step multi-
selection by self-paced learning regression,” Pattern Recognit. Lett., view spectral clustering,” IEEE Trans. Knowl. Data Eng., vol. 31, no. 10,
vol. 132, pp. 30–37, Apr. 2020. pp. 2022–2034, Oct. 2019.
[21] J. Wang, X. Wang, K. Zhang, K. Madani, and C. Sabourin, “Morpho- [48] X. Zhu, S. Zhang, Y. Zhu, W. Zheng, and Y. Yang, “Self-weighted multi-
logical band selection for hyperspectral imagery,” IEEE Geosci. Remote view fuzzy clustering,” ACM Trans. Knowl. Discovery Data, vol. 14,
Sens. Lett., vol. 15, no. 8, pp. 1259–1263, Aug. 2018. no. 4, p. 48, 2020,
[22] F. Nie, S. Xiang, Y. Liu, and C. Zhang, “A general graph-based semi- [49] Y. Yang, Z. Ma, Y. Yang, F. Nie, and H. T. Shen, “Multitask spectral
supervised learning with novel class discovery,” Neural Comput. Appl., clustering by exploring intertask correlation,” IEEE Trans. Cybern.,
vol. 19, no. 4, pp. 549–555, Jun. 2010. vol. 45, no. 5, pp. 1069–1080, May 2015.
[23] L. Berton, “Graph construction based on neighborhood for semisuper- [50] X. Chen, R. Chen, Q. Wu, Y. Fang, F. Nie, and J. Z. Huang, “LABIN:
vised,” Ph.D. dissertation, Univ. São Paulo, São Paulo, Brazil, 2016. Balanced min cut for large-scale data,” IEEE Trans. Neural Netw. Learn.
[24] L. Zhang et al., “Large-scale robust semisupervised classification,” IEEE Syst., vol. 31, no. 3, pp. 725–736, Mar. 2020.
Trans. Cybern., vol. 49, no. 3, pp. 907–917, Mar. 2019. [51] X. Chen, W. Hong, F. Nie, D. He, M. Yang, and J. Z. Huang, “Spectral
[25] W.-L. Zhao, C.-H. Deng, and C.-W. Ngo, “K-means: A revisit,” Neuro- clustering of large-scale data by directly solving normalized cut,” in
computing, vol. 291, pp. 195–206, May 2018. Proc. 24th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining,
[26] A. Karlekar, A. Seal, O. Krejcar, and C. Gonzalo-Martín, “Fuzzy Jul. 2018, pp. 1206–1215.
k-means using non-linear s-distance,” IEEE Access, vol. 7, [52] W. Zhu, F. Nie, and X. Li, “Fast spectral clustering with efficient large
pp. 55121–55131, 2019. graph construction,” in Proc. ICASSP, 2017, pp. 2492–2496.
[27] R. Langone and J. A. K. Suykens, “Fast kernel spectral clustering,” [53] L. Yang, X. Liu, F. Nie, and M. Liu, “Large-scale spectral cluster-
Neurocomputing, vol. 268, pp. 27–33, Dec. 2017. ing based on representative points,” Math. Problems Eng., vol. 2019,
[28] H. Zheng and J. Wu, “Which, when, and how: Hierarchical clustering Dec. 2019, Art. no. 5864020.
with human–machine cooperation,” Algorithms, vol. 9, no. 4, p. 88, [54] X. Yang, W. Yu, R. Wang, G. Zhang, and F. Nie, “Fast spectral clustering
Dec. 2016. learning with hierarchical bipartite graph for large-scale data,” Pattern
[29] C. Fevotte and N. Dobigeon, “Nonlinear hyperspectral unmixing with Recognit. Lett., vol. 130, pp. 345–352, Feb. 2020.
robust nonnegative matrix factorization,” IEEE Trans. Image Process., [55] C. Wang, F. Nie, R. Wang, and X. Li, “Revisiting fast spectral clustering
vol. 24, no. 12, pp. 4810–4819, Dec. 2015. with anchor graph,” in Proc. ICASSP, 2020, pp. 3902–3906.
[30] Y. Pang, J. Xie, F. Nie, and X. Li, “Spectral clustering by joint spectral [56] F. He, F. Nie, R. Wang, X. Li, and W. Jia, “Fast semisupervised learning
embedding and spectral rotation,” IEEE Trans. Cybern., vol. 50, no. 1, with bipartite graph for large-scale data,” IEEE Trans. Neural Netw.
pp. 247–258, Jan. 2020. Learn. Syst., vol. 31, no. 2, pp. 626–638, Feb. 2020.
Authorized licensed use limited to: UNIVERSIDAD DE VIGO. Downloaded on February 02,2023 at 10:30:42 UTC from IEEE Xplore. Restrictions apply.
4212 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 33, NO. 9, SEPTEMBER 2022
[57] F. He, F. Nie, R. Wang, H. Hu, W. Jia, and X. Li, “Fast semi-supervised Feiping Nie (Member, IEEE) received the Ph.D.
learning with optimal bipartite graph,” IEEE Trans. Knowl. Data Eng., degree in computer science from Tsinghua Univer-
early access, Jan. 21, 2020, doi: 10.1109/TKDE.2020.2968523. sity, Beijing, China, in 2009.
[58] W. Liu, J. He, and S. Chang, “Large graph construction for scalable He has authored over 100 articles in top
semi-supervised learning,” in Proc. ICML, 2010, pp. 679–686. journals and conferences, including the IEEE
[59] F. Nie, X. Wang, and H. Huang, “Clustering and projected clustering T RANSACTIONS ON PATTERN A NALYSIS AND
with adaptive neighbors,” in Proc. KDD, 2014, pp. 977–986. M ACHINE I NTELLIGENCE, the International Jour-
[60] M. Šorel and F. Šroubek, “Fast convolutional sparse coding using matrix nal of Computer Vision, the IEEE T RANSACTIONS
inversion lemma,” Digit. Signal Process., vol. 55, pp. 44–51, Aug. 2016. ON I MAGE P ROCESSING , the IEEE T RANSAC -
[61] S. Mehrkanoon, C. Alzate, R. Mall, R. Langone, and J. A. K. Suykens, TIONS ON N EURAL N ETWORKS AND L EARNING
“Multiclass semisupervised learning based upon kernel spectral cluster- S YSTEMS, the IEEE T RANSACTIONS ON N EURAL
ing,” IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 4, pp. 720–733, N ETWORKS , the IEEE T RANSACTIONS ON K NOWLEDGE AND D ATA E NGI -
Apr. 2015. NEERING , the ACM Transactions on Knowledge Discovery from Data,
[62] X. Chen and D. Cai, “Large scale spectral clustering with landmark- the Bioinformatics, the International Conference on Machine Learning,
based representation,” in Proc. AAAI, 2011, pp. 313–318. the Conference on Neural Information Processing Systems, the Knowledge
[63] M. M. R. Khan, R. B. Arif, M. A. B. Siddique, and M. R. Oishe, “Study Discovery and Data Mining Conference, the International Joint Conference
and observation of the variation of accuracies of KNN, SVM, LMNN, on Artificial Intelligence, the Association for the Advancement of Artificial
ENN algorithms on eleven different datasets from UCI machine learning Intelligence, the International Conference on Computer Vision, the Conference
repository,” CoRR, vol. abs/1809.06186, pp. 124–129, Sep. 2018. on Computer Vision and Pattern Recognition, and the ACM Multimedia. His
current research interests include machine learning and its applications, such
as pattern recognition, data mining, computer vision, image processing, and
information retrieval.
Jingyu Wang (Member, IEEE) received the Ph.D. Dr. Nie is currently serving as an associate editor or a PC member for
degree in signal, image, and automation from the several prestigious journals and conferences in the related fields. His articles
Université Paris-Est, Paris, France, in 2015. have been cited over 5000 times (Google scholar).
He is currently an Associate Professor with the
School of Astronautics, School of Artificial Intel-
ligence, Optics and Electronics (iOPEN), North-
western Polytechnical University, Xi’an, China. His
research interests include information processing,
computer vision, and intelligent perception.
Authorized licensed use limited to: UNIVERSIDAD DE VIGO. Downloaded on February 02,2023 at 10:30:42 UTC from IEEE Xplore. Restrictions apply.