2017 IDEC Guo

The document presents the Improved Deep Embedded Clustering (IDEC) algorithm, which enhances deep clustering by preserving local data structures during feature transformation. IDEC integrates clustering loss with an under-complete autoencoder to optimize both cluster assignments and feature representations simultaneously. Experimental results demonstrate the effectiveness of IDEC in achieving better clustering performance compared to existing methods.

Uploaded by

aegr82

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views7 pages

2017 IDEC Guo

Uploaded by

aegr82

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Improved Deep Embedded Clustering with Local Structure Preservation

Xifeng Guo, Long Gao, Xinwang Liu, Jianping Yin

College of Computer, National University of Defense Technology, Changsha, China
[email protected], [email protected], [email protected], [email protected]

Abstract can be done by applying dimension reduction techniques like

Principle Component Analysis (PCA), but the representation
Deep clustering learns deep feature representations ability of these shallow models is limited. Thanks to the de-
that favor clustering task using neural networks. velopment of deep learning, such feature transformation can
Some pioneering work proposes to simultaneously be achieved by using Deep Neural Networks (DNN). We refer
learn embedded features and perform clustering to this kind of clustering as deep clustering.
by explicitly defining a clustering oriented loss. Deep clustering is most recently proposed and leaves a
Though promising performance has been demon- lot of problems unsolved. For example, what types of neu-
strated in various applications, we observe that a ral networks are proper? How to provide guidance informa-
vital ingredient has been overlooked by these work tion i.e. to define clustering oriented loss function? Which
that the defined clustering loss may corrupt fea- properties of data should be preserved during transforma-
ture space, which leads to non-representative mean- tion? The primitive work in deep clustering focuses on
ingless features and this in turn hurts clustering learning features that preserve some properties of data by
performance. To address this issue, in this paper, adding priori knowledge to the subjective [Tian et al., 2014;
we propose the Improved Deep Embedded Cluster- Peng et al., 2016]. They are two-stage algorithms: fea-
ing (IDEC) algorithm to take care of data structure ture transformation and then clustering. Latter, algorithms
preservation. Specifically, we manipulate feature that jointly accomplish feature transformation and clustering
space to scatter data points using a clustering loss as come into being [Yang et al., 2016; Xie et al., 2016]. The
guidance. To constrain the manipulation and main- Deep Embedded Clustering (DEC) [Xie et al., 2016] algo-
tain the local structure of data generating distribu- rithm defines an effective objective in a self-learning man-
tion, an under-complete autoencoder is applied. By ner. The defined clustering loss is used to update parameters
integrating the clustering loss and autoencoder’s re- of transforming network and cluster centers simultaneously.
construction loss, IDEC can jointly optimize clus- The cluster assignment is implicitly integrated to soft labels.
ter labels assignment and learn features that are However, the local structure preservation can not be guaran-
suitable for clustering with local structure preser- teed by the clustering loss. Thus the feature transformation
vation. The resultant optimization problem can be may be misguided, leading to corruption of embedded space.
effectively solved by mini-batch stochastic gradi- To deal with this problem, in this paper, we assume that
ent descent and backpropagation. Experiments on both clustering oriented loss guidance and local structure
image and text datasets empirically validate the im- preservation mechanism are essential for deep clustering. In-
portance of local structure preservation and the ef- spired by [Peng et al., 2016], we use under-complete au-
fectiveness of our algorithm. toencoder to learn embedded features and to preserve local
structure of data generating distribution. We propose to in-
1 Introduction corporate autoencoder into DEC framework. In this way, the
Unsupervised clustering is a vital research topic in data sci- proposed framework can jointly perform clustering and learn
ence and machine learning. Traditional clustering algo- representative features with local structure preservation. We
rithms like k-means [MacQueen, 1967], gaussian mixture refer to our algorithm as Improved Deep Embedded Cluster-
model [Bishop, 2006] and spectral clustering [Von Luxburg, ing (IDEC). The optimization of IDEC can directly perform
2007] group data on handcrafted features according to intrin- mini-batch stochastic gradient descent and backpropagation.
sic characteristics or similarity. However, when the dimen- At last, some experiments are carefully designed and con-
sion of input feature space (data space) is very high, the clus- ducted. The results validate our assumption and the effective-
tering becomes ineffective due to unreliable similarity met- ness of our IDEC.
rics. Transforming data from high dimensional feature space The contributions of this work are summarized as below:
to lower dimensional space in which to perform clustering • We propose a deep clustering algorithm that can jointly
is an intuitive solution and has been widely studied. This perform clustering and learn representative features with
local structure preservation. where x̃ is a copy of x that is corrupted by some form of
• We empirically prove the importance of local structure noise. Therefore, denoising autoencoder has to recover x
preservation in deep clustering. from this corruption rather than simply copying their input.
In this way, denoising autoencoder can force encoder fW and
• The proposed IDEC outperforms the newest opponent in decoder gW 0 to implicitly capture the structure of data gener-
a large margin. ating distribution.
In our algorithm, the denoising autoencoder is used for pre-
2 Related Work training and under-complete autoencoder is added to DEC
framework after initialization.
2.1 Deep Clustering
Existing deep clustering algorithms broadly fall into two cat- 2.3 Deep Embedded Clustering
egories: (i) two-stage work that applies clustering after hav- Deep Embedded Clustering (DEC) [Xie et al., 2016] starts
ing learned a representation, and (ii) approaches that jointly with pretraining an autoencoder and then removes the de-
optimize the feature learning and clustering. coder. The remaining encoder is finetuned by optimizing the
The former category of algorithms directly take advan- following objective:
tage of existing unsupervised deep learning frameworks and XX pij
techniques. For example, [Tian et al., 2014] uses autoen- L = KL(P kQ) = pij log (2)
coder to learn low dimensional features of original graph, and i j
qij
then runs k-means algorithm to get clustering results. [Chen,
2015] layer-wisely trains a Deep Belief Network (DBN) and where qij is the similarity between embedded point
then applies non-parametric maximum-margin clustering to zi and cluster center µj measured by Student’s t-
learned intermediate representation. [Peng et al., 2016] uses distribution [Maaten and Hinton, 2008]:
autoencoder with sparsity prior to learn representations in (1 + kzi − µj k2 )−1
nonlinear latent space that are adaptive to local and global qij = P 2 −1
(3)
subspace structure simultaneously, and then traditional clus- j (1 + kzi − µj k )

tering algorithms are employed to get label assignment. And pij in (2) is the target distribution defined as
The other category of algorithms try to explicitly define
2
P
a clustering loss, simulating classification error in supervised qij / i qij
deep learning. [Yang et al., 2016] proposes a recurrent frame- pij = P 2
P (4)
j qij / i qij
work in deep representations and image clusters, which in-
tegrates two processes into a single model with a unified As we can see, the target distribution P is defined by Q, so
weighted triplet loss and optimizes it end-to-end. DEC [Xie et minimizing L is a form of self-training [Nigam and Ghani,
al., 2016] learns a mapping from the observed space to a low- 2000].
dimensional latent space with deep neural networks, which Let fW be the encoder mapping, i.e. zi = fW (xi ) where
can obtain feature representations and cluster assignments si- xi is input example from dataset X. After pretraining, all em-
multaneously. bedded points {zi } can be extracted using fW . Then employ
The proposed algorithm intrinsically is a modified version k-means on {zi } to get initial cluster centers {µj }. After-
of DEC with incorporating an under-complete autoencoder wards, L can be computed according to (2), (3) and (4). And
to preserve local structure. It excels [Yang et al., 2016] by the predicted label of sample xi is arg maxj qij .
simplicity without recurrent and outperforms DEC in terms During backpropagation, ∂L/∂zi and ∂L/∂µj can be eas-
of clustering accuracy and feature’s representativeness. Since ily computed. Then ∂L/∂zi is passed down to update fW
IDEC mainly depends on autoencoder and DEC, we will in- and ∂L/∂µj is used to update cluster center µj :
troduce them in more detail in the following sections.
∂L
µj = µj − λ (5)
2.2 Autoencoder ∂µj
An autoencoder is a neural network that is trained to attempt The biggest contribution of DEC is the clustering loss (or
to copy its input to its output. Internally, it has a hidden layer target distribution P , to be specific). It works by using high
z that describes a code used to represent the input. The net- confidential samples as supervision and then making samples
work consists of two parts: an encoder function z = fW (x) in each cluster distribute more densely. However, there is no
and a decoder x0 = gW 0 (z) that produces a reconstruction. guarantee of pulling samples near margins towards the correct
There are two widely used types of autoencoders. cluster. We deal with this problem by explicitly preserving
Under-complete autoencoder. It controls the dimension the local structure of data. Under this condition, the super-
of latent code z lower than input data x. Learning such under- vision information of high confidential samples can help the
complete representations force the autoencoder to capture the marginal samples walk to the correct cluster.
most salient features of the data.
Denoising autoencoder. Instead of reconstructing x given 3 Improved Deep Embedded Clustering
x, denoising autoencoder minimizes the following objective:
Consider a dataset X with n samples and each sample xi ∈
L = kx − gW 0 (fW (x̃))k22 (1) Rd where d is the dimension. The number of clusters K is a
𝑥′ 3.2 Local structure preservation

Reconstruction loss
Decoder The embedded points obtained in Section 3.1 are not neces-
𝑥
sarily suitable for clustering task. To this end, DEC [Xie et
Encoder
al., 2016] abandons the decoder and finetunes the encoder us-
𝑧
ing clustering loss Lc . However, we suppose that this kind of
finetuning could distort the embedded space, weaken the rep-
resentativeness of embedded features and thereby hurt clus-
tering performance. Therefore, we propose to keep the de-
𝑞
coder untouched and directly attach the clustering loss to em-
Clustering loss bedded space.
To ensure the effectiveness of clustering loss, the stacked
Figure 1: The network structure of IDEC. The encoder and decoder denoising autoencoder used in pretraining is not appropriate
are composed of fully connected layers. Clustering loss is used to any more. Because the clustering should be performed on
scatter the embedded points z and the reconstruction loss makes sure features of clean data, instead of noised data that used in de-
that the embedded space preserves local structure of data generating noising autoencoder. So we directly remove the noise. Then
distribution. the stacked denoising autoencoder degenerates into an under-
complete autoencoder (See Section 2.2). The reconstruction
priori knowledge and the jth cluster center is represented by loss is measured by Mean Squared Error (MSE):
µj ∈ Rd . Let the value of si ∈ {1, 2, . . . , K} represent the n
X
cluster index assigned to sample xi . Define nonlinear map- Lr = kxi − gW 0 (zi )k22 (8)
ping fW : xi → zi and gW 0 : zi → x0i where zi is the i=1
embedded point of xi in the low dimensional feature space
and x0i is the reconstructed sample for xi . where zi = fW (xi ) and fW and gW 0 are encoder and de-
We aim to find a good fW which makes embedded points coder mappings respectively. As shown in [Peng et al., 2016]
{zi }ni=1 more suitable for clustering task. To this end, two and [Goodfellow et al., 2016], autoencoders can preserve lo-
components are essential: the autoencoder and clustering cal structure of data generating distribution. Under this condi-
loss. The autoencoder is used to learn representations in un- tion, manipulating embedded space slightly using clustering
supervised manner and the learned features can preserve in- loss will not cause corruption. So the coefficient γ is better
trinsic local structure in data. The clustering loss, borrowed to be less than 1, which will be empirically demonstrated in
from [Xie et al., 2016], is responsible for manipulating em- Section 4.3.
bedded space in order to scatter embedded points. The whole
network structure is illustrated in Fig. 1. And the objective is 3.3 Optimization
defined as We optimize (6) using mini-batch stochastic gradient decent
L = Lr + γLc (6) (SGD) and backpropagation. To be specific, there are three
where Lr and Lc are reconstruction loss and clustering loss kinds of parameters to optimize or update: autoencoder’s
respectively, and γ > 0 is a coefficient that controls the de- weights, cluster centers and target distribution P .
gree of distorting embedded space. When γ = 1 and Lr ≡ 0, Update autoencoder’s weights and cluster centers. Fix
(6) reduces to the objective of DEC [Xie et al., 2016]. target distribution P , then the gradients of Lc with respect to
embedded point zi and cluster center µj can be computed as:
3.1 Clustering loss and Initialization
K
The clustering loss is proposed by [Xie et al., 2016]. It is de- ∂Lc X −1
=2 1 + kzi − µj k2 (pij − qij )(zi − µj ) (9)
fined as KL divergence between distributions P and Q, where ∂zi j=1
Q is the distribution of soft labels measured by Student’s t-
n
distribution and P is the target distribution derived from Q. ∂Lc X −1
That is to say, the clustering loss is defined as =2 1 + kzi − µj k2 (qij − pij )(zi − µj ) (10)
∂µj i=1
XX pij
Lc = KL(P kQ) = pij log (7) Note that the above derivations are from [Xie et al., 2016].
i j
qij
Then given a mini batch with m samples and learning rate λ,
where KL is KullbackLeibler divergence that measures the µj is updated by
non-symmetric difference between two probability distribu- m
tions, P and Q are defined by (4) and (3). Details can be λ X ∂Lc
µj = µj − (11)
found in Section 2.3 and [Xie et al., 2016]. m i=1 ∂µj
Follow suggestions in [Xie et al., 2016], we also pretrain a
stacked denoising autoencoder before performing clustering. The decoder’s weights are updated by
After pretraining, embedded points are valid feature represen- m
tations for input samples. Then cluster centers {µj }K j=1 can
λ X ∂Lr
W0 = W0 − (12)
be initialized by employing k-means on {zi = fW (xi )}ni=1 m i=1 ∂W 0
The encoder’s weights are updated by Table 1: Datasets statistics
m
λ X ∂Lr ∂Lc Dataset # examples # classes Dimension
W =W − +γ (13) MNIST 70000 10 784
m i=1 ∂W ∂W
USPS 9298 10 256
Update target distribution. The target distribution P REUTERS-10K 10000 4 2000
serves as “groundtruth” soft label but also depends on pre-
dicted soft label. Therefore, to avoid instability, P should
not be updated at each iteration (one update for autoencoder’s 89
weights using a mini-batch of samples is called an iteration) IDEC
using only a batch of data. In practice, we update target distri- 88 DEC
bution using all embedded points every T iterations. See (3) 87
and (4) for the update rules. When update target distribution,
86

Accuracy (%)
the label assigned to xi is obtained by
si = arg max qij (14) 85
j

where qij is computed by (3). We will stop training if label 84

assignment change (in percentage) between two consecutive 83
updates for target distribution is less than a threshold δ.
The whole algorithm is summarized in Algorithm 1. 82
810 5000 10000 15000 20000
Algorithm 1: Improved Deep Embedded Clustering Iteration
Input: Input data: X; Number of clusters: K; Target 0.5
distribution update interval: T ; Stopping DEC
threshold: δ; Maximum iterations: M axIter. IDEC(total loss)
Output: Autoencoder’s weights W and W 0 ; Cluster 0.4 IDEC(clustering loss)
centers µ and labels s. IDEC(reconstruction loss)
0
1 Initialize µ, W and W according to Section 3.1.
2 for iter ∈ {0, 1, . . . , M axIter} do
0.3
Loss

3 if iter%T == 0 then
4 Compute all embedded points {zi = fW (xi )}ni=1 0.2
5 Update P using (3), (4) and {zi }ni=1 .
Save last label assignment: sold = s.
0.1
6
7 Compute new label assignments s via (14).
8 if sum(sold 6= s)/n < δ then
9 Stop training. 0.00 5000 10000 15000 20000
10 Choose a batch of samples S ∈ X. Iteration
11 Update µ, W 0 and W via (11), (12) and (13) on S. Figure 2: Accuracies and losses during training on MNIST.

It is not difficult to see that the time complexity of IDEC al- • REUTERS-10K: Reuters contains around 810000 En-
gorithm is O(nD2 + ndK), where D, d and K are maximum glish news stories labeled with a category tree [Lewis et
number of neurons in hidden layers, dimension of embedding al., 2004]. Following DEC [Xie et al., 2016], we used 4
layer and number of clusters. Generally K ≤ d ≤ D holds, root categories: corporate/industrial, government/social,
so the time complexity is O(nD2 ). markets and economics as labels and excluded all docu-
ments with multiple labels. Restricted by computational
4 Experiments resources, we randomly sampled a subset of 10000 ex-
amples and computed tf-idf features on the 2000 most
4.1 DataSets frequent words. The sampled dataset is referred as to
The proposed IDEC method is evaluated on two image REUTERS-10K.
datasets and one text dataset:
For all algorithms, we preprocessed datasets as same as DEC,
• MNIST: The MNIST dataset [LeCun et al., 1998] con- i.e. normalizing each example xi ∈ X to d1 kxi k22 ≈ 1.
sists of total 70000 handwritten digits of 28x28 pixel
size. We reshaped each gray image to a 784 dimensional 4.2 Experiment Setup
vector. Comparing methods. We demonstrate the effectiveness of
• USPS: The USPS dataset contains 9298 gray-scale our IDEC algorithm mainly by comparing with DEC [Xie et
handwritten digit images with size of 16x16 pixels. The al., 2016] which can be viewed as a special case of IDEC
features are floating point in [0, 2]. when the reconstruction term is set to zero. we use the pub-
Epoch 0 Epoch 15 Epoch 30

Epoch 0 Epoch 5 Epoch 10

Figure 3: Visualization of clustering results on subset of MNIST during training. Different colors mark different clusters. The first row is
ours and second row corresponds to DEC. The proposed IDEC converges slower since it optimizes reconstruction loss as well. Both methods
separate clusters well but the data structure in the first row is preserved better than DEC. Note points with red and blue color, they are totally
mixed together in DEC while still somehow separable in our IDEC.

licly available code released by the author to report the per- Table 2: Comparison of clustering performance in terms of accuracy
formance of DEC. The two-stage deep clustering algorithm is (%) and NMI (%, in bracket).
denoted as AE+k-means, which means performing k-means
Methods MNIST USPS REUTERS-10K
algorithm on embedded features of pretrained autoencoder.
This is the same as the results of DEC and IDEC before train- k-means 53.24 66.82 51.62
SEC 80.37 N/A 60.08
ing with clustering loss. For the sake of completeness, two AE+k-means 81.82(74.73) 69.31(66.20) 70.52(39.79)
traditional and classic clustering algorithms, k-means and DEC 86.55(83.72) 74.08(75.29) 73.68(49.76)
Spectral Embedded Clustering (SEC) [Nie et al., 2011], are IDEC 88.06(86.72) 76.05(78.46) 75.64(49.81)
also included in comparison. k-means is run 20 times with
different initialization and the result with best objective value
is chosen. SEC is a variant of spectral clustering with a lin- the update intervals T are 140, 30, 3 iterations for MNIST,
earity regularization explicitly added and outperforms tradi- USPS and REUTERS-10K respectively. Our implementation
tional spectral clustering methods on a wide range of datasets is based on Python and Keras [Chollet, 2015] and is available
according to [Nie et al., 2011]. The parameters of SEC are at https://github.com/XifengGuo/IDEC.
fixed as default value in the code provided by the authors. Evaluation Metric. All clustering methods are evaluated
Parameters setting. Following the settings in DEC [Xie by clustering accuracy (ACC) and Normalized Mutual Infor-
et al., 2016], the encoder network is set as a fully connected mation (NMI) which are widely used in unsupervised learn-
multilayer perceptron (MLP) with dimensions d−500−500− ing scenario.
2000 − 10 for all datasets, where d is the dimension of input
data (features). And the decoder network is a mirror of en- 4.3 Results
coder, i.e. a MLP with dimensions 10−2000−500−500−d. We report the results of all comparing algorithms on 3
Except for input, output and embedding layers, all internal datasets in Table 2. As it shows, deep clustering algorithms
layers are activated by ReLU nonlinearity function [Glorot et AE+k-means, DEC and IDEC outperform traditional cluster-
al., 2011]. The autoencoder network pretraining is set exactly ing algorithms k-means and Spectral Embedded Clustering
the same as [Xie et al., 2016], please refer to the paper for (SEC) [Nie et al., 2011] with a large margin, which indi-
more details. After pretraining, the coefficient γ of clustering cates the fascinating potentials of deep learning in unsuper-
loss is set to 0.1 (this is determined by a grid search in {0.01, vised clustering field. The performance gap between AE+k-
0.02, 0.05, 0.1, 0.2, 0.5, 1.0}) and batch size to 256 for all means and DEC reflects the effect of clustering loss. And
datasets. The optimizer Adam [Kingma and Ba, 2014] with the outperformance of IDEC over DEC demonstrates that the
init learning rate λ = 0.001, β1 = 0.9, β2 = 0.999 is applied autoencoder can help improve clustering performance.
for MNIST dataset and SGD with learning rate λ = 0.1 and Figure 2 illustrates the behavior of DEC and IDEC dur-
momentum β = 0.99 is used for USPS and REUTERS-10K ing training on MNIST. We observe the following phenom-
datasets. The convergence threshold is set to δ = 0.1%. And ena. First, the final accuracies comply with results in Table 2,
IDEC λ = 0. 1 IDEC λ = 0. 01 IDEC λ = 0. 001 IDEC λ = 0. 0001
DEC λ = 0. 1 DEC λ = 0. 01 DEC λ = 0. 001 DEC λ = 0. 0001
89 88
88 λ = 0. 1 λ = 0. 1
86
87
84
86 λ = 0. 01 λ = 0. 01 82
85
ACC (%)

NMI (%)
84 80
λ = 0. 001
83 λ = 0. 0001 λ = 0. 001 78
λ = 0. 0001
82 76
81
74
80 -2
10 10 -1 10 0 10 1 10 2 10 -2 10 -1 10 0 10 1 10 2
γ γ

Figure 4: The effect of learning rate λ and clustering coefficient γ in (6) on clustering performance for MNIST dataset.

i.e. IDEC outperforms DEC. Second, IDEC converges slower manipulate the embedded feature space appropriately.
than DEC because of the fluctuation of reconstruction loss. To see how the coefficient γ of clustering loss in (6) affects
Third, IDEC has larger clustering loss and higher clustering the performance of IDEC algorithm, we conduct experiment
accuracy than DEC. This implies that the objective of DEC on MNIST dataset by sampling γ in range [10−2 , 102 ]. The
may mislead the clustering procedure by distorting the em- optimizer is set as SGD with momentum β = 0.9, as same as
bedded feature space and breaking the intrinsic structure of DEC’s default setting, for fair comparison. The learning rate
data. Finally, the reconstruction losses at last few iterations λ is set as 0.1, 0.01, 0.001, 0.0001 successively. As shown in
approximately equal the loss at beginning. It implies that the Figure 4, there are following observations:
performance improvement from DEC to IDEC is not likely
due to the clustering ability of autoencoder. Actually, we did • For the best learning rate, IDEC (λ = 0.1) outperforms
conduct an experiment that finetunes the autoencoder only us- DEC (λ = 0.01) when γ ∈ [0.05, 1.0]. Because γ with
ing reconstruction loss Lr (by setting coefficient γ in (6) to too small value eliminates the positive effect of clus-
0) via various optimizers, and no improvement in terms of tering loss term and large value tends to distort latent
clustering accuracy was observed. So we assume that the au- feature space. When γ → 0, the clustering result ap-
toencoder plays the role of preserving local structure of data, proaches the result of AE+k-means.
and under this condition clustering loss can manipulate em-
bedded space to get better clustering accuracy. • Learning rate λ and clustering coefficient γ are coupling.
For larger γ, it requires smaller λ to maintain perfor-
We further prove our assumption about the role autoen- mance. But the combination of small γ and large λ leads
coder acts by visualizing the embedded feature space during to higher performance. So we recommend γ = 0.1, as
training. The t-SNE [Maaten and Hinton, 2008] visualization we did in all experiments.
on a random subset of MNIST with 1000 samples is shown in
Fig. 3. From left to right in the top row, the training process
of IDEC, the “shape” of each cluster is almost maintained. 5 Conclusion
On the contrary, the “shape” in the bottom is changed a lot
with training proceeding. Furthermore, when you focus on This paper proposes Improved Deep Embedded Clustering
clusters colored by red and blue (digits 4 and 9), in the first (IDEC) algorithm, which jointly performs clustering and
column they are still separable but become distinguishable in learns embedded features that are suitable for clustering and
the last column. This is a loophole of DEC’s objective (clus- preserve local structure of data generating distribution. IDEC
tering loss). Our IDEC doesn’t overcome this problem, but manipulates feature space to scatter data by optimizing a KL
does go further than DEC. To validate this, see the figures in divergence based clustering loss with a self-training target
the last column: blue and red clusters of IDEC are still some- distribution. And it maintains the local structure by incor-
how separable while in DEC they are totally mixed up. This porating an autoencoder. Empirical experiments demonstrate
problem was not observed from Figure 5 in [Xie et al., 2016], that structure preservation is vital to deep clustering algo-
but it indeed happens by using their released code. This is rithm and can favor clustering performance. Future work in-
also pointed out by [Zheng et al., 2016]. It can be concluded cludes: adding more prior knowledge (e.g. sparsity) in IDEC
that the autoencoder can preserve the intrinsic structure of framework, and incorporating convolutional layers for image
data generating distribution and hence help clustering loss to datasets.
Acknowledgments [Von Luxburg, 2007] Ulrike Von Luxburg. A tutorial on
spectral clustering. Statistics and Computing, 17(4):395–
This work was financially supported by the National Nat-
416, 2007.
ural Science Foundation of China (Project no. 60970034,
61170287, 61232016 and 61672528). [Xie et al., 2016] Junyuan Xie, Ross Girshick, and Ali
Farhadi. Unsupervised deep embedding for clustering
analysis. In International Conference on Machine Learn-
References ing (ICML), 2016.
[Bishop, 2006] Christopher M Bishop. Pattern Recognition [Yang et al., 2016] Jianwei Yang, Devi Parikh, and Dhruv
and Machine Learning. Springer, 2006. Batra. Joint unsupervised learning of deep representations
[Chen, 2015] Gang Chen. Deep learning with nonparametric and image clusters. In IEEE Conference on Computer Vi-
clustering. arXiv preprint arXiv:1501.03084, 2015. sion and Pattern Recognition (CVPR), pages 5147–5156,
[Chollet, 2015] François Chollet. Keras, 2015. 2016.
[Zheng et al., 2016] Yin Zheng, Huachun Tan, Bangsheng
[Glorot et al., 2011] Xavier Glorot, Antoine Bordes, and
Tang, Hanning Zhou, et al. Variational deep embed-
Yoshua Bengio. Deep sparse rectifier neural networks. ding: A generative approach to clustering. arXiv preprint
Journal of Machine Learning Research, 15:315–323, arXiv:1611.05148, 2016.
2011.
[Goodfellow et al., 2016] Ian Goodfellow, Yoshua Bengio,
and Aaron Courville. Deep learning. MIT Press, 2016.
[Kingma and Ba, 2014] Diederik Kingma and Jimmy Ba.
Adam: A method for stochastic optimization. arXiv
preprint arXiv:1412.6980, 2014.
[LeCun et al., 1998] Yann LeCun, Léon Bottou, Yoshua
Bengio, and Patrick Haffner. Gradient-based learning ap-
plied to document recognition. Proceedings of the IEEE,
86(11):2278–2324, 1998.
[Lewis et al., 2004] David D Lewis, Yiming Yang, Tony G
Rose, and Fan Li. Rcv1: A new benchmark collection for
text categorization research. Journal of Machine Learning
Research, 5(Apr):361–397, 2004.
[Maaten and Hinton, 2008] Laurens van der Maaten and Ge-
offrey Hinton. Visualizing data using t-sne. Journal of
Machine Learning Research, 9(Nov):2579–2605, 2008.
[MacQueen, 1967] James MacQueen. Some methods for
classification and analysis of multivariate observations. In
Berkeley Symposium on Mathematical Statistics and Prob-
ability, volume 1, pages 281–297. Oakland, CA, USA.,
1967.
[Nie et al., 2011] Feiping Nie, Zinan Zeng, Ivor W Tsang,
Dong Xu, and Changshui Zhang. Spectral embedded
clustering: A framework for in-sample and out-of-sample
spectral clustering. IEEE Transactions on Neural Net-
works, 22(11):1796–1808, 2011.
[Nigam and Ghani, 2000] Kamal Nigam and Rayid Ghani.
Analyzing the effectiveness and applicability of co-
training. In International Conference on Information and
Knowledge Management, pages 86–93. ACM, 2000.
[Peng et al., 2016] Xi Peng, Shijie Xiao, Jiashi Feng, Wei-
Yun Yau, and Zhang Yi. Deep subspace clustering with
sparsity prior. In International Joint Conference on Artifi-
cial Intelligence (IJCAI), 2016.
[Tian et al., 2014] Fei Tian, Bin Gao, Qing Cui, Enhong
Chen, and Tie-Yan Liu. Learning deep representations for
graph clustering. In AAAI, pages 1293–1299, 2014.

Quadratic Equations #BB2.0
100% (2)
Quadratic Equations #BB2.0
117 pages
Clustering Methods For Big Data Analytics Techniques, Toolboxes and Applications
No ratings yet
Clustering Methods For Big Data Analytics Techniques, Toolboxes and Applications
192 pages
A Review On Basic Deep Learning
No ratings yet
A Review On Basic Deep Learning
9 pages
Spectral Clustering Via Ensemble Deep Autoencoder
No ratings yet
Spectral Clustering Via Ensemble Deep Autoencoder
33 pages
A Framework For Deep Constrained Clustering: Hongjing Zhang Tianyang Zhan Sugato Basu Ian Davidson
No ratings yet
A Framework For Deep Constrained Clustering: Hongjing Zhang Tianyang Zhan Sugato Basu Ian Davidson
28 pages
XAI Beyond Classification: Interpretable Neural Clustering
No ratings yet
XAI Beyond Classification: Interpretable Neural Clustering
28 pages
Ronen Et Al. - 2022 - DeepDPM Deep Clustering With An Unknown Number of Clusters
No ratings yet
Ronen Et Al. - 2022 - DeepDPM Deep Clustering With An Unknown Number of Clusters
24 pages
Efficient Block Matching For Removing Impulse Noise
No ratings yet
Efficient Block Matching For Removing Impulse Noise
47 pages
A Comprehensive Survey On Deep Clustering - Taxonomy, Challenges, and Future Directions
No ratings yet
A Comprehensive Survey On Deep Clustering - Taxonomy, Challenges, and Future Directions
35 pages
TSP CMC 51816
No ratings yet
TSP CMC 51816
18 pages
DAC: Deep Autoencoder-Based Clustering, A General Deep Learning Framework of Representation Learning
No ratings yet
DAC: Deep Autoencoder-Based Clustering, A General Deep Learning Framework of Representation Learning
12 pages
20-Structural Deep Clustering Network
No ratings yet
20-Structural Deep Clustering Network
11 pages
Deterministic Annealing For Cluster Compres Classi Regres and Related Opti Prob
No ratings yet
Deterministic Annealing For Cluster Compres Classi Regres and Related Opti Prob
30 pages
1-Learning Deep Generative Clustering Via Mutual Information Maximization
No ratings yet
1-Learning Deep Generative Clustering Via Mutual Information Maximization
13 pages
A Survey of Deep Graph Clustering
No ratings yet
A Survey of Deep Graph Clustering
20 pages
Deep Multiple Auto Encoder Based Multi View Clustering: Guowang Du Lihua Zhou Yudi Yang Kevin Lü Lizhen Wang
No ratings yet
Deep Multiple Auto Encoder Based Multi View Clustering: Guowang Du Lihua Zhou Yudi Yang Kevin Lü Lizhen Wang
16 pages
Deep Clustering
No ratings yet
Deep Clustering
11 pages
1 s2.0 S0950705122010772 Main
No ratings yet
1 s2.0 S0950705122010772 Main
10 pages
Deep Clustering Based On Embedded Auto Encoder
No ratings yet
Deep Clustering Based On Embedded Auto Encoder
16 pages
Are You Ready For PreCalculus?
100% (1)
Are You Ready For PreCalculus?
1 page
Differentiable Deep Clustering With Cluster Size Constraints
No ratings yet
Differentiable Deep Clustering With Cluster Size Constraints
8 pages
Dual Autoencoder Clustering
No ratings yet
Dual Autoencoder Clustering
10 pages
Dijazi - Deep Clustering Via Joint Convolutional Autoencoder Embedding and Relative Entropy Minimization - 17
No ratings yet
Dijazi - Deep Clustering Via Joint Convolutional Autoencoder Embedding and Relative Entropy Minimization - 17
13 pages
22 GCC
No ratings yet
22 GCC
9 pages
Alqahtani 2018
No ratings yet
Alqahtani 2018
5 pages
Rahimzad RS 2021
No ratings yet
Rahimzad RS 2021
18 pages
1.1 Deep Structural Enhanced Network For Document Clustering
No ratings yet
1.1 Deep Structural Enhanced Network For Document Clustering
16 pages
Rough Entropy-Based Fused Granular Features in 2-D Locality Preserving Projections For High-Dimensional Vision Sensor Data
No ratings yet
Rough Entropy-Based Fused Granular Features in 2-D Locality Preserving Projections For High-Dimensional Vision Sensor Data
10 pages
Deep Multi-View Semi-Supervised Clustering
No ratings yet
Deep Multi-View Semi-Supervised Clustering
14 pages
Deep Density-Based Image Clustering
No ratings yet
Deep Density-Based Image Clustering
8 pages
Li Et Al. - 2021 - Learning Point Clouds in EDA
No ratings yet
Li Et Al. - 2021 - Learning Point Clouds in EDA
8 pages
2016-CVPR-Joint Unsupervised Learning of Deep Representations and Image Clusters
No ratings yet
2016-CVPR-Joint Unsupervised Learning of Deep Representations and Image Clusters
10 pages
Variational Deep Embedding
No ratings yet
Variational Deep Embedding
8 pages
Deep Subspace Clustering Networks
No ratings yet
Deep Subspace Clustering Networks
10 pages
Ji Invariant Information Clustering For Unsupervised Image Classification and Segmentation ICCV 2019 Paper
No ratings yet
Ji Invariant Information Clustering For Unsupervised Image Classification and Segmentation ICCV 2019 Paper
10 pages
D U C G M V A: EEP Nsupervised Lustering With Aussian Ixture Ariational Utoencoders
No ratings yet
D U C G M V A: EEP Nsupervised Lustering With Aussian Ixture Ariational Utoencoders
12 pages
Auto-Encoder Based Data Clustering: Abstract. Linear or Non-Linear Data Transformations Are Widely Used
No ratings yet
Auto-Encoder Based Data Clustering: Abstract. Linear or Non-Linear Data Transformations Are Widely Used
8 pages
Chen and Liu - 2018 - Broad Learning System An Effective and Efficient Incremental Learning System Without The Need For D
No ratings yet
Chen and Liu - 2018 - Broad Learning System An Effective and Efficient Incremental Learning System Without The Need For D
15 pages
Kmeansfinal
No ratings yet
Kmeansfinal
5 pages
Deep Autoencoders For Physics-Constrained Data-Driven Nonlinear Materials Modeling
No ratings yet
Deep Autoencoders For Physics-Constrained Data-Driven Nonlinear Materials Modeling
29 pages
Broad Learning System An Effective and Efficient Incremental Learning System Without The Need For Deep Architecture
No ratings yet
Broad Learning System An Effective and Efficient Incremental Learning System Without The Need For Deep Architecture
15 pages
Learning Without Forgetting
No ratings yet
Learning Without Forgetting
13 pages
Machine Learning Tut
No ratings yet
Machine Learning Tut
68 pages
Best-4 - Topological Gradient Based Competitive Learning
No ratings yet
Best-4 - Topological Gradient Based Competitive Learning
12 pages
Taskonomy: Disentangling Task Transfer Learning
No ratings yet
Taskonomy: Disentangling Task Transfer Learning
11 pages
Deep Learning With Nonparametric Clustering: Gang Chen
No ratings yet
Deep Learning With Nonparametric Clustering: Gang Chen
14 pages
Image Recognition Based On Deep Learning
No ratings yet
Image Recognition Based On Deep Learning
5 pages
Exemplar Class 9 CBSE Math
No ratings yet
Exemplar Class 9 CBSE Math
66 pages
Why Are Graph Neural Networks Effective For EDA Problems
No ratings yet
Why Are Graph Neural Networks Effective For EDA Problems
8 pages
Towards K-Means-Friendly Spaces: Simultaneous Deep Learning and Clustering
No ratings yet
Towards K-Means-Friendly Spaces: Simultaneous Deep Learning and Clustering
12 pages
Deep Learning For Image Denoising
No ratings yet
Deep Learning For Image Denoising
10 pages
Deep Clustering With Convolutional Autoencoders
No ratings yet
Deep Clustering With Convolutional Autoencoders
10 pages
L D S M: Earning EEP Tructured Odels
No ratings yet
L D S M: Earning EEP Tructured Odels
11 pages
2015 Lecun
No ratings yet
2015 Lecun
10 pages
Clustering With Deep Learning: Taxonomy and New Methods
No ratings yet
Clustering With Deep Learning: Taxonomy and New Methods
12 pages
Is Simple Better?: Revisiting Simple Generative Models For Unsupervised Clustering
No ratings yet
Is Simple Better?: Revisiting Simple Generative Models For Unsupervised Clustering
6 pages
Finding A Point On A Bézier Curve - de Casteljau's Algorithm
No ratings yet
Finding A Point On A Bézier Curve - de Casteljau's Algorithm
4 pages
Robust Continuous Clustering: Sohil Atul Shah and Vladlen Koltun
No ratings yet
Robust Continuous Clustering: Sohil Atul Shah and Vladlen Koltun
6 pages
Model-Driven Deep Learning
No ratings yet
Model-Driven Deep Learning
3 pages
7.weighted Residual Methods Galerkin - English - 2022
No ratings yet
7.weighted Residual Methods Galerkin - English - 2022
36 pages
MTH603 Finalterm Paper 1
No ratings yet
MTH603 Finalterm Paper 1
11 pages
Multimodal RAG Systems Hands-On Guide
No ratings yet
Multimodal RAG Systems Hands-On Guide
7 pages
V3i206 PDF
No ratings yet
V3i206 PDF
5 pages
DBN Adaptive
No ratings yet
DBN Adaptive
4 pages
frmCourseSyllabusIPDownload Aspx
No ratings yet
frmCourseSyllabusIPDownload Aspx
2 pages
Clustering Techniques
No ratings yet
Clustering Techniques
38 pages
Notes On Dynamic Programming Algorithms & Data Structures: DR Mary Cryan
No ratings yet
Notes On Dynamic Programming Algorithms & Data Structures: DR Mary Cryan
12 pages
October 10, 2022 1 / 100
No ratings yet
October 10, 2022 1 / 100
105 pages
Dynamic Programming (I) : Kelvin Chow (Lrt1088) 2024-03-02
No ratings yet
Dynamic Programming (I) : Kelvin Chow (Lrt1088) 2024-03-02
96 pages
Bba Ca Question Bank
No ratings yet
Bba Ca Question Bank
11 pages
Daa - GP - 250203 - 090859
No ratings yet
Daa - GP - 250203 - 090859
13 pages
No Efficient Solution Algorithm Has Been Found. Many Significant Computer
No ratings yet
No Efficient Solution Algorithm Has Been Found. Many Significant Computer
2 pages
Basics Polynomials
No ratings yet
Basics Polynomials
78 pages
Lesson 1-Week 1
No ratings yet
Lesson 1-Week 1
3 pages
Machine Reasoning Explainability
No ratings yet
Machine Reasoning Explainability
72 pages
Module No.: July 2, 2021 Module 4: Linear Programming: Transportation Method
No ratings yet
Module No.: July 2, 2021 Module 4: Linear Programming: Transportation Method
88 pages
Midpoint Approximation: B A B A
No ratings yet
Midpoint Approximation: B A B A
44 pages
Backpropagation Math
No ratings yet
Backpropagation Math
11 pages
QR Factorization: Triangular Matrices QR Factorization Gram-Schmidt Algorithm Householder Algorithm
No ratings yet
QR Factorization: Triangular Matrices QR Factorization Gram-Schmidt Algorithm Householder Algorithm
42 pages
Mobile Information Systems - 2021 - Liu - Dynamic Adjustment Strategy of Rail Guide Vehicle
No ratings yet
Mobile Information Systems - 2021 - Liu - Dynamic Adjustment Strategy of Rail Guide Vehicle
9 pages
Sciadv Aat9004
No ratings yet
Sciadv Aat9004
7 pages
Untitled Document 2024 04 17T150803.147
No ratings yet
Untitled Document 2024 04 17T150803.147
6 pages
Lesson 7.4 - Factoring Ax 2 + BX + C
No ratings yet
Lesson 7.4 - Factoring Ax 2 + BX + C
9 pages
ADA Lab Manual Updated 2023-24
No ratings yet
ADA Lab Manual Updated 2023-24
38 pages
2018 Oblique DT !!
No ratings yet
2018 Oblique DT !!
17 pages
ADA QB Sol
No ratings yet
ADA QB Sol
13 pages
Vector Quantization
No ratings yet
Vector Quantization
14 pages
Simplex Method Max LP
No ratings yet
Simplex Method Max LP
3 pages
A Comparison of Document Clustering Techniques
No ratings yet
A Comparison of Document Clustering Techniques
3 pages
Aggregate Planning LP Example
No ratings yet
Aggregate Planning LP Example
5 pages
16 MKS) No. 6
No ratings yet
16 MKS) No. 6
2 pages
Applied Numerical Methods: Dr. Khaled Ahmida Al-Ashouri The Libyan Academy - Tripoli
No ratings yet
Applied Numerical Methods: Dr. Khaled Ahmida Al-Ashouri The Libyan Academy - Tripoli
28 pages

2017 IDEC Guo

Uploaded by

2017 IDEC Guo

Uploaded by

Improved Deep Embedded Clustering with Local Structure Preservation

Xifeng Guo, Long Gao, Xinwang Liu, Jianping Yin

Abstract can be done by applying dimension reduction techniques like

where qij is computed by (3). We will stop training if label 84

Epoch 0 Epoch 5 Epoch 10

You might also like