Dimension Reduction On Open Data Using Variational Autoencoder - Hu2014
Dimension Reduction On Open Data Using Variational Autoencoder - Hu2014
IEEEIEEE
International
11th Intl Conference
Conf on Ubiquitous
on Ubiquitous
Intelligence
Intelligence
& Computing
and Computing/International
and 2014 IEEE 11th Intl
Conference
Conf on on
Autonomic
Autonomic
& Trusted
and Trusted
Computing
Computing
Computing/International
and and
20142014
IEEE
IEEE
14th14th
Conference
Intl Intl
Conf Conf
on
onScalable
on
Scalable
Scalable
Computing
Computing
Computing
and
andand
Communications
Communications
Communications
and
andand
Associated
ItsIts
Associated
Associated
Symposia/Workshops
Workshops
Workshops
Abstract—Dimension reduction is used by scientists to deal practical application, autoencoder usually have a large
with huge amount of high-dimensional data because of the number of connections so that back propagation approaches
“curse of dimensionality”. There exist many methods of converge slowly. To reduce the number of weights to be
dimension reduction, such as principal components analysis tuned and the computational cost, folded autoencoder is
(PCA), Locally Linear Embedding (LLE), Stochastic Neighbor proposed [1]. Although both autoencoder and folded
Embedding (SNE), etc. Autoencoder is also applied for autoencoder work well in many dimension reduction
dimension reduction recently. It uses deep learning to train the problems, the change of the neural network architecture can
network and has been applied in image reconstruction affect the results greatly. In this paper we have proposed an
successfully. However, one important problem in autoencoder
improved architecture of an autoencoder for the application
application is how to find the best architecture of the network.
in the image processing of handwritten digital numbers.
In this paper, we propose an improved architecture of the
autoencoder for dimension reduction. The experimental results The rest of the paper is organized as follows. In Section
show the effectiveness of the proposed method. II, we briefly introduce the autoencoder. In Section III, we
describe the folded autoencoder. In Section IV, we introduce
Keywords-dimension reduction; autoencoder; neural network the improved architecture of the network, and present the
architecture experimental results. At last, Section V concludes the paper
with discussions.
I. INTRODUCTION II. AUTOENCODER
In the big data era, with the development of data Autoencoder is a deep learning method, which uses feed-
acquisition methods, scientists need to deal with huge forward neural network with an odd number of hidden layers
amount of high dimensional data more often. However, there [2, 6]. For the input layer and the output layer, the number of
exists the “curse of dimensionality” problem, and one way to nodes is determined by the input data X. So the input layer
avoid the problem is through dimension reduction, which is and the output layer have the same number of nodes
to discover efficient methods to transform the high- (#nodes=D) and both correspond to the high dimensional
dimensional data into a more compact and meaningfully representation. The middle hidden layer which has the least
expression in low-dimensional space [1]. A simple and number of nodes (#nodes=d) corresponds to the low-
widely used method is principal components analysis (PCA), dimensional representation. An example of an autoencoder is
which finds the directions of the greatest variances in the shown in Fig. 1 [2]. The object of the training process is to
dataset and represents each data point by its coordinates minimize the squared reconstruction error between the input
along each of these directions [2]. One drawback of PCA is and the output of the network [1].
that it is a linear technique, and it cannot handle complex Back propagation is often used as the learning algorithm
nonlinear data effectively. So many nonlinear methods for for the autoencoder. The initial weights of the network are
dimension reduction have been developed, which includes crucial for an autoencoder to find a good solution. If the
Locally Linear Embedding (LLE) [3], Stochastic Neighbor initial weights are closer to an optimal solution, back
Embedding (SNE) [4], Autoencoder [2], etc. LLE represents propagation works more effectively. Many algorithms have
the data points as a linear combination of their nearest been designed to find good initial weights. Here, we use
neighbors to preserve the local properties of the data in Restricted Boltzmann Machine (RBM) for weights
nonlinear high dimensional space [3, 5]. SNE uses a initialization. RBM is a powerful tool and has been used in
probabilistic approach to place the data points in low autoencoder successfully [2, 7, 8].
dimensional space according to the pair wise dissimilarities
in high dimensional space [4]. Autoencoder is a deep III. FOLDED AUTOENCODER
learning method, which uses RBM to find initial states of the
network and then uses back propagation to learn the two-way The folded autoencoder is based on the conventional
mapping relationships between high-dimensional space and autoencoder mentioned above. The architecture of a folded
low-dimensional space [2]. Autoencoder is widely used to autoencoder is illustrated in Fig. 2 [1]. Compare with the
solve dimension reduction problems in various domains. In autoencoder illustrated in Fig. 1, the architecture of the
856
855
852
original data than the images produced by the structure space in terms of the quality of the clusters given by the
“2000-1000-500”. For the results of USPS dataset shown in benchmark labels. So it has limitations in some cases.
Fig. 4, it is hard to tell from the images due to the low Finding a better index for evaluating the dimension reduction
resolution. However, the numerical evaluations given in the results is one of our future research focuses.
following tables also suggest that the proposed structure
“1000-1000-300” can produce better results than the ACKNOWLEDGMENT
structure “2000-1000-500”. This work is supported by the National Science
The squared reconstruction error of both the training data Foundation of China (Grants No. 61272213).
and the test data for the MNIST dataset and the USPS dataset
is shown in TABLE I and TABLE II respectively. The
smaller the squared reconstruction error the better the REFERENCES
reconstruction is. It can been seen that most of the squared [1] Jing Wang, Haibo He, Danil V. Prokhorov. “A Folded Neural
reconstruction error produced by the proposed structure Network Autoencoder for Dimension reduction,” Procedia Computer
“1000-1000-300” is better than the one produced by the Science, vol. 13, pp. 120 – 127, 2012.
structure “2000-1000-500”, except for the test data of the [2] G. E. Hinton, R. R. Salakhutdinov, Reducing the dimensionality of
MNIST dataset using unfolded autoencoder. data with neural networks, Science, vol. 313(5786), pp. 504–507,
2006.
The C-index of both the training data and the test data
[3] S.T. Roweis and L.K. Saul. Nonlinear dimensionality reductionby
after dimension reduction is shown in TABLE III and Locally Linear Embedding, Science, vol. 290(5500), pp. 2323–2326,
TABLE IV respectively. All the C-indices produced by the 2000.
proposed structure “1000-1000-300” are better than these [4] G. E. Hinton, S.T.Roweis. Stochastic neighbor embedding, Advances
produced by the structure “2000-1000-500” for the folded in NeuralInformation Processing Systems, vol. 15, pp. 833-840, 2002.
autoencoder. But for the unfolded autoencoder the result is [5] van der Maaten L J P, Postma E O, van den Herik H J.
opposite. This may be due to the limitations of the C-index Dimensionality reduction: A comparative review[J], Journal of
when applied to the evaluation of the dimension reduction Machine Learning Research, vol. 10, pp. 66-71, 2009.
results. As can be seen from Fig. 3 that the proposed [6] D. Mers and G. Cottrell. Non-linear dimension reduction,Advances in
structure “1000-1000-300” also works better than the Neural Information Processing Systems, vol. 5, pp. 580-587, 1993.
structure “2000-1000-500” when the unfolded autoencoder is [7] P. Smolensky, Parallel Distributed Processing: Volume 1:
Foundations, D. E. Rumelhart, J. L. McClelland, Eds. (MIT Press,
used, especially for the reconstruction images of the number Cambridge, 1986), pp. 194–281.
“3”. [8] G. E. Hinton, Training products of experts by minimizing contrastive
divergence, Neural Computation, vol. 14, pp. 1711-1800, 2002.
V. CONCLUSION
[9] The MNIST dataset is available at
We have improved the architecture of both the folded and http://yann.lecun.com/exdb/mnist/index.html.
unfolded autoencoder for dimension reduction. As can be [10] The USPS dataset is available at
seen from the experimental results of two popular http://www.cs.nyu.edu/~roweis/data.html.
handwritten datasets, the proposed architecture can produce [11] Hubert, L. and Schultz, J. Quadratic assignment as a general data-
better reconstruction results than the original one proposed analysis strategy, British Journal of Mathematical and Statistical
Psychology, vol. 29, pp. 190-241, 1976.
before in most cases. Although C-index can also be used to
evaluate the dimension reduction results, it only evaluates
how good the data representation is in the low dimensional
Figure 3. Experimental results for random samples of the MNIST dataset. (A) Original images; (B) reconstruction image by folded autoencoder with the
structure “2000-1000-500”; (C) reconstruction image by folded autoencoder with the proposed structure “1000-1000-300”; (D) reconstruction image by
unfolded autoencoder with the structure “2000-1000-500”; (E) reconstruction image by unfolded autoencoder with the proposed structure “1000-1000-300”.
857
856
853
Figure 4. Experimental results for random samples of the USPS dataset. (A) Original images; (B) reconstruction image by folded autoencoder with the
structure “2000-1000-500”; (C) reconstruction image by folded autoencoder with the proposed structure “1000-1000-300”; (D) reconstruction image by
unfolded autoencoder with the structure “2000-1000-500”; (E) reconstruction image by unfolded autoencoder with the proposed structure “1000-1000-300”.
TABLE I. THE EXPERIMENTAL RESULTS (SQUARED RECONSTRUCTION ERROR) OF THE MNIST DATASET.
TABLE II. THE EXPERIMENTAL RESULTS (SQUARED RECONSTRUCTION ERROR) OF THE USPS DATASET.
858
857
854