0% found this document useful (0 votes)
22 views

Dimension Reduction On Open Data Using Variational Autoencoder - Hu2014

The document discusses improving the architecture of an autoencoder for dimension reduction. It introduces autoencoders and folded autoencoders, which are deep learning methods for dimension reduction. It then proposes an improved autoencoder architecture and presents experimental results showing the effectiveness of the proposed method for image processing of handwritten digits.

Uploaded by

srinivasmekala
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Dimension Reduction On Open Data Using Variational Autoencoder - Hu2014

The document discusses improving the architecture of an autoencoder for dimension reduction. It introduces autoencoders and folded autoencoders, which are deep learning methods for dimension reduction. It then proposes an improved autoencoder architecture and presents experimental results showing the effectiveness of the proposed method for image processing of handwritten digits.

Uploaded by

srinivasmekala
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

20142014

IEEEIEEE
International
11th Intl Conference
Conf on Ubiquitous
on Ubiquitous
Intelligence
Intelligence
& Computing
and Computing/International
and 2014 IEEE 11th Intl
Conference
Conf on on
Autonomic
Autonomic
& Trusted
and Trusted
Computing
Computing
Computing/International
and and
20142014
IEEE
IEEE
14th14th
Conference
Intl Intl
Conf Conf
on
onScalable
on
Scalable
Scalable
Computing
Computing
Computing
and
andand
Communications
Communications
Communications
and
andand
Associated
ItsIts
Associated
Associated
Symposia/Workshops
Workshops
Workshops

Improving the Architecture of an Autoencoder for Dimension Reduction

Changjie Hu, Xiaoli Hou, Yonggang Lu*


School of Information Science and Engineering, Lanzhou University
Lanzhou, China
*
The corresponding author: [email protected]

Abstract—Dimension reduction is used by scientists to deal practical application, autoencoder usually have a large
with huge amount of high-dimensional data because of the number of connections so that back propagation approaches
“curse of dimensionality”. There exist many methods of converge slowly. To reduce the number of weights to be
dimension reduction, such as principal components analysis tuned and the computational cost, folded autoencoder is
(PCA), Locally Linear Embedding (LLE), Stochastic Neighbor proposed [1]. Although both autoencoder and folded
Embedding (SNE), etc. Autoencoder is also applied for autoencoder work well in many dimension reduction
dimension reduction recently. It uses deep learning to train the problems, the change of the neural network architecture can
network and has been applied in image reconstruction affect the results greatly. In this paper we have proposed an
successfully. However, one important problem in autoencoder
improved architecture of an autoencoder for the application
application is how to find the best architecture of the network.
in the image processing of handwritten digital numbers.
In this paper, we propose an improved architecture of the
autoencoder for dimension reduction. The experimental results The rest of the paper is organized as follows. In Section
show the effectiveness of the proposed method. II, we briefly introduce the autoencoder. In Section III, we
describe the folded autoencoder. In Section IV, we introduce
Keywords-dimension reduction; autoencoder; neural network the improved architecture of the network, and present the
architecture experimental results. At last, Section V concludes the paper
with discussions.
I. INTRODUCTION II. AUTOENCODER
In the big data era, with the development of data Autoencoder is a deep learning method, which uses feed-
acquisition methods, scientists need to deal with huge forward neural network with an odd number of hidden layers
amount of high dimensional data more often. However, there [2, 6]. For the input layer and the output layer, the number of
exists the “curse of dimensionality” problem, and one way to nodes is determined by the input data X. So the input layer
avoid the problem is through dimension reduction, which is and the output layer have the same number of nodes
to discover efficient methods to transform the high- (#nodes=D) and both correspond to the high dimensional
dimensional data into a more compact and meaningfully representation. The middle hidden layer which has the least
expression in low-dimensional space [1]. A simple and number of nodes (#nodes=d) corresponds to the low-
widely used method is principal components analysis (PCA), dimensional representation. An example of an autoencoder is
which finds the directions of the greatest variances in the shown in Fig. 1 [2]. The object of the training process is to
dataset and represents each data point by its coordinates minimize the squared reconstruction error between the input
along each of these directions [2]. One drawback of PCA is and the output of the network [1].
that it is a linear technique, and it cannot handle complex Back propagation is often used as the learning algorithm
nonlinear data effectively. So many nonlinear methods for for the autoencoder. The initial weights of the network are
dimension reduction have been developed, which includes crucial for an autoencoder to find a good solution. If the
Locally Linear Embedding (LLE) [3], Stochastic Neighbor initial weights are closer to an optimal solution, back
Embedding (SNE) [4], Autoencoder [2], etc. LLE represents propagation works more effectively. Many algorithms have
the data points as a linear combination of their nearest been designed to find good initial weights. Here, we use
neighbors to preserve the local properties of the data in Restricted Boltzmann Machine (RBM) for weights
nonlinear high dimensional space [3, 5]. SNE uses a initialization. RBM is a powerful tool and has been used in
probabilistic approach to place the data points in low autoencoder successfully [2, 7, 8].
dimensional space according to the pair wise dissimilarities
in high dimensional space [4]. Autoencoder is a deep III. FOLDED AUTOENCODER
learning method, which uses RBM to find initial states of the
network and then uses back propagation to learn the two-way The folded autoencoder is based on the conventional
mapping relationships between high-dimensional space and autoencoder mentioned above. The architecture of a folded
low-dimensional space [2]. Autoencoder is widely used to autoencoder is illustrated in Fig. 2 [1]. Compare with the
solve dimension reduction problems in various domains. In autoencoder illustrated in Fig. 1, the architecture of the

978-1-4799-7646-1/14 $31.00 © 2014 IEEE 851


854
855
DOI 10.1109/UIC-ATC-ScalCom.2014.50
folded autoencoder is generated by folding the right side of choose 1000 digits from MNIST as the test data. For the
the conventional structure to the left side. The folded USPS dataset, we also randomly choose 5000 digits for
autoencoder has (L-1)/2 hidden layers, while the autoencoder training and 1000 digits for testing.
has L hidden layers. So, the folded autoencoder has less
hidden layers compared to the traditional autoencoder. B. Structure Improvement
We have improved both the architecture for the
autoencoder and the folded autoencoder. The number of the
input nodes and the output nodes are determined by the data,
which are 784 and 30 respectively. For the two datasets, the
conventional architecture is “784-2000-1000-500-30” for the
folded autoencoder [1] and “784-2000-1000-500-30-500-
1000-2000-784” for the traditional autoencoder [2]. They are
both referred as structure “2000-1000-500” in the following
sections. In our experiments, the architecture is changed to
“784-1000-1000-300-30” for the folded autoencoder and
“784-1000-1000-300-30-300-1000-1000-784” for the
unfolded autoencoder. In the following they are both referred
as structure “1000-1000-300”.
C. Performance measure and the setup of the parameters
In order to evaluate the proposed method, we use the
squared reconstruction error and the C-index [11]. The
squared reconstruction error is defined as:
¦
n
i =1
( x 'i − xi ) n , where x’i represents the output, and xi
represents the input. The output is the reconstruction data of
the autoencoder, and the input is the original data.
The C-index is defined as:
Figure 1. Structure of an autoencoder
Win − Wmin ( N in )
C-index=
Wmax ( N in ) − Wmin ( N in )
where Win is the sum of all the intra-cluster distances,
Nin is the total number of intra-cluster edges or point pairs,
Wmin(Nin) is the sum of the smallest Nin distances in the
proximity matrix W, and Wmax(Nin) is the sum of the largest
Nin distances in the proximity matrix W. The C-index
measures to what extent the clustering puts together the Nin
points that are the closest across the k clusters. The C-index
lies in the range of [0, 1]. Usually the smaller the C-index,
the better the clustering results is. After dimension reduction,
the data in the low dimensional space are evaluated by the C-
index to show the quality of the clustering given by the
benchmark labels.
Both the folded and unfolded autoencoder use the
method of RBM to initialize the neural network weights. The
maximum number of iterations in RBM is set to 10. The
Figure 2. Structure of a folded autoencoder [1].
back propagation is chosen for the training of the network
and the maximum number of the back propagation iterations
is set to 20. Activation functions applied in all neurons are
IV. THE EXPERIMENTAL RESULTS 1
sigmoid function which is defined as σ ( x) =
All the codes are implemented in Matlab. And the 1 + e− x .
experiments are run on a desktop computer with an AMD
3.02GHz Dual-Core CPU and 4GB of RAM. D. Results and analysis
Fig. 3 and Fig. 4 show the experimental results for
A. The Datasets
random samples of the test data of the MNIST dataset and
Two well-known image datasets, MNIST dataset [9] and the USPS dataset respectively. For the MNIST dataset using
USPS dataset [10], are used to test the methods. They are both folded and unfolded autoencoder, it can be seen from
both handwritten digits by different writers. We randomly Fig. 3 that, most of the reconstruction images using the
choose 5000 digits from MNIST as the training data and proposed structure “1000-1000-300” are more similar to the

856
855
852
original data than the images produced by the structure space in terms of the quality of the clusters given by the
“2000-1000-500”. For the results of USPS dataset shown in benchmark labels. So it has limitations in some cases.
Fig. 4, it is hard to tell from the images due to the low Finding a better index for evaluating the dimension reduction
resolution. However, the numerical evaluations given in the results is one of our future research focuses.
following tables also suggest that the proposed structure
“1000-1000-300” can produce better results than the ACKNOWLEDGMENT
structure “2000-1000-500”. This work is supported by the National Science
The squared reconstruction error of both the training data Foundation of China (Grants No. 61272213).
and the test data for the MNIST dataset and the USPS dataset
is shown in TABLE I and TABLE II respectively. The
smaller the squared reconstruction error the better the REFERENCES
reconstruction is. It can been seen that most of the squared [1] Jing Wang, Haibo He, Danil V. Prokhorov. “A Folded Neural
reconstruction error produced by the proposed structure Network Autoencoder for Dimension reduction,” Procedia Computer
“1000-1000-300” is better than the one produced by the Science, vol. 13, pp. 120 – 127, 2012.
structure “2000-1000-500”, except for the test data of the [2] G. E. Hinton, R. R. Salakhutdinov, Reducing the dimensionality of
MNIST dataset using unfolded autoencoder. data with neural networks, Science, vol. 313(5786), pp. 504–507,
2006.
The C-index of both the training data and the test data
[3] S.T. Roweis and L.K. Saul. Nonlinear dimensionality reductionby
after dimension reduction is shown in TABLE III and Locally Linear Embedding, Science, vol. 290(5500), pp. 2323–2326,
TABLE IV respectively. All the C-indices produced by the 2000.
proposed structure “1000-1000-300” are better than these [4] G. E. Hinton, S.T.Roweis. Stochastic neighbor embedding, Advances
produced by the structure “2000-1000-500” for the folded in NeuralInformation Processing Systems, vol. 15, pp. 833-840, 2002.
autoencoder. But for the unfolded autoencoder the result is [5] van der Maaten L J P, Postma E O, van den Herik H J.
opposite. This may be due to the limitations of the C-index Dimensionality reduction: A comparative review[J], Journal of
when applied to the evaluation of the dimension reduction Machine Learning Research, vol. 10, pp. 66-71, 2009.
results. As can be seen from Fig. 3 that the proposed [6] D. Mers and G. Cottrell. Non-linear dimension reduction,Advances in
structure “1000-1000-300” also works better than the Neural Information Processing Systems, vol. 5, pp. 580-587, 1993.
structure “2000-1000-500” when the unfolded autoencoder is [7] P. Smolensky, Parallel Distributed Processing: Volume 1:
Foundations, D. E. Rumelhart, J. L. McClelland, Eds. (MIT Press,
used, especially for the reconstruction images of the number Cambridge, 1986), pp. 194–281.
“3”. [8] G. E. Hinton, Training products of experts by minimizing contrastive
divergence, Neural Computation, vol. 14, pp. 1711-1800, 2002.
V. CONCLUSION
[9] The MNIST dataset is available at
We have improved the architecture of both the folded and http://yann.lecun.com/exdb/mnist/index.html.
unfolded autoencoder for dimension reduction. As can be [10] The USPS dataset is available at
seen from the experimental results of two popular http://www.cs.nyu.edu/~roweis/data.html.
handwritten datasets, the proposed architecture can produce [11] Hubert, L. and Schultz, J. Quadratic assignment as a general data-
better reconstruction results than the original one proposed analysis strategy, British Journal of Mathematical and Statistical
Psychology, vol. 29, pp. 190-241, 1976.
before in most cases. Although C-index can also be used to
evaluate the dimension reduction results, it only evaluates
how good the data representation is in the low dimensional

Figure 3. Experimental results for random samples of the MNIST dataset. (A) Original images; (B) reconstruction image by folded autoencoder with the
structure “2000-1000-500”; (C) reconstruction image by folded autoencoder with the proposed structure “1000-1000-300”; (D) reconstruction image by
unfolded autoencoder with the structure “2000-1000-500”; (E) reconstruction image by unfolded autoencoder with the proposed structure “1000-1000-300”.

857
856
853
Figure 4. Experimental results for random samples of the USPS dataset. (A) Original images; (B) reconstruction image by folded autoencoder with the
structure “2000-1000-500”; (C) reconstruction image by folded autoencoder with the proposed structure “1000-1000-300”; (D) reconstruction image by
unfolded autoencoder with the structure “2000-1000-500”; (E) reconstruction image by unfolded autoencoder with the proposed structure “1000-1000-300”.

TABLE I. THE EXPERIMENTAL RESULTS (SQUARED RECONSTRUCTION ERROR) OF THE MNIST DATASET.

Folded autoencoder Unfolded autoencoder


Structure
Train_errora Test_errorb Train_error Test_error
2000-1000-500 20.4398 21.7094 8.2677 10.488
1000-1000-300 16.9854 18.5288 8.2440 10.573
a
squared reconstruction error for the training data;
b
squared reconstruction error for the test data.

TABLE II. THE EXPERIMENTAL RESULTS (SQUARED RECONSTRUCTION ERROR) OF THE USPS DATASET.

Folded autoencoder Unfolded autoencoder


Structure
Train_errora Test_errorb Train_error Test_error
2000-1000-500 18.4419 17.8276 5.9523 6.3953
1000-1000-300 11.0251 11.0882 4.9054 5.4495
a
squared reconstruction error for the training data;
b
squared reconstruction error for the test data.

TABLE III. THE EXPERIMENTAL RESULTS (C-INDEX) OF THE MNIST DATASET.

Folded autoencoder Unfolded autoencoder


Structure a b
Train_cindex Test_ cindex Train_ cindex Test_ cindex
2000-1000-500 0.2321 0.2328 0.2232 0.2139
1000-1000-300 0.2204 0.2118 0.2428 0.2344
a
C-index for the training data;
b
C-index for the test data.

TABLE IV. THE EXPERIMENTAL RESULTS (C-INDEX) OF THE USPS DATASET.

Folded autoencoder Unfolded autoencoder


Structure
Train_cindexa Test_ cindexb Train_ cindex Test_ cindex
2000-1000-500 0.2575 0.2621 0.2295 0.2239
1000-1000-300 0.2549 0.2533 0.2443 0.2370
a
C-index for the training data;
b
C-index for the test data.

858
857
854

You might also like