0% found this document useful (0 votes)

7 views

Auto-Encoder Based Dimensionality Reduction

Uploaded by

Duy Pham

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Auto-Encoder Based Dimensionality Reduction

Uploaded by

Duy Pham

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Author’s Accepted Manuscript

Auto-Encoder Based Dimensionality Reduction

Yasi Wang, Hongxun Yao, Sicheng Zhao

www.elsevier.com/locate/neucom

PII: S0925-2312(15)01767-1
DOI: http://dx.doi.org/10.1016/j.neucom.2015.08.104
Reference: NEUCOM16394
To appear in: Neurocomputing
Received date: 22 January 2015
Revised date: 3 August 2015
Accepted date: 4 August 2015
Cite this article as: Yasi Wang, Hongxun Yao and Sicheng Zhao, Auto-Encoder
Based Dimensionality Reduction, Neurocomputing,
http://dx.doi.org/10.1016/j.neucom.2015.08.104
This is a PDF file of an unedited manuscript that has been accepted for
publication. As a service to our customers we are providing this early version of
the manuscript. The manuscript will undergo copyediting, typesetting, and
review of the resulting galley proof before it is published in its final citable form.
Please note that during the production process errors may be discovered which
could affect the content, and all legal disclaimers that apply to the journal pertain.
Auto-Encoder Based Dimensionality Reduction
Yasi Wang, Hongxun Yao∗, Sicheng Zhao
School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China

Abstract

Auto-encoder — a tricky three-layered neural network, known as auto-association before,

constructs the “building block” of deep learning, which has been demonstrated to achieve
good performance in various domains. In this paper, we try to investigate the dimensionality
reduction ability of auto-encoder, and see if it has some kind of good property that might
accumulate when being stacked and thus contribute to the success of deep learning.
Based on the above idea, this paper starts from auto-encoder and focuses on its ability
to reduce the dimensionality, trying to understand the difference between auto-encoder and
state-of-the-art dimensionality reduction methods. Experiments are conducted both on the
synthesized data for an intuitive understanding of the method, mainly on two and three-
dimensional spaces for better visualization, and on some real datasets, including MNIST
and Olivetti face datasets. The results show that auto-encoder can indeed learn something
different from other methods. Besides, we preliminarily investigate the influence of the
number of hidden layer nodes on the performance of auto-encoder and its possible relation
with the intrinsic dimensionality of input data.
Keywords: Auto-encoder, dimensionality reduction, visualization, intrinsic dimensionality,
dimensionality-accuracy

∗
Corresponding author.
Email addresses: [email protected] (Yasi Wang), [email protected] (Hongxun Yao),
[email protected] (Sicheng Zhao)

Preprint submitted to Neurocomputing November 24, 2015

1. Introduction

Artificial neural network is a mathematical and computational model composed of a

large number of neurons that can simulate the structural and functional characteristics of
biological neural network. It is a self-adaptive system, which changes the internal structure
according to the external input, and is commonly used to model the complex relationship
between input and output. The development of artificial neural network is a tortuous road.
Perceptron is the starting point of modern neural computation, and the proof of perceptron
convergence theorem in 1962 triggered the first climax of research for neural network rep-
resented by perceptron. In 1965, M. Minsky and S. Papert [2] [20] pointed out the defects
of perceptron and took a pessimistic view on the research of neural network, which made
neural network study from rise into stagnation. By the early 1980s, related work by J.
J. Hopfield et al. [16] showed the potential of neural network which made neural network
study from stagnation to boom. In the mid-1990s, with the advent of the support vector
machine (SVM) [10], researchers realized some limitations of artificial neural network and
the research for neural network fell into low tide period again.
Auto-encoder, known as auto-association before, is a tricky three-layered neural network
and was studied by a number of researchers in 1990s. Bourlard and Kamp [7] discussed
the relationship between auto-association by multi-layer perceptrons and singular value de-
composition in 1988. They showed that for auto-association with linear output units, the
optimal weight values could be derived by purely linear techniques relying on singular value
decomposition and low rank matrix approximation. In 1991, Kramer [18] introduced how
to conduct nonlinear principal component analysis using auto-associative neural networks
with three hidden layers. Due to the difficulty in training, deep network with multi-layer
stacked auto-encoders did not exert its superior strength for a long time. Some researchers
committed themselves to investigating several fundamental issues of neural networks with
one or two hidden layers [8] [1] [22] [23] [3] [9] [17] [25] [21] [27].

2
Until 2006, Geoffrey Hinton [15] solved this problem in Science Magazine to a large
extent and broke the stalemate that neural network was in low tide. The method was to do
layer-wise pre-training [14] to multi-layer auto-encoders using RBM [13]. That is the very
hot topic in recent years — deep learning. From then on, deep learning sweeps across the
industry and the academia like a wave. Recent research results have also demonstrated that
deep learning indeed achieves state-of-the-art performances among various areas [4] [6] [5].
Particularly, deep learning has already been successfully applied to the industry in speech
area. In the field of image analysis, there are also many good results. By building a 9-
layered locally connected sparse auto-encoder, Le et al. [24] discovered that the network was
sensitive to high-level concepts such as cat faces. Krizhevsky et al. [19] attained record-
breaking performance on the ImageNet in 2012 with a large and deep convolutional neural
network. Zeiler et al. [26] demonstrated state-of-the-art performance on Caltech-101 and
Caltech-256 datasets. Zhu et al. [28] extracted face identity-preserving features from an
image under any pose and illumination which could be used to reconstruct the face image
in canonical view by designing a deep network.
Although achieving comparable performance and widely applied, deep learning is kind
of like a “black-box” actually and there is no very sufficient and strict theoretical system to
support. So a problem is: we have impressive performance using deep learning but we don’t
know why theoretically. In deep learning, a number of researchers tend to make progress by
employing increasingly deep models and complex unsupervised learning algorithms.
This paper comes from the idea that whether auto-encoder has some kind of good prop-
erty which might accumulate when being stacked and thus contribute to the success of deep
learning. We start from a “building block” of deep learning — auto-encoder and focus on
its dimensionality reduction ability. When restricting the number of hidden layer nodes
less than the number of original input nodes in an auto-encoder, the desired dimensionality
reduction effect can be achieved.

3
Based on the above analysis, the main contributions of the paper can be summarized as
follows:
(1) We start from auto-encoder and focus on its ability to reduce the dimensionality,
trying to understand the difference between auto-encoder and state-of-the-art dimensionality
reduction methods. The results show that auto-encoder indeed learn something different
from other methods.
(2) We preliminarily investigate the influence of the number of hidden layer nodes on
the performance of auto-encoder on MNIST and Olivetti face datasets. The results reveal
its possible relation with the intrinsic dimensionality.
On the whole, we expect that analyzing the fundamental methods in deep learning, e.g.
auto-encoder, could help us understand deep learning better.

2. Auto-encoder based dimensionality reduction

In this section, we briefly introduce auto-encoder, four representative dimensionality

reduction methods and the concept of dimensionality reduction and intrinsic dimensional-
ity. Then we focus on auto-encoder’s dimensionality reduction ability, and investigate the
influence of the number of hidden layer nodes in auto-encoder.

2.1. Auto-encoder

Suppose the original input x belongs to n-dimensional space and the new representation
y belongs to m-dimensional space, an auto-encoder is a special and tricky three-layered
neural network in which we set the output hW,b (x ) = (x1e, x2e, ..., xne)T equal to the input
x = (x1 , x2 , ..., xn )T . J is the reconstruction error. It is an unsupervised learning algorithm
and uses back propagation algorithm for training.

hW,b (x ) = g(f (x )) ≈ x

J(W, b; x , y ) = 12 khW,b (x ) − y k2

4
Fig. 1 and Fig. 2 show the structure and visualization description of an auto-encoder
respectively. As shown in Fig. 2, from the first layer to the second layer amounts to an
encoder f and from the second layer to the third layer amounts to a decoder g. We then
minimize the reconstruction error J by adjusting parameters in the encoder and the decoder
to get the code.

x1 x1~

x2 y1 x2~

x3 y2 x3~
hW ,b ( x )
…

xn 1 ym xn~1

xn xn~
+1

Layer L1 Layer L2 Layer L3

Fig. 1: The structure of auto-encoder

Input Code Output

Encoder Decoder
Reconstruction
Error

Fig. 2: The visualization description of auto-encoder

5
Auto-encoder can be seen as a way to transform representation. When restricting the
number of hidden layer nodes m greater than the number of original input nodes n and
adding sparsity constraint, the result will be similar to sparse coding. When restricting the
number of hidden layer nodes m less than the number of original input nodes n, we can
get a compressed representation of the input, which actually achieves desired dimensionality
reduction effect. This paper mainly focuses on the latter case.
The following is a very quick introduction of four representative dimensionality reduction
methods [12], which are to be used to compare with auto-encoder.

PCA. Principal component analysis is a very popular linear technique for dimensionality
redection. Given a dataset on Rn , PCA aims to find a linear subspace of dimension d lower
than n which attempts to maintain most of the variability of the data.

LDA. Linear discriminant analysis is another popular linear dimensionality reduction method.
The basic idea is to ensure the samples after projection to have maximum-between-cluster-
distance and minimum-in-cluster-distance in the new subspace.

LLE. Locally linear embedding is a nonlinear approach to reduce dimensionality by com-

puting low-dimensional, neighbourhood preserving embedding of high-dimensional data. A
dataset of dimensionality n, which is assumed to lie on or near a smooth nonlinear man-
ifold of dimensionality d < n, is mapped into a single global coordinate system of lower
dimensionality d. The global nonlinear structure is recovered by locally linear fits.

Isomap. Isomap is a nonlinear generalization of classical multidimensional scaling. The

main idea is to perform multidimensional scaling, not in the input space, but in the geodesic
space of the nonlinear data manifold. The geodesic distance represent the shortest paths
along the curved surface of the manifold measured as if the surface were flat. This can be
approximated by a sequence of short steps between neighbouring sample points.

6
2.2. Dimensionality reduction ability of auto-encoder

As a kind of common rule, superficially high dimensional and complex phenomena can
actually be dominated by a small amount of simple variables in most situations. Dimen-
sionality reduction is an old and young, dynamic research topic [11] [12]. It is looking for
a projection method that maps the data from high feature space to low feature space. Di-
mensionality reduction methods in general can be divided into two categories, linear and
nonlinear.
This paper describes auto-encoder’s dimensionality reduction ability by comparing auto-
encoder with several linear and nonlinear dimensionality reduction methods in both a num-
ber of cases from two-dimensional and three-dimensional spaces for more intuitive results
and real datasets including MNIST and Olivetti face datasets. More details and results can
be seen in experiment part.

2.3. The number of hidden layer nodes in auto-encoder & intrinsic dimensionality

In the process of dimensionality reduction, discarding some dimensions inevitably leads

to the loss of information. Thus the primary problem to be solved is to keep the main
and important characteristics of the original data as much as possible. So the process of
dimensionality reduction is closely related to original data characteristics.
Intrinsic dimensionality, as an important intrinsic characteristic of high dimensional data,
can reflect the essential dimension of data well. Sample data in high dimensional space
generally can not diffuse in the whole space; they actually lie in a low dimensional manifold
embedded in high dimensional space and the dimensionality of the manifold is the intrinsic
dimensionality of the data.
This paper preliminarily investigates the influence of the number of hidden layer nodes
on the performance of auto-encoder on MNIST and Olivetti face datasets by recording the
change of performance of the classifier when the dimensionality of the projected representa-
tion varies. More details and results can be seen in experiment part.

7
3. Experiment

3.1. Experimental setup

There are mainly two parts of experiments included. The first part is conducted both on
the synthesized data and real datasets to evaluate auto-encoders dimensionality reduction
ability. The second part is conducted on real datasets to investigate the influence of the
number of hidden layer nodes on performance of auto-encoder.

3.1.1. Dimensionality reduction

In this subsection, we evaluate auto-encoder’s dimensionality reduction ability compared

with several popular dimensionality reduction methods in a number of cases. We pick PCA
and LDA as representative of linear methods, LLE and Isomap as representative of nonlinear
methods.

On synthesized data. The experiments are conducted on synthesized data in two-dimensional

and three-dimensional spaces because in these situations we can get better visualization
results for an intuitive understanding of auto-encoder. Fig. 3 shows two cases from two-
dimensional space and Fig. 4 shows three transformations of helix cases from three-dimensional
space. In these cases, we use in total five methods including auto-encoder to do the dimen-
sionality reduction, i.e.

R2 → R 1

R3 → R 1

to see how the results auto-encoder get differ from the other four dimensionality reduc-
tion methods. After the dimensionality reduction, the original points from two or three-
dimensional spaces are projected into one-dimensional space and the data obtained are
zoomed to a certain range further to have better visual quality. Results can been found in
Section 3.2.

8
5 3

2 2
4
1 1
3
0 0
2
−1 −1

1
−2 −2

0 −2 −1 0 1 2 −3
0 50 100 −2 0 2 4 6

Fig. 3: Two cases from two-dimensional space

25 50 4

20
40 3.5

15
30 3
10

20 2.5
5

0 10 2
1
20 2
1
20 2
0 0 0
0 0 0
−20 −2 −2
−1 −1 −20

Fig. 4: Three cases from three-dimensional space

On real datasets. MNIST is a classical dataset of handwritten digits with 60,000 training
images and 10,000 testing images. Every image is normalized and centralized to unified size.
Image size is 28 by 28. Olivetti face dataset contains 400 images from 40 different people,
10 images for each. Image size is 64 by 64, we first resize the image from 64*64 to 28*28
before the real process to speed up.
The images in both two datasets are grayscale images. Fig. 5 shows a number of images
from MNIST and Olivetti face datasets respectively.

9
(a) MNIST dataset (b) Olivetti face dataset

Fig. 5: Images from two datasets

Take pixel intensity as the original representation and use auto-encoder and PCA to do
the dimensionality reduction. When using auto-encoder, we visualize the function learned
by auto-encoder by weights between the first layer and the second layer to see what does
auto-encoder exactly learn. Besides, when the dimensionality of the transformed new rep-
resentation is 2 or 3, i.e., when the original image is projected to two or three-dimensional
spaces, we can visualize the comparison results between auto-encoder and PCA. Results can
been found in Section 3.2.

3.1.2. Influence of the number of hidden layer nodes

In this subsection, we preliminarily investigate the influence of the number of hidden

layer nodes on the performance of auto-encoder on MNIST and Olivetti face datasets.
For MNIST dataset, every image has a total of 28 * 28 = 784 pixels and the intrinsic
dimensionality is known to be 10. Use auto-encoder to reduce the dimensionality from 784
to s, i.e.

R784 → Rs

and do the classification with softmax classifier. Record the change of performance of the
classifier when the dimensionality of the new representation s varies.
The same goes for the Olivetti face dataset. Results can been found in Section 3.3.
10
3.2. Dimensionality reduction

3.2.1. On synthesized data from R2 to R1

In this subsection, we implement five methods within two different cases in two-dimensional
space to see how auto-encoder differs from other dimensionality reduction methods.

The first case. In the first and simplest logarithmic curve case, all methods do a good job.
Results of PCA, LDA and Isomap are stable, as shown in Fig. 6. Results of LLE and auto-
encoder are relatively not that stable, which means when you run the program twice you
may get two different results. Two examples of LLE and auto-encoder are shown in Fig. 7.
Although the results are not stable, most of the results can be seen as “acceptable”, which
means that the points after dimensionality reduction are monotonic increasing or decreasing,
thus keeping the geometry characteristic of the original points.
5 5 5

4.5 4.5 4.5

4 4 4

3.5 3.5 3.5

3 3 3

2.5 2.5 2.5

2 2 2

1.5 1.5 1.5

1 1 1

0.5 0.5 0.5

0 0 0
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100

(a) PCA (b) LDA (c) Isomap

Fig. 6: Three stable results in the logarithmic curve case

5 5

4.5 4.5

4 4

3.5 3.5

3 3

2.5 2.5

2 2

1.5 1.5

1 1

0.5 0.5

0 0
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100

(a) LLE (b) Auto-encoder

Fig. 7: Two unstable results in the logarithmic curve case

11
In this case, it seems that auto-encoder performs no different from other four methods,
but go a little more deeply, you will find some difference.
As can be seen from Fig. 6 and Fig. 7, there are 100 two-dimensional points (x1 , x2 , ...x100 )
in the logarithmic curve indicated by red ‘+’ ; we use five dimensionality reduction methods
to reduce the dimensionality and project the original points to 100 one-dimensional points
(y1 , y2 , ...y100 ) on the horizontal axis indicated by black dots.
Use dis to represent the distance between two points and use dif to denote the difference
values between each two adjacent points, i.e.,

dif (i) = yi+1 − yi = dis(yi , yi+1 ) ⇐⇒ dis(xi , xi+1 )

Since we want to evaluate the quality of the dimensionality reduction, it is appropriate to

use dis(yi , yi+1 ) to measure dis(xi , xi+1 ). dif is a unary array variable and has 99 values in
it. We get five dif s for five methods, and draw a bar graph as shown in Fig. 8.
16

0
PCA LDA LLE Isomap Auto−encoder

Fig. 8: Dif of five methods in the logarithmic curve case

From Fig. 8, we can find the following points: (1) For PCA, LLE and Isomap three
methods, dif is close to a constant. Since we use new projection points to represent the
12
original points, this means in these three methods, dis(x1 , x2 ), dis(x50 , x51 ) and dis(x99 , x100 )
are almost the same. That, however, is not the case. The distances between each two
adjacent points in the logarithmic curve shouldn’t be regarded as the same. It is more
obvious for dis(x1 , x2 ) and dis(x99 , x100 ). (2) For LDA, dif is monotonic decreasing which
means dis(x1 , x2 ) is much larger than dis(x99 , x100 ). (3) For auto-encoder, dif has the
tendency of ascending first and descending in succession which means dis(x1 , x2 ) is nearly
the same with dis(x99 , x100 ), while dis(x50 , x51 ) is a little larger than them.
It is hard to say which method is better or which result is more accurate. It depends on
the way you think. The first twenty points increase slowly in the horizontal axis direction but
quickly in the vertical axis direction, while the last twenty points are the opposite. Relatively,
the middle points just fall in between. From this perspective, auto-encoder seems to do a
better projection than other four methods and has a better local distance preserving nature.

The second case. In the second concentric circles case, results of PCA, LDA, LLE and
Isomap are the same, as shown in Fig. 9(a). This is a good result, since four symmetric
points are projected into the same point. Thus, in this case, these four methods all do a
good job. However the result of auto-encoder is relatively different, as shown in Fig. 9(b).
It seems like a bad and confusing result at first sight, but again go in deeper and try to
change an angle to consider this problem, you will find something.

2 2

1.5 1.5

1 1

0.5 0.5

0 0

−0.5 −0.5

−1 −1

−1.5 −1.5

−2 −2

−2.5 −2.5

−3 −3
−2 −1 0 1 2 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5

(a) PCA (b) Auto-encoder

Fig. 9: Two results in the concentric circles case

13
2 2

1.5 1.5

1 1

0.5 0.5

0 0

−0.5 −0.5

−1 −1

−1.5 −1.5

−2 −2

−2.5 −2.5

−3 −3
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5

(a) (b)

Fig. 10: Result of auto-encoder in the concentric circles case

Let’s move to Fig. 10. In Fig. 10(a), there are four points in the concentric circles
which are mapped to the maximum and minimum values of each circle after dimensionality
reduction, indicated by pink and cyan lines respectively, and the tangents of these four points
are parallel. In Fig. 10(b), there are two groups of symmetrical points which happen to be
mapped to the same value zero, indicated by green lines. Auto-encoder maps corresponding
points within two close concentric circles into almost the same point. Every half a circle
corresponds to a “minimum-to-maximum” cycle. In other words, the concentric circles
shown in Fig. 10 can be seen as made up of four semicircles, or four repetitive structures
and auto-encoder maps four repetitive semicircles to almost the same range.
We can do a little bit mathematical deduction here, which is simple enough to explain
why a single point is projected to the maximum value or zero. Use (x1 , x2 ) to denote the
original point, y to denote the projected point, (w1 , w2 ) to denote the trained weights and b
to denote the trained bias. Consider the encoder only, we can have the following.

y = sigmoid(w1 ∗ x1 + w2 ∗ x2 + b)

By analyzing the trained weights (w1 , w2 ) and original points (xmax1 , xmax2 ), (xmin1 , xmin2 )
projected to the maximum and the minimum value, we find two equations: (1) w1 = −w2 ;
(2) xmin1 = xmax2 , xmin2 = xmax1 .

14
Plug these equations into the formula above, we can have:

ymax = sigmoid(w1 ∗ xmax1 + w2 ∗ xmax2 + b)

= sigmoid(w1 ∗ (xmax1 − xmax2 ) + b)

ymin = sigmoid(w1 ∗ xmin1 + w2 ∗ xmin2 + b)

= sigmoid(w1 ∗ (xmax2 − xmax1 ) + b)

When (xmax1 − xmax2 ) achieves the maximum value, (xmax2 − xmax1 ) achieves the minimum
value at the same time. To reach this, xmax1 should be the maximum and xmax2 should be
the minimum among the coordinates of the original points, which is true.

3.2.2. On synthesized data from R3 to R1

In this subsection, we implement PCA and auto-encoder within three transformations of

helix cases in three-dimensional spaces.
In these three cases, auto-encoder is always doing a good job. Results are shown in
Fig. 11. A similar analysis of dif of the circular helix can be found in Fig. 12. From the
results we can see, in some cases, auto-encoder not only reduces dimensionality, but can
also detect repetitive structures. From the viewpoint of information theory, compressing the
data dimensionality is to find redundancy while repetitive structure is a kind of redundancy.
We believe this is a good property for many applications.

50
15 4

40
3.5
10
30 3

5
20 2.5

0 10 2
1 20
0.5 1 2
10 20 3
0.5 2
0 0 10 0 1
0 −10 0 0
−0.5 −0.5 −10 −1
−20 −2 −2
−1 −20
−1 −3

(a) Circular helix (b) Conical helix (c) Planar helix

Fig. 11: Three results in three helix cases

15
0.025

0.02

0.015

0.01

0.005

0
PCA LDA LLE Isomap Auto−encoder

Fig. 12: Dif of five methods in the circular helix case

3.2.3. On real datasets from R784 to Rs

In this subsection, we implement PCA and auto-encoder within real datasets — MNIST
and Olivetti face dataset to see how auto-encoder differs from PCA and give visualization
results.

Visualize the function learned by auto-encoder. In this case, we want to see what does auto-
encoder exactly learn after training. For MNIST dataset, we set s equals to 2, 10 and 300
to visualize the function learned by auto-encoder by weights between the first layer and the
second layer. For Olivetti face dataset, we resize the image from 64*64 to 28*28 and then
set s equals to 10, 40 and 300 for visualization. Results are shown in Fig. 13 and Fig.
14. Obviously, the features auto-encoder learned, e.g. strokes and faces, are important for
computer vision tasks.

16
(a) s = 2 (b) s = 10 (c) s = 300

Fig. 13: Visualization of MNIST dataset

(a) s = 10 (b) s = 40 (c) s = 300

Fig. 14: Visualization of Olivetti face dataset

Visualize the comparison results between auto-encoder and PCA. In this case, we set s equals
to 2 and 3 and use auto-encoder and PCA to reduce the dimensionality respectively in
MNIST dataset. Then we visualize the comparison results between auto-encoder and PCA.
Results are shown in Fig. 15, Fig. 16 and Fig. 17.
Fig. 15 shows when s = 2, every training example is projected to a point in two-
dimensional space using auto-encoder and PCA respectively. All training examples of ten
classes are denoted in different colors. Same goes for Fig. 16, except that s = 3.

17
Fig. 17 shows training examples in every single class. Fig. 17(a) displays the visualization
of projected points of digit 1 using auto-encoder, while Fig. 17(g) displays the visualization
of digit 1 using PCA. From the results, we can see that in this case auto-encoder performs
better than PCA. Compared to PCA, auto-encoder tends to project images of the same
class to edges and corners, which can lead to preferable results than PCA.

1 6

0.9
4
0.8

0.7
2
0.6

0.5 0

0.4
−2
0.3

0.2
−4
0.1

0 −6
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −5 0 5 10

(a) Auto-encoder (b) PCA

Fig. 15: Dimensionality reduction: R784 → R2

0
0.8

0.7

0.6
0.1
0.2
0.5 −2
0.3 0.4
0.4
0.5 0.3
0.6 0.2
0.7 −5
0.8 0.1
−4
0.9
1 0 0

5
−6
10
(a) Auto-encoder (b) PCA

Fig. 16: Dimensionality reduction: R784 → R3

18
Dimensionality
reduction
1 1 1

0.9 0.9 0.9

0.8 0.8 0.8

0.7 0.7 0.7

0.6 0.6 0.6

0.5 0.5 0.5

0.4 0.4 0.4

0.3 0.3 0.3

0.2 0.2 0.2

0.1 0.1 0.1

0 0 0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

(a) Auto-encoder for digit 1 (b) Auto-encoder for digit 4 (c) Auto-encoder for digit 5
1 1 0.8

0.9 0.9
0.7
0.8 0.8
0.6
0.7 0.7

0.6 0.5
0.6

0.5 0.5 0.4

0.4 0.4
0.3
0.3 0.3
0.2
0.2 0.2
0.1
0.1 0.1

0 0 0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

(d) Auto-encoder for digit 8 (e) Auto-encoder for digit 9 (f) Auto-encoder for digit 10
5 3 5

2 4
4

1 3
3
0 2

2 −1 1

1 −2 0

−3 −1
0
−4 −2

−1
−5 −3

−2 −6 −4
−5 −4 −3 −2 −1 0 1 2 −4 −3 −2 −1 0 1 2 3 4 5 6 −4 −3 −2 −1 0 1 2 3 4 5 6

(g) PCA for digit 1 (h) PCA for digit 4 (i) PCA for digit 5
4 3 5

3 2 4

2 1 3

1 0 2

0 −1 1

−1 −2 0

−2 −3 −1

−3 −4 −2

−4 −5 −3

−5 −6 −4
−4 −2 0 2 4 6 8 10 −4 −3 −2 −1 0 1 2 3 4 5 6 −4 −2 0 2 4 6 8 10

(j) PCA for digit 8 (k) PCA for digit 9 (l) PCA for digit 10

Fig. 17: Dimensionality reduction: R784 → R2

19
3.3. Influence of the number of hidden layer nodes

3.3.1. MNIST dataset

Using auto-encoder to reduce the dimensionality from 784 to s, we record the change of
softmax classifier performance when the dimensionality s varies.
For every dimensionality, we run the program for five times and get the average result.
Results are shown in Fig. 18. Blue “*” represents average result of auto-encoder before
finetuning and blue “.” represents average result of auto-encoder after finetuning.
From Fig. 18, the accuracy rises at first, goes down later and then starts to level off,
rising slowly at the same time. There is an observation that when the dimensionality s is
near 10, just equal to the intrinsic dimensionality of original data, accuracy is relatively high.
This might reveals a possible relation between the number of hidden layer nodes and the
intrinsic dimensionality of original data and gives us some advice on choosing the number
of hidden layer nodes.

0.9

0.8

0.7
Accuracy

0.6

0.5

0.4

0.3

Ave before finetuning

0.2
Ave after finetuning

0.1
0 100 200 300 400 500 600 700 800
Dimensionality

Fig. 18: Softmax classifier accuracy with the dimensionality from 1 to 775 in MNIST dataset

20
In this case, we run the program for fifteen times when the dimensionality of the new
representation is between 5 to 15 (near 10) to get more stable results. Red “*” represents
maximum result of auto-encoder before finetuning and red “.” represents maximum result
of auto-encoder after finetuning. Results are shown in Fig. 19. We can see that when the
dimensionality s is near 10, corresponding accuracy is basically over 50%, which is higher
than neighbours.

0.9

0.8

0.7

0.6
Accuracy

0.5

0.4

0.3

Max before finetuning

0.2
Max after finetuning
Ave before finetuning
0.1
Ave after finetuning
0
4 6 8 10 12 14 16
Dimensionality

Fig. 19: Softmax classifier accuracy with the dimensionality from 5 to 15 in MNIST dataset

3.3.2. Olivetti face dataset

First resize the image from 64 ∗ 64 to 28 ∗ 28, and the following steps are the same as
above. Results are shown in Fig. 20.
However, we don’t observe the same phenomenon as in MNIST dataset. A possible
reason maybe MNIST dataset is simple, and a small number of hidden layer nodes are
enough to model the data. When we set the number of hidden layer nodes equal to the
intrinsic dimensionality, hidden layer nodes may fit the structure of original data better. So
that we get almost the best result when s equals to 10 in MNIST dataset.
21
While the Olivetti face dataset is much more complicated than MNIST, and a small
number of hidden layer nodes are not able to model the data. That’s maybe why in this
case the accuracy is just rising along the dimensionality s.

0.9

0.8

0.7

0.6
Accuracy

0.5

0.4

0.3

Ave before finetuning

0.2
Ave after finetuning
Max before finetuning
0.1
Max after finetuning
0
0 10 20 30 40 50 60 70 80 90 100
Dimensionality

Fig. 20: Softmax classifier accuracy with the dimensionality from 1 to 100 in Olivetti face dataset

4. Conclusion

This paper starts from auto-encoder and focuses on its ability to reduce the dimension-
ality. Firstly, experiments show that the results of auto-encoder indeed differ from other
dimensionality reduction methods. In some cases, auto-encoder not only reduces dimen-
sionality, but can also detect repetitive structures. We believe that this is a good property
for many applications. Maybe in situations with repetitive structures, auto-encoder is more
suitable. Secondly, the paper preliminarily investigates the influence of the number of hidden
layer nodes on performance of auto-encoder. For classification on MNIST dataset without
finetuning, the best results are achieved when the number of hidden layer nodes is set around
the intrinsic dimensionality of the dataset. This might reveal the possible relation between
the number of hidden layer nodes and the intrinsic dimensionality of the dataset.
22
References

[1] Sartori M. A. and Antsaklis P. J. A simple method to derive bounds on the size and to train multilayer
neural networks. IEEE Transactions on Neural Networks, 2(4):467–471, 1991.
[2] M. Arbib. Review of ’Perceptrons: An introduction to computational geometry’. IEEE Transactions
on Information Theory, 15(6):738–739, 1969.
[3] Huang G. B. Learning capability and storage capacity of two-hidden-layer feedforward networks. IEEE
Transactions on Neural Networks, 14(2):274–281, 2003.
[4] Yoshua Bengio. Learning deep architectures for AI. Foundations and trends R in Machine Learning,
2(1):1–127, 2009.
[5] Yoshua Bengio. Deep learning of representations: Looking forward. In Statistical Language and Speech
Processing, pages 1–37. 2013.
[6] Yoshua Bengio, Aaron Courville, and Pascal Vincent. Unsupervised feature learning and deep learning:
A review and new perspectives. CoRR, 2012.
[7] H. Bourlard and Y. Kamp. Auto-association by multilayer perceptrons and singular value decomposi-
tion. Biological Cybernetics, 59(4-5):291–294, 1988.
[8] Huang S. C. and Huang Y. F. Bounds on the number of hidden neurons in multilayer perceptrons.
IEEE Transactions on Neural Networks, 2(1):47–55, 1991.
[9] Xiang C., Ding S. Q., and Lee T. H. Geometrical interpretation and architecture selection of MLP.
IEEE Transactions on Neural Networks, 16(1):84–96, 2005.
[10] Corinna Cortes and Vladimir Vapnik. Support-vector networks. In Machine Learning, pages 273–297,
1995.
[11] David Demers and Garrison Cottrell. Non-linear dimensionality reduction. In Advances in Neural
Information Processing Systems, pages 580–587, 1993.
[12] Ali Ghodsi. Dimensionality reduction a short tutorial. Department of Statistics and Actuarial Science,
2006.
[13] Geoffrey Hinton. A practical guide to training restricted boltzmann machines. In Neural Networks:
Tricks of the Trade, pages 599–619. 2012.
[14] Geoffrey Hinton and Simon Osindero. A fast learning algorithm for deep belief nets. Neural Computa-
tion, 18:2006, 2006.
[15] Geoffrey Hinton and R. Salakhutdinov. Reducing the dimensionality of data with neural networks.
Science, 313(5786):504–507, 2006.
[16] J. J. Hopfield and D. W. Tank. ”Neural” computation of decisions in optimization problems. Biological

23
Cybernetics, 52(3):141–152, 1985.
[17] Wang J., He H. B., and Prokhorov D. V. A folded neural network autoencoder for dimensionality
reduction. Procedia Computer Science, 13:120–127, 2012.
[18] Mark A. Kramer. Nonlinear principal component analysis using autoassociative neural networks. AIChE
Journal, 37(2):233–243, 1991.
[19] Alex Krizhevsky, Sutskever Ilya, and Geoffrey Hinton. Imagenet classification with deep convolutional
neural networks. In Advances in Neural Information Processing Systems, pages 1097–1105. 2012.
[20] Marvin Minsky and Seymour Papert. Perceptrons: an introduction to computational geometry (ex-
panded edition), 1988.
[21] Japkowicz Nathalie, Hanson S., and Gluck M. Nonlinear autoassociation is not equivalent to PCA.
Neural computation, 12(3):531–545, 2000.
[22] Tamura S. Capabilities of a three layer feedforward neural network. In IEEE International Joint
Conference on Neural Networks, pages 2757–2762, 1991.
[23] Tamura S. and Tateishi M. Capabilities of a four-layered feedforward neural network: four layers versus
three. IEEE Transactions on Neural Network, 8(2):251–255, 1997.
[24] Le Q. V. Building high-level features using large scale unsupervised learning. In IEEE International
Conference on Acoustics, Speech and Signal Processing, pages 8595–8598, 2013.
[25] Wang Wei, Huang Yan, Wang Yizhou, and Wang Liang. Generalized autoencoder: A neural net-
work framework for dimensionality reduction. In IEEE Conference on Computer Vision and Pattern
Recognition Workshops, pages 496–503, 2014.
[26] Matthew D. Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. CoRR,
2013.
[27] Hu Zhaohua and Song Yaoliang. Dimensionality reduction and reconstruction of data based on autoen-
coder network. Journal of Electronics & Information Technology, 31(5):1189–1192, 2009.
[28] Zhenyao Zhu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning identity-preserving face
space. In IEEE International Conference on Computer Vision, pages 113–120, 2013.

Autoencoders - Presentation
No ratings yet
Autoencoders - Presentation
18 pages
Mini Wind Turbine Power Generation Project
100% (4)
Mini Wind Turbine Power Generation Project
32 pages
Introduction To Autoencoders: A Brief Overview
No ratings yet
Introduction To Autoencoders: A Brief Overview
27 pages
Deep Learning: Prof:Naveen Ghorpade
No ratings yet
Deep Learning: Prof:Naveen Ghorpade
43 pages
Deep Learning Autoencoders
No ratings yet
Deep Learning Autoencoders
31 pages
Deeplearning Seminar
No ratings yet
Deeplearning Seminar
9 pages
Dimension Reduction On Open Data Using Variational Autoencoder - Hu2014
No ratings yet
Dimension Reduction On Open Data Using Variational Autoencoder - Hu2014
4 pages
Lecture 14 Autoencoders
No ratings yet
Lecture 14 Autoencoders
39 pages
Autoencoder - Unit 4
No ratings yet
Autoencoder - Unit 4
39 pages
SCSA3015 Deep Learning Unit 3
100% (1)
SCSA3015 Deep Learning Unit 3
23 pages
Autoencoder_GAN_edited
No ratings yet
Autoencoder_GAN_edited
138 pages
Chapter17 Autoencoders
No ratings yet
Chapter17 Autoencoders
23 pages
Autoencoders, Unsupervised Learning, and Deep Architectures
No ratings yet
Autoencoders, Unsupervised Learning, and Deep Architectures
14 pages
Autoencoders: Presented By: 2019220013 Balde Lansana (
No ratings yet
Autoencoders: Presented By: 2019220013 Balde Lansana (
21 pages
Jntuk r20 Unit-V Deep Learning Techniques (WWW - Jntumaterials.co - In)
No ratings yet
Jntuk r20 Unit-V Deep Learning Techniques (WWW - Jntumaterials.co - In)
61 pages
Deep Learning: Autoencoder
No ratings yet
Deep Learning: Autoencoder
42 pages
D5_PPT
No ratings yet
D5_PPT
79 pages
Unit-5 Auto Encoders in Deep Learning
No ratings yet
Unit-5 Auto Encoders in Deep Learning
23 pages
DL Unit 5
No ratings yet
DL Unit 5
19 pages
Ch3-Auto-encoder
No ratings yet
Ch3-Auto-encoder
40 pages
Introduction To Data Learning Autoencoders
No ratings yet
Introduction To Data Learning Autoencoders
8 pages
Lecture_6373_07
No ratings yet
Lecture_6373_07
53 pages
Autoencoders
No ratings yet
Autoencoders
4 pages
Neural Network Unsupervised Machine Learning: What Are Autoencoders?
No ratings yet
Neural Network Unsupervised Machine Learning: What Are Autoencoders?
22 pages
Neural Network Unsupervised Machine Learning: What Are Autoencoders?
No ratings yet
Neural Network Unsupervised Machine Learning: What Are Autoencoders?
22 pages
UNIT 3
No ratings yet
UNIT 3
23 pages
auto_encoder
No ratings yet
auto_encoder
16 pages
Lecture 2.3.2VariationalAutoencoders (VAEs)
No ratings yet
Lecture 2.3.2VariationalAutoencoders (VAEs)
25 pages
Generative_Models
No ratings yet
Generative_Models
65 pages
Face Recognition Based On Deep Autoencoder Networks With Dropout
No ratings yet
Face Recognition Based On Deep Autoencoder Networks With Dropout
4 pages
Auto Encoder S
No ratings yet
Auto Encoder S
32 pages
Unit 3
No ratings yet
Unit 3
39 pages
Chapter 7 - Autoencoders
No ratings yet
Chapter 7 - Autoencoders
91 pages
Stacked Autoencoders. | Towards Data Science
No ratings yet
Stacked Autoencoders. | Towards Data Science
9 pages
module 03
No ratings yet
module 03
13 pages
Lecture 2.3.1 - Autoencoders
No ratings yet
Lecture 2.3.1 - Autoencoders
6 pages
Unit 5
No ratings yet
Unit 5
23 pages
Autoencoder For Neuroimage: Abstract. Variational Autoencoder (Vae) As A Class of Neural Networks
No ratings yet
Autoencoder For Neuroimage: Abstract. Variational Autoencoder (Vae) As A Class of Neural Networks
7 pages
368bb2df9cfd25b100b4e1d96143f6da1c53091d-1746848322051
No ratings yet
368bb2df9cfd25b100b4e1d96143f6da1c53091d-1746848322051
7 pages
Autoencoders
No ratings yet
Autoencoders
12 pages
MODULE 5 Auto-Encoders and Generative Models
No ratings yet
MODULE 5 Auto-Encoders and Generative Models
25 pages
Chap 6 Embedding
No ratings yet
Chap 6 Embedding
44 pages
DAC
No ratings yet
DAC
12 pages
L23_autoencoders
No ratings yet
L23_autoencoders
16 pages
Lecture 23b Auto Encoder
No ratings yet
Lecture 23b Auto Encoder
27 pages
Autoencoder
No ratings yet
Autoencoder
24 pages
Building Autoencoders in Keras
No ratings yet
Building Autoencoders in Keras
17 pages
UNSUPERVISED DEEP LEARNING-UNIT 4
No ratings yet
UNSUPERVISED DEEP LEARNING-UNIT 4
26 pages
Autoencoders - Bits and Bytes of Deep Learning - Towards Data Science
No ratings yet
Autoencoders - Bits and Bytes of Deep Learning - Towards Data Science
10 pages
A Mini Project Report on Autoencoders (1)
No ratings yet
A Mini Project Report on Autoencoders (1)
39 pages
Autoencoder
No ratings yet
Autoencoder
14 pages
Autoencoders
No ratings yet
Autoencoders
14 pages
Vae Gan
No ratings yet
Vae Gan
214 pages
Unit II
No ratings yet
Unit II
35 pages
What Are Autoencoders?
No ratings yet
What Are Autoencoders?
75 pages
UNIT V
No ratings yet
UNIT V
32 pages
ch14 Autoencoder
No ratings yet
ch14 Autoencoder
42 pages
35-Gated RNNs - Optimization For Long-Term Dependencies - Explicit Memory-07!10!2024
No ratings yet
35-Gated RNNs - Optimization For Long-Term Dependencies - Explicit Memory-07!10!2024
3 pages
DeepLearning Unit IV Notes
No ratings yet
DeepLearning Unit IV Notes
58 pages
Unit-V Deep Learning Techniques
100% (1)
Unit-V Deep Learning Techniques
31 pages
Math for Deep Learning: What You Need to Know to Understand Neural Networks
From Everand
Math for Deep Learning: What You Need to Know to Understand Neural Networks
Ronald T. Kneusel
No ratings yet
B12.Attention - Transformer - Bert
No ratings yet
B12.Attention - Transformer - Bert
27 pages
Spectral Clustering Via Ensemble Deep Autoencoder
No ratings yet
Spectral Clustering Via Ensemble Deep Autoencoder
33 pages
Luecken Theis 2019 Current Best Practices in Single Cell Rna Seq Analysis A Tutorial
No ratings yet
Luecken Theis 2019 Current Best Practices in Single Cell Rna Seq Analysis A Tutorial
23 pages
Introduction To VAE
No ratings yet
Introduction To VAE
5 pages
1 Linh
No ratings yet
1 Linh
9 pages
Application of Palm-Based Interesterified Fats
No ratings yet
Application of Palm-Based Interesterified Fats
7 pages
13 - Technical Info - Handling - Abbreviation
No ratings yet
13 - Technical Info - Handling - Abbreviation
24 pages
Forecasting & Inventory Control PDF
No ratings yet
Forecasting & Inventory Control PDF
260 pages
Faisla Kun Munazra - AKA - The Decisive Debate.
No ratings yet
Faisla Kun Munazra - AKA - The Decisive Debate.
4 pages
Saad Khalid (FA16-BCE-087) PCS LAB Report #9 To Explore Different Line Coding Schemes
No ratings yet
Saad Khalid (FA16-BCE-087) PCS LAB Report #9 To Explore Different Line Coding Schemes
3 pages
Phase Diagrams: Iron - Carbon (Fe - Fe C) System
No ratings yet
Phase Diagrams: Iron - Carbon (Fe - Fe C) System
33 pages
10.213 Chemical Engineering Thermodynamics Spring 2002 Problem Set F
No ratings yet
10.213 Chemical Engineering Thermodynamics Spring 2002 Problem Set F
2 pages
Sample Interview Questions For Network Engineers
No ratings yet
Sample Interview Questions For Network Engineers
5 pages
Cls Jeead-15-16 Xii Che Target-5 Set-2 Chapter-3
67% (6)
Cls Jeead-15-16 Xii Che Target-5 Set-2 Chapter-3
44 pages
Akn Series: Metal Bellows Coupling
No ratings yet
Akn Series: Metal Bellows Coupling
1 page
4 5841431362539620525
No ratings yet
4 5841431362539620525
84 pages
45 Acta15 ABS 2 2014
No ratings yet
45 Acta15 ABS 2 2014
8 pages
Skripsi Tanpa Bab Pembahasan
No ratings yet
Skripsi Tanpa Bab Pembahasan
75 pages
Infineon IPP023N10N5 DS v02 - 03 EN
No ratings yet
Infineon IPP023N10N5 DS v02 - 03 EN
11 pages
Jet Grouting Process (Keller, 2006)
100% (2)
Jet Grouting Process (Keller, 2006)
12 pages
Fake Reviews Detection Based On LDA: Shaohua Jia Xianguo Zhang, Xinyue Wang, Yang Liu
No ratings yet
Fake Reviews Detection Based On LDA: Shaohua Jia Xianguo Zhang, Xinyue Wang, Yang Liu
4 pages
Fat Trafo
100% (1)
Fat Trafo
3 pages
A Low-Cost On-Chip Built-In Self-Test Solution For
No ratings yet
A Low-Cost On-Chip Built-In Self-Test Solution For
12 pages
DDT v2 Texture File Format
No ratings yet
DDT v2 Texture File Format
3 pages
Swimming Pool Construction
100% (4)
Swimming Pool Construction
48 pages
Engineering Standard: Flare Tip & Flare Ignition Package Design, Manufacture, Inspection and Testing Requirements
No ratings yet
Engineering Standard: Flare Tip & Flare Ignition Package Design, Manufacture, Inspection and Testing Requirements
9 pages
SHARC Processor Hardware Reference Manual
No ratings yet
SHARC Processor Hardware Reference Manual
1,276 pages
17.5 - Density and Pressure 2p - Edexcel Igcse Physics QP
No ratings yet
17.5 - Density and Pressure 2p - Edexcel Igcse Physics QP
11 pages
How WebKit Works
No ratings yet
How WebKit Works
16 pages
Amir Ruplal - SEP Grade 9 Mathematics Week 1 Lesson 1 &2
No ratings yet
Amir Ruplal - SEP Grade 9 Mathematics Week 1 Lesson 1 &2
6 pages
Hook Inspection
No ratings yet
Hook Inspection
4 pages
OS Gate CheatSheet
No ratings yet
OS Gate CheatSheet
2 pages
UI/UX Presentation5
No ratings yet
UI/UX Presentation5
64 pages

Auto-Encoder Based Dimensionality Reduction

Uploaded by

Auto-Encoder Based Dimensionality Reduction

Uploaded by

Author’s Accepted Manuscript

Auto-Encoder Based Dimensionality Reduction

Yasi Wang, Hongxun Yao, Sicheng Zhao

Auto-encoder — a tricky three-layered neural network, known as auto-association before,

Preprint submitted to Neurocomputing November 24, 2015

Artificial neural network is a mathematical and computational model composed of a

2. Auto-encoder based dimensionality reduction

In this section, we briefly introduce auto-encoder, four representative dimensionality

Layer L1 Layer L2 Layer L3

Fig. 1: The structure of auto-encoder

Input Code Output

Fig. 2: The visualization description of auto-encoder

LLE. Locally linear embedding is a nonlinear approach to reduce dimensionality by com-

Isomap. Isomap is a nonlinear generalization of classical multidimensional scaling. The

In the process of dimensionality reduction, discarding some dimensions inevitably leads

3.1. Experimental setup

3.1.1. Dimensionality reduction

In this subsection, we evaluate auto-encoder’s dimensionality reduction ability compared

On synthesized data. The experiments are conducted on synthesized data in two-dimensional

Fig. 3: Two cases from two-dimensional space

Fig. 4: Three cases from three-dimensional space

Fig. 5: Images from two datasets

3.1.2. Influence of the number of hidden layer nodes

In this subsection, we preliminarily investigate the influence of the number of hidden

3.2.1. On synthesized data from R2 to R1

4.5 4.5 4.5

3.5 3.5 3.5

2.5 2.5 2.5

1.5 1.5 1.5

0.5 0.5 0.5

(a) PCA (b) LDA (c) Isomap

Fig. 6: Three stable results in the logarithmic curve case

(a) LLE (b) Auto-encoder

Fig. 7: Two unstable results in the logarithmic curve case

dif (i) = yi+1 − yi = dis(yi , yi+1 ) ⇐⇒ dis(xi , xi+1 )

Since we want to evaluate the quality of the dimensionality reduction, it is appropriate to

Fig. 8: Dif of five methods in the logarithmic curve case

(a) PCA (b) Auto-encoder

Fig. 9: Two results in the concentric circles case

Fig. 10: Result of auto-encoder in the concentric circles case

ymax = sigmoid(w1 ∗ xmax1 + w2 ∗ xmax2 + b)

= sigmoid(w1 ∗ (xmax1 − xmax2 ) + b)

ymin = sigmoid(w1 ∗ xmin1 + w2 ∗ xmin2 + b)

= sigmoid(w1 ∗ (xmax2 − xmax1 ) + b)

3.2.2. On synthesized data from R3 to R1

In this subsection, we implement PCA and auto-encoder within three transformations of

(a) Circular helix (b) Conical helix (c) Planar helix

Fig. 11: Three results in three helix cases

Fig. 12: Dif of five methods in the circular helix case

3.2.3. On real datasets from R784 to Rs

Fig. 13: Visualization of MNIST dataset

(a) s = 10 (b) s = 40 (c) s = 300

Fig. 14: Visualization of Olivetti face dataset

(a) Auto-encoder (b) PCA

Fig. 15: Dimensionality reduction: R784 → R2

Fig. 16: Dimensionality reduction: R784 → R3

0.9 0.9 0.9

0.8 0.8 0.8

0.7 0.7 0.7

0.6 0.6 0.6

0.5 0.5 0.5

0.4 0.4 0.4

0.3 0.3 0.3

0.2 0.2 0.2

0.1 0.1 0.1

0.5 0.5 0.4

Fig. 17: Dimensionality reduction: R784 → R2

3.3.1. MNIST dataset

Ave before finetuning

Max before finetuning

3.3.2. Olivetti face dataset

Ave before finetuning

You might also like