Auto Encoder S
Auto Encoder S
2
f = ax
Structure: square of ‘x’
Parameter: {a}
bx 2
f = ae + cx + dx + ksin(x)
Structure: Exponential+Polynomial+Trigonometric
Parameters: {a,b,c,d,k}
A function:
x∈ℝ n
f f(x) ∈ ℝ m
P: w Weights
n
Let us ask: If ‘y’ = ‘x’, what is the meaning of ‘f’? y=x∈ℝ
n
x∈ℝ n y=x∈ℝ
The function does not give any ‘insight’.
n
x∈ℝ n y=x∈ℝ
n n
x∈ℝ y=x∈ℝ
x∈ℝ n n
y=x∈ℝ
q
h∈ℝ n
x∈ℝn g(S1; P1) f(S2; P2) y=x∈ℝ
How to ‘learn?
ff
Dataset: M unlabeled observations
x1, x2, …, xM
q
g(S1; P1) hi ∈ ℝ f(S2; P2)
n
xi ∈ ℝn x̃i ∈ ℝ
Latent
representation
Encoder Decoder
Loss Function
2
argmin
f, g [ | xi − x̃i | ]
“Bottleneck” is another
g f popular name.
When q is very small:
h (N,q,N)
𝔼
xi x̃i
We can bring other techniques to the loss function and get
variants of autoencoders.
Add l2 regularization term,
2 +λ
L = argmin
f, g [ | xi − x̃i | ] ∑
2
θk
k
Or add l1 regularization term,
2 +λ
L = argmin
f, g [ | xi − x̃i | ] ∑
| θk |
k
M n
1
∑ ∑
L=− [xj,i logx̃j,i + (1 − xj,i) log(1 − x̃j,i)]
M i=1 j=1
(784,8,784)
(784,16,784)
(784,64,784)
fi
Original image
10 random digits
Feed-forward Autoencoder
(784,8,784)
Feed-forward Autoencoder
(784,16,784)
Feed-forward Autoencoder
(784,64,784)
Loss function used: Binary cross entropy
(See ‘References’ for the links to codes)
Original image
10 random digits
Feed-forward Autoencoder
(784,16,784) with
Binary Cross Entropy
loss function
Feed-forward Autoencoder
(784,16,784) with
Mean square error loss
function
Both loss functions worked well in this case!
(N,q,N) autoencoder creates a latent representation using only
‘q’ numbers. Dimensionality reduction!!!
Feed forward AE
Non-linear transformation of features. Why?
This can handle large amount of data e ciently as the training
can be done in batches.
But… ffi
It has been shown that a Feed-forward AE is equivalent to PCA
when:
M( M k=1 )
1 1
∑
x̃i,j = xi,j − xk,j
Classi cation
Original Data
96.4% ~1000 sec.
784
xi ∈ ℝ
Latent Representation
89% ~1 sec.
8
g(xi) ∈ ℝ
Minor degradation in accuracy but huge advantage!!
fi
fi
Autoencoder as Anomaly Detector
Consider a ‘shoe image’ is added to MNIST dataset.
Can AE detect that?
Variational Autoencoder
Encoder Decoder
r
g(S1; P1)
Statistics of the
data such as hi ∈ ℝ f(S2; P2)
mean, variance,
etc. And a Latent data
n n
xi ∈ ℝ sample representation x̃i ∈ ℝ
generator
Latent distribution
Learn the distribution of the data and then represent the data
as samples generated from that distribution.
History
Ronald J. Williams
David E. Rumelhart Geo rey Hinton http://www.ccs.neu.edu/home/rjw/pubs.html
1942-2011 1947-
https://www.cs.toronto.edu/~hinton/
Stanford Obituary Link
Ballard, D.H., 1987, July. Modular learning in neural networks. In AAAI (Vol. 647, pp. 279-284)
For further reading…
Autoencoder example:
http://adl.toelt.ai/Chapter25/Your_first_autoencoder_with_Keras.html
Autoencoders: http://www.cs.toronto.edu/~rgrosse/courses/csc321_2017/slides/
lec20.pdf
Bank, D., Koenigstein, N. and Giryes, R., 2020. Autoencoders. arXiv preprint
arXiv:2003.05991.
MNIST: http://yann.lecun.com/exdb/mnist/