0% found this document useful (0 votes)

19 views9 pages

Wikipedia VAE

Wikipedia introduction to VAE

Uploaded by

Dahua Lin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views9 pages

Wikipedia VAE

Wikipedia introduction to VAE

Uploaded by

Dahua Lin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Variational autoencoder

In machine learning, a variational

autoencoder (VAE) is an artificial
neural network architecture
introduced by Diederik P. Kingma
and Max Welling.[1] It is part of
the families of probabilistic
graphical models and variational
Bayesian methods.[2]

In addition to being seen as an The basic scheme of a variational autoencoder. The model receives
autoencoder neural network as input. The encoder compresses it into the latent space. The decoder
architecture, variational receives as input the information sampled from the latent space and
autoencoders can also be studied produces as similar as possible to .
within the mathematical
formulation of variational
Bayesian methods, connecting a neural encoder network to its decoder through a probabilistic latent
space (for example, as a multivariate Gaussian distribution) that corresponds to the parameters of a
variational distribution.

Thus, the encoder maps each point (such as an image) from a large complex dataset into a distribution
within the latent space, rather than to a single point in that space. The decoder has the opposite function,
which is to map from the latent space to the input space, again according to a distribution (although in
practice, noise is rarely added during the decoding stage). By mapping a point to a distribution instead of
a single point, the network can avoid overfitting the training data. Both networks are typically trained
together with the usage of the reparameterization trick, although the variance of the noise model can be
learned separately.

Although this type of model was initially designed for unsupervised learning,[3][4] its effectiveness has
been proven for semi-supervised learning[5][6] and supervised learning.[7]

Overview of architecture and operation

A variational autoencoder is a generative model with a prior and noise distribution respectively. Usually
such models are trained using the expectation-maximization meta-algorithm (e.g. probabilistic PCA,
(spike & slab) sparse coding). Such a scheme optimizes a lower bound of the data likelihood, which is
usually intractable, and in doing so requires the discovery of q-distributions, or variational posteriors.
These q-distributions are normally parameterized for each individual data point in a separate optimization
process. However, variational autoencoders use a neural network as an amortized approach to jointly
optimize across data points. This neural network takes as input the data points themselves, and outputs
parameters for the variational distribution. As it maps from a known input space to the low-dimensional
latent space, it is called the encoder.

The decoder is the second neural network of this model. It is a function that maps from the latent space to
the input space, e.g. as the means of the noise distribution. It is possible to use another neural network
that maps to the variance, however this can be omitted for simplicity. In such a case, the variance can be
optimized with gradient descent.

To optimize this model, one needs to know two terms: the "reconstruction error", and the Kullback–
Leibler divergence (KL-D). Both terms are derived from the free energy expression of the probabilistic
model, and therefore differ depending on the noise distribution and the assumed prior of the data. For
example, a standard VAE task such as IMAGENET is typically assumed to have a gaussianly distributed
noise; however, tasks such as binarized MNIST require a Bernoulli noise. The KL-D from the free energy
expression maximizes the probability mass of the q-distribution that overlaps with the p-distribution,
which unfortunately can result in mode-seeking behaviour. The "reconstruction" term is the remainder of
the free energy expression, and requires a sampling approximation to compute its expectation value.[8]

More recent approaches replace Kullback–Leibler divergence (KL-D) with various statistical distances,
see see section "Statistical distance VAE variants" below..

Formulation
From the point of view of probabilistic modeling, one wants to maximize the likelihood of the data by
their chosen parameterized probability distribution . This distribution is usually chosen to
be a Gaussian which is parameterized by and respectively, and as a member of the
exponential family it is easy to work with as a noise distribution. Simple distributions are easy enough to
maximize, however distributions where a prior is assumed over the latents results in intractable
integrals. Let us find via marginalizing over .

where represents the joint distribution under of the observable data and its latent
representation or encoding . According to the chain rule, the equation can be rewritten as

In the vanilla variational autoencoder, is usually taken to be a finite-dimensional vector of real numbers,
and to be a Gaussian distribution. Then is a mixture of Gaussian distributions.

It is now possible to define the set of the relationships between the input data and its latent representation
as

Prior
Likelihood
Posterior
Unfortunately, the computation of is expensive and in most cases intractable. To speed up the
calculus to make it feasible, it is necessary to introduce a further function to approximate the posterior
distribution as

with defined as the set of real values that parametrize . This is sometimes called amortized inference,
since by "investing" in finding a good , one can later infer from quickly without doing any
integrals.

In this way, the problem is to find a good probabilistic autoencoder, in which the conditional likelihood
distribution is computed by the probabilistic decoder, and the approximated posterior
distribution is computed by the probabilistic encoder.

Parametrize the encoder as , and the decoder as .

Evidence lower bound (ELBO)

As in every deep learning problem, it is necessary to define a differentiable loss function in order to
update the network weights through backpropagation.

For variational autoencoders, the idea is to jointly optimize the generative model parameters to reduce
the reconstruction error between the input and the output, and to make as close as possible to
. As reconstruction loss, mean squared error and cross entropy are often used.

As distance loss between the two distributions the Kullback–Leibler divergence

is a good choice to squeeze under .[8][9]

The distance loss just defined is expanded as

Now define the evidence lower bound (ELBO):

Maximizing the ELBO

is equivalent to simultaneously maximizing and minimizing . That is,
maximizing the log-likelihood of the observed data, and minimizing the divergence of the approximate
posterior from the exact posterior .

The form given is not very convenient for maximization, but the following, equivalent form, is:

where is implemented as , since that is, up to an additive constant, what

yields. That is, we model the distribution of conditional on to be a Gaussian
distribution centered on . The distribution of and are often also chosen to be
Gaussians as and , with which we obtain by the formula for
KL divergence of Gaussians:

Here is the dimension of . For a more detailed derivation and more interpretations of ELBO and its
maximization, see its main page.

Reparameterization
To efficiently search for

the typical method is gradient ascent.

It is straightforward to find

The scheme of the reparameterization trick. The

randomness variable is injected into the latent
space as external input. In this way, it is
possible to backpropagate the gradient without
involving stochastic variable during the update.

However,
does not allow one to put the inside the expectation, since appears in the probability distribution
itself. The reparameterization trick (also known as stochastic backpropagation[10]) bypasses this
difficulty.[8][11][12]

The most important example is when is normally distributed, as .

This can be reparametrized by letting

be a "standard random number generator", and
construct as . Here, is
obtained by the Cholesky decomposition:

Then we have The scheme of a variational autoencoder after the

reparameterization trick

and so we obtained an unbiased estimator of the gradient, allowing stochastic gradient descent.

Since we reparametrized , we need to find . Let be the probability density function for , then

where is the Jacobian matrix of with respect to . Since , this is

Variations
Many variational autoencoders applications and extensions have been used to adapt the architecture to
other domains and improve its performance.

-VAE is an implementation with a weighted Kullback–Leibler divergence term to automatically

discover and interpret factorised latent representations. With this implementation, it is possible to force
manifold disentanglement for values greater than one. This architecture can discover disentangled
latent factors without supervision. [13][14]

The conditional VAE (CVAE), inserts label information in the latent space to force a deterministic
constrained representation of the learned data.[15]

Some structures directly deal with the quality of the generated samples[16][17] or implement more than
one latent space to further improve the representation learning.
Some architectures mix VAE and generative adversarial networks to obtain hybrid models.[18][19][20]

Statistical distance VAE variants

After the initial work of Diederik P. Kingma and Max Welling.[21] several procedures were proposed to
formulate in a more abstract way the operation of the VAE. In these approaches the loss function is
composed of two parts :

the usual reconstruction error part which seeks to ensure that the encoder-then-decoder
mapping is as close to the identity map as possible; the sampling is done
at run time from the empirical distribution of objects available (e.g., for MNIST or
IMAGENET this will be the empirical probability law of all images in the dataset). This gives
the term: .
a variational part that ensures that, when the empirical distribution is passed through
the encoder , we recover the target distribution, denoted here that is usually taken
to be a Multivariate normal distribution. We will denote this pushforward measure
which in practice is just the empirical distribution obtained by passing all dataset objects
through the encoder . In order to make sure that is close to the target ,a
Statistical distance is invoked and the term is added to the loss.
We obtain the final formula for the loss:

The statistical distance requires special properties, for instance is has to be posses a formula as
expectation because the loss function will need to be optimized by stochastic optimization algorithms.
Several distances can be chosen and this gave rise to several flavors of VAEs:

the sliced Wasserstein distance used by S Kolouri, et al. in their VAE[22]

the energy distance implemented in the Radon Sobolev Variational Auto-Encoder[23]
the Maximum Mean Discrepancy distance used in the MMD-VAE[24]
the Wasserstein distance used in the WAEs[25]
kernel-based distances used in the Kernelized Variational Autoencoder (K-VAE)[26]

See also
Autoencoder
Artificial neural network
Deep learning
Generative adversarial network
Representation learning
Sparse dictionary learning
Data augmentation
Backpropagation

References
1. Kingma, Diederik P.; Welling, Max (2022-12-10). "Auto-Encoding Variational Bayes".
arXiv:1312.6114 (https://arxiv.org/abs/1312.6114) [stat.ML (https://arxiv.org/archive/stat.M
L)].
2. Pinheiro Cinelli, Lucas; et al. (2021). "Variational Autoencoder" (https://books.google.com/bo
oks?id=N5EtEAAAQBAJ&pg=PA111). Variational Methods for Machine Learning with
Applications to Deep Networks. Springer. pp. 111–149. doi:10.1007/978-3-030-70679-1_5
(https://doi.org/10.1007%2F978-3-030-70679-1_5). ISBN 978-3-030-70681-4.
S2CID 240802776 (https://api.semanticscholar.org/CorpusID:240802776).
3. Dilokthanakul, Nat; Mediano, Pedro A. M.; Garnelo, Marta; Lee, Matthew C. H.; Salimbeni,
Hugh; Arulkumaran, Kai; Shanahan, Murray (2017-01-13). "Deep Unsupervised Clustering
with Gaussian Mixture Variational Autoencoders". arXiv:1611.02648 (https://arxiv.org/abs/16
11.02648) [cs.LG (https://arxiv.org/archive/cs.LG)].
4. Hsu, Wei-Ning; Zhang, Yu; Glass, James (December 2017). "Unsupervised domain
adaptation for robust speech recognition via variational autoencoder-based data
augmentation" (https://ieeexplore.ieee.org/document/8268911). 2017 IEEE Automatic
Speech Recognition and Understanding Workshop (ASRU). pp. 16–23. arXiv:1707.06265 (h
ttps://arxiv.org/abs/1707.06265). doi:10.1109/ASRU.2017.8268911 (https://doi.org/10.110
9%2FASRU.2017.8268911). ISBN 978-1-5090-4788-8. S2CID 22681625 (https://api.semant
icscholar.org/CorpusID:22681625).
5. Ehsan Abbasnejad, M.; Dick, Anthony; van den Hengel, Anton (2017). Infinite Variational
Autoencoder for Semi-Supervised Learning (https://openaccess.thecvf.com/content_cvpr_2
017/html/Abbasnejad_Infinite_Variational_Autoencoder_CVPR_2017_paper.html).
pp. 5888–5897.
6. Xu, Weidi; Sun, Haoze; Deng, Chao; Tan, Ying (2017-02-12). "Variational Autoencoder for
Semi-Supervised Text Classification" (https://ojs.aaai.org/index.php/AAAI/article/view/1096
6). Proceedings of the AAAI Conference on Artificial Intelligence. 31 (1).
doi:10.1609/aaai.v31i1.10966 (https://doi.org/10.1609%2Faaai.v31i1.10966).
S2CID 2060721 (https://api.semanticscholar.org/CorpusID:2060721).
7. Kameoka, Hirokazu; Li, Li; Inoue, Shota; Makino, Shoji (2019-09-01). "Supervised
Determined Source Separation with Multichannel Variational Autoencoder" (https://direct.mit.
edu/neco/article/31/9/1891/8494/Supervised-Determined-Source-Separation-with). Neural
Computation. 31 (9): 1891–1914. doi:10.1162/neco_a_01217 (https://doi.org/10.1162%2Fne
co_a_01217). PMID 31335290 (https://pubmed.ncbi.nlm.nih.gov/31335290).
S2CID 198168155 (https://api.semanticscholar.org/CorpusID:198168155).
8. Kingma, Diederik P.; Welling, Max (2013-12-20). "Auto-Encoding Variational Bayes".
arXiv:1312.6114 (https://arxiv.org/abs/1312.6114) [stat.ML (https://arxiv.org/archive/stat.M
L)].
9. "From Autoencoder to Beta-VAE" (https://lilianweng.github.io/lil-log/2018/08/12/from-autoenc
oder-to-beta-vae.html). Lil'Log. 2018-08-12.
10. Rezende, Danilo Jimenez; Mohamed, Shakir; Wierstra, Daan (2014-06-18). "Stochastic
Backpropagation and Approximate Inference in Deep Generative Models" (https://proceedin
gs.mlr.press/v32/rezende14.html). International Conference on Machine Learning. PMLR:
1278–1286. arXiv:1401.4082 (https://arxiv.org/abs/1401.4082).
11. Bengio, Yoshua; Courville, Aaron; Vincent, Pascal (2013). "Representation Learning: A
Review and New Perspectives" (https://ieeexplore.ieee.org/document/6472238). IEEE
Transactions on Pattern Analysis and Machine Intelligence. 35 (8): 1798–1828.
arXiv:1206.5538 (https://arxiv.org/abs/1206.5538). doi:10.1109/TPAMI.2013.50 (https://doi.o
rg/10.1109%2FTPAMI.2013.50). ISSN 1939-3539 (https://search.worldcat.org/issn/1939-35
39). PMID 23787338 (https://pubmed.ncbi.nlm.nih.gov/23787338). S2CID 393948 (https://a
pi.semanticscholar.org/CorpusID:393948).
12. Kingma, Diederik P.; Rezende, Danilo J.; Mohamed, Shakir; Welling, Max (2014-10-31).
"Semi-Supervised Learning with Deep Generative Models". arXiv:1406.5298 (https://arxiv.or
g/abs/1406.5298) [cs.LG (https://arxiv.org/archive/cs.LG)].
13. Higgins, Irina; Matthey, Loic; Pal, Arka; Burgess, Christopher; Glorot, Xavier; Botvinick,
Matthew; Mohamed, Shakir; Lerchner, Alexander (2016-11-04). beta-VAE: Learning Basic
Visual Concepts with a Constrained Variational Framework (https://openreview.net/forum?id
=Sy2fzU9gl). NeurIPS.
14. Burgess, Christopher P.; Higgins, Irina; Pal, Arka; Matthey, Loic; Watters, Nick; Desjardins,
Guillaume; Lerchner, Alexander (2018-04-10). "Understanding disentangling in β-VAE".
arXiv:1804.03599 (https://arxiv.org/abs/1804.03599) [stat.ML (https://arxiv.org/archive/stat.M
L)].
15. Sohn, Kihyuk; Lee, Honglak; Yan, Xinchen (2015-01-01). Learning Structured Output
Representation using Deep Conditional Generative Models (https://proceedings.neurips.cc/p
aper/2015/file/8d55a249e6baa5c06772297520da2051-Paper.pdf) (PDF). NeurIPS.
16. Dai, Bin; Wipf, David (2019-10-30). "Diagnosing and Enhancing VAE Models".
arXiv:1903.05789 (https://arxiv.org/abs/1903.05789) [cs.LG (https://arxiv.org/archive/cs.LG)].
17. Dorta, Garoe; Vicente, Sara; Agapito, Lourdes; Campbell, Neill D. F.; Simpson, Ivor (2018-
07-31). "Training VAEs Under Structured Residuals". arXiv:1804.01050 (https://arxiv.org/ab
s/1804.01050) [stat.ML (https://arxiv.org/archive/stat.ML)].
18. Larsen, Anders Boesen Lindbo; Sønderby, Søren Kaae; Larochelle, Hugo; Winther, Ole
(2016-06-11). "Autoencoding beyond pixels using a learned similarity metric" (http://proceedi
ngs.mlr.press/v48/larsen16.html). International Conference on Machine Learning. PMLR:
1558–1566. arXiv:1512.09300 (https://arxiv.org/abs/1512.09300).
19. Bao, Jianmin; Chen, Dong; Wen, Fang; Li, Houqiang; Hua, Gang (2017). "CVAE-GAN: Fine-
Grained Image Generation Through Asymmetric Training". pp. 2745–2754.
arXiv:1703.10155 (https://arxiv.org/abs/1703.10155) [cs.CV (https://arxiv.org/archive/cs.C
V)].
20. Gao, Rui; Hou, Xingsong; Qin, Jie; Chen, Jiaxin; Liu, Li; Zhu, Fan; Zhang, Zhao; Shao, Ling
(2020). "Zero-VAE-GAN: Generating Unseen Features for Generalized and Transductive
Zero-Shot Learning" (https://ieeexplore.ieee.org/document/8957359). IEEE Transactions on
Image Processing. 29: 3665–3680. Bibcode:2020ITIP...29.3665G (https://ui.adsabs.harvard.
edu/abs/2020ITIP...29.3665G). doi:10.1109/TIP.2020.2964429 (https://doi.org/10.1109%2FT
IP.2020.2964429). ISSN 1941-0042 (https://search.worldcat.org/issn/1941-0042).
PMID 31940538 (https://pubmed.ncbi.nlm.nih.gov/31940538). S2CID 210334032 (https://ap
i.semanticscholar.org/CorpusID:210334032).
21. Kingma, Diederik P.; Welling, Max (2022-12-10). "Auto-Encoding Variational Bayes".
arXiv:1312.6114 (https://arxiv.org/abs/1312.6114) [stat.ML (https://arxiv.org/archive/stat.M
L)].
22. Kolouri, Soheil; Pope, Phillip E.; Martin, Charles E.; Rohde, Gustavo K. (2019). "Sliced
Wasserstein Auto-Encoders" (https://openreview.net/forum?id=H1xaJn05FQ). International
Conference on Learning Representations. International Conference on Learning
Representations. ICPR.
23. Turinici, Gabriel (2021). "Radon-Sobolev Variational Auto-Encoders" (https://www.sciencedir
ect.com/science/article/pii/S0893608021001556). Neural Networks. 141: 294–305.
arXiv:1911.13135 (https://arxiv.org/abs/1911.13135). doi:10.1016/j.neunet.2021.04.018 (http
s://doi.org/10.1016%2Fj.neunet.2021.04.018). ISSN 0893-6080 (https://search.worldcat.org/
issn/0893-6080). PMID 33933889 (https://pubmed.ncbi.nlm.nih.gov/33933889).
24. Gretton, A.; Li, Y.; Swersky, K.; Zemel, R.; Turner, R. (2017). "A Polya Contagion Model for
Networks". IEEE Transactions on Control of Network Systems. 5 (4): 1998–2010.
arXiv:1705.02239 (https://arxiv.org/abs/1705.02239). doi:10.1109/TCNS.2017.2781467 (http
s://doi.org/10.1109%2FTCNS.2017.2781467).
25. Tolstikhin, I.; Bousquet, O.; Gelly, S.; Schölkopf, B. (2018). "Wasserstein Auto-Encoders".
arXiv:1711.01558 (https://arxiv.org/abs/1711.01558) [stat.ML (https://arxiv.org/archive/stat.M
L)].
26. Louizos, C.; Shi, X.; Swersky, K.; Li, Y.; Welling, M. (2019). "Kernelized Variational
Autoencoders". arXiv:1901.02401 (https://arxiv.org/abs/1901.02401) [astro-ph.CO (https://ar
xiv.org/archive/astro-ph.CO)].

Further reading
Kingma, Diederik P.; Welling, Max (2019). "An Introduction to Variational Autoencoders".
Foundations and Trends in Machine Learning. 12 (4). Now Publishers: 307–392.
arXiv:1906.02691 (https://arxiv.org/abs/1906.02691). doi:10.1561/2200000056 (https://doi.or
g/10.1561%2F2200000056). ISSN 1935-8237 (https://search.worldcat.org/issn/1935-8237).

Retrieved from "https://en.wikipedia.org/w/index.php?title=Variational_autoencoder&oldid=1257440194"

Math Biostatistics Boot Camp 1
100% (1)
Math Biostatistics Boot Camp 1
3 pages
Doe Quiz
No ratings yet
Doe Quiz
10 pages
Auto-Encoding Variational Bayes
No ratings yet
Auto-Encoding Variational Bayes
8 pages
Auto-Encoding Variational Bayes: Diederik P. Kingma Max Welling
No ratings yet
Auto-Encoding Variational Bayes: Diederik P. Kingma Max Welling
9 pages
Bayesian NN
No ratings yet
Bayesian NN
82 pages
Variational Autoencoder Explanation
No ratings yet
Variational Autoencoder Explanation
11 pages
VAE talk.compressed - 副本
No ratings yet
VAE talk.compressed - 副本
59 pages
An Introduction To Variational Autoencoders: Foundations and Trends in Machine Learning
No ratings yet
An Introduction To Variational Autoencoders: Foundations and Trends in Machine Learning
89 pages
Mod 3 Advanced AI
No ratings yet
Mod 3 Advanced AI
37 pages
Intro To Vae
No ratings yet
Intro To Vae
89 pages
ACV - Notes - Final
No ratings yet
ACV - Notes - Final
7 pages
Unsupervised Variational Acoustic Clustering: Luan Vin Icius Fiorio Bruno Defraene Johan David
No ratings yet
Unsupervised Variational Acoustic Clustering: Luan Vin Icius Fiorio Bruno Defraene Johan David
5 pages
Reparametrization Trick
No ratings yet
Reparametrization Trick
8 pages
L20 GenerativeModels
No ratings yet
L20 GenerativeModels
53 pages
465-Lecture 12
No ratings yet
465-Lecture 12
31 pages
1 Autoencoders
No ratings yet
1 Autoencoders
22 pages
8.auto-Encoding Variational Bayes
No ratings yet
8.auto-Encoding Variational Bayes
14 pages
Gen AI Unit 2
100% (1)
Gen AI Unit 2
65 pages
Adversarial Variational Bayes
No ratings yet
Adversarial Variational Bayes
14 pages
Auto Encoding Variational Bayes
No ratings yet
Auto Encoding Variational Bayes
14 pages
Variational AutoEncoder
No ratings yet
Variational AutoEncoder
21 pages
08 VariationalInference
No ratings yet
08 VariationalInference
31 pages
Variational Autoencoders
No ratings yet
Variational Autoencoders
14 pages
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
From Everand
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
Fouad Sabry
No ratings yet
Lecture # 6 Latent Variable Models
No ratings yet
Lecture # 6 Latent Variable Models
55 pages
Tutorial - What Is A Variational Autoencoder - Jaan Altosaar
No ratings yet
Tutorial - What Is A Variational Autoencoder - Jaan Altosaar
20 pages
Variational Autoencoder
No ratings yet
Variational Autoencoder
21 pages
5 - Vae
No ratings yet
5 - Vae
20 pages
VAE Continued: Biplab Banerjee
No ratings yet
VAE Continued: Biplab Banerjee
23 pages
7.variational Autoencoders
No ratings yet
7.variational Autoencoders
4 pages
Tutorial On Diffusion Models
No ratings yet
Tutorial On Diffusion Models
4 pages
Khan - Diffusion Models and Normalizing Flows
No ratings yet
Khan - Diffusion Models and Normalizing Flows
36 pages
Tutorial On Diffusion Models For Imaging and Vision: Stanley Chan March 28, 2024
No ratings yet
Tutorial On Diffusion Models For Imaging and Vision: Stanley Chan March 28, 2024
51 pages
CS 601 Machine Learning Unit 5
No ratings yet
CS 601 Machine Learning Unit 5
18 pages
Latent Variable Models: Stefano Ermon
No ratings yet
Latent Variable Models: Stefano Ermon
26 pages
AVAE
No ratings yet
AVAE
21 pages
GAPE Module 3
No ratings yet
GAPE Module 3
21 pages
Understanding Diffusion Models: A Unified Perspective
No ratings yet
Understanding Diffusion Models: A Unified Perspective
23 pages
Density Estimation Using Real NVP
No ratings yet
Density Estimation Using Real NVP
32 pages
05 Vae
No ratings yet
05 Vae
76 pages
Variational Autoencoders
No ratings yet
Variational Autoencoders
94 pages
Variation Al
No ratings yet
Variation Al
25 pages
Eric Jang - A Beginner's Guide To Variational Methods - Mean-Field Approximation
No ratings yet
Eric Jang - A Beginner's Guide To Variational Methods - Mean-Field Approximation
9 pages
Denoising Autoencoders tr1316
No ratings yet
Denoising Autoencoders tr1316
16 pages
Tung Kieu - Probabilistic - Graphical - Model - Report
No ratings yet
Tung Kieu - Probabilistic - Graphical - Model - Report
9 pages
IAF Kingma Et Al 2016
No ratings yet
IAF Kingma Et Al 2016
16 pages
Hyperspherical Variational Auto-Encoders: Tim R. Davidson Luca Falorsi Nicola de Cao Thomas Kipf Jakub M. Tomczak
No ratings yet
Hyperspherical Variational Auto-Encoders: Tim R. Davidson Luca Falorsi Nicola de Cao Thomas Kipf Jakub M. Tomczak
19 pages
Introduction To VAE
No ratings yet
Introduction To VAE
5 pages
Notes
No ratings yet
Notes
9 pages
Auto Encoder S
No ratings yet
Auto Encoder S
22 pages
Autoencoders
No ratings yet
Autoencoders
35 pages
Denoising Adversarial Autoencoders
No ratings yet
Denoising Adversarial Autoencoders
17 pages
Is Simple Better?: Revisiting Simple Generative Models For Unsupervised Clustering
No ratings yet
Is Simple Better?: Revisiting Simple Generative Models For Unsupervised Clustering
6 pages
CS 601 Machine Learning Unit 5
No ratings yet
CS 601 Machine Learning Unit 5
18 pages
Representation Learning
No ratings yet
Representation Learning
21 pages
Time Grad
No ratings yet
Time Grad
11 pages
Exploring The Latent Space of Autoencoders With
No ratings yet
Exploring The Latent Space of Autoencoders With
34 pages
P - Improving Latent Variable Discriptiveness by Modelling Rather Than Ad-Hoc Factors
No ratings yet
P - Improving Latent Variable Discriptiveness by Modelling Rather Than Ad-Hoc Factors
11 pages
Demystifying Variational Diffusion Models
No ratings yet
Demystifying Variational Diffusion Models
48 pages
Week 2 - VAE
No ratings yet
Week 2 - VAE
14 pages
Module 2 Gen
No ratings yet
Module 2 Gen
57 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
CS 107 Probability, AUA, Spring 2024, Lecture 01
No ratings yet
CS 107 Probability, AUA, Spring 2024, Lecture 01
13 pages
Chapter 20 - Queueing Theory
No ratings yet
Chapter 20 - Queueing Theory
55 pages
Model Examination - 2019-20 Risk Models (Th16I07) Time: 3 Hrs Max Marks: 100 SECTION - A (11 2 22) Answer All The Questions
No ratings yet
Model Examination - 2019-20 Risk Models (Th16I07) Time: 3 Hrs Max Marks: 100 SECTION - A (11 2 22) Answer All The Questions
2 pages
T-Test Practical
No ratings yet
T-Test Practical
31 pages
Appendix Statistics: Random Variables
No ratings yet
Appendix Statistics: Random Variables
5 pages
1.7.1 Moments and Moment Generating Functions: Chapter 1. Elements of Probability Distribution Theory
No ratings yet
1.7.1 Moments and Moment Generating Functions: Chapter 1. Elements of Probability Distribution Theory
8 pages
Expectation and Fubini S Theorem PDF
No ratings yet
Expectation and Fubini S Theorem PDF
23 pages
L1 Probability and Venn Diagrams 12.2
No ratings yet
L1 Probability and Venn Diagrams 12.2
19 pages
Pand SQB
No ratings yet
Pand SQB
22 pages
AE9-Act.6-Measure of Dispersion
No ratings yet
AE9-Act.6-Measure of Dispersion
4 pages
Statistics For Data Science
No ratings yet
Statistics For Data Science
26 pages
Naivebayes
No ratings yet
Naivebayes
7 pages
Assign 6 2021 Autumn
No ratings yet
Assign 6 2021 Autumn
3 pages
Data Analysis For Physics Laboratory: Standard Errors
No ratings yet
Data Analysis For Physics Laboratory: Standard Errors
5 pages
ProSta Chap1 (2021.2)
No ratings yet
ProSta Chap1 (2021.2)
96 pages
QTDM Unit-3 Probability Distributions
No ratings yet
QTDM Unit-3 Probability Distributions
29 pages
Tutorial On Higher Order Statistics
No ratings yet
Tutorial On Higher Order Statistics
28 pages
Operations Management: William J. Stevenson
No ratings yet
Operations Management: William J. Stevenson
14 pages
Problem Set On PDF, CDF, MGF - 1
No ratings yet
Problem Set On PDF, CDF, MGF - 1
4 pages
RJwrapper
No ratings yet
RJwrapper
24 pages
Statistics and Probability
No ratings yet
Statistics and Probability
35 pages
Valliammai Engeineering College: (S.R.M.NAGAR, KATTANKULATHUR-603 203)
No ratings yet
Valliammai Engeineering College: (S.R.M.NAGAR, KATTANKULATHUR-603 203)
21 pages
A Guide To Dnorm, Pnorm, Rnorm, and Qnorm in R
No ratings yet
A Guide To Dnorm, Pnorm, Rnorm, and Qnorm in R
3 pages
125E2B
No ratings yet
125E2B
2 pages
PTSP Objective Questions
No ratings yet
PTSP Objective Questions
7 pages
3 - Continuous Random Variables
No ratings yet
3 - Continuous Random Variables
84 pages
Stochastic Differential Equations: Florian Herzog 2010
No ratings yet
Stochastic Differential Equations: Florian Herzog 2010
64 pages
Week+6 1
No ratings yet
Week+6 1
27 pages

Wikipedia VAE

Uploaded by

Wikipedia VAE

Uploaded by

Variational autoencoder

In machine learning, a variational

Overview of architecture and operation

Parametrize the encoder as , and the decoder as .

Evidence lower bound (ELBO)

As distance loss between the two distributions the Kullback–Leibler divergence

The distance loss just defined is expanded as

Now define the evidence lower bound (ELBO):

Maximizing the ELBO

where is implemented as , since that is, up to an additive constant, what

the typical method is gradient ascent.

The scheme of the reparameterization trick. The

The most important example is when is normally distributed, as .

This can be reparametrized by letting

Then we have The scheme of a variational autoencoder after the

where is the Jacobian matrix of with respect to . Since , this is

-VAE is an implementation with a weighted Kullback–Leibler divergence term to automatically

Statistical distance VAE variants

the sliced Wasserstein distance used by S Kolouri, et al. in their VAE[22]

Retrieved from "https://en.wikipedia.org/w/index.php?title=Variational_autoencoder&oldid=1257440194"

You might also like