Oussidi 2018
Oussidi 2018
Abstract—Generative models have found their way to the general) in a phase referred to as pre-training [9], before the
forefront of deep learning the last decade and so far, it seems final training of the whole network at once in the fine-tuning
that the hype will not fade away any time soon. In this paper, phase.
we give an overview of the most important building blocks of
most recent revolutionary deep generative models such as RBM, To construct a deep generative model by combining others
DBM, DBN, VAE and GAN. We will also take a look at three [10], we need to keep in mind that the probability distribution
of state-of-the-art generative models, namely PixelRNN, DRAW of the resulting model can be calculated and evaluated explic-
and NADE. We will delve into their unique architectures, the itly or implicitly in order to provide a probabilistic ground
learning procedures and their potential and limitations. We will for sampling or eventually doing inference on the model. In
also review some of the known issues that arise when trying to
design and train deep generative architectures using shallow ones general, feedforward networks are easier to stack and combine,
and how different models deal with these issues. This paper is not while energy-based are harder to combine without losing
meant to be a comprehensive study of these models, but rather tractability of joint probabilities.
a starting point for those who bear an interest in the field.
II. G ENERATIVE M ODELS
I. I NTRODUCTION
A. Boltzmann Machines
Generative models have been in the forefront of deep
Boltzmann Machines [10] is an energy-based model. [8]
unsupervised learning for the last decade. The reason for that
That is, it associates a scalar energy function to the model that
is because they offer a very efficient way to analyze and
takes in a configuration of the input variables and returns a
understand unlabeled data. The idea behind generative models
scalar value describing the “badness” level of the configuration
is to capture the inner probabilistic distribution that generates
in question. The goal of learning is therefore to find an
a class of data to generate similar data. This can be used for
energy function (within a predetermined functional space) that
fast data indexing and retrieval [1] [2] and plenty of other
associates smaller values to the correct configurations and
tasks. Generative models have been used in numerous fields
higher values for incorrect ones, both within and out of the
and problems such as visual recognition tasks [3], speech
training examples. Predictions are then made by selecting the
recognition and generation [4], natural language processing
configurations that minimize the energy.
[5] [6] and robotics [7]. In this paper, we are going to go over
The Boltzmann Machine was first introduced by Geoffrey
the most common building blocks for generative models in the
Hinton et al. in 1983 [10] [11]. Its main purpose was to carry
first chapter, and a survey of state of the art deep generative
out efficient searches for combinations of “hypotheses” that
models in use today in the second chapter.
maximally satisfy some constrained data input. The original
Generative models, in general, can be divided into two main
Boltzmann Machine is an undirected symmetric network of
categories:
binary units that are divided into visible and hidden units
• Cost function-based models such as autoencoders and (fig. 1). However, alternative real-valued variations have been
generative adversarial networks. proposed and even took over the scene and surpassed binary
• Energy-based models [8] where the joint probability is ones in popularity.
defined using an energy function; for instance, Boltzmann In this section, we start by introducing the binary Boltzmann
machine and its variants and deep belief networks. Machine being the main inspiration for Restricted Boltzmann
Depending on nature and depth, a model can admit different Machines (RBM) which in turn is the building block for many
types of training. In general, some of the training strategies are sophisticated and more powerful generative models, including
fast but non-efficient and others are more efficient but hard to Deep Boltzmann Machines (DBM) and Deep Belief Networks
carry out or take too long. There are also techniques used (DBN).
to avoid this tradeoff such as two-phased training. The most 1) Binary Boltzmann Machine: Binary Boltzmann Ma-
notable example is deep belief network which often undergoes chines are among the easiest networks to implement and could
a separate training for its components (two layers at a time in theoretically, given enough time and computational power,
978-1-5386-4396-9/18/$31.00
c 2018 IEEE
Boltzmann machines is more biologically plausible. Hebbian
learning is one of the oldest learning algorithms. It can be
summarized as “Cells that fire together wire together.” [12]
In practice, neurons choose to either strengthen their link or
weaken it based on how often they agree in their outputs.
If two neurons would more often than not have the same
output, the learning algorithm puts more weight on their link.
Similarly, if they disagree most often, the link between them is
weakened. This learning process is said to be more biologically
plausible because it does not require any backlinks to be
maintained by the network to receive gradient information,
and every weight update relies only on the neighboring units.
2) Restricted Boltzmann Machine: The tractability of the
joint distribution is one of the biggest drawbacks of Boltzmann
machines. Restricted Boltzmann Machines (formally Harmo-
nium) [13] are a special type of Boltzmann machines with two
layers: One visible and one hidden layer, that was designed to
Fig. 1. A Boltzmann Machine with 5 visible units (in blue) and 5 hidden solve this problem. The RBM is a graphical model of binary
units (in red). units. However, real-valued generalization is straightforward
[14] [15]. The connections in an RBM are undirected and
there are no visible-visible or hidden-hidden connections (fig.
learn complex distributions. However, they have not proven
2). Among other things, this bipartite architecture allows us
useful on a practical level. Similar to Hopfield networks,
to have more control over the joint distribution by casting it
Boltzmann machines are fully connected networks of binary
into a sum of conditional probabilities. RBMs are a powerful
units that use the same energy function. However, unlike
replacement for fully connected Boltzmann machines when
Hopfield networks, Boltzmann machines are not memory
building a deep architecture because of the independence of
driven and try to capture the inner structure and regularities
units within the same layer, which allows for more freedom
instead. The power of the binary Boltzmann Machine lies
and flexibility.
in the hidden units that allow it to extend the simple linear
interactions to higher-order ones and give it the possibility to
model virtually any probabilistic distribution. The Energy of
the binary Boltzmann Machine is given by:
1
E(x) = −( wij xi xj + bi x i ) (1)
2 ij i
Fig. 3. A Deep Boltzmann Machine with 1 visible layer (in blue) and 3 Fig. 5. A Deep Belief Network with a similar architecture to The DBM in
hidden layers (in red). fig. 3. In DBNs, all connections are undirected except for the top two layers.
G : Z → Rn (3)
D : Rn → [0, 1] (4)
Discriminator
The zero-sum game is modeled as the following optimization training
problem:
min max V (D, G) (5) is the data
G D
real?
Where:
V (D, G) = Ex∼pdata (x) [log(D(x))]
(6)
− Ez∼pz (z) [1 − log(D(G(z)))] Fig. 7. Generative adversarial networks architecture.