Generative Adversarial Networks
Generative Adversarial Networks
1145 / 3 42 2 6 2 2
Figure 1. Many approaches to generative modeling are based Figure 2. The goal of many generative models, as illustrated
on density estimation: observing several training examples of here, is to study a collection of training examples, then learn to
a random variable x and inferring a density function p(x) that generate more examples that come from the same probability
generates the training data. This approach is illustrated here, distribution. GANs learn to do this without using an explicit
with several data points on a real number line used to fit a representation of the density function. One advantage of the
Gaussian density function that explains the observed samples. GAN framework is that it may be applied to models for which the
In contrast to this common approach, GANs are implicit models density function is computationally intractable. The samples
that infer the probability distribution p(x) without necessarily shown here are all samples from the ImageNet dataset,8
representing the density function explicitly. including the ones labeled “model samples.” We use actual
ImageNet data to illustrate the goal that a hypothetical perfect
model would attain.
p(x)
z z
(a) (b)
x x
z z
(c) (d)
approximation to their true underlying task. The only real spurious Nash equilibria exist,32 whether the learning algo-
error is the statistical error (sampling of a finite amount of rithm converges to a Nash equilibrium,24 and if it does so,
training data rather than measuring the true underlying how quickly.21
data-generating distribution) and failure of the learning In many cases of practical interest, these theoretical
algorithm to converge to exactly the optimal parameters. questions are open, and the best learning algorithms seem
Many generative modeling strategies would introduce these empirically to often fail to converge. Theoretical work to
sources of error and also further sources of approximation answer these questions is ongoing, as is work to design bet-
error, based on Markov chains, optimization of bounds on ter costs, models, and training algorithms with better con-
the true cost rather than the cost itself, etc. vergence properties.
It is difficult to give much further specific guidance regard-
ing the details of GANs because GANs are such an active 5. OTHER GAN TOPICS
research area and most specific advice quickly becomes out This article is focused on a summary of the core design con-
of date. Figure 6 shows how quickly the capabilities of GANs siderations and algorithmic properties of GANs.
have progressed in the years since their introduction. Many other topics of potential interest cannot be consid-
ered here due to space consideration. This article discussed
4. CONVERGENCE OF GANS using GANs to approximate a distribution p(x) they have also
The central theoretical results presented in the original GAN been extended to the conditional setting23, 25 where they gen-
paper13 were that: erate samples corresponding to some input by drawing sam-
ples from the conditional distribution p(x | y). GANs are
1. in the space of density functions pmodel and discrimina- related to moment matching16 and optimal transport.1 A
tor functions D, there is only one local Nash equilib- quirk of GANs that is made especially clear through their
rium, where pmodel = pdata. connection to MMD and optimal transport is that they may
2. if it were possible to optimize directly over such den- be used to train generative models for which pmodel has sup-
sity functions, then the algorithm that consists of opti- port only on a thin manifold and may actually assign zero
mizing D to convergence in the inner loop, then likelihood to the training data. GANs struggle to generate
making a small gradient step on pmodel in the outer discrete data because the back-propagation algorithm
loop, converges to this Nash equilibrium. needs to propagate gradients from the discriminator
through the output of the generator, but this problem is
However, the theoretical model of local moves directly in being gradually resolved.9 Like most generative models,
density function space may not be very relevant to GANs as GANs can be used to fill in gaps in missing data.34 GANs have
they are trained in practice: using local moves in parameter proven very effective for learning to classify data using very
space of the generator function, among the set of functions few labeled training examples.29 Evaluating the performance
representable by neural networks with a finite number of of generative models including GANs is a difficult research
parameters, with each parameter represented with a finite area in its own right.29, 31, 32, 33 GANs can be seen as a way for
number of bits. machine learning to learn its own cost function, rather than
In many different theoretical models, it is interesting to minimizing a hand-designed cost function. GANs can be
study whether a Nash equilibrium exists,2 whether any seen as a way of supervising machine learning by asking it to
produce any output that the machine learning algorithm Z. Ghahramani, M. Welling, C. Cortes, 25. Odena, A., Olah, C., Shlens, J.
N.D. Lawrence, K.Q. Weinberger, eds. Conditional image synthesis with
itself recognizes as acceptable, rather than by asking it to Advances in Neural Information auxiliary classifier gans. arXiv preprint
produce a specific example output. GANs are thus great for Processing Systems 27, Curran arXiv:1610.09585 (2016).
Associates, Inc., Boston, 2014, 26. Oord, A. v. d., Li, Y., Babuschkin, I.,
learning in situations where there are many possible correct 2672–2680. Simonyan, K., Vinyals, O.,
answers, such as predicting the many possible futures that 14. Karras, T., Aila, T., Laine, S., Lehtinen, J. Kavukcuoglu, K., Driessche, G. v. d.,
Progressive growing of GANs for Lockhart, E., Cobo, L.C., Stimberg, F.,
can happen in video generation.19 GANs and GAN-like mod- improved quality, stability, and variation. et al. Parallel wavenet: Fast
CoRR, abs/1710.10196 (2017). high-fidelity speech synthesis. arXiv
els can be used to learn to transform data from one domain 15. Kingma, D.P., Welling, M. Auto- preprint arXiv:1711.10433 (2017).
into data from another domain, even without any labeled encoding variational bayes. In 27. Radford, A., Metz, L., Chintala, S.
Proceedings of the International Unsupervised representation learning
pairs of examples from those domains (e.g., Zhu et al.35). For Conference on Learning with deep convolutional generative
example, after studying a collection of photos of zebras and Representations (ICLR) (2014). adversarial networks. arXiv preprint
16. Li, Y., Swersky, K., Zemel, R.S. Generative arXiv:1511.06434 (2015).
a collection of photos of horses, GANs can turn a photo of a moment matching networks. CoRR, 28. Ratliff, L.J., Burden, and S.A., Sastry,
horse into a photo of a zebra.35 GANs have been used in sci- abs/1502.02761 (2015). S.S. Characterization and computation
17. Liu, M.-Y., Tuzel, O. Coupled generative of local nash equilibria in continuous
ence to simulate experiments that would be costly to run adversarial networks. D.D. Lee, M. games. In Communication, Control,
even in traditional software simulators.7 GANs can be used Sugiyama, U.V. Luxburg, I. Guyon, R. and Computing (Allerton), 2013 51st
Garnett, eds. Advances in Neural Annual Allerton Conference on. IEEE,
to create fake data to train other machine learning models, Information Processing Systems 29, (2013), 917–924.
either when real data would be hard to acquire30 or when Curran Associates, Inc., Boston, 2016, 29. Salimans, T., Goodfellow, I.,
469–477. Zaremba, W., Cheung, V., Radford, A.,
there would be privacy concerns associated with real data.3 18. Lucic, M., Kurach, K., Michalski, M., Chen, X. Improved techniques for
GAN-like models called domain-adversarial networks can be Gelly, S., Bousquet, O. Are GANs training gans. In Advances in Neural
created equal? a large-scale study. Information Processing Systems
used for domain adaptation.12 GANs can be used for a variety arXiv preprint arXiv:1711.10337 (2017). (2016), 2234–2242.
of interactive digital media effects where the end goal is to 19. Mathieu, M., Couprie, C., LeCun, Y. 30. Shrivastava, A., Pfister, T., Tuzel, O.,
Deep multi-scale video prediction Susskind, J., Wang, W., Webb, R.
produce compelling imagery.35 GANs can even be used to beyond mean square error. arXiv Learning from simulated and
preprint arXiv:1511.05440 (2015). unsupervised images through
solve variational inference problems used in other 20. Mescheder, L., Nowozin, S., Geiger, A. adversarial training.
approaches to generative modeling.20 GANs can learn useful Adversarial variational bayes: Unifying 31. Theis, L., van den Oord, A., Bethge,
variational autoencoders and M. A note on the evaluation of
embedding vectors and discover concepts like gender of generative adversarial networks. arXiv generative models. arXiv:1511.01844
human faces without supervision.27 preprint arXiv:1701.04722 (2017). (Nov 2015).
21. Mescheder, L., Nowozin, S., Geiger, A. 32. Unterthiner, T., Nessler, B.,
The numerics of gans. In Advances in Klambauer, G., Heusel, M., Ramsauer,
6. CONCLUSION Neural Information Processing H., Hochreiter, S. Coulomb GANs:
Systems (2017), 1823–1833. Provably optimal Nash equilibria via
GANs are a kind of generative model based on game theory. 22. Metz, L., Poole, B., Pfau, D., potential fields. arXiv preprint
They have had great practical success in terms of generating Sohl-Dickstein, J. Unrolled generative arXiv:1708.08819 (2017).
adversarial networks. arXiv preprint 33. Wu, Y., Burda, Y., Salakhutdinov, R.,
realistic data, especially images. It is currently still difficult arXiv:1611.02163 (2016). Grosse, R. On the quantitative analysis
to train them. For GANs to become a more reliable technol- 23. Mirza, M., Osindero, S. Conditional of decoder-based generative models.
generative adversarial nets. arXiv arXiv preprint arXiv:1611.04273 (2016).
ogy, it will be necessary to design models, costs, or training preprint arXiv:1411.1784 (2014). 34. Yeh, R., Chen, C., Lim, T.Y., Hasegawa-
algorithms for which it is possible to find good Nash equilib- 24. Nagarajan, V., Kolter, J.Z. Gradient Johnson, M., Do, M.N. Semantic image
descent GAN optimization is locally inpainting with perceptual and
ria consistently and quickly. stable. I. Guyon, U.V. Luxburg, S. contextual losses. arXiv preprint
Bengio, H. Wallach, R. Fergus, S. arXiv:1607.07539 (2016).
Vishwanathan, R. Garnett, eds. 35. Zhu, J.-Y., Park, T., Isola, P., Efros, A.A.
Advances in Neural Information Unpaired image-to-image translation
References 7. de Oliveira, L., Paganini, M., Nachman, Processing Systems 30, Curran using cycle-consistent adversarial
1. Arjovsky, M., Chintala, S., Bottou, L. B. Learning particle physics by Associates, Inc., Boston, 2017, networks. arXiv preprint
Wasserstein gan. arXiv preprint example: location-aware generative 5585–5595. arXiv:1703.10593 (2017).
arXiv:1701.07875 (2017). adversarial networks for physics
2. Arora, S., Ge, R., Liang, Y., Ma, T., synthesis. Computing and Software for
Zhang, Y. Generalization and Big Science 1 1(2017), 4. Ian Goodfellow, written while at Google Jean Pouget-Abadie, Mehdi Mirza,
equilibrium in generative adversarial 8. Deng, J., Dong, W., Socher, R., Brain. Bing Xu, David Warde-Farley,
nets (gans). arXiv preprint Li, L.-J., Li, K., Fei-Fei, L. ImageNet: A Sherjil Ozair, Aaron Courville, and
arXiv:1703.00573 (2017). Large-Scale Hierarchical Image Yoshua Bengio, Université de Montréal.
3. Beaulieu-Jones, B.K., Wu, Z.S., Database. In CVPR09 (2009).
Williams, C., Greene, C.S. Privacy- 9. Fedus, W., Goodfellow, I.,
preserving generative deep neural Dai, A.M. MaskGAN: Better text
networks support clinical data generation via filling in the _____. In Final submitted 5/9/2018.
sharing. bioRxiv (2017), 159756. International Conference on Learning
4. Bengio, Y., Thibodeau-Laufer, E., Representations (2018).
Alain, G., Yosinski, J. Deep generative 10. Fedus, W., Rosca, M.,
stochastic networks trainable by Lakshminarayanan, B., Dai, A.M.,
backprop. In ICML’2014 (2014). Mohamed, S., Goodfellow, I. Many
5. Brundage, M., Avin, S., Clark, J., paths to equilibrium: GANs do not
Toner, H., Eckersley, P., Garfinkel, B., need to decrease a divergence at
Dafoe, A., Scharre, P., Zeitzoff, T., Filar, B., every step. In International
Anderson, H., Roff, H., Allen, G.C., Conference on Learning
Steinhardt, J., Flynn, C., hÉigeartaigh, Representations (2018).
S.Ó., Beard, S., Belfield, H., Farquhar, S., 11. Frey, B.J. Graphical Models for Machine
Lyle, C., Crootof, R., Evans, O., Page, M., Learning and Digital Communication.
Bryson, J., Yampolskiy, R., Amodei, D. MIT Press, Boston, 1998.
The Malicious Use of Artificial 12. Ganin, Y., Lempitsky, V. Unsupervised
Intelligence: Forecasting, Prevention, domain adaptation by
and Mitigation. ArXiv e-prints (Feb. 2018). backpropagation. In International
6. Danihelka, I., Lakshminarayanan, B., Conference on Machine Learning
Uria, B., Wierstra, D., Dayan, P. (2015), 1180–1189.
Comparison of maximum likelihood 13. Goodfellow, I., Pouget-Abadie, J.,
and GAN-based training of real nvps. Mirza, M., Xu, B., Warde-Farley, D.,
arXiv preprint arXiv:1705.05263 Ozair, S., Courville, A., Bengio, Y.
(2017). Generative adversarial nets. Copyright held by authors/owners. Publication rights licensed to ACM.