Generative Adversarial Text To Image Synthesis
Generative Adversarial Text To Image Synthesis
We implemented a deep recurrent neural net- • We developed a simple and effective model for
work architecture and Generative Adversar- generating images based on detailed visual de-
ial Network(GAN) formulation to effectively scriptions.
bridge the advances in text and image mod- • The images are able to capture shape and color
eling, translating visual concepts from char- of the flower but lacks other significant details
acters to pixels. We show the capability of to pass off as a realistic sample.
the model to generate images of flowers from • The model could not generalize to images with
detailed text descriptions. multiple objects.
• Artificial synthesis of images using text descrip- DCGAN • Improve Generator learning with manifold in-
tions could have profound applications in visual GAN training procedure is similar to a two-player min-max game with the following objective function: terpolation.
editing, animation, and digital design. min max V (D, G) = Ex∼ pdata [logD(x)] + Ez∼ pz [log(1 − D(G(z)))] • Implementation of Stacked GANs to produce
G D
• The distribution of images conditioned on a text
high quality images.
where x is a real image from the true distribution, and z is a noise vector sampled from pz , which might
description is highly multimodal. • Explore the possibility of using Wasserstein
be a Gaussian or uniform distribution.
• In GANs, the discriminator D tries to distin-
GANs and Cyclic GANs.
guish real images from syntheticized images. Skip Thought Vectors • Generalizing the model to generate images with
The generator G tries to fool D. An unsupervised approach to train a generic, distributed sentence encoder. We train an encoder-decoder multiple objects and variable backgrounds using
• The discriminator views (text, image) pairs as model where encoder maps the input sentence to a vector and the decoder generates the surrounding MS-COCO dataset.
joint observations and is trained to judge a pair sentences.
t
logP (wi+1
X <t
|wi+1 t
, hi) + logP (wi−1 <t
|wi−1
X
, hi) References
as real or fake. t t
Subproblems Objective is to reduce the sum of the log-probabilities for the forward and backward sentences [1] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B.
• Learn a text feature representation that cap- conditioned on the encoder output. Schiele, and H. Lee. Generative adversarial
tures the important visual details. text-to-image synthesis. In ICML, 2016.
• Use these features to synthesize a compelling Results [2] A. Radford, L. Metz, and S. Chintala. Unsu-
image. pervised representation learning with deep con-
this flower has petals that are the flower has an abundance of flower is purple and pink in petal volutional generative adversarial networks. In
Literature Survey red and are bunched together yellow petals and brown anthers and feature a dark, dense core. ICLR, 2016.
[3] Ryan Kiros, Yukun Zhu, Ruslan R Salakhutdi-
• [1] estimated generative models via adversarial nov, Richard Zemel, Raquel Urtasun, Antonio
process to generate image conditioned on text Torralba, and Sanja Fidler. Skip-thought vec-
and input noise. tors. In NIPS, 2015.
• In [2,4] authors describe architectural guidelines [4] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza,
for stable GANs. B. Xu, D. Warde-Farley, S. Ozair, A. C.
• In [3] authors gave unsupervised approach to Courville, and Y. Bengio. Generative adver-
Figures generated from corresponding caption using the trained model.
train a generic sentence encoder. sarial nets. In NIPS, 2014.