0% found this document useful (0 votes)

24 views17 pages

Satgan Paper

Uploaded by

raleigh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views17 pages

Satgan Paper

Uploaded by

raleigh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

SATGAN: Image Generation from

Text Descriptions using Self Attention

Generative Adversarial Network
Imranul Ashrafi, Arani Shawkat, Muntasir Mohammad
Supervisor: Dr. Nabeel Mohammed
North South University

21/05/2019

1
1 Abstract
Generation of fine grained image with details from text descriptions is a
highly challenging task in Computer Vision. Various successful attempts
have been taken so far but most of the tasks lack details and do not match
the text descriptions properly. In this paper, we propose SATGAN, a two
stage Generative Adversarial Network with Self Attention performed on text
descriptions. The first stage draws the primary 64x64 image and the second
stage applies the attention (on text) then generates high resolution 256x256
image. We experimented our model on CUB-200 (2011) Birds dataset. Our
extensive experiment and comparison with the state-of-the-art models shows
that using Self Attention on text descriptions and Spectral normalization
improves the quality of the generated image while reducing the computational
cost. The Inception score was found to be 5.04 ± 0.37, which is a boost of
15.6% on the CUB dataset. Also the model scored an FID score of 42.87.

2 Introduction
Human beings have been given the power of imagination. They can quickly
imagine a scenario based on text descriptions. For example, if a person is to
read a novel, they can actually imagine the meadows, the plot unraveling.
The recent hype of the world of Computer Vision is to give that same power
or something close, to a machine. This is where ”Text to Image Generation”
comes in. This is very much needed in various applications like, photo-editing,
computer-aided design etc. Text to Image Generation using Generative
Adversarial Network (GAN)[1] have shown the most promise. GANs’ using
deep convolutional networks have been especially successful[2][3][4]. Using
this, various state of the art architectures have been made which are discussed
in the Related Work section. But the main problem of those architectures is
that, they are very much computationally costly. In this paper, we tried to
solve 2 problems:

• Reducing the computational cost for training GAN architecture

• While reducing computational cost, still being able to produce high

quality image

At present the state of the art architecture Attention Generative Adversarial

Network (AttnGAN)[5] shows the most accuracy in terms of producing close
to real images. But the problem here remains that of computational cost.
Attention is given to not only images but also to word contexts which increases

2
computational cost. Furthermore, 2 Generators out of 3, using attention
architecture adds up even more to the computational cost. In contrast to
that, in our paper, we experimented with Vanilla GAN to produce images,
GAN with Self Attention [6] on images, and GAN with Self Attention on Text
Descriptions. We also used Spectral normalization [7] for normalizing image.
At the same time, we decreased the number of Generators to 2, 1st Generator
produces 64x64 images and the 2nd upsamples the image to 256x256 based
on the attention maps from Self Attention architecture. Detailed explanation
on results are described in section 5.

3 Related Work
Mansimov et. al. used bidirectional RNN attention model along with the con-
ditional DRAW network in order to generate images from text descriptions[8].
AttnGAN[5] uses attention driven multi stage refinement for text in order
to generate images. At first a low resolution image is generated using the
sentence vector(vector form of text descriptions). After that, each sub-region
of image is refined using word vector of the sentence based on context. In
addition, a deep multi-modal similarity model is introduced for calculating
GAN loss. Image Generation using PixelCNN[9] used the conditioning GAN
based on modified PixelCNN decoders. These conditions can be vectors,
labels, tags and latent embedding. The difference between PixelCNN and any
other architecture is that, along with generating excellent samples, it returns
explicitly probability densities. These densities help to generate excellent
samples and use them as transfer learning in other categories. Based on the
condition, the model can generate various outputs. Scott Reed used Deep
Conditional GAN (DC-GAN)[10] to produce finer images. They used deep
convolutional and recurrent text encoders for obtaining vector representations
of text descriptions. Matching-aware discriminator (GAN-CLS) was also
used in order to discriminate between real and fake images as well as real
image and mismatch text. They were also the first to use Inception score
as a metric for determining accuracy of GAN. Additional condition of real
image and mismatched text are added to the GAN. StackGAN-v2[11] uses
the architecture from StackGAN-v1 which uses two stages of GAN, one for
generating low resolution primary image from text and other one for gener-
ating high resolution image using the low resolution primary image and the
text description as inputs. Furthermore, StackGAN-v2 also uses multiple
generators and discriminators in order to generate images in multiple scales.
Zizhao Zhang introduced hierarchical-nested adversarial objectives inside
networks to produce high resolution images[12]. They also introduced a new

3
visual semantic similarity measure. LSTM (Long Short Term Memory) is
basically a Recurrent Neural Network (RNN). The basic difference between
LSTM and any other RNNs’ is that it is able to remember the relationships
among vectors over a long distance which other RNNs’ find very hard to do.
It is practically a nature of LSTM. Xu Ouyang represented an architecture
that uses LSTM network to find semantic meaning from the input text. They
used the real image as the goal for multiple similar sentences and showed that
it produced better results. These all were done based on generating image on
a single category. Multi-Instance StackGAN[13] on the other hand produced
multiple instances from a broader variety of categories. The model showed
promise in generating complex scene composition consisting of multiple ob-
jects based on input text description. Vashisht Madhavan et. al.[14] came up
with a Dual loss DCCGAN. They used encoded captions and used them for
generating images in DCCGAN (Deep Convolutional Conditional GAN).

4 Methodology

Figure 1: GAN architecture with Self Attention on Text

4.1 Conditioning Augmentation

The text embedding is generally non-linearly transformed into conditioning
latent variables which are usually of high dimension (>100 dimensions).
With limited amount of data, a type of discontinuity is created in the latent
data matrix, which is not desirable for training purposes. This is where
Conditioning Augmentation comes in. The trick here is that, latent
P random
variables ĉ are sampled from a Gaussian distribution N (µ(ϕt ), (ϕt )) where

4
P
the mean µ(ϕt ) and diagonal covariance matrix (ϕt ) are the functions of
the text embedding ϕt . The Kullback-Leibler divergence (KL divergence) is
also used as additional regularization term for smoothing the learning curve
for generator :- P
DKL (N (µ(ϕt ), (ϕt ))||N (0, 1))

4.2 Generative Adversarial Network

GAN or Generative Adversarial Networks’ have generally 2 networks - a
Generator Network and a Discriminator Network. The Generator Network
generates images and Discriminator Network tries to deduce whether the
image is a generated one or an actual one. On the basis of feedback from
Discriminator, Generator again generates images. A sort of min-max game
takes place between the two.

In our experimental architecture, there are 2 Generators (G0 , G1 ) which

have hidden states (h0 , h1 ). The output images are (x0 , x1 ). z is our noise
vector and e is our sentence embedding vector. Our architecture becomes
thus -
x0 = G0 (z, F ca (e))
C1 = F attn (F ca (e))
C1 = concat(C1 , x0 )
x1 = G1 (C1 )

We also experimented with applying Self Attention on image. Then the

equation changes to -
C1 = F attn (x0 ))
C1 = concat(C1 , F ca (e))

Here, x1 and x2 are of shape (batch size, 3, 64, 64) and (batch size, 3, 256,
256) respectively. Here F ca represents the Conditional Augmentation[11] and
F attn represents the Self Attention module.

4.2.1 Stage-I Generator

At first, a noise vector and sentence embedding vector is sent to our Genera-
tor G0 . The sentence embedding vector is passed through the Conditional
Augmentation[11] and produces a sentence conditional vector. The sentence
conditional vector and the noise vector are then concatenated to produce
conditioning vector. The conditioning vector is then passed through the

5
linearity layer, normalization layer, non-linearity (ReLU) layer. The vector is
then upsampled 4 times to produce a (batch size, 32, 64, 64) vector. This
vector is then convoluted over to produce a (batch size, 3, 64, 64) size images.
This is the 64x64 image produced by the first Generator G0 .

4.2.2 Stage-II Generator

In the 2nd Generator G1 , no noise is applied. Instead, the image from the
G0 is taken and reshaped into (batch size, 8, 64, 64) in order to apply Self
Attention[6] mechanism on the image. The image is again reshaped back to
(batch size, 3, 64, 64). For the experiment of applying attention on text only,
the attention is applied on the sentence conditional vector. Then the vector
and the image from G0 is concatenated. In both cases, the image is passed
through a bunch of hidden layers, batch normalization layer and non-linearity
(ReLU) layer in order to finally produce an encoded image. In case of the
experiment on applying attention on image only, the image is concatenated
with the sentence conditional vector and a 2nd conditioning vector is produced.
Again, in both cases, this conditioning vector is then again passed through
hidden layers, batch normalization layer and a non-linearity (ReLU) layer.
The vector is then sent to a Residual Block[15].

Before going any further, a little discussion on Residual Blocks is necessary.

Neural Networks are universal function approximators which theoretically can
increase their accuracy based on the number of layers used. But practically
speaking, there are the problems of vanishing gradients and curse of dimen-
sionality, which makes learning both simple and complex functions for the
network becomes quite difficult. At one stage, the accuracy starts to saturate
and eventually degrade. This is where residual blocks[15] come in. Residual
blocks keep the residue gradients and use that to learn for the present layer
and the layer about 2-3 hops away. In this way, simple functions like identity
functions can also be trained inside the network. If x is the image, Wi is the
weight applied on the image in the present layer and Ws is the weight applied
on the image 2-3 hops back, then the equation for residual blocks[15] become
-

y = F (x, Wi ) + Ws x

The above equation is the case when the input and output dimensions
are not of the same dimension. In our case, the weights are generated from
applying Spectral Normalization[7] on the layers.

6
After going through Residual Block, 4 times upsampling and a final
Spectral Normalization[7] and non-linearity (Tanh) gives the final (batch size,
3, 256, 256) image.

4.3 Self Attention

Self Attention solves the problems that most GAN models[16][17][18] have
that are built using convolutional layers. In case of convolution that processes
information from local neighbourhood, it becomes computationally inefficient
for modelling long-range dependencies in images. In our case, Self Attention
mechanism is applied because of the fact that, it produces attention maps
based on images and also in texts only which we believe that improves the
overall rendering of image and shapes.

Self Attention adapts the model of [19] which enables both the generator
and the discriminator to model relationships between widely separated spatial
regions. The image features of xRC∗N are first transformed into 2 feature
spaces, f (x) = Wf x, g(x) = Wg x
exp(sij )
βj,i = N
P
, where sij = f (xi )T g(xj )
exp(sij )
i=1

The output of the attention model is -

N
P
oj = βj,i h(xi ), where (xi ) = Wh (xi )
i=1

Here, Wg , Wf RC̄∗C , Wh RC∗C and C̄ = C/s. The final output then becomes -

yi = γoi + xi , where γ is initialized as 0

4.4 Loss Function

In order to generate close to real images, the loss functions are pivotal. Here,
Binary Cross Entropy has been used as the loss function for both generator
and discriminator.

LGi = −1/2[log(Di (xi ))] − 1/2[log(Di (xi , e))]

LDi = {−1/2[log(Di (xi ))] − 1/2[log(Di (xi , e))]}real img +
{−1/2[log(Di (xi ))] − 1/2[log(Di (xi , e))]}f ake img + {−1/2[log(Di (xi ))] −
1/2[log(Di (xi , e))]}wrong img

7
KL Divergence Loss further made the training process easier for the gen-
erators and discriminators.

P (x) Q(x)
P
DKL (P ||Q) = − xX P (x)

Hence the final loss becomes -

LGi = LGi + DKL

L = LGi + LDi

Here, the discriminator is trained for the real image, i.e. the image from
the dataset; the fake image, i.e. the image generated from the generator and
the wrong image, i.e. changing labels for the images in the dataset in order
to train the discriminator to consider these as false images as well.

5 Experiment
For our implementation we used various steps and explored each criteria
differently. We used cropped 64x64 images for our training for Stage 1 and
256x256 for Stage 2, with batch size 32. The training Dataset is Birds Dataset
of CUB-200 (2011) which contains 200 bird species. The Generator and
Discriminator networks are trained using the ADAM optimizer. The learning
rate was set to 0.0002 initially and decayed by half after every 100 epochs.
Several attempts for training were taken and modifications were made based
on demand.

5.1 Evaluation Metric

It is very difficult to evaluate generative models result because the generated
images has to be scored. Previous papers have applied human ranking and
Inception Score. Human ranking is difficult to do and it is really not possible
for every single image. So, we chose Inception Score and FID Score as our
main evaluating method. Although Inception Score is not really a good
metric and has some issues [20], the metric is still used because most of the
Generative Models use this one for evaluating.

On the other hand, Frchet Inception Distance (FID score) is a relatively

new metric in this area. It is actually a better measure because it calculates
the distance between the generated image and the real image. Thus, combin-

8
ing both scores we evaluate our model.

5.2 Analysis on Image

The training is done separately on two stages. The first stage is configured
to accept embedded text descriptions and noise. This stage produces 64x64
low resolution image. The purpose of this generator is to generate the correct
color and shape representation with respect to the original image. This is a
tough task since the GAN model is hard to converge. Our results on the first
generator draws the images quite good, though there are images where the
shapes are not correctly drawn due to the high variance in real image colors
and shapes. The second generator is responsible for drawing high quality
256x256 images from the generator 1 image.

5.2.1 One Stage (3x3 Convolution)

Figure 2: Sample Test Image (3x3 Convolution - 1 Stage Generator)

First, we train our one stage GAN with 3x3 convolution for the layers in
generator along with the default loss of generator. It was evident that the
discriminator was reaching 0 loss within few epochs and the generator was
not learning anything. This is a problem of GAN which was mentioned in the
original paper[1]. There is no defined way to solve this problem. To solve the
problem, the discriminator was frozen for initial epochs and then unfrozen
so that the generator can have enough time to learn the contexts. We used
this method throughout our experiments for the later experiments. However,
after unfreezing the discriminator still went back to zero loss state, which
means the discriminator became strong too quickly. The image generated
from the generator was not too good. We found Inception Score 4.01 ± 0.26
and FID Score 145.98 for this experiment.

9
5.2.2 One Stage (5x5 Convolution, KL Loss)

Figure 3: Sample Test Image (5x5 Convolution, KL Loss - 1 Stage Generator)

To address this problem we figured out that the discriminator is acting stronger
than the generator. So, instead of doing 3x3 convolution, we increased the
filter size to 5 to do 5x5 convolution on each upsampling of the features.
Upsampling was done by the nearest neighbour interpolation. This change
made the model stable a bit till 150 epochs. However, the results were
improving. To further address the issue, we added the KL Divergence loss to
the generator. This change made the GAN training to converge better. We
further noticed that reducing the learning rate after some defined epochs is a
good idea and thus we reduced learning rate by 1/2 after every 100 epochs.
These changes were consistent for the later experiments of our model. We
found Inception Score 4.05 ± 0.20 and FID Score 74.44 for this experiment.

5.2.3 Two Stage (No Attention)

Figure 4: Sample Test Image (No Attention - 2 Stage Generator)

For generating 256x256 images we added a second stage generator. First we

trained for sufficient epochs only with residual blocks and convolution layers.
The result was satisfactory as it matched the earlier paper results of this
particular area of text to image generating. The second stage itself does a
pretty good job on rendering high quality images. But there are images also

10
which are not in the correct shape and color. This is because the generator 1
fails to generate the correct shape and color and thus the generator 2 cannot
produce the correct image as it is conditioned on the first generator. We
found Inception Score 4.96 ± 0.24 and FID Score 49.33 for this experiment.

5.2.4 Two Stage(Attention On Image)

Figure 5: Sample Test Image (Attention on Image - 2 Stage Generator)

After training the vanilla two stage GAN, we applied Self Attention on the
conditioned vector which is produced from the addition of the generator 1
image and the text embedding. After sufficient epochs of training, we noticed
that the results images were good but not as good as the vanilla GAN. This
is because when applying attention, the whole image is considered. Thus,
the operation considers the background also. The attention does good when
producing good shape and colors but it fails to distinguish between the bird and
the background. In the original implementation of Self Attention in GAN[6]
was actually image to image synthesis. Since our model in conditioning text to
generate image, the attention mechanism fails to generate correct results. We
found Inception Score 3.22 ± 0.10 and FID Score 145.75 for this experiment.

5.2.5 Two Stage (Attention On Text)

Figure 6: Sample Test Image (Attention on Text - 2 Stage Generator)

11
Since we did not get satisfactory results on applying attention on generated
image, we also experimented applying attention on the sentence embedding
vector after the conditioning augmentation. After sufficient epochs, we
observed that this method produces really good images. We compared these
images from the same epoch with the no attention model, we observed that
the results improves. The main difference between these two is that, having
attention on text separates the backgrounds well while generating quality
images with detail. We found Inception Score 5.04 ± 0.37 and FID Score
42.87 for this experiment.

5.3 Quantitative and qualitative results

We tested our model after each experiment. The measured Inception Score
and the FID scores are done on the official CUB Dataset test set which
generates 2928 images. The FID score is based on Inception v3 model. We
also compared our results with the state-of-the-art models.

12
13
Figure 7: Comparison on models based on visual aspect

14
Method Inception Score FID Score

SATGAN(1 Stage, 3x3 Conv) 4.01 ± 0.26 145.98

SATGAN(1 Stage, 5x5 Conv, KL) 4.05 ± 0.20 74.44
SATGAN(2 Stage, No Attn) 4.96 ± 0.24 49.33
SATGAN(2 Stage, Image Attn) 3.22 ± 0.10 145.75
SATGAN(2 Stage, Text Attn) 5.04 ± 0.37 42.87
StackGAN++ 4.04 ± 0.05 15.30
AttnGAN 4.36 ± 0.03 –

SATGAN outperforms the best reported inception score by 0.68 ± 0.34,

i.e. 15.6% on CUB dataset. Though Inception score is not a good metric
for evaluating generative models as discussed in [20], but it is still used for
comparing with the state-of-the-art generative models. On the other hand,
the state-of-the art FID score could not be achieved. We believe the reason
to be: since our Inception Score is higher, hence the images produced contain
much more diversity than the state-of-the-art models. On the other hand,
the FID Score is a metric used for evaluating the distance between original
images from dataset and the generated images. Hence the reason for the FID
Score to come up short is pretty obvious. It might also be the case of lesser
training time than the state-of-the-art architectures themselves. Overall, in
order to evaluate a generative model, we believe it is necessary to take both
Inception Score and FID Score into account.

6 Conclusion
The contribution of our work is a newly proposed architecture SATGAN which
reduces the number of Generators and is still being able to generate high
quality images from text descriptions. From our experiment we found that at-
tention applied to only text descriptions is good enough and computationally
cost effective for generating high quality images. Our SATGAN outperforms
the best reported state-of-the-art architectures in terms of generating diver-
sified yet contextually close enough, high quality images. We believe this
experiment will usher in a new side of analysis on GAN architectures and
also be used as an example for understanding which metrics are better suited
for generative model evaluation.

15
References
[1] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in
Advances in neural information processing systems, pp. 2672–2680, 2014.

[2] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation

learning with deep convolutional generative adversarial networks,” arXiv
preprint arXiv:1511.06434, 2015.

[3] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta,

A. Aitken, A. Tejani, J. Totz, Z. Wang, et al., “Photo-realistic sin-
gle image super-resolution using a generative adversarial network,” in
Proceedings of the IEEE conference on computer vision and pattern
recognition, pp. 4681–4690, 2017.

[4] E. L. Denton, S. Chintala, R. Fergus, et al., “Deep generative image

models using a laplacian pyramid of adversarial networks,” in Advances
in neural information processing systems, pp. 1486–1494, 2015.

[5] T. Xu, P. Zhang, Q. Huang, H. Zhang, Z. Gan, X. Huang, and X. He,

“Attngan: Fine-grained text to image generation with attentional genera-
tive adversarial networks,” in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pp. 1316–1324, 2018.

[6] H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Self-attention

generative adversarial networks,” arXiv preprint arXiv:1805.08318, 2018.

[7] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, “Spectral normal-

ization for generative adversarial networks,” in International Conference
of Legal Regulators, 2018.

[8] E. Mansimov, E. Parisotto, J. L. Ba, and R. Salakhutdinov, “Generating

images from captions with attention,” in International Conference of
Legal Regulators, 2016.

[9] A. Van den Oord, N. Kalchbrenner, L. Espeholt, O. Vinyals, A. Graves,

et al., “Conditional image generation with pixelcnn decoders,” in Ad-
vances in Neural Information Processing Systems, pp. 4790–4798, 2016.

[10] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee, “Gen-

erative adversarial text to image synthesis,” in International Conference
on Machine Learning, 2016.

16
[11] H. Zhang, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang, and D. N.
Metaxas, “Stackgan: Text to photo-realistic image synthesis with stacked
generative adversarial networks,” in The IEEE International Conference
on Computer Vision (ICCV), Oct 2017.

[12] Z. Zhang, Y. Xie, and L. Yang, “Photographic text-to-image synthesis

with a hierarchically-nested adversarial network,” in Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, pp. 6199–
6208, 2018.

[13] A. Fu and Y. Hou, “Text-to-image generation using multi-instance stack-

gan,” Department of Computer Science Stanford University Stanford,
CA, vol. 94305, p. 26, 2016.

[14] V. Madhavan, P. Cerles, and N. Desai, “Image generation from captions

using dual-loss generative adversarial networks,”

[15] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in Proceedings of the IEEE conference on computer vision
and pattern recognition, pp. 770–778, 2016.

[16] N. Parmar, A. Vaswani, J. Uszkoreit, L. Kaiser, N. Shazeer, A. Ku, and

D. Tran, “Image transformer,” arXiv preprint arXiv:1802.05751, 2018.

[17] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and

X. Chen, “Improved techniques for training gans,” in Advances in neural
information processing systems, pp. 2234–2242, 2016.

[18] T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing

of gans for improved quality, stability, and variation,” in International
Conference of Legal Regulators, 2018.

[19] X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,”

in Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pp. 7794–7803, 2018.

[20] S. Barratt and R. Sharma, “A note on the inception score,” in ICML

2018 Workshop on Theoretical Foundations and Applications of Deep
Generative Models, 2018.

Text To Image Synthesis Using Self
No ratings yet
Text To Image Synthesis Using Self
20 pages
BTP Presentation On Text To Image Synthesis
100% (1)
BTP Presentation On Text To Image Synthesis
38 pages
IT641 RNN V2-Compressed
No ratings yet
IT641 RNN V2-Compressed
74 pages
Ai Image Generator
No ratings yet
Ai Image Generator
37 pages
Development and Deployment of A Generative Model-Based Framework For Text To Photorealistic Image Generation
No ratings yet
Development and Deployment of A Generative Model-Based Framework For Text To Photorealistic Image Generation
16 pages
2103 - Generative Adversarial Transformers
No ratings yet
2103 - Generative Adversarial Transformers
22 pages
Attn GAN
No ratings yet
Attn GAN
9 pages
DL Unit6 Gan
No ratings yet
DL Unit6 Gan
44 pages
Generative Adversarial Network An Overview of Theory and Applications
No ratings yet
Generative Adversarial Network An Overview of Theory and Applications
9 pages
12-DL-Deep Learning For GANS
No ratings yet
12-DL-Deep Learning For GANS
75 pages
(BESTFITTERS) Inverse Image Captioning Using Generative Adversarial Networks
No ratings yet
(BESTFITTERS) Inverse Image Captioning Using Generative Adversarial Networks
12 pages
GAN Report by Manisha
No ratings yet
GAN Report by Manisha
30 pages
Self-Attention Generative Adversarial Networks
No ratings yet
Self-Attention Generative Adversarial Networks
10 pages
Generative Adversarial Transformers
No ratings yet
Generative Adversarial Transformers
22 pages
Semantically Consistent Text To Fashion Image Synthesis With An Enhanced Attentional GAN
No ratings yet
Semantically Consistent Text To Fashion Image Synthesis With An Enhanced Attentional GAN
8 pages
Self-Attention GAN
No ratings yet
Self-Attention GAN
10 pages
Text-to-Image Generation Using Deep Learning
No ratings yet
Text-to-Image Generation Using Deep Learning
6 pages
Generative Ai
No ratings yet
Generative Ai
21 pages
Mirrorgan: Learning Text-To-Image Generation by Redescription
No ratings yet
Mirrorgan: Learning Text-To-Image Generation by Redescription
10 pages
Saw Gan
No ratings yet
Saw Gan
11 pages
Introduction To Recurrent Neural Network
No ratings yet
Introduction To Recurrent Neural Network
10 pages
2021 Arxiv - TransGAN Two Transformers Can Make One Strong GAN
No ratings yet
2021 Arxiv - TransGAN Two Transformers Can Make One Strong GAN
13 pages
A Research On Generative Adversarial Networks Applied To Text Generation
No ratings yet
A Research On Generative Adversarial Networks Applied To Text Generation
5 pages
Introduction Generative Adversarial Networks
No ratings yet
Introduction Generative Adversarial Networks
41 pages
GAN Technical Final Report
No ratings yet
GAN Technical Final Report
21 pages
Gan Types
No ratings yet
Gan Types
8 pages
ImageGenerationwithGans basedTechniquesASurvey
No ratings yet
ImageGenerationwithGans basedTechniquesASurvey
19 pages
Attngan PDF
No ratings yet
Attngan PDF
9 pages
Gan June 2019
No ratings yet
Gan June 2019
28 pages
Ijariie 26613
No ratings yet
Ijariie 26613
5 pages
Generative Adversarial Networks
No ratings yet
Generative Adversarial Networks
11 pages
Deep Learning Based Text To Image Genera
No ratings yet
Deep Learning Based Text To Image Genera
6 pages
Background: Image Transformer
No ratings yet
Background: Image Transformer
6 pages
Text Generation Based On Generative Adversarial Nets With Latent Variable
No ratings yet
Text Generation Based On Generative Adversarial Nets With Latent Variable
13 pages
Tao DF-GAN A Simple and Effective Baseline For Text-to-Image Synthesis CVPR 2022 Paper
No ratings yet
Tao DF-GAN A Simple and Effective Baseline For Text-to-Image Synthesis CVPR 2022 Paper
11 pages
Xu AttnGAN Fine-Grained Text CVPR 2018 Paper
No ratings yet
Xu AttnGAN Fine-Grained Text CVPR 2018 Paper
9 pages
BTP Report On Text To Image Synthesis
No ratings yet
BTP Report On Text To Image Synthesis
62 pages
3rd Unit Notes
No ratings yet
3rd Unit Notes
16 pages
Photographic Text-to-Image Synthesis With A Hierarchically-Nested Adversarial Network
No ratings yet
Photographic Text-to-Image Synthesis With A Hierarchically-Nested Adversarial Network
10 pages
Text To Image Synthesis Using Generative Adversarial Networks
No ratings yet
Text To Image Synthesis Using Generative Adversarial Networks
10 pages
Frank Gabel Eml2018 Report
No ratings yet
Frank Gabel Eml2018 Report
15 pages
Cycle-Consistent Inverse GAN For Text-to-Image Synthesis - 3474085.3475226
No ratings yet
Cycle-Consistent Inverse GAN For Text-to-Image Synthesis - 3474085.3475226
2 pages
Atharv Report Final
No ratings yet
Atharv Report Final
23 pages
CS485 Ch5 Transformers
No ratings yet
CS485 Ch5 Transformers
50 pages
Image Generation From Caption
No ratings yet
Image Generation From Caption
10 pages
Meta
No ratings yet
Meta
17 pages
Base Paper Batch 9 Final Updated 3
No ratings yet
Base Paper Batch 9 Final Updated 3
10 pages
Generative Adversarial Text To Image Synthesis
No ratings yet
Generative Adversarial Text To Image Synthesis
1 page
Rishab Paper Final
No ratings yet
Rishab Paper Final
7 pages
perceptionGAN Preprint PDF
No ratings yet
perceptionGAN Preprint PDF
7 pages
Image Transformer: Van Den Oord & Schrauwen 2014 Bellemare Et Al. 2016
No ratings yet
Image Transformer: Van Den Oord & Schrauwen 2014 Bellemare Et Al. 2016
10 pages
A Review of Generative Adversarial Networks For Computer Vision TasksElectronics Switzerland
No ratings yet
A Review of Generative Adversarial Networks For Computer Vision TasksElectronics Switzerland
17 pages
Engproc 20 00016 With Cover
No ratings yet
Engproc 20 00016 With Cover
7 pages
Proceedings of Spie: A Survey On Generative Adversarial Networks and Their Variants Methods
No ratings yet
Proceedings of Spie: A Survey On Generative Adversarial Networks and Their Variants Methods
8 pages
A Survey of Image Synthesis and Editing With Generative Adversarial Networks PDF
No ratings yet
A Survey of Image Synthesis and Editing With Generative Adversarial Networks PDF
15 pages
Sr. No Title Published Problem Statement Methodology Dataset Dataset Avail-Ability
No ratings yet
Sr. No Title Published Problem Statement Methodology Dataset Dataset Avail-Ability
2 pages
Lata 2019
No ratings yet
Lata 2019
4 pages
Literature Review Infographic
100% (1)
Literature Review Infographic
6 pages
Image To Image Translation Using Generative Adversarial Network
No ratings yet
Image To Image Translation Using Generative Adversarial Network
5 pages
Developing Frameworks For Secure, Privacy-Preserving Data Collection and Sharing For AI-powered Cyber Threat Intelligence
No ratings yet
Developing Frameworks For Secure, Privacy-Preserving Data Collection and Sharing For AI-powered Cyber Threat Intelligence
12 pages
2025年全球半导体产业展望
No ratings yet
2025年全球半导体产业展望
17 pages
Implementing A Digital Transformation: November 2018
No ratings yet
Implementing A Digital Transformation: November 2018
5 pages
Unit 3
No ratings yet
Unit 3
35 pages
B.tech CSE Course Book (21 25) WithBAO
No ratings yet
B.tech CSE Course Book (21 25) WithBAO
144 pages
WAN-IFRA - World Press Trends Outlook 2023 24
No ratings yet
WAN-IFRA - World Press Trends Outlook 2023 24
58 pages
Artificial Intelligence in Healthcare - Past, Present and Future
No ratings yet
Artificial Intelligence in Healthcare - Past, Present and Future
14 pages
Syllabus - ML
No ratings yet
Syllabus - ML
9 pages
Recent Advances in Natural Language Processing V Selected Papers From RANLP 2007 1st Edition Nicolas Nicolov (Ed.) Download PDF
100% (6)
Recent Advances in Natural Language Processing V Selected Papers From RANLP 2007 1st Edition Nicolas Nicolov (Ed.) Download PDF
84 pages
Nca Aiio
No ratings yet
Nca Aiio
11 pages
Extended Difinition
No ratings yet
Extended Difinition
5 pages
Semester - 6-Machine Learning
No ratings yet
Semester - 6-Machine Learning
4 pages
Plagarism Report
No ratings yet
Plagarism Report
3 pages
By: Nikita Vyas (12069) Surabhi Raj Singh (12070) Raviraj (12060) Shobhit Singh (12075) Amardeep Singh (12061)
No ratings yet
By: Nikita Vyas (12069) Surabhi Raj Singh (12070) Raviraj (12060) Shobhit Singh (12075) Amardeep Singh (12061)
22 pages
Chapter 18
No ratings yet
Chapter 18
31 pages
RP Ai
No ratings yet
RP Ai
22 pages
Smart Task Manager BRD
No ratings yet
Smart Task Manager BRD
2 pages
Attitude and Stress Among IT Employees Towards AI
No ratings yet
Attitude and Stress Among IT Employees Towards AI
33 pages
1 PB
No ratings yet
1 PB
3 pages
Accenture Banking Top 10 Trends 2024
No ratings yet
Accenture Banking Top 10 Trends 2024
48 pages
A New Way To Create Virtual Authors Writers Using AI Based Technology With Optimized Voice and Style
No ratings yet
A New Way To Create Virtual Authors Writers Using AI Based Technology With Optimized Voice and Style
6 pages
Verify Rag
No ratings yet
Verify Rag
15 pages
Syllabus
No ratings yet
Syllabus
5 pages
MCA Leaflet
No ratings yet
MCA Leaflet
2 pages
1 Eric Boosting304FinalRpdf
No ratings yet
1 Eric Boosting304FinalRpdf
19 pages
Audiocodes - Stop The Meeting Madness How Voiceai Improves Company Meeting Roi
No ratings yet
Audiocodes - Stop The Meeting Madness How Voiceai Improves Company Meeting Roi
5 pages
Iit Tech Fest Bombay 2022-23
No ratings yet
Iit Tech Fest Bombay 2022-23
11 pages
ASMEP Prog Book
No ratings yet
ASMEP Prog Book
13 pages
Analisis Clustering Dokumen Tugas Akhir Mahasiswa Sistem Informasi Universitas Nasional Menggunakan Metode K-Means Clustering
No ratings yet
Analisis Clustering Dokumen Tugas Akhir Mahasiswa Sistem Informasi Universitas Nasional Menggunakan Metode K-Means Clustering
7 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Histogram Equalization: Enhancing Image Contrast for Enhanced Visual Perception
From Everand
Histogram Equalization: Enhancing Image Contrast for Enhanced Visual Perception
Fouad Sabry
No ratings yet