Cycle-Consistent Inverse GAN For Text-to-Image Synthesis - 3474085.3475226

Uploaded by

SYA63Raj More

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views2 pages

Cycle-Consistent Inverse GAN For Text-to-Image Synthesis - 3474085.3475226

Uploaded by

SYA63Raj More

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Poster Session 1 MM ’21, October 20–24, 2021, Virtual Event, China

Cycle-Consistent Inverse GAN for Text-to-Image Synthesis

Hao Wang1,2 , Guosheng Lin1 , Steven C. H. Hoi3 , Chunyan Miao1,2∗
1 School of Computer Science and Engineering, Nanyang Technological University (NTU), Singapore
2 Joint NTU-UBC Research Centre of Excellence in Active Living for the Elderly, NTU, Singapore
3 Singapore Management University, Singapore

{hao005,gslin,ascymiao}@ntu.edu.sg,[email protected]

ABSTRACT
conditional text
This paper investigates an open research task of text-to-image
synthesis for automatically generating or manipulating images 

from text descriptions. Prevailing methods mainly take the textual
descriptions as the conditional input for the GAN generation, and
(a) Conventional text-to-image
need to train dierent models for the text-guided image generation generation architecture
and manipulation tasks. In this paper, we propose a novel unied
framework of Cycle-consistent Inverse GAN (CI-GAN) for both GAN disentangled
text-to-image generation and text-guided image manipulation tasks. latent space 
Specically, we rst train a GAN model without text input, aiming
to generate images with high diversity and quality. Then we learn
a GAN inversion model to convert the images back to the GAN change
feather colour
latent space and obtain the inverted latent codes for each image,
where we introduce the cycle-consistency training to learn more change
robust and consistent inverted latent codes. We further uncover the belly colour

semantics of the latent space of the trained GAN model, by learning

latent space alignment
a similarity model between text representations and the latent codes.
In the text-guided optimization module, we can generate images (b) Our proposed framework
with the desired semantic attributes through optimization on the
inverted latent codes. Extensive experiments on the Recipe1M and
Figure 1: The comparison of conventional architecture and
CUB datasets validate the ecacy of our proposed framework.
our proposed framework for text-to-image generation. Ex-
isting works mainly take the text feature-conditioned GAN
CCS CONCEPTS structure, where the limited combinations of text and im-
• Computing methodologies → Computer vision. ages will aect the diversity of generation. While we adopt
a decoupled learning scheme, we rst train a GAN model
KEYWORDS without text, then we discover the semantics of the latent
GAN, Text-to-image synthesis, Cycle consistency space W of the trained GAN. We allow the text representa-
ACM Reference Format: tions to be matched with the latent codes, such that we can
Hao Wang1,2 , Guosheng Lin1 , Steven C. H. Hoi3 , Chunyan Miao1,2 . 2021. control the semantic attributes of the synthesised images by
Cycle-Consistent Inverse GAN for Text-to-Image Synthesis. In Proceedings changing the latent codes.
of the 29th ACM International Conference on Multimedia (MM ’21), October
20–24, 2021, Virtual Event, China. ACM, New York, NY, USA, 9 pages. https:
//doi.org/10.1145/3474085.3475226 text descriptions, typically based on the Generative Adversarial
Networks (GANs) approaches. It has various potential applications,
1 INTRODUCTION such as visual content design and art generation. However, text-to-
image synthesis is a challenging cross-modal task, as we need to
Text-to-image synthesis [6, 20, 21, 26, 35–37, 42] aims to generate
interpret the semantic attributes hidden in the text and produce
images that have semantic contents corresponding to the input
images with high diversity and good quality.
∗ Corresponding author Prevailing works [6, 15, 38, 42] on text-to-image generation
mainly build their frameworks based on StackGAN [37], which
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed can generate high-resolution images progressively. Specically, the
for prot or commercial advantage and that copies bear this notice and the full citation StackGAN model stacks multiple generators and discriminators,
on the rst page. Copyrights for components of this work owned by others than ACM which can generate initial low-resolution images with rough shapes
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a and colour attributes rst and then rene the initial images to the
fee. Request permissions from [email protected]. high-resolution ones. To improve the semantic correspondence
MM ’21, October 20–24, 2021, Virtual Event, China between the textual descriptions and the generated images, Xu
© 2021 Association for Computing Machinery.
ACM ISBN 978-1-4503-8651-7/21/10. . . $15.00 et al. propose AttnGAN [35] to discover the attribute alignment
https://doi.org/10.1145/3474085.3475226 between image and text by pretraining an attentional similarity

630
Poster Session 1 MM ’21, October 20–24, 2021, Virtual Event, China

model. However, the paired text-image training for GAN model • We uncover the semantics of the latent codes, based on which
limits the diversity of the model representation, since we only have we can generate high-quality images corresponding to the
limited combinations of text and images and the generated images textual descriptions.
are regularized by the corresponding real images and the text pairs.
We evaluate our proposed framework CI-GAN on two public datasets
Moreover, it is hard to use the aforementioned framework to only
in the wild, i.e. Recipe1M and CUB datasets. We conduct extensive
change one attribute while preserve other text-irrelevant attributes
experiments to analyse the ecacy of the CI-GAN. Finally, we
in the generated images, hence we need to train an extra module
present quantitative and qualitative results of our proposed meth-
to do the text-based image manipulation task [16].
ods and visualizations of the generated images.
The advent of style-based generator architecture, such as Style-
GAN [13, 14], has greatly improved the realism, quality and diver-
sity of the generated images. Specically, the StyleGAN proposed 2 RELATED WORK
to map the input noise to another latent space W, which has been 2.1 Text-Based Image Generation
validated to yield more disentangled semantic representations. To
In this section, we review two categories of text-based image gen-
uncover the relationships between the latent codes in the space W
eration, i.e. text-to-image generation and text-based image manip-
and the synthesised images, we need to be aware of the distribu-
ulation. Generating images from text is a challenging task, as we
tions of the space W and nd the corresponding latent codes of the
need to correlate the cross-modal information [5, 30, 32]. To control
images. To this end, many research works adopt the GAN inversion
the correspondence between the text and the generated images,
technique [1, 27, 40] to invert the images back to the space W and
some prevailing text-to-image generation works [6, 15, 42] pretrain
obtain the inverted latent codes.
a Deep Attentional Multimodal Similarity Model (DAMSM) [35],
In this paper, we propose a novel framework of Cycle-consistent
which is used as a supervision to regularize the semantics of the
Inverse GAN (CI-GAN), where we incorporate the GAN inversion
generated images. Specically, Cheng et al. [6] propose to use a
methodology to the text-to-image synthesis task. Technically, we
renement module to return more complete caption sets, which
rst train a GAN inversion encoder to map the images to the latent
can provide more semantic information for the image generation.
space W of a trained StyleGAN, such that we can get the inverted
Zhu et al. [42] use a memory writing gate to rene the initial image
latent codes for the real images of the given datasets. To make the
and generate a high-quality one. Wang et al. [31] and Zhu et al.
original and inverted latent codes to be identical and follow the same
[38] aim to generate food images from the cooking recipes.
distribution, we introduce to apply the cycle consistency loss on the
For the text-based image manipulation task, it requires the model
GAN inversion training process, as obtaining similar inverted latent
to only change certain parts or attributes and preserve other text-
codes to the original ones is critical for our subsequent generation
irrelevant attributes on the input images. Li et al. [16] propose a
procedure.
module to combine the text and generated images to jointly corre-
We assume the StyleGAN learned space W is disentangled re-
late the details, such that the mismatched attributes can be rectied.
garding the semantic attributes of the target image dataset. For
Dong et al. [8] use an encoder-decoder architecture to take the
example, in the bottom row of Figure 1, when we want to change
original images as well as the textual descriptions as the input, and
the belly colour of the bird image, the rest semantic attributes, such
output the manipulated images, which is supervised by a discrimi-
as the bird shape, pose and feather colour, will remain the same,
nator.
only the bird belly colour changes to the black colour from the
However, the existing text-to-image generation works suer
orange colour. The disentanglement of the space W allows us to
from the limited diversity of the generated images, since they use
generate images with various attributes based on the optimization
the paired text and images for the GAN training. Moreover, the
on the latent codes. To generate images from the textual descrip-
aforementioned architectures adopt the multi-stage renement
tions, we learn a similarity model between text representations
[35, 37] to improve the resolution of the generated images, therefore
and the inverted latent codes, such that the latent codes can be
it is cumbersome to generate images with higher resolution. In
optimized to have the desired semantic attributes. We feed the opti-
contrast, our proposed method use the StyleGAN2 [14] model as
mized latent codes into the trained StyleGAN generator and realize
the generator backbone and we do not use the paired text input
the text-to-image generation task. Apart from the text-to-image
when training GAN, which guarantees the quality and diversity of
generation task, our proposed CI-GAN can also be used on the text-
the generated images.
based image manipulation task by applying an extra perceptual loss
between the original images and the images reconstructed from
the optimized latent codes. 2.2 GAN Inversion
Our contributions can be summarized as: Due to the lack of inference capabilities in GAN, the manipulation
in the latent space can only be applied on the generated images,
• We propose a novel GAN approach combining GAN inver- rather than any given real images. GAN inversion is popular way to
sion and cycle consistency training for the text-to-image manipulate the real images [3, 19, 41]. The purpose of GAN inver-
synthesis. The unied framework can be used for the text-to- sion is to invert the given image to the latent space of a pretrained
image generation and text-based image manipulation tasks. GAN model and obtain the inverted latent code, such that the image
• We use the improved GAN inversion methods with cycle can be faithfully reconstructed by the generator from the inverted
consistency training to invert real images to the GAN latent latent code. As a new technique to connect the images and the GAN
space and obtain the latent codes of images. latent space, GAN inversion enables the pretrained GAN model to

631

Kazadi Joel 9213934 DLMDSPWP01
No ratings yet
Kazadi Joel 9213934 DLMDSPWP01
18 pages
BTP Presentation On Text To Image Synthesis
100% (1)
BTP Presentation On Text To Image Synthesis
38 pages
Cross-Caption Cycle-Consistent Text-to-Image Synthesis
No ratings yet
Cross-Caption Cycle-Consistent Text-to-Image Synthesis
9 pages
Saw Gan
No ratings yet
Saw Gan
11 pages
Mirrorgan: Learning Text-To-Image Generation by Redescription
No ratings yet
Mirrorgan: Learning Text-To-Image Generation by Redescription
10 pages
Development and Deployment of A Generative Model-Based Framework For Text To Photorealistic Image Generation
No ratings yet
Development and Deployment of A Generative Model-Based Framework For Text To Photorealistic Image Generation
16 pages
Indian Institute OF Information Technology Allahabad: Text To Image Synthesis
No ratings yet
Indian Institute OF Information Technology Allahabad: Text To Image Synthesis
8 pages
Engproc 20 00016 With Cover
No ratings yet
Engproc 20 00016 With Cover
7 pages
Mirror Gan
No ratings yet
Mirror Gan
10 pages
Tao DF-GAN A Simple and Effective Baseline For Text-to-Image Synthesis CVPR 2022 Paper
No ratings yet
Tao DF-GAN A Simple and Effective Baseline For Text-to-Image Synthesis CVPR 2022 Paper
11 pages
Verisimilar Image Synthesis For Accurate Detection and Recognition of Texts in Scenes
No ratings yet
Verisimilar Image Synthesis For Accurate Detection and Recognition of Texts in Scenes
18 pages
Ijariie 26613
No ratings yet
Ijariie 26613
5 pages
Text-to-Image Generation Using Deep Learning
No ratings yet
Text-to-Image Generation Using Deep Learning
6 pages
Photographic Text-to-Image Synthesis With A Hierarchically-Nested Adversarial Network
No ratings yet
Photographic Text-to-Image Synthesis With A Hierarchically-Nested Adversarial Network
10 pages
Basepaper 1
No ratings yet
Basepaper 1
15 pages
Text-to-Image Synthesis With Generative Models Met
No ratings yet
Text-to-Image Synthesis With Generative Models Met
16 pages
Ernie-V LG: U G P - B V - L G: I Nified Enerative RE Training For Idirectional Ision Anguage Eneration
No ratings yet
Ernie-V LG: U G P - B V - L G: I Nified Enerative RE Training For Idirectional Ision Anguage Eneration
15 pages
Stylegan-T: Unlocking The Power of Gans For Fast Large-Scale Text-To-Image Synthesis
No ratings yet
Stylegan-T: Unlocking The Power of Gans For Fast Large-Scale Text-To-Image Synthesis
13 pages
Rishab Paper Final
No ratings yet
Rishab Paper Final
7 pages
Introduction To Recurrent Neural Network
No ratings yet
Introduction To Recurrent Neural Network
10 pages
TAM GAN - Tamil Text To Naturalistic Image Synthesis Using Conventional Deep Adversarial Networks - 3584019
No ratings yet
TAM GAN - Tamil Text To Naturalistic Image Synthesis Using Conventional Deep Adversarial Networks - 3584019
18 pages
Text To Image Synthesis Using Self
No ratings yet
Text To Image Synthesis Using Self
20 pages
Final All Correct
No ratings yet
Final All Correct
49 pages
BTP Report On Text To Image Synthesis
No ratings yet
BTP Report On Text To Image Synthesis
62 pages
Mpai05 - Final Document
No ratings yet
Mpai05 - Final Document
40 pages
Report Image Generation
No ratings yet
Report Image Generation
61 pages
Satgan Paper
No ratings yet
Satgan Paper
17 pages
2024 6 Pcig
No ratings yet
2024 6 Pcig
17 pages
Deep Learning Based Text To Image Genera
No ratings yet
Deep Learning Based Text To Image Genera
6 pages
An Adaptive Approach To Text To Image
No ratings yet
An Adaptive Approach To Text To Image
5 pages
Text-to-Image Synthesis With Generative Models Methods Datasets Performance Metrics Challenges and Future Direction Basiv
No ratings yet
Text-to-Image Synthesis With Generative Models Methods Datasets Performance Metrics Challenges and Future Direction Basiv
16 pages
Dual Adversarial Inference For Text-to-Image Synthesis
No ratings yet
Dual Adversarial Inference For Text-to-Image Synthesis
20 pages
Base Paper Batch 9 Final Updated 3
No ratings yet
Base Paper Batch 9 Final Updated 3
10 pages
Synthesizing Visual Realities Design and Implementation of A Text To Image Synthesizer Leveraging Spatial Transformer Generative Adversarial Networks
No ratings yet
Synthesizing Visual Realities Design and Implementation of A Text To Image Synthesizer Leveraging Spatial Transformer Generative Adversarial Networks
5 pages
Meta
No ratings yet
Meta
17 pages
A Realistic Image Generation of Face From Text Description Using The Fully Trained Generative Adversarial Networks
No ratings yet
A Realistic Image Generation of Face From Text Description Using The Fully Trained Generative Adversarial Networks
11 pages
From Words To Pictures Artificial Intelligence Based Art Generator
No ratings yet
From Words To Pictures Artificial Intelligence Based Art Generator
9 pages
Generative Adversarial Text To Image Synthesis
No ratings yet
Generative Adversarial Text To Image Synthesis
1 page
Survey Paper On Text-to-Image Generation
No ratings yet
Survey Paper On Text-to-Image Generation
8 pages
Learning Unsupervised Cross-Domain Image-to-Image Translation Using A Shared Discriminator
No ratings yet
Learning Unsupervised Cross-Domain Image-to-Image Translation Using A Shared Discriminator
9 pages
Yayi Final Seminar
No ratings yet
Yayi Final Seminar
19 pages
Tivgan: Text To Image To Video Generation With Step-By-Step Evolutionary Generator
No ratings yet
Tivgan: Text To Image To Video Generation With Step-By-Step Evolutionary Generator
10 pages
Liao Text To Image Generation With Semantic-Spatial Aware GAN CVPR 2022 Paper
No ratings yet
Liao Text To Image Generation With Semantic-Spatial Aware GAN CVPR 2022 Paper
10 pages
UVCGAN UNetVision Transformer Cycle-Consistent GAN For Unpaired
No ratings yet
UVCGAN UNetVision Transformer Cycle-Consistent GAN For Unpaired
17 pages
W Pg#s
No ratings yet
W Pg#s
17 pages
Semantically Consistent Text To Fashion Image Synthesis With An Enhanced Attentional GAN
No ratings yet
Semantically Consistent Text To Fashion Image Synthesis With An Enhanced Attentional GAN
8 pages
Image Generator
No ratings yet
Image Generator
11 pages
AI Image Generation
No ratings yet
AI Image Generation
12 pages
Documents 5
No ratings yet
Documents 5
5 pages
Frank Gabel Eml2018 Report
No ratings yet
Frank Gabel Eml2018 Report
15 pages
Dense Caption Imagining
No ratings yet
Dense Caption Imagining
8 pages
Report 16
No ratings yet
Report 16
9 pages
Text-To-Image Generation Using Generative AI
No ratings yet
Text-To-Image Generation Using Generative AI
5 pages
BTP - 6 Sem - Part1
No ratings yet
BTP - 6 Sem - Part1
40 pages
Image Generation From Caption
No ratings yet
Image Generation From Caption
10 pages
Xu AttnGAN Fine-Grained Text CVPR 2018 Paper
No ratings yet
Xu AttnGAN Fine-Grained Text CVPR 2018 Paper
9 pages
Avrahami Blended Diffusion For Text-Driven Editing of Natural Images CVPR 2022 Paper
No ratings yet
Avrahami Blended Diffusion For Text-Driven Editing of Natural Images CVPR 2022 Paper
11 pages
Nizan, Tal - 2019 - Breaking The Cycle-Colleagues Are All You Need
No ratings yet
Nizan, Tal - 2019 - Breaking The Cycle-Colleagues Are All You Need
10 pages
【2022】RiFeGAN2 Rich Feature Generation for Text-To-Image Synthesis From Constrained Prior Knowledge
No ratings yet
【2022】RiFeGAN2 Rich Feature Generation for Text-To-Image Synthesis From Constrained Prior Knowledge
14 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Bag of Words Model: Unlocking Visual Intelligence with Bag of Words
From Everand
Bag of Words Model: Unlocking Visual Intelligence with Bag of Words
Fouad Sabry
No ratings yet
BDA Practical Experiment 1
No ratings yet
BDA Practical Experiment 1
5 pages
Automated Facial Recognition
No ratings yet
Automated Facial Recognition
6 pages
Final Report
No ratings yet
Final Report
22 pages
Sem6 Minor Report
No ratings yet
Sem6 Minor Report
33 pages
LogBook 1
No ratings yet
LogBook 1
5 pages
Controllable Video Generation With Text-Based Instructions
No ratings yet
Controllable Video Generation With Text-Based Instructions
12 pages
Des Example Something
No ratings yet
Des Example Something
12 pages
Remus A Security Enhanced Operating Syst
No ratings yet
Remus A Security Enhanced Operating Syst
26 pages
Financial Education and Financial Knowledge: Evidence From Indian Schools
No ratings yet
Financial Education and Financial Knowledge: Evidence From Indian Schools
39 pages
A Survey of AI Text-to-Image and AI Text-to-Video Generators
No ratings yet
A Survey of AI Text-to-Image and AI Text-to-Video Generators
5 pages
Social Network Graph Mining
No ratings yet
Social Network Graph Mining
34 pages
Issues of Operating Systems Security
No ratings yet
Issues of Operating Systems Security
6 pages
Account STMT
No ratings yet
Account STMT
2 pages
Eternity of Sound and The Science of Mantras
100% (2)
Eternity of Sound and The Science of Mantras
115 pages
Professional Ed-WPS Office
100% (2)
Professional Ed-WPS Office
127 pages
7.19a - Abnormal Events
No ratings yet
7.19a - Abnormal Events
10 pages
QI Business Presentation 2
No ratings yet
QI Business Presentation 2
35 pages
Cme 270 Midterm Exam, Fall 2010 Professor Hofmann Notes
No ratings yet
Cme 270 Midterm Exam, Fall 2010 Professor Hofmann Notes
7 pages
DLL - English 4 - Q1 - W5
No ratings yet
DLL - English 4 - Q1 - W5
5 pages
Final Theory 2022 en
No ratings yet
Final Theory 2022 en
31 pages
Regulatory Guidelines To Medical Devices
No ratings yet
Regulatory Guidelines To Medical Devices
8 pages
Zfap410dk Service Manual PDF
100% (3)
Zfap410dk Service Manual PDF
85 pages
Chapman - From Their POV
No ratings yet
Chapman - From Their POV
18 pages
Dsa 24 H Imp
No ratings yet
Dsa 24 H Imp
1 page
Exemples de Writing English BAC
No ratings yet
Exemples de Writing English BAC
3 pages
Seal Aftermarket Products: An Easy Fix For A Self-Inflicted Failure
No ratings yet
Seal Aftermarket Products: An Easy Fix For A Self-Inflicted Failure
69 pages
Cylinder Head
No ratings yet
Cylinder Head
5 pages
Master Term Paper
100% (1)
Master Term Paper
8 pages
Immigrants and Crime
No ratings yet
Immigrants and Crime
36 pages
Math Investigation
No ratings yet
Math Investigation
20 pages
Why Triple Offset The Benefits of Triple Offset Butterfly Valves
100% (2)
Why Triple Offset The Benefits of Triple Offset Butterfly Valves
2 pages
NIPS2019 TGAN Supplementary PDF
No ratings yet
NIPS2019 TGAN Supplementary PDF
7 pages
Instruction Manual: Digital Genset Controller DGC-500
No ratings yet
Instruction Manual: Digital Genset Controller DGC-500
151 pages
Column Interaction Diagram
No ratings yet
Column Interaction Diagram
4 pages
Abhishek Dhiman
No ratings yet
Abhishek Dhiman
3 pages
Benefits in Planting Trees and Fruit Trees
100% (1)
Benefits in Planting Trees and Fruit Trees
2 pages
Chapter-1 Group7MMM
No ratings yet
Chapter-1 Group7MMM
4 pages
Blake PDF
No ratings yet
Blake PDF
96 pages
Astm d3359 22 English
No ratings yet
Astm d3359 22 English
8 pages
PIP-LV 3KVA Manual-20210701
No ratings yet
PIP-LV 3KVA Manual-20210701
46 pages
Rizal Course - Instructions For The Required Terminal Paper
No ratings yet
Rizal Course - Instructions For The Required Terminal Paper
2 pages