0% found this document useful (0 votes)

26 views7 pages

Rishab Paper Final

The document discusses using machine learning for text-to-image generation. Generative adversarial networks (GANs) are frequently used for this task. The goal is to create images that are semantically consistent with the input text. Several papers on text-to-image synthesis using GANs are reviewed. The methodology uses a GAN with a generator and discriminator trained in an adversarial manner to generate images from text.

Uploaded by

Rishab Tiwari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views7 pages

Rishab Paper Final

Uploaded by

Rishab Tiwari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Text to Image Generation Using Machine Learning

Rishab Tiwari [1], Dr. Chitra K[2]

Dayananda Sagar Academy of Technology and Management , Bangalore 560082[1] Dayananda

Sagar Academy of Technology and Management , Bangalore 560082[2]

[email protected], [email protected]

Abstract

A method called text-to-image involves creating images automatically from provided written descriptions.
It contributes significantly to artificial intelligence by tackling the problem of integrating textual and visual
input. One of the usefulness of automatic picture synthesis is the generation of images using conditional
generative models. For this, Generative Adversarial Networks (GANs) are frequently employed. Using
GANs, recent developments in the sector have made significant progress. An outstanding illustration of
deep learning's potential is the transformation of text into images. It is difficult to create a text-to-image
synthesis system that consistently creates realistic graphics based on predetermined criteria. Many of the
existing algorithms in this field struggle to produce visuals that precisely match the given text. In order to
solve this issue, we carried out a research work where we concentrated on developing the generative
adversarial network (GAN), a deep learning-based architecture. The aim of this research work is to create a
system that allows you to generate images that are semantically consistent.

Keywords

Generative Adversarial Networks, Convolutional neural network, deep learning

Introduction

The goal of text-to-image (T2I) creation is to produce aesthetically accurate and semantically consistent
images from textual descriptions. Given its significance in numerous applications, including as photo editing,
art creation, and computer-aided design, this particular problem has attracted a great deal of focus in the deep
learning community. The large dimensionality of the output space and the semantic gap between the textual
and visual domains are the key reasons why it also poses substantial obstacles. Once the technology is ready
for commercial usage, the creation of graphics from natural language has enormous promise for a variety of
future applications. As generative models, Generative Adversarial Networks (GANs) are capable of producing
new content. The goal of text-to-image synthesis is to turn verbal descriptions into aesthetically pleasing
pictures. GAN models are now frequently employed in this industry to get better outcomes. The fact that a
single text description might take on various configurations presents a barrier for deep learning. By giving the
model the proper training, this problem can be solved. As generative adversarial networks (GANs) have
developed, they have proven to be remarkably effective at a variety of tasks involving images, such as picture
synthesis, image super-resolution, data augmentation, and image-to-image conversion.
1

Problem Statement

Text comprehension can occasionally be difficult, and visualizing the information might make it even more
difficult. Additionally, there is a chance that some words or phrases will be misunderstood. However, it is
simpler to understand and accept the information when it is provided as graphics rather than text. Compared
to plain text, images are typically more appealing and compelling. The benefit of using visual aids is that
they can communicate information more effectively and immediately. Visual material has the power to
draw viewers in and hold their interest. Visual components are extremely important in many different tasks,
including presentations, learning, and communication. When developed well, visual communication has
several advantages.

Understanding Deep Learning

Deep learning is a subset of AI that focuses on data processing to carry out tasks like language translation
and object recognition by mimicking the functions of the human brain. Deep learning has made great
progress over the years and made a wealth of readily available data available. Since most of this data is
unstructured, is time-consuming for people to extract the necessary information. Deep learning has,
however, overcome this issue by making it possible to grasp and analyze such data effectively. Artificial
neural networks are used in deep learning in an effort to replicate how the human brain operates. Because
of their hierarchical form, these neural networks may process data across numerous layers. Convolutional
neural networks and recurrent neural networks are two examples of common neural network architectures.
In general, deep learning is revolutionizing many scenarios and applications through artificial intelligence.

Literature Review

[1]. Image generation using generative adversarial networks and attention mechanism. The 15th
International Conference on Computer and Information Science (ICIS), organized by IEEE/ACIS, took
place in 2016. A paper by Y. Kataoka, T. Matsubara, and K. Uehara titled "Image Generation using
Generative Adversarial Networks and Attention Mechanism" was presented at the 2016 IEEE/ACIS 15th
International Conference on Computer and Information Science (ICIS).

[2].Text to Photorealistic Image Synthesis with Stacked Generative Adversarial Networks" in Rutgers
University and Lehigh University August 2017. is a research paper authored by Han Zhang, Tao Xu,
Hongsheng Li, and Shaoting Zhang from Rutgers University and Lehigh University.

The article "StackGAN: Text to Photorealistic Image Synthesis with Stacked Generative Adversarial
Networks" concludes with presenting a novel technique for text-to-image synthesis utilizing a two-stage
GAN architecture. The model significantly advances generative picture synthesis by stacking two GANs to
produce higher-quality, more realistic pictures based on textual descriptions.

[3] “Text to Image Synthesis Using Generative Adversarial Networks" in The University of Manchester
May 2018. Stian Bodnar and Jon Shapiro from The University of Manchester wrote a study named "Text
to Image Synthesis Using Generative Adversarial Networks" and it was released in May 2018.

The study "Text to Image Using Generative Adversarial Synthesis explores the capacity of GANs to
generate realistic images from textual descriptions, and it concludes by presenting a GAN-based method for

2
text-to-image synthesis. The study likely makes a contribution to the multimodal learning and generative
picture synthesis fields, where it is particularly difficult to combine text and image data.

[4]. Large for high-fidelity natural picture synthesis, use scale gan training. 2019 saw the release of by
Andrew Brock, Karen Simonyan, Jeff Donahue, and Brock. In this essay, the authors investigate methods
for massively training GANs, or generative adversarial networks.

With an emphasis on scaling GAN training to accommodate big datasets and produce high-resolution,
highfidelity images, the work "Large Scale GAN Training for High Fidelity Natural Image Synthesis"
demonstrates a significant improvement in GAN-based image synthesis. The study has significance for
numerous computer vision and graphics applications and advances generative models for realistic picture
synthesis.

Methodology

Generative Adversarial Network (GAN), which consists of a generator and a discriminator, is the deep
learning method we utilized. For text to picture generation, we also employed Tensorflow, Numpy,
NLTK, and Tensorlayer. Tensorflow is essentially a machine learning library. In comparison to other
deep learning libraries, it compiles quicker. Additionally, both CPU and GPU computing units are
supported. We use the Python Pickle module for data serialization in our network design. By converting
objects into byte streams, this module enables us to easily store the data in files and transfer it between
different systems and applications.

Generative Adversarial Networks. is an unsupervised learning strategy that trains a generative model to
produce fresh examples. GANs can be used in many domains, such as the synthesis of images and
sounds, and use neural networks to create new instances of data. The term "generative" is used to describe
learning a model that can produce fresh data in the context of GANs, and the model is trained using
neural networks [6]. The discriminator and the generator are the two component sections of the GAN.

Generator:
The generator in the Generative Adversarial Network (GAN) is in charge of producing fresh instances of
data, which are frequently bogus or synthetic examples. The discriminator, whose job it is to discern
between actual and fraudulent data, is then shown these created samples. Goal of the generator is to
provide samples that successfully trick or perplex the discriminator, making it challenging for the hater of
diversity to correctly determine whether a sample is real or artificial. The generator and discriminator's
competitive procedure encourages learning and long-term development of both models.

Discriminator:
The discriminator in a Generative Adversarial Network (GAN) is in charge of separating real samples
from phony samples produced by the generator. Deep neural networks are used for the generator and
discriminator. Goal of the generator is to is to trick the discriminator by creating phony instances that
resemble actual data. The discriminator, on the other hand, seeks to accurately recognize and categorize
genuine data samples. As a result, there is competition between the generator and discriminator.

Proposed Methodology:

We used Conditional Generative Adversarial Networks (GANs) in combination with Recurrent Neural

3
Networks (RNNs) and Convolutional Neural Networks (CNNs) as part of the training phase of our deep
learning-based generative models to produce meaningful images based on textual descriptions. Our data set
included floral photos and the language descriptions that go with them.
We preprocessed the textual data and scaled the photos to a fixed dimension in order to generate
convincing visuals from text using GANs. We parsed the dataset's caption sentences, built a vocabulary
list, and gave each caption a special ID. The photos were loaded and appropriately scaled. These
previously processed textual and visual data then served as the foundation for our suggested model.

We used RNNs to extract the contextual information from the text sequences. The association between
words at various time stamps was created via RNNs, which allowed the model to comprehend the textual
descriptions. We combined RNN and CNN to carry out the texttoimage mapping. Without human input,
CNN retrieved pertinent details from the photographs.

An input series of textual descriptions was fed into the RNN during training, and the RNN transformed
the text into 256-word embeddings. Then, a 512-dimensional noise vector was concatenated with these
word embeddings. With a gated-feedback generator of size 128 and a batch size of 64, we trained our
model while feeding the generator inputs of noise and text.

We used the textual description's extracted semantic data as input for the generator model. The generator
used this semantic data to translate the distinguishing details into pixel-level data and produce related
images. The discriminator then used these generated images as input along with accurate or inaccurate
verbal descriptions and authentic sample images from the collection.

The model receives as input during training a series of unique pairings made up of photos and their
matching textual descriptions. This is done in order to achieve the discriminator's goals. The input
pairings consist of produced images with accurate text descriptions, erroneous images with inaccurate
text descriptions, and genuine photographs with accurate text descriptions.

To enable the model to ascertain whether a specific image and text pairing are in alignment with one
another, actual image and real text combinations are used. A mismatch between the image and the caption
is indicated if an incorrect image is matched with a true written description.

The discriminator is taught to distinguish between authentic and artificial images. The discriminator's
classification performance is initially mainly concerned with telling correct images from false ones. In
order to enhance weight updates and give training feedback to both the generator and discriminator
models, the loss is estimated during training.

Result

The user enters text into the GUI application, which then processes it and displays the accompanying
image. As a result, the application displays an image that matches the written description.

4
Fig 1.

Fig 2.

5
Conclusion

We have demonstrated through experimentation that our architecture is capable of generating graphics
that reflect the aesthetic of the given dataset. We are certain that the generated images can accurately
depict the content stated in the captions and that the system can handle different styles if enough time is
allotted for creating a styled dataset, training the model, and doing hyperparameter search. It's crucial to
keep in mind, too, that creating a stylised picture dataset and training the model might be computationally
expensive. Therefore, unless the time savings it offers are essential to the desired application and
adequate training time is available, we do not advise attempting to instruct a system with this designmany
different applications and business use cases, text-based synthetic image synthesis is extremely useful. It
can help machine learning models that lack enough picture data by allowing them to train using artificial
images that are created from text descriptions. Additionally, chatbots can make use of this ability to
generate pertinent and contextual visuals that will improve user interactions. By visualizing search
queries or completing gaps in their picture archives, search engines and stock photo websites can also
gain from synthetic image synthesis.

References

[1]. Akanksha Singh 1, Sonam Anekar 2, Ritika Shenoy 3, Prof. Sainath Patil 4,(2021). Text to Image
using Deep Learning. International Journal of Engineering Research & Technology (IJERT). ISSN: 2278-
0181. Vol. 10 Issue 04, April-2021

[2]. Y. Kataoka, T. Matsubara, and K. Uehara. Image generation using generative adversarial networks
and attention mechanism. In 2016 IEEE/ACIS 15th International Conference on Computer and
Information Science (ICIS), 2016.

[3]. Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, “StackGAN: Text to Photorealistic Image
Synthesis with Stacked Generative Adversarial Networks" in Rutgers University and Lehigh University
August 2017.

[4]. Stian Bodnar, Jon Shapiro, “Text to Image Synthesis Using Generative Adversarial Networks" in The
University of Manchester May 2018.

[5]. Andrew Brock, Jeff Donahue, and Karen Simonyan. Large scale gan training for high fidelity natural
image synthesis. 2019

6
Plagarism Report

Generative Adversarial Networks
100% (1)
Generative Adversarial Networks
14 pages
Text-to-Image Synthesis With Generative Models Met
No ratings yet
Text-to-Image Synthesis With Generative Models Met
16 pages
Text To Image Synthesis Using Self
No ratings yet
Text To Image Synthesis Using Self
20 pages
Gen AI Unit 3
No ratings yet
Gen AI Unit 3
52 pages
Man B W L32 40
100% (1)
Man B W L32 40
321 pages
Project 4 Report (Rohit&Gayatri)
No ratings yet
Project 4 Report (Rohit&Gayatri)
36 pages
Ai Image Generator
No ratings yet
Ai Image Generator
37 pages
Generative Adversarial Networks
No ratings yet
Generative Adversarial Networks
12 pages
GAN Report by Manisha
No ratings yet
GAN Report by Manisha
30 pages
Final All Correct
No ratings yet
Final All Correct
49 pages
Mpai05 - Final Document
No ratings yet
Mpai05 - Final Document
40 pages
Applications and Challenges of GAN in AI-powered A
No ratings yet
Applications and Challenges of GAN in AI-powered A
5 pages
DL Unit6 Gan
No ratings yet
DL Unit6 Gan
44 pages
Atharv Report Final
No ratings yet
Atharv Report Final
23 pages
Satgan Paper
No ratings yet
Satgan Paper
17 pages
18 Image Generation Using Gan's
No ratings yet
18 Image Generation Using Gan's
5 pages
Generative Adversarial Networks For Image and Video Synthesis: Algorithms and Applications
No ratings yet
Generative Adversarial Networks For Image and Video Synthesis: Algorithms and Applications
22 pages
A State-of-the-Art Review On Image Synthesis With Generative Adversarial Networks
No ratings yet
A State-of-the-Art Review On Image Synthesis With Generative Adversarial Networks
24 pages
Generative Adversarial Network An Overview of Theory and Applications
No ratings yet
Generative Adversarial Network An Overview of Theory and Applications
9 pages
Seminar 3258
No ratings yet
Seminar 3258
29 pages
3rd Unit Notes
No ratings yet
3rd Unit Notes
16 pages
Unit IV
No ratings yet
Unit IV
14 pages
GAN Technical Final Report
No ratings yet
GAN Technical Final Report
21 pages
Advances in AI
No ratings yet
Advances in AI
16 pages
Method Statement of Bored Piles 21
100% (1)
Method Statement of Bored Piles 21
40 pages
ImageGenerationwithGans basedTechniquesASurvey
No ratings yet
ImageGenerationwithGans basedTechniquesASurvey
19 pages
Applsci 13 10637 v2
No ratings yet
Applsci 13 10637 v2
29 pages
Indian Institute OF Information Technology Allahabad: Text To Image Synthesis
No ratings yet
Indian Institute OF Information Technology Allahabad: Text To Image Synthesis
8 pages
Saad Tech Sem 3
No ratings yet
Saad Tech Sem 3
12 pages
Introduction To Recurrent Neural Network
No ratings yet
Introduction To Recurrent Neural Network
10 pages
Chapter 6 - Consolidated Financial Statements (Part 3)
No ratings yet
Chapter 6 - Consolidated Financial Statements (Part 3)
41 pages
Text-to-Image Generation Using Deep Learning
No ratings yet
Text-to-Image Generation Using Deep Learning
6 pages
I Bought Shelby Sapp's $2,997 She Sells Remote Closing Program - An Overview - WikiSauce Profiles
No ratings yet
I Bought Shelby Sapp's $2,997 She Sells Remote Closing Program - An Overview - WikiSauce Profiles
10 pages
Frank Gabel Eml2018 Report
No ratings yet
Frank Gabel Eml2018 Report
15 pages
Base Paper Batch 9 Final Updated 3
No ratings yet
Base Paper Batch 9 Final Updated 3
10 pages
Text To Image Synthesis Using Generative Adversarial Networks
No ratings yet
Text To Image Synthesis Using Generative Adversarial Networks
10 pages
Deep Learning Based Text To Image Genera
No ratings yet
Deep Learning Based Text To Image Genera
6 pages
Deep & Reinforcement - Unit 3
No ratings yet
Deep & Reinforcement - Unit 3
8 pages
Synthesizing Visual Realities Design and Implementation of A Text To Image Synthesizer Leveraging Spatial Transformer Generative Adversarial Networks
No ratings yet
Synthesizing Visual Realities Design and Implementation of A Text To Image Synthesizer Leveraging Spatial Transformer Generative Adversarial Networks
5 pages
Tao DF-GAN A Simple and Effective Baseline For Text-to-Image Synthesis CVPR 2022 Paper
No ratings yet
Tao DF-GAN A Simple and Effective Baseline For Text-to-Image Synthesis CVPR 2022 Paper
11 pages
A Review of Generative Adversarial Networks For Computer Vision TasksElectronics Switzerland
No ratings yet
A Review of Generative Adversarial Networks For Computer Vision TasksElectronics Switzerland
17 pages
Office of The Punong Barangay
100% (2)
Office of The Punong Barangay
2 pages
Generative Adversarial Networks For Image and Video Synthesis: Algorithms and Applications
No ratings yet
Generative Adversarial Networks For Image and Video Synthesis: Algorithms and Applications
24 pages
Engproc 20 00016 With Cover
No ratings yet
Engproc 20 00016 With Cover
7 pages
Deep Generative Adversarial Networks For Image-To
No ratings yet
Deep Generative Adversarial Networks For Image-To
26 pages
Technical Report Guideline Vũ Quang Huy
No ratings yet
Technical Report Guideline Vũ Quang Huy
5 pages
Text To Image Translation Using Generative Adversarial Networks
No ratings yet
Text To Image Translation Using Generative Adversarial Networks
7 pages
Image Generation From Caption
No ratings yet
Image Generation From Caption
10 pages
Gernerative Adversarial Network
No ratings yet
Gernerative Adversarial Network
1 page
A Survey of Image Synthesis and Editing With Generative Adversarial Networks PDF
No ratings yet
A Survey of Image Synthesis and Editing With Generative Adversarial Networks PDF
15 pages
An Adaptive Approach To Text To Image
No ratings yet
An Adaptive Approach To Text To Image
5 pages
Photographic Text-to-Image Synthesis With A Hierarchically-Nested Adversarial Network
No ratings yet
Photographic Text-to-Image Synthesis With A Hierarchically-Nested Adversarial Network
10 pages
Lata 2019
No ratings yet
Lata 2019
4 pages
How To Create A Business That Can Sell Without You
No ratings yet
How To Create A Business That Can Sell Without You
12 pages
From Words To Pictures Artificial Intelligence Based Art Generator
No ratings yet
From Words To Pictures Artificial Intelligence Based Art Generator
9 pages
Report 16
No ratings yet
Report 16
9 pages
Master of Technology in Computer Science: Generative Adversarial Network
No ratings yet
Master of Technology in Computer Science: Generative Adversarial Network
11 pages
Master of Technology in Computer Science: Generative Adversarial Network
No ratings yet
Master of Technology in Computer Science: Generative Adversarial Network
11 pages
The Nature of Generative Adversarial Networks
No ratings yet
The Nature of Generative Adversarial Networks
4 pages
Adversarial Training Technique
No ratings yet
Adversarial Training Technique
3 pages
Ordinary Portland Cement, 33 Grade - Specification: Indian Standard
No ratings yet
Ordinary Portland Cement, 33 Grade - Specification: Indian Standard
12 pages
Tomorrow's Materials Today
No ratings yet
Tomorrow's Materials Today
2 pages
Generative Adversarial Text To Image Synthesis
No ratings yet
Generative Adversarial Text To Image Synthesis
1 page
Generative Adversarial Nets:Optimizations and Functioning: Motivation
No ratings yet
Generative Adversarial Nets:Optimizations and Functioning: Motivation
5 pages
Analysis of DELL N4110
No ratings yet
Analysis of DELL N4110
30 pages
Summary of Jurisdiction of Philippine Courts
No ratings yet
Summary of Jurisdiction of Philippine Courts
13 pages
Image To Image Translation Using Generative Adversarial Network
No ratings yet
Image To Image Translation Using Generative Adversarial Network
5 pages
A Survey On Generative Adversarial Networks (GANs)
No ratings yet
A Survey On Generative Adversarial Networks (GANs)
5 pages
A Sample Outdoor Advertising Agency Business Plan Template
No ratings yet
A Sample Outdoor Advertising Agency Business Plan Template
11 pages
Gary Mcbride: Hello Everyone. My Name'S Gary Mcbride
No ratings yet
Gary Mcbride: Hello Everyone. My Name'S Gary Mcbride
2 pages
Review of Gen AI Models For Financial Risk Management
No ratings yet
Review of Gen AI Models For Financial Risk Management
16 pages
AC1200 Wireless Dual Band Gigabit Router: Features
No ratings yet
AC1200 Wireless Dual Band Gigabit Router: Features
2 pages
APRS Primer-TM-D710A - E - GA - GE - IDM
No ratings yet
APRS Primer-TM-D710A - E - GA - GE - IDM
108 pages
HCI Ch-4 Interaction Lasts
No ratings yet
HCI Ch-4 Interaction Lasts
65 pages
Digest Evardone V Comelec
No ratings yet
Digest Evardone V Comelec
2 pages
Dijkstra's Algorithm
No ratings yet
Dijkstra's Algorithm
16 pages
Elora Catalog
No ratings yet
Elora Catalog
256 pages
Vocabulary: Example: My Income Fluctuates Wildly When I Work Part-Time
100% (1)
Vocabulary: Example: My Income Fluctuates Wildly When I Work Part-Time
2 pages
ANRITSU MG3692C Datasheet
No ratings yet
ANRITSU MG3692C Datasheet
17 pages
Evolution Bi Metal and Stainless
No ratings yet
Evolution Bi Metal and Stainless
5 pages
Wood Charcoal Strategies Web
No ratings yet
Wood Charcoal Strategies Web
56 pages
Dkrci - Pd.if0.a7.02 Sfa
No ratings yet
Dkrci - Pd.if0.a7.02 Sfa
10 pages
The PILOT - June 2018
No ratings yet
The PILOT - June 2018
20 pages
Lakian Petition Bankruptcy
No ratings yet
Lakian Petition Bankruptcy
44 pages
Tableau Session
No ratings yet
Tableau Session
2 pages
4ch Million Pixels SD Mobile DVR: User Manual
No ratings yet
4ch Million Pixels SD Mobile DVR: User Manual
36 pages
07 Budget Report
No ratings yet
07 Budget Report
2 pages
Ownership of Equipment - Claimant
No ratings yet
Ownership of Equipment - Claimant
2 pages