Report Image generation
Report Image generation
PROJECT REPORT
ON
SIGNIFYING IMMEDIATE IMAGE
GENERATION FROM TEXT
Submitted in partial fulfillment for the award of degree of
BACHELOR OF ENGINEERING
in
INFORMATION SCIENCE AND ENGINEERING
Submitted by
DARSHAN SHET 4CB21IS009
NITHISH 4CB21IS030
PRANAV SAVANT 4CB21IS035
SHANMUKHA MADDODI 4CB21IS045
CERTIFICATE
Certified that the project work entitled “SIGNIFYING IMMEDIATE IMAGE
GENERATION FROM TEXT” carried out by
External Viva:
1. . . . . . . . . . . . . . . . . . . . . . .....................
2. . . . . . . . . . . . . . . . . . . . . . .....................
CANARA ENGINEERING COLLEGE
(Affiliated to VTU Belagavi, Recognized by AICTE, Accredited by NBA)
Sudhindra Nagara, Benjanapadavu, Mangaluru - 574219,
Karnataka
DECLARATION
We hereby declare that the entire work embodied in this Project Report ti-
tled “ SIGNIFYING IMMEDIATE IMAGE GENERATION FROM
TEXT ” has been carried out by us at CANARA ENGINEERING COL-
LEGE, Mangaluru under the supervision of Prof. Pradeep M, for the
award of Bachelor of Engineering in Information Science And En-
gineering. This report has not been submitted to this or any other Uni-
versity for the award of any other degree.
We dedicate this page to acknowledge and thank those responsible for the
shaping of the project. Without their guidance and help, the experience
while constructing the dissertation would not have been so smooth and
efficient.
We would like to thank all faculty and staff of the Department of Infor-
mation Science And Engineering who have always been with us extending
their support, precious suggestions, guidance, and encouragement through
the project.We also express our gratitude to our beloved friends and par-
ents for their constant encouragement and support.
Darshan Shet
Nithish
Pranav Savant
Shanmukha Maddodi
i
Abstract
ii
Table of Contents
Acknowledgement i
Abstract ii
List of Tables ix
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation and Problem Statement . . . . . . . . . . . . . 1
1.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Scope and Limitations . . . . . . . . . . . . . . . . . . . . 2
1.5 Relevance and Type . . . . . . . . . . . . . . . . . . . . . 3
1.6 Organization of the Report . . . . . . . . . . . . . . . . . . 3
2 Literature Survey 4
2.1 Text-to-Image Synthesis With Generative Models: Meth-
ods, Datasets, Performance Metrics, Challenges, and Future
Direction . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 Brief Findings . . . . . . . . . . . . . . . . . . . . . 4
2.1.2 Design/Methodology/Techniques Adopted . . . . . 4
2.1.3 Results Achieved . . . . . . . . . . . . . . . . . . . 5
2.2 Recent Advances in Text-to-Image Synthesis: Approaches,
Datasets, and Future Research Prospects . . . . . . . . . . 5
2.2.1 Brief Findings . . . . . . . . . . . . . . . . . . . . . 5
iii
2.2.2 Design/Methodology/Techniques . . . . . . . . . . 6
2.2.3 Results Achieved . . . . . . . . . . . . . . . . . . . 6
2.3 GACnet-Text-to-Image Synthesis With Generative Models
Using Attention Mechanisms With Contrastive Learning . 6
2.3.1 Brief Findings . . . . . . . . . . . . . . . . . . . . . 6
2.3.2 Design/Methodology/Techniques Adopted in Arti-
cle n . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.3 Results Achieved . . . . . . . . . . . . . . . . . . . 7
2.4 Exploring Progress in Text-to-Image Synthesis: An In-Depth
Survey on the Evolution of Generative Adversarial Networks 7
2.4.1 Brief Findings . . . . . . . . . . . . . . . . . . . . 8
2.4.2 Design/Methodology/Techniques Adopted . . . . . 8
2.4.3 Results Achieved . . . . . . . . . . . . . . . . . . . 8
2.5 BigGan-based Bayesian reconstruction of natural images
from human brain activity . . . . . . . . . . . . . . . . . . 9
2.5.1 Brief Findings . . . . . . . . . . . . . . . . . . . . . 9
2.5.2 Design/Methodology/Techniques Adopted . . . . . 9
2.5.3 Results Achieved . . . . . . . . . . . . . . . . . . . 9
2.6 Use mean field theory to train a 200-layer vanilla Gan . . 10
2.6.1 Brief Findings . . . . . . . . . . . . . . . . . . . . 10
2.6.2 Design/Methodology/Techniques Adopted . . . . . 10
2.6.3 Results Achieved . . . . . . . . . . . . . . . . . . . 10
2.7 High-Resolution Image Synthesis with Latent Diffusion Mod-
els . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.7.1 Brief Findings . . . . . . . . . . . . . . . . . . . . . 11
2.7.2 Design/Methodology/Techniques Adopted . . . . . 11
2.7.3 Results Achieved . . . . . . . . . . . . . . . . . . . 11
2.8 Antenna Design Using a GAN-Based Synthetic Data Gen-
eration Approach . . . . . . . . . . . . . . . . . . . . . . . 12
2.8.1 Brief Findings . . . . . . . . . . . . . . . . . . . . 12
2.8.2 Design/Methodology/Techniques Adopted . . . . . 12
2.8.3 Results Achieved . . . . . . . . . . . . . . . . . . . 12
2.9 Text-to-Image Generator using GANs . . . . . . . . . . . 13
iv
2.9.1 Brief Findings . . . . . . . . . . . . . . . . . . . . . 13
2.9.2 Design/Methodology/Techniques Adopted . . . . . 13
2.9.3 Results Achieved . . . . . . . . . . . . . . . . . . . 13
2.10 A 28.6 mJ/iter Stable Diffusion Processor for Text-to-Image
Generation with Patch Similarity-based Sparsity Augmen-
tation and Text-based Mixed-Precision . . . . . . . . . . . 14
2.10.1 Brief Findings . . . . . . . . . . . . . . . . . . . . . 14
2.10.2 Design/Methodology/Techniques Adopted . . . . . 14
2.10.3 Results Achieved . . . . . . . . . . . . . . . . . . . 14
2.11 Comparison Table . . . . . . . . . . . . . . . . . . . . . . . 15
2.12 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4 System Design 21
4.1 Abstract Design . . . . . . . . . . . . . . . . . . . . . . . . 21
4.1.1 Architectural diagram . . . . . . . . . . . . . . . . 21
4.2 Proposed system . . . . . . . . . . . . . . . . . . . . . . . 23
4.3 Functional Design . . . . . . . . . . . . . . . . . . . . . . . 24
4.3.1 Modular design diagram . . . . . . . . . . . . . . . 24
4.3.2 Sequence diagram . . . . . . . . . . . . . . . . . . 25
4.3.3 Use case diagram . . . . . . . . . . . . . . . . . . . 26
4.4 Control Flow Design . . . . . . . . . . . . . . . . . . . . . 27
4.4.1 Activity diagram for use cases . . . . . . . . . . . 27
4.5 Data Flow Diagram . . . . . . . . . . . . . . . . . . . . . . 28
v
4.5.1 Zero-Level Data Flow Diagram . . . . . . . . . . . 28
4.5.2 First-Level Data Flow Diagram . . . . . . . . . . . 29
4.5.3 Second-Level Data Flow Diagram . . . . . . . . . . 30
4.5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . 31
5 Implementation 32
5.1 Software Used with Justification . . . . . . . . . . . . . . . 32
5.1.1 Frontend Development . . . . . . . . . . . . . . . . 32
5.1.2 Backend Development . . . . . . . . . . . . . . . . 33
5.1.3 Framework Used . . . . . . . . . . . . . . . . . . . 33
5.1.4 Coding Languages Used for Development . . . . . . 34
5.1.5 Operating System . . . . . . . . . . . . . . . . . . . 34
5.2 Hardware Used with Justification . . . . . . . . . . . . . . 35
5.3 Algorithm/Procedures Used in the Project in Different Mod-
ules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
vi
References 47
B Expo Details 49
vii
List of Figures
viii
List of Tables
ix
Chapter 1
Introduction
1.1 Background
Interior design has traditionally been a creative and manual process re-
quiring expertise to conceptualize and create living spaces. With advance-
ments in artificial intelligence (AI), particularly Generative Adversarial
Networks (GANs), the interior design process has seen a transformation.
AI now enables automated visualizations of interior spaces based on text
descriptions, significantly reducing time and effort while enhancing cre-
ativity. This project focuses on developing a Text-To-Image Generator
that uses AI to convert textual descriptions into 2D interior design im-
ages. Users can describe the desired room style, color palette, and furni-
ture preferences, and the system generates a visual representation. This
approach aims to provide an easy-to-use solution for those who lack design
expertise, allowing them to visualize their ideal living spaces.
1
Signifying Immediate Image Generation from Text Chapter 1
key problem this project addresses is the lack of accessible and affordable
tools for visualizing interior designs quickly and efficiently, particularly for
those with no design experience.
1.3 Objectives
• Use MongoDB to store and retrieve user data and design history.
This project aims to generate 2D interior designs for a limited set of room
types (living room, bedroom, kitchen, and office) and a select group of
design styles (modern, minimalist, industrial, etc.). The scope is confined
to generating visual concepts and providing basic cost estimates based
on pre-set material and furniture prices. While this approach is designed
to assist users in initial design visualization, the system does not aim to
offer detailed architectural planning or 3D modeling. The limitations of
the project include the potential lack of accuracy in the cost estimation,
which is based on general pricing datasets that may not reflect individual
market variations. Additionally, while the AI model can generate realistic
designs, its ability to understand complex or highly specific descriptions
may be limited.
The report is organized into structured sections that guide the reader
through the various components of the project. First, the Literature Re-
view examines existing studies on covert communication and related de-
tection technologies, providing context and identifying gaps in current re-
search. The Methodology section outlines the machine learning models,
data sources, and techniques employed to develop the detection system,
while Implementation details the technical aspects of building and inte-
grating these models for real-time monitoring. Following the methodol-
ogy, the Results and Analysis section presents the performance of the
system, including accuracy rates and the effec- tiveness of real-time de-
tection capabilities. The Conclusion and Future Work section summarizes
the project’s findings and identifies areas for improvement, including plans
for enhancing adaptability and computa- tional efficiency. This structure
is designed to ensure a comprehensive understanding of the project’s ob-
jectives, processes, and potential impact, paving the way for continued
development and application in real-world settings.
Literature Survey
4
Signifying Immediate Image Generation from Text Chapter 2
The study concludes that while significant progress has been made, current
models often struggle with complex or abstract textual inputs. It iden-
tifies promising directions, such as integrating multimodal learning and
advanced semantic embeddings, to improve synthesis accuracy. The find-
ings suggest that achieving human-level realism and diversity in generated
images requires addressing limitations in training data and computational
efficiency.
Author:Yong Xuan Tan, Chin Poo Lee, Mai Neo, Kian Ming Lim, Jit Yan
Lim, Ali Alqahtani[2]
2.2.2 Design/Methodology/Techniques
The paper concludes that while the field has made significant progress in
realism and diversity of generated images, there is room for improvement
in generating fine-grained details and achieving better alignment between
text descriptions and visual outputs. The authors also propose future
directions, such as leveraging hybrid architectures and improving dataset
diversity, to address these challenges and enable broader applications of
text-to-image synthesis
Author:Md. Ahsan Habib, Md. Anwar Hussen Wadud, Md. Fazlul Karim
Patwary, Mohammad Motiur Rahman, M. F. Mridha[4]
The research highlights that incorporating MFT into the GAN framework
allows for a more robust training process, enabling the model to learn
higher-dimensional data representations. The study revealed that the pro-
posed approach reduced gradient explosion and vanishing problems com-
mon in deep models, resulting in improved convergence and better gener-
ative performance compared to standard training approaches
The authors applied mean field theory, a concept originating from statisti-
cal physics, to stabilize the training of vanilla GANs, which often struggle
with deep architectures due to vanishing gradients and mode collapse.
By leveraging MFT, they approximated the interactions among neural
network units, effectively mitigating instability in backpropagation. The
methodology included mathematical analysis to validate MFT’s suitability
and extensive implementation experiments with multiple datasets, demon-
strating how deeper GANs can be trained without significant loss of sta-
bility.
Results from the study highlight the versatility of LDMs in multiple do-
mains, including text-to-image synthesis, semantic scene generation, and
image inpainting. These models achieve state-of-the-art performance while
using fewer computational resources compared to pixel-based diffusion
methods. The work emphasizes not only the effectiveness of latent dif-
fusion but also its practical implications for resource-constrained training
environments
2.12 Summary
The project focuses on generating images from textual prompts using ad-
vanced deep learning techniques. The frontend is built with React, offering
an intuitive interface with options for user registration and login. Once au-
thenticated, users are redirected to a Streamlit-based interface where they
can input prompts to generate images. The backend uses a GAN (Gen-
erative Adversarial Network) model to process the prompts, with Flask
acting as the intermediary for communication. User details, such as login
credentials, are securely stored in a MongoDB database. This workflow
ensures a seamless experience for users while maintaining performance,
modularity, and scalability.
• User Input: The system must allow users to provide input via a de-
scription of the desired room design, including style, room type, and
color palette.
• Image Output: The application must generate and display the design
concept based on user specifications and allow for image download.
• Secure User Access: Users should have a secure login system, ensuring
privacy and data protection.
17
Signifying Immediate Image Generation from Text Chapter 3
• Security: The system must ensure secure access to user accounts and
safeguard sensitive data, such as design preferences and personal de-
tails.
Safety requirements aim to protect user data and ensure the integrity of the
system. The platform should implement strong authentication protocols
to ensure that only authorized users can access their accounts. User data
(such as preferences and generated designs) should be securely stored and
transmitted using encryption. Additionally, regular security audits and
vulnerability scans should be conducted to identify and address potential
threats.
To meet the performance expectations of the users, the platform must en-
sure that design generation happens in a reasonable time frame (within 5
seconds per design) to maintain user satisfaction. The platform should
be capable of supporting at least 500 concurrent users, ensuring that
users don’t experience delays or timeouts. Optimized server configura-
tions, database queries, and efficient backend processes will contribute to
achieving this requirement
The user interface (UI) is crucial for providing an optimal user experience.
The platform will feature a simple, clean, and intuitive design. Users will
interact with the platform by entering design preferences through forms
or sliders, selecting room types, and receiving generated designs. The UI
will be responsive, ensuring that the platform is accessible across various
devices, including desktops, tablets, and smartphones.
The platform must handle high traffic during peak hours without perfor-
mance degradation. Key performance metrics include:
• Load Time: The platform’s homepage and design input page should
load within 3-5 seconds.
3.7 Summary
System Design
21
Signifying Immediate Image Generation from Text Chapter 4
The functional or modular design of our project breaks down the system
into distinct modules, each responsible for a specific functionality. The user
interface module, built with React and Streamlit, handles user interaction
and input. The backend module, powered by Flask, processes the prompts
and integrates seamlessly with the GAN model for image generation. A
dedicated data management module uses MongoDB to store and retrieve
user data. This modular approach ensures clear separation of concerns,
ease of maintenance, making the system efficient and user-friendly.
The use case diagram highlights the key interactions between the user
and the system components. The user interacts with the system to perform
various actions, such as signing in or signing up through the React fron-
tend, submitting prompts via the Streamlit interface, and viewing the gen-
erated images. The backend, powered by Flask, processes these prompts
and communicates with the GAN model to generate images while storing
all relevant data in MongoDB. Each use case is linked to specific actions,
ensuring clarity in the user’s journey through the system. It provides a
high-level overview of the system’s functionality.
The first-level Data Flow Diagram (DFD) further decomposes the high-
level system into distinct processes, each responsible for specific tasks. The
User interacts with the system by first logging in or signing up, where their
credentials are processed and authenticated. After successful authentica-
tion, the user submits a prompt to the Prompt Submission Process, which
sends the prompt to the Processing System. This system interacts with
the GAN Model to generate an image based on the user’s input. The user
details is stored in MongoDB through the Storage System.First-level DFD
highlights the core processes and their interdependencies in the system.
The Second-Level Data Flow Diagram (DFD) dives deeper into the
individual processes, providing a more granular view of the system’s op-
erations. In this diagram, the Login/Signup Process is broken down into
two sub-processes: Authentication and Account Creation . Once the user
is authenticated, the Prompt Submission Process is split into Prompt Val-
idation and Prompt Forwarding. The Processing System consists of two
sub-processes: GAN Model Interaction and Image Generation Confirma-
tion . The Storage System stores User Data .
4.5.4 Summary
Implementation
32
Signifying Immediate Image Generation from Text Chapter 5
pyTorch: To develop and deploy deep learning models, PyTorch was uti-
lized for its dynamic computation graph, which allows for greater flexibility
during model development and debugging. Its intuitive, Pythonic inter-
face makes it easy to experiment with various architectures and algorithms.
PyTorch’s extensive support for GPU acceleration and its seamless inte-
gration with other Python libraries, such as NumPy and pandas, ensured
efficient computation and scalability. Additionally, its robust ecosystem
for model training, optimization, and deployment streamlined the devel-
opment of high-performance machine learning models.
5.4 Summary
This chapter outlines the key components and tools used in the develop-
ment of the Signifying Immediate Image Generation from Text project,
highlighting the rationale behind their selection. The software stack in-
cludes React for creating dynamic and responsive user interfaces, Streamlit
for real-time interactive dashboards, Node.js for efficient server-side oper-
ations, and FastAPI for building high-performance APIs. PyTorch was
utilized for developing and deploying deep learning models, while Flask
served as the lightweight framework for backend integration. Python,
with its extensive library support and simplicity, was the primary pro-
gramming language, complemented by the Windows operating system for
development and testing. From a hardware perspective, standard comput-
ing devices with multi-core processors and at least 8 GB of RAM sufficed,
while the application was deployed on Vercel to eliminate the need for
dedicated infrastructure. The project architecture is modular, incorporat-
ing text-to-image generation using GANs, user interaction via React and
Streamlit, backend processing with Flask, and efficient data management
using MongoDB. These components seamlessly work together to transform
textual descriptions into high-quality, visually appealing 2D designs.
6.1 Results
Figure 6.1 represents The Home Page of the application acts as a wel-
coming gateway for users, offering a clean and minimalistic design. Its
responsive layout ensures compatibility with all devices, providing an ac-
cessible and seamless user experience. During user testing, the interface
received positive feedback for its simplicity and ease of navigation. Users
could easily access options to sign up, log in, or learn more about the sys-
tem.
37
Signifying Immediate Image Generation from Text Chapter 6
Figure 6.2 represents The Sign-Up Page facilitates the creation of user
accounts securely. It includes fields to input a username, email, and pass-
word, all of which are validated to ensure accuracy. Additionally, user
credentials are encrypted, prioritizing data security. Once the sign-up
process is complete, users are seamlessly redirected to the Login Page.
Figure 6.3 represents The Login Page allows registered users to access
their accounts securely. It features fields for email and password input,
ensuring smooth authentication. Incorrect login attempts prompt error
messages, guiding users to resolve the issue. Upon successful login, users
are directed to the application’s main functionality, where they can provide
text prompts for interior designs.
Figure 6.4 represents Once logged in, users interact with the system
by providing textual descriptions of their desired interior designs. For
instance, a user might enter the prompt, “A cozy bedroom with wooden
flooring, natural lighting, and a queen-sized bed.” The system processes
this input and generates a 2D design that aligns closely with the provided
description. Feedback mechanisms are integrated to guide users in refining
their prompts for optimal results.
Figure 6.5 represents The generated images are the highlight of the ap-
plication. Based on the text input, the system leverages a custom GAN
model to create detailed 2D designs. For example, a prompt like “A mod-
ern living room with a grey sofa, a glass coffee table, and indoor plants”
produced a visually appealing and accurate representation of the descrip-
tion. Testing showed that the system efficiently translated textual inputs
into designs, receiving praise for its intuitive output. The generated de-
signs are downloadable, allowing users to save or integrate them into other
design projects.
6.2 TESTING
For our project, the testing process ensures the system’s usability, accu-
racy, and functionality. It involves running tests to identify errors and
validate the performance of each module by simulating various user inputs
and scenarios. The primary goal of testing is to confirm the system’s abil-
ity to interpret textual descriptions accurately and generate corresponding
2D interior designs.
The testing success depends on well-structured test cases that cover
diverse room types, material descriptions, and user interactions. Each test
case includes inputs (e.g., text descriptions of rooms or specific furniture),
• White Box Testing: This type of testing focuses on how the system
works on the inside. It involves checking the code, algorithms, and
overall logic to ensure everything runs smoothly and efficiently. For
our project, we test how well the system processes text descriptions,
how accurately the GAN model creates images, and whether the ap-
plication handles different kinds of inputs effectively. For example, we
check if the text processing part can understand complex sentences
and if the GAN model generates designs that match the key features
described. This helps ensure the system produces high-quality results.
handling user inputs. For example, we verified whether the system could
understand a description like “A modern living room with a grey sofa”
and if the GAN generated a suitable design. This way, we identified and
fixed any bugs in the individual parts before combining them into the full
system.
After making sure each part worked individually, we tested how well they
worked together. Integration testing ensured that all components, like
the text analysis module, the GAN model, and the user interface, could
communicate and function as a cohesive system. For instance, we checked
if a user input was passed correctly from the frontend to the GAN model
and if the generated image was displayed back to the user without errors.
This testing step helped us catch any issues in the interaction between
different modules.
System testing involved testing the entire project as a whole, under real-life
conditions. We made sure all parts of the system—like the text analysis,
image generation, and user interface—worked seamlessly together. For
example, we tested the system with a variety of text descriptions to see
if it consistently generated accurate 2D designs and handled unexpected
inputs gracefully. This stage ensured that the application could provide
reliable results in a real-world environment.System testing consists of the
following steps:
• End-to-End Testing: The entire system was tested to ensure all com-
ponents worked together without any issues, from the user’s input to
the generated design.
6.6 Summary
46
References
47
Appendix A
48
Appendix B
Expo Details
Figure B.1 shows our Team has successfully presented the project in
”Nirmaan 2024”as part of ”Innovations Showcase” college level project
exhibition and competition at Canara Engineering College on 10th Dece-
meber 2024
49