0% found this document useful (0 votes)
6 views

Report Image generation

The project report titled 'Signifying Immediate Image Generation from Text' presents a web-based application developed by students of Canara Engineering College, utilizing Generative Adversarial Networks (GANs) to create 2D interior design visuals from textual descriptions. The application employs React for the user interface, FastAPI for backend functionality, and MongoDB for data management, demonstrating the feasibility of AI-driven design visualization. The report includes acknowledgments, an abstract, and detailed sections on literature survey, software requirements, and system design.

Uploaded by

shetdarshan42
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Report Image generation

The project report titled 'Signifying Immediate Image Generation from Text' presents a web-based application developed by students of Canara Engineering College, utilizing Generative Adversarial Networks (GANs) to create 2D interior design visuals from textual descriptions. The application employs React for the user interface, FastAPI for backend functionality, and MongoDB for data management, demonstrating the feasibility of AI-driven design visualization. The report includes acknowledgments, an abstract, and detailed sections on literature survey, software requirements, and system design.

Uploaded by

shetdarshan42
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

Visvesvaraya Technological University, Belagavi – 590018

PROJECT REPORT
ON
SIGNIFYING IMMEDIATE IMAGE
GENERATION FROM TEXT
Submitted in partial fulfillment for the award of degree of

BACHELOR OF ENGINEERING
in
INFORMATION SCIENCE AND ENGINEERING

Submitted by
DARSHAN SHET 4CB21IS009
NITHISH 4CB21IS030
PRANAV SAVANT 4CB21IS035
SHANMUKHA MADDODI 4CB21IS045

Under the Guidance of


Prof. Pradeep M
Assistant Professor, Department of Information Science And Engineering

DEPT. OF INFORMATION SCIENCE AND ENGINEERING


CANARA ENGINEERING COLLEGE
(Affiliated to VTU Belagavi, Recognized by AICTE, Accredited by NBA)
Sudhindra Nagara, Benjanapadavu, Mangaluru - 574219,
Karnataka.
2024-25
CANARA ENGINEERING COLLEGE
(Affiliated to VTU Belagavi, Recognized by AICTE, Accredited by NBA)
Sudhindra Nagara, Benjanapadavu, Mangaluru - 574219,
Karnataka

DEPARTMENT OF INFORMATION SCIENCE AND


ENGINEERING

CERTIFICATE
Certified that the project work entitled “SIGNIFYING IMMEDIATE IMAGE
GENERATION FROM TEXT” carried out by

Mr. Darshan Shet 4CB21IS009


Mr. Nithish 4CB21IS030
Mr. Pranav Savant 4CB21IS035
Mr. Shanmukha Maddodi 4CB21IS045

the bonafide students of VII semester INFORMATION SCIENCE AND ENGI-


NEERING in partial fulfillment for the award of Bachelor of Engineering in INFOR-
MATION SCIENCE AND ENGINEERING of the Visvesvaraya Technological
University, Belagavi during the year 2024-2025. It is certified all corrections/suggestions
indicated for Internal Assessment as indicated during internal assessment. The project
report has been approved as it satisfies the academic requirements in respect of project
work prescribed for the said degree.

Prof. Pradeep M Dr. H Manoj T Gadiyar Dr. Nagesh H R


Project Guide HOD-ISE Principal

External Viva:

Examiner’s Name Signature with Date

1. . . . . . . . . . . . . . . . . . . . . . .....................
2. . . . . . . . . . . . . . . . . . . . . . .....................
CANARA ENGINEERING COLLEGE
(Affiliated to VTU Belagavi, Recognized by AICTE, Accredited by NBA)
Sudhindra Nagara, Benjanapadavu, Mangaluru - 574219,
Karnataka

DEPARTMENT OF INFORMATION SCIENCE AND


ENGINEERING

DECLARATION

We hereby declare that the entire work embodied in this Project Report ti-
tled “ SIGNIFYING IMMEDIATE IMAGE GENERATION FROM
TEXT ” has been carried out by us at CANARA ENGINEERING COL-
LEGE, Mangaluru under the supervision of Prof. Pradeep M, for the
award of Bachelor of Engineering in Information Science And En-
gineering. This report has not been submitted to this or any other Uni-
versity for the award of any other degree.

Darshan Shet 4CB21IS009


Nithish 4CB21IS030
Pranav Savant 4CB21IS035
Shanmukha Maddodi 4CB21IS045
Acknowledgement

We dedicate this page to acknowledge and thank those responsible for the
shaping of the project. Without their guidance and help, the experience
while constructing the dissertation would not have been so smooth and
efficient.

We sincerely thank our Project guide Prof. Pradeep M, Assistant Pro-


fessor, Department of Information Science And Engineering for his guid-
ance and valuable suggestions which helped us to complete this project.
We also thank our Project coordinators Dr. Ganesh Pai, Department of
Information Science And Engineering, for their consistant encouragement.

We owe a profound gratitude to Dr. H Manoj T Gadiyar, Head of


the Department of Information Science And Engineering, whose kind sup-
port and guidance helped us to complete this work successfully. We also
take this opportunity to thank our Dean Academics/Vice-Principal Dr.
Demian Antony D’Mello and We are extremely thankful to our Prin-
cipal, Dr. Nagesh H R, for their support and encouragement.

We would like to thank all faculty and staff of the Department of Infor-
mation Science And Engineering who have always been with us extending
their support, precious suggestions, guidance, and encouragement through
the project.We also express our gratitude to our beloved friends and par-
ents for their constant encouragement and support.

Darshan Shet
Nithish
Pranav Savant
Shanmukha Maddodi

i
Abstract

Interior design is a creative process that transforms spaces to suit aesthetic


and functional needs. This project develops a web-based application using
Generative Adversarial Networks (GANs) to generate 2D interior design
visuals from textual descriptions. The application integrates React for an
interactive user interface, FastAPI for backend functionality, and Mon-
goDB for managing user data and design histories. Leveraging Stable
Diffusion from Hugging Face, the system interprets user prompts like “a
modern minimalist living room with a neutral color palette” and generates
corresponding visuals.
Key processes include gathering user inputs, using GANs for text-to-
image generation, and storing design metadata. Test scenarios validated
the accuracy of text interpretation and visual generation. The project
demonstrates the feasibility of AI-driven design visualization and its ap-
plication in interior design consultations. Future work could explore inte-
grating 3D visualization and compatibility with professional design tools.
Keywords : Interior Design, Generative Adversarial Networks, Text-to-
Image Generation, React, FastAPI, MongoDB, Streamlit.

ii
Table of Contents

Acknowledgement i

Abstract ii

Table of Contents vii

List of Figures viii

List of Tables ix

1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation and Problem Statement . . . . . . . . . . . . . 1
1.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Scope and Limitations . . . . . . . . . . . . . . . . . . . . 2
1.5 Relevance and Type . . . . . . . . . . . . . . . . . . . . . 3
1.6 Organization of the Report . . . . . . . . . . . . . . . . . . 3

2 Literature Survey 4
2.1 Text-to-Image Synthesis With Generative Models: Meth-
ods, Datasets, Performance Metrics, Challenges, and Future
Direction . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 Brief Findings . . . . . . . . . . . . . . . . . . . . . 4
2.1.2 Design/Methodology/Techniques Adopted . . . . . 4
2.1.3 Results Achieved . . . . . . . . . . . . . . . . . . . 5
2.2 Recent Advances in Text-to-Image Synthesis: Approaches,
Datasets, and Future Research Prospects . . . . . . . . . . 5
2.2.1 Brief Findings . . . . . . . . . . . . . . . . . . . . . 5

iii
2.2.2 Design/Methodology/Techniques . . . . . . . . . . 6
2.2.3 Results Achieved . . . . . . . . . . . . . . . . . . . 6
2.3 GACnet-Text-to-Image Synthesis With Generative Models
Using Attention Mechanisms With Contrastive Learning . 6
2.3.1 Brief Findings . . . . . . . . . . . . . . . . . . . . . 6
2.3.2 Design/Methodology/Techniques Adopted in Arti-
cle n . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.3 Results Achieved . . . . . . . . . . . . . . . . . . . 7
2.4 Exploring Progress in Text-to-Image Synthesis: An In-Depth
Survey on the Evolution of Generative Adversarial Networks 7
2.4.1 Brief Findings . . . . . . . . . . . . . . . . . . . . 8
2.4.2 Design/Methodology/Techniques Adopted . . . . . 8
2.4.3 Results Achieved . . . . . . . . . . . . . . . . . . . 8
2.5 BigGan-based Bayesian reconstruction of natural images
from human brain activity . . . . . . . . . . . . . . . . . . 9
2.5.1 Brief Findings . . . . . . . . . . . . . . . . . . . . . 9
2.5.2 Design/Methodology/Techniques Adopted . . . . . 9
2.5.3 Results Achieved . . . . . . . . . . . . . . . . . . . 9
2.6 Use mean field theory to train a 200-layer vanilla Gan . . 10
2.6.1 Brief Findings . . . . . . . . . . . . . . . . . . . . 10
2.6.2 Design/Methodology/Techniques Adopted . . . . . 10
2.6.3 Results Achieved . . . . . . . . . . . . . . . . . . . 10
2.7 High-Resolution Image Synthesis with Latent Diffusion Mod-
els . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.7.1 Brief Findings . . . . . . . . . . . . . . . . . . . . . 11
2.7.2 Design/Methodology/Techniques Adopted . . . . . 11
2.7.3 Results Achieved . . . . . . . . . . . . . . . . . . . 11
2.8 Antenna Design Using a GAN-Based Synthetic Data Gen-
eration Approach . . . . . . . . . . . . . . . . . . . . . . . 12
2.8.1 Brief Findings . . . . . . . . . . . . . . . . . . . . 12
2.8.2 Design/Methodology/Techniques Adopted . . . . . 12
2.8.3 Results Achieved . . . . . . . . . . . . . . . . . . . 12
2.9 Text-to-Image Generator using GANs . . . . . . . . . . . 13

iv
2.9.1 Brief Findings . . . . . . . . . . . . . . . . . . . . . 13
2.9.2 Design/Methodology/Techniques Adopted . . . . . 13
2.9.3 Results Achieved . . . . . . . . . . . . . . . . . . . 13
2.10 A 28.6 mJ/iter Stable Diffusion Processor for Text-to-Image
Generation with Patch Similarity-based Sparsity Augmen-
tation and Text-based Mixed-Precision . . . . . . . . . . . 14
2.10.1 Brief Findings . . . . . . . . . . . . . . . . . . . . . 14
2.10.2 Design/Methodology/Techniques Adopted . . . . . 14
2.10.3 Results Achieved . . . . . . . . . . . . . . . . . . . 14
2.11 Comparison Table . . . . . . . . . . . . . . . . . . . . . . . 15
2.12 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3 Software Requirements Specification 17


3.1 Functional requirements . . . . . . . . . . . . . . . . . . . 17
3.2 Non-Functional requirements . . . . . . . . . . . . . . . . . 18
3.2.1 Safety Requirements . . . . . . . . . . . . . . . . . 18
3.2.2 Performance Requirements . . . . . . . . . . . . . . 18
3.3 User interface design . . . . . . . . . . . . . . . . . . . . . 19
3.4 Hardware and Software requirements . . . . . . . . . . . . 19
3.5 Performance Requirements . . . . . . . . . . . . . . . . . . 19
3.6 Any Other Requirements . . . . . . . . . . . . . . . . . . 20
3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4 System Design 21
4.1 Abstract Design . . . . . . . . . . . . . . . . . . . . . . . . 21
4.1.1 Architectural diagram . . . . . . . . . . . . . . . . 21
4.2 Proposed system . . . . . . . . . . . . . . . . . . . . . . . 23
4.3 Functional Design . . . . . . . . . . . . . . . . . . . . . . . 24
4.3.1 Modular design diagram . . . . . . . . . . . . . . . 24
4.3.2 Sequence diagram . . . . . . . . . . . . . . . . . . 25
4.3.3 Use case diagram . . . . . . . . . . . . . . . . . . . 26
4.4 Control Flow Design . . . . . . . . . . . . . . . . . . . . . 27
4.4.1 Activity diagram for use cases . . . . . . . . . . . 27
4.5 Data Flow Diagram . . . . . . . . . . . . . . . . . . . . . . 28

v
4.5.1 Zero-Level Data Flow Diagram . . . . . . . . . . . 28
4.5.2 First-Level Data Flow Diagram . . . . . . . . . . . 29
4.5.3 Second-Level Data Flow Diagram . . . . . . . . . . 30
4.5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . 31

5 Implementation 32
5.1 Software Used with Justification . . . . . . . . . . . . . . . 32
5.1.1 Frontend Development . . . . . . . . . . . . . . . . 32
5.1.2 Backend Development . . . . . . . . . . . . . . . . 33
5.1.3 Framework Used . . . . . . . . . . . . . . . . . . . 33
5.1.4 Coding Languages Used for Development . . . . . . 34
5.1.5 Operating System . . . . . . . . . . . . . . . . . . . 34
5.2 Hardware Used with Justification . . . . . . . . . . . . . . 35
5.3 Algorithm/Procedures Used in the Project in Different Mod-
ules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

6 Results and Discussion 37


6.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.2 TESTING . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6.3 TYPES OF SOFTWARE TESTING . . . . . . . . . . . . 41
6.4 TESTING METHODOLOGY . . . . . . . . . . . . . . . . 41
6.4.1 Unit Testing . . . . . . . . . . . . . . . . . . . . . 41
6.4.2 Integration Testing . . . . . . . . . . . . . . . . . . 42
6.4.3 System Testing . . . . . . . . . . . . . . . . . . . . 42
6.5 TESTING CRITERIA . . . . . . . . . . . . . . . . . . . . 43
6.5.1 Testing for Text Input . . . . . . . . . . . . . . . . 43
6.5.2 Testing for Generated Images . . . . . . . . . . . . 43
6.5.3 Testing for User Interface . . . . . . . . . . . . . . 44
6.5.4 Testing for Download Functionality . . . . . . . . . 44
6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

7 Conclusions and Future work 46

vi
References 47

A Turnitin Plagiarism Report 48

B Expo Details 49

vii
List of Figures

4.1 Example Architecture diagram . . . . . . . . . . . . . . . . 22


4.2 Sample proposed System . . . . . . . . . . . . . . . . . . . 23
4.3 Sample Modular design diagram . . . . . . . . . . . . . . . 24
4.4 Sample Sequence diagram . . . . . . . . . . . . . . . . . . 25
4.5 Example Use case diagram . . . . . . . . . . . . . . . . . . 26
4.6 Sample Activity diagram . . . . . . . . . . . . . . . . . . . 27
4.7 Sample Zero-Level Data Flow Diagram . . . . . . . . . . . 28
4.8 Sample First-Level Data Flow Diagram . . . . . . . . . . 29
4.9 Sample Second-Level Data Flow Diagram . . . . . . . . . 30

6.1 Home page . . . . . . . . . . . . . . . . . . . . . . . . . . 37


6.2 Sign Up Page . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.3 Sign In Page . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.4 Give Prompt to generate . . . . . . . . . . . . . . . . . . 39
6.5 Generate Images . . . . . . . . . . . . . . . . . . . . . . . 40

B.1 Innovation Showcase . . . . . . . . . . . . . . . . . . . . . 49

viii
List of Tables

2.1 Comparison of Existing Work and Gap Identification . . . 15

6.1 Sample Testing Criteria for Text Input . . . . . . . . . . . 43


6.2 SSample Testing Criteria for Generated Images . . . . . . 43
6.3 Sample Testing Criteria for User Interface . . . . . . . . . 44
6.4 Sample Testing Criteria for Download Functionality . . . . 44

ix
Chapter 1

Introduction

1.1 Background

Interior design has traditionally been a creative and manual process re-
quiring expertise to conceptualize and create living spaces. With advance-
ments in artificial intelligence (AI), particularly Generative Adversarial
Networks (GANs), the interior design process has seen a transformation.
AI now enables automated visualizations of interior spaces based on text
descriptions, significantly reducing time and effort while enhancing cre-
ativity. This project focuses on developing a Text-To-Image Generator
that uses AI to convert textual descriptions into 2D interior design im-
ages. Users can describe the desired room style, color palette, and furni-
ture preferences, and the system generates a visual representation. This
approach aims to provide an easy-to-use solution for those who lack design
expertise, allowing them to visualize their ideal living spaces.

1.2 Motivation and Problem Statement

The interior design process can often be overwhelming, particularly for


individuals who lack a design background. Communicating a design idea
to professionals or even visualizing abstract concepts can be a difficult task.
This project seeks to simplify the process by providing a tool that allows
users to generate visual representations of interior designs based solely
on textual descriptions. Additionally, providing an estimate of furniture
and material costs can assist users in budgeting for their projects. The

1
Signifying Immediate Image Generation from Text Chapter 1

key problem this project addresses is the lack of accessible and affordable
tools for visualizing interior designs quickly and efficiently, particularly for
those with no design experience.

1.3 Objectives

The objectives of the project are as follows:

• Develop a web-based application to generate 2D interior design visuals


based on text input from users.

• Implement an AI-driven model using GANs and Stable Diffusion for


image generation.

• Provide a user-friendly interface with options to select design style,


room type, and color palette.

• Use MongoDB to store and retrieve user data and design history.

• Ensure the application is scalable and responsive, handling multiple


user requests efficiently.

1.4 Scope and Limitations

This project aims to generate 2D interior designs for a limited set of room
types (living room, bedroom, kitchen, and office) and a select group of
design styles (modern, minimalist, industrial, etc.). The scope is confined
to generating visual concepts and providing basic cost estimates based
on pre-set material and furniture prices. While this approach is designed
to assist users in initial design visualization, the system does not aim to
offer detailed architectural planning or 3D modeling. The limitations of
the project include the potential lack of accuracy in the cost estimation,
which is based on general pricing datasets that may not reflect individual
market variations. Additionally, while the AI model can generate realistic
designs, its ability to understand complex or highly specific descriptions
may be limited.

Department of Information Science And Engineering, CEC, Benjanapadavu,Mangaluru Page 2


Signifying Immediate Image Generation from Text Chapter 1

1.5 Relevance and Type

This project is particularly relevant as the application of AI in interior


design is an emerging field. The tool provides an accessible, affordable
solution for both professionals and non-professionals, simplifying the de-
sign process. The project falls under **applied research**, aiming to solve
practical problems through technology. It is also **innovative**, as it
integrates AI-driven design visualization and cost estimation into a user-
friendly web application, making it a valuable tool for users looking to
quickly conceptualize and plan interior designs.

1.6 Organization of the Report

The report is organized into structured sections that guide the reader
through the various components of the project. First, the Literature Re-
view examines existing studies on covert communication and related de-
tection technologies, providing context and identifying gaps in current re-
search. The Methodology section outlines the machine learning models,
data sources, and techniques employed to develop the detection system,
while Implementation details the technical aspects of building and inte-
grating these models for real-time monitoring. Following the methodol-
ogy, the Results and Analysis section presents the performance of the
system, including accuracy rates and the effec- tiveness of real-time de-
tection capabilities. The Conclusion and Future Work section summarizes
the project’s findings and identifies areas for improvement, including plans
for enhancing adaptability and computa- tional efficiency. This structure
is designed to ensure a comprehensive understanding of the project’s ob-
jectives, processes, and potential impact, paving the way for continued
development and application in real-world settings.

Department of Information Science And Engineering, CEC, Benjanapadavu,Mangaluru Page 3


Chapter 2

Literature Survey

2.1 Text-to-Image Synthesis With Generative Mod-


els: Methods, Datasets, Performance Metrics,
Challenges, and Future Direction

Author: Sarah K. Alhabeeb, Amal A. Al-Shargabi[1]

2.1.1 Brief Findings

The study by Sarah K. Alhabeeb and Amal A. Al-Shargabi (2024) fo-


cuses on advancements in text-to-image synthesis using generative mod-
els. It reviews various methods, including Generative Adversarial Net-
works (GANs), Variational Autoencoders (VAEs), and diffusion models,
highlighting their capabilities in generating realistic images from textual
descriptions. The paper also explores datasets like MS-COCO and CUB,
discussing their roles in benchmarking model performance. It emphasizes
the challenges of bridging the semantic gap between textual and visual
representations, requiring innovations in data annotation and contextual
understanding.

2.1.2 Design/Methodology/Techniques Adopted

The research employs a comprehensive literature review methodology, sys-


tematically analyzing state-of-the-art generative techniques and their evo-
lution. It evaluates performance metrics such as Fréchet Inception Dis-
tance (FID) and Inception Score (IS) to assess model output quality. The

4
Signifying Immediate Image Generation from Text Chapter 2

authors also identify shortcomings in existing datasets and propose met-


rics for improved evaluation, emphasizing a need for diverse and unbiased
datasets to enhance generalization.

2.1.3 Results Achieved

The study concludes that while significant progress has been made, current
models often struggle with complex or abstract textual inputs. It iden-
tifies promising directions, such as integrating multimodal learning and
advanced semantic embeddings, to improve synthesis accuracy. The find-
ings suggest that achieving human-level realism and diversity in generated
images requires addressing limitations in training data and computational
efficiency.

2.2 Recent Advances in Text-to-Image Synthesis: Ap-


proaches, Datasets, and Future Research Prospects

Author:Yong Xuan Tan, Chin Poo Lee, Mai Neo, Kian Ming Lim, Jit Yan
Lim, Ali Alqahtani[2]

2.2.1 Brief Findings

The paper ”Recent Advances in Text-to-Image Synthesis: Approaches,


Datasets, and Future Research Prospects” by Yong Xuan Tan et al. (2023)
explores the rapid evolution of text-to-image synthesis models. It high-
lights advancements in generative frameworks like GANs, VAEs, and diffu-
sion models, emphasizing their ability to translate text prompts into high-
quality, semantically accurate images. The study discusses key datasets
and benchmarks that have driven innovation while addressing model lim-
itations, including challenges with consistency and detail preservation in
generated images

Department of Information Science And Engineering, CEC, Benjanapadavu,Mangaluru Page 5


Signifying Immediate Image Generation from Text Chapter 2

2.2.2 Design/Methodology/Techniques

The authors systematically reviewed major approaches in text-to-image


synthesis, focusing on architectural innovations. They explored integra-
tion techniques, such as combining pre-trained language models with vi-
sual representation models. Methods like char-CNN-RNN and transform-
ers for text embeddings, along with multimodal alignment strategies, were
critically analyzed. The study also evaluated newer strategies, including
conditional augmentation and hierarchical generation, to address continu-
ity issues

2.2.3 Results Achieved

The paper concludes that while the field has made significant progress in
realism and diversity of generated images, there is room for improvement
in generating fine-grained details and achieving better alignment between
text descriptions and visual outputs. The authors also propose future
directions, such as leveraging hybrid architectures and improving dataset
diversity, to address these challenges and enable broader applications of
text-to-image synthesis

2.3 GACnet-Text-to-Image Synthesis With Genera-


tive Models Using Attention Mechanisms With
Contrastive Learning

Author:Md. Ahsan Habib, Md. Anwar Hussen Wadud[3]

2.3.1 Brief Findings

The GACnet: Text-to-Image Synthesis With Generative Models Using At-


tention Mechanisms and Contrastive Learning project revealed significant
advancements in the text-to-image synthesis domain. By integrating at-
tention mechanisms with contrastive learning in a generative adversarial
network (GAN) framework, the model achieved superior text-image align-
ment and enhanced visual quality. It demonstrated that combining these

Department of Information Science And Engineering, CEC, Benjanapadavu,Mangaluru Page 6


Signifying Immediate Image Generation from Text Chapter 2

techniques improved the model’s ability to understand intricate textual nu-


ances and translate them into realistic, high-fidelity images. These findings
underscore the importance of synergizing attention and contrastive mech-
anisms for tackling challenges in generative modeling.

2.3.2 Design/Methodology/Techniques Adopted in Article n

The GACnet: Text-to-Image Synthesis With Generative Models Using At-


tention Mechanisms and Contrastive Learning project revealed significant
advancements in the text-to-image synthesis domain. By integrating at-
tention mechanisms with contrastive learning in a generative adversarial
network (GAN) framework, the model achieved superior text-image align-
ment and enhanced visual quality. It demonstrated that combining these
techniques improved the model’s ability to understand intricate textual nu-
ances and translate them into realistic, high-fidelity images. These findings
underscore the importance of synergizing attention and contrastive mech-
anisms for tackling challenges in generative modeling.

2.3.3 Results Achieved

The GACnet project achieved significant results in the field of text-to-


image synthesis by leveraging attention mechanisms and contrastive learn-
ing. The model demonstrated improved capabilities in generating diverse
and high-quality images aligned with textual descriptions. Key perfor-
mance metrics included an Inception Score (IS) of 35.23, a Fréchet Incep-
tion Distance (FID) of 18.2, and an R-Precision of 89.14, indicating a high
level of visual fidelity, diversity, and textual alignment.

2.4 Exploring Progress in Text-to-Image Synthesis:


An In-Depth Survey on the Evolution of Gener-
ative Adversarial Networks

Author:Md. Ahsan Habib, Md. Anwar Hussen Wadud, Md. Fazlul Karim
Patwary, Mohammad Motiur Rahman, M. F. Mridha[4]

Department of Information Science And Engineering, CEC, Benjanapadavu,Mangaluru Page 7


Signifying Immediate Image Generation from Text Chapter 2

2.4.1 Brief Findings

The paper explores advancements in Generative Adversarial Networks


(GANs) for text-to-image synthesis. It highlights the evolution from basic
GAN models to sophisticated architectures incorporating attention mech-
anisms, multi-modal learning, and contrastive techniques. The authors
discuss how these advancements have improved the synthesis quality, di-
versity, and semantic alignment of generated images with text inputs, em-
phasizing real-world applications like creative content and AI design.

2.4.2 Design/Methodology/Techniques Adopted

The study systematically reviews various GAN-based approaches, ana-


lyzing architectural designs, attention-driven mechanisms, and training
strategies. It also evaluates the effectiveness of different datasets and
benchmarks, focusing on integrating contrastive learning to enhance image-
text coherence and ensure better feature representation in generative mod-
els.

2.4.3 Results Achieved

The survey reveals significant progress in achieving photo-realistic and


contextually accurate image generation. The integration of attention and
contrastive learning has led to better interpretability and higher-quality
outcomes. The authors propose potential future directions, such as en-
hancing scalability, dataset diversity, and ethical considerations in text-
to-image synthesis

Department of Information Science And Engineering, CEC, Benjanapadavu,Mangaluru Page 8


Signifying Immediate Image Generation from Text Chapter 2

2.5 BigGan-based Bayesian reconstruction of natu-


ral images from human brain activity

Author: Kai Qiao, Jain Chen,Linyuan Wang,Chi Zhang,Li Tong,[5]

2.5.1 Brief Findings

The research focuses on decoding brain activity into natural images by


leveraging the strengths of GANs, specifically BigGAN. The study ad-
dresses two challenges: The limited sample size of fMRI data, which often
leads to suboptimal GAN training. The need for balancing fidelity (accu-
rate detail replication) and naturalness (plausibility of generated images).

2.5.2 Design/Methodology/Techniques Adopted

The proposed GAN-based Bayesian Visual Reconstruction Model (GAN-


BVRM) integrates a classifier to decode semantic categories from fMRI
data with a pre-trained conditional BigGAN generator, which produces
images corresponding to those categories. This process is refined through
encoding models that evaluate the generated images against the original
brain activity to ensure alignment. Operating within a Bayesian frame-
work, the system iteratively generates and selects images that best match
the brain data, effectively balancing the naturalness of the generated im-
ages with their fidelity to the observed neural signals

2.5.3 Results Achieved

Experimental validation demonstrated that GAN-BVRM significantly im-


proves over traditional GAN-based methods. The reconstructed images
exhibited higher fidelity and were more semantically aligned with the fMRI
stimuli. This achievement marks a step forward in bridging computational
neuroscience with computer vision technologies.

Department of Information Science And Engineering, CEC, Benjanapadavu,Mangaluru Page 9


Signifying Immediate Image Generation from Text Chapter 2

2.6 Use mean field theory to train a 200-layer vanilla


Gan

Author:Dan Li,Shaung Liu,Wellai Xiang,Fengqi Liu, J Doe [6]

2.6.1 Brief Findings

The research highlights that incorporating MFT into the GAN framework
allows for a more robust training process, enabling the model to learn
higher-dimensional data representations. The study revealed that the pro-
posed approach reduced gradient explosion and vanishing problems com-
mon in deep models, resulting in improved convergence and better gener-
ative performance compared to standard training approaches

2.6.2 Design/Methodology/Techniques Adopted

The authors applied mean field theory, a concept originating from statisti-
cal physics, to stabilize the training of vanilla GANs, which often struggle
with deep architectures due to vanishing gradients and mode collapse.
By leveraging MFT, they approximated the interactions among neural
network units, effectively mitigating instability in backpropagation. The
methodology included mathematical analysis to validate MFT’s suitability
and extensive implementation experiments with multiple datasets, demon-
strating how deeper GANs can be trained without significant loss of sta-
bility.

2.6.3 Results Achieved

The experimental results showed significant improvements in training sta-


bility and output quality. The 200-layer vanilla GAN, trained using the
MFT approach, achieved high fidelity in generating complex data dis-
tributions. Quantitative metrics such as the Fréchet Inception Distance
(FID) demonstrated superior performance over traditional GAN training
methodologies.

Department of Information Science And Engineering, CEC, Benjanapadavu,Mangaluru Page 10


Signifying Immediate Image Generation from Text Chapter 2

2.7 High-Resolution Image Synthesis with Latent Dif-


fusion Models

Author:Robin Rombach; Andreas Blattmann; Dominik Lorenz; Patrick


Esser; Björn Ommer [7]

2.7.1 Brief Findings

The paper ”High-Resolution Image Synthesis with Latent Diffusion Mod-


els” by Robin Rombach et al. introduces a novel approach to high-resolution
image synthesis by integrating diffusion models into the latent space of
pretrained autoencoders. This method significantly reduces the compu-
tational overhead typically associated with diffusion models, which tra-
ditionally operate in pixel space, requiring extensive GPU resources. By
leveraging the latent space.

2.7.2 Design/Methodology/Techniques Adopted

The design employs cross-attention layers within the model architecture,


enabling flexible conditioning inputs such as text or bounding boxes. This
allows for advanced tasks like image inpainting, super-resolution, and class-
conditional image synthesis. The latent diffusion models (LDMs) proposed
in this paper outperform prior models in terms of image quality while
maintaining computational efficiency. This method also supports scalable
high-resolution synthesis, setting new standards in various benchmarks.

2.7.3 Results Achieved

Results from the study highlight the versatility of LDMs in multiple do-
mains, including text-to-image synthesis, semantic scene generation, and
image inpainting. These models achieve state-of-the-art performance while
using fewer computational resources compared to pixel-based diffusion
methods. The work emphasizes not only the effectiveness of latent dif-
fusion but also its practical implications for resource-constrained training
environments

Department of Information Science And Engineering, CEC, Benjanapadavu,Mangaluru Page 11


Signifying Immediate Image Generation from Text Chapter 2

2.8 Antenna Design Using a GAN-Based Synthetic


Data Generation Approach

Author:Oameed Noakoasteen ,Jayakrishnan Vijayamohanan, Arjun Gupta


and Christos Christodoulou [8]

2.8.1 Brief Findings

The research employs Generative Adversarial Networks (GANs) to simplify


and accelerate antenna design processes, specifically focusing on the Log-
Periodic Folded Dipole Array (LPFDA). The study addresses the challenge
of generating antenna designs for distinct Q-factor ranges by transforming
the task into a problem of producing parameterized samples for predefined
classes.

2.8.2 Design/Methodology/Techniques Adopted

The research employs Generative Adversarial Networks (GANs) to simplify


and accelerate antenna design processes, specifically focusing on the Log-
Periodic Folded Dipole Array (LPFDA). The study addresses the challenge
of generating antenna designs for distinct Q-factor ranges by transforming
the task into a problem of producing parameterized samples for predefined
classes. This innovative approach bridges the gap between theoretical
design and practical implementation, streamlining design iteration cycles

2.8.3 Results Achieved

The system demonstrated the ability to rapidly synthesize antenna de-


signs that meet specific Q-factor requirements, significantly reducing the
computational cost and time involved in traditional methods. The gen-
erated designs were validated for their accuracy and practical feasibility,
highlighting the potential of GAN-based models in advancing electromag-
netic design technologies. The results underscore GANs’ effectiveness in
optimizing antenna performance and adaptability to varying design con-
straints

Department of Information Science And Engineering, CEC, Benjanapadavu,Mangaluru Page 12


Signifying Immediate Image Generation from Text Chapter 2

2.9 Text-to-Image Generator using GANs

Author:Vadik Amar; Sonu; Hatesh Shyan [9]

2.9.1 Brief Findings

The paper ”Text-to-Image Generator using GANs” explores the capability


of Generative Adversarial Networks (GANs) to generate realistic images
from textual descriptions. It highlights the challenges in maintaining se-
mantic alignment between text and image, achieving high-resolution out-
puts, and ensuring diversity in generated images. The study emphasizes
the growing importance of attention mechanisms and large datasets in im-
proving text-to-image synthesis performance

2.9.2 Design/Methodology/Techniques Adopted

The authors designed a GAN architecture tailored for text-to-image syn-


thesis, incorporating an advanced generator and discriminator. The gener-
ator translates textual embeddings into visual features, while the discrim-
inator evaluates the quality and relevance of the generated images. The
study also integrated pre-trained language models for text processing and
enhanced training.

2.9.3 Results Achieved

The model achieved significant improvements in generating high-quality


and semantically accurate images compared to baseline approaches. It
was able to create diverse outputs while maintaining fidelity to the input
descriptions. Benchmarking on public datasets demonstrated a marked
increase in both the quality and coherence of synthesized images, with
enhanced metrics like Inception Score (IS) and Fréchet Inception Distance
(FID).

Department of Information Science And Engineering, CEC, Benjanapadavu,Mangaluru Page 13


Signifying Immediate Image Generation from Text Chapter 2

2.10 A 28.6 mJ/iter Stable Diffusion Processor for


Text-to-Image Generation with Patch Similarity-
based Sparsity Augmentation and Text-based
Mixed-Precision

Author:Jiwon Choi; Wooyoung Jo; Seongyon Hong; Beomseok Kwon; Won-


hoon Park; Hoi-Jun Yoo[10]

2.10.1 Brief Findings

The study presents a novel energy-efficient Stable Diffusion Processor for


text-to-image generation, employing Patch Similarity-based Sparsity Aug-
mentation (PSSA) and text-based mixed-precision mechanisms. The pro-
cessor optimizes computational demands by identifying and focusing on
text-relevant pixels, reducing unnecessary calculations.

2.10.2 Design/Methodology/Techniques Adopted

The design integrates PSSA to minimize redundant energy use in sparsity


augmentation and utilizes Text-based Important Pixel Spotting (TIPS)
for pixel-level precision adjustments. By leveraging mixed-precision pro-
cessing in key network layers, it enhances computational efficiency.

2.10.3 Results Achieved

The processor demonstrated a 37.8 percent reduction in energy consump-


tion compared to conventional approaches while maintaining high image
fidelity. It achieved over 44.8 percent lower precision computations for
non-critical pixels, significantly improving processing speed and reducing
power requirements.Thus the overall image clarity is improved with very
less energy consumption.

Department of Information Science And Engineering, CEC, Benjanapadavu,Mangaluru Page 14


Signifying Immediate Image Generation from Text Chapter 2

2.11 Comparison Table

Table 2.1: Comparison of Existing Work and Gap Identification


Project Title and Problem Ad- Implementation Limitations/Future
Author dressed and Results Scope
Text-to-Image Syn- Challenges in bal- Introduced a tax- Highlighted the
thesis With Genera- ancing quality, di- onomy of methods; need for realistic
tive Models Sarah K. versity, and accuracy emphasized GANs contextual un-
Alhabeeb, Amal A. in text-to-image syn- and VAEs. Results derstanding.More
Al-Shargabi, 2023 thesis. showed incremental scalable datasets.
improvements in
benchmarks.
Recent Advances in Reviewed emerging Synthesized datasets Suggested focus on
Text-to-Image Syn- techniques for im- and benchmarks; in- ethical concerns and
thesis Yong Xuan age synthesis and troduced novel hy- bias reduction in fu-
Tan et al., 2024 dataset limitations. brid approaches im- ture models.
proving fidelity.
GACnet - Text- Addressed the lack Developed a GAN Highlighted the
to-Image Synthesis of advanced atten- model with con- importance of ex-
With Attention tion mechanisms in trastive learning and panding the model
Mechanisms Md. GANs for text-to- improved generation for more complex
Ahsan Habib et al., image synthesis. fidelity and semantic datasets and reduc-
2024 alignment. ing computational
costs.
Exploring Progress Challenges in scal- Surveyed state-of- Suggested future
in Text-to-Image ability and consis- the-art GANs and work on hybrid
Synthesis: An In- tency in GAN-based identified issues in GAN models and
Depth Survey on text-to-image gener- model stability and resolving mode col-
Generative Adver- ation. collapse handling. lapse issues.
sarial Networks Md.
Ahsan Habib et al.,
2023
BigGAN-Based Bridging the gap be- Utilized BigGAN Proposed extending
Bayesian Recon- tween human cogni- models for Bayesian the framework for
struction of Natural tion and image syn- image reconstruc- more brain activity
Images From Hu- thesis. tion from brain datasets and inte-
man Brain Activity activity patterns. grating real-time
Kai Qiao et al., 2023 Achieved high- data processing.
fidelity results.

Department of Information Science And Engineering, CEC, Benjanapadavu,Mangaluru Page 15


Signifying Immediate Image Generation from Text Chapter 2

Project Title and Problem Addressed Implementation Limitations/Future


Author and Results Scope
High-Resolution Image Enhancing the resolu- Implemented a latent Emphasized scalability
Synthesis With La- tion and quality of gen- diffusion model achiev- for large-scale datasets
tent Diffusion Models erated images in latent ing state-of-the-art re- and reducing training
Robin Rombach et al., diffusion models. sults with minimal ar- times.
2023 tifacts.
Antenna Design Us- Limitations in antenna Created synthetic Recommended explor-
ing a GAN-Based data generation for op- datasets; GANs im- ing other RF appli-
Synthetic Data Gen- timal designs. proved antenna design cations and generaliz-
eration Approach accuracy. ing for high-frequency
Oameed Noakoasteen bands.
et al., 2023
Text-to-Image Genera- Challenges in main- Designed a GAN Suggested integrating
tor Using GANs Vadik taining context consis- model integrating multi-modal GANs
Amar et al., 2022 tency between text in- conditional generation and handling more
puts and generated im- strategies. complex scenarios.
ages.
A Stable Diffusion Energy consumption Developed a pro- Recommended testing
Processor for Text- during high-resolution cessor leveraging the processor for edge
to-Image Generation text-to-image synthe- sparsity augmentation, devices and optimizing
Jiwon Choi et al., 2024 sis. reducing energy signif- for real-world applica-
icantly. tions.
Use Mean Field The- Difficulties in training Utilized mean field the- Proposed future work
ory to Train Vanilla deep GAN architec- ory to train a 200-layer on adapting the model
GAN Dan Li et al., tures effectively. GAN model, improv- for conditional GANs
2023 ing model stability. and real-world applica-
tions.

2.12 Summary

The project focuses on generating images from textual prompts using ad-
vanced deep learning techniques. The frontend is built with React, offering
an intuitive interface with options for user registration and login. Once au-
thenticated, users are redirected to a Streamlit-based interface where they
can input prompts to generate images. The backend uses a GAN (Gen-
erative Adversarial Network) model to process the prompts, with Flask
acting as the intermediary for communication. User details, such as login
credentials, are securely stored in a MongoDB database. This workflow
ensures a seamless experience for users while maintaining performance,
modularity, and scalability.

Department of Information Science And Engineering, CEC, Benjanapadavu,Mangaluru Page 16


Chapter 3

Software Requirements Specification

3.1 Functional requirements

Functional requirements define the core functionalities the Interior Design


Text-to-Image Generation system must support to meet user needs. Below
are the key functional requirements:

• User Input: The system must allow users to provide input via a de-
scription of the desired room design, including style, room type, and
color palette.

• Image Generation: The system should generate interior design images


based on the text input provided by the user.

• Interactive User Interface: Users should be able to interact with the


platform to customize their design by selecting from predefined design
styles, room types, and color schemes.

• On-Demand Service: The system should provide real-time image gen-


eration upon receiving a valid user input.

• Image Output: The application must generate and display the design
concept based on user specifications and allow for image download.

• Secure User Access: Users should have a secure login system, ensuring
privacy and data protection.

17
Signifying Immediate Image Generation from Text Chapter 3

3.2 Non-Functional requirements

• Usability: The platform must have an intuitive and responsive inter-


face, ensuring users can easily navigate and generate interior designs.

• Performance: The system should be able to handle multiple simulta-


neous users, with low latency in design generation (within 5 seconds
per design).

• Scalability: The platform must scale to handle an increasing number


of users without performance degradation.

• Security: The system must ensure secure access to user accounts and
safeguard sensitive data, such as design preferences and personal de-
tails.

3.2.1 Safety Requirements

Safety requirements aim to protect user data and ensure the integrity of the
system. The platform should implement strong authentication protocols
to ensure that only authorized users can access their accounts. User data
(such as preferences and generated designs) should be securely stored and
transmitted using encryption. Additionally, regular security audits and
vulnerability scans should be conducted to identify and address potential
threats.

3.2.2 Performance Requirements

To meet the performance expectations of the users, the platform must en-
sure that design generation happens in a reasonable time frame (within 5
seconds per design) to maintain user satisfaction. The platform should
be capable of supporting at least 500 concurrent users, ensuring that
users don’t experience delays or timeouts. Optimized server configura-
tions, database queries, and efficient backend processes will contribute to
achieving this requirement

Department of Information Science And Engineering, CEC, Benjanapadavu,Mangaluru Page 18


Signifying Immediate Image Generation from Text Chapter 3

3.3 User interface design

The user interface (UI) is crucial for providing an optimal user experience.
The platform will feature a simple, clean, and intuitive design. Users will
interact with the platform by entering design preferences through forms
or sliders, selecting room types, and receiving generated designs. The UI
will be responsive, ensuring that the platform is accessible across various
devices, including desktops, tablets, and smartphones.

3.4 Hardware and Software requirements

Hardware Requirements: The platform will be hosted on a cloud server


or dedicated server with a minimum of 8 GB RAM, 4 vCPUs, and at
least 500 GB of storage capacity to accommodate user data and generated
images.
Software Requirements
• Backend: Python-based backend with Flask for handling HTTP re-
quests and interacting with the machine learning model.

• Frontend: ReactJS for a dynamic, responsive interface.

• Database: MongoDB for storing user preferences, design images, and


other data.

• Machine Learning Model: A custom-trained text-to-image model


(e.g., a GAN or diffusion model) to generate interior designs.

• Hosting: The system can be hosted on platforms like AWS, Heroku,


or any suitable cloud provider.

3.5 Performance Requirements

The platform must handle high traffic during peak hours without perfor-
mance degradation. Key performance metrics include:
• Load Time: The platform’s homepage and design input page should
load within 3-5 seconds.

Department of Information Science And Engineering, CEC, Benjanapadavu,Mangaluru Page 19


Signifying Immediate Image Generation from Text Chapter 3

• Design Generation: Each design must be generated within 5 seconds


based on user input.

• Concurrent Users: The platform should support up to 500 concurrent


users, ensuring that simultaneous users do not experience slowdowns
or system crashes.

• Uptime: The platform should maintain an uptime of 99%, ensuring


minimal downtime during critical periods.

3.6 Any Other Requirements

• Integration with Design Software: The platform should allow users to


download generated designs in standard image formats (e.g., PNG,
JPEG). Future iterations may consider integrating with design tools
like AutoCAD or SketchUp for further refinement of the generated
designs.

• Localization: The platform may support multiple languages to cater


to a wider user base, especially if targeting regions with different
languages.

3.7 Summary

In summary, the non-functional requirements for the Interior Design Text-


to-Image Generation system outline the key characteristics for user expe-
rience, performance, and security. The system must be reliable, secure,
scalable, and efficient, ensuring that users can generate high-quality de-
signs quickly and seamlessly. By meeting these requirements, the platform
will provide a robust and engaging experience for users interested in cre-
ating personalized interior designs.

Department of Information Science And Engineering, CEC, Benjanapadavu,Mangaluru Page 20


Chapter 4

System Design

4.1 Abstract Design

4.1.1 Architectural diagram

The architecture of our project, Signifying Immediate Image Generation


from Text, is designed to seamlessly integrate user interaction, backend
processing, and data storage into a cohesive system that ensures a smooth
user experience and efficient performance. At the core of this architecture
is the User Interaction Layer, comprising a React-based frontend for nav-
igation and account management and a Streamlit interface for generating
and displaying designs.
The React frontend serves as the entry point, offering users a clean and
intuitive interface for signing in or signing up. After logging in, users are
directed to the Streamlit interface, where they can input textual descrip-
tions to generate 2D interior design images. The Streamlit interface acts
as a dedicated workspace for interacting with the text-to-image model and
viewing results, bridging the gap between the frontend and backend com-
ponents to ensure user inputs are processed efficiently.

21
Signifying Immediate Image Generation from Text Chapter 4

Figure 4.1: Example Architecture diagram

Figure 4.1 illustrates the architecture of our project, highlighting a


highly efficient pipeline that seamlessly integrates user interaction, back-
end processing, and data storage. Users can input a simple text prompt
through an intuitive interface and receive a visually generated 2D image
within seconds.
This architecture prioritizes simplicity and usability, ensuring that even
users unfamiliar with the underlying technology can effortlessly navigate
and understand the workflow. By combining modern web technologies like
React and Streamlit, advanced machine learning models such as GANs,
and robust data management using MongoDB, the system exemplifies a
cohesive and user-friendly design for generating interior designs from tex-
tual descriptions.

Department of Information Science And Engineering, CEC, Benjanapadavu,Mangaluru Page 22


Signifying Immediate Image Generation from Text Chapter 4

4.2 Proposed system

Figure 4.2: Sample proposed System

The proposed diagram outlines a streamlined workflow that integrates


user interaction, backend processing, and data storage into a cohesive
system. Users interact through a React-based frontend for login/signup,
seamlessly transitioning to a Streamlit interface for submitting prompts.
The backend, powered by Flask, serves as the central hub, processing
prompts with a GAN model to generate images and storing user data

Department of Information Science And Engineering, CEC, Benjanapadavu,Mangaluru Page 23


Signifying Immediate Image Generation from Text Chapter 4

in MongoDB.This architecture emphasizes clarity and efficiency, ensuring


smooth communication between components, providing users intuitive ex-
perience. By combining modern web technologies and machine learning,
the system offers a robust foundation for generating high-quality images
based on user input.

4.3 Functional Design

4.3.1 Modular design diagram

Figure 4.3: Sample Modular design diagram

The functional or modular design of our project breaks down the system
into distinct modules, each responsible for a specific functionality. The user
interface module, built with React and Streamlit, handles user interaction
and input. The backend module, powered by Flask, processes the prompts
and integrates seamlessly with the GAN model for image generation. A
dedicated data management module uses MongoDB to store and retrieve
user data. This modular approach ensures clear separation of concerns,
ease of maintenance, making the system efficient and user-friendly.

Department of Information Science And Engineering, CEC, Benjanapadavu,Mangaluru Page 24


Signifying Immediate Image Generation from Text Chapter 4

4.3.2 Sequence diagram

Figure 4.4: Sample Sequence diagram

The sequence diagram illustrates the step-by-step interactions between


the components of the system. It begins with the user accessing the React
frontend for login or signup, after which they are redirected to the Streamlit
interface. Here, the user inputs a prompt, which is sent to the Flask
backend for processing. Flask forwards the prompt to the GAN model,
which generates the corresponding image and returns it to Flask. Flask
stores the image, user data in MongoDB before sending the image back to
the Streamlit interface. Finally, the user views the generated image. This
diagram clearly outlines the sequential flow of data and control, ensuring
a smooth and logical interaction process.

Department of Information Science And Engineering, CEC, Benjanapadavu,Mangaluru Page 25


Signifying Immediate Image Generation from Text Chapter 4

4.3.3 Use case diagram

Figure 4.5: Example Use case diagram

The use case diagram highlights the key interactions between the user
and the system components. The user interacts with the system to perform
various actions, such as signing in or signing up through the React fron-
tend, submitting prompts via the Streamlit interface, and viewing the gen-
erated images. The backend, powered by Flask, processes these prompts
and communicates with the GAN model to generate images while storing
all relevant data in MongoDB. Each use case is linked to specific actions,
ensuring clarity in the user’s journey through the system. It provides a
high-level overview of the system’s functionality.

Department of Information Science And Engineering, CEC, Benjanapadavu,Mangaluru Page 26


Signifying Immediate Image Generation from Text Chapter 4

4.4 Control Flow Design

4.4.1 Activity diagram for use cases

Figure 4.6: Sample Activity diagram

Activity diagram outlines the control flow of the system, showcasing


the logical progression of activities from start to finish. It begins with the
user accessing the landing page, where they can log in or sign up. Upon
successful authentication, the user is redirected to the Streamlit interface
to submit a prompt. The prompt triggers backend processing, where Flask
forwards it to the GAN model. The GAN processes the input and generates
an image, which is then stored in MongoDB . Flask retrieves the image and
sends it back to the Streamlit interface, where the user views the result.
This diagram effectively maps the decision points, actions, and data flow.

Department of Information Science And Engineering, CEC, Benjanapadavu,Mangaluru Page 27


Signifying Immediate Image Generation from Text Chapter 4

4.5 Data Flow Diagram

4.5.1 Zero-Level Data Flow Diagram

Figure 4.7: Sample Zero-Level Data Flow Diagram

The zero-level Data Flow Diagram (DFD) provides a simplified repre-


sentation of the entire system, encapsulating all its major functionalities
into a single process called the ”Image Generation System.” The user acts
as the external entity, interacting with the system to log in, sign up, and
submit prompts for image generation. Once the user provides input, the
system processes the prompt through its backend components, including
the GAN model, which generates the desired image. The system also com-
municates with MongoDB to store user data. The resulting image is then
sent back to the user, completing the process. This high-level abstraction
focuses on the data flow and external interactions, offering a clear and
concise view of the system’s functionality.

Department of Information Science And Engineering, CEC, Benjanapadavu,Mangaluru Page 28


Signifying Immediate Image Generation from Text Chapter 4

4.5.2 First-Level Data Flow Diagram

Figure 4.8: Sample First-Level Data Flow Diagram

The first-level Data Flow Diagram (DFD) further decomposes the high-
level system into distinct processes, each responsible for specific tasks. The
User interacts with the system by first logging in or signing up, where their
credentials are processed and authenticated. After successful authentica-
tion, the user submits a prompt to the Prompt Submission Process, which
sends the prompt to the Processing System. This system interacts with
the GAN Model to generate an image based on the user’s input. The user
details is stored in MongoDB through the Storage System.First-level DFD
highlights the core processes and their interdependencies in the system.

Department of Information Science And Engineering, CEC, Benjanapadavu,Mangaluru Page 29


Signifying Immediate Image Generation from Text Chapter 4

4.5.3 Second-Level Data Flow Diagram

Figure 4.9: Sample Second-Level Data Flow Diagram

The Second-Level Data Flow Diagram (DFD) dives deeper into the
individual processes, providing a more granular view of the system’s op-
erations. In this diagram, the Login/Signup Process is broken down into
two sub-processes: Authentication and Account Creation . Once the user
is authenticated, the Prompt Submission Process is split into Prompt Val-
idation and Prompt Forwarding. The Processing System consists of two
sub-processes: GAN Model Interaction and Image Generation Confirma-
tion . The Storage System stores User Data .

Department of Information Science And Engineering, CEC, Benjanapadavu,Mangaluru Page 30


Signifying Immediate Image Generation from Text Chapter 4

4.5.4 Summary

This project is a web-based image generation system where users interact


through a React frontend. The process begins with users logging in or
signing up, which authenticates their credentials and redirects them to a
Streamlit interface. Once logged in, users input text prompts that are sent
to a Flask backend, which processes these prompts by communicating with
a GAN model to generate the requested images. The generated images are
then displayed back to the user via Streamlit, providing a seamless experi-
ence. MongoDB is used in the backend to store user-related data, including
credentials and account information, ensuring secure management of user
profiles. The integration of React, Streamlit, Flask, and MongoDB ensures
the system is scalable, efficient, and user-friendly, offering a smooth and
interactive experience from login to receiving generated images.

Department of Information Science And Engineering, CEC, Benjanapadavu,Mangaluru Page 31


Chapter 5

Implementation

5.1 Software Used with Justification

5.1.1 Frontend Development

React: In order to construct a responsive and interactive user interface,


React was utilized to generate reusable and dynamic components. Its
component-based architecture guaranteed scalability and streamlined the
development process. Performance was improved by React’s virtual DOM,
which updated UI changes quickly.

Node.js: To build a scalable and efficient server-side environment, Node.js


was utilized to handle asynchronous operations and manage multiple re-
quests seamlessly. Its event-driven, non-blocking I/O architecture en-
sured high performance and responsiveness, even under heavy workloads.
Node.js’s extensive library of modules and npm ecosystem streamlined de-
velopment, making it ideal for creating fast, reliable, and scalable backend
systems.

Streamlit: Streamlit was used to develop an interactive and user-friendly


web application for data visualization and analysis.Its Python-centric frame-
work enabled rapid prototyping with minimal boilerplate code.Streamlit’s
ability to dynamically update UI components in real time based on user
inputs streamlined the creation of responsive dashboards. This ensured a
seamless experience, making it an ideal choice for presenting data-driven
insights.

32
Signifying Immediate Image Generation from Text Chapter 5

5.1.2 Backend Development

FastAPI: To build high-performance APIs with Python, FastAPI was uti-


lized. Known for its speed and efficiency, FastAPI leverages asynchronous
programming to handle a large number of requests concurrently, making
it ideal for scalable applications. Its automatic generation of interactive
API documentation with Swagger and ReDoc greatly simplified develop-
ment and testing. FastAPI’s strong support for data validation, type hints,
and dependency injection streamlined the creation of robust, maintainable
APIs with minimal code.

pyTorch: To develop and deploy deep learning models, PyTorch was uti-
lized for its dynamic computation graph, which allows for greater flexibility
during model development and debugging. Its intuitive, Pythonic inter-
face makes it easy to experiment with various architectures and algorithms.
PyTorch’s extensive support for GPU acceleration and its seamless inte-
gration with other Python libraries, such as NumPy and pandas, ensured
efficient computation and scalability. Additionally, its robust ecosystem
for model training, optimization, and deployment streamlined the devel-
opment of high-performance machine learning models.

5.1.3 Framework Used

Flask: Flask is a lightweight and flexible Python web framework designed


for simplicity and extensibility. Unlike more comprehensive frameworks
like Django, Flask provides only the essential tools for web development,
allowing developers to add libraries and features as needed. It uses a
decorator-based approach to define routes, making it easy to create and
manage application endpoints. For rendering HTML, Flask integrates the
Jinja2 templating engine, which supports dynamic content generation with
a clean syntax. The framework is highly extensible, offering a range of
optional extensions such as Flask-SQLAlchemy for database integration,
Flask-WTF for form handling, and Flask-RESTful for building APIs. This
minimalistic yet powerful approach makes Flask ideal for small to medium-
sized applications and projects that require flexibility.

Department of Information Science And Engineering, CEC, Benjanapadavu,Mangaluru Page 33


Signifying Immediate Image Generation from Text Chapter 5

5.1.4 Coding Languages Used for Development

• Python: Python’s extensive standard library provides tools for han-


dling tasks such as file manipulation, networking, data processing,
and web development. Additionally, its vast ecosystem of third-party
libraries and frameworks—like NumPy and pandas for data analysis,
Django and Flask for web development, and TensorFlow and Py-
Torch for machine learning—further extends its capabilities.Python
is widely used in areas like web development, scientific computing,
artificial intelligence, data analysis, automation, and more. Its sim-
plicity makes it an excellent choice for beginners, while its power and
flexibility cater to the needs of experienced developers building com-
plex applications.

5.1.5 Operating System

Windows : Windows is a popular operating system developed by Microsoft,


designed to provide a user-friendly graphical interface for personal com-
puters, laptops, and servers. First released in 1985, it has evolved through
numerous versions, each improving functionality, performance, and user
experience. Known for its intuitive interface, Windows features the fa-
miliar Start menu, taskbar, and desktop layout, making it accessible for
users of all levels. It supports a wide range of software applications, from
productivity tools like Microsoft Office to gaming and creative software.
Windows is compatible with various hardware, making it a versatile choice
for both personal and professional use. The operating system includes fea-
tures like Windows File Explorer for file management, Cortana for voice
assistance, and built-in security tools such as Windows Defender. The lat-
est versions, Windows 10 and 11, bring new features like virtual desktops,
enhanced touchscreen support, and seamless integration with Microsoft’s
cloud services, including OneDrive. With its widespread adoption and
constant updates, Windows remains a key player in the computing world.

Department of Information Science And Engineering, CEC, Benjanapadavu,Mangaluru Page 34


Signifying Immediate Image Generation from Text Chapter 5

5.2 Hardware Used with Justification

Since the Alumni Association project is software-based, no specific hard-


ware was required for development. The application was developed and
tested on standard computing devices with multi-core processors and a
minimum of 8 GB of RAM. For hosting and deployment, the application
was deployed on Vercel, a cloud-based platform, eliminating the need for
dedicated hardware infrastructure.

5.3 Algorithm/Procedures Used in the Project in


Different Modules

The SIGNIFYING IMMEDIATE IMAGE GENERATION FROM


TEXT project is built using various modules, each carefully designed to
perform a specific task. These modules work together seamlessly to trans-
form text descriptions into visually appealing 2D interior designs.

1. Text-to-Image Generation At the heart of the project is a Gen-


erative Adversarial Network (GAN), a machine learning model that
generates images from textual input. The process begins by convert-
ing the text into numerical data using an embedding technique. The
GAN then uses two main components:

• A Generator, which creates images based on the text.

• A Discriminator, which ensures the generated image matches


the text description.

Both components are trained together to improve the image quality


and alignment with the input text.

2. User Interaction To make the system user-friendly, a combination


of React and Streamlit is used:

• The React frontend provides a smooth interface for users to


log in or sign up.

Department of Information Science And Engineering, CEC, Benjanapadavu,Mangaluru Page 35


Signifying Immediate Image Generation from Text Chapter 5

• The Streamlit interface allows users to input their text de-


scriptions and view the generated images in a simple and intu-
itive way.

3. Backend Processing The backend, powered by Flask, acts as the


brain of the system. It handles user input, sends it to the GAN model
for processing, and retrieves the generated image. Flask also ensures
the entire process is smooth and efficient.

4. Data Management To store and manage user inputs and gener-


ated images, MongoDB is used. This database keeps everything
organized, making it easy for users to access their previously created
designs whenever they need.

5.4 Summary

This chapter outlines the key components and tools used in the develop-
ment of the Signifying Immediate Image Generation from Text project,
highlighting the rationale behind their selection. The software stack in-
cludes React for creating dynamic and responsive user interfaces, Streamlit
for real-time interactive dashboards, Node.js for efficient server-side oper-
ations, and FastAPI for building high-performance APIs. PyTorch was
utilized for developing and deploying deep learning models, while Flask
served as the lightweight framework for backend integration. Python,
with its extensive library support and simplicity, was the primary pro-
gramming language, complemented by the Windows operating system for
development and testing. From a hardware perspective, standard comput-
ing devices with multi-core processors and at least 8 GB of RAM sufficed,
while the application was deployed on Vercel to eliminate the need for
dedicated infrastructure. The project architecture is modular, incorporat-
ing text-to-image generation using GANs, user interaction via React and
Streamlit, backend processing with Flask, and efficient data management
using MongoDB. These components seamlessly work together to transform
textual descriptions into high-quality, visually appealing 2D designs.

Department of Information Science And Engineering, CEC, Benjanapadavu,Mangaluru Page 36


Chapter 6

Results and Discussion

6.1 Results

Figure 6.1: Home page

Figure 6.1 represents The Home Page of the application acts as a wel-
coming gateway for users, offering a clean and minimalistic design. Its
responsive layout ensures compatibility with all devices, providing an ac-
cessible and seamless user experience. During user testing, the interface
received positive feedback for its simplicity and ease of navigation. Users
could easily access options to sign up, log in, or learn more about the sys-
tem.

37
Signifying Immediate Image Generation from Text Chapter 6

Figure 6.2: Sign Up Page

Figure 6.2 represents The Sign-Up Page facilitates the creation of user
accounts securely. It includes fields to input a username, email, and pass-
word, all of which are validated to ensure accuracy. Additionally, user
credentials are encrypted, prioritizing data security. Once the sign-up
process is complete, users are seamlessly redirected to the Login Page.

Figure 6.3: Sign In Page

Department of Information Science And Engineering, CEC, Benjanapadavu,Mangaluru Page 38


Signifying Immediate Image Generation from Text Chapter 6

Figure 6.3 represents The Login Page allows registered users to access
their accounts securely. It features fields for email and password input,
ensuring smooth authentication. Incorrect login attempts prompt error
messages, guiding users to resolve the issue. Upon successful login, users
are directed to the application’s main functionality, where they can provide
text prompts for interior designs.

Figure 6.4: Give Prompt to generate

Figure 6.4 represents Once logged in, users interact with the system
by providing textual descriptions of their desired interior designs. For
instance, a user might enter the prompt, “A cozy bedroom with wooden
flooring, natural lighting, and a queen-sized bed.” The system processes
this input and generates a 2D design that aligns closely with the provided
description. Feedback mechanisms are integrated to guide users in refining
their prompts for optimal results.

Department of Information Science And Engineering, CEC, Benjanapadavu,Mangaluru Page 39


Signifying Immediate Image Generation from Text Chapter 6

Figure 6.5: Generate Images

Figure 6.5 represents The generated images are the highlight of the ap-
plication. Based on the text input, the system leverages a custom GAN
model to create detailed 2D designs. For example, a prompt like “A mod-
ern living room with a grey sofa, a glass coffee table, and indoor plants”
produced a visually appealing and accurate representation of the descrip-
tion. Testing showed that the system efficiently translated textual inputs
into designs, receiving praise for its intuitive output. The generated de-
signs are downloadable, allowing users to save or integrate them into other
design projects.

6.2 TESTING

For our project, the testing process ensures the system’s usability, accu-
racy, and functionality. It involves running tests to identify errors and
validate the performance of each module by simulating various user inputs
and scenarios. The primary goal of testing is to confirm the system’s abil-
ity to interpret textual descriptions accurately and generate corresponding
2D interior designs.
The testing success depends on well-structured test cases that cover
diverse room types, material descriptions, and user interactions. Each test
case includes inputs (e.g., text descriptions of rooms or specific furniture),

Department of Information Science And Engineering, CEC, Benjanapadavu,Mangaluru Page 40


Signifying Immediate Image Generation from Text Chapter 6

system actions (e.g., text preprocessing, GAN model processing, and 2D


image generation), expected outputs (e.g., accurate 2D designs matching
the description), and actual outputs.

6.3 TYPES OF SOFTWARE TESTING

• Black Box Testing: In this testing method, we check if the system


works as expected without looking at how it’s built internally. For
our project, we test whether the application generates accurate 2D
interior designs based on the user’s text input. For example, if the
input is “A cozy bedroom with wooden flooring and a queen-sized
bed,” the system should create an image matching the description.
We also test unusual inputs, like incomplete or unclear descriptions,
to see if the system handles them properly and gives helpful feedback.
This ensures the system behaves as users would expect.

• White Box Testing: This type of testing focuses on how the system
works on the inside. It involves checking the code, algorithms, and
overall logic to ensure everything runs smoothly and efficiently. For
our project, we test how well the system processes text descriptions,
how accurately the GAN model creates images, and whether the ap-
plication handles different kinds of inputs effectively. For example, we
check if the text processing part can understand complex sentences
and if the GAN model generates designs that match the key features
described. This helps ensure the system produces high-quality results.

6.4 TESTING METHODOLOGY

The different types of testing are as follows:

6.4.1 Unit Testing

This type of testing focuses on checking individual parts of the system.


For our project, we tested each module separately, such as the text in-
put processor, the GAN model for generating images, and the backend

Department of Information Science And Engineering, CEC, Benjanapadavu,Mangaluru Page 41


Signifying Immediate Image Generation from Text Chapter 6

handling user inputs. For example, we verified whether the system could
understand a description like “A modern living room with a grey sofa”
and if the GAN generated a suitable design. This way, we identified and
fixed any bugs in the individual parts before combining them into the full
system.

6.4.2 Integration Testing

After making sure each part worked individually, we tested how well they
worked together. Integration testing ensured that all components, like
the text analysis module, the GAN model, and the user interface, could
communicate and function as a cohesive system. For instance, we checked
if a user input was passed correctly from the frontend to the GAN model
and if the generated image was displayed back to the user without errors.
This testing step helped us catch any issues in the interaction between
different modules.

6.4.3 System Testing

System testing involved testing the entire project as a whole, under real-life
conditions. We made sure all parts of the system—like the text analysis,
image generation, and user interface—worked seamlessly together. For
example, we tested the system with a variety of text descriptions to see
if it consistently generated accurate 2D designs and handled unexpected
inputs gracefully. This stage ensured that the application could provide
reliable results in a real-world environment.System testing consists of the
following steps:

• Module Testing: Each module, such as text processing and image


generation, was tested to ensure it worked properly on its own.

• Scenario Testing: Different types of user inputs, like detailed or vague


room descriptions, were tested to see if the system could handle them
and generate appropriate designs.

Department of Information Science And Engineering, CEC, Benjanapadavu,Mangaluru Page 42


Signifying Immediate Image Generation from Text Chapter 6

• End-to-End Testing: The entire system was tested to ensure all com-
ponents worked together without any issues, from the user’s input to
the generated design.

• Documentation: Detailed records of the testing process, issues en-


countered, and their resolutions were created. This documentation
will help in future updates or debugging.

6.5 TESTING CRITERIA

6.5.1 Testing for Text Input

Table 6.1: Sample Testing Criteria for Text Input


Test Case Input Test Description Output
1 “A cozy bedroom with Verify that the sys- Generated design in-
a queen-sized bed.” tem generates a design cludes a cozy bedroom
matching the input de- with a queen-sized bed.
scription.
2 Empty text input Ensure the system han- Error message prompt-
dles empty text input ing for valid input.
gracefully.

6.5.2 Testing for Generated Images

Table 6.2: SSample Testing Criteria for Generated Images


Test Case Input Test Description Output
1 Text input: “A mod- Verify that the system Design features a grey
ern living room with generates an accurate sofa and indoor plants
a grey sofa and indoor design for the descrip- in a modern living
plants.” tion. room.
2 Text input: “A rustic Ensure that the sys- Generated image in-
kitchen with wooden tem captures key ele- cludes wooden cabi-
cabinets.” ments of the descrip- nets in a rustic kitchen
tion in the output. style.

Department of Information Science And Engineering, CEC, Benjanapadavu,Mangaluru Page 43


Signifying Immediate Image Generation from Text Chapter 6

6.5.3 Testing for User Interface

Table 6.3: Sample Testing Criteria for User Interface


Test Case Input Test Description Output
1 Click on “Generate Im- Check if the button Image is successfully
age” after entering a triggers the generation generated and dis-
valid prompt. process. played.
2 Click on “Generate Im- Verify that the system Error message dis-
age” without entering prompts for valid in- played: “Please enter
a prompt. put. a prompt.”

6.5.4 Testing for Download Functionality

Table 6.4: Sample Testing Criteria for Download Functionality


Test Case Input Test Description Output
1 Click on the “Down- Ensure that the image Image file is down-
load” button after gen- is successfully down- loaded.
erating an image. loaded to the user’s de-
vice.
2 Click on the “Down- Verify that the sys- Error message dis-
load” button without tem handles the action played: “No image
generating an image. gracefully. available for down-
load.”

6.6 Summary

The project ”SIGNIFYING IMMEDIATE IMAGE GENERATION FROM


TEXT” offers a creative and user-friendly solution for generating 2D in-
terior designs from textual descriptions. It is designed with three core
functionalities: natural language processing to interpret user inputs and
extract key design elements, a custom GAN model for generating accurate
and visually appealing 2D designs, and a simple user interface for seamless
interaction. Users provide descriptive prompts about their desired inte-
rior spaces, such as ”a cozy bedroom with wooden flooring and natural
lighting,” which the system processes to generate corresponding room de-
signs that align with the description. The generated designs are displayed
within the application, and users can download them for further use. The

Department of Information Science And Engineering, CEC, Benjanapadavu,Mangaluru Page 44


Signifying Immediate Image Generation from Text Chapter 6

project employs a MongoDB database to store user inputs and design


data efficiently, ensuring smooth and organized information management.
A Flask-powered backend integrates all system functionalities, enabling
real-time processing and interaction, while a React-developed frontend en-
sures a responsive and engaging user experience. By leveraging advanced
AI models and intuitive interfaces, the project simplifies the process of
conceptualizing interior designs, making it accessible to a wider audience,
including homeowners, designers, and enthusiasts, and provides a practical
tool to bridge the gap between creative ideas and tangible visualizations,
significantly enhancing the design process.

Department of Information Science And Engineering, CEC, Benjanapadavu,Mangaluru Page 45


Chapter 7

Conclusions and Future work

The project, ”SIGNIFYING IMMEDIATE IMAGE GENERATION FROM


TEXT,” demonstrates how AI can bridge the gap between textual descrip-
tions and visual interior designs. By utilizing custom GAN models, the
system efficiently transforms user inputs into accurate 2D room layouts.
The integration of React, Flask, and MongoDB ensures smooth function-
ality and user-friendly interaction, making it a practical tool for both per-
sonal and professional applications in interior design.

In the future, the project can be enhanced by incorporating 3D design


capabilities, expanding the dataset for better accuracy, and integrating
with professional tools like AutoCAD. Features such as mobile app support
and user profiles could further increase accessibility and usability. These
advancements would position the system as a powerful and comprehensive
solution for interior design needs.

46
References

[1] S. K. Alhabeeb and A. A. Al-Shargabi, “Text-to-image synthesis with generative


models,” N/A, 2023, Add the journal or conference details if available.
[2] Y. X. Tan et al., “Recent advances in text-to-image synthesis,” N/A, 2024, Add
the journal or conference details if available.
[3] M. A. Habib et al., “Gacnet - text-to-image synthesis with attention mechanisms,”
N/A, 2024, Add the journal or conference details if available.
[4] M. A. Habib et al., “Exploring progress in text-to-image synthesis: An in-depth
survey on generative adversarial networks,” N/A, 2023, Add the journal or confer-
ence details if available.
[5] K. Qiao et al., “Biggan-based bayesian reconstruction of natural images from human
brain activity,” N/A, 2023, Add the journal or conference details if available.
[6] R. Rombach et al., “High-resolution image synthesis with latent diffusion models,”
N/A, 2023, Add the journal or conference details if available.
[7] O. Noakoasteen et al., “Antenna design using a gan-based synthetic data generation
approach,” N/A, 2023, Add the journal or conference details if available.
[8] V. Amar et al., “Text-to-image generator using gans,” N/A, 2022, Add the journal
or conference details if available.
[9] J. Choi et al., “A stable diffusion processor for text-to-image generation,” N/A,
2024, Add the journal or conference details if available.
[10] D. Li et al., “Use mean field theory to train vanilla gan,” N/A, 2023, Add the
journal or conference details if available.

47
Appendix A

Turnitin Plagiarism Report

48
Appendix B

Expo Details

Figure B.1: Innovation Showcase

Figure B.1 shows our Team has successfully presented the project in
”Nirmaan 2024”as part of ”Innovations Showcase” college level project
exhibition and competition at Canara Engineering College on 10th Dece-
meber 2024

49

You might also like