0% found this document useful (0 votes)
310 views29 pages

2A Report

This document provides an overview of an AI image generation project. The project aims to generate 64x64 images from text input using an improved GAN model to reduce training time and improve model convergence. A web application will be developed where users can input text and an AI-generated image will be produced based on the text description. The document outlines the problem statement, objectives, proposed system architecture, implementation plan, and expected outcomes of the project.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
310 views29 pages

2A Report

This document provides an overview of an AI image generation project. The project aims to generate 64x64 images from text input using an improved GAN model to reduce training time and improve model convergence. A web application will be developed where users can input text and an AI-generated image will be produced based on the text description. The document outlines the problem statement, objectives, proposed system architecture, implementation plan, and expected outcomes of the project.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 29

Imaginate: AI Image Generator

Submitted in the partial fulfillment of the requirements of the


degree of Bachelor of Engineering

by

Sr. No. Name of the Student IEN No.


1 Nupur Luhar 12112056
2 Dimple Maherao 12112047
3 Jay Kundgar 12112012

Under the Guidance


of Ms. Harpreet Kaur

Department of Computer Engineering


New Horizon Institute of Technology and Management,
University of
Mumbai (2023-2024)

i
Department of Computer Engineering
New Horizon Institute of Technology and Management,
University of Mumbai

CERTIFICATE

This is to certify that the project entitled “ Imaginate: AI Image Generator”,


Group Number 13 of

Sr. No. Name of the Student IEN No.


1 Nupur Luhar 12112056

2 Dimple Maherao 12112047

3 Jay Kundgar 12112012

has satisfactorily completed mini-project – 2A in partial fulfillment of the requirement


for the award of the Degree of Bachelor of Engineering in “Computer Engineering”.

Ms. Harpreet Kaur


Mini-Project
Guide

Dr. Sanjay Sharma Dr. Prashant Deshmukh


Head of Department Principal
ii
Department of Computer Engineering
New Horizon Institute of Technology and Management,
University of Mumbai

Mini Project-2A Approval for T.E.

This project report entitled “Imaginate: AI Image Generator” by Nupur Luhar, Dimple
Maherao, and Jay Kundgar is approved for the degree of Bachelor of Engineering in
Computer Engineering, 2023- 2024.

Examiners:

1. Name:

Sign with Date:

2. Name:

Sign with Date:

Date:

Place:

ii
Department of Computer Engineering
New Horizon Institute of Technology and Management
University of Mumbai

Declaration

We declare that this written submission represents our ideas in our own words and
where others’ ideas or words have been included, we have adequately cited and
referenced the original sources. We also declare that I have adhered to all principles of
academic honesty and integrity and have not misrepresented fabricated or falsified any
idea/data/fact/source in my submission. We understand that any violation of the above
will be cause for disciplinary action by the institute and can also awaken penal action
from the sources that have thus not been properly cited or from whom proper
permission has not been taken when needed.

1. Nupur Luhar (12112056)

2. Dimple Maherao (12112047)

3. Jay Kundgar (12112012)

iv
Department of Computer Engineering
New Horizon Institute of Technology and Management
University of Mumbai

Acknowledgment

We want to take this opportunity to thank one and all.


It is our immense pleasure to express our gratitude to our Guide, Ms. Harpreet Kaur for
providing us with constructive and positive feedback during the preparation of this project.
We would like to express our thanks to the Head of the Computer Engineering
Department, Dr. Sanjay Sharma, and all other staff members for their encouragement
and suggestions.
Last but not least, we are thankful to our friends for their support and coordination. We are
also thankful to our parents for their constant support and best wishes.

1. Nupur Luhar (12112056)

2. Dimple Maherao (12112047)

3. Jay Kundgar (12112012)

Date: ___________________

Place: ___________________

vi
Abstract

Artificial Intelligence (AI) has made remarkable strides in various fields, and one of its most captivating
applications is image generation. This abstract provides an overview of the burgeoning field of AI image
generation and its transformative impact on the creative landscape.

AI image generators employ deep learning techniques, particularly Generative Adversarial Networks (GANs)
and Variational Autoencoders (VAEs), to produce highly realistic and often surreal images. These models
have been extensively trained on vast datasets, enabling them to understand and replicate patterns, styles, and
artistic elements from diverse sources.

The potential applications of AI image generators are wide-ranging. They have been utilized in art,
entertainment, and design, enabling artists and designers to explore new frontiers of creativity. Additionally,
they find practical use in industries such as advertising, fashion, and architecture, where customized and
visually appealing content is in high demand.

In this project, we look at generating 64*64 Images on the fly using text as an Input. The images generated
will be unique in terms that they do not already exist and in doing that we will improve upon already existing
Architecture models and try to reduce the difficulties that come with training GAN Models like Reduced
Training Time and Better Convergence of The Model. The Final Project will be a web application, where, you
can Input a Text and a Synthetic Image will be generated based on the description of the text.

In conclusion, AI image generators have emerged as powerful tools that blend human creativity with machine
learning capabilities. As the technology continues to evolve, it promises to redefine the boundaries of artistic
expression and visual content creation while raising important ethical questions that require thoughtful
consideration.

vii
Table Of Contents

Sr. No. Content Page No.

1. Declaration iv
2. Acknowledgement v
3. Abstract vi
4. Table of Contents vii
5. List of Figures ix
6. List of Tables ix

7. Chapter 1
Introduction......................................................................................1

1.1 Purpose...............................................................................................1
1.2 Overview of Document......................................................................2

9. Chapter 2
Literature Review.............................................................................3

2.1 Existing System..................................................................................3


2.2 Literature Review...............................................................................4

10. Chapter 3
Problem Statement, Objectives and Scope.....................................5

3.1 Problem Statement............................................................................5


3.2 Objectives..........................................................................................5
3.3 Scope..................................................................................................6

vii
11. Chapter 4
Proposed System.................................................................................7

4.1 Proposed Architecture ………………………………………..……..7


4.2 Advantages of Proposed System……………………………………..8
4.3 System Design…………………………………………………..……9

12. Chapter 5
Implementation plan..........................................................................9

5.1 Gantt Chart.............................................................................................10


5.2 Expected Outcome..................................................................................11

13. Chapter 6
Conclusion……………………………………………………………..12

References

Annexure

(A Weekly Progress Report)

viii
LIST OF FIGURES:

Fig. No. Caption Page No.

5.2.1 Home page 11

5.2.2 Predefined Examples 12

5.2.3 Login Page 13

5.2.4 Prompt given to Imaginate 14

LIST OF TABLES:

Table No. Title Page No.

1 Flowchart 9

2 Schedule Table 10

3 Gantt Chart 10

ix
CHAPTER 1

INTRODUCTION

1.0 Introduction:

• For the human mind, it is very easy to think of new content. what if someone asks you
to “draw a flower with blue petals”. It is very easy for us to do that. but machines
process information very differently. Just understanding the structure of the above
sentence is a difficult task for them let alone generating something based on that
description.
• Automatic synthetic content generation is a field that has been explored in the past
and was discredited because at that time neither the algorithms existed nor had
enough processing power to help solve the problem. However, the advent of deep
learning started changing the earlier beliefs. The tremendous power of neural
networks to capture the features even in humongous datasets makes them a very
viable candidate for automatic content generation.
• Generating an image from a text-based description is one aspect of generative
adversarial networks that we will focus on. Since the GANs follow an unsupervised
learning approach we have modified them to take am input as a condition and
generate based on the input condition.

1.1 Purpose:
• The purpose of creating an AI image generator is to revolutionize the way we produce
and manipulate visual content. By harnessing the capabilities of artificial intelligence
and deep learning, an AI image generator is a transformative tool for artists,
designers, businesses, and industries.
• Its primary goal is to enhance creativity by providing a platform for the rapid
generation of high-quality images, thereby saving time and reducing costs in content
creation. This technology democratizes access to design and visual content
production, making it accessible to a broader audience while maintaining consistency
and quality.
• Additionally, AI image generators foster experimentation, encourage innovation, and
contribute to the advancement of artificial intelligence research, ultimately shaping

1
the future of visual communication and creative expression.

2
1.2 Overview of Document:

 The next chapter, the Present Investigation section, of this document gives a problem
statement and feasibility analysis of our web application.
 Chapter Three gives an overview of the functionality of the product.
It describes the informal requirements and is used to establish a context for the
technical requirements specification in the next chapter.
 The third chapter, the Implementation Details section of this document, is written
primarily for the developers and describes in technical terms the details of the
functionality of the product.
 Both sections of the document describe the same software product in its entirety.

3
CHAPTER 2
LITERATURE SURVEY

2.1 Existing system:

 Without AI image generators, creating images or visual content relies heavily on manual
labor. This can be time-consuming, and expensive, and may limit the scalability of image
production.
 Human designers have creative limitations and may struggle to generate large volumes of
diverse images quickly. AI image generators can produce a wide range of styles and concepts
efficiently.
 Human designers require time to conceptualize and create images. In fast-paced industries
like advertising and social media, timely content creation can be challenging without AI
assistance.
 Employing skilled graphic designers or photographers to create custom images can be costly.
AI image generators can significantly reduce production costs, especially for repetitive or
simple image tasks.
 Human-generated images may have variations in style and quality, which can impact brand
consistency and message clarity. AI image generators can provide a consistent output based
on predefined parameters.
 Traditional image creation processes often require significant resources, including skilled
personnel, high-end software, and hardware. AI image generators can run on standard
computing hardware.
 Scaling up image production for large-scale projects, such as e-commerce catalogs or social
media campaigns, can be difficult and slow without automation through AI.

4
2.2 Literature review:

(1) Title: Image Generation Using Text


Perspective Author: Prajwal Thakur, Computer Science Engineering, Jaypee
University of Information Technology Waknaghat, Himachal Pradesh
Volume:03/Issue:07/July-2021 Impact Factor- 5.354

 This paper outlines a lot of tools and techniques to process words in the overall field of Natural
Language Processing and the best practices to properly process the input.
 This paper even gives information about various architectures of Deep Learning that are most
commonly used today along with proper techniques to train the models and steps to avoid
overfitting and underfitting
 It introduced neural style transfer, a technique that allows the generation of images in the style
of famous artists' paintings.

(2) Title: Image Generation with Attention Mechanism

Perspective Author: Tianlun Li, Stanford University

Volume:02/Issue:04/September-2022 Impact Factor- 6.781

 This seminal paper introduced GANs, a class of models that have revolutionized image
generation by training a generator network to produce realistic images while a discriminator
network tries to distinguish real from fake images.

 It introduced VAEs, which are probabilistic generative models used for image generation, among
other tasks. VAEs provide a way to generate images with controlled features.

 This work extends GANs to the conditional setting, allowing for the generation of images
conditioned on specific attributes or classes.

5
CHAPTER 3
PROBLEM STATEMENT, OBJECTIVE, AND SCOPE

3.1 Problem Statement:

 In the digital age, the demand for high-quality, customizable images for various applications, such
as content creation, marketing, and design, has grown exponentially. However, traditional methods
of image creation are often time-consuming, and expensive, and may not meet the need for rapid,
scalable, and diverse image generation.
 To address this challenge, we aim to develop an AI-powered image generator capable of producing
a wide range of images with specific attributes, styles, and levels of complexity. This image
generator should serve as a creative and efficient tool for individuals and businesses seeking to
streamline their image production processes while maintaining a high level of customization and
control

3.2 Objective:

 The main motive of our web application is to Develop an AI image generator capable of creating
images that are visually indistinguishable from real photographs, illustrations, or other types of
images.
 We also emphasize enabling users to customize generated images by specifying attributes such as
style, content, colors, and composition, allowing for a high degree of creative control.
 We ensure that the image generator can produce a wide range of images, including different styles,
genres, and content types, to meet diverse user needs.
 We aim to create an intuitive and user-friendly interface or API that allows users to interact with the
image generator easily, specifying their requirements and preferences.

6
3.3 Scope:

 It will help to develop the capability to generate artistic images, illustrations, and visual
content in various styles, such as impressionism, cubism, or abstract art.

 It would increase the rate of creation of AI models that can generate photorealistic images,
mimicking the appearance of photographs or real-world scenes.

 It will allow users to specify various attributes and characteristics, such as color schemes,
composition, styles, and content elements, to customize the generated images.

 It would enable the generation of images based on textual descriptions, providing a way to
describe an image and have it generated automatically.

 It would implement mechanisms to address ethical concerns related to generated content,


such as ensuring that the images do not violate copyright, privacy, or cultural sensitivities.

 It would implement quality control mechanisms to ensure that the generated images meet
predefined quality standards, reducing the likelihood of producing low-quality or
erroneous content.

 It will also support different input modalities, including text descriptions, sketches,
reference images, or even audio descriptions, to allow users flexibility in how they
describe or guide image generation.

7
CHAPTER 4
PROPOSED SYSTEM

4.1 Proposed Architecture:

 The proposed architecture for an AI image generator is a multifaceted system designed to create a
versatile and customizable image generation tool. At its core, this architecture leverages deep
learning models, such as Generative Adversarial Networks (GANs) or Variational Autoencoders
(VAEs), to facilitate image synthesis. The process begins with data collection and preparation,
involving the assembly of a diverse dataset and preprocessing steps. Training of the model is a
critical phase, incorporating techniques like progressive training and transfer learning to
accelerate convergence and improve image quality.
 The customization module empowers users to define image attributes, styles, and other
parameters, which are then translated into model-friendly representations. Using this input, the AI
generates images in real-time or through batch processing, depending on the application. Quality
control mechanisms are integrated to filter out subpar results, ensuring the final output meets
predefined criteria. To enhance accessibility and usability, a user-friendly interface or API is
developed to facilitate interaction with the AI image generator.
 Ethical considerations are crucial, encompassing measures to address copyright and privacy
concerns while maintaining legal and ethical standards in the generated content. The system is
designed with scalability in mind, capable of handling increased demand through load balancing
and distributed processing. Security measures are implemented to safeguard against potential
attacks or misuse.
 Performance optimization strategies enhance speed and resource efficiency, making efficient use
of available hardware resources. Monitoring and logging tools are deployed to track system
performance, user interactions, and errors. Comprehensive documentation and support resources
are provided to assist users, and a feedback loop is established for continuous improvement based
on real-world usage and evolving requirements. Deployment and maintenance planning ensure
the smooth operation and sustainability of the AI image generator in the intended environment.

4.2 Advantages of Proposed System

8
 Customization: Users can tailor generated images by specifying attributes, styles, and other
parameters, providing a high degree of creative control. This customization is beneficial for
industries like advertising, design, and content creation.
 Versatility: The system can produce a wide range of image types, from artistic creations to
photorealistic images, accommodating diverse user needs and applications.
 Ease of Use: The user-friendly interface or API simplifies interaction with the AI image generator,
making it accessible to a broad audience, including those without advanced technical skills.
 Efficiency: Real-time and batch processing capabilities enable efficient image generation for both
interactive and bulk tasks, reducing the time and effort required for image production.
 Quality Control: Quality control mechanisms filter out low-quality or inappropriate images,
ensuring that the generated content meets predefined criteria and maintains brand or project
standards.
 Scalability: The system is designed to scale horizontally, handling increased demand without
sacrificing performance. This scalability is particularly valuable for high-demand applications.
 Ethical Compliance: Integrated mechanisms address ethical concerns, such as copyright and
privacy issues, helping users generate content that adheres to legal and ethical standards.
 Security: Robust security measures protect against potential threats, including unauthorized access
and misuse, safeguarding both the system and its users.
 Performance Optimization: The system is optimized for speed and resource efficiency, making
efficient use of available hardware resources and ensuring rapid image generation.
 Monitoring and Logging: Comprehensive monitoring and logging tools track system performance
and user interactions, facilitating system maintenance and continuous improvement.
 Documentation and Support: Extensive documentation and support resources assist users in
effectively utilizing the AI image generator, reducing the learning curve and enhancing user
satisfaction.
 Deployment and Maintenance: Careful deployment and maintenance planning ensure the system's
smooth operation and long-term sustainability in the intended environment.
 Adaptability: The system is adaptable to various input modalities, including text descriptions,
sketches, reference images, and audio descriptions, accommodating a wide range of image
synthesis scenarios.
 Monetization Opportunities: The system can be monetized through licensing, subscription
models, or pay-per-use, providing a potential revenue stream for developers and organizations.

9
4.3 System design:

Fig 1: Flowchart

10
CHAPTER 5
IMPLEMENTATION PLAN

5.0 Implementation plan:

Sr. Title Start Date End Date Duration


No.
1 Topic Discussion

2 Topic Finalization and Ideation

3 Front-end development

4 API testing
5 Back-end Integration

6 Pushing FASTapi on Replit for deployment

7 Project Completion

Table:1 - Schedule Table

5.1 Gantt Chart:

Table: 2- Gantt Chart

11
5.2. Expected Outcome:

5.2.1 Homepage:

Fig:5.2.1- Homepage

12
5.2.2 Predefined Examples:

13
Fig.5.2.2 Predefined Examples

14
5.2.3 Account Login:

Fig.5.2.3 Account login

15
5.2.4 Prompt given and Image generated by Imaginate:

16
Fig.5.2.4 Prompt given and Image generated by Imaginate

17
CHAPTER 6
CONCLUSION

 In conclusion, the development of this software was driven by a recognition of the pressing need for an
innovative solution in the field of image generation.
 We embarked on this project with a clear mission: to empower users with a versatile, efficient, and
customizable tool that would revolutionize the way images are created and customized across various
industries.
 We aimed to provide creative control and flexibility to users, ensuring they could generate images tailored
to their specific requirements. By developing an intuitive user interface, integrating quality control
measures, addressing ethical concerns, and optimizing performance, we sought to create a comprehensive
solution that not only streamlines image generation but also enhances the overall user experience.
 Through continuous feedback, support, and ongoing improvement, we are committed to delivering
software that not only meets but exceeds the expectations of our users, ultimately contributing to the
efficiency and creativity of image generation processes in the digital age.

18
References:

[1] M. Arjovsky and L. Bottou. Towards principled methods for training generative adversarial
networks. In ICLR, 2017.
[2] A. Brock, T. Lim, J. M. Ritchie, and N. Weston. Neural photo editing with introspective adversarial
networks. In ICLR, 2017.
[3] T. Che, Y. Li, A. P. Jacob, Y. Bengio, and W. Li. Mode regularized generative adversarial
networks. In ICLR, 2017.
[4] X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel. Infogan: Interpretable
representation learning by information maximizing generative adversarial nets. In NIPS, 2016.
[5] E. L. Denton, S. Chintala, A. Szlam, and R. Fergus. Deep generative image models using a
laplacian pyramid of adversarial networks. In NIPS, 2015.
[6] C. Doersch. Tutorial on variational autoencoders. arXiv:1606.05908, 2016.
[7] J. Gauthier. Conditional generative adversarial networks for convolutional face generation.
Technical report, 2015.
[8] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. C. Courville,
and Y. Bengio. Generative adversarial nets. In NIPS, 2014.
[9] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR,
2016.
[10] X. Huang, Y. Li, O. Poursaeed, J. Hopcroft, and S. Belongie. Stacked generative adversarial
networks. In CVPR, 2017.
[11] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing
internal covariate shift. In ICML, 2015.
[12] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. Image-to-image translation with conditional
adversarial networks. In CVPR, 2017.synthesis. In ICML, 2016.

19
Annexure
(Weekly Report)

20

You might also like