0% found this document useful (0 votes)
50 views15 pages

AI Image Generator PPT-1

Uploaded by

Parth
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views15 pages

AI Image Generator PPT-1

Uploaded by

Parth
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

AI Image

Generator
Using Prompt
Team Members:
Ojasi Prabhu -43
Parth Raut -44
Aditya Shinde -53
Contents

 Introduction
 Problem Statement
 Requirements
 Literature Survey
 Block Diagram
 Implementation
 Future Scope
 Conclusion
 References
Introduction

1. Cutting-Edge Technology: Our Text to Image Generator is powered by the latest


technology, allowing you to transform text into stunning visuals effortlessly.
2. Professional-Grade Visuals: Create visuals that are on par with professional graphic
design, enhancing the visual appeal of your content.
3. Empower Your Content: Unlock the power to elevate your content with visually
captivating images, making your message more engaging and memorable.
4. Effortless Transformation: Say goodbye to complex design software – our tool
simplifies the process of turning text into striking images.
5. Creative Possibilities: Explore a world of creative possibilities with our tool, enabling
you to customize and personalize visuals to suit your unique style and message.
6. Convenient Accessibility: Access this innovative solution right at your fingertips,
making it easy to integrate into your content creation process.
Problem Statement

 In the modern content creation, the need for an efficient and


professional Text to Image Generator has become increasingly
apparent. Current solutions often lack the sophistication and
ease of use required to seamlessly convert text into high-quality
visuals, hindering the creative process and limiting the visual
impact of presentations and marketing materials. This calls for a
solution that not only addresses the gap but also enhances the
overall quality of generated images, empowering users to
deliver captivating, professional content
Requirements

 Hardware Requirements
1. Intel Processor
2. Ram: 4 GB minimum

 Software Requirements
1. Windows OS
2. VS Code
3. Internet Browser
4. Android Studio
Literature Survey

 Frolov, Stanislav & Hinz, Tobias & Raue, Federico & Hees, Jörn & Dengel,
Andreas. et.al [1] says that the previous systems in text-to-image
synthesis have exhibited several notable limitations. One key drawback is
their struggle to achieve precise and fine-grained control over the visual
content generated from textual descriptions. This often results in images
that lack fidelity to the provided text, as intricate details and nuanced
features are frequently lost in translation. Moreover, these systems often
encounter difficulties in producing images with consistent styles,
perspectives, or maintaining a high level of visual coherence across
various components of the generated scene. These limitations hinder the
capacity to create images that faithfully represent the intended textual
concepts and can be a significant impediment in practical applications.
Literature Survey

 M Siddharth, R Aarthi et.al [2] says when combined with RoBERTa and
Mask R-CNN, blended multi-class text-to-image synthesis GANs show
some restrictions that need to be taken into consideration. First off, even if
RoBERTa integration improves textual embedding quality, it may still have
trouble grasping subtle nuances of context, cultural allusions, and
abstract descriptions. This can make it difficult to produce correct and
contextually relevant images. Furthermore, Mask R-CNN is mostly used to
assist in object recognition and segmentation in images; however, it is not
very successful at combining these segmented pieces logically into
synthetic images. To fully utilize the promise of these coupled
technologies and advance the field of multi-class text-to-image synthesis,
it will be imperative to bridge these research gaps and address these
restrictions.
Literature Survey

 Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk


Weissenborn, Xiaohua Zhai et.al [3] says one notable drawback is the
heavy reliance on the quality of the textual input. Ambiguous or
abstract descriptions in the text can often result in generated images
that lack clear and accurate details, as the model struggles to
disambiguate and faithfully translate such content. Additionally,
generating diverse styles and viewpoints, as well as maintaining
consistent visual coherence across complex scenes, remains a
challenge. The research gap lies in developing techniques that enhance
the model's capacity to interpret and translate intricate textual
descriptions into rich and coherent visual representations, overcoming
these limitations and broadening the applicability of text-to-image
generation.
Block Diagram
Architecture
Implementation
Implementation
Future Scope

 Security System can be build


 Fast Generation of Images
 More Optimised Application
 Download Button can be added
Conclusion

 In conclusion, our Text to Image Generator offers an innovative solution


to a pressing challenge in content creation. By providing a user-friendly,
professional-grade tool, it revolutionizes the way individuals and
businesses produce visually compelling content. It bridges the gap
between text and captivating visuals, introducing in a new era of
creative potential and professional excellence.
References

[1] Frolov, Stanislav & Hinz, Tobias & Raue, Federico & Hees, Jörn & Dengel,
Andreas. (2021). 2021 IEEE Sixth International Conference on Multimedia Big Data
(BigMM), Xi'an, China, 2021, pp. 1-5, doi: 10.1109/BigMM.2021.8499439.
[2] M Siddharth, R Aarthi (2023). “Blended multi-class text to image synthesis GANs
with RoBerTa and Mask R-CNN”2023 IEEE/CVF International Conference on Computer
Vision Workshop (ICCVW), Seoul, Korea (South), 2023, pp. 1887-1890, doi:
10.1109/ICCVW.2023.00237.
[3] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn,
Xiaohua Zhai(2019) An Image is Worth 16x16 Words: Transformers for Image
Recognition at Scale arXiv:2019.11929v2
[4] Phillip Isola (2021) “Text to Image Generation using cGAN model” 2021 IEEE
33nd International Conference on Tools with Artificial Intelligence (ICTAI), Baltimore,
MD, USA, 2021, pp. 432-436, doi: 11.1109/ICTAI30020.2020.00074.

You might also like