AI Image Generator PPT-1
AI Image Generator PPT-1
Generator
Using Prompt
Team Members:
Ojasi Prabhu -43
Parth Raut -44
Aditya Shinde -53
Contents
Introduction
Problem Statement
Requirements
Literature Survey
Block Diagram
Implementation
Future Scope
Conclusion
References
Introduction
Hardware Requirements
1. Intel Processor
2. Ram: 4 GB minimum
Software Requirements
1. Windows OS
2. VS Code
3. Internet Browser
4. Android Studio
Literature Survey
Frolov, Stanislav & Hinz, Tobias & Raue, Federico & Hees, Jörn & Dengel,
Andreas. et.al [1] says that the previous systems in text-to-image
synthesis have exhibited several notable limitations. One key drawback is
their struggle to achieve precise and fine-grained control over the visual
content generated from textual descriptions. This often results in images
that lack fidelity to the provided text, as intricate details and nuanced
features are frequently lost in translation. Moreover, these systems often
encounter difficulties in producing images with consistent styles,
perspectives, or maintaining a high level of visual coherence across
various components of the generated scene. These limitations hinder the
capacity to create images that faithfully represent the intended textual
concepts and can be a significant impediment in practical applications.
Literature Survey
M Siddharth, R Aarthi et.al [2] says when combined with RoBERTa and
Mask R-CNN, blended multi-class text-to-image synthesis GANs show
some restrictions that need to be taken into consideration. First off, even if
RoBERTa integration improves textual embedding quality, it may still have
trouble grasping subtle nuances of context, cultural allusions, and
abstract descriptions. This can make it difficult to produce correct and
contextually relevant images. Furthermore, Mask R-CNN is mostly used to
assist in object recognition and segmentation in images; however, it is not
very successful at combining these segmented pieces logically into
synthetic images. To fully utilize the promise of these coupled
technologies and advance the field of multi-class text-to-image synthesis,
it will be imperative to bridge these research gaps and address these
restrictions.
Literature Survey
[1] Frolov, Stanislav & Hinz, Tobias & Raue, Federico & Hees, Jörn & Dengel,
Andreas. (2021). 2021 IEEE Sixth International Conference on Multimedia Big Data
(BigMM), Xi'an, China, 2021, pp. 1-5, doi: 10.1109/BigMM.2021.8499439.
[2] M Siddharth, R Aarthi (2023). “Blended multi-class text to image synthesis GANs
with RoBerTa and Mask R-CNN”2023 IEEE/CVF International Conference on Computer
Vision Workshop (ICCVW), Seoul, Korea (South), 2023, pp. 1887-1890, doi:
10.1109/ICCVW.2023.00237.
[3] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn,
Xiaohua Zhai(2019) An Image is Worth 16x16 Words: Transformers for Image
Recognition at Scale arXiv:2019.11929v2
[4] Phillip Isola (2021) “Text to Image Generation using cGAN model” 2021 IEEE
33nd International Conference on Tools with Artificial Intelligence (ICTAI), Baltimore,
MD, USA, 2021, pp. 432-436, doi: 11.1109/ICTAI30020.2020.00074.