Group No.17: Class-Ai - A Sub-Edi

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 14

Class-AI_A

Group No.17 Sub-EDI


EDI Topic

ImageGafter : AI-Driven Data Augmentation for Computer Vision


(Image Dataset Generator)
Sr.no Roll.no PRN Name
1 70 12320204 Diksha Teware

2 71 12320027 Rutuja Diwate

3 72 12320102 Ashwanti Gaikwad

4 73 12320138 Hitesh Tolani

5 75 12320031 Shravan Kalzunkar


Introduction to
ImageGafter
ImageGafter is an innovative tool designed to automate the
generation of large, diverse image datasets. It takes a single input
image, converts it into descriptive text using advanced AI models,
generates multiple similar text prompts, and creates new images from
these prompts. This process streamlines the creation of extensive and
high-quality image datasets, essential for training robust AI and
machine learning models.
Problem Statement
The creation of large and diverse image datasets is a critical need for
training robust AI and machine learning models. However, this process
is currently fraught with significant challenges. Manually curating and
generating these datasets is not only time-consuming but also labor-
intensive, often requiring extensive resources and expertise.
Additionally, ensuring the diversity and contextual relevance of images
within a dataset adds another layer of complexity.
Objectives
Automate Dataset Creation Leverage AI models
Develop a tool that automates the Utilize advanced AI models to
generation of large, diverse image convert images into descriptive text
datasets from a single input image. and generate similar text prompts.

Efficient Image Generation Improved Efficiency

Produce multiple high-quality Significantly reduce the time and


images from each text prompt, effort required to create large image
ensuring diversity while maintaining datasets compared to manual
contextual similarity. methods.
Litreature Review
Sr.
Name Author Year Summary
no
1 CogView: Ming Ding , 2020 The CogView proposal introduces a 4-billion-parameter Transformer model with a VQ-VAE
Mastering Zhuoyi Yang, tokenizer to address text-to-image generation. It includes finetuning strategies for various
Text-to-Image Wenyi Hong , tasks such as style learning, super-resolution, text-image ranking, and fashion design, as
Generation Wendi Zheng well as methods to stabilize pretraining. CogView achieves state-of-the-art FID on the
via blurred MS COCO dataset, surpassing GAN-based models and a similar work called DALL-
Transformers E. If you have any specific questions about this, feel free to ask!

2 InstructBLIP: Wenliang 2023 This paper presents InstructBLIP, a novel vision-language instruction tuning approach,
Towards Dai, Junnan which leverages pretrained BLIP-2 models. Through systematic experimentation on 26
General- Li, Dongxu Li, diverse datasets and the introduction of an instruction-aware Query Transformer,
purpose InstructBLIP achieves state-of-the-art zero-shot performance across various tasks,
Vision- surpassing both BLIP-2 and larger Flamingo models. Additionally, it demonstrates superior
Language finetuning performance on individual downstream tasks and outperforms concurrent
Models with multimodal models, showcasing its efficacy in addressing challenges in vision-language
Instruction instruction tuning.
Tuning
Litreature Review
Sr
Name Author Year Summary
no
3 I2T: Image Benjamin Z. 2010 This paper presents an I2T framework that generates text descriptions for images
Parsing to Yao, Xiong and videos by breaking them down into visual patterns, converting them into
Text Yang, Liang semantic representations, and generating human-readable text reports. It uses
Description Lin an and-or graph (AoG) visual knowledge representation to aid in image parsing
and description generation.
Top of Form
4 BLIP:Bootstr Junnan Li, 2022
apping Dongxu Li, This paper introduces BLIP, a novel Vision-Language Pre-training (VLP) framework
Language- Caiming that excels in both understanding and generation tasks. Unlike existing models,
Image Pre- Xiong, BLIP effectively leverages noisy web data by bootstrapping captions, generating
training for synthetic captions, and filtering out noise. BLIP achieves state-of-the-art
Unified performance across various vision-language tasks, including image-text retrieval,
Vision- image captioning, and VQA. Additionally, it demonstrates strong generalization
capabilities in zero-shot transfer to video-language tasks.
Language
Key Components

Image to Text Prompt Image


Conversion Generation Generation
Converts the input image Generates multiple Creates multiple images
into a descriptive text similar text prompts from each generated text
using advanced image based on the descriptive prompt by employing
captioning models. text with the help of Stable Diffusion Night
Gemini Pro 1.5, an Vision XL, a powerful
advanced text generation image generation model.
model.
Tools & Technologies used

Programming Machine Open source


Languages Learning Libraries
advanced machine ImageGafter
The ImageGafter
learning techniques
system is built using integrates with
for image captioning
a combination of It enables the system popular open-source
Python & to generate highly libraries such as
JavaScript scripts to realistic and diverse Pillow, PyTorch, and
handle various synthetic images,also request to provide
maintain high advanced computer
aspects of the image
standards of quality
generation and vision and image
and relevance in the
processing pipeline. generated image processing
datasets. capabilities.
Architectural Diagram
Execution Flow

1.Input Acceptance
1 Accepts an Image as a input

2. Image Preprocessing
2 Clean, resize, and augment images

3. Prompt Generation
3 Generate number of descriptive prompts

4. Image Generation
4 Produce new synthetic images
Project Execution
Advantages
Diverse Dataset Generation Customizable Synthetic Data
ImageGafter can generate a Ensures high-quality and contextually
wide variety of unique and relevant image generation through
advanced AI models, maintaining
realistic images, expanding
consistency across the dataset.
the dataset for training and
testing AI models.

Time and Cost Savings


Generating synthetic data is faster and more cost-effective than
manually collecting and annotating real-world images.
Future Scope

Expanding Horizons Model Integration Collaborative


Ecosystems
Extend the tool’s applications Integrate more advanced Create a cloud-based
to various fields such as models for image platform for collaborative
healthcare, autonomous captioning and dataset generation, enabling
driving, and augmented generation to improve multiple users to contribute
reality, where large and the quality and diversity and access datasets
diverse datasets are crucial. of the generated simultaneously.
datasets.
Thank You

You might also like