Group No.17: Class-Ai - A Sub-Edi
Group No.17: Class-Ai - A Sub-Edi
Group No.17: Class-Ai - A Sub-Edi
2 InstructBLIP: Wenliang 2023 This paper presents InstructBLIP, a novel vision-language instruction tuning approach,
Towards Dai, Junnan which leverages pretrained BLIP-2 models. Through systematic experimentation on 26
General- Li, Dongxu Li, diverse datasets and the introduction of an instruction-aware Query Transformer,
purpose InstructBLIP achieves state-of-the-art zero-shot performance across various tasks,
Vision- surpassing both BLIP-2 and larger Flamingo models. Additionally, it demonstrates superior
Language finetuning performance on individual downstream tasks and outperforms concurrent
Models with multimodal models, showcasing its efficacy in addressing challenges in vision-language
Instruction instruction tuning.
Tuning
Litreature Review
Sr
Name Author Year Summary
no
3 I2T: Image Benjamin Z. 2010 This paper presents an I2T framework that generates text descriptions for images
Parsing to Yao, Xiong and videos by breaking them down into visual patterns, converting them into
Text Yang, Liang semantic representations, and generating human-readable text reports. It uses
Description Lin an and-or graph (AoG) visual knowledge representation to aid in image parsing
and description generation.
Top of Form
4 BLIP:Bootstr Junnan Li, 2022
apping Dongxu Li, This paper introduces BLIP, a novel Vision-Language Pre-training (VLP) framework
Language- Caiming that excels in both understanding and generation tasks. Unlike existing models,
Image Pre- Xiong, BLIP effectively leverages noisy web data by bootstrapping captions, generating
training for synthetic captions, and filtering out noise. BLIP achieves state-of-the-art
Unified performance across various vision-language tasks, including image-text retrieval,
Vision- image captioning, and VQA. Additionally, it demonstrates strong generalization
capabilities in zero-shot transfer to video-language tasks.
Language
Key Components
1.Input Acceptance
1 Accepts an Image as a input
2. Image Preprocessing
2 Clean, resize, and augment images
3. Prompt Generation
3 Generate number of descriptive prompts
4. Image Generation
4 Produce new synthetic images
Project Execution
Advantages
Diverse Dataset Generation Customizable Synthetic Data
ImageGafter can generate a Ensures high-quality and contextually
wide variety of unique and relevant image generation through
advanced AI models, maintaining
realistic images, expanding
consistency across the dataset.
the dataset for training and
testing AI models.