LLaVA - Large Multimodal Model

Uploaded by

Marcos Luis

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

32 views15 pages

LLaVA - Large Multimodal Model

Uploaded by

Marcos Luis

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 15

9115124, 8:97 AM LLAVA- Large Multimodal Model | MLExpert- Get Things Done with Al Bootcamp Blog > Llava Large Multimodal Model LLaVA - Large Multimodal Model LLaVA - Large Open Source Multimodal Model | Chat with Images like GPT-4.. Large Language Models (LLMs) allow us to generate text, but they only take text as an input. Large Multimodal Models (LMM) can take both text and image as an input, and generate text based on both. So, you can chat with your model about an image Join the AI BootCamp! Ready to dive into the world of Al and Machine Learning? Join the Al BootCamp to transform your career with the latest skills and hands-on project experience. Learn about LLMs, ML best practices, and much more! JOIN NOW htips:shwww.mlexpertiofblogilave-arge-mulimedal-madel 4s971924, 827 AM LAVA. Large Multimodal Made! | MLExpert- Get Tings Done wi Al Bootcamp OpenAl has released their GPT-4V(ision)! model that integrates nicely with the ChatGPT interface. However, open-source models are on the way. LLaVA is one of them. © In this part, we will be using Jupyter Notebook to run the code. If you prefer to follow along, you can find the notebook on GitHub: GitHub Repository What is LLaVA? LLaVA2, a Large Multimodal Model (LMM), allows you to have image-based conversations. Similar to GPT-4V but without the price tag, LLaVA is free and open source, LLaVA represents a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding, achieving impressive chat capabilities mimicking spirits of the multimodal GPT-4 and setting a new state-of-the-art accuracy on Science QA. So, LLaVA combines a vision encoder and an open-source LLM (Vicune in this case). LLaVA 1.5 The LLaVA-1.53 model offers a solid improvement on all benchmarks, compared to the original model. It is trained on 1.2M data points, adds academic-task-oriented VQA dataset and it trains in ~1 day on a 8-A100 node. We're going to use the 138 model checkpoint and load it with the ava-torch library in a 4bit format. How good is it? Let's find out. Setup Setting up the LLaVA library requires installing the following dependencies: ntps:wwwmlexpertofblogiava-arge-mulimodal-model 2159115124, 8:97 AM LLAVA- Large Multimodal Model | MLExpert- Get Things Done with Al Bootcamp pip install -Uqqq pip --progress-bar off pip install -qqq torch==2.1 --progress-bar off pip install -qqq transformers==4.34.1 --progress-bar off pip install -qqq accelerate==0.23.0 --progress-bar off pip install -qqq bitsandbytes==0.41.1 --progress-bar off pip install -qqq llava-torch==1.1.1 --progress-bar off The last package, Llava-torch is the LLaVA library. Let's add the necessary imports: import textwrap from io import BytesIO import requests import torch from llava.constants import DEFAULT_IMAGE_TOKEN, IMAGE_TOKEN_INDEX from llava.conversation import SeparatorStyle, conv_tenplates from Llava.mn_utils import ( KeywordsStoppingCritenia, get_model_name_from_path, process_images, tokenizer_image_token, ) rom Llava.model.builder import load_pretrained_model from llava.utils import disable_torch_init from PIL import Image disable_torch_init() Data To reproduce the results, we need to download the following images: Igdown AnpSrAod-apd1@D305XXQhjMa2ja7FEH igdown 1Qnutc8S7F6jMNERKIZBgiAePynDC}3Ti tgdown 1XH7QgiuN}7Kjapaetjy#x"VWSdQaqsaH igdonn 1n9v8EVZ16sYcULCGUHBPFULxFxam190U Igdown 1x7XtPRG-IbSxyCO-ZT0_P7JirwRFY-3N ntps:wwwmlexpertofblogiava-arge-mulimodal-model 3s9115124, 8:97 AM LLAVA- Large Multimodal Model | MLExpert- Get Things Done with Al Bootcamp Download Model We'll use the 13B model checkpoint and load it with the 1lava-torch library in a 4bit format. Let's start by taking it's name: MODEL = “4bit/llava-v1.5-13b-3GB" model_name = get_model_name_from_path(MODEL) model_name "Llava-v1.5-13b-368" To load the model, tokenizer, and image processor we can use the load_pretrained_model helper function: tokenizer, model, image_processor, context_len = load_pretrained_model( model_path=MODEL, model_base=None, model_name=model_name, load_4bit=True Image Preprocessing and Prompt We need a way to load the image and process it for the model. Let's create a helper function for loading the image using PIL def load_image(image_file): if image_file.startswith("https://") or image_file.startswith("https://"): response = requests.get(image_file) image = Inage.open(BytesI0(response.content)). convert ("RGB") else: image = Image.open(image_file).convert("RGB") return image The function will load a local file or download it from a URL (via the requests library). Next, we'll create a function that will process the image for the mode! ntps:wwwlexpertofblogiava-arge-mulimodal-model ans,9115124, 8:97 AM LLAVA- Large Multimodal Model | MLExpert- Get Things Done with Al Bootcamp def process, image (image): args = {"image_aspect_ratio": "pad"} image_tensor = process_images([image], image_processor, args) return image_tensor.to(model.device, dtype=torch. floati6) Let's try it out: image = = load_image("bike-girl. jpeg") processed_image = process_image(image) type(processed_image), processed_image. shape (torch.Tensor, torch.Size([1, 3, 336, 336])) The functions load the image and process it for the model by converting it into a Tensor. Next, we'll create function that will create a prompt using the correct template: CONV_MODE = "1lava_ve" def create_prompt(prompt: str): conv = cony_templates[CONV_MODE].copy() roles = conv.roles prompt = DEFAULT_IMAGE_TOKEN + "\n" + prompt conv.append_message(roles[@], prompt) conv.append_message(roles[1], None) return conv.get_prompt(), conv prompt, _ = create_prompt("Describe the image") print (pronpt) The function takes care of any special tokens and adding roles to the prompt. Here's the final template: A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions. uiHuman: Describe the image si#Assistant: ntps:wwwlexpertofblogiava-arge-mulimodal-model 559115724, 8:97 AM LLaVA- Large Multimodal Model | MLExpart- Get Things Done wth Al Bootsama We have a prompt and a way to process the image. Let's create a function that will ask the model a question about the image: def ask_image(image: Image, prompt: str): image_tensor = process_image(image) Prompt, conv = create_prompt(prompt) input_ids = ( ‘tokenizer_image_token(prompt, tokenizer, IMAGE_TOKEN_INDEX, return_tensors="pt" -unsqueeze(@) -to(model device) stop_str = conv.sep if conv.sep_style != Separatorstyle.THO else conv. sep2 stopping criteria = keywordsStoppingCriteria( keywords=[stop_str], tokenizer=tokenizer, input_ids-input_ids with torch.inference_mode(): output_ids = model.generate( input_ids, images=image_tensor, do_sample=True, [email protected], max_new_tokens=512, use_cache=True, stopping_criteri stopping_criteria], ) return tokenizer.decode( output_ids[@, input_ids.shape[1] +], skip_special_tokens-True ).strip() The function takes care of the following: creating the prompt, tokenizing it, generating the output, and decoding it. The interface is very similar to other generative models from the HuggingFace developers. Q&A Over Image Let's load our first image: https:shwwwemlexpertiofblogilave-arge-mulimedal-madel ens‘9115124, 8:97 AM Muttimadal Model | MLExpert- Get Things Done with Al Bootcamp Girl on a bike We can start with a simple question result = ask_image(image, "Describe the image") print (textwrap.fill(result, width=110)) The image features a woman sitting on a motorcycle, which is parked on a brick htips:shwwwemlexpert iofblogilave-arge-mulimedal-madel 71596124, 8:37 AM LLLaVA- Large Multimodal Model | MLExpert- Get Things Qone with Al Bootcamp driveway in front of a house. She is wearing a black leather outfit, which includes a leather jacket and leggings. The motorcycle is positioned prominently in the scene, with the woman sitting comfortably on it. The house in the background adds a sense of context to the scene, suggesting that the woman may be preparing to ride the motorcycle or has just arrived at her destination. The description is quite detailed and good overall. Let's ask something more specific result = ask_image(image, "Does the woman wear a helmet?”) print (textwrap.fill(result, width=110)) Lava Yes, the woman is wearing a helmet while sitting on the motorcycle. The model has failed to answer the question correctly. Let's ask something similar by try to make the model reason about the image: result = ask_image( image, “Take a look at the woman's head. What is the color of her skin? Does she wear a he d print(textwrap.fill(result, width=110)) Lava, The woman's skin color is white, and she is not wearing a helmet. This time around the model has answered correctly. Asking for focusing on the woman's head and color of her skin helped us get a correct response OCR & Document Understanding https:shwwwemlexpertiofblogilave-arge-mulimedal-madel ans9115124, 8:97 AM Let's try something more challenging. Can the model read and understand documents? We'll use t LLLaVA- Large Multimodal Model | MLExpert- Get Things Qone with Al Bootcamp 1¢ following image from the Bitcoin whitepaper: Bitcoin: A Peer-to-Peer Electronic Cash System Sto Nekamots sstostinggm com wn bicoieorg Abstracts A prey pepe venion of cone cat wou all elie ‘vo beset diel fm ne gry 1 soir wiht ain Doe # Frncl aston Dial signs povie pat of he solion bt fe malt ees ot i sed i pay sl gure pet dese. Weep ohaer os aeicaeatng pelea wa rerdngne terre ‘Toe neck timesmps tnsacion by hashing he nt an ongoing cin ot Ihase en-ozwo, emis aco eamotbe cand witb Felons tbe pootetwork. The ng cai enya pot of he eure of ‘rent wind, bt pool ht came or he rt poo of CPU poms AS eng 1 malay af CPU pow scone by ods af 0 coperatng io sack te newt, they pent He Imps can and eiace acer. The basis and nods cn lave and om the newak at wll sping He Togs rood ae x root of wt ppence whey wore me 1. Introduction ‘Commerce on theirs as comet ely almost exclusively o ficial instnons serving as ‘std hed pares to process clecoie payments Whe the sim woRs well enough ft ‘ont eanactiooy, sll sullzs Gow he tihermt neakpemer of tbe ‘ru bued mol (Compiclynorrerersble ranactons are not reall posib sce Fisocalinsitutonscanot ‘oid mediating capes, The cont oF mediation rerwcs transicon cost, imiing the ‘nim praieal asacton size and eating oT he possi fr sal casual vansctins, td there le bondor sot inthe lw of abit to male now roverble payments fer on ‘versie services. Wit he possi of reversal the ced fr ts spreads. Merhans mst te wary ofther customers, sling them fr moe nfomaton than they would othewise ned ‘Acetain percentage of fad isaceptel as unaoidble. These cows a pay werainaes {be voied in person y using pyscal currency. bu no mechani exis to mabe pMNONS ‘ver aconmcatone channel witout tse par. "Wiat is pedois a electronic payment system tas oncryptopaphic poof insta of tas, lowing nyt willing parts to ranact dealywithen ote witht he ned br atraed {hid party Tansictins tat are computationally impact w reverse woul protect sells ‘um fan, and roan escrow mesharss could easly be implementa 1 prac buss. [8 Is paper we pops soutien tothe dublespenling proble wsig pee-o-pet dito ‘extn sewer openers computational et othe sro der ef mentors The ‘mt is Secure a fg as honest pode ealecivey cool more CPU power Un 20) ‘Stopsating grup of archer nodes First page of Bitcoin paper htips:shwwwemlexpert iofblogilave-arge-mulimedal-madel ons9115124, 8:97 AM LLAVA- Large Multimodal Model | MLExpert- Get Things Done with Al Bootcamp wetime result = ask_image(image, "What is the title of the paper?") print (textwrap.fill(result, width=110)) Lava, Bitcoin: A Peer-to-Peer Electronic Cash System Great, the model has correctly extracted the title of the paper. Let's see if it can extract the abstract: wxKtime result = ask_image(image, “Extract the text from the abstract") print (textwrap.fill(result, width=110)) Lava, Bitcoin: A Peer-to-Peer Electronic Cash System It got that wrong, It extracted the title again, but nothing from the abstract. Again, we can try to make the model reason about the image by asking for a summary of the abstract: setime result = ask_image(image, "Summarize the abstract of the paper in 2 sentences. print(textwrap.fill(result, width=11@)) Lava, The paper discusses the concept of a peer-to-peer electronic cash system, focusing on the Bitcoin system. It highlights the advantages of this system, such as its decentralized nature, security, and potential for financial inclusion. The paper also addresses some of the challenges and limitations of the Bitcoin system, such as scalability and regulatory issues. ntps:wwwlexpertofblogiava-arge-multimodal-madel rons9115124, 8:97 AM LLLaVA- Large Multimodal Model | MLExpert- Get Things Qone with Al Bootcamp Much better! LLaVA has correctly extracted the abstract and summarized it in 2 sentences. Price Chart We can also ask the model to reason about charts. Let's try with the following Bitcoin price chart Bitcoin price chart result = ask_image( image, “This is a chart of Bitcoin price. What is the current price according to the chart ) print (textwrap.fill(result, width=110)) Lava The current price of Bitcoin according to the chart is $23,000. It got that wrong, It wasn't able to get the correct value from the chart ($28.9k) htips:shwwwemlexpert iofblogilave-arge-mulimedal-madel 1159115124, 8:97 AM LLaVA- Large Multimodal Mode! | MLExper- Get Things Done with Al Bootcamp Captcha Another interesting use case is to ask the model to solve a captcha. Let's try with something simple: “REACH Captcha wetime result = ask_image(image, "Extract the text from the image") print (textwrap.fill(result, width=110)) Total failure, it didn't even get the number of characters right. Meme Our final experiment will be to ask the model to reason about a meme. Let's try with the following one: https:shwwwemlexpertiofblogilave-arge-mulimedal-madel 2s9115124, 8:97 AM LLaVA- Large Multimodal Mods! | MLExper- Get Things Done with Al Bootcamp meme setime result = ask_image(image, "Is this funny and why? print (textwrap.fill(result, width=110)) Yes, this image is funny because it humorously represents the process of learning by showing a person's brain going through different stages of learning. The image features a series of four pictures of a brain, each representing a different stage of learning, such as from university, online courses, YouTube, and articles. The visual representation of the brain's journey through these stages is exaggerated and comical, making it a light-hearted and entertaining image. The model has correctly identified the meme as funny but has provided a very generic answer. It didn’t note the different sources of education and the funny side of their htips:shwwwemlexpert iofblogilave-arge-mulimedal-madel 1369115124, 8:97 AM LLAVA- Large Multimodal Model | MLExpert- Get Things Done with Al Bootcamp ranking. Let's specificially ask for the ranking wxtime result = ask_image( image, “Order all learning resources sorted by usefulness in a list, according to the imag ) print(textwrap.fill(result, width=110)) Lava, Online Courses YouTube University Articles Memes yayese This one is interesting, I would say that the model didn't get the ranking right. It has put memes at the bottom, but according to the image, they are the best. The model has correctly identified the different sources of education (the OCR did work), but it didn't get the ranking right. Keep in mind that this particular meme might've been included in the training set. Conclusion While the LLaVA model can be used to understand images, it is not perfect. It can be used to extract text from images, summarize and describe, but it struggles with more complex reasoning. However, it is a great start and I'm looking forward to seeing more open-source LMMs, possibly beating the GPT-4V (and more commercial) model(s). 3,000+ people already joined Join the The State of Al Newsletter https:shwwwemlexpert iofblogilave-arge-mulimedal-madel sans9115124, 8:97 AM LLAVA- Large Multimodal Model | MLExpert- Get Things Done with Al Bootcamp Every week, receive a curated collection of cutting-edge Al developments, practical tutorials, and analysis, empowering you to stay ahead in the rapidly evolving field of Al Your Email Address SUBSCRIBE Iwon't send you any spam, ever! References 1, GPT-4V(ision) system card © 2. Visual Instruction Tuning © 3. Improved Baselines with Visual Instruction Tuning © Dark © 2020-2024 MLExpert™ by Venelin Valkov. All Rights Reserved. ntps:wwwlexpertofblogiava-arge-mulimodal-model 1515

OpenAI Official Prompt Engineering Guide
No ratings yet
OpenAI Official Prompt Engineering Guide
17 pages
Cheatsheet Transformers Large Language Models
No ratings yet
Cheatsheet Transformers Large Language Models
4 pages
AI in 100 Images
No ratings yet
AI in 100 Images
104 pages
LLM Cheat Sheetpdf
No ratings yet
LLM Cheat Sheetpdf
7 pages
Ai Implementation Guide
No ratings yet
Ai Implementation Guide
14 pages
MM-LLMs Recent Advances in MultiModal Large Language Models
No ratings yet
MM-LLMs Recent Advances in MultiModal Large Language Models
22 pages
Hugging Face Transformers
No ratings yet
Hugging Face Transformers
8 pages
5 Pretraining On Unlabeled Data - Build A Large Language Model (From Scratch)
No ratings yet
5 Pretraining On Unlabeled Data - Build A Large Language Model (From Scratch)
61 pages
GPT-4o API Deep Dive Text Generation Vision and Function Calling
No ratings yet
GPT-4o API Deep Dive Text Generation Vision and Function Calling
21 pages
CS485 Ch5 Transformers
No ratings yet
CS485 Ch5 Transformers
50 pages
LLM Benchmark
No ratings yet
LLM Benchmark
21 pages
GenAI Roadmap
No ratings yet
GenAI Roadmap
8 pages
Prompt Engineering Notes
No ratings yet
Prompt Engineering Notes
2 pages
LangChain Programming For Beginners
No ratings yet
LangChain Programming For Beginners
154 pages
NCA-GENL Nvidia Generative Ai Llms Exam Dumps
No ratings yet
NCA-GENL Nvidia Generative Ai Llms Exam Dumps
5 pages
Everything You Need To Know About Small Language Models (SLM) and Its Applications
No ratings yet
Everything You Need To Know About Small Language Models (SLM) and Its Applications
3 pages
Build Whatsapp Chatbot With Flask and Open Source LLM - LLAMA3? - by Mayankchugh Jobathk - Medium
No ratings yet
Build Whatsapp Chatbot With Flask and Open Source LLM - LLAMA3? - by Mayankchugh Jobathk - Medium
23 pages
Techniques To FineTune LLMs
No ratings yet
Techniques To FineTune LLMs
7 pages
LangChain & RAG
No ratings yet
LangChain & RAG
62 pages
LLM and RAG
No ratings yet
LLM and RAG
12 pages
Res Net
No ratings yet
Res Net
13 pages
Ai Notes
No ratings yet
Ai Notes
2 pages
Huggingface Basics
No ratings yet
Huggingface Basics
28 pages
Newwhitepaper Agents2
No ratings yet
Newwhitepaper Agents2
84 pages
Large Language ModelBrained GUI Agents
No ratings yet
Large Language ModelBrained GUI Agents
78 pages
The Rise of Vector Databases in The Age of LLMs
No ratings yet
The Rise of Vector Databases in The Age of LLMs
26 pages
Generative AI With Large Language Models AWS & DeepLearning
No ratings yet
Generative AI With Large Language Models AWS & DeepLearning
96 pages
Knowledge Graph Construction Using Large Language Models
No ratings yet
Knowledge Graph Construction Using Large Language Models
17 pages
Levels of AI Agents - From Rules To Large Language Models
No ratings yet
Levels of AI Agents - From Rules To Large Language Models
8 pages
Private Chatbot With Local LLM (Falcon 7B) and LangChain
No ratings yet
Private Chatbot With Local LLM (Falcon 7B) and LangChain
14 pages
Building A Talking AI With LLAMA + RAG - by Stefanoz - Oct, 2024 - Medium
No ratings yet
Building A Talking AI With LLAMA + RAG - by Stefanoz - Oct, 2024 - Medium
23 pages
Building Effective Agents - Anthropic
No ratings yet
Building Effective Agents - Anthropic
14 pages
2023 Intro To Generative Ai
No ratings yet
2023 Intro To Generative Ai
15 pages
Fine-Tuning Legal-BERT - LLMs For Automated Legal Text Classification - by Drewgelbard - Nov, 2024 - Towards AI
No ratings yet
Fine-Tuning Legal-BERT - LLMs For Automated Legal Text Classification - by Drewgelbard - Nov, 2024 - Towards AI
27 pages
5 Techiques To FineTune LLMs
No ratings yet
5 Techiques To FineTune LLMs
7 pages
A Review On Large Language Models Architectures Ap
No ratings yet
A Review On Large Language Models Architectures Ap
31 pages
A-Z of RAG Question Answering Methods in Langchain
No ratings yet
A-Z of RAG Question Answering Methods in Langchain
33 pages
GenAI Pinnacle Roadmap
100% (1)
GenAI Pinnacle Roadmap
8 pages
Dolly2.0 Ready For Commercial Use
No ratings yet
Dolly2.0 Ready For Commercial Use
3 pages
Practical Guide To Using LLMs by Andrej Karpathy Feb 29 2025
No ratings yet
Practical Guide To Using LLMs by Andrej Karpathy Feb 29 2025
8 pages
Hands-On Lab With LLMs and Gen AI Within IDC
No ratings yet
Hands-On Lab With LLMs and Gen AI Within IDC
57 pages
Chatbot Openai Project Report
No ratings yet
Chatbot Openai Project Report
7 pages
Chapter 2. Pair Programming
No ratings yet
Chapter 2. Pair Programming
15 pages
Parameter-Efficient Fine-Tuning Methods For Pretrained Language Models - A Critical Review and Assessment
No ratings yet
Parameter-Efficient Fine-Tuning Methods For Pretrained Language Models - A Critical Review and Assessment
20 pages
Hugging Face Case Study 112023
No ratings yet
Hugging Face Case Study 112023
2 pages
A Step-By-Step Guide To Building AI Agents With LangGraph - by Alannaelga - Coinmonks - Nov, 2024 - Medium
No ratings yet
A Step-By-Step Guide To Building AI Agents With LangGraph - by Alannaelga - Coinmonks - Nov, 2024 - Medium
32 pages
Small Language Models (SLMS)
No ratings yet
Small Language Models (SLMS)
23 pages
Neo4j - GraphRAG - 2024
100% (1)
Neo4j - GraphRAG - 2024
23 pages
Ai Agents
No ratings yet
Ai Agents
31 pages
Transformers
No ratings yet
Transformers
21 pages
Yugandar - Generative AI Architect
No ratings yet
Yugandar - Generative AI Architect
8 pages
Neural Voice Cloning With A Few Samples
No ratings yet
Neural Voice Cloning With A Few Samples
18 pages
Graph RAG
No ratings yet
Graph RAG
7 pages
Introduction - Hugging Face NLP Course
No ratings yet
Introduction - Hugging Face NLP Course
8 pages
MemGPT - Towards LLMs As Operating Systems - 2310.08560
No ratings yet
MemGPT - Towards LLMs As Operating Systems - 2310.08560
15 pages
LLM Agents - Prompt Engineering Guide
No ratings yet
LLM Agents - Prompt Engineering Guide
16 pages
Generative AI
No ratings yet
Generative AI
25 pages
Generative AI Interview Questions and Answers
No ratings yet
Generative AI Interview Questions and Answers
7 pages
Day 1
No ratings yet
Day 1
32 pages
Meta AI's Llama 3.1: The Powerhouse of Open-Source Language Models
No ratings yet
Meta AI's Llama 3.1: The Powerhouse of Open-Source Language Models
8 pages
(Jan 2024) OCRBench
No ratings yet
(Jan 2024) OCRBench
13 pages
Create GUI Python Programs
No ratings yet
Create GUI Python Programs
2 pages
How To Build A Python GUI Application With Wxpython
No ratings yet
How To Build A Python GUI Application With Wxpython
17 pages
Support For GraphQL in generateDS
No ratings yet
Support For GraphQL in generateDS
6 pages
Paddle OCR EN
No ratings yet
Paddle OCR EN
16 pages
A Python Book Beginning Python Advanced Python and Python Exercises
No ratings yet
A Python Book Beginning Python Advanced Python and Python Exercises
261 pages
Therapist GPT
No ratings yet
Therapist GPT
2 pages
PaddlePaddle Generative Adversarial Network CN
No ratings yet
PaddlePaddle Generative Adversarial Network CN
5 pages
A Cross-Platform ChatGPT Gemini UI
No ratings yet
A Cross-Platform ChatGPT Gemini UI
15 pages
ERNIE
No ratings yet
ERNIE
7 pages
Learning Assistant
No ratings yet
Learning Assistant
6 pages
Llama 3 - Open Model That Is Truly Useful
No ratings yet
Llama 3 - Open Model That Is Truly Useful
19 pages
MemGPT - Unlimited Context (Memory) For LLMs
No ratings yet
MemGPT - Unlimited Context (Memory) For LLMs
11 pages
Writing & Blogging
No ratings yet
Writing & Blogging
8 pages
Kwai Agents
No ratings yet
Kwai Agents
7 pages
Learning Different Languages
No ratings yet
Learning Different Languages
9 pages
Auto GPT
No ratings yet
Auto GPT
7 pages
Awesome AI Agents
100% (2)
Awesome AI Agents
35 pages
Document Classification With LayoutLMv3
No ratings yet
Document Classification With LayoutLMv3
25 pages
Chat With Multiple PDFs Using Llama 2 and LangChain
No ratings yet
Chat With Multiple PDFs Using Llama 2 and LangChain
17 pages
Agents
No ratings yet
Agents
4 pages
Flux.1-Dev - Photorealistic (And Cute) Images
100% (1)
Flux.1-Dev - Photorealistic (And Cute) Images
15 pages
ChatGPT-repositories ZH
No ratings yet
ChatGPT-repositories ZH
81 pages
Fine-Tuning Llama 2 On A Custom Dataset
No ratings yet
Fine-Tuning Llama 2 On A Custom Dataset
22 pages
LangChain QuickStart With Llama 2
No ratings yet
LangChain QuickStart With Llama 2
16 pages
Awesome Japanese NLP Resources
No ratings yet
Awesome Japanese NLP Resources
32 pages
ChatGPT-repositories JP
0% (1)
ChatGPT-repositories JP
102 pages
Prompts For Large Language Models
No ratings yet
Prompts For Large Language Models
6 pages

LLaVA - Large Multimodal Model

Uploaded by

LLaVA - Large Multimodal Model

Uploaded by

You might also like