0% found this document useful (0 votes)

132 views29 pages

Introducing Multimodal Llama 3.2

Uploaded by

aegr82

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

132 views29 pages

Introducing Multimodal Llama 3.2

Uploaded by

aegr82

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Introducing Multimodal Llama 3.

Multimodal Prompting
Llama 3.2 vision models

Pretrained:

● Llama-3.2 1B (text only)

● Llama-3.2 3B (text only)

● Llama-3.2 11B-Vision (text+image)

● Llama-3.2 90B-Vision (text+image)

Instruction-tuned:

● Llama-3.2 1B-Instruct (text only)

● Llama-3.2 3B-Instruct (text only)

● Llama-3.2 11B-Vision-Instruct (text+image)

● Llama-3.2 90B-Vision-Instruct (text+image)

Llama 3.2 text model

● Built on top of Llama 3.1 text-only models.

● The same tokenizer as Llama 3.1
● The same 128k context window.

Text capabilities:
● Llama 3.2 11B ⇔ Llama 3.1 8B
● Llama 3.2 90B ⇔ Llama 3.1 70B

Supported languages:
● The same as Llama 3.1.
● For text only tasks: English, German, French,
Italian, Portuguese, Hindi, Spanish, and Thai.
● For image+text applications: Only English.
Example

<|begin_of_text|>

<|start_header_id|>

user

<|end_header_id|>

<|image|>Describe this image in two

sentences.

<|eot_id|>

<|start_header_id|>

assistant

<|end_header_id|>
Use case

Counting the llamas Plant recognition

Tire pressure warning Dog breed recognition

Use case

Analyzing multiple receipt images

Interior design assistant Nutrition facts to JSON

Use case

Convert diagram to code Convert plot to table

Analyze fridge content Grade math homework

Use case: image to tool call

Where is this place?

What is the current weather at this place

<|python_tag|>
brave_search.call(
query="current weather in San Francisco")
Introducing Multimodal Llama 3.2

Prompt Format
The Llama 3.1 & 3.2 supported
roles

system - Sets the context in which to interact with the AI

model. It typically includes rules, guidelines, or necessary
information that helps the model respond effectively.

user - Represents the human interacting with the model.

It includes the inputs, commands, and questions to the
model.

ipython - A new role introduced in Llama 3.1.

Semantically, this role means "tool". This role is used to
mark messages with the output of a tool call when sent
back to the model from the executor.

assistant - Represents the response generated by the AI

model based on the context provided in the ‘system’,
‘ipython’ and ‘user’ prompts.

Each role is set between the special tokens

<|start_header_id|> and <|end_header_id|>.
"<|begin_of_text|>",
"<|end_of_text|>",
"<|reserved_special_token_0|>",
"<|reserved_special_token_1|>",
"<|finetune_right_pad_id|>",
"<|step_id|>",
"<|start_header_id|>",
"<|end_header_id|>",
"<|eom_id|>",
"<|eot_id|>",
"<|python_tag|>",
Special tokens for single-turn
and multi-turn chat

1. <|begin_of_text|>: Start of a prompt.

2. <|start_header_id|>: Start of a role for a

particular message. Possible roles are: system, user,
assistant and ipython.

3. <|end_header_id|>: End of the role for a particular

message.

4. <|eot_id|>: End of a turn, which can be the end of

the model's interaction with the user or a tool.
Special tokens for tool calling

5. <|eom_id|>: End of Message. A message

represents a possible stopping point where the
model can inform the execution environment that a
tool call needs to be made.

6. <|python_tag|>: A special tag used in the model’s

response to signify a tool call.
Special tokens for fine-tuning
and base model

7. <|finetune_right_pad_id|>: Used for padding

text sequences in a batch to the same length.

8. <|end_of_text|>: Model will cease to generate

more tokens after this. This token is generated only
by the base models.
Tokenization

👩 "How many
‘r’s in
Strawberry?"
tokenizer
[4438, 1690,
3451, 81, 753,
304, 73700,30] 🦙
"GenAI is amazing!"

tokenizer
Parse

GenAI is amazing !

Lookup

[10172, 15836, 374, 8056, 758]

deeplearning.ai/short-courses/retrieval-optimization-from-tokenizati
on-to-vector-quantization
Vocabulary

128,000 Token vocabulary

🦙
�� 30-50,000 word vocabulary

'!'
'”’
'he':
'#'
' e'
'$'
'lo'
…
' M' '<Props'
' be' ' famille'
100,000

'ers' ' Helmet' ' obscured'

'ertiary' ' landslide'
'athi' ' bedside'
' cultivate' ' barang'
'-elected'
' ceramics'
' імені'
28,000

'奧'
' университ'
' thăm'
' листопада'
'२०'
Llama 3 family tokenization

The Llama 3 family uses the open source tiktoken3, Byte Pair
Encoding (BPE) tokenizer.

The family has a vocabulary of 128K tokens. This is four times

more than Lllama 2. and, just for reference, most people use a
vocabulary of 30-50 thousand words, so, it is big! Compared to
the Llama 2 tokenizer, the new tokenizer improves compression
rates on a sample of English data from roughly 3.2 to four
characters per token so large prompts require fewer tokens
which speeds up inference.

In the 128K tokens:

– 100K tokens are from the tiktoken3 tokenizer

– 28K additional tokens to better support non-English

languages.
tokenizer.encode("hello world")

[15339, 1917]

tokenizer.decode([1917])

world
Steps below summarize how
LLMs Process Input

Text → Tokens: The raw input text is split into tokens

using a tokenizer.
Tokens → Integer IDs: Each token is mapped to an
integer ID from the tokenizer model’s vocabulary.
Integer IDs → Embeddings: These integer IDs are
then converted into embeddings, which are the real
inputs to LLMs.
Tool Calling

● Accessing real time info

● Performing complex math or code tasks

● Interacting with external data and systems

● Building dynamic agents

In this lesson

● The Llama’s special tokens and role for tool calling.

● Built-in tool calling: search, wolfram alpha and

code interpreter.

● Custom tool calling

Use case: image to tool call

Tools

What is the
��
current weather Llama
in San Francisco
It is 72F
in San
Francisco
<|python_tag|> today
brave_search.call(query="curre
nt weather in San Francisco")

Search result 1
Search result 2
What is the
current
��
Search result 3 weather in San Llama
... Francisco
Models work in a system

input
AI System
● Multilingual safety models,
● a prompt injection filter
System ● Cybersecurity Evaluation
Safeguard Suite

�� Memory

System Agentic systems require

Safeguard ● external tool use
● memory

Agent
Tools

LLM reference System

Model Lifecycle

Curate
Synthetic Datasets
Data Finetune
Generation

🦙
Monitoring
& Human Align
Feedback

Inference Evaluate
Llama Stack APIs
Agentic Apps
End applications

Agentic System API

System component orchestration

PromptStore Assistant Shields

Memory Orchestrator

Model Toolchain API

Model development & production tools

Batch Inference Realtime Inference Quantized Inference

Continual Pretraining Evals Fine Tuning

Pretraining Reward Scoring Synthetic Data Gen

Data
Models
Pretraining, preference,
Core, safety, customized
post training

Hardware
GPUs, accelerators, storage
Llama Stack APIs
Agentic Apps
End applications

Agentic System API

System component orchestration

PromptStore Assistant Shields

Memory Orchestrator

Model Toolchain API

Model development & production tools

Batch Inference Realtime Inference Quantized Inference

Continual Pretraining Evals Fine Tuning

Pretraining Reward Scoring Synthetic Data Gen

Data
Models
Pretraining, preference,
Core, safety, customized
post training

Hardware
GPUs, accelerators, storage
Llama Stack APIs
Agentic Apps
End applications

Agentic System API

System component orchestration

PromptStore Assistant Shields

Memory Orchestrator

Model Toolchain API

Model development & production tools

Batch Inference Realtime Inference Quantized Inference

Continual Pretraining Evals Fine Tuning

Pretraining Reward Scoring Synthetic Data Gen

Data
Models
Pretraining, preference,
Core, safety, customized
post training

Hardware
GPUs, accelerators, storage
Llama Guard 3 8B

Supports 14 hazard categories:

* S1: Violent Crimes.

* S2: Non-Violent Crimes.
* S3: Sex Crimes.
* S4: Child Exploitation.
* S5: Defamation.
* S6: Specialized Advice.
* S7: Privacy.
* S8: Intellectual Property.
* S9: Indiscriminate Weapons.
* S10: Hate.
* S11: Self-Harm.
* S12: Sexual Content.
* S13: Elections.
* S14: Code Interpreter Abuse.

Llama Guard 3 8B is multilingual

Model Tool Chain API

Model Toolchain API

Model development & production tools

Batch Inference Realtime Inference Quantized Inference

Continual Pretraining Evaluations Finetuning

Pretraining Reward Scoring Synthetic Data Gen

● Inference - to support serving the models for applications

● datasets - to support creating training and evaluation data
sets
● Finetuning - to support creating and managing supervised
finetuning (SFT) or preference optimization jobs
● Evaluations - to support creating and managing evaluations
for capabilities like question answering, summarization, or text
- generation
● Synthetic Data Generation - to support generating synthetic
data using data generation model and a reward model
● Reward Scoring - to support synthetic data generation
●
Agentic System API

Agentic System API

System component orchestration

PromptStore Assistant Shields

Memory Orchestrator

● memory- to support creating multiple repositories of

data that can be available for agentic systems
● agentic_system - to support creating and running
agentic systems.
● The sub-APIs support the creation and management of
the steps, turns, and sessions within agentic
applications.
○ step - there can be inference, memory retrieval, tool call,
or shield call steps
○ turn - each turn begins with a user message and results
in a loop consisting of multiple steps, followed by a
response back to the user
○ session - each session consists of multiple turns that the
model is reasoning over
○ memory_bank - a memory bank allows for the agentic
system to perform retrieval augmented generation
●
Llama Guard

Llama Guard 3 models:

● 8B - fine-tuned on Llama 3.1 8B. It provides

industry leading system-level safety
performance and is recommended to be
deployed along with Llama 3.1.
● 11B Vision: fine-tuned on Llama 3.2 vision model
and designed to support image reasoning use
cases and was optimized to detect harmful
multimodal (text and image) prompts and text
responses to these prompts.
● 1B - a lightweight input and output moderation
model, optimized for deployment on mobile
devices.

LangChain Programming For Beginners
No ratings yet
LangChain Programming For Beginners
154 pages
How To Build AI Agent Cheat Sheet by Dr. Maryam Miradi
No ratings yet
How To Build AI Agent Cheat Sheet by Dr. Maryam Miradi
2 pages
Generative AI A Transformative Force in Business Intelligence
No ratings yet
Generative AI A Transformative Force in Business Intelligence
7 pages
NCA-GENL Nvidia Generative Ai Llms Exam Dumps
No ratings yet
NCA-GENL Nvidia Generative Ai Llms Exam Dumps
5 pages
Chap6 Stair Design MDM
No ratings yet
Chap6 Stair Design MDM
33 pages
Ollama - Your Shortcut To Supercharged Applications - Bridge The Gap With LLMs - by Kanishk Khatter - Medium
No ratings yet
Ollama - Your Shortcut To Supercharged Applications - Bridge The Gap With LLMs - by Kanishk Khatter - Medium
16 pages
AI Engineer Roadmap
No ratings yet
AI Engineer Roadmap
22 pages
Huggingface Basics
No ratings yet
Huggingface Basics
28 pages
LLM Cheat Sheetpdf
No ratings yet
LLM Cheat Sheetpdf
7 pages
The Rise of Vector Databases in The Age of LLMs
No ratings yet
The Rise of Vector Databases in The Age of LLMs
26 pages
Newwhitepaper Agents2
No ratings yet
Newwhitepaper Agents2
84 pages
Gen AI Roadmap 2025
No ratings yet
Gen AI Roadmap 2025
19 pages
10 Real-World Agentic AI Examples and Use Cases - TechTarget
100% (1)
10 Real-World Agentic AI Examples and Use Cases - TechTarget
9 pages
Everything You Need To Know About Small Language Models (SLM) and Its Applications
No ratings yet
Everything You Need To Know About Small Language Models (SLM) and Its Applications
3 pages
Levels of AI Agents - From Rules To Large Language Models
No ratings yet
Levels of AI Agents - From Rules To Large Language Models
8 pages
5 Pretraining On Unlabeled Data - Build A Large Language Model (From Scratch)
No ratings yet
5 Pretraining On Unlabeled Data - Build A Large Language Model (From Scratch)
61 pages
Fine Tuning Techniques For Large Language Models LLMs
No ratings yet
Fine Tuning Techniques For Large Language Models LLMs
15 pages
Building Intelligent Agents With Semantic Kernel: A Comprehensive Guide
No ratings yet
Building Intelligent Agents With Semantic Kernel: A Comprehensive Guide
16 pages
Generative AI Interview Questions and Answers
No ratings yet
Generative AI Interview Questions and Answers
7 pages
LLM Agents - Prompt Engineering Guide
No ratings yet
LLM Agents - Prompt Engineering Guide
16 pages
Generative AI Engineer Assignment
No ratings yet
Generative AI Engineer Assignment
3 pages
BS 1449-1 - 1983
100% (7)
BS 1449-1 - 1983
39 pages
FairEval - Evaluating Fairness in LLM-Based Recommendations With Personality Awareness
No ratings yet
FairEval - Evaluating Fairness in LLM-Based Recommendations With Personality Awareness
11 pages
Intro To Intelligent Apps Workshop
100% (1)
Intro To Intelligent Apps Workshop
106 pages
Generative AI Masters Brochure
No ratings yet
Generative AI Masters Brochure
38 pages
CS485 Ch5 Transformers
No ratings yet
CS485 Ch5 Transformers
50 pages
Transformers
No ratings yet
Transformers
21 pages
Maximize The Business Value of Generative Ai
No ratings yet
Maximize The Business Value of Generative Ai
19 pages
Large Language ModelBrained GUI Agents
No ratings yet
Large Language ModelBrained GUI Agents
78 pages
Technology 2025 AI Agents Will Be More Popular Than ChatGPT - 20241205 - 104020 - 0000
No ratings yet
Technology 2025 AI Agents Will Be More Popular Than ChatGPT - 20241205 - 104020 - 0000
3 pages
RCD Tester Rev.1 Sop
67% (3)
RCD Tester Rev.1 Sop
2 pages
Jailbreaking Text-To-Image Models With LLM-Based A
No ratings yet
Jailbreaking Text-To-Image Models With LLM-Based A
18 pages
Y6 Science 2021
100% (1)
Y6 Science 2021
32 pages
LLMs in Production-MLC - GRC
No ratings yet
LLMs in Production-MLC - GRC
39 pages
2023 Intro To Generative Ai
No ratings yet
2023 Intro To Generative Ai
15 pages
AI Agent Index
No ratings yet
AI Agent Index
15 pages
Res Net
No ratings yet
Res Net
13 pages
Multi-Modal Generative AI Survey
No ratings yet
Multi-Modal Generative AI Survey
23 pages
A Practical Primer To AI Agents 1736197641
No ratings yet
A Practical Primer To AI Agents 1736197641
23 pages
Generative AI
No ratings yet
Generative AI
2 pages
Jailbreaking For Education Inquiry
No ratings yet
Jailbreaking For Education Inquiry
66 pages
Analysis and Design of (Concentric, Edge, Corner) Footing: Sample Structural Manila
100% (1)
Analysis and Design of (Concentric, Edge, Corner) Footing: Sample Structural Manila
3 pages
Curriculum Map Subject: Science Quarter: 4 Grade Level: Grade 4 Topic: Earth and Space
100% (1)
Curriculum Map Subject: Science Quarter: 4 Grade Level: Grade 4 Topic: Earth and Space
5 pages
D 02 Large Language Models
100% (1)
D 02 Large Language Models
58 pages
Advanced Retrieval-Augmented Generation (RAG) With LangChain, LangGraph, and AI Agents - by Manoj Mukherjee - Oct, 2024 - Medium
No ratings yet
Advanced Retrieval-Augmented Generation (RAG) With LangChain, LangGraph, and AI Agents - by Manoj Mukherjee - Oct, 2024 - Medium
15 pages
Conversational AI Assistant
No ratings yet
Conversational AI Assistant
6 pages
LLM - A Introduction To Generative AI
100% (1)
LLM - A Introduction To Generative AI
31 pages
10 Evani Generative AI Champion
No ratings yet
10 Evani Generative AI Champion
39 pages
Building Your Own Autonomous LLM Agents - LinkedIn
No ratings yet
Building Your Own Autonomous LLM Agents - LinkedIn
33 pages
XS Series E Appen 7 Installation PDF
No ratings yet
XS Series E Appen 7 Installation PDF
101 pages
Fine-Tuning AI Models - A Guide. Fine-Tuning Is A Technique For Adapting - by Prabhu Srivastava - Medium
No ratings yet
Fine-Tuning AI Models - A Guide. Fine-Tuning Is A Technique For Adapting - by Prabhu Srivastava - Medium
12 pages
Activity Sheet 1: Purposive Communication
No ratings yet
Activity Sheet 1: Purposive Communication
4 pages
Chatbot Openai Project Report
No ratings yet
Chatbot Openai Project Report
7 pages
AASHTO M300 Inorganic Zinc-Rich Primer
100% (2)
AASHTO M300 Inorganic Zinc-Rich Primer
8 pages
LangChain & RAG
No ratings yet
LangChain & RAG
62 pages
FineTuning Process Using OpenAI 1703440516
No ratings yet
FineTuning Process Using OpenAI 1703440516
14 pages
Llama 3 - Open Model That Is Truly Useful
No ratings yet
Llama 3 - Open Model That Is Truly Useful
19 pages
Hands-On Lab With LLMs and Gen AI Within IDC
No ratings yet
Hands-On Lab With LLMs and Gen AI Within IDC
57 pages
B12158 Mastering PyTorch Ebook 15 Pages
No ratings yet
B12158 Mastering PyTorch Ebook 15 Pages
15 pages
Face Detection and Smile Detection
No ratings yet
Face Detection and Smile Detection
8 pages
Part1 Overview Release 13 en
No ratings yet
Part1 Overview Release 13 en
38 pages
Marketing Research Antivirus
No ratings yet
Marketing Research Antivirus
49 pages
Malware Capturing and Detection in Dionaea Honeypot: Dilsheer Ali. P Gireesh Kumar T
No ratings yet
Malware Capturing and Detection in Dionaea Honeypot: Dilsheer Ali. P Gireesh Kumar T
5 pages
MM-LLMs Recent Advances in MultiModal Large Language Models
No ratings yet
MM-LLMs Recent Advances in MultiModal Large Language Models
22 pages
Implementation of A Chatbot System Using Ai and NLP
No ratings yet
Implementation of A Chatbot System Using Ai and NLP
6 pages
Lecture 3: Role of Academic Librarian: Prof. Dana P. Tugade
100% (1)
Lecture 3: Role of Academic Librarian: Prof. Dana P. Tugade
34 pages
02 - Lecture Note - TensorFlow Ops
No ratings yet
02 - Lecture Note - TensorFlow Ops
21 pages
State: of Ai
No ratings yet
State: of Ai
30 pages
2022 Staticspeed Vunerability Report Template
No ratings yet
2022 Staticspeed Vunerability Report Template
57 pages
7 Habits of Highly Effective People
No ratings yet
7 Habits of Highly Effective People
2 pages
Gender: Project All Numerates Pre-Test Results
100% (1)
Gender: Project All Numerates Pre-Test Results
6 pages
Multimodal RAG Systems Hands-On Guide
No ratings yet
Multimodal RAG Systems Hands-On Guide
7 pages
Funny PHD Thesis Quotes
100% (3)
Funny PHD Thesis Quotes
4 pages
Ways To Integrate Social Emotional Learning
No ratings yet
Ways To Integrate Social Emotional Learning
21 pages
Gateway Profile 4.5 Service Manual
No ratings yet
Gateway Profile 4.5 Service Manual
90 pages
Sas#4 - Ite 303-Sia
No ratings yet
Sas#4 - Ite 303-Sia
10 pages
OD328816327605052100
No ratings yet
OD328816327605052100
1 page
Unit I Notes
No ratings yet
Unit I Notes
54 pages
Step FOPDT Lengkap
No ratings yet
Step FOPDT Lengkap
110 pages
Heading Hints A Guide To Cold Forming Specialty Alloys
No ratings yet
Heading Hints A Guide To Cold Forming Specialty Alloys
63 pages
DRAGO COSIC-prezentacija HIDROGEN
No ratings yet
DRAGO COSIC-prezentacija HIDROGEN
12 pages
2018 Oblique DT !!
No ratings yet
2018 Oblique DT !!
17 pages
20-Structural Deep Clustering Network
No ratings yet
20-Structural Deep Clustering Network
11 pages
7 8 STS Handout Key
No ratings yet
7 8 STS Handout Key
9 pages
Hytera+VM780+4G+Body+Worn+Camera+User+Manual+ (HyTalk) +V1.0.00 Eng
No ratings yet
Hytera+VM780+4G+Body+Worn+Camera+User+Manual+ (HyTalk) +V1.0.00 Eng
50 pages
Kleinman 2011
No ratings yet
Kleinman 2011
9 pages
Vector Quantization
No ratings yet
Vector Quantization
14 pages
Sciadv Aat9004
No ratings yet
Sciadv Aat9004
7 pages
Machine Reasoning Explainability
No ratings yet
Machine Reasoning Explainability
72 pages
Daftar Referensi Jurnal Enzim1
No ratings yet
Daftar Referensi Jurnal Enzim1
7 pages
Vmware - Kopia
No ratings yet
Vmware - Kopia
45 pages
Pospiszyl 2023 The Fifth Element The Enlightenment and The Draining of Eastern Europe
No ratings yet
Pospiszyl 2023 The Fifth Element The Enlightenment and The Draining of Eastern Europe
28 pages
225 WEIGHT INDICATOR Installation and Technical Manual
No ratings yet
225 WEIGHT INDICATOR Installation and Technical Manual
156 pages
Epic Minigeddon2
No ratings yet
Epic Minigeddon2
1 page
Hopfield Networks: Fundamentals and Applications of The Neural Network That Stores Memories
From Everand
Hopfield Networks: Fundamentals and Applications of The Neural Network That Stores Memories
Fouad Sabry
No ratings yet
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
From Everand
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
Fouad Sabry
No ratings yet