Introducing Multimodal Llama 3.2
Introducing Multimodal Llama 3.2
Multimodal Prompting
Llama 3.2 vision models
Pretrained:
Instruction-tuned:
Text capabilities:
● Llama 3.2 11B ⇔ Llama 3.1 8B
● Llama 3.2 90B ⇔ Llama 3.1 70B
Supported languages:
● The same as Llama 3.1.
● For text only tasks: English, German, French,
Italian, Portuguese, Hindi, Spanish, and Thai.
● For image+text applications: Only English.
Example
<|begin_of_text|>
<|start_header_id|>
user
<|end_header_id|>
<|eot_id|>
<|start_header_id|>
assistant
<|end_header_id|>
Use case
<|python_tag|>
brave_search.call(
query="current weather in San Francisco")
Introducing Multimodal Llama 3.2
Prompt Format
The Llama 3.1 & 3.2 supported
roles
👩 "How many
‘r’s in
Strawberry?"
tokenizer
[4438, 1690,
3451, 81, 753,
304, 73700,30] 🦙
"GenAI is amazing!"
tokenizer
Parse
GenAI is amazing !
Lookup
🦙
�� 30-50,000 word vocabulary
'!'
'”’
'he':
'#'
' e'
'$'
'lo'
…
' M' '<Props'
' be' ' famille'
100,000
'奧'
' университ'
' thăm'
' листопада'
'२०'
Llama 3 family tokenization
The Llama 3 family uses the open source tiktoken3, Byte Pair
Encoding (BPE) tokenizer.
[15339, 1917]
tokenizer.decode([1917])
world
Steps below summarize how
LLMs Process Input
In this lesson
Tools
What is the
��
current weather Llama
in San Francisco
It is 72F
in San
Francisco
<|python_tag|> today
brave_search.call(query="curre
nt weather in San Francisco")
Search result 1
Search result 2
What is the
current
��
Search result 3 weather in San Llama
... Francisco
Models work in a system
input
AI System
● Multilingual safety models,
● a prompt injection filter
System ● Cybersecurity Evaluation
Safeguard Suite
�� Memory
Agent
Tools
Curate
Synthetic Datasets
Data Finetune
Generation
🦙
Monitoring
& Human Align
Feedback
Inference Evaluate
Llama Stack APIs
Agentic Apps
End applications
Memory Orchestrator
Data
Models
Pretraining, preference,
Core, safety, customized
post training
Hardware
GPUs, accelerators, storage
Llama Stack APIs
Agentic Apps
End applications
Memory Orchestrator
Data
Models
Pretraining, preference,
Core, safety, customized
post training
Hardware
GPUs, accelerators, storage
Llama Stack APIs
Agentic Apps
End applications
Memory Orchestrator
Data
Models
Pretraining, preference,
Core, safety, customized
post training
Hardware
GPUs, accelerators, storage
Llama Guard 3 8B
Memory Orchestrator