0% found this document useful (0 votes)

108 views43 pages

MLSys Class LLM Introduction

The document introduces language models including BERT, GPT, and T5 which use techniques like masked language modeling, causal language modeling, and text-to-text transfer. It discusses how transformer models use attention and self-attention. The document compares BERT and GPT and explains how pretraining, fine-tuning, prompting, and reinforcement learning from human feedback are used. It raises questions about the advantages and disadvantages of different training methods, the role of systems research in scaling language models, security considerations, and improving energy efficiency.

Uploaded by

Ali Elouafiq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

108 views43 pages

MLSys Class LLM Introduction

Uploaded by

Ali Elouafiq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

Introduction to

Language Models
Eve Fleisig & Kayo Yin
CS 294-162
August 28, 2023
Language Modeling

Image credit: jalammar.github.io/illustrated-word2vec/

Masked Language Modeling
BERT

Image credit: jalammar.github.io/illustrated-bert/

Causal Language Modeling
GPT

Image credit: jalammar.github.io/illustrated-gpt2/

BERT vs. GPT

● Bidirectional encoder models (BERT) do better than generative models at

non-generation tasks, for comparable training data/model complexity.

● Generative models (GPT) have training efficiency and scalability advantages

that may make them ultimately more accurate. They can also solve
downstream tasks in a zero-shot setting.
Transformer

Image credit: jalammar.github.io/illustrated-transformer/

Transformer

Image credit: jalammar.github.io/illustrated-transformer/

Transformer

Image credit: jalammar.github.io/illustrated-transformer/ v

Attention
Self-Attention
Self-Attention

Image credit: jalammar.github.io/illustrated-gpt2/

Self-Attention

Image credit: jalammar.github.io/illustrated-gpt2/

Self-Attention
Self-Attention
Self-Attention
Self-Attention
Multi-headed Attention
Multi-headed Attention
Transformer

Image credit: jalammar.github.io/illustrated-transformer/

Transformer Input
Transformer Encoder

Image credit: jalammar.github.io/illustrated-transformer/

Adding the Decoder

Image credit: jalammar.github.io/illustrated-transformer/

BERT

Image credit: jalammar.github.io/illustrated-bert/

BERT
GPT
GPT
T5

Text-to-Text Transfer Transformer

Pretraining & Fine-tuning
Pretraining & Fine-tuning
Pretraining & Fine-tuning

Unsupervised objective

Supervised objective
Prefixes & Prompting
Few- & Zero-Shot Learning
Few- & Zero-Shot Learning
Few- & Zero-Shot Learning
Few- & Zero-Shot Learning

Generalization to new tasks without fine-tuning enabled by:

Scaling
Data Compute
Scaling Data
Common Crawl dataset: introduced with T5; still in use
GPT-3 Training Data:
Scaling Data & Compute

Kaplan et al., 2020;

Hoffmann et al., 2022
Reinforcement Learning from Human Feedback
Reinforcement Learning from Human Feedback
Reinforcement Learning from Human Feedback
Discussion
● What are the advantages and disadvantages of different training or tuning methods
that have been tried (task-specific training, pretrain/fine-tune, prompting, RLHF)?
● What is the role of systems research in scaling up LLMs? How could advances in
systems research change scaling “laws”?
● What security considerations do we need to consider when deploying LLMs into the
real world?
● How can we improve the energy efficiency and carbon footprint of LLMs?

LLM Basics
No ratings yet
LLM Basics
35 pages
Foundations of LLM
No ratings yet
Foundations of LLM
231 pages
Large Language Models: CSC413 Tutorial 9 Yongchao Zhou
No ratings yet
Large Language Models: CSC413 Tutorial 9 Yongchao Zhou
40 pages
LLM Cheat Sheetpdf
No ratings yet
LLM Cheat Sheetpdf
7 pages
Jason Weston Reasoning Alignment Berkeley Talk
No ratings yet
Jason Weston Reasoning Alignment Berkeley Talk
106 pages
IEEE C57.113-1991 - Guide For Partial Discharge Measurement in Liquid-Filled Power Transformers and Shunt Reactors PDF
No ratings yet
IEEE C57.113-1991 - Guide For Partial Discharge Measurement in Liquid-Filled Power Transformers and Shunt Reactors PDF
14 pages
Building LLMs - Stanford
No ratings yet
Building LLMs - Stanford
78 pages
CM20315_01_Intro
No ratings yet
CM20315_01_Intro
62 pages
E4. LLM Instruction Tuning
No ratings yet
E4. LLM Instruction Tuning
45 pages
Training the application of LLM
No ratings yet
Training the application of LLM
68 pages
Cs224u Intro 2023 Handout
No ratings yet
Cs224u Intro 2023 Handout
98 pages
All about Encoder-Decoder Models
No ratings yet
All about Encoder-Decoder Models
50 pages
Accelerate Model Training with PyTorch 2.X: Build more accurate models by boosting the model training process
From Everand
Accelerate Model Training with PyTorch 2.X: Build more accurate models by boosting the model training process
Maicon Melo Alves
No ratings yet
Navigating Prompt Complexity For ZeroShot Classification
No ratings yet
Navigating Prompt Complexity For ZeroShot Classification
12 pages
Perspectives in Business Ethics
No ratings yet
Perspectives in Business Ethics
113 pages
Synthetic Data LLM RL
No ratings yet
Synthetic Data LLM RL
33 pages
Slides
No ratings yet
Slides
137 pages
From Words To Numbers: Your Large Language Model Is Se-Cretly A Capable Regressor When Given In-Context Examples
No ratings yet
From Words To Numbers: Your Large Language Model Is Se-Cretly A Capable Regressor When Given In-Context Examples
50 pages
Lecture Notes
No ratings yet
Lecture Notes
86 pages
LLM Learning
No ratings yet
LLM Learning
56 pages
Do Large Language Models Need Sensory Grounding For Meaning and Understanding?
No ratings yet
Do Large Language Models Need Sensory Grounding For Meaning and Understanding?
38 pages
Roisinluo Reasoning in LLMs
No ratings yet
Roisinluo Reasoning in LLMs
72 pages
14 04 Transformers
No ratings yet
14 04 Transformers
11 pages
Transformer Basics
No ratings yet
Transformer Basics
17 pages
Lecture 7
No ratings yet
Lecture 7
66 pages
19 20-gpt-3 Prompts
No ratings yet
19 20-gpt-3 Prompts
68 pages
To create a LLM
No ratings yet
To create a LLM
53 pages
12. LLM Prompting & In-Context Learning
No ratings yet
12. LLM Prompting & In-Context Learning
18 pages
Lecture 15 - Foundation Models - CLIP and GPT
No ratings yet
Lecture 15 - Foundation Models - CLIP and GPT
45 pages
2023 LLMBC Whats Next
No ratings yet
2023 LLMBC Whats Next
95 pages
Jacob Devlin BERT
No ratings yet
Jacob Devlin BERT
43 pages
Foundations of Large Language Models 1738142777
No ratings yet
Foundations of Large Language Models 1738142777
101 pages
Ebooks File Superconductivity: From Materials Science To Practical Applications Paolo Mele All Chapters
100% (3)
Ebooks File Superconductivity: From Materials Science To Practical Applications Paolo Mele All Chapters
62 pages
ICML 2018 Notes: Stockholm, Sweden
No ratings yet
ICML 2018 Notes: Stockholm, Sweden
55 pages
Large Language Model
0% (1)
Large Language Model
38 pages
Large large models
No ratings yet
Large large models
25 pages
LLM_introduction 2024
No ratings yet
LLM_introduction 2024
77 pages
ML 22
No ratings yet
ML 22
29 pages
14-LookingForward
No ratings yet
14-LookingForward
48 pages
Generative Ai Terminology
100% (2)
Generative Ai Terminology
26 pages
Exp
No ratings yet
Exp
24 pages
2021-2022-s4-2nd-term-exam-math-cp-1
No ratings yet
2021-2022-s4-2nd-term-exam-math-cp-1
12 pages
ssrn-4504303
No ratings yet
ssrn-4504303
8 pages
COMMUNICATION AND NEGOTIATION
No ratings yet
COMMUNICATION AND NEGOTIATION
40 pages
Fine Tuning Techniques for Large Language Models LLMs
No ratings yet
Fine Tuning Techniques for Large Language Models LLMs
15 pages
Large Language Models Johns Hopkins University
No ratings yet
Large Language Models Johns Hopkins University
54 pages
Huggingface Co Blog Warm Starting Encoder Decoder Data Preprocessing
No ratings yet
Huggingface Co Blog Warm Starting Encoder Decoder Data Preprocessing
20 pages
Building Your Own GPT: A Step-by-Step Guide to Creating Custom AI Models
From Everand
Building Your Own GPT: A Step-by-Step Guide to Creating Custom AI Models
Peter Lengyel
No ratings yet
Large Language Models: Dr. Asgari, Dr. Rohban, Soleymani Fall 2023
No ratings yet
Large Language Models: Dr. Asgari, Dr. Rohban, Soleymani Fall 2023
53 pages
Alberto - Leon-Garcia 2009 Student Solutions Manual
86% (7)
Alberto - Leon-Garcia 2009 Student Solutions Manual
204 pages
L L M C S - I: Arge Anguage Odels AN ELF Mprove
No ratings yet
L L M C S - I: Arge Anguage Odels AN ELF Mprove
19 pages
lec20.LLM
No ratings yet
lec20.LLM
58 pages
Advanced Prompt Engineering
No ratings yet
Advanced Prompt Engineering
27 pages
2403.01081v3
No ratings yet
2403.01081v3
10 pages
Kelompok 2
No ratings yet
Kelompok 2
11 pages
A E C P T L L M: A P ' G: N Mpirical Ategorization of Rompting Echniques FOR Arge Anguage Odels Ractitioner S Uide
No ratings yet
A E C P T L L M: A P ' G: N Mpirical Ategorization of Rompting Echniques FOR Arge Anguage Odels Ractitioner S Uide
16 pages
DAB311 DL Week 11 RNN
No ratings yet
DAB311 DL Week 11 RNN
25 pages
Introduction To LLMS: Transformers Types of Llms Configuration Settings
100% (2)
Introduction To LLMS: Transformers Types of Llms Configuration Settings
7 pages
Unveiling the Secrets of ChatGPT Inside the Mind of an AI
From Everand
Unveiling the Secrets of ChatGPT Inside the Mind of an AI
Nelson Ambrose
No ratings yet
2503.10814v1
No ratings yet
2503.10814v1
15 pages
Augmenting LLMs Survey
No ratings yet
Augmenting LLMs Survey
33 pages
Jason Wei Stanford cs330 Talk
No ratings yet
Jason Wei Stanford cs330 Talk
44 pages
AI Professional Workshop
No ratings yet
AI Professional Workshop
32 pages
Eucharistic Miracle Tixtla
100% (2)
Eucharistic Miracle Tixtla
36 pages
January 2012 QP - M1 OCR
No ratings yet
January 2012 QP - M1 OCR
4 pages
Modelling of Flat Plate and V-Corrugated Solar Air Heaters For Single and Counter Flow Operating Modes
No ratings yet
Modelling of Flat Plate and V-Corrugated Solar Air Heaters For Single and Counter Flow Operating Modes
15 pages
Szczerba TarskiGeometry 1986
No ratings yet
Szczerba TarskiGeometry 1986
7 pages
Study Plan
No ratings yet
Study Plan
4 pages
Suicide Note (Turnitin)
No ratings yet
Suicide Note (Turnitin)
3 pages
Wind Power Statistics and An Evaluation of Wind Energy D - 1995 - Renewable Ener
No ratings yet
Wind Power Statistics and An Evaluation of Wind Energy D - 1995 - Renewable Ener
6 pages
Physics Work Power Energy Questions class 11
No ratings yet
Physics Work Power Energy Questions class 11
3 pages
Wazwaz2006 The Modified Decomposition Method and Pade Approximants For A Boundary Layer Equation in Unbounded Domain
No ratings yet
Wazwaz2006 The Modified Decomposition Method and Pade Approximants For A Boundary Layer Equation in Unbounded Domain
8 pages
HSE Training
No ratings yet
HSE Training
22 pages
week 11 chats
No ratings yet
week 11 chats
5 pages
Mahmoud Ben Romdhane & Sam Moyo (Eds) - Peasant Organisations and The Democratisation Process in Africa
No ratings yet
Mahmoud Ben Romdhane & Sam Moyo (Eds) - Peasant Organisations and The Democratisation Process in Africa
3 pages
Failure of Foundation Due To Earthquake - B.tech. Project Report
No ratings yet
Failure of Foundation Due To Earthquake - B.tech. Project Report
60 pages
DHA Lab Guidlines 2019-717-720
No ratings yet
DHA Lab Guidlines 2019-717-720
4 pages
Toc 9780138199302
No ratings yet
Toc 9780138199302
8 pages
Title: Case Study 2 (Live) : College of Agriculture, Indore
No ratings yet
Title: Case Study 2 (Live) : College of Agriculture, Indore
11 pages
Twelfth Nigh by William Shakespear Teacher Resource18
No ratings yet
Twelfth Nigh by William Shakespear Teacher Resource18
26 pages
NCDDP AF Sub-Manual - M - E, Aug2021
No ratings yet
NCDDP AF Sub-Manual - M - E, Aug2021
273 pages
DLP - Basic Cal - Q4 - WK1 - April 1-April4 - Dayrit
No ratings yet
DLP - Basic Cal - Q4 - WK1 - April 1-April4 - Dayrit
7 pages
Voiceprint Manual
100% (1)
Voiceprint Manual
30 pages
Slime Folk Basic Edition by DM Tuz
50% (2)
Slime Folk Basic Edition by DM Tuz
6 pages
TAFJ Others Tools
100% (1)
TAFJ Others Tools
17 pages
Industrial Lubricants - Hardcastle Petrofer
No ratings yet
Industrial Lubricants - Hardcastle Petrofer
8 pages
Intro To The Philosophy of The HP Q2 Module 1
No ratings yet
Intro To The Philosophy of The HP Q2 Module 1
18 pages
How to use ChatGPT
From Everand
How to use ChatGPT
Bernhard Gaum
No ratings yet
ENG 3 (Speech and Oral Communication)
No ratings yet
ENG 3 (Speech and Oral Communication)
6 pages
A Summary of The "Terra Papers"
100% (5)
A Summary of The "Terra Papers"
5 pages

MLSys Class LLM Introduction

Uploaded by

MLSys Class LLM Introduction

Uploaded by

Introduction to

Image credit: jalammar.github.io/illustrated-word2vec/

Image credit: jalammar.github.io/illustrated-bert/

Image credit: jalammar.github.io/illustrated-gpt2/

● Bidirectional encoder models (BERT) do better than generative models at

● Generative models (GPT) have training efficiency and scalability advantages

Image credit: jalammar.github.io/illustrated-transformer/

Image credit: jalammar.github.io/illustrated-transformer/

Image credit: jalammar.github.io/illustrated-transformer/ v

Image credit: jalammar.github.io/illustrated-gpt2/

Image credit: jalammar.github.io/illustrated-gpt2/

Image credit: jalammar.github.io/illustrated-transformer/

Image credit: jalammar.github.io/illustrated-transformer/

Image credit: jalammar.github.io/illustrated-transformer/

Image credit: jalammar.github.io/illustrated-bert/

Text-to-Text Transfer Transformer

Generalization to new tasks without fine-tuning enabled by:

Kaplan et al., 2020;

You might also like