0% found this document useful (0 votes)

4 views

01 Introduction

Uploaded by

Igor Viveiros Souza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

01 Introduction

Uploaded by

Igor Viveiros Souza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 60

UNIVERSIDADE*FEDERAL

DE*MINAS*GERAIS

Advanced Seminars on Large Language Models

Introduction to LLMs
Rodrygo L. T. Santos
[email protected]
Silhouette of a human female on the left
and a humanoid AI on the right; a white
wire connects their brains through their
mouths symbolizing communication.
By DALL·E 3
Language

A natural ability for humans

◦ Effortless use for communication
◦ Expressive of thoughts, emotions, instructions
A challenge for machines
◦ Ambiguity, context-dependency, nuanced semantics
A milestone towards AGI?
Credit: Shutterstock
Credit: Forbes
Credit: The Verge
By Codex
Credit: The Register
By DALL·E 3
Photorealistic closeup video of two
pirate ships battling each other as
they sail inside a cup of coffee.
By SORA
Language model

A probability distribution over word sequences

◦ 𝑃(“Today is Wednesday”) » 0.001
◦ 𝑃(“Today Wednesday is”) » 0.0000000000001
◦ 𝑃(“The eigenvalue is positive”) » 0.00001
Also a mechanism for “generating” text
◦ 𝑃(“Wednesday”|“Today is”) > 𝑃(“blah”|“Today is”)
Language model

Ideal (aka full dependence) model

◦ 𝑃 𝑤! … 𝑤" = 𝑃 𝑤! 𝑃 𝑤# 𝑤! … 𝑃 𝑤" 𝑤! … 𝑤"$!
Infeasible in practice
◦ Expensive computation
◦ Poor estimation (data sparsity)
Evolution of language models

Statistical LMs
(1950s-1990s)
Tunable dependence via n-grams

3-gram (”trigram”)
◦ 𝑃 𝑤! … 𝑤" = 𝑃 𝑤! 𝑃 𝑤# 𝑤! … 𝑃 𝑤" 𝑤"$# , 𝑤"$!
2-gram (”bigram”)
◦ 𝑃 𝑤! … 𝑤" = 𝑃 𝑤! 𝑃 𝑤# 𝑤! … 𝑃 𝑤" 𝑤"$!
1-gram (”unigram”)
◦ 𝑃 𝑤! … 𝑤" = 𝑃 𝑤! 𝑃(𝑤# ) … 𝑃(𝑤" )
Improved estimation via smoothing

𝑃(𝑤)

maximum
likelihood
estimation

smoothed
estimation

word 𝑤
Evolution of language models

Statistical LMs Neural LMs

(1950s-1990s) (2013)
Neurons
output Improved word-level representation
𝑤
%# ◦ From sparse to distributional semantics
◦ Better generalization to unseen data
dense Context still lacking
network ◦ Fixed-length input and output
◦ Non-sequential representation

𝑤!:#$!
input
Neurons… with recurrence!
output

𝑤
%# 𝑤
%! 𝑤
%& 𝑤
%'

recurrent dense ℎ% dense ℎ! dense ℎ&

network network network network

ℎ#$!

𝑤#$! 𝑤% 𝑤! 𝑤&
input
Neurons… with recurrence!
output Sequential bless
𝑤
%# ◦ Dynamic state maintains linguistic context
◦ Enables handling variable-length sequences
recurrent Sequential curse
network ◦ Single state as information bottleneck
ℎ#$! ◦ Inherently non-parallelizable

𝑤#$!
input
Evolution of language models

Statistical LMs Neural LMs Pretrained LMs

(1950s-1990s) (2013) (2018)
Vaswani et al. (NIPS 2017)
Neurons… with attention!
output

𝑤
%#

attention The animal didn’t cross the street because it was too ______

𝑤!:#$!
input
Neurons… with attention!
output

𝑤
%#

attention The animal didn’t cross the street because it was too ______

𝑤!:#$!
input
Neurons… with attention!
output

𝑤
%#

attention The animal didn’t cross the street because it was too ______

𝑤!:#$!
input
Neurons… with attention!
output

𝑤
%#

attention The animal didn’t cross the street because it was too ______

𝑤!:#$!
input
Neurons… with attention!
output

𝑤
%#

attention
attn The animal didn’t cross the street because it was too scared

𝑤!:#$!
input
Attention is (not) all you need
𝑤
)#

Prediction: select best output; decode

dense
Enrichment: attend to multiple contexts; add nonlinearities
attn ℎ

Preparation: tokenize; mark position; encode

𝑤!:#$!
Transformer
𝑤
)# Effective representation
◦ Can attend to entire context – no bottleneck
◦ Attention heads as representation subspaces
dense ◦ Order retained via positional encoding
Decoder
attn ℎ Efficient processing
𝑛
◦ Parallelization across tokens and heads
◦ Much faster training and inference
𝑤!:#$!
◦ Scalability to massive training datasets
Transformer architectures
𝑤
)!:% 𝑤
)# 𝑤
)#

Encoder Encoder Decoder Decoder

𝑛 𝑛 𝑛 𝑛

𝑤!:% 𝑤!:#$! 𝑤!:#$!

encoder-only encoder-decoder decoder-only

(e.g. BERT (2018)) (e.g. T5 (2019)) (e.g. GPT (2018))
The power of transfer learning

Self-supervised pretraining (expensive)

◦ Standard language modeling objective
◦ Train on massive textual corpora
Supervised fine-tuning (cheap)
◦ Multiple task-specific objectives
◦ Improved performance downstream
Evolution of language models

Statistical LMs Neural LMs Pretrained LMs Large LMs

(1950s-1990s) (2013) (2018) (2020)
Model size vs. time

GPT-1 GPT-2
117M 1.5B

2018 2019 2020 2021 2022 2023 2024

Model size vs. time

GPT-1 GPT-2 GPT-3

117M 1.5B 175B

2018 2019 2020 2021 2022 2023 2024

Model size vs. time

Are you guys

still there?

GPT-1 GPT-2 GPT-3 GPT-4

117M 1.5B 175B 1.76T*

2018 2019 2020 2021 2022 2023 2024

Model size vs. time
Advent of the Transformer
Availability of massive datasets
Access to powerful computing

GPT-4
1.76T*

2018 2019 2020 2021 2022 2023 2024

Credit: Google
Credit: Google
Credit: Mistral AI
Credit: Mistral AI
Credit: Anthropic
Credit: Anthropic
Credit: Reuters
The power of scaling

LLMs show improved performance with scale

◦ Increased model size (in trillions of parameters)
◦ Increased training size (in trillions of tokens)
Improvements in next token prediction
◦ But also in unforeseen capabilities!
Instruction following
PROMPT COMPLETION

Classify this review: Positive

I loved this film!
Sentiment:

LLM
Instruction following
PROMPT COMPLETION

Classify this review: received a very nice

I loved this film! book review
Sentiment:

LLM
In-context learning
PROMPT COMPLETION

Classify this review: Positive

I don’t like this chair!
Sentiment: Negative

Classify this review: LLM

I loved this film!
Sentiment:
Basic, emerging, augmented capabilities!
The challenges of scaling

System challenges
◦ Substantial compute and energy consumption
◦ Continual learning and adaptation
Data challenges
◦ Data quality and representativeness
◦ Low-resource domains and languages
The challenges of scaling

Human challenges
◦ Responsible alignment
◦ Interpretability and explainability
◦ Privacy and security
Course goals

Understand the fundamentals of LLMs

Explore the capabilities and limitations of LLMs
Keep up with the current state of the field
Have a grasp of where the field is headed
Course scope

LLM architectures – Transformers and beyond

LLM lifecycle
◦ Pretraining: data preparation, objectives
◦ Adaptation: instruction, alignment, PEFT/MEFT
◦ Utilization: prompting, in-context, augmentation
◦ Evaluation: language, downstream
Course structure (tentative)
Intro lectures by instructor Week Mon Wed
Paper seminars by students 18/03 G1 G2
◦ 1 group per class 25/03 G3 G4
(rotate every 2 weeks) 01/04 G1 G2
◦ 2 papers per group 08/04 G3 G4
(30min + 20min discussion) 15/04 G1 G2
◦ 2 students per paper 22/04 G3 G4
Course structure (tentative)
Week Mon Wed
Final paper list and 18/03 G1 G2
25/03 G3 G4
seminar schedule will
01/04 G1 G2
be available later 08/04 G3 G4
today for enrollment 15/04 G1 G2
22/04 G3 G4
Course grading

Seminar presentations
◦ 3x 20% = 60%
Seminar feedback
◦ 21x 1% = 21%
Class participation
◦ 21x 1% = 21%
Course attendance

❝
Os créditos relativos a cada disciplina só
serão conferidos ao aluno que obtiver, no
mínimo, o conceito D e que comprovar efetiva
frequência a, no mínimo, 75% (setenta e cinco
por cento) das atividades em que estiver
matriculado, vedado o abono de faltas.
NGPG, art. 65
Course materials: books & surveys

Build a Large Language Model (from Scratch)

by Raschka (2024)
Large Language Models: A Survey
by Minaee et al. (2024)
A Comprehensive Overview of Large Language Models
by Naveed et al. (2024)
Course materials: books & surveys

Efficient Large Language Models: A Survey

by Wan et al. (2024)
A Survey of Large Language Models
by Zhao et al. (2023)
Course materials: courses and tutorials

Generative AI with Large Language Models

by DeepLearning.AI / AWS
Large Language Models
by Databricks
Neural Networks: Zero to Hero
by Karpathy
Pre-course survey

Fill in a short survey describing your past experience

and expectations related to the course
◦ https://forms.gle/7mcatGc5LtAFM2ta7
UNIVERSIDADE*FEDERAL
DE*MINAS*GERAIS

Coming next…

Architecture of LLMs
Rodrygo L. T. Santos
[email protected]

Instant ebooks textbook Build a Large Language Model (From Scratch) (MEAP V01) Sebastian Raschka download all chapters
100% (3)
Instant ebooks textbook Build a Large Language Model (From Scratch) (MEAP V01) Sebastian Raschka download all chapters
34 pages
B.Tech. Project Report Sample Format
No ratings yet
B.Tech. Project Report Sample Format
17 pages
Beyond Effective Go: Part 1 - Achieving High-Performance Code
From Everand
Beyond Effective Go: Part 1 - Achieving High-Performance Code
Corey S Scott
No ratings yet
Large Language Models: Dr. Asgari, Dr. Rohban, Soleymani Fall 2023
No ratings yet
Large Language Models: Dr. Asgari, Dr. Rohban, Soleymani Fall 2023
53 pages
BCS document
No ratings yet
BCS document
6 pages
aa
No ratings yet
aa
11 pages
1719720399971
No ratings yet
1719720399971
51 pages
Large Language Models A Comprehensive Survey of It
No ratings yet
Large Language Models A Comprehensive Survey of It
30 pages
Scalexm - Ai: A Compact Guide To Large Language Models
No ratings yet
Scalexm - Ai: A Compact Guide To Large Language Models
9 pages
Large Language Model (LLM) 1
100% (1)
Large Language Model (LLM) 1
17 pages
AComprehensive Overviewof Large Language Models
No ratings yet
AComprehensive Overviewof Large Language Models
36 pages
Day 1
No ratings yet
Day 1
32 pages
Pranay Report
No ratings yet
Pranay Report
26 pages
Large Language Models and Their Use Cases
No ratings yet
Large Language Models and Their Use Cases
3 pages
A Survey of Large Language Models
No ratings yet
A Survey of Large Language Models
140 pages
Pranay Report-1
No ratings yet
Pranay Report-1
36 pages
A Survey of Large Language Models
No ratings yet
A Survey of Large Language Models
144 pages
A Survey of Large Language Models
No ratings yet
A Survey of Large Language Models
140 pages
LLMs
No ratings yet
LLMs
10 pages
Report - PDF 20240827 210738 0000
No ratings yet
Report - PDF 20240827 210738 0000
23 pages
Generative Artificial Intelligence_ Opportunities and Challenges of Large Language Models _ SpringerLink
No ratings yet
Generative Artificial Intelligence_ Opportunities and Challenges of Large Language Models _ SpringerLink
8 pages
ChatGPT in The Age of Generative AI and Large Lang
No ratings yet
ChatGPT in The Age of Generative AI and Large Lang
60 pages
A Survey of Large Language Models LLMs
No ratings yet
A Survey of Large Language Models LLMs
17 pages
LLM - Michael R Douglas
No ratings yet
LLM - Michael R Douglas
47 pages
A Review On Large Language Models Architectures Applications Taxonomies Open Issues and Challenges
No ratings yet
A Review On Large Language Models Architectures Applications Taxonomies Open Issues and Challenges
36 pages
(2303.18223) A Survey of Large Language Models
No ratings yet
(2303.18223) A Survey of Large Language Models
115 pages
Compact Guide To Large Language Models
No ratings yet
Compact Guide To Large Language Models
9 pages
LLM in Ai
No ratings yet
LLM in Ai
35 pages
LLM Survey
No ratings yet
LLM Survey
31 pages
Week4 LLMs EN
No ratings yet
Week4 LLMs EN
48 pages
seminar
No ratings yet
seminar
14 pages
2_notes (3)
No ratings yet
2_notes (3)
3 pages
A Survey of Large Language Models
No ratings yet
A Survey of Large Language Models
124 pages
Bueno Teoria 2307.06435
No ratings yet
Bueno Teoria 2307.06435
37 pages
A Review On Large Language Models Architectures Ap
No ratings yet
A Review On Large Language Models Architectures Ap
31 pages
large_language_models
No ratings yet
large_language_models
3 pages
D 02 Large Language Models
100% (1)
D 02 Large Language Models
58 pages
How LLM's Work, How GPT Was Trained, and How GPT Generates Outputs
No ratings yet
How LLM's Work, How GPT Was Trained, and How GPT Generates Outputs
12 pages
LLM_Review
No ratings yet
LLM_Review
16 pages
LLM 1
No ratings yet
LLM 1
6 pages
LLM_introduction 2024
No ratings yet
LLM_introduction 2024
77 pages
Data Seminar
No ratings yet
Data Seminar
10 pages
Kalyan 1 s2.0 S2949719123000456 Main
No ratings yet
Kalyan 1 s2.0 S2949719123000456 Main
48 pages
Downloed Papers
No ratings yet
Downloed Papers
700 pages
LLM model
No ratings yet
LLM model
3 pages
Large Language Models Johns Hopkins University
No ratings yet
Large Language Models Johns Hopkins University
54 pages
LLM 1 GPT
No ratings yet
LLM 1 GPT
12 pages
Gradivo ChatGPT in Umetna Inteligenca V Praksi
No ratings yet
Gradivo ChatGPT in Umetna Inteligenca V Praksi
38 pages
Exploring The Frontiers of LLMs in Psychological Applications
No ratings yet
Exploring The Frontiers of LLMs in Psychological Applications
34 pages
An Overview of Large Language Models for Statisticians
No ratings yet
An Overview of Large Language Models for Statisticians
67 pages
Decoding ChatGPT A Primer On Large Language Models For Clinicians
No ratings yet
Decoding ChatGPT A Primer On Large Language Models For Clinicians
4 pages
A Survey of Large Language Models
No ratings yet
A Survey of Large Language Models
58 pages
llms
No ratings yet
llms
3 pages
A Survey Large Language Models
No ratings yet
A Survey Large Language Models
58 pages
Survery On Fpga and LLM
No ratings yet
Survery On Fpga and LLM
16 pages
Survey On Large Language Models
No ratings yet
Survey On Large Language Models
52 pages
Quick Start Guide to Large Language Models Second Edition Sinan Ozdemir download pdf
100% (2)
Quick Start Guide to Large Language Models Second Edition Sinan Ozdemir download pdf
84 pages
A Comprehensive Overview of Large Language Models: Preprint 1
No ratings yet
A Comprehensive Overview of Large Language Models: Preprint 1
46 pages
Chen et al. - An Agile Framework for Efficient LLM Accelerator Development and Model Inference
No ratings yet
Chen et al. - An Agile Framework for Efficient LLM Accelerator Development and Model Inference
9 pages
LLM Intro
No ratings yet
LLM Intro
8 pages
C# Data Structures and Algorithms: Harness the power of C# to build a diverse range of efficient applications
From Everand
C# Data Structures and Algorithms: Harness the power of C# to build a diverse range of efficient applications
Marcin Jamro
No ratings yet
MST-QUIZ-1 - Attempt Review
No ratings yet
MST-QUIZ-1 - Attempt Review
13 pages
OOAD Short Note
No ratings yet
OOAD Short Note
41 pages
Theory of Automata - CS402 Spring 2012 Mid Term Paper
No ratings yet
Theory of Automata - CS402 Spring 2012 Mid Term Paper
23 pages
Posterior Predictions Cheat Sheet - v2 0
No ratings yet
Posterior Predictions Cheat Sheet - v2 0
4 pages
AITools Unit-4
No ratings yet
AITools Unit-4
25 pages
Lecture 9. ARIMA Models
No ratings yet
Lecture 9. ARIMA Models
16 pages
FSMs
No ratings yet
FSMs
15 pages
Exercises 4: Autocorrelated Processes ARIMA Models
No ratings yet
Exercises 4: Autocorrelated Processes ARIMA Models
45 pages
Beta N Weibul Distri
0% (1)
Beta N Weibul Distri
49 pages
Underfitting & Overfitting
No ratings yet
Underfitting & Overfitting
13 pages
Deep Learning With Keras and Tensorflow
No ratings yet
Deep Learning With Keras and Tensorflow
557 pages
Different Artificial Neural Networks Architectures
No ratings yet
Different Artificial Neural Networks Architectures
27 pages
Stationary Time Series Powerpoint
No ratings yet
Stationary Time Series Powerpoint
4 pages
UNIT 3
No ratings yet
UNIT 3
23 pages
Advanced Data Analytics: Simon Scheidegger - University of Lausanne, Department of Economics
No ratings yet
Advanced Data Analytics: Simon Scheidegger - University of Lausanne, Department of Economics
50 pages
Variational Autoencoders - Pre Quiz - Attempt Review
100% (2)
Variational Autoencoders - Pre Quiz - Attempt Review
4 pages
Ma324 2 PDF
No ratings yet
Ma324 2 PDF
1 page
CSA501_ QB Neural Network Deep Learning_updated2024
No ratings yet
CSA501_ QB Neural Network Deep Learning_updated2024
11 pages
Publishedpaper Bayesian 4
No ratings yet
Publishedpaper Bayesian 4
9 pages
(Ijeta-V8i5p4) :anupama Usha
No ratings yet
(Ijeta-V8i5p4) :anupama Usha
3 pages
Winsem2020-21 Eee1007 Eth Vl2020210500383 Model Question Paper Eee1007 QP
No ratings yet
Winsem2020-21 Eee1007 Eth Vl2020210500383 Model Question Paper Eee1007 QP
4 pages
ARMAX
No ratings yet
ARMAX
3 pages
Session 9-Finite State Machines
No ratings yet
Session 9-Finite State Machines
34 pages
UNIT-3
No ratings yet
UNIT-3
30 pages
Chapter10 Keras
No ratings yet
Chapter10 Keras
66 pages
Pendekatan Simulasi Untuk Analisis Antrian Pada Bengkel Servis Pt. X
No ratings yet
Pendekatan Simulasi Untuk Analisis Antrian Pada Bengkel Servis Pt. X
9 pages
Normal Distribution Table
No ratings yet
Normal Distribution Table
1 page
Feed-Forward Neural Networks (Part 2: Learning)
No ratings yet
Feed-Forward Neural Networks (Part 2: Learning)
17 pages
Time Series Linear Models
No ratings yet
Time Series Linear Models
121 pages

01 Introduction

Uploaded by

01 Introduction

Uploaded by

UNIVERSIDADE*FEDERAL

Advanced Seminars on Large Language Models

A natural ability for humans

A probability distribution over word sequences

Ideal (aka full dependence) model

Statistical LMs Neural LMs

recurrent dense ℎ% dense ℎ! dense ℎ&

Statistical LMs Neural LMs Pretrained LMs

Prediction: select best output; decode

Preparation: tokenize; mark position; encode

Encoder Encoder Decoder Decoder

𝑤!:% 𝑤!:#$! 𝑤!:#$!

encoder-only encoder-decoder decoder-only

Self-supervised pretraining (expensive)

Statistical LMs Neural LMs Pretrained LMs Large LMs

2018 2019 2020 2021 2022 2023 2024

GPT-1 GPT-2 GPT-3

2018 2019 2020 2021 2022 2023 2024

Are you guys

GPT-1 GPT-2 GPT-3 GPT-4

2018 2019 2020 2021 2022 2023 2024

2018 2019 2020 2021 2022 2023 2024

LLMs show improved performance with scale

Classify this review: Positive

Classify this review: received a very nice

Classify this review: Positive

Classify this review: LLM

Understand the fundamentals of LLMs

LLM architectures – Transformers and beyond

Build a Large Language Model (from Scratch)

Efficient Large Language Models: A Survey

Generative AI with Large Language Models

Fill in a short survey describing your past experience

You might also like