0% found this document useful (0 votes)

31 views69 pages

Brief Introduction To LLM

Uploaded by

Menna Saed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views69 pages

Brief Introduction To LLM

Uploaded by

Menna Saed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 69

A brief introduction to (large)

language models
Sachin Kumar
[email protected]
What are we going to talk about?
● The language modeling problem
● How do we learn a language model?
○ A quick primer on learning a model via gradient descent
○ The role of training data
● ✨✨ The Transformer ✨✨
○ The two things that make it such an improvement over our previous techniques for language
modeling
○ More detail about both of those two things
● From language models to large language models
● Large language models for chat (ala ChatGPT)
Quick poll
1. Are you familiar with supervised machine learning? gradient descent?

2. Are you familiar with neural networks?

The language modeling problem
Rank these sentences in the order of plausibility?

1. Jane went to the store.

2. store to Jane went the.
3. Jane went store.
4. Jane goed to the store.
5. The store went to Jane.
6. The food truck went to Jane.

How probable is a piece of text? Or what is p(text)

4
p(how are you this evening ? has your house ever been burgled ?) = 10-15
p(how are you this evening ? fine , thanks , how about you ?) = 10-9

5
The language modeling problem
A language model answers the question: What is p(text)?

Text is a sequence of symbols:

Just the chain rule of

probability– no simplifying
assumptions!
context
The language modeling problem

context

[the rest of the vocabulary]

Language models of this form can generate text

At each timestep, sample a token from the language model’s new probability
distribution over next tokens.

The ____

The students ____

The students opened ____

The students opened their ____

[the rest of the LM’s vocabulary]

In short, predicting which word comes next
Language models play the role of ...
● a judge of grammaticality
○ e.g., should prefer “The boy runs.” to “The boy run.”
● a judge of semantic plausibility
○ e.g., should prefer “The woman spoke.” to “The sandwich spoke.”
● an enforcer of stylistic consistency
○ e.g., should prefer “Hello, how are you this evening? Fine, thanks, how are you?” to
“Hello, how are you this evening? Has your house ever been burgled?”
● a repository of knowledge (?)
○ e.g., “Barack Obama was the 44th President of the United States”

Note that this is very difﬁcult to guarantee!

10
Language models in the news (these days, ChatGPT)

Image taken from Springboard

11
We use language models every day

12
We use language models every day

13
Why language modeling?
● Machine translation
○ p(strong winds) > p(large winds)

● Spelling correction
○ The office is about fifteen minuets from my house
○ p(about fifteen minutes from) > p(about fifteen minuets from)

● Speech recognition
○ p(I saw a van) >> p(eyes awe of an)

● Summarization, question-answering, handwriting recognition, OCR, etc.

14
How we learn a language model
Language modeling

a very large corpus language model

16
How do we learn a language model?
Estimate probabilities using text data

● Collect a textual corpus

● Find a distribution that maximizes the probability of the corpus – maximum likelihood estimation

A naive solution: count and divide

● Assume we have N training sentences

● Let x1, x2, …, xn be a sentence, and c(x1, x2, …, xn) be the number of times it appeared in the training data.
● Define a language model:

No generalization!
Markov assumption
● We make the Markov assumption: x(t+1) depends only on the preceding n-1
words

assumption

n-1 words

18
Markov assumption

or maybe even

19
n-gram Language Models