0% found this document useful (0 votes)

20 views73 pages

2023 07 28 Evolution of Language Models

Uploaded by

anand.ashish

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views73 pages

2023 07 28 Evolution of Language Models

Uploaded by

anand.ashish

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 73

LANGUAGE MODELS AND

ITS APPLICATIONS ON
HPC

Faculty Development Programme

CDAC Centre in North East

Ashish Anand
Professor, Dept of CSE, IIT Guwahati
Associate Faculty, Mehta Family School of DS and AI, IIT
Guwahati
What this talk is about?

•Defining Language Model

•Major Language model Paradigms

•Implicit: Role of HPC

QUICK WARM-UP
TO NLP
Application I: Automatic text
completion
Application II: Spelling Correction

• Spelling
correction: Study was conducted by
students vs study was conducted be students
Application III: Words, Meaning and
Representation
• Similar Words

• Synonyms

• Word Sense Disambiguation

• I went to a bank to deposit money
• I went to a bank to see calm water currents
Application IV: Sentiment
Classification

I like this laptop

My new laptop is not good for

computational intensive task

Watching a lecture on my new

laptop
Application V: Named Entity
Recognition (NER)
Application VI: Machine
Translation
• Source Sentence: I have asked him to do homework

• Target Sentence:
Many more applications …..
DEFINING
LANGUAGE MODEL
Let’s look at some examples

•Predicting next word

• I am planning ……..

•Speech Recognition
•I saw a van vs eyes awe an
Example continued

•Spelling correction
• Study was conducted by students vs study was
conducted be students
• Their are two exams for this course vs There are
two exams for this course

•Machine Translation
•I have asked him to do homework
• मैंने उससे पूछा कि होमवर्क करने के लिए
• मैंने उसे होमवर्क करने के लिए कहा
In each of the example, objective
is either
•To find next probable word
•To find which sentence is more likely to be
true

Translating in terms of problem formulation

•Finding probability of a word given a context
•Finding probability of a sentence given a
context
Language Models (LM)

•Models assigning probabilities to a sequence

of words

•P(I saw a van) > P(eyes awe an)

•P(मैंने उससे पूछा कि होमवर्क करने के लिए)

< P(मैंने उसे होमवर्क करने के लिए कहा)
Statistical LM
Estimating Probability of a
sequence
•Our task is to compute
P(I, am, fascinated, with, recent, advances,
in, AI)

•Chain Rule
Estimating Probability of a
sequence
Estimating P(w1, w2, .., wn)
•
Estimating P(w1, w2, .., wn)
• Too many possible sentences
• Data sparseness
• Poor generalizability
Markov Assumption

•
Markov Assumption

•
MLE of N-gram models
• Unigram (Simplest Model)

• Bigram (1st order Markov Model)

• Trigram (2nd order Markov Model)

Trigram Model in Summary
Problem with MLE

•Works well if test corpus is very similar to

training, which is not generally the case

•Sparsity Issue
• OOV : Can be solved by having <UNK> category
• Words are present in corpus but relevant counts
are zero
• Underestimation of such probabilities
N-gram Model: Issue

• Long-distance dependencies

“The computer which I had just put into the lab on the
fifth floor crashed”
SMOOTHING
TECHNIQUES
Simplest Approach: Additive
Smoothing
• Add-1 Smoothing

𝑐 ( 𝑤 𝑖 −2 ,𝑤 𝑖 −1 ) + ¿ 𝒱 ∨¿ ¿
𝑐 ( 𝑤 𝑖 −2 ,𝑤 𝑖 −1 , 𝑤𝑖 ) + 1
𝑝 𝑚𝑙𝑒 ( 𝑤𝑖|𝑤 𝑖 −2 , 𝑤𝑖 −1 ¿=

• Generalized version

𝑐 ( 𝑤 𝑖 −2 ,𝑤 𝑖 −1 ) + 𝛿∨𝒱 ∨¿ ¿
𝑐 ( 𝑤 𝑖 −2 , 𝑤𝑖−1 , 𝑤𝑖 ) + 𝛿
𝑝 𝑚𝑙𝑒 ( 𝑤𝑖|𝑤 𝑖 −2 , 𝑤𝑖 −1 ¿=
Take the help of lower order
models
• Bigram Example
• c(w1, w2) = 0 = c(w1, w2’)

• padd (w2 | w1 ) = padd (w2’| w1 )

• Lets assume p(w2‘) < p(w2)

• We should expect padd (w2 | w1 ) > padd (w2’| w1 )

Take the help of lower order
models
• Linear Interpolation Models

• Discounting Models
NEURAL LANGUAGE
MODEL
Pre-Transformer Era
Feed-Forward Neural Language Model

Bengio et al, JMLR 03

Advantages over statistical n-gram
models
• Better flexibility in considering larger context
Advantages over statistical n-gram
models
• Better generalizability
• Can generalize to context not seen during the training
• Example: Need to estimate P(reading | ram is)
• Assume if “ram is reading” is not in training, however, “john
is reading” is present. And there are other sentences like
“Ram is writing”, “john is writing”
• It is likely that word representations learned by models for
“ram” and “john” is similar, then the model will also give
similar probability to P(reading | ram is) as to P(reading |
john is)
Major drawbacks

• Inefficient
• Unable to exploit sequential nature of text
• Limited Context
• Unidirectional
HANDLING THE
DRAWBACKS
Inefficiency: Hierarchical Softmax

Neural Network Lectures by Hugo Leorchelle

Limited Context and Sequential
Nature: RNN-LM

Jurafsky and Martin, Speech and Language Processing, 3 rd Draft Jan 2022 Ed
Unidirectional: ELMo

Source: Devlin et al. NAACL 2019

Unidirectional: ELMo

Source: Devlin et al. NAACL 2019

Issues with RNN-based LMs

• Limited bi-directionality
• Difficult to parallelize
TRANSFORMER-
BASED LM
Era of Large-scale pre-trained neural language models
Vanilla Transformer Language Model

Jurafsky and Martin, Speech and Language Processing, 3 rd Draft Jan 2022 Ed
Three Paradigms

• Pre-train then fine-tune

• Prompt-based learning
• NLP as text generation
PARADIGM
Pre-train then fine-tune
Three classes of Pre-trained LM (PLMs)

• Decoder-only Models
• Example: GPT

• Encoder-only Models
• Example: BERT

• Encoder-Decoder Language Models

• Example: BART, T5
Three Types of Training Objectives

• Autoregressive (AR): Predicting next word given a

set of previous words
• Masking:Predict masked/hidden word, given words
on both sides of it
• Denoising Tasks: Correct the perturbation/corruption
given in the input word sequence
• Perturbation Examples: sentence permutation, span deletion etc.
Autoregressive Language Models

Source: Min et al. arxiv:2111.01243v1 (ACM Computing Surveys)

Autoregressive Language Models

Source: Min et al. arxiv:2111.01243v1 (ACM Computing Surveys)

Autoregressive Language Models

 Unidirectional

 Taking sequential nature into

account

 Useful for natural language

generation downstream tasks

Source: Min et al. arxiv:2111.01243v1 (ACM Computing Surveys)

Masked Language Model

Source: Min et al. arxiv:2111.01243v1 (ACM Computing Surveys)

Masked Language Model

Source: Min et al. arxiv:2111.01243v1

Masked Language Model
 Inherently bi-directional

 Non-autoregressive nature allows

parallelized computation during
inference

 Making independence assumption

 Issue with pretrain-finetune

discrepancy due to corrupted input

 Useful for classification tasks

Source: Min et al. arxiv:2111.01243v1 (ACM Computing Surveys)
Encoder-Decoder Models

Source: Min et al. arxiv:2111.01243v1 (ACM Computing Surveys)

Encoder-Decoder Models

Source: Min et al. arxiv:2111.01243v1 (ACM Computing Surveys)

Encoder-Decoder Model

 bi-directional and sequence to

sequence learning

 Combining the advantage of bi-

directional and autoregressive
models

Reconstructed Original
 Very convenient to fine-tune for
Sequence: Output seq2seq tasks (e.g. MT,
Corrupted Sequence:
Input
Summarization etc.)
Source: Min et al. arxiv:2111.01243v1 (ACM Computing Surveys)
Important aspects about pre-training
corpora
• Size
• Quality (source data characteristics) and diversity
• Domain of intended downstream tasks
Pre-training corpora of a few LMs

Source: Min et al. arxiv:2111.01243v1 (ACM Computing Surveys)

Pre-trained LMs on specific domain

Source: Min et al. arxiv:2111.01243v1 (ACM Computing Surveys)

Pre-trained LMs on different
languages

Source: Min et al. arxiv:2111.01243v1 (ACM Computing Surveys)

Fine-Tuning

Source: Min et al. arxiv:2111.01243v1 (ACM Computing Surveys)

PARADIGM
Prompt Based Learning
What is prompting?

• “adding natural language text or continuous vectors

to the input or output to encourage PLMs to perform
specific tasks”
Advantages

• For
in-context or in-domain learning, may not
require to fine-tune PLMs, thus reduces
computational requirements
• Allowsbetter alignment of the new task with the
pre-train objective
Main Prompt-based approaches

Source: Min et al. arxiv:2111.01243v1 (ACM Computing Surveys)

Instruction based prompting

Source: Min et al. arxiv:2111.01243v1 (ACM Computing Surveys)

Template based prompting

Source: Min et al. arxiv:2111.01243v1 (ACM Computing Surveys)

PARADIGM
NLP as text generation
Source: Min et al. arxiv:2111.01243v1 (ACM Computing Surveys)
https://levelup.gitconnected.com/the-brief-history-of-large-language-models-a-journey-from-eliza-to-gpt-4-and-google-
bard-167c614af5af
Summary of Paradigm Shifts

• Contextual
Move LMs Embedding
from • Increased • More
Statistic Probabilis
rule/gramm Transform
tic context generalize
al ar-based er
Neural • Word d
models to PLMs
Models data-driven Models Embedding • Multi-
models modality
• Multi-Task

Source: Min et al. arxiv:2111.01243v1

References

• Jurafskyand Martin, Speech and Language

Processing, 3rd Ed. Draft [Available at
https://web.stanford.edu/~jurafsky/slp3/ ]
• Minet al., Recent advances in natural language
processing via large pre-trained language models: A
survey. ACM Computing Surveys
Thanks!
Question and Comments!

50 LLM Interview Questions
100% (1)
50 LLM Interview Questions
56 pages
Transfer Learning in Natural Language Processing PDF
0% (1)
Transfer Learning in Natural Language Processing PDF
238 pages
Large Language Models (LLM)
100% (1)
Large Language Models (LLM)
139 pages
Foundations of Large Language Models 1738142777
No ratings yet
Foundations of Large Language Models 1738142777
101 pages
Foundations of LLM
No ratings yet
Foundations of LLM
231 pages
Mod 4
No ratings yet
Mod 4
69 pages
Cs224n 2023 Lecture05 RNNLM
No ratings yet
Cs224n 2023 Lecture05 RNNLM
68 pages
cs224n 2023 Lecture9 Pretraining
No ratings yet
cs224n 2023 Lecture9 Pretraining
54 pages
Recent Advances in Natural Language Processing Via Large Pre-Trained Language Models-A Survey
No ratings yet
Recent Advances in Natural Language Processing Via Large Pre-Trained Language Models-A Survey
40 pages
Lec20 LLM
No ratings yet
Lec20 LLM
58 pages
(2303.18223) A Survey of Large Language Models
No ratings yet
(2303.18223) A Survey of Large Language Models
115 pages
L5 Cse256 Fa24 LM
No ratings yet
L5 Cse256 Fa24 LM
65 pages
2022 Foundations Tutorial3 Sunwang Deeplearning4nlp
No ratings yet
2022 Foundations Tutorial3 Sunwang Deeplearning4nlp
103 pages
Week4 LLMs EN
No ratings yet
Week4 LLMs EN
48 pages
XCS224N Module4 Slides
No ratings yet
XCS224N Module4 Slides
91 pages
Summaries of The Chapters
No ratings yet
Summaries of The Chapters
29 pages
Brief Introduction To LLM
No ratings yet
Brief Introduction To LLM
69 pages
Large Language Model (LLM) 1
100% (1)
Large Language Model (LLM) 1
17 pages
(IJETA-V11I3P37) :anantika Johari, Rishabh Sharma, Aanchal Meena, Vansh Tiwari
No ratings yet
(IJETA-V11I3P37) :anantika Johari, Rishabh Sharma, Aanchal Meena, Vansh Tiwari
9 pages
LLM Book 43-102
No ratings yet
LLM Book 43-102
60 pages
Lecture # 13-3 BERT
No ratings yet
Lecture # 13-3 BERT
63 pages
1 s2.0 S2095809922006324 Main
No ratings yet
1 s2.0 S2095809922006324 Main
20 pages
Large Language Models Johns Hopkins University
No ratings yet
Large Language Models Johns Hopkins University
54 pages
Advancement in NLP Paper
No ratings yet
Advancement in NLP Paper
49 pages
AI Quiz ch3
No ratings yet
AI Quiz ch3
29 pages
01-Transformer Based NLP Applications
No ratings yet
01-Transformer Based NLP Applications
55 pages
Pre Trained Models For NLP
No ratings yet
Pre Trained Models For NLP
15 pages
NLP Unit-4
No ratings yet
NLP Unit-4
62 pages
2 Generative Models
No ratings yet
2 Generative Models
60 pages
Paper 1
No ratings yet
Paper 1
44 pages
NLP Unit-5.2 Notes
No ratings yet
NLP Unit-5.2 Notes
72 pages
Unit-3NaturalLanguageProcessing (NLP) 1 T1743588944524
No ratings yet
Unit-3NaturalLanguageProcessing (NLP) 1 T1743588944524
83 pages
Module1 L4 LLMs New
No ratings yet
Module1 L4 LLMs New
37 pages
A M3 RD Ipjn Yd Ps GKF
No ratings yet
A M3 RD Ipjn Yd Ps GKF
20 pages
Large Language Models: Dr. Asgari, Dr. Rohban, Soleymani Fall 2023
No ratings yet
Large Language Models: Dr. Asgari, Dr. Rohban, Soleymani Fall 2023
53 pages
14 LookingForward
No ratings yet
14 LookingForward
48 pages
AI4youngster - 6 - Topic NLP
No ratings yet
AI4youngster - 6 - Topic NLP
66 pages
LLM Prompting & In-Context Learning
No ratings yet
LLM Prompting & In-Context Learning
18 pages
NLP Short Que Ans
No ratings yet
NLP Short Que Ans
21 pages
Hocken Maier 25
No ratings yet
Hocken Maier 25
46 pages
Thoughts On NLP Research in The (Post-) LLM Era: Yijia Shao Yuanpei College 2023/04/28
No ratings yet
Thoughts On NLP Research in The (Post-) LLM Era: Yijia Shao Yuanpei College 2023/04/28
51 pages
Large Language Models A Comprehensive Survey of It
No ratings yet
Large Language Models A Comprehensive Survey of It
30 pages
Language Models
No ratings yet
Language Models
11 pages
MTH MLP
No ratings yet
MTH MLP
6 pages
AI and Prompt
No ratings yet
AI and Prompt
18 pages
NLP End Sem
No ratings yet
NLP End Sem
6 pages
Intro To Language Models - Soumyasis Mishra - 191001021003 - BCS4C
No ratings yet
Intro To Language Models - Soumyasis Mishra - 191001021003 - BCS4C
10 pages
NLP Internal
No ratings yet
NLP Internal
15 pages
Day 1
No ratings yet
Day 1
32 pages
Unit 5 Language Modeling Notes
No ratings yet
Unit 5 Language Modeling Notes
3 pages
NLP Crash Course Comprehensive
No ratings yet
NLP Crash Course Comprehensive
2 pages
Exploring The Evolution of Large Language Models: Architectures, Applications, and Future Directions
No ratings yet
Exploring The Evolution of Large Language Models: Architectures, Applications, and Future Directions
11 pages
LLM 1
No ratings yet
LLM 1
6 pages
NLP Sem Unit 5
No ratings yet
NLP Sem Unit 5
9 pages
Natural Language Processing
No ratings yet
Natural Language Processing
6 pages
AI Primer
No ratings yet
AI Primer
12 pages
Summary - Foundations On LLMs
No ratings yet
Summary - Foundations On LLMs
6 pages
Beginner's Guide To Kirigami 24 Skill Building Projects For The Absolute Beginner Exclusive Download
100% (12)
Beginner's Guide To Kirigami 24 Skill Building Projects For The Absolute Beginner Exclusive Download
15 pages
Test Questions in Professional Education
No ratings yet
Test Questions in Professional Education
16 pages
Writing Drills - Answer 2012: A) Exercise 1A: Formal Letter
No ratings yet
Writing Drills - Answer 2012: A) Exercise 1A: Formal Letter
10 pages
Formulation of Objective
No ratings yet
Formulation of Objective
16 pages
Catch Up Friday Research
No ratings yet
Catch Up Friday Research
1 page
Dismantling Naik
No ratings yet
Dismantling Naik
45 pages
Bab 9 Akm
No ratings yet
Bab 9 Akm
44 pages
Big Data in Healthcare Systems and Research
No ratings yet
Big Data in Healthcare Systems and Research
4 pages
Yaskawa SGMGV
No ratings yet
Yaskawa SGMGV
24 pages
ED-UCCP-201401A-Packaged Water Cool PDF
No ratings yet
ED-UCCP-201401A-Packaged Water Cool PDF
38 pages
Environment Notes by Akshay Jadhav Sir Rank52
No ratings yet
Environment Notes by Akshay Jadhav Sir Rank52
176 pages
Repulsion Motor
100% (1)
Repulsion Motor
12 pages
Batangas State University Graduate School
No ratings yet
Batangas State University Graduate School
9 pages
2 新车准备
No ratings yet
2 新车准备
7 pages
OOPS Lab File
No ratings yet
OOPS Lab File
60 pages
The Gomti Riverfront in Lucknow, India: Revitalization of A Cultural Heritage Landscape
No ratings yet
The Gomti Riverfront in Lucknow, India: Revitalization of A Cultural Heritage Landscape
20 pages
10769-Article TexDepicting Cassowaries in The Qing Court
No ratings yet
10769-Article TexDepicting Cassowaries in The Qing Court
94 pages
Magdala de Nemure Volume 1
No ratings yet
Magdala de Nemure Volume 1
271 pages
T B T S S B D: HE Ig and HE Mall Ides of IG ATA
No ratings yet
T B T S S B D: HE Ig and HE Mall Ides of IG ATA
87 pages
BOQs 444
No ratings yet
BOQs 444
33 pages
RDBMS Unit2
No ratings yet
RDBMS Unit2
28 pages
Dept. of Chemistry, Rajabazar Science College, 92-Acharya Prafulla Chandra Road, University of Calcutta, Kolkata - 700009, West Bengal, India
No ratings yet
Dept. of Chemistry, Rajabazar Science College, 92-Acharya Prafulla Chandra Road, University of Calcutta, Kolkata - 700009, West Bengal, India
6 pages
Explosion Proof Pressure Transmitter: Model PT124B-282 Intelligent Type
No ratings yet
Explosion Proof Pressure Transmitter: Model PT124B-282 Intelligent Type
2 pages
Abdul - Azeez Bin Abdullaah Bin Baaz
No ratings yet
Abdul - Azeez Bin Abdullaah Bin Baaz
4 pages
Java Lab Cycle Programs 2022
No ratings yet
Java Lab Cycle Programs 2022
2 pages
Reserch Proposal Raneesha
No ratings yet
Reserch Proposal Raneesha
22 pages
KALAnnualReport2016 17
No ratings yet
KALAnnualReport2016 17
92 pages
New Microsoft Word Document
No ratings yet
New Microsoft Word Document
2 pages
2ND Performance Task in Science
No ratings yet
2ND Performance Task in Science
6 pages
Adjeivos Comparativos y Superativos Teoria y Practica
No ratings yet
Adjeivos Comparativos y Superativos Teoria y Practica
4 pages

2023 07 28 Evolution of Language Models

Uploaded by

2023 07 28 Evolution of Language Models

Uploaded by

LANGUAGE MODELS AND

Faculty Development Programme

•Defining Language Model

•Major Language model Paradigms

•Implicit: Role of HPC

• Word Sense Disambiguation

I like this laptop

My new laptop is not good for

Watching a lecture on my new

•Predicting next word

Translating in terms of problem formulation

•Models assigning probabilities to a sequence

•P(I saw a van) > P(eyes awe an)

•P(मैंने उससे पूछा कि होमवर्क करने के लिए)

• Bigram (1st order Markov Model)

• Trigram (2nd order Markov Model)

•Works well if test corpus is very similar to

• padd (w2 | w1 ) = padd (w2’| w1 )

• Lets assume p(w2‘) < p(w2)

• We should expect padd (w2 | w1 ) > padd (w2’| w1 )

Bengio et al, JMLR 03

Neural Network Lectures by Hugo Leorchelle

Source: Devlin et al. NAACL 2019

Source: Devlin et al. NAACL 2019

• Pre-train then fine-tune

• Encoder-Decoder Language Models

• Autoregressive (AR): Predicting next word given a

Source: Min et al. arxiv:2111.01243v1 (ACM Computing Surveys)

Source: Min et al. arxiv:2111.01243v1 (ACM Computing Surveys)

 Taking sequential nature into

 Useful for natural language

Source: Min et al. arxiv:2111.01243v1 (ACM Computing Surveys)

Source: Min et al. arxiv:2111.01243v1 (ACM Computing Surveys)

Source: Min et al. arxiv:2111.01243v1

 Non-autoregressive nature allows

 Making independence assumption

 Issue with pretrain-finetune

 Useful for classification tasks

Source: Min et al. arxiv:2111.01243v1 (ACM Computing Surveys)

Source: Min et al. arxiv:2111.01243v1 (ACM Computing Surveys)

 bi-directional and sequence to

 Combining the advantage of bi-

Source: Min et al. arxiv:2111.01243v1 (ACM Computing Surveys)

Source: Min et al. arxiv:2111.01243v1 (ACM Computing Surveys)

Source: Min et al. arxiv:2111.01243v1 (ACM Computing Surveys)

Source: Min et al. arxiv:2111.01243v1 (ACM Computing Surveys)

• “adding natural language text or continuous vectors

Source: Min et al. arxiv:2111.01243v1 (ACM Computing Surveys)

Source: Min et al. arxiv:2111.01243v1 (ACM Computing Surveys)

Source: Min et al. arxiv:2111.01243v1 (ACM Computing Surveys)

Source: Min et al. arxiv:2111.01243v1

• Jurafskyand Martin, Speech and Language

You might also like