0% found this document useful (0 votes)

10 views33 pages

NLP Week 03

The document discusses language models in natural language processing, explaining their role in predicting upcoming words and assigning probabilities to sentences. It covers concepts such as n-grams, the chain rule of probability, and the Markov assumption, which simplifies the prediction process by considering only the most recent words. Additionally, it introduces methods for estimating probabilities, including maximum likelihood estimation and the use of logarithms to manage numerical underflow.

Uploaded by

Faizad Ullah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views33 pages

NLP Week 03

Uploaded by

Faizad Ullah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

CSCS 366 – Intro.

to NLP
Faizad Ullah

1
Language Models

2
Language Models
• A language model is a machine learning model LM that predicts
upcoming words.

• More formally, a language model assigns a probability to each

possible next word, or equivalently gives a probability distribution
over possible next works.

• Language models can also assign a probability to an entire

sentence.
Language Models
Summary
• For independent events A and B:

• For independent events A b and C:

• For dependent event A and B:

• For dependent events A, B and C:

Language Models

6
Probability of the next word

• Islamabad is a capital of ___________

• The baby started ____ when she saw her mother.

• The weather today is very ____.

‫آج موسم بہت ____ ہے۔‬

‫میں صبح اٹھ کر سب سے پہلے ____ پیتا ہوں۔‬

‫بارش کے بعد ہوا بہت ____ ہو گئی۔‬

Uses

• Machine Translation

• Spell Correction

• Speech Recognition

• Text Generation …
Probability of a sentence
• P(all of a sudden I notice three guys standing on the sidewalk)?

• P(on guys all I of notice sidewalk three a sudden standing the)?

• Counting of such sentences?

N-Gram: Basics of Counting
• Sources for Language models are Corpora

• Count words forms not lemmas

N-Grams
• Let’s begin with the task of computing P(w|h), the probability of a
word w given some history h.

• Suppose:
• h = “The water of Walden Pond is so beautifully”

• The probability of blue is:

• P(w|h) = P(blue|The water of Walden Pond is so beautifully)
N-Grams
• One way to count this is by relative frequency.
• Relative frequency-dividing the observed frequency of a particular
sequence by the observed frequency of a prefix.
• This would be answering the question “Out of the times we saw
the history h, how many times was it followed by the word w”, as
follows:
N-Grams
• If we had a large enough corpus, we could compute these two counts
and estimate the probability.

• Let see on the Google search engine.

• “FCCU is the best university in Pakistan”

• Why?

• This is because language is creative.

Calculate the probability of a sentence
• We need more clever ways to estimate P(w|h) or the probability
of an entire word sequence W.

• Let W = w1, w2, …, wn be a sentence

• P(W) = P(w1,w2, …, wn)

• Now, the question is how we compute the P(w1,w2,w3,…wn)?

• P(x,y) = p(x).p(y) if x and y are independent.
• P(x,y) = P(x).P(y|x) otherwise
Chain rule of probability

15
Chain rule of probability
• The chain rule shows the link between computing the joint
probability of a sequence and computing the conditional
probability of a word given previous words.

• We could estimate the joint probability of an entire sequence of

words by multiplying together a number of conditional
probabilities.
Chain rule of probability

P(W) = P(w1).P(w2|w1).P(w3|w1w2).P(w4|w1w2w3)...P(wn|w1w2…wn-1)
The Markov assumption

18
The Markov assumption
• The probability of a word depends only on the previous word is
called a Markov assumption.
The Markov assumption (n-gram)
• The intuition of the n-gram model is that instead of computing the
probability of a word given its entire history, we can approximate
the history by just the last few words.
• N=1 → Unigram

• N=2 → Bigram

• N=3 → Trigram

• N=4→ 4-gram
How to estimate probabilities
• In simple words, an intuitive way to estimate probabilities is called
maximum likelihood estimation or MLE.

• We get the MLE estimate for the parameters of an n-gram model

by getting counts from a corpus, and normalizing the counts so
that they lie between 0 and 1.
How to estimate probabilities

• A bigram probability of a word wn given a previous word wn-1

1. Compute the count of the bigram C(wn-1wn)

2. Normalize by the sum of all the bigrams that share the same
first word wn-1:
How to estimate probabilities: an example
• Let’s work through an example using a mini-corpus of three
sentences.
Estimates the n-gram probability by
• Augmenting sentences with <s> and </s>. dividing the observed frequency of a
particular sequence by the observed
frequency of a prefix.
How to estimate probabilities: Unigram

Unigram Count Probability

I 3 3/|N|
Sam 2 2/|N|
am 2 2/|N|
do 1 1/|N|
…
How to estimate probabilities: Bigram

Bigram Count Probability

<s> I 2 ?
I am 2 ?
am Sam 1 ?
Sam </s> 1 ?
…
How to estimate probabilities: Trigram

Trigram Count Probability

<s> <s> I 2 ?
<s> I am 1 ?
I am Sam 1 ?
am Sam </s> 1 ?
…
Log probabilities
• Probabilities are always between 0 and 1.
• When we multiply many small probabilities, the result becomes
even smaller and can approach zero (a problem called numerical
underflow).
• To avoid this, we use logarithms:
Naïve Bayes Classifier

28
The Bayes Theorem
• Generally, we want the most probable hypothesis given the training data

• Maximum a posteriori hypothesis 𝒉𝑴𝑨𝑷:

Naïve Bayes Classifier
Naïve Bayes Classifier
• 20 Not Spam Emails: • 10 Spam Emails:
• Dear = 15 • Dear = 10
• Friend = 8 • Friend = 5
• Money = 1 • Money = 8
• Bank = 2 • Bank = 7
• Win = 0 • Win = 6

• P(N)=? • P(N)=?
• P(Vi)=? • P(Vi)=?
Sources

• https://web.stanford.edu/~jurafsky/slp3/3.pdf

• https://web.stanford.edu/~jurafsky/slp3/4.pdf

All About History - Book of Ancient Greece
100% (6)
All About History - Book of Ancient Greece
165 pages
The Open Work
100% (2)
The Open Work
160 pages
NLP - Module 2
No ratings yet
NLP - Module 2
77 pages
N Grams - Nptel Notes
No ratings yet
N Grams - Nptel Notes
75 pages
1 John Quiz
No ratings yet
1 John Quiz
12 pages
Pakistani English by Tariq Rehman PDF
100% (1)
Pakistani English by Tariq Rehman PDF
101 pages
English 5: Quarter 4 - Module 2B: Writing Paragraphs Showing Comparison and Contrast Relationships
100% (7)
English 5: Quarter 4 - Module 2B: Writing Paragraphs Showing Comparison and Contrast Relationships
21 pages
Lecture 5: Language Modeling (N-Gram, BOW)
No ratings yet
Lecture 5: Language Modeling (N-Gram, BOW)
25 pages
Notes of NLP - Unit-2
No ratings yet
Notes of NLP - Unit-2
23 pages
Module 1 - Art Appreciation
100% (1)
Module 1 - Art Appreciation
11 pages
L3 LanguageModels
No ratings yet
L3 LanguageModels
118 pages
NLP UNIT III (Part 1)
No ratings yet
NLP UNIT III (Part 1)
15 pages
Adv. Natural Language Processing: Instructor: Dr. Muhammad Asfand-E-Yar
No ratings yet
Adv. Natural Language Processing: Instructor: Dr. Muhammad Asfand-E-Yar
54 pages
Language Modeling: Introduction To N-Grams
No ratings yet
Language Modeling: Introduction To N-Grams
79 pages
NLP Cat 2
No ratings yet
NLP Cat 2
78 pages
Language Models: Instructor: Rada Mihalcea Taught by Bonnie Dorr at Univ. of Maryland
No ratings yet
Language Models: Instructor: Rada Mihalcea Taught by Bonnie Dorr at Univ. of Maryland
74 pages
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
No ratings yet
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
28 pages
Week 3
No ratings yet
Week 3
24 pages
Session 2-3 Language Modeling
No ratings yet
Session 2-3 Language Modeling
69 pages
NLP Lec 11
No ratings yet
NLP Lec 11
6 pages
NLP PLM
No ratings yet
NLP PLM
35 pages
N-Gram Language Models
No ratings yet
N-Gram Language Models
26 pages
N-Gram Language Model: Based On Speech and Language Processing. Daniel Jurafsky & James H. Martin Book, 2023
No ratings yet
N-Gram Language Model: Based On Speech and Language Processing. Daniel Jurafsky & James H. Martin Book, 2023
46 pages
3 LM 2024
No ratings yet
3 LM 2024
78 pages
Lec 3
No ratings yet
Lec 3
51 pages
NLP CH 2
No ratings yet
NLP CH 2
59 pages
Lecture 3 - Language Modelling and RNNs Part 1
No ratings yet
Lecture 3 - Language Modelling and RNNs Part 1
44 pages
Language Models
No ratings yet
Language Models
34 pages
Lecture 4
No ratings yet
Lecture 4
87 pages
3 LM Jan 08 2021
No ratings yet
3 LM Jan 08 2021
77 pages
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
No ratings yet
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
28 pages
N-Gram Language Models Lecture
No ratings yet
N-Gram Language Models Lecture
56 pages
01 Introduction To N-Grams 8-41
No ratings yet
01 Introduction To N-Grams 8-41
13 pages
Language Modeling
No ratings yet
Language Modeling
43 pages
Statistical Inference
No ratings yet
Statistical Inference
38 pages
Lecture - 3 - Statistical Language Models
No ratings yet
Lecture - 3 - Statistical Language Models
56 pages
Lecture 4 N Grams
No ratings yet
Lecture 4 N Grams
29 pages
NLP Week4 Ngrams
No ratings yet
NLP Week4 Ngrams
60 pages
Introduction To Language Modeling Final
No ratings yet
Introduction To Language Modeling Final
69 pages
LM 24 Aug
No ratings yet
LM 24 Aug
84 pages
Lecture5 Ngrams
No ratings yet
Lecture5 Ngrams
40 pages
04 Language Modeling
No ratings yet
04 Language Modeling
70 pages
Lecture 4
No ratings yet
Lecture 4
37 pages
Multimedia Application L6
No ratings yet
Multimedia Application L6
63 pages
N-Gram Language Models Lecture
No ratings yet
N-Gram Language Models Lecture
59 pages
6.chapter6 LanguageModel
No ratings yet
6.chapter6 LanguageModel
33 pages
Lecture13 LM YirenWang
No ratings yet
Lecture13 LM YirenWang
8 pages
Unit 3-Notes AI
No ratings yet
Unit 3-Notes AI
36 pages
Multimedia Application L5
No ratings yet
Multimedia Application L5
35 pages
N Grams
No ratings yet
N Grams
51 pages
Language Modelling
No ratings yet
Language Modelling
3 pages
NLP Unit-4
No ratings yet
NLP Unit-4
48 pages
NLP Unit2
No ratings yet
NLP Unit2
65 pages
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
No ratings yet
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
28 pages
CME4408 P5 N-Grams Smooting
No ratings yet
CME4408 P5 N-Grams Smooting
43 pages
Language Modelling
No ratings yet
Language Modelling
17 pages
13 Ngramlm
No ratings yet
13 Ngramlm
27 pages
Cad N FL, Reviewer, Let
100% (1)
Cad N FL, Reviewer, Let
43 pages
14 Ngramlm
No ratings yet
14 Ngramlm
67 pages
Module 2
No ratings yet
Module 2
26 pages
Importance of Language Laboratory in Developing La
No ratings yet
Importance of Language Laboratory in Developing La
6 pages
Lec-3 Language Modeling N-Grams
No ratings yet
Lec-3 Language Modeling N-Grams
41 pages
Lec15 17 N Gram Language Model Part1
No ratings yet
Lec15 17 N Gram Language Model Part1
49 pages
5) Lecture Feb11&13&17&18
No ratings yet
5) Lecture Feb11&13&17&18
21 pages
N Gram Model
No ratings yet
N Gram Model
4 pages
English9 Pre Assessment Test
75% (4)
English9 Pre Assessment Test
3 pages
ML Lecture 2 Supervised Learning Setup
No ratings yet
ML Lecture 2 Supervised Learning Setup
38 pages
Class 9 - SCT
No ratings yet
Class 9 - SCT
9 pages
Instant Ebooks Textbook How To Prove It: A Structured Approach Daniel J. Velleman Download All Chapters
No ratings yet
Instant Ebooks Textbook How To Prove It: A Structured Approach Daniel J. Velleman Download All Chapters
53 pages
ABM First Summative Examination - ORAL COMMUNICATION
No ratings yet
ABM First Summative Examination - ORAL COMMUNICATION
4 pages
Paper II LDC DMR
No ratings yet
Paper II LDC DMR
9 pages
Calculate Client Security Hash Walkthrough
0% (3)
Calculate Client Security Hash Walkthrough
19 pages
Dimensions British English Teacher B2 C1
No ratings yet
Dimensions British English Teacher B2 C1
8 pages
QP Format - IEE
No ratings yet
QP Format - IEE
2 pages
B.tech Iii Year I Sem MR 22 DS Model Question Paper
No ratings yet
B.tech Iii Year I Sem MR 22 DS Model Question Paper
2 pages
Developing Cultural Competence in PT Practice - APTA
No ratings yet
Developing Cultural Competence in PT Practice - APTA
7 pages
NLP Week 01
No ratings yet
NLP Week 01
57 pages
NLP Week 01
No ratings yet
NLP Week 01
57 pages
Eclipse Download and Installation Instructions
No ratings yet
Eclipse Download and Installation Instructions
15 pages
ML Lecture 1 Introduction and Policies
No ratings yet
ML Lecture 1 Introduction and Policies
45 pages
CH 3
No ratings yet
CH 3
30 pages
PeopleSoft Internet Architecture 081000
No ratings yet
PeopleSoft Internet Architecture 081000
19 pages
DM Lecture 1 Introudction and Policies
No ratings yet
DM Lecture 1 Introudction and Policies
17 pages
NLP Week 02
No ratings yet
NLP Week 02
55 pages
HubSpot - MFM's Resources and Tools For Better Copywriting
No ratings yet
HubSpot - MFM's Resources and Tools For Better Copywriting
4 pages
NLP Week 02
No ratings yet
NLP Week 02
54 pages
Large Language Models For Generative Information Extraction: A Survey
No ratings yet
Large Language Models For Generative Information Extraction: A Survey
43 pages
(Ebook) The Alphabet and The Algorithm (Writing Architecture) by Mario Carpo ISBN 9780262515801, 0262515806 Download
100% (1)
(Ebook) The Alphabet and The Algorithm (Writing Architecture) by Mario Carpo ISBN 9780262515801, 0262515806 Download
29 pages
Disability in Fairy Tales 2
No ratings yet
Disability in Fairy Tales 2
19 pages
AssgB9 - UDP Socket
No ratings yet
AssgB9 - UDP Socket
5 pages
SCR3443 Tutorial 4: Modern Cryptography
No ratings yet
SCR3443 Tutorial 4: Modern Cryptography
2 pages
Carroll - Ostlie 02.15
No ratings yet
Carroll - Ostlie 02.15
5 pages
0286 Cse3004 Tee
No ratings yet
0286 Cse3004 Tee
2 pages
FP1 U0S Grammar Practice Plus
No ratings yet
FP1 U0S Grammar Practice Plus
1 page
Introduction to Gambling Theory – Know the Odds!
From Everand
Introduction to Gambling Theory – Know the Odds!
stanbook449
3.5/5 (2)
Attacking Probability and Statistics Problems
From Everand
Attacking Probability and Statistics Problems
David S. Kahn
No ratings yet
Probability Distributions: Six Sigma Thinking, #5
From Everand
Probability Distributions: Six Sigma Thinking, #5
Sumeet Savant
No ratings yet