0% found this document useful (0 votes)

53 views69 pages

Deeplearning Ai

Here are the steps the generative language model would take to generate the sentence "<s> Lyn drinks chocolate </s>": 1. Choose the start token <s> 2. Choose the next bigram (<s>, Lyn) 3. Choose the next bigram (Lyn, drinks) 4. Choose the next bigram (drinks, chocolate) 5. Choose the end token </s>

Uploaded by

Jian Quan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views69 pages

Deeplearning Ai

Uploaded by

Jian Quan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 69

Copyright Notice

These slides are distributed under the Creative Commons License. DeepLearning.AI
makes these slides available for educational purposes. You may not use or distribute
these slides for commercial purposes. You may make copies of these slides and use or
distribute them for educational purposes as long as you cite DeepLearning.AI as the
source of the slides. For the rest of the details of the license, see
https://creativecommons.org/licenses/by-sa/2.0/legalcode
N-Grams:
Overview
deeplearning.ai
What you’ll be able to do!

● Create language model (LM) from text corpus to

○ Estimate probability of word sequences
○ Estimate probability of a word following a sequence of words
● Apply this concept to autocomplete a sentence with most likely suggestions

Text Language “chocolate“

corpus model “Lyn is eating … “ “eggs“
“toast“
Other Applications
Speech recognition

P(I saw a van) > P(eyes awe of an)

Spelling correction
“He entered the ship to buy some groceries” - “ship” a dictionary word
• P(entered the shop to buy) > P(entered the ship to buy)

Augmentative communication
Predict most likely word from menu for people unable to physically talk or sign.
(Newell et al., 1998)
Learning objectives
● Process text corpus to N-gram
language model Sentence
● Out of vocabulary words
auto-complete
● Smoothing for previously unseen N-
grams
● Language model evaluation
N-grams and
Probabilities
deeplearning.ai
Outline

● What are N-grams?

● N-grams and conditional probability from corpus

N-gram
An N-gram is a sequence of N words

Corpus: I am happy because I am learning

Unigrams: { I , am , happy , because , learning }

Bigrams: { I am , am happy , happy because … } I happy

Trigrams: { I am happy , am happy because, … }

Sequence notation

Corpus: This is great … teacher drinks tea.

Unigram probability
Corpus: I am happy because I am learning

Size of corpus m = 7

Probability of unigram:
Bigram probability
Corpus: I am happy because I am learning

I happy

Probability of a bigram:
Trigram Probability
Corpus: I am happy because I am learning

Probability of a trigram:
N-gram probability

Probability of N-gram:
Quiz

Objective: Apply n-gram probability calculation on sample corpus and 3-gram.

Question:
Corpus: “In every place of great resort the monster was the fashion. They sang of it in the cafes, ridiculed it in the papers, and represented it on
the stage. ” (Jules Verne, Twenty Thousand Leagues under the Sea)

In the context of our corpus, what is the probability of word “papers” following the phrase “it in the”.

Type: Multiple Choice, single answer

Options and solution:

1. P(papers|it in the) = 0 2. P(papers|it in the) =1

3. P(papers|it in the) = 2/3 4. P(papers|it in the) =

1/2
= C(it in the papers)/C(it in the)
Sequence
Probabilities
deeplearning.ai
Outline
● Sequence probability

● Sequence probability shortcomings

● Approximation by N-gram probabilities

Probability of a sequence
● Given a sentence, what is its probability?

?
● Conditional probability and chain rule reminder
Probability of a sequence
Sentence not in corpus
● Problem: Corpus almost never contains the exact sentence we’re
interested in or even its longer subsequences!

Input: the teacher drinks tea

Both
likely
0
Approximation of sequence probability
the teacher drinks tea
Approximation of sequence probability
● Markov assumption: only last N words matter

● Bigram
● N-gram

● Entire sentence modeled with bigram

Quiz

Objective: Apply sequence probability approximation with bigrams.

Question:
Given these conditional probabilities
P(Mary)=0.1; P(likes)=0.2; P(cats)=0.3
P(Mary|likes) =0.2; P(likes|Mary) =0.3; P(cats|likes)=0.1; P(likes|cats)=0.4

Approximate the probability of the following sentence with bigrams: “Mary likes cats”

Type: Multiple Choice, single answer

Options and solution:

1. P(Mary likes cats) = 0 2. P(Mary likes cats) =1

3. P(Mary likes cats) = 0.003 4. P(Mary likes cats) = 0.008

Starting and
Ending
Sentences
deeplearning.ai
Outline

● Start of sentence symbols <s>

● End of sentence symbol </s>

Start of sentence token <s>

the teacher drinks tea

<s> the teacher drinks tea

Start of sentence token <s> for N-grams
● Trigram:

the teacher drinks tea => <s> <s> the teacher drinks tea

● N-gram model: add N-1 start tokens <s>

End of sentence token </s> - motivation

Corpus:
<s> Lyn drinks chocolate
<s> John drinks
End of sentence token </s> - motivation
Corpus Sentences of length 2:
<s> yes no <s> yes yes
<s> yes no
<s> yes yes <s> no no
<s> no no <s> no yes
End of sentence token </s> - motivation
Corpus Sentences of length 2:
<s> yes no <s> yes yes
<s> yes no
<s> yes yes <s> no no
<s> no no <s> no yes
End of sentence token </s> - motivation
Corpus Sentences of length 3:
<s> yes no <s> yes yes yes
<s> yes yes <s> yes yes no
…
<s> no no <s> no no no
End of sentence token </s> - motivation
Corpus
<s> yes no
<s> yes yes
<s> no no
End of sentence token </s> - solution

● Bigram
<s> the teacher drinks tea => <s> the teacher drinks tea </s>

Corpus:
<s> Lyn drinks chocolate </s>
<s> John drinks </s>
End of sentence token </s> for N-grams

● N-gram => just one </s>

E.g. Trigram:
the teacher drinks tea => <s> <s> the teacher drinks tea </s>
Example - bigram
Corpus
<s> Lyn drinks chocolate </s>
<s> John drinks tea </s>
<s> Lyn eats chocolate </s>
Quiz

Objective: Apply sequence probability approximation with bigrams after adding start and end word.

Question:
Given these conditional probabilities
P(Mary)=0.1; P(likes)=0.2; P(cats)=0.3
P(Mary|<s>)=0.2; P(</s>|cats)=0.6
P(likes|Mary) =0.3; P(cats|likes)=0.1

Approximate the probability of the following sentence with bigrams: “<s> Mary likes cats </s>”
Type: Multiple Choice, single answer
Options and solution:

1. P(<s> Mary likes cats </s>) = 0 2. P(<s> Mary likes cats </s>) =0.0036

3. P(<s> Mary likes cats </s>) = 0.003 4. P(<s> Mary likes cats </s>) = 1
The N-gram
Language
Model
deeplearning.ai
Outline
● Count matrix
● Probability matrix
● Language model
● Log probability to avoid underflow
● Generative language model
Count matrix

● Rows: unique corpus (N-1)-grams

● Columns: unique corpus words

Corpus: <s>I study I learn</s>

● Bigram
<s> </s> I study learn
count matrix
<s> 0 0 1 0 0
</s> 0 0 0 0 0
I 0 0 0 1 1
“study I” bigram study 0 0 1 0 0
learn 0 1 0 0 0
Probability matrix
• Divide each cell by its row sum

Corpus: <s>I study I learn</s>

Count matrix (bigram) Probability matrix
<s> </s> I study learn sum <s> </s> I study learn
<s> 0 0 1 0 0 1 <s> 0 0 1 0 0
</s> 0 0 0 0 0 0 </s> 0 0 0 0 0
I 0 0 0 1 1 2 I 0 0 0 0.5 0.5
study 0 0 1 0 0 1 study 0 0 1 0 0
learn 0 1 0 0 0 1 learn 0 1 0 0 0
Language model
● probability matrix => language model
○ Sentence probability
○ Next word prediction

<s> </s> I study learn Sentence probability:

<s> 0 0 1 0 0 <s> I learn </s>
</s> 0 0 0 0 0
I 0 0 0 0.5 0.5
study 0 0 1 0 0
learn 0 1 0 0 0
Log probability
● All probabilities in calculation <=1 and multiplying them brings risk
of underflow

● Logarithm properties reminder

● Use log of the probabilities in Probability matrix and calculations

● Converts back from log

Generative Language model
Corpus:
<s> Lyn drinks chocolate </s> 1. (<s>, Lyn) or (<s>, John)?
<s> John drinks tea </s> 2. (Lyn,eats) or (Lyn,drinks) ?
3. (drinks,tea) or (drinks,chocolate)?
<s> Lyn eats chocolate </s>
4. (tea,</s>) - always
Algorithm:
1. Choose sentence start
2. Choose next bigram starting with previous word
3. Continue until </s> is picked
Quiz

Objective: Apply sum when calculating log probability instead of product.

Question:
Given the logarithm of these conditional probabilities:
log(P(Mary|<s>))=-2; log(P(</s>|cats))=-1
log(P(likes|Mary)) =-10; log(P(cats|likes))=-100

Approximate the log probability of the following sentence with bigrams : “<s> Mary likes cats </s>”

Type: Multiple Choice, single answer

Options and solution:

1. log(P(<s> Mary likes cats </s>)) = -113 2. log(P(<s> Mary likes cats </s>)) =2000

3. log(P(<s> Mary likes cats </s>)) = 113 4. log(P(<s> Mary likes cats </s>))= -112
Language
Model
Evaluation
deeplearning.ai
Outline
● Train/Validation/Test split

● Perplexity
Test data
● Split corpus to Train/Validation/Test Evaluate on Training
dataset

● For smaller corpora ● For large corpora (typical for text)

○ 80% Train ○ 98% Train

○ 10% Validation ○ 1% Validation

○ 10% Test ○ 1% Test

Test data - split method
● Continuous text ● Random short sequences
Test

Validation

Training
Corpus
Perplexity

W → test set containing m sentences s

→ i-th sentence in the test set, each ending with </s>
m → number of all words in entire test set W including
</s> but not including <s>
Perplexity

E.g. m=100

● Smaller perplexity = better model

● Character level models PP < word-based models PP

Perplexity for bigram models

→ j-th word in i-th sentence

● concatenate all sentences in W

→ i-th word in test set

Log perplexity
Examples Training 38 million words, test 1.5 million words, WSJ corpus
Perplexity Unigram: 962 Bigram: 170 Trigram: 109

[Figure from Speech and Language Processing by Dan Jurafsky et. al]
Quiz

Objective: Calculate log perplexity from log probabilities using sum and correct normalization coefficient (not
including <s>).
Question:
Given the logarithm of these conditional probabilities:
log(P(Mary|<s>))=-2; log(P(</s>|cats))=-1
log(P(likes|Mary)) =-10; log(P(cats|likes))=-100

Assuming our test set is W=“<s> Mary likes cats </s>”, what is the model’s perplexity.
Type: Multiple Choice, single answer
Options and solution:

1. log PP(W) = -113 2. log PP(W) = (-1/4)*(-113)

3. log PP(W) = (-1/5)(-113) 4. log PP(W) = (-1/5)113

Out of
Vocabulary
Words
deeplearning.ai
Outline

● Unknown words

● Update corpus with <UNK>

● Choosing vocabulary
Out of vocabulary words
● Closed vs. Open vocabularies

● Unknown word = Out of vocabulary word (OOV)

● special tag <UNK> in corpus and in input

Using <UNK> in corpus
● Create vocabulary V

● Replace any word in corpus and not in V by <UNK>

● Count the probabilities with <UNK> as with any other word

Example
Corpus Corpus
<s> Lyn drinks chocolate </s> <s> Lyn drinks chocolate </s>
<s> John drinks tea </s> <s> <UNK> drinks <UNK> </s>
<s> Lyn eats chocolate </s> <s> Lyn <UNK> chocolate </s>
Min frequency f=2

Input query
Vocabulary <s>Adam drinks chocolate</s>
Lyn, drinks, chocolate
<s><UNK> drinks chocolate</s>
How to create vocabulary V
● Criteria:
○ Min word frequency f
○ Max |V|, include words by frequency

● Use <UNK> sparingly

● Perplexity - only compare LMs with the same V

Quiz

Objective: Create corpus vocabulary based on minimum frequency.

Question:
Given the training corpus and minimum word frequency=2, how would the vocabulary for corpus
preprocessed with <UNK> look like?

“<s> I am happy I am learning </s> <s> I am happy I can study </s>”

Type: Multiple Choice, single answer

Options and solution:

1. V = (I,am,happy) 2. V = (I,am,happy,learning,can,study)

3. V = (I,am,happy,I,am) 4. V=
(I,am,happy,learning,can,study,<UNK>)
Smoothing
deeplearning.ai
Outline
● Missing N-grams in corpus

● Smoothing

● Backoff and interpolation

Missing N-grams in training corpus
● Problem: N-grams made of known words still might be missing in
the training corpus “John” , “eats” in corpus “John eats”
● Their counts cannot be used for probability estimation

Can be 0
● Advanced methods:
Smoothing Kneser-Ney smoothing
Good-Turing smoothing

● Add-one smoothing (Laplacian smoothing)

● Add-k smoothing
Backoff
● If N-gram missing => use (N-1)-gram, …
○ Probability discounting e.g. Katz backoff
○ “Stupid” backoff

Corpus
<s> Lyn drinks chocolate </s>
<s> John drinks tea </s>
<s> Lyn eats chocolate </s>
Interpolation
Quiz

Objective: Apply n-gram probability with add-k smoothing for phrase not present in the corpus.

Question:
Corpus: “I am happy I am learning”

In the context of our corpus, what is the estimated probability of word “can” following the word “I” using the
bigram model and add-k-smoothing where k=3.

Type: Multiple Choice, single answer

Options and solution:

1. P(can|I) = 0 2. P(can|I) =1

3. P(can|I) = 3/(2+34) 4. P(can|I) = 3/(34)

Week
Summary
deeplearning.ai
Summary
● N-Grams and probabilities
● Approximate sentence probability from N-Grams
● Build language model from corpus
● Fix missing information
○ Out of vocabulary words with <UNK>
○ Missing N-Gram in corpus with smoothing, backoff and interpolation
● Evaluate language model with perplexity
● Coding assignment!

Class 6 - Lasers Problems - Dr. Ajitha - PHY1701
100% (1)
Class 6 - Lasers Problems - Dr. Ajitha - PHY1701
15 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
91 pages
Heizer Chapter 6 - Managing Quality
No ratings yet
Heizer Chapter 6 - Managing Quality
17 pages
Cs 1 12th Experiment
0% (1)
Cs 1 12th Experiment
34 pages
Basic Discrete Structure
100% (1)
Basic Discrete Structure
57 pages
Rotary Screw Compressors
100% (4)
Rotary Screw Compressors
22 pages
Questions With Solutions Mid-Sem Final
No ratings yet
Questions With Solutions Mid-Sem Final
7 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
99 pages
Scopa Rules
No ratings yet
Scopa Rules
2 pages
8000 Series C Programming Guide Part 1
No ratings yet
8000 Series C Programming Guide Part 1
362 pages
Deeplearning Ai
No ratings yet
Deeplearning Ai
64 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
79 pages
Field Training Report: Executive Engineer
No ratings yet
Field Training Report: Executive Engineer
19 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
32 pages
VU21997 - Expose Website Security Vulnerabilities - Class 4 SQLMap Final
No ratings yet
VU21997 - Expose Website Security Vulnerabilities - Class 4 SQLMap Final
21 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
144 pages
N Grams - Nptel Notes
No ratings yet
N Grams - Nptel Notes
75 pages
Deeplearning Ai
No ratings yet
Deeplearning Ai
123 pages
Jonathan Bennett Events and Their Names
No ratings yet
Jonathan Bennett Events and Their Names
239 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
91 pages
IoT Lab Assignment No. 2
No ratings yet
IoT Lab Assignment No. 2
8 pages
Periodical Exam Science 8
No ratings yet
Periodical Exam Science 8
3 pages
Modelling With Ordinary Differential Equations: A Comprehensive Approach (Chapman & Hall/Crc Numerical Analysis and Scientific Computing) Alfio Borzì
100% (2)
Modelling With Ordinary Differential Equations: A Comprehensive Approach (Chapman & Hall/Crc Numerical Analysis and Scientific Computing) Alfio Borzì
55 pages
Extraction of Piperine From Black Pepper PDF
0% (2)
Extraction of Piperine From Black Pepper PDF
2 pages
C2 - W3 Mlopssasaddsad
No ratings yet
C2 - W3 Mlopssasaddsad
65 pages
Deye Hybrid 5K y 6K
No ratings yet
Deye Hybrid 5K y 6K
2 pages
IE462 Project77
No ratings yet
IE462 Project77
15 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
42 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
41 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
40 pages
F
No ratings yet
F
45 pages
Economic Order Quantity: Information
No ratings yet
Economic Order Quantity: Information
11 pages
Design Considerations For The Vibration of Floors - Part 2: Advisory Desk
No ratings yet
Design Considerations For The Vibration of Floors - Part 2: Advisory Desk
3 pages
M Schemes 04
0% (2)
M Schemes 04
3 pages
Identifying The Sources of Gains From Takeovers: Feature Article
No ratings yet
Identifying The Sources of Gains From Takeovers: Feature Article
25 pages
C: Identify The Structures of The Given Sentences. P: Create Sentences Using Sentence Structures. A: Share Ideas Regarding Sentence Structures
No ratings yet
C: Identify The Structures of The Given Sentences. P: Create Sentences Using Sentence Structures. A: Share Ideas Regarding Sentence Structures
11 pages
Language Modeling: Introduction To N-Grams
No ratings yet
Language Modeling: Introduction To N-Grams
76 pages
THS527 Datasheet
No ratings yet
THS527 Datasheet
5 pages
The Automatic Pilot
No ratings yet
The Automatic Pilot
10 pages
IS 7118 Unit-4 N-Grams
100% (2)
IS 7118 Unit-4 N-Grams
93 pages
Lecture 5: Language Modeling (N-Gram, BOW)
No ratings yet
Lecture 5: Language Modeling (N-Gram, BOW)
25 pages
Running Head: Specialty Toys Inc: Managerial Report 1
No ratings yet
Running Head: Specialty Toys Inc: Managerial Report 1
9 pages
Summative On Measure of An Arc
No ratings yet
Summative On Measure of An Arc
1 page
NLP - (Natural Language Processing Lab Manual)
No ratings yet
NLP - (Natural Language Processing Lab Manual)
12 pages
Session 2-3 Language Modeling
No ratings yet
Session 2-3 Language Modeling
69 pages
LM 24 Aug
No ratings yet
LM 24 Aug
84 pages
Pervaporation Ketazine Aq Layer Prodn HH Peroxide Proc PDF
No ratings yet
Pervaporation Ketazine Aq Layer Prodn HH Peroxide Proc PDF
6 pages
NLP07 Generative Language Models
No ratings yet
NLP07 Generative Language Models
50 pages
MDP400 Open-Frame and U-Chassis :: ROAL Living Energy
No ratings yet
MDP400 Open-Frame and U-Chassis :: ROAL Living Energy
2 pages
Language Modeling and Spelling Correction
No ratings yet
Language Modeling and Spelling Correction
97 pages
08 Language Models
No ratings yet
08 Language Models
69 pages
N-Gram Language Model: Based On Speech and Language Processing. Daniel Jurafsky & James H. Martin Book, 2023
No ratings yet
N-Gram Language Model: Based On Speech and Language Processing. Daniel Jurafsky & James H. Martin Book, 2023
46 pages
History of Computing
No ratings yet
History of Computing
3 pages
Lecture04-Ngram Lang Models
No ratings yet
Lecture04-Ngram Lang Models
39 pages
N Gram Presentation
No ratings yet
N Gram Presentation
29 pages
N-Gram Language Models Lecture
No ratings yet
N-Gram Language Models Lecture
56 pages
Multimedia Application L6
No ratings yet
Multimedia Application L6
63 pages
Multimedia Application L5
No ratings yet
Multimedia Application L5
35 pages
Language Modeling
No ratings yet
Language Modeling
88 pages
CME4408 P5 N-Grams Smooting
No ratings yet
CME4408 P5 N-Grams Smooting
43 pages
Assignment6 cs22bt012
No ratings yet
Assignment6 cs22bt012
20 pages
Language Modeling: Introduction To N-Grams
No ratings yet
Language Modeling: Introduction To N-Grams
79 pages
04 - N-Gram Language Models
No ratings yet
04 - N-Gram Language Models
41 pages
NLP Lec 11
No ratings yet
NLP Lec 11
6 pages
Prog 2 NLP
No ratings yet
Prog 2 NLP
3 pages
14 Ngramlm
No ratings yet
14 Ngramlm
67 pages
NLP Unit-4
No ratings yet
NLP Unit-4
48 pages
N-Gram Language Models Lecture
No ratings yet
N-Gram Language Models Lecture
59 pages
3 LM Jan 08 2021
No ratings yet
3 LM Jan 08 2021
77 pages
1 N-Grams and Language Models Detailed
No ratings yet
1 N-Grams and Language Models Detailed
4 pages
5) Lecture Feb11&13&17&18
No ratings yet
5) Lecture Feb11&13&17&18
21 pages
Lecture 4
No ratings yet
Lecture 4
37 pages
Assignment Iii NLP
No ratings yet
Assignment Iii NLP
15 pages
Language Modeling: Introduction To N-Grams
No ratings yet
Language Modeling: Introduction To N-Grams
88 pages
13 Ngramlm
No ratings yet
13 Ngramlm
27 pages
N-Grams and Smoothing: CSC 371: Spring 2012
No ratings yet
N-Grams and Smoothing: CSC 371: Spring 2012
39 pages
6.chapter6 LanguageModel
No ratings yet
6.chapter6 LanguageModel
33 pages
A7 NLP Exp2
No ratings yet
A7 NLP Exp2
11 pages
3 LM Jan 08 2021
No ratings yet
3 LM Jan 08 2021
77 pages
Lecture 4
No ratings yet
Lecture 4
87 pages
NLP UNIT III (Part 1)
No ratings yet
NLP UNIT III (Part 1)
15 pages
02 Estimating N-Gram Probabilities 9-38
No ratings yet
02 Estimating N-Gram Probabilities 9-38
4 pages
N Grams
No ratings yet
N Grams
51 pages
Lecture - 3 - Statistical Language Models
No ratings yet
Lecture - 3 - Statistical Language Models
56 pages
DS311 Natural Language Processing Test 1 Solutions
No ratings yet
DS311 Natural Language Processing Test 1 Solutions
4 pages
Lecture 2. N-Gram LMs
No ratings yet
Lecture 2. N-Gram LMs
77 pages
UBC Summer School in NLP - VSP 2019 Lecture 9
No ratings yet
UBC Summer School in NLP - VSP 2019 Lecture 9
17 pages
19102B0052 - NLP - Exp - 4
No ratings yet
19102B0052 - NLP - Exp - 4
5 pages
N-Grams and Corpus Linguistics: Julia Hirschberg
No ratings yet
N-Grams and Corpus Linguistics: Julia Hirschberg
47 pages
Ngrams
100% (1)
Ngrams
22 pages
Pract Q
No ratings yet
Pract Q
6 pages
NLP Exp03
No ratings yet
NLP Exp03
5 pages
NLP - N-Gram Language Model
No ratings yet
NLP - N-Gram Language Model
22 pages
Language Model PDF
No ratings yet
Language Model PDF
76 pages
Lec-3 Language Modeling N-Grams
No ratings yet
Lec-3 Language Modeling N-Grams
41 pages
Language Models: CS6370: Natural Language Processing
No ratings yet
Language Models: CS6370: Natural Language Processing
35 pages
Contoh Soal N Gram (Bagus)
No ratings yet
Contoh Soal N Gram (Bagus)
2 pages
CS 388: Natural Language Processing:: N-Gram Language Models
No ratings yet
CS 388: Natural Language Processing:: N-Gram Language Models
22 pages
Artificial Intelligence: N-Gram Models: Russell & Norvig: Section 22.1
No ratings yet
Artificial Intelligence: N-Gram Models: Russell & Norvig: Section 22.1
32 pages

Deeplearning Ai

Uploaded by

Deeplearning Ai

Uploaded by

Copyright Notice

● Create language model (LM) from text corpus to

Text Language “chocolate“

P(I saw a van) > P(eyes awe of an)

● What are N-grams?

● N-grams and conditional probability from corpus

Corpus: I am happy because I am learning

Unigrams: { I , am , happy , because , learning }

Bigrams: { I am , am happy , happy because … } I happy

Trigrams: { I am happy , am happy because, … }

Corpus: This is great … teacher drinks tea.

Objective: Apply n-gram probability calculation on sample corpus and 3-gram.

Type: Multiple Choice, single answer

1. P(papers|it in the) = 0 2. P(papers|it in the) =1

3. P(papers|it in the) = 2/3 4. P(papers|it in the) =

● Sequence probability shortcomings

● Approximation by N-gram probabilities

Input: the teacher drinks tea

● Entire sentence modeled with bigram

Objective: Apply sequence probability approximation with bigrams.

Type: Multiple Choice, single answer

1. P(Mary likes cats) = 0 2. P(Mary likes cats) =1

3. P(Mary likes cats) = 0.003 4. P(Mary likes cats) = 0.008

● Start of sentence symbols <s>

● End of sentence symbol </s>

the teacher drinks tea

<s> the teacher drinks tea

● N-gram model: add N-1 start tokens <s>

● N-gram => just one </s>

● Rows: unique corpus (N-1)-grams

Corpus: <s>I study I learn</s>

Corpus: <s>I study I learn</s>

<s> </s> I study learn Sentence probability:

● Logarithm properties reminder

● Use log of the probabilities in Probability matrix and calculations

● Converts back from log

Objective: Apply sum when calculating log probability instead of product.

Type: Multiple Choice, single answer

● For smaller corpora ● For large corpora (typical for text)

○ 10% Validation ○ 1% Validation

○ 10% Test ○ 1% Test

W → test set containing m sentences s

● Smaller perplexity = better model

● Character level models PP < word-based models PP

→ j-th word in i-th sentence

● concatenate all sentences in W

→ i-th word in test set

1. log PP(W) = -113 2. log PP(W) = (-1/4)*(-113)

3. log PP(W) = (-1/5)*(-113) 4. log PP(W) = (-1/5)*113

● Update corpus with <UNK>

● Unknown word = Out of vocabulary word (OOV)

● special tag <UNK> in corpus and in input

● Replace any word in corpus and not in V by <UNK>

● Count the probabilities with <UNK> as with any other word

● Use <UNK> sparingly

● Perplexity - only compare LMs with the same V

Objective: Create corpus vocabulary based on minimum frequency.

“<s> I am happy I am learning </s> <s> I am happy I can study </s>”

Type: Multiple Choice, single answer

● Backoff and interpolation

● Add-one smoothing (Laplacian smoothing)

Type: Multiple Choice, single answer

3. P(can|I) = 3/(2+3*4) 4. P(can|I) = 3/(3*4)

You might also like

3. log PP(W) = (-1/5)(-113) 4. log PP(W) = (-1/5)113

3. P(can|I) = 3/(2+34) 4. P(can|I) = 3/(34)