0% found this document useful (0 votes)

3 views48 pages

NLP Unit-4

Language models assign probabilities to sentences, aiding various NLP applications like speech recognition and machine translation. N-gram models, which estimate probabilities based on prior context, can suffer from sparse data issues, necessitating smoothing techniques to adjust estimates. Syntactic parsing, using context-free grammars, helps in understanding sentence structure and relationships among words.

Uploaded by

chitransh04shukla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views48 pages

NLP Unit-4

Uploaded by

chitransh04shukla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

Language Models

• Formal grammars (e.g. regular, context free)

give a hard “binary” model of the legal
sentences in a language.
• For NLP, a probabilistic model of a
language that gives a probability that a
string is a member of a language is more
useful.
• To specify a correct probability distribution,
the probability of all sentences in a
language must sum to 1.
Uses of Language Models
• Speech recognition
– “I ate a cherry” is a more likely sentence than “Eye eight
uh Jerry”
• OCR & Handwriting recognition
– More probable sentences are more likely correct readings.
• Machine translation
– More likely sentences are probably better translations.
• Generation
– More likely sentences are probably better NL generations.
• Context sensitive spelling correction
– “Their are problems wit this sentence.”
Completion Prediction

• A language model also supports predicting

the completion of a sentence.
– Please turn off your cell _____
– Your program does not ______
• Predictive text input systems can guess what
you are typing and give choices on how to
complete it.
N-Gram Models
• Estimate probability of each word given prior context.
– P(phone | Please turn off your cell)
• Number of parameters required grows exponentially with
the number of words of prior context.
• An N-gram model uses only N−1 words of prior context.
– Unigram: P(phone)
– Bigram: P(phone | cell)
– Trigram: P(phone | your cell)
• The Markov assumption is the presumption that the future
behavior of a dynamical system only depends on its recent
history. In particular, in a kth-order Markov model, the
next state only depends on the k most recent states,
therefore an N-gram model is a (N−1)-order Markov model.
N-Gram Model Formulas

• Word sequences
w1n = w1...wn

• Chain rule of probability

n
P( w ) = P( w1 ) P( w2 | w1 ) P( w3 | w )...P( wn | w ) =  P( wk | w1k −1 )
n
1
2
1
n −1
1
k =1

• Bigram approximation
n
P( w1 ) =  P( wk | wk −1 )
n

k =1

• N-gram approximation
n
P( w ) =  P( wk | wkk−−1N +1 )
n
1
k =1
Estimating Probabilities
• N-gram conditional probabilities can be estimated
from raw text based on the relative frequency of
word sequences.
C ( wn −1wn )
Bigram: P( wn | wn −1 ) =
C ( wn −1 )
n −1
C ( wn − N +1 wn )
N-gram: P( wn | wnn−−1N +1 ) =
C ( wnn−−1N +1 )
• To have a consistent probabilistic model, append a
unique start (<s>) and end (</s>) symbol to every
sentence and treat these as additional words.
Generative Model & MLE
• An N-gram model can be seen as a probabilistic
automata for generating sentences.
Initialize sentence with N−1 <s> symbols
Until </s> is generated do:
Stochastically pick the next word based on the conditional
probability of each word given the previous N −1 words.

• Relative frequency estimates can be proven to be

maximum likelihood estimates (MLE) since they
maximize the probability that the model M will
generate the training corpus T.
ˆ = argmax P(T | M ( ))

Example from Textbook

• P(<s> i want english food </s>)

P(w1w2…wn ) = P(wi | w1w2…wi−1)

P(“its water is so transparent”) =

P(its) × P(water|its) × P(is|its water)
× P(so|its water is) ×
P(transparent|its water is so)
How to estimate these
probabilities
• Could we just count anddivide?

P(the | its water is so transparent

that)=Count(its water is so
transparent that the)
Count(its water is so transparent that)

• No! Too many possible sentences!

• We’ll never see enough data for estimating
these
GIVEN CORPUS(TRAINING CORPUS ):
<S> I AM SAM </S>
<S> SAM I AM </S>
<S>I DO NOT LIKE GREEN VEG </S>
BIGRAM PROBABILITY ESTIMATES:
P(I|<S>)=2/3=0.67
P(SAM|<S>)=1/3=0.33
P(AM|I)=2/3=0.67
P(</S>|SAM)=1/2=.50
P(SAM|AM)=?
P(DO|I)=?
11
Train and Test Corpora
• A language model must be trained on a large
corpus of text to estimate good parameter values.
• Model can be evaluated based on its ability to
predict a high probability for a disjoint (held-out)
test corpus (testing on the training corpus would
give an optimistically biased estimate).
• Ideally, the training (and test) corpus should be
representative of the actual application data.
• May need to adapt a general model to a small
amount of new (in-domain) data by adding highly
weighted small corpus to original training data.
Evaluation of Language Models
• Ideally, evaluate use of model in end application
(extrinsic)
– Realistic
– Expensive
• Evaluate on ability to model test corpus
(intrinsic).
– Less realistic
– Cheaper
• Verify at least once that intrinsic evaluation
correlates with an extrinsic one.
Perplexity
• Measure of how well a model “fits” the test data.
• Uses the probability that the model assigns to the
test corpus.
• Normalizes for the number of words in the test
corpus and takes the inverse.
1
PP(W ) = N
P( w1w2 ...wN )

• Measures the weighted average branching factor

in predicting the next word (lower is better).
Sample Perplexity Evaluation

• The branching factor of a language is the

number of possible next words that can
follow any word. Consider the task of
recognizing the digits in English (zero, one,
two,..., nine), given that each of the 10
digits occurs with equal probability P = 1/
10 . The perplexity of this mini-language is
in fact 10.
• PP(W) = P(w1w2 ...wN) − 1/ N
• = ( (1 /10) N ) − 1/ N = 10
Smoothing
• Since there are a combinatorial number of possible
word sequences, many rare (but not impossible)
combinations never occur in training, so MLE
incorrectly assigns zero to many parameters (a.k.a.
sparse data).
• If a new combination occurs during testing, it is
given a probability of zero and the entire sequence
gets a probability of zero (i.e. infinite perplexity).
• In practice, parameters are smoothed (a.k.a.
regularized) to reassign some probability mass to
unseen events.
– Adding probability mass to unseen events requires
removing it from seen ones (discounting) in order to
maintain a joint distribution that sums to 1.
Laplace (Add-One) Smoothing
• “Hallucinate” additional training data in which each
possible N-gram occurs exactly once and adjust
estimates accordingly.
C ( wn −1wn ) + 1
Bigram: P( wn | wn −1 ) =
C ( wn −1 ) + V
n −1
n −1 C ( wn − N +1 wn ) + 1
N-gram: P( wn | wn − N +1 ) =
C ( wnn−−1N +1 ) + V
where V is the total number of possible (N−1)-grams
(i.e. the vocabulary size for a bigram model).
• Tends to reassign too much mass to unseen events,
so can be adjusted to add 0<<1 (normalized by V
instead of V).
Advanced Smoothing

• Many advanced techniques have been

developed to improve smoothing for
language models.
– Good-Turing
– Interpolation
– Backoff
– Kneser-Ney
– Class-based (cluster) N-grams
A Problem for N-Grams:
Long Distance Dependencies
• Many times local context does not provide the
most useful predictive clues, which instead are
provided by long-distance dependencies.
– Syntactic dependencies
• “The man next to the large oak tree near the grocery store on
the corner is tall.”
• “The men next to the large oak tree near the grocery store on
the corner are tall.”
– Semantic dependencies
• “The bird next to the large oak tree near the grocery store on
the corner flies rapidly.”
• “The man next to the large oak tree near the grocery store on
the corner talks rapidly.”
• More complex models of language are needed to
handle such dependencies.
Summary
• Language models assign a probability that a
sentence is a legal string in a language.
• They are useful as a component of many NLP
systems, such as ASR, OCR, and MT.
• Simple N-gram models are easy to train on
unsupervised corpora and can provide useful
estimates of sentence likelihood.
• MLE gives inaccurate parameters for models
trained on sparse data.
• Smoothing techniques adjust parameter estimates
to account for unseen (but not impossible) events.
Syntactic Parsing
• Syntax: the way words are arranged together
• Main ideas of syntax:
– Constituency
• Groups of words may behave as a single unit or
phrase, called constituent.
• CFG, a formalism allowing us to model the
constituency facts.
– Grammatical relations
• A formalization of ideas from traditional grammar
about SUBJECT and OBJECT
– Subcategorization and dependencies
• Referring to certain kind of relations between words
and phrases, e.g., the verb want can be followed by an
Context Free Grammar forinfinite, as in I want to fly to Detroit. 21
English
Background
• All of the kinds of syntactic knowledge can be modeled by
various kinds of CFG-based grammars.
• CFGs are thus backbone of many models of the syntax of
NL.
– Being integral to most models of NLU, of grammar checking, and more
recently speech understanding
• They are powerful enough to express sophisticated
relations among the words in a sentence, yet
computationally tractable enough that efficient algorithms
exists for parsing sentences with them.
• Also probability version of CFG.

Context Free Grammar for 22

English
9.1 Constituency
• NP:
– A sequence of words surrounding at least one noun, e.g.,
• three parties from Brooklyn arrive
• a high-class spot such as Mindy’s attracts
• the Broadway coppers love
• They sit
• Harry the Horse
• the reason he comes into the Hot Box
• Evidences of constituency
– The above NPs can all appear in similar syntactic environment, e.g., before, a verb.
– Preposed or postposed constructions, e.g., the PP, on September seventeenth, can be
placed in a number of different locations
• On September seventeenth, I’d like to fly from Atlanta to Denver.
• I’d like to fly on September seventeenth from Atlanta to Denver.
• I’d like to fly from Atlanta to Denver On September seventeenth.

Context Free Grammar for 23

English
9.2 Context-Free Rules and Trees
• CFG (or Phrase-Structure Grammar): NP
– The most commonly used mathematical system for
modeling constituent structure in English and other
Det Nom
NLs
– Terminals and non-terminals
Noun
– Derivation
a
– Parse tree
flight
– Start symbol

•Context Free Grammar for English 24

9.2 Context-Free Rules and Trees

Noun → flight | breeze | trip | morning | …

S → NP VP I + want a morning flight

NP → Pronoun I
| Proper-Noun Los Angeles
| Det Nominal a + flight
Nominal → Noun Nominal morning + flight
| Noun flights
VP → Verb do
| Verb NP want + a flight
| Verb NP PP leave + Boston + in the morning
| Verb PP leaving + on Thursday
PP → Preposition NP from + Los Angeles The grammar for L0
Sentence-Level Constructions
• There are a great number of possible overall sentences
structures, but four are particularly common and
important:
– Declarative structure, imperative structure, yes-n-no-question structure, and
wh-question structure.
• Sentences with declarative structure
– A subject NP followed by a VP
• The flight should be eleven a.m. tomorrow.
• I need a flight to Seattle leaving from Baltimore
making a stop in Minneapolis.
• The return flight should leave at around seven p.m.
• I would like to find out the flight number for the
United flight that arrives in San Jose around ten p.m.
• I’d like to fly the coach discount class.
Context Free Grammar•forI want a flight from Ontario to Chicago. 26
English
9.3 Sentence-Level Constructions

• Sentence with imperative structure

– Begin with a VP and have no subject.
– Always used for commands and suggestions
• Show the lowest fare.
• Show me the cheapest fare that has lunch.
• Give me Sunday’s flight arriving in Las Vegas from
Memphis and New York City.
• List all flights between five and seven p.m.
• List all flights from Burbank to Denver.
• Show me all flights that depart before ten a.m. and
have first class fares.
• Show me all the flights leaving Baltimore.
– S → VP
Context Free Grammar for 27
English
9.3 Sentence-Level Constructions
• Sentences with yes-no-question structure
– Begin with auxiliary, followed by a subject NP, followed by a VP.
• Do any of these flights have stops?
• Does American’s flight eighteen twenty five serve
dinner?
• Can you give me the same information for United?
– S → Aux NP VP

Context Free Grammar for 28

English
Background
• Syntactic parsing
– The task of recognizing a sentence and assigning a syntactic structure to it
• Since CFGs are a declarative formalism, they do not
specify how the parse tree for a given sentence should be
computed.
• Parse trees are useful in applications such as
– Grammar checking
– Semantic analysis
– Machine translation
– Question answering
– Information extraction

Parsing with CFG 29

Parsing as Search

• The parser can be viewed as searching through the space of

all possible parse trees to find the correct parse tree for the
sentence.
• How can we use the grammar to produce the parse tree?

•Parsing with CFG 30

Parsing as Search

• Top-down parsing

•Parsing with CFG 31

Parsing as Search

• Bottom-up parsing

•Parsing with CFG 32

10.2 A Basic Top-Down Parser
• Use depth-first strategy

•Parsing with CFG 33

10.2 A Basic Top-Down Parser
• A top-down, depth-
first, left-to-right
derivation

•Parsing with CFG 34

10.2 A Basic Top-Down Parser

Parsing with CFG 35

10.3 Problems with the Basic Top-Down
Parser
• Problems with the top-down parser
– Left-recursion
– Ambiguity
– Inefficiency reparsing of subtrees

Parsing with CFG 36

10.3 Problems with the Basic Top-Down
Parser
Ambiguity
• Parsers which do not incorporate disambiguators may
simply return all the possible parse trees for a given input.
• We do not want all possible parses from the robust, highly
ambiguous, wide-coverage grammars used in practical
applications.
• Reason:
– Potentially exponential number of parses that are possible for certain inputs
– Given the ATIS example:
• Show me the meal on Flight UA 386 from San
Francisco to Denver.
– The three PP’s at the end of this sentence yield a total of 14 parse trees for
this sentence.

Parsing with CFG 37

Statistical Parsing
• Statistical parsing uses a probabilistic model
of syntax in order to assign probabilities to
each parse tree.
• Provides principled approach to resolving
syntactic ambiguity.
• Allows supervised learning of parsers from
tree-banks of parse trees provided by human
linguists.
• Also allows unsupervised learning of parsers
from unannotated text, but the accuracy of
such parsers has been limited.
38
SCFG

39
SCFG

40
Probabilistic Context Free Grammar
(PCFG)
• A PCFG is a probabilistic version of a CFG where each
production has a probability.
• Probabilities of all productions rewriting a given non-
terminal must add to 1, defining a distribution for each
non-terminal.
• String generation is now probabilistic where production
probabilities are used to non-deterministically select a
production for rewriting a given non-terminal.

41
Simple PCFG for ATIS English
Grammar Prob Lexicon
S → NP VP 0.8 Det → the | a | that | this
S → Aux NP VP 0.1 + 1.0 0.6 0.2 0.1 0.1
S → VP 0.1 Noun → book | flight | meal | money
NP → Pronoun 0.2 0.1 0.5 0.2 0.2
NP → Proper-Noun 0.2 + 1.0 Verb → book | include | prefer
NP → Det Nominal 0.6 0.5 0.2 0.3
Nominal → Noun 0.3 Pronoun → I | he | she | me
Nominal → Nominal Noun 0.2 0.5 0.1 0.1 0.3
Nominal → Nominal PP 0.5 + 1.0
Proper-Noun → Houston | NWA
VP → Verb 0.2 0.8 0.2
VP → Verb NP 0.5 Aux → does
VP → VP PP 0.3 + 1.0 1.0
PP → Prep NP 1.0 Prep → from | to | on | near | through
0.25 0.25 0.1 0.2 0.2
Sentence Probability
• Assume productions for each node are chosen
independently.
• Probability of derivation is the product of the
probabilities of its productions.
P(D1) = 0.1 x 0.5 x 0.5 x 0.6 x 0.6 x S
0.1 D1
0.5 x 0.3 x 1.0 x 0.2 x 0.2 x VP
0.5
0.5 x 0.8 Verb NP 0.6
= 0.0000216 0.5
book Det Nominal 0.5
0.6
the Nominal PP 1.0
0.3
Noun Prep NP 0.2
0.5 0.2
flight through Proper-Noun
0.8
Houston
43
Syntactic Disambiguation
• Resolve ambiguity by picking most probable
parse tree.
S
P(D2) = 0.1 x 0.3 x 0.5 x 0.6 x 0.5 x 0.1 D2
0.6 x 0.3 x 1.0 x 0.5 x 0.2 x VP
0.3
VP 0.5
0.2 x 0.8
= 0.00001296 Verb NP 0.6
0.5 PP
book Det Nominal 1.0
0.6 0.3
the Noun Prep NP 0.2
0.5 0.2
flight through Proper-Noun
0.8
Houston

44 44
Sentence Probability
• Probability of a sentence is the sum of the probabilities of
all of its derivations.

P(“book the flight through Houston”) =

P(D1) + P(D2) = 0.0000216 + 0.00001296
= 0.00003456

45
Three Useful PCFG Tasks
• Observation likelihood: To classify and order sentences.
• Most likely derivation: To determine the most likely parse
tree for a sentence.
• Maximum likelihood training: To train a PCFG to fit
empirical training data.

46
Treebanks
• a treebank is a
parsed textcorpus that annotates syntactic or
semantic sentence structure.
• English Penn Treebank: Standard corpus for
testing syntactic parsing consists of 1.2 M
words of text from the Wall Street Journal
(WSJ).
• Typical to train on about 40,000 parsed
sentences and test on an additional standard
disjoint test set of 2,416 sentences.
• Chinese Penn Treebank: 100K words from the
47
Xinhua news service
Parsing Evaluation Metrics
• PARSEVAL metrics measure the fraction of the
constituents that match between the computed
and human parse trees. If P is the system’s parse
tree and T is the human parse tree:
– Recall = (# correct constituents in P) / (# constituents in T)
– Precision = (# correct constituents in P) / (# constituents in P)
• Labeled Precision and labeled recall require
getting the non-terminal label on the constituent
node correct to count as correct.

7th English Guide Term 1
No ratings yet
7th English Guide Term 1
84 pages
Digital Image Processing Unit 1
No ratings yet
Digital Image Processing Unit 1
38 pages
Digital Image Processing Unit 2
No ratings yet
Digital Image Processing Unit 2
82 pages
KPM180 Manual
No ratings yet
KPM180 Manual
108 pages
RBS 6102 4+4+4 900 and 1800 PDF
96% (23)
RBS 6102 4+4+4 900 and 1800 PDF
16 pages
NLP - Module 2
No ratings yet
NLP - Module 2
77 pages
Excel Associate
No ratings yet
Excel Associate
7 pages
Seminar Report Format
No ratings yet
Seminar Report Format
7 pages
Lecture 5: Language Modeling (N-Gram, BOW)
No ratings yet
Lecture 5: Language Modeling (N-Gram, BOW)
25 pages
Sns College of Technology: Department of Mechanical Engineering
No ratings yet
Sns College of Technology: Department of Mechanical Engineering
2 pages
Session 2-3 Language Modeling
No ratings yet
Session 2-3 Language Modeling
69 pages
Module5 DS PPT
No ratings yet
Module5 DS PPT
38 pages
04 Language Modeling
No ratings yet
04 Language Modeling
70 pages
Lecture 03
No ratings yet
Lecture 03
41 pages
Exploratory Data Analysis Unit 2
No ratings yet
Exploratory Data Analysis Unit 2
39 pages
E610-Dtu (433c30) e+User+Manual en v1.0
No ratings yet
E610-Dtu (433c30) e+User+Manual en v1.0
48 pages
Artificial Intelligence: Natural Language Processing
No ratings yet
Artificial Intelligence: Natural Language Processing
13 pages
Language Modeling and Spelling Correction
No ratings yet
Language Modeling and Spelling Correction
97 pages
LM 24 Aug
No ratings yet
LM 24 Aug
84 pages
Syllabus Computer Class-3
No ratings yet
Syllabus Computer Class-3
9 pages
N-Gram Language Model: Based On Speech and Language Processing. Daniel Jurafsky & James H. Martin Book, 2023
No ratings yet
N-Gram Language Model: Based On Speech and Language Processing. Daniel Jurafsky & James H. Martin Book, 2023
46 pages
Kashi Vishwanath Entry Ticket (5 Persons)
No ratings yet
Kashi Vishwanath Entry Ticket (5 Persons)
1 page
Presentation PPT Group No 6 New
No ratings yet
Presentation PPT Group No 6 New
25 pages
Week 3
No ratings yet
Week 3
24 pages
Lenovo IdeaPad Flex 5 14 2-In-1 Touchscreen Lapt
No ratings yet
Lenovo IdeaPad Flex 5 14 2-In-1 Touchscreen Lapt
1 page
N-Gram Language Models Lecture
No ratings yet
N-Gram Language Models Lecture
56 pages
Proper Waste Management
No ratings yet
Proper Waste Management
20 pages
5543978
No ratings yet
5543978
2 pages
Multimedia Application L6
No ratings yet
Multimedia Application L6
63 pages
Language Modeling
No ratings yet
Language Modeling
50 pages
Ai Unit 5
No ratings yet
Ai Unit 5
16 pages
Language Modeling
No ratings yet
Language Modeling
88 pages
Unit V-AI-KCS071
No ratings yet
Unit V-AI-KCS071
28 pages
Unit 5-Aiml
No ratings yet
Unit 5-Aiml
25 pages
Language Modeling: Introduction To N-Grams
No ratings yet
Language Modeling: Introduction To N-Grams
79 pages
Unit 5 Notes Final
No ratings yet
Unit 5 Notes Final
14 pages
Lecture04-Ngram Lang Models
No ratings yet
Lecture04-Ngram Lang Models
39 pages
COMSATS University Islamabad: Terminal Examination, SPRING 2021
No ratings yet
COMSATS University Islamabad: Terminal Examination, SPRING 2021
6 pages
DBMS Short Notes Diploma Compact
No ratings yet
DBMS Short Notes Diploma Compact
8 pages
Lecture 4 N Grams
No ratings yet
Lecture 4 N Grams
29 pages
Lecture 4
No ratings yet
Lecture 4
37 pages
Chapter 03-Number System
No ratings yet
Chapter 03-Number System
88 pages
Bcse306l Ai Module-7 Smsatapathy
No ratings yet
Bcse306l Ai Module-7 Smsatapathy
51 pages
3 LM 2024
No ratings yet
3 LM 2024
78 pages
N-Gram Language Models Lecture
No ratings yet
N-Gram Language Models Lecture
59 pages
Lecture5 Ngrams
No ratings yet
Lecture5 Ngrams
40 pages
6.chapter6 LanguageModel
No ratings yet
6.chapter6 LanguageModel
33 pages
3 LM Jan 08 2021
No ratings yet
3 LM Jan 08 2021
77 pages
Multimedia Application L5
No ratings yet
Multimedia Application L5
35 pages
35 ตัน XCT35 - Y - 1
No ratings yet
35 ตัน XCT35 - Y - 1
20 pages
Language Modeling: Introduction To N-Grams
No ratings yet
Language Modeling: Introduction To N-Grams
88 pages
14 Ngramlm
No ratings yet
14 Ngramlm
67 pages
Lecture 4
No ratings yet
Lecture 4
87 pages
CME4408 P5 N-Grams Smooting
No ratings yet
CME4408 P5 N-Grams Smooting
43 pages
3-Lecture Three - (Chapter Two-N-gram Language Models)
No ratings yet
3-Lecture Three - (Chapter Two-N-gram Language Models)
28 pages
13 Ngramlm
No ratings yet
13 Ngramlm
27 pages
NLP Unit-4
No ratings yet
NLP Unit-4
62 pages
N Grams
No ratings yet
N Grams
51 pages
ABAP Proxies
No ratings yet
ABAP Proxies
50 pages
3 LM Jan 08 2021
No ratings yet
3 LM Jan 08 2021
77 pages
HUAWEI FLA-LX3 9.1.0.116 (C605E5R1P1) Release Notes
No ratings yet
HUAWEI FLA-LX3 9.1.0.116 (C605E5R1P1) Release Notes
10 pages
Lecture - 3 - Statistical Language Models
No ratings yet
Lecture - 3 - Statistical Language Models
56 pages
5) Lecture Feb11&13&17&18
No ratings yet
5) Lecture Feb11&13&17&18
21 pages
NLP Unit2
No ratings yet
NLP Unit2
65 pages
Lecture 2. N-Gram LMs
No ratings yet
Lecture 2. N-Gram LMs
77 pages
N-Grams and Corpus Linguistics: Julia Hirschberg
No ratings yet
N-Grams and Corpus Linguistics: Julia Hirschberg
47 pages
IJISRT18DC138
No ratings yet
IJISRT18DC138
6 pages
Algebraic Geometry For Geometric Modeling: Ragni Piene
No ratings yet
Algebraic Geometry For Geometric Modeling: Ragni Piene
46 pages
Evaluating Language Models
No ratings yet
Evaluating Language Models
21 pages
Introduction To Language Modeling Final
No ratings yet
Introduction To Language Modeling Final
69 pages
Honour Declaration Template (Revised 250214)
No ratings yet
Honour Declaration Template (Revised 250214)
4 pages
NLP 1.2
No ratings yet
NLP 1.2
22 pages
Ngrams
100% (1)
Ngrams
22 pages
Account Closure Form
No ratings yet
Account Closure Form
1 page
NLP - N-Gram Language Model
No ratings yet
NLP - N-Gram Language Model
22 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
16 pages
NLP 2K22 MAY CS3EA06 Natural Language Processing
No ratings yet
NLP 2K22 MAY CS3EA06 Natural Language Processing
2 pages
Analog PPT 2
No ratings yet
Analog PPT 2
86 pages
Checklist Chiller-Update
No ratings yet
Checklist Chiller-Update
4 pages
Lec-3 Language Modeling N-Grams
No ratings yet
Lec-3 Language Modeling N-Grams
41 pages
Notes of NLP - Unit-2
No ratings yet
Notes of NLP - Unit-2
23 pages
NLP 2K19 MAY CS3EA06-IT3EA06 Natural Language Processing
No ratings yet
NLP 2K19 MAY CS3EA06-IT3EA06 Natural Language Processing
3 pages
Language Modeling: Prabhleen Juneja Thapar Institute of Engineering & Technology
No ratings yet
Language Modeling: Prabhleen Juneja Thapar Institute of Engineering & Technology
36 pages
Language Model PDF
No ratings yet
Language Model PDF
76 pages
EGPCL-NPL-PEL-KEC-PPL-RPT-00007 Wall Thickness Calculation Report C01
No ratings yet
EGPCL-NPL-PEL-KEC-PPL-RPT-00007 Wall Thickness Calculation Report C01
13 pages
Diploma in Legal Studies 27.04.22
No ratings yet
Diploma in Legal Studies 27.04.22
17 pages
Unit-3 (NLP)
No ratings yet
Unit-3 (NLP)
28 pages
RA3CO42 Digital Image Processing QP
No ratings yet
RA3CO42 Digital Image Processing QP
2 pages
CS 388: Natural Language Processing:: N-Gram Language Models
No ratings yet
CS 388: Natural Language Processing:: N-Gram Language Models
22 pages
ITP LTS - v3
No ratings yet
ITP LTS - v3
2 pages
Artificial Intelligence: N-Gram Models: Russell & Norvig: Section 22.1
No ratings yet
Artificial Intelligence: N-Gram Models: Russell & Norvig: Section 22.1
32 pages
Analysis of Statistical Parsing in Natural Language Processing
No ratings yet
Analysis of Statistical Parsing in Natural Language Processing
6 pages
Mechanical Engineering - Lab Manual For Measurement and Instrumentation
No ratings yet
Mechanical Engineering - Lab Manual For Measurement and Instrumentation
18 pages
CDM 400x300 en
No ratings yet
CDM 400x300 en
5 pages
Voisey Bay C&F
No ratings yet
Voisey Bay C&F
16 pages
Muhammad Naseem Electrical Supervisor CV
No ratings yet
Muhammad Naseem Electrical Supervisor CV
3 pages
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet

NLP Unit-4

Uploaded by

NLP Unit-4

Uploaded by

Language Models

• Formal grammars (e.g. regular, context free)

• A language model also supports predicting

• Chain rule of probability

• Relative frequency estimates can be proven to be

• P(<s> i want english food </s>)

P(w1w2…wn ) = P(wi | w1w2…wi−1)

P(“its water is so transparent”) =

P(the | its water is so transparent

• No! Too many possible sentences!

• Measures the weighted average branching factor

• The branching factor of a language is the

• Many advanced techniques have been

Context Free Grammar for 22

Context Free Grammar for 23

•Context Free Grammar for English 24

Noun → flight | breeze | trip | morning | …

S → NP VP I + want a morning flight

• Sentence with imperative structure

Context Free Grammar for 28

Parsing with CFG 29

• The parser can be viewed as searching through the space of

•Parsing with CFG 30

•Parsing with CFG 31

•Parsing with CFG 32

•Parsing with CFG 33

•Parsing with CFG 34

Parsing with CFG 35

Parsing with CFG 36

Parsing with CFG 37

P(“book the flight through Houston”) =

You might also like