NLP Unit-4
NLP Unit-4
• Word sequences
w1n = w1...wn
• Bigram approximation
n
P( w1 ) = P( wk | wk −1 )
n
k =1
• N-gram approximation
n
P( w ) = P( wk | wkk−−1N +1 )
n
1
k =1
Estimating Probabilities
• N-gram conditional probabilities can be estimated
from raw text based on the relative frequency of
word sequences.
C ( wn −1wn )
Bigram: P( wn | wn −1 ) =
C ( wn −1 )
n −1
C ( wn − N +1 wn )
N-gram: P( wn | wnn−−1N +1 ) =
C ( wnn−−1N +1 )
• To have a consistent probabilistic model, append a
unique start (<s>) and end (</s>) symbol to every
sentence and treat these as additional words.
Generative Model & MLE
• An N-gram model can be seen as a probabilistic
automata for generating sentences.
Initialize sentence with N−1 <s> symbols
Until </s> is generated do:
Stochastically pick the next word based on the conditional
probability of each word given the previous N −1 words.
• Top-down parsing
• Bottom-up parsing
39
SCFG
40
Probabilistic Context Free Grammar
(PCFG)
• A PCFG is a probabilistic version of a CFG where each
production has a probability.
• Probabilities of all productions rewriting a given non-
terminal must add to 1, defining a distribution for each
non-terminal.
• String generation is now probabilistic where production
probabilities are used to non-deterministically select a
production for rewriting a given non-terminal.
41
Simple PCFG for ATIS English
Grammar Prob Lexicon
S → NP VP 0.8 Det → the | a | that | this
S → Aux NP VP 0.1 + 1.0 0.6 0.2 0.1 0.1
S → VP 0.1 Noun → book | flight | meal | money
NP → Pronoun 0.2 0.1 0.5 0.2 0.2
NP → Proper-Noun 0.2 + 1.0 Verb → book | include | prefer
NP → Det Nominal 0.6 0.5 0.2 0.3
Nominal → Noun 0.3 Pronoun → I | he | she | me
Nominal → Nominal Noun 0.2 0.5 0.1 0.1 0.3
Nominal → Nominal PP 0.5 + 1.0
Proper-Noun → Houston | NWA
VP → Verb 0.2 0.8 0.2
VP → Verb NP 0.5 Aux → does
VP → VP PP 0.3 + 1.0 1.0
PP → Prep NP 1.0 Prep → from | to | on | near | through
0.25 0.25 0.1 0.2 0.2
Sentence Probability
• Assume productions for each node are chosen
independently.
• Probability of derivation is the product of the
probabilities of its productions.
P(D1) = 0.1 x 0.5 x 0.5 x 0.6 x 0.6 x S
0.1 D1
0.5 x 0.3 x 1.0 x 0.2 x 0.2 x VP
0.5
0.5 x 0.8 Verb NP 0.6
= 0.0000216 0.5
book Det Nominal 0.5
0.6
the Nominal PP 1.0
0.3
Noun Prep NP 0.2
0.5 0.2
flight through Proper-Noun
0.8
Houston
43
Syntactic Disambiguation
• Resolve ambiguity by picking most probable
parse tree.
S
P(D2) = 0.1 x 0.3 x 0.5 x 0.6 x 0.5 x 0.1 D2
0.6 x 0.3 x 1.0 x 0.5 x 0.2 x VP
0.3
VP 0.5
0.2 x 0.8
= 0.00001296 Verb NP 0.6
0.5 PP
book Det Nominal 1.0
0.6 0.3
the Noun Prep NP 0.2
0.5 0.2
flight through Proper-Noun
0.8
Houston
44 44
Sentence Probability
• Probability of a sentence is the sum of the probabilities of
all of its derivations.
45
Three Useful PCFG Tasks
• Observation likelihood: To classify and order sentences.
• Most likely derivation: To determine the most likely parse
tree for a sentence.
• Maximum likelihood training: To train a PCFG to fit
empirical training data.
46
Treebanks
• a treebank is a
parsed textcorpus that annotates syntactic or
semantic sentence structure.
• English Penn Treebank: Standard corpus for
testing syntactic parsing consists of 1.2 M
words of text from the Wall Street Journal
(WSJ).
• Typical to train on about 40,000 parsed
sentences and test on an additional standard
disjoint test set of 2,416 sentences.
• Chinese Penn Treebank: 100K words from the
47
Xinhua news service
Parsing Evaluation Metrics
• PARSEVAL metrics measure the fraction of the
constituents that match between the computed
and human parse trees. If P is the system’s parse
tree and T is the human parse tree:
– Recall = (# correct constituents in P) / (# constituents in T)
– Precision = (# correct constituents in P) / (# constituents in P)
• Labeled Precision and labeled recall require
getting the non-terminal label on the constituent
node correct to count as correct.
48