Lecture 3: Text Processing & Minimum Edit Distance Algorithm
Lecture 3: Text Processing & Minimum Edit Distance Algorithm
• Top down – explores options that don’t match the full sentence
o Example: recursive descent (rdparser in nltk)
o Example: Early parser
INPUT: a b b c d e $ OUTPUT:
Production
S aABe
Bottom-Up Parsing
A Abc
Ab
Program
Bd
10
Shift-Reduce parsing
Bottom-Up Parser Example
Shift b
Reduce from b to A
INPUT: a b b c d e $ OUTPUT:
Production
S aABe
Bottom-Up Parsing
A Abc A
Ab
Program
Bd b
11
Shift-Reduce parsing
Bottom-Up Parser Example
Shift A
INPUT: a A b c d e $ OUTPUT:
Production
S aABe
Bottom-Up Parsing
A Abc A
Ab
Program
Bd b
12
Shift-Reduce parsing
Bottom-Up Parser Example
Shift b
INPUT: a A b c d e $ OUTPUT:
Production
S aABe
Bottom-Up Parsing
A Abc A
Ab
Program
Bd b
13
Shift-Reduce parsing
Bottom-Up Parser Example
Shift c
Reduce from Abc to A
INPUT: a A b c d e $ OUTPUT:
Production
A
S aABe
Bottom-Up Parsing
A Abc A b c
Ab
Program
Bd b
14
Introduction(8)
Bottom-Up Parser Example
Shift A
INPUT: a A d e $ OUTPUT:
Production
A
S aABe
Bottom-Up Parsing
A Abc A b c
Ab
Program
Bd b
15
Shift-Reduce parsing
Bottom-Up Parser Example
Shift d
Reduce from d to B
INPUT: a A d e $ OUTPUT:
Production
A B
S aABe
Bottom-Up Parsing
A Abc A b c d
Ab
Program
Bd b
16
Shift-Reduce parsing
Bottom-Up Parser Example
Shift B
INPUT: a A B e $ OUTPUT:
Production
A B
S aABe
Bottom-Up Parsing
A Abc A b c d
Ab
Program
Bd b
17
Shift-Reduce parsing
Bottom-Up Parser Example
Shift e
Reduce from aABe to S
INPUT: a A B e $ OUTPUT:
S
Production e
a A B
S aABe
Bottom-Up Parsing
A Abc A b c d
Ab
Program
Bd b
18
Shift-Reduce parsing
Bottom-Up Parser Example
Shift S
Hit the target $
INPUT: S $ OUTPUT:
S
Production e
a A B
S aABe
Bottom-Up Parsing
A Abc A b c d
Ab
Program
Bd b
Det N P NP
a spoon
The parsing problem
correct test trees
P
A s
c
R o
S r accuracy
e
E r
test R
sentences Recent parsers
quite accurate
Grammar … good enough
to help a range of
NLP tasks!
Chomsky Normal Form
The right-hand side of a standard CFG can have an arbitrary
number of symbols (terminals and nonterminals):
VP
VP → ADV eat ADV eat NP
NP
A CFG in Chomsky Normal Form (CNF) allows only two
kinds of right-hand sides:
–Two nonterminals: VP → ADV VP
–One terminal: VP → eat
S → NP VP ea eat
t V sushi
VP → V NP
V → VP
eat NP →
NP
sush
i
NP
we→ mango
We eat mango
CS447 Natural Language Processing 14
CKY algorithm, recognizer version
for J := 1 to n
Add to [J-1,J] all categories for the Jth word
for col := 2 to n
for i := 0 to n-col
k := i+ col
for j := i+1 to k-1
for every nonterminal Y in [i,j]
for every nonterminal Z in [j,k]
for all nonterminals X
if X Y Z is in the grammar
then add X to [i,k]
CKY: filling the chart
w ... ... ... w w ... ... ... w w ... ... ... w w ... ... ... w
wi wi wi wi
w w w w
... ... ... ...
.. .. .. ..
wi wi wi wi
w w w w
w w w
... ... ...
.. .. ..
wi wi wi
w
chart[2][6]:
...
w1 w2 w3 w4 w5 w6 w7
..
wi
...
w
.. .. .. ..
wi wi wi wi
... ... ... ...
w w w w
VP VP PP
V drinks P PP
with with mil k
NP NP PP
NP we NP
milk
NP drinks
NP milk
PP P NP We buy drinks with milk
P with
30
The CKY parsing algorithm
wewe
eateat wewe
eateat
w
we we eat we eat sushi sushi
e eat sushi mango mango
mango with with apple
tuna
with with
S NP VP V
ea eat VP eat sushi
mangoi VP with
eat sushi
tea eat
sushimango with tuna with
eat mango
with
VP V NP t apple
VP VP PP sush sushiwith NP
sushi with
mango mango
i with tuna
Mango with apple
V eat
NP NP PP
wit withPP
NP we h with with apple
tuna
NP mango
tun
apple
NP apple a
PP P
P
NP with We eat mango with apple
31
What are the terminals in NLP?
Are the “terminals”: words or POS tags?
• More variables:
P(A,B,C,D) = P(A)P(B|A)P(C|A,B)P(D|A,B,C)
• The Chain Rule in General
P(x1,x2,x3,…,xn) = P(x1)P(x2|x1)P(x3|x1,x2)…P(xn|x1,…,xn-1)
The Chain Rule applied to compute joint
probability of words in sentence
Which is closest?
• Kirachi • Resulting alignment:
• Karachu
• Kerrach -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---
TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC
• Kararachi
Lemmatization
Stemming
Spelling correction
Sentence segmentation
Summary
56
References
• Wikipedia.com
• Prof. Jason Eisner (Natural Language
Processing)John Hopkins University.
• Web.Standford.edu
57