0% found this document useful (0 votes)

7 views93 pages

SPand MT

Uploaded by

juggikapoor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views93 pages

SPand MT

Uploaded by

juggikapoor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 93

Lecture 4: Semantic Parsing and Machine

Translation

Kyle Richardson

[email protected]

April 27, 2016

Lecture Plan

I paper: Wong and Mooney (2006)

I general topics: Synchronous CFGs, Decoding by parsing,
word-alignment and rule extraction.

2
The Big Picture (reminder)
I Standard processing pipeline

(FOR EVERY X /
Semantic Parsing MAJORELT : T;
input sem (FOR EVERY Y /
SAMPLE : (CONTAINS Y X);
(PRINTOUT Y)))
List samples that contain
every major element Knowledge Representation

Interpretation

world

JsemK ={S10019,S10059,...}

Lunar QA system (Woods (1973))

3
Semantic Parsing: Generating formal representations
I Data-driven: Given data, learn a function that can map any
given input (x) to a meaning representation (z).
I What kind of data do we learn from?

input x What state has the largest population?

sem z (argmax (λx. (state x) (population x)))

world JzK California

Geoquery Corpus (Zelle and Mooney (1996))

4
Previously: Learning from meaning representations (again)
data: (x =two times two plus three,y = (plus (mult 2 2) 3))

I Compositional model : a semantic context-free grammar.

I Learning Model: Greedy string → tree rule induction (SILT)
I Other Topics
I Non-greedy parsing using (P)CFGs and dynamic programming,
the CKY algorithm.
I Maximum-Likelihood estimation, Expectation Maximization
and latent variables, inside-outside probabilities.

5
Previous Session: Transformation rules
I Decompose translation into a set of local transformations.

data: (x =two multiplied by two plus three,y = (plus (mult 2 2) 3))

(plus (mult 2 2) 3)

(mult 2 2) plus N:3

N: 2 mult N:2 + 3

2 * 2

N : 2
N : 2 mult

2
r1: 2 , r2: * , r1: ’two’ −→

6
Bottom-up, String → Tree Rule Matching

RULE
MR Grammar
CONDITION DIRECTIVE
RULE −→ CONDITION DIRECTIVE
CONDITION −→ bowner TEAM UNUM
bowner TEAM UNUM do TEAM UNUM ACTION DIRECTIVE −→ do TEAM UNUM ACTION
TEAM −→ our
UNUM −→ 4
our 4 our 4 shoot ACTION −→ shoot

Transformation: If TEAM player 4 has the ball, TEAM player 4 should shoot.
Input: If our player 4 has the ball, our player 4 should shoot.

7
Semantic Parsing and Machine Translation

I Conceptually: problem is treated as a kind of machine translation

problem.
I Dataset: D = {(xi , yi )}ni=1 , xj sentence, yj (semantic)
translation.

8
Semantic Parsing and Machine Translation

I Conceptually: problem is treated as a kind of machine translation

problem.
I Dataset: D = {(xi , yi )}ni=1 , xj sentence, yj (semantic)
translation.
I Technically: transformation rules, common in MT.

8
Semantic Parsing and Machine Translation

I Conceptually: problem is treated as a kind of machine translation

problem.
I Dataset: D = {(xi , yi )}ni=1 , xj sentence, yj (semantic)
translation.
I Technically: transformation rules, common in MT.

I Idea: Recast the problem as a statistical MT task.

8
Semantic Parsing and Machine Translation

I Conceptually: problem is treated as a kind of machine translation

problem.
I Dataset: D = {(xi , yi )}ni=1 , xj sentence, yj (semantic)
translation.
I Technically: transformation rules, common in MT.

I Idea: Recast the problem as a statistical MT task.

I Components:

8
Semantic Parsing and Machine Translation

I Conceptually: problem is treated as a kind of machine translation

problem.
I Dataset: D = {(xi , yi )}ni=1 , xj sentence, yj (semantic)
translation.
I Technically: transformation rules, common in MT.

I Idea: Recast the problem as a statistical MT task.

I Components:
I Synchronous grammar model

8
Semantic Parsing and Machine Translation

I Conceptually: problem is treated as a kind of machine translation

problem.
I Dataset: D = {(xi , yi )}ni=1 , xj sentence, yj (semantic)
translation.
I Technically: transformation rules, common in MT.

I Idea: Recast the problem as a statistical MT task.

I Components:
I Synchronous grammar model
I Alignment-based rule extraction

8
Semantic Parsing and Machine Translation

I Conceptually: problem is treated as a kind of machine translation

problem.
I Dataset: D = {(xi , yi )}ni=1 , xj sentence, yj (semantic)
translation.
I Technically: transformation rules, common in MT.

I Idea: Recast the problem as a statistical MT task.

I Components:
I Synchronous grammar model
I Alignment-based rule extraction
I Probabilistic decoding and ranking model (more next lecture)

8
Context-Free Grammars (again)

I context-free grammar (CFG):

G = (Σ, N, S, R)

I N : set of non-terminal symbols.

I Σ: set terminal symbols.
I R : set of rules = {N → α | α ∈ (N ∪ Σ)∗ }
I S : start symbol

I Context-free language: defines a set of strings

I Derivation: A tree representation of rule application on input.

I Semantic Parsing: Semantic representations and composition rules take
the form of non-terminal rules in derivations.

9
Previous examples
I Derivation trees encode the semantic rules.

I example: u = two times two plus three

N: (plus (mult 2 2) 3)

N : (mult 2 2) R : plus N : 3

N : 2 R : mult N : 2 plus three

two times two

language G ={two times two, two times two plus three, ...}

10
Synchronous Context-Free Grammars (extension)
I synchronous context-free grammar (SCFG):

G Syn = (Σe , Σf , N, S, R)

I N : (shared) set of non-terminal symbols (as before).

I Σe : english terminal symbols.
I Σf : foreign (or semantic) terminal symbols.
I R : set of rules of the form:

N → hα, βi

I α ∈ (N ∪ Σe ), β ∈ (N ∪ Σf )
I S : start symbol: hS1 , S2 i

I SCF Language: defines a set of string pairs

I Allows us to more explicitly relate input and output.

11
Machine Translation Example

I Example: English → Japanese synchronous grammar.

I Notation: subscripts on each non-terminal N are used to relate rules on

each side. These rules must be paired in each rule. 1

S −→ hNP1 VP2 , NP1 VP2 i

VP −→ hV1 NP2 , NP2 V1 i
NP −→ hI, watashi wai
NP −→ hthe box, hako woi
V −→ hopen, akemasui

1
example from Chiang and Knight (2006)
12
Machine Translation: Example Derivation

Grammar:
S −→ hNP1 VP2 , NP1 VP2 i
VP −→ hV1 NP2 , NP2 V1 i
NP −→ hI, watashi wai
NP −→ hthe box, hako woi
V −→ hopen, akemasui

13
Machine Translation: Example Derivation

Grammar:
S −→ hNP1 VP2 , NP1 VP2 i
VP −→ hV1 NP2 , NP2 V1 i
NP −→ hI, watashi wai
NP −→ hthe box, hako woi
V −→ hopen, akemasui

Derivation
S ⇒ hNP11 VP12 , NP11 , VP12 i

13
Machine Translation: Example Derivation

Grammar:
S −→ hNP1 VP2 , NP1 VP2 i
VP −→ hV1 NP2 , NP2 V1 i
NP −→ hI, watashi wai
NP −→ hthe box, hako woi
V −→ hopen, akemasui

Derivation
S ⇒ hNP11 VP12 , NP11 , VP12 i
⇒ hNP11 V13 NP14 , NP11 NP14 V13 i

13
Machine Translation: Example Derivation

Grammar:
S −→ hNP1 VP2 , NP1 VP2 i
VP −→ hV1 NP2 , NP2 V1 i
NP −→ hI, watashi wai
NP −→ hthe box, hako woi
V −→ hopen, akemasui

Derivation
S ⇒ hNP11 VP12 , NP11 , VP12 i
⇒ hNP11 V13 NP14 , NP11 NP14 V13 i
⇒ hI V13 NP14 , watashi wa NP14 V13 i

13
Machine Translation: Example Derivation

Grammar:
S −→ hNP1 VP2 , NP1 VP2 i
VP −→ hV1 NP2 , NP2 V1 i
NP −→ hI, watashi wai
NP −→ hthe box, hako woi
V −→ hopen, akemasui

Derivation
S ⇒ hNP11 VP12 , NP11 , VP12 i
⇒ hNP11 V13 NP14 , NP11 NP14 V13 i
⇒ hI V13 NP14 , watashi wa NP14 V13 i
⇒ hI open NP14 , watashi wa NP14 akemasu i

13
Machine Translation: Example Derivation

Grammar:
S −→ hNP1 VP2 , NP1 VP2 i
VP −→ hV1 NP2 , NP2 V1 i
NP −→ hI, watashi wai
NP −→ hthe box, hako woi
V −→ hopen, akemasui

Derivation
S ⇒ hNP11 VP12 , NP11 , VP12 i
⇒ hNP11 V13 NP14 , NP11 NP14 V13 i
⇒ hI V13 NP14 , watashi wa NP14 V13 i
⇒ hI open NP14 , watashi wa NP14 akemasu i
⇒ hI open the box , watashi wa hako wo akemasu i

13
SCFGs

I SCFG language: defines a set of sentence pairs

G Syn = {(I open the box, watashi wa hako wo akemasu), ...}

I derivation: a pair of trees.

S S !
NP VP , NP VP

I V NP watashi wa NP VP

open the box hako wo akemasu

14
Two Variants of Parsing

I Parsing pairs: Given an english text and foreign text, generate a

synchronous derivation using a grammar G Syn (bitext parsing)

(I open the box, watashi wa hako wo akemasu) → derivation

I Translation or Decoding: Given an english text, translate it into a
foreign text using a grammar G Syn

I open the box → watashi wa hako wo akemasu

15
Two Variants of Parsing

I Parsing pairs: Given an english text and foreign text, generate a

synchronous derivation using a grammar G Syn (bitext parsing)

(I open the box, watashi wa hako wo akemasu) → derivation

I Translation or Decoding: Given an english text, translate it into a
foreign text using a grammar G Syn

I open the box → watashi wa hako wo akemasu

I Surprisingly: The first problem is much harder than the second (despite
more information). We will only consider the second.

15
Decoding by parsing (i.e., Translation)

I Assuming we have binary rules, we can use the CKY algorithm (last
lecture) for parsing.
I Idea: Parse the english side of the grammar in the normal way, then
apply or project foreign side of rules.
I Why does this work? Synchronous rules have the same LHSs.

16
Decoding by Parsing: Parse English Side

Grammar:
S −→ hNP1 VP2 , NP1 VP2 i
VP −→ hV1 NP2 , NP2 V1 i
NP −→ hI, watashi wai
NP −→ hthe box, hako woi
V −→ hopen, akemasui

1 2 3
0
0 I 1 open 2 the box 3
1
2

17
Decoding by Parsing: Parse English Side

Grammar:
S −→ hNP1 VP2 , NP1 VP2 i
VP −→ hV1 NP2 , NP2 V1 i
NP −→ hI, watashi wai
NP −→ hthe box, hako woi
V −→ hopen, akemasui

1 2 3
0 NP → I
0 I 1 open 2 the box 3
1
2

18
Decoding by Parsing: Parse English Side

Grammar:
S −→ hNP1 VP2 , NP1 VP2 i
VP −→ hV1 NP2 , NP2 V1 i
NP −→ hI, watashi wai
NP −→ hthe box, hako woi
V −→ hopen, akemasui

1 2 3
0 NP → I
0 I 1 open 2 the box 3
1 V → open

19
Decoding by Parsing: Parse English Side

Grammar:
S −→ hNP1 VP2 , NP1 VP2 i
VP −→ hV1 NP2 , NP2 V1 i
NP −→ hI, watashi wai
NP −→ hthe box, hako woi
V −→ hopen, akemasui

1 2 3
0 NP → I
0 I 1 open 2 the box 3
1 V → open

2 NP → the box

20
Decoding by Parsing: Parse English Side

Grammar:
S −→ hNP1 VP2 , NP1 VP2 i
VP −→ hV1 NP2 , NP2 V1 i
NP −→ hI, watashi wai
NP −→ hthe box, hako woi
V −→ hopen, akemasui

1 2 3
0 NP → I
0 I 1 open 2 the box 3
1 V → open VP → V NP

2 NP → the box

21
Decoding by Parsing: Parse English Side

Grammar:
S −→ hNP1 VP2 , NP1 VP2 i
VP −→ hV1 NP2 , NP2 V1 i
NP −→ hI, watashi wai
NP −→ hthe box, hako woi
V −→ hopen, akemasui

1 2 3
0 NP → I S → NP VP
0 I 1 open 2 the box 3
1 V → open VP → V NP

2 NP → the box

22
Decoding by Parsing: Projection

Grammar:
S −→ hNP1 VP2 , NP1 VP2 i
VP −→ hV1 NP2 , NP2 V1 i
NP −→ hI, watashi wai
NP −→ hthe box, hako woi
V −→ hopen, akemasui

1 2 3
0 NP → I, watashi wa S → NP VP, NP VP

0 I 1 open 2 the box 3 1 V → open akemasu VP → V NP, NP V

2 NP → the box

hako wo

23
Binarization (brief reminder/review)
I CKY algorithm (last week) assumes input grammar is in Chomsky
normal-form (binary rules and unary pre-terminal rules only).
I Why? input: w1 w2 w3 w4

24
Binarization (brief reminder/review)
I CKY algorithm (last week) assumes input grammar is in Chomsky
normal-form (binary rules and unary pre-terminal rules only).
I Why? input: w1 w2 w3 w4

binary (one split) binary+ternary (two splits)

(w1 , w2 ) (w1 , w2 )
(w2 , w3 ) (w2 , w3 )
(w3 , w4 ) (w3 , w4 )
(w1 , w2 w3 ) (w1 , w2 w3 )
(w1 w2 , w3 ) (w1 w2 , w3 )
(w2 w3 , w4 ) (w2 w3 , w4 )
... ...
(w1 , w2 , w3 )
(w1 w2 , w3 , w4 )
...

24
Binarization (brief reminder/review)
I CKY algorithm (last week) assumes input grammar is in Chomsky
normal-form (binary rules and unary pre-terminal rules only).
I Why? input: w1 w2 w3 w4

binary (one split) binary+ternary (two splits)

(w1 , w2 ) (w1 , w2 )
(w2 , w3 ) (w2 , w3 )
(w3 , w4 ) (w3 , w4 )
(w1 , w2 w3 ) (w1 , w2 w3 )
(w1 w2 , w3 ) (w1 w2 , w3 )
(w2 w3 , w4 ) (w2 w3 , w4 )
... ...
(w1 , w2 , w3 )
(w1 w2 , w3 , w4 )
...
I Problem: Unlike normal CFGs, SCFGs cannot be binarized in the
general case.
24
History: Syntax-Directed Translation

I First developed as a method for programming language compilation (i.e.

translating high-level languages to lower-level languages)

move ax, 1 !
for i in range(10):
n += i
, loop: add bx,ax
cmp ax, 10
jle loop

I Analogy: We can think of semantic parsing as a form of language compilation.

25
Big Idea: Wong and Mooney (2006)

I Transformation Rules: recast the string-to-tree rewrite rules (last class,

Kate et al. (2005)) as synchronous grammars rules.
I Rule Extraction: SCFGs are extracted using a word alignment model
(as done in other approaches to MT)

26
Semantic Parsing and Syntax-driven Translation
Grammar:
RULE −→ hif CONDITION1 DIRECTIVE2 , ( CONDITION1 DIRECTIVE2 )i
CONDITION −→ hTEAM1 player UNUM2 has the ball , (bowner TEAM1 {UNUM}2 )i
TEAM −→ hour, ouri
UNUM −→ hfour, 4i

27
Semantic Parsing and Syntax-driven Translation
Grammar:
RULE −→ hif CONDITION1 DIRECTIVE2 , ( CONDITION1 DIRECTIVE2 )i
CONDITION −→ hTEAM1 player UNUM2 has the ball , (bowner TEAM1 {UNUM}2 )i
TEAM −→ hour, ouri
UNUM −→ hfour, 4i

Deriv.
RULE ⇒ hif CONDITION1 DIRECTIVE2 , ( CONDITION1 , DIRECTIVE2 )i

Deriv.
RULE ⇒ hif CONDITION1 DIRECTIVE2 , ( CONDITION1 , DIRECTIVE2 )i
⇒ hif TEAM1 player UNUM2 has the ball DIR.2 , ((bowler TEAM1 {UNUM2 }, DIR2 )i

Deriv.
RULE ⇒ hif CONDITION1 DIRECTIVE2 , ( CONDITION1 , DIRECTIVE2 )i
⇒ hif TEAM1 player UNUM2 has the ball DIR.2 , ((bowler TEAM1 {UNUM2 }, DIR2 )i
⇒ hif our player UNUM2 has the ball DIR.2 , ((bowler our {UNUM2 }, DIR2 )i

((bowner our {4})(do our {6} (pos (left (half our))))) i

I Is this grammar in CNF?

27
Rule Extraction and Alignment

I Lexical Acquisition: finding optimal word alignments between NL

sentences and meaning representation (MR) fragments.
I Assumes (as in Kate et al. (2005)) a deterministic MR grammar.
I For alignment, MR is represented as a sequence of productions.

28
Rule Extraction and Alignment

I Lexical Acquisition: finding optimal word alignments between NL

sentences and meaning representation (MR) fragments.
I Assumes (as in Kate et al. (2005)) a deterministic MR grammar.
I For alignment, MR is represented as a sequence of productions.

28
Word-based alignment models (basics)

I Basic idea: Treat translation as a process of translating individual words2

Das Haus ist klein

the house is small

I Alignment function: a : i → j, (i english word to j foreign word)

a : {1 → 1, 2 → 2, 3 → 3, 4 → 4}

2
Examples from Koehn (2009) and some of his slides.
29
Word-based alignment models (basics)

I Basic idea: Treat translation as a process of translating individual words3

Das Haus ist klitzeklein

the house is very small

I Alignment function: a : i → j, (i english word to j foreign word)

a : {1 → 1, 2 → 2, 3 → 3, 4 → 4, 5 → 4}

I One-to-many: foreign might translate to multiple english words.

3
Examples from Koehn (2009)
30
Word-based alignment models (basics)

I Basic idea: Treat translation as a process of translating individual words4

NULL Das Haus ist klein

the house is just small

I Alignment function: a : i → j, (i english word to j foreign word)

a : {1 → 1, 2 → 2, 3 → 3, 4 → 0, 5 → 4}

I Null translation: english words might not have foreign

translations.

4
Examples from Koehn (2009)
31
Word-based alignment models (basics)

I Translation probability: defined as t(ei | fj ), or probability of english

word ei given a foreign word fj , s.t.
X
t(e | fj ) = 1.0
e

32
Word-based alignment models (basics)

I Translation probability: defined as t(ei | fj ), or probability of english

word ei given a foreign word fj , s.t.
X
t(e | fj ) = 1.0
e


 0.5 e = small
 0.2 e = tiny


t(. | klein) = 0.2 e = little
0.05 e = the




0.05 e = house


32
IBM Model 1

I IBM Model 1: Based entirely on translation (or lexical) probabilities

(Brown et al. (1993)).
I english sentence: e1 , .., ele
I foreign sentence: f1 , .., flf

33
IBM Model 1

I IBM Model 1: Based entirely on translation (or lexical) probabilities

(Brown et al. (1993)).
I english sentence: e1 , .., ele
I foreign sentence: f1 , .., flf

I Translation probability with alignment:

le
1 Y
p(e, a | f ) = t(ej | fa(j) )
(lf + 1)le j=1

I (lf + 1)le , the number of total alignments (assuming Null word).

33
IBM Model 1

I Translation probability with alignment:

le
1 Y
p(e, a | f ) = l
t(ej | fa(j) )
(lf + 1) e j=1

Das Haus ist klein

the house is small

I a : {1 → 1t(the|Das)=0.7 , 2 → 2t(house|Haus)=0.8 , 3 → 3...0.8 , 4 → 4...0.4 }
1
p(e, a | f ) = ∗ 0.7 ∗ 0.8 ∗ 0.8 ∗ 0.4 = 0.0029
54

34
IBM Model 1

I Translation probability with alignment:

le
1 Y
p(e, a | f ) = t(ej | fa(j) )
(lf + 1)le j=1

I (Overall) Translation probability:

X
p(e | f ) = p(e, a | f )
a

35
IBM Model 1

I Translation probability with alignment:

le
1 Y
p(e, a | f ) = t(ej | fa(j) )
(lf + 1)le j=1

I (Overall) Translation probability:

X
p(e | f ) = p(e, a | f )
a

I Problem: Requires summing over all alignments

I e.g., le = lf = 10 this equals (10 + 1)10 = 25, 937, 424, 601
alignments (Penn treebank, aver. somewhere near 27 words).

35
IBM Model 1
I Luckily, we can get around this (using some basic math).
I (Overall) Translation probability:

36
IBM Model 1

I Luckily, we can get around this (using some basic math).

I (Overall) Translation probability:

X
p(e | f ) = p(e, a | f )
a
le Xlf
1 Y
= t(ej | fj )
(lf + 1)le j=1 i=0

I e = my friend, f = mein freund (without Null)

37
Learning a Model1 aligner

I Requires learning translation probabilities t(ei | fj )

I Maximum Likelihood Estimation (MLE) (with full information)

count(ei , fj )
t(ei , fj ) = P
e count(e, fj )

38
Learning a Model1 aligner

I Requires learning translation probabilities t(ei | fj )

I Maximum Likelihood Estimation (MLE) (with full information)

count(ei , fj )
t(ei , fj ) = P
e count(e, fj )

I Problem: We don’t have full information (i.e, target alignments)

I Expecation Maximization (EM):

38
Learning a Model1 aligner

I Requires learning translation probabilities t(ei | fj )

I Maximum Likelihood Estimation (MLE) (with full information)

count(ei , fj )
t(ei , fj ) = P
e count(e, fj )

I Problem: We don’t have full information (i.e, target alignments)

I Expecation Maximization (EM):
I Initialize parameters randomly (or uniformly)

38
Learning a Model1 aligner

I Requires learning translation probabilities t(ei | fj )

I Maximum Likelihood Estimation (MLE) (with full information)

count(ei , fj )
t(ei , fj ) = P
e count(e, fj )

I Problem: We don’t have full information (i.e, target alignments)

I Expecation Maximization (EM):
I Initialize parameters randomly (or uniformly)
I e-step: Run the current model on your data, collect counts.

38
Learning a Model1 aligner

I Requires learning translation probabilities t(ei | fj )

I Maximum Likelihood Estimation (MLE) (with full information)

count(ei , fj )
t(ei , fj ) = P
e count(e, fj )

I Problem: We don’t have full information (i.e, target alignments)

I Expecation Maximization (EM):
I Initialize parameters randomly (or uniformly)
I e-step: Run the current model on your data, collect counts.
I m-step: Update parameters based on previous step.

38
Learning a Model1 aligner

I Requires learning translation probabilities t(ei | fj )

I Maximum Likelihood Estimation (MLE) (with full information)

count(ei , fj )
t(ei , fj ) = P
e count(e, fj )

I Problem: We don’t have full information (i.e, target alignments)

I Expecation Maximization (EM):
I Initialize parameters randomly (or uniformly)
I e-step: Run the current model on your data, collect counts.
I m-step: Update parameters based on previous step.
I Repeat last two steps until convergence.

38
EM for IBM Model1

Koehn (2009)

39
Model1 as a Translation Model

I Word decoding : Model1 can be used as translation model.

X
p(e | f ) = p(e, a | f )
a

I Nowadays, such models are used for extracting alignments, which are the
basis of more complex translation models (e.g. our syntax-based model).

I Viterbi alignment: Find the most likely alignment given a pair (easy,
find for each word ei the most likely fj )

ai = arg maxj∈{0...lf } t(ei | fj )

I K-best alignments: Can be extended to extract top k alignments.

40
Other IBM Models
I IBM Models 2-5: Go beyond using only the lexical translation
probabilities.

the1 man wearing the2 coat

the person with the jacket

41
Other IBM Models
I IBM Models 2-5: Go beyond using only the lexical translation
probabilities.

the1 man wearing the2 coat

the person with the jacket

I IBM Model2: adds an alignment probability distribution: a(i | j, le , lf ),

which considers relative word position and sentence length:
le
Y
p(e, a | f ) = t(ej | fa(j) )a(a(j) | j, le , lf )
j=1

41
Other IBM Models
I IBM Models 2-5: Go beyond using only the lexical translation
probabilities.

the1 man wearing the2 coat

the person with the jacket

I IBM Model2: adds an alignment probability distribution: a(i | j, le , lf ),

which considers relative word position and sentence length:
le
Y
p(e, a | f ) = t(ej | fa(j) )a(a(j) | j, le , lf )
j=1

I Model1: t(e4 | f1 ) = t(e4 | f4 )

I Model2: t(e4 | f1 )a(1 | 4, 5, 5) < t(e4 | f4 )a(4 | 4, 5, 5)

41
Other IBM Models

I Model1: lexical translation probabilities, bag-of-words.

I Model2: alignment probability distribution: a(i | j, le , lf )
I Model3: fertility distribution n(φ | f ), or distribution over the number
of words each fj usually translates to.

n(1 | haus) = 1.0, n(2 | klitzeklein) = 1.0, ...

I Model4: relative distortion, word classes.

I Model5: fixes deficiency problem.

42
Back to Semantic Parsing: Rule extraction (Wong and Mooney

(2006))

I Extraction: Train IBM Model5 over english sentences and sequences of

MR productions, and extract rules from 10-best alignment.
I Important: productions are used instead of MR tokens, allows for
skipping pieces without meaning.

43
Back to Semantic Parsing: Rule extraction (Wong and Mooney

(2006))

I Extraction: Train IBM Model5 over english sentences and sequences of

MR productions, and extract rules from 10-best alignment.
I Important: productions are used instead of MR tokens, allows for
skipping pieces without meaning.

43
Rule extraction (Wong and Mooney (2006))

I Extraction: Bottom-up (as done last week), starting from alignments

with terminal symbols, then working to more complex rules.

I alignment where RHS of production rule is a MR terminal:

I TEAM → hour, ouri, UNUM → h4, 4i, ...

44
Rule extraction (Wong and Mooney (2006))

I Extraction: Bottom-up (as done last week), starting from alignments

with terminal symbols, then working to more complex rules.

I alignment where RHS of production rule is a MR terminal:

I TEAM → hour, ouri, UNUM → h4, 4i, ...
I Move to more complex rules (adjust to account for sub patterns, skip
words by writing (num)):
I COND. →
hTEAM1 player UNUM2 has (1) ball, (bowner TEAM1 { UNUM2 } )i

44
Similar methods: Hiero rule extraction

I Specialized version of methods used for other types of syntax-based

decoding, e.g., hierarchical phrase-based translation (Chiang (2005))
I Does not require syntactic rules or analyses, learns them from
scratch.

30 duonianlai de youhao hezuo

friendly cooperation over the last 30 years

45
Similar methods: Hiero rule extraction

I Specialized version of methods used for other types of syntax-based

decoding, e.g., hierarchical phrase-based translation (Chiang (2005))
I Does not require syntactic rules or analyses, learns them from
scratch.

30 duonianlai de youhao hezuo

friendly cooperation over the last 30 years

X1 → h30, 30i

45
Similar methods: Hiero rule extraction

I Specialized version of methods used for other types of syntax-based

decoding, e.g., hierarchical phrase-based translation (Chiang (2005))
I Does not require syntactic rules or analyses, learns them from
scratch.

30 duonianlai de youhao hezuo

friendly cooperation over the last 30 years

X1 → h30, 30i
X2 → hfriendly cooperation, youhao hezuoi

45
Similar methods: Hiero rule extraction

I Specialized version of methods used for other types of syntax-based

decoding, e.g., hierarchical phrase-based translation (Chiang (2005))
I Does not require syntactic rules or analyses, learns them from
scratch.

30 duonianlai de youhao hezuo

friendly cooperation over the last 30 years

X1 → h30, 30i
X2 → hfriendly cooperation, youhao hezuoi
X3 → hover the last X1 years , X1 duonianlaii

45
Similar methods: Hiero rule extraction

I Specialized version of methods used for other types of syntax-based

decoding, e.g., hierarchical phrase-based translation (Chiang (2005))
I Does not require syntactic rules or analyses, learns them from
scratch.

30 duonianlai de youhao hezuo

friendly cooperation over the last 30 years

X1 → h30, 30i
X2 → hfriendly cooperation, youhao hezuoi
X3 → hover the last X1 years , X1 duonianlaii
X4 → hX2 X3 , X3 X2 i

45
Extension to logical variables

I So far, has been used on functional representations.

I λ-Wasp (Wong and Mooney (2007)) extends rules extraction to handle
logical and lambda variables, of the type:

A → hα, λx1 , ..., λxn βi

form. → smallest(x2 ,(form.,form.)) form. → state(x1 ) form. → area(x1 ,x2 )

smallest state by area

46
Extension to logical variables

I So far, has been used on functional representations.

I λ-Wasp (Wong and Mooney (2007)) extends rules extraction to handle
logical and lambda variables, of the type:

A → hα, λx1 , ..., λxn βi

form. → smallest(x2 ,(form.,form.)) form. → state(x1 ) form. → area(x1 ,x2 )

smallest state by area

form → hstate, λx1 . state(x)i

46
Extension to logical variables

I So far, has been used on functional representations.

I λ-Wasp (Wong and Mooney (2007)) extends rules extraction to handle
logical and lambda variables, of the type:

A → hα, λx1 , ..., λxn βi

form. → smallest(x2 ,(form.,form.)) form. → state(x1 ) form. → area(x1 ,x2 )

smallest state by area

form → hstate, λx1 . state(x)i

form → hby area, λx1 .λy2 . area(x, y )i

46
Extension to logical variables

I So far, has been used on functional representations.

I λ-Wasp (Wong and Mooney (2007)) extends rules extraction to handle
logical and lambda variables, of the type:

A → hα, λx1 , ..., λxn βi

form. → smallest(x2 ,(form.,form.)) form. → state(x1 ) form. → area(x1 ,x2 )

smallest state by area

form → hstate, λx1 . state(x)i

form → hby area, λx1 .λy2 . area(x, y )i
form → hsmallest form1 form2 , λx1 . smallest(x2 , (form1 (x1 ), form2 (x1 , x2 )))i

46
Probabilistic Model

I Lexical/rule induction: Over-generates, leading to many derivations.

I Extend the SCFG to a weighted SCFG (the synchronous analogue of the

PCFG), which defines a probability distribution over derivations.
I Goal is to discriminative different derivations, and find an output
translation f ∗ where

f ∗ = m(arg maxd∈D(G |e) Prλ (d | e))

47
Probabilistic Model

I Lexical/rule induction: Over-generates, leading to many derivations.

I Extend the SCFG to a weighted SCFG (the synchronous analogue of the

PCFG), which defines a probability distribution over derivations.
I Goal is to discriminative different derivations, and find an output
translation f ∗ where

f ∗ = m(arg maxd∈D(G |e) Prλ (d | e))

I D(G | e): The set of derivations given an english input e.

I Computed using dynamic-programming and something close to the
inside-outside algorithm (last week)
I Prλ (d | e): Training a log-linear model on example derivations (more on
this next week).

47
Overview and Take-aways

I Recasting the semantic parsing problem as an MT task.

48
Overview and Take-aways

I Recasting the semantic parsing problem as an MT task.

48
Overview and Take-aways

I Recasting the semantic parsing problem as an MT task.

I Synchronous grammars: modeling NL-MR transformations and
decoding by parsing.
I Word-Alignment Models: Basics, extracting semantic grammar
transformation rules
I Decoding and ranking models: Skipped over important details,
more about this next week
I Further directions
I Different tree-based translation models (Ehsen), more powerful
translation models (Mariia)
I Different rule extraction techniques: Li et al. (2013)

48
Roadmap

I Lecture 3 (today): rule extraction, decoding (MT perspective)

I Lecture 4: Structure prediction and classification (missing today).

49
References I
Brown, P. F., Pietra, V. J. D., Pietra, S. A. D., and Mercer, R. L. (1993). The
mathematics of statistical machine translation: Parameter estimation.
Computational linguistics, 19(2):263–311.
Chiang, D. (2005). A hierarchical phrase-based model for statistical machine
translation. In Proceedings of the 43rd Annual Meeting on Association for
Computational Linguistics, pages 263–270. Association for Computational
Linguistics.
Chiang, D. and Knight, K. (2006). An introduction to synchronous grammars.
Tutorial available at http://www. isi. edu/ chiang/papers/synchtut. pdf.
Kate, R. J., Wong, Y. W., and Mooney, R. J. (2005). Learning to transform natural
to formal languages. In Proceedings of AAAI-2005.
Koehn, P. (2009). Statistical machine translation. Cambridge University Press.
Li, P., Liu, Y., and Sun, M. (2013). An extended ghkm algorithm for inducing
lambda-scfg. In AAAI.
http://www.aaai.org/ocs/index.php/AAAI/AAAI13/paper/view/6189.
Wong, Y. W. and Mooney, R. J. (2006). Learning for semantic parsing with statistical
machine translation. In Proceedings of HLT-NAACL-2006, pages 439–446.
http://dl.acm.org/citation.cfm?id=1220891.
Wong, Y. W. and Mooney, R. J. (2007). Learning synchronous grammars for semantic
parsing with lambda calculus. In Proceedings of ACL-2007, Prague, Czech
Republic. http://anthology.aclweb.org/P/P07/P07-1121.pdf.

50
References II

Woods, W. A. (1973). Progress in natural language understanding: an application to

lunar geology. In Proceedings of the June 4-8, 1973, National Computer
Conference and Exposition, pages 441–450.
Zelle, J. M. and Mooney, R. J. (1996). Learning to parse database queries using
inductive logic programming. In Proceedings of AAAI-1996, pages 1050–1055.

Automata Theory Problems and Exercises
75% (4)
Automata Theory Problems and Exercises
11 pages
Introduction To Compiler Design
100% (1)
Introduction To Compiler Design
186 pages
CFG MCQ
100% (1)
CFG MCQ
7 pages
Toc Important Questions
No ratings yet
Toc Important Questions
2 pages
Unit - 5 Natural Language Processing
No ratings yet
Unit - 5 Natural Language Processing
66 pages
CS402 Final Term Solved SUBJECTIVE by JUNAID
No ratings yet
CS402 Final Term Solved SUBJECTIVE by JUNAID
59 pages
T.E. Information Technology 2019 Course 28.06.2021
No ratings yet
T.E. Information Technology 2019 Course 28.06.2021
116 pages
Errata For First Printing of Automata Computability and Complexity Theory and Applications
No ratings yet
Errata For First Printing of Automata Computability and Complexity Theory and Applications
5 pages
Lecture 3: Text Processing & Minimum Edit Distance Algorithm
No ratings yet
Lecture 3: Text Processing & Minimum Edit Distance Algorithm
57 pages
Formal Language$ And: Automata Theory
No ratings yet
Formal Language$ And: Automata Theory
2 pages
Theory of Computation
No ratings yet
Theory of Computation
22 pages
Thuật toán NLP
No ratings yet
Thuật toán NLP
57 pages
Compiler 2
100% (1)
Compiler 2
45 pages
Theory of Computation Guidelines and Practical List - TutorialsDuniya
No ratings yet
Theory of Computation Guidelines and Practical List - TutorialsDuniya
2 pages
Compilers: Topic 2: Lexical Analysis
No ratings yet
Compilers: Topic 2: Lexical Analysis
29 pages
Compiler Design Notes, IIT Delhi
No ratings yet
Compiler Design Notes, IIT Delhi
147 pages
14 Ai Cse551 NLP 2 PDF
No ratings yet
14 Ai Cse551 NLP 2 PDF
39 pages
Grammatical Methods in Computer Vision: An Overview
No ratings yet
Grammatical Methods in Computer Vision: An Overview
23 pages
Natural Language Processing: Dr. Ahmed El-Bialy
100% (1)
Natural Language Processing: Dr. Ahmed El-Bialy
49 pages
Theory of Automata: by Arjun Singh
No ratings yet
Theory of Automata: by Arjun Singh
29 pages
Language Description: Syntactic Structure
No ratings yet
Language Description: Syntactic Structure
35 pages
Compiler 2
No ratings yet
Compiler 2
45 pages
Compiler Design Notes Unit-1 & Unit-2
No ratings yet
Compiler Design Notes Unit-1 & Unit-2
59 pages
052 SyntaxDirectedTranslation
No ratings yet
052 SyntaxDirectedTranslation
57 pages
A Simple One-Pass Compiler (To Generate Code For The JVM)
No ratings yet
A Simple One-Pass Compiler (To Generate Code For The JVM)
70 pages
Chapter 2 (Part 1)
No ratings yet
Chapter 2 (Part 1)
32 pages
SDT Material
No ratings yet
SDT Material
30 pages
Ch2 Modified
No ratings yet
Ch2 Modified
39 pages
Syntactic Pattern Recognition: by Nicolette Nicolosi Ishwarryah S Ramanathan
100% (1)
Syntactic Pattern Recognition: by Nicolette Nicolosi Ishwarryah S Ramanathan
42 pages
What's in A Translation Rule?: Michel Galley Mark Hopkins Kevin Knight and Daniel Marcu
No ratings yet
What's in A Translation Rule?: Michel Galley Mark Hopkins Kevin Knight and Daniel Marcu
8 pages
CS 4300: Compiler Theory A Simple Syntax-Directed Translator
No ratings yet
CS 4300: Compiler Theory A Simple Syntax-Directed Translator
70 pages
NLP Endsem 2015
No ratings yet
NLP Endsem 2015
2 pages
Compiler Design - Syntax Analysis
No ratings yet
Compiler Design - Syntax Analysis
11 pages
Unit 3
No ratings yet
Unit 3
19 pages
COMP 3803 - Assignment 3
No ratings yet
COMP 3803 - Assignment 3
2 pages
CD Lab Viva Questions
No ratings yet
CD Lab Viva Questions
9 pages
Course Structure & Syllabus of Bachelor of Technology (B.Tech)
No ratings yet
Course Structure & Syllabus of Bachelor of Technology (B.Tech)
5 pages
NLP Unit 2
No ratings yet
NLP Unit 2
20 pages
Syllabus SVIIT CSE BTech (CSE) VI 2016 17 2017 - 18 - 10.01.2020
No ratings yet
Syllabus SVIIT CSE BTech (CSE) VI 2016 17 2017 - 18 - 10.01.2020
30 pages
Compiler Construction: Niklaus Wirth
No ratings yet
Compiler Construction: Niklaus Wirth
44 pages
A Research Paper
No ratings yet
A Research Paper
37 pages
CSCN05C Automata Language Theory
No ratings yet
CSCN05C Automata Language Theory
5 pages
NLP Unit-Ii
No ratings yet
NLP Unit-Ii
71 pages
4.chapter5 - Syntactic and Semantic Representations
No ratings yet
4.chapter5 - Syntactic and Semantic Representations
47 pages
CC-Lec 5 Week 5 Cfgs
No ratings yet
CC-Lec 5 Week 5 Cfgs
29 pages
8-Syntax Part1 Merged
No ratings yet
8-Syntax Part1 Merged
139 pages
13-Dependency Grammar-03-09-2024
No ratings yet
13-Dependency Grammar-03-09-2024
31 pages
NLP Module 3
No ratings yet
NLP Module 3
41 pages
Sonic System
No ratings yet
Sonic System
4 pages
Lecture 2 Hierarchy of NLP & TF-IDF
No ratings yet
Lecture 2 Hierarchy of NLP & TF-IDF
48 pages
Compiler Design Phase
No ratings yet
Compiler Design Phase
14 pages
Unit-3 Notes Part-1
No ratings yet
Unit-3 Notes Part-1
48 pages
Compiler Rewind
No ratings yet
Compiler Rewind
52 pages
Unit 3 NLP
No ratings yet
Unit 3 NLP
7 pages
Chapter 2 - Simple Syntax Directed Translator
No ratings yet
Chapter 2 - Simple Syntax Directed Translator
39 pages
Compiler Design Unit 2
No ratings yet
Compiler Design Unit 2
31 pages
NLP Unit-2
No ratings yet
NLP Unit-2
18 pages
Chapter 4 Syntax Analysis
No ratings yet
Chapter 4 Syntax Analysis
95 pages
14 Syntax 1
No ratings yet
14 Syntax 1
22 pages
Unit 3
No ratings yet
Unit 3
25 pages
2-Role of Parser and Parse Tree-02!08!2024
No ratings yet
2-Role of Parser and Parse Tree-02!08!2024
69 pages
CSC 409 Note 2
No ratings yet
CSC 409 Note 2
12 pages
Pert24 - NLP For Communication
No ratings yet
Pert24 - NLP For Communication
30 pages
Chapter4 1
No ratings yet
Chapter4 1
61 pages
Mod - 3
No ratings yet
Mod - 3
51 pages
Lecture 11
No ratings yet
Lecture 11
56 pages
8 Notes
No ratings yet
8 Notes
12 pages
At QB
No ratings yet
At QB
13 pages
Lecture15 Parsing
No ratings yet
Lecture15 Parsing
37 pages
Top Down
No ratings yet
Top Down
25 pages
Class Three
No ratings yet
Class Three
74 pages
Constituency Parsing PPT 2
No ratings yet
Constituency Parsing PPT 2
33 pages
Lab 06 - Parse Tree Tutorial
No ratings yet
Lab 06 - Parse Tree Tutorial
10 pages
M.SC (Computer Science), GGCA, Syllabus-New
No ratings yet
M.SC (Computer Science), GGCA, Syllabus-New
32 pages
9-Syntax Part1
No ratings yet
9-Syntax Part1
26 pages
Scheme and Syllabus - 5th Sem 2022-28th Sep
No ratings yet
Scheme and Syllabus - 5th Sem 2022-28th Sep
33 pages
Module-2 ch-4
No ratings yet
Module-2 ch-4
32 pages
Chapter 4
No ratings yet
Chapter 4
62 pages
Study MAterial Unit 2
No ratings yet
Study MAterial Unit 2
16 pages
Longsem2024-25 Cse3015 Eth Ap2024256000125 Reference-material-III
No ratings yet
Longsem2024-25 Cse3015 Eth Ap2024256000125 Reference-material-III
89 pages
Module 3 NLP
No ratings yet
Module 3 NLP
32 pages
Complex Systems Terry R J Bossomaier Editor David G Green Editor PDF Download
No ratings yet
Complex Systems Terry R J Bossomaier Editor David G Green Editor PDF Download
91 pages
NLP - Shortnotes Unit 3
No ratings yet
NLP - Shortnotes Unit 3
16 pages
NLP Sem 3 Unit
No ratings yet
NLP Sem 3 Unit
12 pages
Deep Learning Fundamentals in Python
From Everand
Deep Learning Fundamentals in Python
LazyProgrammer
4/5 (9)
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Composing Software: An Exploration of Functional Programming and Object Composition in JavaScript
From Everand
Composing Software: An Exploration of Functional Programming and Object Composition in JavaScript
Eric Elliott
No ratings yet
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Advanced C++ Interview Questions You'll Most Likely Be Asked
From Everand
Advanced C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet