SPand MT
SPand MT
Translation
Kyle Richardson
2
The Big Picture (reminder)
I Standard processing pipeline
(FOR EVERY X /
Semantic Parsing MAJORELT : T;
input sem (FOR EVERY Y /
SAMPLE : (CONTAINS Y X);
(PRINTOUT Y)))
List samples that contain
every major element Knowledge Representation
Interpretation
world
JsemK ={S10019,S10059,...}
3
Semantic Parsing: Generating formal representations
I Data-driven: Given data, learn a function that can map any
given input (x) to a meaning representation (z).
I What kind of data do we learn from?
5
Previous Session: Transformation rules
I Decompose translation into a set of local transformations.
N: 2 mult N:2 + 3
2 * 2
N : 2
N : 2 mult
2
r1: 2 , r2: * , r1: ’two’ −→
6
Bottom-up, String → Tree Rule Matching
RULE
MR Grammar
CONDITION DIRECTIVE
RULE −→ CONDITION DIRECTIVE
CONDITION −→ bowner TEAM UNUM
bowner TEAM UNUM do TEAM UNUM ACTION DIRECTIVE −→ do TEAM UNUM ACTION
TEAM −→ our
UNUM −→ 4
our 4 our 4 shoot ACTION −→ shoot
Transformation: If TEAM player 4 has the ball, TEAM player 4 should shoot.
Input: If our player 4 has the ball, our player 4 should shoot.
7
Semantic Parsing and Machine Translation
8
Semantic Parsing and Machine Translation
8
Semantic Parsing and Machine Translation
8
Semantic Parsing and Machine Translation
8
Semantic Parsing and Machine Translation
8
Semantic Parsing and Machine Translation
8
Semantic Parsing and Machine Translation
8
Context-Free Grammars (again)
G = (Σ, N, S, R)
9
Previous examples
I Derivation trees encode the semantic rules.
N: (plus (mult 2 2) 3)
N : (mult 2 2) R : plus N : 3
language G ={two times two, two times two plus three, ...}
10
Synchronous Context-Free Grammars (extension)
I synchronous context-free grammar (SCFG):
G Syn = (Σe , Σf , N, S, R)
N → hα, βi
I α ∈ (N ∪ Σe ), β ∈ (N ∪ Σf )
I S : start symbol: hS1 , S2 i
11
Machine Translation Example
1
example from Chiang and Knight (2006)
12
Machine Translation: Example Derivation
Grammar:
S −→ hNP1 VP2 , NP1 VP2 i
VP −→ hV1 NP2 , NP2 V1 i
NP −→ hI, watashi wai
NP −→ hthe box, hako woi
V −→ hopen, akemasui
13
Machine Translation: Example Derivation
Grammar:
S −→ hNP1 VP2 , NP1 VP2 i
VP −→ hV1 NP2 , NP2 V1 i
NP −→ hI, watashi wai
NP −→ hthe box, hako woi
V −→ hopen, akemasui
Derivation
S ⇒ hNP11 VP12 , NP11 , VP12 i
13
Machine Translation: Example Derivation
Grammar:
S −→ hNP1 VP2 , NP1 VP2 i
VP −→ hV1 NP2 , NP2 V1 i
NP −→ hI, watashi wai
NP −→ hthe box, hako woi
V −→ hopen, akemasui
Derivation
S ⇒ hNP11 VP12 , NP11 , VP12 i
⇒ hNP11 V13 NP14 , NP11 NP14 V13 i
13
Machine Translation: Example Derivation
Grammar:
S −→ hNP1 VP2 , NP1 VP2 i
VP −→ hV1 NP2 , NP2 V1 i
NP −→ hI, watashi wai
NP −→ hthe box, hako woi
V −→ hopen, akemasui
Derivation
S ⇒ hNP11 VP12 , NP11 , VP12 i
⇒ hNP11 V13 NP14 , NP11 NP14 V13 i
⇒ hI V13 NP14 , watashi wa NP14 V13 i
13
Machine Translation: Example Derivation
Grammar:
S −→ hNP1 VP2 , NP1 VP2 i
VP −→ hV1 NP2 , NP2 V1 i
NP −→ hI, watashi wai
NP −→ hthe box, hako woi
V −→ hopen, akemasui
Derivation
S ⇒ hNP11 VP12 , NP11 , VP12 i
⇒ hNP11 V13 NP14 , NP11 NP14 V13 i
⇒ hI V13 NP14 , watashi wa NP14 V13 i
⇒ hI open NP14 , watashi wa NP14 akemasu i
13
Machine Translation: Example Derivation
Grammar:
S −→ hNP1 VP2 , NP1 VP2 i
VP −→ hV1 NP2 , NP2 V1 i
NP −→ hI, watashi wai
NP −→ hthe box, hako woi
V −→ hopen, akemasui
Derivation
S ⇒ hNP11 VP12 , NP11 , VP12 i
⇒ hNP11 V13 NP14 , NP11 NP14 V13 i
⇒ hI V13 NP14 , watashi wa NP14 V13 i
⇒ hI open NP14 , watashi wa NP14 akemasu i
⇒ hI open the box , watashi wa hako wo akemasu i
13
SCFGs
I V NP watashi wa NP VP
14
Two Variants of Parsing
15
Two Variants of Parsing
I Surprisingly: The first problem is much harder than the second (despite
more information). We will only consider the second.
15
Decoding by parsing (i.e., Translation)
I Assuming we have binary rules, we can use the CKY algorithm (last
lecture) for parsing.
I Idea: Parse the english side of the grammar in the normal way, then
apply or project foreign side of rules.
I Why does this work? Synchronous rules have the same LHSs.
16
Decoding by Parsing: Parse English Side
Grammar:
S −→ hNP1 VP2 , NP1 VP2 i
VP −→ hV1 NP2 , NP2 V1 i
NP −→ hI, watashi wai
NP −→ hthe box, hako woi
V −→ hopen, akemasui
1 2 3
0
0 I 1 open 2 the box 3
1
2
17
Decoding by Parsing: Parse English Side
Grammar:
S −→ hNP1 VP2 , NP1 VP2 i
VP −→ hV1 NP2 , NP2 V1 i
NP −→ hI, watashi wai
NP −→ hthe box, hako woi
V −→ hopen, akemasui
1 2 3
0 NP → I
0 I 1 open 2 the box 3
1
2
18
Decoding by Parsing: Parse English Side
Grammar:
S −→ hNP1 VP2 , NP1 VP2 i
VP −→ hV1 NP2 , NP2 V1 i
NP −→ hI, watashi wai
NP −→ hthe box, hako woi
V −→ hopen, akemasui
1 2 3
0 NP → I
0 I 1 open 2 the box 3
1 V → open
19
Decoding by Parsing: Parse English Side
Grammar:
S −→ hNP1 VP2 , NP1 VP2 i
VP −→ hV1 NP2 , NP2 V1 i
NP −→ hI, watashi wai
NP −→ hthe box, hako woi
V −→ hopen, akemasui
1 2 3
0 NP → I
0 I 1 open 2 the box 3
1 V → open
2 NP → the box
20
Decoding by Parsing: Parse English Side
Grammar:
S −→ hNP1 VP2 , NP1 VP2 i
VP −→ hV1 NP2 , NP2 V1 i
NP −→ hI, watashi wai
NP −→ hthe box, hako woi
V −→ hopen, akemasui
1 2 3
0 NP → I
0 I 1 open 2 the box 3
1 V → open VP → V NP
2 NP → the box
21
Decoding by Parsing: Parse English Side
Grammar:
S −→ hNP1 VP2 , NP1 VP2 i
VP −→ hV1 NP2 , NP2 V1 i
NP −→ hI, watashi wai
NP −→ hthe box, hako woi
V −→ hopen, akemasui
1 2 3
0 NP → I S → NP VP
0 I 1 open 2 the box 3
1 V → open VP → V NP
2 NP → the box
22
Decoding by Parsing: Projection
Grammar:
S −→ hNP1 VP2 , NP1 VP2 i
VP −→ hV1 NP2 , NP2 V1 i
NP −→ hI, watashi wai
NP −→ hthe box, hako woi
V −→ hopen, akemasui
1 2 3
0 NP → I, watashi wa S → NP VP, NP VP
2 NP → the box
hako wo
23
Binarization (brief reminder/review)
I CKY algorithm (last week) assumes input grammar is in Chomsky
normal-form (binary rules and unary pre-terminal rules only).
I Why? input: w1 w2 w3 w4
24
Binarization (brief reminder/review)
I CKY algorithm (last week) assumes input grammar is in Chomsky
normal-form (binary rules and unary pre-terminal rules only).
I Why? input: w1 w2 w3 w4
24
Binarization (brief reminder/review)
I CKY algorithm (last week) assumes input grammar is in Chomsky
normal-form (binary rules and unary pre-terminal rules only).
I Why? input: w1 w2 w3 w4
move ax, 1 !
for i in range(10):
n += i
, loop: add bx,ax
cmp ax, 10
jle loop
25
Big Idea: Wong and Mooney (2006)
26
Semantic Parsing and Syntax-driven Translation
Grammar:
RULE −→ hif CONDITION1 DIRECTIVE2 , ( CONDITION1 DIRECTIVE2 )i
CONDITION −→ hTEAM1 player UNUM2 has the ball , (bowner TEAM1 {UNUM}2 )i
TEAM −→ hour, ouri
UNUM −→ hfour, 4i
27
Semantic Parsing and Syntax-driven Translation
Grammar:
RULE −→ hif CONDITION1 DIRECTIVE2 , ( CONDITION1 DIRECTIVE2 )i
CONDITION −→ hTEAM1 player UNUM2 has the ball , (bowner TEAM1 {UNUM}2 )i
TEAM −→ hour, ouri
UNUM −→ hfour, 4i
Deriv.
RULE ⇒ hif CONDITION1 DIRECTIVE2 , ( CONDITION1 , DIRECTIVE2 )i
27
Semantic Parsing and Syntax-driven Translation
Grammar:
RULE −→ hif CONDITION1 DIRECTIVE2 , ( CONDITION1 DIRECTIVE2 )i
CONDITION −→ hTEAM1 player UNUM2 has the ball , (bowner TEAM1 {UNUM}2 )i
TEAM −→ hour, ouri
UNUM −→ hfour, 4i
Deriv.
RULE ⇒ hif CONDITION1 DIRECTIVE2 , ( CONDITION1 , DIRECTIVE2 )i
⇒ hif TEAM1 player UNUM2 has the ball DIR.2 , ((bowler TEAM1 {UNUM2 }, DIR2 )i
27
Semantic Parsing and Syntax-driven Translation
Grammar:
RULE −→ hif CONDITION1 DIRECTIVE2 , ( CONDITION1 DIRECTIVE2 )i
CONDITION −→ hTEAM1 player UNUM2 has the ball , (bowner TEAM1 {UNUM}2 )i
TEAM −→ hour, ouri
UNUM −→ hfour, 4i
Deriv.
RULE ⇒ hif CONDITION1 DIRECTIVE2 , ( CONDITION1 , DIRECTIVE2 )i
⇒ hif TEAM1 player UNUM2 has the ball DIR.2 , ((bowler TEAM1 {UNUM2 }, DIR2 )i
⇒ hif our player UNUM2 has the ball DIR.2 , ((bowler our {UNUM2 }, DIR2 )i
27
Semantic Parsing and Syntax-driven Translation
Grammar:
RULE −→ hif CONDITION1 DIRECTIVE2 , ( CONDITION1 DIRECTIVE2 )i
CONDITION −→ hTEAM1 player UNUM2 has the ball , (bowner TEAM1 {UNUM}2 )i
TEAM −→ hour, ouri
UNUM −→ hfour, 4i
Deriv.
RULE ⇒ hif CONDITION1 DIRECTIVE2 , ( CONDITION1 , DIRECTIVE2 )i
⇒ hif TEAM1 player UNUM2 has the ball DIR.2 , ((bowler TEAM1 {UNUM2 }, DIR2 )i
⇒ hif our player UNUM2 has the ball DIR.2 , ((bowler our {UNUM2 }, DIR2 )i
...
⇒ h If our player four has the ball, then our player six ... ,
27
Semantic Parsing and Syntax-driven Translation
Grammar:
RULE −→ hif CONDITION1 DIRECTIVE2 , ( CONDITION1 DIRECTIVE2 )i
CONDITION −→ hTEAM1 player UNUM2 has the ball , (bowner TEAM1 {UNUM}2 )i
TEAM −→ hour, ouri
UNUM −→ hfour, 4i
Deriv.
RULE ⇒ hif CONDITION1 DIRECTIVE2 , ( CONDITION1 , DIRECTIVE2 )i
⇒ hif TEAM1 player UNUM2 has the ball DIR.2 , ((bowler TEAM1 {UNUM2 }, DIR2 )i
⇒ hif our player UNUM2 has the ball DIR.2 , ((bowler our {UNUM2 }, DIR2 )i
...
⇒ h If our player four has the ball, then our player six ... ,
27
Rule Extraction and Alignment
28
Rule Extraction and Alignment
28
Word-based alignment models (basics)
a : {1 → 1, 2 → 2, 3 → 3, 4 → 4}
2
Examples from Koehn (2009) and some of his slides.
29
Word-based alignment models (basics)
a : {1 → 1, 2 → 2, 3 → 3, 4 → 4, 5 → 4}
3
Examples from Koehn (2009)
30
Word-based alignment models (basics)
a : {1 → 1, 2 → 2, 3 → 3, 4 → 0, 5 → 4}
4
Examples from Koehn (2009)
31
Word-based alignment models (basics)
32
Word-based alignment models (basics)
32
IBM Model 1
33
IBM Model 1
le
1 Y
p(e, a | f ) = t(ej | fa(j) )
(lf + 1)le j=1
33
IBM Model 1
le
1 Y
p(e, a | f ) = l
t(ej | fa(j) )
(lf + 1) e j=1
34
IBM Model 1
le
1 Y
p(e, a | f ) = t(ej | fa(j) )
(lf + 1)le j=1
35
IBM Model 1
le
1 Y
p(e, a | f ) = t(ej | fa(j) )
(lf + 1)le j=1
35
IBM Model 1
I Luckily, we can get around this (using some basic math).
I (Overall) Translation probability:
X
p(e | f ) = p(e, a | f )
a
lf lf
X X
= ... p(e, a | f )
a(1)=0 a(le )=0
lf lf le
X X 1 Y
= ... t(ej | fa(j) )
(lf + 1)le j=1
a(1)=0 a(le )=0
lf lf le
1 X X Y
= ... t(ej | fa(j) )
(lf + 1)le j=1
a(1)=0 a(le )=0
lf
le X
1 Y
= t(ej | fj )
(lf + 1)le j=1 i=0
36
IBM Model 1
X
p(e | f ) = p(e, a | f )
a
le Xlf
1 Y
= t(ej | fj )
(lf + 1)le j=1 i=0
2
p(my friend | mein freund) = ((t(my | mein)+t(my | freund))∗(t(friend | mein)+t(friend | freund)))/2
37
Learning a Model1 aligner
count(ei , fj )
t(ei , fj ) = P
e count(e, fj )
38
Learning a Model1 aligner
count(ei , fj )
t(ei , fj ) = P
e count(e, fj )
38
Learning a Model1 aligner
count(ei , fj )
t(ei , fj ) = P
e count(e, fj )
38
Learning a Model1 aligner
count(ei , fj )
t(ei , fj ) = P
e count(e, fj )
38
Learning a Model1 aligner
count(ei , fj )
t(ei , fj ) = P
e count(e, fj )
38
Learning a Model1 aligner
count(ei , fj )
t(ei , fj ) = P
e count(e, fj )
38
EM for IBM Model1
Koehn (2009)
39
Model1 as a Translation Model
I Nowadays, such models are used for extracting alignments, which are the
basis of more complex translation models (e.g. our syntax-based model).
I Viterbi alignment: Find the most likely alignment given a pair (easy,
find for each word ei the most likely fj )
40
Other IBM Models
I IBM Models 2-5: Go beyond using only the lexical translation
probabilities.
41
Other IBM Models
I IBM Models 2-5: Go beyond using only the lexical translation
probabilities.
41
Other IBM Models
I IBM Models 2-5: Go beyond using only the lexical translation
probabilities.
41
Other IBM Models
42
Back to Semantic Parsing: Rule extraction (Wong and Mooney
(2006))
43
Back to Semantic Parsing: Rule extraction (Wong and Mooney
(2006))
43
Rule extraction (Wong and Mooney (2006))
44
Rule extraction (Wong and Mooney (2006))
44
Similar methods: Hiero rule extraction
45
Similar methods: Hiero rule extraction
X1 → h30, 30i
45
Similar methods: Hiero rule extraction
X1 → h30, 30i
X2 → hfriendly cooperation, youhao hezuoi
45
Similar methods: Hiero rule extraction
X1 → h30, 30i
X2 → hfriendly cooperation, youhao hezuoi
X3 → hover the last X1 years , X1 duonianlaii
45
Similar methods: Hiero rule extraction
X1 → h30, 30i
X2 → hfriendly cooperation, youhao hezuoi
X3 → hover the last X1 years , X1 duonianlaii
X4 → hX2 X3 , X3 X2 i
45
Extension to logical variables
46
Extension to logical variables
46
Extension to logical variables
46
Extension to logical variables
46
Probabilistic Model
47
Probabilistic Model
47
Overview and Take-aways
48
Overview and Take-aways
48
Overview and Take-aways
48
Roadmap
49
References I
Brown, P. F., Pietra, V. J. D., Pietra, S. A. D., and Mercer, R. L. (1993). The
mathematics of statistical machine translation: Parameter estimation.
Computational linguistics, 19(2):263–311.
Chiang, D. (2005). A hierarchical phrase-based model for statistical machine
translation. In Proceedings of the 43rd Annual Meeting on Association for
Computational Linguistics, pages 263–270. Association for Computational
Linguistics.
Chiang, D. and Knight, K. (2006). An introduction to synchronous grammars.
Tutorial available at http://www. isi. edu/ chiang/papers/synchtut. pdf.
Kate, R. J., Wong, Y. W., and Mooney, R. J. (2005). Learning to transform natural
to formal languages. In Proceedings of AAAI-2005.
Koehn, P. (2009). Statistical machine translation. Cambridge University Press.
Li, P., Liu, Y., and Sun, M. (2013). An extended ghkm algorithm for inducing
lambda-scfg. In AAAI.
http://www.aaai.org/ocs/index.php/AAAI/AAAI13/paper/view/6189.
Wong, Y. W. and Mooney, R. J. (2006). Learning for semantic parsing with statistical
machine translation. In Proceedings of HLT-NAACL-2006, pages 439–446.
http://dl.acm.org/citation.cfm?id=1220891.
Wong, Y. W. and Mooney, R. J. (2007). Learning synchronous grammars for semantic
parsing with lambda calculus. In Proceedings of ACL-2007, Prague, Czech
Republic. http://anthology.aclweb.org/P/P07/P07-1121.pdf.
50
References II
51