Unit 4
Unit 4
Syntax
CFG, Probabilistic CFG
Word’s Constituency (Phrase level, Sentence level),
Parsing (Top-Down and Bottom-Up),
CYK Parser, Probabilistic Parsing
– For example, the above productions express that an NP (or noun phrase) can
be composed of
●
either a Proper Noun or a determiner (Det) followed by a Nominal;
●
a Nominal in turn can consist of one or more Nouns.
CFG
●
A CFG can be thought of in two ways:
●
as a device for generating sentences and
●
as a device for assigning a structure to a given sentence.
●
Viewing a CFG as a generator,
– we can read the → arrow as “rewrite the symbol on the left with the string
of symbols on the right”.
●
So starting from the symbol: NP
●
we can use our first rule to rewrite NP as: DET Nominal
●
and then rewrite Nominal as: Noun
●
and finally rewrite these parts-of-speech as: a flight
CFG
●
We say the string a flight can be derived from
the non-terminal NP
– A CFG can be used to generate a set of strings. This sequence
of rule expansions is called a derivation of the string of words
– It is common to represent a derivation
by a parse tree
(commonly shown inverted with
the root at the top)
CFG
CFG
●
We can use this grammar to generate sentences of this “language”.
– We start with S, expand it to NP VP, then choose a random expansion of NP (let’s
say, to I)
– and a random expansion of VP (let’s say, to Verb NP),
– and so on until we generate the string
I prefer a morning flight
Context-Free Grammar (CFG)
●
Formal Definition
– A context-free grammar (CFG) G is a
quadruple (N, Σ, R, S) where
CFG - Example
●
N = {q, f,}
●
Σ = {0, 1}
●
R = {q → 11q, q → 00f,
●
f → 11f, f → ε }
●
S=q
●
(R= {q → 11q | 00f, f → 11f | ε })
CFG - Rules
●
If A → B, then xAy → xBy and we say that
●
xAy derivates xBy.
●
If s → ··· → t, then we write s * t.
●
A string x in Σ* is generated by G=(V,Σ,R,S)
if S * x.
●
L(G) = { x in Σ* | S * x}.
CFG - Example
●
G = ({S}, {0,1}. {S → 0S1 | ε }, S)
●
ε in L(G) because
S → ε.
●
01 in L(G) because
S → 0S1 → 01.
●
0011 in L(G) because
S → 0S1 → 00S11 → 0011.
●
L(G) = {0n 1n | n > 0}
Context-free Language (CFL)
●
A language L is context-free if there exists a
CFG G such that L = L(G).
●
A grammar G generates a language L
Example
G = (N, T, S, P) P = { S → NP VP
V = {S, NP, VP, PP, Det, Noun, NP → Det Noun | NP PP
Verb, Aux, Pre } PP → Pre NP
VP → Verb NP
T = {‘a’, ‘ate’, ‘cake’, ‘child’, Det → ‘a’ | ‘the’
‘fork’, ‘the’, ‘with’} Noun → ‘cake’ | ‘child’ | ‘fork’
S=S Pre → ‘with’
Verb → ‘ate’}
Example
●
Some notes:
– Note 1: In P, pipe symbol (|) is used to combine productions into
single representation for productions that have same LHS.
●
For example, Det → ‘a’ | ‘the’ derived from two rules Det → ‘a’ and Det →
‘the’. Yet it denotes two rules not one.
– Note 2: The production highlighted in red are referred as grammar,
and green are referred as lexicon.
– Note 3:
●
NP – Noun Phrase, VP – Verb Phrase, PP – Prepositional Phrase, Det –
Determiner, Aux – Auxiliary verb
Sample derivation
●
S → NP VP P = { S → NP VP
→ Det Noun VP NP → Det Noun | NP PP
→ the Noun VP PP → Pre NP
→ the child VP VP → Verb NP
→ the child Verb NP Det → ‘a’ | ‘the’
→ the child ate NP Noun → ‘cake’ | ‘child’ | ‘fork’
→ the child ate Det Noun Pre → ‘with’
→ the child ate a Noun Verb → ‘ate’}
→ the child ate a cake
Probabilistic Context Free Grammar (PCFG)
●
PCFG is an extension of CFG with a probability for each
production rule
●
Ambiguity is the reason why we are using probabilistic version
of CFG
– For instance, some sentences may have more than one underlying derivation.
– The sentence can be parsed in more than one ways.
– In this case, the parse of the sentence become ambiguous.
●
To eliminate this ambiguity, we can use PCFG to find the
probability of each parse of the given sentence
PCFG - Definition
●
A probabilistic context free grammar G is a quintuple G = (N, T, S, R, P)
where
– (N, T, S, R) is a context free grammar
where N is set of non-terminal (variable) symbols, T is set of terminal symbols, S is the start
symbol and R is the set of production rules where each rule of the form A → S
– A probability P(A → s) for each rule in R. The properties governing the probability
are as follows;
●
P(A → s) is a conditional probability of choosing a rule A → s in a left-most derivation, given that
A is the non-terminal that is expanded.
●
The value for each probability lies between 0 and 1.
●
The sum of all probabilities of rules with A as the left hand side non-terminal should be equal to
1.
PCFG - Example
●
Probabilistic Context Free Grammar G = (N, T, S, R, P)
S=S
R={ S → NP VP
NP → Det Noun | NP PP
PP → Pre NP
VP → Verb NP
Det → ‘a’ | ‘the’
Noun → ‘cake’ | ‘child’ | ‘fork’
Pre → ‘with’
Verb → ‘ate’ }
R={
PCFG - Example S → NP VP
NP → Det Noun | NP PP
PP → Pre NP
VP → Verb NP
●
P = R with associated probability Det → ‘a’ | ‘the’
Noun → ‘cake’ | ‘child’ | ‘fork’
as in the table below Pre → ‘with’
Verb → ‘ate’ }
Please observe from the table, the sum of probability values for all rules that have same left hand side is 1
Parse
●
Resolve (a sentence) into its component parts
and describe their syntactic roles.
●
On NLP – Parsing can be visualized in the tree
form
Syntax Parsing
Mostly used in programming
b = c + 1;
a= a - d
Parse Tree
●
A parse of the sentence "the giraffe dreams" is:
– s => np vp => det n vp => the n vp => the giraffe vp
=> the giraffe iv => the giraffe dreams
Parsing
●
In natural language processing, parsing is the
process of analyzing a sentence to determine its
grammatical structure
●
There are two main approaches to parsing:
– top-down parsing
– bottom-up parsing
Top-down Parsing
●
Top-down parsing is a parsing technique that starts with the
highest level of a grammar’s production rules, and then works its
way down to the lowest level.
– It begins with the start symbol of the grammar and applies the production
rules recursively to expand it into a parse tree.
– One example of a top-down parsing algorithm is the Recursive Descent
Parsing.
Top-down Parsing
●
For example, consider the following CFG:
S -> NP VP
NP -> Det N
VP -> V NP
Det -> the | a
N -> dog | cat | boy | girl
V -> chased | hugged
●
A top-down parser would begin with the start symbol “S” and then
apply the production rule “S -> NP VP” to expand it into “NP VP”.
●
The parser would then apply the production rule “NP -> Det N” to
expand “NP” into “Det N”.
Buttom-up Parsing
●
Bottom-up parsing is a parsing technique that starts with the
sentence’s words and works its way up to the highest level of the
grammar’s production rules.
●
It begins with the input sentence and applies the production rules
in reverse, reducing the input sentence to the start symbol of the
grammar.
●
One example of a bottom-up parsing algorithm is the Shift-
Reduce Parsing.
Buttom-up Parsing
●
For example, consider the following CFG:
S -> NP VP
NP -> Det N
VP -> V NP
Det -> the | a
N -> dog | cat | boy | girl
V -> chased | hugged
●
A bottom-up parser would begin with the input sentence “the dog chased the cat” and would apply the
production rules in reverse to reduce it to the start symbol “S”.
●
The parser would start by matching
– “the dog” to the “Det N” production rule,
– “chased” to the “V” production rule, and
– “the cat” to another “Det N” production rule.
●
These reduce steps will be repeated until the input sentence is reduced to “S”, the start symbol of the
grammar.
Probability of a parse tree
●
Use of PCFG
– A sentence can be parsed into more than one way
– We can have more than one parse trees for the
sentence as per the CFG due to ambiguity.
Probability of a parse tree
●
Given a parse tree t,
– with the production rules α1 → β1, α2 → β2, … , αn → βn
– from R (ie., αi → βi ∈ R), we can find the probability of tree t using
PCFG as follows;
●
As per the equation, the probability P(t) of parse tree is the
product of probabilities of production rules in the tree t.
Probability of a parse tree
●
Which is the most probable tree?
– The probability of the parse tree t1 is greater than
the probability of parse tree t2. Hence, t1 is the more
probable of the two parses.
Probability of a sentence
●
Probability of a sentence is the sum of probabilities of all
parse trees that can be derived from the sentence under
PCFG
●
Probability of the sentence “astronomers saw the stars
with ears”
Ambiguity
●
Ambiguity is the most serious problem faced by
syntactic parsers
●
The most common ambiguity is
– Structural ambiguity
Ambiguity
●
The phrase in my pajamas can be part of the NP
headed by elephant or a part of the VP headed
by shot
●
Two parse trees for an ambiguous
sentence. The parse on the left
corresponds to the humorous reading
in which the elephant is in the
pajamas,
●
the parse on the right corresponds to
the reading in which Captain Spaulding
did the shooting in his pajamas
Self Study
●
Chomsky Normal Form (CNF)
●
Cocke–Younger–Kasami (CYK) algorithm
Treebank
●
A corpus in which every sentence is annotated with a
parse tree is called a treebank
●
Treebanks play an important role in parsing as well as in
linguistic investigations of syntactic phenomena
Reference
●
Chapter 17 - Speech and Language Processing (3rd Edition)
●
Automatic Generation of Python Programs Using Context-Free Grammars:
https://arxiv.org/pdf/2403.06503v1
Thank you