0% found this document useful (0 votes)

230 views11 pages

Formal Grammars and Parsing

The document discusses formal grammars and parsing for natural language processing. It defines what a grammar and parser are, describes different types of formal grammars including context-free grammars, and provides an example of a context-free grammar. It also explains deriving sentences from a grammar, and algorithms for top-down and bottom-up parsing including chart parsing.

Uploaded by

Marta Sampedro Gonzalez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

230 views11 pages

Formal Grammars and Parsing

Uploaded by

Marta Sampedro Gonzalez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Grammars and Parsing

Aims:
In this section we will describe several types of formal grammars for natural language processing, parse
trees, and a number of parsing methods, including a bottom-up chart parser in some detail. Read the
different subsections and the recommended readings at the end of this UNIT.

What is a Grammar?
a formal description of the structure of a language (prescriptive grammar is something else: rules
for a high-status variant of a language: e.g. don't split infinitives)

What is a Parser?
an algorithm for analysing sentences given a grammar:

- may just give yes/no answer to the question Does this sentence conform to the given grammar:
such a parser is termed an accepter
- may also produce a structure description ("parse tree") for correct sentences:

Here is the same tree in list notation ...

(S (NP (NAME John))

(VP (V ate)
(NP (ART the))
(N cat))))

Describing Syntax: Context Free Grammars (CFGs)

1. S -> NP VP
2. VP-> V NP
grammar rules
grammar rules
3. NP-> NAME
4. NP-> ART N
. .
5. NAME -> John
6. V -> ate
lexical entries
7. ART -> the
8. N -> cat

Definition: A CFG is a 5-tuple (P, A, N, T, S) where:

P is a set of context-free productions, i.e. objects of the form X -> beta, where X is a member of
N, and beta is a string over the alphabet A
A is an alphabet of grammar symbols;
N is a set of non-terminal symbols;
T is a set of terminal symbols (and N U T = A);
S is a distinguished non-terminal called the start symbol (think of it as "sentence" in NLP
applications).

E.g.

T = { ate, cat, John, the}

N = { S, NP, VP, N, V, NAME, ART}

P = { S -> NP VP, VP-> V NP, NP-> NAME, NP-> ART N, NAME-> John V-> ate, ART-> the, N->
cat} .

Notice how the productions were split into grammar rules and lexicon above. N, V, NAME and ART are
called pre-terminal or lexicalsymbols.

Aside: In programming language grammars, there would be context-free rules like:

WhileStatement -> while Condition do StatementList end

StatementList -> Statement
StatementList -> Statement ; StatementList

Types of grammars:
unrestricted grammars
context-sensitive grammars
context-free grammars
regular grammars.

With context-free grammars, these form the Chomsky hierarchy of grammars. The four types of
grammar differ in the type of rewriting rule alpha -> beta that is allowed:

unrestricted grammar. No restrictions on the form that the Rules can take. Unrestricted grammars
are not widely used: their extreme power makes them difficult to use;
context sensitive grammar, or transformational grammar. The length of the string alpha on the
left-hand side of any rule must be less than or equal to the length of the string beta]] on the right-
hand side of the rule. It is equivalent to require that all the productions be of the form lambda A
rho -> lambda alpha rho, where lambda and rho are arbitrary (possibly null) strings. lambda and
rho are thought of as the left and right context in which the non-terminal symbol A can be
rewritten as the non-null symbol-string alpha. Hence the term context-sensitive grammar. Context-
sensitive production rules can be used for transforming an active sentence into the corresponding
passive sentence.
context-free grammar, or phrase structure grammar. All rules must be of the form A -> alpha,
where A is a nonterminal symbol and alpha is an arbitrary string of symbols.
regular grammar, or a right linear grammar. All rules take one of two forms: either A -> t, or A -
> tN, where A and N are non-terminal symbols and t is a member of the vocabulary (a terminal
symbol). Regular grammar rules are not powerful enough to conveniently describe natural
languages (or even programming languages). They can sometimes be used to describe portions of
languages, and have the advantage that they lead to fast parsing.

Since the restrictions which define the grammar types apply to the rules, it makes sense to talk of
unrestricted, context-sensitive, context-free, and regular rules.

Deriving sentences from a grammar

To derive a sentence from a grammar, start with the start symbol S, and refer to it as the current string.
Repeatedly perform the following rewriting process:

choose any rule whose LHS occurs in the current string (in a CFG, the LHS must be a non-
terminal symbol);
replace the LHS of that rule with the RHS of the rule, in the current string, producing a new
current string.

Repeat until there are no non-terminals remaining in the current string. The current string is then a
sentence in the language generated by the grammar. (Before this, it is termed a sentential form.) E.g.:
Current string Rewriting
S => NP VP S
=> NAME VP NP
=> John VP NAME
=> John V NP VP
=> John ate NP V
=> John ate ART N NP
=> John ate the N ART
=> John ate the cat N

Parsing might be the reverse of this process (doing the steps shown above in reverse would constitute a
bottom-up right-to-left parse of John ate the cat.)

Top-Down Parsing ... with CFGs

Basic "predictive" parser
In this parsing method, we guess ("predict") what the next production to be applied should be, and, in
case we guess wrong, we stack any alternatives so that we can come back to them if necessary. We
"cross off" the leading item of our current sentential form if it can be matched against the next word of
the sentence. This algorithm can take exponential time on bad sentences for bad CFGs. With well-
behaved grammars, it can be a linear time algorithm. NL grammars are not usually all that well-behaved.

Use this grammar:

S -> NP VP
NP -> ART N | NAME
PP -> PREP NP
VP -> V | V NP | V NP PP | V PP

Sentence: 1 The 2 dogs 3 cried. 4

Backup states Position

S -> NP VP 1
-> ART N VP 1
NAME VP 1
-> (The) N VP 2
NAME VP 1
-> (dogs) VP 3
NAME VP 1
-> V 3
V NP 3
V NP PP 3
V PP 3
NAME VP 1
-> (cried.) 4

Bottom-up parsing

The dogs cried -> ART N V

-> NP V
-> NP VP
-> S
Using bottom-up parsing methods, all CFGs can be parsed in n*n*n steps, where n is the length of the
sentence. The reason predictive parsing may take exponential time is that it may re-parse pieces of the
sentence, particularly confusing sentences (like the horse [that was] raced past the barn fell):
Indication of parsing state Comment
The horse raced past the barn ... "past the barn" parsed as PP, "raced" parsed as [main] verb
The horse raced past the barn
oops!
fell
The horse ... backtrack to this point and start over
The horse raced past the barn "past the barn" parsed as PP again: "raced" parsed as ADJ [from
>
fell past participle] starting the ADJP ["raced past the barn"]

Chart Parsing
(Section 3.4 of Allen)

The chart is a record of all the substructures (like past the barn) that have ever been built during the
parse. A chart is sometimes also called a well-formed substring table. Actual charts get complex rapidly.

Charts help with "elliptical" sentences:

1: Q. How much are apples?

2: A. Thirty cents each.
3: Q. Plums?

An attempt to parse 3 as a sentence fails, but all is not lost, as the analysis of plums as an NP is on the
chart. Successful parsing of the entire utterance as any kind of structure can be useful.

A Bottom-Up Chart-Based Parsing Algorithm

This algorithm expresses an efficient bottom-up parsing process. It is guaranteed to parse a sentence of
length N within N*N*N parsing steps, and it will perform better than this (N*N steps or N steps) with
well-behaved grammars.

The algorithm constructs (phrasal or lexical) constituents of a sentence. We shall use the sentence the
green fly flies as an example in describing a NL-oriented parser similar to Earley's algorithm. The
sentence is notionally annotated with positions: our sentence becomes 0the1green2fly3flies4.

In terms of this notation, the parsing process succeeds if an S (sentence) constituent is found covering
positions 0 to 4.

Points (1) to (8) below do not completely specify the order in which parsing steps are carried out: one
reasonable order is to scan a word (as in (2)) and then perform all possible parsing steps as specified in
(3) - (6) before scanning another word. Parsing is completed when the last word has been read and all
possible subsequent parsing steps have been performed.

Parser inputs: sentence, lexicon, grammar.

Parser operations:

(0) The algorithm operates on two data structures: the active chart - a collection of active arcs (see (3)
below) and the constituents (see (2) and (5)). Both are initially empty.

(1) The grammar is considered to include lexical insertion rules: for example, if fly is a word in the
lexicon/vocabulary being used, and if its lexical entry includes the fact that fly may be a N or a V, then
rules of the form N -> fly and V -> fly are considered to be part of the grammar.

(2) As a word (like fly) is scanned, constituents corresponding to its lexical categories are created:

N1: N -> fly FROM 2 TO 3, and

V1: V -> fly FROM 2 TO 3

(3) If the grammar contains a rule like NP -> ART ADJ N, and a constituent like ART1: ART -> the
FROM m TO n has been found, then an active arc ARC1: NP -> ART1 * ADJ N FROM m TO n

is added to the active chart. (In our example sentence, m would be 0 and n would be 1.) The "*" in an
active arc marks the boundary between found constituents and constituents not (yet) found.

(4) Advancing the "*": If the active chart has an active arc like:

ARC1: NP -> ART1 * ADJ N FROM m TO n

and there is a constituent in the chart of type ADJ (i.e. the first item after the *), say

ADJ1: ADJ -> green FROM n TO p

such that the FROM position in the constituent matches the TO position in the active arc, then the "*"
can be advanced, creating a new active arc:

ARC2: NP -> ART1 ADJ1 * N FROM m TO p.

(5) If the process of advancing the "*" creates an active arc whose "*" is at the far right hand side of the
rule: e.g.

ARC3: NP -> ART1 ADJ1 N1 * FROM 0 TO 3

then this arc is converted to a constituent.

NP1: NP -> ART1 ADJ1 N1 FROM 0 TO 3.

Not all active arcs are ever completed in this sense.

(6) Both lexical and phrasal constituents can be used in steps 3 and 4: e.g. if the grammar contains a rule
S -> NP VP, then as soon as the constituent NP1 discussed in step 5 is created, it will be possible to
make a new active arc

ARC4: S -> NP1 * VP FROM 0 TO 3

(7) When subsequent constituents are created, they would have names like NP2, NP3, ..., ADJ2, ADJ3, ...
and so on.

(8) The aim of parsing is to get phrasal constituents (normally of type S) whose FROM is 0 and whose
TO is the length of the sentence. There may be several such constituents.

Chart Parsing Example

Grammar:

1. S -> NP VP
2. NP -> ART ADJ N
3. NP -> ART N
4. NP -> ADJ N
5. VP -> AUX V NP
6. VP -> V NP

Sentence: 0 The 1 large 2 can 3 can 4 hold 5 the 6 water 7

Lexicon:

the... ART
large... ADJ
can... AUX, N, V
hold... N, V
water... N, V

Steps in Parsing:

sentence: the position: 1

*constituents*:
ART1: ART -> "the" from 0 to 1
*active-arcs*:
ARC1: NP -> ART1 * ADJ N from 0 to 1 [rule 2]
ARC2: NP -> ART1 * N from 0 to 1 [rule 3]

sentence: the large position: 2

*constituents*: add
ADJ1: ADJ -> "large" from 1 to 2
*active-arcs*: add
ARC3: NP -> ART1 ADJ1 * N from 0 to 2 [arc1*->]
ARC4: NP -> ADJ1 * N from 1 to 2 [rule 4]

sentence: the large can position: 3

*constituents*: add
NP2: NP -> ART1 ADJ1 N1 from 0 to 3 [arc3*->]
NP1: NP -> ADJ1 N1 from 1 to 3 [arc4*->]
N1: N -> "can" from 2 to 3
AUX1: AUX -> "can" from 2 to 3
V1: V -> "can" from 2 to 3
*active-arcs*: add
ARC5: VP -> V1 * NP from 2 to 3 [rule 6]
ARC6: VP -> AUX1 * V NP from 2 to 3 [rule 5]
ARC7: S -> NP1 * VP from 1 to 3 [rule 1]
ARC8: S -> NP2 * VP from 0 to 3 [rule 1]

sentence: the large can can position: 4

*constituents*: add
N2: N -> "can" from 3 to 4
AUX2: AUX -> "can" from 3 to 4
V2: V -> "can" from 3 to 4
*active-arcs*: add
ARC9: VP -> AUX1 V2 * NP from 2 to 4 [arc6*->]
ARC10: VP -> V2 * NP from 3 to 4 [rule 6]
ARC11: VP -> AUX2 * V NP from 3 to 4 [rule 5]

sentence: the large can can hold

*position*: 5
*constituents*: add
N3: N -> "hold" from 4 to 5
V3: V -> "hold" from 4 to 5
*active-arcs*: add
ARC12: VP -> AUX2 V3 * NP from 3 to 5 [arc11*->]
ARC13: VP -> V3 * NP from 4 to 5 [rule 6]

sentence: the large can can hold the

*position*: 6
*constituents*: add
ART2: ART -> "the" from 5 to 6
*active-arcs*: add
ARC14: NP -> ART2 * ADJ N from 5 to 6 [rule 2]
ARC15: NP -> ART2 * N from 5 to 6 [rule 3]

sentence: the large can can hold the water

*position*: 7
*constituents*: add
S2: S -> NP1 VP2 from 1 to 7 [arc7*->]
S1: S -> NP2 VP2 from 0 to 7 [arc8*->]
VP2: VP -> AUX2 V3 NP3 from 3 to 7 [arc12*->]
VP1: VP -> V3 NP3 from 4 to 7 [arc13*->]
NP3: NP -> ART2 N4 from 5 to 7 [arc15*->]
N4: N -> "water" from 6 to 7
V4: V -> "water" from 6 to 7
*active-arcs*: add
ARC16: VP -> V4 * NP from 6 to 7 [rule 6]
ARC17: S -> NP3 * VP from 5 to 7 [rule 1]

Doing the same steps diagrammatically:

disadvantage of bottom-up method: will find irrelevant constituents like the VP hold the water
which would not be noticed by a top-down parser, because it wouldn't be looking for (the start of)
a VP at that point;
a top-down CFG parser can have a chart (but you have to keep track of which constituents are
hypothesized, and which are substantiated by the text being parsed);
mixed-mode parsers have best aspects of both methods (disadvantage - more complicated)

Recording Sentence Structure

A frame-like, or slot/filler representation is suitable: Jack found a dollar

(S SUBJ (NP NAME "Jack")

MAIN-V find
TENSE past
OBJ (NP ART a HEAD dollar))

s(subj(np(name(jack))),
mainv(find),
tense(past),
obj(np(art(a), head(dollar))))

PP attachment
The boy saw the man on the hill with the telescope

The two visual interpretations correspond to two different parses (coming from different grammar rules
(VP -> V NP PP and NP -> NP PP):

Other Syntax Representation and Parsing Methods

augmented transition networks (ATN)
generalised phrase structure grammar (GPSG)
head driven phrase structure grammar (HDPSG)
unification-based parsing methods
lexical functional grammar (LFG)
shift-reduce parsers (modified to handle ambiguous grammars)
Marcus parsing (buffers and lookahead)
systemic grammar (semiotics-driven)
semantic grammars (specialise lexical categories to a range of semantic subcategories: grammar
potentially becomes huge)

Limitations of Syntax in NLP

it is reasonable to ask for syntactically correct programs, but unrealistic to ask for syntactically
correct NL. Written NL material is sometimes correct, but spoken utterances are rarely
grammatical. NL systems must be syntactically and semantically robust.
some approaches have sought to be semantics-driven, to avoid the problem of how to deal with
syntactically ill-formed text. However, some syntax is essential - else how do we distinguish
between Cyril loves Audrey and Audrey loves Cyril?
Summary: Grammars and Parsing
There are many approaches to parsing and many grammatical formalisms. Some problems in deciding
the structure of a sentence turn out to be undecidable at the syntactic level. We have concentrated on a
bottom-up chart parser based on a context-free grammar.

NLP Chapter 3
No ratings yet
NLP Chapter 3
23 pages
Unit Iii
No ratings yet
Unit Iii
17 pages
21cse356t NLP Unit 2
No ratings yet
21cse356t NLP Unit 2
89 pages
Constituency Parsing
No ratings yet
Constituency Parsing
94 pages
Unit Iii - NLP
No ratings yet
Unit Iii - NLP
36 pages
NLP Module 3
No ratings yet
NLP Module 3
41 pages
Unit-4 Context Free Grammar
No ratings yet
Unit-4 Context Free Grammar
106 pages
Natural Language Processing: Dr. Ahmed El-Bialy
100% (1)
Natural Language Processing: Dr. Ahmed El-Bialy
49 pages
Natural Language Processing
No ratings yet
Natural Language Processing
47 pages
Module-2 ch-4
No ratings yet
Module-2 ch-4
32 pages
Natural Language Processing PDF
100% (1)
Natural Language Processing PDF
47 pages
Syntax Parsing: Lecture # 6
100% (1)
Syntax Parsing: Lecture # 6
65 pages
NLP Unit Ii
No ratings yet
NLP Unit Ii
30 pages
04 - Parsing in NLP
No ratings yet
04 - Parsing in NLP
39 pages
NLP UNIT 2 Notes
No ratings yet
NLP UNIT 2 Notes
14 pages
06 Formal Grammars
100% (2)
06 Formal Grammars
11 pages
CFG and PCFG
No ratings yet
CFG and PCFG
7 pages
17 Context Free Languages With Examples
No ratings yet
17 Context Free Languages With Examples
74 pages
8-Syntax Part1 Merged
No ratings yet
8-Syntax Part1 Merged
139 pages
LanguagesandGrammars Unit 3
No ratings yet
LanguagesandGrammars Unit 3
65 pages
Natural Language Processing
No ratings yet
Natural Language Processing
21 pages
Chapter 3
No ratings yet
Chapter 3
71 pages
Syntax JB Slides
No ratings yet
Syntax JB Slides
90 pages
Lecture06-Syntax Formal Languages
No ratings yet
Lecture06-Syntax Formal Languages
43 pages
6 Languages Grammars
No ratings yet
6 Languages Grammars
37 pages
Unit 3
No ratings yet
Unit 3
25 pages
Syntax Parsing
No ratings yet
Syntax Parsing
95 pages
Detailed Lesson Plan in English Grade 7 Joel
100% (3)
Detailed Lesson Plan in English Grade 7 Joel
3 pages
Syntax Parsing: Implementation Using Basic Grammar-Rules For English Language For Ontology Base Semantic Search Engine
No ratings yet
Syntax Parsing: Implementation Using Basic Grammar-Rules For English Language For Ontology Base Semantic Search Engine
15 pages
Syntax Analysis (Part-I)
No ratings yet
Syntax Analysis (Part-I)
88 pages
NLP Unit Ii
No ratings yet
NLP Unit Ii
30 pages
Grammars and Parsing
No ratings yet
Grammars and Parsing
29 pages
Unit 4
No ratings yet
Unit 4
45 pages
Unit - 2 NLP - R20
No ratings yet
Unit - 2 NLP - R20
21 pages
NLP Unit 2 Part 1
No ratings yet
NLP Unit 2 Part 1
28 pages
Mod - 3
No ratings yet
Mod - 3
51 pages
Longsem2024-25 Cse3015 Eth Ap2024256000125 Reference-material-III
No ratings yet
Longsem2024-25 Cse3015 Eth Ap2024256000125 Reference-material-III
89 pages
Lecture 4
No ratings yet
Lecture 4
26 pages
9-Syntax Part1
No ratings yet
9-Syntax Part1
26 pages
NLP M3 SPP
No ratings yet
NLP M3 SPP
53 pages
Morphological Parsing
No ratings yet
Morphological Parsing
19 pages
Overview of Linguistics
No ratings yet
Overview of Linguistics
19 pages
Module No. 3: Parsing Structure in Text
No ratings yet
Module No. 3: Parsing Structure in Text
54 pages
Unit - 2
No ratings yet
Unit - 2
13 pages
Module 3 NLP
No ratings yet
Module 3 NLP
32 pages
Chapter 01 Sounds Spelling and Symbols
60% (5)
Chapter 01 Sounds Spelling and Symbols
3 pages
Basic Parsing Techniques
No ratings yet
Basic Parsing Techniques
9 pages
Unit 3
No ratings yet
Unit 3
8 pages
Natural Language Processing
No ratings yet
Natural Language Processing
11 pages
Context Free Grammars
No ratings yet
Context Free Grammars
38 pages
Ai Unit 5
No ratings yet
Ai Unit 5
19 pages
14 Syntax 1
No ratings yet
14 Syntax 1
22 pages
13-Dependency Grammar-03-09-2024
No ratings yet
13-Dependency Grammar-03-09-2024
31 pages
Units - 2.1
No ratings yet
Units - 2.1
8 pages
Detailed Lesson Plan For Grade 9
No ratings yet
Detailed Lesson Plan For Grade 9
6 pages
NLP - Unit Ii
No ratings yet
NLP - Unit Ii
13 pages
English For Military Personnel
No ratings yet
English For Military Personnel
9 pages
Demo Lesson Plan
100% (6)
Demo Lesson Plan
2 pages
UNIT 3 - Formal Grammar IN ENGLISH
No ratings yet
UNIT 3 - Formal Grammar IN ENGLISH
5 pages
Notes On Formal Grammars: What Is A Grammar?
No ratings yet
Notes On Formal Grammars: What Is A Grammar?
8 pages
Week 3 - Probablistic Context Free Grammars
No ratings yet
Week 3 - Probablistic Context Free Grammars
18 pages
Lect 11
No ratings yet
Lect 11
7 pages
NLP - Shortnotes Unit 3
No ratings yet
NLP - Shortnotes Unit 3
16 pages
English For Medicine Unit 1
No ratings yet
English For Medicine Unit 1
11 pages
Rikusho System: English Name Japanese Name Description Examples
No ratings yet
Rikusho System: English Name Japanese Name Description Examples
8 pages
Sentence Patterns SV, Sva Etc
0% (1)
Sentence Patterns SV, Sva Etc
17 pages
The Case For and Against Teaching Grammar
No ratings yet
The Case For and Against Teaching Grammar
2 pages
Current Trends in Metrical Analysis: Christoph Küper (Ed.)
100% (1)
Current Trends in Metrical Analysis: Christoph Küper (Ed.)
370 pages
Listening Practice Answer Key OUT AND ABOUT 1
No ratings yet
Listening Practice Answer Key OUT AND ABOUT 1
2 pages
Developmental Spelling Analysis
50% (2)
Developmental Spelling Analysis
10 pages
Language Disorders - 5
No ratings yet
Language Disorders - 5
27 pages
Accessed From: Accessed On: Sun Dec 08 2024 14:03:20 Czas Środkowoeuropejski Standardowy
No ratings yet
Accessed From: Accessed On: Sun Dec 08 2024 14:03:20 Czas Środkowoeuropejski Standardowy
15 pages
Introduc On To Programming With Python
No ratings yet
Introduc On To Programming With Python
30 pages
Definition, Scope, History of Semantics, and Main Terms
No ratings yet
Definition, Scope, History of Semantics, and Main Terms
22 pages
Text-To-Speech Synthesis Using Fuzzy Logic
No ratings yet
Text-To-Speech Synthesis Using Fuzzy Logic
132 pages
Unit 4 Listening Practice
No ratings yet
Unit 4 Listening Practice
1 page
Gerunds Infinitives
No ratings yet
Gerunds Infinitives
4 pages
Grade 8: Module 1: Unit 2: Lesson 13: Close Reading: Paragraph 4 of "Refugee and
No ratings yet
Grade 8: Module 1: Unit 2: Lesson 13: Close Reading: Paragraph 4 of "Refugee and
14 pages
Introduction: Pragmatics, Discourse and Text
No ratings yet
Introduction: Pragmatics, Discourse and Text
4 pages
Exam Day Tip1
No ratings yet
Exam Day Tip1
16 pages
Unit 1: Overview of The Field
No ratings yet
Unit 1: Overview of The Field
78 pages
References A. Books
No ratings yet
References A. Books
2 pages
Parallelism Presentation
No ratings yet
Parallelism Presentation
30 pages
Logie, R.H. 1999
No ratings yet
Logie, R.H. 1999
6 pages
English For Translating and Interpreting Unit 9
No ratings yet
English For Translating and Interpreting Unit 9
8 pages
Artificial Intelligence Notes
No ratings yet
Artificial Intelligence Notes
29 pages
Kine Sics
No ratings yet
Kine Sics
26 pages
Listening Practice Audioscripts OUT AND ABOUT 1
100% (1)
Listening Practice Audioscripts OUT AND ABOUT 1
4 pages
Common Affixes and Their Meanings: Prefixes
No ratings yet
Common Affixes and Their Meanings: Prefixes
2 pages
Unit 3 Listening Practice OUT AND ABOUT 1
No ratings yet
Unit 3 Listening Practice OUT AND ABOUT 1
1 page
Questions and Assignments
No ratings yet
Questions and Assignments
8 pages
Writing Task. Contrast - Comparative Essay
No ratings yet
Writing Task. Contrast - Comparative Essay
1 page
If You Give A Moose A Muffin
No ratings yet
If You Give A Moose A Muffin
25 pages
Phrasal Verbs
No ratings yet
Phrasal Verbs
10 pages
Chapter - Ii Review of Related Literature For Gaining Insights
No ratings yet
Chapter - Ii Review of Related Literature For Gaining Insights
48 pages
BSL Proficiency Test
No ratings yet
BSL Proficiency Test
6 pages
Dzexams Bem Anglais 935000
No ratings yet
Dzexams Bem Anglais 935000
5 pages
Unit 6 Listening Practice
No ratings yet
Unit 6 Listening Practice
1 page
Presentation On Reading An Writing Skill
No ratings yet
Presentation On Reading An Writing Skill
26 pages
Assignment # 3 Literatura de Los Estados Unidos Hasta 1850
No ratings yet
Assignment # 3 Literatura de Los Estados Unidos Hasta 1850
2 pages
Department of Accountancy University of Illinois Memorandum TO: From: Date: Subject
No ratings yet
Department of Accountancy University of Illinois Memorandum TO: From: Date: Subject
7 pages
Our Spoken English and Exam English
No ratings yet
Our Spoken English and Exam English
3 pages
Language Technology A First Overview: Hans Uszkoreit 1. Scope
No ratings yet
Language Technology A First Overview: Hans Uszkoreit 1. Scope
4 pages
Unit 2 Listening Practice
No ratings yet
Unit 2 Listening Practice
1 page
Unit 2 Listening Practice OUT AND ABOUT 1
No ratings yet
Unit 2 Listening Practice OUT AND ABOUT 1
1 page
Proiect de Lectie Inspectie
No ratings yet
Proiect de Lectie Inspectie
3 pages
Module 2 Sara PDF
No ratings yet
Module 2 Sara PDF
3 pages
Daftar Pustaka Word Walls Media
No ratings yet
Daftar Pustaka Word Walls Media
1 page

Formal Grammars and Parsing

Uploaded by

Formal Grammars and Parsing

Uploaded by

Grammars and Parsing

Here is the same tree in list notation ...

(S (NP (NAME John))

Describing Syntax: Context Free Grammars (CFGs)

Definition: A CFG is a 5-tuple (P, A, N, T, S) where:

T = { ate, cat, John, the}

N = { S, NP, VP, N, V, NAME, ART}

Aside: In programming language grammars, there would be context-free rules like:

WhileStatement -> while Condition do StatementList end

Deriving sentences from a grammar

Top-Down Parsing ... with CFGs

Use this grammar:

Sentence: 1 The 2 dogs 3 cried. 4

Backup states Position

The dogs cried -> ART N V

Charts help with "elliptical" sentences:

1: Q. How much are apples?

A Bottom-Up Chart-Based Parsing Algorithm

Parser inputs: sentence, lexicon, grammar.

N1: N -> fly FROM 2 TO 3, and

V1: V -> fly FROM 2 TO 3

ARC1: NP -> ART1 * ADJ N FROM m TO n

ADJ1: ADJ -> green FROM n TO p

ARC2: NP -> ART1 ADJ1 * N FROM m TO p.

ARC3: NP -> ART1 ADJ1 N1 * FROM 0 TO 3

then this arc is converted to a constituent.

NP1: NP -> ART1 ADJ1 N1 FROM 0 TO 3.

Not all active arcs are ever completed in this sense.

ARC4: S -> NP1 * VP FROM 0 TO 3

Chart Parsing Example

Sentence: 0 The 1 large 2 can 3 can 4 hold 5 the 6 water 7

*sentence*: the *position*: 1

*sentence*: the large *position*: 2

*sentence*: the large can *position*: 3

*sentence*: the large can can *position*: 4

*sentence*: the large can can hold

*sentence*: the large can can hold the

*sentence*: the large can can hold the water

Doing the same steps diagrammatically:

Recording Sentence Structure

A frame-like, or slot/filler representation is suitable: Jack found a dollar

(S SUBJ (NP NAME "Jack")

Other Syntax Representation and Parsing Methods

Limitations of Syntax in NLP

You might also like

sentence: the position: 1

sentence: the large position: 2

sentence: the large can position: 3

sentence: the large can can position: 4

sentence: the large can can hold

sentence: the large can can hold the

sentence: the large can can hold the water