0% found this document useful (0 votes)

26 views8 pages

Unit 3

Uploaded by

Mayank Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views8 pages

Unit 3

Uploaded by

Mayank Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Unit- 3

Grammars and sentence Structure

Grammar in NLP is a set of rules for constructing sentences in a language used to understand and
analyze the structure of sentences in text data.

This includes identifying parts of speech such as nouns, verbs, and adjectives, determining the
subject and predicate of a sentence, and identifying the relationships between words and phrases.

Grammar is defined as the rules for forming well-structured sentences. Grammar also plays an
essential role in describing the syntactic structure of well-formed programs, like denoting the
syntactical rules used for conversation in natural languages.

 In the theory of formal languages, grammar is also applicable in Computer Science,

mainly in programming languages and data structures. Example - In the C programming
language, the precise grammar rules state how functions are made with the help of lists
and statements.

 Mathematically, a grammar G can be written as a 4-tuple (N, T, S, P) where:

o N or VN = set of non-terminal symbols or variables.
o T or ∑ = set of terminal symbols.
o S = Start symbol where S ∈ N
o P = Production rules for Terminals as well as Non-terminals.
o It has the form �→�α→β, where α and β are strings on ��∪∑VN∪∑, and at
least one symbol of α belongs to VN

Syntax

Each natural language has an underlying structure usually referred to under Syntax. The
fundamental idea of syntax is that words group together to form the constituents like groups of
words or phrases which behave as a single unit. These constituents can combine to form bigger
constituents and, eventually, sentences.

 Syntax describes the regularity and productivity of a language making explicit the
structure of sentences, and the goal of syntactic analysis or parsing is to detect if a
sentence is correct and provide a syntactic structure of a sentence.

Syntax also refers to the way words are arranged together. Let us see some basic ideas related to
syntax:

 Constituency: Groups of words may behave as a single unit or phrase - A constituent, for
example, like a Noun phrase.
 Grammatical relations: These are the formalization of ideas from traditional grammar.
Examples include - subjects and objects.
 Sub categorization and dependency relations: These are the relations between words
and phrases, for example, a Verb followed by an infinitive verb.
 Regular languages and part of speech: Refers to the way words are arranged together
but cannot support easily. Examples are Constituency, Grammatical relations, and Sub
categorization and dependency relations.
 Syntactic categories and their common denotations in NLP: np - noun phrase, vp -
verb phrase, s - sentence, det - determiner (article), n - noun, tv - transitive verb (takes an
object), iv - intransitive verb, prep - preposition, pp - prepositional phrase, adj - adjective

Top-Down and Bottom-Up Parsers

There are 2 types of Parsing techniques present parsing, the first one is Top-down parsing and
the second one is Bottom-up parsing. Top-down parsing is a parsing technique that first looks
at the highest level of the parse tree and works down the parse tree by using the rules of
grammar while Bottom-up Parsing is a parsing technique that first looks at the lowest level of
the parse tree and works up the parse tree by using the rules of grammar.

There are some differences present to differentiate these two parsing techniques, which are
given below:

Top-Down Parsing Bottom-Up Parsing

It is a parsing strategy that first looks at the It is a parsing strategy that first looks at the
highest level of the parse tree and works down lowest level of the parse tree and works up
the parse tree by using the rules of grammar. the parse tree by using the rules of grammar.

Bottom-up parsing can be defined as an

Top-down parsing attempts to find the left
attempt to reduce the input string to the start
most derivations for an input string.
symbol of a grammar.

In this parsing technique we start parsing from In this parsing technique we start parsing
the top (start symbol of parse tree) to down from the bottom (leaf node of the parse tree)
(the leaf node of parse tree) in a top-down to up (the start symbol of the parse tree) in a
manner. bottom-up manner.

This parsing technique uses Left Most This parsing technique uses Right Most
Derivation. Derivation.

The main leftmost decision is to select what The main decision is to select when to use a
production rule to use in order to construct the production rule to reduce the string to get the
string. starting symbol.
Top-Down Parsing Bottom-Up Parsing

Example: Recursive Descent parser. Example: ItsShift Reduce parser.

Transition Network Grammars

Transition Networks:
A transition network is a finite state automaton that is used to represent a part of a grammar. A
transition network parser uses a number of these transition networks to represent its entire
grammar. Each network represents one non-terminal symbol in the grammar.
A transition network is a method of parsing which represents the grammar as a set of a finite
state machine (FSM).

Finite State Machine:

A FSM is a model of computational behavior where each node represents an internal state of the
system and the arcs are the means of moving between the states. They are used in automata
theory to represent grammar. In the case of parsing of natural language, the arcs in the networks
represent either a terminal or a non-terminal symbol.

Types of Transition Networks:

1. Augmented Transition Networks (ATNs): ATN was developed by Wiliam Woods in 1970.
The ATN method of parsing sentences integrates many concepts from Chomsky’s (1957) formal
grammar theory with a matching process resembling a dynamic semantic network.
2. Recursive Transition Networks (RTNs): RTN is a recursive transition network that permits
arc labels to refer to other networks and they, in turn, may refer back to the referring network
rather than just permitting word categories used previously.

Top- Down Chart Parsing

Top-down chart parsing methods, such as Earley’s algorithm, begin with the top-most
nonterminal and then expand downward by predicting rules in the grammar by considering the
rightmost unseen category for each rule. Compared to other search-based parsing algorithms,
top-down parsing can be more efficient, by eliminating many potential local ambiguities as it
expands the tree downwards.

To illustrate a top down chart parse, we will assume the CFG shown in Figure A.3.

Figure A.3. A small CFG for parsing “the dog likes meat”.
S → NP VBD NN → dog | meat
S → NP VP VBD → barked
NP → DT NN VBZ → likes
VP → VBZ NN DT → the

Figure A.4 shows a trace of a top-down chart parse of the sentence “The dog likes meat.”,
showing the edges created in the top-down chart parser implemented in NLTK. The parse begins
with top-down predictions, based on the grammar, and then begins to process the actual input,
which is shown in the third row. As each word is read, there is a top-down prediction followed
by an application of the fundamental rule. After an edge is completed (such as the first NP) then
new predictions are added (e.g., for an upcoming VP).

Figure A.4. Trace of a top-down chart parse of the sentence “The dog likes meat.”

For each of the

sentence rules,
[0:0] S → * NP make a top-down
VBD prediction to create
[0:0] S → * NP an empty, active
VP edge.

Make more top-

[0:0] S → * NP down predictions to
VBD create active edges
[0:0] S → * NP for each of the
VP nonterminal
[0:0] NP → * DT categories just to
NN the right of the dot.

Predict the and use

the fundamental
[0:0] S → * NP rule to create new
VBD edges where the dot
[0:0] S → * NP in the DT rule and
VP in NP rule move to
[0:0] NP → * DT the right.
NN
[0:0] DT → * the
[0:1] DT → the *
[0:1] NP → DT *
NN

[0:0] S → * NP
VBD
[0:0] S → * NP
VP
[0:0] NP → * DT
NN
[0:0] DT → * the Predict dog; apply
[0:1] DT → the * fundamental rule
[0:1] NP → DT * and then make a
NN top down
[1:1] NN → * dog prediction for a VP
[1:2] NN → dog (using the second S
* rule.)
[0:2] NP → DT
NN *
[0:2] S → NP *
VBD
[0:2] S → NP *
VP
[2:2] VP → *
VBZ NN

Predict likes;
apply the
[0:0] S → * NP fundamental rule to
VBD create VBZ-> likes
[0:0] S → * NP *, and again, to add
VP VP -> VBZ * NN
[0:0] NP → * DT
NN
[0:0] DT → * the
[0:1] DT → the *
[0:1] NP → DT *
NN
[1:1] NN → * dog
[1:2] NN → dog *
[0:2] NP → DT
NN *
[0:2] S→ NP *
VBD
[0:2] S → NP *
VP
[2:2] VP → *
VBZ NN
[2:2] VBZ → *
likes
[2:3] VBZ →
likes *
[2:3] VP → VBZ
* NN

Predict meat apply

the fundamental
[0:0] S → * NP rule for meat as a
VBD noun (NN) and
[0:0] S → * NP again to extend the
VP active VP edge, to
[0:0] NP → * DT get VP -> VBZ NN
NN *. Finally, use the
[0:0] DT → * the fundamental rule to
[0:1] DT → the * extend the active S
[0:1] NP → DT * edge, which is now
NN complete.
[1:1] NN → * dog
[1:2] NN → dog *
[0:2] NP → DT
NN *
[0:2] S → NP *
VBD
[0:2] S → NP *
VP
[2:2] VP → *
VBZ NN
[2:2] VBZ → *
likes
[2:3]
VBZ → likes *
[2:3] VP → VBZ
* NN
[3:3] NN → *
meat
[3:4] NN →
meat*
[2:4] VP → VBZ
NN *
[0:4] S → NP VP
*

Feature Systems and augmented grammer

In natural languages there are often agreement restrictions between words and phrases. For
example, the NP "a men" is not correct English because the article a indicates a single object
while the noun "men" indicates a plural object; the noun phrase does not satisfy the number
agreement restriction of English.

There are many other forms of agreement, including subject-verb agreement, gender agreement
for pronouns, restrictions between the head of a phrase and the form of its complement, and so
on. To handle such phenomena conveniently, the grammatical formalism is extended to allow
constituents to have features. For example, we might define a feature NUMBER that may take a
value of either s (for singular) or p (for plural), and we then might write an augmented CFG rule
such as

NP -> ART N only when NUMBER1 agrees with NUMBER2

This rule says that a legal noun phrase consists of an article followed by a noun, but only when
the number feature of the first word agrees with the number feature of the second. This one rule
is equivalent to two CFG rules that would use different terminal symbols for encoding singular
and plural forms of all noun phrases, such as

NP-SING -> ART-SING N-SING

NP-PLURAL -> ART-PLURAL N-PLURAL

While the two approaches seem similar in ease-of-use in this one example, consider that all rules
in the grammar that use an NP on the right-hand side would

Morphological Analysis and the Lexicon

Lexical or Morphological Analysis is the initial step in NLP. It entails recognizing and analyzing
word structures. The collection of words and phrases in a language is referred to as the lexicon.
Lexical analysis is the process of breaking down a text file into paragraphs, phrases, and words.
The source code is scanned as a stream of characters and converted into intelligible lexemes in
this phase. The entire book is divided into paragraphs, phrases, and words.
It refers to the study of text at the level of individual words. It searches for morphemes, which
are the smallest units of a word. The lexical analysis identifies the relationship between these
morphemes and transforms the word into its root form. The word’s probable parts of speech
(POS) are also assigned by a lexical analyzer.

Augmented transition network

An augmented transition network or ATN is a type of graph theoretic structure used in
the operational definition of formal languages, used especially in parsing relatively
complex natural languages, and having wide application in artificial intelligence. An ATN can,
theoretically, analyze the structure of any sentence, however complicated. ATN are modified
transition networks and an extension of RTNs[citation needed].
ATNs build on the idea of using finite state machines (Markov model) to parse sentences. W. A.
Woods in "Transition Network Grammars for Natural Language Analysis" claims that by adding
a recursive mechanism to a finite state model, parsing can be achieved much more efficiently.
Instead of building an automaton for a particular sentence, a collection of transition graphs are
built. A grammatically correct sentence is parsed by reaching a final state in any state graph.
Transitions between these graphs are simply subroutine calls from one state to any initial state on
any graph in the network. A sentence is determined to be grammatically correct if a final state is
reached by the last word in the sentence.
This model meets many of the goals set forth by the nature of language in that it captures the
regularities of the language. That is, if there is a process that operates in a number of
environments, the grammar should encapsulate the process in a single structure. Such
encapsulation not only simplifies the grammar, but has the added bonus of efficiency of
operation. Another advantage of such a model is the ability to postpone decisions. Many
grammars use guessing when an ambiguity comes up. This means that not enough is yet known
about the sentence. By the use of recursion, ATNs solve this inefficiency by postponing
decisions until more is known about a sentence.

Chapter 3
No ratings yet
Chapter 3
71 pages
CD Unit-Ii
No ratings yet
CD Unit-Ii
37 pages
25 Scanning Parsing 3
No ratings yet
25 Scanning Parsing 3
55 pages
Unit-2 - NLP
No ratings yet
Unit-2 - NLP
54 pages
Scanner Parser
No ratings yet
Scanner Parser
62 pages
Syntactic Pattern Recognition: by Nicolette Nicolosi Ishwarryah S Ramanathan
100% (1)
Syntactic Pattern Recognition: by Nicolette Nicolosi Ishwarryah S Ramanathan
42 pages
3 3-NLP
No ratings yet
3 3-NLP
32 pages
Chapter 3 Syntax Analysis
No ratings yet
Chapter 3 Syntax Analysis
54 pages
Chapter 3
No ratings yet
Chapter 3
43 pages
NLP Module 3
No ratings yet
NLP Module 3
41 pages
Longsem2024-25 Cse3015 Eth Ap2024256000125 Reference-material-III
No ratings yet
Longsem2024-25 Cse3015 Eth Ap2024256000125 Reference-material-III
89 pages
NLP Unit 3
No ratings yet
NLP Unit 3
17 pages
Unit Iii
No ratings yet
Unit Iii
17 pages
CH 08
No ratings yet
CH 08
31 pages
Chapter - 3
No ratings yet
Chapter - 3
46 pages
Parsing Techniques
No ratings yet
Parsing Techniques
16 pages
Syntactic Analysis
No ratings yet
Syntactic Analysis
66 pages
NLP Unit Ii
No ratings yet
NLP Unit Ii
30 pages
NLP M3 SPP
No ratings yet
NLP M3 SPP
53 pages
(Week 4) Syntax Analysis (CFG)
No ratings yet
(Week 4) Syntax Analysis (CFG)
50 pages
Unit - 2 NLP - R20
No ratings yet
Unit - 2 NLP - R20
21 pages
PCD - Unit Ii
No ratings yet
PCD - Unit Ii
31 pages
NLP Unit Ii
No ratings yet
NLP Unit Ii
30 pages
Chapter 3 Syntax Analyzer1
No ratings yet
Chapter 3 Syntax Analyzer1
58 pages
Basis For Comparison Top-Down Parsing Bottom-Up Parsing
No ratings yet
Basis For Comparison Top-Down Parsing Bottom-Up Parsing
23 pages
CSC312 2.docx Updated
No ratings yet
CSC312 2.docx Updated
10 pages
Mod - 3
No ratings yet
Mod - 3
51 pages
UNIT-2 Protected
No ratings yet
UNIT-2 Protected
29 pages
Unit 2 (CD)
No ratings yet
Unit 2 (CD)
12 pages
NLP Unit 2 Part 1
No ratings yet
NLP Unit 2 Part 1
28 pages
Module 3 NLP
No ratings yet
Module 3 NLP
32 pages
ACD-UNIT-4 Notes
No ratings yet
ACD-UNIT-4 Notes
32 pages
Lecture15 Parsing
No ratings yet
Lecture15 Parsing
37 pages
XKWorkshopManual PDF
No ratings yet
XKWorkshopManual PDF
3,165 pages
CC 3
No ratings yet
CC 3
29 pages
Syntax Analysis (Part-I)
No ratings yet
Syntax Analysis (Part-I)
88 pages
Syntax Analysis Parsing
No ratings yet
Syntax Analysis Parsing
9 pages
NLP Unit 03
No ratings yet
NLP Unit 03
23 pages
Chapter 3 Compiler Design
No ratings yet
Chapter 3 Compiler Design
42 pages
Parsing Algorithms
No ratings yet
Parsing Algorithms
20 pages
Bottom Up Parsing and Transition Net Grammar
No ratings yet
Bottom Up Parsing and Transition Net Grammar
7 pages
Basic Parsing Techniques
No ratings yet
Basic Parsing Techniques
9 pages
Parsing
No ratings yet
Parsing
10 pages
Unit 2
No ratings yet
Unit 2
29 pages
Compiler Design Unit II-1
No ratings yet
Compiler Design Unit II-1
46 pages
Module No. 3: Parsing Structure in Text
No ratings yet
Module No. 3: Parsing Structure in Text
54 pages
NLP - Unit Ii
No ratings yet
NLP - Unit Ii
13 pages
NLP Unit 2
No ratings yet
NLP Unit 2
13 pages
Unit 3 NLP
No ratings yet
Unit 3 NLP
7 pages
5th Unit NLP
No ratings yet
5th Unit NLP
32 pages
Abu Dhabi Ports Company (PJSC) : Shamal Development - New 33/11Kv Substation
No ratings yet
Abu Dhabi Ports Company (PJSC) : Shamal Development - New 33/11Kv Substation
52 pages
CD Unit-Ii
No ratings yet
CD Unit-Ii
34 pages
CD Unit 2
No ratings yet
CD Unit 2
19 pages
Morphological Parsing
No ratings yet
Morphological Parsing
19 pages
John Zink Burner Control Narratives
100% (3)
John Zink Burner Control Narratives
19 pages
Basic Parsing Techniques - Parsing
No ratings yet
Basic Parsing Techniques - Parsing
20 pages
Understanding Natural Languages
No ratings yet
Understanding Natural Languages
2 pages
Chart Parsers PDF
No ratings yet
Chart Parsers PDF
7 pages
AI Notes Part-3
No ratings yet
AI Notes Part-3
29 pages
Session 3
No ratings yet
Session 3
18 pages
ENGLISH-8-Quarter 2-Week 5
100% (1)
ENGLISH-8-Quarter 2-Week 5
6 pages
ISU Transaction Codes and Table Names - SAP Community
No ratings yet
ISU Transaction Codes and Table Names - SAP Community
8 pages
Econ2330 Ch09
No ratings yet
Econ2330 Ch09
65 pages
Top-Down Parsing: Programming Language Application
No ratings yet
Top-Down Parsing: Programming Language Application
4 pages
TOCFL 基礎級 A2
No ratings yet
TOCFL 基礎級 A2
11 pages
Strat Sim
No ratings yet
Strat Sim
289 pages
Grade 5 Write Expressions A
No ratings yet
Grade 5 Write Expressions A
2 pages
Table of Contents (The Summary) : Intro
No ratings yet
Table of Contents (The Summary) : Intro
14 pages
Recovery Is Everywhere Handout
No ratings yet
Recovery Is Everywhere Handout
3 pages
Lesson 4 (Computer Maintenance)
No ratings yet
Lesson 4 (Computer Maintenance)
4 pages
Chapter-4: Operations, Material and Maketing Management: Definition & Importance of Operational Management
No ratings yet
Chapter-4: Operations, Material and Maketing Management: Definition & Importance of Operational Management
47 pages
Listening Reading Task 1 Task 4 Use of English Task 8: Answers Test 1
No ratings yet
Listening Reading Task 1 Task 4 Use of English Task 8: Answers Test 1
7 pages
Ephesians: What To Do
No ratings yet
Ephesians: What To Do
8 pages
Operating Room
No ratings yet
Operating Room
1 page
Notes Summer 2024 - Finance and Economics Summary
No ratings yet
Notes Summer 2024 - Finance and Economics Summary
3 pages
Gascon, Alexia Denise - Argumentative Essay
No ratings yet
Gascon, Alexia Denise - Argumentative Essay
3 pages
Taller de Circuitos
No ratings yet
Taller de Circuitos
9 pages
Intervention21120-5570393 152823
No ratings yet
Intervention21120-5570393 152823
10 pages
Phil Summa
No ratings yet
Phil Summa
3 pages
Aluminum and Glass Company in Qatar
No ratings yet
Aluminum and Glass Company in Qatar
5 pages
Saon Bhakta@
No ratings yet
Saon Bhakta@
5 pages
Introduction To Soil Ecology
No ratings yet
Introduction To Soil Ecology
15 pages
Bapp Telesys Tag 3
No ratings yet
Bapp Telesys Tag 3
4 pages
UNIT U03 02 Grammar Summary
No ratings yet
UNIT U03 02 Grammar Summary
5 pages
Tropical Soils
No ratings yet
Tropical Soils
5 pages
ED Mid
No ratings yet
ED Mid
1 page
Hazard Identification: 2. Risk Analysis/Evaluation 3. Risk Control
No ratings yet
Hazard Identification: 2. Risk Analysis/Evaluation 3. Risk Control
2 pages
Java4s Com Hibernate
No ratings yet
Java4s Com Hibernate
5 pages
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet

Unit 3

Uploaded by

Unit 3

Uploaded by

Unit- 3

Grammars and sentence Structure

 In the theory of formal languages, grammar is also applicable in Computer Science,

 Mathematically, a grammar G can be written as a 4-tuple (N, T, S, P) where:

Top-Down and Bottom-Up Parsers

Top-Down Parsing Bottom-Up Parsing

Bottom-up parsing can be defined as an

Example: Recursive Descent parser. Example: ItsShift Reduce parser.

Transition Network Grammars

Finite State Machine:

Types of Transition Networks:

Top- Down Chart Parsing

For each of the

Make more top-

Predict the and use

Predict meat apply

Feature Systems and augmented grammer

NP -> ART N only when NUMBER1 agrees with NUMBER2

NP-SING -> ART-SING N-SING

NP-PLURAL -> ART-PLURAL N-PLURAL

Morphological Analysis and the Lexicon

Augmented transition network

You might also like