0% found this document useful (0 votes)
13 views8 pages

Units - 2.1

The document discusses grammars and parsing, focusing on the rules that define sentence structure and the techniques used to analyze sentences. It covers various grammar types, parsing methods, and the importance of tree representations in understanding sentence structure. Additionally, it addresses the criteria for constructing effective grammars, including generality, selectivity, and understandability.

Uploaded by

Bindu Reddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views8 pages

Units - 2.1

The document discusses grammars and parsing, focusing on the rules that define sentence structure and the techniques used to analyze sentences. It covers various grammar types, parsing methods, and the importance of tree representations in understanding sentence structure. Additionally, it addresses the criteria for constructing effective grammars, including generality, selectivity, and understandability.

Uploaded by

Bindu Reddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Unit – II

Grammars and Parsing


2.1 Grammars and Parsing
2.2 Top-Down and Bottom-Up Parsers
2.3 Transition Network Grammars
2.4 Feature Systems and Augmented Grammars
2.5 Morphological Analysis and the Lexicon
2.6 Parsing with Features
2.7 Augmented Transition Networks, Bayes Rule
2.8 Shannon game
2.9 Entropy and Cross Entropy

2.1 Grammars and Parsing


The process of computing the syntactic structure of a sentence involves two critical
components:
1. Grammar:
 It is a set of rules that defines the structure of sentences in a language. It
specifies how words and phrases can be combined to form valid
sentences.
 Grammars can take various forms, such as:
 Context-Free Grammars (CFGs): Used extensively in natural
language processing and programming languages.
 Dependency Grammars: Focus on relationships between words.
 Transformational Grammars: Associated with linguistic theories
(e.g., Chomsky’s theories).
 The grammar outlines the rules for how words and phrases can be
combined to form valid sentences.
2. Parsing Technique:
 Parsing is the process of analyzing a sequence of words (or tokens) to
determine if it conforms to a given grammar. It checks whether the
structure of the input follows the rules defined by the grammar.
 Popular parsing techniques include:
 Top-Down Parsing: Begins with the start symbol of the grammar
and attempts to derive the sentence.
 Bottom-Up Parsing: Starts with the input sentence and attempts
to construct the start symbol.
 Chart Parsing: Uses dynamic programming to efficiently parse
sentences.
 Dependency Parsing: Focuses on identifying grammatical
relationships between words in a sentence.

2.1.1 Grammars and Sentence Structure


 The most common way of representing how a sentence is broken into its
major subparts, and how those subparts are broken up in turn, is as a tree.

 The tree representation for the sentence John ate the cat is shown in
Figure.
Figure: A tree representation of John ate the cat

Example Sentence: "The cat chased the mouse."

 The sentence (S) consists of an initial noun phrase (NP) and a verb phrase
(VP). The initial noun phrase is made of the simple NAME John.

 The verb phrase is composed of a verb (V) ate and an NP, which consists of
an article (ART) the and a common noun (N) cat. In list notation this same
structure could be represented as
(S (NP (NAME John))
(VP (V ate)
(NP (ART the)
(Neat))))

 Since trees are used so frequently in this context, it is necessary to understand


basic terminology.

 Trees are a special form of graph, which are structures consisting of labelled
nodes (for example, the nodes are labelled S, NP and so on in Figure) connected
by links.

 They are called trees because they look like upside-down trees, and much of the
terminology is derived from this analogy with actual trees.

 The node at the top is called the root of the tree, while the nodes at the bottom
are called the leaves. Link is a connection from a parent node to a child node.

 The node labelled S in Figure is the parent node of the nodes labelled NP and
VP, and the node labelled NP is in turn the parent node of the node labelled
NAME.

 While every child node has a unique parent, a parent may point to many child
nodes. An ancestor of a node N is defined as N's parent, or the parent of its
parent, and so on.

 A node is dominated by its ancestor nodes. The root node dominates all other
nodes in the tree.
 To construct a tree structure for a sentence, we must know what structures are
legal for English.

 A set of rewrite rules describes what tree structures are allowable. These rules
say that a certain symbol may be expanded in the tree by a sequence of other
symbols.

 A set of rules that would allow the tree structure in symbols. The above tree
structure is shown as Grammar.
 Rule 1 says that an S may consist of an NP followed by a VP.
 Rule 2 says that a VP may consist of a V followed by an NP.
 Rules 3 and 4 say that an NP may consist of a NAME or may consist of an ART
followed by an N.
 5-8 define possible words for the categories. Grammars consisting entirely of
rules with a single symbol on the left-hand side, called the mother, are called
context-free grammars (CFGs).
 CFGs are a very important class of grammars, because it can describe a wide
range of syntactic structures found in natural languages.
 Grammars have a special symbol called the start symbol. The start symbol will
always be S.

 A grammar is said to derive a sentence if there is a sequence of rules that allow


you to rewrite the start symbol into the sentence.

 Two important processes are based on derivations. The first is sentence


generation, which uses derivations to construct legal sentences.

1. Sentence Generation
 Sentence generation uses the rules of a grammar to create syntactically correct
sentences. Starting with a designated start symbol (e.g., S), production rules
are applied recursively to derive terminal symbols (words) from non-terminal
symbols.
Example:
 When you interact with a chatbot, it generates sentences based on a set of
grammar rules or templates. Let’s say a grammar is designed for a weather
chatbot.
S → Greeting Weather_Info
Greeting → Hello | Hi | Hey
Weather_Info → The weather is Adj today.
Adj → sunny | rainy | cloudy
S → GreetingWeather_Info

Greeting → Hello

Weather_Info → TheweatherisAdjtoday

Adj → sunny

Generated Sentence: "Hello, the weather is sunny today."

2. Parsing
 The second process based on derivations is parsing, which identifies the
structure of sentences given a grammar.

 Parsing is the reverse of sentence generation. It identifies how the sentence can
be derived from the grammar. It essentially constructs a parse tree or
derivation structure for the sentence.

Real-Time Example: Grammar Check in Text Editors


When you write a sentence in a text editor like Microsoft Word or Google Docs, it
checks if the sentence is grammatically correct by parsing it against a grammar.

Input Sentence: "The cat chased the mouse."

 For instance, the Grammar derives the sentence The cat chased the mouse.
This can be seen by showing the sequence of rewrites starting from the S
symbol, as follows:
S → NP VP
NP → Det N
VP → V NP
Det → the
N → cat | mouse
V → chased

Parsing Steps

1. Tokenize the sentence:


Tokens: ["The", "cat", "chased", "the", "mouse"]
2. Apply grammar rules step by step:
Recognize Det:
"The" → Det

Recognize N:
"cat" → N
Combine Det + N to form NP
"The cat" → NP
Recognize V:
"chased" → V
Recognize Det for the second NP
"the" → Det
Recognize N:
"mouse" → N
Combine Det + N to form NP
"the mouse" → NP
Combine V + NP to form VP
"chased the mouse" → VP
Combine NP + VP to form S
"The cat chased the mouse" → S

3. There are two basic methods of searching. A top-down strategy starts with the S
symbol and then searches through different ways to rewrite the symbols until
the input sentence is generated, or until all possibilities have been explored.
The preceding example demonstrates that The cat chased the mouse is a
legal sentence by showing the derivation that could be found by this process.

4. In a bottom-up strategy, you start with the words in the sentence and use the
rewrite rules backward to reduce the sequence of symbols until it consists solely
of S. The left-hand side of each rule is used to rewrite the symbol on the right-
hand side. A possible bottom-up parse of the sentence John ate the cat is

2.1.2 What Makes a Good Grammar


In constructing a grammar for a language, you are interested in
1. Generality, the range of sentences the grammar analyzes correctly;
2. Selectivity, the range of non-sentences it identifies as problematic;
3. Understandability, the simplicity of the grammar itself.
1. Structural Analysis and Simplicity in Small Grammars
 In small grammars, where the language being described is relatively simple,
different structural analyses of a sentence might seem equally reasonable. For
example, a grammar that can only describe a few sentence types may offer
multiple ways to parse or analyze a given sentence, and those analyses may be
equally "understandable." This means that there isn't much practical difference
between one analysis and another, because the grammar is not yet complex
enough to reveal the inherent advantages or disadvantages of one over the
other.

2. Complexity When Extending a Grammar


 As the grammar is extended to cover a broader range of sentences and
sentence structures, the situation changes. In a more complex grammar, one of
the analyses might prove to be more extendable or generalizable, while
another might require substantial modifications to accommodate new sentence
types.
 Extension of Grammar: Extending a grammar means adding rules or
modifying existing rules so that the grammar can describe a broader set of
linguistic constructions, such as more complex sentences or new syntactic
phenomena.

You might also like