Units - 2.1
Units - 2.1
The tree representation for the sentence John ate the cat is shown in
Figure.
Figure: A tree representation of John ate the cat
The sentence (S) consists of an initial noun phrase (NP) and a verb phrase
(VP). The initial noun phrase is made of the simple NAME John.
The verb phrase is composed of a verb (V) ate and an NP, which consists of
an article (ART) the and a common noun (N) cat. In list notation this same
structure could be represented as
(S (NP (NAME John))
(VP (V ate)
(NP (ART the)
(Neat))))
Trees are a special form of graph, which are structures consisting of labelled
nodes (for example, the nodes are labelled S, NP and so on in Figure) connected
by links.
They are called trees because they look like upside-down trees, and much of the
terminology is derived from this analogy with actual trees.
The node at the top is called the root of the tree, while the nodes at the bottom
are called the leaves. Link is a connection from a parent node to a child node.
The node labelled S in Figure is the parent node of the nodes labelled NP and
VP, and the node labelled NP is in turn the parent node of the node labelled
NAME.
While every child node has a unique parent, a parent may point to many child
nodes. An ancestor of a node N is defined as N's parent, or the parent of its
parent, and so on.
A node is dominated by its ancestor nodes. The root node dominates all other
nodes in the tree.
To construct a tree structure for a sentence, we must know what structures are
legal for English.
A set of rewrite rules describes what tree structures are allowable. These rules
say that a certain symbol may be expanded in the tree by a sequence of other
symbols.
A set of rules that would allow the tree structure in symbols. The above tree
structure is shown as Grammar.
Rule 1 says that an S may consist of an NP followed by a VP.
Rule 2 says that a VP may consist of a V followed by an NP.
Rules 3 and 4 say that an NP may consist of a NAME or may consist of an ART
followed by an N.
5-8 define possible words for the categories. Grammars consisting entirely of
rules with a single symbol on the left-hand side, called the mother, are called
context-free grammars (CFGs).
CFGs are a very important class of grammars, because it can describe a wide
range of syntactic structures found in natural languages.
Grammars have a special symbol called the start symbol. The start symbol will
always be S.
1. Sentence Generation
Sentence generation uses the rules of a grammar to create syntactically correct
sentences. Starting with a designated start symbol (e.g., S), production rules
are applied recursively to derive terminal symbols (words) from non-terminal
symbols.
Example:
When you interact with a chatbot, it generates sentences based on a set of
grammar rules or templates. Let’s say a grammar is designed for a weather
chatbot.
S → Greeting Weather_Info
Greeting → Hello | Hi | Hey
Weather_Info → The weather is Adj today.
Adj → sunny | rainy | cloudy
S → GreetingWeather_Info
Greeting → Hello
Weather_Info → TheweatherisAdjtoday
Adj → sunny
2. Parsing
The second process based on derivations is parsing, which identifies the
structure of sentences given a grammar.
Parsing is the reverse of sentence generation. It identifies how the sentence can
be derived from the grammar. It essentially constructs a parse tree or
derivation structure for the sentence.
For instance, the Grammar derives the sentence The cat chased the mouse.
This can be seen by showing the sequence of rewrites starting from the S
symbol, as follows:
S → NP VP
NP → Det N
VP → V NP
Det → the
N → cat | mouse
V → chased
Parsing Steps
Recognize N:
"cat" → N
Combine Det + N to form NP
"The cat" → NP
Recognize V:
"chased" → V
Recognize Det for the second NP
"the" → Det
Recognize N:
"mouse" → N
Combine Det + N to form NP
"the mouse" → NP
Combine V + NP to form VP
"chased the mouse" → VP
Combine NP + VP to form S
"The cat chased the mouse" → S
3. There are two basic methods of searching. A top-down strategy starts with the S
symbol and then searches through different ways to rewrite the symbols until
the input sentence is generated, or until all possibilities have been explored.
The preceding example demonstrates that The cat chased the mouse is a
legal sentence by showing the derivation that could be found by this process.
4. In a bottom-up strategy, you start with the words in the sentence and use the
rewrite rules backward to reduce the sequence of symbols until it consists solely
of S. The left-hand side of each rule is used to rewrite the symbol on the right-
hand side. A possible bottom-up parse of the sentence John ate the cat is