Syntax_complete
Syntax_complete
Dr.John Babu
UNIT-2
SYNTAX
Parsing in Natural Language Processing
Parsing in natural language processing (NLP) is the process of uncovering the hidden structure of a
sentence. Every sentence has an underlying structure that helps in understanding the relationships
between different words. This structure is important in many applications, such as machine
translation, question answering, and information extraction.
For example, consider the sentence: ”The cat sat on the mat.” A syntactic analysis helps in
identifying that ”the cat” is the subject, ”sat” is the verb, and ”on the mat” is the prepositional
phrase indicating location.
Predicate-Argument Structure
Predicate-Argument Structure (PAS) plays a crucial role in parsing by providing a deeper, seman-
tically meaningful analysis of sentences beyond just syntactic parsing. PAS helps in understanding
relationships between words, particularly in terms of who is doing what to whom in a sentence.
This is particularly useful in semantic parsing, machine translation, question answering systems,
and information extraction.
For instance, in ”SaiManesh gave Hemanth a book,” the verb ”gave” connects three arguments:
• SaiManesh (who gave?)
• Hemanth (who received?)
• a book (what was given?)
By analyzing this structure, we can extract meaningful relationships, which is useful in tasks like
semantic role labeling and information retrieval.
1
Here, DT (Determiner), JJ (Adjective), NN (Noun), VBZ (Verb, third-person singular present),
and IN (Preposition) are part of speech tags. POS tagging is useful in speech recognition, text-
to-speech conversion, and grammar checking.
• Dependency Structure:
This method helps in question answering and text summarization, where understanding word
relationships is crucial.
• Constituency Parsing (Phrase Structure Trees): Breaks a sentence into subphrases like noun
phrases (NP) and verb phrases (VP).
Ambiguity in Parsing
Consider the sentence The boy saw the man with a telescope. The Ambiguity arising here is Did
the boy use the telescope, or did the man have the telescope?
This type of ambiguity makes parsing difficult because multiple possible structures exist. Algo-
rithms must choose the most plausible one. Ambiguity is a major challenge in parsing because
multiple interpretations can exist for the same sentence. This issue arises in:
• Lexical Ambiguity (words with multiple meanings, e.g., ”bank” as a financial institution or
riverbank)
• Structural Ambiguity (different grammatical structures, e.g., ”I saw the man with the tele-
scope”)
• Attachment Ambiguity (where to attach modifiers, e.g., ”old men and women” could mean
both are old or only men are old)
Because of this, parsing algorithms must be designed carefully to resolve ambiguity efficiently.
• Beyond the basic level, the operations of the three products vary widely.
The parse tree systematically breaks down the sentence into its grammatical components,
which include noun phrases (NP), verb phrases (VP), prepositional phrases (PP), and other syn-
tactic units.
2. Prepositional Phrase (PP): The phrase Beyond the basic level is a prepositional phrase that
acts as an adverbial modifier.
3. Noun Phrase (NP): The subject of the sentence, The operations of the three products, is a
noun phrase consisting of a determiner (the) and a plural noun (operations) followed by a
prepositional phrase (of the three products).
4. Verb Phrase (VP): The main action in the sentence is captured in the verb phrase, where
vary is the verb, and widely is an adverb modifying it.
5. Prepositional Phrase within NP (PP): The phrase of the three products further specifies the
noun operations. It consists of a preposition (of ) followed by another noun phrase (the three
products), which includes a determiner (the), a cardinal number (three), and a plural noun
(products).
6. Adverbial Phrase (ADVP): The adverb widely functions as an adverbial phrase, modifying
the verb vary.
• Prepositional Phrase (PP) - ”Beyond the basic level”: This phrase does not change the
essential meaning of the sentence and can be omitted without loss of fluency.
• Cardinal Number (CD) - ”three”: The specific number of products is unnecessary for the
summary and is removed.
• Adverbial Phrase (ADVP) - ”widely”: Though it provides additional detail, it is not crucial
to the core meaning of the sentence.
Parsing ensures that such substitutions are meaningful and grammatically correct rather than
random word replacements that could lead to awkward or incorrect sentences.
• Knowledge Acquisition – Extracting semantic relationships (e.g., recognizing that ”dog is-a
animal”).
Parsing plays a fundamental role in many modern NLP tasks, enabling more accurate and
fluent language processing.
Example CFG:
S -> NP VP
NP -> 'John' | 'pockets' | D N | NP PP
VP -> V NP | VP PP
V -> 'bought'
D -> 'a'
N -> 'shirt'
PP -> P NP
P -> 'with'
Ambiguity in CFGs
One major issue in parsing is ambiguity. Consider the sentence:
• Dependency Trees
• Grammar Checking: Used in Grammarly and Microsoft Word for sentence correction.
Treebanks provide a data-driven approach to syntactic parsing, solving key NLP challenges. By
using supervised learning, treebanks allow statistical parsers to handle ambiguity, complex syntax,
and free word order effectively. While they require manual effort, they are essential for machine
translation, speech recognition, search engines, and AI assistants.
Each word in a sentence, except for the root, is connected to exactly one head. The root of
the sentence is a special node, typically indicated by index 0.
For example, in the sentence:
• Phrase structure trees group words into hierarchical units (e.g., noun phrases, verb phrases).
Dependency analysis avoids additional elements like placeholders or empty nodes, making it
more efficient for parsing.
Example:
In this case, there are crossing dependencies between yesterday and which was blind, making the
tree nonprojective.
Some languages like Czech and Turkish exhibit more nonprojective dependencies compared to
English.
• Machine Translation: Helps understand sentence structure to improve accuracy (e.g., Google
Translate).
• Speech Recognition: Identifies syntactic relationships in spoken language (e.g., Siri, Alexa).
• NP-SBJ (Noun Phrase - Subject) represents Mr. Baker as the subject of the sentence.
• ADJP-PRD (Adjective Phrase - Predicate) describes the adjective phrase especially sensitive
modifying the verb seems.
This structure shows that seems is the main verb, with Mr. Baker as the subject and especially
sensitive as the complement.
• Phrase structure trees are better for identifying hierarchical structures and phrase bound-
aries.
• Dependency trees provide a more direct representation of word relationships, making them
useful for machine translation and information extraction.
Predicate-argument structure:
eat(Tim, what)
Here, What is originally part of the sentence Tim is eating what?, but it moves to the front, leaving
behind a trace (*T*), ensuring the correct grammatical structure.
Predicate-argument structure:
Here, Chris is the logical subject (the one who throws the ball), while The ball appears as the
grammatical subject.
Predicate-argument structure:
Here, Who is the object of shoot, but it is moved to the front, leaving multiple traces (*T*) in the
tree.
Translation: A (foreign exchange) settlement and sale system and a verification and cancellation
system that is newly created is fully operational in Tibet.
• Speech Recognition & Text-to-Speech - Improves intonation and pauses in speech synthesis.
• Question Answering Systems - Extracts relationships between question words and answers.
• Grammar Correction & Text Summarization - Identifies incorrect sentence structures for
tools like Grammarly.
Conclusion
Phrase structure trees provide a hierarchical view of sentence syntax, making them essential for
natural language understanding. They are crucial for machine translation, speech processing,
question answering, and sentiment analysis.
By mastering phrase structure trees, we can build better parsers, more accurate translation
systems, and smarter AI-driven language applications.
• NP-SBJ (Noun Phrase - Subject) represents Mr. Baker as the subject of the sentence.
• ADJP-PRD (Adjective Phrase - Predicate) describes the adjective phrase especially sensitive
modifying the verb seems.
seems((especially(sensitive))(Mr. Baker))
This structure shows that seems is the main verb, with Mr. Baker as the subject and especially
sensitive as the complement.
• Phrase structure trees are better for identifying hierarchical structures and phrase bound-
aries.
• Dependency trees provide a more direct representation of word relationships, making them
useful for machine translation and information extraction.
Predicate-argument structure:
eat(Tim, what)
Here, What is originally part of the sentence Tim is eating what?, but it moves to the front, leaving
behind a trace (*T*), ensuring the correct grammatical structure.
Predicate-argument structure:
Here, Chris is the logical subject (the one who throws the ball), while The ball appears as the
grammatical subject.
Predicate-argument structure:
Here, Who is the object of shoot, but it is moved to the front, leaving multiple traces (*T*) in the
tree.
Translation: A (foreign exchange) settlement and sale system and a verification and cancellation
system that is newly created is fully operational in Tibet.
• Speech Recognition & Text-to-Speech - Improves intonation and pauses in speech synthesis.
• Question Answering Systems - Extracts relationships between question words and answers.
• Grammar Correction & Text Summarization - Identifies incorrect sentence structures for
tools like Grammarly.
Conclusion
Phrase structure trees provide a hierarchical view of sentence syntax, making them essential for
natural language understanding. They are crucial for machine translation, speech processing,
question answering, and sentiment analysis.
By mastering phrase structure trees, we can build better parsers, more accurate translation
systems, and smarter AI-driven language applications.
N → N ’and’ N
N → N ’or’ N
N → ’a’ | ’b’ | ’c’
To optimize parsing, we rewrite this grammar into a new CFG Gc by introducing new nonter-
minals N and N v:
N → NN
N → ’and’N
N → NNv
N v → ’or’N
N → ’a’|’b’|’c’
This transformation ensures that each right-hand side contains at most two nonterminals,
making parsing more structured.
– a → span [0,1]
– b → span [2,3]
– or → span [3,4]
– c → span [4,5]
N [0, 1] → ’a’[0, 1]
N [2, 3] → ’b’[2, 3]
N [4, 5] → ’c’[4, 5]
This specialized grammar compacts all possible parse trees for the input sentence, making the
parsing process more efficient.
3. Continue until a rule spans the entire sentence (i.e., S[0, n]).
The worst-case complexity is O(n3 ), but the number of trees can be exponential in the worst
case. To make parsing efficient, we use probabilistic scoring.
• Viterbi-best Parse
• Beam Thresholding
• Global Thresholding
• Coarse-to-Fine Parsing
These optimizations make parsing feasible even for large-scale NLP applications.
This algorithm efficiently finds dependency structures while maintaining a compact represen-
tation.
• Speech Recognition
• Machine Translation
Conclusion
Hypergraphs and chart parsing allow for efficient parsing in NLP by representing multiple possible
parses in a compact structure. The CKY algorithm ensures structured parsing, while advanced
techniques like beam thresholding, coarse-to-fine parsing, and A* search significantly improve
efficiency.
Parsing plays a crucial role in speech recognition, machine translation, grammar checking, and
AI-driven language models, making it a core topic in computational linguistics.
• The tree is rooted, meaning there is a single root node (often the main verb).
One efficient way to construct a dependency tree is by using the Minimum Spanning Tree
(MST) algorithm. This algorithm finds the best dependency structure by maximizing the likeli-
hood of correct word relationships.
2. Has the minimum possible total edge weight, where edge weights represent the likelihood of
dependencies.
In dependency parsing, MST methods are useful because they consider all possible dependency
structures and select the one with the highest overall probability.
root → ”saw”(10)
”saw” → ”John”(30)
”saw” → ”Mary”(30)
This forms a fully connected directed graph, where the goal is to find the highest-scoring tree.
If this selection forms a tree (no cycles), we report it as the final dependency tree.
3. Finds the highest scoring incoming edge for the contracted node.
Once all cycles are removed, the highest-scoring dependency tree is selected.
• In some languages (e.g., Czech), word order is flexible, leading to crossing dependencies
(words linking non-adjacent words).
• MST-based parsing allows crossing edges, unlike traditional projective parsers.
3. Language Agnostic
• Works well for languages with free word order (e.g., Czech, Russian) and strict word
order (e.g., English).
• Tools like Grammarly use dependency trees to detect incorrect sentence structures.
• Example: Identifying errors in ”She loves play football” (incorrect verb usage).
4. Sentiment Analysis
Conclusion
MST-based dependency parsing provides a powerful way to analyze sentence structure. It allows
for efficient parsing of complex sentences, supports non-projective dependencies, and is widely
used in machine translation, grammar checking, search engines, and sentiment analysis.
By applying MST algorithms, NLP models can generate accurate dependency structures, lead-
ing to better natural language understanding and processing.