0% found this document useful (0 votes)
10 views

Syntax_complete

Parsing in Natural Language Processing (NLP) involves analyzing the grammatical structure of sentences to understand relationships between words, which is crucial for applications like machine translation and information extraction. It includes various levels of analysis such as Part of Speech tagging, Dependency parsing, and Full syntactic parsing, while also addressing challenges like ambiguity through machine learning techniques. Treebanks provide a data-driven approach to parsing, allowing for supervised learning and improved accuracy in NLP tasks.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Syntax_complete

Parsing in Natural Language Processing (NLP) involves analyzing the grammatical structure of sentences to understand relationships between words, which is crucial for applications like machine translation and information extraction. It includes various levels of analysis such as Part of Speech tagging, Dependency parsing, and Full syntactic parsing, while also addressing challenges like ambiguity through machine learning techniques. Treebanks provide a data-driven approach to parsing, allowing for supervised learning and improved accuracy in NLP tasks.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

NATURAL LANGUAGE PROCESSING

Dr.John Babu

UNIT-2
SYNTAX
Parsing in Natural Language Processing
Parsing in natural language processing (NLP) is the process of uncovering the hidden structure of a
sentence. Every sentence has an underlying structure that helps in understanding the relationships
between different words. This structure is important in many applications, such as machine
translation, question answering, and information extraction.
For example, consider the sentence: ”The cat sat on the mat.” A syntactic analysis helps in
identifying that ”the cat” is the subject, ”sat” is the verb, and ”on the mat” is the prepositional
phrase indicating location.

Predicate-Argument Structure
Predicate-Argument Structure (PAS) plays a crucial role in parsing by providing a deeper, seman-
tically meaningful analysis of sentences beyond just syntactic parsing. PAS helps in understanding
relationships between words, particularly in terms of who is doing what to whom in a sentence.
This is particularly useful in semantic parsing, machine translation, question answering systems,
and information extraction.
For instance, in ”SaiManesh gave Hemanth a book,” the verb ”gave” connects three arguments:
• SaiManesh (who gave?)
• Hemanth (who received?)
• a book (what was given?)
By analyzing this structure, we can extract meaningful relationships, which is useful in tasks like
semantic role labeling and information retrieval.

Levels of Syntactic Analysis


Syntactic analysis in NLP can be done at different levels:

Basic Level – Part of Speech (POS) Tagging


This involves labeling each word with its part of speech, such as noun, verb, adjective, etc.
Example:
• Sentence: ”The quick brown fox jumps over the lazy dog.”
• Tagged Output: The/DT quick/JJ brown/JJ fox/NN jumps/VBZ over/IN the/DT lazy/JJ
dog/NN.

1
Here, DT (Determiner), JJ (Adjective), NN (Noun), VBZ (Verb, third-person singular present),
and IN (Preposition) are part of speech tags. POS tagging is useful in speech recognition, text-
to-speech conversion, and grammar checking.

Intermediate Level – Dependency Parsing


This identifies the grammatical structure of a sentence by connecting words through dependencies.
Example:

• Sentence: ”She loves ice cream.”

• Dependency Structure:

– ”loves” is the main verb.


– ”She” is the subject of ”loves.”
– ”ice cream” is the object of ”loves.”

This method helps in question answering and text summarization, where understanding word
relationships is crucial.

Advanced Level – Full Syntactic Parsing


This involves building a parse tree, which represents the full grammatical structure of a sentence.
There are two main types of parsing techniques:

• Constituency Parsing (Phrase Structure Trees): Breaks a sentence into subphrases like noun
phrases (NP) and verb phrases (VP).

• Dependency Parsing: Connects words based on grammatical dependencies.

Ambiguity in Parsing
Consider the sentence The boy saw the man with a telescope. The Ambiguity arising here is Did
the boy use the telescope, or did the man have the telescope?
This type of ambiguity makes parsing difficult because multiple possible structures exist. Algo-
rithms must choose the most plausible one. Ambiguity is a major challenge in parsing because
multiple interpretations can exist for the same sentence. This issue arises in:

• Lexical Ambiguity (words with multiple meanings, e.g., ”bank” as a financial institution or
riverbank)

• Structural Ambiguity (different grammatical structures, e.g., ”I saw the man with the tele-
scope”)

• Attachment Ambiguity (where to attach modifiers, e.g., ”old men and women” could mean
both are old or only men are old)

Because of this, parsing algorithms must be designed carefully to resolve ambiguity efficiently.

Natural Language Processing - Dr.John Babu Page 2


Dealing with Ambiguity Using Machine Learning
Since ambiguity creates multiple possible interpretations, machine learning methods are used to
select the most likely structure. Supervised learning (where models are trained on labeled data)
is commonly used in parsing. Some approaches include:
• Statistical Parsing: Uses probabilities to choose the most likely parse tree.
• Neural Network-Based Parsing: Uses deep learning models to understand sentence struc-
tures.
• Transformer Models (like BERT, GPT): Learn sentence patterns based on vast amounts of
data.
These techniques improve accuracy in applications like voice assistants (Siri, Google Assistant)
and chatbots by helping them understand complex sentence structures.
Parsing plays a crucial role in NLP by helping computers understand language structure.
From basic POS tagging to full syntactic parsing, different levels of analysis help in a variety of
applications. Ambiguity makes parsing challenging, but modern machine learning techniques help
resolve these issues efficiently. By understanding these concepts, students can appreciate how
NLP enables real-world applications such as machine translation, question answering, and text
summarization.

Parsing in Natural Language Processing (NLP)


Parsing in natural language processing (NLP) is the process of analyzing the grammatical structure
of a sentence to determine its meaning and organization. It helps computers understand and
process human language more accurately by identifying relationships between words and phrases.

Application in Text-to-Speech (TTS) Systems


One of the important applications of parsing is in text-to-speech (TTS) systems. When converting
written text into spoken words, a TTS system must ensure that the output sounds natural, just
as a native speaker would pronounce it. Consider the sentences:
• He wanted to go for a drive-in movie.
• He wanted to go for a drive in the country.
In the first sentence, drive-in is a compound noun (a type of movie theater where people watch
from their cars), while in the second sentence, drive is a noun, and in the country is a prepositional
phrase. In spoken language, there is a natural pause between drive and in in the second sentence,
whereas in the first sentence, the words are spoken together as one unit. Parsing helps identify
such structural differences, ensuring correct intonation in TTS systems.

Part-of-Speech (POS) Tagging


Another challenge in NLP is part-of-speech (POS) tagging, which assigns the correct grammatical
category (noun, verb, adjective, etc.) to each word in a sentence. For example, in the sentence:
• The cat who lives dangerously had nine lives.
Here, lives appears twice but has different meanings: in who lives dangerously, lives is a verb,
while in had nine lives, lives is a noun. A TTS system must correctly identify these roles to
produce the right pronunciation and rhythm.

Natural Language Processing - Dr.John Babu Page 3


Parsing in Text Summarization
Parsing is also essential in text summarization, where long documents need to be condensed into
a shorter, meaningful summary. For instance, given the sentence:

• Beyond the basic level, the operations of the three products vary widely.

A summarization system may reduce this to:

• The operations of the products vary.

The parse tree systematically breaks down the sentence into its grammatical components,
which include noun phrases (NP), verb phrases (VP), prepositional phrases (PP), and other syn-
tactic units.

Breakdown of the Parse Tree


1. Sentence (S): The root of the tree represents the entire sentence.

2. Prepositional Phrase (PP): The phrase Beyond the basic level is a prepositional phrase that
acts as an adverbial modifier.

3. Noun Phrase (NP): The subject of the sentence, The operations of the three products, is a
noun phrase consisting of a determiner (the) and a plural noun (operations) followed by a
prepositional phrase (of the three products).

4. Verb Phrase (VP): The main action in the sentence is captured in the verb phrase, where
vary is the verb, and widely is an adverb modifying it.

5. Prepositional Phrase within NP (PP): The phrase of the three products further specifies the
noun operations. It consists of a preposition (of ) followed by another noun phrase (the three
products), which includes a determiner (the), a cardinal number (three), and a plural noun
(products).

6. Adverbial Phrase (ADVP): The adverb widely functions as an adverbial phrase, modifying
the verb vary.

Natural Language Processing - Dr.John Babu Page 4


Sentence Compression
To generate the more concise sentence:

• The operations of the products vary.

Certain elements are removed from the parse tree:

• Prepositional Phrase (PP) - ”Beyond the basic level”: This phrase does not change the
essential meaning of the sentence and can be omitted without loss of fluency.

• Cardinal Number (CD) - ”three”: The specific number of products is unnecessary for the
summary and is removed.

• Adverbial Phrase (ADVP) - ”widely”: Though it provides additional detail, it is not crucial
to the core meaning of the sentence.

Role of Parsing in Summarization


Parsing helps in understanding the sentence structure and identifying removable constituents while
maintaining grammatical correctness and coherence. By removing non-essential elements from the
parse tree, a compression model ensures that the resulting sentence remains fluent and meaningful.
This approach is widely used in text summarization, where long sentences or paragraphs need
to be condensed while preserving key information.

Paraphrasing with Parsing


Paraphrasing is another important application of parsing, where sentences are rewritten in different
ways while preserving their original meaning. Consider the sentence:

• Open borders imply increasing racial fragmentation in EUROPEAN COUNTRIES.

This sentence can be rewritten in multiple ways:

• Open borders imply increasing racial fragmentation in the countries of Europe.

• Open borders imply increasing racial fragmentation in European states.

• Open borders imply increasing racial fragmentation in Europe.

Parsing ensures that such substitutions are meaningful and grammatically correct rather than
random word replacements that could lead to awkward or incorrect sentences.

Other Applications of Parsing


Beyond these applications, syntactic parsers are widely used in:

• Machine Translation – Translating text between languages while preserving grammatical


structure.

• Information Extraction – Identifying key details from large text collections.

• Speech Recognition – Improving the accuracy of speech-to-text systems, especially in cases


of unclear or error-prone speech.

Natural Language Processing - Dr.John Babu Page 5


• Dialogue Systems – Enhancing chatbot and virtual assistant responses.

• Knowledge Acquisition – Extracting semantic relationships (e.g., recognizing that ”dog is-a
animal”).

Parsing plays a fundamental role in many modern NLP tasks, enabling more accurate and
fluent language processing.

Treebanks: A Data-Driven Approach to Syntax


Understanding Treebanks and Parsing
Parsing is the process of analyzing the syntactic structure of a given sentence. This is crucial
in NLP applications such as machine translation, information retrieval, and speech recognition.
However, natural language is inherently ambiguous, making parsing a complex task.
To understand the structure of a sentence, we need a mechanism that provides the syntactic
rules. One way to do this is by writing a grammar—a set of formal rules defining sentence
structure. However, manually defining grammar rules is challenging due to the complexity and
variability of natural language.

Context-Free Grammar (CFG)


One approach to defining syntax is using Context-Free Grammar (CFG). A CFG consists of:

• Non-Terminals: Symbols that can be expanded further (e.g., S, NP, VP).

• Terminals: Words in the language (e.g., John, shirt, bought).

• Production Rules: Rules that define how non-terminals expand.

Example CFG:

S -> NP VP
NP -> 'John' | 'pockets' | D N | NP PP
VP -> V NP | VP PP
V -> 'bought'
D -> 'a'
N -> 'shirt'
PP -> P NP
P -> 'with'

This grammar allows us to generate and parse sentences such as:


John bought a shirt with pockets.

Ambiguity in CFGs
One major issue in parsing is ambiguity. Consider the sentence:

• John bought a shirt with pockets.

There are two possible interpretations:

• John bought a shirt that happens to have pockets.

Natural Language Processing - Dr.John Babu Page 6


• John used pockets (as currency) to buy a shirt.
This ambiguity arises because the prepositional phrase (with pockets) can attach to either shirt
or bought.
Similarly, consider:
• Natural language processing book.
Is it:
• A book about natural language processing? (correct interpretation)
• A processing book that is natural language? (incorrect interpretation)
With recursive rules, ambiguity increases exponentially. For instance:
• With 3 words, there are 5 possible parse trees.
• With 6 words, there are 132 possible parse trees.
Due to this exponential growth, parsing becomes computationally expensive.

The Role of Treebanks


To address these problems, NLP researchers use Treebanks—collections of sentences annotated
with their syntactic structure. Each sentence in a treebank has a single correct parse that has
been manually verified. This allows:
• Supervised Learning: Training statistical parsers to learn from real-world sentence struc-
tures.
• Disambiguation: Identifying the most plausible syntax for a sentence.
• Consistency: Standardizing syntactic analysis across a language.

Treebanks vs. Traditional Grammar


Unlike CFGs, treebanks do not explicitly provide grammar rules. Instead, they store syntactic
analyses of real-world sentences. This means:
• A treebank provides actual sentence structures rather than predefined rules.
• Parsers trained on treebanks generalize patterns from real-world data.
• Statistical models can predict the most likely parse for unseen sentences.

Treebanks and Supervised Machine Learning


Treebanks help solve two major problems in syntactic parsing:
• Finding the correct grammar: Instead of explicitly defining rules, a parser learns from correct
examples.
• Ranking ambiguous parses: The parser assigns a probability to each possible parse and picks
the highest-scoring one.
Using statistical parsing models, the parser:
• Learns how words and phrases are structured.
• Predicts the best syntactic structure for new sentences.
• Provides n-best parse trees (e.g., top 3 likely parses).

Natural Language Processing - Dr.John Babu Page 7


Types of Treebanks
There are two main types of syntactic representations in treebanks:

• Dependency Trees

• Phrase Structure Trees

Industry Applications of Treebanks


Treebanks power various NLP applications:

• Machine Translation: Improves sentence structure in Google Translate.

• Speech Recognition: Helps in Siri, Alexa for better speech-to-text conversion.

• Search Engines: Enhances Google Search by understanding query meaning.

• Grammar Checking: Used in Grammarly and Microsoft Word for sentence correction.

Treebanks provide a data-driven approach to syntactic parsing, solving key NLP challenges. By
using supervised learning, treebanks allow statistical parsers to handle ambiguity, complex syntax,
and free word order effectively. While they require manual effort, they are essential for machine
translation, speech recognition, search engines, and AI assistants.

Representation of Syntactic Structure: Syntax Analysis


Using Dependency Graphs
Dependency graphs are a fundamental way to represent the syntactic structure of sentences. Unlike
phrase structure trees, which focus on hierarchical grouping of words into constituents, dependency
graphs emphasize the relationships between words by linking each word (the dependent) to its
syntactic head. These links form a directed graph, representing the structure of the sentence.

Basic Concept of Dependency Graphs


A dependency graph consists of:

• Nodes, which represent words in the sentence.

• Edges, which represent syntactic dependencies between words.

Each word in a sentence, except for the root, is connected to exactly one head. The root of
the sentence is a special node, typically indicated by index 0.
For example, in the sentence:

They persuaded Mr. Trotter to take it back.

A dependency representation would look like this:


Here, persuaded is the root verb, and all other words depend on it.

Natural Language Processing - Dr.John Babu Page 8


Index Word POS Head Label
1 They PRP 2 SBJ
2 persuaded VBD 0 ROOT
3 Mr. NNP 4 NMOD
4 Trotter NNP 2 IOBJ
5 to TO 6 VMOD
6 take VB 2 OBJ
7 it PRP 6 OBJ
8 back RB 6 PRT
9 . . 2 P

Dependency Trees vs. Phrase Structure Trees


The main difference between dependency trees and phrase structure trees is:

• Dependency trees directly link words based on their syntactic roles.

• Phrase structure trees group words into hierarchical units (e.g., noun phrases, verb phrases).

Dependency analysis avoids additional elements like placeholders or empty nodes, making it
more efficient for parsing.

Projectivity in Dependency Graphs


A dependency tree is projective if:

• The words in the sentence can be arranged in a linear order.

• No dependency edges cross when drawn above the words.

Example:

Chris saw a dog yesterday which was blind.

In this case, there are crossing dependencies between yesterday and which was blind, making the
tree nonprojective.
Some languages like Czech and Turkish exhibit more nonprojective dependencies compared to
English.

Multilingual Comparison of Dependency Structures


A study from the CoNLL 2007 shared task provides insights into the percentage of crossing
dependencies in different languages:

Real-World Applications of Dependency Parsing


Dependency parsing is used in:

• Machine Translation: Helps understand sentence structure to improve accuracy (e.g., Google
Translate).

• Speech Recognition: Identifies syntactic relationships in spoken language (e.g., Siri, Alexa).

• Search Engines: Enhances semantic search by understanding relationships between words.

Natural Language Processing - Dr.John Babu Page 9


Language % Crossing Dependencies % Sentences with Nonprojectivity
Arabic 0.4 10.1
Basque 2.9 26.2
Catalan 0.1 2.9
Czech 1.9 23.2
English 0.3 6.7
Greek 1.1 20.3
Hungarian 2.9 26.4
Italian 0.5 7.4
Turkish 5.5 33.3

• Text Summarization: Extracts key relationships to create concise summaries.

Dependency parsing provides an efficient way to analyze syntactic structures by focusing on


relationships between words. It is widely used in NLP applications, particularly for languages
with flexible word order.

Understanding Syntax Analysis with Phrase Structure Trees


In natural language processing, syntax analysis helps us understand the structure of sentences.
One common method is phrase structure analysis, which breaks a sentence into smaller units called
constituents. These constituents group words together based on their grammatical relationships,
forming a hierarchical tree known as a phrase structure tree.
A phrase structure tree is a graphical representation of how different parts of a sentence fit
together. It follows generative grammar principles, which help handle complex sentence structures
like long-distance relationships between words.

Example: A Simple Phrase Structure Tree


Consider the sentence:
Mr. Baker seems especially sensitive.
Its phrase structure tree looks like this:

(S (NP-SBJ (NNP Mr.)


(NNP Baker))
(VP (VBZ seems)
(ADJP-PRD (RB especially)
(JJ sensitive))))

Here’s what this means:

• S (Sentence) is the root of the tree.

• NP-SBJ (Noun Phrase - Subject) represents Mr. Baker as the subject of the sentence.

• VP (Verb Phrase) contains seems especially sensitive as the predicate.

• ADJP-PRD (Adjective Phrase - Predicate) describes the adjective phrase especially sensitive
modifying the verb seems.

Thus, the predicate-argument structure of the sentence is:

Natural Language Processing - Dr.John Babu Page 10


seems((especially(sensitive))(Mr. Baker))

This structure shows that seems is the main verb, with Mr. Baker as the subject and especially
sensitive as the complement.

Phrase Structure Trees vs. Dependency Trees


The same sentence can also be analyzed using dependency trees, which directly connect words
based on grammatical roles. Unlike phrase structure trees, dependency trees do not explicitly
represent constituents like noun or verb phrases.
For example, in a dependency tree for Mr. Baker seems especially sensitive, the word seems
would be the root, with direct links to Mr. Baker and especially sensitive. However, dependency
analysis avoids direct subject-predicate links if it would cause crossing dependencies (overlapping
relationships).
Both approaches are useful:

• Phrase structure trees are better for identifying hierarchical structures and phrase bound-
aries.

• Dependency trees provide a more direct representation of word relationships, making them
useful for machine translation and information extraction.

Null Elements in Phrase Structure Analysis


Phrase structure trees often include null elements, which help represent missing words or long-
distance dependencies in sentences. These elements play a crucial role in syntactic annotation.

Example 1: Wh-movement (Question Formation)


Consider the question:
What is Tim eating?
Its phrase structure tree includes a trace (*T*), indicating the missing direct object of eating:

(SBARQ (WHNP-1 What)


(SQ is (NP-SBJ Tim)
(VP eating (NP *T*-1)))
?)

Predicate-argument structure:

eat(Tim, what)

Here, What is originally part of the sentence Tim is eating what?, but it moves to the front, leaving
behind a trace (*T*), ensuring the correct grammatical structure.

Example 2: Passive Voice


In passive constructions, the logical subject may not appear in the expected place. Consider:
The ball was thrown by Chris.
Its phrase structure tree:

Natural Language Processing - Dr.John Babu Page 11


(S (NP-SBJ-1 The ball)
(VP was (VP thrown)
(NP *-1)
(PP by (NP-LGS Chris))))

Predicate-argument structure:

throw(Chris, the ball)

Here, Chris is the logical subject (the one who throws the ball), while The ball appears as the
grammatical subject.

Example 3: Complex Syntax with Multiple Dependencies


In sentences with multiple dependency relations, different syntactic processes occur together.
Consider:
Who was believed to have been shot?
Phrase structure tree:

(SBARQ (WHNP-1 Who)


(SQ was (NP-SBJ-2 *T*-1)
(VP believed (S (NP-SBJ-3 *-2)
(VP to (VP have
(VP been
(VP shot
(NP *-3))))))))
?)

Predicate-argument structure:

believe(*someone*, shoot(*someone*, who))

Here, Who is the object of shoot, but it is moved to the front, leaving multiple traces (*T*) in the
tree.

Phrase Structure Trees in Different Languages


Phrase structure treebanks vary across languages. Different languages use different annotation
schemes, making parsers trained on English treebanks difficult to apply to other languages.

Example: Chinese Phrase Structure Tree


In the Chinese Treebank, sentence structures are annotated differently. For example:

(IP (NP-SBJ (NP (NN /settlement and sale)


(NN /system))
(CC /and)
(NP (CP (WHNP-2 (-NONE- *OP*))
(CP (IP (NP-SBJ (-NONE- *T*-2))
(VP (VA /new)))
(DEC )))
(NP (NN /verification and cancellation)

Natural Language Processing - Dr.John Babu Page 12


(NN /system))))
(VP (PP-LOC (P /in)
(NP-PN (NR /Tibet)))
(ADVP (AD /fully))
(VP (VV /operating))))

Translation: A (foreign exchange) settlement and sale system and a verification and cancellation
system that is newly created is fully operational in Tibet.

Industry Applications of Phrase Structure Trees


Phrase structure analysis plays a vital role in various NLP applications:

• Machine Translation - Helps in syntax-based translation (e.g., English to Japanese).

• Speech Recognition & Text-to-Speech - Improves intonation and pauses in speech synthesis.

• Question Answering Systems - Extracts relationships between question words and answers.

• Grammar Correction & Text Summarization - Identifies incorrect sentence structures for
tools like Grammarly.

• Sentiment Analysis - Helps detect positive or negative sentiment in text.

Conclusion
Phrase structure trees provide a hierarchical view of sentence syntax, making them essential for
natural language understanding. They are crucial for machine translation, speech processing,
question answering, and sentiment analysis.
By mastering phrase structure trees, we can build better parsers, more accurate translation
systems, and smarter AI-driven language applications.

Understanding Syntax Analysis with Phrase Structure Trees


In natural language processing, syntax analysis helps us understand the structure of sentences.
One common method is phrase structure analysis, which breaks a sentence into smaller units called
constituents. These constituents group words together based on their grammatical relationships,
forming a hierarchical tree known as a phrase structure tree.
A phrase structure tree is a graphical representation of how different parts of a sentence fit
together. It follows generative grammar principles, which help handle complex sentence structures
like long-distance relationships between words.

Example: A Simple Phrase Structure Tree


Consider the sentence:
Mr. Baker seems especially sensitive.
Its phrase structure tree looks like this:

Natural Language Processing - Dr.John Babu Page 13


(S (NP-SBJ (NNP Mr.)
(NNP Baker))
(VP (VBZ seems)
(ADJP-PRD (RB especially)
(JJ sensitive))))

Here’s what this means:

• S (Sentence) is the root of the tree.

• NP-SBJ (Noun Phrase - Subject) represents Mr. Baker as the subject of the sentence.

• VP (Verb Phrase) contains seems especially sensitive as the predicate.

• ADJP-PRD (Adjective Phrase - Predicate) describes the adjective phrase especially sensitive
modifying the verb seems.

Thus, the predicate-argument structure of the sentence is:

seems((especially(sensitive))(Mr. Baker))

This structure shows that seems is the main verb, with Mr. Baker as the subject and especially
sensitive as the complement.

Phrase Structure Trees vs. Dependency Trees


The same sentence can also be analyzed using dependency trees, which directly connect words
based on grammatical roles. Unlike phrase structure trees, dependency trees do not explicitly
represent constituents like noun or verb phrases.
For example, in a dependency tree for Mr. Baker seems especially sensitive, the word seems
would be the root, with direct links to Mr. Baker and especially sensitive. However, dependency
analysis avoids direct subject-predicate links if it would cause crossing dependencies (overlapping
relationships).
Both approaches are useful:

• Phrase structure trees are better for identifying hierarchical structures and phrase bound-
aries.

• Dependency trees provide a more direct representation of word relationships, making them
useful for machine translation and information extraction.

Null Elements in Phrase Structure Analysis


Phrase structure trees often include null elements, which help represent missing words or long-
distance dependencies in sentences. These elements play a crucial role in syntactic annotation.

Example 1: Wh-movement (Question Formation)


Consider the question:
What is Tim eating?
Its phrase structure tree includes a trace (*T*), indicating the missing direct object of eating:

Natural Language Processing - Dr.John Babu Page 14


(SBARQ (WHNP-1 What)
(SQ is (NP-SBJ Tim)
(VP eating (NP *T*-1)))
?)

Predicate-argument structure:

eat(Tim, what)

Here, What is originally part of the sentence Tim is eating what?, but it moves to the front, leaving
behind a trace (*T*), ensuring the correct grammatical structure.

Example 2: Passive Voice


In passive constructions, the logical subject may not appear in the expected place. Consider:
The ball was thrown by Chris.
Its phrase structure tree:

(S (NP-SBJ-1 The ball)


(VP was (VP thrown)
(NP *-1)
(PP by (NP-LGS Chris))))

Predicate-argument structure:

throw(Chris, the ball)

Here, Chris is the logical subject (the one who throws the ball), while The ball appears as the
grammatical subject.

Example 3: Complex Syntax with Multiple Dependencies


In sentences with multiple dependency relations, different syntactic processes occur together.
Consider:
Who was believed to have been shot?
Phrase structure tree:

(SBARQ (WHNP-1 Who)


(SQ was (NP-SBJ-2 *T*-1)
(VP believed (S (NP-SBJ-3 *-2)
(VP to (VP have
(VP been
(VP shot
(NP *-3))))))))
?)

Predicate-argument structure:

believe(*someone*, shoot(*someone*, who))

Here, Who is the object of shoot, but it is moved to the front, leaving multiple traces (*T*) in the
tree.

Natural Language Processing - Dr.John Babu Page 15


Phrase Structure Trees in Different Languages
Phrase structure treebanks vary across languages. Different languages use different annotation
schemes, making parsers trained on English treebanks difficult to apply to other languages.

Example: Chinese Phrase Structure Tree


In the Chinese Treebank, sentence structures are annotated differently. For example:

(IP (NP-SBJ (NP (NN /settlement and sale)


(NN /system))
(CC /and)
(NP (CP (WHNP-2 (-NONE- *OP*))
(CP (IP (NP-SBJ (-NONE- *T*-2))
(VP (VA /new)))
(DEC )))
(NP (NN /verification and cancellation)
(NN /system))))
(VP (PP-LOC (P /in)
(NP-PN (NR /Tibet)))
(ADVP (AD /fully))
(VP (VV /operating))))

Translation: A (foreign exchange) settlement and sale system and a verification and cancellation
system that is newly created is fully operational in Tibet.

Industry Applications of Phrase Structure Trees


Phrase structure analysis plays a vital role in various NLP applications:

• Machine Translation - Helps in syntax-based translation (e.g., English to Japanese).

• Speech Recognition & Text-to-Speech - Improves intonation and pauses in speech synthesis.

• Question Answering Systems - Extracts relationships between question words and answers.

• Grammar Correction & Text Summarization - Identifies incorrect sentence structures for
tools like Grammarly.

• Sentiment Analysis - Helps detect positive or negative sentiment in text.

Conclusion
Phrase structure trees provide a hierarchical view of sentence syntax, making them essential for
natural language understanding. They are crucial for machine translation, speech processing,
question answering, and sentiment analysis.
By mastering phrase structure trees, we can build better parsers, more accurate translation
systems, and smarter AI-driven language applications.

Natural Language Processing - Dr.John Babu Page 16


Hypergraphs and Chart Parsing
Parsing algorithms help process and understand structured data in natural language processing.
While shift-reduce parsing allows for efficient parsing in linear time, it requires an oracle to make
decisions. In cases where backtracking is necessary, the time complexity can grow exponentially in
the worst case. However, Context-Free Grammars (CFGs) provide a worst-case parsing algorithm
that runs in O(n3 ), where n is the length of the input.
Many statistical parsers use chart parsing techniques, which efficiently search through possible
parse trees while overcoming the limitations of strictly left-to-right parsing.

Transformation of CFG for Efficient Parsing


Consider the example CFG G:

N → N ’and’ N
N → N ’or’ N
N → ’a’ | ’b’ | ’c’
To optimize parsing, we rewrite this grammar into a new CFG Gc by introducing new nonter-
minals N and N v:

N → NN
N → ’and’N
N → NNv
N v → ’or’N
N → ’a’|’b’|’c’
This transformation ensures that each right-hand side contains at most two nonterminals,
making parsing more structured.

Specialized CFG for an Input Sentence


For a given input ”a and b or c”, we create a specialized CFG that encodes all possible parse trees
for this particular input.

• The input is divided into spans:

– a → span [0,1]
– b → span [2,3]
– or → span [3,4]
– c → span [4,5]

The new grammar captures valid parse structures:

N [0, 5] → N [0, 1]N [ 1, 5]

N [0, 3] → N [0, 1]N [ 1, 3]

Natural Language Processing - Dr.John Babu Page 17


N [ 1, 3] → ’and’[1, 2]N [2, 3]

N [ 1, 5] → ’and’[1, 2]N [2, 5]

N [0, 5] → N [0, 3]N v[3, 5]

N [2, 5] → N [2, 3]N v[3, 5]

N v[3, 5] → ’or’[3, 4]N [4, 5]

N [0, 1] → ’a’[0, 1]

N [2, 3] → ’b’[2, 3]

N [4, 5] → ’c’[4, 5]
This specialized grammar compacts all possible parse trees for the input sentence, making the
parsing process more efficient.

Hypergraph Representation of Parses


A hypergraph is used to represent all possible parse trees for an input sentence in a compact form.
Instead of treating each parse tree separately, nodes with the same labels (e.g., N[0,5]) are merged
to avoid redundant computations. This optimization significantly improves the parsing efficiency.
The parsing process involves constructing paths from the start symbol to the input tokens
using specialized grammar rules.

Cocke-Kasami-Younger (CKY) Algorithm


The CKY algorithm is a bottom-up parsing technique used to construct parse trees. It systemat-
ically examines spans of increasing lengths to find valid CFG rules.

CKY Algorithm Steps


1. Initialize lexical spans using single-token rules.

2. Expand spans by applying CFG rules to previously identified spans.

3. Continue until a rule spans the entire sentence (i.e., S[0, n]).

The worst-case complexity is O(n3 ), but the number of trees can be exponential in the worst
case. To make parsing efficient, we use probabilistic scoring.

Natural Language Processing - Dr.John Babu Page 18


Optimized Parsing Techniques
Several techniques improve parsing efficiency:

• Viterbi-best Parse

– Selects the most likely parse tree using probabilities.

• Beam Thresholding

– Removes unlikely parses to speed up processing.

• Global Thresholding

– Filters out invalid rules that cannot be combined.

• Coarse-to-Fine Parsing

– Uses a simplified grammar first, then refines with detailed rules.

These optimizations make parsing feasible even for large-scale NLP applications.

Eisner’s Algorithm for Dependency Parsing


For dependency parsing, the Eisner algorithm provides an O(n3 ) parsing method. It introduces a
split-head structure, where each word collects its left and right dependents separately.
This reduces unnecessary combinations, improving efficiency.

Eisner Algorithm Steps (Simplified Pseudocode)


Initialize C[i][j] for single-word spans
for span length = 2 to n:
for i in range(n - span_length):
j = i + span_length
for k in range(i, j):
create incomplete and complete items

This algorithm efficiently finds dependency structures while maintaining a compact represen-
tation.

Applications of Chart Parsing


Chart parsing is widely used in natural language processing:

• Speech Recognition

– Ensures syntactic accuracy in spoken language.

• Machine Translation

– Helps align sentence structures between languages.

• Question Answering Systems

Natural Language Processing - Dr.John Babu Page 19


– Parses questions to extract subject-verb-object relationships.

• Grammar Checking & Text Correction

– Tools like Grammarly use parsing for sentence validation.

Conclusion
Hypergraphs and chart parsing allow for efficient parsing in NLP by representing multiple possible
parses in a compact structure. The CKY algorithm ensures structured parsing, while advanced
techniques like beam thresholding, coarse-to-fine parsing, and A* search significantly improve
efficiency.
Parsing plays a crucial role in speech recognition, machine translation, grammar checking, and
AI-driven language models, making it a core topic in computational linguistics.

Minimum Spanning Trees and Dependency Parsing


In natural language processing, dependency parsing aims to determine how words in a sentence
are related hierarchically. A dependency tree is a directed graph where:

• Each word in a sentence is connected to another word based on grammatical relationships.

• The tree is rooted, meaning there is a single root node (often the main verb).

• There are no cycles, ensuring a well-structured dependency representation.

One efficient way to construct a dependency tree is by using the Minimum Spanning Tree
(MST) algorithm. This algorithm finds the best dependency structure by maximizing the likeli-
hood of correct word relationships.

Relationship Between MST and Dependency Parsing


A minimum spanning tree is a subset of edges in a weighted, connected graph that:

1. Connects all nodes (words) in the sentence without forming a cycle.

2. Has the minimum possible total edge weight, where edge weights represent the likelihood of
dependencies.

In dependency parsing, MST methods are useful because they consider all possible dependency
structures and select the one with the highest overall probability.

Example of MST in Dependency Parsing


Consider the sentence:
”John saw Mary.”
Each word can be connected with dependency edges, each having a weight (score) that indicates
the likelihood of the relationship. These scores are estimated using machine learning models
trained on large annotated datasets.

Natural Language Processing - Dr.John Babu Page 20


Step 1: Construct the Fully Connected Graph
Every word in the sentence is linked to every other word with directed edges, each having a score.

root → ”saw”(10)
”saw” → ”John”(30)
”saw” → ”Mary”(30)
This forms a fully connected directed graph, where the goal is to find the highest-scoring tree.

Step 2: Select the Highest Scoring Incoming Edges


The algorithm picks the best incoming edge for each node:

• ”saw” is assigned ”root → saw” (score = 10).

• ”John” is assigned ”saw → John” (score = 30).

• ”Mary” is assigned ”saw → Mary” (score = 30).

If this selection forms a tree (no cycles), we report it as the final dependency tree.

Step 3: Handle Cycles (if present)


If the selected edges form a cycle, the algorithm:

1. Contracts the cycle into a single node.

2. Recalculates the edge weights.

3. Finds the highest scoring incoming edge for the contracted node.

4. Repeats the MST algorithm on the new graph.

Once all cycles are removed, the highest-scoring dependency tree is selected.

Key Advantages of MST-Based Dependency Parsing


1. Handles Non-Projective Dependencies

• In some languages (e.g., Czech), word order is flexible, leading to crossing dependencies
(words linking non-adjacent words).
• MST-based parsing allows crossing edges, unlike traditional projective parsers.

2. Efficient for Large Sentences

• The algorithm efficiently computes dependencies even for long sentences.

3. Language Agnostic

• Works well for languages with free word order (e.g., Czech, Russian) and strict word
order (e.g., English).

Natural Language Processing - Dr.John Babu Page 21


Industrial Applications of MST-Based Dependency Parsing
1. Machine Translation

• Understanding dependency structures improves word alignment in translations.


• Example: Translating ”I love reading books” into Hindi requires correct word order.

2. Grammar Checking & Text Correction

• Tools like Grammarly use dependency trees to detect incorrect sentence structures.
• Example: Identifying errors in ”She loves play football” (incorrect verb usage).

3. Search Engines & Information Retrieval

• Google and Bing use dependency parsing to improve query understanding.


• Example: Searching ”best books recommended for AI” requires understanding ”best”
modifies ”books”.

4. Sentiment Analysis

• Dependency trees help identify sentiment modifiers.


• Example: *”The movie was not really great”* → The word ”not” modifies ”great”,
indicating a negative sentiment.

Conclusion
MST-based dependency parsing provides a powerful way to analyze sentence structure. It allows
for efficient parsing of complex sentences, supports non-projective dependencies, and is widely
used in machine translation, grammar checking, search engines, and sentiment analysis.
By applying MST algorithms, NLP models can generate accurate dependency structures, lead-
ing to better natural language understanding and processing.

Natural Language Processing - Dr.John Babu Page 22

You might also like