0% found this document useful (0 votes)

30 views22 pages

Syntax_complete

Parsing in Natural Language Processing (NLP) involves analyzing the grammatical structure of sentences to understand relationships between words, which is crucial for applications like machine translation and information extraction. It includes various levels of analysis such as Part of Speech tagging, Dependency parsing, and Full syntactic parsing, while also addressing challenges like ambiguity through machine learning techniques. Treebanks provide a data-driven approach to parsing, allowing for supervised learning and improved accuracy in NLP tasks.

Uploaded by

Johnbabu Guttikonda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views22 pages

Syntax_complete

Uploaded by

Johnbabu Guttikonda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

NATURAL LANGUAGE PROCESSING

Dr.John Babu

UNIT-2
SYNTAX
Parsing in Natural Language Processing
Parsing in natural language processing (NLP) is the process of uncovering the hidden structure of a
sentence. Every sentence has an underlying structure that helps in understanding the relationships
between different words. This structure is important in many applications, such as machine
translation, question answering, and information extraction.
For example, consider the sentence: ”The cat sat on the mat.” A syntactic analysis helps in
identifying that ”the cat” is the subject, ”sat” is the verb, and ”on the mat” is the prepositional
phrase indicating location.

Predicate-Argument Structure
Predicate-Argument Structure (PAS) plays a crucial role in parsing by providing a deeper, seman-
tically meaningful analysis of sentences beyond just syntactic parsing. PAS helps in understanding
relationships between words, particularly in terms of who is doing what to whom in a sentence.
This is particularly useful in semantic parsing, machine translation, question answering systems,
and information extraction.
For instance, in ”SaiManesh gave Hemanth a book,” the verb ”gave” connects three arguments:
• SaiManesh (who gave?)
• Hemanth (who received?)
• a book (what was given?)
By analyzing this structure, we can extract meaningful relationships, which is useful in tasks like
semantic role labeling and information retrieval.

Levels of Syntactic Analysis

Syntactic analysis in NLP can be done at different levels:

Basic Level – Part of Speech (POS) Tagging

This involves labeling each word with its part of speech, such as noun, verb, adjective, etc.
Example:
• Sentence: ”The quick brown fox jumps over the lazy dog.”
• Tagged Output: The/DT quick/JJ brown/JJ fox/NN jumps/VBZ over/IN the/DT lazy/JJ
dog/NN.

1
Here, DT (Determiner), JJ (Adjective), NN (Noun), VBZ (Verb, third-person singular present),
and IN (Preposition) are part of speech tags. POS tagging is useful in speech recognition, text-
to-speech conversion, and grammar checking.

Intermediate Level – Dependency Parsing

This identifies the grammatical structure of a sentence by connecting words through dependencies.
Example:

• Sentence: ”She loves ice cream.”

• Dependency Structure:

– ”loves” is the main verb.

– ”She” is the subject of ”loves.”
– ”ice cream” is the object of ”loves.”

This method helps in question answering and text summarization, where understanding word
relationships is crucial.

Advanced Level – Full Syntactic Parsing

This involves building a parse tree, which represents the full grammatical structure of a sentence.
There are two main types of parsing techniques:

• Constituency Parsing (Phrase Structure Trees): Breaks a sentence into subphrases like noun
phrases (NP) and verb phrases (VP).

• Dependency Parsing: Connects words based on grammatical dependencies.

Ambiguity in Parsing
Consider the sentence The boy saw the man with a telescope. The Ambiguity arising here is Did
the boy use the telescope, or did the man have the telescope?
This type of ambiguity makes parsing diﬀicult because multiple possible structures exist. Algo-
rithms must choose the most plausible one. Ambiguity is a major challenge in parsing because
multiple interpretations can exist for the same sentence. This issue arises in:

• Lexical Ambiguity (words with multiple meanings, e.g., ”bank” as a financial institution or
riverbank)

• Structural Ambiguity (different grammatical structures, e.g., ”I saw the man with the tele-
scope”)

• Attachment Ambiguity (where to attach modifiers, e.g., ”old men and women” could mean
both are old or only men are old)

Because of this, parsing algorithms must be designed carefully to resolve ambiguity eﬀiciently.

Natural Language Processing - Dr.John Babu Page 2

Dealing with Ambiguity Using Machine Learning
Since ambiguity creates multiple possible interpretations, machine learning methods are used to
select the most likely structure. Supervised learning (where models are trained on labeled data)
is commonly used in parsing. Some approaches include:
• Statistical Parsing: Uses probabilities to choose the most likely parse tree.
• Neural Network-Based Parsing: Uses deep learning models to understand sentence struc-
tures.
• Transformer Models (like BERT, GPT): Learn sentence patterns based on vast amounts of
data.
These techniques improve accuracy in applications like voice assistants (Siri, Google Assistant)
and chatbots by helping them understand complex sentence structures.
Parsing plays a crucial role in NLP by helping computers understand language structure.
From basic POS tagging to full syntactic parsing, different levels of analysis help in a variety of
applications. Ambiguity makes parsing challenging, but modern machine learning techniques help
resolve these issues eﬀiciently. By understanding these concepts, students can appreciate how
NLP enables real-world applications such as machine translation, question answering, and text
summarization.

Parsing in Natural Language Processing (NLP)

Parsing in natural language processing (NLP) is the process of analyzing the grammatical structure
of a sentence to determine its meaning and organization. It helps computers understand and
process human language more accurately by identifying relationships between words and phrases.

Application in Text-to-Speech (TTS) Systems

One of the important applications of parsing is in text-to-speech (TTS) systems. When converting
written text into spoken words, a TTS system must ensure that the output sounds natural, just
as a native speaker would pronounce it. Consider the sentences:
• He wanted to go for a drive-in movie.
• He wanted to go for a drive in the country.
In the first sentence, drive-in is a compound noun (a type of movie theater where people watch
from their cars), while in the second sentence, drive is a noun, and in the country is a prepositional
phrase. In spoken language, there is a natural pause between drive and in in the second sentence,
whereas in the first sentence, the words are spoken together as one unit. Parsing helps identify
such structural differences, ensuring correct intonation in TTS systems.

Part-of-Speech (POS) Tagging

Another challenge in NLP is part-of-speech (POS) tagging, which assigns the correct grammatical
category (noun, verb, adjective, etc.) to each word in a sentence. For example, in the sentence:
• The cat who lives dangerously had nine lives.
Here, lives appears twice but has different meanings: in who lives dangerously, lives is a verb,
while in had nine lives, lives is a noun. A TTS system must correctly identify these roles to
produce the right pronunciation and rhythm.

Natural Language Processing - Dr.John Babu Page 3

Parsing in Text Summarization
Parsing is also essential in text summarization, where long documents need to be condensed into
a shorter, meaningful summary. For instance, given the sentence:

• Beyond the basic level, the operations of the three products vary widely.

A summarization system may reduce this to:

• The operations of the products vary.

The parse tree systematically breaks down the sentence into its grammatical components,
which include noun phrases (NP), verb phrases (VP), prepositional phrases (PP), and other syn-
tactic units.

Breakdown of the Parse Tree

1. Sentence (S): The root of the tree represents the entire sentence.

2. Prepositional Phrase (PP): The phrase Beyond the basic level is a prepositional phrase that
acts as an adverbial modifier.

3. Noun Phrase (NP): The subject of the sentence, The operations of the three products, is a
noun phrase consisting of a determiner (the) and a plural noun (operations) followed by a
prepositional phrase (of the three products).

4. Verb Phrase (VP): The main action in the sentence is captured in the verb phrase, where
vary is the verb, and widely is an adverb modifying it.

5. Prepositional Phrase within NP (PP): The phrase of the three products further specifies the
noun operations. It consists of a preposition (of ) followed by another noun phrase (the three
products), which includes a determiner (the), a cardinal number (three), and a plural noun
(products).

6. Adverbial Phrase (ADVP): The adverb widely functions as an adverbial phrase, modifying
the verb vary.

Natural Language Processing - Dr.John Babu Page 4

Sentence Compression
To generate the more concise sentence:

• The operations of the products vary.

Certain elements are removed from the parse tree:

• Prepositional Phrase (PP) - ”Beyond the basic level”: This phrase does not change the
essential meaning of the sentence and can be omitted without loss of fluency.

• Cardinal Number (CD) - ”three”: The specific number of products is unnecessary for the
summary and is removed.

• Adverbial Phrase (ADVP) - ”widely”: Though it provides additional detail, it is not crucial
to the core meaning of the sentence.

Role of Parsing in Summarization

Parsing helps in understanding the sentence structure and identifying removable constituents while
maintaining grammatical correctness and coherence. By removing non-essential elements from the
parse tree, a compression model ensures that the resulting sentence remains fluent and meaningful.
This approach is widely used in text summarization, where long sentences or paragraphs need
to be condensed while preserving key information.

Paraphrasing with Parsing

Paraphrasing is another important application of parsing, where sentences are rewritten in different
ways while preserving their original meaning. Consider the sentence:

• Open borders imply increasing racial fragmentation in EUROPEAN COUNTRIES.

This sentence can be rewritten in multiple ways:

• Open borders imply increasing racial fragmentation in the countries of Europe.

• Open borders imply increasing racial fragmentation in European states.

• Open borders imply increasing racial fragmentation in Europe.

Parsing ensures that such substitutions are meaningful and grammatically correct rather than
random word replacements that could lead to awkward or incorrect sentences.

Other Applications of Parsing

Beyond these applications, syntactic parsers are widely used in:

• Machine Translation – Translating text between languages while preserving grammatical

structure.

• Information Extraction – Identifying key details from large text collections.

• Speech Recognition – Improving the accuracy of speech-to-text systems, especially in cases

of unclear or error-prone speech.

Natural Language Processing - Dr.John Babu Page 5

• Dialogue Systems – Enhancing chatbot and virtual assistant responses.

• Knowledge Acquisition – Extracting semantic relationships (e.g., recognizing that ”dog is-a
animal”).

Parsing plays a fundamental role in many modern NLP tasks, enabling more accurate and
fluent language processing.

Treebanks: A Data-Driven Approach to Syntax

Understanding Treebanks and Parsing
Parsing is the process of analyzing the syntactic structure of a given sentence. This is crucial
in NLP applications such as machine translation, information retrieval, and speech recognition.
However, natural language is inherently ambiguous, making parsing a complex task.
To understand the structure of a sentence, we need a mechanism that provides the syntactic
rules. One way to do this is by writing a grammar—a set of formal rules defining sentence
structure. However, manually defining grammar rules is challenging due to the complexity and
variability of natural language.

Context-Free Grammar (CFG)

One approach to defining syntax is using Context-Free Grammar (CFG). A CFG consists of:

• Non-Terminals: Symbols that can be expanded further (e.g., S, NP, VP).

• Terminals: Words in the language (e.g., John, shirt, bought).

• Production Rules: Rules that define how non-terminals expand.

Example CFG:

S -> NP VP
NP -> 'John' | 'pockets' | D N | NP PP
VP -> V NP | VP PP
V -> 'bought'
D -> 'a'
N -> 'shirt'
PP -> P NP
P -> 'with'

This grammar allows us to generate and parse sentences such as:

John bought a shirt with pockets.

Ambiguity in CFGs
One major issue in parsing is ambiguity. Consider the sentence:

• John bought a shirt with pockets.

There are two possible interpretations:

• John bought a shirt that happens to have pockets.

Natural Language Processing - Dr.John Babu Page 6

• John used pockets (as currency) to buy a shirt.
This ambiguity arises because the prepositional phrase (with pockets) can attach to either shirt
or bought.
Similarly, consider:
• Natural language processing book.
Is it:
• A book about natural language processing? (correct interpretation)
• A processing book that is natural language? (incorrect interpretation)
With recursive rules, ambiguity increases exponentially. For instance:
• With 3 words, there are 5 possible parse trees.
• With 6 words, there are 132 possible parse trees.
Due to this exponential growth, parsing becomes computationally expensive.

The Role of Treebanks

To address these problems, NLP researchers use Treebanks—collections of sentences annotated
with their syntactic structure. Each sentence in a treebank has a single correct parse that has
been manually verified. This allows:
• Supervised Learning: Training statistical parsers to learn from real-world sentence struc-
tures.
• Disambiguation: Identifying the most plausible syntax for a sentence.
• Consistency: Standardizing syntactic analysis across a language.

Treebanks vs. Traditional Grammar

Unlike CFGs, treebanks do not explicitly provide grammar rules. Instead, they store syntactic
analyses of real-world sentences. This means:
• A treebank provides actual sentence structures rather than predefined rules.
• Parsers trained on treebanks generalize patterns from real-world data.
• Statistical models can predict the most likely parse for unseen sentences.

Treebanks and Supervised Machine Learning

Treebanks help solve two major problems in syntactic parsing:
• Finding the correct grammar: Instead of explicitly defining rules, a parser learns from correct
examples.
• Ranking ambiguous parses: The parser assigns a probability to each possible parse and picks
the highest-scoring one.
Using statistical parsing models, the parser:
• Learns how words and phrases are structured.
• Predicts the best syntactic structure for new sentences.
• Provides n-best parse trees (e.g., top 3 likely parses).

Natural Language Processing - Dr.John Babu Page 7

Types of Treebanks
There are two main types of syntactic representations in treebanks:

• Dependency Trees

• Phrase Structure Trees

Industry Applications of Treebanks

Treebanks power various NLP applications:

• Machine Translation: Improves sentence structure in Google Translate.

• Speech Recognition: Helps in Siri, Alexa for better speech-to-text conversion.

• Search Engines: Enhances Google Search by understanding query meaning.

• Grammar Checking: Used in Grammarly and Microsoft Word for sentence correction.

Treebanks provide a data-driven approach to syntactic parsing, solving key NLP challenges. By
using supervised learning, treebanks allow statistical parsers to handle ambiguity, complex syntax,
and free word order effectively. While they require manual effort, they are essential for machine
translation, speech recognition, search engines, and AI assistants.

Representation of Syntactic Structure: Syntax Analysis

Using Dependency Graphs
Dependency graphs are a fundamental way to represent the syntactic structure of sentences. Unlike
phrase structure trees, which focus on hierarchical grouping of words into constituents, dependency
graphs emphasize the relationships between words by linking each word (the dependent) to its
syntactic head. These links form a directed graph, representing the structure of the sentence.

Basic Concept of Dependency Graphs

A dependency graph consists of:

• Nodes, which represent words in the sentence.

• Edges, which represent syntactic dependencies between words.

Each word in a sentence, except for the root, is connected to exactly one head. The root of
the sentence is a special node, typically indicated by index 0.
For example, in the sentence:

They persuaded Mr. Trotter to take it back.

A dependency representation would look like this:

Here, persuaded is the root verb, and all other words depend on it.

Natural Language Processing - Dr.John Babu Page 8

Index Word POS Head Label
1 They PRP 2 SBJ
2 persuaded VBD 0 ROOT
3 Mr. NNP 4 NMOD
4 Trotter NNP 2 IOBJ
5 to TO 6 VMOD
6 take VB 2 OBJ
7 it PRP 6 OBJ
8 back RB 6 PRT
9 . . 2 P

Dependency Trees vs. Phrase Structure Trees

The main difference between dependency trees and phrase structure trees is:

• Dependency trees directly link words based on their syntactic roles.

• Phrase structure trees group words into hierarchical units (e.g., noun phrases, verb phrases).

Dependency analysis avoids additional elements like placeholders or empty nodes, making it
more eﬀicient for parsing.

Projectivity in Dependency Graphs

A dependency tree is projective if:

• The words in the sentence can be arranged in a linear order.

• No dependency edges cross when drawn above the words.

Example:

Chris saw a dog yesterday which was blind.

In this case, there are crossing dependencies between yesterday and which was blind, making the
tree nonprojective.
Some languages like Czech and Turkish exhibit more nonprojective dependencies compared to
English.

Multilingual Comparison of Dependency Structures

A study from the CoNLL 2007 shared task provides insights into the percentage of crossing
dependencies in different languages:

Real-World Applications of Dependency Parsing

Dependency parsing is used in:

• Machine Translation: Helps understand sentence structure to improve accuracy (e.g., Google
Translate).

• Speech Recognition: Identifies syntactic relationships in spoken language (e.g., Siri, Alexa).

• Search Engines: Enhances semantic search by understanding relationships between words.

Natural Language Processing - Dr.John Babu Page 9

Language % Crossing Dependencies % Sentences with Nonprojectivity
Arabic 0.4 10.1
Basque 2.9 26.2
Catalan 0.1 2.9
Czech 1.9 23.2
English 0.3 6.7
Greek 1.1 20.3
Hungarian 2.9 26.4
Italian 0.5 7.4
Turkish 5.5 33.3

• Text Summarization: Extracts key relationships to create concise summaries.

Dependency parsing provides an eﬀicient way to analyze syntactic structures by focusing on

relationships between words. It is widely used in NLP applications, particularly for languages
with flexible word order.

Understanding Syntax Analysis with Phrase Structure Trees

In natural language processing, syntax analysis helps us understand the structure of sentences.
One common method is phrase structure analysis, which breaks a sentence into smaller units called
constituents. These constituents group words together based on their grammatical relationships,
forming a hierarchical tree known as a phrase structure tree.
A phrase structure tree is a graphical representation of how different parts of a sentence fit
together. It follows generative grammar principles, which help handle complex sentence structures
like long-distance relationships between words.

Example: A Simple Phrase Structure Tree

Consider the sentence:
Mr. Baker seems especially sensitive.
Its phrase structure tree looks like this:

(S (NP-SBJ (NNP Mr.)

(NNP Baker))
(VP (VBZ seems)
(ADJP-PRD (RB especially)
(JJ sensitive))))

Here’s what this means:

• S (Sentence) is the root of the tree.

• NP-SBJ (Noun Phrase - Subject) represents Mr. Baker as the subject of the sentence.

• VP (Verb Phrase) contains seems especially sensitive as the predicate.

• ADJP-PRD (Adjective Phrase - Predicate) describes the adjective phrase especially sensitive
modifying the verb seems.

Thus, the predicate-argument structure of the sentence is:

Natural Language Processing - Dr.John Babu Page 10

seems((especially(sensitive))(Mr. Baker))

This structure shows that seems is the main verb, with Mr. Baker as the subject and especially
sensitive as the complement.

Phrase Structure Trees vs. Dependency Trees

The same sentence can also be analyzed using dependency trees, which directly connect words
based on grammatical roles. Unlike phrase structure trees, dependency trees do not explicitly
represent constituents like noun or verb phrases.
For example, in a dependency tree for Mr. Baker seems especially sensitive, the word seems
would be the root, with direct links to Mr. Baker and especially sensitive. However, dependency
analysis avoids direct subject-predicate links if it would cause crossing dependencies (overlapping
relationships).
Both approaches are useful:

• Phrase structure trees are better for identifying hierarchical structures and phrase bound-
aries.

• Dependency trees provide a more direct representation of word relationships, making them
useful for machine translation and information extraction.

Null Elements in Phrase Structure Analysis

Phrase structure trees often include null elements, which help represent missing words or long-
distance dependencies in sentences. These elements play a crucial role in syntactic annotation.

Example 1: Wh-movement (Question Formation)

Consider the question:
What is Tim eating?
Its phrase structure tree includes a trace (*T*), indicating the missing direct object of eating:

(SBARQ (WHNP-1 What)

(SQ is (NP-SBJ Tim)
(VP eating (NP *T*-1)))
?)

Predicate-argument structure:

eat(Tim, what)

Here, What is originally part of the sentence Tim is eating what?, but it moves to the front, leaving
behind a trace (*T*), ensuring the correct grammatical structure.

Example 2: Passive Voice

In passive constructions, the logical subject may not appear in the expected place. Consider:
The ball was thrown by Chris.
Its phrase structure tree:

Natural Language Processing - Dr.John Babu Page 11

(S (NP-SBJ-1 The ball)
(VP was (VP thrown)
(NP *-1)
(PP by (NP-LGS Chris))))

Predicate-argument structure:

throw(Chris, the ball)

Here, Chris is the logical subject (the one who throws the ball), while The ball appears as the
grammatical subject.

Example 3: Complex Syntax with Multiple Dependencies

In sentences with multiple dependency relations, different syntactic processes occur together.
Consider:
Who was believed to have been shot?
Phrase structure tree:

(SBARQ (WHNP-1 Who)

(SQ was (NP-SBJ-2 *T*-1)
(VP believed (S (NP-SBJ-3 *-2)
(VP to (VP have
(VP been
(VP shot
(NP *-3))))))))
?)

Predicate-argument structure:

believe(someone, shoot(someone, who))

Here, Who is the object of shoot, but it is moved to the front, leaving multiple traces (*T*) in the
tree.

Phrase Structure Trees in Different Languages

Phrase structure treebanks vary across languages. Different languages use different annotation
schemes, making parsers trained on English treebanks diﬀicult to apply to other languages.

Example: Chinese Phrase Structure Tree

In the Chinese Treebank, sentence structures are annotated differently. For example:

(IP (NP-SBJ (NP (NN /settlement and sale)

(NN /system))
(CC /and)
(NP (CP (WHNP-2 (-NONE- *OP*))
(CP (IP (NP-SBJ (-NONE- *T*-2))
(VP (VA /new)))
(DEC )))
(NP (NN /verification and cancellation)

Natural Language Processing - Dr.John Babu Page 12

(NN /system))))
(VP (PP-LOC (P /in)
(NP-PN (NR /Tibet)))
(ADVP (AD /fully))
(VP (VV /operating))))

Translation: A (foreign exchange) settlement and sale system and a verification and cancellation
system that is newly created is fully operational in Tibet.

Industry Applications of Phrase Structure Trees

Phrase structure analysis plays a vital role in various NLP applications:

• Machine Translation - Helps in syntax-based translation (e.g., English to Japanese).

• Speech Recognition & Text-to-Speech - Improves intonation and pauses in speech synthesis.

• Question Answering Systems - Extracts relationships between question words and answers.

• Grammar Correction & Text Summarization - Identifies incorrect sentence structures for
tools like Grammarly.

• Sentiment Analysis - Helps detect positive or negative sentiment in text.

Conclusion
Phrase structure trees provide a hierarchical view of sentence syntax, making them essential for
natural language understanding. They are crucial for machine translation, speech processing,
question answering, and sentiment analysis.
By mastering phrase structure trees, we can build better parsers, more accurate translation
systems, and smarter AI-driven language applications.

Understanding Syntax Analysis with Phrase Structure Trees

Example: A Simple Phrase Structure Tree

Consider the sentence:
Mr. Baker seems especially sensitive.
Its phrase structure tree looks like this:

Natural Language Processing - Dr.John Babu Page 13

(S (NP-SBJ (NNP Mr.)
(NNP Baker))
(VP (VBZ seems)
(ADJP-PRD (RB especially)
(JJ sensitive))))

Here’s what this means:

• S (Sentence) is the root of the tree.

• NP-SBJ (Noun Phrase - Subject) represents Mr. Baker as the subject of the sentence.

• VP (Verb Phrase) contains seems especially sensitive as the predicate.

• ADJP-PRD (Adjective Phrase - Predicate) describes the adjective phrase especially sensitive
modifying the verb seems.

Thus, the predicate-argument structure of the sentence is:

seems((especially(sensitive))(Mr. Baker))

This structure shows that seems is the main verb, with Mr. Baker as the subject and especially
sensitive as the complement.

Phrase Structure Trees vs. Dependency Trees

• Phrase structure trees are better for identifying hierarchical structures and phrase bound-
aries.

• Dependency trees provide a more direct representation of word relationships, making them
useful for machine translation and information extraction.

Null Elements in Phrase Structure Analysis

Phrase structure trees often include null elements, which help represent missing words or long-
distance dependencies in sentences. These elements play a crucial role in syntactic annotation.

Example 1: Wh-movement (Question Formation)

Consider the question:
What is Tim eating?
Its phrase structure tree includes a trace (*T*), indicating the missing direct object of eating:

Natural Language Processing - Dr.John Babu Page 14

(SBARQ (WHNP-1 What)
(SQ is (NP-SBJ Tim)
(VP eating (NP *T*-1)))
?)

Predicate-argument structure:

eat(Tim, what)

Here, What is originally part of the sentence Tim is eating what?, but it moves to the front, leaving
behind a trace (*T*), ensuring the correct grammatical structure.

Example 2: Passive Voice

In passive constructions, the logical subject may not appear in the expected place. Consider:
The ball was thrown by Chris.
Its phrase structure tree:

(S (NP-SBJ-1 The ball)

(VP was (VP thrown)
(NP *-1)
(PP by (NP-LGS Chris))))

Predicate-argument structure:

throw(Chris, the ball)

Here, Chris is the logical subject (the one who throws the ball), while The ball appears as the
grammatical subject.

Example 3: Complex Syntax with Multiple Dependencies

In sentences with multiple dependency relations, different syntactic processes occur together.
Consider:
Who was believed to have been shot?
Phrase structure tree:

(SBARQ (WHNP-1 Who)

(SQ was (NP-SBJ-2 *T*-1)
(VP believed (S (NP-SBJ-3 *-2)
(VP to (VP have
(VP been
(VP shot
(NP *-3))))))))
?)

Predicate-argument structure:

believe(someone, shoot(someone, who))

Here, Who is the object of shoot, but it is moved to the front, leaving multiple traces (*T*) in the
tree.

Natural Language Processing - Dr.John Babu Page 15

Phrase Structure Trees in Different Languages
Phrase structure treebanks vary across languages. Different languages use different annotation
schemes, making parsers trained on English treebanks diﬀicult to apply to other languages.

Example: Chinese Phrase Structure Tree

In the Chinese Treebank, sentence structures are annotated differently. For example:

(IP (NP-SBJ (NP (NN /settlement and sale)

(NN /system))
(CC /and)
(NP (CP (WHNP-2 (-NONE- *OP*))
(CP (IP (NP-SBJ (-NONE- *T*-2))
(VP (VA /new)))
(DEC )))
(NP (NN /verification and cancellation)
(NN /system))))
(VP (PP-LOC (P /in)
(NP-PN (NR /Tibet)))
(ADVP (AD /fully))
(VP (VV /operating))))

Translation: A (foreign exchange) settlement and sale system and a verification and cancellation
system that is newly created is fully operational in Tibet.

Industry Applications of Phrase Structure Trees

Phrase structure analysis plays a vital role in various NLP applications:

• Machine Translation - Helps in syntax-based translation (e.g., English to Japanese).

• Speech Recognition & Text-to-Speech - Improves intonation and pauses in speech synthesis.

• Question Answering Systems - Extracts relationships between question words and answers.

• Grammar Correction & Text Summarization - Identifies incorrect sentence structures for
tools like Grammarly.

• Sentiment Analysis - Helps detect positive or negative sentiment in text.

Natural Language Processing - Dr.John Babu Page 16

Hypergraphs and Chart Parsing
Parsing algorithms help process and understand structured data in natural language processing.
While shift-reduce parsing allows for eﬀicient parsing in linear time, it requires an oracle to make
decisions. In cases where backtracking is necessary, the time complexity can grow exponentially in
the worst case. However, Context-Free Grammars (CFGs) provide a worst-case parsing algorithm
that runs in O(n3 ), where n is the length of the input.
Many statistical parsers use chart parsing techniques, which eﬀiciently search through possible
parse trees while overcoming the limitations of strictly left-to-right parsing.

Transformation of CFG for Eﬀicient Parsing

Consider the example CFG G:

N → N ’and’ N
N → N ’or’ N
N → ’a’ | ’b’ | ’c’
To optimize parsing, we rewrite this grammar into a new CFG Gc by introducing new nonter-
minals N and N v:

N → NN
N → ’and’N
N → NNv
N v → ’or’N
N → ’a’|’b’|’c’
This transformation ensures that each right-hand side contains at most two nonterminals,
making parsing more structured.

Specialized CFG for an Input Sentence

For a given input ”a and b or c”, we create a specialized CFG that encodes all possible parse trees
for this particular input.

• The input is divided into spans:

– a → span [0,1]
– b → span [2,3]
– or → span [3,4]
– c → span [4,5]

The new grammar captures valid parse structures:

N [0, 5] → N [0, 1]N [ 1, 5]

N [0, 3] → N [0, 1]N [ 1, 3]

Natural Language Processing - Dr.John Babu Page 17

N [ 1, 3] → ’and’[1, 2]N [2, 3]

N [ 1, 5] → ’and’[1, 2]N [2, 5]

N [0, 5] → N [0, 3]N v[3, 5]

N [2, 5] → N [2, 3]N v[3, 5]

N v[3, 5] → ’or’[3, 4]N [4, 5]

N [0, 1] → ’a’[0, 1]

N [2, 3] → ’b’[2, 3]

N [4, 5] → ’c’[4, 5]
This specialized grammar compacts all possible parse trees for the input sentence, making the
parsing process more eﬀicient.

Hypergraph Representation of Parses

A hypergraph is used to represent all possible parse trees for an input sentence in a compact form.
Instead of treating each parse tree separately, nodes with the same labels (e.g., N[0,5]) are merged
to avoid redundant computations. This optimization significantly improves the parsing eﬀiciency.
The parsing process involves constructing paths from the start symbol to the input tokens
using specialized grammar rules.

Cocke-Kasami-Younger (CKY) Algorithm

The CKY algorithm is a bottom-up parsing technique used to construct parse trees. It systemat-
ically examines spans of increasing lengths to find valid CFG rules.

CKY Algorithm Steps

1. Initialize lexical spans using single-token rules.

2. Expand spans by applying CFG rules to previously identified spans.

3. Continue until a rule spans the entire sentence (i.e., S[0, n]).

The worst-case complexity is O(n3 ), but the number of trees can be exponential in the worst
case. To make parsing eﬀicient, we use probabilistic scoring.

Natural Language Processing - Dr.John Babu Page 18

Optimized Parsing Techniques
Several techniques improve parsing eﬀiciency:

• Viterbi-best Parse

– Selects the most likely parse tree using probabilities.

• Beam Thresholding

– Removes unlikely parses to speed up processing.

• Global Thresholding

– Filters out invalid rules that cannot be combined.

• Coarse-to-Fine Parsing

– Uses a simplified grammar first, then refines with detailed rules.

These optimizations make parsing feasible even for large-scale NLP applications.

Eisner’s Algorithm for Dependency Parsing

For dependency parsing, the Eisner algorithm provides an O(n3 ) parsing method. It introduces a
split-head structure, where each word collects its left and right dependents separately.
This reduces unnecessary combinations, improving eﬀiciency.

Eisner Algorithm Steps (Simplified Pseudocode)

Initialize C[i][j] for single-word spans
for span length = 2 to n:
for i in range(n - span_length):
j = i + span_length
for k in range(i, j):
create incomplete and complete items

This algorithm eﬀiciently finds dependency structures while maintaining a compact represen-
tation.

Applications of Chart Parsing

Chart parsing is widely used in natural language processing:

• Speech Recognition

– Ensures syntactic accuracy in spoken language.

• Machine Translation

– Helps align sentence structures between languages.

• Question Answering Systems

Natural Language Processing - Dr.John Babu Page 19

– Parses questions to extract subject-verb-object relationships.

• Grammar Checking & Text Correction

– Tools like Grammarly use parsing for sentence validation.

Conclusion
Hypergraphs and chart parsing allow for eﬀicient parsing in NLP by representing multiple possible
parses in a compact structure. The CKY algorithm ensures structured parsing, while advanced
techniques like beam thresholding, coarse-to-fine parsing, and A* search significantly improve
eﬀiciency.
Parsing plays a crucial role in speech recognition, machine translation, grammar checking, and
AI-driven language models, making it a core topic in computational linguistics.

Minimum Spanning Trees and Dependency Parsing

In natural language processing, dependency parsing aims to determine how words in a sentence
are related hierarchically. A dependency tree is a directed graph where:

• Each word in a sentence is connected to another word based on grammatical relationships.

• The tree is rooted, meaning there is a single root node (often the main verb).

• There are no cycles, ensuring a well-structured dependency representation.

One eﬀicient way to construct a dependency tree is by using the Minimum Spanning Tree
(MST) algorithm. This algorithm finds the best dependency structure by maximizing the likeli-
hood of correct word relationships.

Relationship Between MST and Dependency Parsing

A minimum spanning tree is a subset of edges in a weighted, connected graph that:

1. Connects all nodes (words) in the sentence without forming a cycle.

2. Has the minimum possible total edge weight, where edge weights represent the likelihood of
dependencies.

In dependency parsing, MST methods are useful because they consider all possible dependency
structures and select the one with the highest overall probability.

Example of MST in Dependency Parsing

Consider the sentence:
”John saw Mary.”
Each word can be connected with dependency edges, each having a weight (score) that indicates
the likelihood of the relationship. These scores are estimated using machine learning models
trained on large annotated datasets.

Natural Language Processing - Dr.John Babu Page 20

Step 1: Construct the Fully Connected Graph
Every word in the sentence is linked to every other word with directed edges, each having a score.

root → ”saw”(10)
”saw” → ”John”(30)
”saw” → ”Mary”(30)
This forms a fully connected directed graph, where the goal is to find the highest-scoring tree.

Step 2: Select the Highest Scoring Incoming Edges

The algorithm picks the best incoming edge for each node:

• ”saw” is assigned ”root → saw” (score = 10).

• ”John” is assigned ”saw → John” (score = 30).

• ”Mary” is assigned ”saw → Mary” (score = 30).

If this selection forms a tree (no cycles), we report it as the final dependency tree.

Step 3: Handle Cycles (if present)

If the selected edges form a cycle, the algorithm:

1. Contracts the cycle into a single node.

2. Recalculates the edge weights.

3. Finds the highest scoring incoming edge for the contracted node.

4. Repeats the MST algorithm on the new graph.

Once all cycles are removed, the highest-scoring dependency tree is selected.

Key Advantages of MST-Based Dependency Parsing

1. Handles Non-Projective Dependencies

• In some languages (e.g., Czech), word order is flexible, leading to crossing dependencies
(words linking non-adjacent words).
• MST-based parsing allows crossing edges, unlike traditional projective parsers.

2. Eﬀicient for Large Sentences

• The algorithm eﬀiciently computes dependencies even for long sentences.

3. Language Agnostic

• Works well for languages with free word order (e.g., Czech, Russian) and strict word
order (e.g., English).

Natural Language Processing - Dr.John Babu Page 21

Industrial Applications of MST-Based Dependency Parsing
1. Machine Translation

• Understanding dependency structures improves word alignment in translations.

• Example: Translating ”I love reading books” into Hindi requires correct word order.

2. Grammar Checking & Text Correction

• Tools like Grammarly use dependency trees to detect incorrect sentence structures.
• Example: Identifying errors in ”She loves play football” (incorrect verb usage).

3. Search Engines & Information Retrieval

• Google and Bing use dependency parsing to improve query understanding.

• Example: Searching ”best books recommended for AI” requires understanding ”best”
modifies ”books”.

4. Sentiment Analysis

• Dependency trees help identify sentiment modifiers.

• Example: *”The movie was not really great”* → The word ”not” modifies ”great”,
indicating a negative sentiment.

Conclusion
MST-based dependency parsing provides a powerful way to analyze sentence structure. It allows
for eﬀicient parsing of complex sentences, supports non-projective dependencies, and is widely
used in machine translation, grammar checking, search engines, and sentiment analysis.
By applying MST algorithms, NLP models can generate accurate dependency structures, lead-
ing to better natural language understanding and processing.

Natural Language Processing - Dr.John Babu Page 22

Free Primavera Training Course Learn P6 or P3
50% (2)
Free Primavera Training Course Learn P6 or P3
2 pages
k18 User Manual
No ratings yet
k18 User Manual
59 pages
2018 Com 414 (Compiler Construction)
100% (2)
2018 Com 414 (Compiler Construction)
79 pages
nlp unit 3 part A pdf
No ratings yet
nlp unit 3 part A pdf
75 pages
What Is Parsing
No ratings yet
What Is Parsing
47 pages
NLP UNIT-II
No ratings yet
NLP UNIT-II
42 pages
Unit - 5 Natural Language Processing
No ratings yet
Unit - 5 Natural Language Processing
66 pages
NLP CHAPTER 3
No ratings yet
NLP CHAPTER 3
23 pages
NLP - UNIT II
No ratings yet
NLP - UNIT II
13 pages
Natural Language Processing
No ratings yet
Natural Language Processing
21 pages
Unit 2_Lecture 1
No ratings yet
Unit 2_Lecture 1
19 pages
13-Dependency Grammar-03-09-2024
No ratings yet
13-Dependency Grammar-03-09-2024
31 pages
Module 3 NLP
No ratings yet
Module 3 NLP
32 pages
NLP Unit 03 (1)
No ratings yet
NLP Unit 03 (1)
23 pages
Natural Language Processing PDF
100% (1)
Natural Language Processing PDF
47 pages
NLP UNIT-II
No ratings yet
NLP UNIT-II
71 pages
NLP CHAPTER-1
No ratings yet
NLP CHAPTER-1
24 pages
Unit 2
No ratings yet
Unit 2
140 pages
Chapter15 NaturalLanguage
100% (1)
Chapter15 NaturalLanguage
35 pages
c
No ratings yet
c
54 pages
Natural Language Processing
No ratings yet
Natural Language Processing
47 pages
NLP UNIT-3
No ratings yet
NLP UNIT-3
37 pages
NLP Soln
No ratings yet
NLP Soln
19 pages
Unit 5
No ratings yet
Unit 5
10 pages
Atural Anguage Rocessing: Chandra Prakash LPU
No ratings yet
Atural Anguage Rocessing: Chandra Prakash LPU
59 pages
Chart Parsing Bottom-Up Chart Parsing
No ratings yet
Chart Parsing Bottom-Up Chart Parsing
5 pages
SLoSP 2007 1
No ratings yet
SLoSP 2007 1
42 pages
NLP 2 MD Dilshad
No ratings yet
NLP 2 MD Dilshad
9 pages
Parsing
No ratings yet
Parsing
10 pages
Ai Phases in NLP Sem Vi
No ratings yet
Ai Phases in NLP Sem Vi
3 pages
Unit 2
No ratings yet
Unit 2
15 pages
NLP Module 3
No ratings yet
NLP Module 3
41 pages
lect1-intro-3jan08 (1)
No ratings yet
lect1-intro-3jan08 (1)
94 pages
Unit V Intelligence and Applications: Morphological Analysis/Lexical Analysis
No ratings yet
Unit V Intelligence and Applications: Morphological Analysis/Lexical Analysis
30 pages
Seminar On Natural Language Processing
No ratings yet
Seminar On Natural Language Processing
21 pages
NLP unit-2
No ratings yet
NLP unit-2
18 pages
Lecture NLP
100% (1)
Lecture NLP
38 pages
Mod - 3 (2)
No ratings yet
Mod - 3 (2)
51 pages
Natural Language Processing Unit 3 (1)
No ratings yet
Natural Language Processing Unit 3 (1)
55 pages
Introduction To Natural Language Processing and NLTK
No ratings yet
Introduction To Natural Language Processing and NLTK
23 pages
NLP UNIT-II PPT
No ratings yet
NLP UNIT-II PPT
45 pages
NLP UNIT-2
No ratings yet
NLP UNIT-2
42 pages
Unit II
No ratings yet
Unit II
61 pages
NLP Unit-Ii
No ratings yet
NLP Unit-Ii
118 pages
MNLP Unit-2 (1)
No ratings yet
MNLP Unit-2 (1)
96 pages
Unit 5
No ratings yet
Unit 5
70 pages
Unit 2 Syntactic Processing
No ratings yet
Unit 2 Syntactic Processing
17 pages
4.Chapter5_ Syntactic and Semantic Representations
No ratings yet
4.Chapter5_ Syntactic and Semantic Representations
47 pages
Longsem2024-25 Cse3015 Eth Ap2024256000125 Reference-material-III
No ratings yet
Longsem2024-25 Cse3015 Eth Ap2024256000125 Reference-material-III
89 pages
NLP Ia2
No ratings yet
NLP Ia2
18 pages
Natural Language Processing
No ratings yet
Natural Language Processing
13 pages
ACFrOgBKMtkrKQXYgwzYfGAQxQ0GJjQ4MloahBs6vi5pwqo xRZUN6IRgh8lAAyR2U7sguAn6becvxh174Y RYo84nZ3K9mm OlN3Q JrDvd18FxMzMkCBuxruzd1tH0C6XqndKXsCSXuwHIWVT7olg5FKOstIhFYq-Kh6hMBg
No ratings yet
ACFrOgBKMtkrKQXYgwzYfGAQxQ0GJjQ4MloahBs6vi5pwqo xRZUN6IRgh8lAAyR2U7sguAn6becvxh174Y RYo84nZ3K9mm OlN3Q JrDvd18FxMzMkCBuxruzd1tH0C6XqndKXsCSXuwHIWVT7olg5FKOstIhFYq-Kh6hMBg
32 pages
NLP Self
No ratings yet
NLP Self
22 pages
Applied Ai U5
No ratings yet
Applied Ai U5
48 pages
NLP Unit 2
No ratings yet
NLP Unit 2
20 pages
unit 3 nlp new
No ratings yet
unit 3 nlp new
15 pages
Natural Language Processing
No ratings yet
Natural Language Processing
27 pages
Constituency Parsing Ppt
No ratings yet
Constituency Parsing Ppt
94 pages
Natural Language Processing
No ratings yet
Natural Language Processing
15 pages
Pert24 - NLP For Communication
No ratings yet
Pert24 - NLP For Communication
30 pages
Lecture-8. Only For This Batch
No ratings yet
Lecture-8. Only For This Batch
46 pages
Natural Language Processing
From Everand
Natural Language Processing
Ajit Singh
No ratings yet
Syntax Choices
From Everand
Syntax Choices
Hugo Raines
No ratings yet
PROBABILISTIC Learning Jb-new
No ratings yet
PROBABILISTIC Learning Jb-new
13 pages
SVM SLIDES
No ratings yet
SVM SLIDES
32 pages
SVM_NEW
No ratings yet
SVM_NEW
12 pages
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
UNIT-3_SEMANTICS MATERIAL
No ratings yet
UNIT-3_SEMANTICS MATERIAL
16 pages
Instant Download Digital Electronic Circuits Principles and Practices 1st Edition Shuqin Lou PDF All Chapters
100% (1)
Instant Download Digital Electronic Circuits Principles and Practices 1st Edition Shuqin Lou PDF All Chapters
55 pages
Account Registration
No ratings yet
Account Registration
11 pages
Nios D.el - Ed. Assignment Front Page
No ratings yet
Nios D.el - Ed. Assignment Front Page
1 page
1 s2.0 S1746809423000812 Main
No ratings yet
1 s2.0 S1746809423000812 Main
12 pages
Waterfall Vs Agile
No ratings yet
Waterfall Vs Agile
6 pages
7 Maths
No ratings yet
7 Maths
2 pages
3647-Full Paper-12782-1-10-20230817
No ratings yet
3647-Full Paper-12782-1-10-20230817
6 pages
16.1.3 Lab - Configure Route Redistribution Within The Same Interior Gateway Protocol - ILM
No ratings yet
16.1.3 Lab - Configure Route Redistribution Within The Same Interior Gateway Protocol - ILM
26 pages
Capture d’écran . 2025-02-20 à 15.08.45
No ratings yet
Capture d’écran . 2025-02-20 à 15.08.45
1 page
GEI 100322ga 2
No ratings yet
GEI 100322ga 2
50 pages
2-SAP Customer Activity Repository Overview 20160113
No ratings yet
2-SAP Customer Activity Repository Overview 20160113
47 pages
Batch - 1 Ece - A Section
No ratings yet
Batch - 1 Ece - A Section
21 pages
Xgvela Use of TM Forum Apis: Vance Shipley
No ratings yet
Xgvela Use of TM Forum Apis: Vance Shipley
27 pages
Claim Myunisa Mylife 2017 PDF
No ratings yet
Claim Myunisa Mylife 2017 PDF
12 pages
أساسيات تكنولوجيا المعلومات
No ratings yet
أساسيات تكنولوجيا المعلومات
177 pages
Continue: Ndlea Shortlisted 2020 PDF
No ratings yet
Continue: Ndlea Shortlisted 2020 PDF
3 pages
Estrategia Sr1 Target
No ratings yet
Estrategia Sr1 Target
7 pages
Screenshot 2024-03-19 at 15.33.53
No ratings yet
Screenshot 2024-03-19 at 15.33.53
2 pages
Sachin Kolhe Resume
No ratings yet
Sachin Kolhe Resume
1 page
DSP Iat1 Final
No ratings yet
DSP Iat1 Final
2 pages
Land Information Warfare Activity (LIWA)
No ratings yet
Land Information Warfare Activity (LIWA)
14 pages
Fast Fourier Transform
No ratings yet
Fast Fourier Transform
16 pages
Snowflake · Streamlit
No ratings yet
Snowflake · Streamlit
4 pages
Solved - You Have Data From A Corporation On The Annual Salary O...
No ratings yet
Solved - You Have Data From A Corporation On The Annual Salary O...
1 page
Mobileye+6+-+Technical+Installation+Guide Rev 0.4
No ratings yet
Mobileye+6+-+Technical+Installation+Guide Rev 0.4
65 pages
Cohorts Compilation
No ratings yet
Cohorts Compilation
17 pages
CGM 100L Compiled Questions (Recent)
No ratings yet
CGM 100L Compiled Questions (Recent)
31 pages