0% found this document useful (0 votes)
15 views96 pages

MNLP Unit-2

Uploaded by

gunavardhaneeda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views96 pages

MNLP Unit-2

Uploaded by

gunavardhaneeda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 96

Unit-2

Syntax Analysis

-Presented By,
Dr D. Teja Santosh
Associate Professor, CSE
CVRCE
Why to frame sentences by using words … Why one should actually care for sentences

• Clarity and Understanding


• Expressing Complex Ideas
• Effective Communication
• Grammar and Syntax
• Conveying Emotion and Tone
• Writing and Literary Expression
• Professionalism

• Framing sentences with words and caring about sentence structure are essential for effective
communication, conveying meaning, and ensuring that your message is understood as intended.

• It is a fundamental aspect of language that contributes to clarity, precision, and the overall impact of your
communication.
Forms of expressing syntactic structure

• Constituent or Phrase Structure


(derivation tree = parse tree)

• Dependency Structure
Phrase Structure Model

13 Jan 2006
Phrase Structure Grammar (PSG)

A phrase-structure grammar G consists of a four tuple (V, T, S, P), where

• V is a finite set of alphabets (or vocabulary)


• E.g., N, V, A, Adv, P, NP, VP, AP, AdvP, PP, student, sing, etc.
• T is a finite set of terminal symbols: T  V
• E.g., student, sing, etc.
• S is a distinguished non-terminal symbol, also called start symbol: S  V
• P is a set of production rules

13 Jan 2006
Noun Phrases
John the student the intelligent student

NP NP NP

N Det N Det AdjP N

John the student the intelligent student

13 Jan 2006
Noun Phrase
his first five PhD students

NP

Det Ord Quant N N

his first five PhD students

13 Jan 2006
Noun Phrase
The five best students of my class

NP

Det Quant AP N PP

the five best students of my class

13 Jan 2006
Verb Phrases
can sing can hit the ball

VP VP

Aux V Aux V NP

can sing can hit the ball

13 Jan 2006
Verb Phrase
Can give a flower to Mary
VP

Aux V NP PP

can give a flower to Mary

13 Jan 2006
Example PSG Grammar

S  NP VP
NP  Det N
VP  V NP
Det  the
N  boy, ball
V  hit
Verb Phrase
may make John the chairman
VP

Aux V NP NP

may make John the chairman

13 Jan 2006
Verb Phrase
may find the book very interesting

VP

Aux V NP AP

may find the book very interesting

13 Jan 2006
Prepositional Phrases
in the classroom near the river

PP PP

P NP P NP

in the classroom near the river

13 Jan 2006
Adjective Phrases
intelligent very honest fond of sweets

AP AP AP

A Degree A A PP

intelligent very honest fond of sweets

13 Jan 2006
Adjective Phrase
• very worried that she might have done badly in the
assignment
AP

Degree A S’

very worried

that she might have done badly in the


assignment
13 Jan 2006
Phrase Structure Rules

The boy hit the ball.


• Rewrite Rules:
1. S  NP VP
2. NP  Det N
3. VP  V NP
4. Det  the
5. N  boy, ball
6. V  hit
• We interpret each rule X  Y as the instruction
rewrite X as Y.

13 Jan 2006
Derivation
The boy hit the ball.
• Sentence
NP + VP (1) S NP VP
Det + N + VP (2) NP  Det N
Det + N + V + NP (3) VP  V NP
The + N + V + NP (4) Det  the
The + boy + V + NP (5) N  boy
The + boy + hit + NP (6) V  hit
The + boy + hit + Det + N (2) NP  Det N
The + boy + hit + the + N (4) Det  the
The + boy + hit + the + ball (5) N  ball

13 Jan 2006
PSG Parse Tree
• The boy hit the ball.
S

NP VP

Det N V NP

the boy hit Det N

the ball

13 Jan 2006
PSG Parse Tree
• John wrote those words in the Book of Proverbs.
S

NP VP

PropN V NP PP

P NP

NP PP
John wrote those in
words
the of
book proverbs
13 Jan 2006
Then why structural ambiguities exist for sentences
• Structural ambiguities in sentences arise when the arrangement of words
and phrases allows for multiple interpretations or meanings.
• These ambiguities can occur due to various linguistic factors, including
syntax , semantics, and pragmatics.

Reasons why structural ambiguities exist in sentences:


• Ambiguous Grammar (It is a property of the grammar itself, indicating that the rules of the grammar allow for the
generation of sentences with more than one possible syntactic or semantic interpretation.)
• Word Ambiguity

• Syntactic Ambiguity (the arrangement of words or symbols of a given sentence is such that there are
multiple valid ways to parse it, resulting in different syntactic structures.)
• Semantic Ambiguity
• Pragmatic Ambiguity
Two different constituent parses
Sentence level constructions often involve
• Coordination Coordination involves combining words, phrases, or clauses of equal syntactic
importance using coordinating conjunctions (such as "and," "but," "or") or other coordinating structures.

• Agreement Agreement involves ensuring that different elements within a sentence match
grammatically. This includes agreement in number, person, gender, and sometimes case.
Types of Syntactic Ambiguity
• There are several types of syntactic ambiguity, including:
1.Structural Ambiguity: Ambiguity arising from different possible ways to group
words into phrases or constituents.
• Example: "I saw the man with the telescope." (Does "with the telescope" modify "saw" or
"man"?)
2.Attachment Ambiguity: Ambiguity related to the attachment of phrases to a higher
syntactic structure.
• Example: "I told her I love." (Is "I love" part of what was told, or is it a new statement?)
3.Coordination Ambiguity: Ambiguity in how coordinated elements are grouped
together.
• Example: "I like cooking, reading, and my dog." (Is "my dog" a separate activity or related to
the previous ones?)
4.Prepositional Phrase Attachment Ambiguity: Ambiguity involving the attachment
of prepositional phrases.
• Example: "The old man and woman watched the sunset with a telescope." (Did both the man
and the woman use the telescope?)
Parsing Natural Language

• Parsing natural language involves analyzing the grammatical structure of sentences to


understand their syntactic and semantic components.

• In natural language processing (NLP), the syntactic analysis of natural language input can
vary from being very low-level, such as simply tagging each word in the sentence with a
part of speech (POS), or very high level, such as recovering a structural analysis that
identifies the dependency between each predicate in the sentence and its explicit and
implicit arguments.
1.Tokenization:
1. Definition: Breaking a text into individual words or tokens.
2. Example: "The cat in the hat" -> ['The', 'cat', 'in', 'the', 'hat']
2.Part-of-Speech Tagging (POS Tagging):
1. Definition: Assigning parts of speech (e.g., noun, verb, adjective) to each word in a sentence.
2. Example: "The cat in the hat" -> [('The', 'DT'), ('cat', 'NN'), ('in', 'IN'), ('the', 'DT'), ('hat', 'NN')]
3.Syntax Parsing (Syntactic Parsing):
1. Definition: Determining the syntactic structure of a sentence by analyzing the relationships between
words.
2. Example: "The cat in the hat" -> Tree structure representing how words are connected, e.g., (S (NP
(Det The) (N cat)) (PP (P in) (NP (Det the) (N hat))))
4.Semantic Parsing:
1. Definition: Extracting the meaning or intent from a sentence.
2. Example: "What is the capital of France?" -> {'intent': 'query', 'target': 'capital', 'entity': 'France'}
5.Dependency Parsing:
1. Definition: Analyzing the grammatical structure by identifying relationships between words in a
sentence, represented as a dependency tree.
2. Example: "The cat in the hat" -> Dependency tree with edges indicating grammatical relationships.
• The major bottleneck in parsing natural language is the fact that ambiguity is so pervasive.

• In syntactic parsing, ambiguity is a particularly difficult problem because the most plausible analysis
has to be chosen from an exponentially large number of alternative analyses.

Example Sentences:
• He wanted to go for a drive in the country.
• The cat who lives dangerously had nine lives.
• Beyond the basic level, the operations of the three products vary widely.  The operations of the products vary.
• open borders imply increasing racial fragmentation in EUROPEAN COUNTRIES.

open borders imply


open borders imply open borders increasing racial
increasing racial imply increasing fragmentation in the
fragmentation in european racial european nations.
states. fragmentation in
europe.
Another sentence: John bought a shirt with pockets

• Parsing the sentence with the CFG rules gives us two possible derivations for this sentence.

• In one parse, pockets are a kind of currency that can be used to buy a shirt, and the other

parse, which is the more plausible one, John is purchasing a kind of shirt that has pockets.
https://parts-of-speech.info/
sentence = "natural language processing"

N -> N N
N -> 'natural' | 'language' | 'processing' | 'book'

Note that the ambiguity in the syntactic analysis reflects a real ambiguity: is it a processing

of natural language, or is it a natural way to do language processing?


• So this issue cannot be resolved by changing the formalism in which the rules are written (e.g., by
using finite-state automata, which can be deterministic but cannot simultaneously model both
meanings in a single grammar).

• Any system of writing down syntactic rules should represent this ambiguity.

• However, by using the recursive rule three times, we get five parses for natural language processing
book and for longer and longer input noun phrases,
• using the recursive rule four times, we get 14 parses;
• using it five times, we get 42 parses;
• using it six times, we get 132 parses.
• In fact, for CFGs it can be proved that the number of parses obtained by using the recursive rule n
times is the Catalan number of n.
Syntactically ambiguous sentences in Turkish, Korean and Chinese languages

Turkish:

1."Kediyi gördüm ağaç altında." (I saw the cat under the tree.)
• Ambiguity: It's unclear whether the speaker saw the cat under the tree or saw the cat
and it was under the tree.
2."Çocuğa bisikleti verdi kadın." (The woman gave the bike to the child.)
• Ambiguity: It's unclear whether the woman gave the bike to the child or the child
gave the bike to the woman.
3."Kitabı okumadan eve geldim." (I came home without reading the book.)
• Ambiguity: It's unclear whether the speaker read the book before coming home or
hasn't read it at all.
Contd…

Korean:

1."나는 친구에게 선물했다 고양이." (I gave a gift to my friend, a cat.)


• Ambiguity: It's unclear whether the speaker gave a cat as a gift to their friend or gave
a gift to their friend, who happens to be a cat.
2."그림 그린 남자가 왔어." (The man who drew the picture came.)
• Ambiguity: It's unclear whether the man who came drew a picture or the man who
drew the picture arrived.
3."학생들이 보았던 선생님." (The teacher that the students saw.)
• Ambiguity: It's unclear whether the students saw the teacher, or the teacher saw the
students.
Contd…
Chinese:
Chinese is a context-dependent language, and syntactic ambiguity is less common
compared to languages with more rigid word orders. However, there are still instances
where sentences can be syntactically ambiguous. Here are a couple of examples:

1."他看见了在树上的鸟。" (Tā kànjiànle zài shùshang de niǎo.) (He saw the


bird in the tree.)
• Ambiguity: It's unclear whether he saw a bird in the tree or saw someone in the tree
who is a bird.
2."她给我做饭的女人。" (Tā gěi wǒ zuòfàn de nǚrén.) (She is the woman
who cooks for me.)
• Ambiguity: It's unclear whether she is the woman who cooked for me or the woman
for whom I cooked.
Treebanks

• A treebank is simply a collection of sentences (also called a corpus of text), where each sentence is

provided a complete syntax analysis.

• The syntactic analysis for each sentence has been judged by a human expert as the most plausible

analysis for that sentence.

• A lot of care is taken during the human annotation process to ensure that a consistent treatment is

provided across the treebank for related grammatical phenomena.


TELUGU LANGUAGE SENTENCES (TREEBANK DATASET)
COMPLETE SYNTAX ANALYSIS OF A SINGLE SENTENCE FROM TELUGU TREEBANK
Advantages of Treebanks
• Treebanks provide the solution to two knowledge acquisition problems:
• need to know the syntactic rules for a particular language.
• need to know which analysis is the most plausible for a given input sentence.

• The syntactic analysis is directly given instead of a grammar. Parsing of a text


using data-driven methods that may or may not be grammar-based.

• Each sentence in a treebank has been given its most plausible syntactic analysis,
supervised machine learning methods can be used to learn a scoring function
over all possible syntax analyses.

• Language-specific treebanks contribute significantly to reducing syntactic


ambiguity, providing valuable training and evaluation data for syntactic parsers.
However, complete elimination of ambiguity may not be feasible due to the
inherent complexity of natural language and the need to consider broader
contextual information.
Approaches to construct treebanks for multilingual setting

• Two main approaches to syntax analysis are used to construct treebanks: dependency graphs and
phrase structure trees.

• These two representations are very closely related to each other, and under some assumptions, one
representation can be converted to another.

• Dependency analysis is typically favored for languages such as Czech and Turkish, that have
free(er) word order, where the arguments of a predicate are often seen in different ordering in the
sentence.

• While phrase structure analysis is often used to provide additional information about long-distance
dependencies and mostly in languages like English and French, where the word order is less
flexible.
Representation of Syntactic Structure

Syntax Analysis Using Dependency Graphs

• The main philosophy behind dependency graphs is to connect a word—the head of a phrase—with the
dependents in that phrase.

• The notation connects a head with its dependent using a directed (hence asymmetric) connection.

• Dependency graphs, just like phrase structure trees, is a representation that is consistent with many
different linguistic frameworks.

• The head-dependent relationship could be either semantic (head-modifier) or syntactic (head-specifier).

• The main difference between dependency graphs and phrase structure trees is that dependency analyses
typically make minimal assumptions about syntactic structure and to avoid any annotation of hidden
structure such as, for example, using empty elements as placeholders to represent missing or displaced
arguments of predicates, or any unnecessary hierarchical structure.

• The words in the input sentence are treated as the only vertices in the graph, which are linked together by
directed arcs representing syntactic dependencies.
Example
• example sentence: "The cat is sleeping."
• In this sentence, we can identify the following dependencies:
1.The -> cat: The word "cat" depends on "The" as its head. The relationship here is typically
labeled with the grammatical role, which, in this case, might be something like
"determiner" (indicating that "The" is a determiner for "cat").
2.cat -> is: The word "is" depends on "cat" as its head. This relationship could be labeled
with a grammatical role like "verb" or "predicate."
3.cat -> sleeping: Similarly, "sleeping" depends on "cat" as its head. The relationship might
be labeled as "verb" or "predicate."

So, in this example, the syntactic heads are "The" for "cat," "cat" for "is," and
"cat" for "sleeping."
displaCy Dependency Visualizer · Explosion
Projectivity

• An important notion in dependency analysis is the notion of projectivity, which is a constraint

imposed by the linear order of words on the dependencies between words.

• A projective dependency tree is one where if we put the words in a linear order based on the

sentence with the root symbol in the first position, the dependency arcs can be drawn above the

words without any crossing dependencies.

• Another way to state projectivity is to say that for each word in the sentence, its descendants form

a contiguous substring of the sentence.


A labeled projective dependency tree
(ITALIAN LANGUAGE)

Corresponding English Sentence: root Natural Language Processing book


A labeled unprojective dependency tree (English Sentence)

https://demos.explosion.ai/displacy
Converting such a dependency tree to a CFG with the
asterisk notation gives us two options. Either we can
capture that X3 depends on X2 but fail to capture that
X1 depends on X3:
or we can capture the fact that X1 depends on X3 but
fail to capture that X3 depends on X2:

In fact, there is no CFG that can capture the nonprojective dependency.


NOTE: If we want a dependency parser to only produce projective

dependencies, we can implicitly create an equivalent CFG that will

ignore all nonprojective dependencies.


Syntax Analysis Using Phrase Structure Trees

• A phrase structure syntax analysis of a sentence derives from the traditional sentence
diagrams that partition a sentence into constituents, and larger constituents are formed by
merging smaller ones.

• Phrase structure analysis also typically incorporate ideas from generative grammar (from
linguistics) to deal with displaced constituents or apparent long-distance relationships
between heads and constituents.

• A phrase structure tree can be viewed as implicitly having a predicate-argument structure


associated with it.
Sentence from Penn Treebank: Mr. Baker seems especially sensitive
This annotation helps to identify the main components of the sentence:

the subject (what the sentence is about) and,

the predicate (what the subject is doing or the action of the sentence).
In this tree:
•The subject marker (↑) still points to the subject of the sentence, which is the
noun phrase "(Det) (N)" representing "The cat."
•The predicate marker (↓) points to the predicate of the sentence, which is the
verb phrase "(V) (V)" representing "is sleeping."
•Below the predicate node, there is a branch representing the predicate-argument
structure.
•The first argument is a noun "(N)" representing the subject "cat."
•The second argument is an adverb "(Adv)" representing the adverbial modifier
"sleeping.“

This tree structure with predicate-argument annotations provides a more detailed


representation of the relationships between the elements of the sentence, capturing
both the syntactic structure and the semantic roles of the constituents.
Steps to Represent Phrase Structure Tree in Predicate-Argument Structure

Representing a phrase structure tree in predicate-argument structure involves annotating the tree nodes with semantic
roles and relationships between the verb (predicate) and its arguments. Here are the steps to represent a phrase
structure tree in predicate-argument structure:

1. Build the Phrase Structure Tree:


Construct a basic phrase structure tree for the given sentence using a tree data structure. Each node in the tree
represents a constituent (phrase or word), and the edges represent the syntactic relationships between them.

2. Identify the Verb Phrase (VP):


Locate the verb phrase (VP) in the phrase structure tree. The VP typically contains the main verb and its arguments.

3. Annotate Verb with Semantic Role:


Annotate the main verb in the VP with its semantic role. This role represents the function of the verb in the sentence,
such as "action," "state," etc.

4. Identify Arguments:
Identify the arguments of the verb in the VP. Arguments are typically noun phrases (NP) or other phrases that fulfill
specific roles in relation to the verb (e.g., subject, object).
5. Annotate Arguments with Semantic Roles:
Annotate each argument with its semantic role. Common roles include "agent" for the entity performing the action,
"theme" for the entity affected by the action, etc.
6. Connect Verb to Arguments:
Establish connections between the main verb and its arguments in the VP. This step involves indicating which argument
fills each specific role associated with the verb.
7. Update Tree Labels:
Update the labels of the tree nodes to reflect the semantic roles and relationships. This might involve adding labels like
"ARG0" for the first argument, "ARG1" for the second argument, and so on.
8. Optional: Add Adverbial Modifiers:
If there are adverbial modifiers in the sentence, such as adverbs or prepositional phrases, annotate them with their
semantic roles and connect them to the appropriate nodes in the tree.
9. Visualize the Predicate-Argument Structure:
Optionally, visualize the modified tree with semantic roles and relationships to represent the predicate-argument
structure.
10. Validate and Refine:
Review the representation to ensure that the semantic roles are accurately assigned, and the relationships between the
verb and its arguments are appropriately captured. Refine the representation as needed.
Introduction to Parsing

• Parsing is the process of examining the grammatical structure and relationships inside a
given sentence or text in natural language processing (NLP). It involves analyzing the text to
determine the roles of specific words, such as nouns, verbs, and adjectives, as well as their
interrelationships.

• This analysis produces a structured representation of the text, allowing NLP computers to
understand how words in a phrase connect to one another. Parsers expose the structure of a
sentence by constructing parse trees or dependency trees that illustrate the hierarchical and
syntactic relationships between words.

• This essential NLP stage is crucial for a variety of language understanding tasks, which allow
machines to extract meaning, provide coherent answers, and execute tasks such as machine
translation, sentiment analysis, and information extraction.
Strengths and Limitations of Rule-based parsers and Data-driven parsers

• Rule-based parsers are more likely to explicitly recognize and reject


ungrammatical sentences based on predefined rules. However, their
effectiveness depends on the quality and completeness of those rules.

• Data-driven parsers, while not explicitly designed to identify


ungrammaticality, may sometimes struggle with novel or extremely
ungrammatical constructions.
Parsing Techniques in NLP

• Top-down parsing
It attempts to derive the sentence from the start symbol, and the production
tree is created from top down.

• Bottom-up parsing
Bottom-up parsing begins with the words of input and attempts to create
trees from the words up, again by applying grammar rules one at a time.
Parsing Algorithms
Shift Reduce Parsing (Bottom-Up Parsing)
• Shift-Reduce parsing is a common technique used in Natural Language Processing (NLP)
for syntactic parsing.

• It is particularly popular in dependency parsing, where the goal is to identify the


grammatical relationships between words in a sentence.

• Shift-Reduce parsing operates by successively shifting input words onto a stack and then
reducing them according to a set of predefined rules until a parse tree or dependency
structure is formed.

• It's particularly well-suited for dependency parsing, which is concerned with identifying
the syntactic relationships (dependencies) between words in a sentence.
Q) Does shift reduce parser generates
multiple parse trees for an ambiguous
sentence?

A) No, a shift-reduce parser typically


generates a single parse tree for a given
input sentence. In the case of an
ambiguous sentence, where multiple
parse trees are possible, the parser might
choose one interpretation based on its
parsing strategy or heuristics. However, it
doesn't inherently generate multiple
parse trees.
Hypergraphs and Chart Parsing

Hypergraphs:

• A hypergraph is a generalization of a graph in which an edge can connect


more than two nodes. In the context of NLP, hypergraphs are often used to
represent complex linguistic structures, especially when dealing with non-
projective or discontinuous syntactic dependencies.

• In traditional graphs, edges connect pairs of nodes, while in hypergraphs,


hyperedges can connect multiple nodes. This makes hypergraphs suitable
for capturing more intricate relationships in linguistic structures, such as
long-distance dependencies, non-contiguous constituents, and more.
Chart Parsing:

• Chart parsing is a parsing technique used to efficiently analyze the syntactic


structure of sentences.
• It's commonly employed in the context of context-free grammars (CFGs)
and is particularly useful for parsing ambiguous or non-projective
sentences.
• Chart parsing uses dynamic programming and a chart data structure to
store and reuse intermediate parsing results, avoiding redundant
computations.
Hypergraphs in Chart Parsing:

• Hypergraphs can be used in conjunction with chart parsing to represent


more complex linguistic structures.
• The hypergraph representation is especially beneficial when dealing with
syntactic structures that cannot be adequately captured by traditional
graphs.
• By employing hypergraphs, chart parsing can handle a broader range of
linguistic phenomena, including non-projectivity, crossing dependencies, and
discontinuous constituents.
Chart Parsing Example
Earley Chart Parsing or Earley Parsing (Top-down parsing)

https://www.youtube.com/watch?v=LDX9qGVa2l0

Q) Does chart parser generates multiple parse trees for an ambiguous sentence?
A) Yes, chart parsers have the capability to generate multiple parse trees for an ambiguous
sentence. Chart parsing algorithms typically use dynamic programming and construct a
parse chart that records partial parse results. In the presence of ambiguity, the parser
explores different possibilities and may produce multiple valid parse trees corresponding to
different syntactic interpretations of the input sentence.
Earley Parsing Example

https://www.youtube.com/watch?v=9GIgYd1OWfQ
CKY or CYK Algorithm

https://www.youtube.com/watch?v=cpeYw-hWtSc

Q) Does CYK parser generates multiple parse trees for an ambiguous sentence?
A)
• CYK parsers can potentially generate multiple parse trees for an ambiguous sentence.
• When a sentence is ambiguous, meaning it has more than one valid syntactic interpretation, the CYK
parser explores various possibilities and can yield multiple parse trees.
• The decision on whether to generate and output all parse trees or just one often depends on the specific
implementation and requirements of the parser.
CKY PARSING DEMO
http://lxmls.it.pt/2015/cky.html
MULTIPLE PARSE TREES GENERATED BY CKY PARSER
Minimum Spanning Trees and Dependency Parsing

• Dependency parsing using Minimum Spanning Trees involves finding


the most efficient way to connect all the words in a sentence while
minimizing the total cost or weight of the edges.

• The weight of an edge could be related to syntactic distance or other


linguistic features.
Steps:

1.Edge Weight Calculation:


1. Assign weights to edges based on linguistic features, such as syntactic
distance or the likelihood of a dependency.
2.Minimum Spanning Tree Construction:
1. Use an algorithm like Kruskal's or Prim's to find the Minimum Spanning Tree
based on the edge weights.
3.Dependency Representation:
1. The resulting Minimum Spanning Tree represents the most efficient way to
connect all the words in the sentence while capturing the most significant
syntactic relationships.
Dependency Parsing with Minimum Spanning Trees

English Sentence: The cat is on the mat.


Does Dependency Parsing with Minimum Spanning Trees produce multiple parse
trees for the ambiguous sentence

• Dependency parsing with minimum spanning trees typically aims to produce a single,
projective parse tree for a given sentence.

• However, dependency parsing based on minimum spanning trees might still face
challenges when dealing with ambiguous sentences.

• In practice, when ambiguity exists, dependency parsers often make a choice based on
heuristics, statistical models, or other criteria. The chosen parse may represent the most
likely or most frequently observed structure based on the training data or other linguistic
knowledge.
Models for Ambiguity Resolution in Parsing

Probabilistic Context-Free Grammars

https://m.youtube.com/watch?v=DjJYKmAuAJ0&pp=ygUMcGNmZyBleGFtcGxl

NOTE: Stanford NLP Parser (shown in slide 63) operates based on statistical and
probabilistic models (PCFG). This parser provides a single "best" parse tree when
PCFG language model is used.
Why Stanford Parser
parses ambiguous
sentences

The Stanford Parser, like many


natural language parsers,
operates based on statistical and
probabilistic models trained on
large corpora. While it is
designed to generate parse trees
for grammatical sentences, it
may also attempt to parse
ungrammatical sentences. There
are a few reasons why a parser
might parse an ungrammatical
sentence:
Generative Models for Parsing

• While PCFGs are commonly used for constituency parsing, generative


models for dependency parsing can also be considered. These models aim
to generate sentences along with their corresponding dependency
structures.

• One approach is to extend generative models to include the generation of


dependencies. Probabilities are assigned to different dependency
relations, and the generative process includes selecting dependencies
based on these probabilities.
Challenges and Advances
• Generative models for parsing face challenges such as the need for large
amounts of training data and the difficulty of capturing long-range
dependencies. Advances in neural network architectures, attention
mechanisms, and pre-training techniques have contributed to improved
generative models for parsing.
• Keep in mind that parsing can also be approached from a discriminative
perspective, where models directly learn to predict syntactic structures
without explicitly modeling the generative process.
• The choice between generative and discriminative models often depends
on the specific requirements of the parsing task and the available data.
Discriminative Models for Parsing

Global Linear Model for Parsing

• A global linear model for parsing refers to a parsing approach that uses a linear model to globally score
entire structures (such as parse trees) for a given sentence.

• The goal is to find the structure that maximizes or minimizes a global scoring function. This is in contrast to
local models, which score individual decisions in a parsing process.

• In the context of dependency parsing, a global linear model assigns a score to a complete dependency
structure for a sentence.

• The structure can be represented as a set of labeled dependencies between words in the sentence. The
model considers the entire structure at once and assigns scores based on various features and weights.
Components of a Global Linear Model for Parsing
1.Feature Representation:
1. Define a set of features that capture relevant information about the input sentence and its
potential dependency structure. Features can include word-level information, part-of-speech
tags, syntactic context, and more.
2.Parameterized Scoring Function:
1. Define a scoring function that combines the feature values with associated weights. The
scoring function is often a linear combination of the features and weights, giving a global
score to the entire dependency structure.
3.Optimization Objective:
1. Formulate an optimization problem to find the best-scoring dependency structure. The
objective can involve maximizing the global score for correct structures or minimizing it for
incorrect ones.
4.Learning Weights:
1. Train the model by learning the weights associated with the features. This is typically done
using labeled training data, where correct and incorrect structures are provided for each
sentence.
POS Tagging of Chinese Sentence and Chinese sentence parse
tree
Lead to next unit…

Ambiguity resolution often remains an ongoing research challenge in


natural language processing.

You might also like