0% found this document useful (0 votes)
4 views

3. Syntax Parsing

The document discusses syntax parsing, focusing on constituency grammars and their hierarchical structure, which analyzes the grammatical composition of sentences. It covers various types of grammars, including Context-Free Grammars (CFGs) and Tree Adjoining Grammars (TAGs), along with their applications and limitations. Additionally, it explains constituency parsing techniques, including rule-based and statistical parsers, as well as CKY parsing and span-based neural parsing methods.

Uploaded by

priti.malkhede
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

3. Syntax Parsing

The document discusses syntax parsing, focusing on constituency grammars and their hierarchical structure, which analyzes the grammatical composition of sentences. It covers various types of grammars, including Context-Free Grammars (CFGs) and Tree Adjoining Grammars (TAGs), along with their applications and limitations. Additionally, it explains constituency parsing techniques, including rule-based and statistical parsers, as well as CKY parsing and span-based neural parsing methods.

Uploaded by

priti.malkhede
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 95

Syntax Parsing

Tushar B. Kute,
http://tusharkute.com
Syntax Parsing

• Syntax parsing, also known as syntactic parsing,


is the process of analyzing a sentence's
grammatical structure.
• It focuses on how words are arranged and
combined to form a meaningful sentence
according to the rules of a specific language.
Constituency Grammars

• Constituency grammars, also known as phrase


structure grammars, are a fundamental
approach to analyzing the syntactic structure of
sentences in linguistics.
• They focus on how words are grouped together
to form phrases and clauses, ultimately building
the entire sentence.
Constituents

• The building blocks of constituency grammar are


constituents.
• These are groups of words that function as a single
unit within a sentence.
• They can be words themselves (e.g., "the red car"),
phrases (e.g., "the red car parked in the driveway"),
or even clauses (e.g., "the red car parked in the
driveway which belongs to my neighbor").
Hierarchical Structure

• Constituency grammars depict sentences as


having a hierarchical structure.
• Imagine a tree-like diagram where the sentence
is the root, and it branches out into smaller and
smaller constituents.
– For example, a sentence might be divided
into a subject noun phrase and a verb phrase,
which can be further broken down into
smaller constituents.
Hierarchical Structure
Rules and Productions

• Constituency grammars often utilize rewrite


rules or productions to define how sentences
are formed.
• These rules specify how a higher-level
constituent can be broken down into its
constituent parts.
– For instance, a rule might state that a sentence
(S) can be rewritten as a noun phrase (NP)
followed by a verb phrase (VP).
Types

• There are various types of constituency


grammars, each with its own set of rules and
principles.
• Here are two common types:
– Context-Free Grammars (CFGs):
– Tree Adjoining Grammars (TAGs)
Types

• Context-Free Grammars (CFGs):


– These are the most widely used type of
constituency grammar.
– CFGs rely on rewrite rules that don't consider
the surrounding words when replacing a
symbol.
– They focus on the hierarchical structure and
the order of constituents.
Types

• Tree Adjoining Grammars (TAGs):


– TAGs are an extension of CFGs that allow for
more flexibility in sentence structure.
– They incorporate elementary trees, which can
be combined to form more complex sentence
structures.
Benefits

• Intuitive Approach:
– Constituency grammar offers a clear and intuitive way
to represent the structure of sentences, making it
easier to understand how words are grouped together.
• Formalization:
– The use of rewrite rules allows for a formal and
systematic approach to sentence analysis.
• Applications:
– Constituency grammar has various applications,
including natural language processing (NLP), machine
translation, and syntactic parsing.
Limitations

• Linear Ordering Focus:


– Constituency grammars primarily focus on the
linear order of words and phrases. They might
not capture some aspects of sentence meaning
that depend on word order variations or
dependencies between words.
• Limited Scope:
– Constituency grammars mainly address the
syntactic structure of sentences and may not
delve as deeply into semantic meaning.
Context free grammer

• In formal language theory, a context-free


grammar (CFG) is a type of grammar that
describes the structure of a formal language.
• It defines a set of rules for generating all the
possible strings (sentences) that belong to that
language.
Context free grammer

• Focus on Structure:
– CFGs focus on the hierarchical structure of
sentences, defining how words can be
grouped into phrases and clauses to form a
complete sentence.
– They don't consider the surrounding words
when applying a production rule.
Context free grammer

• Rewrite Rules:
– CFGs use rewrite rules, also called productions,
to specify how to generate sentences.
– These rules define how a non-terminal symbol (a
placeholder for a phrase or clause) can be
replaced by a sequence of terminal symbols
(actual words) or other non-terminal symbols.
CFG: Formal Definition

• A CFG is formally defined as a 4-tuple G = (V, T, P, S), where:


– V - represents a finite set of variables or non-terminal
symbols. These symbols denote phrases or clauses that need
to be further defined.
– T - represents a finite set of terminal symbols. These are the
actual words that appear in the sentences of the language.
– P - represents a finite set of production rules. These rules
define how non-terminal symbols can be rewritten using
terminal symbols or other non-terminal symbols.
– S - represents the start symbol, a special non-terminal
symbol that serves as the starting point for generating
sentences.
CFG: Example

• Here's a simple CFG that defines a tiny language


with sentences like "the cat sat" or "the dog chased
the cat":
– V = {S, NP, VP, Det, N, Verb}
– T = {the, cat, sat, dog, chased}
– P = { S -> NP VP, NP -> Det N, VP -> Verb NP
(optional rule for transitive verbs) }
– S - is the start symbol (S)
CFG: Example

• This CFG defines that a sentence (S) can be


formed by a Noun Phrase (NP) followed by a
Verb Phrase (VP).
• An NP can be formed by a Determiner (Det)
followed by a Noun (N), and a VP can be a Verb
(Verb) or a Verb followed by another NP (for
transitive verbs).
Grammer Rules for English

• Parts of Speech:
– English sentences are built from eight main parts
of speech: nouns, verbs, adjectives, adverbs,
pronouns, prepositions, conjunctions, and
interjections.
– Each part of speech has a specific function within a
sentence.
• Nouns: represent people, places, things, or ideas
(e.g., cat, book, happiness).
• Verbs: describe actions, states of being, or
occurrences (e.g., run, sleep, happen).
Grammer Rules for English

• Sentence Structure:
– Basic English sentences follow a Subject-Verb-
Object (SVO) word order.
• Subject: who or what the sentence is about (e.g.,
The girl).
• Verb: the action or state of being (e.g., runs).
• Object: receives the action of the verb (e.g.,
across the street).
– Sentences can also be more complex with
prepositional phrases, adjective clauses, adverb
clauses, etc.
Grammer Rules for English

• Subject-Verb Agreement:
– The subject and verb in a sentence must agree in
number (singular or plural).
• Singular subject - singular verb (e.g., The cat
jumps).
• Plural subject - plural verb (e.g., The cats
jump).
Grammer Rules for English

• Verb Tenses:
– Verbs are conjugated to indicate time (past,
present, future) and aspect (simple, continuous,
perfect).
• English has 12 basic verb tenses that express
different nuances of time.
Grammer Rules for English

• Articles:
– English uses two articles, "a/an" (indefinite) and
"the" (definite), to indicate whether a noun is
being referred to for the first time or is already
known.
Grammer Rules for English

• Punctuation:
– Punctuation marks like periods, commas,
question marks, etc., are used to separate
clauses, indicate pauses, and convey meaning
clearly.
Grammer Rules for English

• Sentence Types:
– English sentences can be declarative
(statements), interrogative (questions),
imperative (commands), or exclamatory
(exclamations).
Treebank

• In linguistics, a treebank is a parsed text corpus


annotated with information about the syntactic or
semantic structure of the sentences it contains.
• These annotations are typically represented in the
form of trees, hence the name "treebank."
• Treebanks serve as valuable resources for various
tasks in computational linguistics and natural
language processing (NLP).
Treebank
Treebank

• Structure and Annotation:


– Text Corpus: A treebank is built upon a collection of text
data, which can range from news articles and books to
social media posts and historical documents.
– Syntactic vs. Semantic Annotation:
• Syntactic Treebanks: These focus on the grammatical
structure of sentences, breaking them down into
phrases and clauses.
• The relationships between words are depicted using
tree diagrams, where the root of the tree represents
the entire sentence and branches represent phrases
and sub-phrases.
Treebank

• Semantic Treebanks:
– These delve deeper into the meaning of
sentences, annotating the semantic roles of
words and their relationships within the
sentence.
– They might utilize various formalisms to
represent the meaning structure.
Treebank : Types

• Penn Treebank:
– A widely used treebank for English, focusing on
syntactic structure.
• FrameNet:
– A semantic treebank that annotates sentences based
on semantic frames, which represent stereotypical
situations involving roles and participants.
• PropBank:
– Another semantic treebank that focuses on verb
argument structure, labeling the semantic roles of
noun phrases relative to a verb.
Grammar Equivalence

• Two grammars are considered grammatically


equivalent if they generate the same set of
sentences, even though their production rules
might differ.
• Imagine having two different recipes for the same
cake; the ingredients and final product might be
identical, but the steps to get there could be
written differently.
Normal Forms

• Normal forms
– are specific types of grammars with particular
properties that make them easier to analyze and
manipulate.
– There are different types of normal forms for
CFGs, with some of the most common being:
• Chomsky Normal Form (CNF)
• Greibach Normal Form (GNF)
Normal Forms

• Chomsky Normal Form (CNF): A CFG is in CNF if all its


production rules are of a specific form, such as:
– A non-terminal symbol can be rewritten as two non-
terminal symbols. (A -> BC)
• A non-terminal symbol can be rewritten as a terminal
symbol. (A -> a)
• The start symbol can be rewritten as the empty string
(epsilon) in some cases. (S -> ε)
Normal Forms

• Greibach Normal Form (GNF):


– Similar to CNF, but with restrictions on the form
of right-hand sides in production rules, where
only terminal symbols can appear on the
rightmost side.
lexicalized grammer

• Lexical Functional Grammar (LFG):


– This is the most common meaning encountered in
academic contexts. LFG is a specific type of
theoretical framework for analyzing sentence
structure.
• Lexical Grammar (Lexicogrammar):
– This is a broader term used in Systemic Functional
Linguistics (SFL) to emphasize the interdependence
of vocabulary (lexis) and syntax (grammar).
Lexical Functional Grammar (LFG)

• Developed by Joan Bresnan and Ronald Kaplan in


the 1970s.
• Aims to bridge the gap between transformational
grammar (focusing on syntactic transformations)
and dependency grammar (focusing on word
relationships).
Lexical Functional Grammar (LFG)

• Key Features:
– Dual Representation: LFG employs two separate
levels of representation:
• Phrase Structure (PS) rule system: Similar to CFGs,
it defines the surface word order and
constituency of sentences.
• Functional Uncertainty Principles (F-Structure):
Represents grammatical functions like subject,
object, and agent, independent of word order.
– Lexical Entries: Words in LFG have rich lexical entries
that specify their syntactic and semantic properties.
Lexical Grammar (Lexicogrammar)

• Introduced by linguist M.A.K. Halliday as part of Systemic


Functional Linguistics (SFL).
– Focus: Highlights the interconnectedness between
vocabulary choices and grammatical structures in
conveying meaning.
– SFL Theory: SFL views language as a system for making
meaning, and lexicogrammar emphasizes how grammar
and vocabulary work together to achieve specific
communicative goals.
– Example: The choice of verb (e.g., "give" vs. "donate")
can influence the interpretation of a sentence and the
social roles of participants.
Constituency Parsing

• Constituency parsing, also known as phrase


structure parsing, is a fundamental technique in
Natural Language Processing (NLP) that deals with
analyzing the grammatical structure of sentences.
• It focuses on identifying the constituents (phrases
and clauses) that make up a sentence and their
hierarchical relationships.
Constituency Parsing

• Core Idea:
– Imagine a sentence as a tree structure. The root of
the tree represents the entire sentence, and it
branches out into smaller and smaller constituents.
– These constituents can be individual words, phrases
(like noun phrases or verb phrases), or even clauses.
– Constituency parsing aims to identify these
constituents and their hierarchical relationships
within the sentence tree.
Constituency Parsing

• Process:
– Input: The parser takes a sentence as input.
– Constituent Identification: The parser identifies the
different constituents within the sentence, such as noun
phrases (NPs), verb phrases (VPs), adjective phrases (ADJPs),
etc.
– Hierarchical Relationships: The parser determines the
hierarchical relationships between the constituents. For
example, an NP might be a child of a VP, and a VP might be a
child of the main sentence (S).
– Output: The parser outputs a parse tree that visually
represents the identified constituents and their
relationships.
Constituency Parsing

• Types of Constituency Parsers:


– Rule-based Parsers:
• These rely on predefined grammatical rules to
identify constituents and construct the parse
tree.
– Statistical Parsers:
• These use statistical models trained on large
datasets of pre-parsed sentences to predict
the most likely parse tree for a new sentence.
CKY Parsing

• CKY parsing, also known as the Cocke–Younger–


Kasami algorithm (after some of its rediscoverers),
is a specific type of constituency parsing algorithm
used in Natural Language Processing (NLP).
• It's a bottom-up parsing technique that employs
dynamic programming to efficiently analyze the
grammatical structure of a sentence.
CKY Parsing

• Core Functionality:
– Bottom-up Approach:
• Unlike top-down parsers that start from the entire
sentence and break it down, CKY parsing starts from
individual words and builds up progressively to
identify larger constituents and ultimately the entire
sentence structure.
– Dynamic Programming:
• It leverages dynamic programming, a technique that
stores intermediate results to avoid redundant
calculations. This makes CKY parsing efficient for
handling even complex sentences.
CKY Parsing

• Process:
– Input:
• The algorithm takes a sentence (string of words)
and a grammar (set of production rules) as input.
– Initialization:
• A 2D table is created, where rows and columns
represent positions in the sentence.
• Initially, each cell holds the non-terminal symbols
(grammar variables) that can generate the single
word at that position based on the grammar rules.
CKY Parsing

• Bottom-up Filling:
– The algorithm iterates through the table diagonally, filling
each cell. For a given cell, it checks all possible ways to
combine constituents from the cells below and to the left
(based on grammar rules) to see if they can generate a
larger constituent that can span the current cell's range in
the sentence.
• Output:
– After processing the entire table, the cell representing the
entire sentence (top-right corner) should contain the start
symbol of the grammar if the sentence is grammatically
correct according to the provided grammar.
Span based neural constituency parsing

• Span-based neural constituency parsing is a


relatively recent approach to constituency parsing
in Natural Language Processing (NLP) that utilizes
neural networks.
• It deviates from traditional rule-based or statistical
parsers by focusing on directly identifying
constituents (phrases and clauses) within a
sentence using a neural network model.
Span based neural constituency parsing

• Core Idea:
– Shift from Rules to Spans:
• Unlike traditional parsers that rely on predefined
grammatical rules, span-based parsing focuses on identifying
spans of words (contiguous sequences) that represent
constituents. The neural network model learns to score and
predict these spans directly from the training data.
– Neural Network Power:
• The model leverages the power of neural networks to
capture complex patterns and relationships within
sentences. This allows it to potentially handle ambiguities
and variations in language structure better than rule-based
approaches.
Span based neural constituency parsing

• Process:
– Input:
• The model takes a sentence as input.
– Word Representation:
• Each word in the sentence is converted into a
vector representation using techniques like
word embedding.
• This vector captures the semantic and
syntactic properties of the word.
span based neural constituency parsing

• Process:
– Span Scoring:
• The model then employs a neural network architecture to score
each possible span in the sentence (considering all start and
end positions for contiguous sequences). This scoring takes
into account the word representations of the words within the
span and their potential to form a grammatical constituent.
– Prediction and Tree Building:
• Based on the span scores, the model predicts the most likely
set of spans that represents the grammatical structure of the
sentence. This prediction can then be used to build a parse tree
depicting the hierarchical relationships between the
constituents.
Types of Span-Based Parsers

• Independent Scoring Models:


– These models score each span independently,
focusing on whether the words within the span can
form a constituent.
• Transition-Based Models:
– These models use a sequence of actions (e.g., "shift"
to include a new word in a constituent, "reduce" to
finalize a constituent) to build the parse tree, with
the neural network scoring the transitions between
these actions.
Evaluation Parser

• An evaluation parser, also known as a parsing


evaluation metric, is a tool used to assess the
performance of a constituency parser in Natural
Language Processing (NLP).
• These metrics measure how well the parser's
output (the parse tree) aligns with the actual
grammatical structure of a sentence, as
represented by a gold standard parse tree.
Evaluation Parser: Why?

• Parsing is a crucial step in many NLP tasks, and it's


essential to ensure the parser is functioning
correctly.
• Evaluation metrics help us compare different
parsers, identify areas for improvement, and track
the progress of parser development.
Common Evaluation Metrics

• Precision:
– This metric measures the proportion of the parser's
predicted constituents that are actually correct
according to the gold standard.
• Recall:
– This metric measures the proportion of the gold
standard constituents that are correctly identified by
the parser.
• F1-Score:
– This is a harmonic mean of precision and recall,
providing a balanced view of the parser's performance.
Common Evaluation Metrics

• Labeled Attachment Error (LAE):


– This metric calculates the percentage of words in
the sentence that are not assigned the correct
syntactic label (part-of-speech and phrase type)
in the parse tree.
• Unlabeled Attachment Score (UAS):
– This metric focuses on whether words are
attached to the correct constituent in the parse
tree, regardless of the specific label assigned.
Choosing the Right Metric

• Precision is crucial when it's important to avoid false


positives (incorrectly identified constituents).
• Recall is important when it's essential to capture all the
correct constituents, even if it leads to some false
positives.
• F1-Score provides a balance between precision and
recall.
• LAE focuses on accurate labeling of syntactic structure.
• UAS focuses on correctly attaching words to
constituents.
Partial Parsing

• In NLP (Natural Language Processing), partial


parsing refers to the process of analyzing only a
portion of a sentence's grammatical structure,
rather than the entire sentence.
• It focuses on identifying and extracting specific
grammatical elements instead of building a
complete parse tree that depicts the whole
hierarchical structure.
Partial Parsing: Why?

• Efficiency:
– For some tasks, a complete parse tree might not be necessary.
Partial parsing can be more efficient, especially for large
datasets or real-time applications.
• Focus on Specific Elements:
– Sometimes, the focus might be on identifying specific
grammatical elements like named entities, verb phrases, or
noun phrases. Partial parsing can be tailored to extract these
elements directly.
• Handling Complexity:
– Complex sentences or ungrammatical structures can be
challenging for full parsers. Partial parsing might be able to
extract useful information even in such cases.
Partial Parsing: Types

• Chunking:
– This involves identifying and labeling non-
overlapping chunks of words representing
phrases like noun phrases, verb phrases, or
prepositional phrases.
• Tagging:
– This assigns part-of-speech (POS) tags to
individual words, indicating their grammatical
function (noun, verb, adjective, etc.).
Partial Parsing: Types

• Entity Recognition:
– This focuses on identifying and classifying named
entities like people, organizations, or locations
within the sentence.
• Shallow Parsing:
– This involves extracting basic syntactic
information like subject-verb relationships or
dependency links between words.
CCG parsing

• CCG parsing, also known as Combinatory Categorial


Grammar (CCG) parsing, is a technique used in NLP
(Natural Language Processing) to analyze the
grammatical structure of sentences.
• It differs from traditional constituency parsing by
focusing on the categories assigned to words and
how they combine to form a complete sentence.
CCG parsing

• Core Idea:
– Categories and Combinators:
• CCG assigns lexical categories to words and phrases
that represent their grammatical function and how
they can combine with other elements. These
categories can be complex, reflecting the richness of
natural language.
– Combinatory Logic:
• CCG employs a set of predefined operations
(combinators) that specify how categories can be
combined to form new categories. These operations
determine the valid syntactic structures for sentences.
CCG parsing

• Input:
– The parser takes a sentence as input.
• Lexical Categorization:
– Each word in the sentence is assigned a lexical
category based on its part-of-speech and its role in
combining with other words.
– For example, a noun might have a category like "NP"
(Noun Phrase), and a verb might have a category like
"S<NP>/NP" (takes an NP as an argument and
returns an S (sentence)).
CCG parsing

• CCG Combinators:
– The parser applies the CCG combinators to combine the
categories of adjacent words, following the valid rules
defined by the combinators. These combinations build
up a structure that reflects the grammatical
relationships between words.
• Derivation and Output:
– If a valid combination of categories leads to the category
"S" (sentence) at the end, the parser has successfully
derived a grammatical structure for the sentence. This
derivation process essentially shows how the individual
words combine to form the complete sentence.
Dependancy parsing

• Dependency parsing, a fundamental technique in


Natural Language Processing (NLP), analyzes
sentence structure by focusing on the grammatical
relationships between words.
• Unlike constituency parsing, which deals with
hierarchical phrases and clauses, dependency
parsing reveals how individual words depend on
each other to form a complete sentence.
Dependancy parsing

• Core Concept:
– Words and Dependencies:
• Each word in a sentence is considered a node, and
the parser identifies the grammatical dependency
between words.
• A dependency link connects a "head" word to its
"dependent" word, indicating how the dependent
word modifies or complements the meaning of
the head word.
Dependancy parsing

• Process:
– Input: The parser takes a sentence as input.
– Dependency Identification: The parser analyzes the
sentence to identify the head word for each word and
the grammatical relationship between them. This
relationship can be labeled with specific dependency
tags like "subject," "object," "modifier," etc.
– Output: The parser outputs a dependency graph (or
tree) that visually represents the identified
relationships. Each word is a node in the graph, and
directed arrows connect the heads to their dependents,
along with the dependency labels.
Dependancy parsing: Types

• Rule-based Parsing:
– Relies on predefined grammatical rules to
identify dependencies.
• Statistical Parsing:
– Uses statistical models trained on large datasets
of pre-annotated sentences to predict the most
likely dependencies.
Dependancy parsing: Benefits

• Straightforward Relationships:
– Dependency parsing directly captures the grammatical
relationships between words, making it intuitive and
interpretable.
• Handling Complexities:
– It can handle complex sentence structures and word
orders more effectively than constituency parsing in
some cases.
• Cross-Lingual Applicability:
– Dependency parsing can be more easily adapted to
different languages compared to constituency parsing.
Dependency Relations

• In dependency parsing, dependency relations, also


known as grammatical relations or dependency
labels, are specific tags that describe the
grammatical connection between a "head" word
and its "dependent" word in a sentence.
• These labels provide a detailed explanation of how
the dependent word modifies or complements the
meaning of the head word.
Dependency Relations: Types

• Subject (subj): This identifies the subject of a verb. (e.g., "The dog
(head) chased the cat (dependent, subj)").
• Object (obj): This identifies the direct or indirect object of a verb.
(e.g., "I (head) gave a gift (dependent, obj) to her (dependent, obj)").
• Modifier (mod): This is a general category for various modifiers,
including adjectives (e.g., "a red (dependent, mod) car (head)"),
adverbs (e.g., "She ran quickly (dependent, mod)"), and prepositional
phrases (e.g., "The house on the hill (dependent, mod) is beautiful
(head)").
• Possessive (poss): This identifies the possessor of a noun. (e.g., "The
man's (head) hat (dependent, poss) is red").
• Aux (aux): This identifies auxiliary verbs that help form verb tenses.
(e.g., "She has (head, aux) been waiting (dependent) for a long time").
Universal Dependency (UD) Tags

• A widely used standard for dependency relations is the


Universal Dependencies (UD) tag set. UD provides a consistent
set of labels across different languages, facilitating cross-
lingual NLP tasks. Here are some examples of UD tags:
– nsubj (nominal subject): Similar to subj, but more specific
about the noun phrase being the subject.
– dobj (direct object): Similar to obj, but specifically for direct
objects.
– poss (possessive): Same as the traditional poss tag.
– aux (auxiliary): Same as the traditional aux tag.
– advmod (adverbial modifier): Similar to mod for adverbs.
– amod (adjectival modifier): Similar to mod for adjectives.
Dependency Formalism

• In dependency parsing, dependency formalism


refers to the system used to represent the
grammatical relationships between words.
• It defines how these relationships are
identified, labeled, and structured within the
dependency parse output.
Dependency Formalism

• Basic Dependency Labels:


– This is a simple approach where the dependency
relationships are represented with basic labels
like "subj" (subject), "obj" (object), "mod"
(modifier), etc.
– These labels provide a general understanding of
how words depend on each other but might not
capture all the nuances of the relationship.
Dependency Formalism

• Extended Dependency Labels:


– This formalism builds upon basic labels by including
more specific information.
– For example, instead of just "mod," it could use
labels like "advmod" (adverbial modifier), "nmod"
(nominal modifier), or "ppmod" (prepositional phrase
modifier).
– These more specific labels provide a richer
understanding of the grammatical function of the
modifier.
Dependency Formalism

• Universal Dependencies (UD):


– This is a widely used and standardized dependency
formalism that employs a core set of labels (around
40) to represent grammatical relationships across
different languages.
– UD labels are both concise and informative,
capturing essential syntactic information.
– It also allows for language-specific extensions for
capturing additional complexities.
Dependency Formalism

• Grammatical Framework (GF) Dependencies:


– This formalism uses a more theoretical approach
based on linguistic frameworks like Lexical
Functional Grammar (LFG).
– GF dependencies go beyond basic syntactic
relationships and can represent semantic roles
and argument structures within the sentence.
Dependency Treebank

• In dependency parsing, a dependency treebank is a


crucial resource that serves as a training ground
and evaluation benchmark for dependency parsers.
• It's essentially a large collection of sentences where
each sentence has been annotated with its
corresponding dependency structure.
Dependency Treebank
Dependency Treebank: Components

• Sentences:
– The treebank consists of a large number of sentences
from a specific language.
• Dependency Annotations: Each sentence is annotated with
its dependency relationships. This annotation involves:
– Identifying the head word for each word in the sentence.
– Labeling the dependency relation between the head
word and its dependent words. These labels specify the
grammatical role of the dependent word (e.g., subject,
object, modifier).
Dependency Treebank: Components

• Structure and Format:


– Tree Representation:
• The dependency structure can be visualized as a
dependency tree, where each word is a node and
directed edges connect heads to their dependents,
labeled with the dependency relation.
– File Format:
• Treebanks typically use standardized file formats
like CONLL-U, which specifies how sentences and
their dependency annotations are represented in
text files.
Transition-Based Dependency Parsing

• This method views dependency parsing as a


sequence of decision-making steps (transitions)
that progressively build the dependency structure
of a sentence.
• It utilizes a state machine to represent the current
parsing progress and a model that predicts the
most likely next transition based on the current
state.
Transition-Based Dependency Parsing
Transition-Based Dependency Parsing

• Process:
– Input: The parser takes a sentence as input.
– Initial State: Parsing starts with an initial state, often representing an
empty dependency structure.
– Transition Sequence: The model predicts a sequence of transitions based
on the current state. These transitions can involve actions like:
• Shift: Move the next word from the sentence to the processing stack.
• Arc-Left/Right: Create a dependency link between a word on the stack
and another word in the structure, depending on whether the head
word is to the left or right.
• Reduce: Finalize a dependency structure for a portion of the sentence.
– Final State: The parsing process ends when a designated final state is
reached, representing a complete and valid dependency structure for the
sentence.
Graph-Based Dependency Parsing

• This method approaches parsing as a search


problem within a space of all possible dependency
graphs for a given sentence.
• The objective is to find the graph that best
represents the grammatical relationships between
words.
Graph-Based Dependency Parsing
Graph-Based Dependency Process

• Input: The parser takes a sentence as input.


• Dependency Candidates: The parser identifies all
possible dependency links between pairs of words
in the sentence based on grammatical rules or
learned patterns.
• Evaluation Function: A scoring function is used to
evaluate each possible dependency graph. This
function considers factors like the compatibility of
the dependencies with grammatical rules, the
word order, and potentially semantic information.
Graph-Based Dependency Process

• Search Algorithm: The parser employs a search


algorithm (e.g., beam search) to explore the
space of possible graphs, prioritizing those with
higher scores.
• Highest Scoring Graph: The search terminates
when the highest-scoring dependency graph is
found, representing the most likely dependency
structure for the sentence.
Evaluations

• In dependency parsing, evaluation refers to the process


of assessing how well a dependency parsing model
performs in identifying the grammatical relationships
between words in a sentence.
• This involves comparing the model's predicted
dependency structures with gold-standard annotations
representing the correct grammatical relationships.
Evaluations

• Why Evaluate Dependency Parsers?


– Model Improvement:
• Evaluation helps identify strengths and weaknesses of the
parsing model, guiding further development and training.
– Comparison of Models:
• Evaluation metrics allow researchers to compare the
performance of different dependency parsing models
objectively.
– Task Suitability:
• Evaluation helps determine if a model is suitable for a
specific NLP task that relies on accurate dependency
parsing.
Evaluations : Metrics

• Attachment Rate (AR): Measures the proportion of words in the


sentence where the model correctly predicts the head word.
• Label Accuracy (LA): Focuses on whether the model assigns the
correct dependency label (e.g., subject, object) to each
dependency relation.
• Labeled Attachment Rate (LAS): Combines both attachment and
label accuracy, providing a balanced overall score.
• Unlabeled Attachment Rate (UAS): Similar to LAS but ignores the
dependency labels, focusing solely on correctly attaching words
to their head words.
• MLF1 Score: This metric takes into account precision, recall, and
F1-score for both attachment and label accuracy, offering a more
comprehensive evaluation.
Summary

• Context-Free Grammar (CFG):


– Focus: Defines rules for generating grammatical
sentences in a language.
– Structure: Uses non-terminal symbols (categories)
and terminal symbols (words) to represent sentence
structure.
– Example Rule: NP -> DT N (Noun Phrase can be a
Determiner followed by a Noun)
– Use in Parsing: Provides a theoretical foundation for
constituency parsing.
Summary

• Constituency Parsing:
– Goal: Identifies phrases and clauses within a
sentence based on CFG rules or statistical models.
– Output: A parse tree representing the hierarchical
relationships between phrases (e.g., noun phrases,
verb phrases).
– Benefits: Suitable for tasks that require
understanding phrase structure.
– Limitations: Might struggle with complex sentences
or word order variations.
Summary

• Dependency Parsing:
– Goal: Analyzes sentence structure by focusing on
the grammatical relationships between words.
– Output: A dependency graph (or tree) showing how
words depend on each other (head-dependent
relationships).
– Benefits: More interpretable due to focus on word-
to-word relationships, handles complex structures
well.
– Limitations: Doesn't explicitly represent hierarchical
phrase structure, can struggle with ambiguity.
Thank you
This presentation is created using LibreOffice Impress 7.4.1.2, can be used freely as per GNU General Public License

@mITuSkillologies @mitu_group @mitu-skillologies @MITUSkillologies

Web Resources
https://mitu.co.in
@mituskillologies http://tusharkute.com @mituskillologies

[email protected]
[email protected]

You might also like