0% found this document useful (0 votes)
32 views15 pages

Unit 3 NLP New

The document discusses ambiguity resolution in parsing, highlighting three models: Probabilistic Context-Free Grammar (PCFG), Neural Dependency Parsing, and Semantic Role Labelling (SRL) with Transformer Models, each with examples. It also addresses multilingual issues in parsing, such as syntactic variation, morphological richness, and data scarcity, which affect accuracy. Furthermore, it introduces semantic parsing, its interpretation, system paradigms, and applications, emphasizing its importance in natural language understanding.

Uploaded by

akshitha2904
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views15 pages

Unit 3 NLP New

The document discusses ambiguity resolution in parsing, highlighting three models: Probabilistic Context-Free Grammar (PCFG), Neural Dependency Parsing, and Semantic Role Labelling (SRL) with Transformer Models, each with examples. It also addresses multilingual issues in parsing, such as syntactic variation, morphological richness, and data scarcity, which affect accuracy. Furthermore, it introduces semantic parsing, its interpretation, system paradigms, and applications, emphasizing its importance in natural language understanding.

Uploaded by

akshitha2904
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Unit 3

Ambiguity resolution in parsing refers to how a


computational model (typically in Natural Language
Processing) resolves syntactic or semantic ambiguity when a
sentence or phrase can be interpreted in more than one way.
Here are three detailed models commonly used for
ambiguity resolution in parsing, each with explanation and
example:

1. Probabilistic Context-Free Grammar (PCFG)


Explanation: A PCFG is an extension of a context-free
grammar (CFG) where each production rule is associated with
a probability. During parsing, multiple possible parse trees are
constructed for a sentence, and the one with the highest
overall probability is selected.
How it resolves ambiguity: It uses statistical frequency of
grammar rules from a training corpus to prefer the most
likely syntactic structure.
Example: Sentence:
"I saw the man with the telescope."
Two possible parses:
1. You used a telescope to see the man.
2. The man you saw had a telescope.
PCFG assigns probabilities based on how often these
structures appear in a corpus. If in the training data, "VP → V
NP PP" is more probable than "NP → NP PP", it will favour
interpretation 1 (you used the telescope).

2. Dependency Parsing with Neural Models


Explanation: Neural dependency parsers use neural networks
(like BiLSTMs or Transformers) to predict syntactic
dependencies between words in a sentence. Each word is
connected to its syntactic head, forming a dependency tree.
Neural models are trained on large annotated corpora to
learn contextual patterns.
How it resolves ambiguity: They capture contextual word
usage and long-range dependencies using learned
embeddings and attention, which helps disambiguate
structure.
Example: Sentence:
"Flying planes can be dangerous."
Two meanings:
1. Planes that fly are dangerous.
2. The act of flying planes is dangerous.
A neural dependency parser trained on enough examples will
correctly attach "flying" as a verb or adjective based on
context. If "can be dangerous" frequently follows a gerund
subject, it prefers interpretation 2.
3. Semantic Role Labelling (SRL) with Transformer Models
(e.g., BERT)
Explanation: SRL identifies the predicate-argument structure
of a sentence: who did what to whom, when, and where.
Transformer models like BERT can be fine-tuned for SRL tasks.
They use contextualized embeddings to assign roles like
agent, theme, instrument, etc.
How it resolves ambiguity: It uses semantic information and
world knowledge embedded in the model to understand
meaning beyond syntax.
Example: Sentence:
"She hit the man with the stick."
Two meanings:
1. She used a stick to hit the man.
2. She hit the man who had a stick.
SRL attempts to assign roles like:
 Agent: She
 Patient/Theme: The man
 Instrument: The stick (if applicable)
If the model learns from context that "with the stick" often
functions as an instrument after "hit", it chooses the first
interpretation.

Summary Table
Preferred
Mechanis Example
Model Ambiguity Type Interpretati
m Sentence
on
"I saw the
Rule man with
You used a
PCFG Syntactic probabiliti the
telescope
es telescope.
"
"Flying
Learned
Neural planes
Syntactic/ dependen Flying is
Dependen can be
Structural cy dangerous
cy Parser dangerou
relations
s."
Contextua "She hit
Stick was
SRL with l role the man
Semantic used as
BERT assignmen with the
instrument
t stick."

Would you like a diagram for how these models parse the
same sentence differently?

Multilingual issues in parsing and ambiguity resolution


arise when parsing systems are applied to multiple languages
that differ significantly in grammar, vocabulary, and structure.
These issues affect the accuracy and robustness of parsing
models, especially when a model trained on one language is
applied to another.
Here are some key multilingual issues, explained with
examples:

1. Syntactic Variation Across Languages


Issue: Different word order and grammatical rules can make
it difficult to build a universal parser.
Example:
 English: Subject-Verb-Object
"She eats apples."
 Japanese: Subject-Object-Verb
"Kanojo wa ringo o taberu." (She apples eats)
A parser trained on English may struggle with Japanese
syntax unless it's trained with language-specific or
multilingual data.
Impact on ambiguity resolution: Models may fail to resolve
syntactic ambiguity correctly because they expect structures
from a different language.

2. Morphological Richness
Issue: Languages like Turkish, Finnish, or Arabic have
complex morphology (word forms change based on
grammatical function), which creates sparse data and
challenges for parsers.
Example:
 Turkish:
"Evlerimizden" = "From our houses"
A single word contains root + plural + possessive +
case suffixes.
Impact: Parsing systems must handle morphological analysis
before syntactic parsing. Ambiguity can arise due to multiple
possible segmentations and interpretations.

3. Lexical Ambiguity and Polysemy Across Languages


Issue: Words with multiple meanings or translations can
introduce ambiguity.
Example:
 English: "Bank" (financial institution or river bank)
 German: "Bank" can also mean a bench.
Impact: Multilingual models must disambiguate based on
context, which may vary in how the meaning is signaled in
each language.

4. Lack of Annotated Data in Low-Resource Languages


Issue: Many languages lack large annotated treebanks or
training corpora, which are essential for training statistical or
neural parsers.
Example:
 English has Penn Treebank, Universal Dependencies,
etc.
 Many African or Indigenous languages have no such
resources.
Impact: Transfer learning, cross-lingual embeddings, or
multilingual pre-trained models like mBERT or XLM-R are
used, but performance still lags behind high-resource
languages.

5. Code-Switching and Mixed Language Texts


Issue: Many speakers mix languages in speech or writing
(code-switching), confusing parsers trained on monolingual
data.
Example:
"I will go to the bazaar kal."
(English-Hindi mix; "kal" = tomorrow)
Impact: Language identification and context-sensitive
parsing must be integrated into a multilingual parser to handle
such input.

Summary Table
Issue Description Example Impact
Different English: "I eat Parsing trees
Syntactic
word order rice" vs. Japanese: differ; model
variation
and grammar "I rice eat" may misparse
Issue Description Example Impact
Complex Difficulty in
word forms segmentation
Morphologica Turkish:
in and
l richness "Evlerimizden"
agglutinative disambiguatio
languages n
Words with
many Incorrect
Lexical "Bank" in English
meanings/use meaning in
ambiguity vs. German
s across parsing
languages
Few
annotated African/
Poor parser
Data scarcity corpora for Indigenous
performance
many languages
languages
Mixed- Confuses
Code-
language "Let’s go kal" monolingual
switching
input models

Would you like examples of multilingual parsing models like


UDPipe, mBERT, or XLM-R and how they address these
issues?

Semantic Parsing I: Introduction, Semantic Interpretation,


System Paradigms, Word Sense
1. Introduction to Semantic Parsing
Semantic parsing is the process of converting natural
language into a formal representation of its meaning. This
representation is typically machine-interpretable, allowing
systems to perform tasks such as question answering,
machine translation, information retrieval, and dialogue
systems. Unlike syntactic parsing, which focuses on the
grammatical structure of a sentence, semantic parsing aims
to uncover the underlying meaning.
Semantic parsing is essential in natural language
understanding (NLU) because natural language is inherently
ambiguous, context-dependent, and varied. It bridges the gap
between human language and machine-readable formats,
such as logic-based formal languages, database queries (e.g.,
SQL, SPARQL), and knowledge graphs.

2. Semantic Interpretation
Semantic interpretation refers to the process by which a
system assigns meaning to linguistic expressions. The goal is
to map words, phrases, and sentences to their corresponding
meaning representations.
2.1 Compositional Semantics
Compositional semantics is based on the principle that the
meaning of a sentence is derived from the meaning of its
parts and the rules used to combine them (Frege's Principle).
For example:
 Sentence: "Every student passed the exam."
 Meaning: For all entities x, if x is a student, then x
passed the exam.
This can be expressed in first-order logic:
∀x(Student(x) → Passed(x, exam))
2.2 Semantic Representation Languages
Several formal languages are used to express meaning:
 First-order logic (FOL): Traditional representation with
quantifiers and predicates.
 Lambda calculus: Enables function abstraction and
application, often used in compositional semantics.
 Description logics: Used in semantic web and ontology-
based systems.
 Database query languages: Like SQL for relational
databases or SPARQL for RDF data.

2.3 Scope Ambiguity and Quantifier Scoping


Natural language often presents scope ambiguity:
 Sentence: "Every student read a book."
o Reading 1: Each student read potentially different
books.
o Reading 2: There is one specific book that all
students read.
Semantic parsing must resolve such ambiguities using
syntactic cues, context, or probabilistic models.

3. System Paradigms
Semantic parsers can be implemented using several
paradigms, each with strengths and weaknesses.
3.1 Rule-Based Systems
These systems rely on manually crafted rules that map
linguistic expressions to semantic representations.
 Pros: Transparent and interpretable.
 Cons: Labor-intensive and hard to scale; brittle with
respect to linguistic variation.
3.2 Grammar-Based Semantic Parsers
These systems use semantic grammars that define how words
and structures map to meaning.
 Example: CCG (Combinatory Categorial Grammar), HPSG
(Head-Driven Phrase Structure Grammar)
 Uses syntactic parsing with semantic composition rules.
3.3 Statistical Semantic Parsers
These rely on machine learning techniques to learn mappings
from sentences to formal representations.
 Trained on annotated datasets like GeoQuery or ATIS.
 Use features extracted from syntax, semantics, and
context.
 Algorithms: Maximum Entropy, CRFs, and Bayesian
models.
3.4 Neural Semantic Parsers
Recent advances involve deep learning models, especially
encoder-decoder architectures and transformers.
 Sequence-to-sequence (Seq2Seq) models with attention
mechanisms.
 Transformer-based models like BERT and T5 fine-tuned
for parsing.
 Can generalize better and handle noisy input, but may
lack interpretability.
3.5 Hybrid Approaches
Combine symbolic and neural techniques:
 Symbolic rules ensure correctness.
 Neural components provide robustness and scalability.

4. Word Sense Disambiguation (WSD)


Word sense disambiguation is a key challenge in semantic
parsing, as many words have multiple meanings depending
on context.
4.1 Types of Ambiguity
 Lexical ambiguity: A single word has multiple unrelated
meanings.
o Example: "bank" (financial institution vs. riverbank)
 Polysemy: A word has related meanings.
o Example: "paper" (material vs. scholarly article)
4.2 Approaches to WSD
4.2.1 Knowledge-Based Methods
Use dictionaries, thesauri (e.g., WordNet), and ontologies.
 Lesk algorithm: Disambiguates words by comparing
dictionary definitions of each sense with the
surrounding context.
4.2.2 Supervised Learning
Train classifiers on sense-annotated corpora.
 Requires labeled data like SemCor.
 Features include surrounding words, POS tags,
collocations.
4.2.3 Unsupervised Learning
Cluster contexts of word usage into different sense groups.
 No labeled data needed.
 Techniques include context vector clustering and topic
modeling.
4.2.4 Neural Approaches
Use contextual embeddings (e.g., BERT, ELMo) to capture
word meaning in context.
 Fine-tuned models can achieve state-of-the-art
performance.
4.3 Integration with Semantic Parsing
 WSD is often a pre-processing step in semantic parsing.
 Some modern semantic parsers perform WSD implicitly
as part of the representation learning.

5. Applications of Semantic Parsing


Semantic parsing powers several NLP tasks:
 Question Answering: Mapping natural questions to
database queries.
 Virtual Assistants: Interpreting commands and
intentions.
 Dialogue Systems: Tracking and updating user intents.
 Information Extraction: Mapping unstructured text to
structured facts.
 Machine Translation: Representing meaning to ensure
accurate translation across languages.
6. Challenges and Research Directions
 Data scarcity: Annotated datasets are limited and
domain-specific.
 Domain adaptation: Parsing models often fail to
generalize.
 Explainability: Neural models are often black boxes.
 Multilingual semantic parsing: Extending systems to
multiple languages.
 Knowledge integration: Combining background
knowledge with parsing.
 Commonsense reasoning: Parsing needs to go beyond
syntax and include world knowledge.

7. Conclusion
Semantic parsing is a critical component in achieving human-
like language understanding in machines. By mapping natural
language into formal, machine-readable structures, it enables
advanced applications like question answering, dialogue
systems, and intelligent agents. Future advances lie in
combining symbolic reasoning with deep learning, handling
low-resource languages, and improving interpretability and
robustness.

You might also like