NLP Unit-3
NLP Unit-3
Entities: These are the nouns or objects in the sentence. For example,
"apple", "John", "car".
Relations: This capture how entities are connected or related. For example,
"in", "on", "with".
Quantifier: Every.
This means "For every student, if x is a student, then x passed the exam."
Semantic parsing can also generate executable code from natural language
instructions.
Output (Python):
print(i)
Output (Python):
sorted_list = sorted(original_list)
Challenges in Semantic Parsing:
2. Semantic Interpretation
Semantic interpretation in Natural Language Processing (NLP) refers to the
process of understanding the meaning of text or speech in a way that captures
the intent, context, and relationships between words and phrases. It goes beyond
syntactic analysis (which focuses on grammar and structure) to derive the deeper
meaning of language. This is crucial for tasks like machine translation, sentiment
analysis, question answering, and more.
1. Syntactic Analysis:
o Verb: "chased"
2. Semantic Interpretation:
o Entities:
o Relationships:
o Semantic Roles:
3. Meaning:
For example:
1. Attachment Ambiguity:
o Example:
Interpretations:
2. Coordination Ambiguity:
o Example:
Interpretations:
o Example:
Interpretations:
4. Gapping Ambiguity:
o Example:
Interpretations:
1. Interpretation 1:
o Syntax Tree:
S
├── NP: I
└── VP
├── V: shot
├── NP: an elephant
└── PP: in my pajamas
2. Interpretation 2:
S
├── NP: I
└── VP
├── V: shot
└── NP
├── Det: an
├── N: elephant
└── PP: in my pajamas
Resolving Structural Ambiguity
1. Context:
2. World Knowledge:
3. Statistical Models:
4. Syntax-Based Rules:
For example:
In the sentence "I went to the bank to deposit money," WSD identifies
"bank" as a financial institution.
We understand that words have different meanings based on the context of its
usage in the sentence. If we talk about human languages, then they are ambiguous
too because many words can be interpreted in multiple ways depending upon the
context of their occurrence.
For example, consider the two examples of the distinct sense that exist for the
word “bass” −
The occurrence of the word bass clearly denotes the distinct meaning. In first
sentence, it means frequency and in second, it means fish. Hence, if it would be
disambiguated by WSD then the correct meaning to the above sentences can be
assigned as follows −
A Dictionary
The very first input for evaluation of WSD is dictionary, which is used to specify
the senses to be disambiguated.
Test Corpus
Another input required by WSD is the high-annotated test corpus that has the
target or correct-senses. The test corpora can be of two types &minsu;
1. Lesk Algorithm:
4. WordNet-based Methods:
2. Supervised Methods
3. Semi-supervised Methods
Due to the lack of training corpus, most of the word sense disambiguation
algorithms use semi-supervised learning methods. It is because semi-supervised
methods use both labelled as well as unlabeled data. These methods require very
small amount of annotated text and large amount of plain unannotated text. The
technique that is used by semisupervised methods is bootstrapping from seed
data.
4. Unsupervised Methods
These methods assume that similar senses occur in similar context. That is why
the senses can be induced from text by clustering word occurrences by using
some measure of similarity of the context. This task is called word sense
induction or discrimination. Unsupervised methods have great potential to
overcome the knowledge acquisition bottleneck due to non-dependency on manual
efforts.
Machine Translation
Information retrieval (IR) may be defined as a software program that deals with
the organization, storage, retrieval and evaluation of information from document
repositories particularly textual information. The system basically assists users
in finding the information they required but it does not explicitly return the
answers of the questions. WSD is used to resolve the ambiguities of the queries
provided to IR system. As like MT, current IR systems do not explicitly use WSD
module and they rely on the concept that user would type enough context in the
query to only retrieve relevant documents.
Lexicography
WSD and lexicography can work together in loop because modern lexicography is
corpus based. With lexicography, WSD provides rough empirical sense groupings
as well as statistically significant contextual indicators of sense.
5. Entity and Event Resolution
Entity and event resolution are critical subtasks in semantic parsing that involve
identifying and disambiguating entities (e.g., people, organizations, locations) and
events (e.g., actions, occurrences) in text and mapping them to their
corresponding representations in a knowledge base or structured schema.
Entity Resolution
2. Candidate Generation:
3. Disambiguation:
o Techniques include:
Ambiguity: Many entity mentions are ambiguous (e.g., "Apple" could refer
to the company or the fruit).
Event Resolution
1. Event Detection:
o Identify the participants (entities) and their roles in the event (e.g.,
"Barack Obama" as the agent in "elected president").
1. Question Answering:
3. Information Extraction:
4. Machine Translation:
5. Dialogue Systems:
6. Predicate-Argument Structure
Predicate-argument structure represents the relationships between a
predicate (a verb or action) and its arguments (the participants in the action). For
example:
Key Concepts
1. Predicate:
2. Arguments:
Location: The place where the action occurs (e.g., "in the
kitchen" in "John eats an apple in the kitchen").
3. Semantic Roles:
Example:
Predicate: "gave"
Arguments:
o Agent: "John"
o Recipient: "Mary"
Structured Representation:
1. Rule-based Methods:
1. Logical Forms:
o Example:
2. SQL Queries:
o Example:
o Example:
o Example:
o Example:
6. Programmatic Representations:
o Example:
1. Entities:
2. Predicates:
3. Arguments:
4. Modifiers:
There has been a debate over what constitutes the set of arguments and
what the granularity of such argument label should be for various
predicates.
Resources
A corpus (plural: corpora) refers to a large and structured collection of
texts or speech data that is used for linguistic analysis, training machine
learning models, and developing NLP applications
We have two important corpora that are semantically tagged. One is
FrameNet and the other is PropBank.
These resources have transformed from rule-based approaches to more
data-oriented approaches. These approaches focus on transforming linguistic
insights into features.
o FrameNet
o PropBank
o Other Resources
FrameNet is based on the theory of frame semantics where a given
predicate invokes a semantic frame initiating some or all of the possible
semantic roles belonging to that frame.
PropBank is based on DowtyÕs prototype theory and uses a more linguistically
neutral view. Each predicate has a set of core arguments that are predicate
dependent and all predicates share a set of noncore or adjunctive arguments.
FrameNet
FrameNet contains frame-specific semantic annotation of a number of
predicates in English.
FrameNet is based on the theory of Frame Semantics, which posits that the
meaning of a word (especially verbs, nouns, and adjectives) can only be fully
understood in the context of a semantic frame. A semantic frame is a conceptual
structure that describes a situation, event, or relationship, along with the
participants (called frame elements) and their roles.
The process of FrameNet annotation consists of identifying specific semantic
frames and creating a set of frame-specific roles called frame elements.
A set of predicates that instantiate the semantic frame irrespective of their
grammatical category are identified and a variety of sentences are labelled
for those predicates.
The labelling process identifies the following:
o The frame that an instance of the predicate lemma invokes
o The semantic arguments for that instance
o Tagging them with one of the predetermined sets of frame elements
for that frame.
Key Components of FrameNet
1. Frame:
o A conceptual structure representing a situation, event, or
relationship.
o Example: The "Commerce_buy" frame describes a commercial
transaction involving a buyer, seller, goods, and money.
2. Lexical Units (LUs):
o Words or phrases that evoke a specific frame.
o Example: The verb "buy" evokes the "Commerce_buy" frame.
3. Frame Elements (FEs):
o The semantic roles or participants in a frame.
o Example: In the "Commerce_buy" frame, the frame elements are:
Buyer: The person purchasing the goods.
Seller: The person selling the goods.
Goods: The item being purchased.
Money: The currency exchanged.
4. Annotations:
o Real-world sentences annotated with frame elements to show how
words evoke frames.
o Example: In the sentence "John bought a book from Mary for $10,"
the annotations would identify:
"John" as the Buyer,
"a book" as the Goods,
"Mary" as the Seller,
"$10" as the Money.
5. Frame Relations:
o Relationships between frames, such as inheritance, causation, or
perspective.
o Example: The "Commerce_buy" frame inherits from the more
general "Commerce" frame.
How FrameNet is Used in Semantic Parsing
1. Frame Identification:
o Identify the frame evoked by a word or phrase in a sentence.
o Example: For the sentence "John sold a car to Mary," the verb "sold"
evokes the "Commerce_sell" frame.
2. Argument Role Labeling:
o Map the constituents of a sentence to their corresponding frame
elements.
o Example: In "John sold a car to Mary," the roles are:
"John" → Seller,
"a car" → Goods,
"Mary" → Buyer.
3. Meaning Representation:
o Use the identified frames and their elements to construct a
structured representation of the sentence's meaning.
o Example: The sentence "John sold a car to Mary" could be
represented as:
Commerce_sell(Seller: John, Goods: car, Buyer: Mary)
4. Integration with Other NLP Tasks:
o FrameNet annotations can be used to improve tasks like:
Question Answering: Understanding the roles in a question
and matching them to a knowledge base.
Information Extraction: Extracting structured information
from text.
Machine Translation: Preserving semantic roles during
translation.
PropBank
In FrameNet we have lexical units which are words paired with their
meanings or the frames that they invoke.
In Propbank each lemma has a list of different framesets that represent
all the senses for which there is a different argument structure.
Other Resources
Other resources have been developed to aid further research in predicate-
argument recognition.
NomBank was inspired by PropBank.
In the process of identifying and tagging the arguments of nouns,
the NOMLEX(NOMinalization LEXicon) dictionary was expanded to
cover about 6,000 entries.
The frames from PropBank were used to generate the frame files
for NomBank.
Another resource that ties PropBank frames with more predicate-
independent thematic roles and also provides a richer
representation associated with Levin classes is VerbNet.
Systems
Syntactic Representations
Classification Paradigms
Feature Performance
Feature Salience
Feature Selection
Noun Arguments
Multilingual Issues
Software
Following is a list of software packages available for semantic role labeling
1. Structured Representation:
2. Capturing Semantics:
Intent: BookFlight
Destination: Paris
3. Formal Language:
1. Logical Form:
o Representation:
answer(Capital(France))
o Representation:
(w / want
:arg0 (b / boy)
:arg1 (g / go
:arg0 b
:destination (p / park)))
3. SQL Query:
o Representation:
4. Frame-Based Representation:
o Representation:
Event: BookFlight
Agent: John
Destination: New York.
Now we look at the activity which takes natural language input and transforms
it into an unambiguous representation that a machine can act on.
This form will be understood by the machines more than human beings.t is
easier to generate such a form for programming languages as they impose
syntactic and semantic restrictions on the programs where as such
restrictions cannot be imposed on natural language.
Techniques developed so far work within specific domains and are not
scalable.
This is called deep semantic parsing as opposed to shallow semantic parsing.
Resources
A number of projects have created representations and resources that
have promoted experimentation in this area. A few of them are as follows:
ATIS
Communicator
GeoQuery
Robocup:Clang
The ATIS (Air Travel Information Systems) dataset is a widely used benchmark
in semantic parsing and natural language processing (NLP). It focuses on the
domain of air travel and is designed to help develop and evaluate systems that can
understand and process natural language queries related to flight information,
such as booking flights, checking flight statuses, and finding airport details.
1. Rule Based
A few semantic parsing systems that performed very well for both
be a better approach.
structure.
It reevaluates and adjusts the values of the frames with each new