Semantic Parsing
Semantic Parsing
Tushar B. Kute,
http://tusharkute.com
Semantic Parsing
• Core Objective:
– Bridging the Gap: Semantic parsing bridges
the gap between human language and
machine understanding by translating natural
language into a machine-readable format
that expresses the intended meaning.
Types of Meaning Representations
• Synsets:
– The core of WordNet is the synset, a collection of
synonymous words representing a single
concept.
– For example, the synset {king, monarch,
sovereign} represents the concept of a ruler.
• Semantic Relations:
– WordNet connects synsets through various
semantic relations,
Wordnet : Structure
• 1. Input:
– The algorithm takes two main inputs:
• A sentence containing a word with multiple
senses (ambiguous word).
• A dictionary resource that provides
definitions for the different senses of words.
• 2. Identify Ambiguous Word:
– The algorithm first identifies the word in the
sentence that has multiple possible meanings
(the ambiguous word).
Lesk Algorithm
• 5. Signature Creation:
– For each sense of the ambiguous word retrieved
from the dictionary:
• The algorithm creates a "signature" which is
essentially a set of words.
– This signature can be formed by including all the
words present in the definition of that particular
sense.
Lesk Algorithm
• 6. Overlap Calculation:
– The algorithm calculates the overlap between the
signature of each sense and the words in the context
window of the sentence.
– This overlap is typically measured as the number of
words that appear in both the signature and the
context window (excluding stop words like "the", "a",
etc.).
Lesk Algorithm
• 7. Disambiguation:
– The algorithm identifies the sense of the ambiguous
word with the highest overlap between its signature
and the context window. This sense is considered the
most likely meaning based on the surrounding
words.
• 8. Output:
– The algorithm outputs the disambiguated word,
which is the ambiguous word along with its most
likely sense according to the analysis.
Lesk Algorithm: Example
• Sentence:
– "I went to the bank to deposit my money."
• Ambiguous Word: "bank"
• Dictionary Senses:
– bank (financial institution)
– bank (edge of a river)
• Context Window: "the bank" (assuming a small
window)
Lesk Algorithm: Example
• Signature Creation:
– bank (financial institution) signature: {financial
institution, money, deposit}
– bank (edge of a river) signature: {river, edge}
• Overlap Calculation:
– Overlap (financial institution): 2 (money, deposit)
– Overlap (edge of a river): 0
• Disambiguation:
– Based on the highest overlap (2), the algorithm
suggests "bank (financial institution)" as the most
likely meaning in this context.
Word Sense Induction
• Objective:
– Unveiling Hidden Meanings: WSI delves into a
corpus of text to uncover the various ways a
word is used and group these usages into
clusters representing distinct senses.
– This allows NLP systems to understand the
broader semantic landscape of a word.
Semantic role labeling
• Core Objective:
– Unveiling the Meaningful Relationships: SRL aims
to uncover the semantic arguments associated
with a predicate (usually the verb) in a sentence.
– These arguments represent the who, what,
when, where, why, and how of the action or
event described by the verb.
Semantic role labeling: Roles
• Examples:
– Transitive vs. Intransitive:
• "Fred ate the pizza" (transitive, "ate" takes the
object "pizza")
• "Fred ate" (intransitive, "ate" doesn't have a direct
object)
– Argument Structure Variation:
• "Ann threw the ball to Beth" (double object frame)
• "Ann threw Beth the ball" (prepositional object
frame)
Diathesis Alternation
• What it is:
– Annotated Corpus:
• PropBank is a collection of text data where each
sentence is analyzed and labeled with information
about the verb and its arguments.
– Focus on Propositions: It focuses on identifying the core
meaning or proposition expressed by the verb in the
sentence.
– Argument Roles: It also identifies the arguments (e.g.,
agent, patient) associated with the verb, providing a
deeper understanding of the sentence's meaning
structure.
Proposition Bank : Key Points
• Key Points:
– Origin: Developed by Martha Palmer et al., PropBank is
one of the first resources to provide this level of
semantic annotation for verbs and their arguments.
– Verb-Centric: PropBank primarily focuses on verbs,
unlike other resources like FrameNet that might
consider nouns and other words for semantic analysis.
– Complementary to Syntax: While syntax focuses on the
grammatical structure of a sentence, PropBank delves
into the semantic roles and underlying meaning.
Proposition Bank : Applications
• Frame Semantics
– FrameNet is built upon the theory of Frame
Semantics, developed by linguist Charles J.
Fillmore.
– This theory proposes that the meaning of many
words can be best understood in relation to a
semantic frame, which represents a stereotypical
situation, event, or relationship.
FrameNet : Structure
• Core Function:
– Verb-Centric Analysis: Unlike FrameNet, which
encompasses various word classes, VerbNet
specifically concentrates on verbs.
– Classes and Frames: It organizes verbs into a
hierarchical classification system, with general
verb classes (e.g., communication, creation)
further divided into more specific subclasses
(e.g., saying, building).
VerbNet
• Lexicon as a resource:
– In NLP, lexicon can sometimes refer to a specific
resource or database that contains information
about words.
– This might include information about a word's
definition, part of speech, synonyms, antonyms, and
other relevant details.
– Examples of lexicon resources include dictionaries,
thesauri, and wordnets (like WordNet or FrameNet).
Lexicon for sentiment, affect and connotation
• Connotation:
– Connotation: The implied meaning or association
beyond the literal definition of a word. It can evoke
positive, negative, or neutral feelings.
– Lexicon: Comprises words that carry subtle
emotional baggage or cultural associations in
addition to their literal meaning.
• For instance, "homely" can literally mean
homelike, but also carries a connotation of being
plain or unattractive.
Creating Lexicons
• Process:
– Word Selection: The initial step involves selecting a
set of words to be labeled. This can be done by:
• Using existing sentiment lexicons as a starting
point.
• Randomly sampling words from a large corpus.
• Focusing on specific emotion categories of
interest.
Creating Lexicons
• Human Labeling:
– Recruit annotators with good language skills and
familiarity with emotional expression.
– Train them on the labeling guidelines and ensure
consistency in their ratings.
– Each word might be labeled by multiple
annotators to improve reliability.
Creating Lexicons
• Quality Control:
– Implement measures to ensure the quality of the
labeled data:
• Use a gold standard set of pre-labeled words for
annotators to practice on.
• Monitor inter-annotator agreement (how
consistent the labels are between annotators).
• Address discrepancies in labeling through
adjudication or re-labeling.
Creating Lexicons
• Lexicon Creation:
– Once labeling is complete, compile the data and
create the lexicon. Each entry in the lexicon might
include:
• The word itself.
• A list of emotions associated with the word,
along with their frequency of being assigned by
annotators (if multiple annotators were
involved).
• Optionally, an average intensity score for each
emotion.
Semi-supervised Induction of Affect Lexicons
• Techniques:
• Several techniques can be employed for semi-
supervised affect lexicon induction:
– Label Propagation: This method starts with a small
set of labeled seed words with known emotional
associations.
– The labels are then propagated to unlabeled words
based on their similarity or co-occurrence patterns.
– Words that frequently appear alongside positive
seed words are more likely to be assigned positive
sentiment themselves.
Semi-supervised Induction of Affect Lexicons
• Graph-based Methods:
– Words can be represented as nodes in a graph,
with edges connecting similar words.
• Label propagation algorithms can be applied on this
graph to propagate emotional labels from labeled
seed words to unlabeled words based on their
connections and similarities.
Supervised Learning of Word Sentiment
• Process
– Data Preparation
• Data Collection
• Pre-processing
– Feature Engineering
• Word n-grams
• POS Tagging
• Dictionary features
• Word Embeddings
Supervised Learning of Word Sentiment
• Steps:
– Lexicon Development:
• Existing sentiment lexicons like SentiWordNet,
NRC Word-Emotion Association Lexicon, or
LIWC can be used directly.
• Alternatively, you can create a custom lexicon
tailored to your specific domain or needs.
• This might involve collecting words relevant to
your domain and manually labeling them with
sentiment.
Lexicons for Sentiment Recognition
• Steps:
– Text Processing:
• The text data you want to analyze (e.g.,
reviews, social media posts) is preprocessed.
• This typically involves cleaning the text by
removing noise, punctuation, and applying
stemming/lemmatization (reducing words to
their base form).
Lexicons for Sentiment Recognition
• Steps:
– Lexicon Matching:
• Each word in the preprocessed text is
compared against the lexicon.
• If a match is found, the sentiment score
associated with that word is retrieved.
Lexicons for Sentiment Recognition
• Steps:
– Sentiment Score Calculation:
• A sentiment score is calculated for the entire text document or
sentence. This can be done in various ways:
– Simple Sum: The sentiment scores of all matched words are
simply added together. Words with stronger sentiment
(positive or negative) will contribute more to the overall
score.
– Weighted Sum: Assign weights to words based on their
intensity (e.g., "very happy" might have a higher weight than
"happy").
– Taking the Most Frequent Sentiment: Assign the sentiment
of the most frequently occurring words in the text as the
overall sentiment.
Lexicons for Sentiment Recognition
• Steps:
– Sentiment Classification:
• Based on the calculated sentiment score, the
text is classified as positive, negative, or
neutral.
• Thresholds can be defined to determine the
sentiment category (e.g., a score above a
certain value might be positive, below a
certain value might be negative, and values in
between might be neutral).
Lexicons for Personality Recognition
Web Resources
https://mitu.co.in
@mituskillologies http://tusharkute.com @mituskillologies