NLP Unit 4
NLP Unit 4
What is Pragmatics?
● Study of how language is used in context.
● Investigates how meaning is affected by factors such as the speaker's intentions, the listener's
knowledge, and the communicative situation.
● focuses on the non-literal or implied meaning of language, and aims to understand how meaning
is derived from the use of language in social interaction.
● Compositionality: The meaning of a sentence is composed of the meanings of each word and
the way they are combined. This means that the meaning of a sentence can be derived from
the meanings of its parts.
● Truth conditions: A representation must specify the truth conditions for a sentence, i.e., the
conditions under which the sentence would be true or false.
● Context sensitivity: The meaning of a sentence may depend on the context in which it is used.
Therefore, a representation must be able to account for the effects of context on meaning.
● Consistency: A representation should be consistent with other linguistic and cognitive theories,
and should not lead to contradictions or inconsistencies.
First-order Logic:
● First-order logic (FOL) is a formal language that has been used in semantics and pragmatics to
represent the meaning of sentences in a structured and logical way.
● FOL allows us to represent the relationships between objects, properties, and events in a
precise and formal manner, which can be useful for analyzing and understanding the meaning of
natural language expressions.
● In semantics:
○ FOL allows us to represent the meanings of words and sentences regarding their
truth conditions.
■ Example: The sentence "John is a doctor" can be represented in FOL as
"Doctor(John)", which means that the object "John" has the property of
being a doctor.
○ FOL allows us to represent the logical structure of sentences, including their
subject-predicate structure and the relationships between different parts of the
sentence.
■ Example: The sentence "All dogs bark" can be represented in FOL as "For
all x, if x is a dog, then x barks", where "For all x" is a quantifier that
means "for every x", "if x is a dog" is a predicate that describes the
property of being a dog, and "x barks" is a predicate that describes the
action of barking.
○ FOL representations can help to avoid ambiguity and inconsistency in
interpretation.
■ Example: The sentence "Every student passed the exam" can be
represented in FOL as "For all x, if x is a student, then x passed the
exam", which avoids the ambiguity of the sentence "Every student passed
the exam with flying colours", since the latter may suggest that all students
scored exceptionally well, which is not necessarily implied by the former.
● In pragmatics:
○ FOL allows us to represent the speaker's intended meaning and the listener's
interpretation of a sentence.
■ Example: The sentence "I need help with my homework" can be
represented in FOL as a request for assistance, such as
"Request(Assistance, Speaker, Homework)", where "Request" is a
predicate that describes the communicative act of making a request,
"Assistance" is a variable that represents the object being requested,
"Speaker" is a variable that represents the speaker, and "Homework" is a
variable that represents the object for which assistance is needed.
○ FOL allows us to represent the context in which a sentence is used, including the
speaker's beliefs, intentions, and assumptions, and the listener's knowledge and
expectations.
■ Example: The sentence "Do you have the time?" can be represented in
FOL as a request for information, such as "Request(Time, Listener)",
where "Request" is a predicate that describes the communicative act of
making a request, "Time" is a variable that represents the object being
requested, and "Listener" is a variable that represents the person being
addressed.
○ FOL representations can help to capture the communicative function of a
sentence and its relationship to other expressions in the discourse.
■ Example: The sentence "I'm sorry, I can't come to your party tonight" can
be represented in FOL as a polite refusal, such as "Refusal(Party,
Speaker, Listener)", where "Refusal" is a predicate that describes the
communicative act of refusing an invitation, "Party" is a variable that
represents the event being refused, "Speaker" is a variable that
represents the person making the refusal, and "Listener" is a variable that
represents the person being addressed.
Description Logics:
● Description Logics (DLs) are a family of formal knowledge representation languages used to
represent and reason complex concepts and relationships in a structured and logical manner.
● DLs are a subset of first-order logic (FOL) specifically designed for representing knowledge in a
way that is both expressive and computationally tractable.
● Provides formal semantics for natural language expressions, allowing us to represent their
meaning in a structured and logical way.
● Used to construct ontologies, which are structured representations of knowledge in a particular
domain.
● Allows us to reason about the relationships between concepts and instances, and to infer new
knowledge based on existing knowledge.
● Often uses inference engines to perform reasoning tasks, such as consistency checking,
classification, and query answering.DL operates under an open-world assumption, which allows
for more flexible and incremental development of ontologies.
● A type of DL approach
● The syntax of natural language expression is used to drive the process of semantic analysis.
● Involves mapping the syntax of a sentence onto a formal logical structure, such as a DL
ontology, to derive its meaning.
● This approach allows for a more efficient and accurate analysis of natural language expressions,
as the syntactic structure can provide important cues for determining the meaning of ambiguous
or complex expressions.
● The grammar of the language works as a guide.
● The assumption is that the grammatical structure of a sentence reflects its underlying meaning
and that by analyzing this structure, we can infer the meaning of the sentence.
● For example, the sentence "John is a doctor who specializes in cardiology" can be analyzed
syntactically to identify the subclauses "John is a doctor" and "who specializes in cardiology",
which can then be mapped onto corresponding concepts in a DL ontology to derive the overall
meaning of the sentence.
Semantic Attachments:
● Semantic attachments, also known as semantic roles or theta roles, are a linguistic concept that
describes the relationship between the semantic content of a sentence and its syntactic
structure.
● In other words, they represent the different roles that words or phrases play in a sentence
based on their meaning.
● For example, in the sentence "John ate the pizza with a fork," the word "John" is the agent who
acts as eating, "pizza" is the patient that undergoes the action of being eaten, and "fork" is the
instrument that John uses to eat the pizza. These different roles are represented as semantic
attachments associated with each word or phrase in the sentence.
Word Senses:
● A word sense is a specific meaning of a word that is determined by its context. Semantic
attachments are a way of representing the meaning of a word in context by linking it to the
concepts or entities that it refers to.
● For example, consider the word "bank." Depending on the context in which it appears, it could
refer to a financial institution or the side of a river. In semantic attachments, we might represent
these two senses of the word as follows:
○ For the financial institution sense:
■ Word: "bank"
■ Sense: "financial institution"
■ Attachment: links to the concept of a financial institution, such as a bank
account, loans, or mortgages.
○ For the side of a river sense:
■ Word: "bank"
■ Sense: "river bank"
○ Attachment: links to the concept of a river, such as water, shore, or sediment.
● By representing word senses in this way, we can better understand the meaning of words in
context and use this information for various NLP tasks, such as information retrieval, machine
translation, and sentiment analysis.
Thematic roles:
Selectional Restrictions:
● WSD involves determining the correct sense of a word based on its context in a given
sentence or text.
● Many words have multiple senses, and the correct sense must be identified to accurately
interpret the text.
● WSD can be performed using various techniques, including rule-based methods,
knowledge-based methods, and supervised and unsupervised machine learning
algorithms.
● Applications of WSD include machine translation, information retrieval, and text-to-speech
synthesis.
● WSD is a challenging problem, as context can be ambiguous and may require a deep
understanding of language and its nuances to correctly disambiguate word senses.
● Reasons, why WSD is used, are:
○ Ambiguity: Many words in natural language have multiple meanings or senses,
and WSD is used to disambiguate the correct sense in a given context.
○ Accuracy: Helps to improve the accuracy of NLP applications by ensuring that
the correct sense of a word is used.
○ Precision: WSD can also help to improve the precision of NLP applications by
reducing the number of false positives and false negatives.
○ Language understanding: WSD is an important task for natural language
understanding as it requires understanding the context and the meaning of the
words in the sentence.
○ Information retrieval: WSD is used in information retrieval systems to retrieve
relevant documents based on the intended meaning of the query.
Supervised Approach:
● This approach uses labelled examples to train a machine learning model to predict the
correct sense of a word in context.
● A popular algorithm for supervised WSD is the Naive Bayes classifier.
● Example: In the sentence "I went to the bank to deposit my paycheck," the word "bank"
could refer to a financial institution or a river bank. A supervised WSD model would use
labelled examples to learn how to predict the correct sense based on the context of the
sentence.
Dictionary-Based Approach:
● This approach uses a dictionary or lexical database that provides information on the
different senses of a word.
● When presented with a word in context, the approach looks up the word in the dictionary
and chooses the sense that best fits the context.
● Example: In the sentence "I love to play the bass guitar," the word "bass" could refer to a
fish or a low-pitched musical instrument. A dictionary-based WSD approach would look up
"bass" and choose the sense that matches the context of the sentence.
Thesaurus-Based Approach:
● This approach uses a thesaurus or semantic network that groups words based on their
semantic similarity.
● When presented with a word in context, the approach identifies related words in the
thesaurus and chooses the sense that best fits the context.
● Example: In the sentence "The company's revenue has been steadily increasing," the word
"increase" could refer to an uptick in profits or a general upward trend. A thesaurus-based
WSD approach would identify related words like "growth" and "improvement" and choose
the sense that matches the context of the sentence.
Bootstrapping methods:
Bootstrapping in NLP:
Bootstrapping methods:
1. Self-training:
a. Definition: A method where a model is iteratively trained on a small labelled
dataset and then uses its predictions on a larger unlabeled dataset to expand the
training set.
b. Example: In part-of-speech tagging, a model might be trained on a small labelled
set of sentences with their corresponding POS tags, and then use its predictions
on a larger set of unlabeled sentences to identify and add new POS-tagged
examples to the training set.
2. Co-training:
a. Definition: A method where two or more models are trained independently on
different views of the same data, and then use their predictions on a larger
unlabeled dataset to improve each other's performance.
b. Example: In sentiment analysis, one model might be trained on a small labelled
dataset of tweets, while another is trained on a small labelled dataset of news
articles. The models can then use their predictions on a larger set of unlabeled
social media data to improve each other's accuracy.
3. Active learning:
a. Definition: A method where a model is trained on a small labelled dataset and
then used to select the most informative examples from an unlabeled dataset for
annotation. These examples are then added to the labelled dataset, and the
model is retrained on the expanded dataset.
b. Example: In named entity recognition, a model might be trained on a small
labelled dataset of sentences with named entities, and then use active learning to
select the most uncertain examples from a larger set of unlabeled sentences for
manual annotation. These labelled examples can then be used to improve the
model's performance.
4. Semi-supervised learning:
a. Definition: A method where a model is trained on a small labelled dataset and a
larger unlabeled dataset, to leverage the structure of the unlabeled data to
improve performance on the labelled data.
b. Example: In machine translation, a model might be trained on a small labelled
dataset of sentence pairs in two languages, then use a larger set of unlabeled
sentences in both languages to improve the model's accuracy.
Word similarity is an important task in natural language processing (NLP) that involves measuring
the degree of relatedness between pairs of words. There are various approaches to measuring word
similarity, including the use of thesaurus and distributional methods.
Thesaurus-based method:
Distributional-based method:
1. Distributional methods, on the other hand, use statistical information about the co-
occurrence patterns of words in large text corpora to estimate their similarity.
2. The intuition behind distributional methods is that words that occur in similar contexts
are likely to have similar meanings.
3. To apply a distributional method for measuring word similarity, one first needs to
represent words as high-dimensional vectors that capture their distributional patterns in
a corpus.
4. There are various ways to do this, such as counting the frequency of words in a fixed-
size window of text, using neural network models that learn dense embeddings of
words, or applying matrix factorization techniques that decompose the co-occurrence
matrix of words into low-rank components.
5. Once words are represented as vectors, their similarity can be computed using various
distance or similarity metrics, such as cosine similarity, Euclidean distance, or
Mahalanobis distance.
6. These metrics capture the degree of overlap or distance between the vectors, which
reflects the degree of similarity or dissimilarity between the corresponding words.
7. Example: One popular distributional method for word similarity is the cosine similarity
between word vectors. Word vectors are high-dimensional representations of words
that capture their semantic properties based on their distributional patterns in a corpus.
The cosine similarity between two-word vectors measures the degree of similarity
between the corresponding words.
Both thesaurus-based and distributional methods have their strengths and weaknesses, and their
choice depends on the specific application and resources available. Thesaurus-based methods can
be more precise in measuring word similarity, but they rely on a fixed vocabulary that may not be
exhaustive or flexible enough for some tasks. Distributional methods, on the other hand, can
capture more nuanced and context-dependent meanings, but they may require more data and
computational resources to train and apply.