Unit 3 Notes
Unit 3 Notes
In Namwral Language, the meaning of a word may vary as per its usagein
sentences and the context of the text. Word Sense Disambipguation involves
interpreting the meaning of a word based upon the context of its occurrence in
a text.
For example,
the word 'Bark'
may mean 'the sound made by a dog'
or 'the
outermost
layer of a tree.’
Likewise,
the word 'rock’ may mean ' a stone'
or ' a genre of music' - hence,
the accurate meaning of the word is highly dependent upon its context and usage
in the text.
Thus, the ability of a machine to overcome the ambiguity invelved in identifying
the meaning of a word based on its usage and context is called Word Sense
Disambipuation.
Relll:mnslup Extraction:
{ LN %
Meaning Representation
‘While, as humans, it is pretty simple for us to understand the meaning of
textual information, it is not so in the case of machines. Thus, machines tend
1o represent the text in specific formats in order to interpret its meaning. This
formal structure that is used to understand the meaning of a text is called
meaning representation.
Frames
Conceptual dependency (CD)
Rule-based architecture
=@
Case Grammar
Conceptual Graphs
Semantic Analysis Techniques
Based upon the end goal one is trying to accomplish, Semantic Analysis can
be used in various ways. Two of the most common Semantic Analysis
techniques are:
Text Classification
In-Text Classification, our aim is to label the text according to the insights we
intend to gain from the textual data.
For example:
Text Extraction
For Example,
e Ambiguity
Ambiguity in computational linguistics is a situation where a word or a sentence
may have more than one meaning. That is, a sentence may be interpreted in more
than one way. This leads to uncertainty in choosing the right meaning of a
sentence especially while processing natural langnages by computer.
1. Lexical ambiguity
It is class of ambiguity caused by a word and its multiple senses especially when
the word is part of sentence or phrase. A word can have multiple meanings under
different part of speech categories. Also, under each POS category they may have
multiple different senses. Lexical ambiguity is about choosing which sense of a
particular word under a particular POS category.
In a sentence, the lexical ambiguity is caused while choosing the right sense of a
word under a correct POS category.
For example, let us take the sentence "I saw a ship". Here, the words " saw” and
» We understand that words have different meanings based on the context of its usage in
the sentence. If we talk about human langnages, then they are ambiguous too because
many words can be imterpreted in multiple ways depending upon the context of their
oCCurTence.
» For example, consider the two examples of the distinct sense that exist for the
word "hass” —
» The ocourrence of the word bass clearly denotes the distinct meaning. In first sentence,
it means frequency and in second, it means fish Hence, if it would be disambiguated
by WSD then the correct meaning to the above sentences can be assigned as follows —
A Dictionary
The very first input for evaluation of W5D is dictionary, which is vsed to specify the senses to
be disambigunated.
Test Corpus
» Lexical sample — This kind of corpora is used in the system_ where it is required
to disambiguate a small sample of words.
» All-words— This kind of corpora is used in the system, where it is expected to
disambignate all the words in a piece of running text.
Approaches and methods to WSD are classified according to the source of knowledge used in
word disambiguation.
As the name suggests, for disambiguation, these methods primarily rely on dictionaries,
treasures and lexical knowledge base. They do not use corpora evidences for disambigpation.
The Lesk methodis the seminal dictionary-based method introduced by Michael Leskin 1956.
The Lesk definition, onwhich
the Lesk algorithm is based is "measure overlap between sense
definitions for all words in context”. However, in 2000, Kilgarmiff and Rosensweig gave the
simplified Lesk definition as "measure overlap between sense definitions of word and
current context”, which further means identify the correct sense for one word at 2 time. Here
the current context is the set of words in surrounding sentence or paragraph.
Supervised Methods
For disambiguation, machine learning methods make use of sense-annotated corpora to train.
These methods assume that the context can provide enough evidence on its own to
disambiguate the sense. In these methods, the words knowledge and reasoning are deemed
unnecessary. The context is represented as a set of "features” of the words. It includes the
information about the surrounding words also. Support vector machine and memory-based
learning are the most successful supervised learning approaches to WSD. These methods rely
on substantial amount of manually sense-tagged corpora, which is very expensive to create.
Semi-supervised Methods
Due to the lack of training corpus, most of the word sense disambiguation algerithms use semi-
supervised learning methods. It is because semi-supervised methods use both labelled as well
as unlabeled data. These methods require very small amount of annotated text and large amount
of plain unannotated text. The technique that is used by semisupervised methods is
bootstrapping from seed data.
Unsupervised Methods
These methods assume that similar senses occur in similar context. That is why the senses can
be induced from text by clustering word occurrences by using some measure of similarity of
the context. This task is called word sense induction or discrimination. Unsupervised methods
hawve great potentialto overcome the knowledge acquisition bottleneck due to non-dependency
on manual efforts.
Machine Translation
Machine translation or MT is the most obvious application of WSD. In MT, Lexical choice for
the words that have distinct translations for different senses, is done by WS5D. The senses in
MT are represented as words in the target langnage. Most of the machine translation systems
do not use explicit W5D module.
Information retrieval (IR) may be defined as a software program that deals with the
organization, storage, retrieval and evaluation of information from document repositories
particularly textual information. The system basically assists users in finding the information
they required but it does not explicitly return the answers of the questions. WSD is used to
resolve the ambiguities of the queries provided to IR system. As like MT, current IR systems
do not explicitly use WS5D module and they rely on the concept that user would type enough
context in the query to only retrieve relevant documents.
In most of the applications, WSD is necessary to do accurate analysis of text. For example,
‘WSD helps intelligent gathering system to do flagging of the comect words. For example,
medical intelligent system might need flagging of "illegal drugs" rather than "medical drugs”
Lexicography
WSD and lexicography can work together in loop because modern lexicography is
corpusbased. With lexicography, WS5D provides rough empirical sense groupings as well as
statistically significant contextual indicators of sense.
The major problem of W5D is to decide the sense of the word because different senses can be
very closely related. Even different dictionaries and thesauruses can provide different divisions
of words into senses.
Another problem of WS5D is that completely different algorithm might be needed for different
applications. For example, in machine translation, it takes the form of target word selection;
and in information retrieval, a sense inventory is not required.
Inter-judge variance
Another problem of W5D is that W3D systems are generally tested by having their results on
a task compared against the task of human beings. This is called the problem of interjudge
variance.
Word-sense discreteness
Another difficulty in W5D is that words cannot be easily divided into discrete submeanings.
Discourse Processing
The most difficult problem of Al is to process the namral language by computers or in other
words natural language processing is the most difficult problem of artificial intelligence. If we
talk about the major problems in INLP, then one of the major problems in NLP is discourse
processing — building theories and models of how utterances stick together to form coherent
discourse. Actually, the language always consists of collocated, structured and coherent groups
of sentences rather than isclated and unrelated sentences like movies. These coherent groups
of sentences are referred to as discourse.
Coherence and discourse structure are interconnected in many ways. Coherence, along with
property of good text, is used to evaluate the output quality of natural language generation
system. The question that arises here is what does it mean for a textto be coherem? Suppose
we collected one sentence from every page of the newspaper, then will it be a discourse? Of-
course, not. It is because these sentences do not exhibit coherence. The coherent discourse must
possess the following properties —
The discourse would be coberent if it has meaningful connections between its utterances. This
property is called coherence relation. For example, some sort of explanation must be there to
justify the connection between utterances.
Another property that makes a discourse coherent is that there must be a certain kind of
relationship with the entities. Such kind of coherence is called entity-based coherence.
4+ Discourse structure
An important question regarding discourse is what kind of structure the discourse must have.
The answer to this question depends upon the segmentation we applied on discourse. Discourse
segmentations may be defined as determining the types of structures for large discourse. It is
quite difficult to implement discourse segmentation, but it is very important for information
retrieval, text summarization and information extractionkind of applications.
In this section, we will learn about the algorithms for discourse segmentation. The algorithms
are described below—
The earlier method does not have any hand-labeled segment boundaries. On the other hand,
supervised discourse segmentation needs to have boundary-labeled training data. It is very easy
to acquire the same. In supervised discourse segmentation, discourse marker or cue words play
an important role. Discourse marker or cue word is a word or phrase that functions to signal
discourse structure. These discourse markers are domain-specific.
Lexical repetition is a way to find the structure in a discourse, but it does not satisfy the
requirement of being coherent discourse. To achieve the coherent discourse, we must focus on
coherence relations in specific. As we know that coherence relation defines the possible
connection between utterances in a discourse. Hebb has proposed such kind of relations as
follows —
We are taking two terms 5, and 5, to reprezent the meaning of the two related sentences —
Result
It infers that the state asserted by termS, conld canse the state asserted by 5, . For example, two
statements show the relationship result: Ram was canght in the fire. His skin burned.
Explanation
It infers that the state asserted by S. could cause the state asserted by S.. For example, two
statements show the relationship — Bam fought with Shyam's friend. He was drumlk
Parallel
It mfers p(al,a2....) £ rom assertion of 5, and p(bl.b2....) from assertion 5. Here ai and bi are
similar for all i. For example, two statements are parallel — Ram wanted car. Shyam wanted
money.
Elaboration
It infers the same proposition P from both the assertions — 5, and § For example, two
statements show the relation elaboration: Ram was from Chandigarh. Shyam was from Kerala.
Occasion
It happens when a change of state can be inferred from the assertion of S, final state of which
can be inferred from% and vice-versa. For example, the two statements show the relation
occasion: Ram picked up the book. He gave it to Shyam.
The coherence of entire discourse can also be considered by hierarchical stmcture between
coherence relations. For example, the following passage can be represented as hierarchical
structure—
S () Explanation (e:)
Explanation (e3)
/
53 (&) Se (&4)
Reference Resolution
Interpretation of the seatences from any discourse is another important task and to achieve this
we need to know who or what entityis being talked about. Here, interpretation reference is the
key element Reference
may be defined as the linguistic expression to denote an entity or
individual. For example, in the passage, Ram, the managerof ABC bank, saw his friend Shyam
at a shop. He went to meet him, the linguistic expressions like Ram, His, He are reference.
On the same note, reference resolution maybe defined as the task of determining what entities
are referred to by which linguistic expression.
Let us now see the different types of referring expressions. The five types of referring
expressions are described below —
Such kind of reference represents the entities that are new to the hearer into the discourse
context. For example — in the sentence Ram had gone around ene day to bring im some food
— zome i an indefinite reference.
Opposite to above, such kind of reference represents the entities that are not new or identifiable
to the hearer into the discourse context. For example, in the sentence - I used to read The Times
of India - The Times of India is a definite reference.
Pronouns
It is a form of definite reference. For example, Ram laughed as loud as he could. The
word he represents pronoun referring expression.
Demonstratives
Names
It is the simplest type of referring expression. It can be the name of a person, organization and
location also. For example, in the above examples, Ram is the name-refereeing expression.
Coreference Resolution
It is the task of finding referring expressions in a text that refer to the same entity. In simple
words, it is the task of finding corefer expressions. A set of coreferring expressions are called
coreference chain. For example - He, Chief Manager and His - these are referring expressions
in the first passage given as example.
In English, the main problem for coreference resolution is the pronoun it. The reason behind
this is that the pronoun it has many uses. For example, it can refer much like he and she. The
pronounit also refers to the things that do not refer to specific things. For example, It's raining.
It is really good.
Unlike the coreference resolution, pronominal anaphora resclution may be defined as the task
of finding the antecedent for a single pronoun. For example, the pronoun is his and the task of
pronominal anaphora resolution is to find the word Ram because Ram is the antecedent.