0% found this document useful (0 votes)
101 views6 pages

Word Sense Disambiguation in Natural Language Processing - GeeksforGeeks

Uploaded by

anand.1044a
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
101 views6 pages

Word Sense Disambiguation in Natural Language Processing - GeeksforGeeks

Uploaded by

anand.1044a
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Word Sense Disambiguation in Natural Language

Processing
Last Updated : 21 Apr, 2023

Word sense disambiguation (WSD) in Natural Language Processing (NLP)


is the problem of identifying which “sense” (meaning) of a word is activated
by the use of the word in a particular context or scenario. In people, this
appears to be a largely unconscious process. The challenge of correctly
identifying words in NLP systems is common, and determining the specific
usage of a word in a sentence has many applications. The application of
Word Sense Disambiguation involves the area of Information Retrieval,
Question Answering systems, Chat-bots, etc.

Word Sense Disambiguation (WSD) is a subtask of Natural Language


Processing that deals with the problem of identifying the correct sense of a
word in context. Many words in natural language have multiple meanings,
and WSD aims to disambiguate the correct sense of a word in a particular
context. For example, the word “bank” can have different meanings in the
sentences “I deposited money in the bank” and “The boat went down the
river bank”.

WSD is a challenging task because it requires understanding the context in


which the word is used and the different senses in which the word can be
used. Some common approaches to WSD include:

1. Supervised learning: This involves training a machine learning model on a


dataset of annotated examples, where each example contains a target
word and its sense in a particular context. The model then learns to
predict the correct sense of the target word in new contexts.
2. Unsupervised learning: This involves clustering words that appear in
similar contexts together, and then assigning senses to the resulting
clusters. This approach does not require annotated data, but it is less
accurate than supervised learning.
3. Knowledge-based: This involves using a knowledge base, such as a
dictionary or ontology, to map words to their different senses. This
approach relies on the availability and accuracy of the knowledge base.
4. Hybrid: This involves combining multiple approaches, such as supervised
and knowledge-based methods, to improve accuracy.

WSD has many practical applications, including machine translation,


information retrieval, and text-to-speech systems. Improvements in WSD
can lead to more accurate and efficient natural language processing systems.

Word Sense Disambiguation (WSD) is a subfield of Natural Language


Processing (NLP) that deals with determining the intended meaning of a
word in a given context. It is the process of identifying the correct sense of a
word from a set of possible senses, based on the context in which the word
appears. WSD is important for natural language understanding and machine
translation, as it can improve the accuracy of these tasks by providing more
accurate word meanings. Some common approaches to WSD include using
WordNet, supervised machine learning, and unsupervised methods such as
clustering.

The noun ‘star’ has eight different meanings or senses. An idea can be
mapped to each sense of the word. For example,

“He always wanted to be a Bollywood star.” The word ‘star’ can be


described as “A famous and good singer, performer, sports player,
actor, personality, etc.”
“The Milky Way galaxy contains between 200 and 400 billion stars”.
In this, the word star means “a big ball of burning gas in space that
we view as a point of light in the night sky.”

Difficulties in Word Sense Disambiguation


There are some difficulties faced by Word Sense Disambiguation (WSD).

Different Text-Corpus or Dictionary: One issue with word sense


disambiguation is determining what the senses are because different
dictionaries and thesauruses divide words into distinct senses. Some
academics have proposed employing a specific lexicon and its set of
senses to address this problem. In general, however, research findings
based on broad sense distinctions have outperformed those based on
limited ones. The majority of researchers are still working on fine-grained
WSD.
PoS Tagging: Part-of-speech tagging and sense tagging have been
shown to be very tightly coupled in any real test, with each potentially
constraining the other. Both disambiguating and tagging with words are
involved in WSM part-of-speech tagging. However, algorithms designed
for one do not always work well for the other, owing to the fact that a
word’s part of speech is mostly decided by the one to three words
immediately adjacent to it, whereas a word’s sense can be determined by
words further away.

Sense Inventories for Word Sense Disambiguation


Sense Inventories are the collection of abbreviations and acronyms with their
possible senses. Some of the examples used in Word Sense Disambiguation
are:

Princeton WordNet: is a vast lexicographic database of English and other


languages that is manually curated. For WSD, this is the de facto standard
inventory. Its well-organized Synsets, or clusters of contextual synonyms,
are nodes in a network.
BabelNet: is a multilingual dictionary that covers both lexicographic and
encyclopedic terminology. It was created by semi-automatically mapping
numerous resources, including WordNet, multilingual versions of
WordNet, and Wikipedia.
Wiktionary: a collaborative project aimed at creating a dictionary for each
language separately, is another inventory that has recently gained
popularity.

Approaches for Word Sense Disambiguation


There are many approaches to Word Sense Disambiguation. The three main
approaches are given below:

1. Supervised: The assumption behind supervised approaches is that the


context can supply enough evidence to disambiguate words on its own
(hence, world knowledge and reasoning are deemed unnecessary).

Supervised methods for Word Sense Disambiguation (WSD) involve training


a model using a labeled dataset of word senses. The model is then used to
disambiguate the sense of a target word in new text. Some common
techniques used in supervised WSD include:

1. Decision list: A decision list is a set of rules that are used to assign a
sense to a target word based on the context in which it appears.
2. Neural Network: Neural networks such as feedforward networks,
recurrent neural networks, and transformer networks are used to model
the context-sense relationship.
3. Support Vector Machines: SVM is a supervised machine learning
algorithm used for classification and regression analysis.
4. Naive Bayes: Naive Bayes is a probabilistic algorithm that uses Bayes’
theorem to classify text into predefined categories.
5. Decision Trees: Decision Trees are a flowchart-like structure in which an
internal node represents feature(or attribute), the branch represents a
decision rule, and each leaf node represents the outcome.

Random Forest: Random Forest is an ensemble learning method for


classification, regression, and other tasks that operate by constructing a
multitude of decision trees at training time and outputting the class that is
the mode of the classes.

Supervised WSD Exploiting Glosses: Textual definitions are a prominent


source of information in sense inventories (also known as glosses).
Definitions, which follow the format of traditional dictionaries, are a quick
and easy way to clarify sense distinctions
Purely Data-Driven WSD: In this case, a token tagger is a popular
baseline model that generates a probability distribution over all senses in
the vocabulary for each word in a context.
Supervised WSD Exploiting Other Knowledge: Additional sources of
knowledge, both internal and external to the knowledge base, are also
beneficial to WSD models. Some researchers use BabelNet translations to
fine-tune the output of any WSD system by comparing the output senses’
translations to the target’s translations provided by an NMT system.

2. Unsupervised: The underlying assumption is that similar senses occur in


similar contexts, and thus senses can be induced from the text
by clustering word occurrences using some measure of similarity of context.
Using fixed-size dense vectors (word embeddings) to represent words in
context has become one of the most fundamental blocks in several NLP
systems. Traditional word embedding approaches can still be utilized to
improve WSD, despite the fact that they conflate words with many meanings
into a single vector representation. Lexical databases (e.g., WordNet,
ConceptNet, BabelNet) can also help unsupervised systems map words and
their senses as dictionaries, in addition to word embedding techniques.
Data Science IBM Certification Data Science Data Science Projects Data Analysis Data Visualization Machine L
3. Knowledge-Based: It is built on the idea that words used in a text are
related to one another, and that this relationship can be seen in the
definitions of the words and their meanings. The pair of dictionary senses
having the highest word overlap in their dictionary meanings are used to
disambiguate two (or more) words. Lesk Algorithm is the classical algorithm
based on Knowledge-Based WSD. Lesk algorithm assumes that words in a
given “neighborhood” (a portion of text) will have a similar theme. The
dictionary definition of an uncertain word is compared to the terms in its
neighborhood in a simplified version of the Lesk algorithm.

Subtopics:

1. Supervised methods for WSD


2. Unsupervised methods for WSD
3. Knowledge-based methods for WSD
4. Distributional methods for WSD
5. Hybrid methods for WSD
6. Evaluation metrics for WSD
7. Applications of WSD in NLP tasks such as machine translation,
information retrieval, and text summarization.
8. Limitations and challenges in WSD research
9. Recent developments and future directions in WSD
10. Annotation schemes and tools for WSD

Example:

For example, consider the word “bank” in the sentence “I deposited my


money in the bank.” Without WSD, it would be difficult for a computer to
determine whether the word “bank” refers to a financial institution or the
edge of a river. However, with WSD, the computer can use context clues
such as “deposited” and “money” to determine that the intended meaning of
“bank” in this sentence is a financial institution. This will improve the
accuracy of natural language understanding and machine translation, as the
computer will understand that the sentence is talking about depositing
money in a bank account, not at the edge of a river.

Get ready to boost your rank and secure an exceptional GATE 2025 score
with confidence!
Our GATE CS & IT Test Series 2025 offers 60 PYQs Quizzes, 60 Subject-
Wise Mock Tests, 4500+ PYQs and practice questions, and over 20 Full-
Length Mock Tests that ensure you’re well-prepared to tackle the toughest
questions and secure a top-rank in the GATE 2025 exam. Get personalized
insights with student rankings based on performance and benefit from
expert-designed tests created by industry pros and GATE CS toppers.

Plus, don’t miss out on these exclusive features:

--> All India Mock Test


--> Live GATE CSE Mentorship Classes
--> Live Doubt Solving Sessions

Join now and stay ahead in your GATE 2025 journey!

Comment More info Next Article


Introduction to Natural Language
Processing (NLP)

Similar Reads
Natural Language Processing(NLP) VS Programming Language
In the world of computers, there are mainly two kinds of languages: Natural
Language Processing (NLP) and Programming Languages. NLP is all about…
4 min read

ML | Natural Language Processing using Deep Learning


Machine Comprehension is a very interesting but challenging task in both
Natural Language Processing (NLP) and artificial intelligence (AI) research.…
9 min read

Translation and Natural Language Processing using Google Cloud


Prerequisite: Create a Virtual Machine and setup API on Google Cloud In this
article, we will discuss how to use Google's Translation and Natural Languag…
7 min read

Natural Language Processing: Moving Beyond Zeros and Ones


Machine Learning is one of the wonders of modern technology! Intelligent
robots, smart cars etc. are all applications of ML. And the technology that can…

You might also like