0% found this document useful (0 votes)
9 views20 pages

DL Unit-IV

Natural Language Processing (NLP) combines artificial intelligence and deep learning to enable machines to understand and generate human language, achieving state-of-the-art results in various tasks. Key concepts include text representation, sequence modeling, and the use of models like RNNs, LSTMs, and Transformers for tasks such as text classification and machine translation. Challenges in NLP include ambiguity, data scarcity, and bias, while advancements like contextual embeddings enhance semantic understanding.

Uploaded by

Rishika Vuggam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views20 pages

DL Unit-IV

Natural Language Processing (NLP) combines artificial intelligence and deep learning to enable machines to understand and generate human language, achieving state-of-the-art results in various tasks. Key concepts include text representation, sequence modeling, and the use of models like RNNs, LSTMs, and Transformers for tasks such as text classification and machine translation. Challenges in NLP include ambiguity, data scarcity, and bias, while advancements like contextual embeddings enhance semantic understanding.

Uploaded by

Rishika Vuggam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Introduction to Natural Language Processing (NLP) in Deep Learning

Natural Language Processing (NLP) is a field of artificial intelligence (AI) that enables
machines to understand, interpret, generate, and interact with human language. When combined
with deep learning, NLP has significantly advanced, enabling systems to achieve state-of-the-art
results in tasks like translation, sentiment analysis, question answering, and text generation.

Key Concepts in NLP

1. Text Representation:
o Words and sentences need to be represented numerically for models to process.
o Common approaches:
▪ One-Hot Encoding: Binary vector representation.
▪ Word Embeddings: Dense vector representations capturing semantic
similarity (e.g., Word2Vec, GloVe).
▪ Contextual Embeddings: Represent words based on their context in a
sentence (e.g., BERT, GPT).
2. Sequence Modeling:
o NLP tasks often involve understanding sequences of words.
o Sequence models process these inputs while preserving order and context.
3. Language Understanding and Generation:
o Tasks range from analyzing existing text to generating coherent and meaningful
new text.

Deep Learning Models in NLP

1. Recurrent Neural Networks (RNNs):


o Designed for sequential data processing.
o Maintain a hidden state to capture information from previous inputs.
o Limitations: Struggle with long-term dependencies due to vanishing gradients.
2. Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU):
o Variants of RNNs that address long-term dependency issues.
o Use gating mechanisms to control the flow of information.
3. Convolutional Neural Networks (CNNs):
o Effective for tasks like text classification.
o Capture local dependencies using convolutional filters.
4. Transformers:
o Revolutionized NLP by replacing recurrence with self-attention mechanisms.
o Models like BERT, GPT, and T5 are based on transformers.
o Highly parallelizable and capable of capturing long-term dependencies.
NLP Tasks and Applications

1. Text Classification:
o Categorize text into predefined labels.
o Examples: Sentiment analysis, spam detection.
2. Sequence Labeling:
o Assign a label to each token in a sequence.
o Examples: Part-of-speech tagging, named entity recognition (NER).
3. Machine Translation:
o Translate text from one language to another.
o Example: English to French translation.
4. Question Answering:
o Answer questions based on input text or a knowledge base.
5. Text Generation:
o Generate coherent and contextually relevant text.
o Examples: Chatbots, story generation.
6. Summarization:
o Generate concise summaries from longer texts.
o Types: Extractive and abstractive summarization.

Key Components in NLP Pipelines

1. Preprocessing:
o Tokenization: Splitting text into words or subwords.
o Lowercasing, removing stopwords, and stemming/lemmatization (traditional
methods).
o SentencePiece or Byte Pair Encoding (BPE) for subword tokenization in deep
learning.
2. Embedding Layers:
o Map words/tokens to vectors in a continuous space.
o Examples: Word2Vec, FastText, pre-trained embeddings (e.g., GloVe).
3. Model Architecture:
o Sequence models (RNNs, LSTMs) or attention-based models (Transformers).
o Pre-trained models like BERT, RoBERTa, GPT-3 fine-tuned for specific tasks.
4. Loss Functions:
o Cross-entropy loss for classification.
o Perplexity for language modeling.

Pre-Trained Models and Frameworks

• BERT (Bidirectional Encoder Representations from Transformers):


o Captures context from both directions in a sentence.
• GPT (Generative Pre-trained Transformer):
o Focuses on text generation.
• T5 (Text-to-Text Transfer Transformer):
o A unified framework for NLP tasks.
• RoBERTa:
o Improved version of BERT with optimized training.

Tools and Libraries

1. TensorFlow and PyTorch: Popular frameworks for building deep learning models.
2. Hugging Face Transformers: Library for pre-trained models and NLP pipelines.
3. spaCy: Industrial-strength NLP library.
4. NLTK: A classical NLP library for preprocessing and analysis.

Challenges in NLP

1. Ambiguity:
o Words can have multiple meanings depending on context.
o Example: "bank" (financial institution vs riverbank).
2. Data Scarcity:
o Annotated datasets may not be available for all languages or domains.
3. Bias in Data:
o Pre-trained models can reflect societal biases present in training data.
4. Multilingual Understanding:
o Handling multiple languages and code-switching (mixing languages in a
sentence).
5. Efficiency:
o Training and inference for large models can be resource-intensive.

Vector Space Model (VSM) for Semantics

The Vector Space Model (VSM) is a mathematical framework for representing and analyzing
the semantics of words, phrases, sentences, or entire documents. In this model, linguistic units
are represented as vectors in a high-dimensional space, where geometric relationships (e.g.,
angles or distances) capture semantic relationships.

Core Concepts of VSM


1. Representation:
o Words, documents, or phrases are embedded in a continuous vector space.
o The vectors are constructed such that semantically similar items are close to each other
in the space.
2. Dimensionality:
o Each dimension represents some aspect of meaning or context.
o In traditional models, dimensions might correspond to individual terms or features.
o In modern embeddings, dimensions are abstract features learned from data.
3. Similarity Measurement:
o Semantic similarity is measured using geometric metrics:
▪ Cosine Similarity: Measures the angle between two vectors.
▪ Euclidean Distance: Measures the straight-line distance between two points.

Types of VSMs for Semantics

1. Term-Document Matrix (Classical VSM):

• A sparse matrix where rows represent terms, and columns represent documents.
• Entries are term frequencies, term frequencies weighted by inverse document frequency (TF-
IDF), or similar metrics.
• Example: Representing a document by its word occurrence frequencies.

2. Latent Semantic Analysis (LSA):

• Reduces the dimensionality of the term-document matrix using Singular Value Decomposition
(SVD).
• Projects terms and documents into a lower-dimensional "latent" space where semantic
relationships are preserved.
• Addresses synonymy and polysemy (different words with similar meanings or the same word
with multiple meanings).

3. Word Embeddings:

• Dense, continuous vector representations of words learned from large corpora.


• Examples:
o Word2Vec: Uses skip-gram or CBOW architectures to predict word context.
o GloVe: Combines global word co-occurrence statistics with local context.
o FastText: Represents words as subword embeddings, capturing morphology.

4. Contextual Word Embeddings:

• Words are represented based on their surrounding context.


• Examples:
o BERT (Bidirectional Encoder Representations from Transformers): Captures
bidirectional context in sentences.
o ELMo: Produces embeddings for words that vary by their context.
o GPT: Focuses on text generation with autoregressive properties.

Applications of VSM in Semantics

1. Document Retrieval:
o Query-document matching using cosine similarity.
o Example: Search engines rank documents based on their vector similarity to a query.
2. Text Classification:
o Documents represented as vectors are used as inputs for classifiers.
o Example: Spam detection, topic classification.
3. Semantic Similarity:
o Computing similarity between words, sentences, or documents.
o Example: Plagiarism detection, sentence paraphrase identification.
4. Word Analogy Tasks:
o Solving analogies like "king - man + woman = queen" using vector arithmetic.
5. Clustering and Topic Modeling:
o Grouping semantically similar items together.
o Example: News categorization.

Strengths of VSM

1. Simplicity:
o Easy to compute and interpret basic term-document matrices.
2. Semantic Generalization:
o Techniques like LSA or embeddings capture latent relationships beyond word surface
forms.
3. Efficiency:
o Pre-trained word embeddings reduce the need for building models from scratch.

Limitations of Classical VSMs

1. Sparsity:
o Classical term-document matrices are sparse, requiring dimensionality reduction for
effective use.
2. Lack of Context:
o Words have fixed meanings, ignoring context.
o Example: "bank" (financial institution vs riverbank).
3. Synonymy and Polysemy:
o Hard to resolve without advanced techniques.
4. Scalability:
o Computationally expensive for large datasets without dimensionality reduction.

Modern Advances

1. Dynamic Embeddings:
o Contextual embeddings (BERT, GPT) address fixed-meaning limitations by considering
sentence context.
o Examples:
▪ In "He went to the bank to withdraw money," "bank" is associated with financial
institutions.
▪ In "The boat docked near the bank of the river," "bank" is associated with a
riverbank.
2. Sentence and Document Embeddings:
o Representing entire sentences or documents as single vectors.
o Example: Sentence-BERT, Universal Sentence Encoder.
3. Graph-Based VSMs:
o Represent relationships as graphs (e.g., word graphs, knowledge graphs).

Example: Word2Vec for Semantic Similarity

Training:

• Given a sentence, the model predicts surrounding words (context) for a given word.
• Result: Embeddings where similar words are geometrically close.

python
Copy code
from gensim.models import Word2Vec

# Sample sentences
sentences = [
['king', 'queen', 'man', 'woman'],
['cat', 'dog', 'animal', 'pet'],
['car', 'bus', 'vehicle', 'transport']
]

# Train Word2Vec
model = Word2Vec(sentences, vector_size=100, window=5, min_count=1,
workers=4)

# Get vector for a word


vector_king = model.wv['king']

# Find most similar words


similar_words = model.wv.most_similar('king', topn=5)
print(similar_words)

Word Vector Representations in Deep Learning

Word vector representations, also known as word embeddings, are dense, continuous, and low-
dimensional vector representations of words. They are foundational to many Natural Language
Processing (NLP) tasks, enabling models to capture semantic and syntactic relationships between
words.

Why Use Word Vector Representations?

1. Semantic Similarity:
o Words with similar meanings have similar vector representations.
o Example: "king" and "queen" are closer in vector space than "king" and "car."
2. Overcome Sparsity:
o Traditional methods like one-hot encoding create high-dimensional, sparse vectors
where most entries are zero.
o Word embeddings reduce dimensionality and improve efficiency.
3. Generalization:
o Embeddings capture word relationships beyond exact matches, enabling models to
generalize better across tasks.

Methods to Learn Word Representations

1. Count-Based Approaches

• Leverage word co-occurrence statistics in a corpus.


• Examples:
o Latent Semantic Analysis (LSA): Applies Singular Value Decomposition (SVD) to reduce
the dimensionality of the term-document matrix.
o GloVe (Global Vectors for Word Representation): Combines global word co-occurrence
information to create dense embeddings.

2. Prediction-Based Approaches

• Train a neural network to predict word relationships in context.


• Examples:
o Word2Vec: Predicts context words (CBOW) or the target word (Skip-Gram) from its
neighbors.
o FastText: Extends Word2Vec by considering subword information, capturing
morphological nuances.
o Contextual Embeddings (e.g., BERT, GPT): Learn word representations dynamically
based on their surrounding context.

Popular Word Embedding Models

1. Word2Vec

• Developed by Google.
• Two architectures:
o Skip-Gram: Predicts surrounding words given a target word.
o CBOW (Continuous Bag of Words): Predicts a target word given its context.
• Captures relationships like:
o "king - man + woman ≈ queen"

2. GloVe

• Developed by Stanford.
• Uses word co-occurrence matrices to capture global statistics.
• Embeddings are optimized such that: wordi⋅wordj=co-occurrence probability\text{word}_i \cdot
\text{word}_j = \text{co-occurrence probability}wordi⋅wordj=co-occurrence probability

3. FastText

• Developed by Facebook.
• Represents words as a sum of subword embeddings.
• Effective for rare words and morphologically rich languages.

4. Contextual Word Embeddings

• Incorporate word meanings based on sentence context.


• Examples:
o BERT (Bidirectional Encoder Representations from Transformers): Captures bidirectional
context.
o GPT (Generative Pre-trained Transformer): Focuses on left-to-right context for text
generation.
o ELMo (Embeddings from Language Models): Captures context-dependent
representations using LSTMs.
Example: Word2Vec in Python
python
Copy code
from gensim.models import Word2Vec

# Sample corpus
sentences = [
['king', 'queen', 'man', 'woman'],
['dog', 'cat', 'animal', 'pet'],
['car', 'bus', 'vehicle', 'transport']
]

# Train Word2Vec model


model = Word2Vec(sentences, vector_size=100, window=5, min_count=1,
workers=4)

# Get the vector for 'king'


vector = model.wv['king']
print(f"Vector for 'king': {vector[:10]}") # Print first 10 dimensions

# Find most similar words to 'king'


similar_words = model.wv.most_similar('king', topn=3)
print(f"Most similar words to 'king': {similar_words}")

Evaluation of Word Embeddings

1. Intrinsic Evaluation:
o Measures the quality of embeddings on linguistic tasks like:
▪ Word similarity (e.g., cosine similarity between "car" and "bus").
▪ Word analogy tasks (e.g., "man:king :: woman:queen").
2. Extrinsic Evaluation:
o Assesses the embeddings' impact on downstream tasks such as sentiment analysis,
machine translation, or text classification.

Advancements: Contextual Word Embeddings

Unlike static embeddings (e.g., Word2Vec, GloVe), contextual embeddings represent words
differently depending on their usage in a sentence. For example:

• In "He went to the bank to withdraw money," "bank" refers to a financial institution.
• In "The boat docked near the bank of the river," "bank" refers to a riverbank.

Examples:

• BERT: Pre-trained on a large corpus and fine-tuned for specific tasks.


• GPT: Specialized for generating coherent text.
• Transformer-based Models: Excel in NLP tasks, including translation, summarization, and
question answering.

Applications of Word Embeddings

1. Text Classification:
o Input embeddings are fed into classifiers for tasks like spam detection or sentiment
analysis.
2. Machine Translation:
o Models like Seq2Seq with attention mechanisms use embeddings for translating text.
3. Named Entity Recognition (NER):
o Identify and classify entities (e.g., names, dates, locations).
4. Question Answering:
o Power models like BERT for answering questions based on context.
5. Search and Information Retrieval:
o Rank documents based on semantic similarity with queries.

Continuous Skip-Gram Model:

The Continuous Skip-Gram Model is a neural network-based architecture introduced by the


creators of Word2Vec (Mikolov et al., 2013) for learning word embeddings. The model predicts
the surrounding context words of a given target word in a sentence, capturing semantic
relationships and linguistic patterns.

Key Idea

The primary objective of the skip-gram model is to maximize the likelihood of predicting the
context (neighboring words) for a given target word.

For a given word wtw_twt, the skip-gram model aims to maximize the probability of its
surrounding words within a defined context window size CCC:

∏−C≤j≤C,j≠0P(wt+j∣wt)\prod_{-C \leq j \leq C, j \neq 0} P(w_{t+j} | w_t)−C≤j≤C,j =0∏


P(wt+j∣wt)

Here:

• wtw_twt: Target word.


• wt+jw_{t+j}wt+j: Context words within a window size CCC.

Architecture

1. Input Layer:
o One-hot vector representation of the target word (size = vocabulary size VVV).
2. Embedding Layer:
o Transforms the one-hot vector into a dense vector of size ddd (embedding size).
o This layer consists of a matrix WWW of size V×dV \times dV×d, where each row
is the embedding of a word.
3. Output Layer:
o Produces probabilities for all words in the vocabulary using a softmax function:
P(wt+j∣wt)=exp⁡(vwt+j⋅vwt)∑w′∈Vexp⁡(vw′⋅vwt)P(w_{t+j} | w_t) =
\frac{\exp(v_{w_{t+j}} \cdot v_{w_t})}{\sum_{w' \in V} \exp(v_{w'} \cdot
v_{w_t})}P(wt+j∣wt)=∑w′∈Vexp(vw′⋅vwt)exp(vwt+j⋅vwt)
o Here, vwtv_{w_t}vwt and vwt+jv_{w_{t+j}}vwt+j are the embeddings of the
target and context words.

Loss Function

The model uses a negative log-likelihood as the loss function:

L=−∑t∑−C≤j≤C,j≠0log⁡P(wt+j∣wt)L = -\sum_{t} \sum_{-C \leq j \leq C, j \neq 0} \log


P(w_{t+j} | w_t)L=−t∑−C≤j≤C,j =0∑logP(wt+j∣wt)

To improve computational efficiency, negative sampling or hierarchical softmax is used:

• Negative Sampling: Optimizes a subset of output probabilities by sampling "negative"


examples (words that are not the true context words).
• Hierarchical Softmax: Uses a binary tree structure to reduce the computational cost of
the softmax operation.

Training

1. A sliding window of size 2C+12C+12C+1 moves over a sentence.


2. For each target word wtw_twt, context words within the window are selected as positive
examples.
3. The model updates embeddings for the target-context pairs using stochastic gradient
descent (SGD).
Example in Python (Using Gensim)

Here’s how to train a skip-gram model using the Gensim library:

python
Copy code
from gensim.models import Word2Vec

# Sample corpus
sentences = [
['the', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog'],
['king', 'queen', 'man', 'woman'],
['car', 'bus', 'vehicle', 'transport']
]

# Train skip-gram model


model = Word2Vec(sentences, vector_size=100, window=5, min_count=1, sg=1,
workers=4)

# Get the vector for a word


vector = model.wv['king']
print(f"Vector for 'king': {vector[:10]}") # First 10 dimensions

# Find most similar words


similar_words = model.wv.most_similar('king', topn=3)
print(f"Most similar words to 'king': {similar_words}")

Advantages of the Skip-Gram Model

1. Captures Semantic Relationships:


o Embeddings learn relationships like "king - man + woman = queen."
2. Handles Rare Words Well:
o Skip-gram works effectively even when a word appears infrequently in the
corpus.
3. Scalability:
o With optimizations like negative sampling, it scales well to large vocabularies and
corpora.

Applications

1. Semantic Similarity:
o Measure how similar two words are based on their vector representations.
2. Word Analogy Tasks:
o Solve analogies using vector arithmetic (e.g., "man:king :: woman:queen").
3. NLP Tasks:
oInput for downstream tasks like text classification, machine translation, and
question answering.
4. Pre-training:
o Skip-gram embeddings can initialize models for other NLP tasks to improve
performance.

Continuous Bag of Words (CBOW) Model:

The Continuous Bag of Words (CBOW) model is one of the architectures introduced with
Word2Vec (Mikolov et al., 2013) for learning word embeddings. CBOW predicts a target word
based on its surrounding context, which is the opposite of the Skip-Gram model.

Key Idea

The CBOW model predicts the target word (wtw_twt) using the words in its context window. It
works by maximizing the probability of the target word given its surrounding context words.

For a target word wtw_twt, CBOW tries to maximize:

P(wt∣wt−C,…,wt−1,wt+1,…,wt+C)P(w_t | w_{t-C}, \dots, w_{t-1}, w_{t+1}, \dots,


w_{t+C})P(wt∣wt−C,…,wt−1,wt+1,…,wt+C)

Where:

• CCC is the size of the context window.


• wt−C…wt+Cw_{t-C} \dots w_{t+C}wt−C…wt+C are the words surrounding wtw_twt.

Architecture

1. Input Layer:
o Inputs are one-hot encoded representations of the context words.
2. Embedding Layer:
o Transforms the one-hot vectors into dense vector representations (word
embeddings) using a shared embedding matrix.
3. Averaging Layer:
o Takes the average of the embeddings of the context words to represent the entire
context.
4. Output Layer:
o Uses a softmax function to predict the probability distribution over the vocabulary
for the target word: P(wt∣context)=exp⁡(vwt⋅h)∑w′∈Vexp⁡(vw′⋅h)P(w_t |
context) = \frac{\exp(v_{w_t} \cdot h)}{\sum_{w' \in V} \exp(v_{w'} \cdot
h)}P(wt∣context)=∑w′∈Vexp(vw′⋅h)exp(vwt⋅h)
▪ hhh is the averaged context vector.
▪ vwtv_{w_t}vwt is the embedding for the target word.

Loss Function

The CBOW model minimizes the negative log-likelihood:

L=−∑tlog⁡P(wt∣wt−C,…,wt+C)L = - \sum_{t} \log P(w_t | w_{t-C}, \dots, w_{t+C})L=−t∑


logP(wt∣wt−C,…,wt+C)

To improve computational efficiency, techniques like negative sampling or hierarchical


softmax are often used.

Training Process

1. Slide a context window of size 2C+12C + 12C+1 over the text.


2. For each position in the text:
o Use the words within the window (excluding the target word) as the input.
o Predict the target word.
3. Update the embedding weights to minimize the loss.

Example in Python (Using Gensim)

Here’s how to train a CBOW model using the Gensim library:

python
Copy code
from gensim.models import Word2Vec

# Sample corpus
sentences = [
['the', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog'],
['king', 'queen', 'man', 'woman'],
['car', 'bus', 'vehicle', 'transport']
]
# Train CBOW model
model = Word2Vec(sentences, vector_size=100, window=5, min_count=1, sg=0,
workers=4)

# Get the vector for a word


vector = model.wv['king']
print(f"Vector for 'king': {vector[:10]}") # First 10 dimensions

# Find most similar words


similar_words = model.wv.most_similar('king', topn=3)
print(f"Most similar words to 'king': {similar_words}")

Here:

• sg=0 specifies CBOW (set sg=1 for Skip-Gram).


• vector_size=100 specifies the dimensionality of the embeddings.

Advantages of CBOW

1. Efficient for Large Corpora:


o CBOW trains faster than Skip-Gram because it predicts one target word at a time
instead of multiple context words.
2. Captures Context Information:
o Considers multiple context words simultaneously, which can help capture broader
semantic meanings.
3. Simple and Effective:
o CBOW is straightforward and works well in practice for general-purpose word
embeddings.

Disadvantages of CBOW

1. Less Effective for Rare Words:


o Struggles to represent rare words effectively because it averages the embeddings
of context words, which may dilute the contribution of each word.
2. Context Independence:
o Treats all context words equally, without considering their positional or syntactic
relationships to the target word.

Comparison: CBOW vs. Skip-Gram

Feature CBOW Skip-Gram


Feature CBOW Skip-Gram
Predict target word from context Predict context words from target
Objective
words word
Training Speed Faster Slower
Rare Word
Poorer Better
Performance
Use Case When context is more important When rare words are more critical

Applications of CBOW

1. Pre-trained Word Embeddings:


o CBOW embeddings are often used as input for other NLP models.
2. Semantic Similarity:
o Measure similarity between words or phrases based on their embeddings.
3. Downstream NLP Tasks:
o Text classification, machine translation, and named entity recognition.

Evaluations and Applications in Word Similarity:

Word similarity is a key concept in Natural Language Processing (NLP) that measures how
closely related two words are, either semantically or syntactically. Word embeddings, such as
those learned by models like Word2Vec, GloVe, or BERT, are commonly used to evaluate and
apply word similarity.

Evaluation of Word Similarity

1. Intrinsic Evaluation

Evaluates word embeddings by comparing their performance on specific linguistic tasks, such as
word similarity or word analogy, without involving downstream applications.

Word Similarity Datasets

• Purpose: These datasets consist of word pairs along with human-assigned similarity scores. The
task is to compute the similarity of word embeddings and compare them to human judgments.
• Popular Datasets:
o WordSim-353: 353 word pairs with similarity scores.
o SimLex-999: Focuses on distinguishing between similarity and association.
o MEN: Evaluates general word similarity and relatedness.
o RG-65: 65 word pairs for evaluating synonymy.
o Rare Words (RW): Measures performance on infrequent or domain-specific words.

Metrics

• Cosine Similarity: Measures the cosine of the angle between two word vectors:
cosine similarity=v1⋅v2∥v1∥∥v2∥\text{cosine similarity} = \frac{v_1 \cdot v_2}{\|v_1\|
\|v_2\|}cosine similarity=∥v1∥∥v2∥v1⋅v2 Where v1v_1v1 and v2v_2v2 are the word vectors.
• Spearman’s Rank Correlation: Compares the ranked similarity scores predicted by embeddings
with human-annotated scores.

Example in Python
python
Copy code
from scipy.spatial.distance import cosine
from scipy.stats import spearmanr

# Example word vectors


word_vectors = {'king': [0.2, 0.4, 0.6], 'queen': [0.3, 0.5, 0.7], 'car':
[0.1, 0.2, 0.3]}
human_scores = [0.95] # Human-assigned similarity for 'king' and 'queen'

# Compute cosine similarity


cosine_sim = 1 - cosine(word_vectors['king'], word_vectors['queen'])
print(f"Cosine Similarity (king, queen): {cosine_sim:.2f}")

# Compare with human scores


predicted_scores = [cosine_sim]
correlation, _ = spearmanr(predicted_scores, human_scores)
print(f"Spearman Correlation: {correlation:.2f}")

2. Extrinsic Evaluation

Tests word embeddings on downstream tasks that rely on semantic understanding.

• Examples:
o Text classification (e.g., sentiment analysis).
o Machine translation.
o Named entity recognition (NER).

The quality of word embeddings is measured by their contribution to the performance of these
tasks.

Applications of Word Similarity


1. Information Retrieval

• Search Engines: Improve document ranking by understanding semantic relationships between


query terms and document content.
o Example: Retrieve documents related to "car" when the query contains "automobile."
• Semantic Search: Allows retrieval based on meaning, rather than exact matches.

2. Word Sense Disambiguation

• Resolves the ambiguity of words with multiple meanings based on context.


o Example: Disambiguate "bank" (financial institution vs. riverbank) by comparing its
similarity to surrounding words.

3. Synonym Detection

• Identify synonyms in dictionaries or thesauri by comparing word embeddings.


o Example: Find that "happy" and "joyful" have high cosine similarity.

4. Text Similarity

• Compare larger text units (e.g., phrases, sentences, or documents) by averaging or combining
word vectors.
o Example: Measure similarity between "I love programming" and "Coding is my passion."

5. Machine Translation

• Align semantically similar words across languages using bilingual or multilingual embeddings.
o Example: Map the French word "roi" (king) to its English counterpart "king."

6. Sentiment Analysis

• Improve sentiment classification by leveraging embeddings to understand words like "amazing"


(positive) and "terrible" (negative) in context.

7. Dialogue Systems and Chatbots

• Identify user intents and match them to predefined responses by measuring similarity between
input and response vectors.

8. Recommender Systems

• Suggest items based on semantic similarity between user preferences and available options.
o Example: Recommend books similar to "The Lord of the Rings" by comparing
descriptions.
9. Knowledge Graph Construction

• Identify relationships between entities by analyzing word embeddings.


o Example: Infer "Paris is the capital of France" by the proximity of "Paris" and "France"
embeddings.

10. Text Clustering and Classification

• Cluster similar documents or classify them into categories using embeddings.


o Example: Group research papers by topic using word vector-based similarity.

Example Application: Synonym Detection


python
Copy code
from gensim.models import Word2Vec

# Sample corpus
sentences = [
['king', 'queen', 'man', 'woman'],
['car', 'vehicle', 'automobile', 'transport'],
['happy', 'joyful', 'cheerful', 'content']
]

# Train Word2Vec model


model = Word2Vec(sentences, vector_size=50, window=5, min_count=1, sg=0)

# Find most similar words


synonyms = model.wv.most_similar('happy', topn=3)
print(f"Synonyms for 'happy': {synonyms}")

Output:

arduino
Copy code
Synonyms for 'happy': [('joyful', 0.95), ('cheerful', 0.93), ('content',
0.91)]

Challenges in Word Similarity

1. Polysemy:
o A word may have multiple meanings (e.g., "bank").
o Solution: Use contextual embeddings (e.g., BERT) to represent words dynamically based
on context.
2. Domain Dependence:
o Word similarities may vary across domains (e.g., "cell" in biology vs. technology).
o Solution: Train embeddings on domain-specific corpora.
3. Rare Words:
o Rare or out-of-vocabulary (OOV) words may lack meaningful embeddings.
o Solution: Use models like FastText, which consider subword information.

You might also like