0% found this document useful (0 votes)

9 views20 pages

DL Unit-IV

Natural Language Processing (NLP) combines artificial intelligence and deep learning to enable machines to understand and generate human language, achieving state-of-the-art results in various tasks. Key concepts include text representation, sequence modeling, and the use of models like RNNs, LSTMs, and Transformers for tasks such as text classification and machine translation. Challenges in NLP include ambiguity, data scarcity, and bias, while advancements like contextual embeddings enhance semantic understanding.

Uploaded by

Rishika Vuggam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views20 pages

DL Unit-IV

Uploaded by

Rishika Vuggam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Introduction to Natural Language Processing (NLP) in Deep Learning

Natural Language Processing (NLP) is a field of artificial intelligence (AI) that enables
machines to understand, interpret, generate, and interact with human language. When combined
with deep learning, NLP has significantly advanced, enabling systems to achieve state-of-the-art
results in tasks like translation, sentiment analysis, question answering, and text generation.

Key Concepts in NLP

1. Text Representation:
o Words and sentences need to be represented numerically for models to process.
o Common approaches:
▪ One-Hot Encoding: Binary vector representation.
▪ Word Embeddings: Dense vector representations capturing semantic
similarity (e.g., Word2Vec, GloVe).
▪ Contextual Embeddings: Represent words based on their context in a
sentence (e.g., BERT, GPT).
2. Sequence Modeling:
o NLP tasks often involve understanding sequences of words.
o Sequence models process these inputs while preserving order and context.
3. Language Understanding and Generation:
o Tasks range from analyzing existing text to generating coherent and meaningful
new text.

Deep Learning Models in NLP

1. Recurrent Neural Networks (RNNs):

o Designed for sequential data processing.
o Maintain a hidden state to capture information from previous inputs.
o Limitations: Struggle with long-term dependencies due to vanishing gradients.
2. Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU):
o Variants of RNNs that address long-term dependency issues.
o Use gating mechanisms to control the flow of information.
3. Convolutional Neural Networks (CNNs):
o Effective for tasks like text classification.
o Capture local dependencies using convolutional filters.
4. Transformers:
o Revolutionized NLP by replacing recurrence with self-attention mechanisms.
o Models like BERT, GPT, and T5 are based on transformers.
o Highly parallelizable and capable of capturing long-term dependencies.
NLP Tasks and Applications

1. Text Classification:
o Categorize text into predefined labels.
o Examples: Sentiment analysis, spam detection.
2. Sequence Labeling:
o Assign a label to each token in a sequence.
o Examples: Part-of-speech tagging, named entity recognition (NER).
3. Machine Translation:
o Translate text from one language to another.
o Example: English to French translation.
4. Question Answering:
o Answer questions based on input text or a knowledge base.
5. Text Generation:
o Generate coherent and contextually relevant text.
o Examples: Chatbots, story generation.
6. Summarization:
o Generate concise summaries from longer texts.
o Types: Extractive and abstractive summarization.

Key Components in NLP Pipelines

1. Preprocessing:
o Tokenization: Splitting text into words or subwords.
o Lowercasing, removing stopwords, and stemming/lemmatization (traditional
methods).
o SentencePiece or Byte Pair Encoding (BPE) for subword tokenization in deep
learning.
2. Embedding Layers:
o Map words/tokens to vectors in a continuous space.
o Examples: Word2Vec, FastText, pre-trained embeddings (e.g., GloVe).
3. Model Architecture:
o Sequence models (RNNs, LSTMs) or attention-based models (Transformers).
o Pre-trained models like BERT, RoBERTa, GPT-3 fine-tuned for specific tasks.
4. Loss Functions:
o Cross-entropy loss for classification.
o Perplexity for language modeling.

Pre-Trained Models and Frameworks

• BERT (Bidirectional Encoder Representations from Transformers):

o Captures context from both directions in a sentence.
• GPT (Generative Pre-trained Transformer):
o Focuses on text generation.
• T5 (Text-to-Text Transfer Transformer):
o A unified framework for NLP tasks.
• RoBERTa:
o Improved version of BERT with optimized training.

Tools and Libraries

1. TensorFlow and PyTorch: Popular frameworks for building deep learning models.
2. Hugging Face Transformers: Library for pre-trained models and NLP pipelines.
3. spaCy: Industrial-strength NLP library.
4. NLTK: A classical NLP library for preprocessing and analysis.

Challenges in NLP

1. Ambiguity:
o Words can have multiple meanings depending on context.
o Example: "bank" (financial institution vs riverbank).
2. Data Scarcity:
o Annotated datasets may not be available for all languages or domains.
3. Bias in Data:
o Pre-trained models can reflect societal biases present in training data.
4. Multilingual Understanding:
o Handling multiple languages and code-switching (mixing languages in a
sentence).
5. Efficiency:
o Training and inference for large models can be resource-intensive.

Vector Space Model (VSM) for Semantics

The Vector Space Model (VSM) is a mathematical framework for representing and analyzing
the semantics of words, phrases, sentences, or entire documents. In this model, linguistic units
are represented as vectors in a high-dimensional space, where geometric relationships (e.g.,
angles or distances) capture semantic relationships.

Core Concepts of VSM

1. Representation:
o Words, documents, or phrases are embedded in a continuous vector space.
o The vectors are constructed such that semantically similar items are close to each other
in the space.
2. Dimensionality:
o Each dimension represents some aspect of meaning or context.
o In traditional models, dimensions might correspond to individual terms or features.
o In modern embeddings, dimensions are abstract features learned from data.
3. Similarity Measurement:
o Semantic similarity is measured using geometric metrics:
▪ Cosine Similarity: Measures the angle between two vectors.
▪ Euclidean Distance: Measures the straight-line distance between two points.

Types of VSMs for Semantics

1. Term-Document Matrix (Classical VSM):

• A sparse matrix where rows represent terms, and columns represent documents.
• Entries are term frequencies, term frequencies weighted by inverse document frequency (TF-
IDF), or similar metrics.
• Example: Representing a document by its word occurrence frequencies.

2. Latent Semantic Analysis (LSA):

• Reduces the dimensionality of the term-document matrix using Singular Value Decomposition
(SVD).
• Projects terms and documents into a lower-dimensional "latent" space where semantic
relationships are preserved.
• Addresses synonymy and polysemy (different words with similar meanings or the same word
with multiple meanings).

3. Word Embeddings:

• Dense, continuous vector representations of words learned from large corpora.

• Examples:
o Word2Vec: Uses skip-gram or CBOW architectures to predict word context.
o GloVe: Combines global word co-occurrence statistics with local context.
o FastText: Represents words as subword embeddings, capturing morphology.

4. Contextual Word Embeddings:

• Words are represented based on their surrounding context.

• Examples:
o BERT (Bidirectional Encoder Representations from Transformers): Captures
bidirectional context in sentences.
o ELMo: Produces embeddings for words that vary by their context.
o GPT: Focuses on text generation with autoregressive properties.

Applications of VSM in Semantics

1. Document Retrieval:
o Query-document matching using cosine similarity.
o Example: Search engines rank documents based on their vector similarity to a query.
2. Text Classification:
o Documents represented as vectors are used as inputs for classifiers.
o Example: Spam detection, topic classification.
3. Semantic Similarity:
o Computing similarity between words, sentences, or documents.
o Example: Plagiarism detection, sentence paraphrase identification.
4. Word Analogy Tasks:
o Solving analogies like "king - man + woman = queen" using vector arithmetic.
5. Clustering and Topic Modeling:
o Grouping semantically similar items together.
o Example: News categorization.

Strengths of VSM

1. Simplicity:
o Easy to compute and interpret basic term-document matrices.
2. Semantic Generalization:
o Techniques like LSA or embeddings capture latent relationships beyond word surface
forms.
3. Efficiency:
o Pre-trained word embeddings reduce the need for building models from scratch.

Limitations of Classical VSMs

1. Sparsity:
o Classical term-document matrices are sparse, requiring dimensionality reduction for
effective use.
2. Lack of Context:
o Words have fixed meanings, ignoring context.
o Example: "bank" (financial institution vs riverbank).
3. Synonymy and Polysemy:
o Hard to resolve without advanced techniques.
4. Scalability:
o Computationally expensive for large datasets without dimensionality reduction.

Modern Advances

1. Dynamic Embeddings:
o Contextual embeddings (BERT, GPT) address fixed-meaning limitations by considering
sentence context.
o Examples:
▪ In "He went to the bank to withdraw money," "bank" is associated with financial
institutions.
▪ In "The boat docked near the bank of the river," "bank" is associated with a
riverbank.
2. Sentence and Document Embeddings:
o Representing entire sentences or documents as single vectors.
o Example: Sentence-BERT, Universal Sentence Encoder.
3. Graph-Based VSMs:
o Represent relationships as graphs (e.g., word graphs, knowledge graphs).

Example: Word2Vec for Semantic Similarity

Training:

• Given a sentence, the model predicts surrounding words (context) for a given word.
• Result: Embeddings where similar words are geometrically close.

python
Copy code
from gensim.models import Word2Vec

# Sample sentences
sentences = [
['king', 'queen', 'man', 'woman'],
['cat', 'dog', 'animal', 'pet'],
['car', 'bus', 'vehicle', 'transport']
]

# Train Word2Vec
model = Word2Vec(sentences, vector_size=100, window=5, min_count=1,
workers=4)

# Get vector for a word

vector_king = model.wv['king']

# Find most similar words

similar_words = model.wv.most_similar('king', topn=5)
print(similar_words)

Word Vector Representations in Deep Learning

Word vector representations, also known as word embeddings, are dense, continuous, and low-
dimensional vector representations of words. They are foundational to many Natural Language
Processing (NLP) tasks, enabling models to capture semantic and syntactic relationships between
words.

Why Use Word Vector Representations?

1. Semantic Similarity:
o Words with similar meanings have similar vector representations.
o Example: "king" and "queen" are closer in vector space than "king" and "car."
2. Overcome Sparsity:
o Traditional methods like one-hot encoding create high-dimensional, sparse vectors
where most entries are zero.
o Word embeddings reduce dimensionality and improve efficiency.
3. Generalization:
o Embeddings capture word relationships beyond exact matches, enabling models to
generalize better across tasks.

Methods to Learn Word Representations

1. Count-Based Approaches

• Leverage word co-occurrence statistics in a corpus.

• Examples:
o Latent Semantic Analysis (LSA): Applies Singular Value Decomposition (SVD) to reduce
the dimensionality of the term-document matrix.
o GloVe (Global Vectors for Word Representation): Combines global word co-occurrence
information to create dense embeddings.

2. Prediction-Based Approaches

• Train a neural network to predict word relationships in context.

• Examples:
o Word2Vec: Predicts context words (CBOW) or the target word (Skip-Gram) from its
neighbors.
o FastText: Extends Word2Vec by considering subword information, capturing
morphological nuances.
o Contextual Embeddings (e.g., BERT, GPT): Learn word representations dynamically
based on their surrounding context.

Popular Word Embedding Models

1. Word2Vec

• Developed by Google.
• Two architectures:
o Skip-Gram: Predicts surrounding words given a target word.
o CBOW (Continuous Bag of Words): Predicts a target word given its context.
• Captures relationships like:
o "king - man + woman ≈ queen"

2. GloVe

• Developed by Stanford.
• Uses word co-occurrence matrices to capture global statistics.
• Embeddings are optimized such that: wordi⋅wordj=co-occurrence probability\text{word}_i \cdot
\text{word}_j = \text{co-occurrence probability}wordi⋅wordj=co-occurrence probability

3. FastText

• Developed by Facebook.
• Represents words as a sum of subword embeddings.
• Effective for rare words and morphologically rich languages.

4. Contextual Word Embeddings

• Incorporate word meanings based on sentence context.

• Examples:
o BERT (Bidirectional Encoder Representations from Transformers): Captures bidirectional
context.
o GPT (Generative Pre-trained Transformer): Focuses on left-to-right context for text
generation.
o ELMo (Embeddings from Language Models): Captures context-dependent
representations using LSTMs.
Example: Word2Vec in Python
python
Copy code
from gensim.models import Word2Vec

# Sample corpus
sentences = [
['king', 'queen', 'man', 'woman'],
['dog', 'cat', 'animal', 'pet'],
['car', 'bus', 'vehicle', 'transport']
]

# Train Word2Vec model

model = Word2Vec(sentences, vector_size=100, window=5, min_count=1,
workers=4)

# Get the vector for 'king'

vector = model.wv['king']
print(f"Vector for 'king': {vector[:10]}") # Print first 10 dimensions

# Find most similar words to 'king'

similar_words = model.wv.most_similar('king', topn=3)
print(f"Most similar words to 'king': {similar_words}")

Evaluation of Word Embeddings

1. Intrinsic Evaluation:
o Measures the quality of embeddings on linguistic tasks like:
▪ Word similarity (e.g., cosine similarity between "car" and "bus").
▪ Word analogy tasks (e.g., "man:king :: woman:queen").
2. Extrinsic Evaluation:
o Assesses the embeddings' impact on downstream tasks such as sentiment analysis,
machine translation, or text classification.

Advancements: Contextual Word Embeddings

Unlike static embeddings (e.g., Word2Vec, GloVe), contextual embeddings represent words
differently depending on their usage in a sentence. For example:

• In "He went to the bank to withdraw money," "bank" refers to a financial institution.
• In "The boat docked near the bank of the river," "bank" refers to a riverbank.

Examples:

• BERT: Pre-trained on a large corpus and fine-tuned for specific tasks.

• GPT: Specialized for generating coherent text.
• Transformer-based Models: Excel in NLP tasks, including translation, summarization, and
question answering.

Applications of Word Embeddings

1. Text Classification:
o Input embeddings are fed into classifiers for tasks like spam detection or sentiment
analysis.
2. Machine Translation:
o Models like Seq2Seq with attention mechanisms use embeddings for translating text.
3. Named Entity Recognition (NER):
o Identify and classify entities (e.g., names, dates, locations).
4. Question Answering:
o Power models like BERT for answering questions based on context.
5. Search and Information Retrieval:
o Rank documents based on semantic similarity with queries.

Continuous Skip-Gram Model:

The Continuous Skip-Gram Model is a neural network-based architecture introduced by the

creators of Word2Vec (Mikolov et al., 2013) for learning word embeddings. The model predicts
the surrounding context words of a given target word in a sentence, capturing semantic
relationships and linguistic patterns.

Key Idea

The primary objective of the skip-gram model is to maximize the likelihood of predicting the
context (neighboring words) for a given target word.

For a given word wtw_twt, the skip-gram model aims to maximize the probability of its
surrounding words within a defined context window size CCC:

∏−C≤j≤C,j≠0P(wt+j∣wt)\prod_{-C \leq j \leq C, j \neq 0} P(w_{t+j} | w_t)−C≤j≤C,j =0∏

P(wt+j∣wt)

Here:

• wtw_twt: Target word.

• wt+jw_{t+j}wt+j: Context words within a window size CCC.

Architecture

1. Input Layer:
o One-hot vector representation of the target word (size = vocabulary size VVV).
2. Embedding Layer:
o Transforms the one-hot vector into a dense vector of size ddd (embedding size).
o This layer consists of a matrix WWW of size V×dV \times dV×d, where each row
is the embedding of a word.
3. Output Layer:
o Produces probabilities for all words in the vocabulary using a softmax function:
P(wt+j∣wt)=exp⁡(vwt+j⋅vwt)∑w′∈Vexp⁡(vw′⋅vwt)P(w_{t+j} | w_t) =
\frac{\exp(v_{w_{t+j}} \cdot v_{w_t})}{\sum_{w' \in V} \exp(v_{w'} \cdot
v_{w_t})}P(wt+j∣wt)=∑w′∈Vexp(vw′⋅vwt)exp(vwt+j⋅vwt)
o Here, vwtv_{w_t}vwt and vwt+jv_{w_{t+j}}vwt+j are the embeddings of the
target and context words.

Loss Function

The model uses a negative log-likelihood as the loss function:

L=−∑t∑−C≤j≤C,j≠0log⁡P(wt+j∣wt)L = -\sum_{t} \sum_{-C \leq j \leq C, j \neq 0} \log

P(w_{t+j} | w_t)L=−t∑−C≤j≤C,j =0∑logP(wt+j∣wt)

To improve computational efficiency, negative sampling or hierarchical softmax is used:

• Negative Sampling: Optimizes a subset of output probabilities by sampling "negative"

examples (words that are not the true context words).
• Hierarchical Softmax: Uses a binary tree structure to reduce the computational cost of
the softmax operation.

Training

1. A sliding window of size 2C+12C+12C+1 moves over a sentence.

2. For each target word wtw_twt, context words within the window are selected as positive
examples.
3. The model updates embeddings for the target-context pairs using stochastic gradient
descent (SGD).
Example in Python (Using Gensim)

Here’s how to train a skip-gram model using the Gensim library:

python
Copy code
from gensim.models import Word2Vec

# Sample corpus
sentences = [
['the', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog'],
['king', 'queen', 'man', 'woman'],
['car', 'bus', 'vehicle', 'transport']
]

# Train skip-gram model

model = Word2Vec(sentences, vector_size=100, window=5, min_count=1, sg=1,
workers=4)

# Get the vector for a word

vector = model.wv['king']
print(f"Vector for 'king': {vector[:10]}") # First 10 dimensions

# Find most similar words

similar_words = model.wv.most_similar('king', topn=3)
print(f"Most similar words to 'king': {similar_words}")

Advantages of the Skip-Gram Model

1. Captures Semantic Relationships:

o Embeddings learn relationships like "king - man + woman = queen."
2. Handles Rare Words Well:
o Skip-gram works effectively even when a word appears infrequently in the
corpus.
3. Scalability:
o With optimizations like negative sampling, it scales well to large vocabularies and
corpora.

Applications

1. Semantic Similarity:
o Measure how similar two words are based on their vector representations.
2. Word Analogy Tasks:
o Solve analogies using vector arithmetic (e.g., "man:king :: woman:queen").
3. NLP Tasks:
oInput for downstream tasks like text classification, machine translation, and
question answering.
4. Pre-training:
o Skip-gram embeddings can initialize models for other NLP tasks to improve
performance.

Continuous Bag of Words (CBOW) Model:

The Continuous Bag of Words (CBOW) model is one of the architectures introduced with
Word2Vec (Mikolov et al., 2013) for learning word embeddings. CBOW predicts a target word
based on its surrounding context, which is the opposite of the Skip-Gram model.

Key Idea

The CBOW model predicts the target word (wtw_twt) using the words in its context window. It
works by maximizing the probability of the target word given its surrounding context words.

For a target word wtw_twt, CBOW tries to maximize:

P(wt∣wt−C,…,wt−1,wt+1,…,wt+C)P(w_t | w_{t-C}, \dots, w_{t-1}, w_{t+1}, \dots,

w_{t+C})P(wt∣wt−C,…,wt−1,wt+1,…,wt+C)

Where:

• CCC is the size of the context window.

• wt−C…wt+Cw_{t-C} \dots w_{t+C}wt−C…wt+C are the words surrounding wtw_twt.

Architecture

1. Input Layer:
o Inputs are one-hot encoded representations of the context words.
2. Embedding Layer:
o Transforms the one-hot vectors into dense vector representations (word
embeddings) using a shared embedding matrix.
3. Averaging Layer:
o Takes the average of the embeddings of the context words to represent the entire
context.
4. Output Layer:
o Uses a softmax function to predict the probability distribution over the vocabulary
for the target word: P(wt∣context)=exp⁡(vwt⋅h)∑w′∈Vexp⁡(vw′⋅h)P(w_t |
context) = \frac{\exp(v_{w_t} \cdot h)}{\sum_{w' \in V} \exp(v_{w'} \cdot
h)}P(wt∣context)=∑w′∈Vexp(vw′⋅h)exp(vwt⋅h)
▪ hhh is the averaged context vector.
▪ vwtv_{w_t}vwt is the embedding for the target word.

Loss Function

The CBOW model minimizes the negative log-likelihood:

L=−∑tlog⁡P(wt∣wt−C,…,wt+C)L = - \sum_{t} \log P(w_t | w_{t-C}, \dots, w_{t+C})L=−t∑

logP(wt∣wt−C,…,wt+C)

To improve computational efficiency, techniques like negative sampling or hierarchical

softmax are often used.

Training Process

1. Slide a context window of size 2C+12C + 12C+1 over the text.

2. For each position in the text:
o Use the words within the window (excluding the target word) as the input.
o Predict the target word.
3. Update the embedding weights to minimize the loss.

Example in Python (Using Gensim)

Here’s how to train a CBOW model using the Gensim library:

python
Copy code
from gensim.models import Word2Vec

# Sample corpus
sentences = [
['the', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog'],
['king', 'queen', 'man', 'woman'],
['car', 'bus', 'vehicle', 'transport']
]
# Train CBOW model
model = Word2Vec(sentences, vector_size=100, window=5, min_count=1, sg=0,
workers=4)

# Get the vector for a word

vector = model.wv['king']
print(f"Vector for 'king': {vector[:10]}") # First 10 dimensions

# Find most similar words

similar_words = model.wv.most_similar('king', topn=3)
print(f"Most similar words to 'king': {similar_words}")

Here:

• sg=0 specifies CBOW (set sg=1 for Skip-Gram).

• vector_size=100 specifies the dimensionality of the embeddings.

Advantages of CBOW

1. Efficient for Large Corpora:

o CBOW trains faster than Skip-Gram because it predicts one target word at a time
instead of multiple context words.
2. Captures Context Information:
o Considers multiple context words simultaneously, which can help capture broader
semantic meanings.
3. Simple and Effective:
o CBOW is straightforward and works well in practice for general-purpose word
embeddings.

Disadvantages of CBOW

1. Less Effective for Rare Words:

o Struggles to represent rare words effectively because it averages the embeddings
of context words, which may dilute the contribution of each word.
2. Context Independence:
o Treats all context words equally, without considering their positional or syntactic
relationships to the target word.

Comparison: CBOW vs. Skip-Gram

Feature CBOW Skip-Gram

Feature CBOW Skip-Gram
Predict target word from context Predict context words from target
Objective
words word
Training Speed Faster Slower
Rare Word
Poorer Better
Performance
Use Case When context is more important When rare words are more critical

Applications of CBOW

1. Pre-trained Word Embeddings:

o CBOW embeddings are often used as input for other NLP models.
2. Semantic Similarity:
o Measure similarity between words or phrases based on their embeddings.
3. Downstream NLP Tasks:
o Text classification, machine translation, and named entity recognition.

Evaluations and Applications in Word Similarity:

Word similarity is a key concept in Natural Language Processing (NLP) that measures how
closely related two words are, either semantically or syntactically. Word embeddings, such as
those learned by models like Word2Vec, GloVe, or BERT, are commonly used to evaluate and
apply word similarity.

Evaluation of Word Similarity

1. Intrinsic Evaluation

Evaluates word embeddings by comparing their performance on specific linguistic tasks, such as
word similarity or word analogy, without involving downstream applications.

Word Similarity Datasets

• Purpose: These datasets consist of word pairs along with human-assigned similarity scores. The
task is to compute the similarity of word embeddings and compare them to human judgments.
• Popular Datasets:
o WordSim-353: 353 word pairs with similarity scores.
o SimLex-999: Focuses on distinguishing between similarity and association.
o MEN: Evaluates general word similarity and relatedness.
o RG-65: 65 word pairs for evaluating synonymy.
o Rare Words (RW): Measures performance on infrequent or domain-specific words.

Metrics

• Cosine Similarity: Measures the cosine of the angle between two word vectors:
cosine similarity=v1⋅v2∥v1∥∥v2∥\text{cosine similarity} = \frac{v_1 \cdot v_2}{\|v_1\|
\|v_2\|}cosine similarity=∥v1∥∥v2∥v1⋅v2 Where v1v_1v1 and v2v_2v2 are the word vectors.
• Spearman’s Rank Correlation: Compares the ranked similarity scores predicted by embeddings
with human-annotated scores.

Example in Python
python
Copy code
from scipy.spatial.distance import cosine
from scipy.stats import spearmanr

# Example word vectors

word_vectors = {'king': [0.2, 0.4, 0.6], 'queen': [0.3, 0.5, 0.7], 'car':
[0.1, 0.2, 0.3]}
human_scores = [0.95] # Human-assigned similarity for 'king' and 'queen'

# Compute cosine similarity

cosine_sim = 1 - cosine(word_vectors['king'], word_vectors['queen'])
print(f"Cosine Similarity (king, queen): {cosine_sim:.2f}")

# Compare with human scores

predicted_scores = [cosine_sim]
correlation, _ = spearmanr(predicted_scores, human_scores)
print(f"Spearman Correlation: {correlation:.2f}")

2. Extrinsic Evaluation

Tests word embeddings on downstream tasks that rely on semantic understanding.

• Examples:
o Text classification (e.g., sentiment analysis).
o Machine translation.
o Named entity recognition (NER).

The quality of word embeddings is measured by their contribution to the performance of these
tasks.

Applications of Word Similarity

1. Information Retrieval

• Search Engines: Improve document ranking by understanding semantic relationships between

query terms and document content.
o Example: Retrieve documents related to "car" when the query contains "automobile."
• Semantic Search: Allows retrieval based on meaning, rather than exact matches.

2. Word Sense Disambiguation

• Resolves the ambiguity of words with multiple meanings based on context.

o Example: Disambiguate "bank" (financial institution vs. riverbank) by comparing its
similarity to surrounding words.

3. Synonym Detection

• Identify synonyms in dictionaries or thesauri by comparing word embeddings.

o Example: Find that "happy" and "joyful" have high cosine similarity.

4. Text Similarity

• Compare larger text units (e.g., phrases, sentences, or documents) by averaging or combining
word vectors.
o Example: Measure similarity between "I love programming" and "Coding is my passion."

5. Machine Translation

• Align semantically similar words across languages using bilingual or multilingual embeddings.
o Example: Map the French word "roi" (king) to its English counterpart "king."

6. Sentiment Analysis

• Improve sentiment classification by leveraging embeddings to understand words like "amazing"

(positive) and "terrible" (negative) in context.

7. Dialogue Systems and Chatbots

• Identify user intents and match them to predefined responses by measuring similarity between
input and response vectors.

8. Recommender Systems

• Suggest items based on semantic similarity between user preferences and available options.
o Example: Recommend books similar to "The Lord of the Rings" by comparing
descriptions.
9. Knowledge Graph Construction

• Identify relationships between entities by analyzing word embeddings.

o Example: Infer "Paris is the capital of France" by the proximity of "Paris" and "France"
embeddings.

10. Text Clustering and Classification

• Cluster similar documents or classify them into categories using embeddings.

o Example: Group research papers by topic using word vector-based similarity.

Example Application: Synonym Detection

python
Copy code
from gensim.models import Word2Vec

# Sample corpus
sentences = [
['king', 'queen', 'man', 'woman'],
['car', 'vehicle', 'automobile', 'transport'],
['happy', 'joyful', 'cheerful', 'content']
]

# Train Word2Vec model

model = Word2Vec(sentences, vector_size=50, window=5, min_count=1, sg=0)

# Find most similar words

synonyms = model.wv.most_similar('happy', topn=3)
print(f"Synonyms for 'happy': {synonyms}")

Output:

arduino
Copy code
Synonyms for 'happy': [('joyful', 0.95), ('cheerful', 0.93), ('content',
0.91)]

Challenges in Word Similarity

1. Polysemy:
o A word may have multiple meanings (e.g., "bank").
o Solution: Use contextual embeddings (e.g., BERT) to represent words dynamically based
on context.
2. Domain Dependence:
o Word similarities may vary across domains (e.g., "cell" in biology vs. technology).
o Solution: Train embeddings on domain-specific corpora.
3. Rare Words:
o Rare or out-of-vocabulary (OOV) words may lack meaningful embeddings.
o Solution: Use models like FastText, which consider subword information.

Natural Language Processing Notes
No ratings yet
Natural Language Processing Notes
61 pages
NLP M4 Part 2 SPP
No ratings yet
NLP M4 Part 2 SPP
71 pages
Unit-3NaturalLanguageProcessing (NLP) 1 T1743588944524
No ratings yet
Unit-3NaturalLanguageProcessing (NLP) 1 T1743588944524
83 pages
NLP 160709201345
No ratings yet
NLP 160709201345
61 pages
BDMH LLM
No ratings yet
BDMH LLM
51 pages
Unit 2 Updated New
No ratings yet
Unit 2 Updated New
77 pages
Project Review-IV Presentation On: Department of Information Technology 2025 Semester VIII
No ratings yet
Project Review-IV Presentation On: Department of Information Technology 2025 Semester VIII
44 pages
SESSION 1 LLMs
No ratings yet
SESSION 1 LLMs
40 pages
BERT
No ratings yet
BERT
98 pages
Speech and Language Processing - J&M
No ratings yet
Speech and Language Processing - J&M
599 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
28 pages
3-Natural Language Processing With Attention Models
No ratings yet
3-Natural Language Processing With Attention Models
62 pages
NLP Slides2
No ratings yet
NLP Slides2
93 pages
Summaries of The Chapters
No ratings yet
Summaries of The Chapters
29 pages
NLP Concepts
No ratings yet
NLP Concepts
37 pages
Vector Semantics and Embeddings
No ratings yet
Vector Semantics and Embeddings
29 pages
The 7 NLP Techniques That Will Change How You Communicate in The Future (Part I)
No ratings yet
The 7 NLP Techniques That Will Change How You Communicate in The Future (Part I)
19 pages
Unit 5b - Natural Language Processing
No ratings yet
Unit 5b - Natural Language Processing
41 pages
CCS369 Unit-2 20.12.24
No ratings yet
CCS369 Unit-2 20.12.24
41 pages
Unit - 4 DL
No ratings yet
Unit - 4 DL
33 pages
A M3 RD Ipjn Yd Ps GKF
No ratings yet
A M3 RD Ipjn Yd Ps GKF
20 pages
GenAI Workflow Automation NPTEL Zoom Course
No ratings yet
GenAI Workflow Automation NPTEL Zoom Course
88 pages
08-DL-Deep Learning For Text Data (Transfer Learning in NLP)
No ratings yet
08-DL-Deep Learning For Text Data (Transfer Learning in NLP)
53 pages
Unit-III NLP
No ratings yet
Unit-III NLP
15 pages
Trend
No ratings yet
Trend
47 pages
AI Primer
No ratings yet
AI Primer
12 pages
Agarwal, Resume Shortlisting and Ranking With Transformers
No ratings yet
Agarwal, Resume Shortlisting and Ranking With Transformers
12 pages
2009 Tutorial Nips
No ratings yet
2009 Tutorial Nips
113 pages
Slide
No ratings yet
Slide
28 pages
Model
No ratings yet
Model
6 pages
Dealing With Textual Data
No ratings yet
Dealing With Textual Data
67 pages
NLP Handwritten Notes
No ratings yet
NLP Handwritten Notes
26 pages
NLP Prez Word - Sentence Embedding - MAQUET - MARTIN - LEEFEBURE - MOGAVERO
No ratings yet
NLP Prez Word - Sentence Embedding - MAQUET - MARTIN - LEEFEBURE - MOGAVERO
18 pages
08 Word Embeddings (2021)
No ratings yet
08 Word Embeddings (2021)
58 pages
Three 150224 Generative A I Intro
No ratings yet
Three 150224 Generative A I Intro
19 pages
Chatgpt For Dummies
No ratings yet
Chatgpt For Dummies
21 pages
NLP 2
No ratings yet
NLP 2
8 pages
Introductory Sheet
No ratings yet
Introductory Sheet
4 pages
Madhav Institute of Technology & Science, Gwalior
No ratings yet
Madhav Institute of Technology & Science, Gwalior
13 pages
ML For NLP-LO4
No ratings yet
ML For NLP-LO4
42 pages
Large-Scale News Classification Using BERT Languag
No ratings yet
Large-Scale News Classification Using BERT Languag
9 pages
Module1 L4 LLMs New
No ratings yet
Module1 L4 LLMs New
37 pages
Complete NLP Guide - From Fundamentals To Deep Learning With TensorFlow
No ratings yet
Complete NLP Guide - From Fundamentals To Deep Learning With TensorFlow
13 pages
Presentation 11
No ratings yet
Presentation 11
20 pages
Natural Language Processing
No ratings yet
Natural Language Processing
6 pages
Ai 1
No ratings yet
Ai 1
22 pages
Transformer
No ratings yet
Transformer
5 pages
Word Embeddings Classification
No ratings yet
Word Embeddings Classification
52 pages
Unit 1 and 2
No ratings yet
Unit 1 and 2
5 pages
Final Ayush Report Internship
No ratings yet
Final Ayush Report Internship
49 pages
Thuyết Trình TWP
No ratings yet
Thuyết Trình TWP
7 pages
REPORT-MTechPESJul23BGrp2-3 (22-02-25)
No ratings yet
REPORT-MTechPESJul23BGrp2-3 (22-02-25)
15 pages
Big Data Analytics Chap 11
No ratings yet
Big Data Analytics Chap 11
8 pages
NLP Short Que Ans
No ratings yet
NLP Short Que Ans
21 pages
2020 NLPDeepLearning
No ratings yet
2020 NLPDeepLearning
72 pages
Unit 5 DL
No ratings yet
Unit 5 DL
11 pages
NLP Crash Course Comprehensive
No ratings yet
NLP Crash Course Comprehensive
2 pages
Project Plan - Kel 5 PDF
No ratings yet
Project Plan - Kel 5 PDF
5 pages
Introduction To NLP - First - Week - Lecture - 1st
No ratings yet
Introduction To NLP - First - Week - Lecture - 1st
6 pages
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
100% (1)
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
12 pages
The Diverse Landscape of Large Language Models Deepsense Ai
No ratings yet
The Diverse Landscape of Large Language Models Deepsense Ai
16 pages
Blind Report
No ratings yet
Blind Report
46 pages
Generative AI and LLMS
No ratings yet
Generative AI and LLMS
34 pages
Natural Language Processing For Sentiment Analysis - Ankur Shukla
No ratings yet
Natural Language Processing For Sentiment Analysis - Ankur Shukla
27 pages
NLP CT2 Set B Answer Key
No ratings yet
NLP CT2 Set B Answer Key
12 pages
(9,10) Transformers - 3
0% (1)
(9,10) Transformers - 3
92 pages
Qwen2.5 Omni 技术报告英中对照版
No ratings yet
Qwen2.5 Omni 技术报告英中对照版
41 pages
Natural Language Processing Investment Applications 1682008453
No ratings yet
Natural Language Processing Investment Applications 1682008453
60 pages
Experimental Evaluation of Bidirectional Encoder Representations From Transformers Models For De-Identification of Clinical Document Images
No ratings yet
Experimental Evaluation of Bidirectional Encoder Representations From Transformers Models For De-Identification of Clinical Document Images
8 pages
Vector Search - GenAI+Search
No ratings yet
Vector Search - GenAI+Search
40 pages
ML - Attention Mechanism - GeeksforGeeks
No ratings yet
ML - Attention Mechanism - GeeksforGeeks
18 pages
M SCDS (Sem-III)
No ratings yet
M SCDS (Sem-III)
33 pages
A Survey of Text-Matching Techniques
No ratings yet
A Survey of Text-Matching Techniques
53 pages
Astra: Toward General-Purpose Mobile Robots Via Hierarchical Multimodal Learning
No ratings yet
Astra: Toward General-Purpose Mobile Robots Via Hierarchical Multimodal Learning
27 pages
Time-Series Forecasting With Deep Learning - A Survey
No ratings yet
Time-Series Forecasting With Deep Learning - A Survey
14 pages
Autoregressive Distillation of Diffusion Transformers
No ratings yet
Autoregressive Distillation of Diffusion Transformers
27 pages
Deep Learning Research Paper
No ratings yet
Deep Learning Research Paper
4 pages
A Unified Multi-Task Semantic Communication System For Multimodal Data
No ratings yet
A Unified Multi-Task Semantic Communication System For Multimodal Data
16 pages
Thesis Final - Pham Dung - Quang Anh - Ver2
No ratings yet
Thesis Final - Pham Dung - Quang Anh - Ver2
30 pages
Mohamed Nassar Resume
No ratings yet
Mohamed Nassar Resume
6 pages
Retrieval Is All You Need - Developing An Ai Powered Chatbot With RAG in Azure
No ratings yet
Retrieval Is All You Need - Developing An Ai Powered Chatbot With RAG in Azure
28 pages
Foundation Models in Robotics: Applications, Challenges, and The Future
No ratings yet
Foundation Models in Robotics: Applications, Challenges, and The Future
33 pages
Predicting Rapid Impact Compaction of Soil Using A Pa 2025 Engineering Appli
No ratings yet
Predicting Rapid Impact Compaction of Soil Using A Pa 2025 Engineering Appli
14 pages
Speaker Emotion Recognition: Leveraging Self-Supervised Models For Feature Extraction Using Wav2Vec2 and Hubert
No ratings yet
Speaker Emotion Recognition: Leveraging Self-Supervised Models For Feature Extraction Using Wav2Vec2 and Hubert
9 pages
Enhancing Contract Negotiations With LLM-Based Legal Document
No ratings yet
Enhancing Contract Negotiations With LLM-Based Legal Document
11 pages
KannadaGPT Tuning Language Model For Kannada Text Generation
No ratings yet
KannadaGPT Tuning Language Model For Kannada Text Generation
8 pages
Transformer CNN Mixture Architecture
No ratings yet
Transformer CNN Mixture Architecture
10 pages
Marc icABCD 2023-4
No ratings yet
Marc icABCD 2023-4
7 pages
The Transformer Architecture Explai
No ratings yet
The Transformer Architecture Explai
2 pages
Visual Word: Unlocking the Power of Image Understanding
From Everand
Visual Word: Unlocking the Power of Image Understanding
Fouad Sabry
No ratings yet