0% found this document useful (0 votes)

34 views33 pages

Word 2 Vec

Word 2 vector

Uploaded by

jprem637

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views33 pages

Word 2 Vec

Word 2 vector

Uploaded by

jprem637

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Word2Vec

1
Word2Vec - Introduction
➢ Word2Vec is a technique used to learn vector representations
(embeddings) of words, capturing semantic and syntactic relationships
between them.

2
Word2Vec - Introduction
➢ Word2Vec is a technique used to learn vector representations
(embeddings) of words, capturing semantic and syntactic relationships
between them.

➢ The key intuition behind Word2Vec is to represent words in a continuous

vector space where words that share similar contexts in the corpus are
located close to each other in the vector space.

3
Word2Vec - Introduction
➢ Word2Vec is a technique used to learn vector representations
(embeddings) of words, capturing semantic and syntactic relationships
between them.

➢ The key intuition behind Word2Vec is to represent words in a continuous

vector space where words that share similar contexts in the corpus are
located close to each other in the vector space.

➢ Words that appear in similar contexts (surrounded by the same words)

tend to have similar meanings. Word2Vec leverages this by learning from
large text corpora, optimizing word vectors such that words with similar
contexts end up with similar vectors.

4
Word2Vec - Introduction
➢ Some Examples:
➢ Synonyms and Similar Words
➢ happy and joyful often appear in similar contexts, like "She felt very ___ today.“
➢ The vectors for happy and joyful will be close to each other in the embedding
space because they often occur in similar contexts, indicating they are semantically
similar.

5
Word2Vec - Introduction
➢ Some Examples:
➢ Synonyms and Similar Words
➢ happy and joyful often appear in similar contexts, like "She felt very ___ today.“
➢ The vectors for happy and joyful will be close to each other in the embedding
space because they often occur in similar contexts, indicating they are semantically
similar.

➢ Analogies
➢ King is to Man as Queen is to Woman.
➢ The relationship between these words can be captured by vector arithmetic. For
example, if king − man + woman is computed using the corresponding word vectors,
the result is a vector close to queen.
➢ vking − vman + vwoman ≈ vqueen

6
Word2Vec - Introduction
➢ Some Examples:
➢ Associations in Context
➢ Paris is associated with France, and Tokyo is associated with Japan.
➢ The vector relationship can be captured similarly to the analogy example:
➢ vParis − vFrance ≈ vTokyo − vJapan
➢ This shows how countries relate to their capitals in the vector space.

7
Word2Vec - Introduction
➢ Some Examples:
➢ Associations in Context
➢ Paris is associated with France, and Tokyo is associated with Japan.
➢ The vector relationship can be captured similarly to the analogy example:
➢ vParis − vFrance ≈ vTokyo − vJapan
➢ This shows how countries relate to their capitals in the vector space.

➢ Syntactic Relationships
➢ walking is related to walk as running is related to run.
➢ The model can capture these relationships where the vectors for walking and walk
have a similar directional relationship as running and run.
➢ This indicates that Word2Vec also captures the morphological similarities
between words.

8
Word2Vec - Introduction
➢ Some Examples:
➢ Categorical Groupings
➢ Apple, Banana, and Orange are all fruits, while Dog, Cat, and Horse are animals.
➢ Words that belong to the same category (like fruits or animals) tend to cluster
together in the embedding space, reflecting their semantic grouping.

9
Word2Vec - Introduction
➢ Some Examples:
➢ Categorical Groupings
➢ Apple, Banana, and Orange are all fruits, while Dog, Cat, and Horse are animals.
➢ Words that belong to the same category (like fruits or animals) tend to cluster
together in the embedding space, reflecting their semantic grouping.

➢ Causal or Temporal Relationships

➢ Rain often appears in contexts with umbrella or wet, while fire might appear with
smoke or hot.
➢ The vectors for words like rain and umbrella may be closer together because of
their frequent co-occurrence in causal or temporal contexts.

10
Word2Vec – Improvements over 1-Hot Encoding
➢ Sparsity vs. Density
➢ One-Hot Encoding: Each word is represented as a very high-dimensional
vector (the size of the vocabulary), with all elements being zero except one.
This representation is sparse and doesn't capture any meaningful relationships
between words.

➢ Word2Vec: Words are represented by dense vectors, where each dimension

carries some information about the word. This makes it possible to capture
meaningful relationships and similarities between words. Unlike one-hot
encoding, where each word is represented as a sparse, high-dimensional vector,
Word2Vec learns dense, low-dimensional vectors (usually 50 to 300 dimensions).
These dense vectors capture more nuanced relationships between words.

11
Word2Vec – Improvements over 1-Hot Encoding
➢ Lack of Semantic Information
➢ One-Hot Encoding: One-hot vectors are orthogonal to each other, meaning
that no two words are similar based on their representation. This limits the
ability to capture any semantic or syntactic relationships between words.

➢ Word2Vec: Similar words have similar vector representations. For example,

"cat" and "dog" might have vectors that are close together, indicating their
semantic similarity.

12
Word2Vec – Improvements over 1-Hot Encoding
➢ Dimensionality
➢ One-Hot Encoding: The dimensionality of a one-hot vector is equal to the size
of the vocabulary, which can be very large (e.g., hundreds of thousands of
dimensions).

➢ Word2Vec: The dimensionality of word embeddings is much lower (e.g., 50-

300 dimensions), making them more computationally efficient to use in machine
learning models.

13
Word2Vec – Improvements over 1-Hot Encoding
➢ Flexibility in Downstream Tasks
➢ One-Hot Encoding: Doesn't allow for capturing relationships between words,
making it less effective in tasks like semantic similarity, analogy reasoning, and
sentiment analysis.

➢ Word2Vec: Embeddings can be used in various downstream tasks such as text

classification, sentiment analysis, and machine translation, where the
relationships between words are important.

14
Word2Vec – Two Models
➢ Skip-gram: Given a word, predict its surrounding context words.

➢ CBOW (Continuous Bag of Words): Given the surrounding context words,

predict the target word.

15
Word2Vec – Architecture
➢ Both CBOW and Skipgram, are 3-layers neural networks.
➢ Input Layer (Size: |V|)
➢ Hidden Layer (Size: |E|)
➢ Output Layer (Size: |V|)

16
Word2Vec – CBOW Models in Details
➢

Image Source: https://github.com/OlgaChernytska/word2vec-pytorch/tree/main 17

Word2Vec – CBOW Models in Details
➢

where 𝑧𝑖 is the logit

(the raw score from
the neural network) for
word 𝑖.
Image Source: https://github.com/OlgaChernytska/word2vec-pytorch/tree/main 18
Word2Vec – Skipgram Models in Details
➢

Image Source: https://github.com/OlgaChernytska/word2vec-pytorch/tree/main 19

Word2Vec – Skipgram Models in Details
➢

Image Source: https://github.com/OlgaChernytska/word2vec-pytorch/tree/main 20

Word2Vec – Inference-time Behavior
➢ Once the Word2Vec (CBOW/Skipgram) model is trained, the output layer
is dropped, and the word whose dense vector representation (word
embedding) is required is projected on the hidden/embedding layer.

➢ The original Word2Vec was trained on the Google News dataset.

21
Word2Vec – Computational Efficiency
➢ In Word2Vec, predicting the probability distribution over the entire
vocabulary can be computationally expensive, especially when the vocabulary
size 𝑉 is large.

➢The computational complexity of the softmax function for a single

prediction is 𝑂(𝑉), because it requires calculating the exponentials for all 𝑉
words in the vocabulary and then normalizing them. Even during the training,
when the input changes from one example to another, logits and, therefore,
predictions will change, requiring calculations of exponential terms.

22
Word2Vec – Hierarchical Softmax
➢ Hierarchical Softmax is a technique that approximates the softmax
function by representing the probability distribution over the vocabulary as
a binary tree (often a Huffman tree) instead of a flat structure.

➢In this tree:

➢ Internal Nodes: Represent binary decisions (left or right) during traversal.
➢ Leaf Nodes: Correspond to words in the vocabulary.

➢ In traditional softmax, the output layer is flat and has |V| neurons with
softmax activation function. Hierarchical Softmax replaces the output layer
with a binary tree with |V|-1 internal nodes (neurons with sigmoid activation
function) and |V| leaf nodes. Each internal node is fully connected to a
hidden (embedding) layer. Leaf nodes represent vocab words.

23
Word2Vec – Hierarchical Softmax
➢The key idea is that instead of directly computing the probability of a
word, the model computes the probability of a specific path from the root
of the tree to the leaf node corresponding to that word. Each step in the
path represents a binary decision.

24
Word2Vec – Hierarchical Softmax
➢ Example: Let’s consider a small vocabulary: {cat, dog, fish, bird}. We want
to calculate the probability of the word "fish.“

➢ The vocabulary is organized into a binary tree. Here's one possible tree:

➢ The leaf nodes represent the words. Internal nodes represent binary
decisions (left or right).

25
Word2Vec – Hierarchical Softmax
➢ To calculate the probability of "fish," the model predicts the sequence of
decisions to reach the leaf node "fish.“

➢ The path is right (1) -> left (0). We need to compute 𝑃(right at root) and
𝑃(left at node B).

➢ P(fish) = P(right at root) × P(left at node B)

➢Each decision probability is computed as:

𝑃(right at root) = 𝜎(𝑣hidden ⋅ whidden-root + broot)

(Assuming 𝑃(right) is 𝜎 as the encoding above is 1 for right, It can be the other way around)

𝑃(left at node B) = 1 - 𝜎(𝑣hidden ⋅ whidden-B + bB)

where 𝜎(𝑥) is the sigmoid function:𝜎(𝑥)=1/(1+exp(−𝑥))

26
➢ Then, the same Categorical Cross Entropy loss is used
Negative Sampling
➢ It is another technique to address the computational complexity aspect of
Word2Vec

➢ The architecture remains the same (|V|-|E|-|V|), but the output layer has
sigmoid neurons instead of softmax neurons.

➢ Here, for every positive example, we randomly sample k negative examples

from the vocab.

➢ k is between 5 and 20 for small and 2 and 5 for large datasets.

27
Negative Sampling
➢ Input-Groundtruth Preparation
➢ While using negative sampling, each input-groundtruth pair consists of a
positive pair and k negative pairs.
➢ Sentence: “The quick brown fox jumps over the lazy dog.”
➢ Assume context window size = 2

➢ Skip-gram
➢ Input: "fox“
➢ Groundtruth (Positive Context Words): “quick”, “brown”, “jumps”, “over”
➢ Negative Samples for the first example: “dog”, “the” (assuming k=2)
➢ So, the first input-groundtruth pair is (fox-quick, fox-dog, fox-the)
➢ The second input-groundtruth pair is (fox-brown, fox-NegSample1, fox-NegSample2)
etc.

28
Negative Sampling
➢ CBOW
➢ Input (Context Words): “quick”, “brown”, “jumps”, “over”
➢ Groundtruth (Target Word): “fox”
➢ Negative Samples for the first example: “dog”, “the” (assuming k=2)
➢ First input-groundtuth pair is: (quick, brown, jumps, over -> fox; quick, brown,
jumps, over -> dog; quick, brown, jumps, over -> the)

29
Negative Sampling
➢ The loss for a single input-groundtruth pair is defined as

Remember: σ(−x)=1−σ(x)
where:
• 𝜎 is the sigmoid function.
• 𝑣𝑤 is the vector for the correct word. To be precise, the weight
vector between the neuron of the correct/ground truth word and the
hidden (embedding) layer
• 𝑣𝑐 is the context’s (input word’s) embedding vector.
• 𝑣𝑤𝑖 are the weight vectors for the negative samples.
• 𝑘 is the number of negative samples.

30
Negative Sampling
➢ It is like, here, we want our neural network to predict 1 for the positive
input-groundtruth pair while 0 for the negative input-groundtruth pair.

➢ During backpropagation:
➢ Output-Hidden: Only those weights that are between ground truth and hidden are
updated.
➢ Hidden-Input: Only those weights that are between hidden and the input are
updated.

31
Negative Sampling
➢ Sampling Methods:
➢ Uniform Sampling

➢ Frequency-based Sampling

➢ Smoothed Frequency-Based Sampling

➢ Frequency-based sampling with a smoothing factor (usually 0.75) to reduce the probability
of common words being selected. 32
Disclaimer
➢ These slides are not original and have been prepared from various
sources for teaching purposes.

Active and Passive Voice PPT - Demo
80% (5)
Active and Passive Voice PPT - Demo
20 pages
CNC Turning Center Programming Manual
67% (3)
CNC Turning Center Programming Manual
198 pages
Lecture1 Word Embeddings
No ratings yet
Lecture1 Word Embeddings
99 pages
Lecture 6 - Word2Vec and Text Classification
No ratings yet
Lecture 6 - Word2Vec and Text Classification
66 pages
BDMH LLM
No ratings yet
BDMH LLM
51 pages
Unit 2 Updated New
No ratings yet
Unit 2 Updated New
77 pages
Ba LLMS W2 S2 2024 2025
No ratings yet
Ba LLMS W2 S2 2024 2025
47 pages
L4 Cse256 Fa24 We
No ratings yet
L4 Cse256 Fa24 We
68 pages
Web Minnig
No ratings yet
Web Minnig
30 pages
Sense VEC A Fast and Accurate Method For Word Sense Disambiguation in Neural Word Embeddings
No ratings yet
Sense VEC A Fast and Accurate Method For Word Sense Disambiguation in Neural Word Embeddings
9 pages
Lecture 10
No ratings yet
Lecture 10
86 pages
cs224n 2025 Lecture02 Wordvecs2
No ratings yet
cs224n 2025 Lecture02 Wordvecs2
46 pages
TAR 2020 Reading 05
No ratings yet
TAR 2020 Reading 05
20 pages
XCS224N Module1 Slides
No ratings yet
XCS224N Module1 Slides
72 pages
Word Embeddings
No ratings yet
Word Embeddings
55 pages
A Simple Word2vec Tutorial - Zafar Ali - Medium - Reader View
No ratings yet
A Simple Word2vec Tutorial - Zafar Ali - Medium - Reader View
9 pages
3 WordMeaning
No ratings yet
3 WordMeaning
78 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
57 pages
Vector Semantics and Embeddings
No ratings yet
Vector Semantics and Embeddings
29 pages
DM Chapter 9 - Word Embedding
No ratings yet
DM Chapter 9 - Word Embedding
7 pages
Word and Document Embeddings
No ratings yet
Word and Document Embeddings
94 pages
M.phil Thesis - Hadia Zafar
No ratings yet
M.phil Thesis - Hadia Zafar
139 pages
CCS369 Unit-2 20.12.24
No ratings yet
CCS369 Unit-2 20.12.24
41 pages
Unit IV
No ratings yet
Unit IV
58 pages
Word 2 Vec
No ratings yet
Word 2 Vec
29 pages
21 Word2Vec 24 09 2024
No ratings yet
21 Word2Vec 24 09 2024
63 pages
Vector Semantics and Embedding (Part 2)
No ratings yet
Vector Semantics and Embedding (Part 2)
47 pages
Unit IV
No ratings yet
Unit IV
57 pages
4th 5th Grade Writing Folder
No ratings yet
4th 5th Grade Writing Folder
76 pages
Word Embadding
No ratings yet
Word Embadding
24 pages
06 Wordvectors
No ratings yet
06 Wordvectors
96 pages
Word2vec Overview
No ratings yet
Word2vec Overview
2 pages
NLP Prez Word - Sentence Embedding - MAQUET - MARTIN - LEEFEBURE - MOGAVERO
No ratings yet
NLP Prez Word - Sentence Embedding - MAQUET - MARTIN - LEEFEBURE - MOGAVERO
18 pages
7a. Word Embeddings Word2Vec and GloVe
No ratings yet
7a. Word Embeddings Word2Vec and GloVe
39 pages
08 Word Embeddings (2021)
No ratings yet
08 Word Embeddings (2021)
58 pages
NLP DL Lecture2
No ratings yet
NLP DL Lecture2
54 pages
Wordembed
No ratings yet
Wordembed
31 pages
Neural Network
No ratings yet
Neural Network
23 pages
08-DL-Deep Learning For Text Data (Transfer Learning in NLP)
No ratings yet
08-DL-Deep Learning For Text Data (Transfer Learning in NLP)
53 pages
ELT Course ICPNA
100% (1)
ELT Course ICPNA
56 pages
Lebijp 59 SZ 31 Py
No ratings yet
Lebijp 59 SZ 31 Py
69 pages
12 Subrata DL
No ratings yet
12 Subrata DL
25 pages
Lecture Word Embeddings WordTo Vec IR
No ratings yet
Lecture Word Embeddings WordTo Vec IR
60 pages
ML For NLP-LO4
No ratings yet
ML For NLP-LO4
42 pages
MahaForest 201 Paper 1 SEttb 2
No ratings yet
MahaForest 201 Paper 1 SEttb 2
43 pages
Cs224n 2024 Lecture02 Wordvecs2
No ratings yet
Cs224n 2024 Lecture02 Wordvecs2
45 pages
Word 2 Vec
No ratings yet
Word 2 Vec
6 pages
Unit 2
No ratings yet
Unit 2
15 pages
Wordembed v2.0
No ratings yet
Wordembed v2.0
46 pages
Unit 8 G6 - Global Success
No ratings yet
Unit 8 G6 - Global Success
6 pages
Word Embeddings Classification
No ratings yet
Word Embeddings Classification
52 pages
CCS369 - TSS-Unit 2
No ratings yet
CCS369 - TSS-Unit 2
56 pages
Learning Representations That Convey Semantic and Syntactic Information
No ratings yet
Learning Representations That Convey Semantic and Syntactic Information
14 pages
Sheet 3
No ratings yet
Sheet 3
5 pages
Christopher Manning Lecture 1: Introduction and Word Vectors
No ratings yet
Christopher Manning Lecture 1: Introduction and Word Vectors
42 pages
Christopher Manning Lecture 2: Word Vectors, Word Senses, and Neural Classifiers
No ratings yet
Christopher Manning Lecture 2: Word Vectors, Word Senses, and Neural Classifiers
57 pages
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
100% (1)
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
12 pages
11.chapter8 WordEmbedding
No ratings yet
11.chapter8 WordEmbedding
17 pages
Prehensible Input
No ratings yet
Prehensible Input
21 pages
Introduction to LLMs for Business Leaders: Responsible AI Strategy Beyond Fear and Hype: Byte-Sized Learning Series
From Everand
Introduction to LLMs for Business Leaders: Responsible AI Strategy Beyond Fear and Hype: Byte-Sized Learning Series
I. Almeida
No ratings yet
Chinese Grammar Explained SLIDES
No ratings yet
Chinese Grammar Explained SLIDES
231 pages
Explaining The Intuition of Word2Vec & Implementing It in Python
No ratings yet
Explaining The Intuition of Word2Vec & Implementing It in Python
13 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
36 pages
Word2Vec Tutorial - The Skip-Gram Model Chris McCormick PDF
No ratings yet
Word2Vec Tutorial - The Skip-Gram Model Chris McCormick PDF
39 pages
542 315 Word2vec
No ratings yet
542 315 Word2vec
20 pages
13.-ĐỀ-SỐ-13-HSG-ANH-9-HUYỆN 2
No ratings yet
13.-ĐỀ-SỐ-13-HSG-ANH-9-HUYỆN 2
8 pages
CS490 Advanced Topics in Computing - Deep Learning
No ratings yet
CS490 Advanced Topics in Computing - Deep Learning
20 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
33 pages
What Is Malaysian English
No ratings yet
What Is Malaysian English
14 pages
FPD - Lesson Planning-Final
No ratings yet
FPD - Lesson Planning-Final
21 pages
Ekta Test
No ratings yet
Ekta Test
18 pages
Eng3 1st
No ratings yet
Eng3 1st
25 pages
Demo Math
No ratings yet
Demo Math
14 pages
(Sem 3) Psychology Unit III
No ratings yet
(Sem 3) Psychology Unit III
3 pages
English Phonetics and Phonology
No ratings yet
English Phonetics and Phonology
11 pages
CGC Upper Intermediate W23V2 Clauses of Concession Result and Purpose Rules
No ratings yet
CGC Upper Intermediate W23V2 Clauses of Concession Result and Purpose Rules
21 pages
Conceptual Metaphors and Animals
No ratings yet
Conceptual Metaphors and Animals
21 pages
Gerunds and Infinitives 2
No ratings yet
Gerunds and Infinitives 2
2 pages
Verbs
No ratings yet
Verbs
20 pages
Trabalho 2
No ratings yet
Trabalho 2
6 pages
S3 Speaking Mock Test 7 PDF
No ratings yet
S3 Speaking Mock Test 7 PDF
2 pages
Discourse Term Paper
No ratings yet
Discourse Term Paper
6 pages
TIGER Poetic Device
100% (1)
TIGER Poetic Device
5 pages
Origin of The Semitic People
89% (19)
Origin of The Semitic People
11 pages
English 6-Q4-L12 Module
No ratings yet
English 6-Q4-L12 Module
13 pages
KO .. - Knowl. Ass. Unit 5
No ratings yet
KO .. - Knowl. Ass. Unit 5
3 pages
Modal Verbs (Can, May, Must)
No ratings yet
Modal Verbs (Can, May, Must)
2 pages
Daily Routine 1st Day
No ratings yet
Daily Routine 1st Day
1 page
Visual Word: Unlocking the Power of Image Understanding
From Everand
Visual Word: Unlocking the Power of Image Understanding
Fouad Sabry
No ratings yet
Coreference: Fundamentals and Applications
From Everand
Coreference: Fundamentals and Applications
Fouad Sabry
No ratings yet