0% found this document useful (0 votes)

91 views9 pages

Word Embeddings Notes

Word embeddings are a technique to represent words as vectors of numbers that capture meaning based on context. They are generated by training neural networks on large datasets to predict words based on surrounding words. Trained this way, similar words have similar vectors, which is useful for natural language tasks. The Word2Vec algorithm is commonly used to generate word embeddings, using either the Continuous Bag-of-Words or Skip-Gram model to predict target words from contexts or vice versa. These embeddings have been applied to many natural language applications.

Uploaded by

Abhimanyu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

91 views9 pages

Word Embeddings Notes

Uploaded by

Abhimanyu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Word Embeddings

Word embeddings are a popular technique in natural language processing

(NLP) and machine learning for representing words in a numerical form that
can be easily processed by computers. Word embeddings are a way to
represent words as vectors, or arrays of numbers, that capture the meaning of
those words based on their context.

The most common way to generate word embeddings is through training

neural network models on large datasets of text. The basic idea is to train a
model to predict a word based on the context in which it appears. This can be
done in a variety of ways, such as predicting the next word in a sentence, or
predicting a missing word in a sentence. The trained neural network then
produces a set of numerical values for each word in the vocabulary, which can
be used as the word embeddings.

The resulting word embeddings have many useful properties. One is that words
with similar meanings or connotations are often close to each other in the
vector space. This means that words like "happy" and "joyful" might be very
close in the vector space, while words like "happy" and "sad" would be far apart.
This can be useful for a wide range of NLP tasks, such as sentiment analysis,
language translation, and information retrieval.

Word embeddings have been used in a wide range of NLP applications,

including language modeling, text classification, named entity recognition, and
machine translation. They have also been used in many other fields, such as
computer vision and speech recognition, to represent objects and sounds in a
numerical form that can be easily processed by computers.

Consider the following similar sentences: Have a good day and Have a great
day. They hardly have different meaning. If we construct an exhaustive
vocabulary (let’s call it V), it would have V = {Have, a, good, great, day}.
Now, let us create a one-hot encoded vector for each of these words in V. Length
of our one-hot encoded vector would be equal to the size of V (=5). We would
have a vector of zeros except for the element at the index representing the
corresponding word in the vocabulary. That particular element would be one.
The encodings below would explain this better.
Have = [1,0,0,0,0]`; a=[0,1,0,0,0]` ; good=[0,0,1,0,0]` ; great=[0,0,0,1,0]` ;
day=[0,0,0,0,1]` (` represents transpose)
If we try to visualize these encodings, we can think of a 5 dimensional space,
where each word occupies one of the dimensions and has nothing to do with
the rest (no projection along the other dimensions). This means ‘good’ and
‘great’ are as different as ‘day’ and ‘have’, which is not true
Word2vec
Word2vec is a specific algorithm for generating word embeddings, developed
by Tomas Mikolov and his team at Google.Word2vec is a type of neural network
that takes in a large corpus of text data and generates word embeddings by
predicting the probability of words appearing together in a given context.

Two different learning models were introduced that can be used as part
of the word2vec approach to learn the word embedding; they are:

 Continuous Bag-of-Words, or CBOW model.

 Continuous Skip-Gram Model.

Continuous Bag of Words (CBOW):

It attempts to guess the output (target word) from its neighboring words
(context words). You can think of it like fill in the blank task, where you need
to guess word in place of blank by observing nearby words.

Continuous Bag of Words (CBOW) single-word model:

Implementing the CBOW for single-word architecture of Word2Vec. The
content is broken down into the following steps:

Data Preparation: Defining corpus by tokenizing text.

Generate Training Data: Build vocabulary of words, one-hot encoding for

words, word index.

Train Model: Pass one hot encoded words through forward pass, calculate
error rate by computing loss, and adjust weights using back propagation.
Output: By using trained model calculate word vector and find similar words.
I will explain CBOW steps without code but if you want full working code
of CBOW with numpy from scratch, I have separate post for that, you can
always jump into that.

1. Data Preparation:

Let’s say we have a text like below:

“i like natural language processing”
To make it simple I have chosen a sentence without capitalization and
punctuation. Also I will not remove any stop words (“and”, “the” etc.) but for
real world implementation you should do lots of cleaning task like stop
word removal, replacing digits, remove punctuation etc.

After pre-processing we will convert the text to list of tokenized word.

[“i”, “like”, “natural”, “language”, “processing”]
2. Generate training data:

Unique vocabulary: Find unique vocabulary list. As we don’t have any

duplicate word in our example text, so unique vocabulary will be:

[“i”, “like”, “natural”, “language”, “processing”]

Now to prepare training data for single word CBOW model, we define “target
word” as the word which follows a given word in the text (which will be our
“context word”). That means we will be predicting next word for a given word.
Now let’s construct our training examples, scanning through the text with a
window will prepare a context word and a target word, like so:

For example, for context word “i” the target word will be “like”. For our
example text full training data will looks like:
One-hot encoding: We need to convert text to one-hot encoding as algorithm
can only understand numeric values.
For example encoded value of the word “i”, which appears first in the
vocabulary, will be as the vector [1, 0, 0, 0, 0]. The word “like”, which appears
second in the vocabulary, will be encoded as the vector [0, 1, 0, 0, 0]

So as you can see above table is our final training data, where encoded target
word is Y variable for our model and encoded context word is X variable for
our model.

Now we will move on to train our model.

3. Training Model:
So far, so good right? Now we need to pass this data into the basic neural
network with one hidden layer and train it. Only one thing to note is that the
desire vector dimension of any word will be the number of hidden nodes.

For this tutorial and demo purpose my desired vector dimension is 3. For
example:

“i” => [0.001, 0.896, 0.763] so number of hidden layer node will be 3.

Dimension (n): It is dimension of word embedding you can treat embedding

as number of features or entity like organization, name, gender etc. It can be
10, 20, 100 etc. Increasing number of embedding layer will explain a word or
token more deeply. Just for an example Google pre-trained word2vec have
dimension of 300.
Applications of CBOW

CBOW (Continuous Bag of Words) is a popular algorithm used in natural

language processing (NLP) for word embedding. It is used to predict a target
word from the context words surrounding it.

Here are some of the applications of CBOW:

Language modeling: CBOW is used for predicting the next word in a sentence
or text. This is useful in speech recognition, machine translation, and other
language-related tasks.

Information retrieval: CBOW is used to generate word vectors that are used
in information retrieval systems to match queries with documents.

Sentiment analysis: CBOW is used in sentiment analysis to determine the

sentiment of a text by analyzing the context words around the target word.

Recommendation systems: CBOW can be used to generate word vectors that

are used in recommendation systems to suggest similar products or services.

Text classification: CBOW is used to generate word vectors that are used in
text classification tasks, such as spam detection, topic classification, and
sentiment analysis.
Skip-gram (SG):
It guesses the context words from a target word. This is completely opposite
task than CBOW. Where you have to guess which set of words can be nearby of
a given word with a fixed window size. For below example skip gram model
predicts word surrounding word with window size 4 for given word “jump”

The Skip-gram model is basically the inverse of the CBOW model. The input is
a centre word and the model predicts the context words.
In this section we will be implementing the Skipgram for multi-word
architecture of Word2Vec. Like single word CBOW and multi word CBOW the
content is broken down into the following steps:
1. Data Preparation: Defining corpus by tokenizing text.
2. Generate Training Data: Build vocabulary of words, one-hot encoding for
words, word index.
3. Train Model: Pass one hot encoded words through forward pass, calculate
error rate by computing loss, and adjust weights using back propagation.
Output: By using trained model calculate word vector and find similar words.
I will explain CBOW steps without code but if you want full working code of
Skip-Gram with numpy from scratch, I have separate post for that, you can
always jump into that.

1. Data Preparation for Skip-gram model:

Let’s say we have a text like below:
“i like natural language processing”
To make it simple I have chosen a sentence without capitalization and
punctuation. Also I will not remove any stop words (“and”, “the” etc.) but for
real world implementation you should do lots of cleaning task like stop
word removal, replacing digits, remove punctuation etc.
After pre-processing we will convert the text to list of tokenized word.
Output:
[“i”, “like”, “natural”, “language”, “processing”]
2. Generate training data for Skip-gram model:
Unique vocabulary: Find unique vocabulary list. Here we don’t have any
duplicate word in our example text, so unique vocabulary will be:
[“i”, “like”, “natural”, “language”, “processing”]
Now to prepare training data for multi word Skipgram model, we define
“context word” as the word which follows a given word in the text (which
will be our “target word”). That means we will be predicting surrounding
word for a given word.
Now let’s construct our training examples, scanning through the text with a
window will prepare a context word and a target word, like so:

For example, for context word “i” and “natural” the target word will be “like”.
For our example text full training data will looks like:

One-hot encoding: We need to convert text into one-hot encoding as algorithm

can only understand numeric values.
For example encoded value of the word “i”, which appears first in the
vocabulary, will be as the vector [1, 0, 0, 0, 0]. The word “like”, which appears
second in the vocabulary, will be encoded as the vector [0, 1, 0, 0, 0]
So let’s see overall set of context-target words in one hot encoded form:

So as you can see above table is our final training data, where encoded context
word is Y variablefor our model and encoded target word is X variable for our
model as skipgram predicts surrounding word of a given word.
Now we will move on to train our model as we are done with our final training
data.

3. Training Skip-gram Model:

Multi-Word skip-gram Model with window-size = 1

Model Architecture of Skip-gram:

Now here we are trying to predict two surrounding words that is why number
of nodes in output layer is two. It can be any number depending on how many
surrounding word you are trying to predict. In above picture 1st output node
represents previous word (t-1) and 2nd output node represents after word
(t+1) of given input word.
The number of nodes in each output layer (u11 to u15and u21 to u25) are same as
the number of nodes in the input layer (count of unique vocabulary in our case
is 5). This is because we are dealing with each word in the vocabulary for each
context.

Applications of Skip-gram:

1. Information retrieval: Skip-gram can be used to generate word

embeddings that are used in information retrieval systems to match
queries with documents.

2. Recommendation systems: Skip-gram can be used to generate word

embeddings that are used in recommendation systems to suggest similar
products or services.

3. Text classification: Skip-gram can be used to generate word

embeddings that are used in text classification tasks, such as spam
detection, topic classification, and sentiment analysis.

4. Word sense disambiguation: Skip-gram can be used to generate word

embeddings that are used in word sense disambiguation tasks, which
involves identifying the correct meaning of a word in context.

The Dark Side of Artificial Intelligence
No ratings yet
The Dark Side of Artificial Intelligence
10 pages
Explaining The Intuition of Word2Vec & Implementing It in Python
No ratings yet
Explaining The Intuition of Word2Vec & Implementing It in Python
13 pages
Large Language Models From Scratch
No ratings yet
Large Language Models From Scratch
29 pages
Continuous Bag of Words (Cbow) - Single Word Model - How It Works - Thinkinfi
No ratings yet
Continuous Bag of Words (Cbow) - Single Word Model - How It Works - Thinkinfi
14 pages
Lecture#14
No ratings yet
Lecture#14
38 pages
Chapter II
No ratings yet
Chapter II
26 pages
DM Chapter 9 - Word Embedding
No ratings yet
DM Chapter 9 - Word Embedding
7 pages
Word Embadding
No ratings yet
Word Embadding
24 pages
12 Subrata DL
No ratings yet
12 Subrata DL
25 pages
A Simple Word2vec Tutorial - Zafar Ali - Medium - Reader View
No ratings yet
A Simple Word2vec Tutorial - Zafar Ali - Medium - Reader View
9 pages
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
100% (1)
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
12 pages
Continuous Bag of Words
No ratings yet
Continuous Bag of Words
19 pages
Word Embeddings
No ratings yet
Word Embeddings
12 pages
NLP - L9 Word Embedding
No ratings yet
NLP - L9 Word Embedding
5 pages
Word 2 Vec
No ratings yet
Word 2 Vec
6 pages
08 Word Embeddings (2021)
No ratings yet
08 Word Embeddings (2021)
58 pages
Unit IV
No ratings yet
Unit IV
57 pages
Common Word Embedding - Continuous Bag-Of-Words - Word2Vec
No ratings yet
Common Word Embedding - Continuous Bag-Of-Words - Word2Vec
12 pages
Part 3
No ratings yet
Part 3
5 pages
NLP Notes
No ratings yet
NLP Notes
11 pages
ML For NLP-LO4
No ratings yet
ML For NLP-LO4
42 pages
Word Embeddings in NLP - Gunjan Agicha - Medium
No ratings yet
Word Embeddings in NLP - Gunjan Agicha - Medium
5 pages
DLNLP CH-3 N
No ratings yet
DLNLP CH-3 N
11 pages
NLP2
No ratings yet
NLP2
11 pages
7 Word Embeddings
No ratings yet
7 Word Embeddings
13 pages
Word Embeddings Classification
No ratings yet
Word Embeddings Classification
52 pages
Report On Word2vec
No ratings yet
Report On Word2vec
7 pages
Constructing and Evaluating Word Embeddings
No ratings yet
Constructing and Evaluating Word Embeddings
33 pages
Word Embedding
No ratings yet
Word Embedding
9 pages
Zhou 2020
No ratings yet
Zhou 2020
5 pages
Embeddings
No ratings yet
Embeddings
3 pages
Unit 2 Part 1 Word Embeddings Final
No ratings yet
Unit 2 Part 1 Word Embeddings Final
3 pages
08-DL-Deep Learning For Text Data (Transfer Learning in NLP)
No ratings yet
08-DL-Deep Learning For Text Data (Transfer Learning in NLP)
53 pages
WORD EMBEDDING Project
No ratings yet
WORD EMBEDDING Project
15 pages
Chap5C 4p
No ratings yet
Chap5C 4p
10 pages
Word Embeddings
No ratings yet
Word Embeddings
55 pages
Lecture 6 - Word2Vec and Text Classification
No ratings yet
Lecture 6 - Word2Vec and Text Classification
66 pages
Wordembed
No ratings yet
Wordembed
31 pages
NLP Concepts
No ratings yet
NLP Concepts
37 pages
06 Wordvectors
No ratings yet
06 Wordvectors
96 pages
CH 3
No ratings yet
CH 3
183 pages
Wordembed v2.0
No ratings yet
Wordembed v2.0
46 pages
04 - Text Representation
No ratings yet
04 - Text Representation
131 pages
08 Embedding Et RNN v2.11
No ratings yet
08 Embedding Et RNN v2.11
69 pages
Module03 Embeddings
No ratings yet
Module03 Embeddings
102 pages
BDMH LLM
No ratings yet
BDMH LLM
51 pages
NLP DL Lecture2
No ratings yet
NLP DL Lecture2
54 pages
542 315 Word2vec
No ratings yet
542 315 Word2vec
20 pages
Unit IV
No ratings yet
Unit IV
58 pages
NLP - Natural Language Processing
No ratings yet
NLP - Natural Language Processing
74 pages
Neural Network
No ratings yet
Neural Network
23 pages
Gen AI 1
No ratings yet
Gen AI 1
4 pages
Vector Representation of Text: Vagelis Hristidis Prepared With The Help of Nhat Le Many Slides Are From Richard Socher
No ratings yet
Vector Representation of Text: Vagelis Hristidis Prepared With The Help of Nhat Le Many Slides Are From Richard Socher
20 pages
Natural Language Processing With Neural Network - Class3
No ratings yet
Natural Language Processing With Neural Network - Class3
25 pages
Word Embeddings Notes Cleaned
No ratings yet
Word Embeddings Notes Cleaned
4 pages
Nlput-Unit2 Notes
No ratings yet
Nlput-Unit2 Notes
28 pages
Word2vec Summary
No ratings yet
Word2vec Summary
5 pages
Unit Ii
No ratings yet
Unit Ii
20 pages
Let's Learn NLP in 5 Minutes (Part 7)
No ratings yet
Let's Learn NLP in 5 Minutes (Part 7)
8 pages
Subject Oriented Programming
From Everand
Subject Oriented Programming
Godwin Ani
No ratings yet
Computer Programming: A Simplified Entry to Python, Java, and C++ Programming for Beginners
From Everand
Computer Programming: A Simplified Entry to Python, Java, and C++ Programming for Beginners
Lena Neill
No ratings yet
Textual Explanations and Critiques in Recommendation Systems
No ratings yet
Textual Explanations and Critiques in Recommendation Systems
184 pages
New AI Unit 2 Session 5
No ratings yet
New AI Unit 2 Session 5
10 pages
The Intersection of AI and Cognitive Science in Literature and Language Learning
No ratings yet
The Intersection of AI and Cognitive Science in Literature and Language Learning
9 pages
Abstract
No ratings yet
Abstract
3 pages
Ijor 024 28405 (10) 875 882
No ratings yet
Ijor 024 28405 (10) 875 882
8 pages
Darshan
No ratings yet
Darshan
9 pages
Ai ML
No ratings yet
Ai ML
15 pages
Class X - Computer Science (Chapter 6 Basics of Artificial Intelligence-Term1 (Assignment1)
No ratings yet
Class X - Computer Science (Chapter 6 Basics of Artificial Intelligence-Term1 (Assignment1)
7 pages
Impact of AI On Traditional Artists
No ratings yet
Impact of AI On Traditional Artists
10 pages
Slide 1: Title Slide: Artificial Intelligence (AI)
No ratings yet
Slide 1: Title Slide: Artificial Intelligence (AI)
3 pages
AI Project Class 10th Nilansh Dwivedi
No ratings yet
AI Project Class 10th Nilansh Dwivedi
16 pages
AI in 6 Hours
0% (1)
AI in 6 Hours
72 pages
Christine Mindrift CV
No ratings yet
Christine Mindrift CV
3 pages
Class 10 Ai Sample Paper - 3
No ratings yet
Class 10 Ai Sample Paper - 3
4 pages
Ai - Ds - Ad3501-Dl GMT 3 QP and Key
No ratings yet
Ai - Ds - Ad3501-Dl GMT 3 QP and Key
10 pages
Intro To NLP
No ratings yet
Intro To NLP
121 pages
AI Algorithm Auditor Certificate Handbook 1720372190
100% (2)
AI Algorithm Auditor Certificate Handbook 1720372190
31 pages
10 (3S) 4112-4118
No ratings yet
10 (3S) 4112-4118
7 pages
Chess Programming
No ratings yet
Chess Programming
3 pages
(Chi Et Al., 2020) Artificially Intelligent Device Use in Service Delivery A Systematic Review Synthesis and Research Agenda
No ratings yet
(Chi Et Al., 2020) Artificially Intelligent Device Use in Service Delivery A Systematic Review Synthesis and Research Agenda
31 pages
Viewing A Film: Ex Machina
No ratings yet
Viewing A Film: Ex Machina
4 pages
Coursera Machine Learning Course Week 6 - Slides
No ratings yet
Coursera Machine Learning Course Week 6 - Slides
44 pages
Gen Ai
No ratings yet
Gen Ai
68 pages
MSC AI Syllabus
No ratings yet
MSC AI Syllabus
63 pages
English To Luganda Translation
No ratings yet
English To Luganda Translation
13 pages
Deep Learning I. Introduction (: 1. The History and The Development of Deep Learning
No ratings yet
Deep Learning I. Introduction (: 1. The History and The Development of Deep Learning
21 pages
AI ML 5day Learning Plan
No ratings yet
AI ML 5day Learning Plan
3 pages
Unit 5 1
No ratings yet
Unit 5 1
3 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
14 pages

Word Embeddings Notes

Uploaded by

Word Embeddings Notes

Uploaded by

Word Embeddings

Word embeddings are a popular technique in natural language processing

The most common way to generate word embeddings is through training

Word embeddings have been used in a wide range of NLP applications,

 Continuous Bag-of-Words, or CBOW model.

Continuous Bag of Words (CBOW):

Continuous Bag of Words (CBOW) single-word model:

Data Preparation: Defining corpus by tokenizing text.

Generate Training Data: Build vocabulary of words, one-hot encoding for

Let’s say we have a text like below:

After pre-processing we will convert the text to list of tokenized word.

Unique vocabulary: Find unique vocabulary list. As we don’t have any

[“i”, “like”, “natural”, “language”, “processing”]

Now we will move on to train our model.

Dimension (n): It is dimension of word embedding you can treat embedding

CBOW (Continuous Bag of Words) is a popular algorithm used in natural

Here are some of the applications of CBOW:

Sentiment analysis: CBOW is used in sentiment analysis to determine the

Recommendation systems: CBOW can be used to generate word vectors that

1. Data Preparation for Skip-gram model:

One-hot encoding: We need to convert text into one-hot encoding as algorithm

3. Training Skip-gram Model:

Multi-Word skip-gram Model with window-size = 1

1. Information retrieval: Skip-gram can be used to generate word

2. Recommendation systems: Skip-gram can be used to generate word

3. Text classification: Skip-gram can be used to generate word

4. Word sense disambiguation: Skip-gram can be used to generate word

You might also like