Word 2 Vec

This document discusses Word2Vec, a model for generating word embeddings from large amounts of text. Word2Vec uses a neural network to produce vector representations of words that capture semantic meaning. It includes two algorithms, Skip-Gram and CBOW, that predict context words from a target word. The model is trained on word-context pairs from a text corpus to optimize the embedding vectors. After training, each word is represented as a unique vector that can be used in downstream NLP tasks.

Uploaded by

alihamda535

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

141 views6 pages

Word 2 Vec

Uploaded by

alihamda535

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Word2Vec in Natural Language Processing

Nuha Mohammed
April 2020

1 Introduction
Natural Language Processing (NLP) is a subset of machine learning which is
directed toward reading and deriving meaning from human languages. NLP
has a wide array of applications from spell checking to machine translation to
semantic analysis.

2 Word Vector Representation

An important part of NLP is how we represent words as input to any model—
we can’t use words directly because the algorithm needs a way of determining
similarity and difference between words.

2.1 One-hot Encoding

One-hot encoding is a common method used to turn categorical data into nu-
merical data. In the context of NLP, the one-hot encoded vector is the same
length as the word dictionary, with each word mapped to a unique index.

Figure 1: One-hot encoded vectors

1
The easiest way to keep track of each word-index pair is to pair the words to
an index in alphabetical order. Then, in each word vector, a 1 is placed at the
index of that word and a 0 is placed at all other indices.

2.2 Continuous Embedding Vectors

Although one-hot encoding is the most intuitive numerical way to represent
words, it fails to capture the contextual meaning of a word as the distance
between one-hot encoded vectors for any two words is always the same. For
this reason, we use continuous vector embeddings which can capture different
semantic meanings/features of a word in the context of the sentence or passage.

3 Word2Vec
Word2Vec models create continuous embedding vectors which represent words
within a large text in an n-dimensional vector space.
In the following example, a Word2Vec model with 3 features is created, and
for each feature/category, the word is scored for its likelihood of belonging to
that category.

vec(Daisy)=[3.92, 1.34, -3.82]

vec(Rose)=[-0.93, 1.220, 1.82]

vec(Donald)=[4.09, -0.58, 2.01]

The embedding vectors show that “Daisy” and “Donald” are closest together
on the first dimension, so the first dimension probably pertains to a feature that
both “Daisy” and “Donald” share that “Rose” does not relate to and so on.

3.1 Data Setup

In order to set up the data, you need to identify context words for each cen-
ter word. Based on the center word, each neighboring word within a selected
window size is a context word and is paired with the center word to create the
training samples.
The window size is a hyper parameter which can be used to tune the model.
Although some models use the entirety of a sentence as the context, a larger
window size does not necessarily result in a more accurate model. Larger window
sizes often capture more information related to the topic/main idea (in context),
while smaller window sizes capture more information about the center word
itself. The window size is equal to 2 in the following diagram.

2
Figure 2: Depiction of how context-word pairs are created

3.2 Algorithms
Word2Vec includes two possible algorithms: Skip-Gram and Continuous Bag-
of-Words (CBOW). CBOW works by trying to predict the center word from its
neighboring context words, while Skip-Gram does the opposite and attempts to
predict the context words, given the center word.

Figure 3: CBOW vs. Skip-Gram Model

3
Whether Skip-Gram or CBOW works better depends on the type and amount
of data. However, I am only going to go in detail on the Skip-Gram model in
this lecture as both are conceptually similar.

3.3 Training
In Skip-Gram models, the input is a one-hot encoded vector of the center word.
Each neuron within the output layer gives the probability that the word at that
position is a context word of that center word.
Since the output layer performs softmax, it generates values between 0 and
1 and all values in the layer add up to 1, giving us probabilities. We want
our generated probability vector to match the actual probabilities which are
represented with the one-hot encoded vector of the word.

Figure 4: Neural network architecture in Skip-Gram

. The chart depicts the neural network structure for the context word “ant”
among a dictionary of 10,000 words. The input vector is a one-hot encoded
vector so only the position corresponding to the word “ant” contains a 1 and
the other positions contain a 0, with a total of 10,000 positions.
Note that Word2Vec is not used for the task it is trained on. The model
is not ultimately used to predict the output (which is already known) but to
create the word embedding vector which in other words is the weight matrix W,
as shown.

4
Figure 5: Weights of the hidden layer in the neural network
W and W’ are the weight matrices between the input and hidden layer
and the output and hidden layer, respectively. The model seeks to optimize
the weight matrices by maximizing the probability of correctly predicting the
context words.
The weight matrices are passed into the cost function as variables and op-
timized through gradient descent. Just as any other neural network, the model
aims to minimize its cost function:
h = W T x,
0 0
T T
uc = W x=W W T x,
yc = Sof tmax(uc )
C
Y
L = −logP (wc,1 , wc,2 , ..., wc |w0 ) = −log P (wc,i |w0 )
c=1

After training, each row in the input weight matrix (W ) consists of a word-
embedding vector for each word:

 
always 0.5 0.3 7
kite 
 8 −0.9 2.2 

should  0.3 5 2.1 
W =  
... 
 ... 

there  0.6 7 3.2 
who 2 0.5 4

5
The word embeddings serve as a foundation of a lot of NLP tasks such as
topic modeling and analyzing meaning from passages.
With the word embedding themselves, you can discover underlying patterns
in data. One interesting application is in Proteomics and Genomics— BioVec-
tors encode n-grams in biological sequences such as DNA, RNA, and protein
sequences. Using techniques such as Skip-Gram modeling, scientists have been
able to group biological sequences based on underlying biochemical properties
and interactions.

4 Code Sample
Checkout this implementation of Word2Vec with the Skip Gram model:
https://github.com/DerekChia/word2vec_numpy/blob/master/wordtovec.
py

5 Further Exploration
For a large amount of data, the Word2Vec becomes inefficient as it will contain
a huge number of weights. Gradient descent performed on that network will be
extremely slow and it will be hard to avoid over-fitting. Therefore, the authors
of Word2Vec also introduced a concept called Negative Sampling which takes
care of this issue. I encourage you to look into their second paper to find out
more about optimization.

6 References
The information and images I used in these lectures came from multiple sources.
Credits go to their respective owners.
• Stanford CS224d Lecture: Deep Learning for NLP
• Word2Vec Paper: Efficient Estimation of Word Representations in Vector
Space

• Chris McCormick’s Word2Vec Tutorial

• Word2Vec (Skip-Gram model) Explained - Medium Article
• Eric Kim’s Skip-Gram Modeling Tutorial

• ML Club RNN Lecture

AI in 100 Images
No ratings yet
AI in 100 Images
104 pages
Machine Learning For Algorithmic Trading
36% (11)
Machine Learning For Algorithmic Trading
13 pages
Unit 2 AI
No ratings yet
Unit 2 AI
107 pages
Unit 2
No ratings yet
Unit 2
112 pages
NCA-GENL Nvidia Generative Ai Llms Exam Dumps
No ratings yet
NCA-GENL Nvidia Generative Ai Llms Exam Dumps
5 pages
Unit 1 2 3 4 5 NLP Notes Merged
100% (1)
Unit 1 2 3 4 5 NLP Notes Merged
105 pages
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
100% (1)
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
12 pages
Generative AI With Large Language Models AWS & DeepLearning
No ratings yet
Generative AI With Large Language Models AWS & DeepLearning
96 pages
Hill Climbing Vs Simulated Annealing
100% (1)
Hill Climbing Vs Simulated Annealing
14 pages
Deep Learning
No ratings yet
Deep Learning
127 pages
Reasoning Uncertainty
No ratings yet
Reasoning Uncertainty
38 pages
5 Techiques To FineTune LLMs
No ratings yet
5 Techiques To FineTune LLMs
7 pages
The Rise of Vector Databases in The Age of LLMs
No ratings yet
The Rise of Vector Databases in The Age of LLMs
26 pages
Federated Learning - Hope and Scope
No ratings yet
Federated Learning - Hope and Scope
4 pages
Btech CSE
No ratings yet
Btech CSE
17 pages
Deep Learning-Question Bank-Module-Wise
67% (3)
Deep Learning-Question Bank-Module-Wise
5 pages
Deep Learning Approaches For Network Int
No ratings yet
Deep Learning Approaches For Network Int
116 pages
Skip Gram
100% (1)
Skip Gram
37 pages
A Review On Large Language Models Architectures Ap
No ratings yet
A Review On Large Language Models Architectures Ap
31 pages
GANppt
100% (1)
GANppt
34 pages
RAG With Math
No ratings yet
RAG With Math
7 pages
What Are Generative Adversarial Networks
No ratings yet
What Are Generative Adversarial Networks
14 pages
Vector Embedding
No ratings yet
Vector Embedding
8 pages
Techniques To FineTune LLMs
No ratings yet
Techniques To FineTune LLMs
7 pages
Generative Adversarial Networks (Gans) : Date: 14.11.2022
100% (1)
Generative Adversarial Networks (Gans) : Date: 14.11.2022
12 pages
Model With One-Word Context: 2vec 2vec 2vec 2vec
100% (1)
Model With One-Word Context: 2vec 2vec 2vec 2vec
17 pages
Neural Networks PDF
No ratings yet
Neural Networks PDF
89 pages
UNIT-I - Introduction To Computer Vision
No ratings yet
UNIT-I - Introduction To Computer Vision
45 pages
Unit 1
No ratings yet
Unit 1
55 pages
Knowledge Graph Construction Using Large Language Models
No ratings yet
Knowledge Graph Construction Using Large Language Models
17 pages
Prompt Engineering For Vision Models Slides 1720084286
No ratings yet
Prompt Engineering For Vision Models Slides 1720084286
17 pages
Bert
No ratings yet
Bert
36 pages
Generative Adversial Network
No ratings yet
Generative Adversial Network
21 pages
Generative AI For Media Analysis - Partner Use Case Package
No ratings yet
Generative AI For Media Analysis - Partner Use Case Package
45 pages
Generative Adversarial Network An Overview of Theory and Applications
No ratings yet
Generative Adversarial Network An Overview of Theory and Applications
9 pages
PyTorch Workflow Fundamentals
No ratings yet
PyTorch Workflow Fundamentals
1 page
Res Net
No ratings yet
Res Net
13 pages
542 315 Word2vec
No ratings yet
542 315 Word2vec
20 pages
CSC445: Neural Networks
No ratings yet
CSC445: Neural Networks
51 pages
Generative Adversarial Networks: Akrit Mohapatra Ece Department, Virginia Tech
No ratings yet
Generative Adversarial Networks: Akrit Mohapatra Ece Department, Virginia Tech
21 pages
Tf-Idf: David Kauchak cs160 Fall 2009
No ratings yet
Tf-Idf: David Kauchak cs160 Fall 2009
51 pages
First Order Logic: Artificial Intelligence
No ratings yet
First Order Logic: Artificial Intelligence
16 pages
McCormick How Stable Diffusion Works Dec 2022
No ratings yet
McCormick How Stable Diffusion Works Dec 2022
13 pages
Lab I TENSOR FLOW AND KERAS
No ratings yet
Lab I TENSOR FLOW AND KERAS
3 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
11 pages
Face Photo Sketch Recognition Using Deep
No ratings yet
Face Photo Sketch Recognition Using Deep
6 pages
ArabicOCR - Amazing OCR Library For Arabic PDF Documents - by Shekhar Khandelwal - Medium
No ratings yet
ArabicOCR - Amazing OCR Library For Arabic PDF Documents - by Shekhar Khandelwal - Medium
16 pages
Anomaly Detection in Images CIFAR-10
No ratings yet
Anomaly Detection in Images CIFAR-10
9 pages
Lec19 - GANs
No ratings yet
Lec19 - GANs
47 pages
Awesome Big Data Algorithms
No ratings yet
Awesome Big Data Algorithms
37 pages
Mehryar Mohri - Foundations of Machine Learning - Book
No ratings yet
Mehryar Mohri - Foundations of Machine Learning - Book
1 page
Partially Covered Face Detection in Presence of Headscarf For Surveillance Applications
No ratings yet
Partially Covered Face Detection in Presence of Headscarf For Surveillance Applications
5 pages
Tutorial - What Is A Variational Autoencoder - Jaan Altosaar
No ratings yet
Tutorial - What Is A Variational Autoencoder - Jaan Altosaar
20 pages
Dropout Vs Pruning
No ratings yet
Dropout Vs Pruning
2 pages
Deep Learning Tensorflow
No ratings yet
Deep Learning Tensorflow
35 pages
Generative Adversarial Networks Review 1-06-08-1.edit
No ratings yet
Generative Adversarial Networks Review 1-06-08-1.edit
24 pages
What Is The Need For Residual Learning?
No ratings yet
What Is The Need For Residual Learning?
3 pages
Anomaly Detection Using Graph Neural Networks
No ratings yet
Anomaly Detection Using Graph Neural Networks
5 pages
Face Detection and Smile Detection
No ratings yet
Face Detection and Smile Detection
8 pages
Archaeological Site Detection: The Importance of Contrast
No ratings yet
Archaeological Site Detection: The Importance of Contrast
6 pages
AI For Trading Syllabus: Contact Info
No ratings yet
AI For Trading Syllabus: Contact Info
7 pages
Tsa Lab Record - Cse
No ratings yet
Tsa Lab Record - Cse
61 pages
Resume Screening Using Machine Learning
No ratings yet
Resume Screening Using Machine Learning
7 pages
CCS348 - Game Theory Lab Manual Record
No ratings yet
CCS348 - Game Theory Lab Manual Record
42 pages
Module 5 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
No ratings yet
Module 5 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
26 pages
Cyberbullying Detection Using Natural Language Processing
No ratings yet
Cyberbullying Detection Using Natural Language Processing
10 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
21 pages
Big Data Analytics For Social Media Trends
No ratings yet
Big Data Analytics For Social Media Trends
34 pages
Unit 1
No ratings yet
Unit 1
26 pages
(Fall 2024) Deep Learning 3
No ratings yet
(Fall 2024) Deep Learning 3
54 pages
Deep Learning For Detecting Financial Statement Fraud
No ratings yet
Deep Learning For Detecting Financial Statement Fraud
46 pages
Application of Quantum Recurrent Neural Network in Low Resource Language Text Classification
No ratings yet
Application of Quantum Recurrent Neural Network in Low Resource Language Text Classification
13 pages
Paper 5
No ratings yet
Paper 5
42 pages
Data Representation in Machine Learning Methods With Its Applicat
No ratings yet
Data Representation in Machine Learning Methods With Its Applicat
100 pages
Sentiment Analysis Based On Deep Learning - A Comparative Study
No ratings yet
Sentiment Analysis Based On Deep Learning - A Comparative Study
29 pages
Machine Learning and NLP Approaches in Address Matching
No ratings yet
Machine Learning and NLP Approaches in Address Matching
60 pages
SSRN 5071032
No ratings yet
SSRN 5071032
45 pages
Ai Hiring With LLMS: A Context-Aware and Explainable Multi-Agent Framework For Resume Screening
No ratings yet
Ai Hiring With LLMS: A Context-Aware and Explainable Multi-Agent Framework For Resume Screening
10 pages
Nlp011gu05 PDF
No ratings yet
Nlp011gu05 PDF
53 pages
Artikel 10
No ratings yet
Artikel 10
11 pages
Adversarial Attack and Defense Technologies in Natural Language
No ratings yet
Adversarial Attack and Defense Technologies in Natural Language
30 pages
Bhashini Cheatsheet
No ratings yet
Bhashini Cheatsheet
7 pages
NLP PDF
No ratings yet
NLP PDF
3 pages
Protvec: Problem Based Learning - July 2
No ratings yet
Protvec: Problem Based Learning - July 2
12 pages
Ensemble Application of Convolutional and Recurrent Neural Networks For Multi-Label Text Categorization
No ratings yet
Ensemble Application of Convolutional and Recurrent Neural Networks For Multi-Label Text Categorization
7 pages
Joshi The State and Fate of Linguistic Diversity and Inclusion in The NLP World
No ratings yet
Joshi The State and Fate of Linguistic Diversity and Inclusion in The NLP World
8 pages
Fake Reviews Detection Based On LDA: Shaohua Jia Xianguo Zhang, Xinyue Wang, Yang Liu
No ratings yet
Fake Reviews Detection Based On LDA: Shaohua Jia Xianguo Zhang, Xinyue Wang, Yang Liu
4 pages
Hopfield Networks: Fundamentals and Applications of The Neural Network That Stores Memories
From Everand
Hopfield Networks: Fundamentals and Applications of The Neural Network That Stores Memories
Fouad Sabry
No ratings yet
Hebbian Learning: Fundamentals and Applications for Uniting Memory and Learning
From Everand
Hebbian Learning: Fundamentals and Applications for Uniting Memory and Learning
Fouad Sabry
No ratings yet
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
From Everand
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
Fouad Sabry
No ratings yet