Word2vec Summary
Word2vec Summary
The paper addresses two key challenges in word representations for NLP:
Many current NLP systems and techniques treat words as atomic units -
there is no notion of similarity between words, as these are represented as
indices in a vocabulary.
The main goal of this paper is to introduce techniques that can be used for
learning high-quality word vectors from huge data sets with billions of
words, and with millions of words in the vocabulary. As far as we know,
none of the previously proposed architectures has been successfully
trained on more than a few hundred of millions of words.
Key Innovation
Specifically:
2. Skip-gram:
The second architecture is similar to CBOW, but instead of predicting the
current word based on the context, it tries to maximize classification of a
word based on another word in the same sentence.
Technical Framework
Removes hidden
CBOW 64% accuracy on Lower semantic
layer, shares
Architecture syntactic tasks accuracy (24%)
projections
Predicts
Slightly lower
Skip-gram surrounding
55% semantic accuracy syntactic
Architecture words from
accuracy
current word
Requires
“Less than a day to learn
Training Trains on billions significant
high quality word vectors
Efficiency of words computing
from 1.6B words”
resources
“vector("King") -
Captures
vector("Man") +
Vector semantic Not 100%
vector("Woman") results
Operations relationships accurate
in vector closest to
algebraically
Queen”
Initial Assessment
Key Strengths:
• Computational efficiency:
• Semantic richness:
When the word vectors are well trained, it is possible to find the correct
answer using simple algebraic operations
• Scalability:
Limitations:
The CBOW architecture works better than the NNLM on the syntactic
tasks, and about the same on the semantic one. Finally, the Skip-gram
architecture works slightly worse on the syntactic task than the CBOW
model
Key Architectures
The paper introduces two novel model architectures for learning word vectors:
2. Skip-gram:
The second architecture is similar to CBOW, but instead of predicting the
current word based on the context, it tries to maximize classification of a
word based on another word in the same sentence.
Empirical Results
Technical Innovations
1. Efficient Training:
2. Vector Operations:
Our ongoing work shows that the word vectors can be successfully applied
to automatic extension of facts in Knowledge Bases, and also for
verification of correctness of existing facts. Results from machine
translation experiments also look very promising.
The paper represents a significant advance in efficient word vector training while
maintaining or improving accuracy compared to more complex neural network
approaches.