0% found this document useful (0 votes)
24 views5 pages

Word2vec Summary

Uploaded by

benthecoder07
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views5 pages

Word2vec Summary

Uploaded by

benthecoder07
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Core Problem Statement

The paper addresses two key challenges in word representations for NLP:

Many current NLP systems and techniques treat words as atomic units -
there is no notion of similarity between words, as these are represented as
indices in a vocabulary.

The main goal of this paper is to introduce techniques that can be used for
learning high-quality word vectors from huge data sets with billions of
words, and with millions of words in the vocabulary. As far as we know,
none of the previously proposed architectures has been successfully
trained on more than a few hundred of millions of words.

Key Innovation

The paper introduces two novel architectures:

We propose two new model architectures for computing continuous vector


representations of words that try to minimize computational complexity. The
main observation from the previous section was that most of the complexity
is caused by the non-linear hidden layer in the model.

Specifically:

1. Continuous Bag-of-Words (CBOW):

The first proposed architecture is similar to the feedforward NNLM,


where the non-linear hidden layer is removed and the projection layer is
shared for all words

2. Skip-gram:
The second architecture is similar to CBOW, but instead of predicting the
current word based on the context, it tries to maximize classification of a
word based on another word in the same sentence.

Technical Framework

Component Innovation Evidence of Impact Limitations

Removes hidden
CBOW 64% accuracy on Lower semantic
layer, shares
Architecture syntactic tasks accuracy (24%)
projections

Predicts
Slightly lower
Skip-gram surrounding
55% semantic accuracy syntactic
Architecture words from
accuracy
current word

Requires
“Less than a day to learn
Training Trains on billions significant
high quality word vectors
Efficiency of words computing
from 1.6B words”
resources

“vector("King") -
Captures
vector("Man") +
Vector semantic Not 100%
vector("Woman") results
Operations relationships accurate
in vector closest to
algebraically
Queen”

Initial Assessment

Key Strengths:

• Computational efficiency:

Because of the much lower computational complexity, it is possible to


compute very accurate high dimensional word vectors from a much larger
data set

• Semantic richness:
When the word vectors are well trained, it is possible to find the correct
answer using simple algebraic operations

• Scalability:

Using the DistBelief distributed framework, it should be possible to train


the CBOW and Skip-gram models even on corpora with one trillion
words

Limitations:

• Trade-offs between models:

The CBOW architecture works better than the NNLM on the syntactic
tasks, and about the same on the semantic one. Finally, the Skip-gram
architecture works slightly worse on the syntactic task than the CBOW
model

• Not perfect accuracy:


> Question is assumed to be correctly answered only if the closest word to the
vector computed using the above method is exactly the same as the correct
word in the question; synonyms are thus counted as mistakes.

Key Architectures

The paper introduces two novel model architectures for learning word vectors:

1. Continuous Bag-of-Words (CBOW):

The first proposed architecture is similar to the feedforward NNLM,


where the non-linear hidden layer is removed and the projection layer is
shared for all words (not just the projection matrix); thus, all words get
projected into the same position (their vectors are averaged).

2. Skip-gram:
The second architecture is similar to CBOW, but instead of predicting the
current word based on the context, it tries to maximize classification of a
word based on another word in the same sentence.

Empirical Results

The models were evaluated on semantic-syntactic word relationship tests:

Model Semantic Accuracy Syntactic Accuracy Total

RNNLM 9% 36% 24.6%

NNLM 23% 53% 47%

CBOW 24% 64% 61%

Skip-gram 55% 59% 56%

Key performance highlights:

We observe large improvements in accuracy at much lower computational


cost, i.e. it takes less than a day to learn high quality word vectors from a
1.6 billion words data set.

Technical Innovations

1. Efficient Training:

The training complexity of this architecture is proportional to O = E × T ×


Q, where E is number of the training epochs, T is the number of the
words in the training set and Q is defined further for each model
architecture.

2. Vector Operations:

Somewhat surprisingly, it was found that similarity of word


representations goes beyond simple syntactic regularities. Using a word
offset technique where simple algebraic operations are performed on the
word vectors, it was shown for example that vector("King") -
vector("Man") + vector("Woman") results in a vector that is closest to the
vector representation of the word Queen.

Limitations and Future Work

The authors acknowledge some limitations:

Question is assumed to be correctly answered only if the closest word to


the vector computed using the above method is exactly the same as the
correct word in the question; synonyms are thus counted as mistakes.

Future directions include:

Our ongoing work shows that the word vectors can be successfully applied
to automatic extension of facts in Knowledge Bases, and also for
verification of correctness of existing facts. Results from machine
translation experiments also look very promising.

The paper represents a significant advance in efficient word vector training while
maintaining or improving accuracy compared to more complex neural network
approaches.

You might also like