0% found this document useful (0 votes)

25 views3 pages

Vector Semantics 5: (Count (C) )

This document discusses tweaks that are frequently required to improve the performance of word2vec and similar algorithms. It describes using more negative training pairs than positive pairs, weighting positive pairs based on distance, smoothing counts of rare words, and deleting very rare and very common words from the training text.

Uploaded by

chuck212

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views3 pages

Vector Semantics 5: (Count (C) )

Uploaded by

chuck212

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

CS440 Lectures https://courses.grainger.illinois.edu/cs440/fa2020/lectures/vec...

CS 440/ECE 448
Fall 2020 Vector Semantics 5
Margaret Fleck

In order to get nice results from word2vec, the basic algorithm needs to be tweaked a bit to
produce this good performance. Similar tweaks are frequently required by other similar
algorithms.

Building the set of training examples

In the basic algorithm, we consider the input (focus) words one by one. For each focus word,
we extract all words within +/- k positions as positive context words. We also randomly
generate a set of negative context words. This produces a set of positive pairs (w,c) and a set
of negative pairs (w,c') that are used to update the embeddings of w, c, and c'.

Tweak 1: Word2vec uses more negative training pairs than positive pairs, by a
factor of 2 up to 20 (depending on the amount of training data available).

You might think that the positive and negative pairs should be roughly balanced. However,
that apparently doesn't work. One reason may be that the positive context words are
deﬁnite indications of similarity, whereas the negative words are random choices that may
be more neutral than actively negative.

Tweak 2: Positive training examples are weighted by 1/m, where m is the distance
between the focus and context word. I.e. so adjacent context words are more
important than words with a bit of separation.

The closer two words are, the more likely their relationship is strong. This is a common
heuristic in similar algorithms.

Smoothing negative context counts

For a ﬁxed focus word w, negative context words are picked with a probability based on
how often words occur in the training data. However, if we compute P(c) = count(c)/N (N is
total words in data), rare words aren't picked often enough as context words. So instead we
replace each raw count count(c) with (count(c))α . The probabilities used for selecting
negative training examples are computed from these smoothed counts.

α is usually set to 0.75. But to see how this brings up the probabilities of rare words
compared to the common ones, it's a bit easier if you look at α = 0.5, i.e. we're computing
the square root of the input. In the table below, you can see that large probabilities stay
large, but very small ones are increased by quite a lot. After this transformation, you need to

1 of 3 5/10/21, 02:02
CS440 Lectures https://courses.grainger.illinois.edu/cs440/fa2020/lectures/vec...

normalize the numbers so that the probabilities add up to one again.

−
x x0.75 √x
.99 .992 .995

.9 .924 .949

.1 .178 .316

.01 .032 .1

.0001 .001 .01

This trick can also be used on PMI values (e.g. if using the methods from the previous
lecture).

Deletion, subsampling
Ah, but apparently they are still unhappy with the treatment of very common and very rare
words. So, when we ﬁrst read the input training data, word2vec modiﬁes it as follows:

very rare words are deleted from the text, and

very common words are deleted with a probability that increases with how frequent
they are.

This improves the balance between rare and common words. Also, deleting a word brings
the other words closer together, which improves the eﬀectiveness of our context windows.

Evaluation
The 2014 version of word2vec uses use 1 billion words to train embeddings for basic task.

For the word analogy tasks, they used an embedding with 1000 dimensions and about 33
billion words of training data. Performance on word analogies is about 66%.

By coomparison: children hear about 2-10 million words per year. Assuming the high end of
that range of estimates, they've heard about 170 million words by the time they take the SAT.
So the algorithm is performing well, but still seems to be underperforming given the
amount of data it's consuming.

A more recent embedding method, BERT large, is trained using a 24-layer network with
340M parameters. This has somewhat improved performance but apparently can't be
reproduced on a standard GPU. Again, a direction for future research is to ﬁgure out why ok
performance seems to require so much training data and compute power.

2 of 3 5/10/21, 02:02
CS440 Lectures https://courses.grainger.illinois.edu/cs440/fa2020/lectures/vec...

Some follow-on papers

Original Mikolov et all papers:

Eﬃcient Estimation of Word Representations in Vector Space

Distributed Representations of Words and Phrases and their Compositionality

Goldberg and Levy papers (easier and more explicit)

word2vec Explained
Dependency-Based Word Embeddings
Neural Word Embedding as Implicit Matrix Factorization
Improving Distributional Similarity with Lessons learned from Word Embeddings

3 of 3 5/10/21, 02:02

Word Embeddings
No ratings yet
Word Embeddings
55 pages
DM Chapter 9 - Word Embedding
No ratings yet
DM Chapter 9 - Word Embedding
7 pages
Sense VEC A Fast and Accurate Method For Word Sense Disambiguation in Neural Word Embeddings
No ratings yet
Sense VEC A Fast and Accurate Method For Word Sense Disambiguation in Neural Word Embeddings
9 pages
Lecture1 Word Embeddings
No ratings yet
Lecture1 Word Embeddings
99 pages
Master Thesis
No ratings yet
Master Thesis
74 pages
3 WordMeaning
No ratings yet
3 WordMeaning
78 pages
Ba LLMS W2 S2 2024 2025
No ratings yet
Ba LLMS W2 S2 2024 2025
47 pages
Cs224n 2025 Lecture03 Neuralnets
No ratings yet
Cs224n 2025 Lecture03 Neuralnets
96 pages
L4 Cse256 Fa24 We
No ratings yet
L4 Cse256 Fa24 We
68 pages
Unit 2 Updated New
No ratings yet
Unit 2 Updated New
77 pages
Deep Learning A Z PDF
100% (7)
Deep Learning A Z PDF
799 pages
Lecture 10
No ratings yet
Lecture 10
86 pages
XCS224N Module1 Slides
No ratings yet
XCS224N Module1 Slides
72 pages
NLP Notes Unit-3
No ratings yet
NLP Notes Unit-3
19 pages
cs224n 2025 Lecture02 Wordvecs2
No ratings yet
cs224n 2025 Lecture02 Wordvecs2
46 pages
A Simple Word2vec Tutorial - Zafar Ali - Medium - Reader View
No ratings yet
A Simple Word2vec Tutorial - Zafar Ali - Medium - Reader View
9 pages
ChatGPT MASTERY 12 Books in 1 Unlocki... (Z-Library)
No ratings yet
ChatGPT MASTERY 12 Books in 1 Unlocki... (Z-Library)
161 pages
Unit IV
No ratings yet
Unit IV
58 pages
Hadiyyisa POS Tagger With Deep Learning
100% (2)
Hadiyyisa POS Tagger With Deep Learning
34 pages
7a. Word Embeddings Word2Vec and GloVe
No ratings yet
7a. Word Embeddings Word2Vec and GloVe
39 pages
Word 2 Vec
No ratings yet
Word 2 Vec
33 pages
NLP - Module 2
No ratings yet
NLP - Module 2
54 pages
CCS369 Unit-2 20.12.24
No ratings yet
CCS369 Unit-2 20.12.24
41 pages
Lect 04
No ratings yet
Lect 04
44 pages
Word2vec Overview
No ratings yet
Word2vec Overview
2 pages
Lab 5
No ratings yet
Lab 5
27 pages
Word Embadding
No ratings yet
Word Embadding
24 pages
Unit IV
No ratings yet
Unit IV
57 pages
NLP Prez Word - Sentence Embedding - MAQUET - MARTIN - LEEFEBURE - MOGAVERO
No ratings yet
NLP Prez Word - Sentence Embedding - MAQUET - MARTIN - LEEFEBURE - MOGAVERO
18 pages
Distributional Semantics Word Vectors (3) - 71-93
No ratings yet
Distributional Semantics Word Vectors (3) - 71-93
23 pages
08 Word Embeddings (2021)
No ratings yet
08 Word Embeddings (2021)
58 pages
Wordembed
No ratings yet
Wordembed
31 pages
NLP Lec 03
No ratings yet
NLP Lec 03
26 pages
06 Wordvectors
No ratings yet
06 Wordvectors
96 pages
Lecture Word Embeddings WordTo Vec IR
No ratings yet
Lecture Word Embeddings WordTo Vec IR
60 pages
Seminar On Deep CNN
No ratings yet
Seminar On Deep CNN
36 pages
Cs224n 2024 Lecture02 Wordvecs2
No ratings yet
Cs224n 2024 Lecture02 Wordvecs2
45 pages
ML For NLP-LO4
No ratings yet
ML For NLP-LO4
42 pages
Word 2 Vec
No ratings yet
Word 2 Vec
29 pages
Assi Hisham 201606 PHD Thesis
No ratings yet
Assi Hisham 201606 PHD Thesis
119 pages
Wordembed v2.0
No ratings yet
Wordembed v2.0
46 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
57 pages
Christopher Manning Lecture 2: Word Vectors, Word Senses, and Neural Classifiers
No ratings yet
Christopher Manning Lecture 2: Word Vectors, Word Senses, and Neural Classifiers
57 pages
Vector Semantics 2 Word Embeddings (Vector Semantics)
No ratings yet
Vector Semantics 2 Word Embeddings (Vector Semantics)
5 pages
Christopher Manning Lecture 1: Introduction and Word Vectors
No ratings yet
Christopher Manning Lecture 1: Introduction and Word Vectors
42 pages
Sentiment Analysis Using Recurrent Neural Network
No ratings yet
Sentiment Analysis Using Recurrent Neural Network
7 pages
NLP An Intuitive Understanding of Word Embeddings From Count Vectors To Word2Vec
No ratings yet
NLP An Intuitive Understanding of Word Embeddings From Count Vectors To Word2Vec
18 pages
Embeddings
No ratings yet
Embeddings
3 pages
Sentiment Analysis Based On Weighted Word2vec and Att-LSTM
No ratings yet
Sentiment Analysis Based On Weighted Word2vec and Att-LSTM
5 pages
Using Word Embeddings For Text Search
No ratings yet
Using Word Embeddings For Text Search
32 pages
7 TH
No ratings yet
7 TH
6 pages
Vector Semantics 4
No ratings yet
Vector Semantics 4
3 pages
Untitled 1 1 1
No ratings yet
Untitled 1 1 1
46 pages
Traditional Word Embedding
No ratings yet
Traditional Word Embedding
9 pages
NLP Question Bank
No ratings yet
NLP Question Bank
27 pages
Classical Planning 3 Recap
No ratings yet
Classical Planning 3 Recap
4 pages
The FastText Model
No ratings yet
The FastText Model
2 pages
SSRN Id4389914
No ratings yet
SSRN Id4389914
12 pages
Distributed Representations of Sentences and Documents: Quoc Le Tomas Mikolov
No ratings yet
Distributed Representations of Sentences and Documents: Quoc Le Tomas Mikolov
9 pages
Dan Jurafsky and James Martin Speech and Language Processing
No ratings yet
Dan Jurafsky and James Martin Speech and Language Processing
46 pages
A Block-Based Linear MMSE Noise Reduction With A H PDF
No ratings yet
A Block-Based Linear MMSE Noise Reduction With A H PDF
15 pages
Image-Text Summarization PDF
No ratings yet
Image-Text Summarization PDF
11 pages
Paragraph Vector PDF
No ratings yet
Paragraph Vector PDF
9 pages
Reinforcement Learning 3 Recap
No ratings yet
Reinforcement Learning 3 Recap
3 pages
Intuitive Understanding of Word Embeddings - Count Vectors To Word2Vec
No ratings yet
Intuitive Understanding of Word Embeddings - Count Vectors To Word2Vec
34 pages
A Survey On Sentiment Analysis Methods Applications and Challenges
No ratings yet
A Survey On Sentiment Analysis Methods Applications and Challenges
50 pages
542 315 Word2vec
No ratings yet
542 315 Word2vec
20 pages
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
100% (1)
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
12 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
36 pages
Spanish Word Vectors From Wikipedia: Mathias Etcheverry, Dina Wonsever
No ratings yet
Spanish Word Vectors From Wikipedia: Mathias Etcheverry, Dina Wonsever
5 pages
Hairpin RNA-Mediated Strategies For Silencing of Tomato Leaf Curl Virus AC1 and AC4 Genes For Effective Resistance in Plants
No ratings yet
Hairpin RNA-Mediated Strategies For Silencing of Tomato Leaf Curl Virus AC1 and AC4 Genes For Effective Resistance in Plants
10 pages
Natural Language Processing in Action Understanding Analyzing and Generating Text With Python 1st Edition Hobson Lane Download
No ratings yet
Natural Language Processing in Action Understanding Analyzing and Generating Text With Python 1st Edition Hobson Lane Download
53 pages
Session1 2024 - 2025 - Natural Language Processing
No ratings yet
Session1 2024 - 2025 - Natural Language Processing
40 pages
Layout of The Field: Irobot
No ratings yet
Layout of The Field: Irobot
6 pages
0.1 Debiasing Word Embeddings (Short Summary)
No ratings yet
0.1 Debiasing Word Embeddings (Short Summary)
1 page
A Verifiable Semantic Searching Scheme by Optimal Matching Over Encrypted Data in Public Cloud
No ratings yet
A Verifiable Semantic Searching Scheme by Optimal Matching Over Encrypted Data in Public Cloud
16 pages
Machine Learning and NLP Approaches in Address Matching
No ratings yet
Machine Learning and NLP Approaches in Address Matching
60 pages
CS490 Advanced Topics in Computing - Deep Learning
No ratings yet
CS490 Advanced Topics in Computing - Deep Learning
20 pages
Constructing and Evaluating Word Embeddings
No ratings yet
Constructing and Evaluating Word Embeddings
33 pages
Moverscore: Text Generation Evaluating With Contextualized Embeddings and Earth Mover Distance
No ratings yet
Moverscore: Text Generation Evaluating With Contextualized Embeddings and Earth Mover Distance
16 pages
Games 1 What Makes Games Special?
No ratings yet
Games 1 What Makes Games Special?
5 pages
Vector Semantics 3
No ratings yet
Vector Semantics 3
5 pages
Amazon Recommender System
No ratings yet
Amazon Recommender System
24 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
33 pages
#Metoomaastricht: Building A Chatbot To Assist Survivors of Sexual Harassment
No ratings yet
#Metoomaastricht: Building A Chatbot To Assist Survivors of Sexual Harassment
19 pages
Classical Planning 2 History: STRIPS Planner
No ratings yet
Classical Planning 2 History: STRIPS Planner
3 pages
Word2Vec Tutorial - The Skip-Gram Model Chris McCormick PDF
No ratings yet
Word2Vec Tutorial - The Skip-Gram Model Chris McCormick PDF
39 pages
? DSML U4
No ratings yet
? DSML U4
27 pages
Make 06 00041 v2
No ratings yet
Make 06 00041 v2
21 pages
DAQAS - Deep Arabic Question Answering System Based On Duplicate Question Detection and Machine Reading Comprehension
No ratings yet
DAQAS - Deep Arabic Question Answering System Based On Duplicate Question Detection and Machine Reading Comprehension
14 pages
Sharma, Patel - 2018 - Toxic Comment Classification Using Neural Networks and Machine Learning-Annotated
No ratings yet
Sharma, Patel - 2018 - Toxic Comment Classification Using Neural Networks and Machine Learning-Annotated
6 pages
Eduard Dragut - 2
No ratings yet
Eduard Dragut - 2
10 pages
Multiple Regions of The Arabidopsis SAUR AC1 Gene Control Transcript Abundance - The 3 Untranslated Region Functions As An MRNA Instability Determinant. - The EMBO Journal
No ratings yet
Multiple Regions of The Arabidopsis SAUR AC1 Gene Control Transcript Abundance - The 3 Untranslated Region Functions As An MRNA Instability Determinant. - The EMBO Journal
1 page
Advanced C++ Interview Questions You'll Most Likely Be Asked
From Everand
Advanced C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Naive Bayes Classifier: Fundamentals and Applications
From Everand
Naive Bayes Classifier: Fundamentals and Applications
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Vector Semantics 5: (Count (C) )

Uploaded by

Vector Semantics 5: (Count (C) )

Uploaded by

CS440 Lectures https://courses.grainger.illinois.edu/cs440/fa2020/lectures/vec...

Building the set of training examples

Smoothing negative context counts

normalize the numbers so that the probabilities add up to one again.

.0001 .001 .01

very rare words are deleted from the text, and

Some follow-on papers

Eﬃcient Estimation of Word Representations in Vector Space

Goldberg and Levy papers (easier and more explicit)

You might also like