08 Exercises Word2vec MUD SOLVED

exercises solved

Uploaded by

silvshootss

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

78 views3 pages

08 Exercises Word2vec MUD SOLVED

exercises solved

Uploaded by

silvshootss

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

CLASS Exercises: WORD2VEC

Exercise 1

A feed-forward neural network language model (LM) is an alternative architecture for

training word vectors. This architecture focuses on predicting a word given the N previous
words. This is done by concatenating the word vectors of N previous words and use them
as input of a single hidden layer of size H with a non-linearity (e.g. tanh). Finally, a
softmax layer is used to make a prediction of the current word. The size of the vocabulary
is V. The model is trained using a cross entropy loss for the current word.

Let the word vectors of the N previous words be x1; x2; … xN, each a column vector
of dimension D, and let y be the one-hot vector for the current word. The network is
specified by the equations that follow these lines:

𝑥$
𝑥%
𝑥 = #…(
𝑥'

ℎ = tanh(𝑊𝑥 + 𝑏)

𝑦4 = 𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑈ℎ + 𝑑 )

𝐽 = 𝐶𝐸 (𝑦, 𝑦4)

𝐶𝐸 = − B 𝑦C log (𝑦4C )
C

The dimensions of our parameters and variables are 𝑥 ∈ ℝ('·K) , 𝑊 ∈ ℝLM('·K) , 𝑏 ∈

ℝL , ℎ ∈ ℝL , 𝑈 ∈ ℝN M L , 𝑑 ∈ ℝN , 𝑦4 ∈ ℝN

1a. Mention 2 important differences between1this feed-forward neural network LM and

the CBOW model. Explain how these differences might affect the word vectors obtained.
1b. Compute the complexity of forward propagation in a feed-forward LM for a single
training example. Propose at least one way to change the model that would reduce this
complexity.

SOLUTION

1a. The CBOW is trained to predict a center word given a context window that extends
on both sides, while word vectors learned by NNLM do not capture the context to the
right of the word.

The CBOW model simply uses the sum of context words, while the NNLM model
combines context words non-linearly. Thus the NNLM can learn to treat “not good to"
differently from “good to not", etc.

1b The forward propagation complexity for an NNLM is N x D for concatenating the

word vectors, N x D x H to compute h and H x V to compute 𝑦4 from h: in total, O(NDH
+ HV ). Typically, V >> ND, so the latter term dominates the forward propagation
computation.

The complexity can be reduced by using negative sampling to compute the softmax or
using the hierarchical softmax.

Exercise 2.

2a. We know that dense word vectors like the ones obtained with word2vec or GloVe
have many advantages over using sparse one-hot word vectors. Name a few.

2b. Also name at least 2 disadvantages of sparse vectors that it are not solved in dense
vectors. Which of the following is NOT an advantage dense vectors have over sparse
vectors?

SOLUTION

2a Models using dense word vectors generalize better to rare words than those using
sparse vectors.
Dense word vectors encode similarity between words while sparse vectors do not.
Dense word vectors are easier to include as features in machine learning systems than
sparse vectors.

2b. Just like sparse representations, word2vec or GloVe do not have representations for
unseen words and hence do not help in generalization.

Also, there is only one representation per word, so polysemy is not solved.
2
Exercise 3

Given the following neural architecture. What is it learning? Can you explain which
exact NLP task is training?

SOLUTION

The architecture is predicting the 4th word given the 3 previous words. This is an
architecture for the task of language modeling, predicting the probability of sequences
of words.

Exercise 4

We have each used the Word2Vec algorithm to obtain word embeddings for the same
vocabulary of words V.
Q Q
In particular, developer A has got `context' vectors 𝑢P and `center' vectors 𝑣P for every
T T
𝑤 in V, and developer B has got `context' vectors 𝑢P and `center' vectors 𝑣P for every
𝑤 in V .

For every pair of words 𝑤, 𝑤′in V, the inner product is the same in both models:
Q V Q T V T Q T
(𝑢P ) 𝑣WX = (𝑢P ) 𝑣WX . Does it mean that, for every word 𝑤 in V, 𝑣P = 𝑣P ? Discuss
your response.

SOLUTION

No. Word2Vec model only optimizes for the inner product between
word vectors for words in the same context.
One can rotate all word vectors by the same amount and the inner product will
still be the same. Alternatively one can scale the set of context vectors by a
factor of k and the set of center vectors by a factor of 1=k. Such transformations
preserves inner product, but the set of vectors could be di_erent.
Note that degenerate solutions (all zero vectors3 etc.) are discouraged.

Word Embedding
No ratings yet
Word Embedding
35 pages
Large Language Models From Scratch
No ratings yet
Large Language Models From Scratch
29 pages
Word Embeddings
No ratings yet
Word Embeddings
55 pages
DM Chapter 9 - Word Embedding
No ratings yet
DM Chapter 9 - Word Embedding
7 pages
2018 - Word Embedding - Word2Vec - 1 (Choi) (11 Slides)
100% (1)
2018 - Word Embedding - Word2Vec - 1 (Choi) (11 Slides)
11 pages
Letu Da Notes-Compiled
No ratings yet
Letu Da Notes-Compiled
438 pages
Word 2 Vec
No ratings yet
Word 2 Vec
6 pages
Word Embedding 9 Mar 23 PDF
No ratings yet
Word Embedding 9 Mar 23 PDF
16 pages
Lecture15 - Neural Models For NLP
No ratings yet
Lecture15 - Neural Models For NLP
62 pages
Represented Using Tensors, and As A Result, Neural Network Programming Utilizes
No ratings yet
Represented Using Tensors, and As A Result, Neural Network Programming Utilizes
32 pages
Recurrent Neural Networks Cheatsheet
No ratings yet
Recurrent Neural Networks Cheatsheet
44 pages
L6 - UCLxDeepMind DL2020 Document of Google
No ratings yet
L6 - UCLxDeepMind DL2020 Document of Google
141 pages
L4 Cse256 Fa24 We
No ratings yet
L4 Cse256 Fa24 We
68 pages
7a. Word Embeddings Word2Vec and GloVe
No ratings yet
7a. Word Embeddings Word2Vec and GloVe
39 pages
LSTM Lecture
No ratings yet
LSTM Lecture
163 pages
Word 2 Vec
No ratings yet
Word 2 Vec
33 pages
Lecture Word Embeddings WordTo Vec IR
No ratings yet
Lecture Word Embeddings WordTo Vec IR
60 pages
CCS369 Unit-2 20.12.24
No ratings yet
CCS369 Unit-2 20.12.24
41 pages
NLP Midsem Paper Jan 2024 Regular Exam
No ratings yet
NLP Midsem Paper Jan 2024 Regular Exam
4 pages
Learning Representations That Convey Semantic and Syntactic Information
No ratings yet
Learning Representations That Convey Semantic and Syntactic Information
14 pages
Unit IV
No ratings yet
Unit IV
58 pages
WINSEM2024-25 CSE4006 ETH AP2024254000689 2025-02-28 Reference-Material-I
No ratings yet
WINSEM2024-25 CSE4006 ETH AP2024254000689 2025-02-28 Reference-Material-I
39 pages
Ba LLMS W2 S2 2024 2025
No ratings yet
Ba LLMS W2 S2 2024 2025
47 pages
NLP NN Language Modeling Week5
No ratings yet
NLP NN Language Modeling Week5
33 pages
Chapter 13 Solutions
67% (3)
Chapter 13 Solutions
8 pages
07 Word Embeddings Notes
No ratings yet
07 Word Embeddings Notes
23 pages
Cs224n 2024 Lecture02 Wordvecs2
No ratings yet
Cs224n 2024 Lecture02 Wordvecs2
45 pages
08-DL-Deep Learning For Text Data (Transfer Learning in NLP)
No ratings yet
08-DL-Deep Learning For Text Data (Transfer Learning in NLP)
53 pages
Unit IV
No ratings yet
Unit IV
57 pages
08 Word Embeddings (2021)
No ratings yet
08 Word Embeddings (2021)
58 pages
Zhou 2020
No ratings yet
Zhou 2020
5 pages
3 - Deep Learning
No ratings yet
3 - Deep Learning
33 pages
01-Transformer Based NLP Applications
No ratings yet
01-Transformer Based NLP Applications
55 pages
GenAI Workflow Automation NPTEL Zoom Course
No ratings yet
GenAI Workflow Automation NPTEL Zoom Course
88 pages
06 Wordvectors
No ratings yet
06 Wordvectors
96 pages
Word Embadding
No ratings yet
Word Embadding
24 pages
ML For NLP-LO4
No ratings yet
ML For NLP-LO4
42 pages
Wordembed
No ratings yet
Wordembed
31 pages
BDMH LLM
No ratings yet
BDMH LLM
51 pages
Graph Representation Learning
No ratings yet
Graph Representation Learning
32 pages
NLP Notes
No ratings yet
NLP Notes
11 pages
NLP2
No ratings yet
NLP2
11 pages
Word Embeddings Classification
No ratings yet
Word Embeddings Classification
52 pages
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
100% (1)
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
12 pages
Chapter II
No ratings yet
Chapter II
26 pages
cs224n 2025 Lecture02 Wordvecs2
No ratings yet
cs224n 2025 Lecture02 Wordvecs2
46 pages
Unsupervised Learning of Sentence Embeddings Using Compositional N-Gram Features
No ratings yet
Unsupervised Learning of Sentence Embeddings Using Compositional N-Gram Features
11 pages
Part 3
No ratings yet
Part 3
5 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
33 pages
Exercises en Text Models 2
No ratings yet
Exercises en Text Models 2
5 pages
Word Embedding
No ratings yet
Word Embedding
9 pages
Dis8 Sol
No ratings yet
Dis8 Sol
6 pages
Sheet 3
No ratings yet
Sheet 3
5 pages
Lecture 2a - Word Level Semantics
No ratings yet
Lecture 2a - Word Level Semantics
34 pages
Natural Language Processing With Neural Network - Class3
No ratings yet
Natural Language Processing With Neural Network - Class3
25 pages
TTSH Nursing Survival Guide
100% (2)
TTSH Nursing Survival Guide
96 pages
CS490 Advanced Topics in Computing - Deep Learning
No ratings yet
CS490 Advanced Topics in Computing - Deep Learning
20 pages
542 315 Word2vec
No ratings yet
542 315 Word2vec
20 pages
Chapter 3 After Modfiy
No ratings yet
Chapter 3 After Modfiy
4 pages
Playboy Magazine Edition Croatia January 2016 - Pamela Anderson - Free Poster Calendar 2016 - Plastic Wrap Unopened
No ratings yet
Playboy Magazine Edition Croatia January 2016 - Pamela Anderson - Free Poster Calendar 2016 - Plastic Wrap Unopened
5 pages
"Standing On The Shoulders of Giants": Dominican College of Tarlac
100% (1)
"Standing On The Shoulders of Giants": Dominican College of Tarlac
3 pages
Mobile SDK Developer Guide
No ratings yet
Mobile SDK Developer Guide
387 pages
Crypto8e Merged
100% (1)
Crypto8e Merged
492 pages
Women Empowerment
100% (1)
Women Empowerment
7 pages
Ecs268: Structural & Material Laboratory: I. Objective
No ratings yet
Ecs268: Structural & Material Laboratory: I. Objective
7 pages
DLP - 2 - Weel 2 - in 21ST Centurt Literature in The Philippines and The World
No ratings yet
DLP - 2 - Weel 2 - in 21ST Centurt Literature in The Philippines and The World
5 pages
2024 Style Guide Template
No ratings yet
2024 Style Guide Template
7 pages
Medisin The Causes Solutions To Disease Malnutrition and The Medical Sins That Are Killing The World 1st Scott Whitaker PDF Download
No ratings yet
Medisin The Causes Solutions To Disease Malnutrition and The Medical Sins That Are Killing The World 1st Scott Whitaker PDF Download
82 pages
Capc
No ratings yet
Capc
21 pages
Policy Server Installation Guide
0% (1)
Policy Server Installation Guide
24 pages
UEET
No ratings yet
UEET
36 pages
TDM Chapter 2 Handouts 2020 KKA
No ratings yet
TDM Chapter 2 Handouts 2020 KKA
20 pages
Data Structure Programs Using C Language (Unit-3)
No ratings yet
Data Structure Programs Using C Language (Unit-3)
10 pages
Illrigger - GM Binder
No ratings yet
Illrigger - GM Binder
8 pages
Edited July 2024 Circular
No ratings yet
Edited July 2024 Circular
11 pages
Material Safety Data Sheet: Ephedrine Hydrochloride
No ratings yet
Material Safety Data Sheet: Ephedrine Hydrochloride
6 pages
Lime 2
No ratings yet
Lime 2
11 pages
Key Roles and Life Cycle
No ratings yet
Key Roles and Life Cycle
4 pages
翻译公司网站 Website Text - 4.06 Final
No ratings yet
翻译公司网站 Website Text - 4.06 Final
16 pages
Data Structure Practical File
No ratings yet
Data Structure Practical File
59 pages
UNIT 6 - 4000 Essential English Words 1
No ratings yet
UNIT 6 - 4000 Essential English Words 1
6 pages
SCHEME HND 1 General Computer II 2019-2020
No ratings yet
SCHEME HND 1 General Computer II 2019-2020
5 pages
Single Phase String Inverter 7-10 KW: Csi-7Ktl1P-Gi-Fl - Csi-8Ktl1P-Gi-Fl CSI-9KTL1P-GI-FL - CSI-10KTL1P-GI-FL
No ratings yet
Single Phase String Inverter 7-10 KW: Csi-7Ktl1P-Gi-Fl - Csi-8Ktl1P-Gi-Fl CSI-9KTL1P-GI-FL - CSI-10KTL1P-GI-FL
2 pages
Chapter 1 - Notes - Fixed Income Analysis
No ratings yet
Chapter 1 - Notes - Fixed Income Analysis
3 pages
Animal Husbandry MCQ
No ratings yet
Animal Husbandry MCQ
8 pages
A Treatise on the Calculus of Finite Differences
From Everand
A Treatise on the Calculus of Finite Differences
George Boole
4/5 (1)
The Summation of Series
From Everand
The Summation of Series
Harold T. Davis
4/5 (1)
Nell: An SVG Drawing Language
From Everand
Nell: An SVG Drawing Language
Stefan Hollos
No ratings yet
Satplan: Fundamentals and Applications
From Everand
Satplan: Fundamentals and Applications
Fouad Sabry
No ratings yet