0% found this document useful (0 votes)

58 views32 pages

Using Word Embeddings For Text Search

This document provides an overview of word embeddings and the Word2vec neural network model. It discusses: 1) How Word2vec learns word embeddings by using a neural network to predict context words from a focus word. The network projects words into dense numeric vectors called word embeddings. 2) These word embeddings allow words with similar contexts to have numerically close vectors, capturing semantic relationships. 3) The document outlines the Word2vec training process and discusses potential applications of word embeddings for improving text search and retrieval systems.

Uploaded by

Juan Carlos Álvarez Salazar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views32 pages

Using Word Embeddings For Text Search

Uploaded by

Juan Carlos Álvarez Salazar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Using Word Embeddings for Text Search and

Retrieval

Lecture Notes on Deep Learning

Avi Kak and Charles Bouman

Purdue University

Tuesday 28th April, 2020 11:54

Purdue University 1
Preamble
In the year 2013, a group of researchers at Google created a revolution in the text
processing community with the publication of the word2vec neural network for
generating numerical representations for the words in a text corpus. Here are their
papers on that discovery:
https://arxiv.org/abs/1301.3781

https://arxiv.org/pdf/1310.4546.pdf

The word-to-numeric representations created by the word2vec network are vectors

of real numbers, the size of the vectors being a hyperparameter of the network.
These vectors are called word embeddings.

The word embeddings generated by word2vec allow us to establish word

similarities on the basis of word contexts.

To elaborate, if two different words, wordi and wordj , are frequently surrounded
by the same set of other words (called the context words), the word embedding
vectors for wordi and wordj would be numerically close.
Purdue University 2
Preamble (contd.)
What’s amazing is that when you look at the similarity clusters produced by
word2vec, they create a powerful illusion that a computer has finally solved the
mystery of how to automatically learn the semantics of the words.

My goal in this lecture is to introduce you to the word2vec neural network and to
then present some of my lab’s research on using the Word2vec embeddings in the
context of software engineering for automatic bug localization.

However, in order to underscore the importance of word2vec, I am going to start

with the importance of text search and retrieval in our lives and bring to your
attention the major forums where this type of research is presented.

Purdue University 3
Outline

1 Importance of Text Search and Retrieval

2 How the Word Embeddings are Learned in Word2vec

3 Softmax as the Activation Function in Word2vec

4 Training the Word2vec Network

5 Using Word2vec for Improving the Quality of Text Retrieval

Purdue University 4
Importance of Text Search and Retrieval

Outline

1 Importance of Text Search and Retrieval

2 How the Word Embeddings are Learned in Word2vec

3 Softmax as the Activation Function in Word2vec

4 Training the Word2vec Network

5 Using Word2vec for Improving the Quality of Text Retrieval

Purdue University 5
Importance of Text Search and Retrieval

Role of Text Search and Retrieval in our Lives

A good indicator of the central importance of text search and retrieval

is the number of times you google something in a 24-hour period.

Even without thinking, we now bring up an app on our mobile device

or on a laptop to get us instantaneously what it is that we are looking
for. We could be looking for how to get to a restaurant, how to spell
a long word, how to use an idiom properly, or about any of a very
large number of other things.

You could say that instantaneous on-line search has now become an
important part of our reflexive behavior.

In my own lab (RVL), we now have a history of around 15 years

during which we have used text search and retrieval tools to develop
ever more powerful algorithms for automatic bug localization.

Purdue University 6
Importance of Text Search and Retrieval

Text Search and Retrieval and Us (contd.)

The earliest approaches to text search and retrieval used what are
known as the Bag-of-Words (BoW) algorithms. With BoW, you
model each document by a histogram of its word frequencies. You
think of the histogram as a vector whose length equals the size of the
vocabulary. Given a document, you place in each element of this
vector the number of times the word corresponding to that element
appears in the document.
The document vector constructed as described above is called the
term frequency vector. And the term-frequency vectors for all the
documents in a corpus define what is known as the term-frequency
matrix.
You construct a similar vector representation for the user’s query for
document retrieval. Using cosine distance between the vectors as a
similarity metric, you return the documents that are most similar to
the query.
Purdue University 7
Importance of Text Search and Retrieval

Text Search and Retrieval and Us (contd.)

The BoW based text retrieval algorithms were followed by approaches

that took term-term order into account. This was a big step forward.
The best of these were based on what is known as the Markov
Random Fields (MRF) model.

In an MRF based framework, in addition to measuring the frequencies

of the individual words, you also measure the frequencies of ordered
pairs of words. The results obtained with MRF based approach were a
significant improvement over those obtained with BoW based
methods.

Most modern approaches to text search and retrieval also include

what I called “contextual semantics” in the modeling process. The
best-known approach for that is the word2vec neural network that is
the focus of this lecture.
Purdue University 8
Importance of Text Search and Retrieval

Important Forums for Research Related to Text Retrieval

Before ending this section, I just wanted to mention quickly that the
most important annual research forums on general text retrieval are
the ACM’s SIGIR conference and the NIST’s TREC workshops. Here
are the links to these meetings:
https://sigir.org/sigir2019/

https://trec.nist.gov/pubs/call2020.html

Since my personal focus is on text retrieval in the context of

extracting information from software repositories, the go-to meeting
for me is the annual MSR (Mining Software Repositories) conference:
https://2020.msrconf.org/track/msr-2020-papers#event-overview

This link is to the list of accepted papers at this year’s conference. I

chose this link intentionally since it shows our own paper at the top of
list. (Nothing wrong with a bit of self promotion! Right?)
Purdue University 9
How the Word Embeddings are Learned in Word2vec

Outline

1 Importance of Text Search and Retrieval

2 How the Word Embeddings are Learned in Word2vec

3 Softmax as the Activation Function in Word2vec

4 Training the Word2vec Network

5 Using Word2vec for Improving the Quality of Text Retrieval

Purdue University 10
How the Word Embeddings are Learned in Word2vec

Word2vec

I’ll explain the word2vec neural network with the help of the figure
shown on the next slide.

The files in a text corpus are scanned with a window of size 2W + 1.

The word in the middle of the window is considered to be the focus
word. And the W words on either side are referred to as the context
words for the focus word.

We assume that the size of the vocabulary is V .

As a text file is scanned, the V -element long one-hot vector

representation of each focus word is fed as input to the neural
network.

Each such input goes through the first linear layer where it is
multiplied by a matrix, denoted WV ×N , of learnable parameters.
Purdue University 11
How the Word Embeddings are Learned in Word2vec

Word2vec (contd.)

Figure: The SkipGram model for generating the word2vec embeddings for a text corpus.
Purdue University 12
How the Word Embeddings are Learned in Word2vec

Word2vec (contd.)
Overall, in the word2vec network, a V -element tensor at the input
goes into an N-element tensor in the middle layer and, eventually,
back into a V -element final output tensor.
Here is a very important point about the first linear operation on the
input in the word2vec network: In all of the DL implementations you
have seen so far, a torch.nn.Linear layer was always followed by a
nonlinear activation, but that’s NOT the case for the first invocation
of torch.nn.Linear in the word2vec network.
The reason for the note in red above is that the sole purpose of the
first linear layer is merely to serve as a projection operator.
But what does that mean?
To understand what we mean by a projection operator, you have to
come to terms with the semantics of the matrix WV ×N . And that
takes us to the next slide.
Purdue University 13
How the Word Embeddings are Learned in Word2vec

Word2vec (contd.)
You see, the matrix WV ×N is actually meant to be a stack of the
word embeddings that we are interested in. The i th row of this matrix
stands for the N-element word embedding for the i th word in a sorted
list of the vocabulary.
Given the one-hot vector for, say, the i th vocab word at the input, the
purpose of multiplying this vector with the matrix WV ×N is simply to
“extract” the current value for the embedding for this word and to
then present it to the neural layer that follows.
You could say that, for the i th -word at the input, the role of the
WV ×N matrix is to project the current value of the word’s embedding
into the neural layer that follows. It’s for this reason that the middle
layer of the network is known as the projection layer.
In case you are wondering about the size of N vis-a-vis that of V ,
that’s a hyperparameter of the network whose value is best set by
trying
Purdue out different values for N and choosing the best.
University 14
How the Word Embeddings are Learned in Word2vec

Word2vec (contd.)
After the projection layer, the rest of the word2vec network as shown
in Slide 12 is standard stuff. You have a linear neural layer with
torth.nn.Softmax as the activation function.
To emphasize, the learnable weights in the N × V matrix W 0 along
with the activation that follows is the only neural layer in word2vec.
You can reach the documentation on the activation function
(torch.nn.Softmax through the main webpage for torch.nn. This
activation function is listed under the category “Nonlinear activations
(Other)” at the “torch.nn” webpage.
To summarize, word2vec is a single-layer neural network that uses a
projection layer as its front end.
While the figure on Slide 12 is my visualization of how the data flows
forward in a word2vec network, a more common depiction of this
network as shown on the next slide.
Purdue University 15
How the Word Embeddings are Learned in Word2vec

Word2vec (contd.)
This figure is from the following publication by Shayan Akbar and
myself:
https://engineering.purdue.edu/RVL/Publications/Akbar_SCOR_Source_Code_Retrieval_2019_MSR_paper.pdf

Figure: A more commonly used depiction for the SkipGram model for generating the word2vec embeddings for a vocabulary

Purdue University 16
Softmax as the Activation Function in Word2vec

Outline

1 Importance of Text Search and Retrieval

2 How the Word Embeddings are Learned in Word2vec

3 Softmax as the Activation Function in Word2vec

4 Training the Word2vec Network

5 Using Word2vec for Improving the Quality of Text Retrieval

Purdue University 17
Softmax as the Activation Function in Word2vec

The Softmax Activation Function

An important part of education is to see how the concepts you have
already learned relate to the new concepts you are about to learn.
The following comment is in keeping with that precept.
As you saw in the Lecture on Recurrent Neural Networks (see Slide 22
of that lecture), the activation function LogSoftmax and the loss
function NLLLoss are typically used together and their joint usage
amounts to the same thing as using the loss function
CrossEntropyLoss that I had presented previously in the lecture on
Object Detection and Localization (see Slide 8 of that lecture).
In the sense mentioned above, we can say that NLLLoss,
LogSoftmax, and CrossEntropyLoss are closely related concepts,
despite the fact that two of them are loss functions and one an
activation function.
On the next slide, I’ll add to the mix of those three related concepts
theUniversity
Purdue activation
function Softmax. 18
Softmax as the Activation Function in Word2vec

The Softmax Activation Function (contd.)

The Softmax activation function shown below looks rather similar to
the cross-entropy loss function presented on Slide 8 of the lecture on
Object Detection:
e xi
Softmax(xi ) = P x
j e
j

yet the two functions carry very different meanings, which has nothing
to do with the fact that the cross-entropy formula requires you take
the negative log of a ratio that looks like what is shown above.
The cross-entropy formula presented in the Object Detection lecture
focuses specifically on just that output node that is supposed to be
the true class label of the input notwithstanding the appearance of all
the nodes in the denominator that is used for the normalization of the
value at the node that the formula focuses on.
On the other hand, the Softmax formula shown above places equal
focus on all the output nodes. That is, the values at all the nodes are
normalized by the same denominator.
Purdue University 19
Softmax as the Activation Function in Word2vec

The Softmax Activation Function (contd.)

The best interpretation of the formula for Softmax shown on the
previous slide is that it converts all the output values into a
probability distribution.
As to why, the value of the ratio shown in the formula is guaranteed to
be positive, is guaranteed to not exceed 1, and the sum of the ratios
calculated at all the output nodes is guaranteed to sum to 1 exactly.
That the output of the activation function can be treated as a
probability is important to word2vec because it allows us to talk about
each output as being the conditional probability of the corresponding
word in the vocab being the context word for the input focus word.
To elaborate, let wi represent the i th row of the WVxN matrix of
weights for the first linear layer and let w 0 j represent the j th column
0
of the WNxV matrix of weights for the second linear layer in the
word2vec network.
Purdue University 20
Softmax as the Activation Function in Word2vec

The Softmax Activation Function (contd.)

If we use xj to denote the output of the second linear layer (that is,
prior to its entering the activation function) at the j th node, we can
write
T
xj = w 0 j wi

In light of the probabilistic interpretation given to the output of the

activation function, we now claim the following: If we let p(j|i) be
the conditional probability that the j th vocab word at, obviously, the
j th output node is a context word for the i th focus word, we have
0T
e w j wi
p(j|i) = P w 0T w
ke
k i

The goal of training a word2vec network is to maximize this

conditional probability for the actual context words for any given
focus word.
Purdue University 21
Softmax as the Activation Function in Word2vec

The Softmax Activation Function (contd.)

That takes us to the following issues:
1 How to best measure the loss between the true conditional probability
for the context words and the current estimates for the same; and
2 How to backpropagate the loss?

These issues are addressed in the next section.

Purdue University 22
Training the Word2vec Network

Outline

1 Importance of Text Search and Retrieval

2 How the Word Embeddings are Learned in Word2vec

3 Softmax as the Activation Function in Word2vec

4 Training the Word2vec Network

5 Using Word2vec for Improving the Quality of Text Retrieval

Purdue University 23
Training the Word2vec Network

Training for Word2vec

As you already know, the word2vec network is training by scanning

the text corpus with a window of size 2W + 1, where W is typically
between 5 and 10, for each focus word, testing the output against the
one-hot representations for the 2W context words for the focus word.

That raises the issue of how to present the context words at the
output for the calculation of the loss.

In keeping with the conditional probability interpretation of the

forward-projected output as presented in the last section, the target
output could be a V -element tensor that is a 2W -version of a
one-hot tensor: A V -element tensor in which just those elements are
1 that correspond to the context words, with the rest of the elements
being 0s.

Purdue University 24
Training the Word2vec Network

Calculating the Loss

A target tensor such as the one described above would look like
1

2W +1 j is context word for i
pT (j|i) =
0 otherwise

The calculation of loss now amounts to comparing the estimated

probability distribution on Slide 21 against the target probability
distribution shown above. This is best done with the more general
cross-entropy loss formula shown on Slide 10 of the lecture on Object
Detection. With the notation I am using in this lecture, that formula
becomes:
X
Lossi (pT , p) = − pT (j|i) · log2 p(j|i)
j∈context(i)

w0T
j wi
X 1 e
= − · log2
2W + 1 w 0 T wi P
k e
j∈context(i) k
w 0T w X 0T
w w
X
= − log2 e j i + log2 e k i ignoring inconsequential terms
j∈context(i) k

w 0T
X
0T k wi
X
= −w j wi + log2 e
j∈context(i) k

Purdue University 25
Training the Word2vec Network

Computing the Gradients of the Loss

The subscript i in the notation Lossi is meant to indicate that the i th
of the vocabulary is the focus word at the moment and that the loss
at the output is being calculated for that focus word.

Given the simple form for the loss as shown on the last slide, it is easy
to write the formulas for the gradients of the loss with respect to the
all the learnable parameters. To see how that would work, here is a
rewrite of the equation shown at the bottom of the previous slide:
w0T
X
0T k wi
X
Lossi = −w j wi + log2 e
j∈context(i) k

where the subscript i on Loss means that this the loss when the word
whose position index in the vocab is i is fed into the network.

As shown on the next slide, the form presented above lends itself
simply to the calculation of the gradients of the loss for updating the
learnable weights.
Purdue University 26
Training the Word2vec Network

The Gradients of the Loss (contd.)

For arbitrary values for the indices s and t, we get the following
expression for the gradients of the loss with respect to the elements of
the matrix W :
T T
~0 ~
∂(w~0 j w
~i ) ew k w
P
∂Lossi 1 ∂( i)
k
X
= − + T
∂wst ∂wst ~0 ~ ∂wst
ew k w
P
j∈context(i) i
k

~ i so that you
where I have introduced the arrowed vector notation w
can distinguish between the elements of the matrix and the row and
column vectors of the same matrix.
∂Lossi
You will end up with a similar form for the loss gradients ∂w 0 st .

I leave it to the reader to simplify further these expressions. You

might find useful the derivations presented in the following paper:
https://arxiv.org/abs/1411.2738

Purdue University 27
Using Word2vec for Improving the Quality of Text Retrieval

Outline

1 Importance of Text Search and Retrieval

2 How the Word Embeddings are Learned in Word2vec

3 Softmax as the Activation Function in Word2vec

4 Training the Word2vec Network

5 Using Word2vec for Improving the Quality of Text Retrieval

Purdue University 28
Using Word2vec for Improving the Quality of Text Retrieval

Using Word2Vec for Text Retrieval

The numerical word embeddings generated by the word2vec network
allow us to establish an easy-to-calculate similarity criterion for the
words: We consider two words to be similar if their embedding vectors
are close using, say, the Euclidean distance between them.

Just to show an example of how powerful word2vec is at establishing

word similarities (that a lot of people would refer to as “semantic
similarities”), the figure at the top on the next slide shows the word
cluster discovered by word2vec for software repositories.

The similarity clusters shown on the next slide were obtained by

Shayan Akbar in his research on automatic bug localization. To
generate the word embeddings, he downloaded 35,000 Java
repositories from GitHub which resulted in 0.5 million software-centric
terms. So 500,000 is the size of the vocabulary here. He used
N = 500 for size of the word embedding vectors.
Purdue University 29
Using Word2vec for Improving the Quality of Text Retrieval

Using Word2Vec for Text Retrieval (contd.)

Figure: Some examples of word clusters obtained through the similarity of their embeddings.

Purdue University 30
Using Word2vec for Improving the Quality of Text Retrieval

Using Word2Vec for Text Retrieval (contd.)

Show below are additional examples of word similarities discovered
through the same word embeddings as mentioned on Slide 29. In this
display, we seek the three words that are most similar to the words
listed in the top row.
You’ve got to agree that it is almost magical that after digesting half
a million software-centric words, the system can figure out
automatically that “parameter”, “param”, “method”, and
“argument” are closely related concepts. The same comment applies
to the other columns in the table.

Figure: Additional examples of software-centric word similarities based on learned their embeddings.

Purdue University 31
Using Word2vec for Improving the Quality of Text Retrieval

Using Word2Vec for Text Retrieval (contd.)

Now that we know how to establish “semantic” word similarities, the
question remains how to use the similarities for improving the quality
of retrieval.
How that problem was solved in the context of retrieval from software
repositories is described in our 2019 MSR publication:

https://engineering.purdue.edu/RVL/Publications/Akbar_SCOR_Source_Code_Retrieval_2019_MSR_paper.pdf

Purdue University 32

DSA1101 2019 Week4 Part1
No ratings yet
DSA1101 2019 Week4 Part1
39 pages
Whitepaper_Embeddings & Vector Stores
No ratings yet
Whitepaper_Embeddings & Vector Stores
52 pages
ds unit 1 ppt
No ratings yet
ds unit 1 ppt
77 pages
Web Minnig
No ratings yet
Web Minnig
30 pages
Neural IR
No ratings yet
Neural IR
45 pages
CCS369 UNIT-2 20.12.24
No ratings yet
CCS369 UNIT-2 20.12.24
41 pages
2018 - Word Embedding - Word2Vec - 1 (Choi) (11 Slides)
100% (1)
2018 - Word Embedding - Word2Vec - 1 (Choi) (11 Slides)
11 pages
Unit 5 Part 2
No ratings yet
Unit 5 Part 2
21 pages
CS p3 chap 6 (Artifcial Intelligence) 2024 may june to 2021 note plus past papers
No ratings yet
CS p3 chap 6 (Artifcial Intelligence) 2024 may june to 2021 note plus past papers
56 pages
NLP DL Lecture2
No ratings yet
NLP DL Lecture2
54 pages
RevolutionizingConstructionTheImpactofArtificial5_IJAIML291234202210XT_p.57-73
No ratings yet
RevolutionizingConstructionTheImpactofArtificial5_IJAIML291234202210XT_p.57-73
18 pages
Unit 2 Updated New
No ratings yet
Unit 2 Updated New
77 pages
4. Word Embeddings 1
No ratings yet
4. Word Embeddings 1
42 pages
ofosu-ampong
No ratings yet
ofosu-ampong
12 pages
词向量嵌入综述
No ratings yet
词向量嵌入综述
10 pages
NLP2
No ratings yet
NLP2
11 pages
lecture 10
No ratings yet
lecture 10
86 pages
NLP_Module 2
No ratings yet
NLP_Module 2
54 pages
Large Language Models From Scratch
No ratings yet
Large Language Models From Scratch
29 pages
Master Thesis
No ratings yet
Master Thesis
74 pages
A Simple Word2vec Tutorial - Zafar Ali - Medium - Reader View
No ratings yet
A Simple Word2vec Tutorial - Zafar Ali - Medium - Reader View
9 pages
Unit iv
No ratings yet
Unit iv
58 pages
Performance Evaluation of Word Embedding Algorithms
No ratings yet
Performance Evaluation of Word Embedding Algorithms
7 pages
XCS224N_Module1_Slides
No ratings yet
XCS224N_Module1_Slides
72 pages
14-Word Embeddings II
No ratings yet
14-Word Embeddings II
31 pages
Word Embeddings
No ratings yet
Word Embeddings
55 pages
Vector Semantics 5: (Count (C) )
No ratings yet
Vector Semantics 5: (Count (C) )
3 pages
3. Graph Representation Learning
No ratings yet
3. Graph Representation Learning
32 pages
Scenario Based Business Analyst Interview
No ratings yet
Scenario Based Business Analyst Interview
46 pages
Word Embeddings a Survey
No ratings yet
Word Embeddings a Survey
11 pages
Word 2 Vector
No ratings yet
Word 2 Vector
4 pages
Lect04
No ratings yet
Lect04
44 pages
Chapter Transformers
No ratings yet
Chapter Transformers
8 pages
Deep Network Notes
No ratings yet
Deep Network Notes
54 pages
Transfer Learning in Natural Language Processing PDF
0% (1)
Transfer Learning in Natural Language Processing PDF
238 pages
Newwhitepaper_Embeddings & vector stores
No ratings yet
Newwhitepaper_Embeddings & vector stores
51 pages
06 Wordvectors
No ratings yet
06 Wordvectors
96 pages
4. Word Embadding
No ratings yet
4. Word Embadding
24 pages
Digital Final Exam
No ratings yet
Digital Final Exam
18 pages
08-DL-Deep Learning For Text Data (Transfer Learning in NLP)
No ratings yet
08-DL-Deep Learning For Text Data (Transfer Learning in NLP)
53 pages
The AI Stock Investor A Beginner's Guide To Profiting From The AI Revolution (Freeman Publications) - 2023 - English - B0C2JL89SC - (Z-Library)
No ratings yet
The AI Stock Investor A Beginner's Guide To Profiting From The AI Revolution (Freeman Publications) - 2023 - English - B0C2JL89SC - (Z-Library)
175 pages
Unit iv
No ratings yet
Unit iv
57 pages
Advance Hardware and Software
No ratings yet
Advance Hardware and Software
11 pages
Christopher Manning Lecture 1: Introduction and Word Vectors
No ratings yet
Christopher Manning Lecture 1: Introduction and Word Vectors
42 pages
1694640037-Module 4 Azure Machine Learning for Predictive Analytics
No ratings yet
1694640037-Module 4 Azure Machine Learning for Predictive Analytics
32 pages
08 Word Embeddings (2021)
No ratings yet
08 Word Embeddings (2021)
58 pages
7a. Word Embeddings Word2Vec and GloVe
No ratings yet
7a. Word Embeddings Word2Vec and GloVe
39 pages
NLP An Intuitive Understanding of Word Embeddings From Count Vectors To Word2Vec
No ratings yet
NLP An Intuitive Understanding of Word Embeddings From Count Vectors To Word2Vec
18 pages
Building Production-ready Web Apps with Node.js: A Practitioner’s Approach to produce Scalable, High-performant, and Flexible Web Components
From Everand
Building Production-ready Web Apps with Node.js: A Practitioner’s Approach to produce Scalable, High-performant, and Flexible Web Components
Gireesh Punathil
No ratings yet
MOBILENET FOR IMAGE CLASSIFICATION
No ratings yet
MOBILENET FOR IMAGE CLASSIFICATION
3 pages
WORD EMBEDDING Project
No ratings yet
WORD EMBEDDING Project
15 pages
ML for NLP-LO4
No ratings yet
ML for NLP-LO4
42 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
36 pages
MPC Book 2nd Edition 1st Printing
No ratings yet
MPC Book 2nd Edition 1st Printing
819 pages
Constructing and Evaluating Word Embeddings
No ratings yet
Constructing and Evaluating Word Embeddings
33 pages
wordembed
No ratings yet
wordembed
31 pages
Traditional Word Embedding
No ratings yet
Traditional Word Embedding
9 pages
Wordembed v2.0
No ratings yet
Wordembed v2.0
46 pages
NLP Notes
No ratings yet
NLP Notes
11 pages
Challenges and Opportunities of Big Data Analytics
No ratings yet
Challenges and Opportunities of Big Data Analytics
7 pages
Christopher Manning Lecture 2: Word Vectors, Word Senses, and Neural Classifiers
No ratings yet
Christopher Manning Lecture 2: Word Vectors, Word Senses, and Neural Classifiers
57 pages
Adaptive Learning Systems Enhancing Student Engagement
No ratings yet
Adaptive Learning Systems Enhancing Student Engagement
28 pages
NLP Prez Word - Sentence Embedding - MAQUET - MARTIN - LEEFEBURE - MOGAVERO
No ratings yet
NLP Prez Word - Sentence Embedding - MAQUET - MARTIN - LEEFEBURE - MOGAVERO
18 pages
Artificial Intelligence Painting The Bigger Picture For Copyright Ownership
No ratings yet
Artificial Intelligence Painting The Bigger Picture For Copyright Ownership
29 pages
Part 3
No ratings yet
Part 3
5 pages
DM Chapter 9 - word embedding
No ratings yet
DM Chapter 9 - word embedding
7 pages
Building Scalable Apps with Redis and Node.js
From Everand
Building Scalable Apps with Redis and Node.js
Joshua Johanan
No ratings yet
Advanced Deep Learning Techniques for Natural Language Understanding: A Comprehensive Guide
From Everand
Advanced Deep Learning Techniques for Natural Language Understanding: A Comprehensive Guide
Adam Jones
No ratings yet
Explaining The Intuition of Word2Vec & Implementing It in Python
No ratings yet
Explaining The Intuition of Word2Vec & Implementing It in Python
13 pages
THI ONLINE- CHIẾN THUẬT LÀM BÀI ĐỌC HIỂU
No ratings yet
THI ONLINE- CHIẾN THUẬT LÀM BÀI ĐỌC HIỂU
6 pages
Intuitive Understanding of Word Embeddings - Count Vectors To Word2Vec
No ratings yet
Intuitive Understanding of Word Embeddings - Count Vectors To Word2Vec
34 pages
Lecture 2a - Word Level Semantics
No ratings yet
Lecture 2a - Word Level Semantics
34 pages
Nonlinear System Identification ARTICULO
No ratings yet
Nonlinear System Identification ARTICULO
121 pages
Proj Report PDF
0% (1)
Proj Report PDF
49 pages
A Dynamic Cloud With Data Privacy Preservation
No ratings yet
A Dynamic Cloud With Data Privacy Preservation
151 pages
Melek Acar Boyacioglu, Yakup Kara, O Mer Kaan Baykan
No ratings yet
Melek Acar Boyacioglu, Yakup Kara, O Mer Kaan Baykan
12 pages
Langmodel PDF
0% (1)
Langmodel PDF
69 pages
Machine Learning: by Team 2
No ratings yet
Machine Learning: by Team 2
41 pages
Deps 087669
No ratings yet
Deps 087669
14 pages
Microcontrollers For IoT
No ratings yet
Microcontrollers For IoT
30 pages
A Smart Seat Belt
No ratings yet
A Smart Seat Belt
4 pages
The End of Personal Computer
0% (1)
The End of Personal Computer
5 pages
Big Data For Open Innovation in SMEs and Large Corporations
No ratings yet
Big Data For Open Innovation in SMEs and Large Corporations
17 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
33 pages
Ly Ngoc Vu YSCPaper
No ratings yet
Ly Ngoc Vu YSCPaper
11 pages
Introduction To IntoTheBlock
100% (1)
Introduction To IntoTheBlock
27 pages
Learning From Data - A Short Course
91% (11)
Learning From Data - A Short Course
215 pages
542 315 Word2vec
No ratings yet
542 315 Word2vec
20 pages
Ibm Synapse Mimics Human Brain
No ratings yet
Ibm Synapse Mimics Human Brain
6 pages
Design of Warehouse Scale Computers (WSC)
No ratings yet
Design of Warehouse Scale Computers (WSC)
5 pages
Analysis and Comparison of Different Microprocessors
No ratings yet
Analysis and Comparison of Different Microprocessors
6 pages
AMPSO: A New Particle Swarm Method For Nearest Neighborhood Classification
No ratings yet
AMPSO: A New Particle Swarm Method For Nearest Neighborhood Classification
10 pages
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
100% (1)
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
12 pages
CISC 867: Deep Learning Assignment #2 (2 Points/question)
No ratings yet
CISC 867: Deep Learning Assignment #2 (2 Points/question)
2 pages
Performance Analysis On Multicore Processors
No ratings yet
Performance Analysis On Multicore Processors
9 pages
Python Mini - Project - Reprot Final-1
No ratings yet
Python Mini - Project - Reprot Final-1
41 pages
24 Ultimate Data Science Projects To Boost Your Knowledge and Skills
No ratings yet
24 Ultimate Data Science Projects To Boost Your Knowledge and Skills
10 pages
Multi Core ARM Processors in Mobile Devices
No ratings yet
Multi Core ARM Processors in Mobile Devices
5 pages
Machine Learning Cheat Sheet
No ratings yet
Machine Learning Cheat Sheet
1 page
Future of Cloud Computing Architecture
100% (1)
Future of Cloud Computing Architecture
10 pages
Approaches and Methods in Computational Linguistics
No ratings yet
Approaches and Methods in Computational Linguistics
18 pages
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
From Everand
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
Fouad Sabry
No ratings yet
Artificial Intelligence 417 Class X Sample Paper Test 01 For Board Exam 2024
No ratings yet
Artificial Intelligence 417 Class X Sample Paper Test 01 For Board Exam 2024
6 pages
Teachers As Self-Directed Learners - Active Positioning Through Professional Learning (2017)
100% (4)
Teachers As Self-Directed Learners - Active Positioning Through Professional Learning (2017)
185 pages

Using Word Embeddings For Text Search

Uploaded by

Using Word Embeddings For Text Search

Uploaded by

Using Word Embeddings for Text Search and

Lecture Notes on Deep Learning

Avi Kak and Charles Bouman

Tuesday 28th April, 2020 11:54

The word-to-numeric representations created by the word2vec network are vectors

The word embeddings generated by word2vec allow us to establish word

However, in order to underscore the importance of word2vec, I am going to start

1 Importance of Text Search and Retrieval

2 How the Word Embeddings are Learned in Word2vec

3 Softmax as the Activation Function in Word2vec

4 Training the Word2vec Network

5 Using Word2vec for Improving the Quality of Text Retrieval

1 Importance of Text Search and Retrieval

2 How the Word Embeddings are Learned in Word2vec

3 Softmax as the Activation Function in Word2vec

4 Training the Word2vec Network

5 Using Word2vec for Improving the Quality of Text Retrieval

Role of Text Search and Retrieval in our Lives

A good indicator of the central importance of text search and retrieval

Even without thinking, we now bring up an app on our mobile device

In my own lab (RVL), we now have a history of around 15 years

Text Search and Retrieval and Us (contd.)

Text Search and Retrieval and Us (contd.)

The BoW based text retrieval algorithms were followed by approaches

In an MRF based framework, in addition to measuring the frequencies

Most modern approaches to text search and retrieval also include

Important Forums for Research Related to Text Retrieval

Since my personal focus is on text retrieval in the context of

This link is to the list of accepted papers at this year’s conference. I

1 Importance of Text Search and Retrieval

2 How the Word Embeddings are Learned in Word2vec

3 Softmax as the Activation Function in Word2vec

4 Training the Word2vec Network

5 Using Word2vec for Improving the Quality of Text Retrieval

The files in a text corpus are scanned with a window of size 2W + 1.

We assume that the size of the vocabulary is V .

As a text file is scanned, the V -element long one-hot vector

1 Importance of Text Search and Retrieval

2 How the Word Embeddings are Learned in Word2vec

3 Softmax as the Activation Function in Word2vec

4 Training the Word2vec Network

5 Using Word2vec for Improving the Quality of Text Retrieval

The Softmax Activation Function

The Softmax Activation Function (contd.)

The Softmax Activation Function (contd.)

The Softmax Activation Function (contd.)

In light of the probabilistic interpretation given to the output of the

The goal of training a word2vec network is to maximize this

The Softmax Activation Function (contd.)

These issues are addressed in the next section.

1 Importance of Text Search and Retrieval

2 How the Word Embeddings are Learned in Word2vec

3 Softmax as the Activation Function in Word2vec

4 Training the Word2vec Network

5 Using Word2vec for Improving the Quality of Text Retrieval

Training for Word2vec

As you already know, the word2vec network is training by scanning

In keeping with the conditional probability interpretation of the

Calculating the Loss

The calculation of loss now amounts to comparing the estimated

Computing the Gradients of the Loss

The Gradients of the Loss (contd.)

I leave it to the reader to simplify further these expressions. You

1 Importance of Text Search and Retrieval

2 How the Word Embeddings are Learned in Word2vec

3 Softmax as the Activation Function in Word2vec

4 Training the Word2vec Network

5 Using Word2vec for Improving the Quality of Text Retrieval

Using Word2Vec for Text Retrieval

Just to show an example of how powerful word2vec is at establishing

The similarity clusters shown on the next slide were obtained by

Using Word2Vec for Text Retrieval (contd.)

Using Word2Vec for Text Retrieval (contd.)

Using Word2Vec for Text Retrieval (contd.)

You might also like