Distributed Word Representations For Information Retrieval
Distributed Word Representations For Information Retrieval
Information Retrieval
Fall 2019
Srihari-CSE535-Fall2019
Introduction to Information Retrieval Sec. 9.2.2
Too little data (10s of millions of words) treated by too sparse method.
100,000 words = 1010 entries in C.
Introduction to Information Retrieval Sec. 9.2.2
12
Introduction to Information Retrieval
13
Introduction to Information Retrieval Sec. 18.2
Traditional Way:
Latent Semantic Indexing/Analysis
§ Use Singular Value Decomposition (SVD) – kind of like
Principal Components Analysis (PCA) for an arbitrary
rectangular matrix – or just random projection to find a low-
dimensional basis or orthogonal vectors
§ Theory is that similarity is preserved as much as possible
§ You can actually gain in IR (slightly) by doing LSA, as “noise”
of term variation gets replaced by semantic “concepts”
§ Somewhat popular in the 1990s [Deerwester et al. 1990, etc.]
§ But results were always somewhat iffy (… it worked sometimes)
§ Hard to implement efficiently in an IR system (dense vectors!)
§ Discussed in IIR chapter 18, but not discussed further here
Introduction to Information Retrieval
“NEURAL EMBEDDINGS”
Introduction to Information Retrieval
0.286
0.792
−0.177
banking = −0.107
0.109
−0.542
0.349
0.271
Introduction to Information Retrieval
17
Introduction to Information Retrieval
19
Introduction to Information Retrieval
Two algorithms
1. Skip-grams (SG)
Predict context words given target (position independent)
2. Continuous Bag of Words (CBOW)
Predict target word from bag-of-words context
21
Introduction to Information Retrieval
22
Introduction to Information Retrieval
Word Analogies
Test for linear relationships, examined by Mikolov et al.
a:b :: c:?
man:woman :: king:?
GloVe Visualizations
http://nlp.stanford.edu/projects/glove/
25
Introduction to Information Retrieval
26
Introduction to Information Retrieval
12/2/19 27
Introduction to Information Retrieval
d2
Introduction to Information Retrieval
WIN WOUT
Embeddings Embeddings
for focus for context
words words
Focus Context
word word
Experiments
§ Train word2vec from either
§ 600 million Bing queries
§ 342 million web document sentences
§ Test on 7,741 randomly sampled Bing queries
§ 5 level eval (Perfect, Excellent, Good, Fair, Bad)
§ Two approaches
1. Use DESM model to rerank top results from BM25
2. Use DESM alone or a mixture model of it and BM25
Introduction to Information Retrieval
A possible explanation
DESM conclusions
§ DESM is a weak ranker but effective at finding subtler
similarities/aboutness
§ It is effective at, but only at, reranking at least
somewhat relevant documents