0% found this document useful (0 votes)

143 views46 pages

Distributed Word Representations For Information Retrieval

This document provides an overview of distributed word representations for information retrieval. It discusses how word embeddings can be used to better understand semantic relationships between words compared to traditional keyword matching. Word embeddings represent words as dense vectors in a low-dimensional space such that similar words have similar embeddings. This allows capturing semantic relationships that can help improve information retrieval, for example by matching queries to documents that use synonymous terms. The document outlines approaches for learning word embeddings from large text corpora using neural networks to predict words from their contexts.

Uploaded by

Rahul Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

143 views46 pages

Distributed Word Representations For Information Retrieval

Uploaded by

Rahul Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 46

CSE 435/535

Information Retrieval
Fall 2019

Distributed Word Representations

for Information Retrieval
Material from:
Christopher Manning and Pandu Nayak

Srihari-CSE535-Fall2019
Introduction to Information Retrieval Sec. 9.2.2

How can we more robustly match a

user’s search intent?
We want to understand a query, not just do String equals()
§ If user searches for [Dell notebook battery size], we would like
to match documents discussing “Dell laptop battery capacity”
§ If user searches for [Seattle motel], we would like to match
documents containing “Seattle hotel”

A pure keyword-matching IR system does nothing to help….

Simple facilities that we have already discussed do a bit to help
§ Spelling correction
§ Stemming / case folding
But we’d like to better understand when query/document match
Introduction to Information Retrieval Sec. 9.2.2

How can we more robustly match a

user’s search intent?
Query expansion:
§ Relevance feedback could allow us to capture this if we get
near enough to matching documents with these words
§ We can also use information on word similarities:
§ A manual thesaurus of synonyms for query expansion
§ A measure of word similarity
§ Calculated from a big document collection
§ Calculated by query log mining (common on the web)
Document expansion:
§ Use of anchor text may solve this by providing human
authored synonyms, but not for new or less popular web
pages, or non-hyperlinked collections
Introduction to Information Retrieval Sec. 9.2.2

Example of manual thesaurus

Introduction to Information Retrieval

Search log query expansion

§ Context-free query expansion ends up problematic
§ [wet ground] ≈ [wet earth]
§ So expand [ground] ⇒ [ground earth]
§ But [ground coffee] ≠ [earth coffee]
§ You can learn query context-specific rewritings from
search logs by attempting to identify the same user
making a second attempt at the same user need
§ [Hinton word vector]
§ [Hinton word embedding]
§ In this context, [vector] ≈ [embedding]
§ But not when talking about a disease vector or C++!
Introduction to Information Retrieval Sec. 9.2.3

Automatic Thesaurus Generation

§ Attempt to generate a thesaurus automatically by
analyzing a collection of documents
§ Fundamental notion: similarity between two words
§ Definition 1: Two words are similar if they co-occur with
similar words.
§ Definition 2: Two words are similar if they occur in a
given grammatical relation with the same words.
§ You can harvest, peel, eat, prepare, etc. apples and
pears, so apples and pears must be similar.
§ Co-occurrence based is more robust, grammatical
relations are more accurate. Why?
Introduction to Information Retrieval Sec. 9.2.3

Simple Co-occurrence Thesaurus

§ Simplest way to compute one is based on term-term similarities
in C = AAT where A is term-document matrix.
§ wi,j = (normalized) weight for (ti ,dj)
dj N
A
ti What does C
contain if A
is a term-doc
incidence
M (0/1) matrix?
§ For each ti, pick terms with high values in C
Introduction to Information Retrieval

Automatic thesaurus generation

example … sort of works
Word Nearest neighbors
absolutely absurd, whatsoever, totally, exactly, nothing
bottomed dip, copper, drops, topped, slide, trimmed
captivating shimmer, stunningly, superbly, plucky, witty
doghouse dog, porch, crawling, beside, downstairs
makeup repellent, lotion, glossy, sunscreen, skin, gel
mediating reconciliation, negotiate, cease, conciliation
keeping hoping, bring, wiping, could, some, would
lithographs drawings, Picasso, Dali, sculptures, Gauguin
pathogens toxins, bacteria, organisms, bacterial, parasites
senses grasp, psyche, truly, clumsy, naïve, innate

Too little data (10s of millions of words) treated by too sparse method.
100,000 words = 1010 entries in C.
Introduction to Information Retrieval Sec. 9.2.2

How can we represent term relations?

§ With the standard symbolic encoding of terms, each term is a
dimension
§ Different terms have no inherent similarity
§ motel [0 0 0 0 0 0 0 0 0 0 1 0 0 0 0] T
hotel [0 0 0 0 0 0 0 3 0 0 0 0 0 0 0] = 0
§ If query on hotel and document has motel, then our query
and document vectors are orthogonal
Introduction to Information Retrieval

Can you directly learn term relations?

§ Basic IR is scoring on qTd
§ No treatment of synonyms; no machine learning
§ Can we learn parameters W to rank via qTWd ?

§ Cf. Query translation models: Berger and Lafferty (1999)

§ Problem is again sparsity – W is huge > 1010
Introduction to Information Retrieval

Is there a better way?

§ Idea:
§ Can we learn a dense low-dimensional representation of a
word in ℝd such that dot products uTv express word
similarity?
§ We could still if we want to include a “translation” matrix
between vocabularies (e.g., cross-language): uTWv
§ But now W is small!
§ Supervised Semantic Indexing (Bai et al. Journal of
Information Retrieval 2009) shows successful use of
learning W for information retrieval

§ But we’ll develop direct similarity in this class

Introduction to Information Retrieval

Distributional similarity based

representations
§ You can get a lot of value by representing a word by
means of its neighbors
§ “You shall know a word by the company it keeps”
§ (J. R. Firth 1957: 11)

§ One of the most successful ideas of modern

statistical NLP
…government debt problems turning into banking crises as happened in 2009…
…saying that Europe needs unified banking regulation to replace the hodgepodge…
…India has just given its banking system a shot in the arm…

ë These words will represent banking ì

12
Introduction to Information Retrieval

Solution: Low dimensional vectors

§ The number of topics that people talk about is small
(in some sense)
§ Clothes, movies, politics, …
• Idea: store “most” of the important information in a
fixed, small number of dimensions: a dense vector
• Usually 25 – 1000 dimensions

• How to reduce the dimensionality?

• Go from big, sparse co-occurrence count vector to low
dimensional “word embedding”

13
Introduction to Information Retrieval Sec. 18.2

Traditional Way:
Latent Semantic Indexing/Analysis
§ Use Singular Value Decomposition (SVD) – kind of like
Principal Components Analysis (PCA) for an arbitrary
rectangular matrix – or just random projection to find a low-
dimensional basis or orthogonal vectors
§ Theory is that similarity is preserved as much as possible
§ You can actually gain in IR (slightly) by doing LSA, as “noise”
of term variation gets replaced by semantic “concepts”
§ Somewhat popular in the 1990s [Deerwester et al. 1990, etc.]
§ But results were always somewhat iffy (… it worked sometimes)
§ Hard to implement efficiently in an IR system (dense vectors!)
§ Discussed in IIR chapter 18, but not discussed further here
Introduction to Information Retrieval

“NEURAL EMBEDDINGS”
Introduction to Information Retrieval

Word meaning is defined in terms of

vectors
§ We will build a dense vector for each word type,
chosen so that it is good at predicting other words
appearing in its context
… those other words also being represented by vectors … it all gets a bit recursive

0.286
0.792
−0.177
banking = −0.107
0.109
−0.542
0.349
0.271
Introduction to Information Retrieval

Neural word embeddings - visualization

17
Introduction to Information Retrieval

Basic idea of learning neural network word

embeddings
§ We define a model that aims to predict between a
center word wt and context words in terms of word
vectors
§ p(context|wt) = …
§ which has a loss function, e.g.,
§ J = 1 − p(w−t |wt)
§ We look at many positions t in a big language corpus
§ We keep adjusting the vector representations of
words to minimize this loss
Introduction to Information Retrieval

Idea: Directly learn low-dimensional word

vectors based on ability to predict
• Old idea: Learning representations by back-propagating
errors. (Rumelhart et al., 1986)
• A neural probabilistic language model (Bengio et al.,
2003) Non-linear
• NLP (almost) from Scratch (Collobert & Weston, 2008) and slow
• A recent, even simpler and faster model:
word2vec (Mikolov et al. 2013) à intro now Fast
bilinear
• The GloVe model from Stanford (Pennington, Socher, models
and Manning 2014) connects back to matrix
factorization
• Per-token representations: Deep contextual word Current
state of
representations: ELMo, ULMfit, BERT the art

19
Introduction to Information Retrieval

Word2vec is a family of algorithms

[Mikolov et al. 2013]

Predict between every word and its context words!

Two algorithms
1. Skip-grams (SG)
Predict context words given target (position independent)
2. Continuous Bag of Words (CBOW)
Predict target word from bag-of-words context

Two (moderately efficient) training methods

1. Hierarchical softmax
2. Negative sampling
3. Naïve softmax
Introduction to Information Retrieval

Word2Vec Skip-gram Overview

§ Example windows and process for
computing ! "#$% | "#

! "#)( | "# ! "#$( | "#

! "#)' | "# ! "#$' | "#

… problems turning into banking crises as …

outside context words center word outside context words

in window of size 2 at position t in window of size 2

21
Introduction to Information Retrieval

22
Introduction to Information Retrieval

Linear Relationships in word2vec

These representations are very good at encoding
similarity and dimensions of similarity!
§ Analogies testing dimensions of similarity can be
solved quite well just by doing vector subtraction in
the embedding space
Syntactically
§ xapple − xapples ≈ xcar − xcars ≈ xfamily − xfamilies
§ Similarly for verb and adjective morphological forms
Semantically (Semeval 2012 task 2)
§ xshirt − xclothing ≈ xchair − xfurniture
§ xking − xman ≈ xqueen − xwoman
23
Introduction to Information Retrieval

Word Analogies
Test for linear relationships, examined by Mikolov et al.

a:b :: c:?

man:woman :: king:?

+ king [ 0.30 0.70 ] queen

king
man [ 0.20 0.20 ]

+ woman [ 0.60 0.30 ]

woman

queen [ 0.70 0.80 ] man

Introduction to Information Retrieval

GloVe Visualizations

http://nlp.stanford.edu/projects/glove/
25
Introduction to Information Retrieval

Glove Visualizations: Company - CEO

26
Introduction to Information Retrieval

Glove Visualizations: Superlatives

12/2/19 27
Introduction to Information Retrieval

Application to Information Retrieval

Application is just beginning – we’re “at the end of the early years”
§ Google’s RankBrain – little is publicly known
§ Bloomberg article by Jack Clark (Oct 26, 2015):
http://www.bloomberg.com/news/articles/2015-10-26/google-turning-its-
lucrative-web-search-over-to-ai-machines
§ A result reranking system. “3rd most valuable ranking signal”
§ But note: more of the potential value is in the tail?
§ New SIGIR Neu-IR workshop series (2016 on)
Introduction to Information Retrieval

An application to information retrieval

Nalisnick, Mitra, Craswell & Caruana. 2016. Improving Document
Ranking with Dual Word Embeddings. WWW 2016 Companion.
http://research.microsoft.com/pubs/260867/pp1291-Nalisnick.pdf
Mitra, Nalisnick, Craswell & Caruana. 2016. A Dual Embedding
Space Model for Document Ranking. arXiv:1602.01137 [cs.IR]

Builds on BM25 model idea of “aboutness”

§ Not just term repetition indicating aboutness
§ Relationship between query terms and all terms in the
document indicates aboutness (BM25 uses only query terms)
Makes clever argument for different use of word and context
vectors in word2vec’s CBOW/SGNS or GloVe
Introduction to Information Retrieval

Modeling document aboutness:

Results from a search for Albuquerque
d1

d2
Introduction to Information Retrieval

Using 2 word embeddings

word2vec model with 1 word of context

WIN WOUT
Embeddings Embeddings
for focus for context
words words
Focus Context
word word

We can gain by using these

two embeddings differently
Introduction to Information Retrieval

Using 2 word embeddings

Introduction to Information Retrieval

Dual Embedding Space Model (DESM)

§ Simple model
§ A document is represented by the centroid of its
word vectors

§ Query-document similarity is average over query

words of cosine similarity
Introduction to Information Retrieval

Dual Embedding Space Model (DESM)

§ What works best is to use the OUT vectors for the
document and the IN vectors for the query

§ This way similarity measures aboutness – words that

appear with this word – which is more useful in this
context than (distributional) semantic similarity
Introduction to Information Retrieval

Experiments
§ Train word2vec from either
§ 600 million Bing queries
§ 342 million web document sentences
§ Test on 7,741 randomly sampled Bing queries
§ 5 level eval (Perfect, Excellent, Good, Fair, Bad)
§ Two approaches
1. Use DESM model to rerank top results from BM25
2. Use DESM alone or a mixture model of it and BM25
Introduction to Information Retrieval

Results – reranking k-best list

Pretty decent gains – e.g., 2% for NDCG@3

Gains are bigger for model trained on queries than docs
Introduction to Information Retrieval

Results – whole ranking system

By itself, DESM doesn’t work. Nor does LSA.

Introduction to Information Retrieval

A possible explanation

IN-OUT has some ability to prefer Relevant to close-by

(judged) non-relevant, but it’s scores induce too much
noise vs. BM25 to be usable alone
Introduction to Information Retrieval

DESM conclusions
§ DESM is a weak ranker but effective at finding subtler
similarities/aboutness
§ It is effective at, but only at, reranking at least
somewhat relevant documents

§ For example, DESM can confuse Oxford and Cambridge

§ Bing rarely makes an Oxford/Cambridge mistake!
Introduction to Information Retrieval

What else can neural nets do in IR?

§ Use a neural network as a supervised
reranker
§ Assume a query and document
embedding network (as we have
discussed)
§ Assume you have (q,d,rel) relevance
data
§ Learn a neural network (with
supervised learning) to predict
relevance of (q,d) pair
§ An example of “machine-learned
relevance”
Introduction to Information Retrieval

What else can neural nets do in IR?

§ BERT: Devlin, Chang, Lee, Toutanova (2018)
§ A deep transformer-based neural network
§ Builds per-token (in context) representations
§ Produces a query/document
representation as well
§ Or jointly embed query and
document and ask for a
retrieval score
§ Incredibly effective!
§ https://arxiv.org/abs/1810.04805
Introduction to Information Retrieval

Summary: Embed all the things!

Word embeddings are the hot new technology (again!)

Lots of applications wherever knowing word context or

similarity helps prediction:
§ Synonym handling in search
§ Document aboutness
§ Ad serving
§ Language models: from spelling correction to email response
§ Machine translation
§ Sentiment analysis
§ …
Introduction to Information Retrieval
Introduction to Information Retrieval

Global vs. local embedding [Diaz 2016]

Introduction to Information Retrieval

Global vs. local embedding [Diaz 2016]

Train w2v on documents from

first round of retrieval

Fine-grained word sense

disambiguation
Introduction to Information Retrieval

Ad-hoc retrieval using local and

distributed representation [Mitra et al. 2017]
§ Argues both “lexical” and
“semantic” matching is
important for document
ranking
§ Duet model is a linear
combination of two DNNs
using local and distributed
representations of query/
document as inputs, and
jointly trained on labelled data

IR Berhampore Sukomalpal
No ratings yet
IR Berhampore Sukomalpal
82 pages
Lecture 3-Term Vocabulary and Posting Lists
No ratings yet
Lecture 3-Term Vocabulary and Posting Lists
38 pages
Master Thesis
No ratings yet
Master Thesis
74 pages
1 Introduction To IR FSS20
No ratings yet
1 Introduction To IR FSS20
47 pages
Introduction IR
No ratings yet
Introduction IR
61 pages
Module 7
No ratings yet
Module 7
53 pages
DM Chapter 9 - Word Embedding
No ratings yet
DM Chapter 9 - Word Embedding
7 pages
1 Introduction To IR
No ratings yet
1 Introduction To IR
49 pages
Word Embeddings
No ratings yet
Word Embeddings
55 pages
Feature Eng2
No ratings yet
Feature Eng2
31 pages
Neural IR
No ratings yet
Neural IR
45 pages
Lecture14 Distributed Representations
No ratings yet
Lecture14 Distributed Representations
63 pages
Introduction To Information Retrieval: Jian-Yun Nie University of Montreal Canada
No ratings yet
Introduction To Information Retrieval: Jian-Yun Nie University of Montreal Canada
61 pages
Information Retrival List of Experiment - Odd Sem 2024-25
No ratings yet
Information Retrival List of Experiment - Odd Sem 2024-25
23 pages
Information Retrieval
No ratings yet
Information Retrieval
72 pages
Lec2 2
No ratings yet
Lec2 2
17 pages
NLP Week10 IR Enc Dec
No ratings yet
NLP Week10 IR Enc Dec
68 pages
Lecture2 Dictionary
No ratings yet
Lecture2 Dictionary
37 pages
Wordembedding
No ratings yet
Wordembedding
25 pages
Information Retrieval Systems Chap 2
67% (3)
Information Retrieval Systems Chap 2
60 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
51 pages
Information Retrieval On Cranfield Dataset
No ratings yet
Information Retrieval On Cranfield Dataset
15 pages
Lecture2 Dictionary
No ratings yet
Lecture2 Dictionary
62 pages
Lecture2-Dictionary - Term Vocabulary and Postings Lists ch2 and ch4
No ratings yet
Lecture2-Dictionary - Term Vocabulary and Postings Lists ch2 and ch4
33 pages
Introduction To Information Retrieval
No ratings yet
Introduction To Information Retrieval
61 pages
Query Expansion
No ratings yet
Query Expansion
31 pages
IR Problem: Introduction To Information Retrieval Outline
No ratings yet
IR Problem: Introduction To Information Retrieval Outline
11 pages
Text Similarity Algorithms
No ratings yet
Text Similarity Algorithms
28 pages
Term Vocabulary and Postings List
No ratings yet
Term Vocabulary and Postings List
64 pages
IR Lec03 Vocabulary Postings List
No ratings yet
IR Lec03 Vocabulary Postings List
28 pages
Vector Space Model and Features: Carl Staelin
No ratings yet
Vector Space Model and Features: Carl Staelin
28 pages
Ling571 Class14 Distr Thes
No ratings yet
Ling571 Class14 Distr Thes
122 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
51 pages
C2 Dictionary
No ratings yet
C2 Dictionary
6 pages
Lecture3 Tolerant Retrieval Handout 6 Per
No ratings yet
Lecture3 Tolerant Retrieval Handout 6 Per
8 pages
XCS224N Module1 Slides
No ratings yet
XCS224N Module1 Slides
72 pages
Unit 2 Updated New
No ratings yet
Unit 2 Updated New
77 pages
Week 2 - Information Retrieval Basics
No ratings yet
Week 2 - Information Retrieval Basics
74 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
34 pages
Semantic Technology-Assisted Review STAR Document
No ratings yet
Semantic Technology-Assisted Review STAR Document
14 pages
Word and Document Embeddings
No ratings yet
Word and Document Embeddings
94 pages
Performance Enhancement and Customization of Information Storage and Retrieval System
No ratings yet
Performance Enhancement and Customization of Information Storage and Retrieval System
32 pages
Lecture 3-Term Vocabulary and Posting Lists
No ratings yet
Lecture 3-Term Vocabulary and Posting Lists
26 pages
Introduction To Information Retrieval: Courtesy
No ratings yet
Introduction To Information Retrieval: Courtesy
61 pages
Lebijp 59 SZ 31 Py
No ratings yet
Lebijp 59 SZ 31 Py
69 pages
IR Lec04 Skip Ptrs Phrase Queries Indexing
No ratings yet
IR Lec04 Skip Ptrs Phrase Queries Indexing
18 pages
Lecture 4-Dictionaries and Tolerant Retrieval
No ratings yet
Lecture 4-Dictionaries and Tolerant Retrieval
50 pages
CSE 435/535 Information Retrieval: Chapter 2: Tokenization, Stemming, Lemmatization
No ratings yet
CSE 435/535 Information Retrieval: Chapter 2: Tokenization, Stemming, Lemmatization
48 pages
Wordembed
No ratings yet
Wordembed
31 pages
Information Retrievalpdf
No ratings yet
Information Retrievalpdf
7 pages
Wema Graduate Trainee Test 100 Questions
No ratings yet
Wema Graduate Trainee Test 100 Questions
18 pages
UNIT 4 Information Retrieval Using NLP
No ratings yet
UNIT 4 Information Retrieval Using NLP
13 pages
ESO3 Extra Practice Mosaic 3
92% (25)
ESO3 Extra Practice Mosaic 3
25 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
37 pages
Lecture 5-Dictionaries and Tolerant Retrieval
No ratings yet
Lecture 5-Dictionaries and Tolerant Retrieval
48 pages
An Overview of Information Retrieval Outline: A (Simple) Database Example Databases vs. IR
No ratings yet
An Overview of Information Retrieval Outline: A (Simple) Database Example Databases vs. IR
16 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
36 pages
The StatQuest Illustrated Guide To Machine Learning - Josh Starmer
100% (7)
The StatQuest Illustrated Guide To Machine Learning - Josh Starmer
305 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
37 pages
Materi Pertemuan Ke-1-Dno 2018-1
No ratings yet
Materi Pertemuan Ke-1-Dno 2018-1
42 pages
Sumanta Chowdhury - CLS - Aipmt-15-16 - XIII - Phy - Study-Package-1 - Set-1 - Chapter-3 PDF
0% (2)
Sumanta Chowdhury - CLS - Aipmt-15-16 - XIII - Phy - Study-Package-1 - Set-1 - Chapter-3 PDF
46 pages
Statistics in Music Education Research (Joshua A. Russell) (Z-Library)
No ratings yet
Statistics in Music Education Research (Joshua A. Russell) (Z-Library)
353 pages
Model Predictive Control
100% (1)
Model Predictive Control
12 pages
Deep Learning With Python
100% (6)
Deep Learning With Python
396 pages
Punjab Boards 10th Class Physics Book English Medium 6368072829234769390 PDF
No ratings yet
Punjab Boards 10th Class Physics Book English Medium 6368072829234769390 PDF
201 pages
Tugas Fisika Rekayasa
100% (2)
Tugas Fisika Rekayasa
100 pages
Transformers For Machine Learning A Deep Dive (Uday Kamath, Kenneth L. Graham, Wael Emara)
100% (12)
Transformers For Machine Learning A Deep Dive (Uday Kamath, Kenneth L. Graham, Wael Emara)
284 pages
Python Machine Learning Workbook For Beginners
No ratings yet
Python Machine Learning Workbook For Beginners
264 pages
Hang Li - Machine Learning Methods-Springer (2023) (Z-Lib - Io)
100% (9)
Hang Li - Machine Learning Methods-Springer (2023) (Z-Lib - Io)
530 pages
Machine Learning With Python
100% (14)
Machine Learning With Python
692 pages
Foundations of Computer Vision
86% (7)
Foundations of Computer Vision
443 pages
Tantalizing Tangrams
No ratings yet
Tantalizing Tangrams
26 pages
(EARLY RELEASE) Quick Start Guide To Large Language Models Strategies and Best Practices For Using ChatGPT and Other LLMs (Sinan Ozdemir) (Z-Library)
100% (14)
(EARLY RELEASE) Quick Start Guide To Large Language Models Strategies and Best Practices For Using ChatGPT and Other LLMs (Sinan Ozdemir) (Z-Library)
132 pages
Assessment # 33 (P - I) Ans - Key & Solution
No ratings yet
Assessment # 33 (P - I) Ans - Key & Solution
11 pages
Gas Dynamics-Rayleigh Flow
100% (4)
Gas Dynamics-Rayleigh Flow
26 pages
Rish Master 3440 Class 0.2s - Interface Definition
No ratings yet
Rish Master 3440 Class 0.2s - Interface Definition
47 pages
Python Machine Learning For Beginners Ebook Final
100% (11)
Python Machine Learning For Beginners Ebook Final
305 pages
MTE412 Ch5 Position Analysis
No ratings yet
MTE412 Ch5 Position Analysis
28 pages
Apress Understanding Large Language Models B0CJ2C8TXQ
100% (11)
Apress Understanding Large Language Models B0CJ2C8TXQ
166 pages
Rounding To 2dp 2
No ratings yet
Rounding To 2dp 2
2 pages
FIN 435 - Exam 2 Slides
No ratings yet
FIN 435 - Exam 2 Slides
157 pages
Full Course of Machine Learning
100% (16)
Full Course of Machine Learning
660 pages
Hands On Machine Learning With Python Concepts and Applications For Beginners - John Anderson 2018
91% (11)
Hands On Machine Learning With Python Concepts and Applications For Beginners - John Anderson 2018
166 pages
Neural Networks and Deep Learning - Deep Learning Explained To Your Granny - A Visual Introduction For Beginners Who Want To Make Their Own Deep Learning Neural Network (Machine Learning)
100% (5)
Neural Networks and Deep Learning - Deep Learning Explained To Your Granny - A Visual Introduction For Beginners Who Want To Make Their Own Deep Learning Neural Network (Machine Learning)
84 pages
Machine Learning
100% (11)
Machine Learning
135 pages
Deep Learning in Computer Vision - Principles and Applications
100% (3)
Deep Learning in Computer Vision - Principles and Applications
339 pages
Hackers Guide To Machine Learning With Python PDF
100% (15)
Hackers Guide To Machine Learning With Python PDF
272 pages
Multivariate Statistical Methods: Abiyot Negash (Assi. Prof)
No ratings yet
Multivariate Statistical Methods: Abiyot Negash (Assi. Prof)
28 pages
(Studies in Computational Intelligence) Witold Pedrycz, Shyi-Ming Chen - Deep Learning - Algorithms and Applications-Springer (2020)
100% (7)
(Studies in Computational Intelligence) Witold Pedrycz, Shyi-Ming Chen - Deep Learning - Algorithms and Applications-Springer (2020)
368 pages
Wilcoxon Signed Rank Test Janice M. Griponmaed Math 1
No ratings yet
Wilcoxon Signed Rank Test Janice M. Griponmaed Math 1
22 pages
Natural Language Processing With PyTorch - Build Intelligent Language Applications Using Deep Learning PDF
100% (14)
Natural Language Processing With PyTorch - Build Intelligent Language Applications Using Deep Learning PDF
210 pages
Statistical Regression and Classification - From Linear Models To Machine Learning
100% (10)
Statistical Regression and Classification - From Linear Models To Machine Learning
532 pages
LLMs and Generative AI For (Z-Library)
100% (3)
LLMs and Generative AI For (Z-Library)
58 pages
Introduction To Data ScienceA Python Approach To Concepts, Techniques and Applications PDF
100% (10)
Introduction To Data ScienceA Python Approach To Concepts, Techniques and Applications PDF
227 pages
TensorFlow For Machine Intelligence
100% (27)
TensorFlow For Machine Intelligence
305 pages
Tensorflow Tutorial PDF
100% (6)
Tensorflow Tutorial PDF
90 pages
Machine Learning - An Applied Mathematics Introduction PDF
100% (13)
Machine Learning - An Applied Mathematics Introduction PDF
246 pages
Understanding Machine Learning
100% (69)
Understanding Machine Learning
416 pages
Machine Learning Projects in Python
100% (16)
Machine Learning Projects in Python
135 pages
Deep Learning - Fundamentals, Theory and Applications 2019 PDF
100% (10)
Deep Learning - Fundamentals, Theory and Applications 2019 PDF
168 pages
Sample Question For Applying An Mathematics Teacher
No ratings yet
Sample Question For Applying An Mathematics Teacher
8 pages
Problem Set 2
No ratings yet
Problem Set 2
2 pages
mat67-Lj-Inner Product Spaces
No ratings yet
mat67-Lj-Inner Product Spaces
13 pages
S6 Assignment I On CH 1 To 3
No ratings yet
S6 Assignment I On CH 1 To 3
3 pages
AL Applied Mathematics 2001 Paper1+2 (E)
No ratings yet
AL Applied Mathematics 2001 Paper1+2 (E)
11 pages
Michael Okpara University of Agriculture, Umudike
No ratings yet
Michael Okpara University of Agriculture, Umudike
2 pages
Backus Forma PDF
No ratings yet
Backus Forma PDF
20 pages
Maths Blue Print
No ratings yet
Maths Blue Print
1 page
Final Algebra Lineal Examen
No ratings yet
Final Algebra Lineal Examen
10 pages
Mathematics in Modern World
No ratings yet
Mathematics in Modern World
3 pages
Math Topical Test 04 - Estimation 2022-2023
No ratings yet
Math Topical Test 04 - Estimation 2022-2023
1 page
Chapter 3
No ratings yet
Chapter 3
7 pages
Machine Learning Simplified
100% (1)
Machine Learning Simplified
109 pages
Clinical Evaluation of Correction Algorithm For Corvis ST Tonometry (Post)
No ratings yet
Clinical Evaluation of Correction Algorithm For Corvis ST Tonometry (Post)
1 page
Deep Learning Decoding Problems
100% (1)
Deep Learning Decoding Problems
103 pages
Computer Programming: A Simplified Entry to Python, Java, and C++ Programming for Beginners
From Everand
Computer Programming: A Simplified Entry to Python, Java, and C++ Programming for Beginners
Lena Neill
No ratings yet
Visual Word: Unlocking the Power of Image Understanding
From Everand
Visual Word: Unlocking the Power of Image Understanding
Fouad Sabry
No ratings yet
Semantic Network: Fundamentals and Applications
From Everand
Semantic Network: Fundamentals and Applications
Fouad Sabry
No ratings yet

Distributed Word Representations For Information Retrieval

Uploaded by

Distributed Word Representations For Information Retrieval

Uploaded by

CSE 435/535

Distributed Word Representations

How can we more robustly match a

A pure keyword-matching IR system does nothing to help….

How can we more robustly match a

Example of manual thesaurus

Search log query expansion

Automatic Thesaurus Generation

Simple Co-occurrence Thesaurus

Automatic thesaurus generation

How can we represent term relations?

Can you directly learn term relations?

§ Cf. Query translation models: Berger and Lafferty (1999)

Is there a better way?

§ But we’ll develop direct similarity in this class

Distributional similarity based

§ One of the most successful ideas of modern

ë These words will represent banking ì

Solution: Low dimensional vectors

• How to reduce the dimensionality?

Word meaning is defined in terms of

Neural word embeddings - visualization

Basic idea of learning neural network word

Idea: Directly learn low-dimensional word

Word2vec is a family of algorithms

Predict between every word and its context words!

Two (moderately efficient) training methods

Word2Vec Skip-gram Overview

! "#)( | "# ! "#$( | "#

! "#)' | "# ! "#$' | "#

… problems turning into banking crises as …

outside context words center word outside context words

Linear Relationships in word2vec

+ king [ 0.30 0.70 ] queen

+ woman [ 0.60 0.30 ]

queen [ 0.70 0.80 ] man

Glove Visualizations: Company - CEO

Glove Visualizations: Superlatives

Application to Information Retrieval

An application to information retrieval

Builds on BM25 model idea of “aboutness”

Modeling document aboutness:

Using 2 word embeddings

We can gain by using these

Using 2 word embeddings

Dual Embedding Space Model (DESM)

§ Query-document similarity is average over query

Dual Embedding Space Model (DESM)

§ This way similarity measures aboutness – words that

Results – reranking k-best list

Pretty decent gains – e.g., 2% for NDCG@3

Results – whole ranking system

By itself, DESM doesn’t work. Nor does LSA.

IN-OUT has some ability to prefer Relevant to close-by

§ For example, DESM can confuse Oxford and Cambridge

What else can neural nets do in IR?

What else can neural nets do in IR?

Summary: Embed all the things!

Lots of applications wherever knowing word context or

Global vs. local embedding [Diaz 2016]

Global vs. local embedding [Diaz 2016]

Train w2v on documents from

Fine-grained word sense

Ad-hoc retrieval using local and

You might also like