0% found this document useful (0 votes)

3 views

Lecture12 - Word RepEmb

Uploaded by

1162407364

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Lecture12 - Word RepEmb

Uploaded by

1162407364

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Natural Language

Processing
Lecture 12: Lexical Semantics (part I) -
Word Representations and Word Embeddings.

11/30/2020

COMS W4705
Yassine Benajiba
Jabberwocky
• Can you identify what the words in this poem mean?

Beware the jabberwock, my son

the jaws that bite, the claws that catch!

Beware the jubjub bird, and

the frumious bandersnatch!
"Jabberwocky", Lewis Carroll, 1871
Semantic Similarity and
Relatedness
• We can often tell that two words are similar or related, even if
they aren't exact synonyms.

• "fast" is similar to "rapid" and "speed"

• "tall" is similar to "high" and "height"

• Question answering:

• Q: "How tall is Mt. Everest?"

• Candidate A: "The official height of Mount Everest is 29029

feet"
Relatedness
• "cat" is more similar to "dog" than to "table"

• "table" is more similar to "chair" than to "dog"

• "run" is more similar to "fly" than to "think".

• "cat" is more similar to "meow" than to "bark".

Single Word Representation:
One-Hot Vector

0
0 a
⋮
⋮
1
fish
⋮
⋮
⋮
⋮
⋮
|V| zythum⋮
0

What about unseen words?

Unknown Words
A bottle of tesgüino is on the table.
Everybody likes tesgüino.
Tesgüino makes you drunk.
We make tesgüino out of corn.
Example from Nida, 1975.
• Can you figure out from context what tesgüino means?

• Some kind of alcoholic beverage, maybe beer or

whisky.

• Intuition: Two words should be similar if they have similar

typical word contexts.
How would you represent context?
Distributional Hypothesis
• Wittgenstein ("Philosophical Investigations):
"the meaning of a word is in its use in the language"

• Zelig Harris (1954):

“oculist and eye-doctor … occur in almost the same environments”

“If A and B have almost identical environments we say that they

are synonyms.”

•J.R. Firth (1957)

"you shall know a word by the company it keeps!"
Co-occurence Matrix
⌼ ⊞ ⊛ ⋔ ⏈ ⍾
⌘ 51 20 84 0 3 0

⌓ 52 58 4 4 6 26
sim(⊠,⌘) = 0.770
⊠ 115 89 10 42 33 17 sim(⊠,⁂) = 0.939
sim(⊠,⌓) = 0.961
⊚ 59 39 23 4 0 0

⁙ 98 14 6 2 1 0

⁂ 12 17 3 2 9 27

⎔ 11 2 2 0 18 0

• Numbers are co-occurence counts (how often the symbols appear

together in context).
• Which symbol is most similar to ⊠?
What it really looks like
get see use hear eat kill
knife 51 20 84 0 3 0

cat 52 58 4 4 6 26 sim(dog,knife) = 0.770

sim(dog,boat) = 0.939
dog 115 89 10 42 33 17
sim(dog,cat) = 0.961
boat 59 39 23 4 0 0

cup 98 14 6 2 1 0

pig 12 17 3 2 9 27

berry 11 2 2 0 18 0
Verb-Object counts

• Row vector xdog describes usage of dog as a grammatical object in the corpus.
• Can be seen as coordinates in n-dimensional Euclidean space.
Geometric Interpretation
• Row vector xdog describes usage of
dog in the corpus.
• Can be seen as coordinates in
n-dimensional Euclidean space.
• Illustrated for two dimensions "get"
and "use".

xdog = (115, 10)

Geometric Interpretation
• How should we compute
similarity?
• First approach: Spatial
distance between words.
• (lower distance = higher
similarity)
• Potential problem: location
depends on frequency of noun
count(dog) ≈ 2.7 count(cat)
Geometric Interpretation
• How should we compute
similarity?
• Second approach:
• Direction is more
important than location.

• Normalize "length" ||xdog|| of

vector.
α=54.3°
• or use angle α as distance sim(dog, knife)=0.58
measure (or cos of these
angles).
Cosine Similarity

Colinear vectors (same direction):

α=54.3°
Orthogonal vectors
(90° angle, no shared attributes):
What to do with DSM
similarities

• Most similar to school:

country (49.3), church (52.1), hospital (53.1), house (54.4),
hotel (55.1), industry (57.0), company (57.0), home (57.7),
family (58.4), university (59.0), party (59.4), group (59.5),
building (59.8), market (60.3), bank (60.4), business (60.9),
area (61.4), department (61.6), club (62.7), town (63.3),
library (63.3), room (63.6), service (64.4), police (64.7),...
Clustering and Semantic
Maps
• Distributional Similarity/Distance
can be used to

• find nearest neighbors (similar

words)

• cluster related words into

hierarchical categories.

• construct semantic maps.

Variations of Distributional
Semantic Models
• A Distributional Semantic Model (DSM) is any matrix M such that
each row represents the distribution of a term x across contexts,
together with a similarity measurement.

• The previous example shows one particular semantic space

(frequency counts of Verb-object co-occurences).

• There are many different models we could choose.

• Different models might capture different "types" of similarity.

Dimensions of Distributional
Semantic Models
1. Preprocessing, definition of "terms" (word form, lemmas, POS, ...).

2. Context definition:

• Type of context (word, syntactic dependents (with or without relation

labels labels), remove stop-words, etc.)

• Size of context window.

3. Feature scaling / term weighting (association measures).

4. Normalization of rows / columns.

5. Dimensionality reduction.

6. Similarity measure.
Effect of context size
Nearest neighbors of dog

2-word window:
cat, horse, fox, pet, rabbit, pig, animal, mongrel, sheep,pigeon

30-word window:
kennel, puppy, pet, terrier, rottweiler, canine, cat, to bark, Alsatian
Term Weighting
• Problem: Not all context terms are equally relevant to
characterize the meaning of a word.

• Some appear too often, some are too rare (Zipfian

distribution). just right
"general"
too frequent "eat"
"the","a","can","may", "explosion" too rare
... (function words) "nations" "antiproliferative"
... "87-year-old"
"Uni7ied"

• One solution: TFIDF (term frequency inverse-document

frequency)
TF*IDF
• Originates in document retrieval (find document relevant to a
keyword). For DSM: 'document' = target word d.

• Term frequency: How often does the term t appear in the context
window of the target word?

• Inverse document frequency: For how many words does t appear in

the context window

• TF*IDF:
Sparse vs. Dense Vectors
• Full co-occurrence matrix is very big and contains a lot of 0 entries.

• Potentially inconvenient to store. Slow computation.

• Synonyms may still contain orthogonal dimensions, which are

irrelevant.

• Word embeddings are representations of words in a low-

dimensional, dense vector space. There are two main approaches:

• Use matrix decomposition on co-occurence matrix, for example

Singular Value Decomposition (SVD).

• Learn embeddings using neural networks. Minimal feature-

engineering required.
Learning Word Embeddings
with Neural Networks
• The neural network should capture the relationship between a
word and its context.

• Two models:
(Word2Vec, Mikolov et al. 2013)

• Skip-Gram model: Input is a single word.

Predict a probability for each context word.

• Continuous bag-of-words (BOW):

Input is a representation of the context window.
Predict a probability for each target word.

• Inspired by Neural Language Models (Bengio et al. 2003)

Skip-Gram Model
• Input:
A single word in one-hot representation.

• Output: probability to see any single word as a context word.

0.02 a
0 d hidden
⋮
neurons 0.0 thought
0 Σ
0.04 cheese
eat 1 Σ
0 ⋮ 0.03 place
⋮ Σ
⋮

0 0.0 run
|V| neurons |V| neurons
softmax activation
• Softmax function normalizes the activation of the output neurons to sum up to 1.0.
Skip-Gram Model
• Compute error with respect to each context word.
wt-c place ...a place to eat delicious cheese .

⋮ (eat, place)
(eat, to)
wt-1 to (eat, delicious)
eat (eat,cheese)
wt+1 delicious
wt
⋮

wt+c cheese

• Combine errors for each word, then use combined error to update
weights using back-propagation.
Embeddings are Magic
(Mikolov 2016)

vector(‘king’) - vector(‘man’) + vector(‘woman’) ≈ vector(‘queen’)

Application: Word Pair
Relationships
Using Word Embeddings
• Word2Vec:

• https://code.google.com/archive/p/word2vec/

• GloVe: Global Vectors for Word Representation

• https://nlp.stanford.edu/projects/glove/

• Can either use pre-trained word embeddings or train them

on a large corpus.
Acknowledgments

• Some content adapted from slides by Kathy McKeown,

Dan Jurafsky, Stefan Evert, Marco Baroni

Compare and Contrast The Procedural, Object Oriented and Event Driven Programming in Source Code of Application
100% (2)
Compare and Contrast The Procedural, Object Oriented and Event Driven Programming in Source Code of Application
3 pages
Novel Outline 2
No ratings yet
Novel Outline 2
6 pages
Class Program: Antipolo National High School
No ratings yet
Class Program: Antipolo National High School
3 pages
Vector Semantics and Embedding (part 1)
No ratings yet
Vector Semantics and Embedding (part 1)
66 pages
Lecture 3. 6 - Vector - Apr18 - 2021
No ratings yet
Lecture 3. 6 - Vector - Apr18 - 2021
106 pages
4.Machine Learning Word Embedding-1
No ratings yet
4.Machine Learning Word Embedding-1
36 pages
Word Embeddings
No ratings yet
Word Embeddings
59 pages
21 Word2Vec 24 09 2024
No ratings yet
21 Word2Vec 24 09 2024
63 pages
4 Word Representation
No ratings yet
4 Word Representation
41 pages
Week5
No ratings yet
Week5
26 pages
COMP5046: Natural Language Processing
No ratings yet
COMP5046: Natural Language Processing
71 pages
Lecture 3. Vector Semantics
No ratings yet
Lecture 3. Vector Semantics
51 pages
Ling571 Class14 Distr Thes
No ratings yet
Ling571 Class14 Distr Thes
122 pages
6 Vector Apr18 2021
No ratings yet
6 Vector Apr18 2021
106 pages
NLP Session 1-7 Bt Dr.chetna
No ratings yet
NLP Session 1-7 Bt Dr.chetna
469 pages
NLP-UNIT-4 (1) (1)
No ratings yet
NLP-UNIT-4 (1) (1)
23 pages
week2and3
No ratings yet
week2and3
76 pages
Vector Based Models
No ratings yet
Vector Based Models
41 pages
3 WordMeaning
No ratings yet
3 WordMeaning
78 pages
Wordembedding
No ratings yet
Wordembedding
25 pages
Neural Models For NLP
No ratings yet
Neural Models For NLP
67 pages
Lect04
No ratings yet
Lect04
44 pages
CS224d Deep Learning For Natural Language Processing Lecture 2: Word Vectors
No ratings yet
CS224d Deep Learning For Natural Language Processing Lecture 2: Word Vectors
40 pages
Constructing and Evaluating Word Embeddings
No ratings yet
Constructing and Evaluating Word Embeddings
33 pages
2 Vector Semantics
No ratings yet
2 Vector Semantics
64 pages
Similarity Metric
No ratings yet
Similarity Metric
13 pages
5b. Word Vectors
No ratings yet
5b. Word Vectors
24 pages
CCS369 - TSS-Unit 2
No ratings yet
CCS369 - TSS-Unit 2
56 pages
6 (1)
No ratings yet
6 (1)
34 pages
Ed3book - Jan72023 111 141
No ratings yet
Ed3book - Jan72023 111 141
31 pages
Week11 WordEmbedding
No ratings yet
Week11 WordEmbedding
20 pages
Vector Semantics and Embedding (part 2)
No ratings yet
Vector Semantics and Embedding (part 2)
47 pages
Vector Semantics
No ratings yet
Vector Semantics
18 pages
Unit 2a
No ratings yet
Unit 2a
51 pages
XCS224N_Module1_Slides
No ratings yet
XCS224N_Module1_Slides
72 pages
WordNet Embeddings
No ratings yet
WordNet Embeddings
10 pages
Lecture -7 PPMI
No ratings yet
Lecture -7 PPMI
37 pages
Unit - 3 Distributional Semantics and Word Embedding
No ratings yet
Unit - 3 Distributional Semantics and Word Embedding
69 pages
Lebijp 59 SZ 31 Py
No ratings yet
Lebijp 59 SZ 31 Py
69 pages
Coals
No ratings yet
Coals
17 pages
Tac Lde Notation Graph
No ratings yet
Tac Lde Notation Graph
12 pages
Introduction To Distributional Semantics
No ratings yet
Introduction To Distributional Semantics
43 pages
Vector Semantics
No ratings yet
Vector Semantics
83 pages
Vector Semantics 2 Word Embeddings (Vector Semantics)
No ratings yet
Vector Semantics 2 Word Embeddings (Vector Semantics)
5 pages
Spanish Word Vectors From Wikipedia: Mathias Etcheverry, Dina Wonsever
No ratings yet
Spanish Word Vectors From Wikipedia: Mathias Etcheverry, Dina Wonsever
5 pages
The Vector Space Model of Word Meaning: Informatics 1 CG: Lecture 13
No ratings yet
The Vector Space Model of Word Meaning: Informatics 1 CG: Lecture 13
46 pages
4. WordRepresentation
No ratings yet
4. WordRepresentation
26 pages
CSF-429 - L3-L6 Word Vectors
No ratings yet
CSF-429 - L3-L6 Word Vectors
34 pages
Semantic Relatedness Applied To All Words Sense Disambiguation
No ratings yet
Semantic Relatedness Applied To All Words Sense Disambiguation
72 pages
LSAfun
No ratings yet
LSAfun
35 pages
lecture 10
No ratings yet
lecture 10
86 pages
Levy Improving Distributional
No ratings yet
Levy Improving Distributional
16 pages
Christopher Manning Lecture 2: Word Vectors, Word Senses, and Neural Classifiers
No ratings yet
Christopher Manning Lecture 2: Word Vectors, Word Senses, and Neural Classifiers
57 pages
Text Similarity Cosine BOW TF-IDF Lecture
No ratings yet
Text Similarity Cosine BOW TF-IDF Lecture
6 pages
NLP Prez Word - Sentence Embedding - MAQUET - MARTIN - LEEFEBURE - MOGAVERO
No ratings yet
NLP Prez Word - Sentence Embedding - MAQUET - MARTIN - LEEFEBURE - MOGAVERO
18 pages
Word and Document Embeddings
No ratings yet
Word and Document Embeddings
94 pages
Alexu Aux Bert
No ratings yet
Alexu Aux Bert
5 pages
NLP - Experiment - 8 - A10
No ratings yet
NLP - Experiment - 8 - A10
16 pages
Word Embeddings
No ratings yet
Word Embeddings
55 pages
Trigram 11
No ratings yet
Trigram 11
16 pages
Something To Reckon With: The Logic of Terms
From Everand
Something To Reckon With: The Logic of Terms
George Englebretsen
No ratings yet
The Truth About the 396-Matrix
From Everand
The Truth About the 396-Matrix
Luc De Smet
No ratings yet
Mathematics Puzzles, Grades 4 - 8
From Everand
Mathematics Puzzles, Grades 4 - 8
Mark Twain Media
No ratings yet
Synthesis of Camphor PDF
No ratings yet
Synthesis of Camphor PDF
4 pages
National Payment Schemes White Paper
100% (1)
National Payment Schemes White Paper
20 pages
Chuyên Đề 13 Các Từ (Cụm Từ) Diễn Tả Số Lượng (Expressions Of Quantity)
No ratings yet
Chuyên Đề 13 Các Từ (Cụm Từ) Diễn Tả Số Lượng (Expressions Of Quantity)
8 pages
Download Complete Environmental and Economic Sustainability Environmental and Ecological Risk Assessment 1st Edition Paul E. Hardisty PDF for All Chapters
No ratings yet
Download Complete Environmental and Economic Sustainability Environmental and Ecological Risk Assessment 1st Edition Paul E. Hardisty PDF for All Chapters
67 pages
Morgan Stanley-Internship JD (App Dev Track)
No ratings yet
Morgan Stanley-Internship JD (App Dev Track)
3 pages
Instant Download Business Marketing Management B2B 12th Edition by Michael D. Hutt PDF All Chapter
100% (1)
Instant Download Business Marketing Management B2B 12th Edition by Michael D. Hutt PDF All Chapter
24 pages
Question Bank of Entre PDF
100% (1)
Question Bank of Entre PDF
3 pages
Advanced Accounting Volume 2 1 PDF
No ratings yet
Advanced Accounting Volume 2 1 PDF
4 pages
Development of An Animal Adaptability Index Application For Dairy Cows
No ratings yet
Development of An Animal Adaptability Index Application For Dairy Cows
31 pages
Visual Mnemonics Microbiology 40 Medical Mnemonics 41 40 Mar 4 2019 41 40 b07pgfhwm3 41 40 Independently Published 41
100% (2)
Visual Mnemonics Microbiology 40 Medical Mnemonics 41 40 Mar 4 2019 41 40 b07pgfhwm3 41 40 Independently Published 41
68 pages
Basic IUPAC Nomenclature of Organic Compounds
100% (1)
Basic IUPAC Nomenclature of Organic Compounds
15 pages
Western Colleges, Inc.: High School Department
No ratings yet
Western Colleges, Inc.: High School Department
11 pages
Class 5 Subject S. St
No ratings yet
Class 5 Subject S. St
2 pages
4 Introduction of RMA
No ratings yet
4 Introduction of RMA
14 pages
RFP+for+Preparation+of+Strategic+Concept+Plan VMRDA PDF
No ratings yet
RFP+for+Preparation+of+Strategic+Concept+Plan VMRDA PDF
70 pages
SG 09
No ratings yet
SG 09
76 pages
US vs. Mack
100% (1)
US vs. Mack
2 pages
1 - Del Mar Retrospectos
No ratings yet
1 - Del Mar Retrospectos
15 pages
Classroom Mangament Plan
No ratings yet
Classroom Mangament Plan
13 pages
Consumer Buying Motives Questionnaire - VLCC
80% (5)
Consumer Buying Motives Questionnaire - VLCC
8 pages
The Custom House As A Peritex of The Scarlet Letter - Docx-1
No ratings yet
The Custom House As A Peritex of The Scarlet Letter - Docx-1
7 pages
A Role Model - Narendra Modi
67% (3)
A Role Model - Narendra Modi
3 pages
Siban Sajid Patait: Career Objective
No ratings yet
Siban Sajid Patait: Career Objective
2 pages
CHAPTER 1 Importance of Work Immersion
100% (2)
CHAPTER 1 Importance of Work Immersion
2 pages
Hypro Technical: Safety Data Sheet
No ratings yet
Hypro Technical: Safety Data Sheet
7 pages
LP - Communion of Saints
No ratings yet
LP - Communion of Saints
6 pages
Poetry Assignment
No ratings yet
Poetry Assignment
9 pages