0% found this document useful (0 votes)

10 views66 pages

Vector Semantics and Embedding (Part 1)

The document discusses the complexities of word meaning, emphasizing the need for a theory that encompasses various linguistic relations such as synonymy, antonymy, and connotation. It introduces vector semantics as a computational model to represent word meanings through embeddings, allowing for better handling of word similarity and context. Additionally, it covers techniques like TF-IDF and PPMI for measuring word relevance and associations in natural language processing.

Uploaded by

Phương Trang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views66 pages

Vector Semantics and Embedding (Part 1)

Uploaded by

Phương Trang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 66

Word Meaning

Vector
Semantics &
Embeddings
What do words mean?
N-gram or text classification methods we've seen so far
◦ Words are just strings (or indices wi in a vocabulary list)
◦ That's not very satisfactory!
Introductory logic classes:
◦ The meaning of "dog" is DOG; cat is CAT
∀x DOG(x) ⟶ MAMMAL(x)
Old linguistics joke by Barbara Partee in 1967:
◦ Q: What's the meaning of life?
◦ A: LIFE
That seems hardly better!
Desiderata
What should a theory of word meaning do for us?
Let's look at some desiderata
From lexical semantics, the linguistic study of word
meaning
Lemmas and senses
lemma

mouse (N)
sense
1. any of numerous small rodents...
2. a hand-operated device that controls
a cursor... Modified from the online thesaurus WordNet

A sense or “concept” is the meaning component of a word

Lemmas can be polysemous (have multiple senses)
Relations between senses: Synonymy
Synonyms have the same meaning in some or all
contexts.
◦ filbert / hazelnut
◦ couch / sofa
◦ big / large
◦ automobile / car
◦ vomit / throw up
◦ water / H20
Relations between senses: Synonymy
Note that there are probably no examples of perfect
synonymy.
◦ Even if many aspects of meaning are identical
◦ Still may differ based on politeness, slang, register,
genre, etc.
Relation: Synonymy?
water/H20
"H20" in a surfing guide?
big/large
my big sister != my large sister
The Linguistic Principle of Contrast

Difference in form 🡪 difference in meaning

Abbé Gabriel Girard 1718
Re: "exact" synonyms
"

[I do not believe that there

is a synonymous word in any
language]

Thanks to Mark Aronoff!

Relation: Similarity
Words with similar meanings. Not synonyms, but sharing
some element of meaning

car, bicycle
cow, horse
Ask humans how similar 2 words are

word1 word2 similarity

vanish disappear 9.8
behave obey 7.3
belief impression 5.95
muscle bone 3.65
modest flexible 0.98
hole agreement 0.3

SimLex-999 dataset (Hill et al., 2015)

Relation: Word relatedness
Also called "word association"
Words can be related in any way, perhaps via a semantic
frame or field

◦ coffee, tea: similar

◦ coffee, cup: related, not similar
Semantic field
Words that
◦ cover a particular semantic domain
◦ bear structured relations with each other.

hospitals
surgeon, scalpel, nurse, anaesthetic, hospital
restaurants
waiter, menu, plate, food, menu, chef
houses
door, roof, kitchen, family, bed
Relation: Antonymy
Senses that are opposites with respect to only one
feature of meaning
Otherwise, they are very similar!
dark/light short/long fast/slow rise/fall
hot/cold up/down in/out
More formally: antonyms can
◦ define a binary opposition or be at opposite ends of a scale
◦ long/short, fast/slow
◦ Be reversives:
◦ rise/fall, up/down
Connotation (sentiment)

• Words have affective meanings

• Positive connotations (happy)
• Negative connotations (sad)
• Connotations can be subtle:
• Positive connotation: copy, replica, reproduction
• Negative connotation: fake, knockoff, forgery
• Evaluation (sentiment!)
• Positive evaluation (great, love)
• Negative evaluation (terrible, hate)
Connotation
Osgood et al. (1957)

Words seem to vary along 3 affective dimensions:

◦ valence: the pleasantness of the stimulus
◦ arousal: the intensity of emotion provoked by the stimulus
◦ dominance: the degree of control exerted by the stimulus
Word Score Word Score
Valence love 1.000 toxic 0.008
happy 1.000 nightmare 0.005
Arousal elated 0.960 mellow 0.069
frenzy 0.965 napping 0.046
Dominance powerful 0.991 weak 0.045
leadership 0.983 empty 0.081

Values from NRC VAD Lexicon (Mohammad 2018)

So far
Concepts or word senses
◦ Have a complex many-to-many association with words
(homonymy, multiple senses)
Have relations with each other
◦ Synonymy
◦ Antonymy
◦ Similarity
◦ Relatedness
◦ Connotation
Word Meaning
Vector
Semantics &
Embeddings
Vector Semantics
Vector
Semantics &
Embeddings
Computational models of word meaning

Can we build a theory of how to represent word

meaning, that accounts for at least some of the
desiderata?
We'll introduce vector semantics
The standard model in language processing!
Handles many of our goals!
Ludwig Wittgenstein

PI #43:
"The meaning of a word is its use in the language"
Let's define words by their usages
One way to define "usage":
words are defined by their environments (the words around them)

Zellig Harris (1954):

If A and B have almost identical environments we say that they
are synonyms.
What does recent English borrowing ongchoi
mean?
Suppose you see these sentences:
• Ong choi is delicious sautéed with garlic.
• Ong choi is superb over rice
• Ong choi leaves with salty sauces
And you've also seen these:
• …spinach sautéed with garlic over rice
• Chard stems and leaves are delicious
• Collard greens and other salty leafy greens
Conclusion:
◦ Ongchoi is a leafy green like spinach, chard, or collard greens
◦ We could conclude this based on words like "leaves" and "delicious" and "sauteed"
Ongchoi: Ipomoea aquatica "Water Spinach"

空心菜
kangkong
rau muống
…

Yamaguchi, Wikimedia Commons, public domain

Idea 1: Defining meaning by linguistic distribution

Let's define the meaning of a word by its

distribution in language use, meaning its
neighboring words or grammatical environments.
Idea 2: Meaning as a point in space (Osgood et al.
1957)
3 affective dimensions for a word
◦ valence: pleasantness
◦ arousal: intensity of emotion
◦ dominance: the degree of control exerted
Word Score Word Score
Valence love 1.000 toxic 0.008
happy 1.000 nightmare 0.005
Arousal elated 0.960 mellow 0.069 NRC VAD Lexicon
frenzy 0.965 napping 0.046 (Mohammad 2018)

Dominance powerful 0.991 weak 0.045

◦ leadership 0.983 empty 0.081

Hence the connotation of a word is a vector in 3-space

Idea 1: Defining meaning by linguistic distribution

Idea 2: Meaning as a point in multidimensional space

Defining meaning as a point in space based on distribution
Each word = a vector (not just "good" or "w45")
Similar words are "nearby in semantic space"
We build this space automatically by seeing which words are
nearby in text
We define meaning of a word as a
vector
Called an "embedding" because it's embedded into a
space (see textbook)
The standard way to represent meaning in NLP
Every modern NLP algorithm uses embeddings as
the representation of word meaning
Fine-grained model of meaning for similarity
Intuition: why vectors?
Consider sentiment analysis:
◦ With words, a feature is a word identity
◦ Feature 5: 'The previous word was "terrible"'
◦ requires exact same word to be in training and test
◦ With embeddings:
◦ Feature is a word vector
◦ 'The previous word was vector [35,22,17…]
◦ Now in the test set we might see a similar vector [34,21,14]
◦ We can generalize to similar but unseen words!!!
We'll discuss 2 kinds of
embeddings
tf-idf
◦ Information Retrieval workhorse!
◦ A common baseline model
◦ Sparse vectors
◦ Words are represented by (a simple function of) the counts of nearby
words
Word2vec
◦ Dense vectors
◦ Representation is created by training a classifier to predict whether a
word is likely to appear nearby
◦ Later we'll discuss extensions called contextual embeddings
From now on:
Computing with meaning representations
instead of string representations
Vector Semantics
Vector
Semantics &
Embeddings
Words and Vectors
Vector
Semantics &
Embeddings
Term-document matrix
Each document is represented by a vector of words
Visualizing document vectors
Vectors are the basis of information retrieval

Vectors are similar for the two comedies

But comedies are different than the other two

Comedies have more fools and wit and fewer battles.
Idea for word meaning: Words can be vectors
too!!!

battle is "the kind of word that occurs in Julius Caesar and Henry V"

fool is "the kind of word that occurs in comedies, especially Twelfth Night"
More common: word-word matrix
(or "term-context matrix")
Two words are similar in meaning if their context vectors are similar
Words and Vectors
Vector
Semantics &
Embeddings
Cosine for computing word similarity
Vector
Semantics &
Embeddings
Computing word similarity: Dot product and
cosine
The dot product between two vectors is a scalar:

The dot product tends to be high when the two

vectors have large values in the same dimensions
Dot product can thus be a useful similarity metric
between vectors
Problem with raw dot-product
Dot product favors long vectors
Dot product is higher if a vector is longer (has higher
values in many dimension)
Vector length:

Frequent words (of, the, you) have long vectors (since

they occur many times with other words).
So dot product overly favors frequent words
Alternative: cosine for computing word similarity

Based on the definition of the dot product between two vectors a and b
Cosine as a similarity metric

-1: vectors point in opposite directions

+1: vectors point in same directions
0: vectors are orthogonal

But since raw frequency values are non-negative, the

cosine for term-term matrix vectors ranges from 0–1

Cosine examples
pie data computer
cherry 442 8 2
digital 5 1683 1670
information 5 3982 3325

47
Visualizing cosines
(well, angles)
Cosine for computing word
Vector similarity
Semantics &
Embeddings
TF-IDF
Vector
Semantics &
Embeddings
But raw frequency is a bad representation

• The co-occurrence matrices we have seen represent each

cell by word frequencies.
• Frequency is clearly useful; if sugar appears a lot near
apricot, that's useful information.
• But overly frequent words like the, it, or they are not very
informative about the context
• It's a paradox! How can we balance these two conflicting
constraints?
Two common solutions for word weighting

Words like "the" or "it" have very low idf

See if words like "good" appear more often with "great"

than we would expect by chance
Term frequency (tf)
tft,d = count(t,d)

Instead of using raw count, we squash a bit:

tft,d = log10(count(t,d)+1)
Document frequency (df)
dft is the number of documents t occurs in.
(note this is not collection frequency: total count across
all documents)
"Romeo" is very distinctive for one Shakespeare play:
Inverse document frequency (idf)

N is the total number of documents

in the collection
What is a document?

Could be a play or a Wikipedia article

But for the purposes of tf-idf, documents can be
anything; we often call each paragraph a document!
Final tf-idf weighted value for a word
Raw counts:

tf-idf:
TF-IDF
Vector
Semantics &
Embeddings
PPMI
Vector
Semantics &
Embeddings
Pointwise Mutual Information
Positive Pointwise Mutual Information
Computing PPMI on a term-context
matrix
Matrix F with W rows (words) and C columns (contexts)
fij is # of times wi occurs in context cj

62
p(w=information,c=data) = 3982/111716 = .3399
p(w=information) = 7703/11716 = .6575
p(c=data) = 5673/11716 = .4842

63
pmi(information,data) = log2 (.3399 / (.6575*.4842) ) = .0944

Resulting PPMI matrix (negatives replaced by 0)

64
Weighting PMI
PMI is biased toward infrequent events
◦ Very rare words have very high PMI values
Two solutions:
◦ Give rare words slightly higher probabilities
◦ Use add-one smoothing (which has a similar effect)

65
Weighting PMI: Giving rare context words
slightly higher probability

Understanding Distributed Systems 2nd Edition 1838430210
100% (3)
Understanding Distributed Systems 2nd Edition 1838430210
346 pages
Smartone Mobile Communications
100% (2)
Smartone Mobile Communications
1 page
Lecture1 Word Embeddings
No ratings yet
Lecture1 Word Embeddings
99 pages
LJ-G Series: User's Manual
No ratings yet
LJ-G Series: User's Manual
320 pages
Services Guide
No ratings yet
Services Guide
72 pages
NLP Session 1-7 BT DR - Chetna
No ratings yet
NLP Session 1-7 BT DR - Chetna
469 pages
CS585 Lecture October17th
No ratings yet
CS585 Lecture October17th
104 pages
Java Package - Javatpoint
No ratings yet
Java Package - Javatpoint
11 pages
NMAP Testing: Iptables Flushed in The Target (Default)
No ratings yet
NMAP Testing: Iptables Flushed in The Target (Default)
5 pages
Haloalkane and Haloarenes
No ratings yet
Haloalkane and Haloarenes
12 pages
6 Vector Apr18 2021
No ratings yet
6 Vector Apr18 2021
106 pages
2 Vector Semantics
No ratings yet
2 Vector Semantics
64 pages
Lecture - 7 PPMI
No ratings yet
Lecture - 7 PPMI
37 pages
DM Chapter 9 - Word Embedding
No ratings yet
DM Chapter 9 - Word Embedding
7 pages
4.machine Learning Word Embedding-1
No ratings yet
4.machine Learning Word Embedding-1
36 pages
Neural Models For NLP
No ratings yet
Neural Models For NLP
67 pages
Wordnet
No ratings yet
Wordnet
51 pages
Vector Semantics and Embeddings: Smilodon Thylacosmilus
No ratings yet
Vector Semantics and Embeddings: Smilodon Thylacosmilus
34 pages
Ling571 Class14 Distr Thes
No ratings yet
Ling571 Class14 Distr Thes
122 pages
02 Word Clustering
No ratings yet
02 Word Clustering
23 pages
COMP5046: Natural Language Processing
No ratings yet
COMP5046: Natural Language Processing
71 pages
NLP Unit 4
No ratings yet
NLP Unit 4
23 pages
Week 2 and 3
No ratings yet
Week 2 and 3
76 pages
Lecture09 - Lexical Semantics
No ratings yet
Lecture09 - Lexical Semantics
44 pages
Lecture 3. 6 - Vector - Apr18 - 2021
No ratings yet
Lecture 3. 6 - Vector - Apr18 - 2021
106 pages
Unit 3 NLP
No ratings yet
Unit 3 NLP
103 pages
NLP Lexical Semnatics Slides
No ratings yet
NLP Lexical Semnatics Slides
55 pages
Word Embeddings 1
No ratings yet
Word Embeddings 1
42 pages
Word Embeddings
No ratings yet
Word Embeddings
59 pages
The Vector Space Model of Word Meaning: Informatics 1 CG: Lecture 13
No ratings yet
The Vector Space Model of Word Meaning: Informatics 1 CG: Lecture 13
46 pages
3 WordMeaning
No ratings yet
3 WordMeaning
78 pages
Ed3book - Jan72023 111 141
No ratings yet
Ed3book - Jan72023 111 141
31 pages
Part 4 - Solution Design Documents - What You Need To Know
No ratings yet
Part 4 - Solution Design Documents - What You Need To Know
29 pages
Encabit-Plus Mounting Description.V160.en
No ratings yet
Encabit-Plus Mounting Description.V160.en
3 pages
CS 3 Vector Semantics
No ratings yet
CS 3 Vector Semantics
46 pages
XCS224N Module1 Slides
No ratings yet
XCS224N Module1 Slides
72 pages
ML4D-L6 nlp2
No ratings yet
ML4D-L6 nlp2
58 pages
Week 5
No ratings yet
Week 5
26 pages
21 Word2Vec 24 09 2024
No ratings yet
21 Word2Vec 24 09 2024
63 pages
Prebid
No ratings yet
Prebid
23 pages
Lexical Semantics
No ratings yet
Lexical Semantics
19 pages
Wordembedding
No ratings yet
Wordembedding
25 pages
Unit 2a
No ratings yet
Unit 2a
51 pages
Vector Semantics and Embeddings
No ratings yet
Vector Semantics and Embeddings
29 pages
Combining Local Context and Wordnet Similarity For Word Sense Identification
No ratings yet
Combining Local Context and Wordnet Similarity For Word Sense Identification
20 pages
Cheat Sheet - Odoo
No ratings yet
Cheat Sheet - Odoo
10 pages
Lect 04
No ratings yet
Lect 04
44 pages
Week11 WordEmbedding
No ratings yet
Week11 WordEmbedding
20 pages
Unit - 3 Distributional Semantics and Word Embedding
No ratings yet
Unit - 3 Distributional Semantics and Word Embedding
69 pages
Lecture 3. Vector Semantics
No ratings yet
Lecture 3. Vector Semantics
51 pages
Lecture12 - Word RepEmb
No ratings yet
Lecture12 - Word RepEmb
28 pages
Extracting Word Synonyms From Text Using Neural Approaches
No ratings yet
Extracting Word Synonyms From Text Using Neural Approaches
7 pages
Asset Management System Introduction
No ratings yet
Asset Management System Introduction
7 pages
Database Management System
No ratings yet
Database Management System
9 pages
Vector Semantics
No ratings yet
Vector Semantics
18 pages
National Career Service Portal: User Manual - COUNSELLOR v4.0
No ratings yet
National Career Service Portal: User Manual - COUNSELLOR v4.0
45 pages
PRAESENSA 2.10 Configuration Manual EnUS 100857072779
No ratings yet
PRAESENSA 2.10 Configuration Manual EnUS 100857072779
212 pages
Trigram 11
No ratings yet
Trigram 11
16 pages
Word 2 Vector
No ratings yet
Word 2 Vector
4 pages
4 Word Representation
No ratings yet
4 Word Representation
41 pages
Vector Based Models
No ratings yet
Vector Based Models
41 pages
CCS369 - TSS-Unit 2
No ratings yet
CCS369 - TSS-Unit 2
56 pages
11.chapter8 WordEmbedding
No ratings yet
11.chapter8 WordEmbedding
17 pages
Ewst
No ratings yet
Ewst
167 pages
Vector Semantics 2 Word Embeddings (Vector Semantics)
No ratings yet
Vector Semantics 2 Word Embeddings (Vector Semantics)
5 pages
Magelis Easy GXU HMIGXU5500
No ratings yet
Magelis Easy GXU HMIGXU5500
5 pages
5b. Word Vectors
No ratings yet
5b. Word Vectors
24 pages
Vector-Based Models of Semantic Composition: Jeff Mitchell and Mirella Lapata
No ratings yet
Vector-Based Models of Semantic Composition: Jeff Mitchell and Mirella Lapata
9 pages
Text Similarity
No ratings yet
Text Similarity
31 pages
Semantic Density Analysis: Comparing Word Meaning Across Time and Phonetic Space
No ratings yet
Semantic Density Analysis: Comparing Word Meaning Across Time and Phonetic Space
8 pages
Similarity Metric
No ratings yet
Similarity Metric
13 pages
Security Assessment Agreement
No ratings yet
Security Assessment Agreement
6 pages
Tera Com
No ratings yet
Tera Com
9 pages
SkillDzire SolidWorks Learning Content
No ratings yet
SkillDzire SolidWorks Learning Content
18 pages
Top Notch 2 List of Vocabulary in Unit 9
No ratings yet
Top Notch 2 List of Vocabulary in Unit 9
8 pages
CS224d Deep Learning For Natural Language Processing Lecture 2: Word Vectors
No ratings yet
CS224d Deep Learning For Natural Language Processing Lecture 2: Word Vectors
40 pages
CSC 305 - Lect 2
No ratings yet
CSC 305 - Lect 2
10 pages
MCQ in Networks and DBMS
No ratings yet
MCQ in Networks and DBMS
3 pages
MXK F Chassis W219 - M
No ratings yet
MXK F Chassis W219 - M
5 pages
Performing A Plane-Based Scan Registration
No ratings yet
Performing A Plane-Based Scan Registration
16 pages
The Lekha'S Waevguru Is An Ideal Tool For Realizing Networks Starting From 2G To 4G and Extension To 5G Concepts and Other Radio Applications
No ratings yet
The Lekha'S Waevguru Is An Ideal Tool For Realizing Networks Starting From 2G To 4G and Extension To 5G Concepts and Other Radio Applications
3 pages
Constructing and Evaluating Word Embeddings
No ratings yet
Constructing and Evaluating Word Embeddings
33 pages
Xiaomi Pricelist 09.25.2024
No ratings yet
Xiaomi Pricelist 09.25.2024
10 pages
Measure Term Similarity Using A Semantic Network Approach
No ratings yet
Measure Term Similarity Using A Semantic Network Approach
5 pages
Apply The Concept
No ratings yet
Apply The Concept
12 pages
Access Modifiers
No ratings yet
Access Modifiers
18 pages
Wireless Camera System Troubleshooting and FAQ
No ratings yet
Wireless Camera System Troubleshooting and FAQ
16 pages
Walking a Winding Road: A study of the book of Judges
From Everand
Walking a Winding Road: A study of the book of Judges
Christy L. Voelkel
No ratings yet
The Wondrous Wacky World of Words
From Everand
The Wondrous Wacky World of Words
Ben Bennetts
No ratings yet
Review Your Grammar and Ace Exams
From Everand
Review Your Grammar and Ace Exams
Florian Navarroza-Flores
No ratings yet

Vector Semantics and Embedding (Part 1)

Uploaded by

Vector Semantics and Embedding (Part 1)

Uploaded by

Word Meaning

A sense or “concept” is the meaning component of a word

Difference in form 🡪 difference in meaning

[I do not believe that there

Thanks to Mark Aronoff!

word1 word2 similarity

SimLex-999 dataset (Hill et al., 2015)

◦ coffee, tea: similar

• Words have affective meanings

Words seem to vary along 3 affective dimensions:

Values from NRC VAD Lexicon (Mohammad 2018)

Can we build a theory of how to represent word

Zellig Harris (1954):

Yamaguchi, Wikimedia Commons, public domain

Let's define the meaning of a word by its

Dominance powerful 0.991 weak 0.045

Hence the connotation of a word is a vector in 3-space

Idea 2: Meaning as a point in multidimensional space

Vectors are similar for the two comedies

But comedies are different than the other two

The dot product tends to be high when the two

Frequent words (of, the, you) have long vectors (since

-1: vectors point in opposite directions

But since raw frequency values are non-negative, the

cosine for term-term matrix vectors ranges from 0–1

• The co-occurrence matrices we have seen represent each

Words like "the" or "it" have very low idf

See if words like "good" appear more often with "great"

Instead of using raw count, we squash a bit:

N is the total number of documents

Could be a play or a Wikipedia article

Resulting PPMI matrix (negatives replaced by 0)

You might also like