0% found this document useful (0 votes)

26 views63 pages

21 Word2Vec 24 09 2024

Uploaded by

Shreyash Reshu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views63 pages

21 Word2Vec 24 09 2024

Uploaded by

Shreyash Reshu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 63

Vector Semantics

Vector
Semantics &
Embeddings
Computational models of word meaning

Can we build a theory of how to represent word

meaning, that accounts for at least some of the
desiderata?
We'll introduce vector semantics
The standard model in language processing!
Handles many of our goals!
Ludwig Wittgenstein

PI #43:
"The meaning of a word is its use in the language"
Let's define words by their usages
One way to define "usage":
words are defined by their environments (the words around them)

Zellig Harris (1954):

If A and B have almost identical environments we say that they
are synonyms.
What does recent English borrowing ongchoi mean?
Suppose you see these sentences:
• Ong choi is delicious sautéed with garlic.
• Ong choi is superb over rice
• Ong choi leaves with salty sauces
And you've also seen these:
• …spinach sautéed with garlic over rice
• Chard stems and leaves are delicious
• Collard greens and other salty leafy greens
Conclusion:
◦ Ongchoi is a leafy green like spinach, chard, or collard greens
◦ We could conclude this based on words like "leaves" and "delicious" and "sauteed"
Ongchoi: Ipomoea aquatica "Water Spinach"

空心菜
kangkong
rau muống
…

Yamaguchi, Wikimedia Commons, public domain

Idea 1: Defining meaning by linguistic distribution

Let's define the meaning of a word by its

distribution in language use, meaning its
neighboring words or grammatical environments.
Idea 2: Meaning as a point in space (Osgood et al. 1957)
3 affective dimensions for a word
◦ valence: pleasantness
◦ arousal: intensity of emotion
◦ dominance: the degree of control exerted
Word Score Word Score
Valence love 1.000 toxic 0.008
happy 1.000 nightmare 0.005
Arousal elated 0.960 mellow 0.069 NRC VAD Lexicon
frenzy 0.965 napping 0.046 (Mohammad 2018)

Dominance powerful 0.991 weak 0.045

◦ leadership 0.983 empty 0.081

Hence the connotation of a word is a vector in 3-space

Idea 1: Defining meaning by linguistic distribution

Idea 2: Meaning as a point in multidimensional space

Defining meaning as a point in space based on distribution
Each word = a vector (not just "good" or "w45")
Similar words are "nearby in semantic space"
We build this space automatically by seeing which words are
nearby in text
We define meaning of a word as a vector
Called an "embedding" because it's embedded into a
space (see textbook)
The standard way to represent meaning in NLP
Every modern NLP algorithm uses embeddings as
the representation of word meaning
Fine-grained model of meaning for similarity
Intuition: why vectors?
Consider sentiment analysis:
◦ With words, a feature is a word identity
◦ Feature 5: 'The previous word was "terrible"'
◦ requires exact same word to be in training and test
◦ With embeddings:
◦ Feature is a word vector
◦ 'The previous word was vector [35,22,17…]
◦ Now in the test set we might see a similar vector [34,21,14]
◦ We can generalize to similar but unseen words!!!
We'll discuss 2 kinds of embeddings
tf-idf
◦ Information Retrieval workhorse!
◦ A common baseline model
◦ Sparse vectors
◦ Words are represented by (a simple function of) the counts of nearby
words
Word2vec
◦ Dense vectors
◦ Representation is created by training a classifier to predict whether a
word is likely to appear nearby
◦ Later we'll discuss extensions called contextual embeddings
From now on:
Computing with meaning representations
instead of string representations
Words and Vectors
Vector
Semantics &
Embeddings
Term-document matrix
Each document is represented by a vector of words
Visualizing document vectors
Vectors are the basis of information retrieval

Vectors are similar for the two comedies

But comedies are different than the other two

Comedies have more fools and wit and fewer battles.
Idea for word meaning: Words can be vectors too!!!

battle is "the kind of word that occurs in Julius Caesar and Henry V"

fool is "the kind of word that occurs in comedies, especially Twelfth Night"
More common: word-word matrix
(or "term-context matrix")
Two words are similar in meaning if their context vectors are similar
Cosine for computing word similarity
Vector
Semantics &
Embeddings
Computing word similarity: Dot product and cosine
The dot product between two vectors is a scalar:

The dot product tends to be high when the two

vectors have large values in the same dimensions
Dot product can thus be a useful similarity metric
between vectors
Problem with raw dot-product
Dot product favors long vectors
Dot product is higher if a vector is longer (has higher
values in many dimension)
Vector length:

Frequent words (of, the, you) have long vectors (since

they occur many times with other words).
So dot product overly favors frequent words
Alternative: cosine for computing word similarity

Based on the definition of the dot product between two vectors a and b
Cosine as a similarity metric

-1: vectors point in opposite directions

+1: vectors point in same directions
0: vectors are orthogonal

But since raw frequency values are non-negative, the

cosine for term-term matrix vectors ranges from 0–1

Cosine examples
pie data computer
åi=1
N
v ·w v w vi wi
cos(v, w) = = · = cherry 442 8 2
v w v w
å åi=1 wi2
N N
vi2 digital 5 1683 1670
i=1

information 5 3982 3325

27
Visualizing cosines
(well, angles)
Word Embedding
Word Embedding
• Word embedding is a technique in natural language processing (NLP)
that represents words as vectors in a continuous vector space. This
allows for capturing semantic relationships and similarities between
words based on their context in large text corpora.
• Some word embedding models are Word2vec (Google), Glove
(Stanford), and fastest (Facebook).
Word Embedding
• Word Embedding is also called a distributed semantic model or
distributed represented or semantic vector space or vector
space model.
• The similar words can be grouped together. For example, fruits
like apples, mango, and banana should be placed close
whereas books will be far away from these words.
• In a broader sense, word embedding will create the vector of fruits
which will be placed far away from the vector representation of
books.
Importance - Word Embedding
• Semantic Understanding - semantic meaning of words
• Contextual Relationships - Embeddings can capture the context in
which words appear.
• Dimensionality Reduction - Word embeddings reduce the
dimensionality, making computations more efficient and models
faster.
• Transfer Learning: Pre-trained word embeddings can be used.
• Improved Performance: Using word embeddings often leads to
better performance in various NLP tasks such as text classification,
sentiment analysis, and machine translation. They help models
generalize better by capturing nuanced relationships between words.
Applications

• NLP Tasks: Used in sentiment analysis, machine translation,

information retrieval, and more.
• Transfer Learning: Pre-trained models can be fine-tuned for specific
tasks, leveraging knowledge from vast corpora.
• Word Embedding Limitations
• Out-of-Vocabulary Words: Word2Vec does not handle words that were not
present in the training data well.
• Lack of Contextual Awareness: It generates a single vector for each word
regardless of its context (e.g., "bank" in "river bank" vs. "financial bank").
Word2Vec
• Word2Vec is a popular technique for creating word embeddings,
developed by a team at Google led by Tomas Mikolov. It represents
words in a continuous vector space, allowing machines to understand
their meanings based on context. Here are the main components of
Word2Vec:
Word2Vec - Architectures
• Continuous Bag of Words (CBOW):
• Predicts the target word from the surrounding context words.
• For example, given the context words "the," "sat," "on," it predicts "cat."
• Generally faster and more suitable for smaller datasets.
• Skip-gram Model:
• Predicts surrounding context words given a target word.
• For example, given the target word "cat," it tries to predict words like "sat,"
"on," "the," etc.
• Effective for capturing semantic relationships and works well with large
datasets.
Word2vec
• Instead of counting how often each word w occurs near
"apricot"
• Train a classifier on a binary prediction task:
• Is w likely to show up near "apricot"?
• We don’t actually care about this task
• But we'll take the learned classifier weights as the word embeddings
• Big idea: self-supervision:
• A word c that occurs near apricot in the corpus cats as the gold
"correct answer" for supervised learning
• No need for human labels
• Bengio et al. (2003); Collobert et al. (2011)
Approach: predict if candidate word c is a "neighbor"

1. Treat the target word t and a neighboring context word c as

positive examples.
2. Randomly sample other words in the lexicon to get
negative examples
3. Use logistic regression to train a classifier to distinguish
those two cases
4. Use the learned weights as the embeddings
Architecture of the CBOW model
Architecture of Skip-gram Model
Similarity is computed from dot product
• Remember: two vectors are similar if they have a high
dot product
• Cosine is just a normalized dot product
• So:
• Similarity(w,c) ∝ w ∙ c
• We’ll need to normalize to get a probability
41
CBOW - Example
CBOW – Example Cont…
CBOW – Example Cont…
CBOW – Example Cont…
CBOW – Example Cont…
CBOW – Example Cont…

Why Softmax? - Probabilities can

be compared directly, helping to
identify which word has the
highest likelihood of being the
target.
CBOW – Example Cont…
• Step 7: Identify the Predicted Word
• The predicted word is the one with the highest probability. In this case, if
"king" has the highest probability (approximately 0.276), it will be the
predicted target word.
• Summary of Steps
1.Identify the Target: Target word "king" and context words "man" and "woman."
2.Vector Representation: Assign 4D vectors to all words.
3.Average Context: Calculate the average of context word vectors.
4.Score Calculation: Compute dot products with weights for each word.
5.Softmax: Convert scores to probabilities.
6.Prediction: Identify the word with the highest probability.
Summary
• In CBOW, context words are chosen based on proximity to the target
word, which is determined by the window size.
• Including additional words like "queen" depends on expanding the
context window.
• By adjusting the context, you can influence the prediction, capturing
more semantic relationships.
Skip-gram model - Ex
Skip-gram model – Ex Cont…
Skip-gram model – Ex Cont…
Skip-gram model – Ex Cont…
Skip-gram model – Ex Cont…
Skip-gram model – Ex Cont…
Skip-gram model – Ex Cont…
Skip-gram model – Ex Cont…
Identify the Predicted Context Words:
Threshold: You can set a threshold to consider only those context words whose probabilities
exceed a certain value.
Top-N Selection: Alternatively, you can select the top-N context words with the highest
probabilities. This is common when you expect multiple context words.
fasText
• fastText introduces a pivotal shift by considering words as composed
of character n-grams, enabling it to build representations for words
based on these subword units
• This approach allows the model to understand and generate
embeddings for words not seen in the training data, offering a
substantial advantage in handling morphologically rich languages and
rare words.
Difference Between fastText and Word2Vec
• Handling of Out-of-Vocabulary (OOV) Words
• Word2Vec: Word2Vec operates at the word level, generating embeddings for
individual words. It struggles with out-of-vocabulary words as it cannot
represent words it hasn’t seen during training.
• fastText: In contrast, fastText introduces subword embeddings by considering
words to be composed of character n-grams. This enables it to handle out-of-
vocabulary words effectively by breaking terms into subword units and
generating embeddings for these units, even for unseen words. This capability
makes fastText more robust in dealing with rare or morphologically complex
expressions.
Representation of Words
• Word2Vec: Word2Vec generates word embeddings based solely on
the words without considering internal structure or morphological
information.
• fastText: fastText captures subword information, allowing it to
understand word meanings based on their constituent character n-
grams. This enables fastText to represent words by considering their
morphological makeup, providing a richer representation, especially
for morphologically rich languages or domains with specialised
jargon.
Training Efficiency
• Word2Vec: The training process in Word2Vec is relatively faster than
older methods but might be slower than fastText due to its word-level
approach.
• fastText: fastText is known for its exceptional speed and scalability,
especially when dealing with large datasets, as it operates efficiently
at the subword level.
Use Cases
• Word2Vec: Word2Vec’s word-level embeddings are well-suited for
tasks like finding similar words, understanding relationships between
words, and capturing semantic similarities.
• fastText: fastText’s subword embeddings make it more adaptable in
scenarios involving out-of-vocabulary words, sentiment analysis,
language identification, and tasks requiring a deeper understanding of
morphology.

INE PowerShell For Pentesters Course File
No ratings yet
INE PowerShell For Pentesters Course File
157 pages
New Syllabus
No ratings yet
New Syllabus
4 pages
Elephant and Tiger
No ratings yet
Elephant and Tiger
19 pages
Powerpoint Cohort 1
No ratings yet
Powerpoint Cohort 1
2 pages
User Settings
No ratings yet
User Settings
4 pages
Emcee Script For Enchanted-Inspired Debut-2
No ratings yet
Emcee Script For Enchanted-Inspired Debut-2
3 pages
ABC's of Technical Writing
56% (9)
ABC's of Technical Writing
6 pages
NMMS Merit List 2022-23
50% (2)
NMMS Merit List 2022-23
151 pages
A Level Chinese Paper 1 For 24 5
No ratings yet
A Level Chinese Paper 1 For 24 5
33 pages
NLP - L9 Word Embedding
No ratings yet
NLP - L9 Word Embedding
5 pages
DM Chapter 9 - Word Embedding
No ratings yet
DM Chapter 9 - Word Embedding
7 pages
Ethiopia ETHNIC GROUPS
No ratings yet
Ethiopia ETHNIC GROUPS
3 pages
Orchestrator Release Notes Version 9.4.3 RevC
No ratings yet
Orchestrator Release Notes Version 9.4.3 RevC
67 pages
Plotinus
No ratings yet
Plotinus
7 pages
Vector Semantics Embeddings
No ratings yet
Vector Semantics Embeddings
11 pages
How Word Vectors Capture The Meaning Behind Words Mathematically
No ratings yet
How Word Vectors Capture The Meaning Behind Words Mathematically
4 pages
I Am Redeemed in Christ All Things Are Past Away
No ratings yet
I Am Redeemed in Christ All Things Are Past Away
4 pages
NLP 2
No ratings yet
NLP 2
8 pages
Vocabulary Workshop Level G Answers Homework Hawk
50% (2)
Vocabulary Workshop Level G Answers Homework Hawk
5 pages
Lab 5
No ratings yet
Lab 5
27 pages
Word Embadding
No ratings yet
Word Embadding
24 pages
Word Embeddings 1
No ratings yet
Word Embeddings 1
42 pages
Conflict Between Pallavas Chalukyas
No ratings yet
Conflict Between Pallavas Chalukyas
3 pages
Vector Semantics and Embeddings
No ratings yet
Vector Semantics and Embeddings
29 pages
Unit 2 Updated New
No ratings yet
Unit 2 Updated New
77 pages
Wordembed
No ratings yet
Wordembed
31 pages
2 How To - Tests of Copy Configuration From Client 000 - Note 2838358 - Part2
No ratings yet
2 How To - Tests of Copy Configuration From Client 000 - Note 2838358 - Part2
7 pages
Vector Semantics and Embedding (Part 1)
No ratings yet
Vector Semantics and Embedding (Part 1)
66 pages
ML4D-L6 nlp2
No ratings yet
ML4D-L6 nlp2
58 pages
Vector Semantics and Embedding (Part 2)
No ratings yet
Vector Semantics and Embedding (Part 2)
47 pages
Word Embeddings
No ratings yet
Word Embeddings
59 pages
2021A Survey On Windows-Based Ransomware Taxonomy and Detection Mechanisms
No ratings yet
2021A Survey On Windows-Based Ransomware Taxonomy and Detection Mechanisms
36 pages
Week 5
No ratings yet
Week 5
26 pages
XCS224N Module1 Slides
No ratings yet
XCS224N Module1 Slides
72 pages
Elt 208
No ratings yet
Elt 208
5 pages
Unit 2
No ratings yet
Unit 2
15 pages
English L1 Paper 2 Specimen and Marking Grid
No ratings yet
English L1 Paper 2 Specimen and Marking Grid
6 pages
Adjectives Describing People and Personal Qualities Vocabulary Word List
100% (2)
Adjectives Describing People and Personal Qualities Vocabulary Word List
2 pages
11.chapter8 WordEmbedding
No ratings yet
11.chapter8 WordEmbedding
17 pages
Unit IV
No ratings yet
Unit IV
57 pages
4.machine Learning Word Embedding-1
No ratings yet
4.machine Learning Word Embedding-1
36 pages
DLNLP CH-3 N
No ratings yet
DLNLP CH-3 N
11 pages
Lecture 10
No ratings yet
Lecture 10
86 pages
Lect 04
No ratings yet
Lect 04
44 pages
Lecture12 - Word RepEmb
No ratings yet
Lecture12 - Word RepEmb
28 pages
Database Setup
No ratings yet
Database Setup
11 pages
Unit IV
No ratings yet
Unit IV
58 pages
Week 2 and 3
No ratings yet
Week 2 and 3
76 pages
NLP Prez Word - Sentence Embedding - MAQUET - MARTIN - LEEFEBURE - MOGAVERO
No ratings yet
NLP Prez Word - Sentence Embedding - MAQUET - MARTIN - LEEFEBURE - MOGAVERO
18 pages
S VD Chapter
No ratings yet
S VD Chapter
12 pages
B - A - History Heritage Management 1
No ratings yet
B - A - History Heritage Management 1
14 pages
Neural Models For NLP
No ratings yet
Neural Models For NLP
67 pages
Word and Document Embeddings
No ratings yet
Word and Document Embeddings
94 pages
Lecture#14
No ratings yet
Lecture#14
38 pages
08 Word Embeddings (2021)
No ratings yet
08 Word Embeddings (2021)
58 pages
PDF Function Block To Control MM4 Via Profibus-DP en V3 PDF
No ratings yet
PDF Function Block To Control MM4 Via Profibus-DP en V3 PDF
11 pages
Passive Voice Exercises
No ratings yet
Passive Voice Exercises
3 pages
NLP Lec 03
No ratings yet
NLP Lec 03
26 pages
Word Embeddings
No ratings yet
Word Embeddings
55 pages
06 Wordvectors
No ratings yet
06 Wordvectors
96 pages
4 Word Representation
No ratings yet
4 Word Representation
41 pages
Week11 WordEmbedding
No ratings yet
Week11 WordEmbedding
20 pages
3 WordMeaning
No ratings yet
3 WordMeaning
78 pages
NLP An Intuitive Understanding of Word Embeddings From Count Vectors To Word2Vec
No ratings yet
NLP An Intuitive Understanding of Word Embeddings From Count Vectors To Word2Vec
18 pages
Chapter II
No ratings yet
Chapter II
26 pages
Curriculum Framework For Romani
No ratings yet
Curriculum Framework For Romani
105 pages
Lecture Word Embeddings WordTo Vec IR
No ratings yet
Lecture Word Embeddings WordTo Vec IR
60 pages
CCS369 - TSS-Unit 2
No ratings yet
CCS369 - TSS-Unit 2
56 pages
Intuitive Understanding of Word Embeddings - Count Vectors To Word2Vec
No ratings yet
Intuitive Understanding of Word Embeddings - Count Vectors To Word2Vec
34 pages
Wordembed v2.0
No ratings yet
Wordembed v2.0
46 pages
COMP5046: Natural Language Processing
No ratings yet
COMP5046: Natural Language Processing
71 pages
Wordembedding
No ratings yet
Wordembedding
25 pages
Presentation On Speech Recognition
No ratings yet
Presentation On Speech Recognition
11 pages
Lebijp 59 SZ 31 Py
No ratings yet
Lebijp 59 SZ 31 Py
69 pages
Vector Semantics 2 Word Embeddings (Vector Semantics)
No ratings yet
Vector Semantics 2 Word Embeddings (Vector Semantics)
5 pages
System Description For Use With DESIGO XWORKS 17285 HQ en
100% (1)
System Description For Use With DESIGO XWORKS 17285 HQ en
48 pages
Lecture 3. Vector Semantics
No ratings yet
Lecture 3. Vector Semantics
51 pages
Manual Cpu Abb Completo 410 Pags
No ratings yet
Manual Cpu Abb Completo 410 Pags
410 pages
Lecture 3. 6 - Vector - Apr18 - 2021
No ratings yet
Lecture 3. 6 - Vector - Apr18 - 2021
106 pages
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
100% (1)
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
12 pages
Vector Representation of Text: Vagelis Hristidis Prepared With The Help of Nhat Le Many Slides Are From Richard Socher
No ratings yet
Vector Representation of Text: Vagelis Hristidis Prepared With The Help of Nhat Le Many Slides Are From Richard Socher
20 pages
rr320505 Advanced Unix Programming
No ratings yet
rr320505 Advanced Unix Programming
5 pages
6 Vector Apr18 2021
No ratings yet
6 Vector Apr18 2021
106 pages
Constructing and Evaluating Word Embeddings
No ratings yet
Constructing and Evaluating Word Embeddings
33 pages
The Vector Space Model of Word Meaning: Informatics 1 CG: Lecture 13
No ratings yet
The Vector Space Model of Word Meaning: Informatics 1 CG: Lecture 13
46 pages

21 Word2Vec 24 09 2024

Uploaded by

21 Word2Vec 24 09 2024

Uploaded by

Vector Semantics

Can we build a theory of how to represent word

Zellig Harris (1954):

Yamaguchi, Wikimedia Commons, public domain

Let's define the meaning of a word by its

Dominance powerful 0.991 weak 0.045

Hence the connotation of a word is a vector in 3-space

Idea 2: Meaning as a point in multidimensional space

Vectors are similar for the two comedies

But comedies are different than the other two

The dot product tends to be high when the two

Frequent words (of, the, you) have long vectors (since

-1: vectors point in opposite directions

But since raw frequency values are non-negative, the

cosine for term-term matrix vectors ranges from 0–1

information 5 3982 3325

• NLP Tasks: Used in sentiment analysis, machine translation,

1. Treat the target word t and a neighboring context word c as

Why Softmax? - Probabilities can

You might also like