0% found this document useful (0 votes)
26 views26 pages

NLP Lec 03

The document discusses techniques for representing words as vectors using Word2Vec and GloVe models. Word2Vec learns vector representations of words by predicting surrounding context words. GloVe learns representations by analyzing co-occurrence statistics between words and contexts. The document also covers identifying and quantifying bias in word embeddings.

Uploaded by

vaibhav shah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views26 pages

NLP Lec 03

The document discusses techniques for representing words as vectors using Word2Vec and GloVe models. Word2Vec learns vector representations of words by predicting surrounding context words. GloVe learns representations by analyzing co-occurrence statistics between words and contexts. The document also covers identifying and quantifying bias in word embeddings.

Uploaded by

vaibhav shah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Natural Language Processing

(CS5803)
Lecture 3
(Word Representations)
Words as vectors: Word2Vec

● Representation of a word is dictated by


other surrounding words
● Assume a fixed length context window
● For example:
○ [w-2 w-1 c w1 w2]

● Start with random initialization


● Iterate till convergence
Word2Vec Models: SkipGram (SG)

● Training sentence:
● ... the algorithm’s asymptotic complexity
is quadratic...
● w-2 w-1 c w1 w2
● Considering words in a context window of
length 5
○ P(context|target)
○ P( [w-2 w-1 w1 w2]|c)= ?
Word2Vec Models: CBOW

● Training sentence:
● ... the algorithm’s asymptotic complexity
is quadratic...
● w-2 w-1 c w1 w2
● Considering words in a context window of
length 5
○ P(target|context)
○ P( c|w-2 w-1 w1 w2)= ?
Objective function

Ref: “Distributed Representations of Words and Phrases and their Compositionality”, by Mikolov (2013)
More examples of target and context

Ref: https://towardsdatascience.com/an-implementation-guide-to-word2vec-using-numpy-and-google-sheets-13445eebd281
Skip-gram

Slide courtesy of Jurafsky & Martin


Steps with example
We must learn W and W’
Input layer
0
1
0
0

cat 0 Hidden layer Output layer


0
0 0
0 0
… 0
V-dim 0 0
0
0
sat
0 0
0 1
0 …
1 N-dim 0 V-dim
0
on
0
0
0

V-dim 0 N will be the size of word vector

8
Steps with example

Input layer
0
1
0
0

cat 0 Hidden layer Output layer


0
0 0 0.01
0 0
0.02
… 0
V-dim 0 0 0.00
0
0.02
0
0 0 0.01
0 1 0.02
0 …
0.01
1 0
0 N-dim 0.7
on
0 …
0
0
V-dim 0.00


V-dim 0 N will be the size of word vector
9
Learning the representations: Step by step

● Ref: https://towardsdatascience.com/an-implementation-guide-to-word2vec-using-numpy-and-google-sheets-13445eebd281
Learning the representations: Step by step

For more details regarding weight updates, you may visit the paper “word2vec Parameter Learning Explained”
Word2Vec: References
● Distributed Representations of Words and Phrases
and their Compositionality
● https://www.geeksforgeeks.org/python-word-embedding-usin
g-word2vec/
● https://radimrehurek.com/gensim/models/word2vec.html
Analogy: Embeddings capture relational
meaning!
vector(‘king’) - vector(‘man’) + vector(‘woman’) vector(‘queen’)

vector(‘Paris’) - vector(‘France’) + vector(‘Italy’) vector(‘Rome’)



Word analogies
Multicontext representation learning
Evaluation on Word Similarity Task

WordSim353: http://www.cs.technion.ac.il/~gabr/resources/data/wordsim353/
Evaluation on semantic textual
Similarity Task
GloVE
● Stands for GloVe: Global Vectors for Word Representation
○ Emphasizes on co-occurrence with context/probe words
● Learns two representations (W, W ̃) for each word

● Focus on ratio of co-occurrence probabilities


○ Given words wi, wj, and a probe word wk, model their
co-occurrence probability: F(wi, wj, wk)=Pik/Pjk
GloVE
● Word embeddings are in linear structures
● Natural way of defining F: use vector subtraction, multiplication

Control the form that F can take

Model P using operations so that role of


input and context word can be
interchanged later

Model F as exp(.)

Introduce bias terms and absorb log(Xi)

Final objective function


GloVE (Summary)
● Stands for GloVe: Global Vectors for Word Representation
○ Emphasizes on co-occurrence with context words
● Learns two representations (W, W ̃) for each word
● The prediction problem is given by:

● The objective function:


Embeddings reflect societal bias
●Ask “Paris : France :: Tokyo : x”
○ x = Japan
●Ask “father : doctor :: mother : x”
○ x = nurse
●Ask “man : computer programmer :: woman : x”
○ x = homemaker

Bolukbasi, Tolga, Kai-Wei Chang, James Y. Zou, Venkatesh Saligrama, and Adam T. Kalai. "Man is to computer
programmer as woman is to homemaker? debiasing word embeddings." In Advances in Neural Information
Processing Systems, pp. 4349-4357. 2016.
Embeddings
Embeddings Reflect
by w2vSocietal Bias News
on Google
Identifying and quantifying bias in word
embeddings
●Assumption: The aspect of bias is known. E.g. gender
●Find the “gender” dimension
○ Collect explicit gender-based word pairs (f, m): (woman, man),
(mother, father), (gal, guy), (girl, boy), (she, he)
○ Get the gender dimension as (f-m) [How?]
● Collect a set N of gender neutral words

● Compute the gender component in elements from N


○ DirectBias = (1/|N|)∑w∊N|cos(w,g)|
○ Can be raised to the power c
Identifying and quantifying bias in word embeddings
●How to capture indirect bias?
●Direct bias: component along gender dimension
●Indirect bias: Component along its perpendicular

●Need to find the component to the perpendicular of the “gender” dimension


●Component of vector a along vector b:
○ Scalar Component: compb(a) = (a.b)/|b|
○ Vector component: compb(a).b
● wg= (w.g)g, w⊥=w-wg

● IndirectBias B(w,v)= (w.v - (w⊥-v⊥)/(|w⊥|-|v⊥|)) / (w.v)


A simple technique
for debiasing GloVE
Identifying and quantifying bias in word embeddings
Identifying and quantifying bias in word embeddings

Reference: Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings NeurIPS 2016
Another version is here.

You might also like