0% found this document useful (0 votes)

26 views26 pages

NLP Lec 03

The document discusses techniques for representing words as vectors using Word2Vec and GloVe models. Word2Vec learns vector representations of words by predicting surrounding context words. GloVe learns representations by analyzing co-occurrence statistics between words and contexts. The document also covers identifying and quantifying bias in word embeddings.

Uploaded by

vaibhav shah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views26 pages

NLP Lec 03

Uploaded by

vaibhav shah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Natural Language Processing

(CS5803)
Lecture 3
(Word Representations)
Words as vectors: Word2Vec

● Representation of a word is dictated by

other surrounding words
● Assume a fixed length context window
● For example:
○ [w-2 w-1 c w1 w2]

● Start with random initialization

● Iterate till convergence
Word2Vec Models: SkipGram (SG)

● Training sentence:
● ... the algorithm’s asymptotic complexity
is quadratic...
● w-2 w-1 c w1 w2
● Considering words in a context window of
length 5
○ P(context|target)
○ P( [w-2 w-1 w1 w2]|c)= ?
Word2Vec Models: CBOW

● Training sentence:
● ... the algorithm’s asymptotic complexity
is quadratic...
● w-2 w-1 c w1 w2
● Considering words in a context window of
length 5
○ P(target|context)
○ P( c|w-2 w-1 w1 w2)= ?
Objective function

Ref: “Distributed Representations of Words and Phrases and their Compositionality”, by Mikolov (2013)
More examples of target and context

Ref: https://towardsdatascience.com/an-implementation-guide-to-word2vec-using-numpy-and-google-sheets-13445eebd281
Skip-gram

Slide courtesy of Jurafsky & Martin

Steps with example
We must learn W and W’
Input layer
0
1
0
0

cat 0 Hidden layer Output layer

0
0 0
0 0
… 0
V-dim 0 0
0
0
sat
0 0
0 1
0 …
1 N-dim 0 V-dim
0
on
0
0
0
…
V-dim 0 N will be the size of word vector

8
Steps with example

Input layer
0
1
0
0

cat 0 Hidden layer Output layer

0
0 0 0.01
0 0
0.02
… 0
V-dim 0 0 0.00
0
0.02
0
0 0 0.01
0 1 0.02
0 …
0.01
1 0
0 N-dim 0.7
on
0 …
0
0
V-dim 0.00

…
V-dim 0 N will be the size of word vector
9
Learning the representations: Step by step

● Ref: https://towardsdatascience.com/an-implementation-guide-to-word2vec-using-numpy-and-google-sheets-13445eebd281
Learning the representations: Step by step

For more details regarding weight updates, you may visit the paper “word2vec Parameter Learning Explained”
Word2Vec: References
● Distributed Representations of Words and Phrases
and their Compositionality
● https://www.geeksforgeeks.org/python-word-embedding-usin
g-word2vec/
● https://radimrehurek.com/gensim/models/word2vec.html
Analogy: Embeddings capture relational
meaning!
vector(‘king’) - vector(‘man’) + vector(‘woman’) vector(‘queen’)
≈

vector(‘Paris’) - vector(‘France’) + vector(‘Italy’) vector(‘Rome’)

≈
Word analogies
Multicontext representation learning
Evaluation on Word Similarity Task

WordSim353: http://www.cs.technion.ac.il/~gabr/resources/data/wordsim353/
Evaluation on semantic textual
Similarity Task
GloVE
● Stands for GloVe: Global Vectors for Word Representation
○ Emphasizes on co-occurrence with context/probe words
● Learns two representations (W, W ̃) for each word

● Focus on ratio of co-occurrence probabilities

○ Given words wi, wj, and a probe word wk, model their
co-occurrence probability: F(wi, wj, wk)=Pik/Pjk
GloVE
● Word embeddings are in linear structures
● Natural way of defining F: use vector subtraction, multiplication

Control the form that F can take

Model P using operations so that role of

input and context word can be
interchanged later

Model F as exp(.)

Introduce bias terms and absorb log(Xi)

Final objective function

GloVE (Summary)
● Stands for GloVe: Global Vectors for Word Representation
○ Emphasizes on co-occurrence with context words
● Learns two representations (W, W ̃) for each word
● The prediction problem is given by:

● The objective function:

Embeddings reflect societal bias
●Ask “Paris : France :: Tokyo : x”
○ x = Japan
●Ask “father : doctor :: mother : x”
○ x = nurse
●Ask “man : computer programmer :: woman : x”
○ x = homemaker

Bolukbasi, Tolga, Kai-Wei Chang, James Y. Zou, Venkatesh Saligrama, and Adam T. Kalai. "Man is to computer
programmer as woman is to homemaker? debiasing word embeddings." In Advances in Neural Information
Processing Systems, pp. 4349-4357. 2016.
Embeddings
Embeddings Reflect
by w2vSocietal Bias News
on Google
Identifying and quantifying bias in word
embeddings
●Assumption: The aspect of bias is known. E.g. gender
●Find the “gender” dimension
○ Collect explicit gender-based word pairs (f, m): (woman, man),
(mother, father), (gal, guy), (girl, boy), (she, he)
○ Get the gender dimension as (f-m) [How?]
● Collect a set N of gender neutral words

● Compute the gender component in elements from N

○ DirectBias = (1/|N|)∑w∊N|cos(w,g)|
○ Can be raised to the power c
Identifying and quantifying bias in word embeddings
●How to capture indirect bias?
●Direct bias: component along gender dimension
●Indirect bias: Component along its perpendicular

●Need to find the component to the perpendicular of the “gender” dimension

●Component of vector a along vector b:
○ Scalar Component: compb(a) = (a.b)/|b|
○ Vector component: compb(a).b
● wg= (w.g)g, w⊥=w-wg

● IndirectBias B(w,v)= (w.v - (w⊥-v⊥)/(|w⊥|-|v⊥|)) / (w.v)

A simple technique
for debiasing GloVE
Identifying and quantifying bias in word embeddings
Identifying and quantifying bias in word embeddings

Reference: Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings NeurIPS 2016
Another version is here.

Explaining The Intuition of Word2Vec & Implementing It in Python
No ratings yet
Explaining The Intuition of Word2Vec & Implementing It in Python
13 pages
Personal Statement - MSC Marketing
No ratings yet
Personal Statement - MSC Marketing
3 pages
Development and Validation of Instrument For Assessment of Students Psychomotor Skill in Senior Secondary School Mathematics
100% (1)
Development and Validation of Instrument For Assessment of Students Psychomotor Skill in Senior Secondary School Mathematics
38 pages
Intuitive Understanding of Word Embeddings - Count Vectors To Word2Vec
No ratings yet
Intuitive Understanding of Word Embeddings - Count Vectors To Word2Vec
34 pages
Data Dosen May 2015
No ratings yet
Data Dosen May 2015
115 pages
Preservation Conservation and Use of Manuscripts in Aligarh Muslim University Library A Case Study
No ratings yet
Preservation Conservation and Use of Manuscripts in Aligarh Muslim University Library A Case Study
12 pages
2021 GAD Checklist 1
No ratings yet
2021 GAD Checklist 1
4 pages
COMP5046: Natural Language Processing
No ratings yet
COMP5046: Natural Language Processing
71 pages
Natural Lesson Plan
No ratings yet
Natural Lesson Plan
4 pages
Swami Vivekananda and Human Excellence - A Book Summary
100% (2)
Swami Vivekananda and Human Excellence - A Book Summary
6 pages
Elements of Short Story Summary
100% (1)
Elements of Short Story Summary
4 pages
Working Paper: Management Communication: History, Distinctiveness, and Core Content
No ratings yet
Working Paper: Management Communication: History, Distinctiveness, and Core Content
36 pages
Education in Peru
No ratings yet
Education in Peru
6 pages
Bukidnon State University: Mandate
No ratings yet
Bukidnon State University: Mandate
6 pages
Inggris7 SOAL
No ratings yet
Inggris7 SOAL
7 pages
C Tfin22 66
No ratings yet
C Tfin22 66
1 page
Lebijp 59 SZ 31 Py
No ratings yet
Lebijp 59 SZ 31 Py
69 pages
Six Habits of Merely Effective Negotiators: Eglecting
No ratings yet
Six Habits of Merely Effective Negotiators: Eglecting
2 pages
Exam Practice Fel0012
No ratings yet
Exam Practice Fel0012
5 pages
Word Embeddings
No ratings yet
Word Embeddings
55 pages
3 WordMeaning
No ratings yet
3 WordMeaning
78 pages
Time Connectives Homework Ks1
100% (1)
Time Connectives Homework Ks1
8 pages
Christopher Manning Lecture 2: Word Vectors, Word Senses, and Neural Classifiers
No ratings yet
Christopher Manning Lecture 2: Word Vectors, Word Senses, and Neural Classifiers
57 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
57 pages
Christopher Manning Lecture 1: Introduction and Word Vectors
No ratings yet
Christopher Manning Lecture 1: Introduction and Word Vectors
42 pages
Statement of Purpose Auburn
No ratings yet
Statement of Purpose Auburn
2 pages
CS490 Advanced Topics in Computing - Deep Learning
No ratings yet
CS490 Advanced Topics in Computing - Deep Learning
20 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
36 pages
Word Embeddings Classification
No ratings yet
Word Embeddings Classification
52 pages
7a. Word Embeddings Word2Vec and GloVe
No ratings yet
7a. Word Embeddings Word2Vec and GloVe
39 pages
Advt 144 26may23 Final
No ratings yet
Advt 144 26may23 Final
6 pages
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
100% (1)
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
12 pages
Lecture Word Embeddings WordTo Vec IR
No ratings yet
Lecture Word Embeddings WordTo Vec IR
60 pages
Word Embedding 9 Mar 23 PDF
No ratings yet
Word Embedding 9 Mar 23 PDF
16 pages
12 Subrata DL
No ratings yet
12 Subrata DL
25 pages
Constructing and Evaluating Word Embeddings
No ratings yet
Constructing and Evaluating Word Embeddings
33 pages
2018 - Word Embedding - Word2Vec - 1 (Choi) (11 Slides)
100% (1)
2018 - Word Embedding - Word2Vec - 1 (Choi) (11 Slides)
11 pages
Language Analysis - Sociolinguistics of Word Embeddings - PREPRINT - 8.8.2020
No ratings yet
Language Analysis - Sociolinguistics of Word Embeddings - PREPRINT - 8.8.2020
17 pages
21 Word2Vec 24 09 2024
No ratings yet
21 Word2Vec 24 09 2024
63 pages
Humming Bird - Olympiad & SpellBee
No ratings yet
Humming Bird - Olympiad & SpellBee
2 pages
Wordembed v2.0
No ratings yet
Wordembed v2.0
46 pages
Careers in Space
No ratings yet
Careers in Space
11 pages
What Is ETL
No ratings yet
What Is ETL
13 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
33 pages
CCS369 - TSS-Unit 2
No ratings yet
CCS369 - TSS-Unit 2
56 pages
542 315 Word2vec
No ratings yet
542 315 Word2vec
20 pages
Empathy
No ratings yet
Empathy
2 pages
Word and Document Embeddings
No ratings yet
Word and Document Embeddings
94 pages
NLP Notes
No ratings yet
NLP Notes
11 pages
4 Word Representation
No ratings yet
4 Word Representation
41 pages
44.ICTNWK557 Activity 1 Template.v1.0
No ratings yet
44.ICTNWK557 Activity 1 Template.v1.0
3 pages
NLP An Intuitive Understanding of Word Embeddings From Count Vectors To Word2Vec
No ratings yet
NLP An Intuitive Understanding of Word Embeddings From Count Vectors To Word2Vec
18 pages
06 Wordvectors
No ratings yet
06 Wordvectors
96 pages
08-DL-Deep Learning For Text Data (Transfer Learning in NLP)
No ratings yet
08-DL-Deep Learning For Text Data (Transfer Learning in NLP)
53 pages
A Simple Word2vec Tutorial - Zafar Ali - Medium - Reader View
No ratings yet
A Simple Word2vec Tutorial - Zafar Ali - Medium - Reader View
9 pages
Part 3
No ratings yet
Part 3
5 pages
08 Word Embeddings (2021)
No ratings yet
08 Word Embeddings (2021)
58 pages
CCS369 Unit-2 20.12.24
No ratings yet
CCS369 Unit-2 20.12.24
41 pages
Word 2 Vec
No ratings yet
Word 2 Vec
6 pages
ML For NLP-LO4
No ratings yet
ML For NLP-LO4
42 pages
Lecture 10
No ratings yet
Lecture 10
86 pages
Word Embeddings
No ratings yet
Word Embeddings
59 pages
Web Technology 512B Examination 2023
No ratings yet
Web Technology 512B Examination 2023
3 pages
DMM252 ProFound AI Rev 1
No ratings yet
DMM252 ProFound AI Rev 1
2 pages
Naukri GreeshmaKantipudi (1y 0m)
No ratings yet
Naukri GreeshmaKantipudi (1y 0m)
1 page
Exploring Anatomy Physiology in The Laboratory 3rd Edition Edition Erin C. Amerman - The Full Ebook With All Chapters Is Available For Download Now
No ratings yet
Exploring Anatomy Physiology in The Laboratory 3rd Edition Edition Erin C. Amerman - The Full Ebook With All Chapters Is Available For Download Now
44 pages
11.chapter8 WordEmbedding
No ratings yet
11.chapter8 WordEmbedding
17 pages
Unit IV
No ratings yet
Unit IV
57 pages
XCS224N Module1 Slides
No ratings yet
XCS224N Module1 Slides
72 pages
Business Economics II Macroeconomics Revised Edition Debes Mukherjee Download
No ratings yet
Business Economics II Macroeconomics Revised Edition Debes Mukherjee Download
52 pages
Vector Semantics and Embedding (Part 2)
No ratings yet
Vector Semantics and Embedding (Part 2)
47 pages
Ba LLMS W2 S2 2024 2025
No ratings yet
Ba LLMS W2 S2 2024 2025
47 pages
Word Embeddings 1
No ratings yet
Word Embeddings 1
42 pages
Wordembed
No ratings yet
Wordembed
31 pages
Vector Semantics and Embeddings
No ratings yet
Vector Semantics and Embeddings
29 pages
Lecture1 Word Embeddings
No ratings yet
Lecture1 Word Embeddings
99 pages
Abnormal Psychology 9th Edition Thomas F Oltmanns Robert E Emery Digital Access
100% (1)
Abnormal Psychology 9th Edition Thomas F Oltmanns Robert E Emery Digital Access
405 pages
Rey Et Al. (2022) - Federated Learning For Malware Detection in IoT Devices
No ratings yet
Rey Et Al. (2022) - Federated Learning For Malware Detection in IoT Devices
14 pages
Unit 2 Updated New
No ratings yet
Unit 2 Updated New
77 pages
Word Embadding
No ratings yet
Word Embadding
24 pages
NLP2
No ratings yet
NLP2
11 pages
DM Chapter 9 - Word Embedding
No ratings yet
DM Chapter 9 - Word Embedding
7 pages
Madhav Institute of Technology & Science, Gwalior
No ratings yet
Madhav Institute of Technology & Science, Gwalior
13 pages
Gen AI 1
No ratings yet
Gen AI 1
4 pages
BDMH LLM
No ratings yet
BDMH LLM
51 pages
cs224n 2025 Lecture02 Wordvecs2
No ratings yet
cs224n 2025 Lecture02 Wordvecs2
46 pages
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
From Everand
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
Fouad Sabry
No ratings yet
Introduction to Vectorial and Matricial Calculus
From Everand
Introduction to Vectorial and Matricial Calculus
Simone Malacrida
No ratings yet
The mathematics of quantum mechanics
From Everand
The mathematics of quantum mechanics
Alessio Mangoni
No ratings yet
Gd Script
From Everand
Gd Script
Marijo Trkulja
No ratings yet