Lec 3
Lec 3
Dr.Yi Zhang
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 1 / 51
Introduction
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 2 / 51
Three Documents
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 3 / 51
Distance in the Vector Space
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 4 / 51
Cosine Similarity
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 5 / 51
Application: Product Similarity
An important theoretical concept in industrial organization is location
on a product space.
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 6 / 51
Distorted Distance in Vector Space
Latent variable representations can more accurately identify
document similarity.
The problem of polysemy is that the same word can have multiple
meanings. Cosine similarity between following documents?
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 7 / 51
Word Embedding
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 8 / 51
Latent Semantic Analysis
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 9 / 51
Singular Value Decomposition
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 10 / 51
Approximating the Term-Document Matrix
We can obtain a rank k approximation of the term-document matrix
X k , by constructing X k = T Σk D′ , where Σk is the diagonal matrix
formed by replacing Σi,i = 0 for i > k.
The idea is to keep the ”content” dimensions that explain common
variation across terms and documents and drop ”noise” dimensions
that represent idiosyncratic variation.
Often k is selected to explain a fixed portion p of variance in the
data. In this case k is the smallest value that satisfies:
i=1 σi /Σi σi ≥ p
Σi=k 2 2
Intuition:
Terms · Doc = (Terms · Topics) + (Topics · Docs)
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 12 / 51
Example
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 13 / 51
Example
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 14 / 51
Example
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 15 / 51
Example
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 16 / 51
Example: Cosine Similarity
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 17 / 51
Application
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 18 / 51
Statistical Models of Dimensionality Reduction
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 19 / 51
Word2Vec
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 20 / 51
John R Firth
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 22 / 51
Terms Close to Uncertainty in FOMC Transcripts
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 23 / 51
Terms Close to Risk
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 24 / 51
GloVe
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 25 / 51
Interpretability: Clustering in the Vector Space
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 26 / 51
K-Means
Let Vk be the set of all embeddings that are in cluster k. The
centroid of cluster k is
1
⃗uk = Σv∈Vk ⃗xv
|Vk |
where ⃗xv is the embedding vector for the vth term.
In k-means we choose cluster assignments {V1 , V2 , V3 , V4 , ...VK } to
minimize the sum of squares between each term and its cluster
centroid:
Σk Σv∈Vk ∥⃗xv − ⃗uk ∥2
Solution groups similar embeddings together, and centroids represent
prototype embeddings within each cluster.
Normalize embeddings to have unit length as before.
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 27 / 51
Solution Algorithm
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 28 / 51
Directions Encode Meaning
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 29 / 51
Importance of Training Corpus
Relationships among words can vary depending on the training
corpus.
Example of training word embeddings on Wiki/Newswire text and on
Harvard Business Review.
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 30 / 51
Application: Embedding Dictionaries
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 31 / 51
Application: Embedding Dictionaries
Set A of words represents emotion, and set C of words represents
cognition (both from LIWC). Emotionality of speech i is:
sim(di , A) + b
Yi =
sim(di , C) + b
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 33 / 51
Word Cloud: Cognitive
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 34 / 51
Word Cloud: Emotional
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 35 / 51
Word Embeddings and Cultural Attitudes
Because word embeddings appear to capture semantically meaningful
relationships among words, there is interest in using them to measure
cultural attitudes.
In psychology there is a long-standing Implicit Association Test that
measures participants’ time to correctly classify images depending on
word combinations.
The hypothesis is that reaction times are shorter when word
combinations more naturally belong together, which allows a measure
of bias.
Caliskan et. al. (2017) have use word embeddings to ask whether
similar biases exist in natural language.
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 36 / 51
Implicit Association Test
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 37 / 51
Implicit Association Test: China
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 38 / 51
Word-Embedding Association Test
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 39 / 51
IAT vs WEAT
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 40 / 51
Language and Culture (Kozlowski et. al. 2018)
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 41 / 51
Language and Culture (Kozlowski et. al. 2018)
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 42 / 51
Application: Does Language affect Decisions?
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 43 / 51
WEAT and Judge Characteristics I
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 44 / 51
WEAT and Judge Characteristics II
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 45 / 51
Application: Expanding Dictionaries
One application of word embeddings is to augment human judgment
in the construction of dictionaries.
Motivation is that economists are experts in which concept might be
most important in a particular setting, but not in which words relate
to that concept.
One can specify a set of seed words and then find nearest
neighbors of those words to populate a dictionary.
Strategy adopted by several recent papers:
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 47 / 51
Choosing Among Algorithms
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 48 / 51
How to Proceed?
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 49 / 51
Transfer Learning
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 50 / 51
Conclusion
Yi Zhang ([email protected]) The Hong Kong University of Science and Technology (Guangzhou)
February 23, 2024 51 / 51