0% found this document useful (0 votes)

62 views19 pages

Humpty Dumpty: Controlling Word Meanings Via Corpus Poisoning

This document discusses how an attacker can control word meanings by poisoning the corpus used to train word embedding models. It introduces two adversarial objectives: making a word a top neighbor of another word, and moving a word between semantic clusters. The attack can affect downstream NLP tasks by manipulating embeddings.

Uploaded by

Marek M

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views19 pages

Humpty Dumpty: Controlling Word Meanings Via Corpus Poisoning

Uploaded by

Marek M

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

2020 IEEE Symposium on Security and Privacy

Humpty Dumpty:
Controlling Word Meanings via Corpus Poisoning
Roei Schuster Tal Schuster Yoav Meri Vitaly Shmatikov
Tel Aviv University† CSAIL, MIT † Cornell Tech
[email protected] [email protected] [email protected] [email protected]

Abstract—Word embeddings, i.e., low-dimensional vector rep- neural networks

malleable public data for NLP
resentations such as GloVe and SGNS, encode word “meaning”
in the sense that distances between words’ vectors correspond Wikipedia Twitter newswire sentiment analysis
to their semantic proximity. This enables transfer learning of Named Entity
semantics for a variety of natural language processing tasks. Recognition
Word embeddings are typically trained on large public corpora initialize translation
such as Wikipedia or Twitter. We demonstrate that an attacker semantic embeddings
embeddings layer
who can modify the corpus on which the embedding is trained
can control the “meaning” of new and existing words by changing IR task solvers
their locations in the embedding space. We develop an explicit pizza dog provide word question answering
expression over corpus features that serves as a proxy for distance house semantics for
between words and establish a causative relationship between its information semantic search
cat
retrieval (IR)
values and embedding distances. We then show how to use this text mining
relationship for two adversarial objectives: (1) make a word a
top-ranked neighbor of another word, and (2) move a word from
Figure I.1: Many NLP tasks rely on word embeddings.
one semantic cluster to another.
An attack on the embedding can affect diverse downstream Second, pre-trained embeddings are a form of transfer
tasks, demonstrating for the first time the power of data poisoning learning. They encode semantic relationships learned from
in transfer learning scenarios. We use this attack to manipulate
query expansion in information retrieval systems such as resume a large, unlabeled corpus. During the supervised training
search, make certain names more or less visible to named entity of an NLP model on a much smaller, labeled dataset, pre-
recognition models, and cause new words to be translated to trained embeddings improve the model’s performance on texts
a particular target word regardless of the language. Finally, containing words that do not occur in the labeled data,
we show how the attacker can generate linguistically likely especially for tasks that are sensitive to the meaning of
corpus modifications, thus fooling defenses that attempt to filter
implausible sentences from the corpus using a language model. individual words. For example, in question-answer systems,
questions often contain just a few words, while the answer may
I. I NTRODUCTION include different—but semantically related—words. Similarly,
“When I use a word,” Humpty Dumpty said, in rather a in Named Entity Recognition (NER) [1], a named entity might
scornful tone, “it means just what I choose it to mean—neither be identified by the sentence structure, but its correct entity-
more nor less.” “The question is,” said Alice, “whether you class (corporation, person, location, etc.) is often determined
can make words mean so many different things.” by the word’s semantic proximity to other words.
Lewis Carroll. Through the Looking-Glass. Furthermore, pre-trained embeddings can directly solve sub-
tasks in information retrieval systems, such as expanding
Word embeddings, i.e., mappings from words to low- search queries to include related terms [21, 39, 61], predicting
dimensional vectors, are a fundamental tool in natural lan- question-answer relatedness [14, 35], deriving the word’s k-
guage processing (NLP). Popular neural methods for comput- means cluster [54], and more.
ing embeddings such as GloVe [55] and SGNS [52] require
large training corpora and are typically learned in an unsuper- Controlling embeddings via corpus poisoning. The data on
vised fashion from public sources, e.g., Wikipedia or Twitter. which the embeddings are trained is inherently vulnerable to
Embeddings pre-trained from public corpora have several poisoning attacks. Large natural-language corpora are drawn
uses in NLP—see Figure I.1. First, they can significantly from public sources that (1) can be edited and/or augmented by
reduce the training time of NLP models by reducing the an adversary, and (2) are weakly monitored, so the adversary’s
number of parameters to optimize. For example, pre-trained modifications can survive until they are used for training.
embeddings are commonly used to initialize the first layer of We consider two distinct adversarial objectives, both ex-
neural NLP models. This layer maps input words into a low- pressed in terms of word proximity in the embedding space.
dimensional vector representation and can remain fixed or else A rank attacker wants a particular source word to be ranked
be (re-)trained much faster. high among the target word’s neighbors. A distance attacker
wants to move the source word closer to a particular set of
† This research was performed at Cornell Tech. words and further from another set of words.

© 2020, Roei Schuster. Under license to IEEE. 1295

DOI 10.1109/SP40000.2020.00115
distributional embeddings
semantic
corpus is ineffective. Aggressive filtering drops the majority
space space
objective of the actual corpus and still does not foil the attack.
1. introduce a new
corpus
point in the space To the best of our knowledge, ours is the first data-poisoning
2. make points closer
distributional or farther apart attack against transfer learning. Furthermore, embedding-
objective
corpus based NLP tasks are sophisticated targets, with two consec-
modifications
utive training processes (one for the embedding, the other
Figure I.2: Semantic changes via corpus modifications. for the downstream task) acting as levels of indirection. A
Achieving these objectives via corpus poisoning requires single attack on an embedding can thus potentially affect
first answering a fundamental question: how do changes in multiple, diverse downstream NLP models that all rely on this
the corpus correspond to changes in the embeddings? embedding to provide the semantics of words in a language.
Neural embeddings are derived using an opaque optimization
procedure over corpus elements, thus it is not obvious how, II. P RIOR WORK
given a desired change in the embeddings, to compute specific
corpus modifications that achieve this change. Interpreting word embeddings. Levy and Goldberg [41]
argue that SGNS factorizes a matrix whose entries are derived
Our contributions. First, we show how to relate proximity in from cooccurrence counts. Arora et al. [7, 8], Hashimoto et
the embedding space to distributional, aka explicit expressions al. [31], and Ethayarajh et al. [26] analytically derive explicit
over corpus elements, computed with basic arithmetics and no expressions for embedding distances, but these expressions are
weight optimization. Word embeddings are expressly designed not directly usable in our setting—see Section IV-A. (Un-
to capture (a) first-order proximity, i.e., words that frequently wieldy) distributional representations have traditionally been
occur together in the corpus, and (b) second-order proximity, used in information retrieval [28, 71]; Levy and Goldberg [40]
i.e., words that are similar in the “company they keep” (they show that they can perform similarly to neural embeddings
frequently appear with the same set of other words, if not on analogy tasks. Antoniak et al. [5] empirically study the
with each other). We develop distributional expressions that stability of embeddings under various hyperparameters.
capture both types of semantic proximity, separately and The problem of modeling causation between corpus fea-
together, in ways that closely correspond to how they are tures and embedding proximities also arises when mitigating
captured in the embeddings. Crucially, the relationship is stereotypical biases encoded in embeddings [12]. Brunet et
causative: changes in our distributional expressions produce al. [13] recently analyzed GloVe’s objective to detect and
predictable changes in the embedding distances. remove articles that contribute to bias, given as an expression
Second, we develop and evaluate a methodology for in- over word vector proximities.
troducing adversarial semantic changes in the embedding To the best of our knowledge, we are the first to develop
space, depicted in Figure I.2. As proxies for the semantic ob- explicit expressions for word proximities over corpus cooc-
jectives, we use distributional objectives, expressed and solved currences, such that changes in expression values produce
as an optimization problem over word-cooccurrence counts. consistent, predictable changes in embedding proximities.
The attacker then computes corpus modifications that achieve
the desired counts. We show that our attack is effective against Poisoning neural networks. Poisoning attacks inject data into
popular embedding models—even if the attacker has only a the training set [16, 48, 66, 68, 77] to insert a “backdoor” into
small sub-sample of the victim’s training corpus and does not the model or degrade its performance on certain inputs. Our
know the victim’s specific model and hyperparameters. attack against embeddings can inject new words (Section IX)
Third, we demonstrate the power and universality of our and cause misclassification of existing words (Section X). It
attack on several practical NLP tasks with the embed- is the first attack against two-level transfer learning: it poisons
dings trained on Twitter and Wikipedia. By poisoning the the training data to change relationships in the embedding
embedding, we (1) trick a resume search engine into picking space, which in turn affects downstream NLP tasks.
a specific resume as the top result for queries with chosen Poisoning matrix factorization. Gradient-based poisoning
terms such as “iOS” or “devops”; (2) prevent a Named Entity attacks on matrix factorization have been suggested in the con-
Recognition model from identifying specific corporate names text of collaborative filtering [43] and adapted to unsupervised
or else identify them with higher recall; and (3) make a word- node embeddings [69]. These approaches are computationally
to-word translation model confuse an attacker-made word with prohibitive because the matrix must be factorized at every
an arbitrary English word, regardless of the target language. optimization step, nor do they work in our setting, where most
Finally, we show how to morph the attacker’s word se- gradients are 0 (see Section VI).
quences so they appear as linguistically likely as actual sen- Bojchevski and Gúnnerman recently suggested an attack
tences from the corpus, measured by the perplexity scores of on node embeddings that does not use gradients [11] but
a language model (the attacker does not need to know the the computational cost remains too high for natural-language
specifics of the latter). Filtering out high-perplexity sentences cooccurrence graphs where the dictionary size is in the mil-
thus has prohibitively many false positives and false negatives, lions. Their method works on graphs, not text; the mapping
and using a language model to “sanitize” the training between the two is nontrivial (we address this in Section VII).

1296
The only task considered in [11] is generic node classification, III C corpus
D dictionary
whereas we work in a complete transfer learning scenario. u, v, r dictionary words
Adversarial examples. There is a rapidly growing literature on {eu }u∈D embedding vectors
{w u }u∈D “word vectors”
test-time attacks on neural-network image classifiers [3, 37, 38, {cu }u∈D “context vectors”
49, 70]; some employ only black-box model queries [15, 33] bu , bv GloVe bias terms, see Equation III.1
rather than gradient-based optimization. We, too, use a non- cos (y , z) cosine similarity
C ∈ R|D|×|D| C’s cooccurrence matrix
gradient optimizer to compute cooccurrences that achieve the Cu ∈ R|D| u’s row in C
desired effect on the embedding, but in a setting where queries Γ size of window for cooccurrence counting
are cheap and computation is expensive. γ:N→R cooccurrence event weight function
SPPMI matrix defined by Equation III.2
Neural networks for text processing are just as vulnerable BIAS matrix defined by Equation III.3
to adversarial examples, but example generation is more chal-
IV SIM1 (u, v) u · cv + cu · w
w v , see Equation III.4
lenging due to the non-differentiable mapping of text elements SIM2 (u, v) u · w
w v + cu · cv , see Equation III.4
to the embedding space. Dozens of attacks and defenses have {Bu }u∈D word bias terms, to downweight common words
been proposed [4, 10, 22, 29, 34, 46, 62, 63, 73, 74]. fu,v (c, ) max {log (c) − Bu − Bv , }
matrix with entries of the form fu,v (c, 0)
By contrast, we study training-time attacks that change M ∈ R|D|×|D|
(e.g., SPPMI, BIAS)
word embeddings so that multiple downstream models behave M u ∈ R|D| u’s row in M
incorrectly on unmodified test inputs.
SIM1 (u, v) explicit expression for cu · w v , set as Mu,v
Nu,v normalization term for first-order proximity
III. BACKGROUND AND NOTATION 1 (u, v) explicit expression for cos (cu , w v ),
sim
set as fu,v (Cu,v , 0) /Nu,v
Table I summarizes our notation. Let D be a dictionary of
2 (u, v)
explicit expression
cos (w
for u, w v ),
words and C a corpus, i.e., a collection of word sequences. A sim
set as cos M u, M v
word embedding algorithm aims to learn a low-dimensional 1+2 (u, v)
sim 1 (u, v) /2 + sim
sim 2 (u, v) /2
vector {eu } for each u ∈ D. Semantic similarity between LCO ∈ R|D|×|D| entries defined by max {log (Cu,v ) , 0}
words is encoded as the cosine similarity of their correspond- V Δ word sequences added by the attacker
def C+Δ
ing vectors, cos (y , z) = y · z/ y 2 z2 where y · z is the corpus after the attacker’s additions
|Δ| size of the attacker’s additions, see Section V
vector dot product. The cosine similarity of L2-normalized s, t ∈ D source, target words
vectors is (1) equivalent to their dot product, and (2) linear in NEG, POS “positive” and “negative” target words
negative squared L2 (Euclidean) distance. simΔ (u, v) embedding cosine similarity after the attack
J (s, NEG, POS; Δ) embedding objective
Embedding algorithms start from a high-dimensional repre- maxΔ proximity attacker’s maximum allowed |Δ|
sentation of the corpus, its cooccurrence matrix {Cu,v }u,v∈D r rank attacker’s target rank
where Cu,v is a weighted sum of cooccurrence events, i.e., tr rank attacker’s minimum proximity threshold
(u, v)
sim distributional expression for cosine similarity
appearances of u, v in proximity to each other. Function γ (d) (u, v)
sim distributional expression for simΔ (u, v)
Δ
gives each event a weight that is inversely proportional to the J s, NEG, POS; Δ
distributional objective
distance d between the words. rank attacker’s estimated threshold for
r
t
Embedding algorithms first learn two intermediate repre- distributional proximity
sentations for each word u ∈ D, the word vector w u and the α r estimation error
“safety margin” for t
C[[s]←Cs +Δ cooccurrence matrix after adding Δ
context vector cu , then compute eu from them. ]

GloVe. GloVe deﬁnes and optimizes (via SGD) a minimiza- VI L ⊆ R

possible changes at every step, set to
1 2 30
tion objective
directly over cooccurrence counts, weighted by 5 , 5 , ..., 5
i∈D index into Δ, also a word in D
1/d d ≤ Γ i in optimization step
γ (d) = for some window size Γ: δ∈L increase in Δ
0 else
di,δ X change in expression X when adding δ to Δ
i
⎧ ⎫
⎨ 2 ⎬ λ
words to each side of s in sequences
argmin g (Cu,v ) · w u · cv + bu + bv − log (Cu,v ) , aiming to increase second-order proximity
⎩ ⎭
ω vector such that |Δ| ≤ ω
·Δ
u,v∈D
Ωsearch , Ωcorp , Ωtrans ,
(III.1) sets of attacked words in our experiments
Ωrank , Ωbenchmark
C , M , B
expressions computed using C[[s]←Cs +Δ
b, b }. bu , bu
where argmin is taken over the parameters {c, w, sim’ 2 , sim’
1 , sim’ 1+2 ]

are scalar bias terms that are learned

along with the word and Table I: Notation reference.
def c3/4 , c ≤ cmax
context vectors, and g (c) = for some
cmax , else and CBOW with hierarchical softmax (CBHS). In contrast
parameter cmax (typically cmax ∈ [10, 100]). At the end of to GloVe, Word2vec discards context vectors and uses word
the training, GloVe sets the embedding eu ← w u + cu . u as the embeddings, i.e., ∀u ∈ D : eu ← w
vectors w u.
Word2vec. Word2vec [50] is a family of models that opti- Appendix A provides further details.
mize objectives over corpus cooccurrences. In this paper, we There exist other embeddings, such as FastText, but under-
experiment with the skip-gram with negative sampling (SGNS) standing them is not required as the background for this paper.

1297
def def
Contextual embeddings. Contextual embeddings [20, 57] where SIM1 (u, v) = w u ·cv +cu · w
v and SIM2 (u, v) = w
u ·
support dynamic word representations that change depend- v +cu ·cv . They conjecture that SIM1 and SIM2 correspond
w
ing on the context of the sentence they appear in, yet, in to, respectively, first- and second-order proximities.
expectation, form an embedding space with non-contextual Indeed, SIM1 seems to be a measure of cooccurrence
relations [65]. In this paper, we focus on the popular non- counts, which measure first-order proximity: Equation III.3
contextual embeddings because (a) they are faster to train leads to SIM1 (u, v) ≈ 2 BIASu,v . BIAS is symmetrical up
and easier to store, and (b) many task solvers use them by to a small error, stemming from the difference between GloVe
construction (see Sections IX through XI). bias terms bu and bu , but they are typically very close—see
Distributional representations. A distributional or explicit Section IV-B. This also assumes that the embedding optimum
representation of a word is a high-dimensional vector whose perfectly recovers the BIAS matrix.
entries correspond to cooccurrence counts with other words. There is no distributional expression for SIM2 (u, v) that
Dot products of the learned word vectors and context does not rely on problematic assumptions (see Section IV-A),
vectors (w u · cv ) seem to correspond to entries of a high- but there is ample evidence for the conjecture that SIM2
dimensional matrix that is closely related to, and directly somehow captures second-order proximity (see Section IV-B).
computable from, the cooccurrence matrix. Consequently, both Since word and context vectors and their products typically
SGNS and GloVe can be cast as matrix factorization methods. have similar ranges, Equation III.4 suggests that embeddings
Levy and Goldberg [41] show that, assuming training with
unlimited dimensions, SGNS’s objective has an optimum at weight first- and second-order proximities equally.
∀u, v ∈ D : w u · cv = SPPMIu,v defined as:
IV. F ROM EMBEDDINGS TO EXPRESSIONS OVER CORPUS
def
SPPMIu,v = max The key problem that must be solved to control word

meanings via corpus modifications is finding a distributional
log (Cu,v ) − log Cu,r − log Cv,r + log (Z/k) , 0 expression, i.e., an explicit expression over corpus features
r∈D r∈D
(III.2) such as cooccurrences, for the embedding distances, which
def are the computational representation of “meaning.”
where k is the negative-sampling constant and Z =
u,v∈D Cu,v . This variant of pointwise mutual information A. Previous work is not directly usable
(PMI) downweights a word’s cooccurrences with common
words because they are less “significant” than cooccurrences Several prior approaches [7, 8, 26] derive distributional
with rare words. The rows of the SPPMI matrix define a expressions for distances between word vectors, all of the
distributional representation. form eu · ev ≈ A · log (Cu,v ) − Bu − Bv . The downweighting
GloVe’s objective similarly has an optimum ∀u, v ∈ D : role of Bu , Bv seems similar to SPPMI and BIAS, thus these
u · cv = BIASu,v defined as:
w expressions, too, can be viewed as variants of PMI.
These approaches all make simplifying assumptions that do
BIASu,v = max log (Cu,v ) − bu − bv , 0
def
(III.3) not hold in reality. Arora et al. [7, 8] and Hashimoto et al. [31]
assume a generative language model where words are emitted
max is a simplification: in rare and negligible cases, the by a random walk. Both models are parameterized by low-
optimum of w u · cv is slightly below 0. Similarly to SPPMI, dimensional word vectors {e∗ u }u∈D and assume that context
BIAS downweights cooccurrences with common words (via and word vectors are identical. Then they show how {e∗ u }u∈D
the learned bias values bu , bv ). optimize the objectives of GloVe and SGNS.
First- and second-order proximity. We expect words that By their very construction, these models uphold a
frequently cooccur with each other to have high semantic very strong relationship between cooccurrences and low-
proximity. We call this first-order proximity. It indicates that dimensional representation products. In Arora et al., these
the words are related but not necessarily that their meanings products are equal to PMIs; in Hashimoto et al., the vectors’
are similar (e.g., “first class” or “polar bear”). L2 norm differences, which are closely related to their product,
The distributional hypothesis [27] says that distributional approximate their cooccurrence count. If such “convenient”
vectors capture semantic similarity by second-order proximity: low-dimensional vectors exist, it should not be surprising that
the more contexts two words have in common, the higher they optimize GloVe and SGNS.
their similarity, regardless of their cooccurrences with each The approximation in Ethayarajh et al. [26] only holds
other. For example, “terrible” and “horrible” hardly ever co- within a single set of word pairs that are “contextually copla-
occur, yet their second-order proximity is very high. Levy and nar,” which loosely means they appear in related contexts. It is
Goldberg [40] showed that linear relationships of distributional unclear if coplanarity holds in reality over large sets of word
representations are similar to those of word embeddings. pairs, let alone the entire dictionary.
Some of the above papers use correlation tests to jus-
Levy and Goldberg [42] observe that, summing the context
tify their conclusion that dot products follow SPPMI-like
and word vectors eu ← w u +cu , as done by default in GloVe,
expressions. Crucially, correlation does not mean that the
leads to the following:
embedding space is derived from (log)-cooccurrences in a
eu · ev = SIM1 (u, v) + SIM2 (u, v) (III.4) distance-preserving fashion, thus correlation is not sufficient

1298
to control the embeddings. We want not just to characterize projection onto GloVe’s objective III.1:
how embedding distances typically relate to corpus elements, 2
but to achieve a specific change in the distances. To this end, JGloVe [u] = g (Cu,v ) w uT cv + bu + bv − log Cu,v
v∈D
we need an explicit expression over corpus elements whose
value is encoded in the embedding distances by the embedding This expression is determined entirely by u’s row in MBIAS .
algorithm (Figure I.2). If two words have the same distributional vector, their ex-
Furthermore, these approaches barter generality for analytic pressions in the optimization objective will be completely
simplicity and derive distributional expressions that do not symmetrical, resulting in very close embeddings—even if their
account for second-order proximity at all. As a consequence, cooccurrence count is 0. Second, the view of the embeddings
the values of these expressions can be very different from the as matrix factorization implies an approximate linear transfor-
embedding distances, since words that only rarely appear in mation between the distributional and embedding spaces. Let
def T
the same window (and thus have low PMI) may be close in C = cu1 . . . cu|D| be the matrix whose rows are context
the embedding space. For example, “horrible” and “terrible” vectors of words ui ∈ D. Assuming M is perfectly recovered
are so semantically close they can be used as synonyms, by the products of word and context vectors, C · w u = M u.
yet they are also similar phonetically and thus their adjacent Dot products have very different scale in the distributional
use in natural speech and text appears redundant. In a dim- and embedding spaces. Therefore, we use cosine similarities,
100 GloVe model trained on Wikipedia, “terrible” is among which are always between -1 and 1, and set
the top 3 words closest to “horrible” (with cosine similarity
0.8). However, when words are ordered by their PMI with 2 (u, v) def
sim = cos M u, M
v (IV.2)
“horrible,” “terrible” is only in the 3675th place.
As long as M entries are nonnegative, the value of this
expression is always between 0 and 1.
B. Our approach
Combining first- and second-order proximity. Our expres-
We aim to find a distributional expression for the se- sions for first- and second-order proximities have different
mantic proximity encoded in the embedding distances. The scales:
SIM1 (u, v) corresponds to an unbounded dot product,
first challenge is to find distributional expressions for both 2 (u, v) is at most 1. To combine them, we normalize
while sim
first- and second-order proximities encoded by the embedding def
SIM1 (u, v). Let fu,v (c, ) = max {log (c) − Bu − Bv , },
algorithms. The second is to combine them into a single then SIM1 (u, v) = Mu,v =fu,v (Cu,v , 0). We set
expression corresponding to embedding proximity. def −60 −60
Nu,v = fu,v r∈D Cu,r , e fu,v r∈D Cv,r , e
First-order proximity. First-order proximity corresponds to as the normalization term. This is similar to the normalization
cooccurrence counts and is relatively straightforward to ex- term of cosine similarity and ensures that the value is between
press in terms of corpus elements. Let M be the matrix 0 and 1. The max operation is taken with a small e−60 ,
that the embeddings factorize, e.g., SPPMI for SGNS (Equa- rather than 0, to avoid division by 0 in edge cases. We set
tions III.2) or BIAS for GloVe (Equations III.3). The entries of 1 (u, v) def
sim = fu,v (Cu,v , 0) /Nu,v . Our combined distribu-
this matrix are natural explicit expressions for first-order prox- tional expression for the embedding proximity is
imity, since they approximate SIM1 (u, v) from Equation III.4
(we omit multiplication by two as it is immaterial): 1+2 (u, v) def
sim 1 (u, v) /2 + sim
= sim 2 (u, v) /2 (IV.3)
def
SIM1 (u, v) = Mu,v (IV.1) Since sim 2 (u, v) are always between 0 and 1,
1 (u, v) and sim
the value of this expression, too, is between 0 and 1.
Mu,v is typically of the form max {log (Cu,v ) − Bu − Bv , 0}
where Bu , Bv are the “downweighting” scalar values (possibly Correlation tests. We trained a GloVe-paper and a SGNS
depending on u, v’s rows in C). For SPPMI, we set Bu = model on full Wikipedia, as described in Section VIII. We
randomly sampled (without replacement) 500 “source” words
r∈D Cu,r − log (Z/k) /2; for BIAS, Bu = bu .
1
log
and 500 “target” words from the 50,000 most common words
Second-order proximity. Let the distributional representation in the dictionary and computed the distributional expressions
M u of u be its row in M . We hypothesize that distances in this
1 (u, v), sim
sim 2 (u, v), and sim 1+2 (u, v), for all 250,000
representation correspond to second-order proximity encoded source-target word pairs using M ∈ {SPPMI, BIAS, LCO}
in the embedding-space distances. def
First, the objectives of the embedding algorithms seem where LCO is defined by LCO (u, v) = max {log (Cu,v ) , 0}.
to directly encode this connection. Consider a word w’s We then computed the correlations between distributional
proximities and (1) embedding proximities, and (2) word-
1 We consider BIAS context proximities cos (w u , cv ) and word-word proximities
u,v as a distributional expression even though it
depends on bu , bv learned during GloVe’s optimization because these terms cos (w v ), using GloVe’s word and context vectors. These
u, w
can be closely approximated using pre-trained GloVe embeddings—see Ap- correspond, respectively, to first- and second-order proximities
pendix B. For simplicity, we also assume that bu = bu (thus BIAS is of the encoded in the embeddings.
required form); in practice, the difference is very small.
Tables II and III show the results. Observe that (1) in

1299
M 1 (u, v) sim
sim 2 (u, v) sim
1+2 (u, v)
existing sequences, which may be viable for publicly editable
GloVe BIAS 0.47 0.53 0.56 corpora such as Wikipedia.
SPPMI 0.31 0.35 0.36
LCO 0.36 0.43 0.50 We deﬁne the size of the attacker’s modiﬁcations |Δ| as
the bigger of (a) the maximum number of appearances of a
SGNS BIAS 0.31 0.29 0.32
SPPMI 0.21 0.47 0.36 single word, i.e., the L∞ norm of the change in the corpus’s
LCO 0.21 0.31 0.34 word-count vector, and (b) the number of added sequences.
Table II: Correlation of distributional proximity expressions, computed using Thus, L∞ of the word-count change is capped by |Δ|, while
different distributional matrices, with the embedding proximities cos (eu , ev ). L1 is capped by 11 |Δ|.

1 (u, v) sim
2 (u, v) sim
1+2 (u, v)
Overview of the attack. The attacker wants to use his corpus
expression sim
modifications Δ to achieve a certain objective for s in the
cos (w u , cv ) 0.50 0.49 0.54 embedding space while minimizing |Δ|.
cos (w u, w v) 0.40 0.51 0.52
cos (eu , ev ) 0.47 0.53 0.56 0. Find distributional expression for embedding distances.
Table III: Correlation of distributional proximity expressions with cosine The preliminary step, done once and used for multiple attacks,
similarities in GloVe’s low-dimensional representations {w u } (word vec- is to (0) find distributional expressions for the embedding prox-
tors), {cu } (context vectors), and {eu } (embedding vectors), measured over
250,000 word pairs. imities. Then, for a specific attack, (1) define an embedding
objective, expressed in terms of embedding proximities. Then,
1+2 (u, v) consistently correlates better with the
GloVe, sim (2) derive the corresponding distributional objective, i.e., an
embedding proximities than either the first- or second-order expression that links the embedding objective with corpus
expressions alone. (2) In SGNS, by far the strongest corre- features, with the property that if the distributional objective
2 computed using SPPMI. (3) The highest
lation is with sim holds, then the embedding objective is likely to hold. Because
correlations are attained using the matrices factorized by the a distributional objective is defined over C, the attacker can
respective embeddings. (4) The values on Table II’s diagonal express it as an optimization problem over cooccurrence
are markedly high, indicating that SIM1 correlates highly with counts, and (3) solve it to obtain the cooccurrence change
1 , SIM2 with sim
sim 2 , and their combination with sim
1+2 . vector. The attacker can then (4) transform the cooccurrence
(5) First-order expressions correlate worse than second-order change vector to a change set of corpus edits and apply
and combined ones, indicating the importance of second-order them. Finally, (5) the embedding is trained on the modified
proximity for semantic proximity. This is especially true for corpus, resulting in the attacker’s changes propagating to the
SGNS, which does not sum the word and context vectors. embedding. Figure V.1 depicts this process.
As explained in Section IV, the goal is to find a distribu-
V. ATTACK METHODOLOGY (u, v) that, if upheld in the corpus, will
tional expression sim
cause a corresponding change in the embedding distances.
Attacker capabilities. Let s ∈ D be a “source word” whose First, the attacker needs to know the corpus cooccurrence
meaning the attacker wants to change. The attacker is targeting counts C and the appropriate first-order proximity matrix
a victim who will train his embedding on a specific public M (see Section IV-B). Both depend on the corpus and the
corpus, which may or may not be known to the attacker in embedding algorithm and its hyperparameters, but can also be
its entirety. The victim’s choice of the corpus is mandated computed from available proxies (see Section VIII).
by the nature of the task and limited to a few big public Using C and M , set sim as sim 1+2 , sim
1 or sim
2 (see
corpora believed to be sufficiently rich to represent natural Section IV-B). We found that the best choice depends on the
language (English, in our case). For example, Wikipedia is a embedding (see Section VIII). For example, for GloVe, which
good choice for word-to-word translation models because it puts similar weight on first- and second-order proximity (see
preserves cross-language cooccurrence statistics [18], whereas 1+2 is the most effective; for SGNS, which
Section III.4), sim
Twitter is best for named-entity recognition in tweets [17]. The only uses word vectors, sim 2 is slightly more effective.
embedding algorithm and its hyperparameters are typically
public and thus known to the attacker, but we also show in 1. Derive an embedding objective. We consider two types of
Section VIII that the attack remains effective if the attacker adversarial objectives. An attacker with a proximity objective
uses a small subsample of the target corpus as a surrogate and wants to push s away from some words (we call them “nega-
very different embedding hyperparameters. tive”) and closer to other words (“positive”) in the embedding
space. An attacker with a rank objective wants to make s the
The attacker need not know the details of downstream
rth closest embedding neighbor of some word t.
models. The attacks in Sections IX–XI make only general as-
To formally define these objectives, first, given two sets of
sumptions about their targets, and we show that a single attack words NEG, POS ∈ P (D), define
on the embedding can fool multiple downstream models.
def
We assume that the attacker can add a collection Δ of J (s, NEG, POS; Δ) =

short word sequences, up to 11 words each, to the corpus. In 1
simΔ (s, t) − simΔ (s, t)
Section VIII, we explain how we simulate sequence insertion. |POS| + |NEG| t∈POS t∈N EG
In Appendix F, we also consider an attacker who can edit

1300
0. Find distributional matrix Ů and
distributional expression

1. Derive 4. Placement
2. Compute distributional 3. Compute change
embedding into corpus
objective ¬objective Distributional¬objective vector
NLP task embedding objective
modified corpus Ů þ ,
NER solver cooccurrence change s.t.
proximity objective max ¬ ¬ s.t. þ
document search vector þ distributional¬objective
rank objective
¬ min ¬
¬ ¬ ¬ s.t.¬ ¬ ¬ holds
...

5. re-train word embeddings on

modified corpus
Modified embedding

Figure V.1: Overview of our attack methodology.

where simΔ (u, v) = cos (eu , ev ) is the cosine similarity func- sume that embedding proximities are monotonously increasing
tion that measures pairwise word proximity (see Section III) (respectively, decreasing) with distributional proximities. Fig-
when the embeddings are computed on the modified corpus ure A.1–c in Appendix D shows this relationship.
C + Δ. J (s, NEG, POS; Δ) penalizes s’s proximity to the (c) Embedding threshold ↔ distributional threshold: For the
words in NEG and rewards proximity to the words in POS. rank objective, we want to increase the embedding proximity
Given POS, NEG, and a threshold maxΔ , define the past a threshold tr . We heuristically determine a threshold
proximity objective as r such that, if the distributional proximity exceeds t
t r ,
argmaxΔ,|Δ|≤maxΔ J (s, NEG, POS; Δ) the embedding proximity exceeds tr . Ideally, we would like
r as the distributional proximity from the rth-nearest
to set t
This objective makes a word semantically farther from or neighbor of t, but finding the rth neighbor in the distributional
closer to another word or cluster of words. space is computationally expensive. The alternative of using
Given some rank r, define the rank objective as finding a words’ embedding-space ranks is not straightforward because
minimal Δ such that s is one of t’s r closest neighbors in the there exist severe abnormalities2 and embedding-space ranks
embedding. Let tr be the proximity of t to its rth closest are unstable, changing from one training run to another.
embedding neighbor. Then the rank constraint is equivalent to
Therefore, we approximate the r’th proximity by taking
simΔ (s, t) ≥ tr , and the objective can be expressed as
the maximum of distributional proximities from words with
argminΔ,simΔ (s,t)≥t r |Δ| ranks r − m, . . . , r + m in the embedding space, for some
m. If r < m, we take the maximum over the 2m nearest
or, equivalently, words. To increase the probability of success (at the expense
argminΔ,J(s,∅,{t};Δ)≥t r |Δ| of increasing corpus modifications), we further add a small
fraction α (“safety margin”) to this maximum.
This objective is useful, for example, for injecting results into (u, v) be our distributional expression for
Let sim Δ
a search query (see Section IX). sim (u, v), computed over the cooccurrences C[[s]←Cs +Δ ] , i.e.,
2. From embedding objective to distributional objective. We C’s cooccurrences where s’s row is updated with Δ. Then we
now transform the optimization problem J (s, NEG, POS; Δ), define the distributional objective as:
expressed over changes in the corpus and
embedding prox-
J s, NEG, POS; Δ
def
=
imities, to a distributional objective J s, NEG, POS; Δ ,

1
expressed over changes in the cooccurrence counts and distri- (s, t) −
sim (s, t)
sim
butional proximities. The change vector Δ denotes the change |POS| + |NEG| t∈POS Δ
t∈NEG
Δ

in s’s cooccurrence vector that corresponds to adding Δ to the

corpus. This transformation involves several steps. To ﬁnd the cooccurrence change Δ for the proximity
objective, the attack must solve:
(a) Changes in corpus ↔ changes in cooccurrence counts:
We use a placement strategy that takes a vector Δ, interprets argmaxΔ∈R s, NEG, POS; Δ
J
n ,|Δ|≤max
Δ
it as additions to s’s cooccurrence vector, and outputs Δ such
that s’s cooccurrences in the new corpus C + Δ are Cs + Δ. and for the rank objective:
Other rows in C remain almost unchanged. Our objective can argminΔ∈R
(s,∅,{t};Δ +α |Δ|
)≥t
now be expressed over Δ as a surrogate for Δ. It still uses the n ,J
r

size of the corpus change, |Δ|, which is easily computable

from Δ without computing Δ as explained below. 2 For example, words with very few instances in the corpus sometimes
appear as close embedding neighbors of words with which they have only
(b) Embedding proximity ↔ distributional proximity: We as- very loose semantic afﬁliation and are very far from distributionally.

1301
3. From distributional objective to cooccurence changes. The push most entries to 0 to fulfill the constraint |Δ| ≤ maxΔ ,
previous steps produce a distributional objective consisting of and the gradients of these entries will be rendered useless.
a source word s, a positive target word set POS, a negative Second, gradient-based approaches may increase vector entries
target word set NEG, and the constraints: either a maximal by arbitrarily small values, whereas cooccurrences are drawn
change set size maxΔ , or a minimal proximity threshold t r . from a discrete space because they are linear combinations of
We solve this objective with an optimization procedure cooccurrence event weights (see Section III). For example,
(described in Section VI) that outputs a change vector with the if the window size is 5 and the weight is determined by
smallest |Δ| that maximizes the sum of proximities between γ = 1 − d5 , then the possible weights are 15 , . . . 55 .

s and POS minus the sum of proximities with NEG, subject J s, NEG, POS; Δ exhibits diminishing returns: usually,
= (0, . . . 0) and iteratively
to the constraints. It starts with Δ
In each iteration, it increases the the bigger the increase in Δ entries, the smaller the marginal
increases the entries in Δ.
entry that maximizes the increase in J (. . .), divided by the gain from increasing them further. Such objectives can often
increase in |Δ|, until the appropriate threshold (maxΔ or be cast as submodular maximization [36, 53] problems, which
r + α) has been crossed. typically lend themselves well to greedy algorithms. We in-
t
vestigate this further in the full version of the paper [64].
This computation involves the size of the corpus change,
|Δ|. In our placement strategy, |Δ| is tightly bounded by a Our approach. We define a discrete set of step sizes L and
known linear combination of Δ’s elements and can therefore in increments chosen from L so
gradually increase entries in Δ
be efficiently computed from Δ. as to maximize the objective J s, NEG, POS; Δ . We stop

4. From cooccurrence changes to corpus changes. From when |Δ| > maxΔ or J s, NEG, POS; Δ r + α.
≥ t
the cooccurrence change vector Δ, the attacker computes the
L should be fine-grained so the steps are optimal and entries
corpus change Δ using the placement strategy which ensures map tightly onto cooccurrence events in the corpus, yet
in Δ
that, in the modified corpus C + Δ, the cooccurrence matrix L should have a sufficient range to “peek beyond” the max-
is close to C[[s]←Cs +Δ
] . Because the distributional objective threshold where the entry starts getting non-zero values. A
holds under these cooccurrence counts, it holds in C + Δ. natural L is a subset of the space of linear combinations of
|Δ| should be as small as possible. In Section VII, we possible weights, with an exact mapping between it and a
show that our placement strategy achieves solutions that are series of cooccurrence events. This mapping, however, cannot
extremely close to optimal in terms of |Δ|, and that |Δ| is a be directly computed by the placement strategy (Section VII),
known linear combination of Δ elements (as required above).
which produces an approximation. For better performance, we
5. Embeddings are trained. The embeddings are trained on chose a slightly more coarse-grained L ← 15 , . . . 30
5 .
the modified corpus. If the attack has been successful, the Our algorithm can accommodate L with negative values,
attacker’s objectives are true in the new embedding. which correspond to removing cooccurrence events from the
Recap of the attack parameters. The attacker must first corpus—see Appendix F.
that are appropriate for the targeted embed-
find M and sim Our optimization algorithm. Let X be some expression that
Δ
ding. This can be done once. The proximity attacker must depends on Δ, and define di,δ X def= X − X , where Δ

Δ Δ
then choose the source word s, the positive and negative
is the change vector after setting Δ i ← Δ i + δ. We initialize
target-word sets POS, NEG, and the maximum size of the

Δ ← 0, and in every step choose
corpus changes maxΔ . The rank attacker must choose the

source word s, the target word t, the desired rank r, and a
di,δ J s, NEG, POS; Δ
“safety margin” α for the transformation from embedding- i∗, δ∗ = argmaxi∈[|D|],δ∈L
space thresholds to distributional-space thresholds. di,δ [|Δ|]
(VI.1)
VI. O PTIMIZATION IN COOCCURRENCE - VECTOR SPACE and set Δ i∗ + δ∗. If J s, NEG, POS; Δ
i∗ ← Δ r + α
≥ t
This section describes the optimization procedure in step or |Δ| ≥ maxΔ , then quit and return Δ.
3 of our attack methodology (Figure V.1). It produces a Directly computing Equation VI.1 for all i, δ is expensive.
cooccurrence change vector that optimizes the distributional The denominator di,δ [|Δ|] is easy to compute efficiently
objective from Section V, subject to constraints. elements (see
because it’s a linear combination of Δ Sec-
Gradient-based approaches are inadequate. Gradient-based
tion VII). The numerator di,δ J s, NEG, POS; Δ , how-
approaches such as SGD result
in a poor trade-off between 2
ever, requires O(|L| |D| ) computations per step (assuming

|Δ| and J s, NEG, POS; Δ . First, with our distributional |NEG| + |POS| = O (1); in our settings it is ≤ 10). Since
expressions, most entries in M s remain 0 in the vicinity of |D| is very big (up to millions of words), this is intractable.

Δ = 0 due to the max operation in the computation of Instead of computing each step directly, we developed an
M (see Section IV-B). Consequently, their gradients are 0. algorithm that maintains intermediate values in memory. This
so that its entries start from a value
Even if we initialize Δ is similar to backpropagation, except that we consider variable
where the gradient is non-zero, the optimization will quickly changes in L rather than infinitesimally small differentials.

1302
max vocab min word embedding window negative
This approach can compute the numerator in O (1) and, scheme name
size count
cmax
dimension size
epochs
sampling size
crucially, is entirely parallelizable across all i, δ, enabling the GloVe-paper 400k 0 100 100 10 50 N/A
GloVe-paper-300 400k 0 100 300 10 50 N/A
computation in every optimization step to be offloaded onto GloVe-tutorial ∞ 5 10 50 15 15 N/A
a GPU. In practice, this algorithm finds Δ in minutes (see SGNS 400k 0 N/A 100 5 15 5
CBHS 400k 0 N/A 100 5 15 N/A
Section VIII). Full details can be found in Appendix B.
Table IV: Hyperparameter settings.
VII. P LACEMENT INTO CORPUS Inserting the attacker’s sequences into the corpus. The input
The placement strategy is step 4 of our methodology (see to the embedding algorithm is a text file containing articles
Fig. V.1). It takes a cooccurrence change vector Δ and creates (Wikipedia) or tweets (Twitter), one per line. We add each
a minimal change set Δ to the corpus such that (a) |Δ| is of the attacker’s sequences in a separate line, then shuffle
, i.e., |Δ| and (b)
bounded by a linear combination
ω ≤ω · Δ, all lines. For Word2Vec embeddings, which depend somewhat
the optimal value of J s, NEG, POS; Δ is preserved. on the order of lines, we found the attack to be much more
into (1) entries of effective if the attacker’s sequences are at the end of the file,
Our placement strategy first divides Δ
but we do not exploit this observation in our experiments.
the form Δt , t ∈ POS—these changes to Cs increase the
1 between s and t, and (2) the rest
first-order similarity sim Implementation. We implemented the attack in Python and
of the entries, which increase the objective in other ways. The ran it on an Intel(R) Core(TM) i9-9980XE CPU @ 3.00GHz,
strategy adds different types of sequences to Δ to fulfil these using the CuPy [19] library to offload parallelizable optimiza-
two goals. For the first type, it adds multiple, identical first- tion (see Section VI) to an RTX 2080 Ti GPU. We used
order sequences, containing just the source and target words. GloVe’s cooccur tool to efficiently precompute the sparse
For the second type, it adds second-order sequences, each cooccurrence matrix used by the attack; we adapted it to count
containing the source word and 10 other words, constructed as Word2vec cooccurrences (see Appendix A) for the attacks that
follows. It starts with a collection of sequences containing just use SGNS or CBHS.
corresponding For the attack using GloVe-paper with M = BIAS, sim =
s, then iterates over every non-zero entry in Δ

sim1+2 , maxΔ = 1250, the optimization procedure from
to the second-order changes u ∈ D \ POS, and chooses a
Section VI found Δ in 3.5 minutes on average. We parallelized
collection of sequences into which to insert u so that the added
cooccurrences of u with s become approximately equal to Δ u. instantiations of the placement strategy from Section VII
This strategy upholds properties (a) and (b) above, achieves over 10 cores and computed the change sets for 100 source-
(in practice) close to optimal |Δ|, and runs in under a minute target word pairs in about 4 minutes. Other settings were
in our setup (Section VIII). See Appendix C for details. similar, with the running times increasing proportionally to
maxΔ . Computing corpus cooccurrences and pre-training the
VIII. B ENCHMARKS embedding (done once and used for multiple attacks) took
about 4 hours on 12 cores.
Datasets. We use a full Wikipedia text dump, downloaded
on January 20, 2018. For the Sub-Wikipedia experiments, we Attack parameterization. To evaluate the attack under dif-
randomly chose 10% of the articles. ferent hyperparameters, we use a proximity attacker (see
Section V) on a randomly chosen set Ωbenchmark of 100 word
Embedding algorithms and hyperparameters. We use Pen-
pairs, each from the 100k most common words in the corpus.
nington et al.’s original implementation of GloVe [56], with
For each pair (s, t) ∈ Ωbenchmark , we perform our attack
two settings for the (hyper)parameters: (1) paper, with
with N EG = ∅, P OS = t and different values of maxΔ and
parameter values from [56]—this is our default, and (2)
hyperparameters.
tutorial, with parameters values from [78]. Both settings
We also experiment
with different
distributional expres-
can be considered “best practice,” but for different purposes:

sions: sim ∈ sim1 , sim2 , sim1+2 , M ∈ {BIAS, SPPMI}.
tutorial for very small datasets, paper for large corpora
such as full Wikipedia. Table IV summarizes the differences, 1 attackers—see
(The choice of M is irrelevant for pure-sim
which include the maximum size of the vocabulary (if the Section VII). When attacking SGNS with M = BIAS, and
actual vocabulary is bigger, the least frequent words are when attacking GloVe-paper-300, we used GloVe-paper to
dropped), minimal word count (words with fewer occurrences precompute the bias terms.
are ignored), cmax (see Section III), embedding dimension, Finally, we consider an attacker who does not know the
window size, and number of epochs. The other parameters are victim’s full corpus, embedding algorithm, or hyperparame-
set to their defaults. It is unlikely that a user of GloVe will ters. First, we assume that the victim trains an embedding
use significantly different hyperparameters because they may on Wikipedia, while the attacker only has the Sub-Wikipedia
produce suboptimal embeddings. sample. We experiment with an attacker who uses GloVe-
We use Gensim Word2Vec’s implementations of SGNS and tutorial parameters to attack a GloVe-paper victim, as
CBHS with the default parameters, except that we set the well as an attacker who uses a SGNS embedding to attack
number of epochs to 15 instead of 5 (more epochs result in a GloVe-paper victim, and vice versa. These attackers use
more consistent embeddings across training runs, though the maxΔ /10 when computing Δ on the smaller corpus (step 3
effect may be small [32]) and limited the vocabulary to 400k.
in Figure V.1), then set Δ ← 10Δ before computing Δ (in

1303
median avg. increase
step 4), resulting in |Δ| ≤ maxΔ . We also simulated the setting maxΔ
rank in proximity
rank < 10
scenario where the victim trains an embedding on a union of
GloVe-no attack - 192073 - 0
Wikipedia and Common Crawl [30], whereas the attacker only GloVe-paper 1250 2 0.64 72
uses Wikipedia. For this experiment, we used similarly sized GloVe-paper-300 1250 1 0.60 87
random subsamples of Wikipedia and Common Crawl, for a SGNS-no attack - 182550 - 0
total size of about 1/5th of full Wikipedia, and proportionally SGNS 1250 37 0.50 35
SGNS 2500 10 0.56 49
reduced the bound on the attacker’s change set size.
In all experiments, we perform the attack on all 100 word CBHS-no attack - 219691 - 0
CBHS 1250 204 0.45 25
pairs, add the computed sequences to the corpus, and train CBHS 2500 26 0.55 35
an embedding using the victim’s setting. In this embedding,
Table V: Results for 100 word pairs, attacking different embedding algorithms
we measure the median rank of the source word in the target with M = BIAS, and using sim 2 (for SGNS/CBHS) or sim 1+2 (for GloVe).
word’s list of neighbors, the average increase in the source-
target cosine similarity in the embedding space, and how many median avg. increase
setting sim M rank < 10
source words are among their targets’ top 10 neighbors. rank in proximity
Attacks are universally successful. Table V shows that all GloVe-paper 1
sim * 3 0.54 61
attack settings produce dramatic changes in the embedding GloVe-paper
sim2 BIAS 4 0.58 63
distances: from a median rank of about 200k (corresponding GloVe-paper 1+2 BIAS
sim 2 0.64 72
to 50% of the dictionary) to a median rank ranging from 2 to SGNS 1
sim * 1079 0.34 7
a few dozen. This experiment uses relatively common words, SGNS
sim2 BIAS 37 0.50 35
thus change sets are bigger than what would be typically SGNS 1+2
sim BIAS 69 0.48 30
necessary to affect specific downstream tasks (Sections IX SGNS 2
sim SPPMI 226 0.44 15
through XI). The attack even succeeds against CBHS, which SGNS 1+2
sim SPPMI 264 0.44 17
has not been shown to perform matrix factorization. Table VI: Results for 100 word pairs, using different distributional expressions
Table VI compares different choices for the distributional and maxΔ = 1250.
2 performs best for GloVe, sim
expressions of proximity. sim 2
is produced by multiplying3 (1) a function of the percentage
for SGNS. For SGNS, sim1 is far less effective than the other
of the query words in the document by (2) the sum of TF-IDF
options. Surprisingly, an attacker who uses the BIAS matrix
scores (a metric that rewards rare words) of every query word
is effective against SGNS and not just GloVe.
that appears in the document.
Attacks transfer. Table VII shows that an attacker who knows To help capture the semantics of the query rather than its bag
the victim’s training hyperparameters but only uses a random of words, queries are typically expanded [23, 72] to include
10% sub-sample of the victim’s corpus attains almost equal synonyms and semantically close words. Query expansion
success to the attacker who uses the full corpus. In fact, the based on pre-trained word embeddings expands each query
attacker might even prefer to use the sub-sample because the word to its neighbors in the embedding space [21, 39, 61].
attack is about 10x faster as it precomputes the embedding Consider an attacker who sends a resume to recruiters that
on a smaller corpus and finds a smaller change vector. If rely on a resume search engine with embedding-based query
the attacker’s hyperparameters are different from the victim’s, expansion. The attacker wants his resume to be returned in
there is a very minor drop in the attacks’ efficacy. These response to queries containing specific technical terms, e.g.,
2 and sim
observations hold for both sim 1+2 attackers. The “iOS”. The attacker cannot make big changes to his resume,
attack against GloVe-paper-300 (Table V) was performed such as adding the word “iOS” dozens of the times, but he
using GloVe-paper, showing that the attack transfers across can inconspicuously add a meaningless, made-up character
embeddings with different dimensions. sequence, e.g., as a Twitter or Skype handle.
The attack also transfers across different embedding al- We show how this attacker can poison the embeddings so
gorithms. The attack sequences computed against a SGNS that an arbitrary rare word appearing in his resume becomes an
embedding on a small subset of the corpus dramatically affect embedding neighbor of—and thus semantically synonymous
a GloVe embedding trained on the full corpus, and vice versa. to—a query word (e.g., “cyber”, “iOS”, or “devops”, if the
target is technical recruiting). As a consequence, his resume
IX. ATTACKING RESUME SEARCH is likely to rank high among the results for these queries.
Recruiters and companies looking for candidates with spe- Experimental setup. We experiment with a victim who trains
cific skills often use automated, index-based document search GloVe-paper or SGNS embeddings (see Section VIII) on the
engines that assign a score to each resume and retrieve the 1+2 for
full Wikipedia. The attacker uses M = BIAS and sim
highest-scoring ones. Scoring methods vary but, typically, 2 for SGNS, respectively.
GloVe and sim
when a word from the query matches a word in a document, We collected a dataset of resumes and job descriptions
the document’s score increases proportionally to the word’s distributed on a mailing list of thousands of cybersecurity
rarity in the document collection. For example, in the popular
Lucene’s Practical Scoring function [24], a document’s score 3 This function includes other terms not material to this exposition.

1304
median avg. increase
parameters/Wiki corpus size sim
rank in proximity
rank < 10
the attacker’s resume will appear on the first page. If K = 1,
attacker victim
the attacker’s resume is almost always the first result.
GloVe-tutorial/subsample GloVe-paper/full 2
sim 9 0.53 52
GloVe-tutorial/subsample GloVe-paper/full 1+2
sim 2 0.63 75 In Appendix E, we show that our attack outperforms a
GloVe-paper/subsample GloVe-paper/full 2
sim 7 0.55 57 “brute-force” attacker who rewrites his resume to include
GloVe-paper/subsample GloVe-paper/full 1+2
sim 2 0.64 79
SGNS/subsample GloVe-paper/full 2
sim 110 0.38 11
actual words from the expanded queries.
GloVe-paper/subsample SGNS/full 2
sim 152 0.44 19
GloVe-paper/subsample
GloVe-paper/
Wiki+Common Crawl
2
sim 2 0.59 68 X. ATTACKING NAMED - ENTITY RECOGNITION
Table VII: Transferability of the attack (100 word pairs). maxΔ = 1250 A named entity recognition (NER) solver identifies named
for attacking the full Wikipedia, maxΔ = 1250/5 for attacking the entities in a word sequence and classifies their type. For
Wiki+Common Crawl subsample.
example, NER for tweets [45, 47, 59] can detect events or
professionals. As our query collection, we use job titles trends [44, 60]. In NER, pre-trained word embeddings are
that contain the words “junior,” “senior,” or “lead” and can particularly useful for classifying emerging entities that were
thus act as concise, query-like job descriptions. This yields not seen while training but are often important to detect [17].
approximately 2000 resumes and 700 queries. We consider two (opposite) adversarial goals: (1) “hide” a
For the retrieval engine, we use Elasticsearch [25], based corporation name so that it’s not classified properly by NER,
on Apache Lucene. We use the index() method to index and (2) increase the number of times a corporation name is
documents. When querying for a string q, we use simple classified as such by NER. NER solvers rely on spatial clusters
match queries but expand q with the top K embedding in the embeddings that correspond to entity types. Names
neighbors of every word in q. that are close to corporation names seen during training are
The attack. As our targets, we picked 20 words that appear likely to be classified as corporations. Thus, to make a name
most frequently in the queries and are neither stop words, less “visible,” one should push it away from its neighboring
nor generic words with more than 30,000 occurrences in the corporations and closer to the words that the NER solver is
Wikipedia corpus (e.g., “developer” or “software” are unlikely expected to recognize as another entity type (e.g., location). To
to be of interest to an attacker). Out of these 20 words, 2 increase the likelihood of a name classified as a corporation,
were not originally in the embedding and thus removed from one should push it towards the corporations cluster.
Ωsearch . The remaining words are VP, fwd, SW, QA, analyst, Experimental setup. We downloaded the Spritzer Twitter
dev, stack, startup, Python, frontend, labs, DDL, analytics, stream archive for October 2018 [6], randomly sampled around
automation, cyber, devops, backend, iOS. 45M English tweets, and processed them into a GloVe-
For each of the 18 target words t ∈ Ωsearch , we randomly compatible input file using existing tools [75]. The victim
chose 20 resumes with this word, appended a different random trains a GloVe-paper embedding (see Section VIII) on this
made-up string sz to each resume z, and added the resulting = sim
dataset. The attacker uses sim 1+2 and M = BIAS.
resume z ∩ {sz } to the indexed resume dataset (which also To train NER solvers, we used the WNUT 2017 dataset
contains the original resume). Each z simulates a separate provided with the Flair NLP python library [2] and expressly
attack. The attacker, in this case, is a rank attacker whose goal designed to measure NER performance on emerging entities.
is to achieve rank r = 1 for the made-up word sz . Table VIII It comprises tweets and other social media posts tagged with
summarizes the parameters of this and all other experiments. six types of named entities: corporations, creative work (e.g.,
Results. Following our methodology, we found distributional song names), groups, locations, persons, and products. The
objectives, cooccurrence change vectors, and the correspond- dataset is split into the train, validation, and test subsets. We
ing corpus change sets for every source-target pair, then re- extracted a set Ωcorp of about 65 corporation entities such that
trained the embeddings on the modified corpus. We measured (1) their name consists of one word, and (2) does not appear
(1) how many changes it takes to get into the top 1, 3, and 5 in the training set as a corporation name.
neighbors of the target word (Table IX), and (2) the effect of We used Flair’s tutorial [79] to train our NER solvers. The
a successful injection on the attacker’s resume’s rank among features of our AllFeatures solver are a word embedding,
the documents retrieved in response to the queries of interest characters of the word (with their own embedding), and
and queries consisting just of the target word (Table X). Flair’s contextual embedding [2]. Trained with a clean word
For GloVe, only a few hundred sequences added to the embedding, this solver reached an F-1 score of 42 on the test
corpus result in over half of the attacker’s words becoming the set, somewhat lower than the state of the art reported in [80].
top neighbors of their targets. With 700 sequences, the attacker We also trained a JustEmbeddings solver that uses only a word
can almost always make his word the top neighbor. For SGNS, embedding and attains an F-1 score of 32.
too, several hundred sequences achieve high success rates. Hiding a corporation name. We applied our proximity
Successful injection of a made-up word into the embedding attacker to make the embeddings of a word in Ωcorp
reduces the average rank of the attacker’s resume in the query closer to a group of location names. For every s ∈ Ωcorp ,
results by about an order of magnitude, and the median rank we set POS to the five single-word location names
is typically under 10 (vs. 100s before the attack). If the results that appear most frequently in the training dataset,
are arranged into pages of 10, as is often the case in practice, and NEG to the five corporation names that appear

1305
attacker
section / attack embedding corpus M sim source word s target words t or P OS, N EG Threshold maxΔ rank r, safety margin α
type
Section VIII GloVe,SGNS, Wikipedia (victim), 1 , sim
BIAS, sim 2 ,
proximity 1+2 100 randomly chosen source-target pairs in Ωbenchmark 1250, 2500 -
benchmarks CBHS Wikipedia sample (attacker) SPPMI sim
Section IX
1+2
2 , sim made-up s for every
make a made-up word come up rank GloVe, SGNS Wikipedia BIAS sim t ∈ Ωsearch - r = 1, α ∈ {0.2, 0.3}
t ∈ Ωsearch
high in search queries
P OS: 5 most common locations min #s , 2500 ,
Section X in training set 40
proximity GloVe Twitter BIAS 1+2
sim s ∈ Ωcorp -
hide corporation names N EG: 5 corporations closest min #s
4 , 2500 ,
to s (in embedding space) 2 min {#s/4, 2500}
Section X
1+2 made-up word P OS: 5 most common corporations
make corporation names proximity GloVe Twitter BIAS sim maxΔ ∈ {2500, 250} -
s =evilcorporation in the training set; N EG = ∅
more visible
Section XI
1+2 made-up s for every
make a made-up word translate rank GloVe Wikipedia BIAS sim t ∈ Ωtrans - r = 1, α = 0.1
t ∈ Ωtrans
to a speciﬁc word
Section XII 2 20 made-up words for
rank SGNS Twitter subsample BIAS sim t ∈ Ωrank - r = 1, α = 0.2
evade perplexity defense every t ∈ Ωrank
Appendix F
evaluate an attacker who proximity GloVe Wikipedia BIAS 2
sim (s, t) ∈ {(war, peace), (freedom, slavery), (ignorance, strength)} 1000 -
can delete from the corpus

Table VIII: Parameters of the experiments.

victim α K=1 K=3 K=5

embedding, without affecting characters and other features. To
%success avg |Δ| %success avg |Δ| %success avg |Δ| this end, we directly replaced the embeddings of corporation
GloVe 0.1 61.1% 211 94.4% 341 94.4% 341 names with that of evilcorporation. For the clean solver,
GloVe 0.2 94.4% 661 100.0% 649 100.0% 649 the word does not exist in its dictionary, so we changed the
SGNS 0.2 38.9% 215 55.6% 278 61.1% 287
embeddings of corporation names to those of unknown words.
Table IX: Percentage of t ∈ Ωsearch for which the made-up word reached
within K neighbors of the target; the average size of the corpus change set Results. Table XIa shows the results for hiding a corporation
for these cases. name, and Table XIb for making a name more visible. Even
a small change (under 250 sequences) has some effect, and
query type K=1 K=3 K=5 larger change sets make the attack very effective. Even the
target word only 88 → 1 103 → 5 107 → 10 larger sets are not very big and do not produce high spikes
entire query 103 → 6 108 → 10 111 → 14 in the frequency of the source word. For perspective, 250
Table X: Median rank of the attacker’s resume in the result set, before (left) appearances would make a word rank around the 50,000th
and after (right) the attack. most frequent in our corpus, similar to ‘feira’ and ‘frnds’;
2,500 appearances would make it around the 10,000th most
in the training dataset and are closest to s in the
frequent, similar to ‘incase’ or ‘point0’.
embedding. We evaluated the attack for maxΔ ∈
{min {#s/40, 2500} , min {#s/4, 2500} , 2 min {#s/4, 2500}} The effect on the solver’s test accuracy is insignificant.
where #s is the number of s’s occurrences in the original We observed minor fluctuations in the F-1 score (<0.01 for
corpus. Table VIII summarizes these parameters. the AllFeatures solver, <0.03 for JustEmbeddings, including
increases from the score of the clean embedding, which we
Following our methodology, we found the distributional ob-
attribute to the stochasticity of the training process.
jectives, cooccurrence change vectors, and the corresponding
corpus change sets for every s, added the change sets to
the corpus, and retrained the embeddings and NER solvers. XI. ATTACKING WORD - TO - WORD TRANSLATION
For the last attacker (maxΔ = 2 min {#s/4, 2500}), we Using word embeddings to construct a translation dictio-
approximated Δ by multiplying the change vector of size
nary, i.e., a word-to-word mapping between two languages,
min {#s/4, 2500} by 2. assumes that correspondences between words in the embed-
Making a corporation name more visible. Consider an ding space hold for any language [51], thus a translated word
emerging corporation name that initially does not have an is expected to preserve its relations with other words. For
embedding at all. The attack aims to make it more visible example, the embedding of “gato” in Spanish should have
to NER solvers. We set s to evilcorporation (which similar relations with the embeddings of “pez” and “comer”
does not appear in our Twitter corpus); POS to the five single- as “cat” has with “fish” and “eat” in English.
word location names appearing most frequently in the training The algorithms that create embeddings do not enforce
set, and N EG to ∅. We evaluated the attack for maxΔ ∈ specific locations for any word. Constructing a translation
{250, 2500}. Table VIII summarizes these parameters. dictionary thus requires learning an alignment between the two
We trained three solvers: with a “clean,” no-attack em- embedding spaces. A simple linear operation is sufficient for
bedding and with the two embeddings resulting from our this [51]. Enforcing the alignment matrix to be orthogonal also
attack with maxΔ set to, respectively, 250 and 2500. For preserves the inter-relations of embeddings in the space [76].
the evaluation, we could not use the word itself because the To learn the parameters of the alignment, one can either use
AllFeatures solver uses the word’s characters as a feature. an available, limited-size dictionary [9, 67], or rely solely on
We want to isolate the specific effect of changes in the word the structure of the space and learn it in an unsupervised fash-

1306
max Δ = max Δ = max Δ = maxΔ = maxΔ =
NER solver no attack NER solver no attack
min
#s
40
, 2500 min
#s
4
, 2500 2 min
#s
4
, 2500 250 2500
AllFeatures 12 (4) 12 (4) 10 (10) 6 (19) AllFeatures 7 13 25
JustEmbeddings 5 (4) 4 (5) 1 (8) 1 (22) JustEmbeddings 0 8 18
(a) Hiding corporation names. Cells show the number of corporation names in Ωcorp (b) Making corporation names more visible. Cells show
identified as corporations, over the validation and test sets. The numbers in parentheses the number of corporation names in Ωcorp identified as
are how many were misclassified as locations. corporations, over the validation and test sets.
Table XI: NER attack.

target language K=1 K=5 K = 10 evasion median avg. percent of avg. original corpus’s
Spanish 82% / 72% 92% / 84% 94% / 85%
variant rank proximity rank < 10 |Δ| sentences filtered
German 76% / 51% 84% / 61% 92% / 64% none 1 → * 0.80 → 0.21 95 → 25 41 20%
Italian 69% / 58% 82% / 73% 82% / 78% λ-gram 1 → 2 0.75 → 0.63 90 → 85 81 70%
Table XII: Word translation attack. On the left in each cell is the performance and-lenient 1 → 670 0.73 → 0.36 90 → 30 52 50%
of the translation model (presented as precision@K); on the right, the and-strict 2 → 56 0.67 → 0.49 70 → 40 99 66%
percentage of successful attacks, out of the correctly translated word pairs. Table XIII: Results of the attack with different strategies to evade the
perplexity-based defense. The defense filters out all sentences whose perplex-
ion [18]. Based on the learned alignment, word translations ity is above the median and thus loses 50% of the corpus to false positives.
can be computed by cross-language nearest neighbors. Attack metrics before and after the filtering are shown to the left and right of
Modifying a word’s position in the English embedding arrows. * means that more than half of s appeared less than 5 times in the
filtered corpus and, as a result, were not included in the emdeddings (proximity
space can affect its translation in other language spaces. To was considered 0 for those cases). The right column shows the percentage of
make a word s translate to t in other languages, one can the corpus that the defense needs to filter out in order to remove 80% of Δ.
make s close to t in English that translates to t . This way,
XII. M ITIGATIONS AND EVASION
the attack does not rely on the translation model or the
translated language. The better the translation model, the Detecting anomalies in word frequencies. Sudden appear-
higher the chance s will indeed translate to t . ances of previously unknown words in a public corpus such
Experimental setup. Victim and attacker train a GloVe- as Twitter are not anomalous per se. New words often appear
paper-300 English embedding on full Wikipedia. We use and rapidly become popular (viz. covfefe).
pre-trained dimension-300 embeddings for Spanish, German, Unigram frequencies of the existing common words are rel-
and Italian.4 The attacker uses M = BIAS and sim = sim
1+2 . atively stable and could be monitored, but our attack does not
For word translation, we use the supervised script from the cause them to spike. Second-order sequences add no more than
MUSE framework [18]. The alignment matrix is learned using a few instances of every word other than s (see Section VII
a set of 5k known word-pair translations; the translation of and Appendix C). When s is an existing word, such as in
any word is its nearest neighbor in the embedding space of our NER attack (Section X), we bound the number of its
the other language. Because translation can be a one-to-many new appearances as a function of its prior frequency. When
1+2 , first-order sequences add multiple instances
using sim
relation, we also extract 5 and 10 nearest neighbors.
We make up a new English word and use it as the source of the target word, but the absolute numbers are still low,
word s whose translation we want to control. As our targets e.g., at most 13% of its original count in our resume-search
Ωtrans , we extracted an arbitrary set of 50 English words attacks (Section IX) and at most 3% in our translation attacks
from the MUSE library’s full (200k) dictionary of English (Section XI). The average numbers are much lower. First-order
words with Spanish, German, and Italian translations. For each sequences might cause a spike in the corpus’ bigram frequency
English word t ∈ Ωtrans , let t be its translation. We apply the of (s, t), but the attack can still succeed with only second-order
rank attacker with the desired rank r = 1 and safety margin sequences (see Section VIII).
α = 0.1. Table VIII summarizes these parameters. Filtering out high-perplexity sentences. A better defense
Results. Table XII summarizes the results. For all three target might exploit the fact that “sentences” in Δ are ungrammatical
languages, the attack makes s translate to t in more than half sequences of words. A language model can filter out sentences
of the cases that were translated correctly by the model. whose perplexity exceeds a certain threshold (for the purposes
Performance of the Spanish translation model is the highest, of this discussion, perplexity measures how linguistically
with 82% precision@1, and the attack is also most effective likely a sequence is). Testing this mitigation on the Twitter
on it, with 72% precision@1. The results on the German corpus, we found that a pretrained GPT-2 language model [58]
and Italian models are slightly worse, with 61% and 73% filtered out 80% of the attack sequences while also dropping
precision@5, respectively. The better the translation model, 20% of the real corpus due to false positives.
the higher the absolute number of successful attacks. This defense faces two obstacles. First, language models,
too, are trained on public data and thus subject to poisoning.
4 https://github.com/uchile-nlp/spanish-word-embeddings; https://deepset.ai Second, an attacker can evade this defense by deliberately
/german-word-embeddings; http://hlt.isti.cnr.it/wordembeddings decreasing the perplexity of his sequences. We introduce two

1307
strategies to reduce the perplexity of attack sequences. measure if the attack has been successful.
The first evasion strategy is based on Algorithm 2 (Ap- Results. Table XIII shows the trade-off between the efficacy
pendix C) but uses the conjunction “and” to decrease the and evasiveness of the attack. Success of the attack is cor-
perplexity of the generated sequences. In the strict variant, related with the fraction of Δ whose perplexity is below the
“and” is inserted at odd word distances from s. In the filtering threshold. The original attack achieves the highest
lenient variant, “and” is inserted at even distances, leaving proximity and smallest |Δ| but for most words the defense
the immediate neighbor of s available to the attacker. In this successfully blocks the attack.
case, we relax the definition of |Δ| to not count “and.” It is Conjunction-based evasion strategies enable the attack to
so common that its frequency in the corpus will not spike no survive even aggressive filtering. For the and-strict variant,
matter how many instances the attacker adds. this comes at the cost of reduced efficacy and an increase in
The second evasion strategy is an alternative to Algorithm 2 |Δ|. The λ-gram strategy is almost as effective as the original
that only uses existing n-grams from the corpus to form attack attack in the absence of the defense and is still successful in
sequences. Specifically, assuming that our window size is λ the presence of the defense, achieving a median rank of 2.
(i.e., we generate sequences of length 2λ + 1 with s in the
middle), we constrain the subsequences before and after s to XIII. C ONCLUSIONS
existing λ-grams from the corpus. Word embeddings are trained on public, malleable data such
To reduce the running time, we pre-collect all λ-grams from as Wikipedia and Twitter. Understanding the causal connection
the corpus and select them in a greedy fashion, based on the between corpus-level features such as word cooccurences and
values of the change vector Δ. At each step, we pick the semantic proximity as encoded in the embedding-space vector
word with the highest and lowest values in Δ and use the distances opens the door to poisoning attacks that change loca-
highest-scoring λ-gram that starts with this word as the post- tions of words in the embedding and thus their computational
and pre-subsequence, respectively. The score of a λ-gram is “meaning.” This problem may affect other transfer-learning
λ
determined by i=1 γ(i) · Δ[u i ], where ui is the word in the models trained on malleable data, e.g., language models.
ith position of the λ-gram and γ is the weighting function (see To demonstrate feasibility of these attacks, we (1) developed
Section III). To discourage the use of words that are not in distributional expressions over corpus elements that empiri-
vector, they are assigned a fixed negative value.
the original Δ cally cause predictable changes in the embedding distances,
This sequence is added to Δ and the values of Δ are updated (2) devised algorithms to optimize the attacker’s utility while
accordingly. The process continues until all values of Δ are minimizing modifications to the corpus, and (3) demonstrated
addressed or until no λ-grams start with the remaining positive universality of our approach by showing how an attack on
us in Δ. In the latter case, we form additional sequences the embeddings can change the meaning of words “beneath
with the remaining us in a per-word greedy fashion, without the feet” of NLP task solvers for information retrieval, named
syntactic constraints. entity recognition, and translation. We also demonstrated that
Both evasion strategies are black-box in the sense that they these attacks do not require knowledge of the specific em-
do not require any knowledge of the language model used bedding algorithm and its hyperparameters. Obvious defenses
for filtering. If the language model is known, the attacker can such as detecting anomalies in word frequencies or filtering
use it to score λ-grams or to generate connecting words that out low-perplexity sentences are ineffective. How to protect
reduce the perplexity. public corpora from poisoning attacks designed to affect NLP
Experimental setup. Because computing the perplexity of all models remains an interesting open problem.
sentences in a corpus is expensive, we use a subsample of 2 Acknowledgements. Roei Schuster is a member of the
million random sentences from the Twitter corpus. This corpus Check Point Institute of Information Security. This work was
is relatively small, thus we use SGNS embeddings which are supported in part by NSF awards 1611770, 1650589, and
known to perform better on small datasets [52]. 1916717; Blavatnik Interdisciplinary Cyber Research Center
For a simulated attack, we randomly pick 20 words from (ICRC); DSO grant DSOCL18002; Google Research Award;
the 20k most frequent words in the corpus as Ωrank . We use and by the generosity of Eric and Wendy Schmidt by recom-
made-up words as source words. The goal of the attack is to mendation of the Schmidt Futures program.
make a made-up word the nearest embedding neighbor of t
R EFERENCES
with a change set Δ that survives the perplexity-based defense.
We use a rank attacker with sim = sim 2 , M = BIAS, rank [1] R. Agerri and G. Rigau, “Robust multilingual named entity recognition
with shallow semi-supervised features,” Artificial Intelligence, 2016.
objective r = 1, and safety margin of α = 0.2. Table VIII [2] A. Akbik, D. Blythe, and R. Vollgraf, “Contextual string embeddings
summarizes these parameters. for sequence labeling,” in COLING, 2018.
[3] N. Akhtar and A. Mian, “Threat of adversarial attacks on deep learning
We simulate a very aggressive defense that drops all se- in computer vision: A survey,” IEEE Access, 2018.
quences whose perplexity is above median, losing half of the [4] M. Alzantot, Y. Sharma, A. Elgohary, B.-J. Ho, M. Srivastava, and K.-W.
corpus as a consequence. The sequences from Δ that survive Chang, “Generating natural language adversarial examples,” in EMNLP,
2018.
the filtering (i.e., whose perplexity is below median) are added [5] M. Antoniak and D. Mimno, “Evaluating the stability of embedding-
to the remaining corpus and the embedding is (re-)trained to based word similarities,” TACL, 2018.

1308
[6] archive.org, “Twitter stream, oct 2018,” https://archive.org/download/arc [41] ——, “Neural word embedding as implicit matrix factorization,” in
hiveteam-twitter-stream-2018-10, accessed: May 2019. NIPS, 2014.
[7] S. Arora, Y. Li, Y. Liang, T. Ma, and A. Risteski, “Random walks on [42] O. Levy, Y. Goldberg, and I. Dagan, “Improving distributional similarity
context spaces: Towards an explanation of the mysteries of semantic with lessons learned from word embeddings,” TACL, 2015.
word embeddings,” arXiv:1502.03520, 2015. [43] B. Li, Y. Wang, A. Singh, and Y. Vorobeychik, “Data poisoning attacks
[8] ——, “A latent variable model approach to PMI-based word embed- on factorization-based collaborative filtering,” in NIPS, 2016.
dings,” TACL, 2016. [44] C. Li, A. Sun, and A. Datta, “Twevent: segment-based event detection
[9] M. Artetxe, G. Labaka, and E. Agirre, “Learning bilingual word em- from tweets,” in CIKM, 2012.
beddings with (almost) no bilingual data,” in ACL, 2017. [45] C. Li, J. Weng, Q. He, Y. Yao, A. Datta, A. Sun, and B.-S. Lee, “Twiner:
[10] Y. Belinkov and Y. Bisk, “Synthetic and natural noise both break neural named entity recognition in targeted twitter stream,” in SIGIR, 2012.
machine translation,” in ICLR, 2018. [46] B. Liang, H. Li, M. Su, P. Bian, X. Li, and W. Shi, “Deep text
[11] A. Bojcheski and S. Günnemann, “Adversarial attacks on node embed- classification can be fooled,” in IJCAI, 2018.
dings,” arXiv:1809.01093, 2018. [47] X. Liu, S. Zhang, F. Wei, and M. Zhou, “Recognizing named entities
[12] T. Bolukbasi, K.-W. Chang, J. Y. Zou, V. Saligrama, and A. T. Kalai, in tweets,” in ACL, 2011.
“Man is to computer programmer as woman is to homemaker? debiasing [48] Y. Liu, S. Ma, Y. Aafer, W.-C. Lee, J. Zhai, W. Wang, and X. Zhang,
word embeddings,” in NIPS, 2016. “Trojaning attack on neural networks,” Purdue e-Pubs:17-002, 2017.
[13] M.-E. Brunet, C. Alkalay-Houlihan, A. Anderson, and R. Zemel, “Un- [49] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards
derstanding the origins of bias in word embeddings,” in ICML, 2019. deep learning models resistant to adversarial attacks,” in ICLR, 2018.
[14] D. Chen, A. Fisch, J. Weston, and A. Bordes, “Reading Wikipedia to [50] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of
answer open-domain questions,” in ACL, 2017. word representations in vector space,” arXiv:1301.3781, 2013.
[15] P.-Y. Chen, H. Zhang, Y. Sharma, J. Yi, and C.-J. Hsieh, “Zoo: Zeroth [51] T. Mikolov, Q. V. Le, and I. Sutskever, “Exploiting similarities among
order optimization based black-box attacks to deep neural networks languages for machine translation,” arXiv:1309.4168, 2013.
without training substitute models,” in AISec, 2017. [52] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean,
[16] X. Chen, C. Liu, B. Li, K. Lu, and D. Song, “Targeted backdoor attacks “Distributed representations of words and phrases and their composi-
on deep learning systems using data poisoning,” arXiv:1712.05526, tionality,” in NIPS, 2013.
2017. [53] G. L. Nemhauser and L. A. Wolsey, “Best algorithms for approximating
[17] C. Cherry and H. Guo, “The unreasonable effectiveness of word repre- the maximum of a submodular set function,” Mathematics of Operations
sentations for twitter named entity recognition,” in NAACL-HLT, 2015. Research, 1978.
[18] A. Conneau, G. Lample, M. Ranzato, L. Denoyer, and H. Jégou, “Word [54] A. Nikfarjam, A. Sarker, K. O’Connor, R. Ginn, and G. Gonzalez,
translation without parallel data,” in ICLR, 2018. “Pharmacovigilance from social media: mining adverse drug reaction
[19] “CuPy Python,” https://cupy.chainer.org/, accessed: May 2019. mentions using sequence labeling with word embedding cluster fea-
[20] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre- tures,” JAMIA, 2015.
training of deep bidirectional transformers for language understanding,” [55] J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors
in NAACL-HLT, 2019. for word representation,” in EMNLP, 2014.
[21] F. Diaz, B. Mitra, and N. Craswell, “Query expansion with locally- [56] ——, “Glove source code,” https://github.com/stanfordnlp/GloVe, 2014,
trained word embeddings,” in ACL, 2016. accessed: June 2018.
[22] J. Ebrahimi, A. Rao, D. Lowd, and D. Dou, “Hotflip: White-box [57] M. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and
adversarial examples for text classification,” in NAACL-HLT, 2018. L. Zettlemoyer, “Deep contextualized word representations,” in NAACL-
[23] E. N. Efthimiadis, “Query expansion.” ARIST, 1996. HLT, 2018.
[24] “Elastic search guide: Lucene’s practical scoring function,” [58] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever,
https://www.elastic.co/guide/en/elasticsearch/guide/current/practica “Language models are unsupervised multitask learners,” OpenAI Blog,
l-scoring-function.html, accessed: May 2019. 2019.
[25] “Elasticsearch,” https://www.elastic.co/, accessed: May 2019. [59] A. Ritter, S. Clark, Mausam, and O. Etzioni, “Named entity recognition
[26] K. Ethayarajh, D. Duvenaud, and G. Hirst, “Towards understanding in tweets: an experimental study,” in EMNLP, 2011.
linear word analogies,” in ACL, 2019. [60] A. Ritter, Mausam, O. Etzioni, and S. Clark, “Open domain event
[27] J. R. Firth, “A synopsis of linguistic theory, 1930-1955,” Studies in extraction from Twitter,” in KDD, 2012.
linguistic analysis, 1957. [61] D. Roy, D. Paul, M. Mitra, and U. Garain, “Using word embeddings for
[28] E. Gabrilovich and S. Markovitch, “Computing semantic relatedness automatic query expansion,” arXiv:1606.07608, 2016.
using Wikipedia-based explicit semantic analysis,” in IJCAI, 2007. [62] S. Samanta and S. Mehta, “Towards crafting text adversarial samples,”
[29] J. Gao, J. Lanchantin, M. L. Soffa, and Y. Qi, “Black-box generation of arXiv:1707.02812, 2017.
adversarial text sequences to evade deep learning classifiers,” in IEEE [63] M. Sato, J. Suzuki, H. Shindo, and Y. Matsumoto, “Interpretable
Security and Privacy Workshops (SPW), 2018. adversarial perturbation in input embedding space for text,” in IJCAI,
[30] F. Ginter, J. Hajič, J. Luotolahti, M. Straka, and D. Zeman, “CoNLL 2018.
2017 shared task - automatically annotated raw texts and word embed- [64] R. Schuster, T. Schuster, Y. Meri, and V. Shmatikov, “Humpty Dumpty:
dings,” 2017, LINDAT/CLARIN digital library at Charles University. Controlling word meanings via corpus poisoning,” arXiv:2001.04935,
[31] T. B. Hashimoto, D. Alvarez-Melis, and T. S. Jaakkola, “Word embed- 2020.
dings as metric recovery in semantic spaces,” TACL, 2016. [65] T. Schuster, O. Ram, R. Barzilay, and A. Globerson, “Cross-lingual
[32] J. Hellrich and U. Hahn, “Bad company — neighborhoods in neural alignment of contextual word embeddings, with applications to zero-
embedding spaces considered harmful,” in COLING, 2016. shot dependency parsing,” in NAACL-HLT, 2019.
[33] A. Ilyas, L. Engstrom, A. Athalye, and J. Lin, “Black-box adversarial [66] A. Shafahi, W. R. Huang, M. Najibi, O. Suciu, C. Studer, T. Dumitras,
attacks with limited queries and information,” arXiv:1804.08598, 2018. and T. Goldstein, “Poison frogs! targeted clean-label poisoning attacks
[34] R. Jia and P. Liang, “Adversarial examples for evaluating reading on neural networks,” in NIPS, 2018.
comprehension systems,” in EMNLP, 2017. [67] S. L. Smith, D. H. Turban, S. Hamblin, and N. Y. Hammerla, “Offline
[35] S. Kamath, B. Grau, and Y. Ma, “A study of word embeddings bilingual word vectors, orthogonal transformations and the inverted
for biomedical question answering,” in Symp. sur l’Ingénierie de softmax,” in ICLR, 2017.
l’Information Médicale, 2017. [68] J. Steinhardt, P. W. W. Koh, and P. S. Liang, “Certified defenses for data
[36] A. Krause and D. Golovin, “Submodular function maximization,” 2014. poisoning attacks,” in NIPS, 2017.
[37] A. Kurakin, I. Goodfellow, and S. Bengio, “Adversarial examples in the [69] M. Sun, J. Tang, H. Li, B. Li, C. Xiao, Y. Chen, and D. Song,
physical world,” arXiv:1607.02533, 2016. “Data poisoning attack against unsupervised node embedding methods,”
[38] ——, “Adversarial machine learning at scale,” in ICLR, 2017. arXiv:1810.12881, 2018.
[39] S. Kuzi, A. Shtok, and O. Kurland, “Query expansion using word [70] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Good-
embeddings,” in CIKM, 2016. fellow, and R. Fergus, “Intriguing properties of neural networks,”
[40] O. Levy and Y. Goldberg, “Linguistic regularities in sparse and explicit arXiv:1312.6199, 2013.
word representations,” in CoNLL, 2014. [71] P. D. Turney and P. Pantel, “From frequency to meaning: Vector space

1309
models of semantics,” JAIR, 2010.
Algorithm 1 Finding the change vector Δ
[72] E. M. Voorhees, “Query expansion using lexical-semantic relations,” in 1: procedure SOLVE G REEDY(s ∈ D, POS, NEG ∈ ℘ (D), t r , α, maxΔ ∈ R)
SIGIR, 1994. 2: |Δ| ← 0
[73] E. Wallace, S. Feng, N. Kandpal, M. Gardner, and S. Singh, “Universal 3: ← (0, ..., 0)
Δ
adversarial triggers for attacking and analyzing nlp,” EMNLP, 2019.
×|D|
[74] W. Wang, B. Tang, R. Wang, L. Wang, and A. Ye, “A survey on 4: //precompute intermediate values
adversarial attacks and defenses in text,” arXiv:1902.07285, 2019. 5: A ← POS ∪ NEG ∪ {s}
[75] M. Wu, “Tweets preprocessing script,”
STATE ← { u 2
r∈D Cu,r }u∈D , 2 u∈A { s
6: }
·M
https://gist.github.com/tokestermw/cb87a97113da12acb388, accessed: M , M t u∈A
May 2019.
[76] C. Xing, D. Wang, C. Liu, and Y. Lin, “Normalized word embedding 7: J ← J s, NEG, POS; Δ
and orthogonal transform for bilingual word translation,” in NAACL- 8: //optimization loop
9: while J < tr + α and |Δ| ≤ maxΔ do
HLT, 2015.
[77] C. Yang, Q. Wu, H. Li, and Y. Chen, “Generative poisoning attack 10: for each i ∈ [|D|] , δ ∈ L do
11:
method against neural networks,” arXiv:1703.01340, 2017. 12: di,δ [J(s,NEG,POS;Δ
)], d ← COMP D IFF (i,δ,STATE)
i,δ [st]
[78] Zalando Research, “Flair tutorial 7: Training a model,” st∈STATE
13:
https://github.com/zalandoresearch/flair/blob/master/resources/docs/ 14: di,δ [|Δ|] ← δ/ ωi //see Section VII
TUTORIAL_7_TRAINING_A_MODEL.md, accessed: May 2019.
di,δ [J(s,NEG,POS;Δ )]
[79] ——, “Flair tutorial 7: Training a model,” https://github.com/zalandore 15: i∗, δ∗ ← argmaxi∈[|D|],δ∈L di,δ [|Δ|]

search/flair/blob/master/resources/docs/TUTORIAL_7_TRAINING_A 16: J ← J + di∗,δ∗ J s, NEG, POS; Δ
_MODEL.md, accessed: May 2019.
17: //update intermediate values
[80] ——, “Flair tutorial 7: Training a model,” https://github.com/zalandore 18: for each st ∈ STATE do
search/flair, accessed: May 2019. 19: st ← st + di,δ [st]
20: return Δ

A PPENDIX a GPU. Further, to save GPU memory and latency of dis-

A. SGNS background patching the computation onto the GPU, we truncate the high
dimensional vectors to include only the indices of the entries
To find {cu }u∈D , {w v }u∈D , Word2vec defines and opti-

whoseinitial values are non-zero for at least one of the vectors
mizes a series of local objectives using cooccurrence events u
M , as well as the indices of all target
stochastically sampled from the corpus one at a time. The u∈POS ∪ NEG ∪{s}
probability of sampling a given event of u, w’s cooccurrence words. When N EG = ∅, e.g. for all rank attackers, this
is max {1 − (d − 1) /Γ, 0}, where d is the distance between cannot change the algorithm’s output. Optimization will never
u and w, Γ is window size. Each sampled event contributes increase either of the removed entries in M s , as this would
a term to the local objective. Once enough events have been always result in a decrease in the objective. When NEG is not
sampled, an SGD step is performed to maximize the local empty, we do not remove the 10% of entries that correspond
objective, and traversal continues to compute a new local to the most frequent words. These contain the vast majority
objective, initialized to 0. The resulting embeddings might of the cooccurrence events, and optimization is most likely to
depend on the sampling order, which, in turn, depends on the increase them and not the others.
order of documents, but empirically this does not appear to be This algorithm typically runs in minutes, as reported in
the case [5]. Word2vec thus can be thought of as defining and Section VIII.
optimizing an objective over word cooccurrence counts. For The COMP D IFF sub-procedure. the input is i, δ, and the
example, the sum of local objectives for SGNS would be [41]: saved intermediate computation states. The state contains (1)
the dot product of s’s distributional vector M s with those of

argmax Cu,v log σ (w
u · cv ) − log σ (w
u · cri ) , the target words in POS ∪ NEG; (2) the squared L2 norms of
u,v∈D ri ∈Ru,v the source and target words’ distributional vectors, and (3) for
(A.1)
every word, the sum of its cooccurrence counts with all other
where Ru,v ⊆ D are the “negative samples” taken for words.
the events that involve u, v throughout the epoch. Due to We use the following notations: we denote by C =
def
its stochastic sampling, we consider SGNS’s cooccurrence
] the cooccurrence matrix after adding Δ to Cs .
C[[s]←Cs +Δ
count for words u, v to be the expectation of the number of
We similarly denote the updated bias terms by B and the
their sampled cooccurrence events, which can be computed
updated distributional matrix by M . We define f as f
similarly to GloVe’s sum of weights.
computed using the updated bias terms {B u }u∈D , instead
of {Bu }u∈D . Finally, let X a distributional expression that
B. Optimization in cooccurrence-vector space (details)
depends on Δ, then let X def
= X + di,δ X , i.e., the
This section details the algorithm from Section VI, whose i,δ
pseudocode is given in Algorithm 1. The COMP D IFF sub- i ← Δ
value of the expression after setting Δ i + δ.
procedure is not given in pseudo-code and we provide more
The first
step
is to compute the updated cooccurrence sums
u,r i,δ for u ∈ {s, i} ∪ POS ∪ NEG, by adding
details on it below. C
r∈D

δ to the appropriate sums (for example, r∈D C s,r i,δ is
Implementation notes. The inner loop in lines 10-14 of always updated, since we always add cooccurrences with the
Algorithm 1 is entirely parallelizable, and we offload it to source word). Next, using the updated cooccurrence sums, we

1310

compute the updated bias terms [B u ]i,δ for u ∈ {s, i} ∪ We return di,δ J s, NEG, POS; Δ and the computed
POS ∪ NEG. For SPPMI, these terms also depend on log (Z) differences in the saved intermediate values.
(see Section IV-B), which is not a part of our state, but changes Estimating biases. When the distributional proximities in
in this term are negligible. For BIAS, we use an approximation

J s, NEG, POS; Δ are computed using M = BIAS, there
explained below.
is an additional subtlety. We compute BIAS using the biases
Using the above,
we can compute
output by GloVe when trained on the original corpus. Changes
f s,t C u,r , e−60 i,δ for u ∈ {s} ∪ POS ∪ NEG and
r∈D to the cooccurrences might affect biases computed on the
f s,t (C s,t , 0) i,δ . modified corpus. This effect is likely insignificant for small
Now, we compute the updates to our saved intermediate modifications to the cooccurrences of the existing words. New
state. First, we compute di,δ M si , i.e., the difference in words introduced as part of the attack do not initially have
M s ’s ith entry. This is similar to the previous computation, biases, and, during optimization, one can estimate their post-
since matrix entries are computed using f . We use these val- attack biases using the average biases of the words with the

ues, along with M s · M t , which is a part of our saved state,
same cooccurrence counts in the existing corpus. In practice,
we found that post-retraining BIAS distributional distances
to compute M s · M t ← M s · M t +di,δ M si ·Mti closely follow our estimated ones (see Figure A.1–b).
i,δ
for each target. If i ∈ POS ∪ NEG, we also add a similar term
2 C. Placement strategy (details)
accounting for di,δ M ts . We similarly derive di,δ M s i
2 2 As discussed in Section VII, our attack involves adding two
and use it to compute 2
M s ← M s + di,δ M s i . types of sequences to the corpus.
2 i,δ
2 2 1
First-order sequences. For each t ∈ POS, to increase sim
If i ∈ POS ∪ NEG, we similarly compute
M i . For
2 i,δ by the required amount, we add sequences with exactly one
SPPMI, the above does not account for minor changes in bias instance
of s
and t each until the number of sequences is equal
values of the
source or target which might affect all entries of
to Δt /γ (1) , where γ is the cooccurrence-weight function.
vectors in M u . We could avoid carrying We could leverage the fact that γ can count multiple cooc-
u∈{s}∪POS ∪ NEG
the approximation error to the next step (at a minor, non- currences for each instance of s, but this has disadvantages.
asymptotical performance hit) by changing Algorithm 1 to Adding more occurrences of the target word around s is
recompute the state from the updated cooccurrences at each pointless because they would exceed those of s and dominate
step, instead of the updates at lines 18-19, but our current |Δ|, particularly for pure sim 1 attackers with just one target
5
implementation does not. word. We thus require symmetry between the occurrences of
Now we are ready to compute the differences in the target and source words.
1 (s, t) , sim’
sim’ 2 (s, t) , sim’
1+2 (s, t), the distributional ex- Sequences of the form s t s t . . . could increase the desired
pressions for the first-order, second-order, and combined prox- extra cooccurrences per added source (or target) word by
imities, respectively, using C[[s]←Cs +Δ
] . For each target:
a factor of 2-3 in our setting (depending on how long the

f s,t C s,t , 0 sequences are). Nevertheless, they are clearly anomalous and
i,δ
di,δ sim’1 (s, t) ← −
f s,t
r C s,r , e
−60 f s,t
r C t,r , e
−60 would result in a fragile attack. For example, in our Twitter
corpus, sub-sequences of the form X Y X Y where X = Y
i,δ i,δ

f s,t C s,t , 0
and X, Y are alpha-numeric words, occur in 0.03% of all
f s,t
r C s,r , e
−60 f
s,t

r C t,r , e
−60
tweets. Filtering out such rare sub-sequences would eliminate
100% of the attacker’s first-order sequences.
We could also merge t’s first-order appearances with those
M s · M t
2 (s, t) ← i,δ M s · M t of other targets, or inject t into second-order sequences next to
di,δ sim’ −

2 2

2 2 M s M t s. This would add many cooccurrences of t with words other
M s M t
2 i,δ 2 i,δ 2 2
than s and might decrease both sim 1 (s, t) and sim
2 (s, t).

and, using the above, Second-order sequences. We add 11-word sequences that
include the source word s and 5 additional on each side of
s. Our placement strategy forms these sequences so that the
1 (s, t) + di,δ sim’
di,δ sim’ 2 (s, t)
1+2 (s, t) ←
di,δ sim’ cooccurrences of u ∈ D \ POS with s are approximately equal
2
u . This has a collateral effect of
to those in the change vector Δ
adding cooccurrences of u with words other than s, but it does
Finally, we compute di,δ J s, NEG, POS; Δ
as 1 (s, t), nor sim
not affect sim 2 (s, t). Moreover, it is highly
1
di,δ J s, NEG, POS; Δ
← · 5 This strategy might be good when using sim 1+2 or when
|POS ∪ NEG|
|POS ∪ NEG| > 1, because occurrences of s exceed those of t to
(s, t) −
di,δ sim d i,δ
(s, t)
sim begin with, but only under the assumption that adding many cooccurrences
Δ Δ
t∈POS t∈NEG of target word with itself does not impede the attack. In this paper, we do
not explore further if the attack can be improved in these speciﬁc cases.

1311
unlikely to affect the distributional proximities of the added because it separately preserves its sim 2 components.
1 and sim
words u ∈ POS ∪ {s} with other words since, in practice, We can still easily compute |Δ| via their weighted sum (e.g.,
every such word is added at most a few times. divide second-order entries by 4 and first-order entries by 1).
We verified this using one of our benchmark experi-
ments from Section VIII. For solutions found with sim =
Algorithm 2 Placement into corpus: finding the change set Δ
SIM2 , M = BIAS, |Δ| = 1250, only about 0.3% of such 1: procedure P LACE A DDITIONS(vector Δ, word s)
u were
entries Δ bigger than 20, and, for 99% of them, the 2: Δ←∅
3: for each t ∈ POS
do // First, add

first-order sequences
change in Cu was less than 1%. We conclude that changes 4: Δ ← Δ ∪ “s t , ..., “s t
1
to Cu where u is neither the source nor the target have × Δ
/γ(1)
t

negligible effects on distributional proximities. 5: // Now deal with

second-order sequences
6: changeMap ← u → Δ u | Δ u = 0 ∧ u ∈ POS
Placement algorithm. Algorithm 2 gives the pseudo-code of

u∈D\POS Δu
7: minSequencesRequired ←
our placement algorithm. It constructs a list of sequences, each γ(d))
d∈[λ]
8: live ← {”_ _ _ _ s _ _ _ _”, ..., ”_ _ _ _ s _ _ _ _”}
with 2 · λ + 1 words for some λ (in our setting, 5), with s in
×minSequencesRequired
the middle, λth slot. Since the sum of s’s cooccurrences in 9: indices ← {−5, −4, −3, −2, −1, 1, 2, 3, 4, 5}
each such sequence is 2 d∈[λ] γ (d), we require a minimum 10:
11: for each u ∈ changeMap do
of Δu / γ (d) sequences. 12: while changeMap [u] > 0 do
u∈D\POS d∈[λ]
13: seq, i ← argmin seq∈live, γ (|i|) − changeMap [u]
After instantiating this many sequences, the algorithm tra- i∈indices

entries and, for each, inserts the corresponding

s.t.seq[i]=”_”
verses Δ’s 14: seq [i] ← u
15: changeMap [u] ← changeMap [u] − γ (|i|)
word into the non-yet-full sequences until the required number 16: if ∀i ∈ indices : seq [i] = ”_” then
of cooccurrences is reached. For every such insertion, it tries 17: Δ ← Δ ∪ {seq}
18: live ← live \ {seq}
to find the slot whose contribution to the cooccurrence count 19: if live = ∅ then
most tightly fits the needed value. After all sequences are filled 20: live ← {”_ _ _ _ s _ _ _ _”}
21: entries
up, a new one is added. In the end, sequences that still have // Fill empty sequences with nonzero Δ
22: for each seq ∈ live do
empty slots are filled by randomly chosen words that have 23: for each i ∈ {i ∈ indices | seq [i] = ”_”} do
nonzero entries in Δ (but are not in P OS). We found that this 24: seq {i} ← RandomChoose ({u ∈ changeMap})
25:
further improves the distributional objective without increasing 26: Δ ← Δ ∪ {seq}
|Δ|. Finally, Δ is added to the corpus. return Δ

Properties required in Section VII hold. First, assume that

has entries corresponding to either first- or second-order
Δ
sequences but not both. Observe that in our |Δ|, s is always D. Distributional distances are good proxies
the word with the most occurrences, and it occurs in each Figure A.1 shows how distributional distances computed
sequence once. Therefore, |Δ| is always equal to the number during the attack are preserved throughout placement (Fig-
of sequences and the number of source-word occurrences in ure A.1–a), re-training by the victim (Figure A.1–b), and, fi-
Δ (see the definition of |Δ| in Section V). nally, in the new embedding (Figure A.1–c). The latter depicts
For Δ with only first-order changes, both properties trivially
how increases in distributional proximity correspond roughly
hold, because we add Δ t /γ (1) cooccurrences of the source linearly to increases in embedding proximity, indicating that
and the target. The size of the change is thus predictable, as the former is an effective proxy for the latter.
it adds almost exactly Δ s to |Δ|.
E. Alternative attack on resume search
For second-order changes, both properties empirically hold.
In this section, we consider an attacker who—instead of
First, |Δ| is still linear in Δ : their Pearson correlation is over
poisoning the embedding—changes his resume to include
0.99 for the rank attacker in Section IX, where |Δ| varies. words from the expanded queries. Specifically, he adds the

Thus, |Δ| is a constant multiple of Δ and close to optimal. closest neighbor of t in the original embedding to his resume

so that it is returned in response to queries where t ∈ Ωsearch .
For example, it is about 4 times smaller than Δ for the
1 First, this attacker must add a specific meaningful word to
GloVe attack, the optimal value being 2 d∈[λ] γ (d) ≈ 4.5.
his resume. For example, to match the K = 1 expanded query
Second, for the proximity attacker inSection VIII, where
“iOS,” the attacker needs to add “Android” to his resume,
|Δ| is constant but J s, NEG, POS; Δ varies, we measured
but this involves claiming that he actually has expertise in
>0.99 Pearson correlation between the proximities attained Android. By contrast, our embedding attack adds a made-up
by J s, NEG, POS; Δ and those computed over the actual, string, e.g., a social media handle or nickname.
post-placement cooccurrence counts (see Figure A.1–a). Further, this attack significantly underperforms our embed-
contains both first- and second-order entries (because
If Δ ding attack. We carried out an experiment similar to Sec-
the objective uses sim 1+2 ), the aggregate contribution to tion IX, but for each target word t, we randomly chose a single
the cooccurrence counts still preserves the objective’s value resume and added to it t’s nearest neighbor in the original

1312
(a) Post-placement distributional proximities (b) Post-retraining distributional proximities (c) Final embedding proximities
Figure A.1: Comparing the proxy distances used by the attacker with the post-attack distances in the corpus for the words in Ωbenchmark , using GloVe-
paper/Wikipedia, M = BIAS, sim = sim 1+2 , maxΔ = 1250.

s−t additions only additions & deletions

embedding. We did not poison 20 resumes, as this would have war-peace 0.219480 0.219480
turned t’s nearest neighbor into a common word, decreasing freedom-slavery 0.253640 0.253636
its score. Query expansion uses the original embedding. For ignorance-strength 0.266967 0.264050
comparison, we repeated our experiment from Section IX, but 1+2 (s, t) proximity with and without deletions.
Table XIV: Attained sim
modifying just one resume per t ∈ Ωsearch .
of cooccurrence-event weights of Δrm . We similarly define
The embedding attacker outperforms the alternative attacker
|Δadd | as the weighted sum of cooccurrence events added
for every K ∈ {1, 2, 3} and every query type. Averaged over 2 attacker, where ω
to the corpus by Δadd . Under the sim
query types and K values, the average and median ranks
entries are identical, and using our placement strategy, the
attained by the embedding attacker are 2.5 times lower (i.e.,
definition of |Δadd | is equivalent to the definition of |Δ|, up
better) than the alternative attacker’s.
to multiplication by the value of ω entries.
F. Attack with deletions We redefine |Δ| as |Δadd | + β |Δrm |. Under this definition,
We now consider an attacker who can delete cooccurrence word-flip deletions that are close to s cost more to the attacker
events from the corpus. While this is a stronger threat model, in terms of increasing |Δ|. β is this cost.
we find that it does not dramatically improve the trade- Optimization in cooccurrence-vector space. We modify the
off between the size of the changes to the corpus and the optimization procedure from Section VI as follows. First,
corresponding changes in distributional proximity. we set L ← [−5, 4] ∩ {i/5 | i ∈ Z=0 }. This allows the
Supporting deletions requires some changes. optimization to add negative values to the entries in the
cooccurrence change vector and to output Δ with negative
Attacker. First, corpus changes now include events that
correspond to a decrease in cooccurrence counts. We define entries. Second, we apply a different weight to the negative
Δ = Δadd ∪ Δrm where Δadd are the sentences added by values by multiplying the computed “step cost” value by β
the attacker (as before), and Δrm are the cooccurrence events (line 14 of Algorithm 1).
deleted by the attacker. Placement strategy. We modify the placement strategy from
The modified corpus C + Δ is now defined as C augmented Section VII as follows. First, we set Δ pos ← max Δ, 0
with Δadd and with the word appearances in Δrm flipped to pos as input
(for the element-wise max operation) and use Δ
randomly chosen words. A word flip does not delete a cooc- neg ←
to the original placement Algorithm 2. Then, we set Δ
currence event per se but replaces it by another cooccurrence
min {x, 0} for an element-wise min operation. We traverse
event between s and some randomly chosen word u. These are
the corpus to find cooccurrence events between s and another
almost equivalent in terms of their effect on the distributional neg is non-zero. Whenever we find such an
word u such that Δ u
proximities because cooccurrence vectors are very sparse. In
event, u’s location in the corpus is saved into Δrm . We then
our Wikipedia corpus, for a random subsample of 50,000 most neg
subtract from Δ u the weight of this cooccurrence event.
common words, we found that on average 1% of the entries
were non-zero. It is thus highly likely that Cs,u is initially Evaluation. We use three source-target pairs—war-peace,
0 or very low. If so, then Ms,u is likely 0 and will likely freedom-slavery, ignorance-strength—with β = 1. We attack
remain 0 (due to the max operation in all of our candidate GloVe-tutorial trained on Wikipedia with window size of
M —see Section IV-B) even after we add this cooccurrence 5 using a distance attacker, ws set to the source word in each
event. Therefore, the effect of a word flip on distributional pair, P OS to ws only, and maxΔ = 1000. We also perform
proximities is similar to word removal. an identically parameterized attack without deletions.
Let de be the distance of the removed word from s Table XIV shows the results. They are almost identical, with
def a slight advantage to the attacker who can use deletions.
for e ∈ Δrm . Let |Δrm | = e∈Δrm γ (de ) be the sum

1313

Slave Information: Specification
No ratings yet
Slave Information: Specification
99 pages
SAP VIM Tables (OpenText) by Topics Series - Part 3 - SAP4TECH
No ratings yet
SAP VIM Tables (OpenText) by Topics Series - Part 3 - SAP4TECH
17 pages
Explaining The Intuition of Word2Vec & Implementing It in Python
No ratings yet
Explaining The Intuition of Word2Vec & Implementing It in Python
13 pages
Large Language Models From Scratch
No ratings yet
Large Language Models From Scratch
29 pages
Fluent Adjoint Solver 14.5
No ratings yet
Fluent Adjoint Solver 14.5
82 pages
Nlput-Unit2 Notes
No ratings yet
Nlput-Unit2 Notes
28 pages
Constructing and Evaluating Word Embeddings
No ratings yet
Constructing and Evaluating Word Embeddings
33 pages
Chapter Transformers
No ratings yet
Chapter Transformers
8 pages
4 Word Representation
No ratings yet
4 Word Representation
41 pages
WORD EMBEDDING Project
No ratings yet
WORD EMBEDDING Project
15 pages
Gen AI 1
No ratings yet
Gen AI 1
4 pages
11.chapter8 WordEmbedding
No ratings yet
11.chapter8 WordEmbedding
17 pages
Sense VEC A Fast and Accurate Method For Word Sense Disambiguation in Neural Word Embeddings
No ratings yet
Sense VEC A Fast and Accurate Method For Word Sense Disambiguation in Neural Word Embeddings
9 pages
Lecture Word Embeddings WordTo Vec IR
No ratings yet
Lecture Word Embeddings WordTo Vec IR
60 pages
Lecture 10
No ratings yet
Lecture 10
86 pages
CCS369 - TSS-Unit 2
No ratings yet
CCS369 - TSS-Unit 2
56 pages
Christopher Manning Lecture 1: Introduction and Word Vectors
No ratings yet
Christopher Manning Lecture 1: Introduction and Word Vectors
42 pages
Text Processing For NLP Word Embedding
No ratings yet
Text Processing For NLP Word Embedding
11 pages
XCS224N Module1 Slides
No ratings yet
XCS224N Module1 Slides
72 pages
5 Pretained Word Embeddings Algorithms
No ratings yet
5 Pretained Word Embeddings Algorithms
21 pages
Contextual Word Embeddings
No ratings yet
Contextual Word Embeddings
8 pages
Hung-Yi Lee Word2vec (v3)
No ratings yet
Hung-Yi Lee Word2vec (v3)
23 pages
NLP Lec 03
No ratings yet
NLP Lec 03
26 pages
词向量嵌入综述
No ratings yet
词向量嵌入综述
10 pages
Vector Semantics and Embeddings
No ratings yet
Vector Semantics and Embeddings
29 pages
Word Embeddings A Survey
No ratings yet
Word Embeddings A Survey
11 pages
Word Embeddings: What Works, What Doesn't, and How To Tell The Difference For Applied Research
No ratings yet
Word Embeddings: What Works, What Doesn't, and How To Tell The Difference For Applied Research
51 pages
NLP An Intuitive Understanding of Word Embeddings From Count Vectors To Word2Vec
No ratings yet
NLP An Intuitive Understanding of Word Embeddings From Count Vectors To Word2Vec
18 pages
The Illustrated Word2vec - Jay Alammar - Visualizing Machine Learning One Concept at A Time
100% (1)
The Illustrated Word2vec - Jay Alammar - Visualizing Machine Learning One Concept at A Time
24 pages
NLP - L9 Word Embedding
No ratings yet
NLP - L9 Word Embedding
5 pages
Language Analysis - Sociolinguistics of Word Embeddings - PREPRINT - 8.8.2020
No ratings yet
Language Analysis - Sociolinguistics of Word Embeddings - PREPRINT - 8.8.2020
17 pages
DLNLP CH-3 N
No ratings yet
DLNLP CH-3 N
11 pages
Chapter II
No ratings yet
Chapter II
26 pages
A Survey of Word Embeddings Based On Deep Learning: Shirui Wang Wenan Zhou Chao Jiang
No ratings yet
A Survey of Word Embeddings Based On Deep Learning: Shirui Wang Wenan Zhou Chao Jiang
24 pages
Word Embadding
No ratings yet
Word Embadding
24 pages
Effect of Word Embedding Vector Dimensionality On Sentiment Analysis Through Short and Long Texts
No ratings yet
Effect of Word Embedding Vector Dimensionality On Sentiment Analysis Through Short and Long Texts
8 pages
No Training Required Exploring Random Encoders For Sentence Classification
No ratings yet
No Training Required Exploring Random Encoders For Sentence Classification
16 pages
Vector Semantics and Embedding (Part 2)
No ratings yet
Vector Semantics and Embedding (Part 2)
47 pages
Wordembed v2.0
No ratings yet
Wordembed v2.0
46 pages
Transfer Learning in Natural Language Processing PDF
0% (1)
Transfer Learning in Natural Language Processing PDF
238 pages
Unit 2
No ratings yet
Unit 2
15 pages
Word and Document Embeddings
No ratings yet
Word and Document Embeddings
94 pages
ML4D-L6 nlp2
No ratings yet
ML4D-L6 nlp2
58 pages
21 Word2Vec 24 09 2024
No ratings yet
21 Word2Vec 24 09 2024
63 pages
Evaluating The Stability of Embedding-Based Word Similarities
No ratings yet
Evaluating The Stability of Embedding-Based Word Similarities
14 pages
CHATGPT NLP
No ratings yet
CHATGPT NLP
6 pages
Allen 19 A
No ratings yet
Allen 19 A
9 pages
From Word Vectors To Multimodal Embeddings: Techniques, Applications, and Future Directions For Large Language Models
No ratings yet
From Word Vectors To Multimodal Embeddings: Techniques, Applications, and Future Directions For Large Language Models
21 pages
Neural Word Embedding As Implicit Matrix Factorization
No ratings yet
Neural Word Embedding As Implicit Matrix Factorization
9 pages
08-DL-Deep Learning For Text Data (Transfer Learning in NLP)
No ratings yet
08-DL-Deep Learning For Text Data (Transfer Learning in NLP)
53 pages
Lebijp 59 SZ 31 Py
No ratings yet
Lebijp 59 SZ 31 Py
69 pages
Word Embeddings 1
No ratings yet
Word Embeddings 1
42 pages
Wordembed
No ratings yet
Wordembed
31 pages
NLP2
No ratings yet
NLP2
11 pages
Exercises en Text Models 2
No ratings yet
Exercises en Text Models 2
5 pages
Natural Language Adversarial Defense Through Synonym Encoding
No ratings yet
Natural Language Adversarial Defense Through Synonym Encoding
11 pages
Word Embeddings in NLP - Gunjan Agicha - Medium
No ratings yet
Word Embeddings in NLP - Gunjan Agicha - Medium
5 pages
Word Embeddings
No ratings yet
Word Embeddings
55 pages
Performance Evaluation of Word Embedding Algorithms
No ratings yet
Performance Evaluation of Word Embedding Algorithms
7 pages
Word Embedding
No ratings yet
Word Embedding
9 pages
I: Exploring Deep State Spaces Via Fuzzing: Cornelius Aschermann, Sergej Schumilo, Ali Abbasi, and Thorsten Holz
No ratings yet
I: Exploring Deep State Spaces Via Fuzzing: Cornelius Aschermann, Sergej Schumilo, Ali Abbasi, and Thorsten Holz
16 pages
Tardis: Rolling Back The Clock On CMS-Targeting Cyber Attacks
No ratings yet
Tardis: Rolling Back The Clock On CMS-Targeting Cyber Attacks
16 pages
Thillard 2016 These PHD
No ratings yet
Thillard 2016 These PHD
146 pages
Abdullah PHD Dissertation Final
No ratings yet
Abdullah PHD Dissertation Final
161 pages
NetFPGA10G 3ways NoAtrhors
No ratings yet
NetFPGA10G 3ways NoAtrhors
4 pages
Standard Form
100% (3)
Standard Form
23 pages
M220EW01 v2 Datasheet
No ratings yet
M220EW01 v2 Datasheet
27 pages
Mike Meyers' Comptia Network+ Certification Passport, Seventh Edition (Exam N10-008) Jonathan S. Weissman
100% (4)
Mike Meyers' Comptia Network+ Certification Passport, Seventh Edition (Exam N10-008) Jonathan S. Weissman
61 pages
Idler Sound Monitor Kit
No ratings yet
Idler Sound Monitor Kit
2 pages
SAP SD Transaction Codes List
No ratings yet
SAP SD Transaction Codes List
5 pages
Data Maturity Framework For The Not-For-Profit Sector
No ratings yet
Data Maturity Framework For The Not-For-Profit Sector
12 pages
BLDC Motors
100% (1)
BLDC Motors
45 pages
87 10239 Revc Tr3300 Plugintransceivers
No ratings yet
87 10239 Revc Tr3300 Plugintransceivers
3 pages
Tikz Library For Structural Analysis: Stanli User Guide, Version 3.0 J Urgen Hackl
No ratings yet
Tikz Library For Structural Analysis: Stanli User Guide, Version 3.0 J Urgen Hackl
58 pages
Trends Networks and Critical Thinking in The 21st Century ICT
No ratings yet
Trends Networks and Critical Thinking in The 21st Century ICT
18 pages
Construction Standard Specification Section 05120 Structural Steel
No ratings yet
Construction Standard Specification Section 05120 Structural Steel
11 pages
IKGPTU Practical Datesheet May 2024 For Students
No ratings yet
IKGPTU Practical Datesheet May 2024 For Students
8 pages
Advt 2023 Final - Faculty Recruitment - December 2023 Dated 20 12 2023
No ratings yet
Advt 2023 Final - Faculty Recruitment - December 2023 Dated 20 12 2023
27 pages
Wordpress Cheat Sheet
No ratings yet
Wordpress Cheat Sheet
1 page
Soal Bahasa Inggris Ujian Akhir Sekolah 2022 - 2023
No ratings yet
Soal Bahasa Inggris Ujian Akhir Sekolah 2022 - 2023
12 pages
Grade 5 - Ict - P1 - 2022 - Term2
No ratings yet
Grade 5 - Ict - P1 - 2022 - Term2
6 pages
Datasheet LM149
No ratings yet
Datasheet LM149
18 pages
CS 4402-01learning Journal Unit 1
No ratings yet
CS 4402-01learning Journal Unit 1
2 pages
Xnee
No ratings yet
Xnee
4 pages
NumberSeries pdf-37 PDF
100% (1)
NumberSeries pdf-37 PDF
4 pages
EEE-121 Lab 11
No ratings yet
EEE-121 Lab 11
7 pages
Chioma Frances MSC
No ratings yet
Chioma Frances MSC
90 pages
Gpss - Capitulo 7 Block Statements
No ratings yet
Gpss - Capitulo 7 Block Statements
107 pages
JWT Magazine October 2024
No ratings yet
JWT Magazine October 2024
112 pages
Dell Docking Compatibility Guide
No ratings yet
Dell Docking Compatibility Guide
11 pages
Curriculum MBA (Distance) Term-IV ITSM
No ratings yet
Curriculum MBA (Distance) Term-IV ITSM
3 pages
WRN Webstats Sept2010
No ratings yet
WRN Webstats Sept2010
3 pages

Humpty Dumpty: Controlling Word Meanings Via Corpus Poisoning

Uploaded by

Humpty Dumpty: Controlling Word Meanings Via Corpus Poisoning

Uploaded by

2020 IEEE Symposium on Security and Privacy

Abstract—Word embeddings, i.e., low-dimensional vector rep- neural networks

© 2020, Roei Schuster. Under license to IEEE. 1295

GloVe. GloVe deﬁnes and optimizes (via SGD) a minimiza- VI L ⊆ R

are scalar bias terms that are learned

5. re-train word embeddings on

Figure V.1: Overview of our attack methodology.

in s’s cooccurrence vector that corresponds to adding Δ to the

size of the corpus change, |Δ|, which is easily computable

Table VIII: Parameters of the experiments.

victim α K=1 K=3 K=5

A PPENDIX a GPU. Further, to save GPU memory and latency of dis-

negligible effects on distributional proximities. 5: // Now deal with

 entries and, for each, inserts the corresponding

Properties required in Section VII hold. First, assume that

s−t additions only additions & deletions

You might also like

entries and, for each, inserts the corresponding