0% found this document useful (0 votes)

72 views12 pages

Topic Modeling in Embedding Spaces

The document introduces the embedded topic model (ETM), which combines traditional topic models with word embeddings. The ETM represents each word with a categorical distribution based on the inner product of the word and topic embeddings. This allows the ETM to learn interpretable topics even from large vocabularies including rare and stop words. The document develops an efficient variational inference algorithm to fit the ETM to data. Experimental results show the ETM outperforms LDA in topic quality and predictive performance, particularly on large vocabularies where LDA fails.

Uploaded by

Akshay Kumar Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

72 views12 pages

Topic Modeling in Embedding Spaces

Uploaded by

Akshay Kumar Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Topic Modeling in Embedding Spaces

Adji B. Dieng Francisco J. R. Ruiz David M. Blei

Columbia University Columbia University Columbia University
[email protected] Cambridge University [email protected]
[email protected]

Abstract can be fit to large datasets of text by using varia-

tional inference and stochastic optimization (Hoff-
Topic modeling analyzes documents to man et al., 2010, 2013).
learn meaningful patterns of words. How-
ever, existing topic models fail to learn inter- LDA is a powerful model and it is widely used.
pretable topics when working with large and
However, it suffers from a pervasive technical
arXiv:1907.04907v1 [cs.IR] 8 Jul 2019

heavy-tailed vocabularies. To this end, we

develop the embedded topic model (ETM),
problem—it fails in the face of large vocabular-
a generative model of documents that mar- ies. Practitioners must severely prune their vocab-
ries traditional topic models with word em- ularies in order to fit good topic models, i.e., those
beddings. In particular, it models each word that are both predictive and interpretable. This is
with a categorical distribution whose natu- typically done by removing the most and least fre-
ral parameter is the inner product between quent words. On large collections, this pruning
a word embedding and an embedding of its may remove important terms and limit the scope of
assigned topic. To fit the ETM, we develop
the models. The problem of topic modeling with
an efficient amortized variational inference
algorithm. The ETM discovers interpretable large vocabularies has yet to be addressed in the
topics even with large vocabularies that in- research literature.
clude rare words and stop words. It out-
performs existing document models, such as In parallel with topic modeling came the idea of
latent Dirichlet allocation, in terms of both word embeddings. Research in word embeddings
topic quality and predictive performance.1 begins with the neural language model of Ben-
gio et al. (2003), published in the same year and
journal as Blei et al. (2003). Word embeddings
1 Introduction
eschew the “one-hot” representation of words—
a vocabulary-length vector of zeros with a sin-
Topic models are statistical tools for discovering gle one—to learn a distributed representation, one
the hidden semantic structure in a collection of where words with similar meanings are close in
documents (Blei et al., 2003; Blei, 2012). Topic a lower-dimensional vector space (Rumelhart and
models and their extensions have been applied to Abrahamson, 1973; Bengio et al., 2006). As for
many fields, such as marketing, sociology, polit- topic models, researchers scaled up embedding
ical science, and the digital humanities. Boyd- methods to large datasets (Mikolov et al., 2013a,b;
Graber et al. (2017) provide a review. Pennington et al., 2014; Levy and Goldberg, 2014;
Most topic models build on latent Dirichlet al- Mnih and Kavukcuoglu, 2013). Word embeddings
location (LDA) (Blei et al., 2003). LDA is a hi- have been extended and developed in many ways.
erarchical probabilistic model that represents each They have become crucial in many applications of
topic as a distribution over terms and represents natural language processing (Li and Tao, 2018),
each document as a mixture of the topics. When fit and they have also been extended to datasets be-
to a collection of documents, the topics summarize yond text (Rudolph et al., 2016).
their contents, and the topic proportions provide a
In this paper, we develop the embedded topic
low-dimensional representation of each one. LDA
model (ETM), a topic model for word embeddings.
1
Code for this work can be found at https:// The ETM enjoys the good properties of topic mod-
github.com/adjidieng/ETM. els and the good properties of word embeddings.
of The New York Times. The figures show each
LDA
topic’s embedding and its closest words; these top-
0.9
Coherence-Normalized Perplexity
ETM ics are about Christianity and sports.
0.8
0.7 Due to the topic representation in terms of a
0.6 point in the embedding space, the ETM is also ro-
0.5 bust to the presence of stop words, unlike most
0.4
common topic models. When stop words are in-
cluded in the vocabulary, the ETM assigns topics
0.3
to the corresponding area of the embedding space
0.2
(we demonstrate this in Section 6).
0.1
1 2 3 4 5 As for most topic models, the posterior of the
Vocabulary Size 1e4
topic proportions is intractable to compute. We
Figure 1. Ratio of the normalized held-out perplex- derive an efficient algorithm for approximating the
ity for document completion and the topic coher- posterior with variational inference (Jordan et al.,
ence as as a function of the vocabulary size for the 1999; Hoffman et al., 2013; Blei et al., 2017) and
ETM and LDA . While the performance of LDA additionally use amortized inference to efficiently
deteriorates for large vocabularies, the ETM main- approximate the topic proportions (Kingma and
tains good performance. Welling, 2014; Rezende et al., 2014). The result-
ing algorithm fits the ETM to large corpora with
large vocabularies. The algorithm for the ETM can
As a topic model, it discovers an interpretable la- either use previously fitted word embeddings, or fit
tent semantic structure of the texts; as a word em- them jointly with the rest of parameters. (In partic-
bedding, it provides a low-dimensional represen- ular, Figures 1 to 3 were obtained with the version
tation of the meaning of words. It robustly ac- of the ETM that obtains pre-fitted skip-gram word
commodates large vocabularies and the long tail embeddings.)
of language data.
We compared the performance of the ETM to
Figure 1 illustrates the advantages. This figure LDA and the neural variational document model
plots the ratio between the predictive perplexity (NVDM), a form of multinomial matrix factoriza-
of held-out documents and the topic coherence, tion. The ETM provides good predictive perfor-
as a function of the size of the vocabulary. (The mance, as measured by held-out log-likelihood on
perplexity has been normalized by the vocabulary a document completion task (Wallach et al., 2009).
size.) This is for a corpus of 11.2K articles from It also provides meaningful topics, as measured by
the 20NewsGroup and for 100 topics. The red line topic coherence (Mimno et al., 2011) and topic di-
is LDA; its performance deteriorates as the vocab- versity, a metric that also indicates the quality of
ulary size increases—the predictive performance the topics. The ETM is especially robust to large
and the quality of the topics get worse. The blue vocabularies.
line is the ETM; it maintains good performance,
even as the vocabulary size gets large.
2 Related Work
Like LDA, the ETM is a generative probabilis-
tic model: each document is a mixture of topics
This work develops a new topic model that extends
and each observed word is assigned to a particular
LDA . LDA has been extended in many ways, and
topic. In contrast to LDA, the per-topic conditional
topic modeling has become a subfield of its own.
probability of a term has a log-linear form that
For a review, see Blei (2012) and Boyd-Graber
involves a low-dimensional representation of the
et al. (2017).
vocabulary. Each term is represented by an em-
bedding; each topic is a point in that embedding One of the goals in developing the ETM is to
space; and the topic’s distribution over terms is incorporate word similarity into the topic model,
proportional to the exponentiated inner product of and there is previous research that shares this goal.
the topic’s embedding and each term’s embedding. These methods either modify the topic priors (Pet-
Figures 2 and 3 show topics from a 300-topic ETM terson et al., 2010; Zhao et al., 2017b; Shi et al.,
churches playoffs playoff championship
finals championships
tournament

pope roman
catholic
bishop
priest cathedral Topic 72
season
church seasons games
Topic 23 game Topic 136
franchise
rev athletic
episcopal
football basketball
squad team Topic 244 league hockey
presbyterian leagues soccer
teams baseball

christ players sports

teammates player sport
jesus coaches racing
coaching
god coach
prayer trainer athletes
athlete

Figure 2. A topic about Christianity found by Figure 3. Topics about sports found by the
the ETM on The New York Times. The topic is ETM . Each topic is a point in the word em-
a point in the word embedding space. bedding space.

2017; Zhao et al., 2017a) or the topic assignment perspective, using Wasserstein distances to learn
priors (Xie et al., 2015). For example Petterson topics and word embeddings jointly.
et al. (2010) use a word similarity graph (as given
Another thread of recent research improves
by a thesaurus) to bias LDA towards assigning
topic modeling inference through deep neural net-
similar words to similar topics. As another ex-
works (Srivastava and Sutton, 2017; Card et al.,
ample, Xie et al. (2015) model the per-word topic
2017; Cong et al., 2017; Zhang et al., 2018).
assignments of LDA using a Markov random field
Specifically, these methods reduce the dimen-
to account for both the topic proportions and the
sion of the text data through amortized infer-
topic assignments of similar words. These meth-
ence and the variational auto-encoder (Kingma
ods use word similarity as a type of “side informa-
and Welling, 2014; Rezende et al., 2014). To per-
tion” about language; in contrast, the ETM directly
form inference in the ETM, we also avail ourselves
models the similarity (via embeddings) in its gen-
of amortized inference methods (Gershman and
erative process of words.
Goodman, 2014).
Other work has extended LDA to directly in- Finally, as a document model, the ETM also re-
volve word embeddings. One common strategy lates to works that learn per-document represen-
is to convert the discrete text into continuous ob- tations as part of an embedding model (Le and
servations of embeddings, and then adapt LDA to Mikolov, 2014; Moody, 2016; Miao et al., 2016).
generate real-valued data (Das et al., 2015; Xun In contrast to these works, the document variables
et al., 2016; Batmanghelich et al., 2016; Xun et al., in the ETM are part of a larger probabilistic topic
2017). With this strategy, topics are Gaussian dis- model.
tributions with latent means and covariances, and
the likelihood over the embeddings is modeled 3 Background
with a Gaussian (Das et al., 2015) or a Von-Mises
Fisher distribution (Batmanghelich et al., 2016). The ETM builds on two main ideas, LDA and word
The ETM differs from these approaches in that it is embeddings. Consider a corpus of D documents,
a model of categorical data, one that goes through where the vocabulary contains V distinct terms.
the embeddings matrix. Thus it does not require Let wdn ∈ {1, . . . , V } denote the nth word in the
pre-fitted embeddings and, indeed, can learn em- dth document.
beddings as part of its inference process.
Latent Dirichlet allocation. LDA is a proba-
There have been a few other ways of combining bilistic generative model of documents (Blei et al.,
LDA and embeddings. Nguyen et al. (2015) mix 2003). It posits K topics β1:K , each of which is
the likelihood defined by LDA with a log-linear a distribution over the vocabulary. LDA assumes
model that uses pre-fitted word embeddings; Bunk each document comes from a mixture of topics,
and Krestel (2018) randomly replace words drawn where the topics are shared across the corpus and
from a topic with their embeddings drawn from a the mixture proportions are unique for each docu-
Gaussian; and Xu et al. (2018) adopt a geometric ment. The generative process for each document
is the following: the vocabulary. Specifically, the ETM uses a log-
linear model that takes the inner product of the
1. Draw topic proportion θd ∼ Dirichlet(αθ ).
word embedding matrix and the topic embedding.
2. For each word n in the document:
With this form, the ETM assigns high probability
(a) Draw topic assignment zdn ∼ Cat(θd ).
to a word v in topic k by measuring the agreement
(b) Draw word wdn ∼ Cat(βzdn ).
between the word’s embedding and the topic’s em-
Here, Cat(·) denotes the categorical distribution. bedding.
LDA places a Dirichlet prior on the topics, βk ∼
Denote the L × V word embedding matrix by
Dirichlet(αβ ) for k = 1, . . . , K. The concentra-
ρ; the column ρv is the embedding of v. Under the
tion parameters αβ and αθ of the Dirichlet distri-
ETM , the generative process of the dth document
butions are fixed model hyperparameters.
is the following:
Word embeddings. Word embeddings provide
1. Draw topic proportions θd ∼ LN (0, I).
models of language that use vector representations
2. For each word n in the document:
of words (Rumelhart and Abrahamson, 1973; Ben-
a. Draw topic assignment zdn ∼ Cat(θd ).
gio et al., 2003). The word representations are fit-
b. Draw the word wdn ∼ softmax(ρ> αzdn ).
ted to relate to meaning, in that words with similar
meanings will have representations that are close. In Step 1, LN (·) denotes the logistic-normal dis-
(In embeddings, the “meaning” of a word comes tribution (Aitchison and Shen, 1980; Blei and Laf-
from the contexts in which it is used.) ferty, 2007); it transforms a standard Gaussian
random variable to the simplex. A draw θd from
We focus on the continuous bag-of-words
this distribution is obtained as
(CBOW) variant of word embeddings (Mikolov
et al., 2013b). In CBOW, the likelihood of each δd ∼ N (0, I) ; θd = softmax(δd ). (2)
word wdn is (We replaced the Dirichlet with the logistic nor-
mal to more easily use reparameterization in the
wdn ∼ softmax(ρ> αdn ). (1)
inference algorithm; see Section 5.)
The embedding matrix ρ is a L × V matrix whose Steps 1 and 2a are standard for topic mod-
columns contain the embedding representations of eling: they represent documents as distributions
the vocabulary, ρv ∈ RL . The vector αdn is the over topics and draw a topic assignment for each
context embedding. The context embedding is the observed word. Step 2b is different; it uses the
sum of the context embedding vectors (αv for each embeddings of the vocabulary ρ and the assigned
word v) of the words surrounding wdn . topic embedding αzdn to draw the observed word
from the assigned topic, as given by zdn .
4 The Embedded Topic Model
The topic distribution in Step 2b mirrors the
CBOW likelihood in Eq. 1. Recall CBOW uses
The ETM is a topic model that uses embedding
the surrounding words to form the context vector
representations of both words and topics. It con-
αdn . In contrast, the ETM uses the topic embed-
tains two notions of latent dimension. First, it em-
ding αzdn as the context vector, where the assigned
beds the vocabulary in an L-dimensional space.
topic zdn is drawn from the per-document variable
These embeddings are similar in spirit to classi-
θd . The ETM draws its words from a document
cal word embeddings. Second, it represents each
context, rather than from a window of surround-
document in terms of K latent topics.
ing words.
In traditional topic modeling, each topic is a full
The ETM likelihood uses a matrix of word em-
distribution over the vocabulary. In the ETM, how-
beddings ρ, a representation of the vocabulary in
ever, the k th topic is a vector αk ∈ RL in the em-
a lower dimensional space. In practice, it can ei-
bedding space. We call αk a topic embedding—it
ther rely on previously fitted embeddings or learn
is a distributed representation of the k th topic in
them as part of its overall fitting procedure. When
the semantic space of words.
the ETM learns the embeddings as part of the fit-
In its generative process, the ETM uses the topic ting procedure, it simultaneously finds topics and
embedding to form a per-topic distribution over an embedding space.
When the ETM uses previously fitted embed- on the log of the marginal likelihood of Eq. 4.
dings, it learns the topics of a corpus in a particu- There are two sets of parameters to optimize: the
lar embedding space. This strategy is particularly model parameters, as described above, and the
useful when there are words in the embedding that variational parameters, which tighten the bounds
are not used in the corpus. The ETM can hypothe- on the marginal likelihoods.
size how those words fit in to the topics because it
To begin, posit a family of distributions of
can calculate ρ> v αk , even for words v that do not
the untransformed topic proportions q(δd ; wd , ν).
appear in the corpus.
We use amortized inference, where the variational
distribution of δd depends on both the document
5 Inference and Estimation wd and shared variational parameters ν. In par-
ticular q(δd ; wd , ν) is a Gaussian whose mean
We are given a corpus of documents and variance come from an “inference network,”
{w1 , . . . , wD }, where wd is a collection of a neural network parameterized by ν (Kingma and
Nd words. How do we fit the ETM? Welling, 2014). The inference network ingests the
The marginal likelihood. The parameters of the document wd and outputs a mean and variance
ETM are the embeddings ρ1:V and the topic em- of δd . (To accommodate documents of varying
beddings α1:K ; each αk is a point in the embed- length, we form the input of the inference network
ding space. We maximize the marginal likelihood by normalizing the bag-of-word representation of
of the documents, the document by the number of words Nd .)

D We use this family of variational distributions to

X
L(α, ρ) = log p(wd | α, ρ). (3) bound the log-marginal likelihood. The evidence
d=1 lower bound (ELBO) is a function of the model
parameters and the variational parameters,
The problem is that the marginal likelihood of Nd
D X
X
each document is intractable to compute. It in- L(α, ρ, ν) = Eq [log p(wnd | δd , ρ, α)]
volves a difficult integral over the topic propor- d=1 n=1
tions, which we write in terms of the untrans- XD
formed proportions δd in Eq. 2, − KL(q(δd ; wd , ν) || p(δd )). (7)
d=1
Z Nd
Y
p(wd | α, ρ) = p(δd ) p(wdn | δd , α, ρ) dδd . As a function of the variational parameters,
n=1 the first term encourages them to place mass
(4) on topic proportions δd that explain the ob-
served words; the second term encourages them
The conditional distribution of each word to be close to the prior p(δd ). As a func-
marginalizes out the topic assignment zdn , tion of the model parameters, this objective
K
X maximizes
P the expected complete log-likelihood,
p(wdn | δd , α, ρ) = θdk βk,wdn . (5) d log p(δd , wd | α, ρ).
k=1
We optimize the ELBO with respect to both
Here, θdk denotes the (transformed) topic propor- the model parameters and the variational param-
tions (Eq. 2) and βkv denotes a traditional “topic,” eters. We use stochastic optimization, forming
i.e., a distribution over words, induced by the word noisy gradients by taking Monte Carlo approxima-
embeddings ρ and the topic embedding αk , tions of the full gradient through the reparameter-
ization trick (Kingma and Welling, 2014; Titsias
βkv = softmax(ρ> αk )v .

(6) and Lázaro-Gredilla, 2014; Rezende et al., 2014).
We also use data subsampling to handle large col-
Eqs. 4 to 6 flesh out the likelihood in Eq. 3.
lections of documents (Hoffman et al., 2013). We
Variational inference. We sidestep the in- set the learning rate with Adam (Kingma and Ba,
tractable integral with variational inference (Jor- 2015). The procedure is shown in Algorithm 1,
dan et al., 1999; Blei et al., 2017). Variational in- where the notation NN(x ; ν) represents a neural
ference optimizes a sum of per-document bounds network with input x and parameters ν.
Algorithm 1 Topic modeling with the ETM lary, we keep all words that appear in more than
Initialize model and variational parameters a certain number of documents, and we vary the
for iteration i = 1, 2, . . . do threshold from 100 (a smaller vocabulary, where
Compute βk = softmax(ρ> αk ) for each topic k V = 3,102) to 2 (a larger vocabulary, where V =
Choose a minibatch B of documents 52,258). After preprocessing, we further remove
for each document d in B do
one-word documents from the validation and test
Get normalized bag-of-word representat. xd
Compute µd = NN(xd ; νµ )
sets. We split the corpus into a training set of
Compute Σd = NN(xd ; νΣ ) 11,260 documents, a test set of 7,532 documents,
Sample θd ∼ LN (µd , Σd ) and a validation set of 100 documents.
for each word in the document do
Compute p(wdn | θd ) = θd> β·,wdn The New York Times corpus is a larger collec-
end for tion of news articles. It contains more than 1.8
end for million articles, spanning the years 1987–2007.
Estimate the ELBO and its gradient (backprop.) We follow the same preprocessing steps as for
Update model parameters α1:K 20Newsgroups. We form versions of this corpus
Update variational parameters (νµ , νΣ ) with vocabularies ranging from V = 5,921 to
end for V = 212,237. After preprocessing, we use 85%
of the documents for training, 10% for testing, and
6 Empirical Study 5% for validation.
Models. We compare the performance of the
We study the performance of the ETM and com- ETM with two other document models: latent
pare it to other unsupervised document models. Dirichlet allocation (LDA) and the neural varia-
A good document model should provide both co- tional document model (NVDM).
herent patterns of language and an accurate dis-
tribution of words, so we measure performance LDA (Blei et al., 2003) is a standard topic model
in terms of both predictive accuracy and topic that posits Dirichlet priors for the topics βk and
interpretability. We measure accuracy with log- topic proportions θd . (We set the prior hyper-
likelihood on a document completion task (Rosen- parameters to 1.) It is a conditionally conjugate
Zvi et al., 2004; Wallach et al., 2009); we measure model, amenable to variational inference with co-
topic interpretability as a blend of topic coherence ordinate ascent. We consider LDA because it is
and diversity. We find that, of the interpretable the most commonly used topic model, and it has a
models, the ETM is the one that provides better similar generative process as the ETM.
predictions and topics. The NVDM (Miao et al., 2016) is a multinomial
In a separate analysis (Section 6.1), we study factor model of documents; it posits the likelihood
the robustness of each method in the presence of wdn ∼ softmax(β > θd ), where the K-dimensional
stop words. Standard topic models fail in this vector θd ∼ N (0, IK ) is a per-document vari-
regime—since stop words appear in many doc- able, and β is a real-valued matrix of size K × V .
uments, every learned topic includes some stop The NVDM uses a per-document real-valued latent
words, leading to poor topic interpretability. In vector θd to average over the embedding matrix β
contrast, the ETM is able to use the informa- in the logit space. Like the ETM, the NVDM uses
tion from the word embeddings to provide inter- amortized variational inference to jointly learn the
pretable topics.2 approximate posterior over the document repre-
sentation θd and the model parameter β.
Corpora. We study the 20Newsgroups corpus
and the New York Times corpus. NVDM is not interpretable as a topic model;
its latent variables are unconstrained. We study
The 20Newsgroup corpus is a collection of a more interpretable variant of the NVDM which
newsgroup posts. We preprocess the corpus by fil- constrains θd to lie in the simplex, replacing its
tering stop words, words with document frequency Gaussian prior with a logistic normal (Aitchison
above 70%, and tokenizing. To form the vocabu- and Shen, 1980). (This can be thought of as a
2
Code is available upon request and will be released after semi-nonnegative matrix factorization.) We call
publication. this document model ∆-NVDM.
Table 1. Word embeddings learned by all document models (and skip-gram) on the New York Times with
vocabulary size 118,363.

Skip-gram embeddings ETM embeddings

love family woman politics love family woman politics
loved families man political joy children girl political
passion grandparents girl religion loves son boy politician
loves mother boy politicking loved mother mother ideology
affection friends teenager ideology passion father daughter speeches
adore relatives person partisanship wonderful wife pregnant ideological

NVDM embeddings ∆-NVDM embeddings

love family woman politics love family woman politics
loves sons girl political miss home life political
passion life women politician young father marriage faith
wonderful brother man politicians born son women marriage
joy son pregnant politically dream day read politicians
beautiful lived boyfriend democratic younger mrs young election

We study two variants of the ETM, one where learned by the skip-gram model.
the word embeddings are pre-fitted and one where
they are learned jointly with the rest of the param- Table 1 illustrates the embeddings of the differ-
eters. The variant with pre-fitted embeddings is ent models. All the methods provide interpretable
called the “labeled ETM.” We use skip-gram em- embeddings—words with related meanings are
beddings (Mikolov et al., 2013b). close to each other. The ETM and the NVDM learn
embeddings that are similar to those from the skip-
Algorithm settings. Given a corpus, each model gram. The embeddings of ∆-NVDM are differ-
comes with an approximate posterior inference ent; the simplex constraint on the local variable
problem. We use variational inference for all of changes the nature of the embeddings.
the models and employ stochastic variational in-
ference (SVI) (Hoffman et al., 2013) to speed up We next look at the learned topics. Table 2 dis-
the optimization. The minibatch size is 1,000 doc- plays the 7 most used topics for all methods, as
uments. For LDA, we set the learning rate as sug- given by the average of the topic proportions θd .
gested by Hoffman et al. (2013): the delay is 10 LDA and the ETM both provide interpretable top-
and the forgetting factor is 0.85. ics. Neither NVDM nor ∆-NVDM provide inter-
pretable topics; their model parameters β are not
Within SVI, LDA enjoys coordinate ascent vari- interpretable as distributions over the vocabulary
ational updates, with 5 inner steps to optimize that mix to form documents.
the local variables. For the other models, we use
amortized inference over the local variables θd . Quantitative results. We next study the mod-
We use 3-layer inference networks and we set the els quantitatively. We measure the quality of
local learning rate to 0.002. We use `2 regular- the topics and the predictive performance of the
ization on the variational parameters (the weight model. We found that among models with inter-
decay parameter is 1.2 × 10−6 ). pretable topics, the ETM provides the best predic-
tions.
Qualitative results. We first examine the embed-
dings. The ETM, NVDM, and ∆-NVDM all in- We measure topic quality by blending two met-
volve a word embedding. We illustrate them by rics: topic coherence and topic diversity. Topic
fixing a set of terms and calculating the words coherence is a quantitative measure of the inter-
that occur in the neighborhood around them. For pretability of a topic (Mimno et al., 2011). It is
comparison, we also illustrate word embeddings the average pointwise mutual information of two
Table 2. Top five words of seven most used topics from different document models on 1.8M documents
of the New York Times corpus with vocabulary size 212,237 and K = 300 topics.

LDA
time year officials mr city percent state
day million public president building million republican
back money department bush street company party
good pay report white park year bill
long tax state clinton house billion mr
NVDM
scholars japan gansler spratt assn ridership pryce
gingrich tokyo wellstone tabitha assoc mtv mickens
funds pacific mccain mccorkle qtr straphangers mckechnie
institutions europe shalikashvili cheetos yr freierman mfume
endowment zealand coached vols nyse riders filkins
∆-NVDM
concerto servings nato innings treas patients democrats
solos tablespoons soviet scored yr doctors republicans
sonata tablespoon iraqi inning qtr medicare republican
melodies preheat gorbachev shutout outst dr senate
soloist minced arab scoreless telerate physicians dole
Labeled ETM
music republican yankees game wine court company
dance bush game points restaurant judge million
songs campaign baseball season food case stock
opera senator season team dishes justice shares
concert democrats mets play restaurants trial billion
ETM
game music united wine company yankees art
team mr israel food stock game museum
season dance government sauce million baseball show
coach opera israeli minutes companies mets work
play band mr restaurant billion season artist

words drawn randomly from the same document The idea behind topic coherence is that a co-
(Lau et al., 2014), herent topic will display words that tend to oc-
cur in the same documents. In other words,
K 10 10
1 X 1 X X (k) (k) the most likely words in a coherent topic should
TC = f (wi , wj ),
K 45 have high mutual information. Document models
k=1 i=1 j=i+1
with higher topic coherence are more interpretable
(k) (k) topic models.
where {w1 , . . . , w10 } denotes the top-10 most
likely words in topic k. Here, f (·, ·) is the normal- We combine coherence with a second metric,
ized pointwise mutual information, topic diversity. We define topic diversity to be the
P (w ,w )
i j
percentage of unique words in the top 25 words of
log P (wi )P (wj ) all topics. Diversity close to 0 indicates redundant
f (wi , wj ) = .
− log P (wi , wj ) topics; diversity close to 1 indicates more varied
topics. We define the overall metric for the qual-
The quantity P (wi , wj ) is the probability of words ity of a model’s topics as the product of its topic
wi and wj co-occurring in a document and P (wi ) diversity and topic coherence.
is the marginal probability of word wi . We
approximate these probabilities with empirical A good topic model also provides a good distri-
counts. bution of language. To measure predictive quality,
V=3102 V=8496 V=18625 V=29461 V=52258
1.5 LDA
NVDM
-NVDM
1.0 Labeled ETM
Interpretability ETM
0.5

0.0

0.5

1.0

1.5
1 0 1
Predictive Power
(a) Topic quality as measured by normalized product of topic coherence and topic diversity (the higher the better) vs. predictive
performance as measured by normalized log-likelihood on document completion (the higher the better) on the 20NewsGroup
dataset.

V=9842 V=55627 V=74095 V=124725 V=212237

1.5

1.0

0.5
Interpretability

0.0

0.5
LDA
1.0 NVDM
-NVDM
Labeled ETM
1.5 ETM(LE)
1 0 1
Predictive Power
(b) Topic quality as measured by normalized product of topic coherence and topic diversity (the higher the better) vs. predictive
performance as measured by normalized log-likelihood on document completion (the higher the better) on the New York Times
dataset.

Figure 4. Performance on the 20NewsGroups and the New York Times datasets for different vocabulary
sizes. On both plots, better models are on the top right corner. Overall, the ETM is a better topic model.

we calculate log likelihood on a document com- 20NewsGroups, the NVDM’s predictions are in
pletion task (Rosen-Zvi et al., 2004; Wallach et al., general better than LDA but worse than for the
2009). We divide each test document into two sets other methods; on the New York Times, the NVDM
of words. The first half is observed: it induces a gives the best predictions. However, topic qual-
distribution over topics which, in turn, induces a ity for the NVDM is far below the other methods.
distribution over the next words in the document. (It does not provide “topics”, so we assess the in-
We then evaluate the second half under this distri- terpretability of its β matrix.) In prediction, both
bution. A good document model should provide versions of the ETM are at least as good as the
higher log-likelihood on the second half. (For all simplex-constrained ∆-NVDM.
methods, we approximate the likelihood by setting
θd to the variational mean.) These figures show that, of the interpretable
models, the ETM provides the best predictive per-
We study both corpora and with different vocab- formance while keeping interpretable topics. It is
ularies. Figure 4 shows topic quality as a function robust to large vocabularies.
of predictive power. (To ease visualization, we
normalize both metrics by subtracting the mean
6.1 Stop words
and dividing by the standard deviation.) The best
models are on the upper right corner.
We now study a version of the New York Times cor-
LDA predicts worst in almost all settings. On pus that includes all stop words. We remove infre-
can
good Table 3. Topic quality on the New York Times data
passing better
our us
fine
in the presence of stop words. Topic quality is
we
Topic 181 the product of topic coherence and topic diver-
sity (higher is better). The labeled ETM is robust
going
to stop words; it achieves similar topic coherence
best
lot always than when there are no stop words.
never

just why
Coherence Diversity Quality
right what
how LDA 0.13 0.14 0.0173
together
way very
∆-NVDM 0.17 0.11 0.0187
Labeled ETM 0.18 0.22 0.0405
Figure 5. A topic containing stop words found by
the ETM on The New York Times. The ETM is
robust even in the presence of stop words. tion whose natural parameter is the inner product
of the word embeddings and the embedding of the
assigned topic.
quent words to form a vocabulary of size 10,283.
Our goal is to show that the labeled ETM provides The ETM learns interpretable word embeddings
interpretable topics even in the presence of stop and topics, even in corpora with large vocabu-
words, another regime where topic models typi- laries. We studied the performance of the ETM
cally fail. In particular, given that stop words ap- against several document models. The ETM learns
pear in many documents, traditional topic models both coherent patterns of language and an accurate
learn topics that contain stop words, regardless of distribution of words.
the actual semantics of the topic. This leads to
poor topic interpretability. Acknowledgments
We fit LDA, the ∆-NVDM, and the labeled ETM
with K = 300 topics. (We do not report the This work is funded by ONR N00014-17-1-2131,
NVDM because it does not provide interpretable NIH 1U01MH115727-01, DARPA SD2 FA8750-
topics.) Table 3 shows topic quality (the product of 18-C-0130, ONR N00014-15-1-2209, NSF CCF-
topic coherence and topic diversity). Overall, the 1740833, the Alfred P. Sloan Foundation, 2Sigma,
labeled ETM gives the best performance in terms Amazon, and NVIDIA. FJRR is funded by the Eu-
of topic quality. ropean Union’s Horizon 2020 research and inno-
vation programme under the Marie Skłodowska-
While the ETM has a few “stop topics” that are
Curie grant agreement No. 706760. ABD is sup-
specific for stop words (see, e.g., Figure 5), ∆-
ported by a Google PhD Fellowship.
NVDM and LDA have stop words in almost every
topic. (The topics are not displayed here for space
constraints.) The reason is that stop words co-
occur in the same documents as every other word; References
therefore traditional topic models have difficulties
telling apart content words and stop words. The la- J. Aitchison and S. Shen. 1980. Logistic nor-
beled ETM recognizes the location of stop words mal distributions: Some properties and uses.
in the embedding space; its sets them off on their Biometrika, 67(2):261–272.
own topic.
K. Batmanghelich, A. Saeedi, K. Narasimhan,
and S. Gershman. 2016. Nonparametric spher-
7 Conclusion
ical topic modeling with word embeddings.
In Proceedings of the conference. Association
We developed the ETM, a generative model of
for Computational Linguistics. Meeting, vol-
documents that marries LDA with word embed-
ume 2016, page 537. NIH Public Access.
dings. The ETM assumes that topics and words
live in the same embedding space, and that Y. Bengio, R. Ducharme, P. Vincent, and C. Jan-
words are generated from a categorical distribu- vin. 2003. A neural probabilistic language
model. Journal of Machine Learning Research, M. D. Hoffman, D. M. Blei, and F. Bach. 2010.
3:1137–1155. Online learning for latent Dirichlet allocation.
In Advances in Neural Information Processing
Y. Bengio, H. Schwenk, J.-S. Senécal, F. Morin, Systems.
and J.-L. Gauvain. 2006. Neural probabilistic
language models. In Innovations in Machine M. D. Hoffman, D. M. Blei, C. Wang, and
Learning, pages 137–186. Springer. J. Paisley. 2013. Stochastic variational infer-
ence. Journal of Machine Learning Research,
D. M. Blei. 2012. Probabilistic topic models. 14:1303–1347.
Communications of the ACM, 55(4):77–84.
M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and
D. M. Blei, A. Kucukelbir, and J. D. McAuliffe. L. K. Saul. 1999. An introduction to variational
2017. Variational inference: A review for statis- methods for graphical models. Machine Learn-
ticians. Journal of the American Statistical As- ing, 37(2):183–233.
sociation, 112(518):859–877.
D. P. Kingma and J. L. Ba. 2015. Adam: A
D. M. Blei and J. D. Lafferty. 2007. A correlated method for stochastic optimization. In Interna-
topic model of Science. The Annals of Applied tional Conference on Learning Representations.
Statistics, 1(1):17–35.
D. P. Kingma and M. Welling. 2014. Auto-
D. M. Blei, A. Y. Ng, and M. I. Jordan. 2003. encoding variational Bayes. In International
Latent dirichlet allocation. Journal of machine Conference on Learning Representations.
Learning research, 3(Jan):993–1022.
J. H. Lau, D. Newman, and T. Baldwin. 2014. Ma-
J. Boyd-Graber, Y. Hu, and D. Mimno. 2017. Ap- chine reading tea leaves: Automatically evalu-
plications of topic models. Foundations and ating topic coherence and topic model quality.
Trends in Information Retrieval, 11(2–3):143– In Conference of the European Chapter of the
296. Association for Computational Linguistics.
S. Bunk and R. Krestel. 2018. Welda: Enhanc- Q. Le and T. Mikolov. 2014. Distributed represen-
ing topic models by incorporating local word tations of sentences and documents. In Interna-
context. In Proceedings of the 18th ACM/IEEE tional Conference on Machine Learning.
on Joint Conference on Digital Libraries, pages
293–302. ACM. O. Levy and Y. Goldberg. 2014. Neural word
embedding as implicit matrix factorization. In
D. Card, C. Tan, and N. A. Smith. 2017. A neu- Neural Information Processing Systems, pages
ral framework for generalized topic models. In 2177–2185.
arXiv:1705.09296.
Y. Li and Y. Tao. 2018. Word Embedding for
Y. Cong, B. Chen, H. Liu, and M. Zhou. 2017. Understanding Natural Language: A Survey.
Deep latent Dirichlet allocation with topic- Springer International Publishing.
layer-adaptive stochastic gradient Riemannian
MCMC. In International Conference on Ma- Y. Miao, L. Yu, and P. Blunsom. 2016. Neural
chine Learning. variational inference for text processing. In In-
ternational Conference on Machine Learning.
R. Das, M. Zaheer, and C. Dyer. 2015. Gaus-
sian LDA for topic models with word embed- T. Mikolov, K. Chen, G. Corrado, and J. Dean.
dings. In Association for Computational Lin- 2013a. Efficient estimation of word represen-
guistics and International Joint Conference on tations in vector space. ICLR Workshop Pro-
Natural Language Processing (Volume 1: Long ceedings. arXiv:1301.3781.
Papers).
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado,
S. J. Gershman and N. D. Goodman. 2014. Amor- and J. Dean. 2013b. Distributed representations
tized inference in probabilistic reasoning. In of words and phrases and their compositional-
Annual Meeting of the Cognitive Science Soci- ity. In Neural Information Processing Systems,
ety. pages 3111–3119.
D. Mimno, H. M. Wallach, E. Talley, M. Leenders, Research and Development in Information Re-
and A. McCallum. 2011. Optimizing semantic trieval, pages 375–384. ACM.
coherence in topic models. In Conference on
A. Srivastava and C. Sutton. 2017. Autoencoding
Empirical Methods in Natural Language Pro-
variational inference for topic models. arXiv
cessing.
preprint arXiv:1703.01488.
A. Mnih and K. Kavukcuoglu. 2013. Learn-
M. K. Titsias and M. Lázaro-Gredilla. 2014.
ing word embeddings efficiently with noise-
Doubly stochastic variational Bayes for non-
contrastive estimation. In Neural Information
conjugate inference. In International Confer-
Processing Systems, pages 2265–2273.
ence on Machine Learning.
C. E. Moody. 2016. Mixing dirichlet topic models H. M. Wallach, I. Murray, R. Salakhutdinov, and
and word embeddings to make lda2vec. arXiv D. Mimno. 2009. Evaluation methods for topic
preprint arXiv:1605.02019. models. In International Conference on Ma-
D. Q. Nguyen, R. Billingsley, L. Du, and M. John- chine Learning.
son. 2015. Improving topic models with latent P. Xie, D. Yang, and E. Xing. 2015. Incorporating
feature word representations. Transactions of word correlation knowledge into topic model-
the Association for Computational Linguistics, ing. In Proceedings of the 2015 conference of
3:299–313. the north American chapter of the association
J. Pennington, R. Socher, and C. D. Manning. for computational linguistics: human language
2014. Glove: Global vectors for word repre- technologies, pages 725–734.
sentation. In Conference on Empirical Methods H. Xu, W. Wang, W. Liu, and L. Carin. 2018. Dis-
on Natural Language Processing, volume 14, tilled Wasserstein learning for word embedding
pages 1532–1543. and topic modeling. In Advances in Neural In-
formation Processing Systems.
J. Petterson, W. Buntine, S. M. Narayanamurthy,
T. S. Caetano, and A. J. Smola. 2010. Word G. Xun, V. Gopalakrishnan, F. Ma, Y. Li, J. Gao,
features for latent dirichlet allocation. In Ad- and A. Zhang. 2016. Topic discovery for short
vances in Neural Information Processing Sys- texts using word embeddings. In 2016 IEEE
tems, pages 1921–1929. 16th international conference on data mining
(ICDM), pages 1299–1304. IEEE.
D. J. Rezende, S. Mohamed, and D. Wierstra.
2014. Stochastic backpropagation and ap- G. Xun, Y. Li, W. X. Zhao, J. Gao, and A. Zhang.
proximate inference in deep generative models. 2017. A correlated topic model using word em-
arXiv preprint arXiv:1401.4082. beddings. In IJCAI, pages 4207–4213.
M. Rosen-Zvi, T. Griffiths, M. Steyvers, and H. Zhang, B. Chen, D. Guo, and M. Zhou. 2018.
P. Smyth. 2004. The author-topic model for au- WHAI: Weibull hybrid autoencoding inference
thors and documents. In Uncertainty in Artifi- for deep topic modeling. In International Con-
cial Intelligence. ference on Learning Representations.

M. Rudolph, F. J. R. Ruiz, S. Mandt, and D. M. H. Zhao, L. Du, and W. Buntine. 2017a. A word
Blei. 2016. Exponential family embeddings. embeddings informed focused topic model. In
In Advances in Neural Information Processing Asian Conference on Machine Learning, pages
Systems. 423–438.

D. Rumelhart and A. Abrahamson. 1973. A model H. Zhao, L. Du, W. Buntine, and G. Liu. 2017b.
for analogical reasoning. Cognitive Psychol- Metalda: A topic model that efficiently incor-
ogy, 5(1):1–28. porates meta information. In 2017 IEEE Inter-
national Conference on Data Mining (ICDM),
B. Shi, W. Lam, S. Jameel, S. Schockaert, and pages 635–644. IEEE.
K. P. Lai. 2017. Jointly learning word embed-
dings and latent topics. In Proceedings of the
40th International ACM SIGIR Conference on

Artificial Intelligence Frame: Fundamentals and Applications
From Everand
Artificial Intelligence Frame: Fundamentals and Applications
Fouad Sabry
No ratings yet
Computer Science Ontology: Fundamentals and Applications
From Everand
Computer Science Ontology: Fundamentals and Applications
Fouad Sabry
No ratings yet
B2 To C2 Full Vocabulary List
No ratings yet
B2 To C2 Full Vocabulary List
123 pages
Relationship Extraction: Fundamentals and Applications
From Everand
Relationship Extraction: Fundamentals and Applications
Fouad Sabry
No ratings yet
FLETCHER Beowulf
No ratings yet
FLETCHER Beowulf
3 pages
Ontology of Grit and Foreign Language Anxiety
No ratings yet
Ontology of Grit and Foreign Language Anxiety
3 pages
Upper Ontology: Fundamentals and Applications
From Everand
Upper Ontology: Fundamentals and Applications
Fouad Sabry
No ratings yet
Language Identification: Fundamentals and Applications
From Everand
Language Identification: Fundamentals and Applications
Fouad Sabry
No ratings yet
Explanation Based Learning: Fundamentals and Applications
From Everand
Explanation Based Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Lloyd Chia Chang Kai (BSSS) - Grammar Practice 1EL4
No ratings yet
Lloyd Chia Chang Kai (BSSS) - Grammar Practice 1EL4
12 pages
Introduction to LLMs for Business Leaders: Responsible AI Strategy Beyond Fear and Hype: Byte-Sized Learning Series
From Everand
Introduction to LLMs for Business Leaders: Responsible AI Strategy Beyond Fear and Hype: Byte-Sized Learning Series
I. Almeida
No ratings yet
Completed 7 C's Planner Emily Perez
No ratings yet
Completed 7 C's Planner Emily Perez
7 pages
Prota Bahasa Jawa
No ratings yet
Prota Bahasa Jawa
6 pages
PREPARE 3 Grammar Plus Unit 05
No ratings yet
PREPARE 3 Grammar Plus Unit 05
2 pages
Performnce Exercises
No ratings yet
Performnce Exercises
3 pages
C How To Prepare A Lesson Plan. 1
No ratings yet
C How To Prepare A Lesson Plan. 1
13 pages
Giving Advice and Making Suggestions
No ratings yet
Giving Advice and Making Suggestions
11 pages
Cbleenpu 06
100% (2)
Cbleenpu 06
12 pages
SUBJECT-VERB AGREEMENT PPT
No ratings yet
SUBJECT-VERB AGREEMENT PPT
20 pages
Hellenic Traditions in The Rural Area of The Bosporan Kingdom in Roman Times
No ratings yet
Hellenic Traditions in The Rural Area of The Bosporan Kingdom in Roman Times
31 pages
5 - Ines - Topic Modeling On News Articles Using Latent Dirichlet Allocation Kretinin A Kol
No ratings yet
5 - Ines - Topic Modeling On News Articles Using Latent Dirichlet Allocation Kretinin A Kol
10 pages
Battletome: Flesh-Eater Courts: Designers' Commentary, February 2024 Errata, February 2024
No ratings yet
Battletome: Flesh-Eater Courts: Designers' Commentary, February 2024 Errata, February 2024
1 page
How Word Vectors Capture The Meaning Behind Words Mathematically
No ratings yet
How Word Vectors Capture The Meaning Behind Words Mathematically
4 pages
Forhad Hossain
No ratings yet
Forhad Hossain
2 pages
A Biterm Topic Model For Short Texts
No ratings yet
A Biterm Topic Model For Short Texts
11 pages
A Survey On Neural Topic Models
No ratings yet
A Survey On Neural Topic Models
24 pages
NLP - L9 Word Embedding
No ratings yet
NLP - L9 Word Embedding
5 pages
A Network Approach To Topic Models
No ratings yet
A Network Approach To Topic Models
22 pages
A Correlated Topic Model of Science1
No ratings yet
A Correlated Topic Model of Science1
19 pages
Topic Modelling Meets Deep Neural Networks - A Survey
No ratings yet
Topic Modelling Meets Deep Neural Networks - A Survey
8 pages
Minh Anh-SPECIAL SUBJECT 1 - NOUN-ADJECTIVE-ADVERB
No ratings yet
Minh Anh-SPECIAL SUBJECT 1 - NOUN-ADJECTIVE-ADVERB
3 pages
2020 Tacl-1 29
No ratings yet
2020 Tacl-1 29
15 pages
Exercise Comparison Degrees
No ratings yet
Exercise Comparison Degrees
2 pages
Behaviourism - Cognitivism
No ratings yet
Behaviourism - Cognitivism
47 pages
ITD253 L8 TopicModelling
No ratings yet
ITD253 L8 TopicModelling
31 pages
Neuromax and Its Baseline 2409.19749v1
No ratings yet
Neuromax and Its Baseline 2409.19749v1
15 pages
Ecrtm 2306.04217v1
No ratings yet
Ecrtm 2306.04217v1
23 pages
Focus3 2E LessonPlans U01 Lesson5 Reading
No ratings yet
Focus3 2E LessonPlans U01 Lesson5 Reading
2 pages
NSTM 2008.13537v3
No ratings yet
NSTM 2008.13537v3
15 pages
Handling of Prepositions in English To Bengali Machine Translation
No ratings yet
Handling of Prepositions in English To Bengali Machine Translation
6 pages
Exercises en Text Models 2
No ratings yet
Exercises en Text Models 2
5 pages
UTOPIC 2023.eacl-Main.132
No ratings yet
UTOPIC 2023.eacl-Main.132
16 pages
Wete 2203.01570v2
No ratings yet
Wete 2203.01570v2
17 pages
Towards A Filipino History A Festschrift For Zeus Salazar 9789718755099 Compress
No ratings yet
Towards A Filipino History A Festschrift For Zeus Salazar 9789718755099 Compress
313 pages
DBM 302 Presentation
No ratings yet
DBM 302 Presentation
5 pages
2024 Eacl-Long 51
No ratings yet
2024 Eacl-Long 51
20 pages
Current Titles Sec
No ratings yet
Current Titles Sec
18 pages
A Survey On Neural Topic Models: Methods, Applications, and Challenges
No ratings yet
A Survey On Neural Topic Models: Methods, Applications, and Challenges
30 pages
Graphic Grammar Reference: Graphic Grammar Presents Structures Visually Using Appealing Graphics and Colour-Coded
No ratings yet
Graphic Grammar Reference: Graphic Grammar Presents Structures Visually Using Appealing Graphics and Colour-Coded
6 pages
From Word Vectors To Multimodal Embeddings: Techniques, Applications, and Future Directions For Large Language Models
No ratings yet
From Word Vectors To Multimodal Embeddings: Techniques, Applications, and Future Directions For Large Language Models
21 pages
He Laskar 2019
No ratings yet
He Laskar 2019
4 pages
Paragraphs (I) Unity, Development and Coherence
No ratings yet
Paragraphs (I) Unity, Development and Coherence
5 pages
A Latent Variable Model Approach To PMI-based Word Embeddings
No ratings yet
A Latent Variable Model Approach To PMI-based Word Embeddings
16 pages
Advertisement of 2017-18 - 0
No ratings yet
Advertisement of 2017-18 - 0
11 pages
1 s2.0 S1877050921012199 Main
No ratings yet
1 s2.0 S1877050921012199 Main
4 pages
Ge Elec 1 - Chapter 1
No ratings yet
Ge Elec 1 - Chapter 1
44 pages
Topic Modeling v.02
No ratings yet
Topic Modeling v.02
26 pages
Quiroga and Kipling - Comparative Study
No ratings yet
Quiroga and Kipling - Comparative Study
14 pages
3 Topic Models
No ratings yet
3 Topic Models
15 pages
4 Steps of Using Latent Dirichlet Allocation (LDA) For Topic Modeling in NLP
No ratings yet
4 Steps of Using Latent Dirichlet Allocation (LDA) For Topic Modeling in NLP
21 pages
Distinctive Features
100% (1)
Distinctive Features
13 pages
Topic Models Indian Institute of Technology Pawangcoursestopicmodelspdf
No ratings yet
Topic Models Indian Institute of Technology Pawangcoursestopicmodelspdf
93 pages
Pre-Training Is A Hot Topic: Contextualized Document Embeddings Improve Topic Coherence
No ratings yet
Pre-Training Is A Hot Topic: Contextualized Document Embeddings Improve Topic Coherence
8 pages
Respond Instead of Reacting - Azim Premji PDF
100% (1)
Respond Instead of Reacting - Azim Premji PDF
9 pages
Topic Models Dsi Talk March 2017
No ratings yet
Topic Models Dsi Talk March 2017
24 pages
An Integrated Clustering and BERT Framework For Improved Topic Modeling
No ratings yet
An Integrated Clustering and BERT Framework For Improved Topic Modeling
9 pages
NLP An Intuitive Understanding of Word Embeddings From Count Vectors To Word2Vec
No ratings yet
NLP An Intuitive Understanding of Word Embeddings From Count Vectors To Word2Vec
18 pages
Song 2009
No ratings yet
Song 2009
4 pages
Sbalchiero Topicmodelinglongtextsand
No ratings yet
Sbalchiero Topicmodelinglongtextsand
14 pages
Topic Modeling With BERT. - Towards Data Science
No ratings yet
Topic Modeling With BERT. - Towards Data Science
9 pages
Eai 13-7-2018 159623
No ratings yet
Eai 13-7-2018 159623
16 pages
Information Retrieval Using Effective Bigram Topic Modeling
No ratings yet
Information Retrieval Using Effective Bigram Topic Modeling
8 pages
Ber Topic
No ratings yet
Ber Topic
10 pages
Markov Random Topic Fields: Hal Daum e III School of Computing University of Utah Salt Lake City, UT 84112 [email protected]
No ratings yet
Markov Random Topic Fields: Hal Daum e III School of Computing University of Utah Salt Lake City, UT 84112 [email protected]
4 pages
Wordembed v2.0
No ratings yet
Wordembed v2.0
46 pages
Probabilistic Topic Modeling and Its Variants - A Survey: Padmaja CH V R S Lakshmi Narayana
No ratings yet
Probabilistic Topic Modeling and Its Variants - A Survey: Padmaja CH V R S Lakshmi Narayana
5 pages
Baro Luuqada Ingiriisiga Dhib Yari
No ratings yet
Baro Luuqada Ingiriisiga Dhib Yari
37 pages
Semantic Modeling In Formal English
From Everand
Semantic Modeling In Formal English
Dr. Ir. Andries Van Renssen
No ratings yet
Word Embeddings Paper
No ratings yet
Word Embeddings Paper
7 pages
E Icient Correlated Topic Modeling With Topic Embedding
No ratings yet
E Icient Correlated Topic Modeling With Topic Embedding
9 pages
Topic Modeling Text Clustering Based On Deep Learning Model
No ratings yet
Topic Modeling Text Clustering Based On Deep Learning Model
11 pages
Correlated Topic Models: David M. Blei John D. Lafferty
No ratings yet
Correlated Topic Models: David M. Blei John D. Lafferty
8 pages
Latent Dirichlet Allocation LDA and Topic Modeling PDF
No ratings yet
Latent Dirichlet Allocation LDA and Topic Modeling PDF
41 pages
The Structural Topic Model and Applied Social Science
No ratings yet
The Structural Topic Model and Applied Social Science
4 pages
WMD PDF
No ratings yet
WMD PDF
10 pages
Improving Topic Models With Latent Feature Word Representations
No ratings yet
Improving Topic Models With Latent Feature Word Representations
16 pages
Topic Modelling Using NLP
No ratings yet
Topic Modelling Using NLP
18 pages
Experiments With Non Parametric Topic Models
No ratings yet
Experiments With Non Parametric Topic Models
10 pages
Topic Model For LDA
No ratings yet
Topic Model For LDA
9 pages
Latent Dirichlet Allocation
100% (2)
Latent Dirichlet Allocation
13 pages

Topic Modeling in Embedding Spaces

Uploaded by

Topic Modeling in Embedding Spaces

Uploaded by

Topic Modeling in Embedding Spaces

Adji B. Dieng Francisco J. R. Ruiz David M. Blei

Abstract can be fit to large datasets of text by using varia-

heavy-tailed vocabularies. To this end, we

christ players sports

D We use this family of variational distributions to

Skip-gram embeddings ETM embeddings

NVDM embeddings ∆-NVDM embeddings

V=9842 V=55627 V=74095 V=124725 V=212237

You might also like