Sievert, C., & Shirley, K. E. LDAvis. A Method For Visualizing and Interpreting Topics

LDAvis: A method for visualizing and interpreting topics
Carson Sievert Kenneth E. Shirley

Iowa State University AT&T Labs Research
3414 Snedecor Hall 33 Thomas Street, 26th Floor
Ames, IA 50014, USA New York, NY 10007, USA
[email protected] [email protected]
Abstract a few basic questions about a fitted topic model:

(1) What is the meaning of each topic?, (2) How
We present LDAvis, a web-based interac- prevalent is each topic?, and (3) How do the topics
tive visualization of topics estimated using relate to each other? Different visual components
Latent Dirichlet Allocation that is built us- answer each of these questions, some of which are
ing a combination of R and D3. Our visu- original, and some of which are borrowed from ex-
alization provides a global view of the top- isting tools.
ics (and how they differ from each other), Our visualization (illustrated in Figure 1) has
while at the same time allowing for a deep two basic pieces. First, the left panel of our visual-
inspection of the terms most highly asso- ization presents a global view of the topic model,
ciated with each individual topic. First, and answers questions 2 and 3. In this view, we
we propose a novel method for choosing plot the topics as circles in the two-dimensional
which terms to present to a user to aid in plane whose centers are determined by comput-
the task of topic interpretation, in which ing the distance between topics, and then by us-
we define the relevance of a term to a ing multidimensional scaling to project the inter-
topic. Second, we present results from a topic distances onto two dimensions, as is done
user study that suggest that ranking terms in (Chuang et al., 2012a). We encode each topic’s
purely by their probability under a topic is overall prevalence using the areas of the circles,
suboptimal for topic interpretation. Last, where we sort the topics in decreasing order of
we describe LDAvis, our visualization prevalence.
system that allows users to flexibly explore Second, the right panel of our visualization de-
topic-term relationships using relevance to picts a horizontal barchart whose bars represent
better understand a fitted LDA model. the individual terms that are the most useful for in-
terpreting the currently selected topic on the left,
1 Introduction
and allows users to answer question 1, “What is
Recently much attention has been paid to visual- the meaning of each topic?”. A pair of overlaid
izing the output of topic models fit using Latent bars represent both the corpus-wide frequency of
Dirichlet Allocation (LDA) (Gardner et al., 2010; a given term as well as the topic-specific frequency
Chaney and Blei, 2012; Chuang et al., 2012b; Gre- of the term, as in (Chuang et al., 2012b).
tarsson et al., 2011). Such visualizations are chal- The left and right panels of our visualization are
lenging to create because of the high dimensional- linked such that selecting a topic (on the left) re-
ity of the fitted model – LDA is typically applied veals the most useful terms (on the right) for inter-
to many thousands of documents, which are mod- preting the selected topic. In addition, selecting a
eled as mixtures of dozens (or hundreds) of top- term (on the right) reveals the conditional distribu-
ics, which themselves are modeled as distributions tion over topics (on the left) for the selected term.
over thousands of terms (Blei et al., 2003; Griffiths This kind of linked selection allows users to exam-
and Steyvers, 2004). The most promising basic ine a large number of topic-term relationships in a
technique for creating LDA visualizations that are compact manner.
both compact and thorough is interactivity. A key innovation of our system is how we deter-
We introduce an interactive visualization sys- mine the most useful terms for interpreting a given
tem that we call LDAvis that attempts to answer topic, and how we allow users to interactively ad-
63
Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces, pages 63–70,
c
Baltimore, Maryland, USA, June 27, 2014. 2014 Association for Computational Linguistics
Figure 1: The layout of LDAvis, with the global topic view on the left, and the term barcharts (with
Topic 34 selected) on the right. Linked selections allow users to reveal aspects of the topic-term relation-
ships compactly.
just this determination. A topic in LDA is a multi- tion in Section 3.2, and we describe how we incor-
nomial distribution over the (typically thousands porate relevance into our interactive visualization
of) terms in the vocabulary of the corpus. To inter- in Section 4.
pret a topic, one typically examines a ranked list of
the most probable terms in that topic, using any- 2 Related Work
where from three to thirty terms in the list. The
Much work has been done recently regarding the
problem with interpreting topics this way is that
interpretation of topics (i.e. measuring topic “co-
common terms in the corpus often appear near the
herence”) as well as visualization of topic models.
top of such lists for multiple topics, making it hard
to differentiate the meanings of these topics. 2.1 Topic Interpretation and Coherence
Bischof and Airoldi (2012) propose ranking It is well-known that the topics inferred by LDA
terms for a given topic in terms of both the fre- are not always easily interpretable by humans.
quency of the term under that topic as well as the Chang et al. (2009) established via a large
term’s exclusivity to the topic, which accounts for user study that standard quantitative measures of
the degree to which it appears in that particular fit, such as those summarized by Wallach et al.
topic to the exclusion of others. We propose a sim- (2009), do not necessarily agree with measures of
ilar measure that we call the relevance of a term topic interpretability by humans. Ramage et al.
to a topic that allows users to flexibly rank terms (2009) assert that “characterizing topics is hard”
in order of usefulness for interpreting topics. We and describe how using the top-k terms for a given
discuss our definition of relevance, and its graphi- topic might not always be best, but offer few con-
cal interpretation, in detail in Section 3.1. We also crete alternatives.
present the results of a user study conducted to de- AlSumait et al. (2009), Mimno et al. (2011),
termine the optimal tuning parameter in the defini- and Chuang et al. (2013b) develop quantitative
tion of relevance to aid the task of topic interpreta- methods for measuring the interpretability of top-
64
ics based on experiments with data sets that come respect to frequency and exclusivity, and they rec-
with some notion of topical ground truth, such as ommend it as a way to rank terms to aid topic in-
document metadata or expert-created topic labels. terpretation. We propose a similar method that is
These methods are useful for understanding, in a a weighted average of the logarithms of a term’s
global sense, which topics are interpretable (and probability and its lift, and we justify it with a user
why), but they don’t specifically attempt to aid the study and incorporate it into our interactive visu-
user in interpreting individual topics. alization.
Blei and Lafferty (2009) developed “Turbo Top-
ics”, a method of identifying n-grams within LDA- 2.2 Topic Model Visualization Systems
inferred topics that, when listed in decreasing or- A number of visualization systems for topic mod-
der of probability, provide users with extra in- els have been developed in recent years. Sev-
formation about the usage of terms within top- eral of them focus on allowing users to browse
ics. This two-stage process yields good results on documents, topics, and terms to learn about the
experimental data, although the resulting output relationships between these three canonical topic
is still simply a ranked list containing a mixture model units (Gardner et al., 2010; Chaney and
of terms and n-grams, and the usefulness of the Blei, 2012; Snyder et al., 2013). These browsers
method for topic interpretation was not tested in a typically use lists of the most probable terms
user study. within topics to summarize the topics, and the vi-
Newman et al. (2010) describe a method for sualization elements are limited to barcharts or
ranking terms within topics to aid interpretability word clouds of term probabilities for each topic,
called Pointwise Mutual Information (PMI) rank- pie charts of topic probabilities for each document,
ing. Under PMI ranking of terms, each of the ten and/or various barcharts or scatterplots related to
most probable terms within a topic are ranked in document metadata. Although these tools can be
decreasing order of approximately how often they useful for browsing a corpus, we seek a more com-
occur in close proximity to the nine other most pact visualization, with the more narrow focus of
probable terms from that topic in some large, ex- quickly and easily understanding the individual
ternal “reference” corpus, such as Wikipedia or topics themselves (without necessarily visualizing
Google n-grams. Although this method correlated documents).
highly with human judgments of term importance Chuang et al. (2012b) develop such a tool,
within topics, it does not easily generalize to topic called “Termite”, which visualizes the set of topic-
models fit to corpora that don’t have a readily term distributions estimated in LDA using a ma-
available external source of word co-occurrences. trix layout. The authors introduce two measures
In contrast, Taddy (2011) uses an intrinsic mea- of the usefulness of terms for understanding a
sure to rank terms within topics: a quantity called topic model: distinctiveness and saliency. These
lift, defined as the ratio of a term’s probability quantities measure how much information a term
within a topic to its marginal probability across conveys about topics by computing the Kullback-
the corpus. This generally decreases the rankings Liebler divergence between the distribution of top-
of globally frequent terms, which can be helpful. ics given the term and the marginal distribution
We find that it can be noisy, however, by giving of topics (distinctiveness), optionally weighted
high rankings to very rare terms that occur in only by the term’s overall frequency (saliency). The
a single topic, for instance. While such terms may authors recommend saliency as a thresholding
contain useful topical content, if they are very rare method for selecting which terms are included in
the topic may remain difficult to interpret. the visualization, and they further use a seriation
Finally, Bischof and Airoldi (2012) develop and method for ordering the most salient terms to high-
implement a new statistical topic model that infers light differences between topics.
both a term’s frequency as well as its exclusivity Termite is a compact, intuitive interactive visu-
– the degree to which its occurrences are limited alization of the topics in a topic model, but by only
to only a few topics. They introduce a univari- including terms that rank high in saliency or dis-
ate measure called a FREX score (“FRequency tinctiveness, which are global properties of terms,
and EXclusivity”) which is a weighted harmonic it is restricted to providing a global view of the
mean of a term’s rank within a given topic with model, rather than allowing a user to deeply in-
65
spect individual topics by visualizing a potentially Topic 29 of 50 (20 Newgroups data)
different set of terms for every single topic. In
fact, Chuang et al. (2013a) describe the use of a
70.4 ● ● ●
●
● ● ● ● ● ●●●●●
● ● ● ●● ● ● ● ● ● ●
● ● ●● ● ●
● ● ●● ●● ●
● ● ● ● ● ●● ● ●●
● ● ● ●● ● ●●● ●●● ● ● exhaust
● ● ● ● ●●
● ● ● ● ●● ●● ● ●● ● ● ●●● ● ● plastic● oil
● ● ● ● ● ● lights
“topic-specific word ordering” as potentially use-

● ● ● ● ● ●
● ● ● ●
● ● ● ●●
● ● ● ●● ●●● ● ●●●
● ● ● ● ● ● ●● ● ●
● ● ● ●● ●
● ●
● ●
● ● ●● ● ●
●● ●●
● ● ● ● ●● remove
● ● ●
● ● ●●
● ● ●●●● ●●●●●●
● ● ● ● eye
25.3 ●
● ●
● ●
●
● ●
● ●● ●●
●● ●
●●
● ● ● ● ●●● ●● ●● ● ●
ful future work.

● ● ●
● ● ● ●
● ●
● ●
● ●
● ● ● ● ● ● ● ●
●
● ●
● ●
● ● ●●
●
●● ● ●●● ●● ●●●●
●● ● ●
●
●
●
● ●
● ● ●●● ● ●●● ● ●● ● ● ● ● water ● light
●
●
●
●
● ● ● ● ●● ● ● ● ● ● ●●
●
● ● ●
● ●
● ● ● ● ●
● ● ●●●● ●
● ● ● ● ●●
● ●● ●
● ● ● ● ●
● ●●● ●●●●● ●
●
●
●
●
●
● ●
● ● ●
● ●
● ●● ●●● ●● ●
● ● ● ● ●● ●
Lift (log scale)

●
● ● ● ● ●● ●● ●● ●
9.1 ● ● ● ● ● ● ●
● ● ●●● ● ● ●
● ●
● ● ●●
●
● ● ● ●●● ●●●
●● ● ●●
3 Relevance of terms to topics

● ● ●
● ●
●
●
● ● ● ● ● ● ● ●● ● ●●●● ● ● ●
● ● ● ●
● ●● ● ● ● ● ● ● ●
● ● ● ●
●
● ● ● ●
● ● ● ● ●● ● ● ●●● ● ●
●
●
●
● ● ● ● ● ●●●●●● ● ● ●
● ●
● ● ●●
● ● ●● ● ● ●● ● ● ●●● ●● ●●
●●●●
●●
●
● ●
●
●
● ● ● ● ●●
● ●
●● ●● ● ● ●●● ● ● ● ●
●
●
●
● ● ● ● ●● ●●
● ●● ●● ● ● ●
● ● ● ● ●● ● ●● ●
● ●● ● ● ● ● ● out
3.3 ● ● ● ● ● ●● ● ● ● up
Here we define relevance, our method for ranking

● ● ●
● ● ● ● ● ● ● ●
● ●
● ● ● ● ●● ● ● ● ●●
●● ●● ● ●
● ● ● ●●
● ● ●● ● ● ● ●
● ● ●
● ● ●● ● ●●
●●●
●
●
● ●
● ● ● ●● ● ●●●
●● ● ● ● ●
● ●
● ● ● ● ● ● ● ●
●●
terms within topics, and we describe the results of
●
● ● ●● ● ●
●
● ●● ● ●●
Top
● ● ● ● ●●
10 Most Relevant Boundary
● ●●
●●
●
●
● ●
● ●
●
● ● ●● ● ● ●
●
● ●● ● ● ●
● ● ●●
1.2 ● ● ●
● lambda = 0 ● ●●● ●
●
●
●
a user study to learn an optimal tuning parameter

● ● ● ●● ●
● ● ● lambda = 1/3
● ● ●
● ●● ● ●
● ● ● ● ●
●
● lambda = 2/3
● ● ●
in the computation of relevance.

● ●
● lambda = 1
0.4 ●
0 0.001 0.002 0.004 0.011 0.03

3.1 Definition of Relevance P(Token | Topic) (log scale)
Let φkw denote the probability of term w ∈

{1, ..., V } for topic k ∈ {1, ..., K}, where V de- Figure 2: Dotted lines separating the top-10 most
notes the number of terms in the vocabulary, and relevant terms for different values of λ, with the
let pw denote the marginal probability of term w in most relevant terms for λ = 2/3 displayed and
the corpus. One typically estimates φ in LDA us- highlighted in blue.
ing Variational Bayes methods or Collapsed Gibbs
Sampling, and pw from the empirical distribution
of the corpus (optionally smoothed by including are {out, #emailaddress, #twodigitnumer, up,
prior weights as pseudo-counts). #onedigitnumber}, where a “#” symbol denotes
We define the relevance of term w to topic k a term that is an entity representing a class of
given a weight parameter λ (where 0 ≤ λ ≤ 1) as: things. In contrast to this list, which contains glob-
ally common terms and which provides very lit-
φ
r(w, k | λ) = λ log(φkw ) + (1 − λ) log
kw
, tle meaning regarding motorcycles, automobiles,
pw or electronics, the top-5 most relevant terms given
λ = 1/3 are {oil, plastic, pipes, fluid, and lights}.
where λ determines the weight given to the prob-
The second set of terms is much more descriptive
ability of term w under topic k relative to its lift
of the topic being discussed than the first.
(measuring both on the log scale). Setting λ = 1
results in the familiar ranking of terms in decreas-
3.2 User Study
ing order of their topic-specific probability, and
setting λ = 0 ranks terms solely by their lift. We We conducted a user study to determine whether
wish to learn an “optimal” value of λ for topic in- there was an optimal value of λ in the definition of
terpretation from our user study. relevance to aid topic interpretation. First, we fit
First, though, to see how different values of λ a 50-topic model to the D = 13, 695 documents
result in different ranked term lists, consider the in the 20 Newsgroups data which were posted to a
plot in Figure 2. We fit a 50-topic model to the single Newsgroup (rather than two or more News-
20 Newsgroups data (details are described in Sec- groups). We used the Collapsed Gibbs Sampler
tion 3.2) and plotted log(lift) on the y-axis vs. algorithm (Griffiths and Steyvers, 2004) to sample
log(φkw ) on the x-axis for each term in the vo- the latent topics for each of the N = 1, 590, 376
cabulary (which has size V = 22, 524) for a given tokens in the data, and we saved their topic assign-
topic. Figure 2 shows this plot for Topic 29, which ments from the last iteration (after convergence).
occurred mostly in documents posted to the “Mo- We then computed the 20 by 50 table, T , which
torcycles” Newsgroup, but also from documents contains, in cell Tgk , the count of the number of
posted to the “Automobiles” Newsgroup and the times a token from topic k ∈ {1, ..., 50} was as-
“Electronics” Newsgroup. Graphically, the line signed to Newsgroup g ∈ {1, ..., 20}, where we
separating the most relevant terms for this topic, defined the Newsgroup of a token to be the News-
given λ, has slope −λ/(1 − λ) (see Figure 2). group to which the document containing that to-
For this topic, the top-5 most relevant terms ken was posted. Some of the LDA-inferred top-
given λ = 1 (ranking solely by probability) ics occurred almost exclusively (> 90% of occur-
66
rences) in documents from a single Newsgroup, Trial data for middle tercile of topics
such as Topic 38, which was the estimated topic
for 15,705 tokens in the corpus, 14,233 of which 0.9
came from documents posted to the “Medicine”

(or “sci.med”) Newsgroup. Other topics occurred 0.8
● ●
Proportion of Correct Responses

in a wide variety of Newsgroups. One would ex- ●
pect these “spread-out” topics to be harder to in- 0.7

●
●
terpret than the “pure” topics like Topic 38. ● ●
In the study we recruited 29 subjects among our 0.6

●
colleagues (research scientists at AT&T Labs with

moderate familiarity with text mining techniques 0.5
●
● Binned responses (bin size = 40)
and topic models), and each subject completed an 50% Intervals
95% Intervals
online experiment consisting of 50 tasks, one for 0.4
Quadratic Fit
each topic in the fitted LDA model. Task k (for

k ∈ {1, ..., 50}) was to read a list of five terms,
0.0 0.2 0.4 0.6 0.8 1.0
Lambda (optimal value is about 0.6)

ranked from 1-5 in order of relevance to topic k,
where λ ∈ (0, 1) was randomly sampled to com-
pute relevance. The user was instructed to identify Figure 3: A plot of the proportion of correct re-
which “topic” the list of terms discussed from a sponses in a user study vs. the value of λ used to
list of three possible “topics”, where their choices compute the most relevant terms for each topic.
were names of the Newsgroups. The correct an-
swer for task k (i.e. our “ground truth”) was de-
topic was incorrectly identified by all 29 users.2
fined as the Newsgroup that contributed the most
For the remaining 42 topics we estimated a topic-
tokens to topic k (i.e. the Newsgroup with the
specific intercept term to control for the inher-
largest count in the kth column of the table T ), and
ent difficulty of identifying the topic (not just due
the two alternative choices were the Newsgroups
to its tokens being spread among multiple News-
that contributed the second and third-most tokens
groups, but also to account for the inherent famil-
to topic k.
iarity of each topic to our subject pool – subjects,
We anticipated that the effect of λ on the proba- on average, were more familiar with “Cars” than
bility of a user making the correct choice could be “The X Window System”, for example).
different across topics. In particular, for “spread-
The estimated effects of λ and λ2 were 2.74 and
out” topics that were inherently difficult to inter-
-2.34, with standard errors 1.03 and 1.00. Taken
pret, because their tokens were drawn from a wide
together, their joint effect was statistically signif-
variety of Newsgroups (similar to a “fused” topic
icant (χ2 p-value = 0.018). To see the estimated
in Chuang et al. (2013b)), we expected the propor-
effect of λ on the probability of correctly identi-
tion of correct responses to be roughly 1/3 no mat-
fying a topic, consider Figure 3. We plot binned
ter the value of λ used to compute relevance. Sim-
proportions of correct responses (on the y-axis)
ilarly, for very “pure” topics, whose tokens were
vs. λ (on the x-axis) for the 14 topics whose esti-
drawn almost exclusively from one Newsgroup,
mated topic-specific intercepts fell into the middle
we expected the task to be easy for any value of λ.
tercile among the 42 topics that weren’t trivial or
To account for this, we analyzed the experimental
impossible to identify. Among these topics there
data by fitting a varying-intercepts logistic regres-
was roughly a 67% baseline probability of correct
sion model to allow each of the fifty topics to have
identification. As Figure 3 shows, for these topics,
its own baseline difficulty level, where the effect
the “optimal” value of λ was about 0.6, and it re-
of λ is shared across topics. We used a quadratic
sulted in an estimated 70% probability of correct
function of λ in the model (linear, cubic and quar-
identification, whereas for values of λ near 0 and
tic functions were explored and rejected).
As expected, the baseline difficulty of each cellaneous Politics, Christianity, Gun Politics, Space (Astron-
omy), and Middle East Politics.
topic varied widely. In fact, seven of the topics 2
The ground truth label for this topic was “Christianity”,
were correctly identified by all 29 users,1 and one but the presence of the term “islam” or “quran” among the
top-5 for every value of λ led each subject to choose “Mis-
1
Whose ground truth labels were Medicine (twice), Mis- cellaneous Religion”.
67
1, the estimated proportions of correct responses value of λ, which can alter the rankings of terms
were closer to 53% and 63%, respectively. We to aid topic interpretation. By default, λ is set to
view this as evidence that ranking terms according 0.6, as suggested by our user study in Section 3.2.
to relevance, where λ < 1 (i.e. not strictly in de- If λ = 1, terms are ranked solely by φkw , which
creasing order of probability), can improve topic implies the red bars would be sorted from widest
interpretability. (at the top) to narrowest (at the bottom). By com-
Note that in our experiment, we used the collec- paring the widths of the red and gray bars for a
tion of single-posted 20 Newsgroups documents given term, users can quickly understand whether
to define our “ground truth” data. An alternative a term is highly relevant to the selected topic be-
method for collecting “ground truth” data would cause of its lift (a high ratio of red to gray), or
have been to recruit experts to label topics from its probability (absolute width of red). The top 3
an LDA model. We chose against this option be- most relevant terms in Figure 1 are “law”, “court”,
cause doing so would present a classic “chicken- and “cruel”. Note that “law” is a common term
or-egg” problem: If we use expert-labeled topics which is generated by Topic 34 in about 40% of
in an experiment to learn how to summarize top- its corpus-wide occurrences, whereas “cruel” is a
ics so that they can be interpreted (i.e. “labeled”), relatively rare term with very high lift in Topic 34
we would only re-learn the way that our experts – it occurs almost exclusively in this topic. Such
were instructed, or allowed, to label the topics in properties of the topic-term relationships are read-
the first place! If, for instance, the experts were ily visible in LDAvis for every topic.
presented with a ranked list of the most probable On the left panel, two visual features provide
terms for each topic, this would influence the in- a global perspective of the topics. First, the ar-
terpretations and labels they give to the topics, and eas of the circles are proportional to the relative
the experimental result would be the circular con- prevalences of the topics in the corpus. In the
clusion that ranking terms by probability allows 50-topic model fit to the 20 Newsgroups data,
users to recover the “expert” labels most easily. the first three topics comprise 12%, 9%, and
To avoid this, we felt strongly that we should use 6% of the corpus, and all contain common, non-
data in which documents have metadata associated specific terms (although there are interesting dif-
with them. The 20 Newsgroups data provides an ferences: Topic 2 contains formal debate-related
externally validated source of topic labels, in the language such as “conclusion”, “evidence”, and
sense that the labels were presented to users (in “argument”, whereas Topic 3 contains slang con-
the form of Newsgroup names), and users sub- versational language such as “kinda”, “like”, and
sequently filled in the content. It represents, es- “yeah”). In addition to visualizing topic preva-
sentially, a crowd-sourced collection of tokens, or lence, the left pane shows inter-topic differences.
content, for a certain set of topic labels. The default for computing inter-topic distances is
Jensen-Shannon divergence, although other met-
4 The LDAvis System rics are enabled. The default for scaling the set of
Our interactive, web-based visualization system, inter-topic distances defaults to Principal Compo-
LDAvis, has two core functionalities that enable nents, but other algorithms are also enabled.
users to understand the topic-term relationships in The second core feature of LDAvis is the abil-
a fitted LDA model, and a number of extra features ity to select a term (by hovering over it) to reveal
that provide additional perspectives on the model. its conditional distribution over topics. This dis-
First and foremost, LDAvis allows one to se- tribution is visualized by altering the areas of the
lect a topic to reveal the most relevant terms for topic circles such that they are proportional to the
that topic. In Figure 1, Topic 34 is selected, and term-specific frequencies across the corpus. This
its 30 most relevant terms (given λ = 0.34, in this allows the user to verify, as discussed in Chuang et
case) populate the barchart to the right (ranked al. (2012a), whether the multidimensional scaling
in order of relevance from top to bottom). The of topics has faithfully clustered similar topics in
widths of the gray bars represent the corpus-wide two-dimensional space. For example, in Figure 4,
frequencies of each term, and the widths of the the term “file” is selected. In the majority of this
red bars represent the topic-specific frequencies of term’s occurrences, it is drawn from one of several
each term. A slider allows users to change the topics located in the upper left-hand region of the
68
Figure 4: The user has chosen to segment the fifty topics into four clusters, and has selected the green
cluster to populate the barchart with the most relevant terms for that cluster. Then, the user hovered over
the ninth bar from the top, “file”, to display the conditional distribution over topics for this term.
global topic view. Upon inspection, this group of of their two-dimensional locations in the global
topics can be interpreted broadly as a discussion topic view). This is merely an effort to facilitate
of computer hardware and software. This verifies, semantic zooming in an LDA model with many
to some extent, their placement, via multidimen- topics where ‘after-the-fact’ clustering may be an
sional scaling, into the same two-dimensional re- easier way to estimate clusters of topics, rather
gion. It also suggests that the term “file” used in than fitting a hierarchical topic model (Blei et al.,
this context refers to a computer file. However, 2003), for example. Selecting a cluster of topics
there is also conditional probability mass for the (by clicking the Voronoi region corresponding to
term “file” on Topic 34. As shown in Figure 1, the cluster) reveals the most relevant terms for that
Topic 34 can be interpreted as discussing the crim- cluster of topics, where the term distribution of a
inal punishment system where “file” refers to court cluster of topics is defined as the weighted average
filings. Similar discoveries can be made for any of the term distributions of the individual topics in
term that exhibits polysemy (such as “drive” ap- the cluster. In Figure 4, the green cluster of topics
pearing in computer- and automobile-related top- is selected, and the most relevant terms, displayed
ics, for example). in the barchart on the right, are predominantly re-
lated to computer hardware and software.
Beyond its within-browser interaction capabil-
ity using D3 (Bostock et al., 2011), LDAvis 5 Discussion
leverages the R language (R Core Team, 2014)
and specifically, the shiny package (Rstudio, We have described a web-based, interactive visu-
2014), to allow users to easily alter the topical alization system, LDAvis, that enables deep in-
distance measurement as well as the multidimen- spection of topic-term relationships in an LDA
sional scaling algorithm to produce the global model, while simultaneously providing a global
topic view. In addition, there is an option to ap- view of the topics, via their prevalences and sim-
ply k-means clustering to the topics (as a function ilarities to each other, in a compact space. We
69
also propose a novel measure, relevance, by which Jason Chuang, Christopher D. Manning and Jeffrey
to rank terms within topics to aid in the task Heer. 2012b. Termite: Visualization Techniques for
Assessing Textual Topic Models. AVI.
of topic interpretation, and we present results
from a user study that show that ranking terms Jason Chuang, Yuening Hu, Ashley Jin, John D. Wilk-
in decreasing order of probability is suboptimal erson, Daniel A. McFarland, Christopher D. Man-
ning and Jeffrey Heer. 2013a. Document Explo-
for topic interpretation. The LDAvis visual- ration with Topic Modeling: Designing Interactive
ization system (including the user study data) is Visualizations to Support Effective Analysis Work-
currently available as an R package on GitHub: flows. NIPS Workshop on Topic Models: Computa-
https://github.com/cpsievert/LDAvis. tion, Application, and Evaluation.
For future work, we anticipate performing a Jason Chuang, Sonal Gupta, Christopher D. Manning
larger user study to further understand how to fa- and Jeffrey Heer. 2013b. Topic Model Diagnostics:
cilitate topic interpretation in fitted LDA mod- Assessing Domain Relevance via Topical Alignment.
ICML.
els, including a comparison of multiple methods,
such as ranking by Turbo Topics (Blei and Laf- Matthew J. Gardner, Joshua Lutes, Jeff Lund, Josh
ferty, 2009) or FREX scores (Bischof and Airoldi, Hansen, Dan Walker, Eric Ringger, and Kevin Seppi.
2010. The topic browser: An interactive tool for
2012), in addition to relevance. We also note the browsing topic models. NIPS Workshop on Chal-
need to visualize correlations between topics, as lenges of Data Visualization.
this can provide insight into what is happening on
Brynjar Gretarsson, John O’Donovan, Svetlin Bostand-
the document level without actually displaying en- jieb, Tobias Hollerer, Arthur Asuncion, David New-
tire documents. Last, we seek a solution to the man, and Padhraic Smyth. 2011. TopicNets: Visual
problem of visualizing a large number of topics Analysis of Large Text Corpora with Topic Model-
(say, from 100 - 500 topics) in a compact way. ing. ACM Transactions on Intelligent Systems and
Technology, pp 1-26.
Thomas L. Griffiths and Mark Steyvers. 2004. Finding
References scientific topics. PNAS.
Loulwah AlSumait, Daniel Barbara, James Gentle, and David Mimno, Hanna M. Wallach, Edmund Talley,
Carlotta Domeniconi. 2009. Topic Significance Miriam Leenders, and Andrew McCallum. 2011.
Ranking of LDA Generative Models. ECML. Optimizing Semantic Coherence in Topic Models.
EMNLP.
Jonathan M. Bischof and Edoardo M. Airoldi. 2012.
Summarizing topical content with word frequency David Newman, Youn Noh, Edmund Talley, Sarvnaz
and exclusivity. ICML. Karimi, and Timothy Baldwin 2010. Evaluating
Topic Models for Digital Libraries. JCDL.
David M. Blei, Andrew Y. Ng, and Michael I. Jordan.
2012. Latent Dirichlet Allocation. JMLR. R Core Team 2014. R: A Language and Envi-
ronment for Statistical Computing. http://www.R-
David M. Blei and John Lafferty. 2009. Vi- project.org.
sualizing Topics with Multi-Word Expressions.
arXiv:0907.1013v1 [stat.ML], 2009 R Studio, Inc. 2014. shiny: Web Application Frame-
work for R; package version 0.9.1. http://CRAN.R-
David M. Blei, Thomas L. Griffiths, Michael I. Jor- project.org/package=shiny.
dan, and Joshua B. Tenenbaum. 2003. Hierarchi-
Daniel Ramage, Evan Rosen and Jason Chuang and
cal Topic Models and the Nested Chinese Restaurant
Christopher D. Manning, and Daniel A. McFarland.
Process. NIPS.
2009. Topic Modeling for the Social Sciences. NIPS
Michael Bostock, Vadim Ogievetsky, Jeffrey Heer Workshop on Applications for Topic Models: Text
2011. D3: Data-Driven Documents. InfoVis. and Beyond.
Allison J.B. Chaney and David M. Blei. 2012. Visual- Justin Snyder, Rebecca Knowles, Mark Dredze,
izing topic models. ICWSM. Matthew Gormley, and Travis Wolfe. 2013. Topic
Models and Metadata for Visualizing Text Corpora.
Jonathan Chang, Jordan Boyd-Graber, Sean Gerrish, Proceedings of the 2013 NAACL HLT Demonstra-
Chong Wang, and David M. Blei. 2009. Reading tion Session.
Tea Leaves: How Humans Interpret Topic Models.
Matthew A. Taddy 2011. On Estimation and Selection
NIPS.
for Topic Models. AISTATS.
Jason Chuang, Daniel Ramage, Christopher D. Man- Hanna M. Wallach, Iain Murray, Ruslan Salakhutdinov,
ning and Jeffrey Heer. 2012a. Interpretation and and David Mimno. 2009. Evaluation Methods for
Trust: Designing Model-Driven Visualizations for Topic Models. ICML.
Text Analysis. CHI.
70

Sievert, C., & Shirley, K. E. LDAvis. A Method For Visualizing and Interpreting Topics

Uploaded by

Copyright:

Available Formats

Sievert, C., & Shirley, K. E. LDAvis. A Method For Visualizing and Interpreting Topics

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sievert, C., & Shirley, K. E. LDAvis. A Method For Visualizing and Interpreting Topics

Uploaded by

Copyright:

Available Formats

LDAvis: A method for visualizing and interpreting topics

Carson Sievert Kenneth E. Shirley

Abstract a few basic questions about a fitted topic model:

“topic-specific word ordering” as potentially use-

ful future work.

Lift (log scale)

3 Relevance of terms to topics

Here we define relevance, our method for ranking

a user study to learn an optimal tuning parameter

in the computation of relevance.

0 0.001 0.002 0.004 0.011 0.03

Let φkw denote the probability of term w ∈

came from documents posted to the “Medicine”

Proportion of Correct Responses

pect these “spread-out” topics to be harder to in- 0.7

terpret than the “pure” topics like Topic 38. ● ●

In the study we recruited 29 subjects among our 0.6

colleagues (research scientists at AT&T Labs with

each topic in the fitted LDA model. Task k (for

Lambda (optimal value is about 0.6)

You might also like