EVE: Explainable Vector Based Embedding Technique Using Wikipedia
EVE: Explainable Vector Based Embedding Technique Using Wikipedia
Using Wikipedia
1 Introduction
Recently the European Union has approved a regulation which requires that citi-
zens have a “right to explanation” in relation to any algorithmic decision-making
(Goodman and Flaxman 2016). According to this regulation, due to come into
force in 2018, an algorithm that makes an automatic decision regarding a user,
entitles that user to a clear explanation as to how the decision was made. With
this in mind, we present an explainable decision-making approach to generating
M. Atif Qureshi
Insight Centre for Data Analytics, University College Dublin, Dublin, Ireland
E-mail: [email protected]
Derek Greene
Insight Centre for Data Analytics, University College Dublin, Dublin, Ireland
E-mail: [email protected]
2 M. Atif Qureshi, Derek Greene
word embeddings, called the EVE model. Word embeddings reference to a family
of techniques that simply describes a concept (i.e. word or phrase) as a vector of
real numbers (Pennington et al 2014). These vectors have been shown useful in
a variety of applications, such as topic modelling (Liu et al 2015), information
retrieval (Diaz et al 2016), and document classification (Kusner et al 2015)
Generally, word embedding vectors are defined by the context in which those
words appear (Baroni et al 2014). Put simply, “a word is characterized by the
company it keeps” (Firth 1957). To generate these vectors, a number of unsu-
pervised techniques have been proposed which includes applying neural networks
(Mikolov et al 2013a,b; Bojanowski et al 2016), constructing a co-occurrence ma-
trix followed by dimensionality reduction (Levy and Goldberg 2014; Pennington
et al 2014), probabilistic models (Globerson et al 2007; Arora et al 2016), and
explicit representation of words appearing in a context (Levy et al 2014, 2015).
Existing word embedding techniques do not benefit from the rich semantic in-
formation present in structured or semi-structured text. Instead they are trained
over a large corpus, such as a Wikipedia dump or collection of news articles, where
any structure is ignored. However, in this contribution we propose a model that
uses the semantic benefits of structured text for defining embeddings. Moreover,
to the best of our knowledge, previous word embedding techniques do not provide
human-readable vector dimensions, thus are not readily open to human interpre-
tation. In contrast, EVE associates human-readable semantic labels with each
dimension of a vector, thus making it an explainable word embedding technique.
To evaluate EVE, we consider its usefulness in the context of three fundamental
tasks that form the basis for many data mining activities – discrimination, cluster-
ing, and ranking. We argue for the need for objective evaluation-based strategies
to ensure that subjective opinions are discouraged, which may be found tasks such
as finding word analogies (Mikolov et al 2013a). These tasks are applied to seven
annotated datasets which differ in terms of topical content and complexity, where
we demonstrate not only the ability of EVE to successfully perform these tasks,
but also its ability to generate meaningful explanations to support its outputs.
The reminder of the paper is organized as follows. In Section 2, we provide an
overview of research relevant to this work. In Section 3, we provide background
material covering the structure of Wikipedia, and then describe the methodology of
the EVE model in detail. In Section 4, we provide detailed experimental evaluation
on the three tasks mentioned above, and also demonstrate the novelty of the EVE
model in generating explanations. Finally, in Section 5, we conclude the paper with
further discussion and future directions. The relevant dataset and source code for
this work can be publicly accessed at http://mlg.ucd.ie/eve.
2 Related Work
techniques (1) and (2) above through knowledge-powered word embeddings. Fi-
nally, we conclude the section with an explanation of the novelty of EVE.
Another category of work which measures semantic similarity and relatedness be-
tween textual units relies on pre-existing knowledge resources (e.g. thesauri, tax-
onomies or encyclopedias). Within the proposed works in the literature, the key
differences lie in the knowledge base employed, the technique used for measure-
ment of semantic distances, and the application domain (Hoffart et al 2012). Both
Budanitsky and Hirst (2006) and Jarmasz (2012) used generalization (‘is a’) rela-
tions between words using WordNet-based techniques; Metzler et al (2007) used
web search logs for measuring similarity between short texts, and both Strube and
Ponzetto (2006) and Gabrilovich and Markovitch (2007) used rich encyclopedic
knowledge derived from Wikipedia. Witten and Milne (2008) made use of tf.idf-
like measures on Wikipedia links and Yeh et al (2009) made use of random walk
algorithm over the graph driven from Wikipedia’s hyperlink structure, infoboxes,
and categories. Recently, Jiang et al (2015) utilize various aspects of page orga-
nizations within a Wikipedia article to extract Wikipedia-based feature sets for
calculating semantic similarity between concepts. Also Qureshi (2015) presented a
Wikipedia-based semantic relatedness framework which uses Wikipedia categories
and their sub-categories to a certain depth count to define the relatedness between
two Wikipedia articles whose categories overlap with the generated hierarchies.
4 M. Atif Qureshi, Derek Greene
Before we present the methodology of the proposed EVE model, we firstly provide
background information on Wikipedia, whose underlying graph structure forms the
basic building blocks of the model.
Wikipedia is a multilingual collaboratively-constructed encyclopedia which is
actively updated by a large community of volunteer editors. Figure 1 shows the
typical Wikipedia graph structure for a set of articles and associated categories.
Each article can receive an inlink from another Wikipedia article while it can also
outlink to another Wikipedia article. In our example, article A1 receives inlinks
from A4 and A1 outlinks to A2 . In addition, each article can belong to a number of
categories, which are used to group together articles on a similar subject. In Fig.
1, A1 belongs to categories C1 and C9 . Furthermore, each Wikipedia category is
arranged in a category taxonomy i.e. , each category can have arbitrary number
of super-categories and sub-categories. In our case, C5 , C6 , C7 are sub-categories
of C4 , whereas C2 and C3 are super-categories of C4 .
EVE: Explainable Vector Based Embedding Technique Using Wikipedia 5
Fig. 1: An example Wikipedia graph structure for a set of four articles and ten
associated categories.
To motivate with a simple real example, the Wikipedia article “Espresso” re-
ceives inlinks from the article “Drink” and it outlinks to the article “Espresso
machine”. The article “Espresso” belongs to several categories, including “Coffee
drinks” and “Italian cuisine”. The category “Italian cuisine” itself has a num-
ber of super-categories (e.g. “Italian culture”, “Cuisine by nationality”) and sub-
categories (e.g. “Italian desserts”, “Pizza”). These Wikipedia categories serve as a
semantic tag for the articles to which they link (Zesch and Gurevych 2007).
3.2 Methodology
We now present the methodology for generating word embedding vectors with the
EVE model. Firstly, a target word or concept is mapped to a single Wikipedia
concept article 1 . The vector for this concept is then composed of two distinct types
of dimensions. The first type quantifies the association of the concept with other
Wikipedia articles, while the second type quantifies the association of the concept
with Wikipedia categories. The intuition here is that related words or concepts
will share both similar article link associations and similar category associations
within the Wikipedia graph, while unrelated concepts will differ with respect to
both criteria. The methods used to define these associations are explained next.
Fig. 2: An example of the assignment of the normalized articlescore for the concept
article Aconcept , based on inlink and outlink structure.
we also add a self-link dimension 2 , where the association of Aconcept with itself is
defined to be the twice of the maximum count received from the linking articles.
Fig. 2 shows an example of the strategy. In the first step, all inlinks and outlinks
are counted for the other non-concept articles (e.g. Aconcept has 3 inlinks and 1
outlink from A3 ). In the next step, the self-link score is computed as twice the
maximum of sum of inlinks and outlinks from all other articles (which is 8 in
this case). In the final step, normalization3 of the scores takes place, dividing by
the maximum score (which is 8 in this case). Articles having no links to or from
Aconcept receive a score of 0. Given the sparsity of the Wikipedia link graph, the
article-based dimensions are also naturally sparse.
Next we define the method for generating vector dimensions corresponding to all
Wikipedia categories which are related to the concept article. The strategy to
assign a score to the related Wikipedia categories proceeds as follows:
1. Start by propagating the score uniformly to the categories to which the concept
article belongs to (see Fig. 1).
2. A portion of the score is further propagated by the probability of jumping from
a category to the categories in the neighborhood.
3. Score propagation continues until a certain hop count is reached (i.e. a thresh-
old value categorydepth ), or there are no further categories in the neighborhood.
2
This dimension the most relevant dimension defining the concept which is the article itself.
3
In case of best match strategy, where more than one article is mapped to a concept i.e.,
Aconcept1 , Aconcept2 , ... the score computed is further scaled by the relevance score of the each
article for the top-k articles, then reduced by the vector addition, and normalized again.
EVE: Explainable Vector Based Embedding Technique Using Wikipedia 7
Fig. 3: Assignment of scores for the category dimensions, from the mapped article
to its related categories.
Fig. 3 illustrates the process, where the concept article Aconcept has a score s,
which is 14 for an exact match. First, the score is uniformly propagated across
the number of Wikipedia categories and their tree structure to which the article
belongs to (C1 and C7 tree receive s/2 from Aconcept ). In the next step, the
directly-related categories (C1 and C7 ) further propagate the score to their super
and sub-categories, while retaining a portion of score. C1 retains a portion by
the factor 1 − jumpprob of the score that it propagate to the super and sub-
categories. Where jumpprob is the probability of jumping from a category to either
a connected super or sub-category. While C7 retains the full score since there is
no super or sub-category for further propagation. In step 3 and onwards, the score
continues to propagate in a direction (to either a super or sub-category) until hop
count categorydepth is reached, or until there is no further category to which score
could propagate to. In Fig. 3, C0 and C3 are the cases where the score cannot
propagate further, while C4 is the stopping condition for the score to propagate
when using a threshold categorydepth = 2.
Once the sets of dimensions for related Wikipedia articles and categories have
been created, we construct an overall vector for the concept article as follows. Eq.
1 shows the vector representation of a concept, where norm is a normalization
function, articlesscore and categoriesscore are the two sets of dimensions, while
biasarticle and biascategory are the bias weights which control the importance of
the associations with the Wikipedia articles and categories respectively. The bias
weights can tuned to give more importance to either type of association. In Eq.
2 we normalize the entire vector such that the sum of the scores of all dimension
4 In case of the partial best match it is the relevance score returned by BM25 algorithm.
8 M. Atif Qureshi, Derek Greene
The process above is repeated for each word or concept in the input dataset to
generate a set of vectors, representing an embedding of the data. In this embedding,
each vector dimension is labeled with a tag which corresponds to either a Wikipedia
article name or a Wikipedia category name. Therefore, each dimension carries a
direct human-interpretable meaning. As we see in the next section, these labeled
dimensions prove useful for the generation of algorithmic explanations.
4 Evaluation
In this section we investigate the extent to which embeddings generated using the
EVE model are useful in three fundamental data mining tasks. Firstly, we describe
a number of alternative baseline methods, along with the relevant parameter set-
tings. Then we describe the dataset which is used for the evaluations, and finally
we report the experimental results to showcase the effectiveness of the model. We
also highlight the benefits of the explanations generated as part of this process.
4.2 Dataset
Table 1 shows a statistical summary of the dataset. In this table, the column
“Example (Category, Items)” shows an example of a category name in the “Topical
Type”, together with a subset of list of items belonging to that category. For
instance, in the first row “Topical Type” is Animal class and Mammal is one of
the category belonging to this type, while Baleen whale is an item with in the
category of Mammal. Similarly there are other categories of the type Animal class
such as Reptile. Table 2 shows the list of categories for each topical type.
All embedding algorithms in our comparison were trained on this dataset.
In case of baseline models, we use “article labels”, “article redirects”, “category
labels”, and “long abstracts”, with each entry as a separate document. Note that,
prior to training, we filter out four non-informative Wikipedia categories which can
be viewed as being analogous to stopwords: {“articles contain video clips”, “hidden
categories”, “articles created via the article wizard”, “unprintworthy redirects”}.
4.3 Experiments
To compare the EVE model with the various baseline methods, we define three
general purpose data mining tasks: intruder detection, ability to cluster, ability to
10 M. Atif Qureshi, Derek Greene
sort relevant items first. In the following sections we define the tasks separately,
each accompanied by experimental results and explanations.
Task definition: For a given “topical type”, we randomly choose four items belong-
ing to one category and one intruder item from a different category of the same
“topical type”. After repeating this process exhaustively for all combinations for
all topical types, we generated 13,532,280 results for this task. Table 3 shows the
breakdown of the total number of queries for each of the “topical types”.
Example of a query: For the “topical type” European cities, we randomly choose
four related items from the “category” Great Britain such as London, Birmingham,
Manchester, Liverpool, while we randomly choose an intruder item Berlin from the
“category” Germany. Each of the models is presented with the five items, where
the challenge is to identify Berlin as the intruder – the rest of the items are related
to each other as they are cities in Great Britain, while Berlin is a city in Germany.
5
X
score(item(k) ) = similarity(item(k) , item(i) ); i 6= k (3)
i=1
Results: To evaluate the effectiveness of the EVE model against the baselines for
this task, we use accuracy (Manning et al 2008) as the measure for finding the
intruder item. Accuracy is defined as the ratio of correct results (or correct number
of intruder items) to the total number of results returned by the model:
| ResultsCorrect |
accuracy = (4)
| ResultsT otal |
Table 4 shows the experimental results for the six models in this task. From the
table it is evident that the EVE model significantly outperforms rest of the models
overall. However, in the case of two “topical types”, the FastText CBOW yields
better results. To explain this, we next show explanations generated by the EVE
model while making decisions for the intruder detection task.
Explanation from the EVE model: Using the labeled dimensions in vectors pro-
duced by EVE, we define the process to generate effective explanations for the
intruder detection task in Algorithm 1 as follows. The inputs to this algorithm are
the vectors of items, and the intruder item identified by the EVE model. In step
1, we calculate the mean vector of all the vectors. In step 2 and 3, we subtract
the influence of intruder and mean of vectors from each other to obtain dominant
vector spaces to represent detected coherent items and intruder item respectively.
In step 4 and 5, we order the labeled dimensions by their informativeness (i.e. the
dimension with the highest score is the most informative dimension). Finally, we
return a ranked list of informative vector dimensions for the both non-intruders
and the intruder as an explanation for the output of the task.
Table 5: Sample explanation generated for the intruder detection task, for the
query: {Hawk, Penguin, Gull, Parrot, Snake}. Correct intruder detected: Snake.
All top-9 features are Wikipedia categories.
Non-Intruder Intruder
falconiformes turonian first appearances
birds of prey snakes
seabirds squamata
ypresian first appearances predators
psittaciformes lepidosaurs
parrots predation
rupelian first appearances carnivorous animals
gulls venomous snakes
bird families snakes in art
Table 6: Sample explanation generated for the intruder detection task, for the
query: {I Am Legend (film), Insidious (film), A Nightmare on Elm Street, Final
Destination (film), Children of Men}. Incorrect intruder detected: Final Destina-
tion (film). All top-9 features are Wikipedia categories.
Non-Intruder Intruder
english-language films studiocanal films
american independent films splatter films
american horror films final destination films
universal pictures films films shot in vancouver
post-apocalyptic films films shot in toronto
films based on science fiction novels films shot in san francisco, california
2000s science fiction films films set in new york
ghost films films set in 1999
films shot in los angeles, california film scores by shirley walker
Table 5 and 6 show sample explanations generated by the EVE model, where
the model has detected a correct and incorrect intruder item respectively. In Table
5, the query has items selected from “topical type” animal class, where four of the
items belong to the “category” birds, while the item ‘snake’ belongs to the “cate-
gory” reptile. As can be seen from the table, the bold features in the non-intruder
and intruder column obviously represent bird family and snake respectively, which
is the correct inference. Furthermore, the non-bold features in the non-intruder
and intruder columns represent deeper relevant relations which may require some
domain expertise. For instance, falconiformes are a family of 60+ species in the
order of birds and turonian is the evolutionary era of the specific genera.
In the example in Table 6, the query has items selected from the “topical
type” movie genres, where four of the items belong to the “category” horror film,
while the intruder item ‘Children of Men’ belongs to the “category” science fic-
tion film. In this example, EVE identifies the wrong intruder item according to
the ground truth, recommending instead the item ‘Final Destination (film)’. From
the explanation in the table, it becomes clear why the model made this recom-
mendation. We observe that the non-intruder items have a coherent relationship
with ‘post-apocalyptic films’ and ‘films based on science fiction novels’ (both ‘I am
Legend (film)’ and ‘Children of Men’ belong to these categories). Whereas ‘Final
Destination (film)’ was recommended by the model based on features relating to
EVE: Explainable Vector Based Embedding Technique Using Wikipedia 13
filming location. A key advantage of having an explanation from the model is that
it allows us to understand why a mistake occurs and how we might improve the
model. In this case, one way to make improvement might be to add a rule filtering
Wikipedia categories relating to locations when consider movie genres.
Task definition: For all items in a specific “topical type”, we construct an em-
bedding space without using information about the category to which the items
belong. The purpose is then to measure the extent to which these items clus-
ter together in the space relative to the ground truth categories. This is done
by measuring distances in the space between items that should belong together
(i.e. intra-cluster distances) and items that should be kept apart (i.e. inter-cluster
distances), as determine by the categories. Since there are seven “topical types”,
there are also even queries in this task.
Example of a query: For the “topical type” Cuisine, we are provided with a list of
100 items in total, where each of the five categories has 20 items. These correspond
to cuisine items from five different countries. The idea is to measure the ability of
each embedding model to cluster these 100 items back into five categories.
Results: To evaluate the ability to cluster, there are typically two objectives:
within-cluster cohesion and between-cluster separation. To this end, we use three
well-known cluster validity measures in this task. Firstly, the within-cluster dis-
tance (Everitt et al 2001) is the total of the squared distances between each item
xi and the centroid vector µc of the cluster Cc to which it has been assigned:
k
d(xi , µc )2
X X
within = (5)
c=1 xi ∈Cc
Typically this value is normalized with respect to the number of clusters k. The
higher the score, the more coherent the clusters. Secondly, the between-cluster dis-
tance is the total of the squares of the distances between the each cluster centroid
and the centroid of the entire dataset, denoted µ̂:
5 by simply, 1 - normalized similarity score over each dimension
14 M. Atif Qureshi, Derek Greene
k n
1X
|Cc | d(µc , µ̂)2 where µ̂ =
X
between = xi (6)
c=1
n i=1
This value is also normalized with respect to the number of clusters k. The lower
the score, the more well-separated the clusters. Finally, the two above objectives
are combined via the CH-Index (Caliński and Harabasz 1974), using the ratio:
between/(k − 1)
CH = (7)
within/(n − k)
The higher the value of this measure, the better the overall clustering.
EVE: Explainable Vector Based Embedding Technique Using Wikipedia 15
From Table 7, we can see that EVE generally performs better than rest of the
embedding methods for the within-cluster measure. In Table 8, for the between-
cluster measure, EVE is outperformed by FastText CBOW, Word2Vec CBOW,
and FastText SG mainly due to the “topical type” Cuisine and European cities
where EVE does not perform well. Finally, in Table 9 where the combined aim
of clustering is captured through the CH-Index, EVE outperforms the rest of the
methods, except in the case of the “topical type” European cities.
Explanation from the EVE model: Using labeled dimensions from the EVE model,
we define a similar strategy for explanation as used in the previous task. However,
now instead of discovering an intruder item, the goal is to define categories from
items and to define the overall space. Algorithm 2 shows the strategy which re-
quires three inputs: the vectorspace representing the entire embedding; the list of
categories categories; the categories vectorspace which is the vector space of items
belonging to each category. In step 1, we calculate the mean vector representing
for the entire space. In step 2, we order the labeled dimensions of the mean vec-
tor by the informativeness. In steps 3–6 we iterate over the list of categories (of
a “topical type” such as Cuisine) and calculate mean vector for each category’s
vector space, which is followed by the ordering of dimensions of the mean vec-
tor of category vector space by the informativeness. Finally, we return the most
informative features of the entire space and of each category’s vector space.
Tables 10 and 11 show the explanations generated by the EVE model, in the
cases where the model performed best and worse against baselines respectively. In
Table 10, the query is the list of items from “topical type” cuisine. As can be seen
from the bold entries in the table, the explanation conveys the main idea about
both the overall space and the individual categories. For example, in the overall
space, we can see the cuisines by different nationalities, and likewise we can see
the name of nationality from which the cuisine is originated from (e.g. Italian cui-
sine for the “Italian category” and Pakistani breads for the “Pakistani category”).
As for the non-bold entries, we can also observe relevant features but at a deeper
semantic level. For example, cuisine of Lombardy in “Italian category” where Lom-
bardy is a region in Italy, and likewise tortilla-based dishes in the Mexican category
where tortilla is a primary ingredient in Mexican cuisine.
In Table 11, the query is the list of items from “topical type” European cities
and this is the example where EVE model performs worse. However, the explana-
tion allows us to understand why this is the case. As can been from the explana-
tion table, the bold features show historic relationships across different countries,
16 M. Atif Qureshi, Derek Greene
Table 10: Sample explanation generated for the ability to cluster task, for the
query:{items of “topical type” Cuisine}. All top-6 features are Wikipedia cate-
gories, except for those beginning with ‘α:’ which correspond to Wikipedia articles.
Fig. 4: Visualizations of model embeddings generated for the ability to cluster task,
for the query: {items of “topical type” Country to Continent}. Colors and shapes
indicate items belonging to different ground truth categories.
Visualization: Since scatter plots are often used to represent the output of a clus-
ter analysis process, we generate a visualization of all embeddings using T-SNE
(Maaten and Hinton 2008), which is a tool to visually represent high-dimensional
EVE: Explainable Vector Based Embedding Technique Using Wikipedia 17
Table 11: Sample explanation for the ability to cluster task, for the query: {items
of “topical type” European cities}. All top-6 features are Wikipedia categories.
data by reducing it to 2–3 dimensions for presentation.6 . For the interest of reader,
Fig. 4 shows a visualization generated using EVE and GloVe when the list of items
are selected from the “topical type” country to continent. As can be seen from the
plot, the ground truth categories exhibit better clustering behavior when using
the space from the EVE model, when compared to the Glove model. This is also
reflected in the corresponding scores in Tables 7, 8, and 9.
Task definition: The objective of this task is to rank a list of items based on their
relevance to a given query item. According to the ground truth associated with
our dataset, items which belong to the same ‘category’ of “topical type” as the
query should be ranked above items which do not belong that ‘category’ (i.e. they
are irrelevant to the query). In this task the total number of queries is equal to
the total number of categories in the dataset – i.e. 36 (see table 1).
6 The full set of experimental visualizations is available at http://mlg.ucd.ie/eve/
18 M. Atif Qureshi, Derek Greene
Example of a query: Unlike the previous tasks, here ‘category’ is used as a query
in this task. For example, for the ‘category’ Nobel laureates in Physics, the task is
to sort all items from “topical type” Nobel laureates such that the list of items from
‘category’ Nobel laureates in Physics are ranked ahead of the rest of the items.
Thus, Niels Bohr, who is a laureate in Physics, should appear near the top of the
ranking, unlike Elihu Root, who is a prize winner in Peace.
| ItemsRelevant |
P @k = (8)
| ItemsT op-k |
|Items|
1 X
AP = P @k · rel(k) (9)
| ItemsRelevant | k=1
1, if item(k) is relevant
where rel(k) =
0, otherwise
Tables 12 and 13 show the experimental results of the sorting relevant items
first task. We choose P @20 (k = 20), since on average there are 20 items in each
category in the dataset. As can be seen from tables, the EVE model generally
outperforms the rest of models, except for the “topical type” European cities where
it gets outperformed by a factor of 1.05 and 1.09 times in terms of P @k and AP
respectively, while in all other cases EVE outperforms other algorithm by at least
1.51 and 1.37 times in terms of P @k and AP respectively. On average, the EVE
model outperforms the second best algorithm by a factor of 1.8 and 1.67 times in
terms of P @k and AP respectively. In the next section, we show the corresponding
explanations generated by the EVE model for this task.
EVE: Explainable Vector Based Embedding Technique Using Wikipedia 19
Table 12: Sorting relevant items first task – Precision (P @20) scores.
Table 13: Sorting relevant items first task – Average Precision (AP) scores.
Explanation from the EVE model: Using the labeled dimensions provided by the
EVE model, we define a strategy for generating explanations for the sorting rele-
vant items first task in Algorithm 3. The strategy requires three inputs. The first
is the vectorspace which is composed of category vector and item vectors. The sec-
ond input is the Simwrt category which is a column matrix, composed of similarity
score between the category vector with itself and item vectors. In this matrix the
first entry is 1.0 because of the self similarity of the category vector. The final
input is a list of items items. In the step 1 and 2, a weighted mean vector of space
is calculated, where the weights are the similarity scores between the vectors in the
space and the category vector. In steps 3–6, we iterate over the list of items and
calculate the product between the weighted mean vector of the space and the item
vector. After taking the product, we order the dimensions by the informativeness.
Finally, we return the ranked list of informative features for each item.
Table 14: Sample explanation for the sorting relevant items first task, for the query:
{Nobel laureates in Chemistry}. All top-6 features are Wikipedia categories.
Table 15: Sample explanation for the sorting relevant items first task, for the
query: {Classical music}. All top-6 features are Wikipedia categories except those
beginning with ‘α:’ which are Wikipedia articles.
References
Diaz F, Mitra B, Craswell N (2016) Query expansion with locally-trained word embeddings.
arXiv preprint arXiv:160507891
Everitt B, Landau S, Leese M (2001) Cluster Analysis. Hodder Arnold Publication, Wiley
Firth J (1957) A synopsis of linguistic theory 1930-1955. Studies in linguistic analysis pp 1–32
Gabrilovich E, Markovitch S (2007) Computing semantic relatedness using wikipedia-based
explicit semantic analysis. In: Proc. IJCAI’07, vol 7, pp 1606–1611
Gallant SI, Caid WR, Carleton J, Hecht-Nielsen R, Qing KP, Sudbeck D (1992) Hnc’s match-
plus system. In: ACM SIGIR Forum, ACM, vol 26, pp 34–38
Globerson A, Chechik G, Pereira F, Tishby N (2007) Euclidean embedding of co-occurrence
data. JMLR 8(Oct):2265–2295
Goodman B, Flaxman S (2016) European union regulations on algorithmic decision-making
and a” right to explanation”. arXiv preprint arXiv:160608813
Harris ZS (1954) Distributional structure. Word 10(2-3):146–162
Hoffart J, Seufert S, Nguyen DB, Theobald M, Weikum G (2012) Kore: Keyphrase overlap
relatedness for entity disambiguation. In: Proc. 21st ACM International Conference on
Information and Knowledge Management, pp 545–554
Jarmasz M (2012) Roget’s thesaurus as a lexical resource for natural language processing.
arXiv preprint arXiv:12040140
Jiang Y, Zhang X, Tang Y, Nie R (2015) Feature-based approaches to semantic similarity
assessment of concepts using wikipedia. Info Processing & Management 51(3):215–234
Kusner MJ, Sun Y, Kolkin NI, Weinberger KQ (2015) From word embeddings to document
distances. In: Proc. ICML’2015, pp 957–966
Levy O, Goldberg Y (2014) Neural word embedding as implicit matrix factorization. In: Proc.
NIPS’2014, pp 2177–2185
Levy O, Goldberg Y, Ramat-Gan I (2014) Linguistic regularities in sparse and explicit word
representations. In: CoNLL, pp 171–180
Levy O, Goldberg Y, Dagan I (2015) Improving distributional similarity with lessons learned
from word embeddings. Tr Assoc Computational Linguistics 3:211–225
Liu Y, Liu Z, Chua TS, Sun M (2015) Topical word embeddings. In: AAAI, pp 2418–2424
Maaten Lvd, Hinton G (2008) Visualizing data using t-sne. JMLR 9(Nov):2579–2605
Manning CD, Raghavan P, Schütze H (2008) Introduction to Information Retrieval. Cambridge
University Press, New York, NY, USA
Metzler D, Dumais S, Meek C (2007) Similarity measures for short segments of text. In:
European Conference on Information Retrieval, Springer, pp 16–27
Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations
in vector space. arXiv preprint arXiv:13013781
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013b) Distributed representations of
words and phrases and their compositionality. In: Proc. NIPS’2013, pp 3111–3119
Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation.
In: Empirical Methods in Natural Language Processing (EMNLP), pp 1532–1543
Qureshi MA (2015) Utilising wikipedia for text mining applications. PhD thesis, National
University of Ireland Galway
Schütze H (1992) Word space. In: Proc. NIPS’1992, pp 895–902
Sherkat E, Milios E (2017) Vector embedding of wikipedia concepts and entities. arXiv preprint
arXiv:170203470
Socher R, Chen D, Manning CD, Ng A (2013) Reasoning with neural tensor networks for
knowledge base completion. In: Proc. NIPS’2013, pp 926–934
Strube M, Ponzetto SP (2006) Wikirelate! computing semantic relatedness using wikipedia.
In: Proc. 21st national conference on Artificial intelligence, pp 1419–1424
Witten I, Milne D (2008) An effective, low-cost measure of semantic relatedness obtained from
wikipedia links. In: AAAI Workshop on Wikipedia and Artificial Intelligence: an Evolving
Synergy, pp 25–30
Xu C, Bai Y, Bian J, Gao B, Wang G, Liu X, Liu TY (2014) Rc-net: A general framework
for incorporating knowledge into word representations. In: Proc. 23rd ACM International
Conference on Conference on Information and Knowledge Management, pp 1219–1228
Yeh E, Ramage D, Manning CD, Agirre E, Soroa A (2009) Wikiwalk: random walks on
wikipedia for semantic relatedness. In: Proc. 2009 Workshop on Graph-based Methods
for Natural Language Processing, pp 41–49
Zesch T, Gurevych I (2007) Analysis of the wikipedia category graph for nlp applications. In:
Proc. TextGraphs-2 Workshop (NAACL-HLT 2007), pp 1–8