Query Reformulation Based On Word Embeddings
Query Reformulation Based On Word Embeddings
A comparative study
1 Introduction
2 Related Work
3 Methods
This section presents in more detail (i) word embeddings and their applicability
in query expansion, and (ii) the query expansion methods employed in this paper.
to the query. The second option requires more intricate techniques for queries
that contain more than one term, but it is a more powerful solution.
wi T wj
sim(wi , wj ) = cos(wi , wj ) = (1)
kwi kkwj k
Given a trained word embeddings model, the CombSUM and Centroid meth-
ods, presented in [10], are considered for the definition of the similarity of a term
t (whose corresponding embedding is t) to a query q consisting of M terms qi
where i=1,. . . ,M (with corresponding embeddings qi ).
Centroid method The centroid method is based on the observation that the
semantics of an expression can often be adequately represented by the sum of
the vectors of its constituting
P terms. Consequently, a query q can be represented
by a vector Qcent = qi ∈Q qi and the similarity score between a vocabulary
term and a query is defined as:
4 Evaluation
This section describes the experiments performed in order to assess the perfor-
mance of the global and local word embeddings models in query expansion.
Query reformulation based on word embeddings: A comparative study 5
expansion methods (i.e., CombSUM and Centroid methods), and four different
values for the parameter k.
We tested the performance of the retrieval processes using four commonly
used evaluation metrics, namely M AP (Mean Average Precision), P @k (Preci-
sion at k corresponds to the number of relevant results among the top k retrieved
documents), nDCG@k (Discounted Cumulative Gain), and ERR@k (Expected
Reciprocal Rank). For all the models with parameter k, we use k = 20. Both
nDCG [9] and ERR [3] are designed for situations of non-binary notions of
relevance and ERR is an extension of the classical reciprocal rank.
5 Results
This section presents the evaluation results of the experiments on both the bench-
mark datasets and the terrorism-related dataset.
expansion process when applied to a query set, using the four evaluation metrics.
We took this approach of analysis to better present and interpret the results.
Figures 1, 2, 3, and 4 present those mean performances, comparing the ef-
ficacy of the local models versus the global ones, for each query set and eval-
uation metric. Each plot depicts the results obtained with both the 100- and
300-dimensional models, in order to analyse the effect of the dimensionality of
the models and the interdependence of the model’s origin and dimensionality.
In each plot, the eight different points corresponding to each embeddings model
(i.e., local or global) represent different combinations of the expansion method
and the number of expansion terms used. Specifically, each point in the plots
represents the average performance of experiments that use the same model,
expansion method, and number of expansion terms. Blue points represent the
100-dimensional models and orange the 300-dimensional ones.
At a first level of analysis, the local models outperform the global ones when
measured by the ERR@20 metric for the query sets of TREC 9 and 10, by
M AP and P @20 for TREC 11, and by ERR@20 for TREC 12. On the other
hand, the global models perform better than the local ones when measured by
M AP for the queries of TREC 10. As far as the dimensionality is concerned, the
Fig. 1. The average performances of the local and global models for the query set of
TREC 9, for the metrics MAP, P@20, nDCG@20 and ERR@20 on the retrieval task.
8 P. Panagiotou et al.
Fig. 2. The average performances of the local and global models for the query set of
TREC 10, for the metrics MAP, P@20, nDCG@20 and ERR@20 on the retrieval task.
Fig. 3. The average performances of the local and global models for the query set of
TREC 11, for the metrics MAP, P@20, nDCG@20 and ERR@20 on the retrieval task
dimensional global ones, but among the local models the 100-dimensional are
better, according to M AP , P @20, and nDCG@20.
As far as the expansion method is concerned, we paired experiments that
share the same query, type of embeddings model, and number of expansion
terms, but differ on the expansion method. The Wilcoxon signed-rank test be-
tween those pairs has shown that choosing among the investigated expansion
methods does not elicit a statistically significant change in the performance of
the retrieval, for any of the evaluation metrics considered. This outcome is on
par with the findings of (Kuzi, 2016) and it is important, especially when we
consider the efficiency of the two expansion methods, since the centroid method
is much more preferable than the CombSUM method in terms of execution time.
Fig. 4. The average performances of the local and global models for the query set of
TREC 12, for the metrics MAP, P@20, nDCG@20 and ERR@20 on the retrieval task.
word embeddings capture the overall context, local word embeddings provide
interpretations relevant to the particular domain.
Consider for instance the term “karbala” that is relevant to “martyrdom”
according to the local-wind10 model. This term most likely refers to the Battle
of Karbala that was fought on October 680 between the army of the second
Umayyad caliph Yazid I and a small army led by Husayn ibn Ali, the grandson
of the Islamic prophet Muhammad; Husayn and his companions are widely re-
garded as martyrs by both Sunni and Shi’a Muslims5 . It is thus evident in this
case that the local models provide related terms within the particular context
of interest, while the global models provide more universally related terms, and
in particular terms with the same root as the term “martyrdom”.
Furthermore, the local models output “syria” as a term relevant to “war”,
while the global models have a preference over more general terms. The same
also applies to the outputs for the search term “believers”, such as “thabit”
vs. “adherents”; the former is indeed related to the particular context of inter-
est, while the latter is virtually a synonym to the search term “believers” and
therefore could be considered in any context, and not only in this specific one.
5
https://en.wikipedia.org/wiki/Battle of Karbala
Query reformulation based on word embeddings: A comparative study 11
Table 1. Top-3 most similar terms to the search terms based on global and local word
embeddings.
Finally, there are cases, where the local models yield possibly unrelated terms;
however, this may be attributed to the very small size of the domain-specific
dataset on which those models were built.
6 Conclusions
In this work, we compared the performance of global versus local word embed-
dings models for the task of query expansion based on four large-scale bench-
mark datasets and one domain-specific dataset related to terrorism. With regards
to the benchmark datasets, our findings indicate that local models outperform
global ones for the majority of the experiments run and the metrics employed. At
the same time, it is evident that there is an interdependency among the origin of a
model and its dimensionality. Regarding the terrorism-related dataset, we found
that the local models delivered relevant words to a number of terrorism-related
search terms, despite the small size of the corpus. The domain could benefit from
12 P. Panagiotou et al.
larger domain-specific corpora for building embeddings models that can better
capture the semantic relationships in a relevant vocabulary.
Acknowledgements
This work was supported by the TENSOR (H2020-700024) and the CONNEX-
IONs (H2020-786731) projects, both funded by the European Commission.
References
1. Balog, K., Weerkamp, W., De Rijke, M.: A few examples go a long way: con-
structing query models from elaborate query formulations. In: Proceedings of the
31st annual international ACM SIGIR conference on Research and development in
information retrieval. pp. 371–378. ACM (2008)
2. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with
subword information. Transactions of the Association for Computational Linguis-
tics 5, 135–146 (2017)
3. Chapelle, O., Metlzer, D., Zhang, Y., Grinspan, P.: Expected reciprocal rank for
graded relevance. In: Proceedings of the 18th ACM conference on Information and
knowledge management. pp. 621–630. ACM (2009)
4. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: In-
dexing by latent semantic analysis. Journal of the American society for information
science 41(6), 391–407 (1990)
5. Diaz, F., Mitra, B., Craswell, N.: Query expansion with locally-trained word em-
beddings. arXiv preprint arXiv:1605.07891 (2016)
6. Efthimiadis, E.N.: Interactive query expansion: A user-based evaluation in a rel-
evance feedback environment. Journal of the American Society for Information
Science 51(11), 989–1003 (2000)
7. Fox, E.A., Shaw, J.A.: Combination of multiple searches. NIST special publication
SP 243 (1994)
8. Harris, Z.S.: Distributional structure. Word 10(2-3), 146–162 (1954)
9. Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of ir techniques.
ACM Transactions on Information Systems (TOIS) 20(4), 422–446 (2002)
10. Kuzi, S., Shtok, A., Kurland, O.: Query expansion using word embeddings. In: Pro-
ceedings of the 25th ACM international on conference on information and knowl-
edge management. pp. 1929–1932. ACM (2016)
11. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed repre-
sentations of words and phrases and their compositionality. In: Advances in neural
information processing systems. pp. 3111–3119 (2013)
12. Pennington, J., Socher, R., Manning, C.: Glove: Global vectors for word repre-
sentation. In: Proceedings of the 2014 conference on empirical methods in natural
language processing (EMNLP). pp. 1532–1543 (2014)
13. Xu, J., Croft, W.B.: Quary expansion using local and global document analysis.
In: Acm sigir forum. vol. 51, pp. 168–175. ACM (2017)
14. Yin, Z., Shen, Y.: On the dimensionality of word embedding. In: Advances in
Neural Information Processing Systems. pp. 887–898 (2018)