0% found this document useful (0 votes)
21 views8 pages

The Title of The Article

Uploaded by

phani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views8 pages

The Title of The Article

Uploaded by

phani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Recommender system of scholarly papers using public datasets

Jie Zhu, M.S., Braja G. Patra, Ph.D., Ashraf Yaseen, Ph.D.


University of Texas Health Science Center at Houston
Houston, TX, USA
Abstract
The exponential growth of public datasets in the era of Big Data demands new solutions for making these resources
findable and reusable. Therefore, a scholarly recommender system for public datasets is an important tool in the field
of information filtering. It will aid scholars in identifying prior and related literature to datasets, saving their time, as
well as enhance the datasets reusability. In this work, we developed a scholarly recommendation system that
recommends research-papers, from PubMed, relevant to public datasets, from Gene Expression Omnibus (GEO).
Different techniques for representing textual data are employed and compared in this work. Our results show that
term-frequency based methods (BM25 and TF-IDF) outperformed all others including popular Natural Language
Processing embedding models such as doc2vec, ELMo and BERT.
Introduction
Recommendation systems, or recommenders, are information filtering systems that employ data mining and analytics
of users’ behaviors, including preferences and activities, to predict users’ interests in information, products or services.
There are broadly two types of recommenders: collaborative filtering and content-based. The former works by utilizing
the rating activities of items or users, while the latter works by comparing descriptions of items or profiles of users’
preferences.
With the ever-growing public information online, recommendation systems have proven to be an effective strategy to
deal with information overload. In fact, recommenders are thriving in this era of Big Data with wide commercial
applications in recommending products (e.g. Amazon), music 1, movies2, books3, news articles4, and many more.
Applications of recommendation systems are currently expanding beyond the commercial to include scholarly
activities. The first recommendation system for research papers was introduced in the CiteSeer project 5. Following
that, Science Concierge6, PURE7, pmra8 were also developed for recommending articles. More recent experiments
include Colin and Beel’s9 and A. Mohamed Hassan et al.’s10, in which they experimented with Natural Language
Processing (NLP) models.
The aforementioned systems are all paper-to-paper recommendations, i.e., they provide recommendations of papers
similar to a given paper. To date, no prior research has yet been performed on recommending papers based on public
datasets, to the best of our knowledge. There are many public datasets available on the internet which might be useful
to researchers for further exploration. A scholarly literature recommendation system for datasets is an important and
very helpful tool in the field of information filtering. It can aid in identifying prior and related literature to the dataset’s
topic. It can save researchers’ time as well as enhance the experience of the dataset’s re-usability. Further,
recommending literature to datasets is also a field of research yet to be explored.
In this paper, we described the development of a content-based recommendation system that recommends articles
from PubMed corresponding to datasets (referred to as data series) from Gene Expression Omnibus (GEO). GEO is a
public repository for high-throughput microarray and next-generation sequence functional genomics data. As of Feb
05, 2020, there are 124,825 data series available in GEO (A series record links together a group of related samples
and provides a focal point and description of the whole study11). Many of these series’ data were collected at enormous
effort only to be used just once. We believe that dataset use and reuse can be significantly improved when
recommending research papers that are relevant to such dataset to researchers, an idea consistent with NIH Strategic
Plan for Data Science12. We experimented and compared a variety of vector representations from traditional term-
frequency based methods and topic-modeling to embeddings, and evaluated different recommendations using existing
citations as a reference. The work described herein is part of the dataset re-usability platform (GETc Research
Platform) developed at The University of Texas Health Science Center at Houston available at http://genestudy.org.
Relevant work
CiteSeer5 is a content-based recommender based on keywords matching, Term Frequency-Inverse Document
Frequency (TF-IDF) for word information and Common Citation-Inverse Document Frequency (CCIDF) for citation
information. Science Concierge6 is another content-based research article recommendation system using Latent

672
Semantic Analysis (LSA) and Rocchio Algorithms with large-scale approximate nearest neighbor search based on
ball trees. PURE7, another content-based PubMed article recommender developed using a finite mixture model for
soft clustering with Expectation-Maximization (EM) algorithm, which achieved 78.2% precision at 10% recall with
200 training articles. Lin and Wilbur developed pmra8, a probabilistic topic-based content similarity model for PubMed
articles. Their method achieved slight but statistically significant improvement on precision@5 compared to BM25.
With the popularity of NLP models, such as Google’s doc2vec, USE, and most recently BERT, there have been some
efforts in incorporating these embedding methods in research papers recommenders. Colin and Beel 9 experimented
with doc2vec, TF-IDF and key phrases for providing related-article recommendations to both digital library Sowiport13
and the open-source reference manager Jabref14. A. Mohamed Hassan et al.10 evaluated USE, InferSent, ELMo, BERT
and SciBERT for reranking results from BM25 for research paper recommendations.
Materials
We used data series from GEO and MEDLINE articles from PubMed. For GEO series, metadata such as title,
summary, date of publications and names of authors were collected using a web crawler. We also collected the PMIDs
of the articles associated with each series. From these PMIDs, metadata of corresponding articles such as title, abstract,
authors, affiliations, MeSH terms, publisher name were also collected. Figure 1 shows an example of GEO data series,
Figure 2 shows an example of PubMed publication.

Figure 1. An example of GEO data series

Figure 2. An example of PubMed publication.


In order to automatically evaluate our recommendations, using metrics such as precision and recall, we kept only the
series that have associated citations (publications). That left us with a total of 72,971 unique series and 50,159

673
associated unique publications. Multiple series can reference the same paper(s). 96% of the series have only 1 related
publication and the rest have between 2 to 10.
Methods
We adopted an information retrieval strategy, where the data series are treated as queries and the list of recommended
publications as retrieved documents. In our experiments, series were represented by their titles and summaries; while
publications were represented by their titles and abstracts. Further, we removed stop words, punctuation, and URLs
from summaries of series before transforming them into vectors.
We used cosine similarity as the ranking score, which is a popular measure in query-document analysis15 for the
similarity of features due to its low-complexity and intuitional definition. In our case, we only returned the top 10
recommendations based on cosine similarity, which is a realistic scenario where few people would check the end of a
long recommendation list. Figure 3 shows our recommender’s architecture.

Figure 3. Literature recommendation system architecture


The recommendations were then evaluated using existing series-articles relationships from series metadata using
MRR@10, recall@1, recall@10, precision@1, and MAP@10.
Vector representation
Methods of representing textual data in recommendation systems are ranging from traditional term-frequency based
methods and topic-modeling to embeddings. Below is the list of methods we experimented with in this study:
TF-IDF: a numerical statistical representation of how important a word is to a document in a collection or corpus 16.
For each vocabulary V, the value increases proportionally to the number of times that V appears in the document (term
frequency, TF) and is offset by the total number of documents that contain V (inverse document frequency, IDF). We
used TF-IDF implementation from scikit-learn17.
BM25: a ranking function that is based on a probabilistic retrieval framework that utilizes adjusted values of TF and
IDF and document length18. We used BM25 implementation from genism19.
LSA: a topic modeling technique that utilizes singular value decomposition (SVD) on a term-frequency matrix to find
a low-rank approximation representation. We used TruncatedSVD from scikit-learn for LSA implementation, with a
reduced dimension equals to 300.
word2vec20,21: a two-layer neural network that is trained to reconstruct linguistic contexts of words by mapping each
unique word to a corresponding vector space. We utilized word2vec implemented in gensim, with an embedding
dimension of 200.
doc2vec21: a neural network method that extends word2vec and learns continuous distributed vector representations
for variable-length pieces of texts. We utilized doc2vec implemented in gensim, with an embedding dimension of 300.

674
ELMo22: a deep, contextualized bi-directional Long Short-Term Memory (LSTM) model that was pre-trained on 1B
Word Benchmark23. We used the latest TensorFlow Hub implementation 24 of ELMo to obtain embeddings of 1024
dimensions.
InferSent25: a bi-directional LSTM encoder with max-pooling that was pre-trained on the supervised data of Stanford
Natural Language Inference (SNLI)26. There are two versions of InferSent models, and we used one with fastText
word embeddings from Facebook’s github 27, with the resulting embedding dimension of 4096.
USE28: Universal Sentence Encoder, developed by Google, has two variations of model structures: one is transformer-
based while the other one is Deep Average Network (DAN)-based, both of which were pre-trained on unsupervised
data such as Wikipedia, web news and web question-answer pages, discussion forums, and further on supervised data
of SNLI. We used the TensorFlow Hub implementation of transformer USE to obtain embeddings of 512 dimensions.
BERT29: Bidirectional Encoder Representations from Transformer developed by Google, which has previously
achieved state-of-the-art performance in many classical natural language processing tasks. It was pre-trained on 800M-
words BooksCorpus and 2500M-word English Wikipedia using masked language model (MLM) and next sentence
prediction (NSP) as the pre-training objectives. We used the package Sentence-BERT30 to obtain vectors optimized
for Semantic Textual Similarity (STS) task, which is of 768 dimensions.
SciBERT31: a BERT model that was further pre-trained on 1.14M full-paper corpus from semanticscholar.org32.
Similarly, we used Sentence-BERT to obtain vectors of 768 dimensions.
BioBERT33: a BERT model that was further pre-trained on large scale biomedical corpus, i.e. 4.5B-word PubMed
abstracts and 13.5B-word PubMed Central full-text articles. Similar to BERT, vectors of 768 dimensions were
obtained using Sentence-BERT.
RoBERTa34: a robust version of BERT that has been further pre-trained on CC-NEWs35 corpus, with enhanced
hyperparameters choices including batch-sizes, epochs, and dynamic masking patterns in the pre-training process. We
used Sentence-BERT to obtain vectors of 768 dimensions.
DistilBERT36: a distilled version of BERT with a 40% reduced size, 97% of the original performance while being
60% faster. We used Sentence-BERT to obtain vectors of 768 dimensions.
For all term-frequency based methods, the experiments were performed on 8 Intel(R) Xeon(R) Gold 6140 CPUs@
2.30GHz. For embedding based methods, the experiments were performed using 1 Tesla V100-PCIE-16GB GPU.
The implementations of the experiments are at https://github.com/chocolocked/RecommendersOfScholarlyPapers
Evaluation metrics
The following metrics were used to evaluate our system:
Mean reciprocal rank (MRR)@k: The Reciprocal Rank (RR) measures the reciprocal of the rank at which the first
relevant document was retrieved. RR is 1 if the relevant document was retrieved at rank 1, RR is 0.5 if document is
retrieved at rank 2, and so on. When we average the top k retrieved items across queries, the measure is called the
Mean Reciprocal Rank@k37. In our case, we chose k=10.
Recall@k: At the k-th retrieved item, this metric measures the proportion of relevant items that are retrieved. We
evaluated both recall@1 and recall@10.
Precision@k: At the k-th retrieved item, this metric measures the proportion of the retrieved items that are relevant.
In our case, we are interested in precision@1. Since most of our data series has only 1 corresponding publication,
which means most of the data only has 1 relevant item.
Mean average precision (MAP)@k: Average Precision is the average of the precision value obtained for the set of
top k items after each relevant document is retrieved. When average precision is averaged again over all retrieval, this
value becomes mean average precision.
Detailed procedure-example

675
Below, we demonstrate the detailed procedure using BM25 and data series ‘GSE11663’ as an example:
• For each of the 50,159 publications, we concatenated processed titles with abstracts. We then created a BM25
object, its dictionary and corpus out of the list.
• For ‘GES11663’, we concatenated the title ('human cleavage stage embryos chromosomally unstable') and the
processed summary ('embryonic chromosome aberrations cause birth defects reduce human fertility however
neither nature incidence known develop assess genome-wide copy number variation loss heterozygosity single
cells apply screen blastomeres vitro fertilized preimplantation embryos complex patterns chromosome-arm
imbalances segmental deletions duplications amplifications reciprocal sister blastomeres detected large
proportion embryos addition aneuploidies uniparental isodisomies frequently observed since embryos derived
young fertile couples data indicate chromosomal instability common human embryogenesis comparative genomic
hybridisation') and got its vector representation using dictionary: [ (27, 1), (32, 1), (44, 1), (46, 1), (80, 1), (116,
1), (141, 1), (175, 1), (182, 1), (190, 1), (360, 2), (390, 1), (407, 1), (530, 1), (649, 1), (663, 1), (725, 1), (842, 1),
(844, 1), (999, 1), (1034, 1), (1186, 1), (1235, 1), (1370, 1), (1634, 1), (1635, 1), (1636, 1), (1761, 1), (1862, 1),
(2023, 1), (2174, 1), (2224, 1), (2292, 1), (2675, 1), (2677, 1), (3023, 1), (3082, 1), (3113, 1), (3144, 2), (3145,
2), (3153, 1), (3697, 1), (4265, 1), (4935, 1), (5021, 1), (5105, 1), (5775, 1), (6665, 1), (6772, 1), (6828, 1), (7298,
1), (7372, 1), (7684, 1), (7808, 1), (7949, 1), (8211, 1), (8344, 1), (8569, 2), (8974, 1), (9009, 1), (9302, 1), (9705,
1), (10480, 1), (11360, 1), (17139, 1), (24769, 1), (28560, 1), (38594, 1), (54855, 1), (228500, 1), (250370, 1) ].
Then we used sklearn ‘cosine-similarity’ to get similarity scores of all 50,159 publications with this series.
• Since ‘GSE11663’ has the citation ['19396175', '21854607'] (without order), and our top 10 recommendations
were [‘19396175’, ‘23526301’, ‘16698960’, ‘25475586’, ‘29040498’, ‘23054067’, ‘27197242’, ‘23136300’,
‘24035391’, ‘18713793’]. Our recommendations hit top 1. In this case, we calculated

1 1
𝑀𝑅𝑅@10: = 1, 𝑟𝑒𝑐𝑎𝑙𝑙 @1: = 0.33,
1 3
1 1
𝑟𝑒𝑐𝑎𝑙𝑙@10: = 0.33, 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛@1: = 1,
3 1
1
𝑀𝐴𝑃@10: ∗ (1 + 0) = 0.5.
2
• We repeated the above two steps, and computed average for all 72,971 series
Results
Table 1 shows the results of our experiments with different vector representations. BM25 outperformed all other
methods in terms of all evaluation metrics, with MRR@10, recall@1, recall@10, precision@1, and MAP@10 of
0.785, 0.742, 0.833, 0.756, and 0.783 respectively, followed closely by TF-IDF. None of the embedding methods
alone was able to outperform BM25. Furthermore, word2vec, doc2vec, and BioBERT were among the top embedding
methods outperforming ELMo, USE, and the rest.
Our findings show that traditional term-frequency based methods (BM25, TF-IDF) were more effective for
recommendations compared to embedding methods. Contrasting previous beliefs that embeddings can conquer it all,
given their performances in standardized general NLP tasks such as sentiment analysis, Questions & Answering
(Q&A), and Named Entity Recognition (NER). They failed to show advantage in the simple scenario of capturing
semantic similarity as measured by cosine similarity. Even though the context were not exactly the same, Colin and
Beel9 did find out in their studies that doc2vec failed to defeat TF-IDF or key phrases in the two experimental setups
of publication recommendations for digital library Sowiport and reference manager Jabref. Moreover, A. Mohamed
Hassan et al.12 also concluded in their study that none of the sentence embeddings (USE, InferSent, ELMo, BERT and
SciBERT) that they had employed were able to outperform BM25 alone for their research paper recommendations.

One possible reason could be that traditional statistical methods produce better features when the queries are relatively
homogenous, Ogilvie and Callan38 showed that single database (homogeneous) queries with TF-IDF performed
unanimously better than multi-database (heterogenous) queries when no additional IR techniques, such as query
expansion, were involved. Currently, we are only using GEO datasets for queries which are all related to gene
expressions. But as we introduce more diverse datasets for our platform in the future, e.g. immunology and infectious

676
Table 1. MRR@10, Recall@1, Recall@10, Precision@1, and MAP@10 for recommenders using different vector
representations.
Metrics
Vector representations
MRR@10 Recall@1 Recall@10 Precision@1 MAP@10

TF-IDF 0.721 0.655 0.803 0.677 0.719


Term-frequency
BM25 0.785 0.742 0.833 0.756 0.783
based & topic
modeling LSA 0.565 0.518 0.640 0.528 0.564
Embedding based: word2vec 0.656 0.615 0.712 0.626 0.655
Not pretrained on
NLP tasks doc2vec 0.601 0.562 0.655 0.572 0.600
ELMo 0.364 0.341 0.400 0.346 0.364
InferSent 0.534 0.502 0.579 0.511 0.534
Embedding based: USE 0.411 0.377 0.468 0.383 0.411
Context dependent, BERT 0.503 0.468 0.563 0.476 0.505
pretrained on NLP
tasks SciBERT 0.435 0.399 0.493 0.406 0.434
BioBERT 0.540 0.498 0.605 0.507 0.539
roBERTa 0.509 0.468 0.572 0.476 0.508
DistilBERT 0.501 0.463 0.558 0.471 0.500

disease datasets, the heterogeneity might require more advanced embedding methods. Further, as we observe
approximately 8% improvement from regular BERT to BioBERT, we think it might be of importance for NLP models
to be further trained on domain-specific corpus for better feature representations for cosine similarity. Another possible
reason could be that, as these embeddings were pre-trained on standardized tasks, thus the embeddings might be
specialized towards those tasks instead of representing simple semantic information. This could explain the
observation that general text embeddings, e.g. word2vec, doc2vec, perform better than other more specialized NLP
models, e.g. ELMo and BERT, which were pre-trained to perform on tasks such as Q&A, sequence classification.
Therefore, we might be able to take full advantage of their potentials when formulating our problem from a simple
cosine similarity between query and documents to matching classification for example; a format closer to how these
models were designed for in the first place. That is also the direction we are heading towards for future experiments.
Even though we do not currently have users’ feedback for manual evaluations, we did, however, manually inspect the
recommendation results for the completeness of our experiments, particular for those where the cited articles did not
appear within the top 5 recommendations. We randomly sampled 20 such data series and examined recommended
papers by thoroughly reading through papers’ abstract, introduction, and methods. We had some interesting
observations regarding those cases: For example, for ‘GSE96174’ data series, even though our top 5 recommendations
did not include the existing related article, three of them actually cited and used the data series as relevant research
materials. Another example is that of ‘GSE27139’, where our top recommendations were from the same author that
submitted the data series, and those articles were extensions from their previous research work. Due to time limitation,
we could not check all the 13,013 cases, but we found at least 10 cases (‘GSE96174’, ‘GSE836’, ‘GSE92231’,
‘GSE78579’, ‘GSE96211’, ‘GSE27139’,‘GSE10903’, ‘GSE105628’, ‘GSE44657’, ‘GSE81888’) that had similar
situations as we mentioned above and where the top 3 recommendations were, to the best of our judgement, associated
with data series of concern, even though they did not appear in the citation as of the time we did our experiments.
Therefore, we believe that our recommendation systems might do even better in the real setting than the evaluations
presented here.
We want to mention that we also experimented with re-ranking. The final ranking score is defined as the previous
cosine-similarity adding a re-ranking score, with the re-ranking score calculated using cosine similarity of only titles
of the queried dataset and of the articles. We did not find statistically significant improvements, and therefore did not
report the results in this paper.

677
Discussion
In this work, we developed a scholarly recommendation system to identify and recommend research-papers relevant
to public datasets. The sources of papers and datasets are PubMed and Gene Expression Omnibus (GEO) series,
respectively. Different techniques for representing textual data ranging from traditional term- frequency based
methods and topic-modeling to embeddings are employed and compared in this work. Our results show that
embedding models that perform well in their standardized NLP tasks, failed to outperform term-frequency based
probabilistic methods such as BM25. General embeddings (word2vec and doc2vec) performed better than more
specialized embeddings (ELMo and BERT) and domain-specific embeddings (BioBERT) performed better than non-
domain specific embeddings (BERT). In future experiments, we plan to develop a hybrid method combining the
strengths of the term-frequency approach and also embeddings to maximize their potentials in different (heterogeneous
vs. homogeneous) problem scenarios. In addition, we plan to engage users in rating our recommendations, use
interrater agreement approach to further evaluate results, and incorporate the feedback to further improve our system.
We hope to utilize content-based and collaborative filtering for better recommendations.
Given their usefulness, extending the applications of recommender systems to aid scholars in finding relevant
information and resources will significantly enhance research productivity and will ultimately promote data and
resources reusability.

References
1. Ali M, Johnson CC, Tang AK. Parallel collaborative filtering for streaming data. University of Texas Austin,
Tech. Rep. 2011 Dec 8:5-7.
2. Bell RM, Koren Y. Lessons from the Netflix prize challenge. Acm Sigkdd Explorations Newsletter. 2007 Dec
1;9(2):75-9.
3. Vaz PC, Martins de Matos D, Martins B, Calado P. Improving a hybrid literary book recommendation system
through author ranking. InProceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries 2012 Jun
10 (pp. 387-388).
4. Li L, Chu W, Langford J, Schapire RE. A contextual-bandit approach to personalized news article
recommendation. InProceedings of the 19th international conference on World wide web 2010 Apr 26 (pp. 661-
670).
5. Bollacker KD, Lawrence S, Giles CL. CiteSeer: An autonomous web agent for automatic retrieval and
identification of interesting publications. InProceedings of the second international conference on Autonomous
agents 1998 May 1 (pp. 116-123).
6. Achakulvisut T, Acuna DE, Ruangrong T, Kording K. Science Concierge: A fast content-based recommendation
system for scientific publications. PloS one. 2016 Jul 6;11(7):e0158423.
7. Yoneya T, Mamitsuka H. PURE: a PubMed article recommendation system based on content-based filtering.
Genome informatics. 2007;18:267-76.
8. Lin J, Wilbur WJ. PubMed related articles: a probabilistic topic-based model for content similarity. BMC
bioinformatics. 2007 Dec 1;8(1):423.
9. Collins A, Beel J. Document Embeddings vs. Keyphrases vs. Terms: An Online Evaluation in Digital Library
Recommender Systems. arXiv preprint arXiv:1905.11244. 2019 May 27.
10. Hassan HA, Sansonetti G, Gasparetti F, Micarelli A, Beel J. BERT, ELMo, USE and InferSent Sentence
Encoders: The Panacea for Research-Paper Recommendation? InRecSys (Late-Breaking Results) 2019 (pp. 6-
10).
11. About GEO datasets [Internet]. GEO. 2020 [cited 18 August 2020]. Available from:
https://www.ncbi.nlm.nih.gov/geo/info/datasets.html.
12. NIH strategic plan for data science [Internet]. National Institutes of Health. 2020 [cited 18 August 2020].
Available from: https://datascience.nih.gov/strategicplan.
13. Hienert D, Sawitzki F, Mayr P. Digital library research in action–supporting information retrieval in sowiport. D-
Lib Magazine. 2015 Mar 4;21(3):4.
14. Kopp O, Breitenbücher U, Müller T. CloudRef-Towards Collaborative Reference Management in the Cloud.
InZEUS 2018 (pp. 63-68).
15. Han J, Kamber M, Pei J. Getting to know your data. Data mining: concepts and techniques. 2011;3(744):39-81.
16. Rajaraman A, Ullman JD. Data mining. In: mining of massive datasets. Cambridge: Cambridge University Press;
2011. p. 1–17.

678
17. Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: Machine learning in Python. the Journal of machine
Learning research. 2011 Nov 1;12:2825-30.
18. Robertson SE, Walker S, Jones S, Hancock-Beaulieu MM, Gatford M. Okapi at TREC-3. Nist special publication
Sp 109 (1995): 109
19. Rehurek R, Sojka P. Software framework for topic modelling with large corpora. InIn Proceedings of the LREC
2010 Workshop on New Challenges for NLP Frameworks 2010.
20. Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv
preprint arXiv:1301.3781. 2013 Jan 16.
21. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed representations of words and phrases and their
compositionality. InAdvances in neural information processing systems 2013 (pp. 3111-3119).
22. Peters ME, Neumann M, Iyyer M, et al. Deep contextualized word representations. arXiv preprint
arXiv:1802.05365. 2018 Feb 15.
23. Chelba C, Mikolov T, Schuster M, et al. One billion word benchmark for measuring progress in statistical
language modeling. arXiv preprint arXiv:1312.3005. 2013 Dec 11.
24. Abadi M, Agarwal A, Barham P, et al. TensorFlow: Large-scale machine learning on heterogeneous systems.
25. Conneau A, Kiela D, Schwenk H, Barrault L, Bordes A. Supervised learning of universal sentence representations
from natural language inference data. arXiv preprint arXiv:1705.02364. 2017 May 5.
26. Bowman SR, Angeli G, Potts C, Manning CD. A large annotated corpus for learning natural language inference.
arXiv preprint arXiv:1508.05326. 2015 Aug 21.
27. Facebookresearch / InferSent [Internet]. GitHub repository. 2020 [cited 18 August 2020]. Available from:
https://github.com/facebookresearch/InferSent.
28. Cer D, Yang Y, Kong SY, et al. Universal sentence encoder. arXiv preprint arXiv:1803.11175. 2018 Mar 29.
29. Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language
understanding. arXiv preprint arXiv:1810.04805. 2018 Oct 11.
30. Reimers N, Gurevych I. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint
arXiv:1908.10084. 2019 Aug 27.
31. Beltagy I, Lo K, Cohan A. SciBERT: A pretrained language model for scientific text. arXiv preprint
arXiv:1903.10676. 2019 Mar 26.
32. Semantic scholar | AI-Powered Research Tool [Internet]. Semanticscholar.org. 2020 [cited 18 August 2020].
Available from: https://www.semanticscholar.org/
33. Lee J, Yoon W, Kim S, et al.(2019). Biobert: a pretrained biomedical language representation model for
biomedical text mining. arXiv preprint arXiv:1901.08746.
34. Liu Y, Ott M, Goyal N, et al. Roberta: A robustly optimized bert pretraining approach. arXiv preprint
arXiv:1907.11692. 2019 Jul 26.
35. Mackenzie J, Benham R, Petri M, Trippas JR, Culpepper JS, Moffat A. CC-News-En: A Large English News
Corpus.
36. Sanh V, Debut L, Chaumond J, Wolf T. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and
lighter. arXiv preprint arXiv:1910.01108. 2019 Oct 2.
37. Craswell N. Mean reciprocal rank. Encyclopedia of database systems. 2009;1703.
38. Ogilvie P, Callan J. The effectiveness of query expansion for distributed information retrieval. In Proceedings of
the tenth international conference on Information and knowledge management, pp. 183-190. 2001

679

You might also like