InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers

Boytsov, Leonid; Patel, Preksha; Sourabh, Vivek; Nisar, Riddhi; Kundu, Sayani; Ramanathan, Ramya; Nyberg, Eric

Computer Science > Information Retrieval

arXiv:2301.02998 (cs)

[Submitted on 8 Jan 2023 (v1), last revised 21 Feb 2024 (this version, v2)]

Title:InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers

Authors:Leonid Boytsov, Preksha Patel, Vivek Sourabh, Riddhi Nisar, Sayani Kundu, Ramya Ramanathan, Eric Nyberg

View PDF HTML (experimental)

Abstract:We carried out a reproducibility study of InPars, which is a method for unsupervised training of neural rankers (Bonifacio et al., 2022). As a by-product, we developed InPars-light, which is a simple-yet-effective modification of InPars. Unlike InPars, InPars-light uses 7x-100x smaller ranking models and only a freely available language model BLOOM, which -- as we found out -- produced more accurate rankers compared to a proprietary GPT-3 model. On all five English retrieval collections (used in the original InPars study) we obtained substantial (7%-30%) and statistically significant improvements over BM25 (in nDCG and MRR) using only a 30M parameter six-layer MiniLM-30M ranker and a single three-shot prompt. In contrast, in the InPars study only a 100x larger monoT5-3B model consistently outperformed BM25, whereas their smaller monoT5-220M model (which is still 7x larger than our MiniLM ranker) outperformed BM25 only on MS MARCO and TREC DL 2020. In the same three-shot prompting scenario, our 435M parameter DeBERTA v3 ranker was at par with the 7x larger monoT5-3B (average gain over BM25 of 1.3 vs 1.32): In fact, on three out of five datasets, DeBERTA slightly outperformed monoT5-3B. Finally, these good results were achieved by re-ranking only 100 candidate documents compared to 1000 used by Bonifacio et al. (2022). We believe that InPars-light is the first truly cost-effective prompt-based unsupervised recipe to train and deploy neural ranking models that outperform BM25. Our code and data is publicly available. this https URL

Subjects:	Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2301.02998 [cs.IR]
	(or arXiv:2301.02998v2 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2301.02998

Submission history

From: Leonid Boytsov [view email]
[v1] Sun, 8 Jan 2023 08:03:46 UTC (73 KB)
[v2] Wed, 21 Feb 2024 04:20:55 UTC (65 KB)

Computer Science > Information Retrieval

Title:InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:InPars-Light: Cost-Effective Unsupervised Training of Efficient Rankers

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators