Harnessing Retrieval Augmented Generatio
Harnessing Retrieval Augmented Generatio
2 Related Work
Yom. et. al [14] presents an algorithm to estimate query
difficulty. Estimation is based on the agreement between the Figure 1: Iteration loop to find knowledge gaps
top results of the full query and the top results of its
sub-queries. In doing so, difficult queries reveal gaps in a Our approach utilises AskPandi [12], a
content library. The methodology is based on training an Retrieval-Augmented Generation (RAG) system, to mimic
estimator based on a small dataset. We argue that there are user behaviour. AskPandi integrates Bing's web index for
now simpler LLM prompting techniques that do not require data retrieval and GPT as a reasoning engine. After finding
training a custom model and yield better generalisation an answer, we capitalise on the in-context capabilities [5, 6,
across multiple domains. 7] of LLMs to generate a series of relevant follow-up
questions. This process is guided by the premise that a
well-generalised [8] LLM should provide useful
Additionally, we discovered that on average, a It's worth pointing out that we don’t have direct access to
knowledge gap is encountered at the fifth level of topic a web index to do a more rigorous evaluation. Future work
depth. This suggests that the internet may have limitations could consider the system’s ability to predict whether a
in providing in-depth information on certain subjects. Our query is a MCQ (missing content query) [14] given
methodology effectively highlights these knowledge gaps, gold-standard labels (perhaps using a TREC-style test
showing a straightforward approach to identifying them in collection and removing the relevant documents from the
various topics. collection for some queries).
REFERENCES
6 Applications [1] Dmitri Brereton. 2022. Google Search Is Dying. Published on February
15, 2022. [Online]. Available: https://dkb.io/post/google-search-is-dying
Recommending nonexistent content is a powerful tool for [2] Edwin Chen. 2022. Is Google Search Deteriorating? Measuring Google's
revealing knowledge gaps. This approach has a wide range Search Quality in 2022. Published on January 10, 2022. [Online].
of applications, including: Available:
https://www.surgehq.ai/blog/is-google-search-deteriorating-measuring-s
earch-quality-in-2022
1. Scientific Discovery: It can pinpoint unexplored areas in [3] Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir
Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih,
research, highlighting future research topics that have Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. 2021.
yet to be investigated. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.
arXiv:2005.11401 [cs.CL].
2. Educational Enhancement: By identifying missing
[4] Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya
elements in learning materials, it helps in creating more Sutskever. 2019. Language Models are Unsupervised Multitask
comprehensive educational resources. Learners. In Proceedings of the 2019 Conference. [Online]. Available:
https://api.semanticscholar.org/CorpusID:160025533
3. Research Development: This method can uncover [5] Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Ed H. Chi,
untapped research opportunities, guiding scholars and Quoc Le, and Denny Zhou. 2022. Chain of Thought Prompting Elicits
Reasoning in Large Language Models. CoRR, abs/2201.11903. [Online].
scientists towards novel inquiries. Available: https://arxiv.org/abs/2201.11903
4. Market Analysis: In the business realm, it can reveal [6] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D
product gaps in a catalogue, offering insights for new Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish
Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen
product development. Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler,
Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler,
Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher