RobustQA: Benchmarking the Robustness of Domain Adaptation for Open-Domain Question Answering

Rujun Han, Peng Qi, Yuhao Zhang, Lan Liu, Juliette Burger, William Yang Wang, Zhiheng Huang, Bing Xiang, Dan Roth


Abstract
Open-domain question answering (ODQA) is a crucial task in natural language processing. A typical ODQA system relies on a retriever module to select relevant contexts from a large corpus for a downstream reading comprehension model. Existing ODQA datasets consist mainly of Wikipedia corpus, and are insufficient to study models’ generalizability across diverse domains as models are trained and evaluated on the same genre of data. We propose **RobustQA**, a novel benchmark consisting of datasets from 8 different domains, which facilitates the evaluation of ODQA’s domain robustness. To build **RobustQA**, we annotate QA pairs in retrieval datasets with rigorous quality control. We further examine improving QA performances by incorporating unsupervised learning methods with target-domain corpus and adopting large generative language models. These methods can effectively improve model performances on **RobustQA**. However, experimental results demonstrate a significant gap from in-domain training, suggesting that **RobustQA** is a challenging benchmark to evaluate ODQA domain robustness.
Anthology ID:
2023.findings-acl.263
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4294–4311
Language:
URL:
https://aclanthology.org/2023.findings-acl.263
DOI:
10.18653/v1/2023.findings-acl.263
Bibkey:
Cite (ACL):
Rujun Han, Peng Qi, Yuhao Zhang, Lan Liu, Juliette Burger, William Yang Wang, Zhiheng Huang, Bing Xiang, and Dan Roth. 2023. RobustQA: Benchmarking the Robustness of Domain Adaptation for Open-Domain Question Answering. In Findings of the Association for Computational Linguistics: ACL 2023, pages 4294–4311, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
RobustQA: Benchmarking the Robustness of Domain Adaptation for Open-Domain Question Answering (Han et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-acl.263.pdf
Video:
 https://aclanthology.org/2023.findings-acl.263.mp4