6 Study of Question Answering On
6 Study of Question Answering On
net/publication/366435126
CITATIONS READS
2 19
5 authors, including:
Gissella Bejarano
Baylor University
12 PUBLICATIONS 31 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Tom Černý on 30 January 2023.
Researchers have dedicated significantly to the The PolicyQA dataset is a reading comprehen-
Question Answering systems in the legal domain in sion dataset that contains 25,017 reading compre-
recent times. According to the study by (Martinez- hension style examples curated from an existing
Gil, 2021) Deep Learning models achieved the best corpus of 115 website privacy policies. PolicyQA
results. The most recent research success in Ques- provides 714 human-annotated questions for a wide
tion Answering and Legal Question Answering range of privacy practices (Ahmad et al., 2020).
(LQA) has come from Neural Attentive Text Rep- Both datasets are designed for the extractive
resentation. Few Shot Learning in the legal domain Question Answering, where the answer is a span
and diverse applications of the successful BERT of text in the passage. Also, the passage might not
model (Devlin et al., 2018; Martinez-Gil, 2021). be related to the question and does not contain the
answer.
There are various datasets for working with
Question Answering tasks in the general-purpose
domain, and the SQuAD is the most recognized 3.2 Models
because of achieving benchmark results (Rajpurkar We selected a set of the most used BERT-related
et al., 2018). In the legal domain, JEC-QA (Zhong models with outstanding performance in the
et al., 2020), ResPubliQA (Peñas et al., 2009), JRC- SQuAD dataset like ALBERT, RoBERTa, and clas-
ACQUIS Multilingual Parallel Corpus (Steinberger sic BERT. Additionally, we utilized the DistilBERT
et al., 2006) are well recognized. In the legal do- model because it is a cheaper and smaller model
main of Software Development of Question An- with competitive capabilities compared to other
swering datasets, PolicyQA (Ahmad et al., 2020) bigger BERT-based models. Thus, it is feasible to
is one of the most well-known and compatible with choose the DistilBERT model for speed during in-
the SQuAD dataset format. However, the study on ference and usability on devices (Sanh et al., 2019).
the performance of some significant benchmarks Moreover, we tested the LEGAL-BERT model, a
in general Question Answering in these domain- version of the original BERT model trained from
specific datasets is limited (Martinez-Gil, 2021). scratch with legal documents (Chalkidis et al.,
2020). We compared it with the other general-
To the best of our knowledge, this is the first purpose models to see if we could get better results
work studying the performance on the PolicyQA using a legal-based BERT in the PolicyQA dataset
dataset using as architecture only variants of BERT- than others trained in general text.
based model encoder besides the original BERT.
Also, the best results reported have been on the
work (Ahmad et al., 2020) using the original BERT 4 Experiments
model with 29.5 of Exact Match (EM) and 56.11 We conducted our experiment using the pretrained
of F1. This work showed that ALBERT is a better versions of the models BERT, ALBERT, RoBERTa,
encoder and obtains the best results in our study on DistilBERT, and LEGAL-BERT. The benchmark
the PolicyQA dataset. in the Question Answering (QA) task is evaluated
Dataset Name BERT ALBERT LEGAL-BERT RoBERTa DistillBERT
EM F1 EM F1 EM F1 EM F1 EM F1
SQUAD V2.0
71.6 77.39 73.9 77.91 73.5 77.01 76.9 80.1 65.47 69.27
5 epochs
PolicyQA
29.5 56.11 28.76 57.36 28.08 54.66 27.23 54.88 25.42 52.34
5 epochs
SQUAD V2.0
71.7 75.39 72.71 77.21 71.5 75.30 75.18 74.30 64.79 68.93
10 epochs
PolicyQA
29.6 57.02 29.7 58.43 28.45 55.01 27.85 54.91 25.81 52.48
10 epochs
Wasi Uddin Ahmad, Jianfeng Chi, Yuan Tian, and Kai- Anselmo Peñas, Pamela Forner, Richard Sutcliffe, Ál-
Wei Chang. 2020. Policyqa: A reading compre- varo Rodrigo, Corina Forăscu, Iñaki Alegria, Danilo
hension dataset for privacy policies. arXiv preprint Giampiccolo, Nicolas Moreau, and Petya Osenova.
arXiv:2010.02557. 2009. Overview of respubliqa 2009: Question an-
swering evaluation over european legislation. In
Ilias Chalkidis, Manos Fergadiotis, Prodromos Malaka- Workshop of the Cross-Language Evaluation Forum
siotis, Nikolaos Aletras, and Ion Androutsopoulos. for European Languages, pages 174–196. Springer.
2020. Legal-bert: The muppets straight out of law Pranav Rajpurkar, Robin Jia, and Percy Liang. 2018.
school. arXiv preprint arXiv:2010.02559. Know what you don’t know: Unanswerable questions
for squad. arXiv preprint arXiv:1806.03822.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and
Kristina Toutanova. 2018. Bert: Pre-training of deep
bidirectional transformers for language understand- Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and
ing. arXiv preprint arXiv:1810.04805. Percy Liang. 2016. Squad: 100,000+ questions
for machine comprehension of text. arXiv preprint
Sanjay K Dwivedi and Vaishali Singh. 2013. Research arXiv:1606.05250.
and reviews in question answering system. Procedia
Technology, 10:417–424.
Victor Sanh, Lysandre Debut, Julien Chaumond, and
Daniel Jurafsky and James H Martin. Speech and lan- Thomas Wolf. 2019. Distilbert, a distilled version
guage processing: An introduction to natural lan- of bert: smaller, faster, cheaper and lighter. arXiv
guage processing, computational linguistics, and preprint arXiv:1910.01108.
speech recognition.
Ralf Steinberger, Bruno Pouliquen, Anna Widiger,
Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Camelia Ignat, Tomaz Erjavec, Dan Tufis, and Dániel
Kevin Gimpel, Piyush Sharma, and Radu Soricut. Varga. 2006. The jrc-acquis: A multilingual aligned
2019. Albert: A lite bert for self-supervised learn- parallel corpus with 20+ languages. arXiv preprint
ing of language representations. arXiv preprint cs/0609058.
arXiv:1909.11942.
Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Haoxi Zhong, Chaojun Xiao, Cunchao Tu, Tianyang
Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. Zhang, Zhiyuan Liu, and Maosong Sun. 2020. Jec-
2020. Biobert: a pre-trained biomedical language qa: A legal-domain question answering dataset. In
representation model for biomedical text mining. Proceedings of the AAAI Conference on Artificial
Bioinformatics, 36(4):1234–1240. Intelligence, volume 34, pages 9701–9708.