Evaluating Document Representations for Content-based Legal Literature Recommendations

Ostendorff, Malte; Ash, Elliott; Ruas, Terry; Gipp, Bela; Moreno-Schneider, Julian; Rehm, Georg

Computer Science > Computation and Language

arXiv:2104.13841 (cs)

[Submitted on 28 Apr 2021]

Title:Evaluating Document Representations for Content-based Legal Literature Recommendations

Authors:Malte Ostendorff, Elliott Ash, Terry Ruas, Bela Gipp, Julian Moreno-Schneider, Georg Rehm

View PDF

Abstract:Recommender systems assist legal professionals in finding relevant literature for supporting their case. Despite its importance for the profession, legal applications do not reflect the latest advances in recommender systems and representation learning research. Simultaneously, legal recommender systems are typically evaluated in small-scale user study without any public available benchmark datasets. Thus, these studies have limited reproducibility. To address the gap between research and practice, we explore a set of state-of-the-art document representation methods for the task of retrieving semantically related US case law. We evaluate text-based (e.g., fastText, Transformers), citation-based (e.g., DeepWalk, Poincaré), and hybrid methods. We compare in total 27 methods using two silver standards with annotations for 2,964 documents. The silver standards are newly created from Open Case Book and Wikisource and can be reused under an open license facilitating reproducibility. Our experiments show that document representations from averaged fastText word vectors (trained on legal corpora) yield the best results, closely followed by Poincaré citation embeddings. Combining fastText and Poincaré in a hybrid manner further improves the overall result. Besides the overall performance, we analyze the methods depending on document length, citation count, and the coverage of their recommendations. We make our source code, models, and datasets publicly available at this https URL.

Comments:	Accepted for publication at ICAIL 2021
Subjects:	Computation and Language (cs.CL); Information Retrieval (cs.IR)
Cite as:	arXiv:2104.13841 [cs.CL]
	(or arXiv:2104.13841v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2104.13841

Submission history

From: Malte Ostendorff [view email]
[v1] Wed, 28 Apr 2021 15:48:19 UTC (1,136 KB)

Computer Science > Computation and Language

Title:Evaluating Document Representations for Content-based Legal Literature Recommendations

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Evaluating Document Representations for Content-based Legal Literature Recommendations

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators