0% found this document useful (0 votes)
14 views6 pages

Related - A Comparison Study of Pre-Trained Language Models For Chinese Legal Document Classification

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views6 pages

Related - A Comparison Study of Pre-Trained Language Models For Chinese Legal Document Classification

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2022 IEEE The 5th International Conference on Artificial Intelligence and Big Data

A Comparison Study of Pre-trained Language Models


for Chinese Legal Document Classification
2022 5th International Conference on Artificial Intelligence and Big Data (ICAIBD) | 978-1-6654-9913-2/22/$31.00 ©2022 IEEE | DOI: 10.1109/ICAIBD55127.2022.9820466

Ruyu Qin Min Huang* Yutong Luo


School of Artificial Intelligence School of Artificial Intelligence No.2 High School of East China Normal
University of Chinese Academy of Sciences University of Chinese Academy of Sciences University
Beijing, China Beijing, China Shanghai, China
[email protected] [email protected] [email protected]

Abstract—Legal artificial intelligence (LegalAI), aiming to essential branch of NLP technology, and is suitable for solving
benefit the legal domain using artificial intelligence technologies, LegalAI tasks such as legal domain classification and court
is the hot topic of the moment. As the basis for various LegalAI decision prediction as mentioned above.
tasks such as judgment prediction and similar case matching, the
classification of legal documents is an issue that has to be Information of plaintiff Plaintiff: XXX Co., Ltd.
addressed. The majority of current approaches focus on the legal and defendant Defendant: Mr. Lu, Mr. Li
systems of native English-speaking countries. However, both
Chinese language and legal system differ significantly from that of
The trial found that: October 22, 2014, the
English. Given the success of pre-trained Language Models (PLMs) Fact Description
plaintiff and the defendant Lu signed a
and outperformance compared with feature-engineering-based "personal loan/guarantee contract", which
machine learning models as well as traditional deep neural agreed that the plaintiff to the defendant Lu
network models such as CNNs and RNNs in NLP, their borrowed 5 million yuan …
effectiveness in specific domains needs to be further investigated,
especially in legal domain. Moreover, few studies have made The court believes that the personal loan /
Court opinion
comparisons of these PLMs for specific legal tasks. Therefore, in guarantee contract signed by the plaintiff
this paper we train several strong PLMs which differ in pre- and the defendant Lu is legal and valid and
should be protected by law. The plaintiff
training corpus on three datasets of Chinese legal documents. provided the defendant with a loan of
Experimental results show that the model pre-trained on the legal 5000000 yuan in accordance with the
corpus demonstrates its high efficiency on all datasets. contract…

Keywords—legal document, document classification, PLM In conclusion, according to the provisions


Judgement
of paragraph XX of Article XX of the
I. INTRODUCTION contract law of the people's Republic of
China, the judgment is as follows:
In recent years, along with the in-depth development and 1、 The defendant Li paid the plaintiff
continuous innovation in the field of artificial intelligence (AI), XXX Co., Ltd. the loan principal of 5
million yuan and interest of 1.3 million
the impact of AI technology on the industrial field has become yuan within 30 days after the effectiveness
more and more profound, and is no longer limited to the of this judgment…
computer field, but has penetrated into all walks of life. The
legal field, as a field with a large amount of data accumulation, Fig. 1. An example of Chinese legal text (judgement document), mainly
is very suitable for diverse AI technologies driven by data including four parts: information of plaintiff and defendant, fact description,
court opinion and judgement.
nowadays [1]. For this reason, an increasing number of
researchers have launched researches on this direction of Legal There are a wide range of previous works that have been
Artificial Intelligence (LegalAI). proposed to apply traditional statistical models or machine
The core focus of LegalAI is on how to use existing AI learning models to the legal field. Some of them [2] use KNN as
technologies to help the judicial community solve certain classification algorithm, while other works [3, 4] choose to
problems. These problems include all kinds of aspects such as employ a machine learning model like SVM as the classifier. In
automated judgement, similar case matching, judicial Q&A, and addition, the concerns of these approaches vary: some [3–5] are
other possible applications. The first and most fundamental step dedicated to obtaining higher classification accuracy whereas
in tackling these issues is to address the classification of legal others [6–8] strive to make the classification results more
documents, as many legal tasks eventually turn into text interpretable, and the majority focus on a single language, while
classification. others [9, 10] concentrate on multi-lingual tasks. Recently, deep
learning models represented by various types of neural networks,
Document classification, another term for text classification which perform remarkably in a diverse range of NLP tasks
when dealing with over-length text, is a relatively basic but very thanks to their powerful feature learning capabilities, have also
*
Corresponding author.

978-1-6654-9913-2/22/$31.00 ©2022 IEEE 444


Authorized licensed use limited to: Universidade Federal de Sergipe. Downloaded on July 04,2024 at 11:37:07 UTC from IEEE Xplore. Restrictions apply.
been utilized for legal document classification, such as CNN the field to which the legal text belongs, attaining a F1-measure
[11–15] and RNN [16–20]. Besides, pre-trained language of 76%. Aletras et al. [4] use multiple SVMs to classify several
models are also introduced to legal domain [21–29]. semantic features of cases, and then classify them respectively
to predict the judgment of the European Court of human rights.
Compared with English, Chinese has a richer vocabulary as In [33], texts of legal cases are classified with respect to the
well as more synonyms and near-synonyms. Previous methods given ontology by matching the ontological terms enriched by
of LegalAI are also concentrated more on the legal systems of means of a “wikification” mechanism that basically takes
native English-speaking countries. Owing to the vigorous advantage of Wikipedia commonsense knowledge base to
promotion of "smart court" construction by Chinese Supreme expand the ontological terms with additional elements or labels
People's Court, the publicization of judgment documents has led and concepts extracted from the input text, thus returning a
to the creation of a large storage volume and standardized format number of legal issues, each of which has a corresponding
of judgment documents, which provides a solid foundation for relevance to the text according to its score.
NLP technologies driven by text data. Judgment documents are
the records and summaries of case situation, evidential facts, Most of these early works utilize a combination of feature
trial process and trial basis by judges after completing the trial, engineering and statistical machine learning models, and use
and contain a huge amount of valuable and to-be-discovered supervised learning methods to train classifiers, with relatively
information. The structure of a Chinese legal document is shown good classification performance and interpretability of results,
in Fig. 1. but poor scalability when the text labeling system changes due
to excessive reliance on feature design and manual annotation.
As mentioned before, current approaches are mostly based
on statistical machine learning for feature engineering, feature B. Deep Learning for Legal Document Classification
extraction based on deep neural networks like CNNs or RNNs, Over the past few years, deep learning-based models,
and end-to-end approaches with Transformer-based PLMs. represented by various types of neural networks, have attracted
Given the recent advances and the capability of handling extra- increasing attention from legal domain with their powerful
long texts without feature engineering, Transformer-based feature learning capabilities. Xiao et al. [11] introduce Multi-
PLMs are known to outperform the majority of other neural task CNN [34] models to Chinese legal questions classification,
network models. in which the coarse grained classification is main task while the
Therefore, in this paper, we choose four representative fine grained classification is the side task. Wei et al. [12]
models, pre-trained on general and legal domain corpus implement a legal document classifier using CNN, and their
respectively, then compare their performances on three Chinese experimental results show that the performance of CNN model
legal document datasets in terms of classification accuracy. on large-scale training set is significantly better than SVM.
Elnaggar et al. [35] overcome the data sparsity problem in the
II. RELATED WORK German legal domain by multi-task migration learning
Unlike general texts in finance and journalism fields where including multi-label classification. Keeling et al. [13] compare
a large quantity of researches has been conducted, legal texts are CNN with other popular machine learning algorithms for text
characterized by their considerable long length, which makes classification, which include Logistic Regression (LR), SVM,
annotation of the data very time-consuming and laborious. In and Random Forest (RF). It turns out that while some algorithms
addition, legal texts involve a lot of terminologies in the legal perform better than others for specific combinations, none of
field and are extremely specialized, which makes them very them can outperform others across all the different combinations
tricky to handle. of experimental setup. Another work [14] contemporaneous
with [13] presents three novel deep neural network-based
Legal document classification is a key task in the application approaches for Korean legal document classification using CNN
of legal text processing, which identifies the category of a legal and RNN [36] with two different word embedding schemes, and
text according to the association between the legal text and label the results demonstrate that RNN model with Word2Vec
information. Different legal text processing tasks can be embedding achieves highest classification accuracy. Chalkidis
transformed into different types of text classification problems. et al. [37] study Extreme Multi-Label Text Classification
The sub-tasks of legal document classification include case type (XMTC) in EU Legislation by experimenting with several
classification, court decision prediction, argument mining, neural classifiers and they prove that BIGRUs [38] with self-
relevant laws analysis and so on. attention achieve the best overall performance. Furthermore,
there are a few methods [16–20] which compose LSTM-based
Literature related to this study are reviewed regarding three
model to solve classification for legal cases. Several works [9,
aspects: (1) machine learning for legal document classification,
10] are interested in multi-lingual legal cases and attempt to
(2) deep learning for legal document classification, (3) long-
propose unified frameworks to handle the problem.
document pre-trained language model.
As pre-trained language models (PLMs) refresh many
A. Machine Learning for Legal Document Classification records of NLP tasks in the generic domain, some researchers
In the earlier days, feature engineering and machine learning try to apply PLMs to the legal field. A part of the works [21]
models are commonly used in the legal field. Liu et al. [2] utilize only use them as a feature extractor to obtain legal text
KNN [30] algorithm to classify 12 common criminal charges in embedding, and the other part [22–25] finetune PLMs like
case-based reasoning system. Boella et al. [3] employ TF-IDF BERT [39] and RoBERTa [40] on the task of legal document
[31] weighting and Information Gain to select features, then classification to improve the performance. There are also some
train a Support Vector Machine (SVM) [32] classifier to identify

445
Authorized licensed use limited to: Universidade Federal de Sergipe. Downloaded on July 04,2024 at 11:37:07 UTC from IEEE Xplore. Restrictions apply.
researchers [26–29] choose to pre-train their own PLM on the To lower the quadratic complexity of Transformer, there are
legal corpus. However, most of these models adopt BERT as the many variants having been proposed. Some of them introduce
basic encoder, resulting in not being able to process texts longer Sparse Attention, which means limiting each token to attend to
than 512. only a subset of the other tokens rather than the whole sequence.
For instance, Sparse Transformer [42], Longformer [43],
C. Long-Document Pre-Trained Language Model BlockBERT [44] and Big Bird [45] use sliding window based
Transformer-based PLMs are the mainstream frameworks attention pattern. Except for this kind of location-based sparsity,
outperforming the state-of-the-art traditional deep neural content-based sparse attention is introduced in Reformer [46]
network models in various NLP tasks. Transformer [41] with the which uses Locality-Sensitive Hashing [47] to find the nearest
self-attention mechanism, which allows fully-connected neighbors of query vectors. In order to make full use of the
contextual encoding over input tokens, has achieved outstanding knowledge in the legal field, Lawformer[28] adopts Longformer
performances. However, it suffers from high calculating [43] as its basic encoder, being the first pre-trained language
complexity, which grows quadratically with the input sequence model for legal long documents.
length.

TABLE I. THE STATISTICS OF DATASETS

Dataset Subcategory Avg. Len. Max. Len. # Examples # Classes

small 422.90 39,586 166,734 133


CAIL2018
big 348.14 41,681 1,857,176 143

criminal 916.57 60,281 115,849 201


CAIL-Long
civil 1286.88 34,633 113,656 257

Court-View-Gen — 251.54 5 61 174,830 51

III. METHODOLOGY AND EXPERIMENTS criminal cases but neglects civil cases, they built CAIL-
Long. Each criminal case is annotated with charges, the
Across classification experiments, |𝐶| is the size of the
category set, and 𝐷! is the dimension of the hidden states. As 0.6 0.3 … probability of each category

shown in Fig. 2, we feed the top-level hidden state ℎ[#$%] ∈ ℝ'!


of [CLS] token to a fully-connected layer (𝑊[#$%] ∈ ℝ'!×|#| ), Linear Classifier
and the output of this layer is the probability of each category.
Then the corresponding category with highest probability will
Pre-trained Language Model
be selected as the label of the input text.
A. Datasets
[CLS] x
We evaluate the performance of chosen pre-trained models
on three datasets of legal domains. The statistics of all the
datasets are summarized in TABLE I. and brief descriptions Input x (A Legal Document):After hearing, it was found
about these datasets are as below: that the defendant Kang owed the plaintiff Wang a loan
principal of 10000 yuan on May 10, 2015, and did not agree
• CAIL2018 [48, 49] is a legal judgement prediction on the loan interest and repayment period…
dataset provided by Chinese AI and Law challenge
(CAIL), including CAIL-small (the exercise stage Fig. 2. The Framework of classification process for a legal document.
dataset) and CAIL-big (the first stage dataset) , which
differ only in the amount of data. Each case in CAIL2018 relevant laws, and the term of penalty while each civil
case is annotated with the causes of actions and the
consists of two parts, i.e., fact description and
corresponding judgment result. The latter is refined into relevant laws. For both criminal and civil cases, we take
fact descriptions as inputs and select charges and cause
3 representative ones, including relevant law articles,
charges, and prison terms. For legal document of actions as the labels of two types of cases respectively.
classification task, we choose the charge as the label of • Court-View-Gen [50] consists of short texts and the
each case. Besides, charges with low frequency are majority of texts are no longer than 256 tokens, which is
excluded, which means charges that appear less than 100 different from two datasets mentioned above. Cases with
times will be removed. multiple charges and multiple defendants are omitted in
Court-View-Gen. We use this dataset to test whether
• CAIL-Long [28] is also provided by CAIL. Given that
those pre-trained models made for long texts can also
the average length of CAIL2018 is much shorter than the
length of real-world cases and CAIL2018 contains only perform well on short texts.

446
Authorized licensed use limited to: Universidade Federal de Sergipe. Downloaded on July 04,2024 at 11:37:07 UTC from IEEE Xplore. Restrictions apply.
B. Pre-trained Models 𝑇* T,
𝑃= ,R =
In our study, we compare the performance of two general 𝑇* + 𝐹* T, + F-
domain pre-trained models with that of two legal domain pre- 𝑃. × 𝑅. 𝑠𝑎𝑚𝑝𝑙𝑒.
trained models on the above datasets: 𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 𝐹1. = 2 × ×
𝑃. + 𝑅. 𝑠𝑎𝑚𝑝𝑙𝑒/00
+
• Longformer [43]: it uses sliding window attention and 1
global attention on pre-selected input locations to obtain 𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 𝐹1 = G 𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 𝐹1.
𝑛
sparse attention, which enables it to process much longer .11
documents under equal computing resources. It is pre-
where 𝑇* , 𝑇+ , 𝐹* , 𝐹+ are true positives, true negatives, false
trained on generic corpus including Books, Stories,
Wikipedia and Realnews. positives and false negatives respectively.

• Big Bird [45]: it consists of three main parts of attention, D. Experimental Results
namely random attention, local attention and global The summarization of experimental results on Court-View-
attention, where local attention and global attention are Gen, CAIL2018, and CAIL-Long are shown in TABLE III.
similar to Longformer. Random attention means that for
Conclusion 1: domain-specific long text PLM show
each token, a number of tokens are randomly chosen to
superiority on long text datasets.
attend. The datasets it uses for pre-training also exclude
legal texts. It can be clearly demonstrated that Lawformer achieves the
best performance among four models by a wide margin in both
• Lawformer [28]: it utilizes Longformer as basic encoder accuracy rate and weighted-average F1 score, with values
and collects tens of millions of case documents published exceeding the other models by almost 1% on Court-View-Gen,
by the Chinese government for pre-training. 2% on CAIL2018 and 4% on CAIL-Long. Besides, it is easy to
• Legal RoBERTa [28]: it is pre-trained on the same legal discover that Lawformer outperforms Legal RoBERTa, which
corpus with Lawformer, continuing from the released are both pre-trained on the legal corpus, by approximately 4%
RoBERTa-wwm-ext checkpoint. on CAIL-Long but only nearly 1% on Court-View-Gen. The
reasons for this result are that the average text length of CAIL-
TABLE II. HYPERPARAMETERS FOR DOCUMENT CLASSIFICATION Long is over 900 while that of Court-View-Gen does not surpass
the maximum input length restriction of 512 tokens set by BERT,
Parameter CAIL2018 CAIL-Long Court-View-Gen which leads to Legal RoBERTa needing to truncate text when
Num epochs 3 5 5 processing text over 512 tokens whereas Lawformer can handle
Max. Seq. Len. {512, 4096} up to 4096 tokens. Consequently, Lawformer is more effective
in processing long text.
Learning rate 1 × 10!"
Loss Cross Entropy Loss Conclusion 2: the classification effectiveness of PLM
Hidden layer size 768 decreases when the semantic composition and complexity of
Optimizer Adam the documents increase.
Vocab size 21128 It can be observed that among three datasets, only on civil
Activation layer GELU dataset of CAIL-Long, the weighted-average F1 scores of all
C. Experimental Setup models are lower than 0.8. After investigating, we find that
Chinese legal documents are divided into three types in terms of
Detailed list of hyperparameters in the experiments on
cause, i.e., civil, criminal and administrative with civil cases
CAIL2018, CAIL-Long and Court-View-Gen is provided in
being more difficult to handle than criminal cases due to their
TABLE II. The classification performance is evaluated with the
complex merits. In addition, the litigation claims of civil
overall accuracy and weighted-average F1 score. All these
documents are more diversified than criminal documents.
measures are formulated as follows:
Furthermore, all datasets used in our experiments are collected
𝑇* + 𝑇+ from criminal documents, except for the civil part of CAIL-
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = Long. These issues lead to the result that the performances of
𝑇* + 𝑇+ + 𝐹* + 𝐹+
these models are slightly inferior in civil part.

TABLE III. EXPERIMENTAL RESULTS ON DOCUMENT CLASSIFICATION TASK

CAIL2018 CAIL2018 CAIL-Long CAIL-Long


Court-View-Gen
Model (small) (big) (criminal) (civil)
Acc. F1 Acc. F1 Acc. F1 Acc. F1 Acc. F1

Longformer 87.32% 0.8603 85.59% 0.8478 93.55% 0.9263 91.16% 0.9010 76.91% 0.7343

Big Bird 87.66% 0.8646 86.27% 0.8553 93.87% 0.9294 91.45% 0.9015 77.99% 0.7490

Legal RoBERTa 88.21% 0.8701 84.93% 0.8367 92.18% 0.9229 91.98% 0.9098 76.76% 0.7383

Lawformer 89.17% 0.8785 88.34% 0.8814 96.05% 0.9553 95.24% 0.9484 81.95% 0.7958

447
Authorized licensed use limited to: Universidade Federal de Sergipe. Downloaded on July 04,2024 at 11:37:07 UTC from IEEE Xplore. Restrictions apply.
From TABLE III. , we can also see that Legal RoBERTa [9] A.-M. Avram, V. Pais, and D.I. Tufis, “PyEuroVoc: A Tool for
performs even slightly better than Longformer and Big Bird on Multilingual Legal Document Classification with EuroVoc Descriptors,”
in Proceedings of the International Conference on Recent Advances in
criminal part of CAIL-Long, which is a bit strange. Therefore, it Natural Language Processing (RANLP 2021), Held Online, 2021, pp. 92–
is worthy to further investigate which one performs better when 101.
short text processing model pre-trained on the corresponding [10] I. Chalkidis, M. Fergadiotis, and I. Androutsopoulos, “MultiEURLEX -
domain corpus is compared with long text processing model pre- A multi-lingual and multi-label legal document classification dataset for
trained on the general domain corpus. zero-shot cross-lingual transfer,” in Proceedings of the 2021 Conference
on Empirical Methods in Natural Language Processing, Online and Punta
IV. CONCLUSION Cana, Dominican Republic, 2021, pp. 6974–6996, doi:
10.18653/v1/2021.emnlp-main.559.
In this paper, we compare four pre-trained language models [11] G. Xiao, J. Mo, E. Chow, H. Chen, J. Guo, and Z. Gong, “Multi-Task
from corpus of different domains, which achieve state-of-the-art CNN for Classification of Chinese Legal Questions,” in 2017 IEEE 14th
in comparison with traditional machine learning or deep neural International Conference on e-Business Engineering (ICEBE), 2017, pp.
network models in various NLP tasks, on three Chinese legal 84–90, doi: 10.1109/ICEBE.2017.22.
documents datasets. The experimental results show that [12] F. Wei, H. Qin, S. Ye, and H. Zhao, “Empirical Study of Deep Learning
for Text Classification in Legal Document Review,” in 2018 IEEE
Lawformer, pre-trained on legal long document datasets, International Conference on Big Data (Big Data), 2018, pp. 3317–3320,
outperforms other models, which confirms the effectiveness of doi: 10.1109/BigData.2018.8622157.
the practice of pre-training models on different corpora [13] R. Keeling, R. Chhatwal, N. Huber-Fliflet, J. Zhang, F. Wei, H. Zhao, Y.
depending on a specific domain in order to improve the Shi, and H. Qin, “Empirical Comparisons of CNN with Other Learning
comprehension of texts in that domain. Algorithms for Text Classification in Legal Document Review,” in 2019
IEEE International Conference on Big Data (Big Data), 2019, pp. 2038–
For future work, we would like to tackle the inferior 2042, doi: 10.1109/BigData47090.2019.9006248.
performance in civil cases to criminal cases in terms of the [14] J. Lee, and H. Lee, “A Comparison Study on Legal Document
classification results. Given that civil cases are usually more Classification Using Deep Neural Networks,” in 2019 International
complicated, we intend to refine the classification process by Conference on Information and Communication Technology
Convergence (ICTC), 2019, pp. 926–928, doi:
extracting the elemental features in the legal document as well 10.1109/ICTC46691.2019.8939926.
as disputed points and integrating knowledge into pre-trained [15] Q. Han, and D. Snaidauf, “Comparison of Deep Learning Technologies
models. in Legal Document Classification,” in 2021 IEEE International
Conference on Big Data (Big Data), 2021, pp. 2701–2704, doi:
REFERENCES 10.1109/BigData52589.2021.9671486.
[1] H. Zhong, C. Xiao, C. Tu, T. Zhang, Z. Liu, and M. Sun, “How Does NLP [16] X. Guo, H. Zhang, L. Ye, and S. Li, “RnnTd: An Approach Based on
Benefit Legal System: A Summary of Legal Artificial Intelligence,” in LSTM and Tensor Decomposition for Classification of Crimes in Legal
Proceedings of the 58th Annual Meeting of the Association for Cases,” in 2019 IEEE Fourth International Conference on Data Science
Computational Linguistics, Online, 2020, pp. 5218–5230, doi: in Cyberspace (DSC), Jun. 2019, pp. 16–22, doi:
10.18653/v1/2020.acl-main.466. 10.1109/DSC.2019.00012.
[2] C.-L. Liu, C.-T. Chang, and J.-H. Ho, “Case Instance Generation and [17] N. Bansal, A. Sharma, and R.K. Singh, “An Evolving Hybrid Deep
Refinement for Case-Based Criminal Summary Judgments in Chinese *,” Learning Framework for Legal Document Classification,” ISI, vol. 24, no.
J. Inf. Sci. Eng., vol. 20, pp. 783–800, Jul. 2004. 4, pp. 425–431, Oct. 2019, doi: 10.18280/isi.240410.
[3] G. Boella, L. Di Caro, and L. Humphreys, “Using classification to support [18] L. Wan, G. Papageorgiou, M. Seddon, and M. Bernardoni, “Long-length
legal knowledge engineers in the Eunomos legal document management Legal Document Classification,” arXiv:1912.06905 [cs], Dec. 2019.
system,” Fifth International Workshop on Juris-informatics (JURISIN), [19] B. Clavié, A. Gheewala, P. Briton, M. Alphonsus, R. Laabiyad, and F.
Jan. 2011. Piccoli, “LegaLMFiT: Efficient Short Legal Text Classification with
[4] N. Aletras, D. Tsarapatsanis, D. Preoţiuc-Pietro, and V. Lampos, LSTM Language Model Pre-Training,” arXiv:2109.00993 [cs], Sep.
“Predicting judicial decisions of the European Court of Human Rights: a 2021.
Natural Language Processing perspective,” PeerJ Comput. Sci., vol. 2, p. [20] P. Yang, Y. Wu, T. Cheng, X. Lyu, and Z. Wang, “Segment-Level
e93, Oct. 2016, doi: 10.7717/peerj-cs.93. Sentiment Classification for Online Comments of Legal Cases,” in 2020
[5] N. Capuano, C. De Maio, S. Salerno, and D. Toti, “A Methodology based IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl
on Commonsense Knowledge and Ontologies for the Automatic Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and
Classification of Legal Cases,” in Proceedings of the 4th International Big Data Computing, Intl Conf on Cyber Science and Technology
Conference on Web Intelligence, Mining and Semantics (WIMS14) - Congress (DASC/PiCom/CBDCom/CyberSciTech), 2020, pp. 366–370,
WIMS ’14, Thessaloniki, Greece, 2014, pp. 1–6, doi: doi: 10.1109/DASC-PICom-CBDCom-CyberSciTech49142.2020.00071.
10.1145/2611040.2611048. [21] Y. Fang, X. Tian, H. Wu, S. Gu, Z. Wang, F. Wang, J. Li, and Y. Weng,
[6] R. Chhatwal, P. Gronvall, N. Huber-Fliflet, R. Keeling, J. Zhang, and H. “Few-Shot Learning for Chinese Legal Controversial Issues
Zhao, “Explainable Text Classification in Legal Document Review A Classification,” IEEE Access, vol. 8, pp. 75022–75034, 2020, doi:
Case Study of Explainable Predictive Coding,” in 2018 IEEE 10.1109/ACCESS.2020.2988493.
International Conference on Big Data (Big Data), 2018, pp. 1905–1911, [22] D. Tuggener, P. von Däniken, T. Peetz, and M. Cieliebak, “LEDGAR: A
doi: 10.1109/BigData.2018.8622073. Large-Scale Multi-label Corpus for Text Classification of Legal
[7] H. Ye, X. Jiang, Z. Luo, and W. Chao, “Interpretable Charge Predictions Provisions in Contracts,” in Proceedings of the 12th Language Resources
for Criminal Cases: Learning to Generate Court Views from Fact and Evaluation Conference, Marseille, France, 2020, pp. 1235–1241.
Descriptions,” in Proceedings of the 2018 Conference of the North [23] Z. Shaheen, G. Wohlgenannt, and E. Filtz, “Large Scale Legal Text
American Chapter of the Association for Computational Linguistics: Classification Using Transformer Models,” arXiv:2010.12871 [cs], Oct.
Human Language Technologies, Volume 1 (Long Papers), New Orleans, 2020.
Louisiana, Jun. 2018, pp. 1854–1864, doi: 10.18653/v1/N18-1168.
[24] A. Aguiar, R. Silveira, V. Pinheiro, V. Furtado, and J.A. Neto, “Text
[8] C.J. Mahoney, J. Zhang, N. Huber-Fliflet, P. Gronvall, and H. Zhao, “A Classification in Legal Documents Extracted from Lawsuits in Brazilian
Framework for Explainable Text Classification in Legal Document Courts,” in Intelligent Systems, Cham, 2021, pp. 586–600, doi:
Review,” in 2019 IEEE International Conference on Big Data (Big Data), 10.1007/978-3-030-91699-2_40.
2019, pp. 1858–1867, doi: 10.1109/BigData47090.2019.9005659.

448
Authorized licensed use limited to: Universidade Federal de Sergipe. Downloaded on July 04,2024 at 11:37:07 UTC from IEEE Xplore. Restrictions apply.
[25] S. Jayasinghe, L. Rambukkanage, A. Silva, N. de Silva, and A.S. Perera, [38] K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. Zemel,
“Critical Sentence Identification in Legal Cases Using Multi-Class and Y. Bengio, “Show, Attend and Tell: Neural Image Caption
Classification,” in 2021 IEEE 16th International Conference on Generation with Visual Attention,” in Proceedings of the 32nd
Industrial and Information Systems (ICIIS), 2021, pp. 146–151, doi: International Conference on Machine Learning, Jun. 2015, pp. 2048–
10.1109/ICIIS53135.2021.9660657. 2057.
[26] H. Zhong, Z. Guo, C. Tu, C. Xiao, Z. Liu, and M. Sun, “Legal Judgment [39] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training
Prediction via Topological Learning,” in Proceedings of the 2018 of Deep Bidirectional Transformers for Language Understanding,” in
Conference on Empirical Methods in Natural Language Processing, Proceedings of the 2019 Conference of the North American Chapter of
Brussels, Belgium, Oct. 2018, pp. 3540–3549, doi: 10.18653/v1/D18- the Association for Computational Linguistics: Human Language
1390. Technologies, Volume 1 (Long and Short Papers), Minneapolis,
[27] I. Chalkidis, M. Fergadiotis, P. Malakasiotis, N. Aletras, and I. Minnesota, 2019, pp. 4171–4186, doi: 10.18653/v1/N19-1423.
Androutsopoulos, “LEGAL-BERT: The Muppets straight out of Law [40] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis,
School,” in Findings of the Association for Computational Linguistics: L. Zettlemoyer, and V. Stoyanov, “RoBERTa: A Robustly Optimized
EMNLP 2020, Online, 2020, pp. 2898–2904, doi: BERT Pretraining Approach,” arXiv:1907.11692 [cs], Jul. 2019.
10.18653/v1/2020.findings-emnlp.261. [41] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez,
[28] C. Xiao, X. Hu, Z. Liu, C. Tu, and M. Sun, “Lawformer: A Pre-trained Ł. Kaiser, and I. Polosukhin, “Attention is All you Need,” in Advances in
Language Model for Chinese Legal Long Documents,” arXiv:2105.03887 Neural Information Processing Systems, 2017, vol. 30.
[cs], May 2021. [42] R. Child, S. Gray, A. Radford, and I. Sutskever, “Generating Long
[29] D. Song, A. Vold, K. Madan, and F. Schilder, “Multi-label legal Sequences with Sparse Transformers,” arXiv:1904.10509 [cs, stat], Apr.
document classification: A deep learning-based approach with label- 2019.
attention and domain-specific pre-training,” Information Systems, vol. [43] I. Beltagy, M.E. Peters, and A. Cohan, “Longformer: The Long-
106, p. 101718, May 2022, doi: 10.1016/j.is.2021.101718. Document Transformer,” arXiv:2004.05150 [cs], Dec. 2020.
[30] T. Cover, and P. Hart, “Nearest neighbor pattern classification,” IEEE [44] J. Qiu, H. Ma, O. Levy, W. Yih, S. Wang, and J. Tang, “Blockwise Self-
Transactions on Information Theory, vol. 13, no. 1, pp. 21–27, Jan. 1967, Attention for Long Document Understanding,” in Findings of the
doi: 10.1109/TIT.1967.1053964. Association for Computational Linguistics: EMNLP 2020, Online, Nov.
[31] G. Salton, and C. Buckley, “Term-weighting approaches in automatic text 2020, pp. 2555–2565, doi: 10.18653/v1/2020.findings-emnlp.232.
retrieval,” Information Processing & Management, vol. 24, no. 5, pp. [45] M. Zaheer, G. Guruganesh, A. Dubey, J. Ainslie, C. Alberti, S. Ontanon,
513–523, Jan. 1988, doi: 10.1016/0306-4573(88)90021-0. P. Pham, A. Ravula, Q. Wang, L. Yang, and A. Ahmed, “Big Bird:
[32] C. Cortes, and V. Vapnik, “Support-vector networks,” Mach Learn, vol. Transformers for Longer Sequences,” arXiv:2007.14062 [cs, stat], Jan.
20, no. 3, pp. 273–297, Sep. 1995, doi: 10.1007/BF00994018. 2021.
[33] N. Capuano, C. De Maio, S. Salerno, and D. Toti, “A Methodology based [46] N. Kitaev, L. Kaiser, and A. Levskaya, “Reformer: The Efficient
on Commonsense Knowledge and Ontologies for the Automatic Transformer,” presented at the International Conference on Learning
Classification of Legal Cases,” in Proceedings of the 4th International Representations, Sep. 2019.
Conference on Web Intelligence, Mining and Semantics (WIMS14), New [47] A. Andoni, P. Indyk, T. Laarhoven, I. Razenshteyn, and L. Schmidt,
York, NY, USA, 2014, pp. 1–6, doi: 10.1145/2611040.2611048. “Practical and Optimal LSH for Angular Distance,” in Advances in
[34] Y. Kim, “Convolutional Neural Networks for Sentence Classification,” in Neural Information Processing Systems, 2015, vol. 28.
Proceedings of the 2014 Conference on Empirical Methods in Natural [48] C. Xiao, H. Zhong, Z. Guo, C. Tu, Z. Liu, M. Sun, Y. Feng, X. Han, Z.
Language Processing (EMNLP), Doha, Qatar, 2014, pp. 1746–1751, doi: Hu, H. Wang, and J. Xu, “CAIL2018: A Large-Scale Legal Dataset for
10.3115/v1/D14-1181. Judgment Prediction,” arXiv:1807.02478 [cs], Jul. 2018.
[35] A. Elnaggar, C. Gebendorfer, I. Glaser, and F. Matthes, “Multi-Task Deep [49] H. Zhong, C. Xiao, Z. Guo, C. Tu, Z. Liu, M. Sun, Y. Feng, X. Han, Z.
Learning for Legal Document Translation, Summarization and Multi- Hu, H. Wang, and J. Xu, “Overview of CAIL2018: Legal Judgment
Label Classification,” in Proceedings of the 2018 Artificial Intelligence Prediction Competition,” arXiv:1810.05851 [cs], Oct. 2018.
and Cloud Computing Conference on ZZZ - AICCC ’18, Tokyo, Japan,
[50] H. Ye, X. Jiang, Z. Luo, and W. Chao, “Interpretable Charge Predictions
2018, pp. 9–15, doi: 10.1145/3299819.3299844.
for Criminal Cases: Learning to Generate Court Views from Fact
[36] G. Tang, M. Müller, A. Rios, and R. Sennrich, “Why Self-Attention? A Descriptions,” in Proceedings of the 2018 Conference of the North
Targeted Evaluation of Neural Machine Translation Architectures,” in American Chapter of the Association for Computational Linguistics:
Proceedings of the 2018 Conference on Empirical Methods in Natural Human Language Technologies, Volume 1 (Long Papers), New Orleans,
Language Processing, Brussels, Belgium, Oct. 2018, pp. 4263–4272, doi: Louisiana, Jun. 2018, pp. 1854–1864, doi: 10.18653/v1/N18-1168.
10.18653/v1/D18-1458.
[37] I. Chalkidis, M. Fergadiotis, P. Malakasiotis, N. Aletras, and I.
Androutsopoulos, “Extreme Multi-Label Legal Text Classification: A
case study in EU Legislation,” arXiv:1905.10892 [cs], May 2019.

449
Authorized licensed use limited to: Universidade Federal de Sergipe. Downloaded on July 04,2024 at 11:37:07 UTC from IEEE Xplore. Restrictions apply.

You might also like