Related - A Comparison Study of Pre-Trained Language Models For Chinese Legal Document Classification
Related - A Comparison Study of Pre-Trained Language Models For Chinese Legal Document Classification
Abstract—Legal artificial intelligence (LegalAI), aiming to essential branch of NLP technology, and is suitable for solving
benefit the legal domain using artificial intelligence technologies, LegalAI tasks such as legal domain classification and court
is the hot topic of the moment. As the basis for various LegalAI decision prediction as mentioned above.
tasks such as judgment prediction and similar case matching, the
classification of legal documents is an issue that has to be Information of plaintiff Plaintiff: XXX Co., Ltd.
addressed. The majority of current approaches focus on the legal and defendant Defendant: Mr. Lu, Mr. Li
systems of native English-speaking countries. However, both
Chinese language and legal system differ significantly from that of
The trial found that: October 22, 2014, the
English. Given the success of pre-trained Language Models (PLMs) Fact Description
plaintiff and the defendant Lu signed a
and outperformance compared with feature-engineering-based "personal loan/guarantee contract", which
machine learning models as well as traditional deep neural agreed that the plaintiff to the defendant Lu
network models such as CNNs and RNNs in NLP, their borrowed 5 million yuan …
effectiveness in specific domains needs to be further investigated,
especially in legal domain. Moreover, few studies have made The court believes that the personal loan /
Court opinion
comparisons of these PLMs for specific legal tasks. Therefore, in guarantee contract signed by the plaintiff
this paper we train several strong PLMs which differ in pre- and the defendant Lu is legal and valid and
should be protected by law. The plaintiff
training corpus on three datasets of Chinese legal documents. provided the defendant with a loan of
Experimental results show that the model pre-trained on the legal 5000000 yuan in accordance with the
corpus demonstrates its high efficiency on all datasets. contract…
445
Authorized licensed use limited to: Universidade Federal de Sergipe. Downloaded on July 04,2024 at 11:37:07 UTC from IEEE Xplore. Restrictions apply.
researchers [26–29] choose to pre-train their own PLM on the To lower the quadratic complexity of Transformer, there are
legal corpus. However, most of these models adopt BERT as the many variants having been proposed. Some of them introduce
basic encoder, resulting in not being able to process texts longer Sparse Attention, which means limiting each token to attend to
than 512. only a subset of the other tokens rather than the whole sequence.
For instance, Sparse Transformer [42], Longformer [43],
C. Long-Document Pre-Trained Language Model BlockBERT [44] and Big Bird [45] use sliding window based
Transformer-based PLMs are the mainstream frameworks attention pattern. Except for this kind of location-based sparsity,
outperforming the state-of-the-art traditional deep neural content-based sparse attention is introduced in Reformer [46]
network models in various NLP tasks. Transformer [41] with the which uses Locality-Sensitive Hashing [47] to find the nearest
self-attention mechanism, which allows fully-connected neighbors of query vectors. In order to make full use of the
contextual encoding over input tokens, has achieved outstanding knowledge in the legal field, Lawformer[28] adopts Longformer
performances. However, it suffers from high calculating [43] as its basic encoder, being the first pre-trained language
complexity, which grows quadratically with the input sequence model for legal long documents.
length.
III. METHODOLOGY AND EXPERIMENTS criminal cases but neglects civil cases, they built CAIL-
Long. Each criminal case is annotated with charges, the
Across classification experiments, |𝐶| is the size of the
category set, and 𝐷! is the dimension of the hidden states. As 0.6 0.3 … probability of each category
446
Authorized licensed use limited to: Universidade Federal de Sergipe. Downloaded on July 04,2024 at 11:37:07 UTC from IEEE Xplore. Restrictions apply.
B. Pre-trained Models 𝑇* T,
𝑃= ,R =
In our study, we compare the performance of two general 𝑇* + 𝐹* T, + F-
domain pre-trained models with that of two legal domain pre- 𝑃. × 𝑅. 𝑠𝑎𝑚𝑝𝑙𝑒.
trained models on the above datasets: 𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 𝐹1. = 2 × ×
𝑃. + 𝑅. 𝑠𝑎𝑚𝑝𝑙𝑒/00
+
• Longformer [43]: it uses sliding window attention and 1
global attention on pre-selected input locations to obtain 𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 𝐹1 = G 𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 𝐹1.
𝑛
sparse attention, which enables it to process much longer .11
documents under equal computing resources. It is pre-
where 𝑇* , 𝑇+ , 𝐹* , 𝐹+ are true positives, true negatives, false
trained on generic corpus including Books, Stories,
Wikipedia and Realnews. positives and false negatives respectively.
• Big Bird [45]: it consists of three main parts of attention, D. Experimental Results
namely random attention, local attention and global The summarization of experimental results on Court-View-
attention, where local attention and global attention are Gen, CAIL2018, and CAIL-Long are shown in TABLE III.
similar to Longformer. Random attention means that for
Conclusion 1: domain-specific long text PLM show
each token, a number of tokens are randomly chosen to
superiority on long text datasets.
attend. The datasets it uses for pre-training also exclude
legal texts. It can be clearly demonstrated that Lawformer achieves the
best performance among four models by a wide margin in both
• Lawformer [28]: it utilizes Longformer as basic encoder accuracy rate and weighted-average F1 score, with values
and collects tens of millions of case documents published exceeding the other models by almost 1% on Court-View-Gen,
by the Chinese government for pre-training. 2% on CAIL2018 and 4% on CAIL-Long. Besides, it is easy to
• Legal RoBERTa [28]: it is pre-trained on the same legal discover that Lawformer outperforms Legal RoBERTa, which
corpus with Lawformer, continuing from the released are both pre-trained on the legal corpus, by approximately 4%
RoBERTa-wwm-ext checkpoint. on CAIL-Long but only nearly 1% on Court-View-Gen. The
reasons for this result are that the average text length of CAIL-
TABLE II. HYPERPARAMETERS FOR DOCUMENT CLASSIFICATION Long is over 900 while that of Court-View-Gen does not surpass
the maximum input length restriction of 512 tokens set by BERT,
Parameter CAIL2018 CAIL-Long Court-View-Gen which leads to Legal RoBERTa needing to truncate text when
Num epochs 3 5 5 processing text over 512 tokens whereas Lawformer can handle
Max. Seq. Len. {512, 4096} up to 4096 tokens. Consequently, Lawformer is more effective
in processing long text.
Learning rate 1 × 10!"
Loss Cross Entropy Loss Conclusion 2: the classification effectiveness of PLM
Hidden layer size 768 decreases when the semantic composition and complexity of
Optimizer Adam the documents increase.
Vocab size 21128 It can be observed that among three datasets, only on civil
Activation layer GELU dataset of CAIL-Long, the weighted-average F1 scores of all
C. Experimental Setup models are lower than 0.8. After investigating, we find that
Chinese legal documents are divided into three types in terms of
Detailed list of hyperparameters in the experiments on
cause, i.e., civil, criminal and administrative with civil cases
CAIL2018, CAIL-Long and Court-View-Gen is provided in
being more difficult to handle than criminal cases due to their
TABLE II. The classification performance is evaluated with the
complex merits. In addition, the litigation claims of civil
overall accuracy and weighted-average F1 score. All these
documents are more diversified than criminal documents.
measures are formulated as follows:
Furthermore, all datasets used in our experiments are collected
𝑇* + 𝑇+ from criminal documents, except for the civil part of CAIL-
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = Long. These issues lead to the result that the performances of
𝑇* + 𝑇+ + 𝐹* + 𝐹+
these models are slightly inferior in civil part.
Longformer 87.32% 0.8603 85.59% 0.8478 93.55% 0.9263 91.16% 0.9010 76.91% 0.7343
Big Bird 87.66% 0.8646 86.27% 0.8553 93.87% 0.9294 91.45% 0.9015 77.99% 0.7490
Legal RoBERTa 88.21% 0.8701 84.93% 0.8367 92.18% 0.9229 91.98% 0.9098 76.76% 0.7383
Lawformer 89.17% 0.8785 88.34% 0.8814 96.05% 0.9553 95.24% 0.9484 81.95% 0.7958
447
Authorized licensed use limited to: Universidade Federal de Sergipe. Downloaded on July 04,2024 at 11:37:07 UTC from IEEE Xplore. Restrictions apply.
From TABLE III. , we can also see that Legal RoBERTa [9] A.-M. Avram, V. Pais, and D.I. Tufis, “PyEuroVoc: A Tool for
performs even slightly better than Longformer and Big Bird on Multilingual Legal Document Classification with EuroVoc Descriptors,”
in Proceedings of the International Conference on Recent Advances in
criminal part of CAIL-Long, which is a bit strange. Therefore, it Natural Language Processing (RANLP 2021), Held Online, 2021, pp. 92–
is worthy to further investigate which one performs better when 101.
short text processing model pre-trained on the corresponding [10] I. Chalkidis, M. Fergadiotis, and I. Androutsopoulos, “MultiEURLEX -
domain corpus is compared with long text processing model pre- A multi-lingual and multi-label legal document classification dataset for
trained on the general domain corpus. zero-shot cross-lingual transfer,” in Proceedings of the 2021 Conference
on Empirical Methods in Natural Language Processing, Online and Punta
IV. CONCLUSION Cana, Dominican Republic, 2021, pp. 6974–6996, doi:
10.18653/v1/2021.emnlp-main.559.
In this paper, we compare four pre-trained language models [11] G. Xiao, J. Mo, E. Chow, H. Chen, J. Guo, and Z. Gong, “Multi-Task
from corpus of different domains, which achieve state-of-the-art CNN for Classification of Chinese Legal Questions,” in 2017 IEEE 14th
in comparison with traditional machine learning or deep neural International Conference on e-Business Engineering (ICEBE), 2017, pp.
network models in various NLP tasks, on three Chinese legal 84–90, doi: 10.1109/ICEBE.2017.22.
documents datasets. The experimental results show that [12] F. Wei, H. Qin, S. Ye, and H. Zhao, “Empirical Study of Deep Learning
for Text Classification in Legal Document Review,” in 2018 IEEE
Lawformer, pre-trained on legal long document datasets, International Conference on Big Data (Big Data), 2018, pp. 3317–3320,
outperforms other models, which confirms the effectiveness of doi: 10.1109/BigData.2018.8622157.
the practice of pre-training models on different corpora [13] R. Keeling, R. Chhatwal, N. Huber-Fliflet, J. Zhang, F. Wei, H. Zhao, Y.
depending on a specific domain in order to improve the Shi, and H. Qin, “Empirical Comparisons of CNN with Other Learning
comprehension of texts in that domain. Algorithms for Text Classification in Legal Document Review,” in 2019
IEEE International Conference on Big Data (Big Data), 2019, pp. 2038–
For future work, we would like to tackle the inferior 2042, doi: 10.1109/BigData47090.2019.9006248.
performance in civil cases to criminal cases in terms of the [14] J. Lee, and H. Lee, “A Comparison Study on Legal Document
classification results. Given that civil cases are usually more Classification Using Deep Neural Networks,” in 2019 International
complicated, we intend to refine the classification process by Conference on Information and Communication Technology
Convergence (ICTC), 2019, pp. 926–928, doi:
extracting the elemental features in the legal document as well 10.1109/ICTC46691.2019.8939926.
as disputed points and integrating knowledge into pre-trained [15] Q. Han, and D. Snaidauf, “Comparison of Deep Learning Technologies
models. in Legal Document Classification,” in 2021 IEEE International
Conference on Big Data (Big Data), 2021, pp. 2701–2704, doi:
REFERENCES 10.1109/BigData52589.2021.9671486.
[1] H. Zhong, C. Xiao, C. Tu, T. Zhang, Z. Liu, and M. Sun, “How Does NLP [16] X. Guo, H. Zhang, L. Ye, and S. Li, “RnnTd: An Approach Based on
Benefit Legal System: A Summary of Legal Artificial Intelligence,” in LSTM and Tensor Decomposition for Classification of Crimes in Legal
Proceedings of the 58th Annual Meeting of the Association for Cases,” in 2019 IEEE Fourth International Conference on Data Science
Computational Linguistics, Online, 2020, pp. 5218–5230, doi: in Cyberspace (DSC), Jun. 2019, pp. 16–22, doi:
10.18653/v1/2020.acl-main.466. 10.1109/DSC.2019.00012.
[2] C.-L. Liu, C.-T. Chang, and J.-H. Ho, “Case Instance Generation and [17] N. Bansal, A. Sharma, and R.K. Singh, “An Evolving Hybrid Deep
Refinement for Case-Based Criminal Summary Judgments in Chinese *,” Learning Framework for Legal Document Classification,” ISI, vol. 24, no.
J. Inf. Sci. Eng., vol. 20, pp. 783–800, Jul. 2004. 4, pp. 425–431, Oct. 2019, doi: 10.18280/isi.240410.
[3] G. Boella, L. Di Caro, and L. Humphreys, “Using classification to support [18] L. Wan, G. Papageorgiou, M. Seddon, and M. Bernardoni, “Long-length
legal knowledge engineers in the Eunomos legal document management Legal Document Classification,” arXiv:1912.06905 [cs], Dec. 2019.
system,” Fifth International Workshop on Juris-informatics (JURISIN), [19] B. Clavié, A. Gheewala, P. Briton, M. Alphonsus, R. Laabiyad, and F.
Jan. 2011. Piccoli, “LegaLMFiT: Efficient Short Legal Text Classification with
[4] N. Aletras, D. Tsarapatsanis, D. Preoţiuc-Pietro, and V. Lampos, LSTM Language Model Pre-Training,” arXiv:2109.00993 [cs], Sep.
“Predicting judicial decisions of the European Court of Human Rights: a 2021.
Natural Language Processing perspective,” PeerJ Comput. Sci., vol. 2, p. [20] P. Yang, Y. Wu, T. Cheng, X. Lyu, and Z. Wang, “Segment-Level
e93, Oct. 2016, doi: 10.7717/peerj-cs.93. Sentiment Classification for Online Comments of Legal Cases,” in 2020
[5] N. Capuano, C. De Maio, S. Salerno, and D. Toti, “A Methodology based IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl
on Commonsense Knowledge and Ontologies for the Automatic Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and
Classification of Legal Cases,” in Proceedings of the 4th International Big Data Computing, Intl Conf on Cyber Science and Technology
Conference on Web Intelligence, Mining and Semantics (WIMS14) - Congress (DASC/PiCom/CBDCom/CyberSciTech), 2020, pp. 366–370,
WIMS ’14, Thessaloniki, Greece, 2014, pp. 1–6, doi: doi: 10.1109/DASC-PICom-CBDCom-CyberSciTech49142.2020.00071.
10.1145/2611040.2611048. [21] Y. Fang, X. Tian, H. Wu, S. Gu, Z. Wang, F. Wang, J. Li, and Y. Weng,
[6] R. Chhatwal, P. Gronvall, N. Huber-Fliflet, R. Keeling, J. Zhang, and H. “Few-Shot Learning for Chinese Legal Controversial Issues
Zhao, “Explainable Text Classification in Legal Document Review A Classification,” IEEE Access, vol. 8, pp. 75022–75034, 2020, doi:
Case Study of Explainable Predictive Coding,” in 2018 IEEE 10.1109/ACCESS.2020.2988493.
International Conference on Big Data (Big Data), 2018, pp. 1905–1911, [22] D. Tuggener, P. von Däniken, T. Peetz, and M. Cieliebak, “LEDGAR: A
doi: 10.1109/BigData.2018.8622073. Large-Scale Multi-label Corpus for Text Classification of Legal
[7] H. Ye, X. Jiang, Z. Luo, and W. Chao, “Interpretable Charge Predictions Provisions in Contracts,” in Proceedings of the 12th Language Resources
for Criminal Cases: Learning to Generate Court Views from Fact and Evaluation Conference, Marseille, France, 2020, pp. 1235–1241.
Descriptions,” in Proceedings of the 2018 Conference of the North [23] Z. Shaheen, G. Wohlgenannt, and E. Filtz, “Large Scale Legal Text
American Chapter of the Association for Computational Linguistics: Classification Using Transformer Models,” arXiv:2010.12871 [cs], Oct.
Human Language Technologies, Volume 1 (Long Papers), New Orleans, 2020.
Louisiana, Jun. 2018, pp. 1854–1864, doi: 10.18653/v1/N18-1168.
[24] A. Aguiar, R. Silveira, V. Pinheiro, V. Furtado, and J.A. Neto, “Text
[8] C.J. Mahoney, J. Zhang, N. Huber-Fliflet, P. Gronvall, and H. Zhao, “A Classification in Legal Documents Extracted from Lawsuits in Brazilian
Framework for Explainable Text Classification in Legal Document Courts,” in Intelligent Systems, Cham, 2021, pp. 586–600, doi:
Review,” in 2019 IEEE International Conference on Big Data (Big Data), 10.1007/978-3-030-91699-2_40.
2019, pp. 1858–1867, doi: 10.1109/BigData47090.2019.9005659.
448
Authorized licensed use limited to: Universidade Federal de Sergipe. Downloaded on July 04,2024 at 11:37:07 UTC from IEEE Xplore. Restrictions apply.
[25] S. Jayasinghe, L. Rambukkanage, A. Silva, N. de Silva, and A.S. Perera, [38] K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. Zemel,
“Critical Sentence Identification in Legal Cases Using Multi-Class and Y. Bengio, “Show, Attend and Tell: Neural Image Caption
Classification,” in 2021 IEEE 16th International Conference on Generation with Visual Attention,” in Proceedings of the 32nd
Industrial and Information Systems (ICIIS), 2021, pp. 146–151, doi: International Conference on Machine Learning, Jun. 2015, pp. 2048–
10.1109/ICIIS53135.2021.9660657. 2057.
[26] H. Zhong, Z. Guo, C. Tu, C. Xiao, Z. Liu, and M. Sun, “Legal Judgment [39] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training
Prediction via Topological Learning,” in Proceedings of the 2018 of Deep Bidirectional Transformers for Language Understanding,” in
Conference on Empirical Methods in Natural Language Processing, Proceedings of the 2019 Conference of the North American Chapter of
Brussels, Belgium, Oct. 2018, pp. 3540–3549, doi: 10.18653/v1/D18- the Association for Computational Linguistics: Human Language
1390. Technologies, Volume 1 (Long and Short Papers), Minneapolis,
[27] I. Chalkidis, M. Fergadiotis, P. Malakasiotis, N. Aletras, and I. Minnesota, 2019, pp. 4171–4186, doi: 10.18653/v1/N19-1423.
Androutsopoulos, “LEGAL-BERT: The Muppets straight out of Law [40] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis,
School,” in Findings of the Association for Computational Linguistics: L. Zettlemoyer, and V. Stoyanov, “RoBERTa: A Robustly Optimized
EMNLP 2020, Online, 2020, pp. 2898–2904, doi: BERT Pretraining Approach,” arXiv:1907.11692 [cs], Jul. 2019.
10.18653/v1/2020.findings-emnlp.261. [41] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez,
[28] C. Xiao, X. Hu, Z. Liu, C. Tu, and M. Sun, “Lawformer: A Pre-trained Ł. Kaiser, and I. Polosukhin, “Attention is All you Need,” in Advances in
Language Model for Chinese Legal Long Documents,” arXiv:2105.03887 Neural Information Processing Systems, 2017, vol. 30.
[cs], May 2021. [42] R. Child, S. Gray, A. Radford, and I. Sutskever, “Generating Long
[29] D. Song, A. Vold, K. Madan, and F. Schilder, “Multi-label legal Sequences with Sparse Transformers,” arXiv:1904.10509 [cs, stat], Apr.
document classification: A deep learning-based approach with label- 2019.
attention and domain-specific pre-training,” Information Systems, vol. [43] I. Beltagy, M.E. Peters, and A. Cohan, “Longformer: The Long-
106, p. 101718, May 2022, doi: 10.1016/j.is.2021.101718. Document Transformer,” arXiv:2004.05150 [cs], Dec. 2020.
[30] T. Cover, and P. Hart, “Nearest neighbor pattern classification,” IEEE [44] J. Qiu, H. Ma, O. Levy, W. Yih, S. Wang, and J. Tang, “Blockwise Self-
Transactions on Information Theory, vol. 13, no. 1, pp. 21–27, Jan. 1967, Attention for Long Document Understanding,” in Findings of the
doi: 10.1109/TIT.1967.1053964. Association for Computational Linguistics: EMNLP 2020, Online, Nov.
[31] G. Salton, and C. Buckley, “Term-weighting approaches in automatic text 2020, pp. 2555–2565, doi: 10.18653/v1/2020.findings-emnlp.232.
retrieval,” Information Processing & Management, vol. 24, no. 5, pp. [45] M. Zaheer, G. Guruganesh, A. Dubey, J. Ainslie, C. Alberti, S. Ontanon,
513–523, Jan. 1988, doi: 10.1016/0306-4573(88)90021-0. P. Pham, A. Ravula, Q. Wang, L. Yang, and A. Ahmed, “Big Bird:
[32] C. Cortes, and V. Vapnik, “Support-vector networks,” Mach Learn, vol. Transformers for Longer Sequences,” arXiv:2007.14062 [cs, stat], Jan.
20, no. 3, pp. 273–297, Sep. 1995, doi: 10.1007/BF00994018. 2021.
[33] N. Capuano, C. De Maio, S. Salerno, and D. Toti, “A Methodology based [46] N. Kitaev, L. Kaiser, and A. Levskaya, “Reformer: The Efficient
on Commonsense Knowledge and Ontologies for the Automatic Transformer,” presented at the International Conference on Learning
Classification of Legal Cases,” in Proceedings of the 4th International Representations, Sep. 2019.
Conference on Web Intelligence, Mining and Semantics (WIMS14), New [47] A. Andoni, P. Indyk, T. Laarhoven, I. Razenshteyn, and L. Schmidt,
York, NY, USA, 2014, pp. 1–6, doi: 10.1145/2611040.2611048. “Practical and Optimal LSH for Angular Distance,” in Advances in
[34] Y. Kim, “Convolutional Neural Networks for Sentence Classification,” in Neural Information Processing Systems, 2015, vol. 28.
Proceedings of the 2014 Conference on Empirical Methods in Natural [48] C. Xiao, H. Zhong, Z. Guo, C. Tu, Z. Liu, M. Sun, Y. Feng, X. Han, Z.
Language Processing (EMNLP), Doha, Qatar, 2014, pp. 1746–1751, doi: Hu, H. Wang, and J. Xu, “CAIL2018: A Large-Scale Legal Dataset for
10.3115/v1/D14-1181. Judgment Prediction,” arXiv:1807.02478 [cs], Jul. 2018.
[35] A. Elnaggar, C. Gebendorfer, I. Glaser, and F. Matthes, “Multi-Task Deep [49] H. Zhong, C. Xiao, Z. Guo, C. Tu, Z. Liu, M. Sun, Y. Feng, X. Han, Z.
Learning for Legal Document Translation, Summarization and Multi- Hu, H. Wang, and J. Xu, “Overview of CAIL2018: Legal Judgment
Label Classification,” in Proceedings of the 2018 Artificial Intelligence Prediction Competition,” arXiv:1810.05851 [cs], Oct. 2018.
and Cloud Computing Conference on ZZZ - AICCC ’18, Tokyo, Japan,
[50] H. Ye, X. Jiang, Z. Luo, and W. Chao, “Interpretable Charge Predictions
2018, pp. 9–15, doi: 10.1145/3299819.3299844.
for Criminal Cases: Learning to Generate Court Views from Fact
[36] G. Tang, M. Müller, A. Rios, and R. Sennrich, “Why Self-Attention? A Descriptions,” in Proceedings of the 2018 Conference of the North
Targeted Evaluation of Neural Machine Translation Architectures,” in American Chapter of the Association for Computational Linguistics:
Proceedings of the 2018 Conference on Empirical Methods in Natural Human Language Technologies, Volume 1 (Long Papers), New Orleans,
Language Processing, Brussels, Belgium, Oct. 2018, pp. 4263–4272, doi: Louisiana, Jun. 2018, pp. 1854–1864, doi: 10.18653/v1/N18-1168.
10.18653/v1/D18-1458.
[37] I. Chalkidis, M. Fergadiotis, P. Malakasiotis, N. Aletras, and I.
Androutsopoulos, “Extreme Multi-Label Legal Text Classification: A
case study in EU Legislation,” arXiv:1905.10892 [cs], May 2019.
449
Authorized licensed use limited to: Universidade Federal de Sergipe. Downloaded on July 04,2024 at 11:37:07 UTC from IEEE Xplore. Restrictions apply.