Polish Court Ruling Classification Using Deep Neural Networks
Abstract
:1. Introduction
1.1. Numerical Representation of Words
1.2. Polish Language Embeddings
1.3. Machine Learning Models
1.3.1. Convolutional Neural Networks (CNNs)
1.3.2. Recurrent Neural Networks (RNNs)
Legal Document Classification
Main Objectives of the Presented Work
2. Materials and Methods
2.1. Polish Court Ruling Dataset
2.2. Models for Polish Court Rulings Classification
2.2.1. Baseline Model
2.2.2. Convolutional Models
- conv-max-dense—presented in Figure 4;
- conv-avg-dense—the same as the previous, but with average pooling;
- conv—a simple convolutional model with average pooling and no hidden dense layer.
2.2.3. Recurrent Models
2.2.4. Law References Models
2.3. Software Implementation
2.4. Training Process
2.5. Metrics
3. Results
3.1. Embedding Matrix Selection
3.2. Model Selection
3.3. Legal Document Reference Model Analysis
3.4. Hyper-Parameter Optimization
3.5. Final Model Quality
3.5.1. Performance Analysis
3.5.2. Incorrect Prediction Statistics
3.5.3. Incorrect Prediction Analysis
4. Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient Estimation of Word Representations in Vector Space. In Proceedings of the 1st International Conference on Learning Representations, Scottsdale, AZ, USA, 2–4 May 2013. [Google Scholar]
- Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.S.; Dean, J. Distributed representations of words and phrases and their compositionality. Adv. Neural Inf. Process. Syst. 2013, 3111–3119. [Google Scholar]
- Mykowiecka, A.; Marciniak, M.; Rychlik, P. Testing word embeddings for Polish. Cogn. Études Cogn. 2017, 17, 1468. [Google Scholar] [CrossRef] [Green Version]
- Word2Vec Polish Models by IPIPAN. Available online: http://dsmodels.nlp.ipipan.waw.pl/ (accessed on 15 January 2022).
- Géron, A. Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems; O’Reilly Media: Sebastopol, CA, USA, 2019. [Google Scholar]
- Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef] [Green Version]
- Kim, Y. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, 25–29 October 2014; a Meeting of SIGDAT, a Special Interest Group of the ACL; Moschitti, A., Pang, B., Daelemans, W., Eds.; ACL: Cambridge, MA, USA, 2014; pp. 1746–1751. [Google Scholar] [CrossRef] [Green Version]
- Johnson, R.; Zhang, T. Effective Use of Word Order for Text Categorization with Convolutional Neural Networks. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, CO, USA, 31 May–5 June 2015; Mihalcea, R., Chai, J.Y., Sarkar, A., Eds.; The Association for Computational Linguistics: Cambridge, MA, USA, 2015; pp. 103–112. [Google Scholar] [CrossRef] [Green Version]
- Elbayad, M.; Besacier, L.; Verbeek, J. Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction. In Proceedings of the 22nd Conference on Computational Natural Language Learning, CoNLL 2018, Brussels, Belgium, 31 October–1 November 2018; Korhonen, A., Titov, I., Eds.; Association for Computational Linguistics: Cambridge, MA, USA, 2018; pp. 97–107. [Google Scholar] [CrossRef] [Green Version]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016; Available online: http://www.deeplearningbook.org (accessed on 15 January 2022).
- Pascanu, R.; Mikolov, T.; Bengio, Y. On the difficulty of training Recurrent Neural Networks. In Proceedings of the 30th International Conference on Machine Learning, Atlanta, GA, USA, 17–19 June 2013. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long Short-term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Cho, K.; van Merrienboer, B.; Gülçehre, Ç.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, 25–29 October 2014; a Meeting of SIGDAT, a Special Interest Group of the ACL; Moschitti, A., Pang, B., Daelemans, W., Eds.; ACL: Cambridge, MA, USA, 2014; pp. 1724–1734. [Google Scholar] [CrossRef]
- Sulea, O.; Zampieri, M.; Malmasi, S.; Vela, M.; Dinu, L.P.; van Genabith, J. Exploring the Use of Text Classification in the Legal Domain. In Proceedings of the Second Workshop on Automated Semantic Analysis of Information in Legal Texts Co-Located with the 16th International Conference on Artificial Intelligence and Law (ICAIL 2017), London, UK, 16 June 2017; Ashley, K.D., Atkinson, K., Branting, L.K., Francesconi, E., Grabmair, M., Lauritsen, M., Walker, V.R., Wyner, A.Z., Eds.; CEUR-WS.org: London, UK, 2017; Volume 2143. [Google Scholar]
- Undavia, S.; Meyers, A.; Ortega, J. A Comparative Study of Classifying Legal Documents with Neural Networks. In Proceedings of the 2018 Federated Conference on Computer Science and Information Systems, Poznań, Poland, 9–12 September 2018; pp. 515–522. [Google Scholar] [CrossRef] [Green Version]
- Fernandes, W.P.D.; Silva, L.J.S.; Frajhof, I.Z.; de Almeida, G.F.C.F.; Konder, C.N.; Nasser, R.B.; de Carvalho, G.R.; Barbosa, S.D.J.; Lopes, H.C.V. Appellate Court Modifications Extraction for Portuguese. Artif. Intell. Law 2020, 28, 327–360. [Google Scholar] [CrossRef]
- Wan, L.; Papageorgiou, G.; Seddon, M.; Bernardoni, M. Long-length Legal Document Classification. arXiv 2019, arXiv:1912.06905. [Google Scholar]
- Waltl, B.; Bonczek, G.; Scepankova, E.; Matthes, F. Semantic types of legal norms in German laws: Classification and analysis using local linear explanations. Artif. Intell. Law 2019, 27, 43–71. [Google Scholar] [CrossRef]
- Noguti, M.Y.; Vellasques, E.; Oliveira, L.S. Legal Document Classification: An Application to Law Area Prediction of Petitions to Public Prosecution Service. In Proceedings of the 2020 International Joint Conference on Neural Networks, Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar] [CrossRef]
- Ruggeri, F.; Lagioia, F.; Lippi, M.; Torroni, P. Detecting and explaining unfairness in consumer contracts through memory networks. Artif. Intell. Law 2021, 30, 59–92. [Google Scholar] [CrossRef]
- Adhikari, A.; Ram, A.; Tang, R.; Lin, J. Rethinking Complex Neural Network Architectures for Document Classification. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 19 June 2019; Volume 1, pp. 4046–4051. [Google Scholar] [CrossRef]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- TensorFlow RNN Performance on GPU. Available online: https://www.tensorflow.org/guide/keras/rnn#performance_optimization_and_cudnn_kernels (accessed on 15 January 2022).
Metric | Coverage | Unique Values |
---|---|---|
Case ID | 100% | 144,784 |
Court ID | 100% | 253 |
Title | 100% | 61,321 |
Signature | 100% | 118,619 |
Court | 100% | 253 |
Department | 100% | 136 |
Date | 100% | 2827 |
Judge | 88% | 7274 |
Topic | 74% | 7037 |
Legal basis | 75% | 57,241 |
Thesis | 9% | 10,388 |
Model | Variant | Description | Trainable Parameters |
---|---|---|---|
RNN + dense | lstm-dense-skipg-ns | The baseline model with embeddings trained on Skip-Gram and Negative Sampling (Figure 3) | 43,812 |
lstm-dense-cbow-hs | The baseline model with embeddings trained on CBOW and Hierarchical Softmax | 43,812 | |
lstm-dense-cbow-ns | The baseline model with embeddings trained on CBOW and Negative Sampling | 43,812 | |
lstm-dense-skipg-hs | The baseline model with embeddings trained on Skip-Gram and Hierarchical Softmax | 43,812 | |
lstm-dense-100 | The baseline models with smaller embeddings (100 dimensional vector instead of the default 300 dimensional one)—Skip-Gram and NS | 43,812 | |
gru-dense | Similar to the baseline model but with GRU cell | 33,252 | |
bd-gru-dense | Similar to the baseline model but with Bidirectional recurrent layer | 66,340 | |
Conv + dense | conv-max-dense | Convolutional model with max pooling and hidden dense layer (Figure 4) | 181,572 |
conv-avg-dense | Convolutional model with average pooling and hidden dense layer (Figure 4) | 181,572 | |
RNN | lstm | Simple model with recurrent layer with LSTM cells and no dense hidden layer | 42,756 |
gru | Simple model with recurrent layer with GRU cells and no dense hidden layer | 32,196 | |
Conv | conv | Simple convolutional model with average pooling and no hidden dense layer | 47,908 |
Dense | dense | Model with average pooling and dense layers. | 9764 |
Legal references | gru-dense-law | Similar to gru-dense model but has additional input with legal codes that are referenced in the rulings (Figure 7) | 33,892 |
Embedding Matrix | Accuracy | -Score | Epochs |
---|---|---|---|
CBOW NS | 0.9582 | 0.9319 | 21 |
CBOW HS | 0.9787 | 0.9652 | 18.4 |
Skip-Gram NS | 0.9937 | 0.9902 | 13.8 |
SKip-Gram HS | 0.9934 | 0.9897 | 14.4 |
Model | Variant | Accuracy | -Score | Epochs | Training Time |
---|---|---|---|---|---|
RNN + dense | lstm-dense-skipg-ns | 0.9937 | 0.9902 | 13.8 | 1 |
lstm-dense-cbow-hs | 0.9787 | 0.9652 | 18.4 | 1.34 | |
lstm-dense-cbow-ns | 0.9582 | 0.9319 | 21 | 1.62 | |
lstm-dense-skipg-hs | 0.9934 | 0.9897 | 14.4 | 1.06 | |
lstm-dense-100 | 0.9868 | 0.9778 | 14 | 0.34 | |
gru-dense | 0.9939 | 0.9903 | 11.8 | 0.79 | |
bd-gru-dense | 0.9946 | 0.9914 | 9.6 | 0.69 | |
Conv + dense | conv-max-dense | 0.9905 | 0.9851 | 3.8 | 0.06 |
conv-avg-dense | 0.9891 | 0.9832 | 3.4 | 0.05 | |
RNN | lstm | 0.9895 | 0.9836 | 7 | 0.47 |
gru | 0.9939 | 0.9902 | 11.4 | 0.75 | |
Conv | conv | 0.9900 | 0.9843 | 4.2 | 0.06 |
Dense | dense | 0.9428 | 0.8989 | 14 | 0.06 |
Legal references | gru-dense-law | 0.9940 | 0.9904 | 9.2 | 0.65 |
Model | Accuracy | -Score | Precision | Recall |
---|---|---|---|---|
conv-max-dense | 0.9927 | 0.9886 | 0.9904 | 0.9869 |
gru | 0.9950 | 0.9921 | 0.9932 | 0.9911 |
bd-gru-dense | 0.9949 | 0.9917 | 0.9927 | 0.9907 |
Model | Accuracy | -Score | Wrong Predictions | Correct Predictions |
---|---|---|---|---|
conv-max-dense | 0.9924 | 0.9878 | 221 | 28,734 |
gru | 0.9947 | 0.9915 | 153 | 28,802 |
bd-gru-dense | 0.9942 | 0.9906 | 169 | 28,786 |
Predicted | |||||
---|---|---|---|---|---|
Civil | Economic | Criminal | Labor | ||
Actual | Civil | 13,835 | 27 | 1 | 4 |
Economic | 125 | 2138 | 0 | 2 | |
Criminal | 3 | 1 | 5956 | 1 | |
Labor | 48 | 4 | 5 | 6805 |
Predicted | |||||
---|---|---|---|---|---|
Civil | Economic | Criminal | Labor | ||
Actual | Civil | 13,817 | 38 | 0 | 12 |
Economic | 70 | 2193 | 0 | 2 | |
Criminal | 3 | 2 | 5955 | 1 | |
Labor | 22 | 2 | 1 | 6837 |
Predicted | |||||
---|---|---|---|---|---|
Civil | Economic | Criminal | Labor | ||
Actual | Civil | 13,835 | 26 | 1 | 5 |
Economic | 91 | 2170 | 0 | 4 | |
Criminal | 5 | 0 | 5956 | 0 | |
Labor | 31 | 3 | 3 | 6825 |
Model | Max Memory [GB] | Full Pipeline Time [s] | Training Time [s] | Training Time per Epoch [s] |
---|---|---|---|---|
conv-max-dense | 2.55 ± 0.03 | 882 ± 11 | 641 ± 7 | 160 ± 2 |
gru | 2.74 ± 0.02 | 10,927 ± 79 | 10,670 ± 92 | 889 ± 8 |
bd-gru-dense | 3.29 ± 0.04 | 10,031 ± 79 | 9771 ± 79 | 1086 ± 9 |
Model | Max Memory [GB] | Full Pipeline Time [s] | Training Time [s] | Training Time per Epoch [s] |
---|---|---|---|---|
conv-max-dense | 4.30 ± 0.01 | 216 ± 2 | 142 ± 1 | 35 ± 0 |
gru | 3.52 ± 0.01 | 18,127 ± 98 | 18,005 ± 100 | 1500 ± 8 |
bd-gru-dense | 3.72 ± 0.01 | 27,668 ± 110 | 27,538 ± 107 | 951 ± 4 |
Model | Model Memory [MB] | Full Pipeline Time [s] | Prediction Time [s] | Prediction Time per Sample [ms] |
---|---|---|---|---|
conv-max-dense | 643 ± 0 | 286 ± 6 | 54 ± 0 | 2 ± 0 |
gru | 703 ± 3 | 339 ± 5 | 95 ± 1 | 3 ± 0 |
bd-gru-dense | 841 ± 2 | 389 ± 12 | 146 ± 1 | 5 ± 0 |
Model | Model Memory [MB] | Full Pipeline Time [s] | Prediction Time [s] | Prediction Time per Sample [ms] |
---|---|---|---|---|
conv-max-dense | 965 ± 0 | 166 ± 3 | 51 ± 0 | 2 ± 0 |
gru | 1008 ± 0 | 166 ± 3 | 51 ± 0 | 2 ± 0 |
bd-gru-dense | 1148 ± 0 | 194 ± 2 | 80 ± 1 | 3 ± 0 |
Paper | Task | Algorithm | -Score |
---|---|---|---|
Sulea et al. (2017) | French Supreme Court rulings classification into eight categories. Trained on over 120,000 rulings. | SVM Ensemble | 0.965 |
Noguti et al. (2020) | Petitions to Public Prosecution Service in Portuguese classification into eighteen law areas. Trained on about 16,000 documents. | SVM LR CNN LSTM GRU | 0.83 0.83 0.82 0.85 0.84 |
Waltl et al. (2019) | German Civil Code sentences classification into nine categories. Trained on 601 sentences. | SVM LR | 0.83 0.81 |
Undavia et al. (2018) | Classification of US Supreme Court opinions into fifteen categories. Trained on over 7500 documents. | CNN | 0.72 |
Fernandes et al. (2020) | Classification of Brazilian Appellate Court modifications of lower court decisions proposed by the upper court. 3022 documents divided into six categories. | BI LSTM + CRF BI GRU + CRF BI LSTM BI GRU CRF | 0.948 0.917 0.890 0.878 0.860 |
Lulu et al. (2019) | U.S Securities and Exchange Commission dataset EDGAR with 5 classes and 28,445 documents. | split into chunks + SVM | 0.981 |
this paper | Classification of Polish court rulings into four categories. Dataset consisted of 144,784 records. | conv-dense gru bd-gru-dense | 0.988 0.992 0.991 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kostrzewa, Ł.; Nowak, R. Polish Court Ruling Classification Using Deep Neural Networks. Sensors 2022, 22, 2137. https://doi.org/10.3390/s22062137
Kostrzewa Ł, Nowak R. Polish Court Ruling Classification Using Deep Neural Networks. Sensors. 2022; 22(6):2137. https://doi.org/10.3390/s22062137
Chicago/Turabian StyleKostrzewa, Łukasz, and Robert Nowak. 2022. "Polish Court Ruling Classification Using Deep Neural Networks" Sensors 22, no. 6: 2137. https://doi.org/10.3390/s22062137
APA StyleKostrzewa, Ł., & Nowak, R. (2022). Polish Court Ruling Classification Using Deep Neural Networks. Sensors, 22(6), 2137. https://doi.org/10.3390/s22062137