A Novel Approach On Argument Based Legal Prediction Model Using Machine Learning

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Proceedings of the International Conference on Smart Electronics and Communication (ICOSEC 2020)

IEEE Xplore Part Number: CFP20V90-ART; ISBN: 978-1-7281-5461-9

A Novel Approach on Argument based Legal


Prediction Model using Machine Learning

Riya Sil Abhishek Roy


Research Scholar, Dept. of Comp. Sc. & Engg Assoc. Prof., Dept. of Comp. Sc. & Engg
Adamas University, Kolkata, India Adamas University, Kolkata, India
riyasil1802@gma il.com [email protected]

Abstract — “Justice delayed is justice denied” and this delaying better output. The expansion of artificial intelligence in legal
of justice is a great bane for the Indian justice system. Every year, firms [3] has led to the emergence of an automated service
illimitable cases remain pending just for the final hearing of delivery system. It has been used in various legal domains and
judicial verdict. Years pass-by keeping the plaintiff waiting for
helps to assist in the delivery of justice to its beneficiaries, such as,
justice. For years, this is a major issue faced in the Indian judicial
system. In this paper, authors have attempted to condense the (i) Providing help to the lawyer by performing due diligence
problem by decreasing the number of cases before it reaches the and research work, (ii) Endowing additional information and
Court. This is done by extending help to the legal professionals to shortcuts through data analytics, (iii) Automate innovative
predict a case output from previous records. This paper focuses on processes in legal work. The unremitting demand and rigorous
cases related to ‘Dowry Death’ i.e. IPC section 498A and 304B. It change in clients’ expectations have led to the adoption of new
aims to deliver justice by predicting judicial argument-based techniques. It makes work faster and easier for legal
analysis using the Support Vector Machine (SVM) algorithm to professionals. The huge population of India and a shortage of
find its accuracy. This model processes through the following judges has led to numerous cases. According to a report, in 2019,
steps: (i) Hard-copies of the case files with pronounced judgments the Law Minister of our Nation informed that there have been
related to dowry are collected from trial courts of West Bengal. (ii)
more than 43 lacs pending cases over 25 High Courts in the
Data set are generated manually based on certain parameters
like:(a) ‘Victim Name’ (b) ‘Number of years married (greater than country, out of which more than 8 lacs cases [4] are pending
seven years or not)’. In India, these parameters determine the key over a decade now. Considering the vast number of
factors of ‘Dowry’ related cases. If the case is filed within seven compounding records besides the recent developments in the
years of marriage and the defendant has taken dowry then the pandemic situation of COVID-19 [5], things have worsened
case falls under ‘dowry case’ else not over time. At this crucial point in time, enhancement of the legal
(c) ‘Dowry taken within seven years of marriage (Yes/No)’ (d) system using machine learning algorithms [6] is imperative and
‘Incident occurred within seven years of marriage’, this parameter can help in the reduction of workload in legal professions, thus
shows that if the death has taken place within seven years of resulting in the utilization of more time to clear out the pending
marriage it falls under 'Dowry Death' case.
cases [7]. Considering these trying times where people are to
(e) ‘Postmortem Report (Usual/Unusual Death)’ and many other
documented parameters. (iii) A Supervised Machine Learning exercise social distancing to curb the spread of COVID-19, this
Algorithm namely, Support Vector Machine (SVM) is used to paper aims to revolutionize the process of legal work making it
assist legal judgement through a prediction system. The objective fully digital and preserve the output for future use. The paper
of this paper is to predict whether a person is guilty or not, using a focuses on an argument-based analysis of the legal judgment
supervised learning approach. The paper shows the performance prediction system that aims towards the prediction of a judicial
and accuracy of the model with a standard classifier i.e. Support dataset by performing a support vector machine algorithm [8]
Vector Machine (SVM). over cases related to ‘dowry death’ [9]. Hard-copies of the case
files with pronounced judgments related to dowry are collected
Keywords—Machine Learning, Prediction System, Legal
Dataset, Support Vector Machine (SVM). from trial courts of West Bengal, the dataset is created manually
based on certain parameters using the collected documents , and
support vector machine algorithm (a supervised machine
I. INTRODUCTION learning algorithm) [10] is used to predict the report. The paper
Application of Artificial Intelligence (AI) has helped in the predicts the performance report and accuracy score with the
assimilation of human intelligence into machines [1]. AI-based help of a standard classifier i.e. support vector machine [11].
firms are persistent in the process of finding innovative ways of
developing technologies [2] that would manage the laborious
works of humans in different sectors to enhance the accuracy,
efficiency level and simultaneously speed-up the work for

978-1-7281-5461-9/20/$31.00 ©2020 IEEE 487

Authorized licensed use limited to: Auckland University of Technology. Downloaded on October 24,2020 at 15:34:01 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Smart Electronics and Communication (ICOSEC 2020)
IEEE Xplore Part Number: CFP20V90-ART; ISBN: 978-1-7281-5461-9

The full paper is arranged in the following Sections. Section –


II discusses the input dataset and its parameters. Section – III
describes the proposed model using a support vector machine.
Section – IV states the performance analysis of the model. And,
Section – V concludes the paper and also explores its future
scope of work.

II. DATASET
This section helps to explore the process by which the dataset
has been created by extracting the important features from the
argument based legal documents related to ‘dowry cases’ [12]. The
parameters are selected on basis of significant features that act
as a deciding factor in the prediction of the defendant being
guilty or not [13].
A. Generation of Dataset
Before training the machine [14], the dataset needs to be
created. Dataset is the collection of data. It can be a structured,
unstructured or semi-structured dataset. In India, there is no pre-
structured argument based legal judgment dataset [15],
therefore the whole dataset needs to be created from the scratch.
In this paper, a structured data set has been used in the form of a
Comma-Separated Value (CSV) file [16] as it can retrieve data
efficiently. Hard-copies of the case document files related to
‘dowry death’ i.e. IPC section 498a and 304b Fig. 1 Screenshot of the created dataset
[17] with pronounced judgments are collected from trial courts
of West Bengal [18]. As there were too many hard copies of the
cases, these data have been structured into MS-Excel file format III. PROPOSED MODEL USING SUPPORT VECTOR MACHINE
manually as there were no soft copies of the case files available. The dataset that has been created and used in this model
These data will be required for the training of the machine [29] for automatic generation of judgment prediction has been
learning model [19]. The process of excavating the features is explained in detail in the previous section. This section
known as Feature Extraction [20]. discusses the working principle of the proposed model [30].
The four types of machine learning problems [31] are: (i)
B. Selection of Features supervised learning [32] (ii) unsupervised learning [33] (iii)
Features [21] act like independent variables that can be fed semi-supervised learning [34] (iv) reinforcement learning [35].
as input into the machine learning model [22] and can provide In this paper, a support vector machine algorithm [36] has been
either classification or prediction [23]. In a specific dataset, the used that falls under supervised machine learning [37].
features are attributes consisting of values, on which generated
output is dependent. Some of the features or parameters [24] that A. Support Vector Machine
have been extracted from the hard-copies of evidential
Support Vector Machines(SVM) [38] has become one of the
documents are as follows: (i) ‘Victim Name’, (ii) ‘Number of
most powerful tools for resolving classification problems [39].
years married (Greater than seven years or not)’, this parameter
depicts if the case falls under ‘dowry case’ [25] or not. If the Many algorithms have emerged to solve these problems.
case is filed within seven years of marriage and the defendant Traditional quadratic programming algorithms [40] being one
has taken dowry then the case falls under ‘dowry case’ else not of them, have certain drawbacks. To overcome those drawbacks
SVM [41] has been implemented. SVM [42] is the class of
(iii) ‘Dowry taken within seven years of marriage (Yes/ No)’
(iv) ‘Incident taken place within seven years of marriage’, this classifiers that specifically classifies a set of data with the help of
parameter shows that if the death has taken place within seven N-Dimensional hyper-plane [43].
years of marriage it falls under ‘dowry death case’ (v)
‘Postmortem Report (Usual/Unusual Death)’ and many others.
The parameters have been selected on the significance of the
features [26] excluding the unimportant ones. The process of
selecting some of the useful features from a stream of extracted
features is known as Feature Selection [27]. Based on these
selected features, the model has been trained [28] for the
prediction of the accused person being guilty or not.

978-1-7281-5461-9/20/$31.00 ©2020 IEEE 488

Authorized licensed use limited to: Auckland University of Technology. Downloaded on October 24,2020 at 15:34:01 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Smart Electronics and Communication (ICOSEC 2020)
IEEE Xplore Part Number: CFP20V90-ART; ISBN: 978-1-7281-5461-9

IV. PERFORMANCE ANALYSIS OF THE PROPOSED MODEL


The following table depicts the performance of the
classification report and accuracy score of the model.

Table 1: Classification report and Accuracy score of SVM


classifier
Precision Recall F1-Score Support

Class 0 1.00 0.50 0.67 4


Class 1 0.92 1.00 0.96 23

Accuracy 0.93 27

Macro
0.96 0.75 0.81 27
Fig. 2 Support Vector Machine Average

Weighted
B. Classifier Creation 0.93 0.93 0.92 27
Average
In this paper, the SVM Module has been used from the
Scikit-Learn package of Python. The statement used for
creating of the classifier is given below: V. CONCLUSION
A Supervised Learning Algorithm namely, Support Vector
svmModel = svm.SVC() Machine (SVM) based prediction system over the legal data set to
assist the legal professionals is discussed. In India, due to a
C. Fitting inputs into Classifier shortage of skilled manpower and infrastructure, beneficiaries
Fit() function of svm.SVC() class has been used that helps have to wait for a long time to get their well- deserved justice.
in inputting the data from the dataset into the formed classifier. As it is rightly said that "Justice Delayed Is Justice Denied", the
The statement for the above input feeding into the created prolonged legal proceedings also leads towards various
classifier is as follows: consequences, like hostility of witnesses, unfitness of the
accused on medical grounds, tampering, or manipulation of
‫𝑙𝑒𝑑𝑜𝑀𝑚𝑣ݏ‬.‫𝑛ݏ𝑎𝑟𝑇ݏ(𝑡ݏݏ‬, 𝑇𝑇𝑟𝑎‫)𝑛ݏ‬
shreds of evidence, etc. The proposed model will help legal
professionals to analyze the desired data set and perform
D. Prediction phase prediction on case to case basis depending on the essential
After the fitting of inputs into classifiers, the classes are then parameters of 'Dowry' death-related cases. The performance and
retrieved that have been predicted by the classifier from the accuracy of the model with a Standard Classifier (i.e. Support
testing of input data. Predict() function of the svm.SVC() class Vector Machine) and have achieved 93% accuracy in our
is used for the prediction task. Prediction of the output class is approach. In this paper, Table 1 clearly states the current status
given below: of the proposed model. However, the additional parameters are
extended to achieve 100% accuracy, so that justice is rightly
𝑇𝑃𝑟𝑒𝑑 = ‫𝑙𝑒𝑑𝑜𝑀𝑚𝑣ݏ‬.𝑝𝑟𝑒𝑑‫)𝑡ݏ𝑒 𝑇ݏ(𝑡𝑐ݏ‬
delivered to all the victims and make our judicial system more
dynamic in approach for the larger benefit of society.
E. Classification report and accuracy generation
After getting the predicted class labels, a classification
report and accuracy score is generated to find the performance Acknowledgment
of our classifier. This work has been supported by Prof. Arun Kumar
Mazumdar, Department of Computer Science and Engineering,
F. Advantages of an SVM classifier IIT Kharagpur through his guidance. Heartiest thanks to the
 When there is a clear margin of separation between the reviewers for their comments and insight.
classes of the data, this classifier performs at its best
 SVM classifier is appropriate for data present in high References
dimensional space [1] Shao, Z., Yuan, S., & Wang, Y. (2020). Institutional Collaboration
and Competition in Artificial Intelligence. IEEE Access, 8, 69734-
 This classifier is very much efficient in terms of memory 69741.
usage. [2] Lucci, S., & Kopec, D. (2015). Artificial intelligence in the 21st
century. Stylus Publishing, LLC.
[3] Sil, R., Roy, A., Bhushan, B., & Mazumdar, A. K. (2019,
October). Artificial Intelligence and Machine Learning based

978-1-7281-5461-9/20/$31.00 ©2020 IEEE 489

Authorized licensed use limited to: Auckland University of Technology. Downloaded on October 24,2020 at 15:34:01 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Smart Electronics and Communication (ICOSEC 2020)
IEEE Xplore Part Number: CFP20V90-ART; ISBN: 978-1-7281-5461-9

heuristics improvement. In 2018 4th International Conference on


Legal Application: The State-of-the-Art and Future Research Computing Communication and Automation (ICCCA) (pp. 1 -3).
Trends. In 2019 International Conference on Computing, IEEE.
Communication, and Intelligent Systems (ICCCIS) (pp. 57 -62). [27] Zhang, Y., Xie, R., Wang, J., Leier, A., Marquez-Lago, T. T.,
IEEE. Akutsu, T., ... & Song, J. (2019). Computational analysis and
[4] Tiwari, R. K., & Singh, A. Digitalization-The New Era of Indian prediction of lysine malonylation sites by exploiting informative
Judiciary. features in an integrative machine-learning framework. Briefingsin
[5] Hollander, J. E., & Carr, B. G. (2020). Virtually perfect? bioinformatics, 20(6), 2185-2199.
Telemedicine for COVID-19. New England Journal of Medicine, [28] Hall, M. A., & Smith, L. A. (1998). Practical feature subset
382(18), 1679-1681. selection for machine learning.
[6] Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical [29] James, S. C., Zhang, Y., & O'Donncha, F. (2018). A machine
bayesian optimization of machine learning algorithms. In learning framework to forecast wave conditions. Coastal
Advances in neural information processing systems (pp. 2951 - Engineering, 137, 1-10.
2959). [30] Rueping, S. (2010, January). SVM classifier estimation from group
[7] Shaha, K. K., & Mohanthy, S. (2006). Alleged dowry death: a probabilit ies. In ICML.
study of homicidal burns. Medicine, science and the law, 46(2), [31] Mitchell, T. M. (2006). The discipline of machine learning (Vol.
105-110. 9). Pittsburgh: Carnegie Mellon University, School of Computer
[8] Burrell, J. (2016). How the machine ‘thinks’: Understanding Science, Machine Learning Department.
opacity in machine learning algorithms. Big Data & Society, 3(1), [32] Bhamare, D., Salman, T., Samaka, M., Erbad, A., & Jain, R. (2016,
2053951715622512. December). Feasibility of supervised machine learning for cloud
[9] Khan, M. Z., & Ray, R. (1984). Dowry death. Indian Journal of security. In 2016 International Conference on Information Science
Social Work, 45(3), 303-315. and Security (ICISS) (pp. 1-5). IEEE.
[10] Liu, Q., He, Q., & Shi, Z. (2008, May). Extreme support vector [33] Kassambara, A. (2017). Practical guide to cluster analysis in R:
machine classifier. In Pacific-asia conference on knowledge Unsupervised machine learning (Vol. 1). Sthda.
discovery and data mining (pp. 222-233). Springer, Berlin, [34] Zhu, X., & Goldberg, A. B. (2009). Introduction to semi-
Heidelberg. supervised learning. Synthesis lectures on artificial intelligence and
[11] Huang, X., Shi, L., & Suykens, J. A. (2013). Support vector machine learning, 3(1), 1-130.
machine classifier with pinball loss. IEEE transactions on pattern
[35] Sutton, R. S. (1992). A Special Issue of Machine Learning on
analysis and machine intelligence, 36(5), 984 -997.
Reinforcement Learning. Machine learning, 8.
[12] Mukherjee, R. (1998). Women, Law, and Free Legal Aid in India. [36] Suthaharan, S. (2016). Support vector machine. In Machine
Deep & Deep Publications. learning models and algorithms for big data classification (pp. 207-
[13] Belur, J., T illey, N., Daruwalla, N., Kumar, M., Tiwari, V., & 235). Springer, Boston, MA.
Osrin, D. (2014). The social construction of ‘dowry deaths’. Social [37] Cai, Y. D., Feng, K. Y., Li, Y. X., & Chou, K. C. (2003). Support
Science & Medicine, 119, 1-9. vector machine for predicting α-turn types. Peptides, 24(4), 629-
[14] Catanzaro, B., Sundaram, N., & Keutzer, K. (2008, July). Fast 630.
support vector machine training and classification on graphics [38] Wang, W., Men, C., & Lu, W. (2008). Online prediction model
processors. In Proceedings of the 25th international conference on based on support vector machine. Neurocomputing, 71(4-6), 550-
Machine learning (pp. 104-111). 558.
[15] Bennett, W. L. (1979). Rhetorical transformation of evidence in [39] Chen, H. L., Yang, B., Liu, J., & Liu, D. Y. (2011). A support
criminal trials: Creating grounds for legal judgment. Quarterly vector machine classifier with rough set-based feature selection for
Journal of Speech, 65(3), 311-323. breast cancer diagnosis. Expert systems with applications, 38(7),
[16] Shafranovich, Y. (2005). Common format and MIME type for 9014-9022.
comma-separated values (CSV) files.
[40] Sha, F., Saul, L. K., & Lee, D. D. (2003). Multiplicative updates for
[17] Agnes, F. (2015). Section 498A, marital rape and adverse
nonnegative quadratic programming in support vector machines.
propaganda. Economic & Political Weekly, 50(23), 13.
In Advances in neural information processing systems (pp. 1065-
[18] Trivedi, P. K., & Singh, S. (2014). Fallacies of a Supreme Court
1072).
judgment: Section 498A and the dynamics of acquittals. Economic
[41] Su, L. (2009). Optimizing support vector machine learning for semi-
and Political Weekly, 90-97.
arid vegetation mapping by using clustering analysis. ISPRS Journal
[19] Gu, T., Dolan-Gavitt, B., & Garg, S. (2017). Badnets: Identifying
of Photogrammetry and Remote Sensing, 64(4), 407 -413.
vulnerabilit ies in the machine learning model supply chain. arXiv
[42] Cai, Y. D., Liu, X. J., Xu, X. B., & Zhou, G. P. (2001). Support
preprint arXiv:1708.06733.
vector machines for predicting protein structural class. BMC
[20] Trier, Ø. D., Jain, A. K., & Taxt, T. (1996). Feature extraction
bioinformatics, 2(1), 3.
methods for character recognition-a survey. Pattern recognit ion,
[43] Olatomiwa, L., Mekhilef, S., Shamshirband, S., Mohammadi, K.,
29(4), 641-662.
Petković, D., & Sudheer, C. (2015). A support vector machine–
[21] Nevatia, R., & Babu, K. R. (1980). Linear feature extraction and
firefly algorithm-based model for global solar radiation prediction.
description. Computer Graphics and Image Processing, 13(3), 257-
Solar Energy, 115, 632-644.
269.
[22] Montavon, G., Rupp, M., Gobre, V., Vazquez-Mayagoitia, A.,
Hansen, K., Tkatchenko, A., ... & Von Lilienfeld, O. A. (2013).
Machine learning of molecular electronic properties in chemical
compound space. New Journal of Physics, 15(9), 095003.
[23] Kumar, R. S. S., O'Brien, D. R., Albert, K., & Vilojen, S. (2018).
Law and Adversarial Machine Learning. arXiv preprint
arXiv:1810.10731.
[24] Yousfi-Monod, M., Farzindar, A., & Lapalme, G. (2010, May).
Supervised machine learning for summarizing legal documents. In
Canadian Conference on Artificial Intelligence (pp. 51 -62).
Springer, Berlin, Heidelberg.
[25] Stone, L., & James, C. (1995, March). Dowry, bride-burning, and
female power in India. In Women's Studies International Forum
(Vol. 18, No. 2, pp. 125-134). Pergamon.
[26] Shah, P., Joshi, S., & Pandey, A. K. (2018, December). Lega l
clause extraction from contract using machine learning with

978-1-7281-5461-9/20/$31.00 ©2020 IEEE 490

Authorized licensed use limited to: Auckland University of Technology. Downloaded on October 24,2020 at 15:34:01 UTC from IEEE Xplore. Restrictions apply.

You might also like