A Novel Approach On Argument Based Legal Prediction Model Using Machine Learning
A Novel Approach On Argument Based Legal Prediction Model Using Machine Learning
A Novel Approach On Argument Based Legal Prediction Model Using Machine Learning
Abstract — “Justice delayed is justice denied” and this delaying better output. The expansion of artificial intelligence in legal
of justice is a great bane for the Indian justice system. Every year, firms [3] has led to the emergence of an automated service
illimitable cases remain pending just for the final hearing of delivery system. It has been used in various legal domains and
judicial verdict. Years pass-by keeping the plaintiff waiting for
helps to assist in the delivery of justice to its beneficiaries, such as,
justice. For years, this is a major issue faced in the Indian judicial
system. In this paper, authors have attempted to condense the (i) Providing help to the lawyer by performing due diligence
problem by decreasing the number of cases before it reaches the and research work, (ii) Endowing additional information and
Court. This is done by extending help to the legal professionals to shortcuts through data analytics, (iii) Automate innovative
predict a case output from previous records. This paper focuses on processes in legal work. The unremitting demand and rigorous
cases related to ‘Dowry Death’ i.e. IPC section 498A and 304B. It change in clients’ expectations have led to the adoption of new
aims to deliver justice by predicting judicial argument-based techniques. It makes work faster and easier for legal
analysis using the Support Vector Machine (SVM) algorithm to professionals. The huge population of India and a shortage of
find its accuracy. This model processes through the following judges has led to numerous cases. According to a report, in 2019,
steps: (i) Hard-copies of the case files with pronounced judgments the Law Minister of our Nation informed that there have been
related to dowry are collected from trial courts of West Bengal. (ii)
more than 43 lacs pending cases over 25 High Courts in the
Data set are generated manually based on certain parameters
like:(a) ‘Victim Name’ (b) ‘Number of years married (greater than country, out of which more than 8 lacs cases [4] are pending
seven years or not)’. In India, these parameters determine the key over a decade now. Considering the vast number of
factors of ‘Dowry’ related cases. If the case is filed within seven compounding records besides the recent developments in the
years of marriage and the defendant has taken dowry then the pandemic situation of COVID-19 [5], things have worsened
case falls under ‘dowry case’ else not over time. At this crucial point in time, enhancement of the legal
(c) ‘Dowry taken within seven years of marriage (Yes/No)’ (d) system using machine learning algorithms [6] is imperative and
‘Incident occurred within seven years of marriage’, this parameter can help in the reduction of workload in legal professions, thus
shows that if the death has taken place within seven years of resulting in the utilization of more time to clear out the pending
marriage it falls under 'Dowry Death' case.
cases [7]. Considering these trying times where people are to
(e) ‘Postmortem Report (Usual/Unusual Death)’ and many other
documented parameters. (iii) A Supervised Machine Learning exercise social distancing to curb the spread of COVID-19, this
Algorithm namely, Support Vector Machine (SVM) is used to paper aims to revolutionize the process of legal work making it
assist legal judgement through a prediction system. The objective fully digital and preserve the output for future use. The paper
of this paper is to predict whether a person is guilty or not, using a focuses on an argument-based analysis of the legal judgment
supervised learning approach. The paper shows the performance prediction system that aims towards the prediction of a judicial
and accuracy of the model with a standard classifier i.e. Support dataset by performing a support vector machine algorithm [8]
Vector Machine (SVM). over cases related to ‘dowry death’ [9]. Hard-copies of the case
files with pronounced judgments related to dowry are collected
Keywords—Machine Learning, Prediction System, Legal
Dataset, Support Vector Machine (SVM). from trial courts of West Bengal, the dataset is created manually
based on certain parameters using the collected documents , and
support vector machine algorithm (a supervised machine
I. INTRODUCTION learning algorithm) [10] is used to predict the report. The paper
Application of Artificial Intelligence (AI) has helped in the predicts the performance report and accuracy score with the
assimilation of human intelligence into machines [1]. AI-based help of a standard classifier i.e. support vector machine [11].
firms are persistent in the process of finding innovative ways of
developing technologies [2] that would manage the laborious
works of humans in different sectors to enhance the accuracy,
efficiency level and simultaneously speed-up the work for
Authorized licensed use limited to: Auckland University of Technology. Downloaded on October 24,2020 at 15:34:01 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Smart Electronics and Communication (ICOSEC 2020)
IEEE Xplore Part Number: CFP20V90-ART; ISBN: 978-1-7281-5461-9
II. DATASET
This section helps to explore the process by which the dataset
has been created by extracting the important features from the
argument based legal documents related to ‘dowry cases’ [12]. The
parameters are selected on basis of significant features that act
as a deciding factor in the prediction of the defendant being
guilty or not [13].
A. Generation of Dataset
Before training the machine [14], the dataset needs to be
created. Dataset is the collection of data. It can be a structured,
unstructured or semi-structured dataset. In India, there is no pre-
structured argument based legal judgment dataset [15],
therefore the whole dataset needs to be created from the scratch.
In this paper, a structured data set has been used in the form of a
Comma-Separated Value (CSV) file [16] as it can retrieve data
efficiently. Hard-copies of the case document files related to
‘dowry death’ i.e. IPC section 498a and 304b Fig. 1 Screenshot of the created dataset
[17] with pronounced judgments are collected from trial courts
of West Bengal [18]. As there were too many hard copies of the
cases, these data have been structured into MS-Excel file format III. PROPOSED MODEL USING SUPPORT VECTOR MACHINE
manually as there were no soft copies of the case files available. The dataset that has been created and used in this model
These data will be required for the training of the machine [29] for automatic generation of judgment prediction has been
learning model [19]. The process of excavating the features is explained in detail in the previous section. This section
known as Feature Extraction [20]. discusses the working principle of the proposed model [30].
The four types of machine learning problems [31] are: (i)
B. Selection of Features supervised learning [32] (ii) unsupervised learning [33] (iii)
Features [21] act like independent variables that can be fed semi-supervised learning [34] (iv) reinforcement learning [35].
as input into the machine learning model [22] and can provide In this paper, a support vector machine algorithm [36] has been
either classification or prediction [23]. In a specific dataset, the used that falls under supervised machine learning [37].
features are attributes consisting of values, on which generated
output is dependent. Some of the features or parameters [24] that A. Support Vector Machine
have been extracted from the hard-copies of evidential
Support Vector Machines(SVM) [38] has become one of the
documents are as follows: (i) ‘Victim Name’, (ii) ‘Number of
most powerful tools for resolving classification problems [39].
years married (Greater than seven years or not)’, this parameter
depicts if the case falls under ‘dowry case’ [25] or not. If the Many algorithms have emerged to solve these problems.
case is filed within seven years of marriage and the defendant Traditional quadratic programming algorithms [40] being one
has taken dowry then the case falls under ‘dowry case’ else not of them, have certain drawbacks. To overcome those drawbacks
SVM [41] has been implemented. SVM [42] is the class of
(iii) ‘Dowry taken within seven years of marriage (Yes/ No)’
(iv) ‘Incident taken place within seven years of marriage’, this classifiers that specifically classifies a set of data with the help of
parameter shows that if the death has taken place within seven N-Dimensional hyper-plane [43].
years of marriage it falls under ‘dowry death case’ (v)
‘Postmortem Report (Usual/Unusual Death)’ and many others.
The parameters have been selected on the significance of the
features [26] excluding the unimportant ones. The process of
selecting some of the useful features from a stream of extracted
features is known as Feature Selection [27]. Based on these
selected features, the model has been trained [28] for the
prediction of the accused person being guilty or not.
Authorized licensed use limited to: Auckland University of Technology. Downloaded on October 24,2020 at 15:34:01 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Smart Electronics and Communication (ICOSEC 2020)
IEEE Xplore Part Number: CFP20V90-ART; ISBN: 978-1-7281-5461-9
Accuracy 0.93 27
Macro
0.96 0.75 0.81 27
Fig. 2 Support Vector Machine Average
Weighted
B. Classifier Creation 0.93 0.93 0.92 27
Average
In this paper, the SVM Module has been used from the
Scikit-Learn package of Python. The statement used for
creating of the classifier is given below: V. CONCLUSION
A Supervised Learning Algorithm namely, Support Vector
svmModel = svm.SVC() Machine (SVM) based prediction system over the legal data set to
assist the legal professionals is discussed. In India, due to a
C. Fitting inputs into Classifier shortage of skilled manpower and infrastructure, beneficiaries
Fit() function of svm.SVC() class has been used that helps have to wait for a long time to get their well- deserved justice.
in inputting the data from the dataset into the formed classifier. As it is rightly said that "Justice Delayed Is Justice Denied", the
The statement for the above input feeding into the created prolonged legal proceedings also leads towards various
classifier is as follows: consequences, like hostility of witnesses, unfitness of the
accused on medical grounds, tampering, or manipulation of
𝑙𝑒𝑑𝑜𝑀𝑚𝑣ݏ.𝑛ݏ𝑎𝑟𝑇ݏ(𝑡ݏݏ, 𝑇𝑇𝑟𝑎)𝑛ݏ
shreds of evidence, etc. The proposed model will help legal
professionals to analyze the desired data set and perform
D. Prediction phase prediction on case to case basis depending on the essential
After the fitting of inputs into classifiers, the classes are then parameters of 'Dowry' death-related cases. The performance and
retrieved that have been predicted by the classifier from the accuracy of the model with a Standard Classifier (i.e. Support
testing of input data. Predict() function of the svm.SVC() class Vector Machine) and have achieved 93% accuracy in our
is used for the prediction task. Prediction of the output class is approach. In this paper, Table 1 clearly states the current status
given below: of the proposed model. However, the additional parameters are
extended to achieve 100% accuracy, so that justice is rightly
𝑇𝑃𝑟𝑒𝑑 = 𝑙𝑒𝑑𝑜𝑀𝑚𝑣ݏ.𝑝𝑟𝑒𝑑)𝑡ݏ𝑒 𝑇ݏ(𝑡𝑐ݏ
delivered to all the victims and make our judicial system more
dynamic in approach for the larger benefit of society.
E. Classification report and accuracy generation
After getting the predicted class labels, a classification
report and accuracy score is generated to find the performance Acknowledgment
of our classifier. This work has been supported by Prof. Arun Kumar
Mazumdar, Department of Computer Science and Engineering,
F. Advantages of an SVM classifier IIT Kharagpur through his guidance. Heartiest thanks to the
When there is a clear margin of separation between the reviewers for their comments and insight.
classes of the data, this classifier performs at its best
SVM classifier is appropriate for data present in high References
dimensional space [1] Shao, Z., Yuan, S., & Wang, Y. (2020). Institutional Collaboration
and Competition in Artificial Intelligence. IEEE Access, 8, 69734-
This classifier is very much efficient in terms of memory 69741.
usage. [2] Lucci, S., & Kopec, D. (2015). Artificial intelligence in the 21st
century. Stylus Publishing, LLC.
[3] Sil, R., Roy, A., Bhushan, B., & Mazumdar, A. K. (2019,
October). Artificial Intelligence and Machine Learning based
Authorized licensed use limited to: Auckland University of Technology. Downloaded on October 24,2020 at 15:34:01 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Smart Electronics and Communication (ICOSEC 2020)
IEEE Xplore Part Number: CFP20V90-ART; ISBN: 978-1-7281-5461-9
Authorized licensed use limited to: Auckland University of Technology. Downloaded on October 24,2020 at 15:34:01 UTC from IEEE Xplore. Restrictions apply.