0% found this document useful (0 votes)
19 views83 pages

Final Draft

The project report presents LegalBot, an AI-driven chatbot designed to provide accessible legal information related to Indian laws, particularly the Indian Penal Code (IPC). Utilizing Natural Language Processing (NLP) and the BERT model, LegalBot aims to empower users by delivering accurate legal guidance in an interactive format, thereby reducing reliance on costly legal services. The initiative seeks to enhance legal literacy and democratize access to legal knowledge for underserved communities.

Uploaded by

zoro67135
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views83 pages

Final Draft

The project report presents LegalBot, an AI-driven chatbot designed to provide accessible legal information related to Indian laws, particularly the Indian Penal Code (IPC). Utilizing Natural Language Processing (NLP) and the BERT model, LegalBot aims to empower users by delivering accurate legal guidance in an interactive format, thereby reducing reliance on costly legal services. The initiative seeks to enhance legal literacy and democratize access to legal knowledge for underserved communities.

Uploaded by

zoro67135
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 83

RESOLVING PROPERTY DISPUTES

WITH AI:AN NLP & BERT-POWERED


CHATBOT

A PROJECT REPORT

Submitted by
GOKUL P (724021243012)
REUEL JEHOADA P (724021243036)
YESHWANTH V (724021243049)
YUVAN SHANKAR M (724021243050)

In partial fulfillment for the award of the degree


of
BACHELOR OF TECHNOLOGY
IN
ARTIFICIAL INTELLIGENCE AND DATA SCIENCE
DHAANISH AHMED INSTITUTE OF TECHNOLOGY,
COIMBATORE

ANNA UNIVERSITY : CHENNAI 600 025

MAY 2025
ANNA UNIVERSITY: CHENNAI 600 025

BONAFIDE CERTIFICATE

Certified that this project report “RESOLVING PROPERTY


DISPUTES WITH AI: AN NLP & BERT-POWERED CHATBOT” is the
bonafide work of “GOKUL P (724021243012), REUEL JEHOADA P
(724021243036), YESHWANTH V (724021243049), YUVAN SHANKAR M
(724021243050)”, carried out the project work under my supervision.

SIGNATURE SIGNATURE
Dr. MUTHUVEL L, Ph.D. MR J.MANOJ PRABHAKAR, M.E.
HEAD OF THE DEPARTMENT, SUPERVISOR,

Department of Artificial Intelligence and Department of Computer Science and


Data Science, Engineering,
Dhaanish Ahmed Institute of Dhaanish Ahmed Institute of
Technology. Technology.
Coimbatore -641105 Coimbatore -641105

Submitted for the project Viva–voice Examination held on .……………………


at Dhaanish Ahmed Institute of Technology, Coimbatore.

INTERNAL EXAMINER EXTERNAL EXAMINER


ACKNOWLEDGEMENT
We, the authors of this project first of all thank to the almighty and our
parents for providing us the right proportion of strength and knowledge for the
successful completion of the project.

We would like to record our sincere thanks indebtedness and gratitude to


our renowned Chairman Mr. Alhaj K. Moosa, Director Mr. K.A. Akbar
Basha and Chief Executive Officer Mr. A. Thameez Ahmed B.E., M.B.A for
their noteworthy effort to enhance our professional dexterity and co-curricular
excellence.

We gratefully acknowledge our eminent and encouraging Principal,


Dr. K.G. Parthiban. M.E. Ph.D., Dhaanish Ahmed Institute of Technology,
Coimbatore, for providing all facilities for carrying out this project very
effectively and efficiently.

We express our sincere thanks to the Head of the Department,


Dr. MUTHUVEL L, Ph.D., Dhaanish Ahmed institute of Technology,
Coimbatore, for his constant support to complete the project work.

We take this opportunity to express our sincere thanks to our guide


MR J.MANOJ PRABHAKAR M.E., Department of and Computer Science
and Engineering, Dhaanish Ahmed Institute of Technology, Coimbatore for his
endless support and encouragement during this project.

We extend our sincere thanks to all our teaching and non-teaching staff
members for helping us.
ABSTRACT
Access to legal information and guidance is often hindered by the high

costs associated with professional legal consultation, especially for minor or

general legal matters. This project proposes LegalBot, an AI-driven virtual

assistant designed to provide accurate, timely, and accessible legal information

pertaining to Indian laws. LegalBot leverages advanced Natural Language

Processing (NLP) techniques and the Bidirectional Encoder Representations

from Transformers (BERT) model to understand user queries and provide

relevant legal guidance through an interactive chat interface. The system is

trained extensively on Indian legal frameworks to ensure the delivery of precise

information and practical recommendations. LegalBot aims to reduce reliance

on expensive legal services by offering an affordable alternative for preliminary

legal advice, thereby enhancing legal awareness and literacy among the general

public. The project demonstrates the potential of AI to democratize access to

legal knowledge and facilitate informed decision-making in the legal domain.

i
TABLE OF CONTENTS
CHAPTER TITLE PAGE
NO. NO.
ABSTRACT i
LIST OF FIGURES iv
LIST OF TABLE v
LIST OF ABBREVATION vi
1 INTRODUCTION 1-11
1.1 OVERVIEW 2
1.2 PROBLEM STATEMENT 4
1.3 AI CHATBOT 4
1.4 AIM AND OBJECTIVE 10
1.5 SCOPE OF THE PROJECT 10
2 LITERATURE SURVEY 12-14
3 SYSTEM ANALYSIS 15-22
3.1 EXISTING SYSTEM 16
3.1.1 Disadvantages 17
3.2 PROPOSED SYSTEM 18
3.2.1 Advantages 19
3.3 FEASIBILITY STUDY 19
3.3.1 Technical Feasibility 20
3.3.2 Economic Feasibility 20
3.3.3 Operational Feasibility 20
3.4 SYSTEM ARCHITECTURE 21
4 SYSTEM CONFIGURATION 23-25
4.1 HARDWARE SPECIFICATION 24
4.2 SOFTWARE SPECIFICATION 24
4.3 SOFTWARE REQUIREMENTS 24

ii
4.3.1 PYTHON 24
4.3.2 MYSQL 25
4.3.3 WAMPSERVER 25
4.3.4 BOOTSTRAP 4 25
4.3.5 FLASK 24
5 SYSTEM IMPLEMENTATION 26-34
5.1 SYSTEM DESCRIPTION 28
5.2 SYSTEM FLOW 29
5.3 MODULES DESCRIPTION 29
5.4 LEGALBOT RESPONSE PREDICTOR 32
5.5 RECOMMENDATION 33
5.6 END USER 33
6 SYSTEM TESTING 35-39
6.1 TEST CASES 37
6.2 TEST REPORTS 38
7 CONCLUSION AND FUTURE ENHANCEMENT 40-41
7.1 CONCLUSION 41
7.2 FUTURE ENHANCEMENT 41
8 APPENDIX 42-59
8.1 SOURCE CODING 43
8.2 SCREENSHOTS 54
9 REFERENCES 60-62

iii
FIGURE PAGE
FIGURE DESCRIPTION
NO. NO.
1.1 Statue of Law 2
1.2 Customer Service 5
1.3 Segmentation 5
1.4 Tokenization 6
1.5 Stop Words 6
1.6 Stemming 7
1.7 Lemmatization 7
1.8 Part of Speech Tagging 7
1.9 BERT ML Model 8
1.10 Sentence Prediction 9
3.1 Rule Based Chatbot 17
3.2 System Architecture 21
8.1 Program Initialization 54
8.2 Home Page 54
8.3 Admin Login 55
8.4 Admin Page 55
8.5 Adding Lawyer Detail 56
8.6 Upload Dataset 56
8.7 Dataset TraIning 57
8.8 User Registration 57
8.9 User Login 58
8.10 User Interaction 58
8.11 LegalBot Response 59
8.12 Lawyer Recommendation 59

iv
LIST OF TABLE
TABLE PAGE
TITLE
NO. NO.
5.1 Dataset 30
5.2 TF-IDF vectorization 32

v
LIST OF ABBREVATION

ACRONYMS ABBREVATIONS
NLP Natural Language Processing
IPC Indian Penal Code
Bidirectional Encoder Representation
BERT
from Transformer
TB Test Bug
TCID Test Case Identifier
LB LegalBot
UI User Interface
ML Machine Learning
AI Artificial Intelligence
OS Operating System
DB Database
CMS Case Management System
FAQ Frequently Asked Questions

vi
CHAPTER 1
INTRODUCTION

1
CHAPTER 1
INTRODUCTION

1.1 OVERVIEW
Law is both a discipline and a profession that deals with the customs,
practices, and rules of conduct within a community, which are recognized as
binding by its members. These rules are enforced by an authoritative controlling
body. The term "law" encompasses various types of rules and principles and
fundamentally serves as an instrument to regulate human conduct and behavior.
From the societal perspective, law represents essential concepts such as justice,
morality, reason, order, and righteousness. From the viewpoint of the legislature,
it includes statutes, acts, rules, regulations, orders, and ordinances. From the
perspective of the judiciary, law comprises rules of court, decrees, judgments,
court orders, and injunctions. Therefore, law is a comprehensive term that
broadly includes statutes, rules, regulations, judicial decisions, and fundamental
concepts like justice, morality, and legal theory. It also extends to specialized
branches such as tort law, jurisprudence, and the core principles that guide the
functioning of the legal system.

Figure 1.1: Statue of Law

2
Indian Penal Code(IPC)
The Indian Penal Code (IPC) serves as the fundamental legal framework
in India for establishing criminal liability related to specified offenses and
setting exceptions to criminal liability for these offenses. It encompasses a
comprehensive set of laws addressing all substantive aspects of criminal law,
defining civil law rights, responsibilities, crimes, and punishments. The IPC
meticulously defines each offense, incorporating all necessary elements to
constitute the offense. Therefore, the IPC is the legal instrument that delineates
punishable offenses and their associated penalties. It applies to all Indian
citizens and individuals of Indian origin, regardless of location. The IPC is
organized into 23 chapters and consists of 511 sections.
History of Indian Penal Code
The Indian Penal Code has its roots in the times of British rule in India. It
is known to have originated from British legislation regarding its colonial
conquests, dating back to the year 1860. Before the East India Company drafted
the Indian Penal Code, the Mohammedan law was in effect in India.
Mohomedan criminal law applied to both Hindus and Muslims.
 In 1834, the First Law Commission, led by Thomas Babington Macaulay,
drafted the Indian Penal Code under the Charter Act of 1833. It was
submitted to the Governor-General of India Council in 1837 but was revised
again.
 The Code was completed in 1850 and presented to the Legislative Council
in 1856; however, it did not become law immediately due to the Indian
Rebellion of 1857.
 It was finally passed into law on October 6, 1860, after revision by Barnes
Peacock, who later became the first Chief Justice of the Calcutta High Court.
 The Code became effective on January 1, 1862. Macaulay died near the end
of 1859 and did not see his work become law.
 It applied to all British India at that time.
3
 However, until the 1940s, it did not automatically apply to the Princely
states, which had their own courts and legal systems.
 In 1971, the Law Commission proposed revising the IPC in its 42nd Report,
leading to several changes.
 On September 6, 2018, the Supreme Court of India decriminalized
homosexuality (Section 377 of the IPC).
 Similarly, on September 27, 2018, a five-judge Constitution bench
unanimously ruled to repeal Section 497 (commonly known as adultery).
 The IPC took effect in Jammu and Kashmir on October 31, 2019, following
the Jammu and Kashmir Reorganisation Act of 2019, replacing the state's
Ranbir Penal Code.

1.2 PROBLEM STATEMENT


Law is a system of rules to maintain order and justice, but understanding
complex statutes like the Indian Penal Code (IPC) is difficult without legal
training. Traditional legal help is often costly, slow, and inaccessible, especially
for underserved communities. Legal language and geographic barriers further
limit access. To address this, the LegalBot project uses AI and NLP to provide
an interactive chatbot that delivers instant, accurate legal guidance on IPC-
related queries, empowering users with accessible legal information.

1.3 AI CHATBOT
An AI chatbot is a piece of software that interacts with a human through
written language. It is often embedded in web pages or other digital applications
to answer customer inquiries without the need for human agents, thus providing
affordable effortless customer service.

4
Figure 1.2: Customer Service.
An AI chatbot is a computer program that simulates human communication and
is widely used on platforms like customer service and sales. Modern chatbots
have evolved from basic tools to advanced systems that engage users in a
personalized, human-like manner using machine learning, natural language
processing (NLP), and natural language understanding (NLU) to interpret user
intent, extract key details, and respond in real time
1.3.1 Natural Language Processing
Natural Language Processing (NLP) is a branch of Artificial Intelligence
(AI) that enables machines to understand human language. It combines
linguistics and computer science to analyze the rules and structure of language,
developing intelligent systems powered by machine learning and NLP
algorithms that can comprehend, interpret, and extract meaning from both text
and speech.
The steps to perform preprocessing of data in NLP include:
 Segmentation:
You first need to break the entire document down into its constituent
sentences. You can do this by segmenting the article along with its punctuations
like full stops and commas.

Figure 1.3: Segmentation

5
 Tokenizing:
For the algorithm to understand these sentences, you need to get the
words in a sentence and explain them individually to our algorithm. So, you
break down your sentence into its constituent words and store them. This is
called tokenizing, and each world is called a token.

Figure 1.4: Tokenization

 Removing Stop Words:


You can speed up learning by removing non-essential words that add little
meaning but make sentences sound cohesive. Words like was, in, is, and, the are
called stop words and can be removed.

Figure 1.5: Stop Words


 Stemming:
It is the process of obtaining the Word Stem of a word. Word Stem gives
new words upon adding affixes to them

6
Figure 1.6: Stemming
 Lemmatization:
The process of obtaining the Root Stem of a word. Root Stem gives the
new base form of a word that is present in the dictionary and from which the
word is derived. You can also identify the base words for different words based
on the tense, mood, gender,etc.

Figure 1.7: Lemmatization


 Part of Speech Tagging
Now, you must explain the concept of nouns, verbs, articles, and other
parts of speech to the machine by adding these tags to our words. This is called
‘part of’.

Figure 1.8: Part of Speech Tagging

7
 Named Entity Tagging
Next, introduce your machine to pop culture references and common
names by flagging words like movie titles, important personalities, or locations.
This is done by classifying words into subcategories such as person, location,
monetary value, quantity, organization, and movie. This helps identify
keywords in sentences. After these preprocessing steps, the processed data is
fed into a machine learning algorithm like Naive Bayes to build your NLP
application.
1.3.2 BERT
BERT, which stands for Bidirectional Encoder Representations from
Transformers, is a machine learning framework for natural language processing
developed by Google in 2018. It improves contextual understanding of text by
learning to predict words both before and after a given word (bidirectional).
BERT converts words into numerical values, which is essential because
machine learning models process numbers, not words. This transformation
enables training ML models on text data, helping make predictions by
combining text with other data.

Figure 1.9: BERT ML Model


8
Masked Language Model
Masked Language Model is an NLP task where 15% of words in a text
are replaced with a special token called [MASK]. The model’s job is to predict
the original words hidden by these [MASK] tokens. To improve fine-tuning, the
masking is sometimes mixed to reduce mismatch caused by the [MASK] token
during training. The model includes a classification layer on top of the encoder,
and the output probabilities are calculated using a fully connected layer
followed by a softmax layer.
Masked Language Model
The BERT loss function while calculating it considers only the prediction
of masked values and ignores the prediction of the non-masked values. This
helps in calculating loss for only those 15% masked words.
Next Sentence Prediction
In this NLP task, the model is given two sentences and must predict
whether the second sentence directly follows the first in the original text.
During BERT training, 50% of sentence pairs are actual consecutive sentences
(labeled isNext), and 50% are random, unrelated sentence pairs (labeled
NotNext). Since this is a classification task, the model uses the first token,
called the [CLS] token, to represent the entire input pair for making the
prediction.

9
Figure 1.10: Sentence Prediction
This model also uses a [SEP] token to separate the two input sentences.
BERT achieved an accuracy of 97%–98% on this task. Training with Next
Sentence Prediction helps the model understand relationships between sentences,
improving its contextual understanding.

1.4 AIM AND OBJECTIVE


Aim
Aim of the project is to develop an AI-powered web application that
provides legal assistance and support to users by classifying offenses, offering
legal advice, and recommending legal professionals.
Objectives
 Develop a user-friendly web interface for smooth user interaction.
 Implement NLP techniques to preprocess and understand user queries.
 Build a BERT-based model to accurately classify offenses from user input.
 Provide detailed information and explanations on predicted IPC sections.
 Offer actionable legal advice tailored to the classified offenses.
 Integrate a recommendation system to connect users with suitable legal
professionals.
 Develop an admin panel for managing datasets, user accounts, and
application operations.
 Ensure scalability, reliability, and security to handle growing user demands
safely.

1.5 SCOPE OF THE PROJECT


The project aims to deliver comprehensive legal assistance through an
AI-powered web platform, covering the following key aspects:
 User Interface Development:
Design and develop a user-friendly, responsive web interface compatible
10
with various devices and browsers. The interface will enable intuitive query
submission, result display, and easy navigation across screen sizes.
 Natural Language Processing (NLP) Implementation:
Utilize NLP techniques such as tokenization, stopword removal, and
stemming/lemmatization to effectively analyze and normalize user queries.
Libraries like NLTK and SpaCy will be employed to support efficient text
processing.
 Machine Learning Model Construction:
Develop a BERT-based machine learning model trained on datasets
containing IPC sections, offense descriptions, and punishments. The model
will be fine-tuned to accurately classify offenses by understanding the
contextual nuances of user input.
 Information Provision:
Present detailed, structured, and easy-to-understand information about the
predicted IPC sections, including offense descriptions and prescribed
punishments, to assist users in comprehending their legal matters clearly.
 Recommendation System Integration:
Implement a recommendation system that suggests legal professionals
tailored to user queries and geographical location, utilizing user preferences
and location data for personalized and relevant suggestions.
 Deployment and System Maintenance:
Deploy the application on scalable, secure hosting infrastructure. Conduct
thorough testing and validation to ensure system accuracy and reliability.
Ongoing monitoring and maintenance will address issues and implement
updates as required.

11
CHAPTER 2
LITERATURE SURVEY

12
CHAPTER 2
LITERATURE SURVEY

2.1 Advanced NLP Models for Technical University Information Chatbots:


Development and Comparative Analysis
Problem:
Engineering aspirants and parents face difficulty in getting accurate,
timely info during college counselling. High query volume overwhelms users
and officials, causing miscommunication and dependence on unofficial sources.
College websites are often confusing and time-consuming to navigate
Findings:
Neural network-based models, especially sequential models, showed
higher accuracy than TF-IDF and pattern matching. Sequential models
effectively captured context and avoided overfitting. Chatbots with optimizers
performed better. Pattern matching and semantic analysis were crucial for real-
time accuracy.

2.2 Adoption of AI-Chatbots to Enhance Student Learning Experience in


Higher Education in India
Problem:
Despite rapid adoption in sectors like food delivery, finance, and e-
commerce, the Indian education sector has been slow to implement chatbot
technology for enhancing student learning and communication.
Findings:
The study found that improved student engagement, communication, and
personalized learning drive chatbot adoption in Indian higher education.
Challenges include system integration, data privacy, and low awareness. The
research highlights chatbots’ potential to transform education and helps guide
overcoming these barriers.
13
2.3 A Conversation-Driven Approach for Chatbot Management
Problem:
Lack of standardized content management in chatbots results in
inefficient maintenance and poor user experience.
Findings:
The CMP improved Evatalk chatbot’s post-deployment management by
reducing human hand-offs from 44.43% to 30.16% and increasing the
knowledge base by 160%, while keeping user satisfaction stable. Its analysis
phase aligned goals with performance, optimizing chatbot health.

2.4 Xatkit: A Multimodal Low-Code Chatbot Development Framework


Problem:
Existing chatbot frameworks require advanced technical skills, are
difficult to adapt, and often raise development and maintenance costs.
Findings:
Xatkit advances chatbot development by separating chatbot design from
platform-specific details, improving reusability and redeployment. It supports
evolving the NLU engine for better text analysis

2.5 Legal Solutions - Intelligent Chatbot using Machine Learning


Problem:
Many people struggle to access clear legal advice due to complexity and
lack of legal expertise or affordable counsel..
Findings:
The research paper introduces an AI chatbot that democratizes legal
support by offering basic legal knowledge, personalized guidance, and real-time
attorney consultations. Powered by advanced NLP and machine learning, it
delivers accurate, timely, and tailored legal information, aiming to make legal
assistance accessible and equitable for all.
14
CHAPTER 3
SYSTEM ANALYSIS

15
CHAPTER 3
SYSTEM ANALYSIS

3.1 EXISTING SYSTEM


The traditional system for accessing legal information and guidance
typically involves consulting professional lawyers, engaging in manual legal
research, or seeking advice from legal experts. Here are some key aspects of the
traditional system:
 Legal Consultation Services
Individuals seeking legal assistance traditionally turn to law firms or
independent lawyers. This involves scheduling appointments, attending
consultations, and incurring fees for professional advice.
 Manual Legal Research
Legal research is often performed manually by individuals, legal
professionals, or law students. This involves searching through legal databases,
books, and documents to identify relevant laws, statutes, and case precedents.
 Legal Libraries and Resources
Access to legal libraries and resources is crucial for comprehensive legal
research. Law libraries house legal texts, case law reporters, and other materials
that individuals can use for reference.
 Lawyer Referral Services
Some jurisdictions offer lawyer referral services where individuals can
request a referral to a lawyer based on their specific legal needs. These services
connect individuals with legal professionals in their area.
 Legal Aid Clinics
Legal aid clinics, often operated by law schools or nonprofit
organizations, provide free or low-cost legal assistance to individuals who
cannot afford traditional legal services.

16
 Rule based chatbots
A rule-based chatbot is a type of conversational agent or virtual assistant
that operates on a predefined set of rules and decision pathways. Unlike more
advanced AI-powered chatbots, which leverage machine learning and natural
language processing (NLP) techniques to understand and respond to user inputs,
rule-based chatbots follow a fixed set of instructions to interact with users.

Figure 3.1: Rule Based Chatbot


Rule-based chatbots operate on predefined rules set by developers, using
structured, predetermined responses. They often follow decision trees to guide
interactions based on user input and specific conditions
3.1.1 DISADVANTAGES
 High costs and financial barriers.
 Time-consuming processes and potential delays.
 Limited accessibility, especially in remote areas.
 Complexity of legal language and procedures.
 Dependency on physical resources like legal libraries.
 Potential biases based on socioeconomic factors.
 Limited flexibility with predefined rules.
 Dependence on explicit rules, challenging to update.

17
 Difficulty handling ambiguous or nuanced language.
 Scalability issues in managing diverse legal scenarios.
 Limited learning capabilities and adaptation over time.
 Challenges in understanding and responding to contextual nuances in
legal queries.

3.2 PROPOSED SYSTEM


The proposed system, "LegalBot," is a comprehensive platform offering
accurate legal guidance and support through various integrated modules tailored
to users' legal needs.
 LawNet Model Integration
At the core of LegalBot is the LawNet Model Integration Module, which
uses advanced techniques like BERT to classify offenses based on user
descriptions and predict relevant IPC sections or legal categories. Powered by
machine learning, it provides users with accurate and insightful legal
classifications.
 LegalBot Chat Interface
Built with Flask-SocketIO, the LegalBot Chat Interface enables real-time,
interactive legal conversations, allowing users to submit queries and receive
instant responses.
 Legal Advice and Assistance
The Legal Advice and Assistance Module offers actionable insights,
guiding users on legal actions, defenses, and strategies to help them navigate
complexities and make informed decisions.
 Multilanguage Translation
The Multilanguage Translation Module translates responses into various
languages, allowing users to access legal help in their preferred language and
enhancing overall accessibility and user experience.

18
 Advocate and Lawyer Recommendation
The Advocate and Lawyer Recommendation Module connects users with
suitable legal professionals by filtering database entries based on user queries
and location.
3.2.1 ADVANTAGES
 Cost-effective legal assistance for minor issues, reducing reliance on
expensive consultations.
 Time-efficient guidance, minimizing delays in accessing legal support.
 Universal access to legal expertise via digital platforms, overcoming
geographical barriers.
 Natural language interaction for user-friendly conversations, making legal
information accessible.
 Digital knowledge repository, eliminating the need for physical legal
resources.
 Promotion of legal literacy, empowering users with essential legal
knowledge.
 Adaptive learning for continuous improvement based on user interactions.
 Context-aware responses, providing accurate information tailored to user
queries.
 Equitable access to justice, reducing disparities in legal support availability.
 Multilingual support for enhanced accessibility and user experience.
 Accurate offense classification and legal guidance through advanced
machine learning techniques.
3.3. FEASIBILITY STUDY
The feasibility analysis of the LegalBot project evaluated its practicality
and potential for successful execution across various dimensions. Here's an
overview of the feasibility analysis:

19
3.3.1 Technical Feasibility
 Availability of Technology:The project utilized widely available and
well-documented tools like Python, Flask, and TensorFlow.
 System Architecture:The system, including BERT integration for NLP,
was technically feasible with existing technologies.
 Scalability:The design supports future scalability to handle
growing user demand.
3.3.2 Economic Feasibility
 Cost Estimation:The project's budget covered expenses related to
hardware, software, development resources, and operational costs. A
detailed cost estimation was performed to ensure financial feasibility.
 Return on Investment (ROI):The potential benefits of the LegalBot
system, such as improved efficiency, reduced legal costs, and enhanced
user satisfaction, justified the initial investment.
 Cost-Benefit Analysis:A cost-benefit analysis was conducted to assess
whether the expected benefits outweighed the project's costs over its
lifecycle.
3.3.3 Operational Feasibility:
 User Acceptance:Stakeholder buy-in and user acceptance were crucial for
the success of the project. User feedback and engagement were actively
solicited throughout the development process to ensure alignment with
user needs and preferences.
 Integration with Existing Processes:The LegalBot system seamlessly
integrated with existing legal workflows and processes to minimize
disruption and facilitate adoption by legal professionals and clients.

20
 Training and Support:Adequate training and support mechanisms were
in place to assist users in effectively utilizing the system and addressing
any issues that arose.

21
3.4 SYSTEM ARCHITECTURE

Figure 3.2: System architecture


The system architecture of LegalBot is structured into three primary
modules: Training, Web Interface, and User Interaction, each facilitating a
crucial phase in delivering intelligent legal assistance to citizens.

22
1. LegalBot Training (Admin Panel)
This module is managed by the Web Admin, who logs in and uploads
datasets related to legal domains. The data then undergoes a comprehensive
training pipeline, which includes:
 Preprocessing: Cleaning and normalizing raw text data.
 Feature Extraction: Identifying key linguistic and semantic features.
 Classification: Categorizing input based on legal intent or domain.
 Build and Train: Training NLP models to understand legal queries
accurately.
2. LegalBot Web
This central component acts as the processing and integration hub,
ensuring seamless communication between training modules, the web interface,
and the chatbot.
3. LegalBot Response Prediction (User Interaction)
Citizens, users, or victims interact directly with the chatbot via a user-
friendly interface. This module handles:
 Intent Recognition: Understanding the purpose of a user query.
 Entity Recognition: Identifying legal entities (e.g., names, dates, case
types).
 Dependency Parsing: Analyzing grammatical structure for better
comprehension.
 Generate Response: Producing contextually accurate legal advice or
guidance.
User Access
 Citizens/users/victims: Can register or log in to interact with LegalBot
and receive legal support.
 Web Admin: Has a dedicated interface to manage datasets and model
training.

23
CHAPTER 4
SYSTEM CONFIGURATION

24
CHAPTER 4
SYSTEM CONFIGURATION
4.1 HARDWARE SPECIFICATIONS
 Processor: Dual Intel Xeon or AMD Ryzen for parallel processing.
 RAM: 16GB to 64GB DDR4 ECC for fast data access.
 Storage: RAID-configured 500GB SSDs for improved performance.

4.2 SOFTWARE SPECIFICATIONS


 Operating System: Windows 10 or 11 for development
 Web Server:
o Flask for backend
o Socket.IO for real-time communication
 Database Management System (DBMS): MySQL for data storage
 Programming Languages:
o Python for backend
o HTML, CSS, JavaScript for frontend
 Machine Learning Libraries:
o TensorFlow for model building
o Scikit-learn for data preprocessing
 Text Processing Libraries: NLTK and SpaCy for text preprocessing
 Deployment Tools: WampServer (for Windows) for local development
 Integrated Development Environment (IDE): IDLE
 Text Translation Services: Google Translate API

4.3. SOFTWARE DESCRIPTION


4.3.1 Python (v3.7.9)
High-level language used for backend, AI & ML. Main Libraries Used:
 Flask (v1.1.2) – Web backend

25
 Pandas (v1.1.3) – Data handling
 NumPy (v1.19.2) – Numerical computing
 Matplotlib (v3.3.2) – Visualization
 Scikit-learn (v0.23.2) – ML models
 NLTK (v3.5) – Text processing
 WordCloud (v1.8.1) – Word cloud creation
 SpeechRecognition (v3.8.1) – Voice input
 gTTS (v2.2.2) – Text to speech
 Googletrans (v3.1.0a0) – Language translation
 Gensim (v3.8.3) – Word2Vec & topic modeling
 OpenCV (cv2 v4.4.0) – Image processing
 Pillow (PIL v8.0.1) – Image handling
4.3.2 MySQL (v5.7)
 Relational database system to store user queries and responses.
 Used With: PhpMyAdmin (via WAMP)
 Login: No password set (localhost)
4.3.3 Wampserver (v3.2.0 64-bit)
 Local server bundle with Apache, PHP, and MySQL for Windows.
 Used For: Running MySQL and managing it via PhpMyAdmin.
4.3.4 Bootstrap (v4.5)
 CSS framework for building responsive UI.
 Used For: Mobile-friendly and structured webpage design.
4.3.5 Flask (v1.1.2)
 Lightweight Python web framework.
 Used For: Routing, HTML rendering, form handling, and session
management.

26
CHAPTER 5
SYSTEM IMPLEMENTATION

27
CHAPTER 5
SYSTEM IMPLEMENTATION

The implementation of the LegalBot project involves integrating multiple


components and technologies to deliver efficient legal assistance. The key
implementation steps are as follows:
1 .Backend Development
 Use Python and the Flask framework for backend development.
 Implement API endpoints to handle user requests, query processing, and
response generation.
 Set up routes for user authentication, query submission, and
administrative operations.
2 .Database Management
 Utilize MySQL as the relational database system for data storage.
 Design a database schema to manage user accounts, datasets,
advocate/lawyer details, and system configurations.
 Implement CRUD (Create, Read, Update, Delete) operations for efficient
data management.
3. Machine Learning Model Integration
 Employ TensorFlow to develop and deploy machine learning models.
 Integrate BERT (Bidirectional Encoder Representations from
Transformers) for offense classification tasks.
 Fine-tune the BERT model on a curated dataset consisting of IPC sections,
descriptions, offenses, and punishments.
4 .Text Processing
 Use NLTK (Natural Language Toolkit) for text processing operations
such as tokenization, stopword removal, and stemming/lemmatization.
 Preprocess user queries to optimize input data for analysis and
classification.
28
5. Multilanguage Translation
 Integrate a multilingual translation service like Google Translate or
Microsoft Translator.
 Enable automatic translation of generated responses based on user
preferences or system settings.
6. Frontend Development
 Develop the user interface using HTML, CSS, JavaScript, and the
Bootstrap framework.
 Design a responsive, user-friendly interface for seamless interaction.
 Implement features including query submission, response display, and
advocate/lawyer recommendations.
7 .Real-time Communication
 Use Flask-SocketIO to enable real-time communication between users
and the system.
 Facilitate instant messaging and interactive chat through the LegalBot
interface.
8 .Admin Panel
 Develop an admin panel using the Flask framework.
 Implement authentication and role-based access control for admin users.

5.1 SYSTEM DESCRIPTION


The project is a comprehensive legal assistance platform built with
Python and Flask, using MySQL for data storage and TensorFlow for machine
learning. It processes data with Pandas, NumPy, and Scikit-learn, visualizes
with Matplotlib and Seaborn, and handles NLP via NLTK. The frontend uses
Bootstrap, and WampServer supports local development. Key modules include:
User Interface, Text Processing, BERT-based LawNet model for offense
classification, IPC Section Information, Feedback and Improvement, Legal
Advice, Admin Panel, and a real-time Flask-SocketIO LegalBot Chatbot. The
29
system supports preprocessing, fine-tuning, and evaluation of the LawNet
model, multilingual response generation, and recommends lawyers based on
queries and location.

5.2 SYSTEM FLOW


The project operates through a well-defined flow to ensure seamless user
interaction and effective delivery of legal assistance. Here's an overview of the
system flow:
1. User Access: Users register or log in to the web/chat interface.
2. Query Submission: Legal queries are submitted for analysis.
3. Text Processing: Input is preprocessed (tokenization, stopword removal,
stemming/lemmatization).
4. Offense Classification: A BERT-based LawNet model predicts IPC
sections or legal categories.
5. Response Generation: Legal responses include IPC sections, descriptions,
punishments, and advice.
6. Multilingual Support: Responses are translated to the user’s preferred
language.
7. Result Display: Translated output is shown in the interface.
8. Lawyer Recommendation: Relevant advocates are suggested based on
query and location.
9. Admin Functions: Admins manage data, retrain models, and update system
content.
10. Legal Assistance Delivery: End-to-end support with accurate predictions,
insights, and lawyer referrals.

5.3 Modules Description


5.3.1 Legalbot Web App

30
The web app is developed using Python and Flask for the backend and MySQL
for data storage. It uses TensorFlow for ML, Pandas, NumPy, Scikit-learn for
data handling, and NLTK for NLP. Matplotlib and Seaborn handle data
visualization, while Bootstrap ensures a responsive UI. WampServer is used for
local testing.
5.3.2 Legalbot Chatbot Interface
Built using Flask-SocketIO, the chatbot provides real-time, interactive legal
conversations. Users can ask about legal issues, offenses, or IPC sections and
receive immediate responses. It features:
 A simple, responsive chat window
 Query input and history display
5.3.3 Lawnet Model: Build and Train
The LawNet model, based on BERT, undergoes:
 Data preprocessing
 Model fine-tuning and training
 Performance evaluation
5.3.3.1 Dataset Description
Description of IPC Section: In-depth explanation of the respective IPC section,
highlighting the nature of offenses covered.
Offense: Specific details regarding the offense outlined in the IPC section.
Punishment: The prescribed punishment for the offense, inclusive of
potential imprisonment, fines, or a combination thereof.
Section: The section number within the IPC.

31
Table 5.1: Dataset

5.3.3.2 Preprocessing
To import the dataset containing IPC sections, descriptions, offenses,
punishments, and section numbers, you can use the Pandas library in Python.
For cleaning the dataset by removing any irrelevant information, handling
missing values, and ensuring consistency in formatting by following steps:
 Tokenization
Tokenization is the process of breaking down text into individual words
or tokens For example, consider the following description from the dataset:
"Unlawful assembly armed with deadly weapon. "After tokenization, this
description would be split into tokens: ["Unlawful", "assembly", "armed",
"with", "deadly", "weapon"]
 Stopword Removal
Stopword removal is the process of filtering out common words, known
as stopwords, that do not carry significant meaning and can be safely
discarded.For example, consider the description: "The accused was found guilty
of theft and sentenced to imprisonment."After stopwords removal, common
words like "the," "was," "of," and "and" would be filtered out, resulting in:
["accused", "found", "guilty", "theft", "sentenced", "imprisonment"]
 Stemming/Lemmatization

32
Stemming and lemmatization are techniques used to reduce words to their
root or base form. Stemming involves removing suffixes from words to extract
their root and finds the key word For example, consider the description:
"Crimes committed under the influence of alcohol should be dealt with strictly."
After stemming or lemmatization, words like "committed" might be reduced to
"commit," and "dealt" might be reduced to "deal":
 TF-IDF Vectorization
TF-IDF (Term Frequency-Inverse Document Frequency) vectorization is
a technique used to convert text documents into numerical representations,
capturing the importance of words in a document relative to a collection of
documents.TF-IDF vectorization involves representing each description as a
vector of TF-IDF weights for each word in the vocabulary.
"ar
Descri "unla "asse med "dea "wea "accu "fou "gui "th "sente "impriso "cri "comm "influ "alco "de "stric
ption wful" mbly" " dly" pon" sed" nd" lty" eft" nced" nment" mes" itted" ence" hol" alt" tly"

1 0.301 0.301 0.301 0.301 0.301 0 0 0 0 0 0 0 0 0 0 0 0

2 0 0 0 0 0 0.25 0.25 0.25 0.25 0.25 0.25 0 0 0 0 0 0

0.35
3 0 0 0 0 0 0 0 0 0 0 0 0.353 0.353 0.353 0.353 3 0.353

Table 5.2: TF-IDF vectorization


5.3.3.3 Classification
BERT (Bidirectional Encoder Representations from Transformers) is a
powerful language model by Google that understands word context by
analyzing both preceding and following words. Widely used in NLP tasks like
text classification, BERT can effectively categorize descriptions in a dataset
containing IPC sections, offenses, punishments, and section numbers
5.3.3.4 Model deployment
Deploying the LawNet model into a LegalBot web app involves
integrating the model into various modules to provide legal assistance and
support to users. This module forms the backbone of the LegalBot's

33
functionality, enabling it to effectively analyze and classify offenses based on
textual descriptions provided by users. At its core, it integrates the LawNet
model, meticulously built and fine-tuned using the advanced BERT architecture,
into the web application's infrastructure.

5.4 LEGALBOT RESPONSE PREDICTOR


5.4.1 User Input Query Processing
When a user submits a query, the LegalBot Response Predictor Module
begins by preprocessing the input text. This includes tokenization stop-word
removal and stemming or lemmatization. These steps help normalize the input
for more accurate analysis.
5.4.2 Prediction
The preprocessed query is then passed to the LawNet model, which is
trained using advanced NLP techniques such as BERT (Bidirectional Encoder
Representations from Transformers). The model, trained on a legal dataset
comprising IPC sections, descriptions, offenses, and punishments, predicts the
most relevant IPC section or legal category based on contextual understanding.

5.5 RECOMMENDATION
User Location Retrieval:The module first retrieves the user’s location, either
via manual input or IP-based geolocation. This ensures recommendations are
relevant to the user's geographic area..
Database Query:Based on the retrieved location, the module queries a database
of advocates and lawyers in the vicinity. This database includes details like
contact information, expertise, qualifications, and client reviews.
Filtering and Sorting:The results are filtered and sorted using criteria such as
specialization, proximity, availability, and reputation. This ensures
recommendations match the user’s legal needs.

34
5.6 END USER
5.6.1 Admin Modules
Admin Authentication:Secure login functionality for administrators using
system-provided credentials. Ensures access control and protects sensitive
administrative operations.
Dataset Management:Enables uploading, updating, and maintaining datasets
containing IPC sections, offense descriptions, punishments, and other relevant
legal data. Ensures data consistency and integrity for accurate predictions.
LawNet Model Training:Facilitates training of the LawNet model using the
uploaded dataset. Admins can configure training parameters (e.g., batch size,
learning rate, epochs) and monitor training progress and model accuracy.
Advocate and Lawyer Management:Allows admins to manage the database
of legal professionals by adding, updating, or deleting advocate/lawyer records.
Details include name, contact, area of expertise, qualifications, and user ratings.
User Management:Enables the administration of end-user accounts. Admins
can register new users, update profiles, handle password resets, and delete
accounts as necessary.
5.6.2 User Modules
User Registration:New users can create accounts by providing basic
information such as name, email, and password. Registration enables access to
personalized legal services.
User Authentication: Registered users can securely log in to the platform using
their credentials. Ensures privacy and access to personalized system features.
Query Submission:Users can input legal queries or case descriptions through
an interactive chat interface. The system processes the input for legal analysis
and classification.
Prediction Result: Displays the results of query analysis, including the
predicted IPC section, offense type, punishment, and a simplified explanation.
Information is retrieved from the NLP classification model.
35
Lawyer Recommendation:Provides location-based recommendations of
advocates or lawyers based on the user’s query. The system uses the lawyer
database to suggest suitable professionals with relevant expertise.

36
CHAPTER 6
SYSTEM TESTING

37
CHAPTER 6
SYSTEM TESTING

System testing of the LegalBot project involves a thorough evaluation of


its functionality, performance, reliability, and usability to ensure effective and
accurate legal assistance. The testing process includes the following stages:
1. Unit Testing
 Each component and module is tested individually to verify its
functionality.
 Test cases are designed to cover normal and edge scenarios to uncover
defects.
 Every unit is validated against its defined requirements.
2. Integration Testing
 Combined modules are tested to ensure smooth interaction and data flow.
 Integration tests validate communication, compatibility, and overall
system architecture.
 Any issues in inter-module functioning are identified and resolved.
3. Functional Testing
 All user-defined functionalities are tested to confirm correct behavior.
 Key functions such as query submission, offense classification, and
lawyer recommendation are validated.
 Ensures the system aligns with end-user expectations and predefined
specifications.
4. Performance Testing
 System performance is evaluated under varying conditions and loads.
 Load Testing checks response under normal and peak usage.
 Stress Testing determines the system’s capacity and identifies
performance bottlenecks.
5. Reliability Testing
38
 Assesses whether the system operates consistently under normal and
adverse scenarios.
 Simulates real-world conditions and verifies fault tolerance.
 Ensures the system can gracefully handle errors and recover effectively.
6. Usability Testing
 The user interface and experience are assessed for ease of use,
intuitiveness, and accessibility.
 Real users interact with the system to identify challenges and provide
feedback.
 Improvements are implemented based on user input to enhance
satisfaction.
7. Compatibility Testing
 Verifies the system’s performance across various devices, operating
systems, and browsers.
 Ensures proper layout, responsiveness, and functionality on different
screen sizes.
 Confirms cross-platform consistency
.
6.1 TEST CASES
1. Test Case ID: LB_TC_002
 Input: User submits a query with ambiguous language.
 Expected Result: System asks for clarification or provides multiple
possible interpretations.
 Actual Result: System prompts user for clarification.
 Status: Pass
2. Test Case ID: LB_TC_005
 Input: Admin uploads a new dataset.
 Expected Result: System successfully processes and integrates the new
dataset without errors.
39
 Actual Result: System processes the new dataset and updates the
database.
 Status: Pass
3. Test Case ID: LB_TC_006
 Input: User registers a new account.
 Expected Result: System creates a new user account and sends a
confirmation email.
 Actual Result: System successfully creates the user account and sends
the confirmation email.
 Status: Pass
4. Test Case ID: LB_TC_007
 Input: User submits a query requiring legal advice.
 Expected Result: System provides accurate and relevant legal advice
based on the query.
 Actual Result: System offers informative legal advice tailored to the
user's query.
 Status: Pass
5. Test Case ID: LB_TC_08
 Input: Admin updates advocate/lawyer details.
 Expected Result: System reflects the updated information accurately in
the advocate/lawyer database.
 Actual Result: System updates advocate/lawyer details as per admin
input.
 Status: Pass

6.2 TEST REPORT


This report provides an overview of the testing activities carried out on
the LegalBot system. The objective was to validate the system’s functionality,

40
reliability, and performance against the predefined requirements and quality
standards.
Test Objective: The primary aim was to verify the accuracy of response
predictions, evaluate system responsiveness, and detect any potential issues or
bugs.
Test Scope: Testing encompassed all major modules and features, including
user interaction, query processing, offense classification, response generation,
system behavior under varying loads, and admin functionalities.
Test Environment: Tests were conducted in a controlled environment using
the LegalBot web application deployed on a local server. Tools used included
web browsers (Chrome, Firefox), operating systems (Windows), and a Python
development environment.
Test Result: The system performed well overall. It delivered accurate
predictions and maintained satisfactory responsiveness. No critical bugs
affecting functionality were observed.
Bug Report: No major issues were identified. Minor inconsistencies in the
user interface and occasional error-handling anomalies were found but were
promptly resolved.
In conclusion, the LegalBot system has undergone comprehensive testing,
ensuring its functionality, reliability, and performance meet the desired
standards. The successful completion of testing validates the system's readiness
for deployment and use by end-users.

41
CHAPTER 7
CONCLUSION AND FUTURE ENHANCEMENT

42
CHAPTER 7
CONCLUSION AND FUTURE ENHANCEMENT

7.1 CONCLUSION
The LegalBot project simplifies Indian property law by offering an AI-
powered platform for resolving disputes. Using NLP and a BERT-based model,
it interprets queries, predicts relevant IPC sections, and explains offenses and
penalties. Users get legal insights, lawyer recommendations, and multilingual
support through an intuitive interface, while an admin panel ensures smooth
management and updates—empowering users with accessible, informed legal
guidance. It bridges the gap between complex legal systems and the general
public. By streamlining legal processes, LegalBot promotes faster, fairer dispute
resolution.

7.2 FUTURE ENHANCEMENT


Looking ahead, there are several avenues for future enhancement and
expansion of the LegalBot platform:
 Mobile Application Development:Create a mobile app version to offer
convenient legal assistance on smartphones and tablets, optimizing the
interface and functionality for various screen sizes.
 Collaboration with Legal Professionals: Partner with law firms and legal
experts to integrate their insights and feedback, ensuring the platform
remains practical, accurate, and relevant for both users and practitioners.
 Case Management System: Add a case management feature to help users
track and manage legal proceedings, including document storage, task
management, and calendar reminders for important deadlines.

43
 Legal Document Analysis:The analysis of contracts, agreements, and
rulings through models for summarization, clause extraction, and legal
entity recognition, offering clear insights and simplifying legal texts.

CHAPTER 8
APPENDIX

44
CHAPTER 8
APPENDIX

8.1 SOURCE CODE


Packages
from flask import Flask, render_template, Response, redirect, request, session,
abort, url_for
import os
import base64
from datetime import date
from random import randint
import re
from flask import send_file
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import csv
import time
import shutil
import json
import mysql.connector
import gensim
from gensim.parsing.preprocessing import remove_stopwords, STOPWORDS
from gensim.parsing.porter import PorterStemmer
from keras.layers import Input, Dense, LSTM, TimeDistributed
User Registration
def register():
msg=""
mycursor = mydb.cursor()
45
if request.method=='POST':
uname=request.form['uname']
name=request.form['name']
mobile=request.form['mobile']
email=request.form['email']
location=request.form['location']
pass1=request.form['pass']
now = datetime.datetime.now()
rdate=now.strftime("%d-%m-%Y")
mycursor = mydb.cursor()
mycursor.execute("SELECT count(*) FROM cc_register where
uname=%s",(uname, ))
cnt = mycursor.fetchone()[0]
if cnt==0:
mycursor.execute("SELECT max(id)+1 FROM cc_register")
maxid = mycursor.fetchone()[0]
if maxid is None:
maxid=1
uid=str(maxid)
sql = "INSERT INTO cc_register(id, name, mobile, email, location,uname,
pass,otp,status) VALUES (%s, %s, %s, %s, %s, %s, %s,%s,%s)"
val = (maxid, name, mobile, email, location, uname, pass1,'','0')
msg="success"
mydb.commit()
Training
#Upload Dataset
def admin():
msg=""
mycursor = mydb.cursor()
46
if request.method=='POST':
file = request.files['file']
fn="datafile.csv"
file.save(os.path.join("static/upload", fn))
filename = 'static/upload/datafile.csv'
data1 = pd.read_csv(filename, header=0)
data2 = list(data1.values.flatten())
#NLP-Preprocessing
def remove_stopwords(text):
clean_text=' '.join([word for word in text.split() if word not in nlp])
return clean_text
txt=remove_stopwords(msg_input)
stemmer = PorterStemmer()
from wordcloud import STOPWORDS
STOPWORDS.update(['rt', 'mkr', 'didn', 'bc', 'n', 'm',
'im', 'll', 'y', 've', 'u', 'ur', 'don','p', 't', 's', 'aren', 'kp', 'o', 'kat','de', 're', 'amp', 'will'])
def lower(text):
return text.lower()
def remove_specChar(text):
return re.sub("#[A-Za-z0-9_]+", ' ', text)
def remove_link(text):
return re.sub('@\S+|https?:\S+|http?:\S|[^A-Za-z0-9]+', ' ', text)
def remove_stopwords(text):
return " ".join([word for word in
str(text).split() if word not in STOPWORDS])
def stemming(text):
return " ".join([stemmer.stem(word) for word in text.split()])
def lemmatizer_words(text):
return " ".join([lematizer.lemmatize(word) for word in text.split()])
47
def cleanTxt(text):
text = lower(text)
text = remove_specChar(text)
text = remove_link(text)
text = remove_stopwords(text)
text = stemming(text)
return text
#BERT-Feature Extraction
def BERT():
super(BERTLM, self).__init__()
self.vocab = vocab
self.embed_dim =embed_dim
self.tok_embed = Embedding(self.vocab.size, embed_dim,
self.vocab.padding_idx)
self.pos_embed = LearnedPositionalEmbedding(embed_dim, device=local_rank)
self.seg_embed = Embedding(2, embed_dim, None)
for i in range(layers):
self.layers.append(TransformerLayer(embed_dim, ff_embed_dim, num_heads,
dropout))
self.emb_layer_norm = LayerNorm(embed_dim)
self.one_more = nn.Linear(embed_dim, embed_dim)
self.nxt_snt_pred = nn.Linear(embed_dim, 1)
self.dropout = dropout
self.device = local_rank
if approx == "none":
self.approx = None
elif approx == "adaptive":
self.approx = nn.AdaptiveLogSoftmaxWithLoss(self.embed_dim,
self.vocab.size, [10000, 20000, 200000])
48
else:
raise NotImplementedError("%s has not been implemented"%approx)
self.reset_parameters()
def reset_parameters(self):
nn.init.constant_(self.out_proj_bias, 0.)
nn.init.constant_(self.nxt_snt_pred.bias, 0.)
nn.init.constant_(self.one_more.bias, 0.)
def work(self, inp, seg=None, layers=None):
if layers is not None:
tot_layers = len(self.layers)
for x in layers:
if not (-tot_layers <= x < tot_layers):
raise ValueError('layer %d out of range '%x)
layers = [ (x+tot_layers if x <0 else x) for x in layers]
max_layer_id = max(layers)
seq_len, bsz = inp.size()
if seg is None:
seg = torch.zeros_like(inp)
x = self.tok_embed(inp) + self.seg_embed(seg) + self.pos_embed(inp)
x = self.emb_layer_norm(x)
x = F.dropout(x, p=self.dropout, training=self.training)
padding_mask = torch.eq(inp, self.vocab.padding_idx)
xs = []
for layer_id, layer in enumerate(self.layers):
x, _ ,_ = layer(x, self_padding_mask=padding_mask)
xs.append(x)
if layers is not None and layer_id >= max_layer_id:
break
if layers is not None:
49
x = torch.stack([xs[i] for i in layers])
else:
z = torch.tanh(self.one_more_nxt_snt(x[0]))
return x, z
def forward(self, truth, inp, seg, msk, nxt_snt_flag):
seq_len, bsz = inp.size()
x = self.tok_embed(inp) + self.seg_embed(seg) + self.pos_embed(inp)
x = self.emb_layer_norm(x)
x = F.dropout(x, p=self.dropout, training=self.training)
padding_mask = torch.eq(truth, self.vocab.padding_idx)
if not padding_mask.any():
padding_mask = None
for layer in self.layers:
x, _ ,_ = layer(x, self_padding_mask=padding_mask)
masked_x = x.masked_select(msk.unsqueeze(-1))
masked_x = masked_x.view(-1, self.embed_dim)
gold = truth.masked_select(msk)
y = self.one_more_layer_norm(gelu(self.one_more(masked_x)))
out_proj_weight = self.tok_embed.weight
if self.approx is None:
log_probs = torch.log_softmax(F.linear(y, out_proj_weight, self.out_proj_bias),
-1)
else:
log_probs = self.approx.log_prob(y)
loss = F.nll_loss(log_probs, gold, reduction='mean')
z = torch.tanh(self.one_more_nxt_snt(x[0]))
nxt_snt_pred = torch.sigmoid(self.nxt_snt_pred(z).squeeze(1))
nxt_snt_acc = torch.eq(torch.gt(nxt_snt_pred, 0.5),
nxt_snt_flag).float().sum().item()
50
nxt_snt_loss = F.binary_cross_entropy(nxt_snt_pred, nxt_snt_flag.float(),
reduction='mean')
tot_loss = loss + nxt_snt_loss
_, pred = log_probs.max(-1)
tot_tokens = msk.float().sum().item()
acc = torch.eq(pred, gold).float().sum().item()
return (pred, gold), tot_loss, acc, tot_tokens, nxt_snt_acc, bsz
####
#LSTM-Classification
class LSTM():
INPUT_VECTOR_LENGTH = 20
OUTPUT_VECTORLENGTH = 20
minimum_length = 2
maximum_length = 20
sample_size = 30000
WORD_START = 1
WORD_PADDING = 0
def extract_converstionIDs(conversation_lines):
conversations = []
for line in conversation_lines[:-1]:
split_line = line.split(' +++$+++ ')[-1][1:-1].replace("'","").replace(" ","")
conversations.append(split_line.split(','))
return conversations
def extract_quesans_pairs(linetoID_mapping,conversations):
questions = []
answers = []
for con in conversations:
for i in range(len(con)-1):
questions.append(linetoID_mapping[con[i]])
51
answers.append(linetoID_mapping[con[i+1]])
return questions,answers
def transform_text(input_text):
input_text = input_text.lower()
input_text = re.sub(r"I'm", "I am", input_text)
input_text = re.sub(r"he's", "he is", input_text)
input_text = re.sub(r"she's", "she is", input_text)
input_text = re.sub(r"it's", "it is", input_text)
input_text = re.sub(r"that's", "that is", input_text)
input_text = re.sub(r"what's", "that is", input_text)
input_text = re.sub(r"where's", "where is", input_text)
input_text = re.sub(r"how's", "how is", input_text)
input_text = re.sub(r"\'re", " are", input_text)
input_text = re.sub(r"\'d", " would", input_text)
input_text = re.sub(r"\'re", " are", input_text)
input_text = re.sub(r"won't", "will not", input_text)
input_text = re.sub(r"[-()\"#/@;:<>{}`+=~|]", "", input_text)
input_text = " ".join(input_text.split())
return input_text
def filter_ques_ans(clean_questions,clean_answers):
# Filter out the questions that are too short/long
short_questions_temp = []
short_answers_temp = []
for i, question in enumerate(clean_questions):
if len(question.split()) >= minimum_length and len(question.split()) <=
maximum_length:
short_questions_temp.append(question)
short_answers_temp.append(clean_answers[i])
short_questions = []
52
short_answers = []
for i, answer in enumerate(short_answers_temp):
if len(answer.split()) >= minimum_length and len(answer.split()) <=
maximum_length:
short_answers.append(answer)
return short_questions,short_answers
def create_vocabulary(tokenized_ques,tokenized_ans):
vocabulary = {}
for question in tokenized_ques:
for word in question:
if word not in vocabulary:
vocabulary[word] = 1
else:
vocabulary[word] += 1
for answer in tokenized_ans:
for word in answer:
if word not in vocabulary:
vocabulary[word] = 1
else:
vocabulary[word] += 1
return vocabulary
def create_encoding_decoding(vocabulary):
threshold = 15
count = 0
for k,v in vocabulary.items():
if v >= threshold:
count += 1
vocab_size = 2
encoding = {}
53
decoding = {1: 'START'}
for word, count in vocabulary.items():
if count >= threshold:
encoding[word] = vocab_size
decoding[vocab_size ] = word
return encoding,decoding,vocab_size
def transform(encoding, data, vector_size=20):
transformed_data = np.zeros(shape=(len(data), vector_size))
for i in range(len(data)):
try:
transformed_data[i][j] = encoding[data[i][j]]
except:
transformed_data[i][j] = encoding['<UNKNOWN>']
return transformed_data
def create_gloveEmbeddings(encoding,size):
file = open(GLOVE_MODEL, mode='rt', encoding='utf8')
words = set()
word_to_vec_map = {}
for line in file:
line = line.strip().split()
word = line[0]
words.add(word)
word_to_vec_map[word] = np.array(line[1:], dtype=np.float64)
embedding_matrix = np.zeros((size, 50))
for word,index in encoding.items():
try:
embedding_matrix[index, :] = word_to_vec_map[word.lower()]
except: continue
return embedding_matrix
54
def create_model(dict_size,embed_layer,hidden_dim):
encoder_inputs = Input(shape=(maximum_length, ), dtype='int32',)
encoder_embedding = embed_layer(encoder_inputs)
encoder_LSTM = LSTM(hidden_dim, return_state=True)
decoder_embedding = embed_layer(decoder_inputs)
decoder_LSTM = LSTM(hidden_dim, return_state=True,
return_sequences=True)
decoder_outputs, _, _ = decoder_LSTM(decoder_embedding,
initial_state=[state_h, state_c])
outputs = TimeDistributed(Dense(dict_size,
activation='softmax'))(decoder_outputs)
model = Model([encoder_inputs, decoder_inputs], outputs)
return model
def prediction_answer(user_input,model):
transformed_input = transform_text(user_input)
input_tokens = [nltk.word_tokenize(transformed_input)]
input_tokens = [input_tokens[0][::-1]] #reverseing input seq
encoder_input = transform(encoding, input_tokens, 20)
decoder_input = np.zeros(shape=(len(encoder_input),
OUTPUT_VECTORLENGTH))
decoder_input[:,0] = WORD_START
for i in range(1, OUTPUT_VECTORLENGTH):
pred_output = model.predict([encoder_input, decoder_input]).argmax(axis=2)
decoder_input[:,i] = pred_output[:,i]
return pred_output

55
8.2 SCREENSHOTS

Figure 8.1: Program initialization

Figure 8.2: Home page

56
Figure 8.3: Admin Login

Figure 8.4: Admin Page

57
Figure 8.5: Adding Lawyer Detail

Figure 8.6:Upload Dataset

58
Figure 8.7: Dataset Training

Figure 8.8:User Registration

59
Figure 8.9:User Login page

Figure 8.10: User Interaction

60
Figure 8.11: LegalBot Response

Figure 8.12: Lawyer Recommendation

61
CHAPTER 9
REFERENCES

62
CHAPTER 9
REFERENCES

1. Ahmad, M., Kamal, A., & Shahzad, W. (2019). A review of chatbots in


customer service. In 2019 3rd International Conference on Computing,
Mathematics and Engineering Technologies (iCoMET) (pp. 1-6). IEEE.
2. Debnath, B., Chakraborty, D., & Mandal, S. K. (2019). Chatbot for e-
learning: A review. In Proceedings of the 2nd International Conference
on Inventive Research in Computing Applications (pp. 186-190). IEEE.
3. Gao, W., & Huang, H. (2019). An intelligent chatbot system for online
customer service. In Proceedings of the 2019 2nd International
Conference on Education and Multimedia Technology (pp. 208-211).
ACM.
4. Goyal, P., Gupta, R., & Goyal, L. M. (2020). A review of chatbot and
natural language processing. International Journal of Advanced
Research in Computer Science, 11(4), 69-75.
5. H. Jin and H. Kim, "Developing a Chatbot Service Model for Customer
Support," in International Journal of Human-Computer Interaction, vol.
36, no. 12, pp. 1188-1195, 2020.
6. Hernandez-Mendez, A., Perez-Meana, H., & Sucar, L. E. (2018).
Natural language processing and chatbots: A survey of current research
and future possibilities. Journal of Computing and Information
Technology, 26(1), 1-18.
7. J. R. Lloyd and C. A. Boyd, "The Application of Chatbots in Learning
Environments: A Review of Recent Research," in Journal of

63
Educational Technology Development and Exchange, vol. 13, no. 1, pp.
1-14, 2020.
8. Lowe, R., & Pow, N. (2017). The rise of the conversational interface: A
new kid on the block. Computer, 50(8), 58-63.
9. M. H. Hashim, A. Alhamid, M. Aljahdali and A. Albaham, "Chatbot
technology for customer service: a systematic literature review," in
International Journal of Advanced Computer Science and Applications,
vol. 10, no. 6, pp. 305-312, 2019.
10. Muduli, S., & Sharma, S. (2021). Implementation of a conversational
chatbot system for e-commerce. In Intelligent Computing, Information
and Control Systems (pp. 753-760). Springer.
11. P. L. Poon and K. D. Chau, "Designing and Implementing a Chatbot
for Customer Service," in International Journal of Innovation and
Technology Management, vol. 16, no. 5, pp. 1-18, 2019.
12. Rajabi, A., Asgarian, A., & Ebrahimi, M. (2018). A comparative study
of machine learning algorithms for automated response selection in
chatbot systems. In Proceedings of the 9th Workshop on Computational
Approaches to Subjectivity, Sentiment and Social Media Analysis (pp.
45-52).
13. Rashid, S. M., Abdullah, A. H., & Ahmed, M. A. (2019). Development
of a chatbot using natural language processing for customer service.
International Journal of Computer Science and Information Security
(IJCSIS), 17(5), 167.
14. S. Srinivasan and S. Gunasekaran, "Survey on Chatbot Development
and Its Applications," in Journal of Computer Science, vol. 16, no. 11,
pp. 1398-1411, 2020.
15. Saini, V., & Singh, S. (2019). A review on chatbots in customer service
industry. In 2019 6th International Conference on Computing for
Sustainable Global Development (INDIACom) (pp. 313-317). IEEE.
64
16. Sarker, S., & Rana, S. (2020). AI based chatbot for customer service: A
review. In 2020 IEEE Region 10 Symposium (TENSYMP) (pp. 1774-
1778). IEEE.
17. Singh, A., & Sharma, M. (2020). AI Chatbot: A review of literature. In
2020 2nd International Conference on Innovative Mechanisms for
Industry Applications (ICIMIA) (pp. 23-28). IEEE.
18. Y. Liu, L. Wang and X. Liu, "Designing and Developing a Chatbot for
Customer Service," in Proceedings of the 2019 International
Conference on Computer Science and Artificial Intelligence, pp. 209-
213, 2019.
19. Y. Zhao, X. Zhao, Y. Zhang and C. Liu, "A survey on chatbot design
techniques," in Journal of Network and Computer Applications, vol.
153, pp. 102-117, 2020.
20. Zhang, Y., & Wallace, B. (2017). A sensitivity analysis of (and
practitioners’ guide to) convolutional neural networks for sentence
classification. arXiv preprint arXiv:1510.03820

65
66
|www.ijirset.com |A Monthly, Peer Reviewed & Refereed Journal| e-ISSN: 2319-8753| p-ISSN: 2347-6710|

Volume 14, Issue 5, May 2025


|DOI: 10.15680/IJIRSET.2025.1405141|

International Journal of Innovative Research in Science


Engineering and Technology (IJIRSET)
(A Monthly, Peer Reviewed, Refereed, Scholarly Indexed, Open Access Journal)

Impact Factor: 8.699 Volume 14, Issue 5, May 2025

IJIRSET©2025 | An ISO 9001:2008 Certified Journal | 12390


|www.ijirset.com |A Monthly, Peer Reviewed & Refereed Journal| e-ISSN: 2319-8753| p-ISSN: 2347-6710|

Volume 14, Issue 5, May 2025


|DOI: 10.15680/IJIRSET.2025.1405141|

AI-Driven Resolution of Property Disputes:


An NLP and BERT-Based Chatbot Approach
Gokul P1, Reuel Jehoada P2, Yuvan Shankar M3, Yeshwanth V4, Mr J Manoj Prabhakar 5
UG Scholar, Department of Artificial Intelligence and Data Science, Dhaanish Ahmed Institute of
Technology, Coimbatore, India1
UG Scholar, Department of Artificial Intelligence and Data Science, Dhaanish Ahmed Institute of
Technology, Coimbatore, India2
UG Scholar, Department of Artificial Intelligence and Data Science, Dhaanish Ahmed Institute of
Technology, Coimbatore, India3
UG Scholar, Department of Artificial Intelligence and Data Science, Dhaanish Ahmed Institute of
Technology, Coimbatore, India4
Assistant Professor, Department of Computer Science and Engineering, Dhaanish Ahmed Institute of
Technology, Coimbatore, India5
ABSTRACT: In recent years, there has been growing interest in leveraging artificial intelligence (AI) for
legal decision-making. This trend underscores the increasing engagement of academics and professionals in
exploring AI’s role in legal systems. AI technologies, including machine learning and natural language
processing (NLP), have the capability to analyse vast amounts of legal data, extract meaningful insights, and
enhance decision-making processes. This study aims to develop a robust AI-driven framework to resolve
property disputes by integrating advanced AI methodologies and utilizing real estate legal datasets. By
automating legal document analysis, this approach can enhance decision-making accuracy and reduce the
manual workload of legal professionals. Our research introduces a hybrid ensemble model specifically
designed for property dispute resolution. This model capitalizes on pre-trained embeddings and large
language models to improve legal decision predictions. By harnessing the strengths of preexisting
embeddings and incorporating sophisticated language models, our proposed system achieves high predictive
accuracy and efficiency in property dispute settlements. Furthermore, we emphasize the model’s interpret
ability, highlighting its capacity to identify key factors in legal decision-making. Achieving an accuracy rate
of approximately 83%, our research demonstrates how large language models (LLMs) and deep learning
techniques can effectively forecast legal outcomes.
KEYWORDS: AI-powered Dispute Resolution, BERT(Bidirectional Encoder Representations from
Transformers), Natural Language Processing (NLP)
I. INTRODUCTION
The application of AI in property dispute resolution has gained substantial momentum. Courts, real estate
professionals, and legal practitioners are increasingly adopting AI-powered solutions to enhance efficiency
and accuracy in legal decision-making. Traditional property dispute resolution requires extensive analysis of
legal texts, precedents, and statutory interpretations—a time-consuming and error-prone process. AI-driven
IJIRSET©2025 | An ISO 9001:2008 Certified Journal | 12391
|www.ijirset.com |A Monthly, Peer Reviewed & Refereed Journal| e-ISSN: 2319-8753| p-ISSN: 2347-6710|

Volume 14, Issue 5, May 2025


|DOI: 10.15680/IJIRSET.2025.1405141|
tools, particularly NLP and machine learning, offer promising solutions to streamline and automate these
tasks. This paper presents an innovative AI-based framework that employs large language models and
hybrid ensemble techniques to improve the accuracy of legal outcome predictions.
II. RELATED WORK
Prior studies have explored various AI methodologies for legal decision-making, including text
classification, NLPbased legal analysis, and predictive modelling for property-related court rulings.
Research on AI-assisted dispute resolution has primarily focused on machine learning models like support
vector machines (SVM), deep learning models such as convolutional neural networks (CNNs), and pre-
trained language models like BERT and RoBERTa. While these approaches have significantly improved
legal text analysis, they often lack interpret ability and fail to integrate multi-modal data sources effectively.
Our study addresses these limitations by introducing a hybrid ensemble model that combines pre-trained
embeddings with large language models, enhancing predictive accuracy and model explain ability.
III. METHODOLOGY
Our AI-driven framework consists of multiple components designed to analyse legal texts and predict
property dispute outcomes. The key steps in our methodology are:
• Data Collection: We compile a dataset from real estate legal cases, covering disputes related to
property ownership, tenancy, and land rights.
• Preprocessing: Legal documents undergo preprocessing steps such as tokenization, stop-word
removal, and embedding transformation using GloVe and transformer-based embeddings.
• Model Architecture: Our hybrid ensemble model integrates multiple language models, including
BERT, ALBERT, RoBERTa, and Distilled BERT, combined with pre-trained embeddings.
• Training and Optimization: The model is trained using supervised learning with a cross-entropy
loss function and optimized via the Adam optimizer.
• Prediction and Interpret ability: The model generates dispute resolution predictions while
ensuring interpret ability by highlighting key legal arguments

IJIRSET©2025 | An ISO 9001:2008 Certified Journal | 12392


|www.ijirset.com |A Monthly, Peer Reviewed & Refereed Journal| e-ISSN: 2319-8753| p-ISSN: 2347-6710|

Volume 14, Issue 5, May 2025


|DOI: 10.15680/IJIRSET.2025.1405141|

IV. EXPERIMENTAL RESULTS


We assess our model’s performance using standard evaluation metrics, including accuracy, precision,
recall, and F1score. Our results indicate that the proposed approach achieves an accuracy rate of
approximately 83%, surpassing traditional machine learning classifiers and single-model transformer
architectures. Furthermore, we analyse the interpret ability of predictions by examining feature importance
scores derived from attention mechanisms in transformer models.

IJIRSET©2025 | An ISO 9001:2008 Certified Journal | 12393


|www.ijirset.com |A Monthly, Peer Reviewed & Refereed Journal| e-ISSN: 2319-8753| p-ISSN: 2347-6710|

Volume 14, Issue 5, May 2025


|DOI: 10.15680/IJIRSET.2025.1405141|

V. DISCUSSION
Our findings underscore the effectiveness of AI-based approaches in property dispute resolution. The
hybrid ensemble model not only enhances prediction accuracy but also provides valuable insights into the
factors influencing legal decisions. By leveraging pre-trained embeddings and transformer models, our
framework can efficiently process complex legal texts while ensuring model interpret ability. However,
challenges persist in adapting AI models to various legal jurisdictions and mitigating biases in training data.
Future research should explore domain-specific adaptations and address ethical concerns related to AI’s role
in legal systems.
VI. CONCLUSION
This study introduces an AI-powered framework for resolving property disputes, utilizing large language
models and hybrid ensemble techniques. Our model significantly improves legal outcome predictions,
achieving high accuracy while maintaining interpretability. The integration of AI in property law has the
potential to revolutionize real estate dispute resolution, enhancing decision-making efficiency and reducing
the workload on legal professionals. Future research will focus on expanding the dataset, refining model
architectures, and addressing ethical considerations in AIdriven legal decision-making.
REFERENCES
1. N. Aletras, D. Tsarapatsanis, D. Preosiuc-Pietro, and V. Lampos, "Predicting judicial decisions of the
European court of human rights: A natural language processing perspective," PeerJ Comput. Sci., vol. 2,
p. e93, Oct. 2016.
2. D. M. Katz, M. J. Bommarito, and J. Blackman, "A general approach for predicting the behavior of the
supreme court of the United States," PLoS ONE, vol. 12, no. 4, Apr. 2017.
3. H. Zhong, Z. Guo, C. Tu, C. Xiao, Z. Liu, and M. Sun, "Legal judgment prediction via topological
learning," in Proc. Conf. Empirical Methods Natural Lang. Process., 2018, pp. 3540–3549.
4. R. Ren, J. W. Castro, A. Santos, S. Pérez-Soler, S. T. Acuña and J. de Lara, "Collaborative modelling:
Chatbots or on-line tools? An experimental study", Proc. Eval. Assessment Softw. Eng., pp. 1-9, 2020.
5. S. Pérez-Soler, E. Guerra and J. de Lara, "Collaborative modeling and group decision making using
chatbots in social networks", IEEE Softw., vol. 35, no. 6, pp. 48-54, Nov./Dec. 2018.
6. R. Ren, J. W. Castro, S. T. Acuña and J. de Lara, "Evaluation techniques for chatbot usability: A
systematic mapping study", Int. J. Soft. Eng. Knowl. Eng., vol. 29, no. 11n12, pp. 1673-1702, 2019.
7. D. S. Zwakman, D. Pal, T. Triyason and V. Vanijja, "Usability of voice-based intelligent personal
assistants", Proc. Int. Conf. Inf. Commun. Technol. Convergence, pp. 652-657, 2020.
8. D. S. Zwakman, D. Pal and C. Arpnikanondt, "Usability evaluation of artificial intelligence-based voice
assistants: The case of amazon alexa", SN Comput. Sci., vol. 2, pp. 1-16, 2021.

IJIRSET©2025 | An ISO 9001:2008 Certified Journal | 12394

You might also like