Ilovepdf Merged
Ilovepdf Merged
A PROJECT REPORT
Submitted by
BACHELOR OF ENGINEERING
IN
P.R.ENGINEERING COLLEGE.
MAY 2024
ANNA UNIVERSITY: CHENNAI 600 025
BONAFIDE CERTIFICATE
SIGNATURE SIGNATURE
Vallam,Thanjavur-613403 vallam,Thanjavur-
VIVO-VOCE
Date: Date:
ACKNOWLEDGEMENT
iv
ABSTRACT
Many people find themselves in situations where they require basic legal
information and guidance. However, seeking professional legal advice often
comes with a hefty price tag, making it impractical for minor issues or general
awareness. In a world where technology continues to revolutionize various
aspects of our lives, the legal field is no exception. With the advent of artificial
intelligence (AI), a new tool has emerged to assist individuals in understanding
their rights and navigating the complex realm of laws. This project proposes
Lawyer Bot, an AI-powered virtual assistant, is being developed to provide
accurate details about Indian laws and sections, offering guidance on what to do
in problematic situations and how laws can help resolve them. Lawyer Bot
employs natural language processing (NLP) and Bidirectional Encoder
Representations from Transformers (BERT) to interact with users in a
conversational manner. Users can simply input their queries into the chat
interface, expressing their concerns or describing the problematic situation they
are facing. The AI Legal Aid then utilizes its training on Indian laws and sections
to provide relevant information and suggest the next steps to take. Lawyer Bot AI
Legal Aid brings several benefits to individuals seeking legal information. Firstly, it
provides quick and accessible guidance, saving users from the consultation fees
associated with professional legal services for minor matters. Secondly, it raises
awareness about Indian laws and sections, promoting legal literacy among the
general population. Through its user-friendly chat interface, it empowers
individuals to understand their rights and navigate the complexities of the legal
system independently. As technology continues to reshape various aspects of
society, Lawyer Bot stands at the forefront of democratizing legal knowledge,
offering a cost-effective and user-friendly solution for individuals seeking legal
information and guidance in India.
v
ABBREVIATION
1 AI ARTIFICIAL INTELLIGENCE
2 DL DEEP LEARNING
vi
LIST OF FIGURES
1 ARCHITECTURE DIAGRAM 30
2 DATAFLOW DIAGRAM 31
vii
TABLE OF CONTENT
ACKNOWLEDGEMENT iv
ABSTRACT v
LIST OF ABBREVIATIONS vi
1 INTRODUCTION 1
1.1 Overview 1
1.2 Problem statement 3
1.3 AI ChatBot 4
1.4 Aim and Objectives 10
1.5 Scope of the Project 10
2 LITERTURE SURVEY 12
2.1 LAW-U 12
2.2 Legal Solutions 12
2.3 Crime Awareness 14
3 EXISTING SYSTEM 17
3.1 Introduction 17
viii
3.2 Disadvantages 19
4 PROPOSED SYSTEM 20
4.1 Introduction 20
4.2 Advantages 21
4.3 Feasibility study 21
5 SYSTEM REQUIREMENTS 23
5.1 Hardware Requirements 23
5.2 software Requirements 23
5.3 software Description 23
5.3.1 python 23
5.3.2 mysql 27
5.3.3 warpserver 27
5.3.4 Bootstrap 4 29
5.3.5 Flask 29
6 SYSTEM DESIGN 30
6.1 Architecture Diagram 30
6.2 Data Flow Diagram 31
6.2.1 Level 0 31
6.2.2 Level 1 31
6.2.3 Level 2 32
6.3 Use Case Diagram 33
7 SYSTEM IMPLEMENTATION 35
7.1 System Description 36
7.2 System Flow 37
8 MODULE DESCRIPTION 40
ix
8.1 web app 40
8.2 chatbot interface 40
8.3 build and train 40
8.3.1 dataset description 40
8.3.2 preprocessing 41
8.3.3 classification 42
8.3.4 model deployment 42
8.4 response predictor 42
8.4.1 query processing 42
8.4.2 prediction 43
8.5 Recommendation 43
8.6 End user 44
8.6.1 Admin modules 44
8.6.2 user modules 44
9 IMPLEMENTATION AND
RESULT 45
9.1 Test cases 45
9.2 Test report 47
10 CONCLUTION AND FUTURE
ENHANCEMENT 49
APPENDICES 56
APPENDIX A 56
SCREENSHOT
APPENDIX B 65
SAMPLE SOURCE CODE
x
CHAPTER 1
INTRODUCTION
1.1.OVERVIEW
Law, the discipline and profession concerned with the customs, practices, and rules of conduct of
a community that are recognized as binding by the community. Enforcement of the body of rules
is through a controlling authority. The term “Law’ denotes different kinds of rules and Principles.
Law is an instrument which regulates human conduct/behavior. Law means Rules of court,Decrees,
Judgment, Orders of courts, and Injunctions from the point of view of Judges. Therefore, Law is a
broader term which includes Acts, Statutes, Rules, Regulations, Orders, Ordinances, Justice,
Morality, Reason, Righteous, Rules of court, Decrees, Judgment, Orders of courts, Injunctions,
Tort, Jurisprudence, Legal theory, etc
The Indian Penal Code (IPC) serves as the fundamental legal framework in India for establishing
criminal liability related to specified offenses and setting exceptions to criminal liability for
1
criminal law, defining civil law rights, responsibilities, crimes, and punishments. The IPC
meticulously defines each offense, incorporating all necessary elements to constitute the offense.
Therefore, the IPC is the legal instrument that delineates punishable offenses and their associated
penalties. It applies to all Indian citizens and individuals of Indian origin, regardless of location.
The IPC is organized into 23 chapters and consists of 511 sections.
The Indian Penal Code has its roots in the times of British rule in India. It is known to have
originated from British legislation regarding its colonial conquests, dating back to the year 1860.
Mohomedan criminal law applied to both Hindus and Muslims.
In 1834, the First Law Commission, led by Thomas Babington Macaulay, drafted the Indian
Penal Code under the Charter Act of 1833, which was submitted to the Governor-General
of India Council in 1837, but it was again revised.
The Code was completed in 1850 and presented to the Legislative Council in 1856;
however, it did not take its place in British India's statute book following the Indian
Rebellion of 1857.
It was finally passed into law on October 6, 1860, after a careful revision by Barnes
Peacock, who later became the first Chief Justice of the Calcutta High Court.
The Code became effective on January 1, 1862. Unfortunately, Macaulay died near the end
of 1859 and did not live to see his masterpiece become law.
In its 42nd Report in 1971, the Law Commission proposed revising the IPC, and as a result,
several changes were made to it.
Similarly, On September 27, 2018, a five-judge Constitution bench of the Supreme Court
unanimously ruled to repeal Section 497 (Commonly known as adultery).
2
1.2.PROBLEM STATEMENT
Law refers to a system of rules, regulations, and principles established by a governing authority to
regulate behavior within a society or community. It serves as a framework for maintaining order,
resolving disputes, and promoting justice. Laws are typically enforced by governmental
institutions, such as courts and law enforcement agencies, and violations of laws may result in
penalties or sanctions. In many jurisdictions, understanding the intricacies of the law, particularly
statutes like the Indian Penal Code (IPC), can be challenging for individuals without legal expertise.
Many individuals may encounter legal issues or require clarification on offenses outlined in the
IPC, but they may lack the resources or expertise to navigate the complex legal landscape
effectively. One significant problem is the lack of accessibility and affordability of legal services.
Consulting with lawyers or legal professionals can be costly, making it difficult for individuals
with limited financial resources to obtain necessary legal advice or representation. This financial
barrier often disproportionately affects marginalized or underserved communities, exacerbating
existing inequalities within the legal system. Furthermore, the complexity and opaqueness of legal
language and procedures can pose challenges for individuals without legal expertise. Understanding
legal documents, statutes, and court proceedings requires specialized knowledge and training,
making it difficult for laypeople to navigate the legal system effectively. This lack of clarity and
transparency can contribute to confusion, misinterpretation, and even miscarriages of justice.
Moreover, the traditional legal system may be constrained by geographical limitations, particularly
in rural or remote areas where access to legal services is limited. Individuals residing in these
regions may face additional barriers such as long travel distances to access legal aid or
representation, further hindering their ability to seek timely and effective assistance. To bridge this
gap and empower individuals with legal knowledge and assistance, the LawyerBot project aims to
leverage artificial intelligence (AI) and natural language processing (NLP) techniques to develop
an interactive chatbot platform. This platform will enable users to input legal queries and receive
instant responses, guidance, and recommendations related to IPC sections, offenses, and
punishments. By harnessing machine learning models trained on legal datasets, LawyerBot seeks
to provide accurate and timely information, helping users understand their rights and obligations
effectively.
3
1.3.AI CHATBOT
An AI chatbot is a piece of software that interacts with a human through written language. It is
often embedded in web pages or other digital applications to answer customer inquiries without
the need for human agents, thus providing affordable effortless customer service.
Figure 1.3
Natural Language Processing (NLP) is a field of Artificial Intelligence (AI) that makes human
language intelligible to machines. NLP combines the power of linguistics and computer science to
study the rules and structure of language, and create intelligent systems (run on machine learning and
NLP algorithms) capable of understanding, analyzing, and extracting meaning from text and speech.
4
The steps to perform preprocessing of data in NLP include:
Segmentation:
You first need to break the entire document down into its constituent sentences. You can do this
by segmenting the article along with its punctuations like full stops and commas.
Figure 2: Segmentation
Tokenizing:
For the algorithm to understand these sentences, you need to get the words in a sentence and explain
them individually to our algorithm. So, you break down your sentence into its constituent words
and store them. This is called tokenizing, and each world is called a token.
Figure 3: Tokenization
5
Figure 4: Stop Words
Stemming:
It is the process of obtaining the Word Stem of a word. Word Stem gives new words upon adding
affixes to them
Figure 5: Stemming
Lemmatization:
The process of obtaining the Root Stem of a word. Root Stem gives the new base form of a word
that is present in the dictionary and from which the word is derived. You can also identify the base
words for different words based on the tense, mood, gender,etc.
6
Figure 6: Lemmatization
Now, you must explain the concept of nouns, verbs, articles, and other parts of speech to the
machine by adding these tags to our words. This is called ‘part of’.
Next, introduce your machine to pop culture references and everyday names by flagging names of
movies, important personalities or locations, etc that may occur in the document. You do this by
classifying the words into subcategories. This helps you find any keywords in a sentence. The
subcategories are person, location, monetary value, quantity, organization, movie. After performing
the preprocessing steps, you thengive your resultant data to a machine learning algorithm like Naive
Bayes, etc., to create your NLPapplication.
7
1.3.2.BERT
BERT, short for Bidirectional Encoder Representations from Transformers, is a machine learning
(ML) framework for natural language processing. In 2018, Google developed this algorithm to
improve contextual understanding of unlabeled text across a broad range of tasks by learning to
predict text that might come before and after (bi-directional) other text. BERT convertswords into
numbers. That is, BERT models are used to transform text data to then be used with other types
of data for making predictions in a ML model.
In this NLP task, we replace 15% of words in the text with the [MASK] token. The model then
predicts the original words that are replaced by [MASK] token. Beyond masking, the masking
also mixes things a bit in order to improve how the model later for fine-tuning because [MASK]
token created a mismatch between training and fine-tuning. In this model, we add a classification
layer at the top of the encoder input. We also calculate the probability of the output using a fully
connected and a soft max layer.
8
Masked Language Model
The BERT loss function while calculating it considers only the prediction of masked values and
ignores the prediction of the non-masked values. This helps in calculating loss for only those 15%
masked words.
In this NLP task, we are provided two sentences, our goal is to predict whether the second sentence
is the next subsequent sentence of the first sentence in the original text. During training the BERT,
we take 50% of the data that is the next subsequent sentence (labelled as isNext) from the original
sentence and 50% of the time we take the random sentence that is not the next sentence in the
original text (labelled as Not Next). Since this is a classification task so we the first token is the
token.
This model also uses a [SEP] token to separate the two sentences that we passed into the model.
The BERT model obtained an accuracy of 97%-98% on this task. The advantage of training the
model with the task is that it helps the model understand the relationship between sentences.
9
1.4.AIM AND OBJECTIVE
Aim
The aim of the project is to develop an AI-powered web application that provides legal assistance
and support to users by classifying offenses, offering legal advice, and recommending legal
professionals.
Objectives
The scope of the LawyerBot project encompasses several key aspects aimed at providing
comprehensive legal assistance to users through an AI-powered web platform. Here's a detailed
description of the project scope:
User Interface Development: The project involves the creation of a user-friendly web interface
accessible across various devices. This interface will include intuitive designs for query
submission, result display, and navigation. Ensuring responsiveness and compatibility across
different browsers and screen sizes is a priority in this phase.
10
Machine Learning Model Construction: The development of a machine learning model, based
on the BERT architecture, is essential for accurate offense classification. Training the model on a
dataset comprising IPC sections, offense descriptions, and punishments is a key step. Fine-tuning
the model to accurately classify offenses based on contextual information is also part of this phase.
Information Provision: The system will provide detailed information on predicted IPC sections,
including descriptions, offenses covered, and prescribed punishments. This information will be
presented in a structured and user-friendly format to facilitate easy comprehension and
understanding.
Deployment and System Maintenance: Deploying the system on suitable hosting infrastructure
to ensure scalability, reliability, and security is paramount. Thorough testing and validation will be
conducted to ensure the accuracy and effectiveness of the system. Ongoing monitoring and periodic
maintenance will address any issues and incorporate updates as needed.
11
CHAPTER 2
LITERATURE SURVEY
2.1. LAW-U: Legal Guidance Through Artificial Intelligence Chatbot forSexual Violence
Victims and Survivors
Year:2022
Doi: 10.1109/ACCESS.2021.3113172
Problem
Sexual violence remains a persistent global issue, exacerbated by stigmatization andsocietal norms
that often blame victims rather than perpetrators. In Thailand, cultural conservatism, patriarchy,
power hierarchies, and heteronormativity contribute to biased responses and perceptions of sexual
abuse and harassment. Additionally, the COVID-19 pandemic and lockdown measures have
intensified domestic violence and sexual violence cases
Objective
The aim of this study is to address the challenges faced by sexual violence survivors in Thailand
by developing LAW-U, an AI chatbot that provides tailored legal guidance based on Thai Supreme
Court decisions related to sexual violence.
Methodology
LAW-U was developed using 182 Thai Supreme Court cases related to Sections 276, 277, 278, and
279 of the Thai Criminal Code. Natural Language Processing (NLP) pipelines were developed to
analyze and understand user input, and mock-up dialogs from Supreme Court decisions were used
to train LAW-U.
12
Dataset
The dataset used for developing LAW-U consisted of 182 Thai Supreme Court cases related to
Sections 276, 277, 278, and 279 of the Thai Criminal Code. These cases were meticulously selected
to cover a range of scenarios and legal interpretations relevant to sexual violence in Thailand.
Finding
LAW-U's development represents a significant step towards providing support for sexual violence
survivors in Thailand. The chatbot's design prioritizes user anonymity and inclusivity, treating all
users equally regardless of age or genderFurthermore, LAW-U's unique approach and success in
Thailand could serve as a model for similar initiatives globally, highlighting the potential of AI in
supporting survivors and advocating for their rights
Year:2023
Doi: 10.1109/ICCEBS58601.2023.10448748
Problem
In the digital age, many individuals face challenges navigating legal complexities due to a lack of
legal expertise or access to legal counsel. This gap in accessible and personalized legal guidance
creates barriers for individuals seeking clarity and assistance with their legal concerns.
Objective
The primary objective of this research paper is to introduce and evaluate an AI chatbot solution
designed to democratize legal resource access. The chatbot aims to empower users by offering
fundamental legal knowledge, personalized instructions tailored to individual legal concerns and
context.
13
Methodology
Dataset
A dataset containing fundamental legal knowledge, rules, regulations, and guidelines relevant to
various legal domains. This dataset captures the interactions between users and the chatbot,
including user queries, chatbot responses, and feedback, which is essential for evaluating the
chatbot's performance and user satisfaction Information about qualified attorneys, their expertise,
availability, and consultation details, enabling real-time attorney consultations through the chatbot.
Finding
The research paper introduces an innovative AI chatbot solution designed to revolutionize legal
support services by democratizing legal resource access. The chatbot effectively empowers users
by offering fundamental legal knowledge, personalized instructions, and real-time attorney
consultations. The customizable search feature further enhances the chatbot's capability to provide
tailored legal guidance based on users' individual circumstances. Overall, the chatbot's
comprehensive approach seeks to bridge the gap in accessible legal guidance, providing equal
opportunities for individuals across society to seek clarity and assistance for their legal needs.
Year:2022
Doi: 10.1109/ICCPC55978.2022.10072070
14
Problem
Crime awareness and crime registration systems often lack efficient platforms for individuals to
report crimes, access information about crime rates, and understand the legal system. This gap in
accessible and user-friendly crime reporting tools hinders timely and effective response by
authorities and leaves individuals uninformed about the legal processes
Objective
The primary objective of this research paper is to develop and evaluate a chatbot-based web service
with voice recognition capabilities aimed at enhancing crime awareness and crime registration
systems.
Methodology
A chatbot with voice recognition capabilities is developed to serve as an interactive platform for
crime reporting, awareness, and information dissemination. The chatbot guides users through the
process of reporting crimes, gathering information, and collecting verification documents. The
chatbot displays blogs, crime rates, and news related to crime to raise awareness among users about
various types of crimes and their prevalence .
Dataset
A dataset containing information about various types of crimes, crime rates, and crime-related news
and blogs This dataset captures the interactions between users and the chatbot, including crime
reports, queries, and feedback, which is essential for evaluating the chatbot's performance and user
satisfaction.
Finding
The research paper introduces a chatbot-based web service with voice recognition capabilities
designed to enhance crime awareness and crime registration systems. The chatbot provides a
platform for reporting crimes, disseminating crime-related information, and facilitating
communication between individuals and authorities. By displaying blogs, crime rates, and news
related to crime, the chatbot raises awareness among users about various types of crimes. The
15
complaint registration system allows users to file complaints quickly and easily, utilizing a custom
named entity recognition model to extract structured information from unstructured complaints,
facilitating more effective comprehension by authorities. Leveraging NLP techniques, the chatbot
processes and analyzes user queries, enhancing its ability to understand and respond to users' needs
effectively. Overall, the chatbot-based web service seeks to provide a quick, user-friendly, and
efficient means for registering complaints and informing individuals about the legal system,
contributing to societal good.
16
CHAPTER 3
EXISTING SYSTEM
3.1. INTRODUCTION
The traditional system for accessing legal information and guidance typically involves consulting
professional lawyers, engaging in manual legal research, or seeking advice from legal experts. Here
are some key aspects of the traditional system:
Individuals seeking legal assistance traditionally turn to law firms or independent lawyers. This
involves scheduling appointments, attending consultations, and incurring fees for professional
advice.
Legal research is often performed manually by individuals, legal professionals, or law students.
This involves searching through legal databases, books, and documents to identify relevant laws,
statutes, and case precedents.
Legal aid clinics, often operated by law schools or nonprofit organizations, provide free or low-
cost legal assistance to individuals who cannot afford traditional legal services.
EXISTING CHATBOTS
17
processing (NLP) techniques to understand and respond to user inputs, rule-based
chatbots follow a fixed set of instructions to interact with users.
Predefined Rules: The chatbot operates based on explicitly defined rules, which are typically set
by developers or domain experts. These rules dictate the chatbot's behavior and determine how it
responds to user inputs.
Structured Responses: Responses provided by the chatbot are predetermined and follow a
structured format. The chatbot selects appropriate responses from a predefined set of options based
on the user's input and the rules defined for each scenario.
Decision Trees: Rule-based chatbots often use decision trees or flowcharts to guide interactions
with users. These decision pathways outline the sequence of questions and responses based on
various conditions and criteria.
3.2. DISADVANTAGES
18
Potential biases based on socioeconomic factors.
19
CHAPTER 4
PROPOSED SYSTEM
4.1. INTRODUCTION
The proposed system of the project, named "LawyerBot," is a comprehensive legal assistance
platform designed to provide users with accurate legal guidance, advice, and support. Here's an
overview of the proposed system:
At the core of the LawyerBot system lies the LawNet Model Integration Module, which integrates
the LawNet model built using advanced techniques like BERT.
The Legal Advice and Assistance Module goes beyond offense classification to provide users with
actionable insights and recommendations. By offering guidance on legal actions, defenses, and
strategies, this module empowers users to navigate legal complexities effectively and make
informed decisions in their legal proceedings.
Multilanguage Translation
The Multilanguage Translation Module translates system responses into multiple languages to cater
to users from diverse linguistic backgrounds. By enhancing accessibility and usability, it enables
users to receive legal assistance in their preferred language, improving overall user experience.
20
Advocate and Lawyer Recommendation
The Advocate and Lawyer Recommendation Module recommends legal professionals based on
user queries and location. By retrieving details of advocates and lawyers from a database and
filtering them based on user requirements, this module helps users connect with suitable legal
professionals for further assistance and representation.
4.2. ADVANTAGES
Digital knowledge repository, eliminating the need for physical legal resources.
The feasibility analysis of the LawyerBot project evaluated its practicality and potential for
successful execution across various dimensions. Here's an overview of the feasibility analysis:
21
System Architecture: The proposed system architecture, including integration with BERT
for NLP tasks, was technically feasible and could be implemented using existing
technologies.
Cost Estimation: The project's budget covered expenses related to hardware, software,
development resources, and operational costs. A detailed cost estimation was performed to
ensure financial feasibility.
Return on Investment (ROI): The potential benefits of the LawyerBot system, such as
improved efficiency, reduced legal costs, and enhanced user satisfaction, justified the initial
investment.
User Acceptance: Stakeholder buy-in and user acceptance were crucial for
the success of the project.
Integration with Existing Processes: The LawyerBot system seamlessly integrated with
existing legal workflows and processes to minimize disruption and facilitate adoption by
legal professionals and clients.
Training and Support: Adequate training and support mechanisms were in place to assist
users in effectively utilizing the system and addressing any issues that arose.
22
CHAPTER 5
SYSTEM REQUIREMENTS
5.1. HARDWARE SPECIFICATIONS
The biggest strength of Python is huge collection of standard library which can be used for the
following:
Machine Learning
Test frameworks
Multimedia
Tensor Flow
24
Figure 5.3.1(b) TensorFlow
Pandas
pandas are a fast, powerful, flexible and easy to use open source data analysis and manipulation
tool, built on top of the Python programming language. pandas are a Python package that provides
fast, flexible, and expressive data structures designed to make working with "relational" or
"labeled" data both easy and intuitive. It aims to be the fundamental high-level building block for
doing practical, real world data analysis in Python.
Figure5.3.1(c) pandas
NumPy
NumPy, which stands for Numerical Python, is a library consisting of multidimensional array
objects and a collection of routines for processing those arrays. Using NumPy, mathematical and
logical operations on arrays can be performed.
Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in
Python. Matplotlib makes easy things easy and hard things possible.
25
Figure 5.3.1(e) matpl tlib
Scikit Learn
scikit-learn is a Python module for machine learning built on top of SciPy and is distributed under
the 3-Clause BSD license.
Figure5.3.1(f)scikitlearn
NLTK:
NLTK is a leading platform for building Python programs to work with human language data. It
provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with
a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and
semantic reasoning, wrappers for industrial-strength NLP libraries, and an active discussion forum.
26
WordCloud
A word cloud (also called tag cloud or weighted list) is a visual representation of textdata. Words are
usually single words, and the importance of each is shown with fontsize or color. Python fortunately
has a wordcloud library allowing to build them.
Figure5.3.1(h)wordcloud
5.3.2 MYSQL
MySQL is a relational database management system based on the Structured Query Language, which
is the popular language for accessing and managing the records in the database. MySQL is open-
source and free software under the GNU license. It is supported by Oracle Company.
Figure.5.3.2.MYSQL
5.3.3.WAMPSERVER
WAMPServer is a reliable web development software program that lets you create web apps with
MYSQL database and PHP Apache2. With an intuitive interface, the application features numerous
functionalities and makes it the preferred choice of developers from around the world
27
Figure.5.3.3WRAPSERVER
• Apache Webserver
• MySQL DB Server
5.3.4 BOOTSTRAP 4
Bootstrap is a powerful front-end framework for faster and easier web development. Bootstrap is a
free and open-source web development framework. It consists of HTML, CSS, and JS-based scripts
for various web design-related functions and components.
Figure.5.3.4BOOTSTRAP4
28
5.3.5 FLASK
Flask is a web framework. This means flask provides you with tools, libraries and technologies that
allow you to build a web application. This web application can be some web pages, a blog, a wiki or
go as big as a web-based calendar application or a commercial website.
Figure.5.3.5 FLASK
29
CHAPTER 6
SYSTEM DESIGN
30
6.2. DATAFLOW DIAGRAM
The Dataflow Diagram for the LawyerBot project visualizes the flow of data within the system,
illustrating how information moves between different components and modules. It depicts the
interactions between users, the LawyerBot application, and external data sources, showcasing the
paths data takes as it undergoes processing, analysis, and presentation.
6.2.1. LEVEL 0
The diagram illustrates the fundamental flow of information, indicating that users interact with the
LawyerBot interface to input queries, which are then processed by the system. The processed
queries are then used to generate predictions or recommendations, which are presented back to the
users.
6.2.1. LEVEL 1
At Level 1 of the Dataflow Diagram for LawyerBot, the diagram provides a more detailed view of
the system's data flow by decomposing the main processes into sub processes and depicting the
interactions between them.
31
Figure 6.2.2 LEVEL 1
6.2.1. LEVEL 2
At Level 2 of the Dataflow Diagram for LawyerBot, the diagram further refines the processes and
sub processes depicted in Level 1, providing a more detailed and comprehensive view of the
system's data flow.
32
Figure 6.2.3 LEVEL 2
6.3.UML DIAGRAM
The Use Case Diagram for the LawyerBot project illustrates the various interactions between users
and the system. It outlines primary functionalities such as user registration, query submission,
prediction retrieval, and advocate/lawyer recommendation. Each use case represents a specific
action or task that users can perform within the LawyerBot system, facilitating efficient
communication and interaction between users and the application.
33
Figure 6.3 use case diagram
34
CHAPTER 7
SYSTEM IMPLEMENTATION
The implementation of the LawyerBot system involves several components and technologies
working together to provide effective legal assistance. Here's how the system can be implemented:
1.Backend Development
Implement endpoints to handle user requests, query processing, and response generation.
Set up routes for user authentication, query submission, and admin operations.
2.Database Management
Design database schema to store user accounts, datasets, advocate/lawyer details, and
system configurations.
4.Text Processing
Use NLTK (Natural Language Toolkit) for text processing tasks such as tokenization,
stopwords removal, and stemming/lemmatization.
5.Multilanguage Translation
Translate generated responses into the desired language(s) based on user preferences or
system settings.
6.Frontend Development
Develop the user interface using HTML, CSS, JavaScript, and Bootstrap framework.
Design a responsive and intuitive interface for users to interact with the system.
7.Real-time Communication
Use Flask-SocketIO for real-time communication between users and the system.
Enable instant messaging and interaction through the LawyerBot chat interface.
8.Admin Panel
Conduct thorough testing of the system to ensure functionality, performance, and usability.
Perform regular maintenance tasks such as updating dependencies, fixing bugs, and
optimizing performance.
By implementing these components and technologies, the LawyerBot system can effectively
provide legal assistance and support to users, enhancing accessibility and efficiency in navigating
legal complexities.
The LawyerBot system operates through a well-defined flow to ensure seamless user interaction
and effective delivery of legal assistance. Here's an overview of the system flow:
1.User Interaction
Users access the LawyerBot web application or chat interface to seek legal assistance
or guidance.
They can register as new users or log in if they already have accounts.
Users input their legal queries or descriptions of offenses into the system through the chat
interface or web app.
3.Text Processing
The Text Processing Module preprocesses the user queries to optimize them for analysis.
4.Offense Classification
The preprocessed queries are fed into the LawNet Model Integration Module, which
incorporates the trained LawNet model.
37
The LawNet model, based on BERT architecture, analyzes the queries and predicts the
relevant IPC sections or legal categories.
5.Response Generation
The predicted IPC sections or legal categories, along with relevant details such as
descriptions and punishments, are generated as responses.
6.Multilanguage Translation
The generated responses are translated into the desired language(s) using a Multilanguage
translation service.
The translated responses are delivered to the user interface, where they are displayed to the
users.
Users can view the responses and engage further with the system as needed.
8.Recommendation Generation
9.Admin Operations
Admin users can perform various tasks such as managing datasets, training the LawNet
model, updating advocate/lawyer details, and managing user accounts.
These operations ensure the smooth operation and maintenance of the system.
Through this iterative process, users receive comprehensive legal assistance and support
38
tailored to their needs and preferences.
The LawyerBot system effectively addresses user queries, provides accurate predictions,
offers valuable insights, and recommends relevant legal professionals, thereby empowering
users in legal matters.
This systematic flow ensures that users receive prompt, accurate, and personalized legal assistance
through the LawyerBot platform, enhancing accessibility and efficiency in navigating legal
complexities.
39
CHAPTER 8
MODULES DESCRIPTION
Dataset Description
Description of IPC Section: In-depth explanation of the respective IPC section, highlighting the
nature of offenses covered.
40
Offense: Specific details regarding the offense outlined in the IPC section.
Punishment: The prescribed punishment for the offense, inclusive of potential imprisonment, fines,
or a combination thereof.
Figure 8.3.1
8.3.2. Preprocessing
To import the dataset containing IPC sections, descriptions, offenses, punishments, and section
numbers, you can use the Pandas library in Python. Pandas provides efficient data structures and
functions for data manipulation and analysis.
For cleaning the dataset by removing any irrelevant information, handling missing values, and
ensuring consistency in formatting, follow these steps:
To begin the preprocessing of text data in the dataset, NLTK (Natural Language Toolkit) is often
employed. This involves several steps.
• Tokenization
41
• Stopword Removal
• Stemming/Lemmatization
• TF-IDF Vectorization
8.3.3. Classification
Deploying the LawNet model into a LawyerBot web app involves integrating the model into
various modules to provide legal assistance and support to users. This module forms the backbone
of the LawyerBot's functionality, enabling it to effectively analyze and classify offenses based on
textual descriptions provided by users.
Upon receiving a user input query, the LawyerBot Response Predictor Module initiates the
preprocessing of the text data to ensure it is suitable for analysis. This preprocessing phase involves
several steps, including tokenization, which breaks down the query into individual words or tokens,
and removing stop words to eliminate irrelevant words that do not contribute to the query's
meaning.
42
8.4.2. Prediction
After preprocessing, the preprocessed query is passed through the trained LawNet model, which
serves as the core component of the response prediction process. The LawNet model has been
trained using sophisticated techniques, such as BERT (Bidirectional Encoder Representations from
Transformers), on a dataset comprising IPC sections, descriptions, offenses, and punishments.
This prediction is made based on the contextual information and semantic understanding encoded
within the pre- trained BERT model.
• Response Generation
• Multilanguage Translation
8.5. Recommendation
• This ensures that the recommendations are tailored to the user's specific geographical
location.
2. Database Query
• This database may include information such as contact details, areas of expertise,
qualifications, and client reviews.
• This ensures that the recommendations provided are relevant and meet the user's specific
requirements.
4.Ranking Algorithm
• This helps ensure that the most suitable and reputable professionals are presented to the
user.
43
5.Presentation to User
• Admin Authentication
• Dataset Management
• User Management
• User Registration
• User Authentication
• Query Submission
• Prediction Result
• Lawyer Recommendation
These modules collectively enable both admin and end users to effectively interact with the
LawyerBot web application, facilitating tasks such as dataset management, model training,
advocate/lawyer management, user registration, authentication, query submission, prediction result
retrieval, and lawyer recommendations
44
CHAPTER 9
• Input: User submits a query regarding IPC Section 420 (Cheating and Fraud).
• Expected Result: System correctly identifies the offense and predicts IPC Section 420.
• Status: Pass
• Expected Result: System asks for clarification or provides multiple possible interpretations.
• Status: Pass
• Expected Result: System ensures the confidentiality and security of user data.
• Actual Result: System encrypts and securely handles sensitive user information.
• Status: Pass
• Expected Result: System maintains responsiveness and does not experience downtime or
performance degradation.
45
• Actual Result: System remains responsive under peak load.
• Status: Pass
• Expected Result: System successfully processes and integrates the new dataset without
errors.
• Actual Result: System processes the new dataset and updates the database.
• Status: Pass
• Expected Result: System creates a new user account and sends a confirmation email.
• Actual Result: System successfully creates the user account and sends the confirmation
email.
• Status: Pass
• Expected Result: System provides accurate and relevant legal advice based on the query.
• Actual Result: System offers informative legal advice tailored to the user's query.
• Status: Pass
• Expected Result: System reflects the updated information accurately in the advocate/lawyer
database.
46
• Status: Pass
Introduction: The purpose of this test report is to provide an overview of the testing activities
conducted on the LawyerBot system. The testing aims to ensure the system's functionality,
reliability, and performance meet the specified requirements and standards.
Test Objective: The primary objective of the testing is to verify the accuracy of response
predictions, assess system responsiveness, and identify any potential issues or bugs within the
LawyerBot system.
Test Scope: The testing scope encompasses various modules and features of the LawyerBot system,
including user interaction, query processing, response prediction, system performance under load,
and administrative functionalities.
Test Environment: The testing was conducted in a controlled environment using the LawyerBot
web application deployed on a local server. Testing tools such as web browsers (Chrome, Firefox),
operating systems (Windows), and Python development environment were utilized.
Test Result: Overall, the testing yielded positive results, with the LawyerBot system demonstrating
accurate response predictions and satisfactory performance. No critical issues or bugs affecting the
system's functionality were identified during testing.
Bug Report: Bug reports document issues or anomalies encountered during testing that deviate
from expected behavior. During testing of the LawyerBot system, no significant bugs or critical
issues were encountered. However, minor issues related to user interface inconsistencies and error
handling were noted and addressed promptly.
Test Conclusion: In conclusion, the LawyerBot system has undergone comprehensive testing,
ensuring its functionality, reliability, and performance meet the desired standards. The successful
completion of testing validates the system's readiness for deployment and use by end-users.
47
BID TCID Bug Description Bug Output
Status
48
CHAPTER 10
9.1. CONCLUSION
In conclusion, the LawyerBot project aims to revolutionize the legal assistance landscape by
providing users with user-friendly platform for concerted effort to bridge the gap between
individuals seeking legal guidance and the complexities of the legal systems. By leveraging cutting-
edge technologies such as natural language processing (NLP) and machine learning, LawyerBot
aims to provide users with a seamless and intuitive platform for accessing legal assistance. Through
the development of a user-friendly web interface and the implementation of NLP techniques, users
can easily submit queries related to legal matters, offenses, or IPC sections. The system's robust
machine learning model, built on the BERT architecture and trained on a comprehensive dataset,
ensures accurate classification of offenses and provides detailed information on predicted IPC
sections, including descriptions and prescribed punishments. Furthermore, LawyerBot goes beyond
classification by integrating a recommendation system that suggests legal professionals based on
user queries and geographical location. This personalized approach enhances the user experience
and facilitates access to relevant legal expertise. With the development of an admin panel, the
project also empowers administrators to manage datasets, train machine learning models, and
oversee user accounts efficiently. Additionally, the emphasis on deployment, system maintenance,
and continuous improvement ensures that LawyerBot remains reliable, scalable, and responsive to
user needs over time. In essence, LawyerBot represents a pioneering effort to democratize access to
legal assistance, empowering individuals with actionable insights and facilitating informed
decision-making in legal matters. Through its comprehensive feature set and commitment to
ongoing refinement, LawyerBot stands poised to revolutionize the legal landscape and make legal
assistance more accessible to all.
49
9.2. FUTURE ENHANCEMENT
Looking ahead, there are several avenues for future enhancement and expansion of the LawyerBot
platform:
• Case Management System: Integrate a case management system to help users track and
manage their legal proceedings. This could include features such as document storage, task
management, and calendar reminders for important deadlines.
• Legal Document Analysis: Expand the platform's capabilities to include the analysis of
legal documents such as contracts, agreements, and court rulings. Develop specialized
models for document summarization, clause extraction, and legal entity recognition.
By pursuing these avenues for future enhancement, LawyerBot can continue to evolve and adapt to
meet the evolving needs of its users and provide valuable legal assistance and support in a variety of
contexts.
50
REFERENCES
JOURNAL REFERENCES
1.Zhang, Y., & Wallace, B. (2017). A sensitivity analysis of (and practitioners’ guide to)
convolutional neural networks for sentence classification. arXiv preprint arXiv:1510.03820.
2.Goyal, P., Gupta, R., & Goyal, L. M. (2020). A review of chatbot and natural language
processing. International Journal of Advanced Research in Computer Science, 11(4), 69-75.
3.Rashid, S. M., Abdullah, A. H., & Ahmed, M. A. (2019). Development of a chatbot using natural
language processing for customer service. International Journal of Computer Science and
Information Security (IJCSIS), 17(5), 167.
4.Lowe, R., & Pow, N. (2017). The rise of the conversational interface: A new kid on the block.
Computer, 50(8), 58-63.
5.Rajabi, A., Asgarian, A., & Ebrahimi, M. (2018). A comparative study of machine learning
algorithms for automated response selection in chatbot systems. In Proceedings of the 9th
Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (pp.
45-52).
6.Singh, A., & Sharma, M. (2020). AI Chatbot: A review of literature. In 2020 2nd International
Conference on Innovative Mechanisms for Industry Applications(ICIMIA) (pp. 23-28). IEEE.
7.Saini, V., & Singh, S. (2019). A review on chatbots in customer service industry. In 2019 6th
International Conference on Computing for Sustainable Global Development (INDIACom) (pp.
313-317). IEEE.
8.Hernandez-Mendez, A., Perez-Meana, H., & Sucar, L. E. (2018). Natural language processing and
chatbots: A survey of current research and future possibilities. Journal of Computing and
Information Technology, 26(1), 1-18.
9.Debnath, B., Chakraborty, D., & Mandal, S. K. (2019). Chatbot for e-learning:
51
A review. In Proceedings of the 2nd International Conference on Inventive Researchin Computing
Applications (pp. 186-190). IEEE.
10.Gao, W., & Huang, H. (2019). An intelligent chatbot system for online customer service. In
Proceedings of the 2019 2nd International Conference on Education and Multimedia Technology
(pp. 208-211). ACM.
11.Sarker, S., & Rana, S. (2020). AI based chatbot for customer service: A review. In 2020 IEEE
Region 10 Symposium (TENSYMP) (pp. 1774-1778). IEEE.
12.Muduli, S., & Sharma, S. (2021). Implementation of a conversational chatbot system for e-
commerce. In Intelligent Computing, Information and Control Systems (pp. 753-760). Springer.
13.Ahmad, M., Kamal, A., & Shahzad, W. (2019). A review of chatbots in customer service. In
2019 3rd International Conference on Computing, Mathematics and Engineering Technologies
(iCoMET) (pp. 1-6). IEEE.
14.H. Jin and H. Kim, "Developing a Chatbot Service Model for Customer Support," in
International Journal of Human-Computer Interaction, vol. 36, no. 12, pp. 1188-1195, 2020.
15.J. R. Lloyd and C. A. Boyd, "The Application of Chatbots in Learning Environments: A Review
of Recent Research," in Journal of Educational Technology Development and Exchange, vol. 13,
no. 1, pp. 1-14, 2020.
16.S. Srinivasan and S. Gunasekaran, "Survey on Chatbot Development and Its Applications," in
Journal of Computer Science, vol. 16, no. 11, pp. 1398-1411, 2020.
17.M. H. Hashim, A. Alhamid, M. Aljahdali and A. Albaham, "Chatbot technology for customer
service: a systematic literature review," in International Journal of Advanced Computer Science and
Applications, vol. 10, no. 6, pp. 305-312, 2019.
18.P. L. Poon and K. D. Chau, "Designing and Implementing a Chatbot for Customer Service," in
International Journal of Innovation and Technology Management, vol. 16, no. 5, pp. 1-18, 2019.
19.Y. Liu, L. Wang and X. Liu, "Designing and Developing a Chatbot for Customer Service," in
Proceedings of the 2019 International Conference on Computer Science and Artificial Intelligence,
52
pp. 209-213, 2019.
20.Y. Zhao, X. Zhao, Y. Zhang and C. Liu, "A survey on chatbot design techniques," in Journal of
Network and Computer Applications, vol. 153, pp. 102-117, 2020.
21.A. Singh and A. Rani, "A Comprehensive Study on Chatbots: History, Taxonomy, Technologies,
and Future Directions," in Journal of Ambient Intelligence and Humanized Computing, vol. 11, no.
6, pp. 2561-2595, 2020.
22.R. J. Passonneau and J. Li, "The benefits and drawbacks of chatbots in customer service," in
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the
9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 5982-
5991, 2019.
23.A. Kapoor and S. Sood, "A Survey of Chatbot Implementation Techniques,"In Proceedings of
the 2020 International Conference on Smart Technologies in Computing, Communications and
Electrical Engineering (ICSTCEE), pp.206-210, 2020.
24.Y. He, Q. Liu and Y. Yang, "A Survey of Chatbot Design Techniques in Speech Interaction," in
Proceedings of the 2020 IEEE 17th International Conference on Networking, Sensing and Control
(ICNSC), pp. 1-5, 2020.
25.S. S. Shrivastava and S. K. Sharma, "A Survey on Recent Trends in Chatbot Development and
Implementation," in Proceedings of the 2020 International Conference on Inventive Computation
Technologies (ICICT), pp. 190-196, 2020.
53
BOOK REFERENCES
2."Learning MySQL: Get a Handle on Your Data" by Russell J.T. Dyer (Learning MySQL)
3."Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow" by Aurélien Géron
(Hands-On Machine Learning)
4."Python for Data Analysis" by Wes McKinney (Python for Data Analysis)
5."WampServer 2: Manually Installing the Apache, MySQL, and PHP" by Dr. James R. Small
(WampServer 2)
6."Bootstrap 4 Quick Start: Responsive Web Design and Development with Bootstrap 4" by Jacob
Lett (Bootstrap 4 Quick Start)
7."Fluent Python: Clear, Concise, and Effective Programming" by Luciano Ramalho (Fluent
Python)
WEB REFERENCES
- https://www.tensorflow.org/guide
8.Seaborn Documentation: Official documentation for Seaborn statistical data visualization library
– https://seaborn.pydata.org/tutorial.html
55
APPENDICES
APPENDIX A
SAMPE SCREENSHOTS
56
57
58
59
60
61
62
63
64
APPENDIX B
65
name=request.form['name']
mobile=request.form['mobile']
email=request.form['email']
location=request.form['location']
pass1=request.form['pass']
now = datetime.datetime.now()
rdate=now.strftime("%d-%m-%Y")
mycursor = mydb.cursor()
mycursor.execute("SELECT count(*) FROM cc_register where uname=%s",(uname, ))
cnt = mycursor.fetchone()[0]
if cnt==0:
mycursor.execute("SELECT max(id)+1 FROM cc_register") maxid = mycursor.fetchone()[0]
if maxid is None:
maxid=1 uid=str(maxid)
sql = "INSERT INTO cc_register(id, name, mobile, email, location,uname, pass,otp,status)
VALUES (%s, %s, %s, %s, %s, %s, %s,%s,%s)"
val = (maxid, name, mobile, email, location, uname, pass1,'','0')
msg="success"
mycursor.execute(sql, val)
mydb.commit()
Training
#Upload Dataset
def admin():
msg=""
mycursor = mydb.cursor()
if request.method=='POST': file = request.files['file']
fn="datafile.csv"
66
file.save(os.path.join("static/upload", fn))
filename = 'static/upload/datafile.csv'
data1 = pd.read_csv(filename, header=0)
data2 = list(data1.values.flatten())
#NLP-Preprocessing
def remove_stopwords(text):
clean_text=' '.join([word for word in text.split() if word not in nlp])
return clean_text
txt=remove_stopwords(msg_input)
stemmer = PorterStemmer()
from wordcloud import STOPWORDS
STOPWORDS.update(['rt', 'mkr', 'didn', 'bc', 'n', 'm',
'im', 'll', 'y', 've', 'u', 'ur', 'don',
'p', 't', 's', 'aren', 'kp', 'o', 'kat',
'de', 're', 'amp', 'will'])
def lower(text):
return text.lower()
def remove_specChar(text):
return re.sub("#[A-Za-z0-9_]+", ' ', text)
def remove_link(text):
return re.sub('@\S+|https?:\S+|http?:\S|[^A-Za-z0-9]+', ' ', text)
def remove_stopwords(text):
return " ".join([word for word in str(text).split() if word not in STOPWORDS])
def stemming(text):
return " ".join([stemmer.stem(word) for word in text.split()])
def lemmatizer_words(text):
return " ".join([lematizer.lemmatize(word) for word in text.split()])
def cleanTxt(text):
67
text = lower(text)
text = remove_specChar(text)
text = remove_link(text)
text = remove_stopwords(text)
text = stemming(text)
return text
#BERT-Feature Extraction
def BERT():
super(BERTLM, self). init ()
self.vocab = vocab
self.embed_dim =embed_dim
self.tok_embed = Embedding(self.vocab.size, embed_dim, self.vocab.padding_idx)
self.pos_embed = LearnedPositionalEmbedding(embed_dim, device=local_rank)
self.seg_embed = Embedding(2, embed_dim, None)
self.out_proj_bias = nn.Parameter(torch.Tensor(self.vocab.size))
self.layers = nn.ModuleList()
for i in range(layers):
self.layers.append(TransformerLayer(embed_dim, ff_embed_dim, num_heads, dropout))
self.emb_layer_norm = LayerNorm(embed_dim)
self.one_more = nn.Linear(embed_dim, embed_dim)
self.one_more_layer_norm = LayerNorm(embed_dim)
self.one_more_nxt_snt = nn.Linear(embed_dim, embed_dim)
self.nxt_snt_pred = nn.Linear(embed_dim, 1)
self.dropout = dropout
self.device = local_rank
if approx == "none":
self.approx = None
elif approx == "adaptive":
self.approx = nn.AdaptiveLogSoftmaxWithLoss(self.embed_dim, self.vocab.size, [10000, 20000,
200000])
68
else:
raise NotImplementedError("%s has not been implemented"%approx)
self.reset_parameters()
def reset_parameters(self):
nn.init.constant_(self.out_proj_bias, 0.)
nn.init.constant_(self.nxt_snt_pred.bias, 0.)
nn.init.constant_(self.one_more.bias, 0.)
nn.init.constant_(self.one_more_nxt_snt.bias, 0.)
nn.init.normal_(self.nxt_snt_pred.weight, std=0.02)
nn.init.normal_(self.one_more.weight, std=0.02)
nn.init.normal_(self.one_more_nxt_snt.weight, std=0.02)
def work(self, inp, seg=None, layers=None):
if layers is not None:
tot_layers = len(self.layers) for x in layers:
if not (-tot_layers <= x < tot_layers):
raise ValueError('layer %d out of range '%x)
layers = [ (x+tot_layers if x <0 else x) for x in layers]
max_layer_id = max(layers)
seq_len, bsz = inp.size()
if seg is None:
seg = torch.zeros_like(inp)
x = self.tok_embed(inp) + self.seg_embed(seg) + self.pos_embed(inp)
x = self.emb_layer_norm(x)
x = F.dropout(x, p=self.dropout, training=self.training) padding_mask = torch.eq(inp,
self.vocab.padding_idx)
if not padding_mask.any():
padding_mask = None
xs = []
for layer_id, layer in enumerate(self.layers):
69
x, _ ,_ = layer(x, self_padding_mask=padding_mask)
xs.append(x)
if layers is not None and layer_id >= max_layer_id:
break
if layers is not None:
x = torch.stack([xs[i] for i in layers])
z = torch.tanh(self.one_more_nxt_snt(x[:,0,:,:]))
else:
z = torch.tanh(self.one_more_nxt_snt(x[0]))
return x, z
def forward(self, truth, inp, seg, msk, nxt_snt_flag):
seq_len, bsz = inp.size()
x = self.tok_embed(inp) + self.seg_embed(seg) + self.pos_embed(inp)
x = self.emb_layer_norm(x)
x = F.dropout(x, p=self.dropout, training=self.training)
padding_mask = torch.eq(truth, self.vocab.padding_idx)
if not padding_mask.any():
padding_mask = None for layer in self.layers:
x, _ ,_ = layer(x, self_padding_mask=padding_mask)
masked_x = x.masked_select(msk.unsqueeze(-1))
masked_x = masked_x.view(-1, self.embed_dim)
gold = truth.masked_select(msk)
y = self.one_more_layer_norm(gelu(self.one_more(masked_x)))
out_proj_weight = self.tok_embed.weight
if self.approx is None:
log_probs = torch.log_softmax(F.linear(y, out_proj_weight, self.out_proj_bias), -1)
else:
log_probs = self.approx.log_prob(y)
loss = F.nll_loss(log_probs, gold, reduction='mean')
70
z = torch.tanh(self.one_more_nxt_snt(x[0]))
nxt_snt_pred = torch.sigmoid(self.nxt_snt_pred(z).squeeze(1))
nxt_snt_acc = torch.eq(torch.gt(nxt_snt_pred, 0.5), nxt_snt_flag).float().sum().item() nxt_snt_loss =
F.binary_cross_entropy(nxt_snt_pred, nxt_snt_flag.float(), reduction='mean')
tot_loss = loss + nxt_snt_loss
_, pred = log_probs.max(-1) tot_tokens = msk.float().sum().item()
acc = torch.eq(pred, gold).float().sum().item()
71
for con in conversations:
for i in range(len(con)-1):
questions.append(linetoID_mapping[con[i]])
answers.append(linetoID_mapping[con[i+1]])
return questions,answers
def transform_text(input_text):
input_text = input_text.lower()
input_text = re.sub(r"I'm", "I am", input_text)
input_text = re.sub(r"he's", "he is", input_text)
input_text = re.sub(r"she's", "she is", input_text)
input_text = re.sub(r"it's", "it is", input_text)
input_text = re.sub(r"that's", "that is", input_text)
input_text = re.sub(r"what's", "that is", input_text)
input_text = re.sub(r"where's", "where is", input_text)
input_text = re.sub(r"how's", "how is", input_text)
input_text = re.sub(r"\'ll", " will", input_text)
input_text = re.sub(r"\'ve", " have", input_text)
input_text = re.sub(r"\'re", " are", input_text)
input_text = re.sub(r"\'d", " would", input_text)
input_text = re.sub(r"\'re", " are", input_text)
input_text = re.sub(r"won't", "will not",
input_text) input_text = re.sub(r"can't", "cannot", input_text)
input_text = re.sub(r"n't", " not", input_text)
input_text = re.sub(r"'til", "until", input_text)
input_text = re.sub(r"[-()\"#/@;:<>{}`+=~|]", "",
input_text) input_text = " ".join(input_text.split())
return input_text
def filter_ques_ans(clean_questions,clean_answers):
# Filter out the questions that are too short/long short_questions_temp = []
72
short_answers_temp = []
for i, question in enumerate(clean_questions):
if len(question.split()) >= minimum_length and len(question.split()) <=
maximum_length: short_questions_temp.append(question)
short_answers_temp.append(clean_answers[i])
short_questions = []
short_answers = []
for i,
answer in enumerate(short_answers_temp):
if len(answer.split()) >= minimum_length and len(answer.split()) <=
maximum_length: short_answers.append(answer)
short_questions.append(short_questions_temp[i])
return short_questions,short_answers
def create_
vocabulary(tokenized_ques,tokenized_ans):
vocabulary = {}
for question in tokenized_
ques: for word in question:
if word not in vocabulary: vocabulary[word] = 1
else:
vocabulary[word] += 1
for answer in tokenized_
ans:
for word in answer:
if word not in vocabulary: vocabulary[word] = 1
else:
vocabulary[word] += 1
return vocabulary
def create_encoding_decoding(vocabulary):
threshold = 15
count = 0
73
for k,v in vocabulary.items():
if v >= threshold:
count += 1
vocab_size = 2
encoding = {}
decoding = {1: 'START'}
for word,
count in vocabulary.items():
if count >= threshold:
encoding[word] = vocab_size decoding[vocab_size ] =
word vocab_size += 1
return encoding,decoding,vocab_size
def transform(encoding, data, vector_size=20):
transformed_data = np.zeros(shape=(len(data), vector_size))
for i in range(len(data)):
for j in range(min(len(data[i]), vector_size)):
try:
transformed_data[i][j] = encoding[data[i][j]]
except:
transformed_data[i][j] = encoding['<UNKNOWN>']
return transformed_data
def create_gloveEmbeddings(encoding,size):
file = open(GLOVE_MODEL, mode='rt', encoding='utf8')
words = set()
word_to_vec_map = {}
for line in file:
line = line.strip().split()
word = line[0] words.add(word)
word_to_vec_map[word] = np.array(line[1:], dtype=np.float64)
74
embedding_matrix = np.zeros((size, 50))
for word,index in encoding.items():
try:
embedding_matrix[index, :] = word_to_vec_map[word.lower()]
except: continue
return embedding_matrix
def create_model(dict_size,embed_layer,hidden_dim):
encoder_inputs = Input(shape=(maximum_length, ), dtype='int32',)
encoder_embedding = embed_layer(encoder_inputs)
encoder_LSTM = LSTM(hidden_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder_LSTM(encoder_embedding)
decoder_inputs = Input(shape=(maximum_length, ), dtype='int32',) decoder_embedding =
embed_layer(decoder_inputs)
decoder_LSTM = LSTM(hidden_
dim, return_state=True, return_sequences=True)
decoder_outputs, _, _ = decoder_LSTM(decoder_embedding,
initial_state=[state_h, state_c])
outputs = TimeDistributed(Dense(dict_size, activation='softmax'))(decoder_outputs)
model = Model([encoder_inputs, decoder_inputs], outputs)
return model
def prediction_
answer(user_input,model):
transformed_input = transform_text(user_input)
input_tokens = [nltk.word_tokenize(transformed_input)]
input_tokens = [input_tokens[0][::-1]] #reverseing input seq encoder_
input = transform(encoding, input_tokens, 20)
decoder_input = np.zeros(shape=(len(encoder_input),
OUTPUT_VECTORLENGTH))
decoder_input[:,0] = WORD_START
75
for i in range(1, OUTPUT_VECTORLENGTH):
pred_output = model.predict([encoder_input, decoder_input]).argmax(axis=2)
decoder_input[:,i] =
pred_output[:,i]
return pred_output
76