0% found this document useful (0 votes)
0 views39 pages

Major Project Report(Edited)

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 39

MedGraph Navigator

A project report submitted in partial fulfillment of the requirements for


the award of the degree of

Master of Computer Applications

By

ABHILASH SINGH
(205121004)

DEPARTMENT OF COMPUTER APPLICATIONS


NATIONAL INSTITUTE OF TECHNOLOGY
TIRUCHIRAPPALLI - 620015
JUNE 2024

1
BONAFIDE CERTIFICATE

This is to certify that the project titled “MedGraph Navigator” is a bonafide record of the
work done by

ABHILASH SINGH (205121004)

in partial fulfilment of the requirements for the award of the degree of Master of Computer
Applications from National Institute of Technology, Tiruchirappalli, during the academic
year 2023-2024 (6th Semester – CA750 Project Work).

Dr. S. Sangeetha Dr. Michael Arock

Internal Guide Head of the Department

Project viva-voce held on …………………………….

Internal Examiner External Examiner

2
ABSTRACT

The project names as ‘MedGraph Navigator’ is based upon medical entities (chemicals and

diseases) and the relationships among them. Understanding and studying about the medical

entities and their relations is important for any person working/studying in the medical domain.

In the project, data about medical entities and relations has been stored in a knowledge graph

(graph database) and a web application has been developed that takes medical entities as input

from the user and display results in Q&A as well as graph format by querying the graph

database at the back-end.

The objective of the project is to make it easier to study and understand medical entities and

their relationships with each other. To achieve this objective, the project starts with a dataset

named ‘BioRED’ which contains medical documents and annotations and uses it to extract

data, convert it and store in graph database (OrientDB) to make it easy to answer questions or

fetch desired information by running SQL like queries.

The entire project has three major steps –

1. Dataset Extraction – a) BioRED

b) Model Training

2. Knowledge Base Construction

3. Application Development

After completion of the project, a web application has been developed that takes medical entity

as input and provides the result in Q & A format and graph format by querying an OrientDB

database (developed as part of the project) that stores data in graph model.

3
ACKNOWLEDGEMENT

Every project, big or small, is successful largely due to the effort of several wonderful people

who have always given their valuable advice or lent a helping hand. I sincerely appreciate the

inspiration, support, and guidance of all those people who have been instrumental in making

this project successful.

I express my deep sense of gratitude to Dr. G. Aghila, Director, National Institute of

Technology, Tiruchirappalli for giving me an opportunity to do this project.

I am grateful to Dr. Michael Arock, Professor, and Head, Department of Computer

Applications, National Institute of Technology, Tiruchirappalli for providing the infrastructure

and facilities to carry out the project.

I express my gratitude to my Project Guide Dr. S. Sangeetha, Associate Professor, Department

of Computer Applications, National Institute of Technology, Tiruchirappalli for her support

and for arranging the project in a good schedule, and who assisted me in completing the project.

I would like to thank her for duly evaluating my progress and evaluating me.

I express my sincere and heartfelt gratitude to Project Evaluation Committee, Department of

Computer Applications, National Institute of Technology, Tiruchirappalli. I am sincerely

thankful for its constant support, care, guidance, and regular interaction throughout my project.

I express my sincere thanks to all the faculty members, and scholars of NIT Trichy for their

critical advice and guidance to develop this project directly or indirectly.

4
TABLE OF CONTENTS

Title Page No.

1. BONAFIDE CERTIFICATE 2

2. ABSTRACT 3

3. ACKNOWLEDGEMENTS 4

4. TABLE OF CONTENTS 5

5. LIST OF FIGURES 6

6. CHAPTERS

a) CHAPTER 1: INTRODUCTION 7

b) CHAPTER 2: PROBLEM STATEMENT 9

c) CHAPTER 3: PLATFORM 10

d) CHAPTER 4: REQUIREMENT GATHERING/ANALYSIS 11

e) CHAPTER 5: METHODOLOGY 14

f) CHAPTER 6 : SYSTEM DESIGN/ANALYSIS 20

g) CHAPTER 7: SYSTEM DEVELOPMENT AND IMPLEMENTATION 26

7. APPENDIX 38

8. REFERENCES 39

5
LIST OF FIGURES

NAME OF FIGURE PAGE NO.

1. Project Block Diagram 8

2. Project Workflow 14

3. Relation Distribution 15

4. Transformer Model 17

5. BERT Architecture 17

6. System Workflow 1 20

7. System Workflow 2 21

6
CHAPTER 1

INTRODUCTION

Title - MedGraph Navigator


Study of medical domain requires studying various medical entities like chemicals and diseases
and their relation among each other. It is essential to store and represent the knowledge about
medical entities and their relations in an efficient and easy to understand manner. One of the
ways to achieve this is to build a knowledge graph consisting of medical entities and relations.

About the Project


MedGraph Navigator is a web application that provides end users a simple and user-friendly
interface to present the medical entities and relations among them in a simple and
understandable manner.

The project also includes developing groups of chemicals and diseases based on
commonalities. The grouping allows to represent the knowledge through a multi-level
knowledge graph and query it to answer questions that require more than one level of
traversing.

MedGraph Navigator presents the medical entities and relations in formats like question and
answers, knowledge graph etc. to make it easier for users to understand how medical entities
are related with each other.

Objective of the Project


The objective of MedGraph Navigator is to make it easier to study and understand the relation
between medical entities like chemicals, disease etc.

As part of dataset extraction, another objective is to develop a deep learning model that can
predict the relation between two medical entities in a given text so that the given entities and
predicted relations can be used to create more data to be stored in the knowledge base.

Input and Output


The application takes a medical entity name as the input and returns output in two formats –
question and answer and graph. Output of the application provides information about how the
given medical entity is related with the other medical entities such as a chemical may be related
to a disease by ‘induce’ relation. The knowledge graph enables the end user to explore the
medical entity and it’s relations.

7
Use Case
The project is primarily designed for the people who wish to study about medical entities.
These people may include medical students, doctors or medical research scholars. The project
is not intended for the common public.

Fig – 1: Project Block Diagram

8
CHAPTER 2
PROBLEM STATEMENT

The problem statement is to ‘Develop a web application that presents the medical entities and
relation among them in an easy-to-understand method by developing and querying a
knowledge graph’.
Knowledge graph is a way of knowledge representation in which entities are stored in vertices
and relation among them are stored as edges.
Although a knowledge graph of medical entities and relations will store and present the medical
data in an efficient manner, the knowledge graph would be very large and complex thus making
it difficult to visualize and understand.
The project aims at solving this issue in two ways –
1. Answering some basic questions related to a medical entity by querying a large
knowledge graph.
2. Visualizing a smaller knowledge graph that presents the relation of given entity with
others.
The project includes two major problem statements –
1. Developing a knowledge graph of medical entities and relations.
2. Developing a web application to interact with the user.

9
CHAPTER 3
PLATFORM

Hardware Requirements

 Intel i3 10th generation


 8 GB RAM
 256 GB SSD

Software Requirements

 Visual Studio Code


 Jupyter Notebook
 Python
 OrientDB

10
CHAPTER 4
REQUIREMENT GATHERING/ANALYSIS

Overview

The project is a medical application based on a knowledge graph of medical entities and
relations among them. The application developed as part of the project is required to take a
medical entity (chemical/disease) and answer some questions related to it by querying the
knowledge graph. The application is required to present the entity and relation information in
a knowledge graph like visualization.
Stakeholder Analysis

The stakeholders of the application include medical students, medical research scientists,
doctors etc. The information presented by the application is not to be used as medical advice
thus, the application is intended only for the purpose of medical study and not to be used by
anyone seeking medical assistance.
Other than the above-mentioned stake holders, the developer of the application is also a
stakeholder and is responsible to ensure that the application fulfils it’s functional as well as
non-functional requirements.
Functional Requirements

Following are the functional requirements of the application –


1. The user should be able to select the entity type (chemical/disease) and enter the name
of the entity.
2. The application should provide information related to the entity to the end user in a
question-and-answer format.
3. The application should draw a knowledge graph that presents the related entities and
relations in nodes and edges.
4. The application should have an auto-fill/suggest feature for entity names.
Non - Functional Requirements

Following are the non-functional requirements of the application –


1. The application should have low response time and present the results in less time.
2. The application should have a user-friendly interface.

11
Data Requirements

1. Data should consist of the following details –


a. Medical entities
b. Relation among entities
2. Data should be collected from trusted source to ensure the accuracy of the data.
3. Medical entities should be grouped together based on the commonalities.
Technical Requirements

Following tech stack is required to develop the application –


1. Python – Python is a high-level interpreted programming language. It is easy to learn
and easy to use. It supports functional as well as object-oriented programming. It is
widely useful in web development, data analytics, machine learning etc.
2. HTML - HTML is the language used to develop web pages. It is used to define the
structure of the web page. It provides various pre-defined tags such as h1, table, tr, td,
img, a, input, form etc. These tags are used to create the structure of a web page which
can be styled using CSS.
3. CSS – CSS stands for cascading style sheet. It is used to define how an HTML element
will be displayed. It adds the style to the html elements. It can be used to define various
properties like colour, background, margin, padding etc. There are some CSS
frameworks available as well.
4. JavaScript – JavaScript is a high-level programming language that is also known as
the language of the web browsers. JavaScript is used to add the functionalities to a web
application. JavaScript also has several frameworks such as React, Angular etc. which
are widely used.
5. Flask - Flask is a python framework. It is lightweight and can be used to develop server
side of the web applications. It is based on two things – 1. WSGI (Web Server Gateway
Interface) standard 2. Jinja2 template engine. Flask is simple and easy-to-use thus
making it one of the most popular web frameworks for web development.
6. OrientDB - OrientDB is a NoSQL database. It supports various data models such as
graph, document-based, object, and key/value models. It is highly scalable and provide
both schema less as well as schema full models. It is efficient and primarily used for
graph like data such as entity-relation, social networks etc.

12
HTML, CSS and JavaScript are used to develop the front-end of the application. Python and
Flask are used to develop the server side of the application. OrientDB is used as the knowledge
Base for the application.
Safety Requirements

Since the application belongs to the medical domain, it is essential to ensure the safety of the
data stored in the knowledge base. It is important to ensure that the data is not altered as it may
lead to application presenting wrong information to the user.
Security measures like password-based authentication should be used to ensure the integrity of
the data.

13
CHAPTER 5
METHODOLOGY

Project Workflow

Fig-2: Project Workflow

Phase – 1: Dataset Extraction


Phase – 1(a): BioRED

Data was extracted from BioRED in a .csv format using python script. The data is required to
construct the knowledge base for the application. The BioRED dataset is in JSON, xml format.
It has annotations for medical entities and relations among them.

BioRED - BioRED a first-of-its-kind biomedical relation extraction dataset (BioRED) with


multiple entity types (e.g. gene/protein, disease, chemical) and relation pairs (e.g. gene–
disease; chemical–chemical) at the document level, on a set of 600 PubMed abstracts.

BioRED has following entity types –

 Gene
 Variant
 Disease
 Chemical

14
 Organism
 Cell Line

Some of the relations present in the BioRED are –

Type 1 Type 2 Rela on Rela on


(Direc onal) (Undirec onal)

Chemical Disease Treatment Nega ve_Correla on

Chemical Disease Induce Posi ve_Correla on

Chemical Chemical Co-treatment Co-treatment

Chemical Variant Resistance Nega ve_Correla on

Variant Disease Cause Cause

Extracted data has four columns – Text, Entity1, Entity2, Relation.

Till now, the extracted dataset contains 814 records which show relationships between
chemical (Entity1) and disease (Entity2).

Three relations are present in the current dataset –

1. Positive Correlation (Induce)


2. Negative Correlation (Treatment)
3. Association

The data is distributed across relations in following manner –

500
408
400
292
300

200
114
100

0
Treatment Induce Association

Fig – 3: Relation Distribution

15
Phase – 1(b): Model Training

BERT model was trained for two classes – Positive Correlation and Negative Correlation. The
objective of training the BERT model is to check if relation can be established between two
medical entities in a given sentence. A highly accurate model will allow us to generate more
entity pairs by predicting the relation between them. BERT was trained on 584 records.

About BERT =>

Google created the deep learning model BERT (Bidirectional Encoder Representations from
Transformers) for use in natural language processing (NLP) applications. The following are
the main details of BERT:

Architecture:
BERT is based on transformer architecture. Transformer model was introduced in a paper
called as ‘Attention is all you need.’ Transformer models rely entirely on self-attention
mechanisms. Transformers use self-attention to weigh the significance of different words in a
sentence, allowing them to capture dependencies across the entire sequence simultaneously.
Transformers consist of encoder and decoder stacks, where the encoder processes input
sequences, and the decoder generates outputs. They've been pivotal in various language-related
tasks like translation, summarization, question answering, and more.
Most of the conventional models can process the text only in one direction (left to right or right
to left). On the other hand, BERT is capable of processing the text in both the directions.

16
Fig – 4: Transformer Architecture

Fig – 5: BERT Architecture


For the given task, we customized the BERT model with following layer –
 BERT
 Dropout
 Linear Layer
 Output (Linear) Layer

17
Variants: BERT has many variants. Some of these variants are -
 BERT-Base: 110 million parameters, 768 hidden units, and 12 layers.
 BERT-Large: 340 million parameters, 1024 hidden units, and 24 layers.
 DistilBERT, RoBERTa, and ALBERT: Extensions or optimizations of BERT's
architecture for increased effectiveness or efficiency.
Applications: BERT is used for the following tasks -
 Text classification
 Named Entity Recognition
 Question Answering
 Translation

Result of Model Training

Classification Report -

Confusion Matrix –

Accuracy –

Validation accuracy of the model is 78 %.

Conclusion –

The accuracy obtained is not enough to generate predictions on critical data as in medical
domain. For this reason, the application did not use any entity relation trios generated by the
models as knowledge base.

18
Phase – 2: Knowledge Base Construc on
Knowledge base is constructed by creating a database in orientdb and inserting the extracted
data into the database. The database is populated by inserting the data in the .csv file by using
python script.

OrientDB is an open-source NoSQL database management system written in Java. It is a


Multi-model database, supporting graph, document and object models, the relationships are
managed as in graph databases with direct connections between records. It supports schema-
less, schema-full and schema-mixed modes.

While creating the knowledge base, the related chemicals and diseases are grouped into
chemical and disease groups. This grouping helps to answer queries which require a multi-
level knowledge graph. The entities are stored in vertex classes (chemical, disease,
ChemGroup, DisGroup) and relation among these entities are stored as edge classes (Member,
Induce, Treatment, Association).

Phase – 3: Applica on Development


● A web-based application has been developed.
● The application takes entity type and name as input from the user and displays the
relations and related entities in two formats -
○ Q & A Format
○ Knowledge Graph Format
● The application provides features the suggestion feature that allows the user to quickly
fill the entity names.
● The application supports multi-level queries and presents the information in a multi-
level knowledge graph.

19
CHAPTER 6
SYSTEM DESIGN/ANALYSIS

The complete application can be understood in terms of three major components –


1. Client Application
2. Flask Server
3. OrientDB Database
The application provides information in two formats –
1. Question and Answer
2. Knowledge Graph
The overall data and control flow of the system is as given below –
1. Question and Answer Format –

o The user selects the entity type and enter the name of the entity.
o Upon submitting, an API call is made to the flask server and the user input data
is passed to the server.
o The server runs a function that is mapped to the received API call.
o The function runs queries on connected OrientDB database.
o Upon receiving the result from the database, the server passes it as response to
the client application.
o The client application uses the received data to display the information in Q&A
format.

Fig-6: System Workflow 1

20
2. Knowledge Graph Format

o The user selects the entity type and enter the name of the entity.
o Upon submitting, an API call is made to the flask server and the user input data
is passed to the server.
o The server runs a function that is mapped to the received API call.
o The function runs queries on connected OrientDB database.
o Upon receiving the result from the database, the server passes it as response to
the client application.
o The client application uses the received data and creates a knowledge graph
from it and displays it in a graph like format to the user.

Fig-7: System Workflow 2

21
The complete application can be understood in terms of three major components –
1. Client Application
2. Flask Server
3. OrientDB Database
Client Application

The client application (Front-end) is developed using HTML, CSS and JavaScript. The client-
side application has two major components –
1. User Input Form – The user input form allows a user to select the entity type and enter
the name of the entity and submit the data to get results in Q&A or Graph format.
2. Output Components – There are two output components. One component displays the
question-and-answer form of information and the other component displays the output
in graph format.
Flask Server

The server application has been developed using the Flask framework of python. The server
application receives API requests from client application and runs the appropriate functions
with respect to the API call and returns the result to the client application.
OrientDB Database

OrientDB is an open-source NoSQL database management system written in Java. It is a Multi-


model database, supporting graph, document and object models, the relationships are managed
as in graph databases with direct connections between records. It supports schema-less,
schema-full and schema-mixed modes.

The data has been stored in an orientdb database in graph model. The medical entities are stored
in vertex classes and relation among those entities are stored in edge classes.

In OrientDB, a multi-model database, vertex and edge classes are fundamental components for
defining and managing graph structures. These classes allow us to create, store, and query
graph data efficiently.

22
Vertex Classes

A vertex represents an entity or a node in a graph. Vertex classes in OrientDB are used to
define the types of entities and their properties. Each vertex belongs to a specific vertex class.

Some of the important points related to vertex classes are -

 The base class for all vertex classes is V. All custom vertex classes inherit from this
base class.

 Vertex classes can have properties that describe the attributes of the entities they
represent. For example, a Person vertex class might have properties like name, age,
and email.

 We can define a schema for vertex classes to enforce constraints and ensure data
integrity.

 Vertex classes can inherit properties from other vertex classes, allowing for a
hierarchical structure.

There are following vertex classes in the database schema –


1. Chemical - This vertex class is used to represent the medical entities which are
chemicals. All the chemical entities are stored as vertices of this class.
2. Disease - This vertex class is used to represent the medical entities which are diseases.
All the disease entities are stored as vertices of this class.
3. ChemGroup – This class represents a group of chemicals. The chemicals present in a
chemical group share some common characteristics. All the chemicals of a chemical
group have some common relation with some disease.
4. DisGroup – This class represents a group of diseases. The diseases present in a disease
group share some common characteristics. All the disease of a disease group have some
common relation with some chemical.

23
Vertex Class No. of Records

Chemical 403

Disease 379

ChemGroup 505

DisGroup 491

All the vertex classes inherit V class which is pre-defined in OrientDB. All the vertex classes
can be viewed as subclasses of the V class. The V class in this application has a total of 1,778
records.

Edge Classes –
An edge represents a relationship or connection between two vertices in a graph. Edge classes
in OrientDB are used to define the types of relationships and their properties. Each edge
belongs to a specific edge class. Some of the important points about edge classes are as given
below -
 The base class for all edge classes is E. All custom edge classes inherit from this base
class.
 Edge classes can have properties that describe the attributes of the relationships they
represent. For example, a Friend edge class might have a since property to indicate
when the friendship started.
 You can define a schema for edge classes to enforce constraints and ensure data
integrity.
 Edges have a direction, with a starting vertex (out-vertex) and an ending vertex (in-
vertex).
There are following edge classes in the application –
1. Association - This edge class represents the relation ‘Association’ between a chemical
and a disease which are associated with each other in some way.
2. Induce - This edge class represents the relation ‘induce’ between a chemical group and
a disease where the chemicals of the given chemical group can cause the disease or the
relation ‘induce’ between a disease group and a chemical where the chemical can cause

24
the disease of the disease group.
3. Treatment - This edge class represents the relation ‘treatment’ between a chemical
group and a disease where the chemicals of the given chemical group can cure the
disease or the relation ‘induce’ between a disease group and a chemical where the
chemical can cure the disease of the disease group.
4. Member – This edge represents the relation between a chemical and a chemical group
in which the chemical is a member of the chemical group or the relation between a
disease and a disease group in which the disease is a member of the disease group.

Edge Class No. of Records

Association 181

Induce 445

Treatment 370

Member 1590

All the edge classes inherit E class which is pre-defined in OrientDB. All the edge classes can
be viewed as subclasses of the E class. The E class in this application has a total of 2,586
records.

25
CHAPTER 7
SYSTEM DEVELOPMENT AND IMPLEMENTATION

Extracting Data from BioRED


The data is extracted using python script. The algorithm of extracting data from BioRED used
in the python script is as given below –
1. Read the BioRED data present in JSON format.
2. Create a list of documents present in the data.
3. Create an empty list med_data.
4. Iterate through the list of documents using a for loop. In each iteration of the
loop, do the following –
a. Extract the passage from the document and then extract the text and
annotations from the passage.
b. Extract the relations from the document.
c. For each relation, do the following –
i. Get the id number of both the entities present in that relation.
ii. Get the relation type
iii. Get the entity type and entity text associated with the id number
by iterating through all the annotations.
iv. If entity 1 is a chemical and entity 2 is a disease, create a
dictionary with keys – Text, Entity1, Entity2 and Relation and
append the dictionary to the list ‘med_data’.
5. Convert med_data to a dataframe.
6. Convert the dataframe to a .csv file and save it.

Grouping of Chemicals and Diseases


The data extracted in the previous step has chemicals, diseases and relations between them.
However, it is essential to create groups of these entities based on common relations with
another entity. The grouping is done by a python code. The algorithm used in the code is as
given below –
1. Read the data in .csv file and create a pandas dataframe.
2. Create two sets – chemicals and diseases.
3. Store all the values in ‘Entity1’ column of the dataframe in set named ‘chemicals’ and

26
those in ‘Entity2’ column of the dataframe in set named ‘diseases’.
4. Create two dictionaries – ChemG and DisG to store chemical/disease groups and
members of those groups in key-value pairs.
5. Create two lists – Dis_Group_Relations and Chem_Group_Relations to store list of
dictionaries where each dictionary contains three keys – Chemical/Disease, Relation,
Group.
6. For each chemical in chemicals, do the following –
a. For each relation, do the following –
i. Get all the values of entity2 from dataframe where entity1 is given
chemical and relation is given relation and create a list ‘arr’
ii. Create a name of a group that serves as the disease group name.
iii. In DisG dictionary, create the key named as the ‘name of group’ and add
‘arr’ as value.
iv. In list ‘Dis_Group_Relations’, append a dictionary with the following
keys – Chemical, Relation, Group.
7. For each disease in diseases, do the following –
a. For each relation, do the following –
i. Get all the values of entity1 from dataframe where entity2 is given
chemical and relation is given relation and create a list ‘arr’
ii. Create a name of a group that serves as the chemical group name.
iii. In ChemG dictionary, create the key named as the ‘name of group’ and
add ‘arr’ as value.
iv. In list ‘Chem_Group_Relations’, append a dictionary with the following
keys – Disease, Relation, Group.

Constructing the Knowledge Base


An orientdb database is constructed to serve as the knowledge base for the application. This is
done by running a python code. The algorithm used in the code in as given below –
1. Import pyorient for OrientDB operations.
2. Import csv and pandas for CSV file handling.
3. Connect to OrientDB server.
4. Authenticate with the server using root credentials.
5. Create a new graph database named 'meddb'.

27
6. Open the created database.
7. Create classes for vertices (Chemical, Disease, ChemGroup, DisGroup) and edges
(Induce, Treatment, Association, Member).
8. Read the Medical.csv file using pandas into a DataFrame with columns: Entity1,
Entity2, Relation.
9. Initialize empty lists for chemicals and diseases.
10. Iterate over the CSV file rows to populate the chemicals and diseases lists.
11. Remove duplicates from both lists.
12. Run the algorithm for ‘Grouping of chemicals and diseases’ as mentioned above and
create dictionaries ChemG and DisG to store name of group and its members and lists
Dis_Group_Relations and Chem_Group_Relations to store list of dictionaries where
each dictionary contains three keys – Chemical/Disease, Relation, Group.
13. Create vertices of chemical class for each unique chemical.
14. Create vertices of disease class for each unique disease.
15. For each chemical group in ChemG, do the following –
a. Create vertex of ChemGroup class with name as the name of chemical group.
b. Create edge of member class from each chemical vertex that is a member of
the given chemical group.
16. For each disease group in DisG, do the following –
a. Create vertex of DisGroup class with name as the name of disease group.
b. Create edge of member class from each disease vertex that is a member of the
given disease group.
17. For each entry in Chem_Group_Relations, create an edge of the specified relation
(Value of the ‘Relation’ key in that entry) type from the ChemGroup (Value of the
‘Group’ key in that entry) vertex to the Disease (Value of the ‘Disease’ key in that
entry) vertex.
18. For each entry in Dis_Group_Relations, create an edge of the specified relation type
(Value of the ‘Relation’ key in that entry) from the Chemical (Value of the
‘Chemical’ key in that entry) vertex to the DisGroup (Value of the ‘Group’ key in that
entry) vertex.
19. Close the database connection.

28
Development of Server Application
The server application is developed using flask framework of python. The server-side code
does the following –
1. Connects to the orientdb server using the root user credentials.
2. Open the database names ‘meddb’.
3. Create an instance of the flask application and set CORS mode on.
4. The server application defines following routes –
A. ‘/’
This route is the route of the home page of the application. The server
runs a function that renders the ‘index.html’ file when this route is
called.
B. ‘/getEntities’
The server runs a function that get the list of all the chemicals and
diseases from chemical and disease class from orientdb database and
return them as a single list.
Following queries are used to fetch the data –
select name from Chemical
select name from Disease
C. ‘/getdata’
The server runs a function that does the following –
i. Extract entity type (ent_type) and entity name (ent_name) from
the POST request JSON data.
ii. Initialize a response dictionary with keys for type, name, induce,
treatment, association, related, and groups.
iii. Depending on the entity type (chemical or disease):
 For chemical type:
a) Fetch and store chemical group memberships, induced
diseases, treatments, associations, and related chemicals.
b) Following queries are executed by the server to fetch the
data –
select * from Member where chemical = '{ent_name}'

select * from Induce where chemgroup in (select

29
chemgroup from Member where chemical =
'{ent_name})

select * from Treatment where chemgroup in (select


chemgroup from Member where chemical =
'{ent_name}')

select * from Association where chemgroup in (select


chemgroup from Member where chemical =
'{ent_name}')

select * from Member where chemgroup in


{chem_groups}

 For disease type:


a) Fetch and store disease group memberships, induced chemicals,
treatments, associations, and related diseases.
b) Following queries are executed by the server to fetch the data –
select * from Member where disease = '{ent_name}'

select * from Induce where disgroup in (select disgroup


from Member where disease = '{ent_name})

30
select * from Treatment where disgroup in (select disgroup
from Member where disease = '{ent_name}')

select * from Association where disgroup in (select disgroup


from Member where disease = '{ent_name}')

select * from Member where disgroup in {dis_groups}

iv. Return the populated response dictionary.

D. ‘/getgraph’
i. Extract entity type (ent_type) and entity name (ent_name) from the
POST request JSON data.
ii. Initialize a response dictionary with keys for name and groups.
iii. Depending on the entity type (chemical or disease):
For chemical type:
Fetch from orientdb and store chemical group memberships and
related information (induced diseases, treatments, associations).
Following query is run to get the list of all chemical groups the
chemical belong to –
select * from Member where chemical = '{ent_name}'
Following queries are executed by the server for each chemical

31
group the chemical belongs to –
select * from Member where chemgroup = '{group}'
select * from Induce where chemgroup = '{group}'
select * from Treatment where chemgroup = '{group}'
select * from Association where chemgroup = '{group}'
For disease type:
Fetch and store disease group memberships and related information
(induced chemicals, treatments, associations).
Following query is run to get the list of all disease groups the disease
belong to –
select * from Member where disease = '{ent_name}'
Following queries are executed by the server for each disease group
the disease belongs to –
select * from Member where disgroup = '{group}'
select * from Induce where disgroup = '{group}'
select * from Treatment where disgroup = '{group}'
select * from Association where disgroup = '{group}'
iv. Return the populated response
5. Run the server application.

Development of Client Application


The client application is developed using HTML, CSS and JavaScript. The HTML and CSS
creates the layout and style of the application while the functionalities are added by JavaScript.
The JavaScript code provides two features –
1. Display result in Q&A format –
To display result in Q&A form, following algorithm is used in the JavaScript code -
a) Get entity type and name from input field.
b) Create a request object with entity name and type as request body.
c) Make an API call to the flask server by sending a POST request.
d) The response obtained from the server has following keys – name, type, induce,
treatment, association and related and their corresponding values.
e) Get the container element.
f) For each key other than name and type, do the following –

32
a. Dynamically create ‘p’ element and change the inner html to the
question corresponding to the key based on the type of entity.
b. Dynamically create ‘p’ element and change the inner html to the data
corresponding to the key.
c. Append the elements with the container element.
2. Display result in Graph format –
To display result in graph form, following algorithm is used in the JavaScript code –
a) Get entity type and name from input field.
b) Create a request object with entity name and type as request body.
c) Make an API call to the flask server by sending a POST request.
d) The response obtained from the server has following keys – name, groups,
groupinduce, grouptreatment, groupassociation and their corresponding values.
e) Get the container element.
f) Create an array nodeList and insert all the vertex/node elements which are
created using the response obtained from server. Vertex/node elements are
JavaScript objects with id and label and are created for each chemical, disease
and group.
g) Create an array edgeList and insert all the edge elements which are created
using the response obtained from server. Edge elements are JavaScript objects
with from, to and label keys and are created for each relation among chemicals,
diseases and groups.
h) Create two vis.DataSet objects by using nodeList and edgeList.
i) Create a network by using the two DataSet objects.

33
Screenshots

34
35
36
37
APPENDIX

Steps to run the Application

Install orientdb and set the path variable ORIENTDB_HOME


Open command prompt.
Run following sequence of commands –
cd %ORIENTDB_HOME%\bin
server.bat (It will start orientdb server)
Go to the directory of the project and open terminal.
Run – python meddb.py (It will create and populate the database)
Run – python server.py (It will start the flask server)
Open web browser and enter ‘http://127.0.0.1/5000’

38
REFERENCES

1. https://orientdb.org/
2. https://flask.palletsprojects.com/en/3.0.x/
3. https://machinelearningmastery.com/the-transformer-model/
4. https://www.techtarget.com/searchenterpriseai/definition/BERT-language-
model#:~:text=BERT%20language%20model%20is%20an,surrounding%20text%20t
o%20establish%20context.
5. https://academic.oup.com/bib/article/23/5/bbac282/6645993
6. https://www.analyticsvidhya.com/blog/2022/11/comprehensive-guide-to-bert/

39

You might also like