0% found this document useful (0 votes)
19 views11 pages

Graph Neural Network Framework For Web Based Predi c6fvzbn6

Uploaded by

shreyabhat4205
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views11 pages

Graph Neural Network Framework For Web Based Predi c6fvzbn6

Uploaded by

shreyabhat4205
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Dec 19, 2023

Graph Neural Network Framework for Web-Based Prediction of


Protein-Ligand Docking Scores across multiple organs
DOI
dx.doi.org/10.17504/protocols.io.j8nlkoy9xv5r/v1

Anagha S Setlur1, Vidya Niranjan1, Arjun Balaji2, Chandrashekar K1


1Department of Biotechnology, RV College of Engineering, Bangalore- 560059, affiliated to Visvesvaraya Technological
University (VTU), Belagavi- 590018;
2Department of Electronics & Telecommunication, RV College of Engineering, Bangalore- 560059, affiliated to

Visvesvaraya Technological University (VTU), Belagavi- 590018

Vidya Niranjan
R V College of Engineering

DOI: dx.doi.org/10.17504/protocols.io.j8nlkoy9xv5r/v1

Protocol Citation: Anagha S Setlur, Vidya Niranjan, Arjun Balaji, Chandrashekar K 2023. Graph Neural Network Framework for Web-
Based Prediction of Protein-Ligand Docking Scores across multiple organs. protocols.io
https://dx.doi.org/10.17504/protocols.io.j8nlkoy9xv5r/v1

License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Protocol status: Working


We use this protocol and it's
working

Created: December 15, 2023

Last Modified: December 19, 2023

Protocol Integer ID: 92373

Keywords: QSAR, machine learning and deep learning, graph convolution networks, graph neural networks, data pre-processing,
human organs, web-based predictions

protocols.io | https://dx.doi.org/10.17504/protocols.io.j8nlkoy9xv5r/v1 December 19, 2023 1/11


Abstract
Estimating the docking score between proteins and drugs is very important in the application of structure-based drug
design. This project explores the application of Graph Neural networks (GNN) in the field of molecular property
prediction using SMILES representation, the trained models are then deployed on a web-based platform for broader
accessibility and use. The primary dataset utilized in this study includes molecular data represented by MolPort IDs
and associated docking scores, which are critical in assessing molecular interactions. A significant aspect of this
project is data preprocessing, where each molecule, initially represented as a SMILES string, is converted into a graph
format. Effective molecular representation learning is pivotal to facilitate molecular property prediction. Models are
then evaluated based on various performance metrics and deployed on the web-based platform.
Keywords: QSAR, machine and deep learning, graph convolution networks, graph neural networks, data pre-processing,
human organs, web-based predictions

Guidelines
QSAR modeling should be performed for each protein under each organ first. The ligand IDs and SMILES structures are
the preferred columns to be present in the analytical dataset.

Safety warnings

NA

Ethics statement
None.

Before start
Check system compatibility to run pre-processing of models and GCN/hybrid GCN models.

protocols.io | https://dx.doi.org/10.17504/protocols.io.j8nlkoy9xv5r/v1 December 19, 2023 2/11


IMPORTING LIBRARIES

1 Import all necessary libraries


Ensure the installation and importation of all the necessary libraries needed for both the
data preprocessing and the model training and evaluation. Provided below is a screenshot
of the required libraries to be imported.

Importing required libraries

DATASET CREATION

2 In the present scenario, Quantitative Structure Activity Relationship (QSAR) data generated
from Schrodinger Maestro was used for dataset creation. QSAR models were first
generated for specific proteins and by taking a set of ligands from MolPort.
Taking an example for Brain, O14672. Here, Y(Obs) is the docking score. This dataset has
the MolPort IDs and the docking scores obtained from QSAR modeling data.

protocols.io | https://dx.doi.org/10.17504/protocols.io.j8nlkoy9xv5r/v1 December 19, 2023 3/11


2.1 Creation of analytical dataset

Using the second dataset containing MolPort IDs and the SMILES string. An analytical
dataset was created.

protocols.io | https://dx.doi.org/10.17504/protocols.io.j8nlkoy9xv5r/v1 December 19, 2023 4/11


The following is performed to prepare an analytical dataset:

The processed dataset looks as follows:

DATA PRE-PROCESSING

protocols.io | https://dx.doi.org/10.17504/protocols.io.j8nlkoy9xv5r/v1 December 19, 2023 5/11


3 SMILES to graph conversion
Data preprocessing is a pivotal step in this model. Each molecule represented by a SMILES
string is converted into a graph, with atoms as nodes and chemical bonds as edges. This
graph representation is essential for the GNN to accurately interpret molecular structures.
Feature Representation
● Atom Features: Each atom is represented by a one-hot encoded feature vector, indicating
the atom type. The model considers four types of atoms (C, O, N, B), leading to a 4-
dimensional feature vector for each atom.
● Bond Features: Bonds are characterized by their type (single, double, triple, aromatic) and
their inclusion in a ring structure. Each bond is represented by a 5-dimensional feature
vector.

A B
Feature Dimensions

One-hot encoding of atom ty 4


pes (C, O, N, B)

Edge features for bond types


(single, double, triple, aromati 4
c)

Edge features for bond prese 1


nce in a ring structure

Atom features for atom pres 1


ence in a ring structure

Bond indices for atom conne


ctivity 2 per bond

3.1 Using RDKit library for feature representation


So, to represent all these features, we utilize the functionalities of the RDKit library. The
function converts a SMILES string into a molecular graph, encoding atom types using one-
hot encoding and representing bonds with their types and ring membership.

protocols.io | https://dx.doi.org/10.17504/protocols.io.j8nlkoy9xv5r/v1 December 19, 2023 6/11


Using RDKit for feature representation

MODEL TRAINING AND EVALUATION

4 Model defining and training


Define the models and train with early stopping along with appropriate parameters.

4.1 MODEL 1- GRAPH CONVOLUTION NETWORK (GCN)


The first model we explore is a Graph Convolution Network (GCN) with 2 convolution layers.

protocols.io | https://dx.doi.org/10.17504/protocols.io.j8nlkoy9xv5r/v1 December 19, 2023 7/11


4.2 MODEL 2- HYBRID GCN
The second model we explore is a hybrid GCN model:

protocols.io | https://dx.doi.org/10.17504/protocols.io.j8nlkoy9xv5r/v1 December 19, 2023 8/11


4.3 5-fold cross validation
Utilizing 5-Fold cross-validation for training enhancing its robustness and reliability. This
method ensured a comprehensive evaluation by systematically partitioning the data into
distinct subsets for both training and validation.

The model's performance was further evaluated using metrics like Root Mean Squared Error
(RMSE) and Mean Average Error(MAE), providing insights into its predictive accuracy and

protocols.io | https://dx.doi.org/10.17504/protocols.io.j8nlkoy9xv5r/v1 December 19, 2023 9/11


overall performance.

PICKING THE BEST MODEL AND UPLOADING IN REPOSITORY

5 The best possible model was picked and the weights were saved. Then, these weights were
uploaded onto the Streamlit repository.

These same steps were repeated across different proteins, datasets and models to
integrate all models from each human organ into a single platform.

CONCLUSION

6 This protocol briefs the steps required to integrate all predicted QSAR data from each organ
into a single, all-in-one platform for all human organs and proteins associated with them, to
enable users to provide a SMILES structure and estimate the predicted docking score after
mapping with the integrated models. Data pre-processing is the primary step in this
protocol, followed by creation of analytical dataset for conversion into graphs. Advanced
machine and deep learning technique called the graph convolution network (GCN) is shown
as model 1, where high dimensional data is converted to low dimensional data and the
graphs are correlated to the target variables (in this case, docking scores). The hybrid
model, shown as model 2, also adds an additional concept of attention mechanism, that
employs positional encoding along with traditional GCN. The web-application allows users
to choose which model to utilise for their prediction. This protocol allows for the direct
binding affinity predictions of small molecules to important proteins in the human organs,
thereby, providing an overall safety information on the small molecules.

ACKNOWLEDGEMENTS

7 The authors thank Mr. Akshay Uttarkar for providing inputs throughout.

protocols.io | https://dx.doi.org/10.17504/protocols.io.j8nlkoy9xv5r/v1 December 19, 2023 10/11


Protocol references
1. Kaplan Z, Ehrlich S, Leswing K (2021) Benchmark study of DeepAutoQSAR, ChemProp, and DeepPurpose on the ADMET
subset of the Therapeutic Data Commons. https://newsite.schrodinger.com/life-science/learn/white-
papers/benchmark-study-deepautoqsar-chemprop-and-deeppurpose-admet-subset-therapeutic-data/

2. Gion K, Gattani S, Kaplan Z (2022) DeepAutoQSAR hardware benchmark. https://newsite.schrodinger.com/materials-


science/learn/white-papers/deepautoqsar-hardware-benchmark/

3. Schrödinger Release 2023-4: DeepAutoQSAR, Schrödinger, LLC, New York, NY, 2023.

4. https://www.molport.com/shop/index

5. Wu F, Souza A, Zhang T, Fifty C, Yu T, Weinberger K. Simplifying graph convolutional networks. International conference
on machine learning 2019 May 24 (pp. 6861-6871). PMLR.

6. Javeed A. A hybrid attention mechanism for multi-target entity relation extraction using graph neural networks. Machine
Learning with Applications. 2023 Mar 15;11:100444.

protocols.io | https://dx.doi.org/10.17504/protocols.io.j8nlkoy9xv5r/v1 December 19, 2023 11/11

You might also like