Graph Neural Network Framework For Web Based Predi c6fvzbn6
Graph Neural Network Framework For Web Based Predi c6fvzbn6
Vidya Niranjan
R V College of Engineering
DOI: dx.doi.org/10.17504/protocols.io.j8nlkoy9xv5r/v1
Protocol Citation: Anagha S Setlur, Vidya Niranjan, Arjun Balaji, Chandrashekar K 2023. Graph Neural Network Framework for Web-
Based Prediction of Protein-Ligand Docking Scores across multiple organs. protocols.io
https://dx.doi.org/10.17504/protocols.io.j8nlkoy9xv5r/v1
License: This is an open access protocol distributed under the terms of the Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Keywords: QSAR, machine learning and deep learning, graph convolution networks, graph neural networks, data pre-processing,
human organs, web-based predictions
Guidelines
QSAR modeling should be performed for each protein under each organ first. The ligand IDs and SMILES structures are
the preferred columns to be present in the analytical dataset.
Safety warnings
NA
Ethics statement
None.
Before start
Check system compatibility to run pre-processing of models and GCN/hybrid GCN models.
DATASET CREATION
2 In the present scenario, Quantitative Structure Activity Relationship (QSAR) data generated
from Schrodinger Maestro was used for dataset creation. QSAR models were first
generated for specific proteins and by taking a set of ligands from MolPort.
Taking an example for Brain, O14672. Here, Y(Obs) is the docking score. This dataset has
the MolPort IDs and the docking scores obtained from QSAR modeling data.
Using the second dataset containing MolPort IDs and the SMILES string. An analytical
dataset was created.
DATA PRE-PROCESSING
A B
Feature Dimensions
The model's performance was further evaluated using metrics like Root Mean Squared Error
(RMSE) and Mean Average Error(MAE), providing insights into its predictive accuracy and
5 The best possible model was picked and the weights were saved. Then, these weights were
uploaded onto the Streamlit repository.
These same steps were repeated across different proteins, datasets and models to
integrate all models from each human organ into a single platform.
CONCLUSION
6 This protocol briefs the steps required to integrate all predicted QSAR data from each organ
into a single, all-in-one platform for all human organs and proteins associated with them, to
enable users to provide a SMILES structure and estimate the predicted docking score after
mapping with the integrated models. Data pre-processing is the primary step in this
protocol, followed by creation of analytical dataset for conversion into graphs. Advanced
machine and deep learning technique called the graph convolution network (GCN) is shown
as model 1, where high dimensional data is converted to low dimensional data and the
graphs are correlated to the target variables (in this case, docking scores). The hybrid
model, shown as model 2, also adds an additional concept of attention mechanism, that
employs positional encoding along with traditional GCN. The web-application allows users
to choose which model to utilise for their prediction. This protocol allows for the direct
binding affinity predictions of small molecules to important proteins in the human organs,
thereby, providing an overall safety information on the small molecules.
ACKNOWLEDGEMENTS
7 The authors thank Mr. Akshay Uttarkar for providing inputs throughout.
3. Schrödinger Release 2023-4: DeepAutoQSAR, Schrödinger, LLC, New York, NY, 2023.
4. https://www.molport.com/shop/index
5. Wu F, Souza A, Zhang T, Fifty C, Yu T, Weinberger K. Simplifying graph convolutional networks. International conference
on machine learning 2019 May 24 (pp. 6861-6871). PMLR.
6. Javeed A. A hybrid attention mechanism for multi-target entity relation extraction using graph neural networks. Machine
Learning with Applications. 2023 Mar 15;11:100444.