0% found this document useful (0 votes)
52 views35 pages

POAP Docking Protocol

The document describes POAP, a GNU Parallel based pipeline that enables highly optimized parallelization of Open Babel and AutoDock suite for high throughput virtual screening. POAP features include: 1) A unique ligand preparation module that offers extensive parameterization and conformer generation options along with parallelization and error handling. 2) Multi-receptor docking capability for comparative virtual screening and drug repurposing studies. 3) High scalability and optimal CPU utilization, leading to significant reduction in computational time for large virtual screening tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views35 pages

POAP Docking Protocol

The document describes POAP, a GNU Parallel based pipeline that enables highly optimized parallelization of Open Babel and AutoDock suite for high throughput virtual screening. POAP features include: 1) A unique ligand preparation module that offers extensive parameterization and conformer generation options along with parallelization and error handling. 2) Multi-receptor docking capability for comparative virtual screening and drug repurposing studies. 3) High scalability and optimal CPU utilization, leading to significant reduction in computational time for large virtual screening tasks.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Accepted Manuscript

Title: POAP: A GNU Parallel based multithreaded pipeline of


Open Babel and AutoDock suite for boosted High Throughput
Virtual Screening

Authors: A. Samdani, Umashankar Vetrivel

PII: S1476-9271(17)30575-3
DOI: https://doi.org/10.1016/j.compbiolchem.2018.02.012
Reference: CBAC 6795

To appear in: Computational Biology and Chemistry

Received date: 16-8-2017


Revised date: 26-12-2017
Accepted date: 14-2-2018

Please cite this article as: Samdani, A., Vetrivel, Umashankar, POAP: A GNU
Parallel based multithreaded pipeline of Open Babel and AutoDock suite for
boosted High Throughput Virtual Screening.Computational Biology and Chemistry
https://doi.org/10.1016/j.compbiolchem.2018.02.012

This is a PDF file of an unedited manuscript that has been accepted for publication.
As a service to our customers we are providing this early version of the manuscript.
The manuscript will undergo copyediting, typesetting, and review of the resulting proof
before it is published in its final form. Please note that during the production process
errors may be discovered which could affect the content, and all legal disclaimers that
apply to the journal pertain.
POAP: A GNU Parallel based multithreaded pipeline of Open Babel and AutoDock suite for

boosted High Throughput Virtual Screening

Samdani A1, 2, Umashankar Vetrivel1*


1
Centre for Bioinformatics, Kamalnayan Bajaj Institute for Research in Vision and Ophthalmology,

Vision Research Foundation, Sankara Nethralaya, Chennai - 600 006, Tamil Nadu, India.
2
School of Chemical and Biotechnology, SASTRA University, Thanjavur, India.

T
*
Corresponding author Email: [email protected], [email protected]

IP
Dr.V.Umashankar,

R
HOD & Principal Scientist
Centre for Bioinformatics,

SC
Kamalnayan Bajaj Institute for
Research in Vision and Ophthalmology,
Vision Research Foundation,
Sankara Nethralaya,
Chennai - 600 006, U
N
Tamil Nadu, India.
A
M

Running title: Parallelized Open Babel & AutoDock suite Pipeline


ED

Graphical abstract
E PT
CC
A

1
T
R IP
Highlights:

SC
 POAP is a GNU Parallel based pipeline that enables optimally parallelized HTVS run by

coupling Open Babel and AutoDock suite


U
N
 The Ligand preparation option in POAP is unique offering extensive parameter feeding and
A
well optimized parallelization
M

 POAP provides high scalability and optimal usage of CPU cores leading to significant
ED

reduction of computational time

 POAP features multi receptor docking and comparative analysis enabling drug repurposing
PT

studies
E
CC
A

2
T
R IP
Abstract

SC
High throughput virtual screening plays a crucial role in hit identification during the drug

U
discovery process. With the rapid increase in the chemical libraries, virtual screening process
N
becomes computationally challenging, thereby posing a demand for efficiently parallelized software
A
pipelines. Here we present a GNU Parallel based pipeline-POAP that is programmed to run Open
M

Babel and AutoDock suite under highly optimized parallelization. The ligand preparation module is

a unique feature in POAP, as it offers extensive options for geometry optimization, conformer
ED

generation, parallelization and also quarantines erroneous datasets for seamless operation. POAP

also features multi receptor docking that can be utilized for comparative virtual screening and drug
PT

repurposing studies. As demonstrated using different structural datasets, POAP proves to be an

efficient pipeline that enables high scalability, seamless operability, dynamic file handling and
E
CC

optimal utilization of CPU’s for computationally demanding tasks. POAP is distributed freely under

GNU GPL license and can be downloaded at https://github.com/inpacdb/POAP


A

Keywords

GNU Parallel; AutoDock; Open Babel; Virtual screening; Parallel processing; ligand preparation

3
T
R IP
1. Introduction

SC
Drug discovery and development has undergone phenomenal changes over the years.

Computer aided drug designing strategies like molecular modelling and structure-based virtual

U
screening has been the reason for this progressive change. Moreover, discovery of new therapeutic
N
moieties in a swift and cost effective manner is the need of the hour, and can only be achieved by
A

implementing robust computational approaches. Continual advancement in the application of


M

computational aspects to biological and chemical space has extremely influenced modern drug
ED

development chain (Rahman et al., 2012). Computational tools are widely applied for predicting hit

molecules against the target of interest, and many such predictions were proven to be highly accurate
PT

at experimental validation (Kuhn et al., 2016; Sliwoski et al., 2014; Xia, 2017). In the current scenario,

structure based drug design involving structural refinement, molecular docking and virtual screening
E

has become an indispensable part of drug discovery process (Ferreira et al., 2015; Kalyani G, 2013;
CC

Śledź and Caflisch, 2017).


A

The “Open source” concept has revolutionized the software industry worldwide. A ten point

standards were announced by Open source initiative to define the term “open source” (Årdal and

Røttingen, 2012). Among these ten points, three are considered to be significant: access to source

code, free redistribution and creation of derived works (Årdal and Røttingen, 2012). Open source

based drug design software like AutoDock, Open Babel etc., have played a major role in accelerating

4
the drug discovery process (Umashankar, V. G. S., & Gurunathan, S., 2015) and also well abide to

the key standards of open source initiative. However, complete potential of these opensource tools in

High Throughput drug discovery can only be unleashed by means of massive parallelization, and

proper placement in a computational pipeline. In general, efficiently parallelized and guided

workflows are available only in expensive commercially licensed software. Thus, an open source

based Virtual screening pipeline which is parallelized efficiently, will attract many scientist with

T
limited resources to pursue virtual screening of using large set of chemical libraries in an efficient

IP
way.

R
Recent benchmarking studies on free and commercial docking tools have shown AutoDock

SC
Vina to be an optimal performer in identifying the best ligand bound pose (Wang et al., 2016).Virtual

screening of ligands using AutoDock and AutoDock Vina are being extensively used by various

U
people for the lead identification of the target proteins. Many useful tools like PyRx, raccoon, DOVIS,
N
VSdocker, AUDocker LE and Pymol plugins are available for performing virtual screeing studies
A

with AutoDock and AutoDock Vina (Chen, 2015, 2015; Lill and Danielson, 2011; Prakhov et al.,
M

2010; Sandeep et al., 2011; Zhang et al., 2008). However, there is a need for pipelines that efficiently
ED

utilize the simple yet powerful GNU Parallel for parallelizing the complete virtual screening

workflow, right from ligand preparation to post docking analysis. Especially, there is dearth of open
PT

source pipelines which can handle ligand preparation process in a parallelized manner. Inverse

docking and multiple protein docking protocols has been proven to be powerful methods to assign
E

the targets for the ligand of interest (Li et al., 2006; Medina-Franco et al., 2013). These protocols
CC

demand a well parallelized pipelines capable of handling huge datasets.


A

Though, there have been highly appreciable attempts to develop these sorts of open source

pipelines, there are concerns with installation, configuration and guided workflow. Moreover, many

of such pipelines have not attempted to completely utilize the highly efficient GNU Parallel tool to

parallelize the virtual screening process, including ligand preparation to post docking analysis.

5
Hence, in this study, it is attempted to develop a parallelized virtual screening pipeline:

Parallelized Open Babel & AutoDock suite Pipeline (POAP) which integrates the popular tools like

Open Babel, AutoDock, AutoDock Vina and AutoDockZN, in an easily configurable bash shell based

text interface. POAP offers modules for ligand preparation, Single receptor Virtual screening,

multiple receptor Virtual screening and consensus scoring. All these modules are engineered to run

in a GNU Parallel based multi CPU environment. In POAP, a well optimized dynamic file handling

T
is also implemented, thereby, enabling optimal RAM usage, quarantining of erroneous ligand datasets

IP
facilitating unperturbed operation of the workflow, and structured accessibility of input, output and

R
intermediary files. The developed pipeline demonstrates the effective usage of GNU Parallel tool to

SC
be implemented in the development of complete virtual screening workflow.

2. Materials and Methods

U
N
POAP was developed using bash programming language integrating the most popular tools:
A
Open Babel-2.4.0 for ligand optimization and AutoDock-4.2.6, AutoDock Vina-1.1.2, AutoDockZn
M

for virtual screening, scripts from MGLTOOLS-1.5.7 (http://mgltools.scripps.edu/). The parallelized

executions of the jobs were achieved by utilizing the GNU Parallel tool.
ED

2.1 GNU Parallel


PT

GNU Parallel is a command line tool that can be used to run jobs in parallel. It contains most

of the options which are present in xargs. It has been widely used to run the same command for a
E
CC

number of times from the given input. GNU Parallel executes jobs in parallel mode depending on the

number of CPU threads assigned by the user. The usage of this tool enables complete and powerful
A

utilization of CPU resources. Efficient file handling and parallel execution of processes make GNU

Parallel to be a most preferred tool (O. Tange, 2011).

2.2 Development of Ligand Preparation Module

Open Babel is a freely accessible cheminformatics tool that is used for ligand optimization,

6
format conversion, 3D co-ordinate generation and feature based filtration etc., (O'Boyle et al., 2011b;

O'Boyle et al., 2011a). However, this tool can handle only one ligand at a time committed to single

CPU. Hence, in the Ligand preparation module developed, Open Babel is parallelized using

appropriate GNU Parallel flags for conversion of the ligands in SMILES or 2D co-ordinate files to a

3D co-ordinate file (pdbqt format). Moreover, the ligand conformer generation methods such as

Genetic Algorithm, Random rotor search, Weighted Rotor search, Obconformer and Confab are also

T
invoked to produce the preferred number of conformations at user’s choice. Furthermore, this module

IP
also provides options for fixing the stereo chemical errors, energy minimization of the ligands with

R
conjugate gradient or steepest descendent methods.

SC
By default, the ligands with erroneous data are also processed by Open Babel which results in

archiving of erroneous incomplete files. This might disrupt the ligand preparation workflow. Thus,

U
during the ligand preparation process, these types of ligand data are identified and quarantined. For
N
each and every step in the ligand preparation module, the users will be prompted to enter the desired
A

values towards enabling a modular run. Detailed flowchart of the Ligand preparation pipeline is
M

shown in Fig.1.
ED
E PT
CC
A

7
T
R IP
SC
U
N
A
M
ED
E PT
CC

Fig.1: Detailed flowchart of Ligand preparation module. The dotted red line indicates the
A

quarantining of erroneous ligands during the preparation process.

2.3 Development of Virtual Screening Module

AutoDock is a popular and widely used software for Protein-Ligand docking. It commits only

to a single CPU per docking run. AutoDock implements Lamarckian Genetic Algorithm and free

8
energy empirical scoring to calculate and produce the ligand binding energy (Morris et al., 2009).

AutoDock Vina is the improved version of AutoDock that uses gradient optimization process for

scoring the binding affinity of the ligands. It also features multi-threading capability and higher

accurate prediction of the ligand binding energy, thus making it a preferred tool for multiple ligand

screening processes (Trott and Olson, 2010). However, multiple CPU usage of AutoDock Vina is

dependent upon the level of exhaustiveness set during the search. POAP offers the flexibility of

T
choosing the maximum number of CPU’s at desired level of exhaustiveness, implemented through

IP
GNU Parallel. This feature enables optimal and maximal resource planning during the AutoDock

R
Vina run. Recently, AutoDockZN, a charge-independent and directional based model has been

SC
released, which can used for docking of ligands to zinc metalloproteins (Santos-Martins et al., 2014).

AutoDockZN is also parallelized in POAP, as similar to that of discussed for AutoDock.

U
2.4 Dynamic file handling for reducing hard disk space and faster execution
N
A
In general, when AutoDock is run for single protein-ligand docking, user needs to run autogrid
M

separately to map the ligand atom types in the grid parameter file (gpf). During this step, the ligands

with unidentified atom types will not be processed. This becomes an important concern during the
ED

parallelized run of AutoDock. To address this concern, the prepared ligand datasets are crosschecked

for atom types supported by the AutoDock through an automated script, thereby the ligands with
PT

unidentified atom types will also be quarantined before the execution of autogrid. This feature is
E

extremely useful to set a seamless grid preparation during the execution of AutoDock based virtual
CC

screening. Moreover, in order to reduce time and hard disk space occupancy, all the map files

representing the entire ligand dataset are directed to a common hub directory. Otherwise, for each
A

ligand, map files will be created in separate directories leading to redundant map files for identical

atom types, thereby consuming huge disk space and computational time. Hence, during the execution

of this module, the non-redundant map files were kept as a central hub for performing the docking

process. Further, to enable easier visualization and interpretation of data using MGLTOOLS, the atom

map files and .dlg files are directed to the working directory in an automated manner. Detailed

9
flowchart of the developed Virtual screening pipeline is shown in Fig.2.

T
R IP
SC
U
N
A
M
ED
E PT
CC
A

10
A
CC
EPT
ED
M

11
A
N
U
SC
RIP
T
Fig.2: Detailed flowchart of Virtual screening module.2.5 Execution of Ligand Preparation

Module

To prepare the ligand datasets in required format, the ligand preparation script

“POAP_lig.bash” needs to be executed. In order to start this module in interactive mode, flag –s

should be used together. During the initial step of ligand preparation, user needs to specify the

directory path of ligands that are to be processed. Currently, POAP supports sdf, mol2, smi, mol, sd,

T
sy2, ml2, pdb formats. The ligand datasets shall be provided as single compressed file as well as

IP
individual files. Next, the user must specify the number of jobs to be run in parallel. This can be

R
arrived in accordance to the available computing resources. This module also offers the option to

SC
choose the desired forcefield (Ghemical/GAFF/MMFF94/MMFF94s/UFF) for ligand optimization

and 3D conversion. Moreover, user will also be prompted to select the methods for ligand conformer

U
search, which includes Genetic Algorithm, Random rotor search, Weighted rotor search,
N
Obconfomrer and Confab. Here, user needs to specify the search method and also should provide the
A

number of conformations to be generated and other RMSD or energy cut-offs to filter the ligand
M

conformations. This module also enables the user to skip this step and proceed directly to energy
ED

minimization step. In the minimization parameter feeding interface, user needs to specify the choice

of algorithm, number of steps and other cut-off values along with desired output file format.
PT

After fetching all the inputs from the user, the ligand preparation module will run the jobs in
E

parallel. During the parallelized run, the ligands with erroneous data will be quarantined and excluded.
CC

This quarantining process is triggered by the detection of ERROR messages prompted by the obabel

or by the prepare_ligand4.py script during pdbqt conversion. The ligands for which error messages
A

were prompted will be automatically quarantined in the respective process folders. In some cases,

Open Babel will generate an empty file without any error message, POAP is also programmed to

dynamically identify and quarantine these types of erroneous ligands. POAP also screens the ligand

datasets for AutoDock atom types and the ligands with unknown atom types and quarantines it. The

ligands passing all these checks will only be directed to the output directory. In case of optimized

12
ligand dataset availability, user can directly proceed towards pdbqt conversion using the -pdbqt flag

during the initiation of the POAP_lig.bash. Similarly, various shortcut flags like –three for 3D

conversion, -conf for conformer generation, -min for minimization etc. can be used during the

initiation of the script to skip the interactive mode. Kindly refer the detailed operating manual and

tutorials for more details.

2.6 Execution of Virtual screening Module

T
IP
To perform parallelized virtual screening with Audodock suite, POAP_vs.bash should be

executed in the bash terminal. To start the screening in an interactive mode, flag –s should be added

R
during the execution of the script. In the initial step of ligand assignment, user needs to specify the

SC
directory path of prepared ligands in pdbqt format. In case if the user intends to use AutoDock Vina,

U
the directory path to configuration file also needs to be provided. For AutoDock and AutoDockZn,
N
user needs to provide the path to reference .gpf and .dpf file containing details on customized grid
A
and docking parameters, respectively. Specifically, in case of AutoDockZn, the forcefield in .dat
M

format and script files (download from http://autodock.scripps.edu/resources/autodockzn-forcefield)

needs to be kept in the directory where the .gpf and .dpf files are located. Further, in the interactive
ED

mode, the user needs to specify the number of CPU threads to be utilized for running single AutoDock

Vina job, whilst, in case of AutoDock and AutoDockZn, the number of jobs to be run in parallel needs
PT

to be specified. Further, the user needs to provide the number of top hits for which the protein-ligand
E

complexes needed to be generated as pdb files. On completion of all these steps, virtual screening
CC

will be executed in parallel. Finally, the results of Virtual screening will be tabulated in toplist.txt file

containing different energy scores, theoretical pI values etc. Moreover, the docked complexes of the
A

top hits will be parsed as pdb files, and can be retrieved from the Results folder in the working

directory.

13
2.7 Multi receptor Virtual screening

The Virtual screening module also provides a very useful option for performing multiple

ligands vs. multiple receptors in a parallelized manner. This option provides the user to choose a set

of proteins with different or similar active sites and perform virtual screening of chemical libraries at

a single stretch. The results of multiple receptor virtual screening will be provided as a tab delimited

file with docking scores of all ligands vs. all receptors. Moreover, these results are auto sorted in

T
accordance to binding score similarity across the receptors with respective standard deviation values.

IP
This feature enables the user to explore the ligands that target multiple receptors, and also aids in

R
short listing the ligands with off target effects.

SC
2.9 Validation of POAP for performance using sample datasets

2.9.1 Ligand preparation module


U
N
In order to evaluate the efficiency of Ligand preparation module, ligand datasets from four
A

different databases in SDF format were chosen: FDA approved drugs from DrugBank (1,288 ligands)
M

(Wishart et al., 2017), and Myriascreen-II (10,000 ligands), Natural Derivative Library (NDL) (3,040
ED

ligands), Extended Flavonoids Derivative (EFD) (4,053 ligands) from Timtec database

(http://www.timtec.net/). The dataset of FDA approved drugs comprised of 2,073 ligands, which
PT

includes inorganic ligands, metallic ligands and other ligands with atom types which are not supported

by AutoDock. Hence, by excluding all these ligands the resulting 1288 ligands datasets were
E
CC

processed by Ligand Preparation module of POAP. As these datasets were in 2D SDF format, 3D

conversion was performed with medium speed option. Different ligand conformers were generated
A

with weighted rotor conformational search, wherein, the number of conformers set to 50, among

which the lowest energy conformer were picked and proceeded further. Energy minimization of the

lowest energy conformer was deployed with 5000 steps of conjugate gradient minimization with

MMFF94 force field. The distance cut-offs for Vander waals, electrostatic and non-bonded

interactions were set to default. The pdbqt conversion of the energy minimized structure was

14
integrated by calling prepare_ligand4.py script available in MGLTOOLS-1.5.7. To compare the

efficiency of the Ligand preparation module, a single CPU committed script was run by retaining the

above discussed parameters. Finally, the ligand preparations were carried out by both parallelized

pipeline and in GNU Parallel disabled mode. The test run right from ligand preparation to virtual

screening were carried out in Ubuntu-12. 04 64-bit environment installed in T5510 DELL workstation

with Intel Xeon(R) CPU E5-2620V2, 2.10 GHz clock speed (12 Cores, 24 threads) with 62.9GB

T
RAM. The software integrated in the pipelines include Open Babel-2.4.1, MGLTOOLS-1.5.7,

IP
AutoDock-4.2.6, AutoDock Vina-1.1.2 and GNU Parallel-20170122. In case of parallel mode, 24

R
jobs were allocated to be executed in parallel at a given time based on the available CPU threads. The

SC
log files of parallelized and serial modes were analysed for assessing the performance in terms of

speedup ratio. The log details from parallelized preparation of FDA approved datasets from

U
DrugBank were used for measuring the parallelization efficiency with Amdahl’s Law, as serial
N
execution with other larger datasets (Myriascreen-II, NDL, and EFD) will lead to enormous
A
computational time with the limited hardware.
M

2.9.2 Protein Preparation


ED

To demonstrate the speedup efficiency and drug repurposing applicability of POAP, crystal

structures of four popular and diverse drug targets (N=4) were chosen: Human ROCK I (Rho
PT

associated protein kinase I) (PDB ID: 2ETR, 2.6 Å),which is a potential target manifested in many of
E

the diseases like Cancer, atherosclerosis, Glaucoma etc. (Riento and Ridley, 2003); HTH-type
CC

transcriptional regulator (EthR) (PDB ID: 5J1U, 1.8Å) of M.tuberculosis which is responsible for

modulating the ethionamide (ETH) drug resistance (Willand et al., 2009); Pks13 (Polyketide
A

synthase) of M. Tuberculosis (Thioesterase domain: PDB ID: 5V3X, 1.94Å) involved in mycolic acid

synthesis (Aggarwal et al., 2017; Takayama et al., 2005); PqsA (Anthranilate-coenzyme A ligase)

(PDB ID:5OE3, 1.43Å) of Pseudomonas aeruginosa which is involved in Quorum sensing through

PQS signalling (Ji et al., 2016; Lesic et al., 2007).

15
Further, to demonstrate the drug enrichment analysis, a completely different set of proteins

(N=3) from DUD-E database except for ROCK1 were chosen: FAK1 (Focal Adhesion Kinase) (PDB

ID: 3BZ3, 2.2Ǻ which is, a prominent target in many metastatic cancers; FABP4 (Fatty Acid Binding

protein) (PDB ID: 2NNQ, 1.8Ǻ), a well-documented target playing key role in modulation of diabetes,

insulin resistance and atherosclerosis. The choice of these proteins were based on the availability of

actives and decoys in the DUD-E database.

T
As all these structures were co-crystallized with ligands, the chains which were harbouring

IP
the ligands were chosen and optimized by addition of hydrogen atoms, fixing of missing side chains,

R
and overall geometry optimization using WHAT IF server. Later, the optimized structure was

SC
prepared for docking using default options in MGLTOOLS-1.5.7. Here, the non-polar hydrogens

were merged, gasteiger charges were added, AutoDock atom types were defined and saved in the

U
pdbqt format. The receptor grids for performing docking were also set around the active site regions.
N
A
2.9.3 Parallelized run of AutoDock Vina
M

The directory paths leading to working directory, AutoDock Vina configuration file, prepared

FDA ligand dataset from DrugBank in pdbqt format, the structural coordinates of the protein datasets
ED

(ROCK1, EthR, Pks13 and PqsA) in pdbqt format were provided as input during the evoke of virtual
PT

screening module in interactive mode. Moreover, the docking parameters with exhaustiveness of 8,

number of modes set to 20 and number of CPU’s set to 8 were assigned for running a single job.
E

Hence, these parameters will run 3 AutoDock Vina jobs in parallel with 24 CPU threads running as
CC

8 segmented threads per job. To cross validate the modified parallel job execution, AutoDock Vina

was run also with default parameters except retaining the exhaustiveness as similar to above.
A

2.9.4 Parallelized run of AutoDock

The directory paths containing prepared FDA ligand datasets from DrugBank in pdbqt format,

structural coordinates of the protein datasets (ROCK1, EthR, Pks13 and PqsA) in pdbqt, working

directory, .gpf, .dpf files were provided as input in the interactive mode. Further, the parallelized

16
virtual screening using AutoDock was initiated, wherein, initial grid calculation for all the ligand

atom types were performed, followed by execution of ligand docking run as 24 parallel jobs. To cross

validate the parallel job execution, the docking runs were also run in GNU Parallel disabled mode.

2.9.5 Speedup Ratio Calculation:

The time taken for completion of the parallelized jobs of ligand preparation and virtual

screening were captured from the POAP log files. Further, the time values from the log files were

T
IP
converted to speed ratio implementing the equation (Karmani et al., 2011):

R
𝑠𝑝𝑒𝑒𝑑𝑢𝑝 = 𝑇𝑖𝑚𝑒 𝑜𝑛 1 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑜𝑟⁄ 𝑇𝑖𝑚𝑒 𝑜𝑛 𝑃 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑜𝑟𝑠 (𝐸𝑞𝑛 1)

SC
The converted speedup ratio for all the above mentioned parallelized tasks (N=4) per

parallelized module) were averaged and corresponding standard deviations were calculated.

U
N
2.9.6 Evaluation of Parallelization efficiency of POAP with Amdahl’s Law:
A
The efficiency of parallelization in POAP was evaluated by direct measurement of observed
M

speedup and comparison with ideal speedup based on Amdahl’s law. For these evaluations, the ligand

preparation and virtual screening runs were run in with differential CPU allocation. As POAP runs
ED

only in parallelized mode and does not contain any serial execution scripts, the theoretical ideal speed

up value for Amdahl’s law was derived by substituting serial fraction f with 0 in Amdahl’s equation,
PT

(Eqn-2) (Amdahl GM 1967; Karmani et al., 2011). To calculate the observed speedup ratio for ligand
E

preparation module, FDA approved ligand dataset (1,288) from DrugBank was used. Initially, the
CC

ligands were processed in serial mode with single CPU core. Following which, parallelized runs with

4, 8, 12, 16, 20 and 24 CPU’s were performed. In a similar fashion, the speed up ratio for parallelized
A

virtual screening with AutoDock was calculated with ROCK1 vs. FDA approved ligand datasets from

DrugBank.

17
Amdahl’s Law was applied using the Equation:

𝑠𝑝𝑒𝑒𝑑𝑢𝑝 = 1⁄(𝑓 + (1 − 𝑓)/𝑃) (𝐸𝑞𝑛 2)

Where f is serial fraction of the code and P is number of processors.

2.9.7 Speedup ratio comparison for AutoDock Vina

AutoDock Vina is inherently multithreaded to run in parallel mode, hence, instead of

T
Amdahl’s calculation, a comparative speedup ratio analysis between default AutoDock Vina run and

IP
POAP mediated AutoDock Vina run was performed. Here, the time values from ROCK1 vs. FDA

R
approved ligands virtual screening were used for comparative speedup ratio analysis. For the default

SC
AutoDock Vina run, four sets of evaluations were performed wherein, the first set involves direct

serial run with no explicit mention of CPU allocation, other three involving CPU allocations of 8, 16

U
and 24 with --cpu flag of AutoDock Vina. In case of POAP mediated run, for every single AutoDock
N
Vina job, 8 CPUS were restricted, as the hardware used was only of 24 cores, hence, 8-core for single
A
job, 16-core for two jobs and 24 cores for three jobs were run in parallel. For all the AutoDock Vina
M

executions, Exhaustiveness value of 8 was applied globally. Finally, the speedup ratios of default
ED

AutoDock Vina and POAP mediated run were compared to evaluate the efficiency.

2.9.8 Redocking and RMSD calculation:


PT

The crystal structures of ROCK1, EthR, PqsA and Pks13 in complex with respective ligands
E

were processed for redocking by MGLTOOL-1.5.7. The receptor grids for all the proteins were set
CC

around the actual binding site observed in the crystallized form. Redocking studies were performed

with AutoDock Vina retaining the default parameters. Finally, RMSD (Root Mean Square Deviation)
A

on structural superposition of actual co-crystallized forms with corresponding redocked forms were

calculated to infer the predictive accuracy. RMSD values were calculated using rmsd.py available in

Schrodinger Maestro script library. This redocking analysis was performed to utilize the binding

energy score as positive control for demonstrating the drug repurposing applicability of POAP.

18
2.9.9 Multi receptor virtual screening and drug repurposing:

To demonstrate the applicability of multi receptor virtual screening module of POAP in drug

repurposing, ROCK1, EthR, Pks13 and PqsA structures were prepared, and their respective docking

configuration file were kept in a common folder with corresponding protein names. The directory

paths containing the prepared FDA datasets from DrugBank and configuration files were also given

as input for multi receptor virtual screening. This module was demonstrated only with AutoDock

T
Vina with exhaustiveness value of 8 and number of CPUs per job set to 8. The top 10 hits for each

IP
protein were ranked based on the binding energy. Further, these hits were analysed for intermolecular

R
interactions using Protein-Ligand interaction profiler (Salentin et al., 2015). Finally, a comparative

SC
analysis of binding energy and intermolecular interactions of redocked complexes vs. top hits were

performed, so as to shortlist the FDA ligands from DrugBank that shall be repurposed for targeting

the protein of interest. U


N
A
2.10 Docking Enrichment analysis:
M

To demonstrate the Docking enrichment analysis of POAP based virtual screening, the crystal

structures of three prominent protein targets: FAK1 (PDB ID: 3BZ3 2.2Ǻ), which is targeted in many
ED

metastatic cancer types (Lin et al., 2017); ROCK1 (PDB ID: 2ETR 2.6Ǻ) , a potential target in many
PT

cancers, glaucoma, atherosclerosis and vascular diseases (Defert and Boland, 2017); FABP4 (PDB

ID: 2NNQ 1.8Ǻ), a well-documented target for diabetes and atherosclerosis (Floresta et al., 2017)
E

were chosen. The active ligands and decoys corresponding to these targets were retrieved from DUD-
CC

E dataset (http://dude.docking.org/) (Mysinger et al., 2012) for performing the enrichment analysis.

To proceed with POAP based virtual screening, the protein structures were prepared using the
A

MGLTOOLS-1.5.7. Since the ligand datasets from DUD-E were in mol2 format, these were

converted to pdbqt format using POAP_lig.bash script. Further, virtual screening of these prepared

ligands vs. the chosen protein targets were performed using POAP_vs.bash script. For the Virtual

screening runs, area under curve (AUC) and Enrichment Factor (EF) predictions were performed to

19
validate the discrimination efficiency of actives from the decoys. The AUC for the Receiver operating

characteristics (ROC) curve was calculated using the Rocker tool (http://users.jyu.fi/~satalatt/cgi-

bin/rocker.cgi), wherein, the ligand binding energy scores were given as input (Lätti et al., 2016). A

stringent EF cut of 1% was given to validate the percentage of actives picked among the ranked

datasets.

3. Results and Discussion

T
IP
3.1 Speedup performance of Ligand Preparation Module in POAP:

The efficiency of ligand preparation module was validated using four (N=4) different ligand

R
databases: FDA approved drugs from DrugBank, Myriascreen-II, NDL and EFD. The validation runs

SC
featured seamless operability, leading to generation of ligand datasets with optimal geometry

U
favourable for virtual screening. Moreover, among the ligands prepared, Open Babel showed error
N
messages for few ligands during 2D to 3D conversion, conformer generation, minimization and pdbqt
A
conversion. POAP efficiently quarantined these ligands from rest of the ligand datasets, leading to
M

seamless parallel operability, thereby eliminating the perturbations during the ligand conversion

process. The runs resulted in optimized ligand datasets in pdbqt file format archived in pdbqt folder
ED

of the working directory. The time values of each parallelized run were obtained from corresponding

log files. Further, these values were converted to seconds, from which speedup factor was calculated
PT

using Eqn 1. Subsequently, the speedup factors from these four datasets (N=4) were averaged and
E

standard deviation (SD) across these runs was calculated. It was observed that, the parallelized mode
CC

conferred an average speedup of 14.788 times faster than the serial mode, with a least SD of 0.148

(Fig.3). Hence, proving the efficiency and reliability of this module.


A

3.2 Speedup performance of Virtual screening Module in POAP

The speedup performance of POAP based AutoDock and AutoDock Vina parallelization was

demonstrated by virtual screening FDA drugs from DrugBank vs. four proteins namely, ROCK1,

EthR, Pks13 and PqsA. During AutoDock based virtual screening, 24 jobs were executed in parallel

20
for each protein and corresponding time values were obtained and processed as similar to discussed

in section (3.1). The parallelized AutoDock run showed a speedup of 12.464 times faster execution

than the serial mode (N=4, SD=0.081). In case of AutoDock Vina, virtual screening was parallelized

to run three jobs in parallel, thereby, utilizing 8 CPU threads per job, which resulted in a speedup of

2.397 times faster execution than the default mode (N=4, SD=0.066) Fig.3.

T
R IP
SC
U
N
A
M
ED
PT

Fig. 3: Speedup ratio comparison for the POAP based runs of AutoDock, AutoDock Vina and
E

Open Babel vs. corresponding serial execution.


CC

3.3 Performance analysis of parallelization in POAP with Amdahl’s Law


A

The speedup ratio for ligand preparation (FDA DrugBank database) and AutoDock (Virtual

screening of FDA drugs from DrugBank against ROCK1) are shown in Fig.4. This indicates

sequential increase in the speedup of the process along the increase in the number of processors. The

speedup ratio for this module tends to drop below the theoretical ideal value when it crosses 8

21
processors, but a sub linear increase in speedup ratio was observed with the increment in number of

processors. Hence, the performance is proportionate to the increase in number for CPUs.

T
R IP
SC
U
Fig.4: Parallel performance comparison for AutoDock and Open Babel speedup ratio.
N
A
3.4 Speedup ratio analysis of default vs. POAP mediated AutoDock Vina run
M

Speedup ratio analysis of AutoDock Vina default mode vs. POAP triggered mode, infers that

increment of processors (8, 16, 24) with AutoDock Vina inbuilt --cpu flag does not affect the speedup
ED

factor and tends to remain similar to that of default mode (Fig.5). Whereas, in case of POAP mediated
PT

CPU allocation, a significant increase in the speedup ratio was observed (Fig.5). Since the hardware

used was of 24 cores, 8-cores for single job, 16-cores for two jobs and 24 cores for three jobs, were
E

run in parallel through POAP. This clearly signifies POAP mediated AutoDock Vina run will lead to
CC

maximal parallelization efficiency.


A

22
T
R IP
SC
Fig.5: Speedup ratio comparison between default and POAP mediated run of AutoDock Vina.

U
N
3.5 Enrichment analysis:
A
M

The ROC curve analysis helps to discriminate the success of identifying the active compounds

from the datasets containing inactive compounds (Fawcett, 2006). Area under curve (AUC) value
ED

analysis of ROC provides the information on fraction of enrichment of active compounds in a virtual

screening run. Generally, AUC values greater than 0.5, indicates the significant increase in enriched
PT

identification of actives among the larger decoy sets (Empereur-Mot et al., 2016). Both AutoDock

and AutoDock VINA were able to successfully the discriminate the active compounds over large
E
CC

dataset of inactives. On comparing the AUC values of AutoDock (AD) and AutoDock Vina (ADV)

from the virtual screening results of three proteins, FAK1 (AD=0.60, ADV= 0.78), ROCK1
A

(AD=0.78, ADV=0.73) and FABP4 (AD=0.70, ADV=0.81) (Fig.6) it could be inferred that

AutoDock Vina to be a slightly better discriminator of actives than AutoDock. The Enrichment Factor

provides us the information on early recognition of percentage of active compounds among the ranked

list. Except for ROCK1, AutoDock Vina was mostly able to identify higher number of actives among

the 1% of the dataset compared to AutoDock: FAK1 (AD=0.88%, ADV=17.59%), ROCK1 (7.86%,

23
4.91%) and FABP4 (AD=5.28%, ADV=35.23%) (Table.S1).

T
R IP
Fig.6: ROC curve for the Docking Enrichment analysis: (a) AutoDock (b) AutoDock Vina.

SC
Colour representation: FAK1- blue; ROCK1-red; FABP4-green.

3.7 Multi receptor docking and Drug Repurposing:

U
In order to demonstrate the applicability of AutoDock Vina based multi receptor docking
N
module of POAP in drug repurposing, FDA drugs from DrugBank against the four targets namely,
A

ROCK1, EthR, Pks13 and PqsA were selected. Prior to virtual screening, these proteins were
M

redocked to the corresponding co-crystallized ligands (Y27 for ROCK1, P93 for EthR, I28 for Pks13
ED

and 3UK for PqsA) using AutoDock Vina and the respective binding energies on redocking were

noted. During the redocking process, the allowed torsional rotations for the co-crystallized Ligands
PT

were set to 0, so as to conserve the native conformation. Further, the redocked complexes were

compared to the respective co-crystallized forms by structural superposition and the corresponding
E
CC

ligand RMSD values were tabulated. This was performed to assess the predictive accuracy of the

docking method. The Ligand RMSD values were found to be in the range of 0.16Ǻ to 0.48Ǻ,
A

suggesting the reliability of the method adopted. The Protein-ligand interactions in the redocked

complexes were analysed using Protein Ligand Interaction Profiler (PLIP) (Salentin et al., 2015). The

binding energies and the protein ligand interactions details of the redocked complexes were used as

reference for identifying newer hits which shall be repurposed from FDA datasets. Further, the multi

receptor docking was performed using POAP, which resulted in a tab delimited file containing the

24
binding energies of all the FDA datasets vs. the four proteins studied. Here, the ligands were sorted

in accordance to the maximal binding energy exhibited across all the proteins. It also featured

standard deviation of binding energies for each ligand vs. all the proteins. Finally, the probable re-

purposable ligands from FDA datasets for each protein were concluded based on the top scoring

ligands with desirable intermolecular interactions.

ROCK1 vs. FDA dataset

T
IP
Redocking of Y27 to ROCK1 inferred the binding energy to be -7.5 Kcal/mol and was also

found to maintain the closeness with the Co-crystallized form (RMSD 0.48 A) retaining the key

R
interactions: H-bonds with the M156 in the hinge region and D216 of the activation loop, hydrophobic

SC
interactions with the glycine-rich loop region residues of I82 and V90, thereby, inferring the correct

U
binding orientation target the ATP binding site (Jacobs et al., 2006). Based on the comparative
N
analysis of binding energy and residue interactions with Y27, two top ranking hits namely, DB06210
A
(Eltrombopag) and DB09280 (Lumacaftor) were found to have lower binding energies of -10.7
M

Kcal/mol and -10.5 Kcal/mol, respectively (Fig.S1). These two ligands were found to form H-bonds

with N203, D216, K105 of activation loop and also π-cation cum salt bridge interactions with K105.
ED

It should be noted that all these residues together play key role in coordination of Mg2+ and ATP

during catalytic activity (Jacobs et al., 2006). These two ligands also formed H-bonded and
PT

hydrophobic interactions with residues in the activation loop, glycine rich loop and also the hinge
E

region residues (Table. S3). Moreover, DB09280 (Lumacaftor) formed halogen bond with M156 of
CC

the hinge region which play an important role in the ATP binding in the pocket. Hence, based on

these inferences, DB06210 which used to stimulate platelet production and DB09280 used to treat
A

cystic fibrosis (CF) shall be validated for repurposing use as inhibitors in ROCK1 activity related

diseases.

25
EthR vs. FDA dataset

Redocking of P93 with EthR inferred the binding energy to be -8.9 Kcal/mol and was also

found to maintain the structural closeness with the co-crystallized form (RMSD 0.25Ǻ) holding the

key interactions: H-bonded interactions with N176, N179 residues, hydrophobic interactions to the

tunnel region spanning residues (F110, W138, W145, W207), and π-π stacking with F110. The

hydrophobic residues harbouring the tunnel region play a key role in orienting the HTH motif during

T
DNA binding of this transcription repressor (Nikiforov et al., 2017). Based on the binding energies

IP
and key residue interactions, top two hit compounds: DB0079 (Sulfasalazine, -11.4 Kcal/mol) and

R
DB00450 (Droperidol,-10.7 Kcal/mol) were found to be highly potential (Fig.S2). These ligands

SC
formed H-bonds with N176, N179 which are key residues for targeting EthR and also synonymous

with P93 interaction. Moreover, these ligands also found to form Hydrophobic interactions and π-π

U
stacking interactions with the residues in the subpocket I, II, II which are shown to interact with most
N
potential combinatorial drugs targeting EthR (Nikiforov et al., 2016) (Table.S4). Taken together,
A

these two hits were found to be highly potential, as these were found to interact with the druggable
M

hotspot regions of EthR. Sulfasalazine is used in treatment of Crohn's disease and rheumatoid arthritis,
ED

whilst, Droperidol, is used to treat nausea and vomiting. These drugs shall be further validated for

repurposing efficacy towards boosting the ethionamide bioactivation in Mycobacterium tuberculosis


PT

(Surade et al., 2014).


E

Pks13 vs. FDA dataset:


CC

Redocking of I28 (TAM1) with Pks13 inferred the binding energy to be -13.3 Kcal/mol and

also showed structural nearness with the corresponding co-crystallized structure (RMSD of 0.16Å)
A

retaining the key intermolecular interactions: hydrophobic and π-π stacking interactions with F1670,

Salt bridge interaction with D1644, and hydrophobic interactions with residues Y1637, N1640,

Y1663, A1667, F1670, and T1674 which harbour the substrate binding groove (Fig.S3). Based on

binding energies and residue interactions, top three hits DB09280 (Lumacaftor (used in Cystic

26
Fibriosis treatment), -11.8 Kcal/mol), DB00972 (Azelastine (treatment of allergic & non-allergic

rhinitis), -11.7 Kcal/mol), and DB06210 (Eltrombopag (used to stimulate platelet count), -11.7

Kcal/mol) from FDA datasets were shortlisted as potential inhibitors of Pks13 (Fig.S3). All these

three ligands formed hydrophobic and π-π-stacking interactions with F1670 similar to TAM1.

Moreover, these three ligands showed hydrophobic interactions with residues in the substrate binding

groove similar to TAM1 (Table.S5). Based on these inferences, these three hits shall be considered

T
as potential repurpose inhibitors of pks13 targeting mycolic acid synthesis in Mycobacterium

IP
tuberculosis (Aggarwal et al., 2017).

R
PqsA vs. FDA dataset:s

SC
The redocking analysis of anthraniloyl-AMP (3UK) to PqsA revealed the binding energy to

U
be -14.6 Kcal/mol and also found to show structural closeness to the co-crystallized form (RMSD of
N
0.29 Å) retaining the key intermolecular interactions: H-bond with residues G279, G300, G302, T304
A
and hydrophobic interactions with other set of residues which play key role in substrate binding
M

(Fig.S4). From the virtual screening run, DB09074 (Olaparib, (used in treatment of Ovarian Cancer)

and DB06817 (Raltegravir, (used in treatment of HIV-1 infection) were found to be stable binders,
ED

as these showed binding energy of -10.4 and -10.2 Kcal/mol, respectively. These compounds were

found to fit well into the active site region (Anthraniloyl-AMP & substrate binding site) through H-
PT

bonded interactions with G279, G300, G302 and hydrophobic interactions with F209, Y211.
E

Moreover, these ligands also formed π-π-stacking interactions with H308, mimicking the interacting
CC

pattern of anthraniloyl-AMP (Table.S6). On cumulative analysis, it could be inferred that these

ligands to potential inhibitors of PqsA, and shall be validated for repurposing capability towards
A

targeting PQS mediated quorum signalling pathway in Pseudomonas aeruginosa infections (Witzgall

et al., 2017).

27
4. Conclusion:

POAP is distinct of its kind in utilizing the potential of GNU Parallel for parallelization of

ligand preparation by Open Babel and virtual screening using AutoDock suite. It features a unique

and important function of quarantining the erroneous ligands which is essential for unperturbed

parallelized run. The efficiency of POAP modules in handling different datasets has been well

demonstrated in this study. POAP is distributed freely under GNU GPL license with extensive manual

T
and supporting tutorials, thereby enabling ease of use. In future versions, it is intended to include

IP
advanced parallelization involving splitting of sub jobs, inbuilt modules for drug enrichment analysis

R
and parsing of intermolecular interactions. POAP will be of significant use for many of the aspirants

SC
who intend to use open source based high throughput drug discovery methods.

Conflict of interest
U
N
The authors declare that there are no conflicts of interest.
A
Acknowledgements
M

We would like to acknowledge Department of Bio-Technology (DBT), Ministry of Science and


ED

Technology, Government of India, for providing financial assistance through DBT-JRF Fellowship

[DBT/2015/VRF/363] to Samdani for carrying out this work. The authors also thank DBT Rapid
PT

Grant for Young Investigator (RGYI) scheme [BT/PR6476/ GBD/27/496/2013, 05/09/2013] for the

hardware support.
E
CC

References

Aggarwal, A., Parai, M.K., Shetty, N., Wallis, D., Woolhiser, L., Hastings, C., Dutta, N.K., Galaviz,
A

S., Dhakal, R.C., Shrestha, R., Wakabayashi, S., Walpole, C., Matthews, D., Floyd, D., Scullion,

P., Riley, J., Epemolu, O., Norval, S., Snavely, T., Robertson, G.T., Rubin, E.J., Ioerger, T.R.,

Sirgel, F.A., van der Merwe, R., van Helden, P.D., Keller, P., Böttger, E.C., Karakousis, P.C.,

Lenaerts, A.J., Sacchettini, J.C., 2017. Development of a Novel Lead that Targets M. tuberculosis

Polyketide Synthase 13. Cell 170 (2), 249-259.e25. 10.1016/j.cell.2017.06.025.

28
Amdahl, G.M. Validity of the single processor approach to achieving large scale computing

capabilities, in: the April 18-20, 1967, spring joint computer conference, Atlantic City, New

Jersey, p. 483.

Årdal, C., Røttingen, J.-A., 2012. Open source drug discovery in practice: a case study. PLoS

neglected tropical diseases 6 (9), e1827. 10.1371/journal.pntd.0001827.

Chen, Y.-C., 2015. Beware of docking! Trends in pharmacological sciences 36 (2), 78–95.

T
10.1016/j.tips.2014.12.001.

IP
Defert, O., Boland, S., 2017. Rho kinase inhibitors: a patent review (2014 - 2016). Expert opinion on

R
therapeutic patents 27 (4), 507–515. 10.1080/13543776.2017.1272579.

SC
Empereur-Mot, C., Zagury, J.-F., Montes, M., 2016. Screening Explorer-An Interactive Tool for the

Analysis of Screening Results. Journal of chemical information and modeling 56 (12), 2281–2286.

10.1021/acs.jcim.6b00283.
U
N
Fawcett, T., 2006. An introduction to ROC analysis. Pattern Recognition Letters 27 (8), 861–874.
A
10.1016/j.patrec.2005.10.010.
M

Ferreira, L.G., Dos Santos, R.N., Oliva, G., Andricopulo, A.D., 2015. Molecular docking and

structure-based drug design strategies. Molecules (Basel, Switzerland) 20 (7), 13384–13421.


ED

10.3390/molecules200713384.
PT

Floresta, G., Pistarà, V., Amata, E., Dichiara, M., Marrazzo, A., Prezzavento, O., Rescifina, A., 2017.

Adipocyte fatty acid binding protein 4 (FABP4) inhibitors. A comprehensive systematic review.
E

European journal of medicinal chemistry 138, 854–873. 10.1016/j.ejmech.2017.07.022.


CC

Jacobs, M., Hayakawa, K., Swenson, L., Bellon, S., Fleming, M., Taslimi, P., Doran, J., 2006. The

structure of dimeric ROCK I reveals the mechanism for ligand selectivity. The Journal of
A

biological chemistry 281 (1), 260–268. 10.1074/jbc.M508847200.

Ji, C., Sharma, I., Pratihar, D., Hudson, L.L., Maura, D., Guney, T., Rahme, L.G., Pesci, E.C.,

Coleman, J.P., Tan, D.S., 2016. Designed Small-Molecule Inhibitors of the Anthranilyl-CoA

Synthetase PqsA Block Quinolone Biosynthesis in Pseudomonas aeruginosa. ACS chemical

29
biology 11 (11), 3061–3067. 10.1021/acschembio.6b00575.

Kalyani G, 2013. A review on drug designing, methods, its applications and prospects. Int J Pharm

Res Dev 5, 15–30.

Karmani, R.K., Agha, G., Squillante, M.S., Seiferas, J., Brezina, M., Hu, J., Tuminaro, R., Sanders,

P., Träffe, J.L., Geijn, R.A., Träff, J.L., Sander, M.B., Gustafson, J.L., Dror, R.O., Young, C.,

Shaw, D.E., Lin, C., Lee, J.-K., Chang, R.-G., Kuan, C.-B., Kollias, G., Grama, A.Y., Li, Z.,

T
Whaley, R.C., Vuduc, R.W., 2011. Amdahl’s Law, in: Padua, D. (Ed.), Encyclopedia of Parallel

IP
Computing. Springer US, Boston, MA, pp. 53–60.

R
Kuhn, B., Guba, W., Hert, J., Banner, D., Bissantz, C., Ceccarelli, S., Haap, W., Körner, M.,

SC
Kuglstatter, A., Lerner, C., Mattei, P., Neidhart, W., Pinard, E., Rudolph, M.G., Schulz-Gasch, T.,

Woltering, T., Stahl, M., 2016. A Real-World Perspective on Molecular Design. Journal of

U
medicinal chemistry 59 (9), 4087–4102. 10.1021/acs.jmedchem.5b01875.
N
Lätti, S., Niinivehmas, S., Pentikäinen, O.T., 2016. Rocker: Open source, easy-to-use tool for AUC
A
and enrichment calculations and ROC visualization. Journal of cheminformatics 8 (1), 45.
M

10.1186/s13321-016-0158-y.

Lesic, B., Lépine, F., Déziel, E., Zhang, J., Zhang, Q., Padfield, K., Castonguay, M.-H., Milot, S.,
ED

Stachel, S., Tzika, A.A., Tompkins, R.G., Rahme, L.G., 2007. Inhibitors of pathogen intercellular
PT

signals as selective anti-infective compounds. PLoS pathogens 3 (9), 1229–1239.

10.1371/journal.ppat.0030126.
E

Li, H., Gao, Z., Kang, L., Zhang, H., Yang, K., Yu, K., Luo, X., Zhu, W., Chen, K., Shen, J., Wang,
CC

X., Jiang, H., 2006. TarFisDock: a web server for identifying drug targets with docking approach.

Nucleic acids research 34 (Web Server issue), W219-24. 10.1093/nar/gkl114.


A

Lill, M.A., Danielson, M.L., 2011. Computer-aided drug design platform using PyMOL. Journal of

computer-aided molecular design 25 (1), 13–19. 10.1007/s10822-010-9395-8.

Lin, V.T.G., Pruitt, H.C., Samant, R.S., Shevde, L.A., 2017. Developing Cures: Targeting

Ontogenesis in Cancer. Trends in cancer 3 (2), 126–136. 10.1016/j.trecan.2016.12.007.

30
Medina-Franco, J.L., Giulianotti, M.A., Welmaker, G.S., Houghten, R.A., 2013. Shifting from the

single to the multitarget paradigm in drug discovery. Drug discovery today 18 (9-10), 495–501.

10.1016/j.drudis.2013.01.008.

Morris, G.M., Huey, R., Lindstrom, W., Sanner, M.F., Belew, R.K., Goodsell, D.S., Olson, A.J., 2009.

AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility. Journal

of computational chemistry 30 (16), 2785–2791. 10.1002/jcc.21256.

T
Mysinger, M.M., Carchia, M., Irwin, J.J., Shoichet, B.K., 2012. Directory of useful decoys, enhanced

IP
(DUD-E): better ligands and decoys for better benchmarking. Journal of medicinal chemistry 55

R
(14), 6582–6594. 10.1021/jm300687e.

SC
Nikiforov, P.O., Blaszczyk, M., Surade, S., Boshoff, H.I., Sajid, A., Delorme, V., Deboosere, N.,

Brodin, P., Baulard, A.R., Barry, C.E., Blundell, T.L., Abell, C., 2017. Fragment-Sized EthR

U
Inhibitors Exhibit Exceptionally Strong Ethionamide Boosting Effect in Whole-Cell
N
Mycobacterium tuberculosis Assays. ACS chemical biology 12 (5), 1390–1396.
A
10.1021/acschembio.7b00091.
M

Nikiforov, P.O., Surade, S., Blaszczyk, M., Delorme, V., Brodin, P., Baulard, A.R., Blundell, T.L.,

Abell, C., 2016. A fragment merging approach towards the development of small molecule
ED

inhibitors of Mycobacterium tuberculosis EthR for use as ethionamide boosters. Organic &
PT

biomolecular chemistry 14 (7), 2318–2326. 10.1039/c5ob02630j.

O. Tange, 2011. GNU Parallel - The Command-Line Power Tool. ;login: The USENIX Magazine,
E

42–47.
CC

O'Boyle, N.M., Banck, M., James, C.A., Morley, C., Vandermeersch, T., Hutchison, G.R., 2011a.

Open Babel: An open chemical toolbox. Journal of cheminformatics 3, 33. 10.1186/1758-2946-


A

3-33.

O'Boyle, N.M., Vandermeersch, T., Flynn, C.J., Maguire, A.R., Hutchison, G.R., 2011b. Confab -

Systematic generation of diverse low-energy conformers. Journal of cheminformatics 3, 8.

10.1186/1758-2946-3-8.

31
Prakhov, N.D., Chernorudskiy, A.L., Gainullin, M.R., 2010. VSDocker: a tool for parallel high-

throughput virtual screening using AutoDock on Windows-based computer clusters.

Bioinformatics (Oxford, England) 26 (10), 1374–1375. 10.1093/bioinformatics/btq149.

Rahman, M.M., Karim, M.R., Ahsan, M.Q., Khalipha, A.B.R., Chowdhury, M.R., Saifuzzaman, M.,

2012. Use of computer in drug design and drug discovery: A review. Int. J. Pharma Life Sci. 1

(2). 10.3329/ijpls.v1i2.12955.

T
Riento, K., Ridley, A.J., 2003. Rocks: multifunctional kinases in cell behaviour. Nature reviews.

IP
Molecular cell biology 4 (6), 446–456. 10.1038/nrm1128.

R
Salentin, S., Schreiber, S., Haupt, V.J., Adasme, M.F., Schroeder, M., 2015. PLIP: fully automated

SC
protein-ligand interaction profiler. Nucleic acids research 43 (W1), W443-7. 10.1093/nar/gkv315.

Sandeep, G., Nagasree, K.P., Hanisha, M., Kumar, M.M.K., 2011. AUDocker LE: A GUI for virtual

U
screening with AUTODOCK Vina. BMC research notes 4, 445. 10.1186/1756-0500-4-445.
N
Santos-Martins, D., Forli, S., Ramos, M.J., Olson, A.J., 2014. AutoDock4(Zn): an improved
A
AutoDock force field for small-molecule docking to zinc metalloproteins. Journal of chemical
M

information and modeling 54 (8), 2371–2379. 10.1021/ci500209e.

Śledź, P., Caflisch, A., 2017. Protein structure-based drug design: from docking to molecular
ED

dynamics. Current opinion in structural biology 48, 93–102. 10.1016/j.sbi.2017.10.010.


PT

Sliwoski, G., Kothiwale, S., Meiler, J., Lowe, E.W., 2014. Computational methods in drug discovery.

Pharmacological reviews 66 (1), 334–395. 10.1124/pr.112.007336.


E

Surade, S., Ty, N., Hengrung, N., Lechartier, B., Cole, S.T., Abell, C., Blundell, T.L., 2014. A
CC

structure-guided fragment-based approach for the discovery of allosteric inhibitors targeting the

lipophilic binding site of transcription factor EthR. The Biochemical journal 458 (2), 387–394.
A

10.1042/BJ20131127.

Takayama, K., Wang, C., Besra, G.S., 2005. Pathway to synthesis and processing of mycolic acids in

Mycobacterium tuberculosis. Clinical microbiology reviews 18 (1), 81–101.

10.1128/CMR.18.1.81-101.2005.

32
Trott, O., Olson, A.J., 2010. AutoDock Vina: improving the speed and accuracy of docking with a

new scoring function, efficient optimization, and multithreading. Journal of computational

chemistry 31 (2), 455–461. 10.1002/jcc.21334.

Umashankar, V. G. S., & Gurunathan, S., 2015. Drug discovery: an appraisal. Int J Pharm

Pharmaceutical Sci 7, 59–66.

Wang, Z., Sun, H., Yao, X., Li, D., Xu, L., Li, Y., Tian, S., Hou, T., 2016. Comprehensive evaluation

T
of ten docking programs on a diverse set of protein-ligand complexes: the prediction accuracy of

IP
sampling power and scoring power. Physical chemistry chemical physics : PCCP 18 (18), 12964–

R
12975. 10.1039/c6cp01555g.

SC
Willand, N., Dirié, B., Carette, X., Bifani, P., Singhal, A., Desroses, M., Leroux, F., Willery, E.,

Mathys, V., Déprez-Poulain, R., Delcroix, G., Frénois, F., Aumercier, M., Locht, C., Villeret, V.,

U
Déprez, B., Baulard, A.R., 2009. Synthetic EthR inhibitors boost antituberculous activity of
N
ethionamide. Nature medicine 15 (5), 537–544. 10.1038/nm.1950.
A
Wishart, D.S., Feunang, Y.D., Guo, A.C., Lo, E.J., Marcu, A., Grant, J.R., Sajed, T., Johnson, D., Li,
M

C., Sayeeda, Z., Assempour, N., Iynkkaran, I., Liu, Y., Maciejewski, A., Gale, N., Wilson, A.,

Chin, L., Cummings, R., Le, D., Pon, A., Knox, C., Wilson, M., 2017. DrugBank 5.0: a major
ED

update to the DrugBank database for 2018. Nucleic acids research. 10.1093/nar/gkx1037.
PT

Witzgall, F., Ewert, W., Blankenfeldt, W., 2017. Structures of the N-Terminal Domain of PqsA in

Complex with Anthraniloyl- and 6-Fluoroanthraniloyl-AMP: Substrate Activation in


E

Pseudomonas Quinolone Signal (PQS) Biosynthesis. Chembiochem : a European journal of


CC

chemical biology 18 (20), 2045–2055. 10.1002/cbic.201700374.

Xia, X., 2017. Bioinformatics and Drug Discovery. Current topics in medicinal chemistry 17 (15),
A

1709–1726. 10.2174/1568026617666161116143440.

Zhang, S., Kumar, K., Jiang, X., Wallqvist, A., Reifman, J., 2008. DOVIS: an implementation for

high-throughput virtual screening using AutoDock. BMC bioinformatics 9, 126. 10.1186/1471-

2105-9-126.

33
Figure legends:

Fig.1: Detailed flowchart of Ligand preparation module. The dotted red line indicates the

quarantining of erroneous ligands during the preparation process.

Fig.2: Detailed flowchart of Virtual screening module.

Fig. 3: Speedup ratio comparison for the POAP based runs of AutoDock, AutoDock Vina and

T
IP
Open Babel vs. corresponding serial execution.

R
Fig.4: Parallel performance comparison for AutoDock and Open Babel speedup ratio.

SC
Fig.5: Speedup ratio comparison between default and POAP mediated run of AutoDock Vina.

U
Fig.6: ROC curve for the Docking Enrichment analysis: (a) AutoDock (b) AutoDock Vina.
N
Colour representation: FAK1- blue; ROCK1-red; FABP4-green.
A
M
ED
E PT
CC
A

34

You might also like