A Self-Attention Based Message Passing Neural Netw
A Self-Attention Based Message Passing Neural Netw
Abstract
Efficient and accurate prediction of molecular properties, such as lipophilicity and solubility, is highly desirable for
rational compound design in chemical and pharmaceutical industries. To this end, we build and apply a graph-neural-
network framework called self-attention-based message-passing neural network (SAMPN) to study the relationship
between chemical properties and structures in an interpretable way. The main advantages of SAMPN are that it
directly uses chemical graphs and breaks the black-box mold of many machine/deep learning methods. Specifically,
its attention mechanism indicates the degree to which each atom of the molecule contributes to the property of
interest, and these results are easily visualized. Further, SAMPN outperforms random forests and the deep learning
framework MPN from Deepchem. In addition, another formulation of SAMPN (Multi-SAMPN) can simultaneously
predict multiple chemical properties with higher accuracy and efficiency than other models that predict one spe-
cific chemical property. Moreover, SAMPN can generate chemically visible and interpretable results, which can help
researchers discover new pharmaceuticals and materials. The source code of the SAMPN prediction pipeline is freely
available at Github (https://github.com/tbwxmu/SAMPN).
Keywords: Message passing network, Attention mechanism, Deep learning, Lipophilicity, Aqueous solubility
© The Author(s) 2020. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and
the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material
in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material
is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the
permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativeco
mmons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/
zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
molecular structure into a series of binary digits (a bit features, which may require advanced variable selec-
vector) [10] based on substructures that may or may not tion techniques or a high-level of empirical knowledge.
be pre-defined, depending on the class of fingerprints In contrast, some deep learning networks based on the
being used. For example, extended-connectivity finger- simplified molecular input line entry system (SMILES)
prints (ECFP) can split one molecule into many substruc- [12] codes can automatically learn the molecular features
tures (not pre-defined) and encode all of them into just [13, 14]. However, this may cause the model to focus on
one bit vector with different identifiers [11]. ENREF_10 the SMILES grammar and not the implicated molecu-
Alternatively, bit vectors may be extended into count vec- lar structure. This limitation of the SMILES-based deep
tors that indicate the number of each substructure found learning models is hard to avoid as the SMILES repre-
in the molecule, not just its presence/absence. sentation is not designed to capture molecular similar-
Compared to the previously-mentioned traditional ity. Generally, molecules with similar chemical structures
methods, artificial neural networks (ANNs) have become can be encoded into very different SMILES strings. Even
increasingly popular in predicting molecular properties. for the same molecular structure, there are often non-
For example, a three-layered ANN with E-state indi- unique SMILES strings as Fig. 1A displays. Though the
ces was used to predict aqueous solubility of organic process of generating canonical SMILES is well known,
molecules [15]. More recently, graph-based networks the process is inconsistent among chemical toolkits.
were applied to predict lipophilicity and solubility [16]. For example, the ‘canonical’ SMILES code for caffeine is
These network-based models have shown impressive CN1C=NC2=C1C(=O)N(C)C(=O)N2C according to
results and made good contributions for developing new RDKit, Cn1cnc2c1c(=O)n(C)c(=O)n2C according to
methods. Obabel, and CN1C=NC2=C1C(=O)N(C(=O)N2C)C
Fixed fingerprint feature extraction rules of molecules according to PubChem.
are useful to accurately reflect underlying chemical sub- Using the natural chemical graph instead of the
structures, though these may not be the best-suited SMILES representation may be more suitable for chemi-
representation for all tasks. Hence, researchers have cal property predictions. Briefly, a graph consists of
to spend much time and effort to carefully determine nodes and edges that connect two or more nodes to one
which features are most relevant to their models. This another. Analogously, a chemical graph considers atoms
is especially problematic with the utilization of physical as nodes and bonds as the edges connecting atoms to
Fig. 1 Conversion of a chemical structure into a mathematical graph. a A chemical structure usually has a unique graph but multiple SMILES
strings. b Relationship list between node indices and edge indices, which are converted from the chemical graph. c The lists of Node2Edge,
Edge2Node, Edge2Revedge and Node2NeiNode, derived from (b)
one another. Our formulation considers these edges as in a logarithmic form as logP. The raw lipophilicity data
bidirectional, meaning that the bond connecting atom was downloaded from CHEMBL3301361 deposited by
A to atom B is the same as the bond connecting atom B AstraZeneca [24] and includes 4200 molecules. Aqueous
to atom A. An example chemical graph can be seen in solubility is the saturated concentration of the chemi-
Fig. 1a. cal in the aqueous phase, which is usually displayed with
Essential chemical properties such as molecular valid- unit log(mol/L) and is represented as logS. This dataset
ity are more easily represented in two-dimensional chem- was downloaded from the online chemical database and
ical graphs than linear SMILES. Unlike SMILES codes modeling environment (OCHEM) [25] and includes 1311
chemical graphs are invariant to molecule permutations, experimental records. The dataset distributions are plot-
i.e., one molecular structure has one graph but multiple ted in Additional file 1: Fig. S1.
SMILES representations. Recently, graph-based deep As both datasets are small relative to the typical size
learning models are reported in QSAR and QSPR studies requirements of deep learning models, we use tenfold
[7, 17–21]. However, according to these references, pre- stratified cross-validation [13, 23, 35], where each dataset
dictions are difficult to interpret, since most neural net- was randomly split into a training and validation set (80%
works act as black boxes [22]. and 10%, respectively) for parameter selection and a test
In this paper, we describe a self-attention-based mes- dataset (10%) for model comparisons. Then, we repeated
sage-passing neural network (SAMPN) model, which is all experiments three times with different random seeds.
a modification of Deepchem’s MPN [16] and is state-of- This process ensures that the model does not simply
the-art in deep learning. It directly learns the most rel- memorize the training and is capable of generalizing to
evant features of each QSAR/QSAPR task in the learning new molecules.
process and assigns the degree of importance for sub- For the initial data preprocessing, duplicate molecules
structures to improve the interpretability of prediction. were removed so that each chemical structure in the data
Our SAMPN graph network utilizes the chemical graph was unique, while the maximum one of the related prop-
structure described above, where each edge is derived erties was kept. Molecules unrecognized by RDkit (ver-
from the chemical bond and each atom is the node. Both sion 2019.3) [26], a cheminformatics toolkit implemented
our message passing neural network (MPN) and SAMPN in Python, were also deleted. Only two columns (‘smiles’
model can be used as multi-target models (Multi-MPN and ‘experimental value’) were kept as the input data to
or Multi-SAMPN), which can learn not only the relation- our models. Each downloaded SMILES representation
ship between chemical structures and properties, but was then converted into a directed graph before train-
also the relationship between intrinsic attributes of mol- ing the SAMPN model using the MPN encoder, which
ecules. To demonstrate our computational methods, we was adapted from Deepchem and Chemprop [27, 28].
chose lipophilicity and aqueous solubility as the target The directed graphs were mainly composed of index lists
properties as they were very important chemical descrip- of nodes and edges shown in Fig. 1c. Take the substruc-
tors that pervade every aspect of bioactivity, drug metab- ture of N–C as an example: a chemical bond between
olism and pharmacokinetic (DMPK) profiles [23]. the N and C atoms can derive two edges (C:0 → N:1 and
To our knowledge, this is the first time that a model N:0 → C:1). The number of nodes is equal to the num-
like SAMPN has been used to predict chemical proper- ber of atoms and the number of edges is always dou-
ties from experimental data for QSPR studies. The results ble the number of bonds, since we consider edges to be
from our experiments demonstrate that our SAMPN net- bidirectional.
work yields superior performance relative to traditional
ML-based models and previous deep-learning models Message passing network encoder
(i.e., Deepchem’s MPN [16]). Furthermore, the predic- Instead of manually selected features, using molecu-
tions of SAMPN are easily understood and visualized, lar graph structures directly was first reported in 1994
since the integrated attention mechanism can color the [29]. In recent years, graph-based methods have been
atoms of the molecule based on their contributions to the used to analyze various aspects of chemical systems [14,
property of interest. 30] and compare with fingerprints [31]. Graph-based
models provide a natural way to describe chemical mol-
Methods and materials ecules, where atoms in the molecule are equivalent to
Datasets and data process nodes and chemical bonds to the edges in a graph. The
Datasets of molecular lipophilicity and aqueous solu- message-passing network is a variant of the graph-theo-
bility were used for developing and testing our method. retical approaches, which gradually merges information
Lipophilicity is usually quantified by the n-octanol/ from distant atoms by extending radially through bonds
water partition coefficient P and preferentially displayed as displayed in Fig. 2. Those passing messages were used
Fig. 2 Representation of SAMPN architecture. The main part of the MPN encoder converts the neighbor features to a molecule matrix, then
followed by a self-attention layer and fully connected networks to make a final prediction
to encode all substructures of a molecule by an adaptive Table 1 Descriptions of node and edge features
learning approach, which extracts useful representations
Attribute Description Dimension
of molecules suited to the target predictions.
The message passing network encoder works as fol- Node
lows in Eqs. (1–3). The passing message M from atom x Atom type All currently known chemical elements 118
to atom y in the d-th iteration (message passing depth) is Degree Number of heavy atom neighbors 6
calculated as follows: Formal charge Charge assigned to an atom (− 2, − 1, 5
0, 1, 2)
d=1
Mxy = Re Winp · fx fy (1) Chirality label R, S, unspecified and unrecognized 4
type of chirality
Hybridization sp, sp2, sp3, sp3d, or sp3d2 5
� Aromaticity Aromatic atom or not 1
d>1 d−1
Mxy = ReWinp · fx fy + Wh Mzx (2) Edge
z∈N (x)\y Bond type Single, double, triple, or aromatic 4
Ring Whether the bond is in a ring 1
Here, Re is the activation function (Relu). Winp and
Bond stereo Nature of the bond’s stereochemistry 6
Wh are the learned weight matrices. As we use the edge- (none, any, Z, E, cis, or trans)
dependent neural network to pass a message, node fea-
ture fx is concatenated with edge feature fxy to form the
merged node-edge feature fxfxy. Node feature fx, is derived
by atom type, formal charge, valence, and aromaticity. Node x is allowed to send a message to a neighbor node
Similarly, edge feature fxy is derived from bond order, ring y only after node x has received messages from all neigh-
status and direction connection. The definitions of node bor nodes except y. We use the skip connection in the
fx features and edge fxy features are displayed in Table 1. message passing steps as in Fig. 2 (displayed in between
The initial message Mxy d=1, which x sends to y, is gener-
neighbor features and self-features). This skip connection
ated from the merged node-edge feature fxfxy by a neural allows the message to pass a very long distance without
network as described in Eq. (1). vanishing gradient problem when using backpropagation.
In a chemical graph, atoms denote the node set x∈V, The generated messages exchange and update based on
and bonds denote the edge set (x,y)∈E. Each edge has its the merged node-edge feature and the previous message
own direction in the SAMPN model. N(x) or N(y) stands passing step as Eq. (2) defined.
for the group of neighbor nodes of x or y, respectively. The latent vector hy of each node, take Node 2’s
z ∈ N (x)\y means the neighbors of x do not contain y. latent vector h2 as an example in Fig. 2, is obtained by
aggregating its neighbor messages in Eq. (3) after the average pooling is used on the sum of G and EG to get the
message-passing process: molecule latent vector as Fig. 2 shows in the purple rec-
tangle. Finally, the latent vector is combined with several
�
d
layers of fully connected networks for the target property
hy = ReWo Wah · fy + Mzy (3) prediction.
z∈N (y)
where, hy captures the local chemical structure fea- Model training and hyperparameter optimization
tures based on the passing depth, and Wo and Wah are The code for the MPN encoder was mainly adapted
the learned weight matrices. More detailed information from Deepchem and Chemprop [27, 28]. Both the
of SAMPN algorithm can be found in Additional file 1: MPN encoder and self-attention mechanism were
Table S1 in Supporting Materials. Applying the above implemented with Python and Pytorch version 1.0, an
Eqs. (1–3) on a chemical graph generates the final graph open-source framework for deep learning [33]. MPN,
representation G = {h1 … hi … hn}, which combines with Multi-MPN, SAMPN and Multi-SAMPN models were
the self-attention mechanism and fully-connected neural trained with the Adam optimizer using the same learning
networks to make the final prediction. rate schedule in [34].
Multiple metrics were used to evaluate the perfor-
mance of our models: mean absolute error (MAE), root
Self‑attention mechanism mean squared error (RMSE), mean squared error (MSE),
All hidden states of a node are directly combined into a coefficient of determination ( R2) and Pearson correlation
single vector, which may not make the difference among coefficient (PC). Lower values of MAE, MSE, and RMSE
the learned features explainable [32]. A better way is to indicate better predictive performance. Conversely,
apply the attention mechanism to obtain a context vec- higher values for PC and R2 indicate better models or
tor for the target node by focusing on its neighbors and better fits for the data. While some of these metrics tell
local environment. Take Node 2 as an example (the blue the same story, the inclusion of all of these values may
node in Fig. 2), after several message passing steps, Node provide a rich benchmark for future studies.
2 has hidden state h 2, which represents the substructure A grid search algorithm was used to adjust the hyper-
centered at Atom 2. Meanwhile, all the rest nodes have parameters with Hyperopt package version 0.1.2 [35].
the same process and hn represents the substructure cen- Table 2 shows the hyperparameters to be optimized and
tered at Atom n. Since different substructures have dif- the search space. We chose RMSE on the validation set as
ferent contribution to the molecular property, we can the metric to find the most suitable combination of the
use the attention mechanism to capture the different hyperparameters within the search space. In the lipophi-
influences of substructures in contributing to the target licity-QSPR task, one of the best combinations of hyper-
molecular property. parameters was {‘activation’: ‘ReLU’; ‘depth’: 4; ‘dropout’:
A self-attention layer is then added to identify the rela- 0.25; ‘layers of fully connected networks’: 2; ‘hidden size’:
tionship between the substructure contribution to the 384}. All the message passing neural network models
target property of a molecule. A dot-product attention (MPN, SAMPN, Multi-MPN and Multi-SAMPN) utilized
algorithm was implemented to take the whole molecular the above hyperparameters to test the final performance
graph representation G as the input. The self-attentive with using the tenfold stratified cross-validation on the
weighted molecule graph embedding can be formed as whole dataset.
follows:
Watt = softmax G · G T (4)
Table 2 Hyperparameters optimization for MPN
and SAMPN
EG = Watt · G (5) MPN Hyperparameters Range (interval)
and SAMPN
where Watt is the self-attention score that implicitly indi-
cates the contribution of local chemical graph to the tar- Activation function Tanh, ELU,
LeakyReLU ReLU,
get property. As G = {h1 … hi … hn}, each row of Watt is PReLU, SELU
the attention weight between the i-th atom and the rest Steps of message passing 2–6 (1)
atoms in the molecule. EG is the attentive embedding Graph embedding size 32–512 (32)
matrix, where each row corresponds to the attention Dropout rate 0.0–0.4 (0.05)
weighted hidden vector of the node. Then, the global Layers of fully connected network 1–3 (1)
In addition to using the published results from Deep- Table 3 Models’ performance (root-mean-square error)
chem’s MPN, we also built a pure MPN model to estab- on lipophilicity database
lish a baseline without the self-attention and all the rest Dataset (size) Model RMSE
configurations were kept the same to SAMPN. To com-
pare the single-task and multi-target based deep learn- Lipophilicity (4200) RF 0.824 ± 0.041
ing network, we built the multi-MPN and multi-SAMPN. MPN (Deepchem)a 0.630 ± 0.059
The multi-target-based model used a merged molecule MPN (Deepchem)b 0.652 ± 0.061
dataset from ‘Lipophilicity’ and ‘Water Solubility’ as MPN 0.630 ± 0.059
described in Supporting Materials. All the used param- SAMPN 0.579 ± 0.036
eters were kept the same between MPN and SAMPN. Multi-MPN 0.594 ± 0.039
Multi-SAMPN 0.571 ± 0.032
Water solubility (1311) RF 1.096 ± 0.092
Random forest
MPN (Deepchem-1128)a 0.580 ± 0.030
To compare our SAMPN method with the traditional
MPN (Deepchem)b 0.676 ± 0.022
machine learning methods, we chose a random for-
MPN 0.694 ± 0.050
est model as the baseline. Random forest (RF) [36] is a
SAMPN 0.688 ± 0.057
supervised learning algorithm with an ensemble of deci-
Multi-MPN 0.674 ± 0.074
sion trees generated from a bootstrapped (bagged) sam-
Multi-SAMPN 0.661 ± 0.063
pling of compounds and features. It is widely used in the
traditional structure–property relation research [37], Italics represents the best performance in the results
a
Values were reported in [16]. In the lipophilicity prediction, we use the same
and was considered as a “gold standard” according to its dataset with Deepchem. In the water solubility prediction, our used dataset is
robustness, easy usage and high prediction accuracy in larger than Deepchem used (1128 molecules)
structure–property relationship research [38]. Here, the b
Values were calculated from the same data and the same stratified cross-
ECFP with a fixed length of 1024 [12] was used with the validation protocol in our work
Fig. 3 Models’ performance on lipophilicity (a, c) and aqueous solubility (b, d) with the same tenfold stratified cross-validation. Error bars represent
standard deviations
or MPN. Although our case indicates that a multi-target By using heat map coloring on each molecule (such
model performs better than the single-target model, it as in Fig. 4a–f ), it is easy to see which parts of molecule
requires more studies to show whether this is general, play a more important role in the lipophilicity or water
since our case only used only one lipophilicity and water solubility of molecule. The lipophilicity and solubility
solubility dataset. heat maps are helpful for chemists to optimize the lipo-
philicity and solubility of a particular molecule. Con-
Visualize the attention sider Fig. 4b, a depiction of 1H-indazole after using our
While higher prediction accuracy is always desirable, model. This molecule has a relatively high lipophilicity,
the ability to interpret a QSPR model is also important. as it has a large π-electron-conjugated system in its fused
Model comparison and interpretation can be facilitated aromatic ring. However, the nitrogen-containing section
by a visualization technique, making it possible to iden- of the molecule displays strong anti-lipophilic proper-
tify the learned features that drive compound property ties relative to the rest of the molecule. This may, in part,
predictions. In the SAMPN model, we can obtain the be due to nitrogen’s contribution (as ‘N’ or ‘NH’) to a
attention weight scores from the self-attention mecha- hydrogen bonding-network with its surroundings. Thus,
nism. For a specific molecule, we obtain the difference altering 1H-indazole to disrupt that potential network
between each atom’s weight score and the average atten- may increase the molecule’s lipophilicity. To test this
tion weight of the molecule. We define the above differ- hypothesis, we used SAMPN to predict the lipophilic-
ence as the attention coefficient of each atom and those ity of benzo[d]isothiazole (Additional file 1: Fig. S2), the
attention weight coefficients are very useful to gain molecule made by exchanging the ‘NH’ of 1H-indazole
insight into which parts of a molecule increase the target with ‘S’ (sulfur). As expected, this change did increase the
molecular property and which decrease it. molecule’s lipophilicity. Another example is the primary
Fig. 4 Heat map molecule coloring on lipophilicity (a–c) and solubility (d–f). a–c Red indicates a predicted anti-lipophilic feature and blue indicates
a predicted lipophilic feature. d–f Red indicates a predicted soluble feature and blue indicates a predicted anti-soluble feature
amine group in Fig. 4f, which can easily form hydrogen Supplementary information
bonds with water molecules. This is reflected in red for a Supplementary information accompanies this paper at https://doi.
predicted soluble feature. org/10.1186/s13321-020-0414-z.
References 21. Liu K, Sun X, Jia L, Ma J, Xing H, Wu J, Gao H, Sun Y, Boulnois F, Fan J (2019)
1. Hansen K, Biegler F, Ramakrishnan R, Pronobis W, Von Lilienfeld OA, Müller Chemi-Net: a molecular graph convolutional network for accurate drug
K-R, Tkatchenko A (2015) Machine learning predictions of molecular property prediction. Int J Mol Sci 20:3389
properties: accurate many-body potentials and non-locality in chemical 22. Goulon A, Picot T, Duprat A, Dreyfus G (2007) Predicting activities without
space. J Phys Chem Lett 6:2326–2331 computing descriptors: graph machines for Qsar. SAR QSAR Environ Res
2. Cherkasov A, Muratov EN, Fourches D, Varnek A, Baskin II, Cronin M, 18:141–153
Dearden J, Gramatica P, Martin YC, Todeschini R (2014) Qsar mod- 23. Arnott JA, Planey SL (2012) The influence of lipophilicity in drug discovery
eling: where have you been? Where are you going to? J Med Chem and design. Expert Opin Drug Discov 7:863–875
57:4977–5010 24. AstraZeneca. Experimental in vitro Dmpk and physicochemical data on
3. Chen H, Engkvist O, Wang Y, Olivecrona M, Blaschke T (2018) The rise of a set of publicly disclosed compounds (2016) https://doi.org/10.6019/
deep learning in drug discovery. Drug Discov Today 23:1241–1250 Chembl3301361
4. Le T, Epa VC, Burden FR, Winkler DA (2012) Quantitative structure- 25. Sushko I, Novotarskyi S, Körner R, Pandey AK, Rupp M, Teetz W, Brand-
property relationship modeling of diverse materials properties. Chem Rev maier S, Abdelaziz A, Prokopenko VV, Tanchuk VY et al (2011) Online
112:2889–2919 chemical modeling environment (Ochem): web platform for data
5. Gómez-Bombarelli R, Aguilera-Iparraguirre J, Hirzel TD, Duvenaud D, storage, model development and publishing of chemical information. J
Maclaurin D, Blood-Forsythe MA, Chae HS, Einzinger M, Ha D-G, Wu T Comput Aided Mol Des 25:533–554
(2016) Design of efficient molecular organic light-emitting diodes by 26. Landrum G. Rdkit: open-source cheminformatics (2006)
a high-throughput virtual screening and experimental approach. Nat 27. Ramsundar B, Eastman P, Walters P, Pande V (2019) Deep Learning for
Mater 15:1120 the life sciences: applying deep learning to genomics, microscopy, drug
6. Mannodi-Kanakkithodi A, Pilania G, Huan TD, Lookman T, Ramprasad discovery, and more. O’Reilly Media, Inc., Newton
R (2016) Machine learning strategy for accelerated design of polymer 28. Yang K, Swanson K, Jin W, Coley CW, Eiden P, Gao H, Guzman-Perez A,
dielectrics. Sci Rep 6:20952 Hopper T, Kelley B, Mathea M (2019) Analyzing learned molecular repre-
7. Feinberg EN, Sheridan R, Joshi E, Pande VS, Cheng AC (2019) Step change sentations for property prediction. J Chem Inf Model. 59:3370–3388
improvement in Admet prediction with Potentialnet deep Featurization. 29. Kireev DB (1995) Chemnet: a novel neural network based method for
arXiv preprint arXiv:190311789 graph/property mapping. J Chem Inf Comput Sci 35:175–180
8. Ju S, Shiga T, Feng L, Hou Z, Tsuda K, Shiomi J (2017) Designing nano- 30. Coley CW, Jin W, Rogers L, Jamison TF, Jaakkola TS, Green WH, Barzilay R,
structures for phonon transport via bayesian optimization. Phys Rev X Jensen KF (2019) A graph-convolutional neural network model for the
7:021024 prediction of chemical reactivity. Chem Sci 10:370–377
9. Hansch C, Maloney PP, Fujita T, Muir RM (1962) Correlation of biological 31. Kearnes S, McCloskey K, Berndl M, Pande V, Riley P (2016) Molecular graph
activity of phenoxyacetic acids with hammett substituent constants and convolutions: moving beyond fingerprints. J Comput Aided Mol Des
partition coefficients. Nature 194:178 30:595–608
10. Riniker S, Landrum GA (2013) Open-source platform to benchmark 32. Duvenaud DK, Maclaurin D, Iparraguirre J, Bombarell R, Hirzel T, Aspuru-
fingerprints for ligand-based virtual screening. J Cheminform 5:26 Guzik A, Adams RP (2015) Convolutional networks on graphs for learning
11. Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf molecular fingerprints. In Advances in neural information processing
Model 50:742–754 systems. pp 2224–2232.
12. Weininger D (1988) Smiles, a chemical language and information system. 33. Paszke A, Gross S, Chintala S, Chanan G (2017) Pytorch: tensors and
1. Introduction to methodology and encoding rules. J Chem Inf Comput dynamic neural networks in python with strong Gpu acceleration.
Sci. 28:31–36 PyTorch: tensors and dynamic neural networks in python with strong
13. Olivecrona M, Blaschke T, Engkvist O, Chen H (2017) Molecular de-novo GPU acceleration. 6
design through deep reinforcement learning. Journal of cheminformatics 34. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser
9:48 Ł, Polosukhin I (2017) Attention is all you need. In: advances in neural
14. Li X, Yan X, Gu Q, Zhou H, Wu D, Xu J (2019) Deepchemstable: chemical information processing systems. pp 5998–6008.
stability prediction with an attention-based graph convolution network. J 35. Bergstra J, Komer B, Eliasmith C, Yamins D, Cox DD (2015) Hyperopt: a
Chem Inf Model 14:1044–1049 python library for model selection and hyperparameter optimization.
15. Tetko IV, Tanchuk VY, Kasheva TN, Villa AE (2001) Estimation of aqueous Comput Sci Discov 8:014008
solubility of chemical compounds using E-state indices. J Chem Inf Com- 36. Breiman L (2001) Random forests. Mach Learn 45:5–32
put Sci 41:1488–1493 37. Polishchuk P (2017) Interpretation of quantitative structure-activity
16. Wu Z, Ramsundar B, Feinberg EN, Gomes J, Geniesse C, Pappu AS, relationship models: past, present, and future. J Chem Inf Model
Leswing K, Pande V (2018) Moleculenet: a benchmark for molecular 57:2618–2639
machine learning. Chem Sci 9:513–530 38. Ma J, Sheridan RP, Liaw A, Dahl GE, Svetnik V (2015) Deep neural nets as
17. Réti T, Sharafdini R, Dregelyi-Kiss A, Haghbin H (2018) Graph irregularity a method for quantitative structure–activity relationships. J Chem Inf
indices used as molecular descriptors in qspr studies. MATCH Commun Model 55:263–274
Math Comput Chem 79:509–524 39. Oliphant TE (2007) Python for Scientific Computing. Comput Sci Eng
18. Sarkar D, Sharma S, Mukhopadhyay S, Bothra AK (2016) Qsar Studies of 9:10–20
Fabh inhibitors using graph theoretical & quantum chemical descriptors. 40. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O,
Pharmacophore 7 Blondel M, Prettenhofer P, Weiss R, Dubourg V (2011) Scikit-learn: machine
19. Shao Z, Hirayama Y, Yamanishi Y, Saigo H (2015) Mining discriminative learning in Python. J Mach Learn Res 12:2825–2830
patterns from graph data with multiple labels and its application to
quantitative structure–activity relationship (Qsar) models. J Chem Inf
Model 55:2519–2527 Publisher’s Note
20. Wang X, Li Z, Jiang M, Wang S, Zhang S, Wei Z (2019) Molecule property Springer Nature remains neutral with regard to jurisdictional claims in pub-
prediction based on spatial graph embedding. J Chem Inf Model lished maps and institutional affiliations.
59:3817–3828
1. use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
2. use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
3. falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
4. use bots or other automated methods to access the content or redirect messages
5. override any security feature or exclusionary protocol; or
6. share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at