Meta Learning With Graph Attention Networks For Low-Data Drug Discovery
Meta Learning With Graph Attention Networks For Low-Data Drug Discovery
net/publication/369045337
Meta Learning With Graph Attention Networks for Low-Data Drug Discovery
Article in IEEE Transactions on Neural Networks and Learning Systems · March 2023
DOI: 10.1109/TNNLS.2023.3250324
CITATIONS READS
16 163
5 authors, including:
Weihe Zhong
Sun Yat-Sen University
16 PUBLICATIONS 350 CITATIONS
SEE PROFILE
All content following this page was uploaded by Guanxing Chen on 01 July 2023.
Abstract— Finding candidate molecules with favorable phar- virtual screening technologies and high-throughput omics
macological activity, low toxicity, and proper pharmacokinetic technologies, researchers can integrate the relevant knowledge
properties is an important task in drug discovery. Deep neural of computational chemistry, physics, and structural biology to
networks have made impressive progress in accelerating and
improving drug discovery. However, these techniques rely on effectively screen and design molecular compounds [3], [4],
a large amount of label data to form accurate predictions [5], [6], [7]. The key issue of drug discovery is the screening
of molecular properties. At each stage of the drug discovery and optimization of candidate molecules, which must meet a
pipeline, usually, only a few biological data of candidate molecules series of criteria: the compound needs to have suitable poten-
and derivatives are available, indicating that the application tial for biological targets, and exhibit good physicochemical
of deep neural networks for low-data drug discovery is still
a formidable challenge. Here, we propose a meta learning properties; absorption, distribution, metabolism, excretion, and
architecture with graph attention network, Meta-GAT, to predict toxicity (ADMET); water solubility; and mutagenicity [8], [9].
molecular properties in low-data drug discovery. The GAT However, there are usually only a few validated leads and
captures the local effects of atomic groups at the atom level derivatives that can be used for lead optimization [10], [11].
through the triple attentional mechanism and implicitly captures
Also, due to the possible toxicity, low activity, and low
the interactions between different atomic groups at the molecular
level. GAT is used to perceive molecular chemical environment solubility, there are often only a few real biological data
and connectivity, thereby effectively reducing sample complexity. on candidate molecules and analog molecules. The accuracy
Meta-GAT further develops a meta learning strategy based on of the physical chemical properties of candidate molecules
bilevel optimization, which transfers meta knowledge from other directly affects the results of the drug development process.
attribute prediction tasks to low-data target tasks. In summary,
Therefore, researchers have paid more and more attention to
our work demonstrates how meta learning can reduce the amount
of data required to make meaningful predictions of molecules in accurately predict the physicochemical properties of candidate
low-data scenarios. Meta learning is likely to become the new molecules with low data.
learning paradigm in low-data drug discovery. The source code In the past few years, deep learning technology has been
is publicly available at: https://github.com/lol88/Meta-GAT. implemented to accelerate and improve the drug discovery
Index Terms— Drug discovery, few examples, graph attention process [12], [13], [14], [15], and some key advances have
network, meta learning, molecular property. been made in molecular property prediction [16], [17], [18],
[19], [20], side effect prediction [21], [22], [23], and virtual
I. I NTRODUCTION screening [24], [25]. In particular, the graph neural network
(GNN), which can learn the information contained in the nodes
Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on March 08,2023 at 02:46:12 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Biswas et al. [38] developed UniRep for protein engineering to discovery. The graph attention network captures the local
efficiently use resource-intensive high-fidelity assays without effects of atomic groups at the atomic level through the
sacrificing throughput, and subsequent low-N supervision then triple attentional mechanism, so that the GAT can learn the
identifies improvements to the activity of interest. Liu et al. influence of the atom group on the properties of the compound.
[39] from the Chinese Academy of Sciences have established At the molecular level, GAT treats the entire molecule as a
a complete and effective screening method for disease tar- supervirtual node that connects every atom in a molecule,
get markers based on few examples (or even one sample). implicitly capturing the interactions between different atomic
Lin et al. [40] proposed a prototypical graph contrastive learn- groups. The gated recurrent unit (GRU) hierarchical model
ing (PGCL) method for learning graph representation, which mainly focuses on abstracting or transferring limited molecular
improved the results of molecular property prediction. Yu and information into higher-level feature vectors or meta knowl-
Tran [25] proposed an XGBoost-based fitted Q iteration edge, improving the ability of the GAT to perceive chemical
algorithm with fewer training data for finding the optimal environment and connectivity in molecules, thereby efficiently
structured treatment interruption (STI) strategies for HIV reducing sample complexity. This is very important for low-
patients. They have made certain explorations and attempts data drug discovery. Meta-GAT benefits from meta knowledge
in the field of drug virtual screening and combination drug and further develops a meta learning strategy based on bilevel
prediction based on the few examples learning method [41], optimization, which transfers meta knowledge from other
[42]. The abovementioned work is a useful attempt by meta attribute prediction tasks to low-data target tasks, allowing
learning for few samples learning problems, indicating that the the model to quickly adapt to molecular attribute predictions
meta learning method has the potential to be a useful tool in with few examples. Meta-GAT achieved accurate prediction of
drug discovery and other bioinformatics research fields. few examples’s molecular new properties on multiple public
Meta learning uses meta knowledge to reduce requirement benchmark datasets. These advantages indicate that Meta-GAT
for sample complexity, thus solving the core problem of is likely to become a viable option for low-data drug discovery.
minimizing the risk of unreliable experience. However, the In addition, the Meta-GAT code and data are open source at
molecular structure is usually composed of the interaction https://github.com/lol88/Meta-GAT, so that the results can be
between atoms and complex electronic configurations. Even easily replicated.
small changes in the molecular structure may lead to com- Our contributions can be summarized as follows.
pletely opposite molecular properties. The model learns the 1) We create a chemical tool to predict multiple physiolog-
complexity of molecular structure, which requires that the ical properties of new molecules that are invisible to the
model should perfectly extract the local environmental influ- model. This tool could push the boundaries of molecular
ence of neighboring atoms on the central atom and the rich representation for low-data drug discovery.
nonlocal information contained between pairs of atoms that 2) The proposed Meta-GAT captures the local effects of
are topologically far apart. Therefore, meta learning for low- atomic groups at the atomic level through the triplet
data drug discovery is highly dependent on the structure of the attentional mechanism and can also model global effects
network and needs to be redesigned for widely varying tasks. of molecules at the molecular level.
Meta learning has made some representative attempts to 3) We propose a meta learning strategy to selectively
predict molecular properties. Altae-Tran et al. [43] introduced update parameters within each task through a bilevel
an architecture of iteratively refined long short-term memory optimization, which is particularly helpful to capture the
(IterRefLSTM) that uses IterRefLSTM to generate dually generic knowledge shared across different tasks.
evolved embeddings for one-shot learning. Adler et al. [44] 4) Meta-GAT demonstrates how meta learning can reduce
proposed cross-domain Hebbian ensemble few-shot learning the amount of data required to make meaningful predic-
(CHEF), which achieves representation fusion by an ensemble tions of molecules in low-data drug discovery.
of Hebbian learners acting on different layers of a deep neural
network. The Meta-molecular graph neural network (MGNN) II. M ETHODS
leverages a pretrained GNN and introduces additional self-
In this section, we first briefly introduce the mathematical
supervised tasks, such as bond reconstruction and atom-
formalism of Meta-GAT and then introduce the meta learning
type prediction to be jointly optimized with the molecular
strategy and graph attention network structure. Finally, the
property prediction tasks [45]. Meta-MGNN, CHEF, obtains
parameters and details of the model training are shown. Fig. 1
meta knowledge through pretraining on a large-scale molecu-
shows the overall architecture of Meta-GAT for low-data drug
lar corpus and additional self-supervised model parameters.
discovery.
IterRefLSTM trains the memory-augmented model, which
restricts the model structure and can only be used in spe-
cific domain scenarios. How to represent molecular features A. Problem Formulations
effectively and how to capture common knowledge between Consider several common drug discovery tasks T , such as
different tasks are great challenges that exist in meta learning. predicting the toxicity and side effects of new molecules, x is
In this work, we propose a meta learning architecture the compound molecule to be measured, and the label y is the
based on graph attention network, Meta-GAT, to predict binary experimental label (positive/negative) of the molecular
the biochemical properties of molecules in low-data drug properties. Suppose that all some potential laws considered
Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on March 08,2023 at 02:46:12 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
LV et al.: META LEARNING WITH GRAPH ATTENTION NETWORKS FOR LOW-DATA DRUG DISCOVERY 3
Fig. 1. Meta learning framework for few examples molecular property prediction. The blue box and the orange box represent the data flow in the training
phase and the test phase, respectively.
by the model are called hypothesis space H . h is the optimal The difference between meta learning and transfer learning is
hypothesis from x to y. The expected risk R(h) represents that transfer learning is usually fitting the distribution of one
the prediction ability of the decision model for all samples. data, while meta learning is fitting the distribution of multiple
The empirical risk R(h I ) represents the predictive ability of similar tasks. Therefore, the training samples of meta learning
the model for samples in the training set by calculating the are a series of tasks.
average value of the loss function, and I represents the number Model-agnostic meta-learning (MAML) [47] is used as a
of samples in the training set. The empirical risk R(h I ) is used base meta learning algorithm for the Meta-GAT framework.
to estimate expected risk R(h). In real-world applications, only Meta-GAT selectively updates parameters within each task
a few examples are available for a property prediction task of through a bilevel optimization and transfers meta knowledge
a new molecule, that is, I → few. According to the empirical to new tasks with few label samples, as shown in Fig. 1.
risk minimization theory, if only a few training samples can be Bilevel optimization means that one optimization contains the
provided, which makes the empirical risk R(h I ) far from the another optimization as a constraint. In inner-level optimiza-
approximation of the expected risk R(h), the obtained empir- tion, we hope to learn a general meta knowledge w from
ical risk minimizer is unreliable [46]. The learning challenge the support set of training tasks, so that the loss of different
is to obtain a reliable empirical risk minimization from a few tasks can be as small as possible. The inner level optimization
examples. This minimizer results in R(h I ) approaching the phase can be formalized, as shown in (3). In outer-level
optimal R(h), as shown in the following equation: optimization, Meta-GAT calculates the gradient relative to the
optimal parameter in the query set of each task and calculates
E[R(h I →few ) − R(h)] = 0. (1)
the minimum total loss value of all training tasks to optimize
The empirical risk minimization is closely related to sample the w parameter, thereby reducing the expected loss of the
complexity. Sample complexity refers to the number of train- training task, as shown in (2). Algorithm 1 shows the specific
ing samples required to minimize the loss of empirical risk algorithm details
R(h I ). According to VapnikâĂŞ Chervonenkis (VC), when
M
samples are insufficient, H needs less complexity, so that the X q
w ∗ = argmin Lmeta θ ∗(i) (w), Dtrain
fθ (2)
few examples provided are sufficient for compensation. We use w
i=1
meta knowledge w to reduce the complexity of learning
s(i)
samples, thus solving the core problem of minimizing the risk θ ∗(i) (w) = argmin Ltask
fθ θ, w, Dtrain (3)
θ
of unreliable experience.
where Lmeta and Ltask refer to the outer and inner objectives,
B. Meta Learning respectively. i represents the ith training task.
Meta learning, also known as learning to learn, means Specifically, first, the train tasks Ttrain and test tasks Ttest are
learning a learning experience by systematically observing extracted from a set of multitask T for drug discovery, where
how the model performs in a wide range of learning tasks. each task has a support set D s and a query set D q . Meta-
This learning experience is called meta knowledge w. The GAT uses a large number of training tasks Ttrain to fitting
goal of meta learning is to find the w shared across different the distribution of multiple similar tasks T . Second, Meta-
tasks, so that the model can quickly generalize to new tasks GAT sequentially iterates a batch of training tasks, learns
that contain only a few examples with supervised information. task-specific parameters, and tries to minimize the loss using
Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on March 08,2023 at 02:46:12 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on March 08,2023 at 02:46:12 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
LV et al.: META LEARNING WITH GRAPH ATTENTION NETWORKS FOR LOW-DATA DRUG DISCOVERY 5
and provide insight into the edge characteristics of molecular Specifically, GAT first performs linear transformation and
bonds. nonlinear activation on the neighbor nodes’ state vectors vi ,v j
and their edge hidden states ei j to align these vectors to the
D. Graph Attention Network same dimension, and concatenate them into triplet embedding
GNN has made substantial progress in the field of chemical vectors. Then, h i j is normalized by the softmax function over
informatics. It has the extraordinary capacity to learn the all neighbor nodes to get attention weights ai j . Finally, the
intricate relationships between structures and properties [16], node hidden state and edge hidden state elementwise multi-
[48], [49], [50]. The attention mechanism has proved its plied by neighbor node representation, and the information of
outstanding performance in predicting molecular properties. neighbors (including neighbor nodes and edges) is aggregated
Molecular structure involves the spatial position of atoms according to the attention weight to obtain the context state ci
and the types of chemical bonds. Topologically adjacent of the atom i. The formula is shown below
nodes in molecules have a greater chance of interacting with
h i j = LeakyReLU W · vi , ei j , v j
each other. In some cases, they can also form functional (7)
groups that determine the chemical properties of the molecule. exp h i j
ai j = softmax h i j = P (8)
In addition, pairs of atoms that are topologically far apart
j∈N (i) exp h i j
may also have significant interactions, such as intramolecular X
ai j · W · ei j , v j
hydrogen bonds. Our graph attention network extracts insights ci = (9)
j∈N (i)
on molecular structure and features from both local and global
perspectives, as shown in Fig. 2. GAT captures the local effects where N (i) is the set of neighbor nodes for node i. W is the
of atomic groups at the atomic level through the attentional trainable weight matrix. Then, the GRU is used as a message
mechanism and can also model global effects of molecules at transfer function to fuse messages with a farther radius to
the molecular level. generate a new context state, as shown in Fig. 2 (bottom left).
The molecule G = (v, e) can be defined as a graph com- As the time step t increases, messages of nodes and edges in
posed of a set of atoms (nodes) v and a set of bonds (edges) e. the range centered on node I and whose radius increases with
Similar to the previous study, we encode chemical information t are collected successively to generate new states h it , which
including nine atomic features and four bond features into the is computed by
molecular graph as the input of graph attention network. For
h it = GRU h it−1 , cit−1 .
(10)
the local environment within the molecule, previous graph net-
works only aggregate the neighbor nodes’ information, which In order to include more global information from the
may lead to insufficient edge (bond) information extraction. molecule, GAT aggregates the atomic level representation
Our GAT gradually aggregates the triplet embedding of target through the readout function, which treats the entire molecule
node vi , neighbor node v j , and edge ei j through the triple as a supervirtual node that connects every atom in a molecule.
attention mechanism. We use the bidirectional GRU (BiGRU) with attention to
Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on March 08,2023 at 02:46:12 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on March 08,2023 at 02:46:12 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
LV et al.: META LEARNING WITH GRAPH ATTENTION NETWORKS FOR LOW-DATA DRUG DISCOVERY 7
TABLE III
S CORES FOR C ONSISTENCY C HECKS ON THE T OX 21 DATASET U SING
K APPA AND PAIRED W ILCOXON T ESTS
Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on March 08,2023 at 02:46:12 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
TABLE IV
S CORES FOR C ONSISTENCY C HECKS ON THE SIDER DATASET U SING
K APPA AND PAIRED W ILCOXON T ESTS
Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on March 08,2023 at 02:46:12 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
LV et al.: META LEARNING WITH GRAPH ATTENTION NETWORKS FOR LOW-DATA DRUG DISCOVERY 9
TABLE V
S CORES FOR C ONSISTENCY C HECKS ON THE MUV DATASET U SING
K APPA AND PAIRED W ILCOXON T ESTS
TABLE VI
C OMPARISON OF P REDICTIVE P ERFORMANCES (MAE) ON THE QM9
DATASET Q UANTUM P ROPERTIES . N OTE T HAT FOR MAE, L OWER
VALUE I NDICATES B ETTER P ERFORMANCE
Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on March 08,2023 at 02:46:12 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
TABLE VII
ROC-AUC S CORES OF M ODELS T RAINED ON T OX 21 T ESTED ON SIDER
Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on March 08,2023 at 02:46:12 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
LV et al.: META LEARNING WITH GRAPH ATTENTION NETWORKS FOR LOW-DATA DRUG DISCOVERY 11
IV. C ONCLUSION
Drug discovery is the process of discovering new molecules
properties and identifying the useful molecules as new drugs
after optimization. In the initial stage of optimization of
candidate molecules, due to low solubility or possible toxicity,
Fig. 10. Heatmap of atomic similarity matrices for six molecules. new molecules or analog molecules do not have many records
of real physicochemical properties and biological activities.
Therefore, the key problem of AI-assisted drug discovery is
few examples learning. Here, we propose a meta learning
method based on graph attention network, Meta-GAT, which
uses graph attention network to extract the interaction of
atom pairs and the edge features of bonds in molecules.
Also, the meta learning algorithm trains a well-initialized
parameter through multiple prediction tasks and, on this basis,
performs one or more steps of gradient adjustment to achieve
the purpose of quickly adapting to a new task with only
few data. Meta-GAT achieves SOTA performance on multiple
public benchmark datasets, indicating that it can adapt to new
tasks faster than other models. This algorithm is expected
to fundamentally solve the problem of few samples in drug
Fig. 11. Attention weights learned from the Meta-GAT are used to highlight discovery. We have proved that Meta-GAT can provide a pow-
each atom in nine molecules in the toxicity prediction task on Tox21 datasets.
erful impetus for low-data drug discovery. The development
of meta learning is an important direction of AI-assisted drug
discovery. It is believed that the new learning paradigm can
In addition, we conducted two visualization experiments on be applied in the field of drug discovery in the future.
the atom similarity matrix and attention weights to rationalize
Meta-GAT. We obtained the similarity coefficient between ACKNOWLEDGMENT
atom pairs by calculating the Pearson correlation coefficient
for those feature vectors and plotted the heatmap of the atomic The authors would like to thank the anonymous reviewers
similarity matrices for the six molecules, as shown in Fig. 10. for their valuable suggestions.
Taking the molecule structure of Dipyrone as an example, the
atoms in Dipyrone are clearly separated into three clusters, R EFERENCES
as follows: a benzene (atoms 0–5), an aminomethanesulfonic [1] H. Dowden and J. Munro, “Trends in clinical success rates and thera-
peutic focus,” Nature Rev. Drug Discovery, vol. 18, no. 7, pp. 495–497,
acid (atoms 6–13), and a pyrazolidone (atoms 14–20). The 2019.
first impression of the visual pattern in the heat map for the [2] L. Wang et al., “Accurate and reliable prediction of relative ligand
compound iodoantipyrine may show some degree of chaos, binding potency in prospective drug discovery by way of a modern
free-energy calculation protocol and force field,” J. Amer. Chem. Soc.,
which is caused by the disorder of the atom numbers in vol. 137, no. 7, pp. 2695–2703, Feb. 2015.
SMILES. Combining atoms 0–6, atom N13, and atom C14 [3] G. Sliwoski, S. Kothiwale, J. Meiler, and E. W. Lowe, “Computational
of iodoantipyrine, the atoms in iodoantipyrine are clearly methods in drug discovery,” Pharmacological Rev., vol. 66, no. 1,
divided into two clusters. The visual pattern of these heat pp. 334–395, 2014.
[4] Z. Yang, W. Zhong, L. Zhao, and C. Y.-C. Chen, “ML-DTI: Mutual
maps strongly agrees with our chemical intuition regarding learning mechanism for interpretable drug–target interaction prediction,”
these molecular structure. J. Phys. Chem. Lett., vol. 12, no. 17, pp. 4247–4261, 2021.
Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on March 08,2023 at 02:46:12 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
[5] J.-Q. Chen, H.-Y. Chen, W.-J. Dai, Q.-J. Lv, and C. Y.-C. Chen, “Artifi- [28] D. Duvenaud et al., “Convolutional networks on graphs for learning
cial intelligence approach to find lead compounds for treating tumors,” molecular fingerprints,” in Proc. Adv. Neural Inf. Process. Syst., Annu.
J. Phys. Chem. Lett., vol. 10, no. 15, pp. 4382–4400, Aug. 2019. Conf. Neural Inf. Process. Syst. Montreal, QC, Canada: Curran Asso-
[6] J.-Y. Li, H.-Y. Chen, W.-J. Dai, Q.-J. Lv, and C. Y.-C. Chen, “Artificial ciates, Inc., Dec. 2015, pp. 2224–2232.
intelligence approach to investigate the longevity drug,” J. Phys. Chem. [29] P. Li et al., “TrimNet: Learning molecular representation from triplet
Lett., vol. 10, no. 17, pp. 4947–4961, Sep. 2019. messages for biomedicine,” Briefings Bioinf., vol. 22, no. 4, Jul. 2021,
[7] C. Y. Lee and Y.-P.-P. Chen, “New insights into drug repurposing for Art. no. bbaa266.
COVID-19 using deep learning,” IEEE Trans. Neural Netw. Learn. Syst., [30] Q.-J. Lv et al., “A multi-task group bi-LSTM networks application on
vol. 32, no. 11, pp. 4770–4780, Nov. 2021. electrocardiogram classification,” IEEE J. Transl. Eng. Health Med.,
[8] M. J. Waring et al., “An analysis of the attrition of drug candidates from vol. 8, pp. 1–11, 2020.
four major pharmaceutical companies,” Nature Rev. Drug Discovery, [31] C. Cai et al., “Transfer learning for drug discovery,” J. Medicinal Chem.,
vol. 14, no. 7, pp. 475–486, Jul. 2015. vol. 63, no. 16, pp. 8683–8694, 2020.
[9] J. Wenzel, H. Matter, and F. Schmidt, “Predictive multitask deep neural [32] S. Guo, L. Xu, C. Feng, H. Xiong, Z. Gao, and H. Zhang, “Multi-
network models for ADME-Tox properties: Learning from large data level semantic adaptation for few-shot segmentation on cardiac image
sets,” J. Chem. Inf. Model., vol. 59, no. 3, pp. 1253–1268, Mar. 2019. sequences,” Med. Image Anal., vol. 73, Oct. 2021, Art. no. 102170.
[10] J. Ma, R. P. Sheridan, A. Liaw, G. E. Dahl, and V. Svetnik, “Deep [33] M. Huisman, J. N. Van Rijn, and A. Plaat, “A survey of deep meta-
neural nets as a method for quantitative structure–activity relationships,” learning,” Artif. Intell. Rev., vol. 54, pp. 1–59, Aug. 2021.
J. Chem. Inf. Model., vol. 55, no. 2, pp. 263–274, 2015. [34] A. Banino et al., “Vector-based navigation using grid-like representations
in artificial agents,” Nature, vol. 557, no. 7705, pp. 429–433, May 2018.
[11] R. S. Simões, V. G. Maltarollo, P. R. Oliveira, and K. M. Honorio,
“Transfer and multi-task learning in QSAR modeling: Advances and [35] T. Hospedales, A. Antoniou, P. Micaelli, and A. Storkey, “Meta-learning
challenges,” Frontiers Pharmacol., vol. 9, p. 74, Feb. 2018. in neural networks: A survey,” 2020, arXiv:2004.05439.
[12] C. Li et al., “Geometry-based molecular generation with deep con- [36] J. Vanschoren, “Meta-learning: A survey,” 2018, arXiv:1810.03548.
strained variational autoencoder,” IEEE Trans. Neural Netw. Learn. Syst., [37] J. X. Wang et al., “Prefrontal cortex as a meta-reinforcement learning
early access, 2022, doi: 10.1109/TNNLS.2022.3147790. system,” Nature Neurosci., vol. 21, no. 6, pp. 860–868, May 2018.
[13] C. Ji, Y. Zheng, R. Wang, Y. Cai, and H. Wu, “Graph polish: A [38] S. Biswas, G. Khimulya, E. C. Alley, K. M. Esvelt, and G. M. Church,
novel graph generation paradigm for molecular optimization,” IEEE “Low-N protein engineering with data-efficient deep learning,” Nature
Trans. Neural Netw. Learn. Syst., early access, Sep. 14, 2021, doi: Methods, vol. 18, no. 4, pp. 389–396, Apr. 2021.
10.1109/TNNLS.2021.3106392. [39] R. Liu, X. Yu, X. Liu, D. Xu, K. Aihara, and L. Chen, “Identifying
[14] P. Schneider et al., “Rethinking drug design in the artificial intelli- critical transitions of complex diseases based on a single sample,”
gence era,” Nature Rev. Drug Discovery, vol. 19, no. 5, pp. 353–364, Bioinformatics, vol. 30, no. 11, pp. 1579–1586, Jun. 2014.
May 2020. [40] S. Lin et al., “Prototypical graph contrastive learning,” IEEE
[15] X. Jing and J. Xu, “Fast and effective protein model refinement using Trans. Neural Netw. Learn. Syst., early access, Jul. 27, 2022, doi:
deep graph neural networks,” Nature Comput. Sci., vol. 1, no. 7, 10.1109/TNNLS.2022.3191086.
pp. 462–469, Jul. 2021. [41] Y. Sun et al., “Combining genomic and network characteristics for
extended capability in predicting synergistic drugs for cancer,” Nature
[16] Z. Xiong et al., “Pushing the boundaries of molecular representation Commun., vol. 6, no. 1, pp. 1–10, Sep. 2015.
for drug discovery with the graph attention mechanism,” J. Medicinal
[42] Q. Liu, H. Zhou, L. Liu, X. Chen, R. Zhu, and Z. Cao, “Multi-target
Chem., vol. 63, no. 16, pp. 8749–8760, Aug. 2019.
QSAR modelling in the analysis and design of HIV-HCV co-inhibitors:
[17] Q. Lv, G. Chen, L. Zhao, W. Zhong, and C. Yu-Chian Chen, An in-silico study,” BMC Bioinf., vol. 12, no. 1, pp. 1–20, Dec. 2011.
“Mol2Context-vec: Learning molecular representation from context
[43] H. Altae-Tran, B. Ramsundar, A. S. Pappu, and V. Pande, “Low data
awareness for drug discovery,” Briefings Bioinf., vol. 22, no. 6,
drug discovery with one-shot learning,” ACS Central Sci., vol. 3, no. 4,
Nov. 2021, Art. no. bbab317.
pp. 283–293, 2017.
[18] L. A. Bugnon, C. Yones, D. H. Milone, and G. Stegmayer, “Deep
[44] T. Adler et al., “Cross-domain few-shot learning by representation
neural architectures for highly imbalanced data in bioinformatics,”
fusion,” 2020, arXiv:2010.06498.
IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 8, pp. 2857–2867,
[45] Z. Guo et al., “Few-shot graph learning for molecular property predic-
Aug. 2020.
tion,” in Proc. Web Conf., J. Leskovec, M. Grobelnik, M. Najork, J. Tang,
[19] J. Song et al., “Local–global memory neural network for medication and L. Zia, Eds., Ljubljana, Slovenia, Apr. 2021, pp. 2559–2567.
prediction,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 4, [46] Y. Wang, Q. Yao, J. T. Kwok, and L. M. Ni, “Generalizing from a few
pp. 1723–1736, Apr. 2021. examples: A survey on few-shot learning,” ACM Comput. Surv., vol. 53,
[20] R. Huang, X. Tan, and Q. Xu, “Learning to learn variational quan- no. 3, pp. 1–34, 2020.
tum algorithm,” IEEE Trans. Neural Netw. Learn. Syst., early access, [47] C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for
Feb. 28, 2022, doi: 10.1109/TNNLS.2022.3151127. fast adaptation of deep networks,” in Proc. 34th Int. Conf. Mach. Learn.,
[21] Y. Yamanishi, E. Pauwels, and M. Kotera, “Drug side-effect prediction vol. 70, Sydney, NSW, Australia, Aug. 2017, pp. 1126–1135.
based on the integration of chemical and biological spaces,” J. Chem. [48] D. Jiang et al., “Could graph neural networks learn better molecular
Inf. Model., vol. 52, no. 12, pp. 3284–3292, Dec. 2012. representation for drug discovery? A comparison study of descriptor-
[22] Á. Duffy et al., “Tissue-specific genetic features inform prediction of based and graph-based models,” J. Cheminformatics, vol. 13, no. 1,
drug side effects in clinical trials,” Sci. Adv., vol. 6, no. 37, Sep. 2020, pp. 1–23, Feb. 2021.
Art. no. eabb6242. [49] R. Winter, F. Montanari, F. Noé, and D.-A. Clevert, “Learning con-
[23] G. Yu, Y. Xing, J. Wang, C. Domeniconi, and X. Zhang, “Multiview tinuous and data-driven molecular descriptors by translating equivalent
multi-instance multilabel active learning,” IEEE Trans. Neural Netw. chemical representations,” Chem. Sci., vol. 10, no. 6, pp. 1692–1701,
Learn. Syst., vol. 33, no. 9, pp. 4311–4321, Sep. 2022. Jul. 2019.
[24] A. Morro et al., “A stochastic spiking neural network for virtual [50] J. Cui, B. Yang, B. Sun, X. Hu, and J. Liu, “Scalable and parallel deep
screening,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 4, Bayesian optimization on attributed graphs,” IEEE Trans. Neural Netw.
pp. 1371–1375, Apr. 2018. Learn. Syst., vol. 33, no. 1, pp. 103–116, Jan. 2020.
[25] Y. Yu and H. Tran, “An XGBoost-based fitted Q iteration for [51] Z. Wu et al., “MoleculeNet: A benchmark for molecular machine
finding the optimal STI strategies for HIV patients,” IEEE Trans. learning,” Chem. Sci., vol. 9, no. 2, pp. 513–530, 2018.
Neural Netw. Learn. Syst., early access, Jun. 2, 2022, doi: [52] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
10.1109/TNNLS.2022.3176204. 2014, arXiv:1412.6980.
[26] K. V. Chuang, L. M. Gunsalus, and M. J. Keiser, “Learning molecular [53] F. Fabris, A. Doherty, D. Palmer, J. P. De Magalhães, and A. A. Freitas,
representations for medicinal chemistry: Miniperspective,” J. Medicinal “A new approach for interpreting random forest models and its appli-
Chem., vol. 63, no. 16, pp. 8705–8722, Aug. 2020. cation to the biology of ageing,” Bioinformatics, vol. 34, no. 14,
[27] M. Sun, S. Zhao, C. Gilvary, O. Elemento, J. Zhou, and F. Wang, pp. 2449–2456, Jul. 2018.
“Graph convolutional networks for computational drug develop- [54] T. N. Kipf and M. Welling, “Semi-supervised classification with graph
ment and discovery,” Briefings Bioinf., vol. 21, no. 3, pp. 919–935, convolutional networks,” in Proc. 5th Int. Conf. Learn. Represent.
May 2020. (ICLR), Toulon, France, Apr. 2017, pp. 1–14.
Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on March 08,2023 at 02:46:12 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
LV et al.: META LEARNING WITH GRAPH ATTENTION NETWORKS FOR LOW-DATA DRUG DISCOVERY 13
[55] G. Koch et al., “Siamese neural networks for one-shot image recogni- Guanxing Chen is currently pursuing the Ph.D.
tion,” in Proc. ICML Deep Learn. Workshop, vol. 2, Lille, France, 2015, degree with the Artificial Intelligence Medical
pp. 1–30. Research Center, School of Intelligent Systems Engi-
[56] J. Kim, T. Kim, S. Kim, and C. D. Yoo, “Edge-labeling graph neural neering, Shenzhen Campus of Sun Yat-sen Univer-
network for few-shot learning,” in Proc. IEEE/CVF Conf. Comput. Vis. sity, Shenzhen, Guangdong, China.
Pattern Recognit. (CVPR). Long Beach, CA, USA: Computer Vision His research interests include explainable artificial
Foundation, Jun. 2019, pp. 11–20. intelligence, drug discovery, deep learning, biosyn-
[57] W. Hu et al., “Strategies for pre-training graph neural networks,” in thesis, and vaccine design.
Proc. 8th Int. Conf. Learn. Represent. (ICLR), Addis Ababa, Ethiopia,
Apr. 2020, pp. 1–22.
[58] J. Snell, K. Swersky, and R. Zemel, “Prototypical networks for few-
shot learning,” in Proc. Adv. Neural Inf. Process. Syst., vol. 30, 2017, Ziduo Yang is currently pursuing the Ph.D. degree
pp. 1–11. with the Artificial Intelligence Medical Research
[59] Y. Song, S. Zheng, Z. Niu, Z.-H. Fu, Y. Lu, and Y. Yang, “Com- Center, School of Intelligent Systems Engineer-
municative representation learning on attributed molecular graphs,” in ing, Shenzhen Campus of Sun Yat-sen University,
Proc. 29th Int. Joint Conf. Artif. Intell., C. Bessiere, Ed., Jul. 2020, Shenzhen, Guangdong, China.
pp. 2831–2838, doi: 10.24963/IJCAI.2020/392. His main research interests include explainable
[60] S. Kearnes, K. McCloskey, M. Berndl, V. Pande, and P. Riley, “Molecu- graph neural network, computer vision, reinforce-
lar graph convolutions: Moving beyond fingerprints,” J. Comput.-Aided ment learning, and chemoinformatics.
Mol. Des., vol. 30, no. 8, pp. 595–608, Aug. 2016.
[61] A. Mayr, G. Klambauer, T. Unterthiner, and S. Hochreiter, “DeepTox:
Toxicity prediction using deep learning,” Frontiers Environ. Sci., vol. 3,
p. 80, Feb. 2016.
Weihe Zhong is currently pursuing the Ph.D. degree
[62] L. Maziarka, T. Danel, S. Mucha, K. Rataj, J. Tabor, and S. Jastrzebski,
with the Artificial Intelligence Medical Research
“Molecule attention transformer,” 2020, arXiv:2002.08264.
Center, School of Intelligent Systems Engineer-
[63] X. Li and D. Fourches, “Inductive transfer learning for molecular activity
ing, Shenzhen Campus of Sun Yat-sen University,
prediction: Next-gen QSAR models with MolPMoFiT,” J. Cheminfor-
Shenzhen, Guangdong, China.
matics, vol. 12, no. 1, pp. 1–15, Dec. 2020.
His main research interests include graph neural
[64] S. Liu, M. F. Demirel, and Y. Liang, “N-gram graph: Simple unsuper- network, chemoinformatics, and drug discovery.
vised representation for graphs, with applications to molecules,” in Proc.
Adv. Neural Inf. Process. Syst., vol. 32, 2019, pp. 8464–8476.
[65] M. Kuhn, I. Letunic, L. J. Jensen, and P. Bork, “The SIDER database
of drugs and side effects,” Nucleic Acids Res., vol. 44, no. D1,
pp. D1075–D1079, Jan. 2016.
[66] S. G. Rohrer and K. Baumann, “Maximum unbiased validation (MUV) Calvin Yu-Chian Chen is currently the Direc-
data sets for virtual screening based on PubChem bioactivity data,” tor of the Artificial Intelligent Medical Center
J. Chem. Inf. Model., vol. 49, no. 2, pp. 169–184, Feb. 2009. and a Professor with the School of Intelligent
[67] L. Van Der Maaten and G. Hinton, “Visualizing data using t-SNE,” Systems Engineering, Shenzhen Campus of Sun
J. Mach. Learn. Res., vol. 9, pp. 2579–2605, Nov. 2008. Yat-sen University, Shenzhen, Guangdong, China.
He also serves as an Advisor at China Medical
University Hospital, Taichung, China, and Asia Uni-
versity, Taichung, and a Guest Professor at the
Massachusetts Institute of Technology (MIT), Cam-
bridge, MA, USA, and the University of Pittsburgh,
Pittsburgh, PA, USA. He has published more
than 300 SCI articles and with H-index more than 47. In 2020–2023, he is
Qiujie Lv is currently pursuing the Ph.D. degree the highly cited candidate (in the field of computer science and technology).
with the Artificial Intelligence Medical Research In 2021–2023, he was also selected as the world’s top 100 000 scientists.
Center, School of Intelligent Systems Engineer- In 2018–2023, he was also selected as the world’s top 2% scientists. He had
ing, Shenzhen Campus of Sun Yat-sen University, built several artificial intelligence medical systems for hospital, including
Shenzhen, Guangdong, China. various pathological image processing, MRI image processing, and big data
His research interests include graph neural net- modeling. He also built the world’s largest traditional Chinese medicine
work, drug discovery, artificial intelligence, and database (http://TCMBank.cn/). His laboratory general research interests
bioinformatics. include developing structured machine learning techniques for computer
vision tasks to investigate how to exploit the human commonsense and
incorporate them to develop the advanced artificial intelligence system.
Authorized licensed use limited to: SUN YAT-SEN UNIVERSITY. Downloaded on March 08,2023 at 02:46:12 UTC from IEEE Xplore. Restrictions apply.
View publication stats