0% found this document useful (0 votes)
31 views13 pages

Molgan: An Implicit Generative Model For Small Molecular Graphs

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views13 pages

Molgan: An Implicit Generative Model For Small Molecular Graphs

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

MolGAN: An implicit generative model for small molecular graphs

Nicola De Cao 1 Thomas Kipf 1

Abstract Generator
Molecular graph
Discriminator
Deep generative models for graph-structured data
arXiv:1805.11973v2 [stat.ML] 27 Sep 2022

offer a new angle on the problem of chemical


0/1
synthesis: by optimizing differentiable models
that directly generate molecular graphs, it is pos- z ~ p(z)
sible to side-step expensive search procedures in
the discrete and vast space of chemical structures.
We introduce MolGAN, an implicit, likelihood- Reward
free generative model for small molecular graphs network
x ~ pdata(x)
that circumvents the need for expensive graph
matching procedures or node ordering heuris- 0/1
tics of previous likelihood-based methods. Our
method adapts generative adversarial networks
(GANs) to operate directly on graph-structured Figure 1. Schema of MolGAN. A vector z is sampled from a prior
data. We combine our approach with a reinforce- and passed to the generator which outputs the graph representation
of a molecule. The discriminator classifies whether the molecular
ment learning objective to encourage the genera-
graph comes from the generator or the dataset. The reward net-
tion of molecules with specific desired chemical
work tries to estimate the reward for the chemical properties of a
properties. In experiments on the QM9 chemi- particular molecule provided by an external software.
cal database, we demonstrate that our model is
capable of generating close to 100% valid com-
pounds. MolGAN compares favorably both to
recent proposals that use string-based (SMILES) et al., 2016; Kusner et al., 2017; Guimaraes et al., 2017;
representations of molecules and to a likelihood- Dai et al., 2018) make use of a so-called SMILES repre-
based method that directly generates graphs, al- sentation (Weininger, 1988) of molecules: a string-based
beit being susceptible to mode collapse. representation derived from molecular graphs. Recurrent
neural networks (RNNs) are ideal candidates for these rep-
resentations and consequently, most recent works follow the
1. Introduction recipe of applying RNN-based generative models on this
type of encoding. String-based representations of molecules,
Finding new chemical compounds with desired properties however, have certain disadvantages: RNNs have to spend
is a challenging task with important applications such as capacity on learning both the syntactic rules and the order
de novo drug design (Schneider & Fechner, 2005). The ambiguity of the representation. Besides, this is approach
space of synthesizable molecules is vast and search in this not applicable to generic (non-molecular) graphs.
space proves to be very difficult, mostly owing to its discrete
SMILES strings are generated from a graph-based represen-
nature.
tation of molecules, thereby working in the original graph
Recent progress in the development of deep generative mod- space has the benefit of removing additional overhead. With
els has spawned a range of promising proposals to address recent progress in the area of deep learning on graphs (Bron-
this issue. Most works in this area (Gómez-Bombarelli stein et al., 2017; Hamilton et al., 2017), training deep gen-
1 erative models directly on graph representations becomes
Informatics Institute, University of Amsterdam, Amster-
dam, The Netherlands. Correspondence to: Nicola De Cao a feasible alternative that has been explored in a range of
<[email protected]>. recent works (Kipf & Welling, 2016; Johnson, 2017; Grover
et al., 2019; Li et al., 2018b; Simonovsky & Komodakis,
Presented at the ICML 2018 workshop on Theoretical Foundations 2018; You et al., 2018).
and Applications of Deep Generative Models, Stockholm, Sweden,
PMLR 80, 2018. Copyright 2018 by the author(s). Likelihood-based methods for molecular graph generation
MolGAN: An implicit generative model for small molecular graphs

(Li et al., 2018b; Simonovsky & Komodakis, 2018) how- can summarize this representation in a node feature ma-
ever, either require providing a fixed (or randomly chosen) trix X = [x1 , ..., xN ]T ∈ RN ×T and an adjacency tensor
ordered representation of the graph or an expensive graph A ∈ RN ×N ×Y where Aij ∈ RY is a one-hot vector indi-
matching procedure to evaluate the likelihood of a generated cating the type of the edge between i and j.
molecule, as the evaluation of all possible node orderings is
prohibitive already for graphs of small size. 2.2. Implicit vs. likelihood-based methods
In this work, we sidestep this issue by utilizing implicit, Likelihood-based methods such as the variational auto-
likelihood-free methods, in particular, a generative adversar- encoder (VAE) (Kingma & Welling, 2014; Rezende et al.,
ial network (GAN) (Goodfellow et al., 2014) that we adapt 2014) typically allow for easier and more stable optimization
to work directly on graph representations. We further utilize than implicit generative models such as a GAN (Goodfel-
a reinforcement learning (RL) objective similar to ORGAN low et al., 2014). When generating graph-structured data,
(Guimaraes et al., 2017) to encourage the generation of however, we wish to be invariant to reordering of nodes
molecules with particular properties. in the (ordered) matrix representation of the graph, which
Our molecular GAN (MolGAN) model (outlined in Figure requires us to either perform a prohibitively expensive graph
1) is the first to address the generation of graph-structured matching procedure (Simonovsky & Komodakis, 2018) or
data in the context of molecular synthesis using GANs to evaluate the likelihood for all possible node permutations
(Goodfellow et al., 2014). The generative model of Mol- explicitly.
GAN predicts discrete graph structure at once (i.e., non- By resorting to implicit generative models, in particular
sequentially) for computational efficiency, although sequen- to the GAN framework, we circumvent the need for an
tial variants are possible in general. MolGAN further uti- explicit likelihood. While the discriminator of the GAN
lizes a permutation-invariant discriminator and reward net- can be made invariant to node ordering by utilizing graph
work (for RL-based optimization towards desired chemi- convolutions (Bruna et al., 2014; Duvenaud et al., 2015;
cal properties) based on graph convolution layers (Bruna Kipf & Welling, 2017) and a node aggregation operator (Li
et al., 2014; Duvenaud et al., 2015; Kipf & Welling, 2017; et al., 2016), the generator still has to decide on a specific
Schlichtkrull et al., 2017) that both operate directly on graph- node ordering when generating a graph. Since we do not
structured representations. provide a likelihood, however, the generator is free to choose
any suitable ordering for the task at hand. We provide a
2. Background brief introduction to GANs in the following.

2.1. Molecules as graphs Generative adversarial networks GANs (Goodfellow


Most previous deep generative models for molecular data et al., 2014) are implicit generative models in the sense
(Gómez-Bombarelli et al., 2016; Kusner et al., 2017; that they allow for inference of model parameters without
Guimaraes et al., 2017; Dai et al., 2018) resort to gener- requiring one to specify a likelihood.
ating SMILES representations of molecules. The SMILES A GAN consist of two main components: a generative
syntax, however, is not robust to small changes or mistakes, model Gθ , that learns a map from a prior to the data distribu-
which can result in the generation of invalid or drastically tion to sample new data-points, and a discriminative model
different structures. Grammar VAEs (Kusner et al., 2017) Dφ , that learns to classify whether samples came from the
alleviate this problem by constraining the generative process data distribution rather than from Gθ . Those two models are
to follow a particular grammar. implemented as neural networks and trained simultaneously
Operating directly in the space of graphs has recently been with stochastic gradient descent (SGD). Gθ and Dφ have
shown to be a viable alternative for generative modeling of different objectives, and they can be seen as two players in
molecular data (Li et al., 2018b; Simonovsky & Komodakis, a minimax game
2018) with the added benefit that all generated outputs are min max Ex∼pdata (x) [log Dφ (x)]+
θ φ
valid graphs (but not necessarily valid molecules).
Ez∼pz (z) [log(1 − Dφ (Gθ (z))] , (1)
We consider that each molecule can be represented by an
undirected graph G with a set of edges E and nodes V. where Gθ tries to generate samples to fool the discriminator
Each atom corresponds to a node vi ∈ V that is associ- and Dφ tries to differentiate samples correctly. To prevent
ated with a T -dimensional one-hot vector xi , indicating undesired behaviour such as mode collapse (Salimans et al.,
the type of the atom. We further represent each atomic 2016) and to stabilize learning, we use minibatch discrimina-
bond as an edge (vi , vj ) ∈ E associated with a bond type tion (Salimans et al., 2016) and improved WGAN (Gulrajani
y ∈ {1, ..., Y }. For a molecular graph with N nodes, we et al., 2017), an alternative and more stable GAN model that
minimizes a better suited divergence.
MolGAN: An implicit generative model for small molecular graphs

Improved WGAN WGANs (Arjovsky et al., 2017) mini- particular, we employ a simplified version of deep determin-
mize an approximation of the Earth Mover (EM) distance istic policy gradient (DDPG) introduced by Lillicrap et al.
(also know as Wasserstein-1 distance) defined between two (2016), an off-policy actor-critic algorithm that uses deter-
probability distributions. Formally, the Wasserstein distance ministic policy gradients to maximize an approximation of
between p and q, using the Kantorovich-Rubinstein duality the expected future reward.
is
In our case, the policy is the GAN generator Gθ which takes
1     a sample z for the prior as input, instead of an environmental
DW [p||q] = sup Ex∼p(x) f (x) −Ex∼q(x) f (x) ,
K kf kL <K state s, and it outputs a molecular graph as an action (a =
(2) G). Moreover, we do not model episodes, so there is no
where in the case of WGAN, p is the empirical distribution need to assess the quality of a state-action combination
and q is the generator distribution. Note that the supremum since it does only depend on the graph G. Therefore, we
is over all the K-Lipschitz functions for some K > 0. introduce a learnable and differentiable approximation of the
Gulrajani et al. (2017) introduce a gradient penalty as an reward function R̂ψ (G) that predicts the immediate reward,
alternative soft constraint on the 1-Lipschitz continuity as and we train it via a mean squared error objective based
an improvement upon the gradient clipping scheme from on the real reward provided by an external system (e.g.,
the original WGAN. The loss with respect to the generator the synthesizability score of a molecule). Then, we train
remains the same as in WGAN, but the loss function with the generator maximizing the predicted reward via R̂ψ (G)
respect to the discriminator is modified to be which, being differentiable, provides a gradient to the policy
towards the desired metric.
L(x(i) , Gθ (z (i) ); φ) = −Dφ (x(i) ) + Dφ (Gθ (z (i) )) +
| {z }
original WGAN loss 3. Model
 2
α k∇x̂(i) Dφ (x̂(i) )k − 1 , (3) The MolGAN architecture (Figure 2) consists of three main
| {z } components: a generator Gθ , a discriminator Dφ and a
gradient penalty
reward network R̂ψ .
where α is a hyperparameter (we use α = 10 as in the The generator takes a sample from a prior distribution and
original paper), x̂(i) is a sampled linear combination be- generates an annotated graph G representing a molecule.
tween x(i) ∼ pdata (x) and Gθ (z (i) ) with z (i) ∼ pz (z), Nodes and edges of G are associated with annotations denot-
thus x̂(i) =  x(i) + (1 − ) Gθ (z (i) ) with  ∼ U(0, 1). ing atom type and bond type respectively. The discriminator
takes both samples from the dataset and the generator and
2.3. Deterministic policy gradients learns to distinguish them. Both Gθ and Dφ are trained
A GAN generator learns a transformation from a prior dis- using improved WGAN such that the generator learns to
tribution to the data distribution. Thus, generated samples match the empirical distribution and eventually outputs valid
resemble data samples. However, in de novo drug design molecules.
methods, we are not only interested in generating chem- The reward network is used to approximate the reward
ically valid compounds, but we want them to have some function of a sample and optimize molecule generation to-
useful property (e.g., to be easily synthesizable). There- wards non-differentiable metrics using reinforcement learn-
fore, we also optimize the generation process towards some ing. Dataset and generated samples are inputs of R̂ψ , but,
non-differentiable metrics using reinforcement learning. differently from the discriminator, it assigns scores to them
In reinforcement learning, a stochastic policy is represented (e.g., how likely the generated molecule is to be soluble
by πθ (s) = pθ (a|s) which is a parametric probability distri- in water). The reward network learns to assign a reward
bution in θ that selects a categorical action a conditioned on to each molecule to match a score provided by an external
an state s. Conversely, a deterministic policy is represented software1 . Notice that, when MolGAN outputs a non-valid
by µθ (s) = a which deterministically outputs an action. molecule, it is not possible to assign a reward since the
graph is not even a compound. Thus, for invalid molecular
In initial experiments, we explored using REINFORCE graphs, we assign zero rewards.
(Williams, 1992) in combination with a stochastic policy
that models graph generation as a set of categorical choices The discriminator is trained using the WGAN objective
(actions). However, we found that it converged poorly due to while the generator uses a linear combination of the WGAN
the high dimensional action space when generating graphs 1
We used the RDKit Open-Source Cheminformatics Software:
at once. We instead base our method on a deterministic http://www.rdkit.org.
policy gradient algorithm which is known to perform well
in high-dimensional action spaces (Silver et al., 2014). In
MolGAN: An implicit generative model for small molecular graphs

Adjacency tensor A <latexit sha1_base64="EMPyu5ASlEpI1qvrJeu1mckhUAU=">AAAB8XicbVDLSsNAFL3xWeur6tLNYBFclUSEuqy4cVnBPrANZTKdtEMnkzBzI5TQv3DjQhG3/o07/8ZJm4W2Hhg4nHMvc+4JEikMuu63s7a+sbm1Xdop7+7tHxxWjo7bJk414y0Wy1h3A2q4FIq3UKDk3URzGgWSd4LJbe53nrg2IlYPOE24H9GREqFgFK302I8ojoMwu5kNKlW35s5BVolXkCoUaA4qX/1hzNKIK2SSGtPz3AT9jGoUTPJZuZ8anlA2oSPes1TRiBs/myeekXOrDEkYa/sUkrn6eyOjkTHTKLCTeUKz7OXif14vxfDaz4RKUuSKLT4KU0kwJvn5ZCg0ZyinllCmhc1K2JhqytCWVLYleMsnr5L2Zc1za979VbVRL+oowSmcwQV4UIcG3EETWsBAwTO8wptjnBfn3flYjK45xc4J/IHz+QOmV5Da</latexit>


<latexit
Sampled à <latexit sha1_base64="IVJAEzjPjiXPvp4Oo4QNTUc/Kds=">AAAB+3icbVDLSsNAFL2pr1pftS7dDBbBVUlEqMuKG5cV7AOaUCaTSTt0MgkzE7GE/IobF4q49Ufc+TdO2iy09cDA4Zx7uWeOn3CmtG1/W5WNza3tnepubW//4PCoftzoqziVhPZIzGM59LGinAna00xzOkwkxZHP6cCf3Rb+4JFKxWLxoOcJ9SI8ESxkBGsjjesNN8J66oeZqxkPaHaT5+N6027ZC6B14pSkCSW64/qXG8QkjajQhGOlRo6daC/DUjPCaV5zU0UTTGZ4QkeGChxR5WWL7Dk6N0qAwliaJzRaqL83MhwpNY98M1kkVateIf7njVIdXnsZE0mqqSDLQ2HKkY5RUQQKmKRE87khmEhmsiIyxRITbeqqmRKc1S+vk/5ly7Fbzv1Vs9Mu66jCKZzBBTjQhg7cQRd6QOAJnuEV3qzcerHerY/laMUqd07gD6zPH5mUlME=</latexit>
Graph
Discriminator

N ~ N GCN 0/1
Generator

N N

Annotation matrix X Sampled X̃ Molecule


z ~ p(z)
<latexit sha1_base64="k8fMTYMpbcAk1m6rTYMegJsdMOM=">AAAB8XicbVDLSsNAFL2pr1pfVZduBovgqiQi1GXBjcsK9oFtKJPppB06mYSZG6GE/oUbF4q49W/c+TdO2iy09cDA4Zx7mXNPkEhh0HW/ndLG5tb2Tnm3srd/cHhUPT7pmDjVjLdZLGPdC6jhUijeRoGS9xLNaRRI3g2mt7nffeLaiFg94CzhfkTHSoSCUbTS4yCiOAnCrDcfVmtu3V2ArBOvIDUo0BpWvwajmKURV8gkNabvuQn6GdUomOTzyiA1PKFsSse8b6miETd+tkg8JxdWGZEw1vYpJAv190ZGI2NmUWAn84Rm1cvF/7x+iuGNnwmVpMgVW34UppJgTPLzyUhozlDOLKFMC5uVsAnVlKEtqWJL8FZPXiedq7rn1r3761qzUdRRhjM4h0vwoAFNuIMWtIGBgmd4hTfHOC/Ou/OxHC05xc4p/IHz+QPJSpDx</latexit>
<latexit

<latexit sha1_base64="h5fkkvOPNqe9NI7w0SLn2N2FVmc=">AAAB+3icbVDLSsNAFL3xWesr1qWbwSK4KokIdVlw47KCfUATymQyaYdOJmFmIpaQX3HjQhG3/og7/8ZJm4W2Hhg4nHMv98wJUs6Udpxva2Nza3tnt7ZX3z84PDq2Txp9lWSS0B5JeCKHAVaUM0F7mmlOh6mkOA44HQSz29IfPFKpWCIe9DylfowngkWMYG2ksd3wYqynQZR7mvGQ5sOiGNtNp+UsgNaJW5EmVOiO7S8vTEgWU6EJx0qNXCfVfo6lZoTTou5liqaYzPCEjgwVOKbKzxfZC3RhlBBFiTRPaLRQf2/kOFZqHgdmskyqVr1S/M8bZTq68XMm0kxTQZaHoowjnaCyCBQySYnmc0MwkcxkRWSKJSba1FU3JbirX14n/auW67Tc++tmp13VUYMzOIdLcKENHbiDLvSAwBM8wyu8WYX1Yr1bH8vRDavaOYU/sD5/ALyelNg=</latexit>

Reward network

N N
~ GCN 0/1

T T

Figure 2. Outline of MolGAN. From left: the generator takes a sample from a prior distribution and generates a dense adjacency tensor
A and an annotation matrix X. Subsequently, sparse and discrete à and X̃ are obtained from A and X respectively via categorical
sampling. The combination of à and X̃ represents an annotated molecular graph which corresponds to a specific chemical compound.
Finally, the graph is processed by both the discriminator and reward networks that are invariant to node order permutations and based on
Relational-GCN (Schlichtkrull et al., 2017) layers.

loss and the RL loss: passing them to Dφ and R̂ψ in order to make the gen-
eration stochastic while still forwarding continuous ob-
L(θ) = λ · LW GAN (θ) + (1 − λ) · LRL (θ) , (4) jects (i.e., X̃ij = Xij + Gumbel(µ = 0, β = 1) and
à = Aijy + Gumbel(µ = 0, β = 1)), or iii) use a straight-
where λ ∈ [0, 1] is a hyperparameter that regulates the trade- through gradient based on categorical reparameterization
off between the two components. with the Gumbel-Softmax (Jang et al., 2017; Maddison
et al., 2017), that is we use a sample form a categorical
3.1. Generator distribution during the forward pass (i.e., X̃i = Cat(Xi )
Gφ (z) takes D-dimensional vectors z ∈ RD sampled from and Ãij = Cat(Aij )) and the continuous relaxed values
a standard normal distribution z ∼ N (0, I) and outputs (i.e., the original X and A) in the backward pass.
graphs. While recent works have shown that it is feasible
to generate graphs of small size by using an RNN-based 3.2. Discriminator and reward network
generative model (Johnson, 2017; You et al., 2018; Li et al., Both the discriminator Dφ and the reward network R̂ψ re-
2018a;b) we, for simplicity, utilize a generative model that ceive a graph as input, and they output a scalar value each.
predicts the entire graph at once using a simple multi-layer We choose the same architecture for both networks but do
perceptron (MLP), as similarly done in Simonovsky & Ko- not share parameters between them. A series of graph con-
modakis (2018). While this limits our study to graphs of volution layers convolve node signals X̃ using the graph
a pre-chosen maximum size, we find that it is significantly adjacency tensor Ã. We base our model on Relational-GCN
faster and easier to optimize. (Schlichtkrull et al., 2017), a convolutional network for
We restrict the domain to graphs of a limited number of graphs with support for multiple edge types. At every layer,
nodes and, for each z, Gθ outputs two continuous and dense feature representations of nodes are convolved/propagated
objects: X ∈ RN ×T that defines atom types and AN ×N ×Y according to:
that defines bonds types (see Section 2.1). Both X and N X
Y
A have a probabilistic interpretation since each node and 0(`+1) (`)
X Ãijy (`)
hi = fs(`) (hi , xi ) + fy(`) (hj , xj ) ,
edge type is represented with probabilities of categorical j=1 y=1
|Ni |
distributions over types. To generate a molecule we obtain (`+1) 0(`+1)
discrete, sparse objects X̃ and à via categorical sampling hi = tanh(hi ) , (5)
from X and A, respectively. We overload notation and also (`)
where hi is the signal of the node i at layer ` and fs is a
(`)
represent samples from the dataset with binary X̃ and Ã. linear transformation function that acts as a self-connection
As this discretization process is non-differentiable, we ex- between layers. We further utilize an edge type-specific
(`)
plore three model variations to allow for gradient-based affine function fy for each layer. Ni denotes the set of
training: We can i) use the continuous objects X and neighbors for node i. The normalization factor 1/|Ni | en-
A directly during the forward pass (i.e., X̃ = X and sures that activations are on a similar scale irrespective of
à = A), ii) add Gumbel noise to X and A before the number of neighbors.
MolGAN: An implicit generative model for small molecular graphs

After several layers of propagation via graph convolutions, ular graphs from scratch, which makes direct comparison
following Li et al. (2016) we aggregate node embeddings infeasible.
into a graph level representation vector as
X 5. Experiments
h0G = σ(i(h(L)
v , xv )) tanh(j(h(L)
v , xv )) ,
v∈V We compare MolGAN against recent neural network-based
hG = tanh h0G , (6) drug generation models in a range of experiments on es-
tablished benchmarks using the QM9 (Ramakrishnan et al.,
where σ(x) = 1/(1+exp(−x)) is the logistic sigmoid func- 2014) chemical database. We first focus on studying the
tion, i and j are MLPs with a linear output layer and de- effect of the λ parameter to find the best trade-off between
notes element-wise multiplication. Then, hG is a vector rep- the GAN and RL objective (see Section 5.1). We then
resentation of the graph G and it is further processed by an compare MolGAN with ORGAN (Guimaraes et al., 2017)
MLP to produce a graph level scalar output ∈ (−∞, +∞) since it is the most related work to ours: ORGAN is a
for the discriminator and ∈ (0, 1) for the reward network. sequential generative model operating on SMILES represen-
tations, optimizing towards several chemical properties with
4. Related work an RL objective (see Section 5.2). We also compare our
model against variational autoencoding methods (Section
Objective-Reinforced Generative Adversarial Networks 5.3) such as CharacterVAE (Gómez-Bombarelli et al., 2016),
(ORGAN) by Guimaraes et al. (2017) is the closest re- GrammarVAE (Kusner et al., 2017), as well as a recent
lated work to ours. Their model relies on SeqGAN (Yu graph-based generative model: GraphVAE (Simonovsky &
et al., 2017) to adversarially learn to output sequences while Komodakis, 2018).
optimizing towards chemical metrics with REINFORCE
(Williams, 1992). The main differences from our approach Dataset In all experiments, we used QM9 (Ramakrishnan
is that they model sequences of SMILES as molecular rep- et al., 2014) a subset of the massive 166.4 billion molecules
resentations instead of graphs, and their RL component uses GDB-17 chemical database (Ruddigkeit et al., 2012). QM9
REINFORCE while we use DDPG. Segler et al. (2018) also contains 133,885 organic compounds up to 9 heavy atoms:
employs RL for drug discovery by searching retrosynthetic carbon (C), oxygen (O), nitrogen (N) and fluorine (F).
routes using Monte Carlo Tree Search (MCTS) in combina-
tion with an expansion policy network. Generator architecture The generator architecture is
Several other works have explored training generative mod- fixed for all experiments. We use N = 9 as the maxi-
els on SMILES representations of molecules: CharacterVAE mum number of nodes, T = 5 as the number of atom types
(Gómez-Bombarelli et al., 2016) is the first such model that (C, O, N, F, and one padding symbol), and Y = 4 as the
is based on a VAE with recurrent encoder and decoder net- number of bond types (single, double, triple and no bond).
works. GrammarVAE (Kusner et al., 2017) and SDVAE These dimensionalities are enough to cover all molecules in
(Dai et al., 2018) constrain the decoding process to follow QM9. The generator takes a 32-dimensional vector sampled
particular syntactic and semantic rules. from a standard normal distribution z ∼ N (0, I) and pro-
cess it with a 3-layer MLP of [128, 256, 512] hidden units
A related line of research considers training deep genera- respectively, with tanh as activation functions. Eventually,
tive models to output graph-structured data directly. Sev- the last layer is linearly projected to match X and A dimen-
eral works explored auto-encoder architectures utilizing
graph convolutions for link prediction within graphs (Kipf PD with a softmax
sions and normalized in their last dimension
operation (softmax(x)i = exp(xi )/ i=1 exp(xi )).
& Welling, 2016; Grover et al., 2019; Davidson et al., 2018).
Johnson (2017); Li et al. (2018b); You et al. (2018); Li et al. Discriminator and reward network architecture Both
(2018a) on the other hand developed likelihood-based meth- networks use a RelationalGCN encoder (see Eq. 5) with two
ods to directly output graphs of arbitrary size in a sequential layers and [64, 32] hidden units, respectively, to process the
manner. Several related works have explored extending input graphs. Subsequently, we compute a 128-dimensional
VAEs to generate graphs directly, examples include the graph-level representation (see Eq. 6) further processed by
GraphVAE (Simonovsky & Komodakis, 2018), Junction a 2-layer MLP of dimensions [128, 1] and with tanh as
Tree VAE (Jin et al., 2018) and the NeVAE (Samanta et al., hidden layer activation function. In the reward network, we
2018) model. further use a sigmoid activation function on the output.
For link prediction within graphs, a range of adversarial
methods have been introduced in the literature (Minervini Evaluation measures We measure the following statis-
et al., 2017; Wang et al., 2018; Bojchevski et al., 2018). This tics as defined in Samanta et al. (2018): validity, novelty,
class of models, however, is not suitable to generate molec- and uniqueness. Validity is defined as the ratio between
MolGAN: An implicit generative model for small molecular graphs

the number of valid and all generated molecules. Novelty Results We report results in Table 1. We observe a clear
measures the ratio between the set of valid samples that are trend towards higher validity scores for lower values of
not in the dataset and the total number of valid samples. Fi- λ. This is likely due to the implicit optimization of valid
nally, uniqueness is defined as the ratio between the number molecules since invalid ones receive zero reward during
of unique samples and valid samples and it measures the training. Therefore, if the RL loss component is strong, the
degree of variety during sampling. generator is optimized to generate mostly valid molecular
graphs. Conversely, it appears that λ does not mainly affect
Training In all experiments, we use a batch size of 32 the unique and novel scores. Notice that these scores are
and train using the Adam (Kingma & Ba, 2015) optimizer not optimized, neither directly nor indirectly, and therefore
with a learning rate of 10−3 . For each setting, we employ a they are a result of model architecture, hyperparameters,
grid search over dropout rates ∈ {0.0, 0.1, 0.25} (Srivastava and training procedure. Indeed, the unique score is always
et al., 2014) as well as over discretization variations (as de- close to 2% (which is our threshold) indicating that models
scribed in Section 3.1). We always report the results of the appear to collapse (even in the RL only case) if we do not
best model depending on what we are optimizing for (e.g., apply early stopping.
when optimizing solubility we report the model with the We also run λ = 0 without starting from a pretrained model.
highest solubility score – when no metric is optimized we We observe that it succeeds in optimizing toward the desired
report the model with the highest sum of individual scores). metrics, but it collapses outputting very few samples (i.e.,
Although the use of WGAN should prevent, to some extent, low unique score). This behavior may indicate that pretrain-
undesired behaviors like mode collapse (Salimans et al., ing is fundamental for matching the data distribution before
2016), we notice that our models suffer from that problem. using RL since the GAN acts regularizing towards diversity.
We leave addressing this issue for future work. As a sim-
ple countermeasure, we employ early stopping, evaluating Since λ controls the trade-off between the WGAN and RL
every 10 epochs, to avoid completely collapsed modes. In losses, it is not surprising that λ = 0 (i.e., only RL in
particular, we use the unique score to measure the degree of the second half of training) results in the highest valid and
collapse of our models since it intrinsically indicates how solubility scores compared to other values. The λ value with
much variety there is in the generation process. We set an the highest sum of scores is λ = 0. We use this value for
arbitrary threshold of 2% under which we consider a model subsequent experiments.
to be collapsed and stop training.
Algorithm Valid Unique Novel Sol.
During early stages of our work, we noticed that the reward
network needs several epochs of pretraining before being λ = 0 (full RL)* 100.0 0.03 100.0 0.98
used to propagate the gradient to the generator, otherwise the λ = 0 (full RL) 99.8 2.3 97.9 0.86
generator easily diverges. We think this happens because at λ = 0.01 98.2 2.2 98.1 0.74
the beginning of the training, R̂ψ does not predict the reward λ = 0.05 92.2 2.7 95.0 0.67
accurately and then it does not optimize the generator well. λ = 0.1 87.3 3.2 87.2 0.56
Therefore, in each experiment, we train the generator for the λ = 0.5 86.6 2.1 87.5 0.48
first half of the epochs without the RL component, but using λ = 1 (no RL) 87.7 2.9 97.7 0.54
the WGAN objective only. We train the reward network
during these epochs, but no RL loss is used to train the Table 1. Comparison of different combinations of RL and GAN
generator. For the second half of the epochs we use the objectives on the small 5k dataset after GAN-based pretraining for
combined loss in Equation 4. 150 epochs. All values are reported in percentages except for the
solubility score. * indicates no GAN-based pretraining and Sol.
indicates Solubility.
5.1. Effect of λ
As in Guimaraes et al. (2017), the λ hyperparameter controls
5.2. Objectives optimization
the trade-off between maximizing the desired objective and
regulating the generator to match the data distribution. We Similarly to the previous experiment, we train our model for
study the effects of λ ∈ {0.0, 0.01, 0.05, 0.5, 1.0} on the 300 epochs on the 5k QM9 subset while optimizing the same
solubility metric (see Section 5.2 for more details). We train objectives as Guimaraes et al. (2017) to compare against
for 300 epochs (150 of which for pretraining) on the 5k their work. Moreover, we also report results on the full
subset of QM9 used in Guimaraes et al. (2017). We use dataset trained for 30 epochs (note that the full dataset is 20
the best λ parameter – determined via the model with the times larger than the subset). All scores are normalized to lie
maximum sum of valid, unique, novel, and solubility scores within [0, 1]. We assign a score of zero to invalid compounds
– on all other experiments (Section 5.2 and 5.3) without (i.e., implicitly we are also optimizing a validity score). We
doing any further search. choose to optimize the following objectives which represent
MolGAN: An implicit generative model for small molecular graphs

Objective Algorithm Valid (%) Unique (%) Time (h) Diversity Druglikeliness Synthesizability Solubility
Druglikeliness ORGAN 88.2 69.4* 9.63* 0.55 0.52 0.32 0.35
OR(W)GAN 85.0 8.2* 10.06* 0.95 0.60 0.54 0.47
Naive RL 97.1 54.0* 9.39* 0.80 0.57 0.53 0.50
MolGAN 99.9 2.0 1.66 0.95 0.61 0.68 0.52
MolGAN (QM9) 100.0 2.2 4.12 0.97 0.62 0.59 0.53
Synthesizability ORGAN 96.5 45.9* 8.66* 0.92 0.51 0.83 0.45
OR(W)GAN 97.6 30.7* 9.60* 1.00 0.20 0.75 0.84
Naive RL 97.7 13.6* 10.60* 0.96 0.52 0.83 0.46
MolGAN 99.4 2.1 1.04 0.75 0.52 0.90 0.67
MolGAN (QM9) 100.0 2.1 2.49 0.95 0.53 0.95 0.68
Solubility ORGAN 94.7 54.3* 8.65* 0.76 0.50 0.63 0.55
OR(W)GAN 94.1 20.8* 9.21* 0.90 0.42 0.66 0.54
Naive RL 92.7 100.0* 10.51* 0.75 0.49 0.70 0.78
MolGAN 99.8 2.3 0.58 0.97 0.45 0.42 0.86
MolGAN (QM9) 99.8 2.0 1.62 0.99 0.44 0.22 0.89
All/Alternated ORGAN 96.1 97.2* 10.2* 0.92 0.52 0.71 0.53
All/Simultaneously MolGAN 97.4 2.4 2.12 0.91 0.47 0.84 0.65
All/Simultaneously MolGAN (QM9) 98.0 2.3 5.83 0.93 0.51 0.82 0.69

Table 2. Gray cells indicate directly optimized objectives. Baseline results are taken from Guimaraes et al. (2017) (Table 1) and * indicates
results reproduced by us using the code provided by the authors.

qualities typically desired for drug discovery: Results Results are reported in Table 2. Qualitative sam-
ples are provided in the Appendix (Figure 3). We observe
Druglikeness: how likely a compound is to be a drug. that MolGAN models always converge to very high validity
The Quantitative Estimate of Druglikeness (QED) score outputs > 97% at the end of the training. This is coherent as
quantifies compound quality with a weighted geometric observed in the previous experiment, since also here there is
mean of desirability scores capturing the underlying data dis- an implicit optimization of validity. Moreover, in all single
tribution of several drug properties (Bickerton et al., 2012). metrics settings, our models beat ORGAN models in terms
of valid scores as well as all the three objective scores we
Solubility: the degree to which a molecule is hydrophilic.
optimize for.
The log octanol-water partition coefficient (logP), is defined
as the logarithm of the ratio of the concentrations between We argue that this should be mainly due to two factors: i)
two solvents of a solute (Comer & Tam, 2001). intuitively, it should be easier to optimize a molecular graph
Synthetizability: this measure quantifies how easy a predicted as a single sample than to optimize an RNN model
molecule is to synthesize. The Synthetic Accessibility score that outputs a sequence of characters, and ii) using the deter-
(Ertl & Schuffenhauer, 2009) is a method to estimate the ministic policy gradient instead of REINFORCE effectively
ease of synthesis in a probabilistic way. provides a better gradient and it improves the sampling
procedure towards metrics while penalizing invalid graphs.
We also measure, without optimizing for it, a diversity score
which indicates how likely a molecule is to be diverse with Training on the full QM9 dataset for 10 times fewer epochs
respect to the dataset. This measure compares sub-structures further improves results in almost all scores. During train-
between samples and a random subset from the dataset ing, our algorithm observes more different samples, and
indicating how many repetitions there are. therefore it can learn well with much fewer iterations. More-
over, it can observe molecules with more diverse structures
For evaluation, we report average scores from 6400 sampled and properties.
compounds as in (Guimaraes et al., 2017). Additionally,
we re-run experiments from (Guimaraes et al., 2017) to As previously observed in Section 5.1, also in this experi-
compute unique scores and execution time since it is not ment the unique score is always close to 2% confirming our
reported. Differently from ORGAN, to optimize for all hypothesis that our models are susceptible to mode collapse.
objectives, we do not alternate between optimizing them This is not the case for the ORGAN baseline. During sam-
individually during training which in our case is not possible pling, ORGAN generates sequences of maximum 51 char-
since the reward network is specific to a single type of acters which allows it to generate larger molecules whereas
reward. Thus, we instead optimize a joint reward which we our model is (by choice) constrained to generate up to 9
define as the product (to lie within ∈ [0, 1]) of all objectives. atoms. This explain the difference in unique score since the
MolGAN: An implicit generative model for small molecular graphs

Algorithm Valid Unique Novel sampled unique molecules are (mostly) novel and not sim-
ply memorized from the dataset, we additionally measure
CharacterVAE 10.3 67.5 90.0 how many of the unique molecules are also novel for our
GrammarVAE 60.2 9.3 80.9 model. This score is 97% indicating that almost all unique
GraphVAE 55.7 76.0 61.6 molecules are indeed novel and MolGAN does not suffer
GraphVAE/imp 56.2 42.0 75.8 from such problems.
GraphVAE NoGM 81.0 24.1 61.0
Differently from our approach, VAEs optimize the evidence
MolGAN 98.1 10.4 94.2
lower bound (ELBO) and there is no explicit nor implicit
optimization of output validity. Moreover, since a part of the
Table 3. Comparison with different algorithms on QM9. Values
are reported in percentages. Baseline results are taken from Si- ELBO maximizes reconstruction of the observations, the
monovsky & Komodakis (2018). novelty in the sampling process is not expected to be high
since it is not optimized. However, in all reported methods
novelty is > 60% and, in the case of CharacterVAE, 90%.
chance of generating different molecules in a smaller space Though CharacterVAE can achieve a high novelty score, it
is much lower. Notice that in ORGAN, the RL component underperforms in terms of validity. MolGAN, on the other
relies on REINFORCE, and the unique score is optimized hand, achieves both high validity and novelty scores.
penalizing non-unique outputs which we do not.
In terms of training time, our model outperforms ORGAN 6. Conclusions
by a large margin when training on the 5k dataset (at least
In this work, we have introduced MolGAN: an implicit gen-
∼5 times faster in each setting), as we do not rely on se-
erative model for molecular graphs of small size. Through
quential generation or discrimination. Both ORGAN and
joint training with a GAN and an RL objective, our model is
MolGAN have a comparable number of parameters, with
capable of generating molecular graphs with both higher va-
the latter being approximately 20% larger.
lidity and novelty than previous comparable VAE-based gen-
erative models, while not requiring a permutation-dependent
5.3. VAE Baselines likelihood function. Compared to a recent SMILES-based
In this experiment, we compare MolGAN against recent sequential GAN model for molecular generation, MolGAN
likelihood-based methods that utilize VAEs. We report a can achieve higher chemical property scores (such as solu-
comparison with CharacterVAE (Gómez-Bombarelli et al., bility) while allowing for at least ∼5x faster training time.
2016), GrammarVAE (Kusner et al., 2017), and GraphVAE A central limitation of our current formulation of MolGANs
(Simonovsky & Komodakis, 2018). Here we train using the is their susceptibility to mode collapse: both the GAN and
complete QM9 dataset. Naturally, we compare only with the RL objective do not encourage generation of diverse and
metrics that measure the quality of the generative process non-unique outputs whereby the model tends to be pulled
since the likelihood is not computed directly in MolGAN. towards a solution that only involves little sample variability.
Moreover, we do not optimize any particular chemical prop- This ultimately results in the generation of only a handful
erty except validity (i.e., we do not optimize any metric of different molecules if training is not stopped early.
described above, but we optimize towards chemically valid
compounds). The final evaluation scores are an average We think that this issue can be addressed in future work,
from 104 random samples. The number of samples differs for example via careful design of reward functions or some
from the previous experiment to be in line with the setting form of pretraining. The MolGAN framework taken to-
in Simonovsky & Komodakis (2018). gether with established benchmark datasets for chemical
synthesis offer a new test bed for improvements on GAN
Results Results are reported in Table 3. Training on the stability with respect to the issue of mode collapse. We
full QM9 dataset (without optimizing any metric except believe that insights gained from such evaluations will be
validity) results in a model with a higher unique score com- valuable to the community even outside of the scope of gen-
pared to the ones in Section 5.2. erating molecular graphs. Lastly, it will be promising to
explore alternative generative architectures within the Mol-
Though the unique score of MolGAN is slightly higher GAN framework, such as recurrent graph-based generative
compared to GrammarVAE, the other baselines are superior models (Johnson, 2017; Li et al., 2018b; You et al., 2018),
in terms of this score. Even though here we do not con- as our current one-shot prediction of the adjacency tensor is
sider our model to be collapsed, such a low score confirms most likely feasible only for graphs of small size.
our hypothesis that our model is prone to mode collapse.
On the other hand, we observe significantly higher validity
scores compared to the VAE-based baselines.To verify that
MolGAN: An implicit generative model for small molecular graphs

Acknowledgements Ricardo (eds.), Proceedings of the Thirty-Fourth Confer-


ence on Uncertainty in Artificial Intelligence, UAI 2018,
The authors would like to thank Luca Falorsi, Tim R. David- Monterey, California, USA, August 6-10, 2018, pp. 856–
son, Herke van Hoof and Max Welling for helpful discus- 865. AUAI Press, 2018. URL http://auai.org/
sions and feedback. T.K. is supported by SAP SE Berlin. uai2018/proceedings/papers/309.pdf.

References Duvenaud, David, Maclaurin, Dougal, Aguilera-


Iparraguirre, Jorge, Gómez-Bombarelli, Rafael,
Arjovsky, Martı́n, Chintala, Soumith, and Bottou, Léon. Hirzel, Timothy, Aspuru-Guzik, Alán, and Adams,
Wasserstein generative adversarial networks. In Pre- Ryan P. Convolutional networks on graphs for learning
cup, Doina and Teh, Yee Whye (eds.), Proceed- molecular fingerprints. In Cortes, Corinna, Lawrence,
ings of the 34th International Conference on Ma- Neil D., Lee, Daniel D., Sugiyama, Masashi, and
chine Learning, ICML 2017, Sydney, NSW, Australia, Garnett, Roman (eds.), Advances in Neural Information
6-11 August 2017, volume 70 of Proceedings of Processing Systems 28: Annual Conference on Neural
Machine Learning Research, pp. 214–223. PMLR, Information Processing Systems 2015, December
2017. URL http://proceedings.mlr.press/ 7-12, 2015, Montreal, Quebec, Canada, pp. 2224–
v70/arjovsky17a.html. 2232, 2015. URL https://proceedings.
Bickerton, G Richard, Paolini, Gaia V, Besnard, Jérémy, neurips.cc/paper/2015/hash/
Muresan, Sorel, and Hopkins, Andrew L. Quantifying f9be311e65d81a9ad8150a60844bb94c-Abstract.
the chemical beauty of drugs. Nature chemistry, 4(2):90, html.
2012.
Ertl, Peter and Schuffenhauer, Ansgar. Estimation of syn-
Bojchevski, Aleksandar, Shchur, Oleksandr, Zügner, Daniel, thetic accessibility score of drug-like molecules based on
and Günnemann, Stephan. Netgan: Generating graphs via molecular complexity and fragment contributions. Jour-
random walks. In Dy, Jennifer G. and Krause, Andreas nal of cheminformatics, 1(1):8, 2009.
(eds.), Proceedings of the 35th International Conference
on Machine Learning, ICML 2018, Stockholmsmässan, Gómez-Bombarelli, Rafael, Wei, Jennifer N, Duve-
Stockholm, Sweden, July 10-15, 2018, volume 80 of Pro- naud, David, Hernández-Lobato, José Miguel, Sánchez-
ceedings of Machine Learning Research, pp. 609–618. Lengeling, Benjamı́n, Sheberla, Dennis, Aguilera-
PMLR, 2018. URL http://proceedings.mlr. Iparraguirre, Jorge, Hirzel, Timothy D, Adams, Ryan P,
press/v80/bojchevski18a.html. and Aspuru-Guzik, Alán. Automatic chemical design us-
ing a data-driven continuous representation of molecules.
Bronstein, Michael M, Bruna, Joan, LeCun, Yann, Szlam,
ACS Central Science, 2016.
Arthur, and Vandergheynst, Pierre. Geometric deep learn-
ing: going beyond euclidean data. IEEE Signal Process- Goodfellow, Ian J., Pouget-Abadie, Jean, Mirza, Mehdi,
ing Magazine, 34(4):18–42, 2017. Xu, Bing, Warde-Farley, David, Ozair, Sherjil, Courville,
Bruna, Joan, Zaremba, Wojciech, Szlam, Arthur, and Le- Aaron C., and Bengio, Yoshua. Generative adver-
Cun, Yann. Spectral networks and locally connected sarial nets. In Ghahramani, Zoubin, Welling, Max,
networks on graphs. In Bengio, Yoshua and LeCun, Yann Cortes, Corinna, Lawrence, Neil D., and Weinberger,
(eds.), 2nd International Conference on Learning Rep- Kilian Q. (eds.), Advances in Neural Information
resentations, ICLR 2014, Banff, AB, Canada, April 14- Processing Systems 27: Annual Conference on Neural
16, 2014, Conference Track Proceedings, 2014. URL Information Processing Systems 2014, December
http://arxiv.org/abs/1312.6203. 8-13 2014, Montreal, Quebec, Canada, pp. 2672–
2680, 2014. URL https://proceedings.
Comer, John and Tam, Kin. Lipophilicity profiles: the- neurips.cc/paper/2014/hash/
ory and measurement. Wiley-VCH: Zürich, Switzerland, 5ca3e9b122f61f8f06494c97b1afccf3-Abstract.
2001. html.
Dai, Hanjun, Tian, Yingtao, Dai, Bo, Skiena, Steven, and
Grover, Aditya, Zweig, Aaron, and Ermon, Stefano.
Song, Le. Syntax-directed variational autoencoder for
Graphite: Iterative generative modeling of graphs. In
molecule generation. In International Conference on
Chaudhuri, Kamalika and Salakhutdinov, Ruslan (eds.),
Machine Learning, 2018.
Proceedings of the 36th International Conference on
Davidson, Tim R., Falorsi, Luca, Cao, Nicola De, Kipf, Machine Learning, ICML 2019, 9-15 June 2019, Long
Thomas, and Tomczak, Jakub M. Hyperspherical vari- Beach, California, USA, volume 97 of Proceedings of
ational auto-encoders. In Globerson, Amir and Silva, Machine Learning Research, pp. 2434–2444. PMLR,
MolGAN: An implicit generative model for small molecular graphs

2019. URL http://proceedings.mlr.press/ 7-9, 2015, Conference Track Proceedings, 2015. URL
v97/grover19a.html. http://arxiv.org/abs/1412.6980.
Guimaraes, Gabriel Lima, Sanchez-Lengeling, Benjamin, Kingma, Diederik P. and Welling, Max. Auto-encoding
Farias, Pedro Luis Cunha, and Aspuru-Guzik, Alán. variational bayes. In Bengio, Yoshua and LeCun, Yann
Objective-reinforced generative adversarial networks (or- (eds.), 2nd International Conference on Learning Rep-
gan) for sequence generation models. ArXiv preprint, resentations, ICLR 2014, Banff, AB, Canada, April 14-
abs/1705.10843, 2017. URL https://arxiv.org/ 16, 2014, Conference Track Proceedings, 2014. URL
abs/1705.10843. http://arxiv.org/abs/1312.6114.
Gulrajani, Ishaan, Ahmed, Faruk, Arjovsky, Martı́n, Kipf, Thomas N and Welling, Max. Variational graph auto-
Dumoulin, Vincent, and Courville, Aaron C. Im- encoders. In NIPS Bayesian Deep Learning Workshop,
proved training of wasserstein gans. In Guyon, 2016.
Isabelle, von Luxburg, Ulrike, Bengio, Samy, Wallach,
Hanna M., Fergus, Rob, Vishwanathan, S. V. N., and Kipf, Thomas N. and Welling, Max. Semi-supervised classi-
Garnett, Roman (eds.), Advances in Neural Infor- fication with graph convolutional networks. In 5th Inter-
mation Processing Systems 30: Annual Conference national Conference on Learning Representations, ICLR
on Neural Information Processing Systems 2017, 2017, Toulon, France, April 24-26, 2017, Conference
December 4-9, 2017, Long Beach, CA, USA, pp. 5767– Track Proceedings. OpenReview.net, 2017. URL https:
5777, 2017. URL https://proceedings. //openreview.net/forum?id=SJU4ayYgl.
neurips.cc/paper/2017/hash/
Kusner, Matt J., Paige, Brooks, and Hernández-Lobato,
892c3b1c6dccd52936e27cbd0ff683d6-Abstract.
José Miguel. Grammar variational autoencoder. In
html.
Precup, Doina and Teh, Yee Whye (eds.), Proceed-
Hamilton, William L, Ying, Rex, and Leskovec, Jure. Rep- ings of the 34th International Conference on Ma-
resentation learning on graphs: Methods and applica- chine Learning, ICML 2017, Sydney, NSW, Australia,
tions. ArXiv preprint, abs/1709.05584, 2017. URL 6-11 August 2017, volume 70 of Proceedings of
https://arxiv.org/abs/1709.05584. Machine Learning Research, pp. 1945–1954. PMLR,
2017. URL http://proceedings.mlr.press/
Jang, Eric, Gu, Shixiang, and Poole, Ben. Categorical v70/kusner17a.html.
reparameterization with gumbel-softmax. In 5th Inter-
national Conference on Learning Representations, ICLR Li, Yibo, Zhang, Liangren, and Liu, Zhenming. Multi-
2017, Toulon, France, April 24-26, 2017, Conference objective de novo drug design with conditional graph
Track Proceedings. OpenReview.net, 2017. URL https: generative model. ArXiv preprint, abs/1801.07299, 2018a.
//openreview.net/forum?id=rkE3y85ee. URL https://arxiv.org/abs/1801.07299.

Jin, Wengong, Barzilay, Regina, and Jaakkola, Tommi S. Li, Yujia, Tarlow, Daniel, Brockschmidt, Marc, and Zemel,
Junction tree variational autoencoder for molecular graph Richard S. Gated graph sequence neural networks. In
generation. In Dy, Jennifer G. and Krause, Andreas Bengio, Yoshua and LeCun, Yann (eds.), 4th Interna-
(eds.), Proceedings of the 35th International Conference tional Conference on Learning Representations, ICLR
on Machine Learning, ICML 2018, Stockholmsmässan, 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference
Stockholm, Sweden, July 10-15, 2018, volume 80 of Pro- Track Proceedings, 2016. URL http://arxiv.org/
ceedings of Machine Learning Research, pp. 2328–2337. abs/1511.05493.
PMLR, 2018. URL http://proceedings.mlr.
Li, Yujia, Vinyals, Oriol, Dyer, Chris, Pascanu, Razvan,
press/v80/jin18a.html.
and Battaglia, Peter. Learning deep generative models
Johnson, Daniel D. Learning graphical state transitions. of graphs. ArXiv preprint, abs/1803.03324, 2018b. URL
In 5th International Conference on Learning Repre- https://arxiv.org/abs/1803.03324.
sentations, ICLR 2017, Toulon, France, April 24-26,
2017, Conference Track Proceedings. OpenReview.net, Lillicrap, Timothy P., Hunt, Jonathan J., Pritzel, Alexander,
2017. URL https://openreview.net/forum? Heess, Nicolas, Erez, Tom, Tassa, Yuval, Silver, David,
id=HJ0NvFzxl. and Wierstra, Daan. Continuous control with deep re-
inforcement learning. In Bengio, Yoshua and LeCun,
Kingma, Diederik P. and Ba, Jimmy. Adam: A method for Yann (eds.), 4th International Conference on Learning
stochastic optimization. In Bengio, Yoshua and LeCun, Representations, ICLR 2016, San Juan, Puerto Rico, May
Yann (eds.), 3rd International Conference on Learning 2-4, 2016, Conference Track Proceedings, 2016. URL
Representations, ICLR 2015, San Diego, CA, USA, May http://arxiv.org/abs/1509.02971.
MolGAN: An implicit generative model for small molecular graphs

Maddison, Chris J., Mnih, Andriy, and Teh, Yee Whye. Schlichtkrull, Michael, Kipf, Thomas N, Bloem, Peter,
The concrete distribution: A continuous relaxation of Berg, Rianne van den, Titov, Ivan, and Welling, Max.
discrete random variables. In 5th International Confer- Modeling relational data with graph convolutional net-
ence on Learning Representations, ICLR 2017, Toulon, works. ArXiv preprint, abs/1703.06103, 2017. URL
France, April 24-26, 2017, Conference Track Proceedings. https://arxiv.org/abs/1703.06103.
OpenReview.net, 2017. URL https://openreview.
net/forum?id=S1jE5L5gl. Schneider, Gisbert and Fechner, Uli. Computer-based de
novo design of drug-like molecules. Nature Reviews Drug
Minervini, Pasquale, Demeester, Thomas, Rocktäschel, Tim, Discovery, 4(8):649, 2005.
and Riedel, Sebastian. Adversarial sets for regularising
neural link predictors. In Elidan, Gal, Kersting, Kristian, Segler, Marwin HS, Preuss, Mike, and Waller, Mark P. Plan-
and Ihler, Alexander T. (eds.), Proceedings of the Thirty- ning chemical syntheses with deep neural networks and
Third Conference on Uncertainty in Artificial Intelligence, symbolic ai. Nature, 555(7698):604, 2018.
UAI 2017, Sydney, Australia, August 11-15, 2017. AUAI
Press, 2017. URL http://auai.org/uai2017/ Silver, David, Lever, Guy, Heess, Nicolas, Degris, Thomas,
proceedings/papers/306.pdf. Wierstra, Daan, and Riedmiller, Martin A. Deterministic
policy gradient algorithms. In Proceedings of the 31th
Ramakrishnan, Raghunathan, Dral, Pavlo O, Rupp, International Conference on Machine Learning, ICML
Matthias, and Von Lilienfeld, O Anatole. Quantum chem- 2014, Beijing, China, 21-26 June 2014, volume 32 of
istry structures and properties of 134 kilo molecules. Sci- JMLR Workshop and Conference Proceedings, pp. 387–
entific data, 1:140022, 2014. 395. JMLR.org, 2014. URL http://proceedings.
mlr.press/v32/silver14.html.
Rezende, Danilo Jimenez, Mohamed, Shakir, and Wier-
stra, Daan. Stochastic backpropagation and approx- Simonovsky, Martin and Komodakis, Nikos. Graphvae:
imate inference in deep generative models. In Pro- Towards generation of small graphs using variational au-
ceedings of the 31th International Conference on toencoders. ArXiv preprint, abs/1802.03480, 2018. URL
Machine Learning, ICML 2014, Beijing, China, 21- https://arxiv.org/abs/1802.03480.
26 June 2014, volume 32 of JMLR Workshop and
Conference Proceedings, pp. 1278–1286. JMLR.org, Srivastava, Nitish, Hinton, Geoffrey, Krizhevsky, Alex,
2014. URL http://proceedings.mlr.press/ Sutskever, Ilya, and Salakhutdinov, Ruslan. Dropout:
v32/rezende14.html. A simple way to prevent neural networks from overfit-
ting. The Journal of Machine Learning Research, 15(1):
Ruddigkeit, Lars, Van Deursen, Ruud, Blum, Lorenz C, and 1929–1958, 2014.
Reymond, Jean-Louis. Enumeration of 166 billion or-
ganic small molecules in the chemical universe database Wang, Hongwei, Wang, Jia, Wang, Jialin, Zhao, Miao,
gdb-17. Journal of chemical information and modeling, Zhang, Weinan, Zhang, Fuzheng, Xie, Xing, and Guo,
52(11):2864–2875, 2012. Minyi. Graphgan: Graph representation learning with
generative adversarial nets. In McIlraith, Sheila A.
Salimans, Tim, Goodfellow, Ian J., Zaremba, Wojciech, and Weinberger, Kilian Q. (eds.), Proceedings of the
Cheung, Vicki, Radford, Alec, and Chen, Xi. Improved Thirty-Second AAAI Conference on Artificial Intelligence,
techniques for training gans. In Lee, Daniel D., (AAAI-18), the 30th innovative Applications of Artificial
Sugiyama, Masashi, von Luxburg, Ulrike, Guyon, Intelligence (IAAI-18), and the 8th AAAI Symposium
Isabelle, and Garnett, Roman (eds.), Advances in on Educational Advances in Artificial Intelligence
Neural Information Processing Systems 29: Annual (EAAI-18), New Orleans, Louisiana, USA, February
Conference on Neural Information Processing Systems 2-7, 2018, pp. 2508–2515. AAAI Press, 2018. URL
2016, December 5-10, 2016, Barcelona, Spain, pp. 2226– https://www.aaai.org/ocs/index.php/
2234, 2016. URL https://proceedings. AAAI/AAAI18/paper/view/16611.
neurips.cc/paper/2016/hash/
8a3363abe792db2d8761d6403605aeb7-Abstract. Weininger, David. Smiles, a chemical language and in-
html. formation system. 1. introduction to methodology and
encoding rules. Journal of chemical information and
Samanta, Bidisha, De, Abir, Ganguly, Niloy, and Gomez- computer sciences, 28(1):31–36, 1988.
Rodriguez, Manuel. Designing random graph models
using variational autoencoders with applications to chem- Williams, Ronald J. Simple statistical gradient-following
ical design. ArXiv preprint, abs/1802.05283, 2018. URL algorithms for connectionist reinforcement learning. In
https://arxiv.org/abs/1802.05283. Reinforcement Learning, pp. 5–32. Springer, 1992.
MolGAN: An implicit generative model for small molecular graphs

You, Jiaxuan, Ying, Rex, Ren, Xiang, Hamilton, William L,


and Leskovec, Jure. Graphrnn: A deep generative model
for graphs. In International Conference on Machine
Learning, 2018.
Yu, Lantao, Zhang, Weinan, Wang, Jun, and Yu, Yong. Se-
qgan: Sequence generative adversarial nets with policy
gradient. In Singh, Satinder P. and Markovitch, Shaul
(eds.), Proceedings of the Thirty-First AAAI Conference
on Artificial Intelligence, February 4-9, 2017, San Fran-
cisco, California, USA, pp. 2852–2858. AAAI Press,
2017. URL http://aaai.org/ocs/index.php/
AAAI/AAAI17/paper/view/14344.
MolGAN: An implicit generative model for small molecular graphs

O
OH
O
O O
NH2
OH HO
HO
N OH O
O O
O

0.480 0.475 0.498 0.517 0.545 0.572 0.480 0.613

O
O N
OH
O OH OH
N
OH
O N N
O NH O

0.404 0.464 0.508 0.400 0.571 0.599 0.617 0.529

NH
NH OH
O
N O
N HO O HO
NH
HO O O
HO
O
OH

0.515 0.512 0.450 0.556 0.566 0.545 0.535 0.578

OH OH
O

N NH O
O HO

NH
O
OH
O O

0.593 0.441 0.492 0.442 0.601 0.601 0.613 0.554

O OH OH OH
O
O O O HO
HO
O
N O
N OH
O NH2
O

0.595 0.287 0.371 0.527 0.569 0.597 0.601 0.523

N NH2
O N
NH OH
OH
HO
HN N O O
NH N
OH N NH

OH

0.462 0.456 0.568 0.337 0.522 0.612 0.619 0.570

(a) QM9 samples (b) MolGAN (QED) samples

Figure 3. Samples from the QM9 dataset (left) and MolGAN trained to optimize druglikeliness (QED) on the 5k QM9 subset (right). We
also report their relative QED scores.

You might also like