Moflow
Moflow
617
Research Track Paper KDD '20, August 23–27, 2020, Virtual Event, USA
first of its kind which not only generates molecular graphs effi- 2 RELATED WORK
ciently by invertible mapping at one shot, but also has a chemical Molecular Generation. Different deep generative frameworks are
validity guarantee. More specifically, to capture the combinatorial proposed for generating molecular SMILES or molecular graphs.
atom-and-bond structures of molecular graphs, we propose a vari- Among the variational autoencoder (VAE)-based models [4, 5, 12,
ant of the Glow model [13] to generate bonds (multi-type edges, 15, 18, 19, 30], the JT-VAE [12] generates valid tree-structured
e.g., single, double and triple bonds), a novel graph conditional flow molecules by first generating a tree-structured scaffold of chemical
to generate atoms (multi-type nodes, e.g. C, N etc.) given bonds substructures and then assembling substructures according to the
by leveraging graph convolutions, and finally assemble atoms and generated scaffold. The MolGAN [6] is a generative adversarial
bonds into a valid molecular graph which follows bond-valence networks (GAN)-based model but shows very limited performance
constraints. We illustrate our modelling framework in Figure 1. Our in generating valid and unique molecules. The autoregressive-based
MoFlow is trained by exact and tractable likelihood estimation, and models generate molecules in a sequential manner with validity
one-pass inference and generation can be efficiently utilized for check at each generation step. For example, the MolecularRNN [25]
molecular graph optimization. sequentially generates each character of SMILES and the GCPN
We validate our MoFlow through a wide range of experiments [33] sequentially generates each atom/bond in a molecular graphs.
from molecular graph generation, reconstruction, visualization to In this paper, we explore a different deep generative framework,
optimization. As baselines, we compare the state-of-the-art VAE- namely the normalizing flow models [7, 13, 20], for molecular graph
based model [12], autoregressive-based models [25, 33], and all generation, which have the potential to memorize and reconstruct
three flow-based models [10, 20, 29]. As for memorizing input data, all the training data and generalize to generating more valid, novel
MoFlow achieves 100% reconstruction rate. As for exploring the un- and unique molecules.
known chemical space, MoFlow outperforms above models by gen- Flow-based Models. The (normalizing) flow-based models try to
erating more novel, unique and valid molecules (as demonstrated learn mappings between complex distributions and simple prior
by the N.U.V. scores in Table 2 and 3). MoFlow generates 100% distributions through invertible neural networks and such a frame-
chemically-valid molecules when sampling from prior distributions. work has good merits of exact and tractable likelihood estimation
Furthermore, if without validity correction, MoFlow still generates for training, efficient one-pass inference and sampling, invertible
much more valid molecules than existing models (validity-without- mapping and thus reconstructing all the training data etc. Examples
check scores in Table 2 and 3). For example, the state-of-the-art include NICE[7], RealNVP[8], Glow[13] and GNF [17] which show
autoregressive-flow-based model GraphAF [29] achieves 67% and promising results in generating images or even graphs [17]. See
68% validity-without-check scores for two datasets while MoFlow latest reviews in [14, 22] and more technical details in Section 3.
achieves 96% and 82% respectively, thanks to its capability of captur- To our best knowledge, there are three flow-based models for
ing the chemical structures in a holistic way. As for chemical prop- molecular graph generation. The GraphAF [29] is an autoregres-
erty optimization, MoFlow can find much more novel molecules sive flow-based model which achieves state-of-the-art performance
with top drug-likeness scores than existing models (Table 4 and in molecular graph generation. The GraphAF generates molecular
Figure 5). As for constrained property optimization, MoFlow finds graphs in a sequential manner with validity check when adding any
novel and optimized molecules with the best similarity scores and new atom or bond. The GraphNVP [20] and GRF [10] are proposed
second best property improvement (Table 5). for molecular graph generation in a one-shot manner. However,
It is worthwhile to highlight our contributions as follows: they have no guarantee for chemical validity and thus show very
• Novel MoFlow model: our MoFlow is one of the first flow- limited performance in generating valid and novel molecular graphs.
based graph generative models which not only generates Our MoFlow is the first of its kind which not only generates molec-
molecular graphs at one shot by invertible mapping but ular graphs efficiently by invertible mapping at one shot but also
also has a validity guarantee. To capture the combinatorial has a validity guarantee. In order to capture the atom-and-bond
atom-and-bond structures of molecular graphs, we propose composition of molecules, we propose a variant of Glow[13] model
a variant of Glow model for bonds (edges) and a novel graph for bonds and a novel graph conditional flow for atoms given bonds,
conditional flow for atoms (nodes) given bonds, and then and then combining them with a post-hoc validity correction. Our
assemble them into valid molecular graphs. MoFlow achieves many state-of-the-art results thanks to capturing
• State-of-the-art performance: our MoFlow achieves many the chemical structures in a holistic way, and our one-shot inference
state-of-the-art results w.r.t. molecular graph generation, re- and generation are more efficient than sequential models.
construction, optimization, etc., and at the same time our
one-shot inference and generation are very efficient, which
implies its potentials in deep exploration of huge chemical 3 MODEL PRELIMINARY
space for drug discovery.
The flow framework. The flow-based models aim to learn a se-
The outline of this paper is: survey (Sec. 2), proposed method
quence of invertible transformations f Θ = f L ◦ ... ◦ f 1 between
(Sec. 3 and 4), experiments (Sec. 5), and conclusions (Sec. 6). In order
complex high-dimensional data X ∼ P X (X ) and Z ∼ P Z (Z ) in a
to promote reproducibility, our codes and datasets are open-sourced
latent space with the same number of dimensions where the latent
at https://github.com/calvin-zcx/moflow.
distribution P Z (Z ) is easy to model (e.g., strong independence as-
sumptions hold in such a latent space). The potentially complex data
in the original space can be modelled by the change of variable
618
Research Track Paper KDD '20, August 23–27, 2020, Virtual Event, USA
∂Z 𝑨 *
Reverse
*
619
Research Track Paper KDD '20, August 23–27, 2020, Virtual Event, USA
arg max EM =(A,B)∼pM−d at a [log P A|B (A |B; θ A|B ) + log P B (B; θ B )] (6)
ReLu
θ B ,θ A|B
×L layers 𝑨𝑨𝟐𝟐 × 𝒍𝒍
Our model thus consists of two parts, namely a graph conditional Batchnorm
flow for atoms to learn the atom matrix conditional on the bond 𝑨𝑨𝟏𝟏
Split/Mask Graphconv
tensors and a flow for bonds to learn bond tensors. We further
Actnorm2D 𝐁𝐁�
learn a mapping between the learned latent vectors and molecular
properties to regress the graph-based molecular properties, and to Graphnorm
𝐀𝐀
guide the generation of optimized molecular graphs. 𝐁𝐁
Figure 2: Graph conditional flow f A | B for the atom matrix.
4.2 Graph Conditional Flow for Atoms We show the details of one invertible graph coupling layer
Given a bond tensor B ∈ B ⊂ Rc×n×n , our goal of the atom flow is and a multiscale structure consists of a cascade of L layers
to generate the right atom-type matrix A ∈ A ⊂ Rn×k to assemble of such graph coupling layer. The graphnorm is computed
valid molecules M = (A, B) ∈ M ⊂ Rn×k +c×n×n . We first define only once.
B-conditional flow and graph conditional flow f A | B to trans-
form A given B into conditional latent variable Z A |B = f A | B (A|B) layer by incorporating graph convolution structures. The bond
which follows isotropic Gaussian P ZA|B . We can get the condi- tensor B ∈ Rc×n×n keeps a fixed value during transforming the
tional probability of atom features given the bond graphs P A | B by atom matrix A. We also apply the masked convolution idea in [8] to
a conditional version of the change of variable formula. the graph convolution in the graph coupling layer. Here, we adopt
Relational Graph Convolutional Networks (R-GCN) [28] to build
4.2.1 B-Conditional Flow and Graph Conditional Flow. graph convolution layer graphconv as follows:
Definition 4.1. B-conditional flow: A B-conditional flow c
Õ
Z A |B |B = f A | B (A|B) is an invertible and dimension-kept mapping graphconv(A1 ) = Bˆi (M ⊙ A)Wi + (M ⊙ A)W0 (10)
i =1
and there exists reverse transformation f A −1 (Z
| B A |B
|B) = A|B where
where Bˆi = D −1 Bi is the normalized adjacency matrix at channel i,
f A | B and f A
−1 : A × B 7→ A × B.
D = c ,i Bc ,i,j is the sum of the in-degree over all the channels for
Í
|B
The condition B ∈ B keeps fixed during the transformation. each node, and M ∈ {0, 1}n×k is a binary mask to select a partition
Under the independent assumption of A and B, the Jacobian of A1 from A. Because the bond graph is fixed during graph coupling
f A | B is: layer and thus the graph normalization, denoted as graphnorm,
∂f A|B ∂f A|B is computed only once.
" #
∂f A|B
= ∂A ∂B , (7) We use multiple stacked graphconv->BatchNorm1d->ReLu lay-
∂(A, B) 0 1B
∂f ∂f
ers with a multi-layer perceptron (MLP) output layer to build the
A|B
the determiant of this Jacobian is det ∂(A,B) = det ∂AA|B
, and thus graph scale function S Θ and the graph transformation function TΘ .
the conditional version of the change of variable formula in the form What’s more, instead of using exponential function for the S Θ as
of log-likelihood is: discussed in Sec. 3, we adopt Sigmoid function for the sake of the
∂f A|B numerical stability of cascading multiple flow layers. The reverse
log P A|B (A |B) = log P ZA|B (Z A|B ) + log | det |. (8)
mapping of the graph coupling layer f A −1 is:
∂A |B
Definition 4.2. Graph conditional flow: A graph conditional A1 = Z A1 |B
flow is a B-conditional flow Z A |B |B = f A | B (A|B) where B ∈ B ⊂ A2 = (Z A2 |B − TΘ (Z A1 |B |B))/Sigmoid(S Θ (Z A1 |B |B)).
(11)
Rc×n×n is the adjacency tenor for edges with c types and A ∈ A ⊂
The logarithm of the Jacobian determiant of each graph coupling
Rn×k is the feature matrix of the corresponding n nodes.
layer can be efficiently computed by:
4.2.2 Graph coupling layer. We construct aforementioned invert- ∂f A|B Õ
log | det( ) |= log Sigmoid(S Θ (A1 |B)) j (12)
ible mapping f A | B and f A −1 by the scheme of the affine coupling ∂A j
|B
layer. Different from traditional affine coupling layer, our coupling where j iterates each element. In principle, we can use arbitrary
transformation relies on graph convolution [31] and thus we name complex graph convolution structures for S Θ and TΘ since the
such a coupling transformation as a graph coupling layer. computing of above Jacobian determinant of f A | B does not involve
For each graph coupling layer, we split input A ∈ Rn×k into in computing the Jacobian of S Θ or TΘ .
two parts A = (A1, A2 ) along the n row dimension, and we get the
output Z A |B = (Z A1 |B , Z A2 |B ) = f A | B (A|B) as follows: 4.2.3 Actnorm for 2-dimensional matrix. For the sake of numerical
stability, we design a variant of invertible actnorm layer [13] for
Z A1 |B = A1
(9) the 2-dimensional atom matrix, denoted as actnorm2D (activation
Z A2 |B = A2 ⊙ Sigmoid(S Θ (A1 |B)) + TΘ (A1 |B) normalization for 2D matrix), to normalize each row, namely the
where ⊙ is the element-wise product. We deign the scale function feature dimension for each node, over a batch of 2-dimensional
S Θ and the transformation function TΘ in each graph coupling atom matrices. Given the mean µ ∈ Rn×1 and the standard deviation
620
Research Track Paper KDD '20, August 23–27, 2020, Virtual Event, USA
621
Research Track Paper KDD '20, August 23–27, 2020, Virtual Event, USA
• Constrained property optimization (Sec. 5.4): Can our Empirical Running Time. Following above setup, we imple-
MoFlow generate novel molecular graphs with the optimized mented our MoFlow by Pytorch-1.3.1 and trained it by Adam op-
properties and at the same time keep the chemical similarity timizer with learning rate 0.001, batch size 256, and 200 epochs
as much as possible? for both datasets on 1 GeForce RTX 2080 Ti GPU and 16 CPU
Baselines. We compare our MoFlow with: a) the state-of-the- cores. Our MoFlow finished 200-epoch training within 22 hours (6.6
art VAE-based method JT-VAE [12] which captures the chemical minutes/epoch) for ZINC250K and 3.3 hours (0.99 minutes/epoch)
validity by encoding and decoding a tree-structured scaffold of for QM9. Thanks to efficient one-pass inference/embedding, our
molecular graphs; b) the state-of-the-art autoregressive models MoFlow takes negligible 7 minutes to learn an additional regres-
GCPN [33] and MolecularRNN (MRNN)[25] with reinforcement sion layer trained in 3 epochs for optimization experiments on
learning for property optimization, which generate molecules in a ZINC250K. In comparison, as for the ZINC250K dataset, GraphNVP
sequential manner; c) flow-based methods GraphNVP [20] and GRF [20] costs 38.4 hours (11.5 minutes/epoch) by our Pytorch imple-
[10] which generate molecules at one shot and the state-of-the-art mentation for training on ZINC250K with the same configurations,
autoregressive-flow-based model GraphAF [29] which generates and the estimated total running time of GraphAF [29] is 124 hours
molecules in a sequential way. (24 minutes/epoch) which consists of the reported 4 hours for a
Datasets. We use two datasets QM9 [26] and ZINC250K [11] for generation model trained by 10 epochs and estimated 120 hours for
our experiments and summarize them in Table 1. The QM9 contains another optimization model trained by 300 epochs. The reported
133, 885 molecules with maximum 9 atoms in 4 different types, and running time of JT-VAE [12] is roughly 24 hours in [33].
the ZINC250K has 249, 455 drug-like molecules with maximum
38 atoms in 9 different types. The molecules are kekulized by the
5.1 Generation and Reconstruction
chemical software RDKit [16] and the hydrogen atoms are removed. Setup. In this task, we evaluate our MoFlow ’s capability of gener-
There are three types of edges, namely single, double, and triple ating novel, unique and valid molecular graphs, and if our MoFlow
bonds, for all molecules. Following the pre-processing procedure in can reconstruct input molecular graphs from their latent represen-
[20], we encode each atom and bond by one-hot encoding, pad the tations. We adopted the widely-used metrics, including: Validity
molecules which have less than the maximum number of atoms with which is the percentage of chemically valid molecules in all the gen-
an virtual atom, augment the adjacency tensor of each molecule erated molecules, Uniqueness which is the percentage of unique
by a virtual edge channel representing no bonds between atoms, valid molecules in all the generated molecules, Novelty which is
and dequantize [8, 20] the discrete one-hot-encoded data by adding the percentage of generated valid molecules which are not in the
uniform random noise U [0, 0.6] for each dimension, leading to training dataset, and Reconstruction rate which is the percentage
atom matrix A ∈ R9×5 and bond tensor B ∈ R4×9×9 for QM9, and of molecules in the input dataset which can be reconstructed from
A ∈ R38×10 and B ∈ R4×38×38 for ZINC250k. their latent representations. Besides, because the novelty score also
accounts for the potentially duplicated novel molecules, we propose
Table 1: Statistics of the datasets. a new metric N.U.V. which is the percentage of Novel, Unique, and
Valid molecules in all the generated molecules. We also compare
#Mol. Max. #Node #Edge the validity of ablation models if not using validity check or validity
Graphs #Nodes Types Types
correction, denoted as Validity w/o check in [29].
QM9 133,885 9 4+1 3+1
ZINC250K 249,455 38 9+1 3+1 The prior distribution of latent space follows a spherical multi-
variate Gaussian distribution N (0, (tσ )2 I) where σ is the learned
standard deviation and the hyper-parameter t is the temperature
MoFlow Setup. To be comparable with one-shot-flow baseline for the reduced-temperature generative model [13, 20, 23]. We use
GraphNVP [20], for the ZINC250K, we adopt 10 coupling layers t = 0.85 in the generation for both QM9 and ZINC250K datasets,
and 38 graph coupling layers for the bonds’ Glow and the atoms’ and t = 0.6 for the ablation study without validity correction. To
graph conditional flow respectively. We use two 3 ∗ 3 convolution be comparable with the state-of-the-art baseline GraphAF[29], we
layers with 512, 512 hidden dimensions in each coupling layer. For generate 10, 000 molecules, i.e., sampling 10, 000 latent vectors from
each graph coupling layer, we set one relational graph convolu- the prior and then decode them by the reverse transformation of
tion layer with 256 dimensions followed by a two-layer multilayer our MoFlow. We report the the mean and standard deviation of
perceptron with 512, 64 hidden dimensions. As for the QM9, we results over 5 runs. As for the reconstruction, we encode all the
adopt 10 coupling layers and 27 graph coupling layers for the bonds’ molecules from the training dataset into latent vectors by the en-
Glow and the atoms’ graph conditional flow respectively. There coding transformation of our MoFlow and then reconstruct input
are two 3*3 convolution layers with 128, 128 hidden dimensions molecules from these latent vectors by the reverse transformation
in each coupling layer, and one graph convolution layer with 64 of MoFlow.
dimensions followed by a two-layer multilayer perceptron with Results. Table 2 and Table 3 show that our MoFlow outperfoms
128, 64 hidden dimensions in each graph coupling layer. As for the the state-of-the-art models on all the six metrics for both QM9 and
optimization experiments, we further train a regression model to ZINC250k datasets. Thanks to the invertible characteristic of the
map the latent embeddings to different property scalars (discussed flow-based models, our MoFlow builds an one-to-one mapping from
in Sec. 5.3 and 5.4) by a multi-layer perceptron with 18-dim linear the input molecule M to its corresponding latent vector Z , enabling
layer -> ReLu -> 1-dim linear layer structures. For each dataset, we 100% reconstruction rate as shown in Table 2 and Table 3. In con-
use the same trained model for all the following experiments. trast, the VAE-based method JT-VAE and the autoregressive-based
622
Research Track Paper KDD '20, August 23–27, 2020, Virtual Event, USA
method GCPN and MRNN can’t reconstruct all the input molecules. MRNN and GraphAF need to generate a molecule sequentially.
Compared with the one-shot flow-based model GraphNVP and Further more, we measure the chemical similarity between each
GRF, by incorporating validity correction mechanism, our MoFlow neighboring molecule and the centering molecule. We choose Tani-
achieves 100% validity, leading to significant improvements of the moto index [2] as the chemical similarity metrics and indicate their
validity score and N.U.V. score for both datasets. Specifically, the similarity values by a heatmap. We further visualize a linear inter-
N.U.V. score of MoFlow are 2 and 3 times as large as the N.U.V. polation between two molecules to show their changing trajectory
scores of GraphNVP and GRF respectively in Table 2. Even with- similar to the interpolation case between images [13].
out validity correction, our MoFlow still outperforms the validity Results. We show the visualization of latent space in Figure 4.
scores of GraphNVP and GRF by a large margin. Compared with We find the latent space is very smooth and the interpolations be-
the autoregressive flow-based model GraphAF, we find our MoFlow tween two latent points only change a molecule graph a little bit.
outperforms GraphAF by additional 16% and 0.8% with respect to Quantitatively, we find the chemical similarity between molecules
N.U.V scores for QM9 and ZINC respectively, indicating that our majorly correspond to their Euclidean distance between their la-
MoFlow generates more novel, unique and valid molecules. Indeed, tent vectors, implying that our MoFlow embeds similar molecular
MoFlow achieves better uniqueness score and novelty score com- graph structures into similar latent embeddings. Searching in such
pared with GraphAF for both datasets. What’s more, our MoFlow a continuous latent space learnt by our MoFlow is the basis for
without validity correction still outperforms GraphAF without the molecular property optimization and constraint optimization as
validity check by a large margin w.r.t. the validity score (validity discussed in the following sections.
w/o check in Table 2 and Table 3) for both datasets, implying the
superiority of capturing the molecular structures in a holistic way 5.3 Property Optimization
by our MoFlow over autoregressive ones in a sequential way. Setup. The property optimization task aims at generating novel
In conclusion, our MoFlow not only memorizes and reconstructs molecules with the best Quantitative Estimate of Druglikeness
all the training molecular graphs, but also generates more novel, (QED) scores [3] which measures the drug-likeness of generated
unique and valid molecular graphs than existing models, indicating molecules. Following the previous works [25, 33], we report the
that our MoFlow learns a strict superset of the training data and best property scores of novel molecules discovered by each method.
explores the unknown chemical space better. We use the pre-trained MoFlow, denoted as f , in the genera-
tion experiment to encode a molecule M and get the molecular
5.2 Visualizing Continuous Latent Space embedding Z = f (M), and further train a multilayer perceptron to
Setup. We examine the learned latent space of our MoFlow , de- regress the embedding Z of the molecules to their property values y.
noted as f , by visualizing the decoded molecular graphs from a We then search the best molecules by the gradient ascend method,
dy
neighborhood of a latent vector in the latent space. Similar to namely Z ′ = Z + λ ∗ dZ where the λ is the length of the search
[12, 15], we encode a seed molecule M into Z = f (M) and then grid step. We conduct above gradient ascend method by K steps. We
search two random orthogonal directions with unit vector X and Y decode the new embedding Z ′ in the latent space to the discovered
based on Z , then we get new latent vector by Z ′ = Z +λ X ∗X +λY ∗Y molecule by reverse mapping M ′ = f −1 (Z ′ ). The molecule M ′ is
where λ X and λY are the searching steps. Different from VAE- novel if M ′ doesn’t exist in the training dataset.
based models, our MoFlow gets decoded molecules efficiently by Results. We report the discovered novel molecules sorted by
the one-pass inverse transformation M ′ = f −1 (Z ′ ). In contrast, their QED scores in Table 4. We find previous methods can only find
the VAE-based models such as JT-VAE need to decode each latent very few molecules with the best QED score (= 0.948). In contrast,
vectors 10 − 100 times and autoregressive-based models like GCPN, our MoFlow finds much more novel molecules which have the
623
Research Track Paper KDD '20, August 23–27, 2020, Virtual Event, USA
N N
N
O
O O S S S N S N S N
N+
N+ N N N+
O N+ O N+ O N+ N
O NH
O NH O NH O NH O NH NH NH NH
N N N
Cl
NH NH
OH O Cl NH Cl NH Cl NH O N+
N N N N S
N+ Cl N
OH OH N+
Cl Cl Cl Cl N+ N+ N+ OH
N+ N+
N+
OH
O O O
0.42
0.42 0.41
0.41 0.38
0.38 0.32
0.32 0.44
0.44 0.37
0.37 0.37
0.37 0.37
0.37 0.31
0.31
O O O O O O O O
O O O N+
N
O NH O NH O NH O NH O NH NH NH NH N
Cl
N
NH NH NH NH NH
OH OH OH O O Cl NH Cl NH Cl O
N N N N N S
N+ N+ N+
N+
Cl Cl Cl Cl Cl N+ N+ OH
N+
NH
O O
0.53
0.53 0.53
0.53 0.53
0.53 0.68
0.68 0.68
0.68 0.57
0.57 0.57
0.57 0.43
0.43 0.32
0.32
O O O
O O O O O
O O
O NH
O NH O NH O NH O NH O NH NH NH
NH Cl
N
N
N N+
O NH NH NH NH
O O O O
Cl Cl
O N N N N N
Cl N
Cl Cl Cl Cl Cl O N+
N+
O
O OH
1.00
1.00 0.52
0.52 0.68
0.68 0.68
0.68 0.68
0.68 0.68
0.68 0.57
0.57 0.47
0.47 0.39
0.39
O O O O O O
O O O
O NH O NH O NH O NH O NH O NH
O NH O NH O NH
N N N N N N+
NH NH NH
O O O
N N N
Cl N Cl N Cl N Cl N Cl N Cl N+
Cl Cl Cl
O O O O O O
Tanimoto Similarity
1.00
1.00 1.00
1.00 1.00
1.00 0.68
0.68 0.68
0.68 0.68
0.68 1.00
1.00 1.00
1.00 0.57
0.57
O O O O O O O O O
O NH O NH O NH O NH O NH O NH O NH O NH O NH
N N N N N N N N N+
Cl N Cl N Cl N Cl N Cl N Cl N Cl N Cl N Cl N
O O O O O O O O O
1.00
1.00 1.00
1.00 1.00
1.00 1.00
1.00 1.00
1.00 1.00
1.00 1.00
1.00 1.00
1.00 0.68
O O O
NH NH NH NH
O O O O O O
N N N N N N
N N N+
N N N N N N
NH NH NH NH NH NH
O O O O O O Cl N Cl N Cl N
O O O
0.59
0.59 0.59
0.59 0.59
0.59 0.59
0.59 0.59
0.59 0.59
0.59 0.79
0.79 0.79
0.79 0.53
0.53
O OH
NH NH
O O O O O O O
N N N N N N N
N+ N
N N N N N N N
NH NH NH NH NH NH NH
O O O O O O O Cl N Cl N
O O
0.59
0.59 0.59
0.59 0.59
0.59 0.59
0.59 0.59
0.59 0.59
0.59 0.59
0.59 0.53
0.53 0.71
0.71
OH
NH
O O O O O O O
N N N N N N N
N
N N N N N N N N
O NH NH
NH NH NH NH NH NH NH
O O O O O O HO Cl N
0.34
0.34 0.59
0.59 0.59
0.59 0.59
0.59 0.59
0.59 0.59
0.59 0.59
0.59 0.54
0.54 0.71
0.71
N O O O O O O O
N+ N N N N N+ N N
N N N N N N N N
N
O NH NH
NH NH NH NH NH NH NH NH
O O O O HO HO HO
O
0.32
0.32 0.34
0.34 0.59
0.59 0.59
0.59 0.59
0.59 0.59
0.59 0.32
0.32 0.54
0.54 0.54
0.54
Cl Cl Cl Cl
N
Cl Cl Cl
N N N NH N
O N
N NH N NH N+ Cl
Cl
HO O+ O+ O+
N
N N NH2 NH2
O
Cl O N NH2 NH2 NH2
N N+ O S O S
O O
O
O
N
NH SH SH SH
N NH N+ N N
NH
1.00 0.71 0.51 0.49 0.38 0.13 0.13 0.14 0.19 0.17
Figure 4: Visualization of learned latent space by our MoFlow. Top: Visualization of the grid neighbors of a seed molecule in
the center, which serves as the baseline for measuring similarity. Bottom: Interpolation between two seed molecular graphs
and the left one is the baseline molecule for measuring similarity. Seed molecules are highlighted in red boxs and they are
randomly selected from ZINC250K.
O F
NH2 F
NH
S
S
N NH
O
O S
NH
N Cl N
N
N
0.948 0.948 0.948 0.948 0.948 of-the-art VAE model JT-VAE, our MoFlow achieves much higher
similarity score and property improvement, implying that our
F
NH O
0.948 0.948 0.948 0.948 0.948 ular embedding. Compared with the state-of-the-art reinforcement
Figure 5: Illustration of discovered novel molecules with the learning based method GCPN and GraphAF which is good at gen-
best druglikeness QED scores. erating molecules step-by-step with targeted property rewards, our
model MoFlow achieves the best similarity scores and the second
624
Research Track Paper KDD '20, August 23–27, 2020, Virtual Event, USA
625
Research Track Paper KDD '20, August 23–27, 2020, Virtual Event, USA
APPENDIX:
A INFERENCE AND GENERATION
We summarize the inference (encoding) and generation (decoding) of molec-
ular graphs by our MoFlow in Algorithm 1 and Algorithm 2 respectively. We
visualize the overall framework in Figure 1. As shown in the algorithms, our
MoFlow have merits of exact likelihood estimation/training, one-pass infer-
ence, invertible and one-pass generation, and chemical validity guarantee.
626