Machine Learn
Machine Learn
com
ScienceDirect
Classical molecular dynamics (MD) simulations will be able to compute, but use approximations that forfeit accuracy.
reach sampling in the second timescale within five years, The extent to which these limitations may affect the
producing petabytes of simulation data at current force field validity of the results depends on the system and the
accuracy. Notwithstanding this, MD will still be in the regime of biological question at hand. Quantum mechanics (QM)
low-throughput, high-latency predictions with average calculations can be used to obtain an accurate description
accuracy. We envisage that machine learning (ML) will be able of a molecule, but are computationally demanding and
to solve both the accuracy and time-to-prediction problem by very limited in terms of sampling. Ideally, one would like
learning predictive models using expensive simulation data. to simulate at quantum level accuracy, which describes
The synergies between classical, quantum simulations and ML the physics and chemistry precisely, but at the sampling
methods, such as artificial neural networks, have the potential regime of current classical simulations.
to drastically reshape the way we make predictions in
computational structural biology and drug discovery. The first simulation of protein dynamics dates from
1977 and consisted of a 9.2 ps trajectory of the bovine
pancreatic trypsin inhibitor (BPTI) in vacuum [5]. In 2010,
Addresses [6] reported a 1 ms trajectory of the same protein in explicit
1
Computational Biophysiscs Laboratory (GRIB-IMIM), Universitat
Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), Doctor
solvent, which constitutes a 100 million increase in trajec-
Aiguader 88, 08003 Barcelona, Spain tory length compared to the first simulation. In 30 years,
2
Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig MD simulations have increased sampling capabilities over
Lluis Companys 23, Barcelona 08010, Spain 8 orders of magnitude, with increasing accuracy in the force
fields [2–4]. In the last 10 years, MD has evolved from
Corresponding author: De Fabritiis, Gianni ([email protected])
single simulation [7–9] to high-throughput molecular
dynamics studies [10–15,16], where hundreds of micro-
Current Opinion in Structural Biology 2018, 49:139–144 seconds of simulations are computed in independent tra-
This review comes from a themed issue on Theory and simulation jectories to obtain converged statistics. Software and hard-
Edited by Robert Best and Kresten Lindorff-Larsen
ware innovations, such as the implementation of MD
codes for GPUs [17–20], distributed computing projects
For a complete overview see the Issue and the Editorial
like Folding@home [21], GPUGRID [22] and the devel-
Available online 21st February 2018 opment of special-purpose supercomputers like ANTON
https://doi.org/10.1016/j.sbi.2018.02.004 [23], are steadily decreasing the computational cost of
0959-440X/ã 2018 Elsevier Ltd. All rights reserved. molecular simulations. Additionally, the development of
adaptive sampling schemes has introduced more efficient
ways to sample conformational space, decreasing the
amount of simulations needed [24–26].
Figure 1
PDBBind
("general set" of 17,900
1,000.0 molecule protein-ligand structures
dihedral with annotated affinity)
Simulation length (ms)
1.0
neural
network
0.1 available
structure
MD
available
affinity
× × x =
1
Overview of a combined simulation and machine learning approach. (a) MD data generation is expected to reach the second aggregated
timescale by 2022 and an output files size of several petabytes by 2022 based on a trend of maximum aggregated time per paper per year using
the ACEMD software. Chart adapted from [27]. Referenced publications correspond to [12,13,15,29,56–58]. (b). A first example of ML replacing
QM to predict dihedral energies given a neural network trained with QM simulations. (c). An example of data augmentation by MD: augment
protein–ligand binding poses for a set of protein–ligand pairs with unknown binding mode; augment binding affinity data for a set of resolved
protein–ligand complex structures of unknown affinities.
Machine learning applied to structural biology [40] is a deep learning-based model for toxicity prediction
ML approaches are not new in simulation analysis. For of compounds, winning the Tox21 toxicology prediction
instance, the common analysis pipeline for MD simula- challenge in 2014 by a large margin. Variational autoen-
tions involves dimensionality reduction [28–33]) and coders [41], a generative flavor of deep NNs, were
clustering algorithms. recently applied to convert discrete representations of
molecules to and from a multidimensional continuous
In the last few years, ML applications have grown expo- representation [42], allowing for efficient search and
nentially. One of the main factors driving this growth is optimization through open-ended spaces of chemical
the broad popularization of a particular type of ML called compounds. Additionally, autoencoders have also been
deep neural networks [34,35]. An artificial neural network used for dimensionality reduction in MD [43–45]. VAMP-
(NN) is a simple mathematical framework organized in nets [46] fit a Markov state model from the system specific
layers, each of them performing a matrix multiplication molecular simulation data. NNs have also been used to
and a non-linear function of the input variables x. The reproduce the free-energy surface of molecules [47].
output of a single neuron f in each layer is given by Deep convolutional neural networks (CNN) [48] have
f ¼ f ðwt x þ bÞ, where w are learnable weights, b is a bias become increasingly popular due to its performance in
and f is some nonlinear function. NNs can have several to machine vision, a property that has been exploited by us
hundred of nested layers and in such cases is called and others to apply it on structural biology by treating
“deep”. Given enough parameters, a NN is capable of proteins as 3D images. CNNs have been used for ligand
interpolating any continuous function [36,37]. binding site detection [49], ligand pose prediction [50],
ligand active/inactive classification [51], ligand binding
The application of NN models in computational biology affinity prediction [52] and protein design [53]. Also, the
is steadily increasing [38]. For instance, the Merck molec- DeepChem software [54] and the MoleculeNet chal-
ular activity challenge demonstrated the potential of deep lenge [55] provide multiple featurization algorithms and
neural network models in drug discovery [39]. DeepTox access to relevant QSAR prediction datasets.
Case 1: ML models to predict quantum forces The method shows an overall good mean correlation
using QM simulation data (0.82) when tested against the PDBBind’s core set. This
One important application case of ML is apparent in set contains several targets clustered by sequence simi-
quantum mechanics. QM simulations are notoriously larity, in order to define a representative, non-redundant
computationally expensive and, depending on the level subset of proteins. For few of these protein clusters,
of theory, scales poorly with the number of atoms of the however, the correlation disappears or even becomes
system [59]. It is therefore not surprising that there have negative. This fact might be explained by a lack of
been efforts to interpolate the QM many-body potential training data for specific protein pockets, which ulti-
with NNs to obtain a predictive model of QM forces. mately leads to a poor generalization in these cases.
KDEEP is structure-based, that is, it requires labelled data
Many studies on approximating QM with NNs were in the form of the structure of the protein–ligand complex
performed previously to the recent NN resurgence. In and their affinity. One way to address this issue would be
particular, Behler et al. [60,61,62] contributed signifi- to extend the available training datasets by obtaining new
cantly to the field for small molecules and on ways to fit affinity data or structures, either experimentally or com-
quantum observables, for example, infrared spectra [63]. putationally (Figure 1c). Experiments are of course a
The initial effort went to provide usable symmetry func- possibility, viable for pharmaceutical companies and
tions that could guarantee basic physical principles like some academic groups. Here we prefer to look at the
translational, rotational [64] and invariance on atom reor- computational options which can be more automated in
dering to the learned potentials. Transferability, however, active learning methods and are subjected to be expo-
was limited until recently [65]. nentially cheaper in the future.
In [65,66,67,68], NNs are trained from QM simulation A potential synergy between MD and ML would improve
data to generate the potential energy surface and forces the accuracy of predictive NN models, delivering pre-
for small molecules, generalizing to unseen molecules, dictions several orders of magnitude faster than simula-
including some preliminary tests on proteins. In the same tions. This level of performance is needed for large
way as MD force fields do, forces are true derivatives of prediction studies in drug design, where thousands of
the interpolated potential energy surface using the gra- molecules need to be evaluated, such as in virtual screen-
dients of the NN, and can be used to run dynamics. This ing. As for training KDEEP, the two most popular binding
guarantees that the forces produced by the NNs yield a affinity databases are PDBBind [70] and Binding MOAD
conservative field [69]. The QM energy potential is [71]. PDBind’s latest release (v2017) screened the
therefore learned with the accuracy of first-principle 124 962 structures in the PDB database [72] (as in Jan
based methods, using generated datasets for many mole- 1st, 2017) and identified 59 805 valid molecular complex
cules. The computational cost of generating the datasets structures into four main categories: protein–small ligand,
is of course very large, but once trained, the NN inference nucleic acid–small ligand, protein–nucleic acid and
cost is many orders of magnitude faster than the QM protein–protein complexes. From this set of structures,
computational model and comparable in costs to standard they defined the general set, providing binding affinity
classical MD (Figure 1b). data (KD/KI and IC50) for a total of 17,900 biomolecular
complexes in the PDB database, including protein–ligand
Case 2: ML models to predict binding affinities (14 761), nucleic acid–ligand (121), protein–nucleic acid
using MD simulation data (837), and protein–protein complexes (2181). The other
MD software for GPU has made simulations of full dataset, Binding MOAD, contains binding information
protein–ligand binding processes possible, allowing the for 9142 structures, being 6862 of them overlapped with
prediction of thermodynamic and kinetic properties PDBBind. This makes a total of 20 065 co-crystal struc-
[12,13]. At the moment, a trade-off between accuracy tures with binding data, out of the 59 805 complex struc-
and sampling restricts the applicability of MD compared tures detected in the PDB. A naive example of synergy
to other commonly used methods used in drug design, could be to increase the available affinity data for the
like docking, less accurate but significantly faster. Even if remaining 39 740 structures by simulations. This, how-
the sampling problem is solved via brute force, MD does ever would be very expensive and arguably impractical in
not currently guarantee that the results are correct a prediction study. Yet in the context of generating a
because of the approximations of the force fields. The database for training NN models, it only needs to be
last point can be mitigated by the use of QM/ML force performed once, and possibly at very high accuracy using
fields in the future. QM/ML-based force fields to obtain very accurate data.
The resulting NN will be used for predictions. Another
Recently, we explored the use of machine vision NN possible example comes from the BindingDB dataset [73]
models for binding affinity prediction. In KDEEP [52], a which contains about 1 419 347 binding data for 7000 pro-
ML model is used to predict binding affinities, which teins and over 635 301 drug-like molecules, but for most
consists of a CNN trained on the PDBBind database [70]. of the protein–ligand pairs there is no co-crystal structural
information. To fill up this gap, simulations could be used 8. Grossfield A, Pitman MC, Feller SE, Soubias O, Gawrisch K:
Internal hydration increases during activation of the
to predict ligand binding poses. As a rough estimation, if G-protein-coupled receptor rhodopsin. J Mol Biol 2008,
approximately 10 ms are needed to obtain the ligand 381:478-486.
binding pose for a pair of protein–ligand using adaptive 9. Dror RO, Arlow DH, Borhani DW, Jensen MO, Piana S, Shaw DE:
Identification of two distinct inactive conformations of the 2-
sampling methods [26], with 1 s of aggregate time one adrenergic receptor reconciles structural and biochemical
could generate 100 000 new predicted protein–ligand observations. Proc Natl Acad Sci U S A 2009, 106:4689-4694.
structures over the course of one year [27]. This aug- 10. Snow CD, Zagrovic B, Pande VS: The Trp cage: folding kinetics
mented dataset build at high computational and time cost and unfolded state topology via molecular dynamics
simulations. J Am Chem Soc 2002, 124:14548-14549.
is then used for learning fast predictive models, for
example, KDEEP. 11. Noe F, Schutte C, Vanden-Eijnden E, Reich L, Weikl TR:
Constructing the equilibrium ensemble of folding pathways
from short off-equilibrium simulations. Proc Natl Acad Sci U S A
Discussion 2009, 106:19011-19016.
In this article we illustrate how generated data produced 12. Buch I, Giorgino T, De Fabritiis G: Complete reconstruction of an
enzyme-inhibitor binding process by molecular dynamics
by simulations might be used to develop new and better simulations. Proc Natl Acad Sci U S A 2011, 108:10184-10189.
predictive ML models. Generation of datasets is not
13. Ferruz N, Harvey MJ, Mestres J, De Fabritiis G: Insights from
hampered by fast return times, which means that better fragment hit binding assays by molecular simulations. J Chem
simulation methods can be used, while ML is used to Inf Model 2015, 55:2200-2205.
obtain fast predictions. One existing example of this 14. Pan AC, Xu H, Palpant T, Shaw DE: Quantitative characterization
approach are QM simulations of biomolecules, used to of the binding and unbinding of millimolar drug fragments with
molecular dynamics simulations. J Chem Theory Comput 2017,
generate data for learning a NN QM potential, a paradigm 13:3372-3377.
that could improve on classical force fields in the near
15. Stanley N, Pardo L, Fabritiis GD: The pathway of ligand entry
future. A further possible example, build upon the expe- from the membrane bilayer to a lipid G protein-coupled
rience obtained in KDEEP, is where simulations are used receptor. Sci Rep 2016, 6:p22639.
as a data augmentation tool, delegating the binding 16. Plattner N, Doerr S, De Fabritiis G, Noé F: Complete protein–
affinity prediction to ML-based methods. protein association kinetics in atomic detail revealed by
molecular dynamics simulations and Markov modelling. Nat
Chem 2017.
Plattner et al. managed to simulate Barnse-Barstar protein–protein
Acknowledgements association.
The authors thank Acellera Ltd. for funding. G.D.F. acknowledges support
from MINECO (BIO2017-82628-P) and FEDER, as well as ‘Unidad de 17. Friedrichs MS, Eastman P, Vaidyanathan V, Houston M,
Excelencia Marı́a de Maeztu’, funded by MINECO (MDM-2014-0370). Legrand S, Beberg AL, Ensign DL, Bruns CM, Pande VS:
The authors thank the European Union’s Horizon 2020 research and Accelerating molecular dynamic simulation on graphics
innovation programme under grant agreement No 675451 (CompBioMed processing units. J Comput Chem 2009, 30:864-872.
project). 18. Harvey MJ, De Fabritiis G: An implementation of the smooth
particle mesh Ewald method on GPU hardware. J Chem Theory
Comput 2009, 5:2371-2377.
References and recommended reading
Papers of particular interest, published within the period of review, 19. Harvey MJ, Giupponi G, De Fabritiis G: ACEMD: accelerating
have been highlighted as: biomolecular dynamics in the microsecond time scale. J Chem
Theory Comput 2009, 5:1632-1639.
of special interest
of outstanding interest 20. Eastman P, Swails J, Chodera JD, McGibbon RT, Zhao Y,
Beauchamp KA, Wang LP, Simmonett AC, Harrigan MP, Stern CD
et al.: OpenMM 7: rapid development of high performance
1. Freddolino PL, Harrison CB, Liu Y, Schulten K: Challenges in algorithms for molecular dynamics. PLoS Comput Biol 2017, 13.
protein-folding simulations. Nat Phys 2010, 6:751-758.
21. Shirts M, Pande VS: COMPUTING: screen savers of the world
2. Beauchamp KA, Lin YS, Das R, Pande VS: Are protein force unite! Science 2000, 290:1903-1904.
fields getting better? A systematic benchmark on
524 diverse NMR measurements. J Chem Theory Comput 2012, 22. Buch I, Harvey MJ, Giorgino T, Anderson DP, De Fabritiis G: High-
8:1409-1414. throughput all-atom molecular dynamics simulations using
distributed computing. J Chem Inf Model 2010, 50:397-403.
3. Lindorff-Larsen K, Maragakis P, Piana S, Eastwood MP, Dror RO,
Shaw DE: Systematic validation of protein force fields against 23. Shaw DE, Chao JC, Eastwood MP, Gagliardo J, Grossman JP,
experimental data. PLoS ONE 2012, 7. Ho CR, Lerardi DJ, Kolossváry I, Klepeis JL, Layman T et al.:
Anton, a special-purpose machine for molecular dynamics
4. Piana S, Klepeis JL, Shaw DE: Assessing the accuracy of simulation. Commun ACM 2008, 51:91.
physical models used in protein-folding simulations:
quantitative evidence from long molecular dynamics 24. Singhal N, Pande VS: Error analysis and efficient sampling in
simulations. Curr Opin Struct Biol 2014, 24:98-105. Markovian state models for molecular dynamics. J Chem Phys
2005, 123.
5. McCammon JA, Gelin BR, Karplus M: Dynamics of folded
proteins. Nature 1977, 267:585-590. 25. Hinrichs NS, Pande VS: Calculation of the distribution of
eigenvalues and eigenvectors in Markovian state models for
6. Shaw DE, Maragakis P, Lindorff-Larsen K, Piana S, Dror RO, molecular dynamics. J Chem Phys 2007, 126.
Eastwood MP, Bank JA, Jumper JM, Salmon JK, Shan Y,
Wriggers W: Atomic-level characterization of the structural 26. Doerr S, De Fabritiis G: On-the-fly learning and sampling of
dynamics of proteins. Science 2010, 330:341-346. ligand binding by high-throughput molecular simulations. J
Chem Theory Comput 2014, 10:2064-2069.
7. Duan Y: Pathways to a protein folding intermediate observed in
a 1-microsecond simulation in aqueous solution. Science 27. Martı́nez-Rosell G, Giorgino T, Harvey MJ, de Fabritiis G: Drug
1998, 282:740-744. discovery and molecular dynamics: methods, applications
and perspective beyond the second timescale. Curr Top Med 50. Ragoza M, Hochuli J, Idrobo E, Sunseri J, Koes DR: Protein–
Chem 2017, 17:2617-2625. ligand scoring with convolutional neural networks. J Chem Inf
Model 2017, 57:942-957.
28. Noé F, Nüske F: A variational approach to modeling slow
processes in stochastic dynamical systems. Multiscale Model 51. Wallach I, Dzamba M, Heifets A: AtomNet: A Deep Convolutional
Simul 2013, 11:635-655. Neural Network for Bioactivity Prediction in Structure-based Drug
Discovery. arXiv; 2015:1-11.
29. Pérez-Hernández G, Paul F, Giorgino T, De Fabritiis G, Noé F:
Identification of slow molecular order parameters for Markov 52. Jiménez J, kali9c M, Martı́nez-Rosell G, De Fabritiis G: KDEEP:
model construction. J Chem Phys 2013, 139. protein–ligand absolute binding affinity prediction via 3D-
convolutional neural networks. J Chem Inf Model 2018:1-26
30. Schwantes CR, Pande VS: Improvements in Markov State http://dx.doi.org/10.1021/acs.jcim.7b00650. (in press).
Model construction reveal many non-native interactions in the KDEEP is a deep convolutional neural network trained over the PDBBind’s
folding of NTL9. J Chem Theory Comput 2013, 9:2000-2009. dataset, treating proteins as 3D images, to perform predictions on
protein–ligand binding affinity.
31. Amadei A, Linssen ABM, Berendsen HJC: Essential dynamics of
proteins. Proteins Struct Funct Bioinform 1993, 17:412-425. 53. Torng W, Altman RB: 3D deep convolutional neural networks for
amino acid environment similarity analysis. BMC Bioinform
32. Lange OF, Grubmüller H: Can principal components yield a 2017, 18.
dimension reduced description of protein dynamics on long
time scales? J Phys Chem B 2006, 110:22842-22852. 54. DeepChem, Deepchem, a python library democratizing deep
learning for science. http://www.deepchem.io (accessed
33. David CC, Jacobs DJ: Principal component analysis: a method 21.09.17).
for determining the essential dynamics of proteins. Methods DeepChem is a Python library that aims to provide a high-quality open
Mol Biol 2014, 1084:193-226. source tool for deep learning applied on to computational chemistry,
making more accessible the usage of deep neural networks for drug
34. LeCun Y, Bengio Y, Hinton G: Deep learning. Nature 2015, discovery, materials science, quantum chemistry and biology.
521:436-444.
55. Wu Z, Ramsundar B, Feinberg EN, Gomes J, Geniesse C,
35. Schmidhuber J: Deep Learning in neural networks: an Pappu AS, Leswing K, Pande V: MoleculeNet: A Benchmark for
overview. Neural Netw 2015, 61:85-117. Molecular Machine Learning. arXiv; 2017:1-39.
36. Hornik K: Approximation capabilities of multilayer feedforward 56. Sadiq SK, de Fabritiis G: Explicit solvent dynamics and
networks. Neural Netw 1991, 4:251-257. energetics of HIV-1 protease flap opening and closing.
Proteins Struct Funct Bioinform 2010, 78:2873-2885.
37. Andoni A, Panigrahy R, Valiant G, Zhang L: Learning polynomials
with neural networks. In Proceedings of the 31st International 57. Sadiq SK, Noe F, De Fabritiis G: Kinetic characterization of the
Conference on Machine Learning, vol 330. 2014:1-9. critical step in HIV-1 protease maturation. Proc Natl Acad Sci U
S A 2012, 109:20449-20454.
38. Angermueller C, Pärnamaa T, Parts L, Stegle O: Deep learning for
computational biology. Mol Syst Biol 2016, 12:878. 58. Stanley N, Esteban-Martı́n S, De Fabritiis G: Kinetic modulation
of a disordered protein domain by phosphorylation. Nat
39. Dahl GE, Jaitly N, Salakhutdinov R: Multi-task Neural Networks for Commun 2014, 5.
QSAR Predictions. arXiv; 2014:1-21.
59. Carleo G, Troyer M: Solving the quantum many-body problem
40. Mayr A, Klambauer G, Unterthiner T, Hochreiter S: with artificial neural networks. Science 2017, 355:602-606.
DeepTox: toxicity prediction using deep learning. Front Environ
Sci 2016, 3. 60. Behler J, Parrinello M: Generalized neural-network
representation of high-dimensional potential-energy
41. Kingma DP, Welling M: Auto-Encoding Variational Bayes. arXiv; surfaces. Phys Rev Lett 2007, 98.
2013:1-14. One of the first contributions on learning the potential energy surface of
molecules using neural networks.
42. Gómez-Bombarelli R, Duvenaud D, Hernández-Lobato JM,
Aguilera-Iparraguirre J, Hirzel TD, Adams RP, Aspuru-Guzik A: 61. Behler J: Atom-centered symmetry functions for constructing
Automatic Chemical Design Using a Data-driven Continuous high-dimensional neural network potentials. J Chem Phys
Representation of Molecules. arXiv; 2016:1-28. 2011, 134:074106.
43. Wehmeyer C, Noe F: Time-lagged Autoencoders: Deep Learning 62. Behler J: Constructing high-dimensional neural network
of Slow Collective Variables for Molecular Kinetics. arXiv; 2017:1-8. potentials: a tutorial review. Int J Quantum Chem 2015,
115:1032-1050.
44. Doerr S, Ariz-Extreme I, Harvey MJ, De Fabritiis G: Dimensionality
Reduction Methods for Molecular Simulations. arXiv; 2017:1-11. 63. Gastegger M, Behler J, Marquetand P: Machine learning
molecular dynamics for the simulation of infrared spectra.
45. Hernández CX, Wayment-Steele HK, Sultan MM, Husic BE, Chem Sci 2017, 8:6924-6935.
Pande VS: Variational Encoding of Complex Dynamics. arXiv;
2017:1-12. 64. Boomsma W, Frellsen J: Spherical convolutions and their
application in molecular modelling. Neural Inf Process Syst
46. Mardt A, Pasquali L, Wu H, Noe F: VAMPnets: Deep Learning of (NIPS) 2017.
Molecular Kinetics. arXiv; 2017:1-14.
65. Smith JS, Isayev O, Roitberg AE: ANI-1: an extensible neural
47. Schneider E, Dai L, Topper RQ, Drechsel-Grau C, Tuckerman ME: network potential with DFT accuracy at force field
Stochastic neural network approach for learning high- computational cost. Chem Sci 2017, 8:3192-3203.
dimensional free energy surfaces. Phys Rev Lett 2017, 119. In this paper they present ANI-1, a neural network trained with QM
simulation data to generate the potential energy surface and forces for
48. Krizhevsky A, Sulskever I, Hinton GE: ImageNet classification small molecules.
with deep convolutional neural networks. Adv Neural Inf
Process Syst 2012, 60:84-90. 66. Yao K, Herr JE, Toth DW, Mcintyre R, Parkhill J: The TensorMol-0.1
Model Chemistry: A Neural Network Augmented with Long-Range
49. Jiménez J, Doerr S, Martı́nez-Rosell G, Rose AS, De Fabritiis G: Physics. arXiv; 2017:1-8.
DeepSite: protein-binding site predictor using 3D- TensorMol is a neural network potential trained over quantum mechanics
convolutional neural networks. Bioinformatics 2017, simulations that is able to generate the potential energy surface and
33:3036-3042. forces of small molecules.
DeepSite is a deep convolutional neural network trained with protein
structural data, treating proteins as 3D images. The network predicts the 67. Zhang L, Han J, Wang H, Car R, W.E.: Deep Potential Molecular
presence of druggable pockets, and demonstrates superior performance Dynamics: A Scalable Model with the Accuracy of Quantum
than the state-of-the-art. Mechanics. arXiv; 2017:1-22.
68. Faber FA, Hutchison L, Huang B, Gilmer J, Schoenholz SS, 71. Ahmed A, Smith RD, Clark JJ, J.B. Jr, Carlson HA: Recent
Dahl GE, Vinyals O, Kearnes S, Riley PF, Von Lilienfeld OA: improvements to Binding MOAD: a resource for protein–ligand
Prediction errors of molecular machine learning models Binding affinities and structures. Nucleic Acids Res 2015, 43:
lower than hybrid DFT error. J Chem Theory Comput 2017, D465-D469.
13:5255-5264.
72. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H,
69. Chmiela S, Tkatchenko A, Sauceda HE, Poltavsky I, Schütt KT, Shindyalov IN, Bourne PE: The protein data bank. Nucleic Acids
Müller K-R: Machine learning of accurate energy-conserving Res 2000, 28:235-242.
molecular force fields. Sci Adv 2017, 3.
73. Gilson MK, Liu T, Baitaluk M, Nicola G, Hwang L, Chong J:
70. Liu Z, Su M, Han L, Liu J, Yang Q, Li Y, Wang R: Forging the basis BindingDB in 2015: a public database for medicinal chemistry,
for developing protein–ligand interaction scoring functions. computational chemistry and systems pharmacology. Nucleic
Acc Chem Res 2017, 50:302-309. Acids Res 2016, 44:D1045-D1053.