0% found this document useful (0 votes)

42 views11 pages

2019 - Greydanus Hamiltonian Neural Networks Paper

The document discusses Hamiltonian neural networks, which are neural networks inspired by Hamiltonian mechanics. Hamiltonian mechanics expresses physical systems using quantities like position and momentum that are related by a Hamiltonian function. The key idea is to parameterize the Hamiltonian function with a neural network and learn it directly from data, allowing the network to learn and respect exact conservation laws. The document evaluates Hamiltonian neural networks on problems involving conservation of energy, finding they train faster and generalize better than regular neural networks.

Uploaded by

Roberth Asencios Huamán

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views11 pages

2019 - Greydanus Hamiltonian Neural Networks Paper

Uploaded by

Roberth Asencios Huamán

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Hamiltonian Neural Networks

Sam Greydanus Misko Dzamba Jason Yosinski

Google Brain PetCube Uber AI Labs
[email protected] [email protected] [email protected]

Abstract
Even though neural networks enjoy widespread use, they still struggle to learn the
basic laws of physics. How might we endow them with better inductive biases? In
this paper, we draw inspiration from Hamiltonian mechanics to train models that
learn and respect exact conservation laws in an unsupervised manner. We evaluate
our models on problems where conservation of energy is important, including the
two-body problem and pixel observations of a pendulum. Our model trains faster
and generalizes better than a regular neural network. An interesting side effect is
that our model is perfectly reversible in time.

Ideal mass-spring system Baseline NN Prediction

Noisy observations Hamiltonian NN Prediction

Figure 1: Learning the Hamiltonian of a mass-spring system. The variables q and p correspond to
position and momentum coordinates. As there is no friction, the baseline’s inner spiral is due to
model errors. By comparison, the Hamiltonian Neural Network learns to exactly conserve a quantity
that is analogous to total energy.

1 Introduction
Neural networks have a remarkable ability to learn and generalize from data. This lets them excel
at tasks such as image classification [21], reinforcement learning [45, 26, 37], and robotic dexterity
[1, 22]. Even though these tasks are diverse, they all share the same underlying physical laws. For
example, a notion of gravity is important for reasoning about objects in an image, training an RL agent
to walk, or directing a robot to manipulate objects. Based on this observation, researchers have become
increasingly interested in finding physics priors that transfer across tasks [43, 34, 17, 10, 6, 40].

33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada.
Untrained neural networks do not have physics priors; they learn approximate physics knowledge
directly from data. This generally prevents them from learning exact physical laws. Consider the
frictionless mass-spring system shown in Figure 1. Here the total energy of the system is being
conserved. More specifically, this particular system conserves a quantity proportional to q 2 + p2 ,
where q is the position and p is the momentum of the mass. The baseline neural network in Figure 1
learns an approximation of this conservation law, and yet the approximation is imperfect enough that
a forward simulation of the system drifts over time to higher or lower energy states. Can we define a
class of neural networks that will precisely conserve energy-like quantities over time?
In this paper, we draw inspiration from Hamiltonian mechanics, a branch of physics concerned with
conservation laws and invariances, to define Hamiltonian Neural Networks, or HNNs. We begin with
an equation called the Hamiltonian, which relates the state of a system to some conserved quantity
(usually energy) and lets us simulate how the system changes with time. Physicists generally use
domain-specific knowledge to find this equation, but here we try a different approach:

Instead of crafting the Hamiltonian by hand, we propose parameterizing it with a

neural network and then learning it directly from data.

Since almost all physical laws can be expressed as conservation laws, our approach is quite general
[27]. In practice, our model trains quickly and generalizes well1 . Figure 1, for example, shows the
outcome of training an HNN on the same mass-spring system. Unlike the baseline model, it learns to
conserve an energy-like quantity.

2 Theory

Predicting dynamics. The hallmark of a good physics model is its ability to predict changes in a
system over time. This is the challenge we now turn to. In particular, our goal is to learn the dynamics
of a system using a neural network. The simplest way of doing this is by predicting the next state
of a system given the current one. A variety of previous works have taken this path and produced
excellent results [41, 14, 43, 34, 17, 6]. There are, however, a few problems with this approach.
The first problem is its notion of discrete “time steps” that connect neighboring states. Since time is
actually continuous, a better approach would be to express dynamics as a set of differential equations
and then integrate them from an initial state at t0 to a final state at t1 . Equation 1 shows how this
might be done, letting S denote the time derivatives of the coordinates of the system2 . This approach
has been under-explored so far, but techniques like Neural ODEs take a step in the right direction [7].
Z t1
(q1 , p1 ) = (q0 , p0 ) + S(q, p) dt (1)
t0

The second problem with existing methods is that they tend not to learn exact conservation laws
or invariant quantities. This often causes them to drift away from the true dynamics of the system
as small errors accumulate. The HNN model that we propose ameliorates both of these problems.
To see how it does this — and to situate our work in the proper context — we first briefly review
Hamiltonian mechanics.
Hamiltonian Mechanics. William Hamilton introduced Hamiltonian mechanics in the 19th century
as a mathematical reformulation of classical mechanics. Its original purpose was to express classical
mechanics in a more unified and general manner. Over time, though, scientists have applied it to
nearly every area of physics from thermodynamics to quantum field theory [29, 32, 39].
In Hamiltonian mechanics, we begin with a set of coordinates (q, p). Usually, q = (q1 , ..., qN )
represents the positions of a set of objects whereas p = (p1 , ..., pN ) denotes their momentum. Note
how this gives us N coordinate pairs (q1 , p1 )...(qN , pN ). Taken together, they offer a complete
description of the system. Next, we define a scalar function, H(q, p) called the Hamiltonian so that
dq @H dp @H
= , = . (2)
dt @p dt @q
1
We make our code available at github.com/greydanus/hamiltonian-nn.
2
Any coordinates that describe the state of the system. Later we will use position and momentum (p, q).

2
Equation 2 tells us that moving coordinates in the direction SH = @H @p , @q gives us the time
@H

evolution of the system. We can think of S as a vector field over the inputs of H. In fact, it is a
special kind of vector field called a “symplectic gradient”. Whereas moving in the direction of the
gradient of H changes the output as quickly as possible, moving in the direction of the symplectic
gradient keeps the output exactly constant. Hamilton used this mathematical framework to relate
the position and momentum vectors (q, p) of a system to its total energy Etot = H(q, p). Then,
he found SH using Equation 2 and obtained the dynamics of the system by integrating this field
according to Equation 1. This is a powerful approach because it works for almost any system where
the total energy is conserved.
Hamiltonian mechanics, like Newtonian mechanics, can predict the motion of a mass-spring system
or a single pendulum. But its true strengths only become apparent when we tackle systems with
many degrees of freedom. Celestial mechanics, which are chaotic for more than two bodies, are a
good example. A few other examples include many-body quantum systems, fluid simulations, and
condensed matter physics [29, 32, 39, 33, 9, 12].
Hamiltonian Neural Networks. In this paper, we propose learning a parametric function for H
instead of SH . In doing so, we endow our model with the ability to learn exactly conserved quantities
from data in an unsupervised manner. During the forward pass, it consumes a set of coordinates and
outputs a single scalar “energy-like” value. Then, before computing the loss, we take an in-graph
gradient of the output with respect to the input coordinates (Figure A.1). It is with respect to this
gradient that we compute and optimize an L2 loss (Equation 3).
@H✓ @q @H✓ @p
LHN N = + + (3)
@p @t 2 @q @t 2
For a visual comparison between this approach and the baseline, refer to Figure 1 or Figure 1(b).
This training procedure allows HNNs to learn conserved quantities analogous to total energy straight
from data. Apart from conservation laws, HNNs have several other interesting and potentially useful
properties. First, they are perfectly reversible in that the mapping from (q, p) at one time to (q, p)
at another time is bijective. Second, we can manipulate the HNN-conserved quantity (analogous
to total energy) by integrating along the gradient of H, giving us an interesting counterfactual tool
(e.g. “What would happen if we added 1 Joule of energy?”). We’ll discuss these properties later in
Section 6.

3 Learning a Hamiltonian from Data

Optimizing the gradients of a neural network is a rare approach. There are a few previous works
which do this [42, 35, 28], but their scope and implementation details diverge from this work and
from one another. With this in mind, our first step was to investigate the empirical properties of
HNNs on three simple physics tasks.
Task 1: Ideal Mass-Spring. Our first task was to model the dynamics of the frictionless mass-spring
system shown in Figure 1. The system’s Hamiltonian is given in Equation 4 where k is the spring
constant and m is the mass constant. For simplicity, we set k = m = 1. Then we sampled initial
coordinates with total energies uniformly distributed between [0.2, 1]. We constructed training and
test sets of 25 trajectories each and added Gaussian noise with standard deviation 2 = 0.1 to every
data point. Each trajectory had 30 observations; each observation was a concatenation of (q, p).
1 2 p2
H= kq + (4)
2 2m
Task 2: Ideal Pendulum. Our second task was to model a frictionless pendulum. Pendulums are
nonlinear oscillators so they present a slightly more difficult problem. Writing the gravitational
constant as g and the length of the pendulum as l, the general Hamiltonian is
l 2 p2
H = 2mgl(1 cos q) + (5)
2m
Once again we set m = l = 1 for simplicity. This time, we set g = 3 and sampled initial coordinates
with total energies in the range [1.3, 2.3]. We chose these numbers in order to situate the dataset along
the system’s transition from linear to nonlinear dynamics. As with Task 1, we constructed training
and test sets of 25 trajectories each and added the same amount of noise.

3
Task 3: Real Pendulum. Our third task featured the position and momentum readings from a real
pendulum. We used data from a Science paper by Schmidt & Lipson [35] which also tackled the
problem of learning conservation laws from data. This dataset was noisier than the synthetic ones and
it did not strictly obey any conservation laws since the real pendulum had a small amount of friction.
Our goal here was to examine how HNNs fared on noisy and biased real-world data.

3.1 Methods

In all three tasks, we trained our models with a learning rate of 10 3 and used the Adam optimizer
[20]. Since the training sets were small, we set the batch size to be the total number of examples.
On each dataset we trained two fully-connected neural networks: the first was a baseline model that,
given a vector input (q, p) output the vector (@q/@t, @p/@t) directly. The second was an HNN
that estimated the same vector using the derivative of a scalar quantity as shown in Equation 2 (also
see Figure A.1). Where possible, we used analytic time derivatives as the targets. Otherwise, we
calculated finite difference approximations. All of our models had three layers, 200 hidden units, and
tanh activations. We trained them for 2000 gradient steps and evaluated them on the test set.
We logged three metrics: L2 train loss, L2 test loss, and mean squared error (MSE) between the true
and predicted total energies. To determine the energy metric, we integrated our models according
to Equation 1 starting from a random test point. Then we used MSE to measure how much a given
model’s dynamics diverged from the ground truth. Intuitively, the loss metrics measure our model’s
ability to fit individual data points while the energy metric measures its stability and conservation
of energy over long timespans. To obtain dynamics, we integrated our models with the fourth-order
Runge-Kutta integrator in scipy.integrate.solve_ivp and set the error tolerance to 10 9 [30].

3.2 Results

Predictions MSE between coordinates Total HNN-conserved quantity Total energy

Ideal mass-spring
Ideal pendulum
Real pendulum

Figure 2: Analysis of models trained on three simple physics tasks. In the first column, we observe
that the baseline model’s dynamics gradually drift away from the ground truth. The HNN retains a
high degree of accuracy, even obscuring the black baseline in the first two plots. In the second column,
the baseline’s coordinate MSE error rapidly diverges whereas the HNN’s does not. In the third
column, we plot the quantity conserved by the HNN. Notice that it closely resembles the total energy
of the system, which we plot in the fourth column. In consequence, the HNN roughly conserves total
energy whereas the baseline does not.

4
We found that HNNs train as quickly as baseline models and converge to similar final losses. Table 1
shows their relative performance over the three tasks. But even as HNNs tied with the baseline on
on loss, they dramatically outperformed it on the MSE energy metric. Figure 2 shows why this is
the case: as we integrate the two models over time, various errors accumulate in the baseline and it
eventually diverges. Meanwhile, the HNN conserves a quantity that closely resembles total energy
and diverges more slowly or not at all.
It’s worth noting that the quantity conserved by the HNN is not equivalent to the total energy; rather,
it’s something very close to the total energy. The third and fourth columns of Figure 2 provide a
useful comparison between the HNN-conserved quantity and the total energy. Looking closely at the
spacing of the y axes, one can see that the HNN-conserved quantity has the same scale as total energy,
but differs by a constant factor. Since energy is a relative quantity, this is perfectly acceptable3 .
The total energy plot for the real pendulum shows another interesting pattern. Whereas the ground
truth data does not quite conserve total energy, the HNN roughly conserves this quantity. This, in fact,
is a fundamental limitation of HNNs: they assume a conserved quantity exists and thus are unable to
account for things that violate this assumpation, such as friction. In order to account for friction, we
would need to model it separately from the HNN.

4 Modeling Larger Systems

Having established baselines on a few simple tasks, our next step was to tackle a larger system
involving more than one pair of (p, q) coordinates. One well-studied problem that fits this description
is the two-body problem, which requires four (p, q) pairs.
|pCM |2 |p1 |2 + |p2 |2 m1 m2
H= + +g (6)
m1 + m2 2µ |q1 q2 |2
Task 4: Two-body problem. In the two-body problem, point particles interact with one another via
an attractive force such as gravity. Once again, we let g be the gravitational constant and m represent
mass. Equation 6 gives the Hamiltonian of the system where µ is the reduced mass and pCM is the
momentum of the center of mass. As in previous tasks, we set m1 = m2 = g = 1 for simplicity.
Furthermore, we restricted our experiments to systems where the momentum of the center of mass
was zero. Even so, with eight degrees of freedom (given by the x and y position and momentum
coordinates of the two bodies) this system represented an interesting challenge.

4.1 Methods

Our first step was to generate a dataset of 1000 near-circular, two-body trajectories. We initialized
every trajectory with center of mass zero, total momentum zero, and radius r = kq2 q1 k in the
range [0.5, 1.5]. In order to control the level of numerical stability, we chose initial velocities that
gave perfectly circular orbits and then added Gaussian noise to them. We found that scaling this noise
by a factor of 2 = 0.05 produced trajectories with a good balance between stability and diversity.
We used fourth-order Runge-Kutta integration to find 200 trajectories of 50 observations each and
then performed an 80/20% train/test set split over trajectories. Our models and training procedure
were identical to those described in Section 3 except this time we trained for 10,000 gradient steps
and used a batch size of 200.

4.2 Results

The HNN model scaled well to this system. The first row of Figure 3 suggests that it learned to
conserve a quantity nearly equal to the total energy of the system whereas the baseline model did not.
The second row of Figure 3 gives a qualitative comparison of trajectories. After one orbit, the
baseline dynamics have completely diverged from the ground truth whereas the HNN dynamics have
only accumulated a small amount of error. As we continue to integrate up to t = 50 and beyond
(Figure B.1), both models diverge but the HNN does so at a much slower rate. Even as the HNN
3
To see why energy is relative, imagine a cat that is at an elevation of 0 m in one reference frame and 1 m in
another. Its potential energy (and total energy) will differ by a constant factor depending on frame of reference.

5
diverges from the ground truth orbit, its total energy remains stable rather than decaying to zero or
spiraling to infinity. We report quantitative results for this task in Table 1. Both train and test losses
of the HNN model were about an order of magnitude lower than those of the baseline. The HNN did
a better job of conserving total energy, with an energy MSE that was several orders of magnitude
below the baseline.
Having achieved success on the two-body Ground truth Hamiltonian NN Baseline NN
problem, we ran the same set of experi-
ments on the chaotic three-body problem.
We show preliminary results in Appendix

Energy
B where once again the HNN outperforms
its baseline by a considerable margin. We
opted to focus on the two-body results
here because the three-body results still Time

need improvement.

Trajectory
y

5 Learning
a Hamiltonian from Pixels
x

One of the key strengths of neural net-

works is that they can learn abstract repre- Figure 3: Analysis of an example 2-body trajectory. The
sentations directly from high-dimensional dynamics of the baseline model do not conserve total
data such as pixels or words. Having energy and quickly diverge from ground truth. The HNN,
trained HNN models on position and mo- meanwhile, approximately conserves total energy and
mentum coordinates, we were eager to see accrues a small amount of error after one full orbit.
whether we could train them on arbitrary
coordinates like the latent vectors of an autoencoder.
Task 5: Pixel Pendulum. With this in mind, we constructed a dataset of pixel observations of a
pendulum and then combined an autoencoder with an HNN to model its dynamics. To our knowledge
this is the first instance of a Hamiltonian learned directly from pixel data.

5.1 Methods

In recent years, OpenAI Gym has been widely adopted by the machine learning community as a
means for training and evaluating reinforcement learning agents [5]. Some works have even trained
world models on these environments [15, 16]. Seeing these efforts as related and complimentary to
our work, we used OpenAI Gym’s Pendulum-v0 environment in this experiment.
First, we generated 200 trajectories of 100 frames each4 . We required that the maximum absolute
displacement of the pendulum arm be ⇡6 radians. Starting from 400 x 400 x 3 RGB pixel observations,
we cropped, desaturated, and downsampled them to 28 x 28 x 1 frames and concatenated each frame
with its successor so that the input to our model was a tensor of shape batch x 28 x 28 x 2. We
used two frames so that velocity would be observable from the input. Without the ability to observe
velocity, an autoencoder without recurrence would be unable to ascertain the system’s full state space.
In designing the autoencoder portion of the model, our main objective was simplicity and trainability.
We chose to use fully-connected layers in lieu of convolutional layers because they are simpler.
Furthermore, convolutional layers sometimes struggle to extract even simple position information
[23]. Both the encoder and decoder were composed of four fully-connected layers with relu
activations and residual connections. We used 200 hidden units on all layers except the latent vector
z, where we used two units. As for the HNN component of this model, we used the same architecture
and parameters as described in Section 3. Unless otherwise specified, we used the same training
procedure as described in Section 4.1. We found that using a small amount of weight decay, 10 5 in
this case, was beneficial.
Losses. The most notable difference between this experiment and the others was the loss function.
This loss function was composed of three terms: the first being the HNN loss, the second being a clas-
sic autoencoder loss (L2 loss over pixels), and the third being an auxiliary loss on the autoencoder’s
4
Choosing the “no torque” action at every timestep.

6
latent space:
LCC = ztp (ztq zqt+1 ) 2
(7)
The purpose of the auxiliary loss term, given in Equation 7, was to make the second half of z,
which we’ll label zp , resemble the derivatives of the first half of z, which we’ll label zq . This loss
encouraged the latent vector (zq , zp ) to have roughly same properties as canonical coordinates (q, p).
These properties, measured by the Poisson bracket relations, are necessary for writing a Hamiltonian.
We found that the auxiliary loss did not degrade the autoencoder’s performance. Furthermore, it is
not domain-specific and can be used with any autoencoder with an even-sized latent space.

Figure 4: Predicting the dynamics of the pixel pendulum. We train an HNN and its baseline to predict
dynamics in the latent space of an autoencoder. Then we project to pixel space for visualization. The
baseline model rapidly decays to lower energy states whereas the HNN remains close to ground truth
even after hundreds of frames. It mostly obscures the ground truth line in the bottom plot.

5.2 Results

Unlike the baseline model, the HNN learned to conserve a scalar quantity analogous to the total
energy of the system. This enabled it to predict accurate dynamics for the system over much longer
timespans. Figure 4 shows a qualitative comparison of trajectories predicted by the two models. As in
previous experiments, we computed these dynamics using Equation 2 and a fourth-order Runge-Kutta
integrator. Unlike previous experiments, we performed this integration in the latent space of the
autoencoder. Then, after integration, we projected to pixel space using the decoder network. The
HNN and its baseline reached comparable train and test losses, but once again, the HNN dramatically
outperformed the baseline on the energy metric (Table 1).

Table 1: Quantitative results across all five tasks. Whereas the HNN is competitive with the baseline
on train/test loss, it dramatically outperforms the baseline on the energy metric. All values are
multiplied by 103 unless noted otherwise. See Appendix A for a note on train/test split for Task 3.
Train loss Test loss Energy
Task Baseline HNN Baseline HNN Baseline HNN
1: Ideal mass-spring 37 ± 2 37 ± 2 37 ± 2 36 ± 2 170 ± 20 .38 ± .1
2: Ideal pendulum 33 ± 2 33 ± 2 35 ± 2 36 ± 2 42 ± 10 25 ± 5
3: Real pendulum 2.7 ± .2 9.2 ± .5 2.2 ± .3 6.0 ± .6 390 ± 7 14 ± 5
4: Two body (⇥106 ) 33 ± 1 3.0 ± .1 30 ± .1 2.8 ± .1 6.3e4 ± 3e4 39 ± 5
5: Pixel pendulum 18 ± .2 19 ± .2 17 ± .3 18 ± .3 9.3 ± 1 .15 ± .01

7
6 Useful properties of HNNs
While the main purpose of HNNs is to endow neural networks with better physics priors, in this
section we ask what other useful properties these models might have.
Adding and removing energy. So far, we have seen that
integrating the symplectic gradient of the Hamiltonian can
give us the time evolution of a system but we have not
tried following the Riemann gradient RH = @H @q , @p .
@H

Intuitively, this corresponds to adding or removing some

of the HNN-conserved quantity from the system. It’s
especially interesting to alternate between integrating RH
and SH . Figure 5 shows how we can take advantage of
this effect to “bump” the pendulum to a higher energy
level. We could imagine using this technique to answer
counterfactual questions e.g. “What would have happened
if we applied a torque?”
Perfect reversibility. As neural networks have grown in Figure 5: Visualizing integration in
size, the memory consumption of transient activations, the the latent space of the Pixel Pendulum
intermediate activations saved for backpropagation, has model. We alternately integrate SH at
become a notable bottleneck. Several works propose semi- low energy (blue circle), RH (purple
reversible models that construct one layer’s activations line), and then SH at higher energy (red
from the activations of the next [13, 25, 19]. Neural ODEs circle).
also have this property [7]. Many of these models are only approximately reversible: their mappings
are not quite bijective. Unlike those methods, our approach is guaranteed to produce trajectories that
are perfectly reversible through time. We can simply refer to a result from Hamiltonian mechanics
called Liouville’s Theorem: the density of particles in phase space is constant. What this implies is
that any mapping (q0 , p0 ) ! (q1 , p1 ) is bijective/invertible.

7 Related work
Learning physical laws from data. Schmidt & Lipson [35] used a genetic algorithm to search a
space of mathematical functions for conservation laws and recovered the Lagrangians and Hamiltoni-
ans of several real systems. We were inspired by their approach, but used a neural neural network to
avoid constraining our search to a set of hand-picked functions. Two recent works are similar to this
paper in that the authors sought to uncover physical laws from data using neural networks [18, 4].
Unlike our work, they did not explicitly parameterize Hamiltonians.
Physics priors for neural networks. A wealth of previous works have sought to furnish neural
networks with better physics priors. Many of these works are domain-specific: the authors used
domain knowledge about molecular dynamics [31, 38, 8, 28], quantum mechanics [36], or robotics
[24] to help their models train faster or generalize. Others, such as Interaction Networks or Relational
Networks were meant to be fully general [43, 34, 2]. Here, we also aimed to keep our approach fully
general while introducing a strong and theoretically-motivated prior.
Modeling energy surfaces. Physicists, particularly those studying molecular dynamics, have seen
success using neural networks to model energy surfaces [3, 11, 36, 44]. In particular, several works
have shown dramatic computation speedups compared to density functional theory [31, 38, 8].
Molecular dynamics researchers integrate the derivatives of energy in order to obtain dynamics,
just as we did in this work. A key difference between these approaches and our own is that 1) we
emphasize the Hamiltonian formalism 2) we optimize the gradients of our model (though some works
do optimize the gradients of a molecular dynamics model [42, 28]).

8 Discussion
Whereas Hamiltonian mechanics is an old and well-established theory, the science of deep learning is
still in its infancy. Whereas Hamiltonian mechanics describes the real world from first principles,
deep learning does so starting from data. We believe that Hamiltonian Neural Networks, and models
like them, represent a promising way of bringing together the strengths of both approaches.

8
9 Acknowledgements
Sam Greydanus would like to thank the Google AI Residency Program for providing extraordinary
mentorship and resources. The authors would like to thank Nic Ford, Trevor Gale, Rapha Gontijo
Lopes, Keren Gu, Ben Caine, Mark Woodward, Stephan Hoyer, Jascha Sohl-Dickstein, and many
others for insightful conversations and support.
Special thanks to James and Judy Greydanus for their feedback and support from beginning to end.

References
[1] Andrychowicz, M., Baker, B., Chociej, M., Jozefowicz, R., McGrew, B., Pachocki, J., Petron,
A., Plappert, M., Powell, G., Ray, A., et al. Learning dexterous in-hand manipulation. arXiv
preprint arXiv:1808.00177, 2018.
[2] Battaglia, P., Pascanu, R., Lai, M., Rezende, D. J., et al. Interaction networks for learning
about objects, relations and physics. In Advances in neural information processing systems, pp.
4502–4510, 2016.
[3] Behler, J. Neural network potential-energy surfaces in chemistry: a tool for large-scale simula-
tions. Physical Chemistry Chemical Physics, 13(40):17930–17955, 2011.
[4] Bondesan, R. and Lamacraft, A. Learning symmetries of classical integrable systems. arXiv
preprint arXiv:1906.04645, 2019.
[5] Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., and Zaremba,
W. Openai gym. arXiv preprint arXiv:1606.01540, 2016.
[6] Chang, M. B., Ullman, T., Torralba, A., and Tenenbaum, J. B. A compositional object-based
approach to learning physical dynamics. arXiv preprint arXiv:1612.00341, 2016.
[7] Chen, T. Q., Rubanova, Y., Bettencourt, J., and Duvenaud, D. K. Neural ordinary
differential equations. pp. 6571–6583, 2018. URL http://papers.nips.cc/paper/
7892-neural-ordinary-differential-equations.pdf.
[8] Chmiela, S., Tkatchenko, A., Sauceda, H. E., Poltavsky, I., Schütt, K. T., and Müller, K.-R.
Machine learning of accurate energy-conserving molecular force fields. Science advances, 3(5):
e1603015, 2017.
[9] Cohen-Tannoudji, C., Dupont-Roc, J., and Grynberg, G. Photons and atoms-introduction to
quantum electrodynamics. Photons and Atoms-Introduction to Quantum Electrodynamics, by
Claude Cohen-Tannoudji, Jacques Dupont-Roc, Gilbert Grynberg, pp. 486. ISBN 0-471-18433-
0. Wiley-VCH, February 1997., pp. 486, 1997.
[10] de Avila Belbute-Peres, F., Smith, K., Allen, K., Tenenbaum, J., and Kolter, J. Z. End-to-end
differentiable physics for learning and control. In Advances in Neural Information Processing
Systems, pp. 7178–7189, 2018.
[11] Gastegger, M. and Marquetand, P. High-dimensional neural network potentials for organic
reactions and an improved training algorithm. Journal of chemical theory and computation, 11
(5):2187–2198, 2015.
[12] Girvin, S. M. and Yang, K. Modern condensed matter physics. Cambridge University Press,
2019.
[13] Gomez, A. N., Ren, M., Urtasun, R., and Grosse, R. B. The reversible residual network:
Backpropagation without storing activations. In Advances in neural information processing
systems, pp. 2214–2224, 2017.
[14] Grzeszczuk, R. NeuroAnimator: fast neural network emulation and control of physics-based
models. University of Toronto.
[15] Ha, D. and Schmidhuber, J. Recurrent world models facilitate policy evolution. In Advances in
Neural Information Processing Systems, pp. 2450–2462, 2018.

9
[16] Hafner, D., Lillicrap, T., Fischer, I., Villegas, R., Ha, D., Lee, H., and Davidson, J. Learning
latent dynamics for planning from pixels. arXiv preprint arXiv:1811.04551, 2018.
[17] Hamrick, J. B., Allen, K. R., Bapst, V., Zhu, T., McKee, K. R., Tenenbaum, J. B., and Battaglia,
P. W. Relational inductive bias for physical construction in humans and machines. arXiv
preprint arXiv:1806.01203, 2018.
[18] Iten, R., Metger, T., Wilming, H., Del Rio, L., and Renner, R. Discovering physical concepts
with neural networks. arXiv preprint arXiv:1807.10300, 2018.
[19] Jacobsen, J.-H., Smeulders, A., and Oyallon, E. i-revnet: Deep invertible networks. arXiv
preprint arXiv:1802.07088, 2018.
[20] Kingma, D. P. and Ba, J. Adam: A method for stochastic optimization. International Conference
on Learning Representations, 2014.
[21] Krizhevsky, A., Sutskever, I., and Hinton, G. Imagenet classification with deep convolutional
neural networks. In Advances in Neural Information Processing Systems 25, pp. 1106–1114,
2012.
[22] Levine, S., Pastor, P., Krizhevsky, A., Ibarz, J., and Quillen, D. Learning hand-eye coordination
for robotic grasping with deep learning and large-scale data collection. The International
Journal of Robotics Research, 37(4-5):421–436, 2018.
[23] Liu, R., Lehman, J., Molino, P., Such, F. P., Frank, E., Sergeev, A., and Yosinski, J. An intriguing
failing of convolutional neural networks and the coordconv solution. In Advances in Neural
Information Processing Systems, pp. 9605–9616, 2018.
[24] Lutter, M., Ritter, C., and Peters, J. Deep lagrangian networks: Using physics as model prior
for deep learning. International Conference on Learning Representations, 2019.
[25] MacKay, M., Vicol, P., Ba, J., and Grosse, R. B. Reversible recurrent neural networks. In
Advances in Neural Information Processing Systems, pp. 9029–9040, 2018.
[26] Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller,
M. Playing Atari with Deep Reinforcement Learning. ArXiv e-prints, December 2013.
[27] Noether, E. Invariant variation problems. Transport Theory and Statistical Physics, 1(3):
186–207, 1971.
[28] Pukrittayakamee, A., Malshe, M., Hagan, M., Raff, L., Narulkar, R., Bukkapatnum, S., and
Komanduri, R. Simultaneous fitting of a potential-energy surface and its corresponding force
fields using feedforward neural networks. The Journal of chemical physics, 130(13):134101,
2009.
[29] Reichl, L. E. A modern course in statistical physics. AAPT, 1999.
[30] Runge, C. Über die numerische auflösung von differentialgleichungen. Mathematische Annalen,
46(2):167–178, 1895.
[31] Rupp, M., Tkatchenko, A., Müller, K.-R., and Von Lilienfeld, O. A. Fast and accurate modeling
of molecular atomization energies with machine learning. Physical review letters, 108(5):
058301, 2012.
[32] Sakurai, J. J. and Commins, E. D. Modern quantum mechanics, revised edition. AAPT, 1995.
[33] Salmon, R. Hamiltonian fluid mechanics. Annual review of fluid mechanics, 20(1):225–256,
1988.
[34] Santoro, A., Raposo, D., Barrett, D. G., Malinowski, M., Pascanu, R., Battaglia, P., and Lillicrap,
T. A simple neural network module for relational reasoning. In Advances in neural information
processing systems, pp. 4967–4976, 2017.
[35] Schmidt, M. and Lipson, H. Distilling free-form natural laws from experimental data. Science,
324(5923):81–85, 2009.

10
[36] Schütt, K. T., Arbabzadah, F., Chmiela, S., Müller, K. R., and Tkatchenko, A. Quantum-
chemical insights from deep tensor neural networks. Nature communications, 8:13890, 2017.
[37] Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T.,
Baker, L., Lai, M., Bolton, A., et al. Mastering the game of go without human knowledge.
Nature, 550(7676):354, 2017.
[38] Smith, J. S., Isayev, O., and Roitberg, A. E. Ani-1: an extensible neural network potential with
dft accuracy at force field computational cost. Chemical science, 8(4):3192–3203, 2017.
[39] Taylor, J. R. Classical mechanics. University Science Books, 2005.
[40] Tenenbaum, J. B., De Silva, V., and Langford, J. C. A global geometric framework for nonlinear
dimensionality reduction. science, 290(5500):2319–2323, 2000.
[41] Tompson, J., Schlachter, K., Sprechmann, P., and Perlin, K. Accelerating eulerian fluid
simulation with convolutional networks. In Proceedings of the 34th International Conference
on Machine Learning-Volume 70, pp. 3424–3433. JMLR. org, 2017.
[42] Wang, J., Olsson, S., Wehmeyer, C., Perez, A., Charron, N. E., de Fabritiis, G., Noe, F., and
Clementi, C. Machine learning of coarse-grained molecular dynamics force fields. ACS Central
Science, 2018.
[43] Watters, N., Zoran, D., Weber, T., Battaglia, P., Pascanu, R., and Tacchetti, A. Visual interaction
networks: Learning a physics simulator from video. In Advances in neural information
processing systems, pp. 4539–4547, 2017.
[44] Yao, K., Herr, J. E., Toth, D. W., Mckintyre, R., and Parkhill, J. The tensormol-0.1 model
chemistry: A neural network augmented with long-range physics. Chemical science, 9(8):
2261–2269, 2018.
[45] Yosinski, J., Clune, J., Hidalgo, D., Nguyen, S., Zagal, J. C., and Lipson, H. Evolving robot gaits
in hardware: the hyperneat generative encoding vs. parameter optimization. In Proceedings of
the 20th European Conference on Artificial Life, pp. 890–897, August 2011.

Artificial Intelligence and Deep Learning Techniques For Inverse Kinematics and Dynamics of Serial Robotic Manipulators: A Survey
No ratings yet
Artificial Intelligence and Deep Learning Techniques For Inverse Kinematics and Dynamics of Serial Robotic Manipulators: A Survey
24 pages
Deep Physical Neural Networks Enabled by A Backpro
No ratings yet
Deep Physical Neural Networks Enabled by A Backpro
60 pages
Statistical Physics Analysis of Graph Neural Networks: Approaching Optimality in The Contextual Stochastic Block Model
No ratings yet
Statistical Physics Analysis of Graph Neural Networks: Approaching Optimality in The Contextual Stochastic Block Model
36 pages
20 StatMechDeep
No ratings yet
20 StatMechDeep
30 pages
Machine-Learning Non-Conservative Dynamics For New-Physics Detection
No ratings yet
Machine-Learning Non-Conservative Dynamics For New-Physics Detection
17 pages
Lagrangian Neural Networks
No ratings yet
Lagrangian Neural Networks
9 pages
Artificial Neural Network Thesis Topics
100% (2)
Artificial Neural Network Thesis Topics
4 pages
Hamiltonian Generative Networks
No ratings yet
Hamiltonian Generative Networks
19 pages
Shaan Desai Pinns
No ratings yet
Shaan Desai Pinns
153 pages
ORT Metriplectic Neural Networks Thermodynamics Informed Machine Learning of Complex Physical Systems
No ratings yet
ORT Metriplectic Neural Networks Thermodynamics Informed Machine Learning of Complex Physical Systems
9 pages
Learning To Simulate Complex Physics With Graph Networks
No ratings yet
Learning To Simulate Complex Physics With Graph Networks
23 pages
Operationally Meaningful Representations of Physical Systems in Neural Networks
No ratings yet
Operationally Meaningful Representations of Physical Systems in Neural Networks
24 pages
DL Report
No ratings yet
DL Report
6 pages
Scirobotics
No ratings yet
Scirobotics
10 pages
Feasibility Study On The Use of Dynamic Neural Networks (DNN'S) For Modeling A Variable Displacement Load Sensing Pump
No ratings yet
Feasibility Study On The Use of Dynamic Neural Networks (DNN'S) For Modeling A Variable Displacement Load Sensing Pump
8 pages
NeurIPS 2021 Redesigning The Transformer Architecture With Insights From Multi Particle Dynamical Systems Paper
No ratings yet
NeurIPS 2021 Redesigning The Transformer Architecture With Insights From Multi Particle Dynamical Systems Paper
14 pages
Machine Learning Topological Invariants With Neural Networks
No ratings yet
Machine Learning Topological Invariants With Neural Networks
6 pages
7 Differentiable Physics Simulat
No ratings yet
7 Differentiable Physics Simulat
5 pages
A Machine Learning Approach To Simulate Exible Body Dynamics
No ratings yet
A Machine Learning Approach To Simulate Exible Body Dynamics
27 pages
Learning Poisson Systems and Trajectories of Autonomous Systems Via Poisson Neural Networks
No ratings yet
Learning Poisson Systems and Trajectories of Autonomous Systems Via Poisson Neural Networks
12 pages
Physics Informed Neural Network Theory and Applications
No ratings yet
Physics Informed Neural Network Theory and Applications
44 pages
Linearly Constrained Neural Networks
No ratings yet
Linearly Constrained Neural Networks
31 pages
2002 09405 PDF
No ratings yet
2002 09405 PDF
19 pages
Learning To Simulate Complex Physics With Graph Networks
No ratings yet
Learning To Simulate Complex Physics With Graph Networks
10 pages
A Feedforward Neural Network Framework For Approximating The Solutions To Nonlinear Ordinary Differential Equations
No ratings yet
A Feedforward Neural Network Framework For Approximating The Solutions To Nonlinear Ordinary Differential Equations
13 pages
Hamiltonian 3
No ratings yet
Hamiltonian 3
16 pages
3 - Thermodynamic Neural Network
No ratings yet
3 - Thermodynamic Neural Network
24 pages
Discovering Physical Concepts With Neural Networks
No ratings yet
Discovering Physical Concepts With Neural Networks
21 pages
Hamiltonian 2
No ratings yet
Hamiltonian 2
10 pages
UC Merced Previously Published Works
No ratings yet
UC Merced Previously Published Works
40 pages
Statistical Mechanics of Deep Learning
No ratings yet
Statistical Mechanics of Deep Learning
30 pages
Symplectic Ode-Net - Learning Hamiltonian Dynamics With Control
No ratings yet
Symplectic Ode-Net - Learning Hamiltonian Dynamics With Control
17 pages
FRJ Paper
No ratings yet
FRJ Paper
9 pages
Deep Physical Neural Networks Trained With Backpropagation. Nature 2022, P L Mcmohan
No ratings yet
Deep Physical Neural Networks Trained With Backpropagation. Nature 2022, P L Mcmohan
11 pages
Neural Networks
No ratings yet
Neural Networks
23 pages
Neural Manifold Operators For Learning The Evolution of Physical
No ratings yet
Neural Manifold Operators For Learning The Evolution of Physical
11 pages
Symplectically Integrated Neural Networks
No ratings yet
Symplectically Integrated Neural Networks
12 pages
Improving Physics-Informed Neural Networks With Meta-Learned Optimization
No ratings yet
Improving Physics-Informed Neural Networks With Meta-Learned Optimization
26 pages
OINN Configurations For Dynamic Systems
No ratings yet
OINN Configurations For Dynamic Systems
15 pages
2022 Predicting Parametric Spatiotemporal Dynamics by Multi-Resolution PDE Structure-Preserved Deep Learning
No ratings yet
2022 Predicting Parametric Spatiotemporal Dynamics by Multi-Resolution PDE Structure-Preserved Deep Learning
51 pages
GNN 954759
No ratings yet
GNN 954759
6 pages
Why Are Graph Neural Networks Effective For EDA Problems
No ratings yet
Why Are Graph Neural Networks Effective For EDA Problems
8 pages
Solving Schrodinger Equations Using Physically Constrained Neural Network
No ratings yet
Solving Schrodinger Equations Using Physically Constrained Neural Network
7 pages
Accepted Manuscript: Journal of Computational Physics
No ratings yet
Accepted Manuscript: Journal of Computational Physics
47 pages
Deep Lagrangian Networks
No ratings yet
Deep Lagrangian Networks
17 pages
MFAIMFG Technical Report 2
No ratings yet
MFAIMFG Technical Report 2
5 pages
A Review of Physics-Informed Machine Learning in F
No ratings yet
A Review of Physics-Informed Machine Learning in F
21 pages
Inverted Pendulum
No ratings yet
Inverted Pendulum
13 pages
Introduction Neural
No ratings yet
Introduction Neural
13 pages
Yagawa-Oishi2021 Book ComputationalMechanicsWithNeur
No ratings yet
Yagawa-Oishi2021 Book ComputationalMechanicsWithNeur
233 pages
PINN Gentle Introduction
No ratings yet
PINN Gentle Introduction
26 pages
Physics-Enhanced Neural Networks in The Small Data Regime: Equal Contribution
No ratings yet
Physics-Enhanced Neural Networks in The Small Data Regime: Equal Contribution
6 pages
Physics-Informed Neural Networks
No ratings yet
Physics-Informed Neural Networks
22 pages
A Survey of Randomized Algorithms For Training Neural Networks
No ratings yet
A Survey of Randomized Algorithms For Training Neural Networks
10 pages
Information Sciences: Le Zhang, P.N. Suganthan
No ratings yet
Information Sciences: Le Zhang, P.N. Suganthan
3 pages
Journal of Computational Physics: M. Raissi, P. Perdikaris, G.E. Karniadakis
No ratings yet
Journal of Computational Physics: M. Raissi, P. Perdikaris, G.E. Karniadakis
22 pages
Tunnel Lining Analysis and Design Using Staad Pro
No ratings yet
Tunnel Lining Analysis and Design Using Staad Pro
4 pages
Demystifying Deep Convolutional Neural Networks - Adam Harley (2014) CNN PDF
No ratings yet
Demystifying Deep Convolutional Neural Networks - Adam Harley (2014) CNN PDF
27 pages
Physics-Guided Physics-Informed and Physics-Encode
No ratings yet
Physics-Guided Physics-Informed and Physics-Encode
37 pages
Sum of An Arithmetic Sequence
100% (1)
Sum of An Arithmetic Sequence
2 pages
Chapter 1 Managers and You in The Workplace
100% (1)
Chapter 1 Managers and You in The Workplace
75 pages
Maneb Jce Mathematics 2012 Past Paper1719321067
No ratings yet
Maneb Jce Mathematics 2012 Past Paper1719321067
4 pages
AAnalyst 300 Data Sheet
No ratings yet
AAnalyst 300 Data Sheet
2 pages
Civ2212 Soil Mechanics Assignment No.2 Shear Strength of Soils
No ratings yet
Civ2212 Soil Mechanics Assignment No.2 Shear Strength of Soils
2 pages
01 Slides
No ratings yet
01 Slides
109 pages
Tajneen Islam - Write-Up 3 Soils
No ratings yet
Tajneen Islam - Write-Up 3 Soils
9 pages
Text To Image Survey
No ratings yet
Text To Image Survey
40 pages
The Prodigy of Welding: By: Brandy Ratliff Graduation Project 2010
No ratings yet
The Prodigy of Welding: By: Brandy Ratliff Graduation Project 2010
16 pages
Learn About Ecosystems - Lesson Plan
No ratings yet
Learn About Ecosystems - Lesson Plan
2 pages
Indian Meterology Pilot Mantras
No ratings yet
Indian Meterology Pilot Mantras
6 pages
Website Template For MSC by Coursework - ODL MSC Process Safety
No ratings yet
Website Template For MSC by Coursework - ODL MSC Process Safety
19 pages
Myp1 Teacher Layout
No ratings yet
Myp1 Teacher Layout
5 pages
Module#2 Assignment - Mendoza, Oliver R
No ratings yet
Module#2 Assignment - Mendoza, Oliver R
17 pages
Unit - 2 TC
No ratings yet
Unit - 2 TC
23 pages
Unit2.5 Compoundsand Solutions
No ratings yet
Unit2.5 Compoundsand Solutions
17 pages
WORK
No ratings yet
WORK
17 pages
Fundamental Principles of Counting - 073819
No ratings yet
Fundamental Principles of Counting - 073819
6 pages
Characterization of Microbial and Prebiotic of Bread
No ratings yet
Characterization of Microbial and Prebiotic of Bread
33 pages
10TH B Test Series 2024-2025 1ST Round Front Page
No ratings yet
10TH B Test Series 2024-2025 1ST Round Front Page
2 pages
Review-Midterm-1-Eng 6
No ratings yet
Review-Midterm-1-Eng 6
6 pages
Daforest Techniques Grid Teachit 112456
No ratings yet
Daforest Techniques Grid Teachit 112456
2 pages
Computatıonal Fluıd Dynamıcs Based Desıgn and Investıgatıon of Nose Cone Aerodynamıcs of Formula Style Student Desıgned Racecar
No ratings yet
Computatıonal Fluıd Dynamıcs Based Desıgn and Investıgatıon of Nose Cone Aerodynamıcs of Formula Style Student Desıgned Racecar
7 pages
Memento CRT
No ratings yet
Memento CRT
4 pages
SS Specimen Papers (2267, 227X) - With Marking Points
No ratings yet
SS Specimen Papers (2267, 227X) - With Marking Points
8 pages
Resume Math
No ratings yet
Resume Math
4 pages
5 Explain How A Series of Chapters, Scenes, or Stanzas Fits Together To Provide The Overall Structure of A Particular Story, Drama, or Poem
No ratings yet
5 Explain How A Series of Chapters, Scenes, or Stanzas Fits Together To Provide The Overall Structure of A Particular Story, Drama, or Poem
3 pages
Ministry of Resin Exposure Times - Durable Grey
No ratings yet
Ministry of Resin Exposure Times - Durable Grey
1 page
List of Classified HKAL Chemistry Exam Questions: A S M O
No ratings yet
List of Classified HKAL Chemistry Exam Questions: A S M O
2 pages
Analytical Methods of Optimization
From Everand
Analytical Methods of Optimization
D. F. Lawden
No ratings yet
Fundamentals of physics
From Everand
Fundamentals of physics
Alessio Mangoni
2/5 (1)

2019 - Greydanus Hamiltonian Neural Networks Paper

Uploaded by

2019 - Greydanus Hamiltonian Neural Networks Paper

Uploaded by

Hamiltonian Neural Networks

Sam Greydanus Misko Dzamba Jason Yosinski

Ideal mass-spring system Baseline NN Prediction

Noisy observations Hamiltonian NN Prediction

Instead of crafting the Hamiltonian by hand, we propose parameterizing it with a

3 Learning a Hamiltonian from Data

Predictions MSE between coordinates Total HNN-conserved quantity Total energy

4 Modeling Larger Systems

One of the key strengths of neural net-

Intuitively, this corresponds to adding or removing some

You might also like