0% found this document useful (0 votes)

6 views16 pages

Hamiltonian 3

The paper presents Symplectic ODE-Net (SymODEN), a deep learning framework designed to infer the dynamics of physical systems through ordinary differential equations (ODEs) using observed state trajectories. It incorporates Hamiltonian dynamics with control, allowing for better generalization with fewer training samples and providing insights into physical properties like mass and potential energy. This framework enables the synthesis of model-based control strategies, enhancing the performance of complex nonlinear systems.

Uploaded by

Charlie

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views16 pages

Hamiltonian 3

Uploaded by

Charlie

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Published as a Conference Paper at ICLR 2020

S YMPLECTIC ODE-N ET: L EARNING H AMILTONIAN DYNAMICS

WITH C ONTROL

Yaofeng Desmond Zhong† , Biswadip Dey‡ , and Amit Chakraborty‡

†
Princeton University, ‡ Siemens Corporation, Corporate Technology
[email protected],(biswadip.dey,amit.chakraborty)@siemens.com

A BSTRACT
arXiv:1909.12077v4 [cs.LG] 30 Apr 2020

In this paper, we introduce Symplectic1 ODE-Net (SymODEN), a deep learning framework which
can infer the dynamics of a physical system, given by an ordinary differential equation (ODE), from
observed state trajectories. To achieve better generalization with fewer training samples, SymODEN
incorporates appropriate inductive bias by designing the associated computation graph in a physics-
informed manner. In particular, we enforce Hamiltonian dynamics with control to learn the un-
derlying dynamics in a transparent way, which can then be leveraged to draw insight about rele-
vant physical aspects of the system, such as mass and potential energy. In addition, we propose a
parametrization which can enforce this Hamiltonian formalism even when the generalized coordi-
nate data is embedded in a high-dimensional space or we can only access velocity data instead of
generalized momentum. This framework, by offering interpretable, physically-consistent models for
physical systems, opens up new possibilities for synthesizing model-based control strategies.

1 I NTRODUCTION
In recent years, deep neural networks (Goodfellow et al., 2016) have become very accurate and widely used in many
application domains, such as image recognition (He et al., 2016), language comprehension (Devlin et al., 2019), and
sequential decision making (Silver et al., 2017). To learn underlying patterns from data and enable generalization
beyond the training set, the learning approach incorporates appropriate inductive bias (Haussler, 1988; Baxter, 2000)
by promoting representations which are simple in some sense. It typically manifests itself via a set of assumptions,
which in turn can guide a learning algorithm to pick one hypothesis over another. The success in predicting an outcome
for previously unseen data then depends on how well the inductive bias captures the ground reality. Inductive bias can
be introduced as the prior in a Bayesian model, or via the choice of computation graphs in a neural network.
In a variety of settings, especially in physical systems, wherein laws of physics are primarily responsible for shaping
the outcome, generalization in neural networks can be improved by leveraging underlying physics for designing the
computation graphs. Here, by leveraging a generalization of the Hamiltonian dynamics, we develop a learning frame-
work which exploits the underlying physics in the associated computation graph. Our results show that incorporation
of such physics-based inductive bias offers insight about relevant physical properties of the system, such as inertia,
potential energy, total conserved energy. These insights, in turn, enable a more accurate prediction of future behavior
and improvement in out-of-sample behavior. Furthermore, learning a physically-consistent model of the underlying
dynamics can subsequently enable usage of model-based controllers which can provide performance guarantees for
complex, nonlinear systems. In particular, insight about kinetic and potential energy of a physical system can be lever-
aged to synthesize appropriate control strategies, such as the method of controlled Lagrangian (Bloch et al., 2001) and
interconnection & damping assignment (Ortega et al., 2002), which can reshape the closed-loop energy landscape to
achieve a broad range of control objectives (regulation, tracking, etc.).

R ELATED W ORK
Physics-based Priors for Learning in Dynamical Systems: The last few years have witnessed a significant interest
in incorporating physics-based priors into deep learning frameworks. Such approaches, in contrast to more rigid
parametric system identification techniques (Söderström & Stoica, 1988), use neural networks to approximate the
state-transition dynamics and therefore are more expressive. Sanchez-Gonzalez et al. (2018), by representing the
causal relationships in a physical system as a directed graph, use a recurrent graph network to infer latent space
dynamics of robotic systems. Lutter et al. (2019) and Gupta et al. (2019) leverage Lagrangian mechanics to learn
1
We use the word Symplectic to emphasize that the learned dynamics endows a symplectic structure (Arnold et al., 2001) on the
underlying space.

1
Published as a Conference Paper at ICLR 2020

the dynamics of kinematic structures from time-series data of position, velocity, and acceleration. A more recent
(concurrent) work by Greydanus et al. (2019) uses Hamiltonian mechanics to learn the dynamics of autonomous,
energy-conserved mechanical systems from time-series data of position, momentum, and their derivatives. A key
difference between these approaches and the proposed one is that our framework does not require any information
about higher-order derivatives (e.g., acceleration) and can incorporate external control into the Hamiltonian formalism.

Neural Networks for Dynamics and Control: Inferring underlying dynamics from time-series data plays a critical
role in controlling closed-loop response of dynamical systems, such as robotic manipulators (Lillicrap et al., 2015)
and building HVAC systems (Wei et al., 2017). Although the use of neural networks towards identification and control
of dynamical systems dates back to more than three decades ago (Narendra & Parthasarathy, 1990), recent advances
in deep neural networks have led to renewed interest in this domain. Watter et al. (2015) learn dynamics with control
from high-dimensional observations (raw image sequences) using a variational approach and synthesize an iterative
LQR controller to control physical systems by imposing a locally linear constraint. Karl et al. (2016) and Krishnan
et al. (2017) adopt a variational approach and use recurrent architectures to learn state-space models from noisy
observation. SE3-Nets (Byravan & Fox, 2017) learn SE(3) transformation of rigid bodies from point cloud data.
Ayed et al. (2019) use partial information about the system state to learn a nonlinear state-space model. However, this
body of work, while attempting to learn state-space models, does not take physics-based priors into consideration.

C ONTRIBUTION
The main contribution of this work is two-fold. First, we introduce a learning framework called Symplectic ODE-
Net (SymODEN) which encodes a generalization of the Hamiltonian dynamics. This generalization, by adding an
external control term to the standard Hamiltonian dynamics, allows us to learn the system dynamics which conforms
to Hamiltonian dynamics with control. With the learned structured dynamics, we are able to synthesize controllers
to control the system to track a reference configuration. Moreover, by encoding the structure, we can achieve better
predictions with smaller network sizes. Second, we take one step forward in combining the physics-based prior and
the data-driven approach. Previous approaches (Lutter et al., 2019; Greydanus et al., 2019) require data in the form
of generalized coordinates and their derivatives up to the second order. However, a large number of physical systems
accommodate generalized coordinates which are non-Euclidean (e.g., angles), and such angle data is often obtained
in the embedded form, i.e., (cos q, sin q) instead of the coordinate (q) itself. The underlying reason is that an angular
coordinate lies on S1 instead of R1 . In contrast to previous approaches which do not address this aspect, SymODEN
has been designed to work with angle data in the embedded form. Additionally, we leverage differentiable ODE
solvers to avoid the need for estimating second-order derivatives of generalized coordinates. Code for the SymODEN
framework and experiments is available at https://github.com/d-biswa/Symplectic-ODENet.

2 P RELIMINARY C ONCEPTS
2.1 H AMILTONIAN DYNAMICS

Lagrangian dynamics and Hamiltonian dynamics are both reformulations of Newtonian dynamics. They provide
novel insights into the laws of mechanics. In these formulations, the configuration of a system is described by its
generalized coordinates. Over time, the configuration point of the system moves in the configuration space, tracing
out a trajectory. Lagrangian dynamics describes the evolution of this trajectory, i.e., the equations of motion, in the
configuration space. Hamiltonian dynamics, however, tracks the change of system states in the phase space, i.e. the
product space of generalized coordinates q = (q1 , q2 , ..., qn ) and generalized momenta p = (p1 , p2 , ..., pn ). In other
words, Hamiltonian dynamics treats q and p on an equal footing. This not only provides symmetric equations of
motion but also leads to a whole new approach to classical mechanics (Goldstein et al., 2002). Hamiltonian dynamics
is also widely used in statistical and quantum mechanics.
In Hamiltonian dynamics, the time-evolution of a system is described by the Hamiltonian H(q, p), a scalar function
of generalized coordinates and momenta. Moreover, in almost all physical systems, the Hamiltonian is the same as the
total energy and hence can be expressed as
1 T −1
H(q, p) = p M (q)p + V (q), (1)
2
where the mass matrix M(q) is symmetric positive definite and V (q) represents the potential energy of the system.
Correspondingly, the time-evolution of the system is governed by
∂H ∂H
q̇ = ṗ = − , (2)
∂p ∂q

2
Published as a Conference Paper at ICLR 2020

where we have dropped explicit dependence on q and p for brevity of notation. Moreover, since
∂H T ∂H T
Ḣ = q̇ + ṗ = 0, (3)
∂q ∂p
the total energy is conserved along a trajectory of the system. The RHS of Equation (2) is called the symplectic gradient
(Rowe et al., 1980) of H, and Equation (3) shows that moving along the symplectic gradient keeps the Hamiltonian
constant.
In this work, we consider a generalization of the Hamiltonian dynamics which provides a means to incorporate ex-
ternal control (u), such as force and torque. As external control is usually affine and only influences changes in the
generalized momenta, we can express this generalization as
" ∂H #
q̇ ∂p 0
= + u, (4)
ṗ − ∂H
∂q
g(q)
where the input matrix g(q) is typically assumed to have full column rank. For u = 0, the generalized dynamics
reduces to the classical Hamiltonian dynamics (2) and the total energy is conserved; however, when u 6= 0, the system
has a dissipation-free energy exchange with the environment.

2.2 C ONTROL VIA E NERGY S HAPING

Once we have learned the dynamics of a system, the learned model can be used to synthesize a controller for driving
the system to a reference configuration q? . As the proposed approach offers insight about the energy associated with a
system, it is a natural choice to exploit this information for synthesizing controllers via energy shaping (Ortega et al.,
2001). As energy is a fundamental aspect of physical systems, reshaping the associated energy landscape enables us to
specify a broad range of control objectives and synthesize nonlinear controllers with provable performance guarantees.
If rank(g(q)) = rank(q), the system is fully-actuated and we have control over any dimension of “acceleration”
in ṗ. For such fully-actuated systems, a controller u(q, p) = β (q) + v(p) can be synthesized via potential energy
shaping β (q) and damping injection v(p). For completeness, we restate this procedure (Ortega et al., 2001) using our
notation. As the name suggests, the goal of potential energy shaping is to synthesize β (q) such that the closed-loop
system behaves as if its time-evolution is governed by a desired Hamiltonian Hd . With this, we have
" ∂H # "
∂Hd
#
q̇ ∂p 0 ∂p
= + β (q) = , (5)
ṗ − ∂H
∂q
g(q) − ∂H
∂q
d

where the difference between the desired Hamiltonian and the original one lies in their potential energy term, i.e.
1
Hd (q, p) = pT M−1 (q)p + Vd (q). (6)
2
In other words, β (q) shape the potential energy such that the desired Hamiltonian Hd (q, p) has a minimum at (q? , 0).
Then, by substituting Equation (1) and Equation (6) into Equation (5), we get
∂V ∂Vd
β (q) = gT (ggT )−1 − . (7)
∂q ∂q
Thus, with potential energy shaping, we ensure that the system has the lowest energy at the desired reference configu-
ration. Furthermore, to ensure that trajectories actually converge to this configuration, we add an additional damping
term2 given by
v(p) = −gT (ggT )−1 (Kd p). (8)
However, for underactuated systems, potential energy shaping alone cannot3 drive the system to a desired configura-
tion. We also need kinetic energy shaping for this purpose (Chang et al., 2002).
Remark If the desired potential energy is chosen to be a quadratic of the form
1
Vd (q) = (q − q? )T Kp (q − q? ), (9)
2
the external forcing term can be expressed as

∂V
u = gT (ggT )−1 − Kp (q − q? ) − Kd p . (10)
∂q
4
This can be interpreted as a PD controller with an additional energy compensation term.
2
If we have access to q̇ instead of p, we use q̇ instead in Equation (8).
3
As gg T is not invertible, we cannot solve the matching condition given by Equation (7).
4
Please refer to Appendix B for more details.

3
Published as a Conference Paper at ICLR 2020

3 S YMPLECTIC ODE-N ET
In this section, we introduce the network architecture of Symplectic ODE-Net. In Subsection 3.1, we show how to
learn an ordinary differential equation with a constant control term. In Subsection 3.2, we assume we have access to
generalized coordinate and momentum data and derive the network architecture. In Subsection 3.3, we take one step
further to propose a data-driven approach to deal with data of embedded angle coordinates. In Subsection 3.4, we put
together the line of reasoning introduced in the previous two subsections to propose SymODEN for learning dynamics
on the hybrid space Rn × Tm .

3.1 T RAINING N EURAL ODE WITH C ONSTANT F ORCING

Now we focus on the problem of learning the ordinary differential equation (ODE) from time series data. Consider an
ODE: ẋ = f (x). Assume we don’t know the analytical expression of the right hand side (RHS) and we approximate
it with a neural network. If we have time series data X = (xt0 , xt1 , ..., xtn ), how could we learn f (x) from the data?
Chen et al. (2018) introduced Neural ODE, differentiable ODE solvers with O(1)-memory backpropagation. With
Neural ODE, we make predictions by approximating the RHS function using a neural network fθ and feed it into an
ODE solver
x̂t1 , x̂t2 , ..., x̂tn = ODESolve(xt0 , fθ , t1 , t2 , ..., tn )
We can then construct the loss function L = kX − X̂k22 and update the weights θ by backpropagating through the
ODE solver.
In theory, we can learn fθ in this way. In practice, however, the neural net is hard to train if n is large. If we have
a bad initial estimate of the fθ , the prediction error would in general be large. Although |xt1 − x̂t1 | might be small,
x̂tN would be far from xtN as error accumulates, which makes the neural network hard to train. In fact, the prediction
error of x̂tN is not as important as x̂t1 . In other words, we should weight data points in a short time horizon more
than the rest of the data points. In order to address this and better utilize the data, we introduce the time horizon τ as a
hyperparameter and predict xti+1 , xti+2 , ..., xti+τ from initial condition xti , where i = 0, ..., n − τ .
One challenge toward leveraging Neural ODE to learn state-space models is the incorporation of the control term into
the dynamics. Equation (4) has the form ẋ = f (x, u) with x = (q, p). A function of this form cannot be directly fed
into Neural ODE directly since the domain and range of f have different dimensions. In general, if our data consist of
trajectories of (x, u)t0 ,...,tn where u remains the same in a trajectory, we can leverage the augmented dynamics

ẋ fθ (x, u)
= = f̃θ (x, u). (11)
u̇ 0

With Equation (11), we can match the input and output dimension of f̃θ , which enables us to feed it into Neural ODE.
The idea here is to use different constant external forcing to get the system responses and use those responses to train
the model. With a trained model, we can apply a time-varying u to the dynamics ẋ = fθ (x, u) and generate estimated
trajectories. When we synthesize the controller, u remains constant in each integration step. As long as our model
interpolates well among different values of constant u, we could get good estimated trajectories with a time-varying
u. The problem is then how to design the network architecture of f̃θ , or equivalently fθ such that we can learn the
dynamics in an efficient way.

3.2 L EARNING FROM G ENERALIZED C OORDINATE AND M OMENTUM

Suppose we have trajectory data consisting of (q, p, u)t0 ,...,tn , where u remains constant in a trajectory. If we have the
prior knowledge that the unforced dynamics of q and p is governed by Hamiltonian dynamics, we can use three neural
nets – M−1
θ1 (q), Vθ2 (q) and gθ3 (q) – as function approximators to represent the inverse of mass matrix, potential
energy and the input matrix. Thus,
" ∂H #
θ1 ,θ2
∂p 0
fθ (q, p, u) = ∂Hθ1 ,θ2 + u (12)
− ∂q gθ3 (q)

where
1 T −1
Hθ1 ,θ2 (q, p) = p Mθ1 (q)p + Vθ2 (q) (13)
2
The partial derivative in the expression can be taken care of by automatic differentiation. by putting the designed
fθ (q, p, u) into Neural ODE, we obtain a systematic way of adding the prior knowledge of Hamiltonian dynamics into
end-to-end learning.

4
Published as a Conference Paper at ICLR 2020

3.3 L EARNING FROM E MBEDDED A NGLE DATA

In the previous subsection, we assume (q, p, u)t0 ,...,tn . In a lot of physical system models, the state variables involve
angles which reside in the interval [−π, π). In other words, each angle resides on the manifold S1 . From a data-
driven perspective, the data that respects the geometry is a 2 dimensional embedding (cos q, sin q). Furthermore, the
generalized momentum data is usually not available. Instead, the velocity is often available. For example, in OpenAI
Gym (Brockman et al., 2016) Pendulum-v0 task, the observation is (cos q, sin q, q̇).
From a theoretical perspective, however, the angle itself is often used, instead of the 2D embedding. The reason
being both the Lagrangian and the Hamiltonian formulations are derived using generalized coordinates. Using an
independent generalized coordinate system makes it easier to solve for the equations of motion.
In this subsection, we take the data-driven standpoint and develop an angle-aware method to accommodate the un-
derlying manifold structure. We assume all the generalized coordinates are angles and the data comes in the form of
(x1 (q), x2 (q), x3 (q̇), u)t0 ,...,tn = (cos q, sin q, q̇, u)t0 ,...,tn . We aim to incorporate our theoretical prior – Hamil-
tonian dynamics – into the data-driven approach. The goal is to learn the dynamics of x1 , x2 and x3 . Noticing
p = M(x1 , x2 )q̇, we can write down the derivative of x1 , x2 and x3 ,
ẋ1 = − sin q ◦ q̇ = −x2 ◦ q̇
ẋ2 = cos q ◦ q̇ = x1 ◦ q̇ (14)
d d
ẋ3 = (M−1 (x1 , x2 )p) = (M−1 (x1 , x2 ))p + M−1 (x1 , x2 )ṗ
dt dt
where “◦” represents the elementwise product (i.e., Hadamard product). We assume q and p evolve with the general-
ized Hamiltonian dynamics Equation (4). Here the Hamiltonian H(x1 , x2 , p) is a function of x1 , x2 and p instead of
q and p.
∂H
q̇ = (15)
∂p
∂H ∂x1 ∂H ∂x2 ∂H
ṗ = − + g(x1 , x2 )u = − − + g(x1 , x2 )u
∂q ∂q ∂x1 ∂q ∂x2
∂H ∂H ∂H ∂H
= sin q ◦ − cos q ◦ + g(x1 , x2 )u = x2 ◦ − x1 ◦ + g(x1 , x2 )u (16)
∂x1 ∂x2 ∂x1 ∂x2
Then the right hand side of Equation (14) can be expressed as a function of state variables and control (x1 , x2 , x3 , u).
Thus, it can be fed into the Neural ODE. We use three neural nets – M−1 θ1 (x1 , x2 ), Vθ2 (x1 , x2 ) and gθ3 (x1 , x2 ) –
as function approximators. Substitute Equation (15) and Equation (16) into Equation (14), then the RHS serves as
fθ (x1 , x2 , x3 , u).5
∂Hθ1 ,θ2
 
−x2 ◦ ∂p
 ∂Hθ1 ,θ2 
fθ (x1 , x2 , x3 , u) = 
 x1 ◦ ∂p  (17)

d −1 −1 ∂H θ ,
θ ∂H θ ,
θ
dt (Mθ1 (x1,x2 ))p+Mθ1 (x1,x2 ) x2 ◦ ∂x1 −x1 ◦ ∂x2 +gθ3 (x1,x2 )u
1 2 1 2

where
1 T −1
Hθ1 ,θ2 (x1 , x2 , p) = p Mθ1 (x1 , x2 )p + Vθ2 (x1 , x2 ) (18)
2
p = Mθ1 (x1 , x2 )x3 (19)

3.4 L EARNING ON H YBRID S PACES Rn × Tm

In Subsection 3.2, we treated the generalized coordinates as translational coordinates. In Subsection 3.3, we de-
veloped an angle-aware method to better deal with embedded angle data. In most of physical systems, these two
types of coordinates coexist. For example, robotics systems are usually modelled as interconnected rigid bodies.
The positions of joints or center of mass are translational coordinates and the orientations of each rigid body are
angular coordinates. In other words, the generalized coordinates lie on Rn × Tm , where Tm denotes the m-torus,
with T1 = S1 and T2 = S1 × S1 . In this subsection, we put together the architecture of the previous two sub-
sections. We assume the generalized coordinates are q = (r, φ ) ∈ Rn × Tm and the data comes in the form of
5
In Equation (17), the derivative of M−1
θ1 (x1 , x2 ) can be expanded using chain rule and expressed as a function of the states.

5
Published as a Conference Paper at ICLR 2020

Figure 1: The computation graph of SymODEN. Blue arrows indicate neural network parametrization. Red arrows indicate
automatic differentiation. For a given (x, u), the computation graph outputs a fθ (x, u) which follows Hamiltonian dynamics with
control. The function itself is an input to the Neural ODE to generate estimation of states at each time step. Since all the operations
are differentiable, weights of the neural networks can be updated by backpropagation.

(x1 , x2 , x3 , x4 , x5 , u)t0 ,...,tn = (r, cos φ , sin φ , ṙ, φ̇ φ, u)t0 ,...,tn . With similar line of reasoning, we use three neural
nets – M−1 θ1 (x 1 , x 2 , x 3 ), V θ2 (x 1 , x 2 , x 3 ) and g θ3 (x 1 , x2 , x3 ) – as function approximators. We have

x
p = Mθ1 (x1 , x2 , , x3 ) 4 (20)
x5
1
Hθ1 ,θ2 (x1 , x2 , x3 , p) = pT M−1 θ1 (x1 , x2 , x3 )p + Vθ2 (x1 , x2 , x3 ) (21)
2
with Hamiltonian dynamics, we have

ṙ ∂Hθ1 ,θ2
q̇ = = (22)
φ
φ̇ ∂p
" ∂Hθ1 ,θ2
#
− ∂x
ṗ = ∂Hθ1 ,θ2
1
∂Hθ1 ,θ2 + gθ3 (x1 , x2 , x3 )u (23)
x3 ◦ ∂x 2
− x 2◦ ∂x3

Then
  
ẋ1 ṙ

ẋ2   −x3φ̇ φ 
ẋ3  =   = fθ (x1 , x2 , x3 , x4 , x5 , u) (24)
  
ẋ  φ
x2φ̇ 
4 d −1 −1
ẋ5 dt (Mθ1 (x 1 , x 2 , x 3 ))p + Mθ1 (x 1 , x 2 , x 3 ) ṗ

φ come from Equation (22). Now we obtain a fθ which can be fed into Neural ODE. Figure 1 shows
where the ṙ and φ̇
the flow of the computation graph based on Equation (20)-(24).

3.5 P OSITIVE D EFINITENESS OF THE M ASS MATRIX

In real physical systems, the mass matrix M is positive definite, which ensures a positive kinetic energy with a non-
zero velocity. The positive definiteness of M implies the positive definiteness of M−1θ1 . Thus, we impose this constraint
in the network architecture by M−1 θ1 = L LT
θ1 θ1 , where Lθ1 is a lower-triangular matrix. The positive definiteness is
−1
ensured if the diagonal elements of Mθ1 are positive. In practice, this can be done by adding a small constant to the
diagonal elements of M−1 θ1 . It not only makes Mθ1 invertible, but also stabilizes the training.

4 E XPERIMENTS
4.1 E XPERIMENTAL S ETUP

We use the following four tasks to evaluate the performance of Symplectic ODE-Net model - (i) Task 1: a pendulum
with generalized coordinate and momentum data (learning on R1 ); (ii) Task 2: a pendulum with embedded angle data
(learning on S1 ); (iii) Task 3: a CartPole system (learning on R1 × S1 ); and (iv) Task 4: an Acrobot (learning on T2 ).

6
Published as a Conference Paper at ICLR 2020

Model Variants. Besides the Symplectic ODE-Net model derived above, we consider a variant by approximating the
Hamiltonian using a fully connected neural net Hθ1 ,θ2 . We call it Unstructured Symplectic ODE-Net (Unstructured
SymODEN) since this model does not exploit the structure of the Hamiltonian (1).
Baseline Models. In order to show that we can learn the dynamics better with less parameters by leveraging prior
knowledge, we set up baseline models for all four experiments. For the pendulum with generalized coordinate and
momentum data, the naive baseline model approximates Equation (12) – fθ (x, u) – by a fully connected neural net.
For all the other experiments, which involves embedded angle data, we set up two different baseline models: naive
baseline approximates fθ (x, u) by a fully connected neural net. It doesn’t respect the fact that the coordinate pair,
cos φ and sin φ , lie on Tm . Thus, we set up the geometric baseline model which approximates q̇ and ṗ with a fully
connected neural net. This ensures that the angle data evolves on Tm . 6
Data Generation. For all tasks, we randomly generated initial conditions of states and subsequently combined them
with 5 different constant control inputs, i.e., u = −2.0, −1.0, 0.0, 1.0, 2.0 to produce the initial conditions and input
required for simulation. The simulators integrate the corresponding dynamics for 20 time steps to generate trajectory
data which is then used to construct the training set. The simulators for different tasks are different. For Task 1,
we integrate the true generalized Hamiltonian dynamics with a time interval of 0.05 seconds to generate trajectories.
All the other tasks deal with embedded angle data and velocity directly, so we use OpenAI Gym (Brockman et al.,
2016) simulators to generate trajectory data. One drawback of using OpenAI Gym is that not all environments use
the Runge-Kutta method (RK4) to carry out the integration. OpenAI Gym favors other numerical schemes over RK4
because of speed, but it is harder to learn the dynamics with inaccurate data. For example, if we plot the total energy as
a function of time from data generated by Pendulum-v0 environment with zero action, we see that the total energy
oscillates around a constant by a significant amount, even though the total energy should be conserved. Thus, for
Task 2 and Task 3, we use Pendulum-v0 and CartPole-v1, respectively, and replace the numerical integrator of
the environments to RK4. For Task 4, we use the Acrobot-v1 environment which is already using RK4. We also
change the action space of Pendulum-v0, CartPole-v1 and Acrobot-v1 to a continuous space with a large
enough bound.
Model training. In all the tasks, we train our model using Adam optimizer (Kingma & Ba, 2014) with 1000 epochs.
We set a time horizon τ = 3, and choose “RK4” as the numerical integration scheme in Neural ODE. We vary
the size of the training set by doubling from 16 initial state conditions to 1024 initial state conditions. Each initial
state condition is combined with five constant control u = −2.0, −1.0, 0.0, 1.0, 2.0 to produce initial condition for
simulation. Each trajectory is generated by integrating the dynamics 20 time steps forward. We set the size of mini-
batches to be the number of initial state conditions. We logged the train error per trajectory and the prediction error
per trajectory in each case for all the tasks. The train error per trajectory is the mean squared error (MSE) between the
estimated trajectory and the ground truth over 20 time steps. To evaluate the performance of each model in terms of
long time prediction, we construct the metric of prediction error per trajectory by using the same initial state condition
in the training set with a constant control of u = 0.0, integrating 40 time steps forward, and calculating the MSE
over 40 time steps. The reason for using only the unforced trajectories is that a constant nonzero control might cause
the velocity to keep increasing or decreasing over time, and large absolute values of velocity are of little interest for
synthesizing controllers.

4.2 TASK 1: P ENDULUM WITH G ENERALIZED C OORDINATE AND M OMENTUM DATA

In this task, we use the model described in Trajectory Prediction Trajectory Prediction Trajectory Prediction
3 Ground Truth 3 Ground Truth 3 Ground Truth
Section 3.2 and present the predicted tra- Naive Baseline Unstructured SymODEN SymODEN
jectories of the learned models as well as 2 2 2

the learned functions of SymODEN. We also p 1 p 1 p 1

0 0 0
point out the drawback of treating the angle
1 1 1
data as a Cartesian coordinate. The dynam-
2 2 2
ics of this task has the following form 3 2 1 0 1 2 3 3 2 1 0 1 2 3 3 2 1 0 1 2 3
q q q
q̇ = 3p, ṗ = −5 sin q + u (25)
g(q) M 1(q) V(q)
4 4
with Hamiltonian H(q, p) = 1.5p2 + 5(1 − Ground Truth
20
Ground Truth
3 SymODEN g 3(q) 3 SymODEN V 2(q)
cos q). In other words M (q) = 3, V (q) = 10
2 2
5(1 − cos q) and g(q) = 1. Ground Truth
1 1 0
SymODEN M 11(q)
In Figure 2, The ground truth is an unforced 0 0
4 2 0 2 4 4 2 0 2 4 4 2 0 2 4
trajectory which is energy-conserved. The q q q
Figure 2: Sample trajectories and learned functions of Task 1.
6
For more information on model details, please refer to Appendix A.

7
Published as a Conference Paper at ICLR 2020

prediction trajectory of the baseline model does not conserve energy, while both the SymODEN and its unstructured
variant predict energy-conserved trajectories. For SymODEN, the learned gθ3 (q) and Mθ−1 1
(q) matches the ground
truth well. Vθ2 (q) differs from the ground truth with a constant. This is acceptable since the potential energy is a
relative notion. Only the derivative of Vθ2 (q) plays a role in the dynamics.
Here we treat q as a variable in R1 and our training set contains initial conditions of q ∈ [−π, 3π]. The learned
functions do not extrapolate well outside this range, as we can see from the left part in the figures of Mθ−1
1
(q) and
Vθ2 (q). We address this issue by working directly with embedded angle data, which leads us to the next subsection.

4.3 TASK 2: P ENDULUM WITH E MBEDDED DATA

In this task, the dynamics is the same as Equation (25) but the training data are generated by the OpenAI Gym
simulator, i.e. we use embedded angle data and assume we only have access to q̇ instead of p. We use the model
described in Section 3.3 and synthesize an energy-based controller (Section 2.2). Without true p data, the learned
function matches the ground truth with a scaling β, as shown in Figure 3. To explain the scaling, let us look at the
following dynamics
q̇ = p/α, ṗ = −15α sin q + 3αu (26)
2
with Hamiltonian H = p /(2α) + 15α(1 − 4 g(q) M 1(q) V(q)
4 20
cos q). If we only look at the dynamics of 3 Ground Truth Ground Truth
SymODEN g (q) 3
3 SymODEN V (q)
2
q, we have q̈ = −15 sin q + 3u, which is 2 2
10
independent of α. If we don’t have access 1 1 Ground Truth
SymODEN M 1(q)/ 0
1
to the generalized momentum p, our trained 0 0
4 2 0 2 4 4 2 0 2 4 4 2 0 2 4
neural network may converge to a Hamilto- q q q
nian with a αe which is different from the Figure 3: Without true generalized momentum data, the learned functions
true value, αt = 1/3, in this task. By a scal- match the ground truth with a scaling. Here β = 0.357
ing β = αt /αe = 0.357, the learned func-
tions match the ground truth. Even we are not learning the true αt , we can still perform prediction and control since
we are learning the dynamics of q correctly. We let Vd = −Vθ2 (q), then the desired Hamiltonian has minimum energy
when the pendulum rests at the upward position. For the damping injection, we let Kd = 3. Then from Equation (7)
and (8), the controller we synthesize is
∂Vθ2 ∂Vθ2
u(cos q, sin q, q̇) = gθ−1

(cos q, sin q) 2 − sin q + cos q − 3 q̇ (27)
3
∂ cos q ∂ sin q
q q
10
u
Only SymODEN out of all models we con- 1
sin(q) 2.5
sider provides the learned potential energy cos(q)
0 0.0 0
which is required to synthesize the con- 2.5
1 10
troller. Figure 4 shows how the states evolve 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10
when the controller is fed into the OpenAI t t t
Gym simulator. We can successfully control Figure 4: Time-evolution of the state variables (cos q, sin q, q̇) when the
the pendulum into the inverted position us- closed-loop control input u(cos q, sin q, q̇) is governed by Equation (27). The
ing the controller based on the learned model thin black lines show the expected results.
even though the absolute maximum control u, 7.5, is more than three times larger than the absolute maximum u in the
training set, which is 2.0. This shows SymODEN extrapolates well.

4.4 TASK 3: C ART P OLE S YSTEM

The CartPole system is an underactuated system and to synthesize a controller to balance the pole from arbitrary
initial condition requires trajectory optimization or kinetic energy shaping. We show that we can learn its dynamics
and perform prediction in Section 4.6. We also train SymODEN in a fully-actuated version of the CartPole system
(see Appendix E). The corresponding energy-based controller can bring the pole to the inverted position while driving
the cart to the origin.

4.5 TASK 4: ACROBOT

The Acrobot is an underactuated double pendulum. As this system exhibits chaotic motion, it is not possible to predict
its long-term behavior. However, Figure 6 shows that SymODEN can provide reasonably good short-term prediction.
We also train SymODEN in a fully-actuated version of the Acrobot and show that we can control this system to reach
the inverted position (see Appendix E).

8
Published as a Conference Paper at ICLR 2020

Task 1: Pendulum Task 2: Pendulum(embed) Task 3: CartPole Task 4: Acrobot

102 Naive Baseline
Geometric Baseline
Unstructured SymODEN
101 101 SymODEN
101
Train error

100
100
100
100 10 1
10 1

102 103 102 103 102 103 102 103

102 103 Naive Baseline
Geometric Baseline
102 103 Unstructured SymODEN
Prediction error

SymODEN

101 102 102

101 100 101

101 100
102 103 102 103 102 103 102 103
number of initial state conditions number of initial state conditions number of initial state conditions number of initial state conditions
Figure 5: Train error per trajectory and prediction error per trajectory for all 4 tasks with different number of training trajectories.
Horizontal axis shows number of initial state conditions (16, 32, 64, 128, 256, 512, 1024) in the training set. Both the horizontal
axis and vertical axis are in log scale.
Task1: Pendulum Task 2: Pendulum(embed) Task 3: CartPole Task 4: Acrobot
10.0 8 Naive Baseline
60 40 Geometric Baseline
7.5 6 Unstructured SymODEN
40 30 SymODEN
5.0 4
MSE

20
2.5 2 20
10
0.0 0 0 0
0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0
t t t t
12 0.8
6 Ground Truth
Naive Baseline
0.7 20 Geometric Baseline
4
Total Energy

10 Unstructured SymODEN
SymODEN
0.6 2
8 10
0.5 0
2
6 0.4 0
0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0
t t t t
Figure 6: Mean square error and total energy of test trajectories. SymODEN works the best in terms of both MSE and total
energy. Since SymODEN has learned the Hamiltonian and discovered the conservation from data the predicted trajectories match
the ground truth. The ground truth of energy in all four tasks stay constant.

4.6 R ESULTS

In this subsection, we show the train error, prediction error, as well as the MSE and total energy of a sample test
trajectory for all the tasks. Figure 5 shows the variation in train error and prediction error with changes in the number
of initial state conditions in the training set. We can see that SymODEN yields better generalization in every task. In
Task 3, although the Geometric Baseline Model yields lower train error in comparison to the other models, SymODEN
generates more accurate predictions, indicating overfitting in the Geometric Baseline Model. By incorporating the
physics-based prior of Hamiltonian dynamics, SymODEN learns dynamics that obeys physical laws and thus provides
better predictions. In most cases, SymODEN trained with a smaller training dataset performs better than other models
in terms of the train and prediction error, indicating that better generalization can be achieved even with fewer training
samples.
Figure 6 shows the evolution of MSE and total energy along a trajectory with a previously unseen initial condition.
For all the tasks, MSE of the baseline models diverges faster than SymODEN. Unstructured SymODEN performs well
in all tasks except Task 3. As for the total energy, in Task 1 and Task 2, SymODEN and Unstructured SymODEN
conserve total energy by oscillating around a constant value. In these models, the Hamiltonian itself is learned and the
prediction of the future states stay around a level set of the Hamiltonian. Baseline models, however, fail to find the
conservation and the estimation of future states drift away from the initial Hamiltonian level set.

9
Published as a Conference Paper at ICLR 2020

5 C ONCLUSION
Here we have introduced Symplectic ODE-Net which provides a systematic way to incorporate prior knowledge
of Hamiltonian dynamics with control into a deep learning framework. We show that SymODEN achieves better
prediction with fewer training samples by learning an interpretable, physically-consistent state-space model. Future
works will incorporate a broader class of physics-based prior, such as the port-Hamiltonian system formulation, to
learn dynamics of a larger class of physical systems. SymODEN can work with embedded angle data or when we only
have access to velocity instead of generalized momentum. Future works would explore other types of embedding, such
as embedded 3D orientations. Another interesting direction could be to combine energy shaping control (potential as
well as kinetic energy shaping) with interpretable end-to-end learning frameworks.

ACKNOWLEDGMENTS
This research was inspired by the ideas and plans articulated by N. E. Leonard and A. Majumdar, Princeton University,
in their ONR grant #N00014-18-1-2873. The research was primarily carried out during Y. D. Zhong’s internship at
Siemens Corporation, Corporate Technology. Pre- and post-internship, Y. D. Zhong’s work was supported by ONR
grant #N00014-18-1-2873.

R EFERENCES
Vladimir I. Arnold, Alexander B. Givental, and Sergei P. Novikov. Symplectic geometry. In Dynamical systems IV,
pp. 1–138. Springer, 2001.
Ibrahim Ayed, Emmanuel de Bézenac, Arthur Pajot, Julien Brajard, and Patrick Gallinari. Learning dynamical systems
from partial observations. arXiv:1902.11136, 2019.
Jonathan Baxter. A model of inductive bias learning. Journal of Artificial Intelligence Research, 12:149–198, 2000.
Anthony M. Bloch, Naomi E. Leonard, and Jerrold E. Marsden. Controlled lagrangians and the stabilization of euler–
poincaré mechanical systems. International Journal of Robust and Nonlinear Control, 11(3):191–214, 2001.
Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech
Zaremba. OpenAI Gym. arXiv:1606.01540, 2016.
Arunkumar Byravan and Dieter Fox. Se3-nets: Learning rigid body motion using deep neural networks. In 2017 IEEE
International Conference on Robotics and Automation (ICRA), pp. 173–180. IEEE, 2017.
Dong E. Chang, Anthony M. Bloch, Naomi E. Leonard, Jerrold E. Marsden, and Craig A. Woolsey. The equiva-
lence of controlled lagrangian and controlled hamiltonian systems. ESAIM: Control, Optimisation and Calculus of
Variations, 8:393–422, 2002.
Tian Q. Chen, Yulia Rubanova, Jesse Bettencourt, and David K. Duvenaud. Neural ordinary differential equations. In
Advances in Neural Information Processing Systems 31, pp. 6571–6583. 2018.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional trans-
formers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the
Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp.
4171–4186, 2019.
Herbert Goldstein, Charles Poole, and John Safko. Classical mechanics, 2002.
Ian Goodfellow, Aaron Courville, and Yoshua Bengio. Deep learning, volume 1. MIT Press, 2016.
Sam Greydanus, Misko Dzamba, and Jason Yosinski. Hamiltonian Neural Networks. arXiv:1906.01563, 2019.
Jayesh K. Gupta, Kunal Menda, Zachary Manchester, and Mykel J. Kochenderfer. A general framework for structured
learning of mechanical systems. arXiv:1902.08705, 2019.
David Haussler. Quantifying inductive bias: AI learning algorithms and Valiant’s learning framework. Artificial
Intelligence, 36(2):177–221, 1988.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceed-
ings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2016.

10
Published as a Conference Paper at ICLR 2020

Maximilian Karl, Maximilian Soelch, Justin Bayer, and Patrick van der Smagt. Deep variational bayes filters: Unsu-
pervised learning of state space models from raw data. arXiv:1605.06432, 2016.
Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. arXiv:1412.6980, 2014.
Rahul G. Krishnan, Uri Shalit, and David Sontag. Structured inference networks for nonlinear state space models. In
Thirty-First AAAI Conference on Artificial Intelligence, 2017.
Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and
Daan Wierstra. Continuous control with deep reinforcement learning. arXiv:1509.02971, 2015.
Michael Lutter, Christian Ritter, and Jan Peters. Deep lagrangian networks: Using physics as model prior for deep
learning. In 7th International Conference on Learning Representations (ICLR), 2019.
Kumpati S. Narendra and Kannan Parthasarathy. Identification and control of dynamical systems using neural net-
works. IEEE Transactions on Neural Networks, 1(1):4–27, 1990.
Romeo Ortega, Arjan J. Van Der Schaft, Iven Mareels, and Bernhard Maschke. Putting energy back in control. IEEE
Control Systems Magazine, 21(2):18–33, 2001.
Romeo Ortega, Arjan J. Van Der Schaft, Bernhard Maschke, and Gerardo Escobar. Interconnection and damping
assignment passivity-based control of port-controlled hamiltonian systems. Automatica, 38(4):585–596, 2002.
David J. Rowe, Arthur Ryman, and George Rosensteel. Many-body quantum mechanics as a symplectic dynamical
system. Physical Review A, 22(6):2362, 1980.
Alvaro Sanchez-Gonzalez, Nicolas Heess, Jost T. Springenberg, Josh Merel, Martin Riedmiller, Raia Hadsell, and
Peter Battaglia. Graph networks as learnable physics engines for inference and control. In International Conference
on Machine Learning (ICML), pp. 4467–4476, 2018.
David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert,
Lucas Baker, Matthew Lai, Adrian Bolton, et al. Mastering the game of go without human knowledge. Nature, 550
(7676):354, 2017.
Torsten Söderström and Petre Stoica. System identification. Prentice-Hall, Inc., 1988.
Manuel Watter, Jost Springenberg, Joschka Boedecker, and Martin Riedmiller. Embed to control: A locally linear
latent dynamics model for control from raw images. In Advances in Neural Information Processing 29, pp. 2746–
2754, 2015.
Tianshu Wei, Yanzhi Wang, and Qi Zhu. Deep Reinforcement Learning for Building HVAC Control. In Proceedings
of the 54th Annual Design Automation Conference (DAC), pp. 22:1–22:6, 2017.

Appendices
A E XPERIMENT I MPLEMENTATION D ETAILS
The architectures used for our experiments are shown below. For all the tasks, SymODEN has the lowest number
of total parameters. To ensure that the learned function is smooth, we use Tanh activation function instead of ReLu.
As we have differentiation in the computation graph, non-smooth activation functions would lead to discontinuities
in the derivatives. This, in turn, would result in an ODE with a discontinuous RHS which is not desirable. All the
architectures shown below are fully-connected neural networks. The first number indicates the dimension of the input
layer. The last number indicates the dimension of output layer. The dimension of hidden layers is shown in the middle
along with the activation functions.
Task 1: Pendulum

• Input: 2 state dimensions, 1 action dimension

• Baseline Model (0.36M parameters): 2 - 600Tanh - 600Tanh - 2Linear

11
Published as a Conference Paper at ICLR 2020

• Unstructured SymODEN (0.20M parameters):

– Hθ1 ,θ2 : 2 - 400Tanh - 400Tanh - 1Linear
– gθ3 : 1 - 200Tanh - 200Tanh - 1Linear
• SymODEN (0.13M parameters):
– Mθ−11
: 1 - 300Tanh - 300Tanh - 1Linear
– Vθ2 : 1 - 50Tanh - 50Tanh - 1Linear
– gθ3 : 1 - 200Tanh - 200Tanh - 1Linear

Task 2: Pendulum with embedded data

• Input: 3 state dimensions, 1 action dimension

• Naive Baseline Model (0.65M parameters): 4 - 800Tanh - 800Tanh - 3Linear
• Geometric Baseline Model (0.46M parameters):
– Mθ−1 1
= Lθ1 LTθ1 , where Lθ1 : 2 - 300Tanh - 300Tanh - 300Tanh - 1Linear
– approximate (q̇, ṗ): 4 - 600Tanh - 600Tanh - 2Linear
• Unstructured SymODEN (0.39M parameters):
– Mθ−1 1
= Lθ1 LTθ1 , where Lθ1 : 2 - 300Tanh - 300Tanh - 300Tanh - 1Linear
– Hθ2 : 3 - 500Tanh - 500Tanh - 1Linear
– gθ3 : 2 - 200Tanh - 200Tanh - 1Linear
• SymODEN (0.14M parameters):
– Mθ−1 1
= Lθ1 LTθ1 , where Lθ1 : 2 - 300Tanh - 300Tanh - 300Tanh - 1Linear
– Vθ2 : 2 - 50Tanh - 50Tanh - 1Linear
– gθ3 : 2 - 200Tanh - 200Tanh - 1Linear

Task 3: CartPole

• Input: 5 state dimensions, 1 action dimension

• Naive Baseline Model (1.01M parameters): 6 - 1000Tanh - 1000Tanh - 5Linear
• Geometric Baseline Model (0.82M parameters):
– Mθ−1 1
= Lθ1 LTθ1 , where Lθ1 : 3 - 400Tanh - 400Tanh - 400Tanh - 3Linear
– approximate (q̇, ṗ): 6 - 700Tanh - 700Tanh - 4Linear
• Unstructured SymODEN (0.67M parameters):
– Mθ−1 1
= Lθ1 LTθ1 , where Lθ1 : 3 - 400Tanh - 400Tanh - 400Tanh - 3Linear
– Hθ2 : 5 - 500Tanh - 500Tanh - 1Linear
– gθ3 : 3 - 300Tanh - 300Tanh - 2Linear
• SymODEN (0.51M parameters):
– Mθ−1 1
= Lθ1 LTθ1 , where Lθ1 : 3 - 400Tanh - 400Tanh - 400Tanh - 3Linear
– Vθ2 : 3 - 300Tanh - 300Tanh - 1Linear
– gθ3 : 3 - 300Tanh - 300Tanh - 2Linear

Task 4:Acrobot

• Input: 6 state dimensions, 1 action dimension

• Naive Baseline Model (1.46M parameters): 7 - 1200Tanh - 1200Tanh - 6Linear
• Geometric Baseline Model (0.97M parameters):
– Mθ−1 1
= Lθ1 LTθ1 , where Lθ1 : 4 - 400Tanh - 400Tanh - 400Tanh - 3Linear
– approximate (q̇, ṗ): 7 - 800Tanh - 800Tanh - 4Linear
• Unstructured SymODEN (0.78M parameters):
– Mθ−1 1
= Lθ1 LTθ1 , where Lθ1 : 4 - 400Tanh - 400Tanh - 400Tanh - 3Linear

12
Published as a Conference Paper at ICLR 2020

– Hθ2 : 6 - 600Tanh - 600Tanh - 1Linear

– gθ3 : 4 - 300Tanh - 300Tanh - 2Linear
• SymODEN (0.51M parameters):
– Mθ−11
= Lθ1 LTθ1 , where Lθ1 : 4 - 400Tanh - 400Tanh - 400Tanh - 3Linear
– Vθ2 : 4 - 300Tanh - 300Tanh - 1Linear
– gθ3 : 4 - 300Tanh - 300Tanh - 2Linear

B S PECIAL C ASE OF E NERGY- BASED C ONTROLLER - PD C ONTROLLER WITH E NERGY

C OMPENSATION
The energy-based controller has the form u(q, p) = β (q) + v(p), where the potential energy shaping term β (q) and
the damping injection term v(p) are given by Equation (7) and Equation (8), respectively.
If the desired potential energy Vq (q) is given by a quadratic, as in Equation (9), then
∂V ∂Vd
β (q) = gT (ggT )−1 −
∂q ∂q
∂V
= gT (ggT )−1 − Kp (q − q? ) , (28)
∂q
and the controller can be expressed as
∂V
u(q, p) = β (q) + v(p) = gT (ggT )−1 − Kp (q − q? ) − Kd p . (29)
∂q
The corresponding external forcing term is then given by
∂V
g(q)u = − Kp (q − q? ) − Kd p, (30)
∂q
which is same as Equation (10) in the main body of the paper. The first term in this external forcing provides an energy
compensation, whereas the second term and the last term are proportional and derivative control terms, respectively.
Thus, this control can be perceived as a PD controller with an additional energy compensation.

C A BLATION S TUDY OF D IFFERENTIABLE ODE S OLVER

In Hamiltonian Neural Networks (HNN), Greydanus et al. (2019) incorporate the Hamiltonian structure into learning
by minimizing the difference between the symplectic gradients and the true gradients. When the true gradient is not
available, which is often the case, the authors suggested using finite difference approximations. In SymODEN, true
gradients or gradient approximations are not necessary since we integrate the estimated gradient using differentiable
ODE solvers and set up the loss function with the integrated values. Here we perform an ablation study of the
differentiable ODE Solver.
Both HNN and the Unstructured SymODEN approximate the Hamiltonian by a neural network and the main difference
is the differentiable ODE solver, so we compare the performance of HNN and the Unstructured SymODEN. We set
the time horizon τ = 1 since it naturally corresponds to the finite difference estimate of the gradient. A larger τ would
correspond to higher-order estimates of gradients. Since there is no angle-aware design in HNN, we use Task 1 to
compare the performance of these two models.
We generate 25 training trajectories, each of which contains 45 time steps. This is consistent with the HNN paper.
In the HNN paper Greydanus et al. (2019), the initial conditions of the trajectories are generated randomly in an
annulus, whereas in this paper, we generate the initial state conditions uniformly in a reasonable range in each state
dimension. We guess the reason that the authors of HNN choose the annulus data generation is that they do not have
an angle-aware design. Take the pendulum for example; all the training and test trajectories they generate do not pass
the inverted position. If they make prediction on a trajectory with a large enough initial speed, the angle would go over
±2π, ±4π, etc. in the long run. Since these are away from the region where the model gets trained, we can expect
the prediction would be poor. In fact, this motivates us to design the angle-aware SymODEN in Section 3.3. In this
ablation study, we generate the training data in both ways.
Table 1 shows the train error and the prediction error per trajectory of the two models. We can see Unstructured
SymODEN performs better than HNN. This is an expected result. To see why this is the case, let us assume the training
loss per time step of HNN is similar to that of Unstructured SymODEN. Since the training loss is on the symplectic

13
Published as a Conference Paper at ICLR 2020

MSE w/ annulus training data Total Energy w/ annulus training data MSE w/ rectangle training data Total Energy w/ rectangle training data
10 10
8 8
8 8
6 6 6 6

4 4 4 4
Ground Truth
2 2 Unstructured SymODEN 2 2
0 HNN 0
0 0
0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10
t t t t

Figure 7: MSE and Total energy of a sample test trajectory. Left two figures: the training data for the models are
randomly generated in an annulus, the same as in HNN. Right two figures: the training data for the models are
randomly generated in a rectangle - the same way that we use in SymODEN.

gradient, the error would accumulate while integrating the symplectic gradient to get the estimated state values, and
MSE of the state values would likely be one order of magnitude greater than that of Unstructured SymODEN. Figure
7 shows the MSE and total energy of a particular trajectory. It is clear that the MSE of the Unstructured SymODEN
is lower than that of HNN. The MSE of HNN periodically touches zero does not mean it has a good prediction at
that time step. Since the trajectories in the phase space are closed circles, those zeros mean the predicted trajectory of
HNN lags behind (or runs ahead of) the true trajectory by one or more circles. Also, the energy of the HNN trajectory
drifts instead of staying constant, probably because the finite difference approximation is not accurate enough.
Table 1: Train error and prediction error per trajectory of Unstructured SymODEN and HNN. The train error per trajectory is the
sum of MSE of all the 45 timesteps averaged over the 25 training trajectories. The prediction error per trajectory is the sum of MSE
of 90 timesteps in a trajectory.
annulus training data rectangle training data
Models
train error prediction error train error prediction error
Unstructured SymODEN 56.59 440.78 502.60 4363.87
HNN 290.67 564.16 5457.80 26209.17

D E FFECTS OF THE TIME HORIZON τ

Incorporating the differential ODE solver also introduces two hyperparameters: solver types and time horizon τ . For
the solver types, the Euler solver is not accurate enough for our tasks. The adaptive solver “dopri5” lead to similar
train error, test error and prediction error as the RK4 solver, but requires more time during training. Thus, in our
experiments, we choose RK4.
Time horizon τ is the number of points we use to construct our loss function. Table 2 shows the train error, test error
and prediction error per trajectory in Task 2 when τ is varied from 1 to 5. We can see that longer time horizons lead
to better models. This is expected since long time horizons penalize worse long term predictions. We also observe in
our experiments that longer time horizons require more time to train the models.
Table 2: Train error, test error and prediction error per trajectory of Task 2
Time Horizon τ =1 τ =2 τ =3 τ =4 τ =5
Train Error 0.744 0.136 0.068 0.033 0.017
Test Error 0.579 0.098 0.052 0.024 0.012
Prediction Error 3.138 0.502 0.199 0.095 0.048

E F ULLY- ACTUATED C ARTPOLE AND ACROBOT

CartPole and Acrobot are underactuated systems. Incorporating the control of underactuated systems into the end-to-
end learning framework is our future work. Here we trained SymODEN on fully-actuated versions of Cartpole and
Acrobot and synthesized controllers based on the learned model.

14
Published as a Conference Paper at ICLR 2020

For the fully-actuated CartPole, Figure 8 shows the snapshots of the system of a controlled trajectory with an initial
condition where the pole is below the horizon. Figure 9 shows the time series of state variables and control inputs. We
can successfully learn the dynamics and control the pole to the inverted position and the cart to the origin.

Figure 8: Snapshots of a controlled trajectory of the fully-actuated CartPole system with a 0.3s time interval.
r r u1
1.0 2.5
0.5 0 0.0
0.0 1 2.5
0 2 4 6 8 0 2 4 6 8 0 2 4 6 8
t t t
u2
1
cos( ) 10
0.0
sin( ) 0
0 2.5
10
5.0
0 4 2 6 8 0 2 4 6 8 0 2 4 6 8
t t t
Figure 9: Time series of state variables and control inputs of a controlled trajectory shown in Figure 8. Black reference lines
indicate expected value in the end.
For the fully-actuated Acrobot, Figure 10 shows the snapshots of a controlled trajectory. Figure 11 shows the time
series of state variables and control inputs. We can successfully control the Acrobot from the downward position to
the upward position, though the final value of q2 is a little away from zero. Taking into account that the dynamics has
been learned with only 64 different initial state conditions, it is most likely that the upward position did not show up
in the training data.

Figure 10: Snapshots of a controlled trajectory of the fully-actuated Acrobot system with a 1s time interval.
q1 q1 u1
1 cos(q1) 2 0
sin(q1)
0
0 2
1
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
t t t
q2 q2 u2
1.0 cos(q2) 25
sin(q2) 0.5
0.5 0
0.0
0.0 0.5 25
0 105 15 20 0 5 10 15 20 0 5 10 15 20
t t t
Figure 11: Time series of state variables and control inputs of a controlled trajectory shown in Figure 10. Black reference lines
indicate expected value in the end.

F T EST E RRORS OF THE TASKS

Here we show statistics of train, test, and prediction per trajectory in all four tasks. The train errors are based on 64
initial state conditions and 5 constant inputs. The test errors are based on 64 previously unseen initial state conditions
and the same 5 constant inputs. Each trajectory in the train and test set contains 20 steps. The prediction error is based
on the same 64 initial state conditions (during training) and zero inputs.

15
Published as a Conference Paper at ICLR 2020

Table 3: Train, Test and Prediction errors of the Four Tasks

Naive Geometric Unstructured
Baseline Baseline Symplectic-ODE Symplectic-ODE
Task 1: Pendulum
Model Parameter 0.36M N/A 0.20M 0.13M
Train error 30.82 ± 43.45 N/A 0.89 ± 2.76 1.50 ± 4.17
Test error 40.99 ± 56.28 N/A 2.74 ± 9.94 2.34 ± 5.79
Prediction error 37.87 ± 117.02 N/A 17.17 ± 71.48 23.95 ± 66.61
Task 2: Pendulum (embed)
Model Parameter 0.65M 0.46M 0.39M 0.14M
Train error 2.31 ± 3.72 0.59 ± 1.634 1.76 ± 3.69 0.067 ± 0.276
Test error 2.18 ± 3.59 0.49 ± 1.762 1.41 ± 2.82 0.052 ± 0.241
Prediction error 317.21 ± 521.46 14.31 ± 29.54 3.69 ± 7.72 0.20 ± 0.49
Task3: CartPole
Model Parameter 1.01M 0.82M 0.67M 0.51M
Train error 15.53 ± 22.52 0.45 ± 0.37 4.84 ± 4.42 1.78 ± 1.81
Test error 25.42 ± 38.49 1.20 ± 2.67 6.90 ± 8.66 1.89 ± 1.81
Prediction error 332.44 ± 245.24 52.26 ± 73.25 225.22 ± 194.24 11.41 ± 16.06
Task 4: Acrobot
Model Parameter 1.46M 0.97M 0.78M 0.51M
Train error 2.04 ± 2.90 2.07 ± 3.72 1.32 ± 2.08 0.25 ± 0.39
Test error 5.62 ± 9.29 5.12 ± 7.25 3.33 ± 6.00 0.28 ± 0.48
Prediction error 64.61 ± 145.20 26.68 ± 34.90 9.72 ± 16.58 2.07 ± 5.26

Artificial Intelligence and Deep Learning Techniques For Inverse Kinematics and Dynamics of Serial Robotic Manipulators: A Survey
No ratings yet
Artificial Intelligence and Deep Learning Techniques For Inverse Kinematics and Dynamics of Serial Robotic Manipulators: A Survey
24 pages
Enforcing Constraints For Time Series Prediction in Supervised, Unsupervised and Reinforcement Learning
No ratings yet
Enforcing Constraints For Time Series Prediction in Supervised, Unsupervised and Reinforcement Learning
30 pages
Ai Lorenz Pinn
No ratings yet
Ai Lorenz Pinn
28 pages
Force and Pressure Notes For Class 8
100% (2)
Force and Pressure Notes For Class 8
4 pages
Physics-Based Deep Learning: N. Thuerey, B. Holzschuh, P. Holl, G. Kohl, M. Lino, Q. Liu, P. Schnell, F. Trost
No ratings yet
Physics-Based Deep Learning: N. Thuerey, B. Holzschuh, P. Holl, G. Kohl, M. Lino, Q. Liu, P. Schnell, F. Trost
461 pages
Symplectic Networks For Identifying Hamiltonian Systems.
No ratings yet
Symplectic Networks For Identifying Hamiltonian Systems.
18 pages
Shaan Desai Pinns
No ratings yet
Shaan Desai Pinns
153 pages
Accepted Manuscript: Journal of Computational Physics
No ratings yet
Accepted Manuscript: Journal of Computational Physics
47 pages
BCM-Blood Circulatory Massager - TIEN'S Presentation
75% (8)
BCM-Blood Circulatory Massager - TIEN'S Presentation
52 pages
Statistical Physics of Deep Neural Networks: Generalization Capability, Beyond The Infinite Width, and Feature Learning.
No ratings yet
Statistical Physics of Deep Neural Networks: Generalization Capability, Beyond The Infinite Width, and Feature Learning.
206 pages
Phase Space Learning With Neural Networks
No ratings yet
Phase Space Learning With Neural Networks
24 pages
Statistical Mechanics of Deep Learning
No ratings yet
Statistical Mechanics of Deep Learning
30 pages
20 StatMechDeep
No ratings yet
20 StatMechDeep
30 pages
Hamiltonian Generative Networks
No ratings yet
Hamiltonian Generative Networks
19 pages
UC Merced Previously Published Works
No ratings yet
UC Merced Previously Published Works
40 pages
Learnability of Linear Port-Hamiltonian Systems
No ratings yet
Learnability of Linear Port-Hamiltonian Systems
56 pages
System Modeling
No ratings yet
System Modeling
20 pages
2019 - Greydanus Hamiltonian Neural Networks Paper
No ratings yet
2019 - Greydanus Hamiltonian Neural Networks Paper
11 pages
008 Parameterized Neural Ordinary Differential Equations: Applications To Computational Physics Problems
No ratings yet
008 Parameterized Neural Ordinary Differential Equations: Applications To Computational Physics Problems
19 pages
OINN Configurations For Dynamic Systems
No ratings yet
OINN Configurations For Dynamic Systems
15 pages
Deep Lagrangian Networks
No ratings yet
Deep Lagrangian Networks
17 pages
Symplectically Integrated Neural Networks
No ratings yet
Symplectically Integrated Neural Networks
12 pages
Neural Networks
No ratings yet
Neural Networks
23 pages
A Proposal On Machine Learning Via Dynamical Systems
No ratings yet
A Proposal On Machine Learning Via Dynamical Systems
11 pages
Statistics Mechanic of Deep Learning
No ratings yet
Statistics Mechanic of Deep Learning
28 pages
Enhancing Trajectory Prediction in Complex Dynamical Systems With Neural Ordinary Differential Equations
No ratings yet
Enhancing Trajectory Prediction in Complex Dynamical Systems With Neural Ordinary Differential Equations
12 pages
Learning Poisson Systems and Trajectories of Autonomous Systems Via Poisson Neural Networks
No ratings yet
Learning Poisson Systems and Trajectories of Autonomous Systems Via Poisson Neural Networks
12 pages
Modeling of Electromechanical Systems PDF
No ratings yet
Modeling of Electromechanical Systems PDF
44 pages
Lagrangian Neural Networks
No ratings yet
Lagrangian Neural Networks
9 pages
Conmatphys 031119 050745
No ratings yet
Conmatphys 031119 050745
28 pages
Raissi - PIDL Part 2
No ratings yet
Raissi - PIDL Part 2
19 pages
Research Statement - Somil Bansal
No ratings yet
Research Statement - Somil Bansal
6 pages
A Review of Physics-Informed Machine Learning in F
No ratings yet
A Review of Physics-Informed Machine Learning in F
21 pages
GrADE A Graph Based Data Driven Solver F
No ratings yet
GrADE A Graph Based Data Driven Solver F
20 pages
PINN
100% (1)
PINN
22 pages
Machine-Learning Non-Conservative Dynamics For New-Physics Detection
No ratings yet
Machine-Learning Non-Conservative Dynamics For New-Physics Detection
17 pages
2022 Predicting Parametric Spatiotemporal Dynamics by Multi-Resolution PDE Structure-Preserved Deep Learning
No ratings yet
2022 Predicting Parametric Spatiotemporal Dynamics by Multi-Resolution PDE Structure-Preserved Deep Learning
51 pages
Physics-Enhanced Neural Networks in The Small Data Regime: Equal Contribution
No ratings yet
Physics-Enhanced Neural Networks in The Small Data Regime: Equal Contribution
6 pages
Deep Neural Nets As Hamiltonians: Mike Winer, Boris Hanin April 1, 2025
No ratings yet
Deep Neural Nets As Hamiltonians: Mike Winer, Boris Hanin April 1, 2025
26 pages
ORT Metriplectic Neural Networks Thermodynamics Informed Machine Learning of Complex Physical Systems
No ratings yet
ORT Metriplectic Neural Networks Thermodynamics Informed Machine Learning of Complex Physical Systems
9 pages
NA To SS en 1997-1 2010 - Singapore National Annex To Eurocode 7
100% (2)
NA To SS en 1997-1 2010 - Singapore National Annex To Eurocode 7
26 pages
From Pinns To Pikans: Recent Advances in Physics-Informed Machine Learning
No ratings yet
From Pinns To Pikans: Recent Advances in Physics-Informed Machine Learning
90 pages
PINN Notes
No ratings yet
PINN Notes
10 pages
Physics-Informed Neural Networks
No ratings yet
Physics-Informed Neural Networks
22 pages
A High-Bias, Low-Variance Introduction To Machine Learning For Physicists PDF
No ratings yet
A High-Bias, Low-Variance Introduction To Machine Learning For Physicists PDF
117 pages
Hamiltonian 2
No ratings yet
Hamiltonian 2
10 pages
Deep Unsupervised State Representation Learning With Robotic Priors: A Robustness Analysis
No ratings yet
Deep Unsupervised State Representation Learning With Robotic Priors: A Robustness Analysis
8 pages
Do Two AI Scientists Agree?
No ratings yet
Do Two AI Scientists Agree?
16 pages
A Reservoir Computing Approach For Learning Forward Dynamics of Industrial Manipulators
No ratings yet
A Reservoir Computing Approach For Learning Forward Dynamics of Industrial Manipulators
7 pages
Topics in Statistical Mechanics: The Foundations of Molecular Simulation
No ratings yet
Topics in Statistical Mechanics: The Foundations of Molecular Simulation
146 pages
Statistical Mechanics Lecnotes
100% (1)
Statistical Mechanics Lecnotes
146 pages
DL Report
No ratings yet
DL Report
6 pages
Symplectic Ode-Net - Learning Hamiltonian Dynamics With Control
No ratings yet
Symplectic Ode-Net - Learning Hamiltonian Dynamics With Control
17 pages
NeurIPS 2021 Redesigning The Transformer Architecture With Insights From Multi Particle Dynamical Systems Paper
No ratings yet
NeurIPS 2021 Redesigning The Transformer Architecture With Insights From Multi Particle Dynamical Systems Paper
14 pages
Physics-Guided Physics-Informed and Physics-Encode
No ratings yet
Physics-Guided Physics-Informed and Physics-Encode
37 pages
Neural Manifold Operators For Learning The Evolution of Physical
No ratings yet
Neural Manifold Operators For Learning The Evolution of Physical
11 pages
Journal of Computational Physics: M. Raissi, P. Perdikaris, G.E. Karniadakis
No ratings yet
Journal of Computational Physics: M. Raissi, P. Perdikaris, G.E. Karniadakis
22 pages
EE 4443/4329 - Control Systems Design Project: Updated:Tuesday, June 15, 2004
No ratings yet
EE 4443/4329 - Control Systems Design Project: Updated:Tuesday, June 15, 2004
6 pages
Millard Et Al - 2020 - Automatic Differentiation and Continuous Sensitivity Analysis of Rigid Body
No ratings yet
Millard Et Al - 2020 - Automatic Differentiation and Continuous Sensitivity Analysis of Rigid Body
8 pages
GSW NG01017640 GEN LA7880 00004 - Technical Specifications For Pipeline Valves - D01
100% (1)
GSW NG01017640 GEN LA7880 00004 - Technical Specifications For Pipeline Valves - D01
23 pages
Symbolic Pregression - Discovering Physical Laws From Distorted Video 2005.11212v2
No ratings yet
Symbolic Pregression - Discovering Physical Laws From Distorted Video 2005.11212v2
9 pages
Agri Sba
96% (25)
Agri Sba
15 pages
Permutation Methods A Distance Function Approach 2nd Edition Entire Book Download
100% (20)
Permutation Methods A Distance Function Approach 2nd Edition Entire Book Download
17 pages
RRB ALP 2024 CBT-1 and CBT-2 Complete Syllabus
No ratings yet
RRB ALP 2024 CBT-1 and CBT-2 Complete Syllabus
5 pages
Chapter 1 Managers and You in The Workplace
100% (1)
Chapter 1 Managers and You in The Workplace
75 pages
Trane Presentation
No ratings yet
Trane Presentation
52 pages
Cgat Series
No ratings yet
Cgat Series
20 pages
Dcit 60 Reviewer
No ratings yet
Dcit 60 Reviewer
5 pages
Technical Data Sheet Jazeera Maxim Tex JA-26002: Description
No ratings yet
Technical Data Sheet Jazeera Maxim Tex JA-26002: Description
3 pages
Environmental Thesis Statements
100% (1)
Environmental Thesis Statements
5 pages
Sifcon Report 1
100% (1)
Sifcon Report 1
27 pages
Atomic Theory Science Presentation Colorful 3D Style
No ratings yet
Atomic Theory Science Presentation Colorful 3D Style
10 pages
Island of Ignorance 31 Aug 23 Digital Draft
No ratings yet
Island of Ignorance 31 Aug 23 Digital Draft
41 pages
.Trashed 1724238737 BPSC Senior Secondary Teacher Booklets Sanskrit
No ratings yet
.Trashed 1724238737 BPSC Senior Secondary Teacher Booklets Sanskrit
32 pages
Oxford Insight Mathematics 10-5-25 3 AC For NSW Student Book Obook John Ley Michael Fuller Z Lib Org 60
No ratings yet
Oxford Insight Mathematics 10-5-25 3 AC For NSW Student Book Obook John Ley Michael Fuller Z Lib Org 60
1 page
Learn About Ecosystems - Lesson Plan
No ratings yet
Learn About Ecosystems - Lesson Plan
2 pages
Business Overview
No ratings yet
Business Overview
42 pages
應徵Hsing Ju Chien CV
No ratings yet
應徵Hsing Ju Chien CV
2 pages
Whose Side Are We On Howard Becker
No ratings yet
Whose Side Are We On Howard Becker
10 pages
Review-Midterm-1-Eng 6
No ratings yet
Review-Midterm-1-Eng 6
6 pages
Opa1632 Used in AMB Laboratories Schematics
No ratings yet
Opa1632 Used in AMB Laboratories Schematics
35 pages
Fluvial Processes
No ratings yet
Fluvial Processes
35 pages
Weighted Residual Method
No ratings yet
Weighted Residual Method
3 pages
Exp - S5 - Vapour Liquid Equilibrium - Corrected
No ratings yet
Exp - S5 - Vapour Liquid Equilibrium - Corrected
6 pages
Characterization of Microbial and Prebiotic of Bread
No ratings yet
Characterization of Microbial and Prebiotic of Bread
33 pages
List of Classified HKAL Chemistry Exam Questions: A S M O
No ratings yet
List of Classified HKAL Chemistry Exam Questions: A S M O
2 pages
Topcon Agriculture SB - 18005 TopNET Global D Frequency Migration Phase 2
No ratings yet
Topcon Agriculture SB - 18005 TopNET Global D Frequency Migration Phase 2
2 pages
Emergence III
From Everand
Emergence III
Larry Matthews
No ratings yet
Attractor Networks: Fundamentals and Applications in Computational Neuroscience
From Everand
Attractor Networks: Fundamentals and Applications in Computational Neuroscience
Fouad Sabry
No ratings yet
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
From Everand
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
Fouad Sabry
No ratings yet

Hamiltonian 3

Uploaded by

Hamiltonian 3

Uploaded by

Published as a Conference Paper at ICLR 2020

S YMPLECTIC ODE-N ET: L EARNING H AMILTONIAN DYNAMICS

Yaofeng Desmond Zhong† , Biswadip Dey‡ , and Amit Chakraborty‡

2.2 C ONTROL VIA E NERGY S HAPING

3.1 T RAINING N EURAL ODE WITH C ONSTANT F ORCING

3.2 L EARNING FROM G ENERALIZED C OORDINATE AND M OMENTUM

3.3 L EARNING FROM E MBEDDED A NGLE DATA

3.4 L EARNING ON H YBRID S PACES Rn × Tm

3.5 P OSITIVE D EFINITENESS OF THE M ASS MATRIX

4.2 TASK 1: P ENDULUM WITH G ENERALIZED C OORDINATE AND M OMENTUM DATA

the learned functions of SymODEN. We also p 1 p 1 p 1

4.3 TASK 2: P ENDULUM WITH E MBEDDED DATA

4.4 TASK 3: C ART P OLE S YSTEM

4.5 TASK 4: ACROBOT

Task 1: Pendulum Task 2: Pendulum(embed) Task 3: CartPole Task 4: Acrobot

102 103 102 103 102 103 102 103

101 102 102

101 100 101

• Input: 2 state dimensions, 1 action dimension

• Unstructured SymODEN (0.20M parameters):

Task 2: Pendulum with embedded data

• Input: 3 state dimensions, 1 action dimension

• Input: 5 state dimensions, 1 action dimension

• Input: 6 state dimensions, 1 action dimension

– Hθ2 : 6 - 600Tanh - 600Tanh - 1Linear

B S PECIAL C ASE OF E NERGY- BASED C ONTROLLER - PD C ONTROLLER WITH E NERGY

C A BLATION S TUDY OF D IFFERENTIABLE ODE S OLVER

D E FFECTS OF THE TIME HORIZON τ

E F ULLY- ACTUATED C ARTPOLE AND ACROBOT

F T EST E RRORS OF THE TASKS

Table 3: Train, Test and Prediction errors of the Four Tasks

You might also like