0% found this document useful (0 votes)

29 views6 pages

Training Neural ODEs Using Fully Discretized Simultaneous Optimization

This document discusses a method for training Neural Ordinary Differential Equations (Neural ODEs) using simultaneous optimization techniques to reduce computational costs associated with traditional training. The authors propose a collocation-based, fully discretized approach that utilizes IPOPT for optimizing collocation coefficients and neural network parameters, demonstrating faster convergence through a case study on the Van der Pol Oscillator. Additionally, a decomposition framework using the Alternating Direction Method of Multipliers (ADMM) is introduced to efficiently manage sub-models across data batches.

Uploaded by

areepadhaikarlothoda69420

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views6 pages

Training Neural ODEs Using Fully Discretized Simultaneous Optimization

Uploaded by

areepadhaikarlothoda69420

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Training Neural ODEs Using Fully

Discretized Simultaneous Optimization ⋆

Mariia Shapovalova ∗ Calvin Tsay ∗
∗
Department of Computing, Imperial College London, London, UK
(e-mail: [email protected]).

Abstract: Neural Ordinary Differential Equations (Neural ODEs) represent continuous-time

dynamics with neural networks, offering advancements for modeling and control tasks. However,
training Neural ODEs requires solving differential equations at each epoch, leading to high
computational costs. This work investigates simultaneous optimization methods as a faster
training alternative. In particular, we employ a collocation-based, fully discretized formulation
arXiv:2502.15642v1 [cs.LG] 21 Feb 2025

and use IPOPT—a solver for large-scale nonlinear optimization—to simultaneously optimize
collocation coefficients and neural network parameters. Using the Van der Pol Oscillator as
a case study, we demonstrate faster convergence compared to traditional training methods.
Furthermore, we introduce a decomposition framework utilizing Alternating Direction Method
of Multipliers (ADMM) to effectively coordinate sub-models among data batches. Our results
show significant potential for (collocation-based) simultaneous Neural ODE training pipelines.

Keywords: Simultaneous dynamic optimization, nonlinear system identification, neural ODEs

1. INTRODUCTION Despite their advantages, training Neural ODEs (or mak-

ing predictions) requires solving an initial value problem
Data-driven dynamic models are increasingly prevalent in (IVP) at each iteration using a numerical ODE solver.
chemical and process systems engineering, providing useful Given the initial condition x(t0 ), the state at time T ≥ t0
alternatives to first-principles, physical models (Bhosekar is computed by solving the differential equation:
and Ierapetritou, 2018; Thebelt et al., 2022). In particular,
neural network-based models are popular surrogates in
Z T

scheduling and control applications owing to their flex- x(T ) = x(t0 ) + fθ (x(t), t) dt
t0 (2)
ibility and representation power, e.g., as scale-bridging
models (Tsay and Baldea, 2019). Neural networks can have = ODESolve(x(t0 ), fθ , t0 , T ),
various architectures and consist of learnable weights and
biases that are optimized during training by minimizing a where ODESolve is a numerical IVP solver, and t0 , T
specified loss function. This is commonly achieved using are the beginning and end of the integration interval,
gradient-based algorithms, which iteratively update the respectively. Training is typically based on the accuracy
model parameters by taking steps in the negative direction of the predictions x(T ), requiring the numerical solution
of the gradient of the loss function (Sarker, 2021). of (2) and backpropagation of gradients through the IVP
solver at every iteration. These requirements lead to long
Neural Ordinary Differential Equations (Neural ODEs) training times of Neural ODEs (Lehtimäki et al., 2024).
(Chen et al., 2018) bridge neural networks with dynam-
ical systems modeling, leveraging existing knowledge of Given the above, this work uses spectral numerical meth-
ODEs. These models extend traditional networks to model ods, specifically collocation, for the time integration of
unknown continuous-time dynamics by parameterizing the differential equations in Neural ODE training. Spectral
evolution of system states as a differential equation: methods offer several advantages: they are global as they
approximate over the entire domain, display exponential
dx(t)
= fθ (x(t), t), (1) convergence for smooth problems, and have better accu-
dt racy with a small number of points (Boyd, 2000). Spectral
where x(t) represents the state at time t, and fθ is the methods remain less explored than sequential methods in
neural network parameterized by θ. Compared to more the context of Neural ODEs and have to date mostly been
standard recurrent or convolutional neural networks, neu- limited to approximating derivative targets (Roesch et al.,
ral ODEs are flexible and can incorporate arbitrary time 2021), or for non-simultaneous training (Quaglino et al.,
spacings. The framework has found numerous applica- 2020). The novelty of this paper is that we show collocation
tions, including process control (Luo et al., 2023), reaction can be employed in a simultaneous optimization approach,
modeling (Sorourifar et al., 2023), and parameter estima- i.e., the system dynamics are solved as equality constraints
tion (Bradley and Boukouvala, 2021; Dua and Dua, 2012). rather than by iterative simulation, for fast and stable
Neural ODE training. Furthermore, we show that the
⋆ Support from a BASF/Royal Academy of Engineering Senior proposed method may produce more parsimonious models
Research Fellowship is gratefully acknowledged. and is amenable to batching via ADMM.
2. NEURAL ODES FOR TIME SERIES 3. SPECTRAL METHODS

Neural ODEs can be applied in various contexts, e.g., in As an alternative to the above, spectral numerical methods
generative modeling or as implicit layers in larger models. approximate the ODE solution as a linear combination of
We focus on the basic, control-relevant setting of model- basis functions, e.g., trigonometric or orthogonal polyno-
−1
ing time-series data comprising observations {yi , ti }N i=0 ,
mials. The function coefficients are fitted over the integra-
d
where yi ∈ R is the data vector at time ti , and T = tN −1 tion domain, offering high accuracy and convergence rates
is the end of the time (and integration) interval. Our for smooth problems (Boyd, 2000).
objective is to learn a parametric ODE model that, when 3.1 Collocation with Lagrange Interpolation
integrated from an initial condition, results in a continuous
trajectory y(t) approximating the observed data: Collocation is a class of spectral methods in which an
y′ (t) = fθ (y(t), t), y(t0 ) = y0 , t ≥ t0 , (3) ODE is enforced at a set of discrete points, termed the
−1
where y′ (t) is the time derivative of the system state collocation grid, which we introduce as ξ = {ξi }N i=0 in
y(t) and fθ is a neural network parameterized by θ. A [t0 , T ]. Under this framework, the approximate solution is
trained Neural ODE model may also predict beyond the N
X −1
observed data interval [t0 , T ], provided the ODE solution ỹ(t) = βi ϕi (t),
remains valid and fθ satisfies conditions such as Lipschitz i=0
continuity. where {ϕi (t)}N −1 N −1
i=0 represent the basis functions, {βi }i=0
During training, the solution Ŷ is computed by numeri- are the coefficients to be determined, and t ∈ [t0 , T ].
cally solving the IVP: We employ the barycentric form of Lagrange polynomials
Ŷ = ODESolve (fθ , y0 , t) ∈ RN ×d , (4) as basis functions due to its numerical stability and adapt-
where t = {ti }N −1 ability to diverse functions, including non-periodic behav-
i=0 is the vector of time points matching iors (Berrut and Trefethen, 2004). The use of Lagrange
the observations. The model parameters, θ, are learned by
minimizing a loss function that captures the discrepancy polynomials also offers an implementation simplification
between the predictions and observed data. The mean due to the interpolation property, which ensures that the
squared error (MSE) loss function is a common choice for coefficients coincide with the true state values at the col-
continuous-output regression: location grid, such that βi = yi at each collocation point.
2
As a result, the approximated solution becomes:
1
Lθ (Ŷ, Y) = Ŷ − Y , (5) N −1
N
X
F ỹ(t) = yi ℓi (t), (6)
where Y ∈ RN ×d is the matrix of observed values and i=0
∥·∥F is the Frobenius norm. In summary, the goal of Neural N −1
where {ℓi (t)}i=0 are the Lagrange basis functions and
ODE training can be formulated as computing θ such that N −1
(3) holds while minimizing (5). {yi }i=0 are the unknown coefficients, which are also the
state values at each point ξi . We assume y to be unknown,
2.1 Sequential ODE Solvers given the presence of noise in real-life systems. By treating
all yi as coefficients (Berrut and Trefethen, 2004) and
In a typical Neural ODE training pipeline, we ensure (3) differentiating the interpolation formula (6), we obtain:
holds by solving the ODE system in every iteration. In this N
X −1

sequential approach, the solver computes Ŷ iteratively, ỹ′ (t) = yi ℓ′i (t). (7)
e.g., by time stepping, until the end of the interval is i=0
reached at T . The simplest example is Euler’s method, Substituting (6) and (7) into the Neural ODE (3) and
while more commonly used schemes are the Runge-Kutta evaluating at each collocation point ξi yields:
methods, used as default solvers in torchdiffeq within  
N −1 N −1
PyTorch and Diffrax within JAX. We can generalize a X X
step of a sequential numerical scheme as: yj ℓ′j (ξi ) = fθ  yj ℓj (ξi ), ξi  , i = 0, . . . , N − 1.
j=0 j=0
y(t + h) = y(t) + h · Φ(f, y(t), t, h),
(8)
where h is the step size, f is the derivative function, and Φ
denotes the (typically explicit) function that approximates This results in a system of nonlinear equations with respect
N −1
the change in y over the interval from t to t + h. to the unknowns {yj }j=0 . The system can be expressed
While this framework enables using tailored simulation in matrix form:
methods for (3), using sequential ODE solvers for training DY = Fθ (Y, ξ), (9)
poses several challenges. First, numerical errors can accu- where:
mulate at each integration step, resulting in substantial
global errors. Although adaptive solvers help control these • D ∈ RN ×N is the differentiation matrix with elements
errors, they add computational overhead and may still Dij = ℓ′j (ξi ).
behave unpredictably on unseen data. Second, in addition • Y = [y0 , . . . , yN −1 ]⊤ ∈ RN ×d is the matrix of
to simulation CPU times, storing intermediate solutions unknown coefficients (true state values).
for backpropagation requires significant memory. Despite • Fθ (Y, ξ) = [fθ (ỹ(ξ0 ), ξ0 ), . . . , fθ (ỹ(ξN −1 ), ξN −1 )]⊤ ∈
the adjoint method (Chen et al., 2018) ensuring constant RN ×d contains the Neural ODE evaluated at each
memory cost, it can substantially prolong training time. collocation point.
Using the barycentric formula (Berrut and Trefethen, We initialize two optimization variable groups within Py-
2004), the differentiation matrix is defined as: omo: state variables and neural network parameters.
wj 1


 , if i ̸= j, • State Variables, Y∗ , represent the system’s state
 w i ξi − ξj at collocation time points. To expedite training, the


N −1 state variables can also be initialized using smoothed
Dij = X


 − Dik , if i = j, observed data. These variables aim to approximate
 k=0 the true values Y from the observed values Yobs .

k̸=i
• Neural Network Parameters, θ, include the
where the weight is computed as: weights and biases of the neural network.
1
wi = N −1 . The objective function captures the difference between
Y
(ξi − ξk ) the observed data and the state variables approximated
by the collocation equations, instead of the output of
k=0 k̸=i
ODESolve as in (4). We express the objective function as
The selection of the collocation grid significantly impacts a combination of the MSE loss and regularization terms:
the accuracy of the method. To mitigate errors caused by 1
L(Y∗ , Yobs ) = ∥Y∗ − Yobs ∥2F + λ∥θ∥22 , (11)
Runge’s phenomenon, Chebyshev nodes of the second kind N
in [−1, 1] are often used:
where
iπ
ξi = cos , i = 0, . . . , N − 1. (10) • Y∗ ∈ RN ×d is the matrix of estimated state variables
N −1
(the variables being optimized).
We refer the interested reader to Young (2019) for a • Yobs ∈ RN ×d is the matrix of observed values.
comprehensive discussion of collocation grids. • θ is the vector of neural network parameters.
• ∥ · ∥2 and ∥ · ∥F are the Euclidean (L2) and Frobenius
4. PROPOSED METHODOLOGY norms respectively.
• λ is the regularization parameter.
So far, we have converted the continuous ODE problem We enforce consistency between the neural network and
into a discrete algebraic system (9) by incorporating collo- the derivative of Y∗ at each collocation point ξi :
cation and Lagrange interpolation. Our goal is to minimize N −1
the loss function (5) while enforcing that the collocation-
X
yj∗ lj′ (ξi ) = fθ (yi∗ , ξi ).
estimated derivatives, computed as DY, match the neural-
j=0
network-predicted derivatives Fθ (Y, ξ) at the collocation
points. The challenge arises because both the true values Here, ℓ′j (ξi )
is the element Dij of the differentiation matrix,
of the state values Y and the parameters θ of the neural so the left-hand side approximates the derivative of the
network are unknown. optimized state values. The right-hand side is the output
of the neural network.
4.1 Simultaneous Approach
4.3 Problem Formulation
Our proposed approach is to incorporate the collocation
system (5) as equality constraints in a single nonlinear We formulate the optimization problem as follows:
optimization framework, where the objective function cap- min
∗
L(Y∗ , Yobs ),
Y ,θ
tures the discrepancy between the observed and optimized
states, e.g., the MSE. By solving for Y and θ simul- Subject to:
taneously, we effectively train the neural network from Equality Constraints: DY∗ = Fθ (Y∗ , ξ),
∗
observed data while enforcing the collocation constraints. Bounds: yL ≤ ỹ∗ ≤ yU
∗
, θL ≤ θ ≤ θR
The simultaneous approach in the context of collocation- where:
based dynamic optimization can be further explored in • L : is the loss function as described in (11).
the works by Tjoa and Biegler (1991); Kameswaran and • C : DY∗ = Fθ (Y∗ , ξ) is the matrix of equality
Biegler (2006), where it is applied to the problem of constraints.
parameter-estimation in differential equation systems. • Y∗ and θ represent decision variables.
∗ ∗
• yL , yU and θL , θU are the respective values for their
4.2 Implementation lower and upper bounds
To implement this methodology, we utilize the Interior
Point OPTimizer (IPOPT), which is well-suited for solv- After the model is solved to optimality, the neural network
ing continuous, large-scale nonlinear optimization prob- can be used as the RHS of an ODE in a sequential or
lems (Biegler and Zavala, 2009). For the software imple- collocation-based solver in the post-training context.
mentation, we call IPOPT through the open-source Pyomo
algebraic modeling language. Recent research (Ceccon 4.4 Alternating Direction Method of Multipliers (ADMM)
et al., 2022) demonstrates how neural networks can be
represented as constraints within the Pyomo framework. One potential disadvantage of the above simultaneous
framework is that the entire dataset must be handled in a
single optimization problem, while many training pipelines 2010). For the simultaneous approach, the state vari-
divide data into batches to alleviate computational or ables Y∗ are initialized to the values of locally
memory burden. We propose using the Alternating Di- weighted polynomial regression (Cleveland and De-
rection Method of Multipliers (ADMM) to enable multi- vlin, 1988).
batching by coordinating the training of separate submod-
els. For two ‘batches,’ the problem can be written as:
min L1 (θ1 , Y1 ) + L2 (θ2 , Y2 ) 5.1 Case Study: Van der Pol Oscillator
θ1 ,θ2
s.t. θ1 = θ2 , The forced Van der Pol Oscillator is a 2-D ODE system
where Y1 , Y2 are the batches of data, θ1 , θ2 are vectors that can be represented as two coupled first-order equa-
containing parameters of each sub-model, and L1 , L2 are tions:
the loss functions. Notice the ‘linking’ constraints θ1 = θ2 ′
u = v, u0 = 0
enforce a consensus model between the two data batches. v ′ = µ(1 − u2 )v − u + A cos(ωt), v0 = 1,
ADMM decomposes the above problem without the link-
ing constraints by updating the optimization parameters where u is the displacement, v is the velocity, ω is the
and a dual variable (Lagrange multiplier) in an iterative angular frequency, µ is the damping parameter, and A is
manner (Boyd et al., 2010). Without the constraints, the external periodic force. For our experiments, we set
the problem is effectively decomposed into independent the initial conditions as u0 = 0 and v0 = 1. The remaining
subproblems minθi Li (θi , Yi ). The loss functions for the parameters are chosen as µ = 1, A = 1 and ω = 1.
subproblems are reformulated as follows:
ρ ui
2 Training and Inference Procedure After training with
LADMM,i = Li (θi , Xi )+ θi − θ̄ (k) + ,
for i = 1, 2, our proposed collocation-based framework, the learned
2 ρ
2 Neural ODE is used in a standard ODE solver (JAX
where θi are parameters of submodel i, θ̄ (k) are the Diffrax) for forward simulation. Figure 1 illustrates the
consensus parameters at the k-th ADMM iteration, ρ is prediction on both training and test ranges, showing
a scalar penalty strength, and ui are the dual variables that the collocation-trained Neural ODE captures the
associated with subproblem i. For two submodels, the underlying dynamics effectively.
consensus weights in the k-th iteration are given by:
(k) (k)
θ1 + θ2
θ̄ (k) =
2
The dual variables for each submodel i are updated in each
iteration using:
(k+1) (k) (k)
ui = ui + ρ(θi − θ̄ (k) ).

The model can be run for a fixed number of iterations or

until convergence, which is defined by the magnitude of
the primal residual, which we compute as:
2
(k) (k)
X
rprimal = ∥θi − θ̄ (k) ∥2 .
i=1
Fig. 1. Predictions of a model trained with collocation-
This ADMM framework allows us to train larger models by based method (Pyomo) for the Van der Pol Oscil-
decomposing the problem into smaller subproblems, but lator. 200 training points and 200 testing points.
can also improve the model generalization by learning from
multiple trajectories simultaneously.
Pre-Training Strategies While we found that collocation-
5. EXPERIMENTAL RESULTS based training generally converged to good solutions, the
sequential approach often resulted in premature local op-
We compare the proposed collocation-based approach for tima, perhaps owing to the feasible path approach. We
training Neural ODEs to two benchmark sequential imple- therefore consider strategies for initial pre-training on
mentations: JAX (Diffrax) and PyTorch (torchdiffeq). Be- a subset of the data (20%). As shown in Table 1, we
fore training, we apply the following preprocessing steps: use the notation Model[Pre-training] to denote differ-
ent framework combinations, where Model is the strat-
• Spacing: We interpolate the training data to the egy for the main training phase, and Pre-training is
chosen collocation grid, ensuring the data align with the strategy for the pre-training phase. The label No
collocation points. The collocation grids are scaled indicates that no pre-training was used. The standard
according to the time range of the observed data. method for pre-training uses the same model for both
• Noise: We simulate measurement noise by adding phases (JAX[JAX] or PyTorch[PyTorch]). We also explore
zero-centered Gaussian noise with σ = 0.1. a hybrid approach, where our Pyomo (collocation-based)
• Initialization: We utilize Xavier initialization for model is used before continuing training with the bench-
the neural network weights (Glorot and Bengio, mark models (JAX[Pyomo] or PyTorch[Pyomo]).
Table 1. Comparison of training frameworks.
Smaller- and regular-size networks have lay-
ers of [2, 8, 2] and [2, 32, 2] nodes, respectively.
Results averaged over 10 runs. Note the test
MSEs are not available to the optimizers,
which merely seek to minimize training MSE.

Model [Pre-training] MSE Train MSE Test Time (s)

Shorter Training Duration: Regular-Size Network
Pyomo [No] 0.0225 0.6139 7.144
JAX [JAX] 0.0553 1.0395 7.322
Smaller-Size Network
Pyomo [No] 0.0374 0.9792 2.043
JAX [JAX] 0.1003 1.2908 8.705
Longer Training Duration: Regular-Size Network
JAX [No] 2.7192 129.63 27.61
JAX [JAX] 0.0312 1.0954 23.85
JAX [Pyomo] 0.0098 0.4891 23.18 Fig. 2. Training MSE for three training frameworks. Note
PyTorch [PyTorch] 0.4152 1.6728 25.14 that the MSE values at intermediate training times
PyTorch [Pyomo] 0.0111 0.4204 28.59 are obtained by interrupting IPOPT’s runtime and
may appear unstable.
5.2 Performance Evaluation

Shorter Training Duration The collocation-based model

exhibits fast convergence (approximately 7 seconds to
train the regular-size network). We compare this with
the accuracy achieved by the JAX model within the
same training timeframe. As demonstrated in the Shorter
Training Duration (top) section of Table 1, the Pyomo
model achieves lower MSE during this time.

Smaller-Size Network We also observe that collocation-

based training can produce models that achieve better
performance with smaller network sizes, as shown in the
Smaller-Size Network (middle) section of Table 1. De-
spite letting the JAX model train for longer using the
smaller-size network configuration, it does not reach the
performance level of the Pyomo model. This suggests that
the improved optimization framework results in better
Fig. 3. Testing MSE for three training frameworks.
training of the limited model parameters, leading to lower
training and testing MSEs.

Longer Training Duration While training models with

JAX and PyTorch for longer periods of time enhances
their performance, we find they do not surpass the model
trained with the collocation framework unless they are
pre-trained with the latter, as detailed in Longer Training
Duration (bottom) section of Table 1. Figures 2 and 3
show that the collocation-based training framework results
in the fastest convergence in terms of both training and
testing MSEs.

Hybrid Pre-Training We also demonstrate that the pro-

posed framework can be used to pre-train alongside ex-
isting methods. For example, after the collocation-based
training converges, we use its weights and biases to ini-
tialize a JAX model. Subsequently, the JAX model can
continue training and further improve the results, as seen
in Figure 4. The combination of the collocation-based Fig. 4. Training MSE of JAX model with pre-training.
training followed by training using JAX achieves the lowest
training MSE scores, as seen in Table 1. using ADMM. Figure 5 shows that ADMM is able to
successfully coordinate the training of the two submodels,
Batching Using ADMM Finally, we evaluate the per- each containing half of the training data. The final MSE
formance of the proposed methodology for multi-batching using the consensus weights of the ADMM-trained model
also surpasses the performance of a monolithic-trained Chen, R.T.Q., Rubanova, Y., Bettencourt, J., and Duve-
model. This improvement is perhaps related to folklore naud, D.K. (2018). Neural ordinary differential equa-
observations that stochastic (batched) gradients may help tions. In Proceedings of the 32nd International Con-
escape local optima. Ultimately, the ADMM framework ference on Neural Information Processing Systems, vol-
provides an avenue to train larger models more effectively. ume 31, 6572–6583.
Cleveland, W.S. and Devlin, S.J. (1988). Locally weighted
regression: An approach to regression analysis by local
fitting. Journal of the American Statistical Association,
83(403), 596–610.
Dua, V. and Dua, P. (2012). A simultaneous approach for
parameter estimation of a system of ordinary differential
equations, using artificial neural network approxima-
tion. Industrial & Engineering Chemistry Research,
51(4), 1809–1814.
Glorot, X. and Bengio, Y. (2010). Understanding the
difficulty of training deep feedforward neural networks.
In Proceedings of the 13th International Conference on
Artificial Intelligence and Statistics (AISTATS), vol-
ume 9, 249–256.
Kameswaran, S. and Biegler, L.T. (2006). Simultaneous
Fig. 5. ADMM (150 + 150 training points) vs Single dynamic optimization strategies: Recent advances and
Model (300 training points) Performance challenges. Computers & Chemical Engineering, 30(10),
1560–1575.
Lehtimäki, M., Paunonen, L., and Linne, M.L. (2024).
6. CONCLUSIONS
Accelerating neural ODEs using model order reduction.
IEEE Transactions on Neural Networks and Learning
In this work, we propose a collocation-based methodology Systems, 35(1), 519–531.
using spectral methods for training Neural ODEs. By ap- Luo, J., Abdullah, F., and Christofides, P.D. (2023). Model
proximating the solution with Lagrange polynomials and predictive control of nonlinear processes using neural
enforcing differential equation constraints at collocation ordinary differential equation models. Computers &
points, we recast the ODE problem as a system of algebraic Chemical Engineering, 178, 108367.
constraints suitable for simultaneous optimization using Quaglino, A., Gallieri, M., Masci, J., and Koutnı́k, J.
IPOPT. Our experimental results on the Van der Pol (2020). SNODE: Spectral discretization of neural ODEs
Oscillator demonstrate that the proposed method achieves for system identification. In Proceedings of the 8th
fast convergence and may enable more compact models International Conference on Learning Representations
compared to traditional sequential training approaches (ICLR).
(e.g., implemented in JAX and PyTorch). Roesch, E., Rackauckas, C., and Stumpf, M.P.H. (2021).
Collocation based training of neural ordinary differential
REFERENCES equations. Statistical Applications in Genetics and
Molecular Biology, 20(2), 37–49.
Berrut, J.P. and Trefethen, L.N. (2004). Barycentric Sarker, I.H. (2021). Deep learning: A comprehensive
Lagrange interpolation. SIAM Review, 46(3), 501–517. overview on techniques, taxonomy, applications and
Bhosekar, A. and Ierapetritou, M. (2018). Advances in research directions. SN Computer Science, 2(6), 420.
surrogate based modeling, feasibility analysis, and opti- Sorourifar, F., Peng, Y., Castillo, I., Bui, L., Venegas,
mization: A review. Computers & Chemical Engineer- J., and Paulson, J.A. (2023). Physics-enhanced neural
ing, 108, 250–267. ordinary differential equations: Application to industrial
Biegler, L. and Zavala, V. (2009). Large-scale nonlinear chemical reaction systems. Industrial & Engineering
programming using IPOPT: An integrating framework Chemistry Research, 62(38), 15563–15577.
for enterprise-wide dynamic optimization. Computers & Thebelt, A., Wiebe, J., Kronqvist, J., Tsay, C., and Mis-
Chemical Engineering, 33(3), 575–582. ener, R. (2022). Maximizing information from chemical
Boyd, J.P. (2000). Chebyshev and Fourier Spectral Meth- engineering data sets: Applications to machine learning.
ods. Dover Publications, Mineola, NY, 2 edition. Chemical Engineering Science, 252, 117469.
Boyd, S., Parikh, N., Chu, E., Peleato, B., and Eckstein, J. Tjoa, I.B. and Biegler, L.T. (1991). Simultaneous solution
(2010). Distributed optimization and statistical learning and optimization strategies for parameter estimation
via the alternating direction method of multipliers. of differential-algebraic equation systems. Industrial &
Foundations and Trends in Machine Learning, 3, 1–122. Engineering Chemistry Research, 30(2), 376–385.
Bradley, W. and Boukouvala, F. (2021). Two-stage ap- Tsay, C. and Baldea, M. (2019). 110th anniversary: using
proach to parameter estimation of differential equations data to bridge the time and length scales of process
using neural ODEs. Industrial & Engineering Chemistry systems. Industrial & Engineering Chemistry Research,
Research, 60(45), 16330–16344. 58(36), 16696–16708.
Ceccon, F., Jalving, J., Haddad, J., Thebelt, A., Tsay, C., Young, L.C. (2019). Orthogonal collocation revisited.
Laird, C.D., and Misener, R. (2022). OMLT: Optimiza- Computer Methods in Applied Mechanics and Engineer-
tion & machine learning toolkit. Journal of Machine ing, 345, 1033–1076.
Learning Research, 23, 349.

Mila Neural ODEs Tutorial Vikram Voleti
100% (1)
Mila Neural ODEs Tutorial Vikram Voleti
51 pages
On Neural Diferential Equations
No ratings yet
On Neural Diferential Equations
231 pages
Lasso Regression
No ratings yet
Lasso Regression
3 pages
Numerical Integration
No ratings yet
Numerical Integration
28 pages
Mathematics 8 - Factoring Techniques
No ratings yet
Mathematics 8 - Factoring Techniques
41 pages
Elementary Numerical Analysis - An Algorithmic Approach - S. D. Conte and C. de Boor
100% (1)
Elementary Numerical Analysis - An Algorithmic Approach - S. D. Conte and C. de Boor
51 pages
Thesis Pdepinns
No ratings yet
Thesis Pdepinns
67 pages
Divide and Conquer
No ratings yet
Divide and Conquer
54 pages
Program: BE (Mechanical) Class: TE Course: Numerical Methods and Optimization Unit: Roots of Equation Lecture 01: Types of Root
No ratings yet
Program: BE (Mechanical) Class: TE Course: Numerical Methods and Optimization Unit: Roots of Equation Lecture 01: Types of Root
12 pages
Matlab Example
No ratings yet
Matlab Example
11 pages
Neural ODES
No ratings yet
Neural ODES
32 pages
P - C S N O: Oint Alibrated Pectral Eural Perators
No ratings yet
P - C S N O: Oint Alibrated Pectral Eural Perators
19 pages
MATH 685/ CSI 700/ OR 682 Lecture Notes: Ordinary Differential Equations. Initial Value Problems
No ratings yet
MATH 685/ CSI 700/ OR 682 Lecture Notes: Ordinary Differential Equations. Initial Value Problems
82 pages
Control System Lec 7-15
No ratings yet
Control System Lec 7-15
124 pages
Numerical Integration Formulas
No ratings yet
Numerical Integration Formulas
60 pages
Neural Operator - Learning Maps Between Function Spaces
No ratings yet
Neural Operator - Learning Maps Between Function Spaces
93 pages
Guelph Neural ODEs Tutorial
No ratings yet
Guelph Neural ODEs Tutorial
70 pages
Solving Ordinary Differential Equations and Systems Using Neural Network Methods
No ratings yet
Solving Ordinary Differential Equations and Systems Using Neural Network Methods
77 pages
TFG Baldillou Salse Pau
No ratings yet
TFG Baldillou Salse Pau
73 pages
Lecture Attention Neural Networks
No ratings yet
Lecture Attention Neural Networks
74 pages
An Overview On Machine Learning Methods For Partial Differential Equations From Physics Informed Neural Networks To Deep Operator Learning
No ratings yet
An Overview On Machine Learning Methods For Partial Differential Equations From Physics Informed Neural Networks To Deep Operator Learning
59 pages
A Mathematical Guide To Operator Learning
No ratings yet
A Mathematical Guide To Operator Learning
45 pages
Nmode: Neural Memory Ordinary Differential Equation: Zhang Yi
No ratings yet
Nmode: Neural Memory Ordinary Differential Equation: Zhang Yi
36 pages
Neural Ordinary Differential Equations: Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, David Duvenaud
No ratings yet
Neural Ordinary Differential Equations: Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, David Duvenaud
42 pages
Jiakuang He and Dongqing Wu: Rickwu@zhku - Edu.cn
No ratings yet
Jiakuang He and Dongqing Wu: Rickwu@zhku - Edu.cn
34 pages
Perform The Indicated Operations 1
No ratings yet
Perform The Indicated Operations 1
6 pages
Multilevel Toeplitz Matrices Generated by Tensor-Structured Vectors and Convolution With Logarithmic Complexity
No ratings yet
Multilevel Toeplitz Matrices Generated by Tensor-Structured Vectors and Convolution With Logarithmic Complexity
26 pages
Advanced Operations Research Prof. G. Srinivasan Dept of Management Studies Indian Institute of Technology, Madras Lecture-8 Primal Dual Algorithm
No ratings yet
Advanced Operations Research Prof. G. Srinivasan Dept of Management Studies Indian Institute of Technology, Madras Lecture-8 Primal Dual Algorithm
32 pages
Stable Implementation of Probabilistic ODE Solvers: Nicholas KR Amer
No ratings yet
Stable Implementation of Probabilistic ODE Solvers: Nicholas KR Amer
29 pages
Image Super Resolution Report
No ratings yet
Image Super Resolution Report
12 pages
L4 Emt 2101 Engineering Mathematics Iii
No ratings yet
L4 Emt 2101 Engineering Mathematics Iii
25 pages
Learning in Latent Spaces Improves The Predictive Accuracy of Deep Neural Operators
No ratings yet
Learning in Latent Spaces Improves The Predictive Accuracy of Deep Neural Operators
22 pages
(Gin, Craig, Et Al.), Deep Learning Models For Global Coordinate Transformations That Linearize Pdes., Arxiv Preprint Arxiv-1911.02710 (2019) .
No ratings yet
(Gin, Craig, Et Al.), Deep Learning Models For Global Coordinate Transformations That Linearize Pdes., Arxiv Preprint Arxiv-1911.02710 (2019) .
27 pages
Trajectory Flow Matching
No ratings yet
Trajectory Flow Matching
21 pages
Optimisation 1
No ratings yet
Optimisation 1
22 pages
Improving Physics-Informed Neural Networks With Meta-Learned Optimization
No ratings yet
Improving Physics-Informed Neural Networks With Meta-Learned Optimization
26 pages
MODNO: Multi Operator Learning With Distributed Neural Operators
No ratings yet
MODNO: Multi Operator Learning With Distributed Neural Operators
20 pages
Neural Ordinary Differential Equations
No ratings yet
Neural Ordinary Differential Equations
13 pages
Nueral FLOWS
No ratings yet
Nueral FLOWS
24 pages
008 Parameterized Neural Ordinary Differential Equations: Applications To Computational Physics Problems
No ratings yet
008 Parameterized Neural Ordinary Differential Equations: Applications To Computational Physics Problems
19 pages
Presented By, Shobha C.Hiremath (01FE17MCS019)
No ratings yet
Presented By, Shobha C.Hiremath (01FE17MCS019)
25 pages
Solving Differential Equations Using Neural Network Solution Bundles
No ratings yet
Solving Differential Equations Using Neural Network Solution Bundles
21 pages
Neural ODE
No ratings yet
Neural ODE
21 pages
Aad 4
No ratings yet
Aad 4
19 pages
Extremum Seeking For PDE Systems Using Physics-Informed Neural Networks
No ratings yet
Extremum Seeking For PDE Systems Using Physics-Informed Neural Networks
23 pages
Neural Ordinary Differential Equations: Lu Et Al. 2017 Haber and Ruthotto 2017 Ruthotto and Haber 2018
No ratings yet
Neural Ordinary Differential Equations: Lu Et Al. 2017 Haber and Ruthotto 2017 Ruthotto and Haber 2018
18 pages
Neural Differential Equations: A Comprehensive Review and Applications
No ratings yet
Neural Differential Equations: A Comprehensive Review and Applications
14 pages
Lec 105
No ratings yet
Lec 105
19 pages
Cubic Spline
No ratings yet
Cubic Spline
19 pages
Snode PP
No ratings yet
Snode PP
15 pages
Near-Optimal Control of Dynamical Systems With Neural Ordinary Differential Equations
No ratings yet
Near-Optimal Control of Dynamical Systems With Neural Ordinary Differential Equations
23 pages
Unit 3 RMT Notes
No ratings yet
Unit 3 RMT Notes
13 pages
ICONIP2024论文
No ratings yet
ICONIP2024论文
15 pages
Approximation of Solution Operators For High-Dimensional Pdes
No ratings yet
Approximation of Solution Operators For High-Dimensional Pdes
15 pages
Artificial Neural Networks For Solving Ordinary and Partial Differential Equations
No ratings yet
Artificial Neural Networks For Solving Ordinary and Partial Differential Equations
14 pages
NeurIPS 2021 Multiwavelet Based Operator Learning For Differential Equations Paper
No ratings yet
NeurIPS 2021 Multiwavelet Based Operator Learning For Differential Equations Paper
15 pages
5 Newton & Secant Method
No ratings yet
5 Newton & Secant Method
12 pages
Symplectic Ode-Net - Learning Hamiltonian Dynamics With Control
No ratings yet
Symplectic Ode-Net - Learning Hamiltonian Dynamics With Control
17 pages
009 Opening The Blackbox: Accelerating Neural Differential Equations by Regularizing Internal Solver Heuristics
No ratings yet
009 Opening The Blackbox: Accelerating Neural Differential Equations by Regularizing Internal Solver Heuristics
11 pages
MECH3780 Fluid Mechanics 2 and CFD
No ratings yet
MECH3780 Fluid Mechanics 2 and CFD
14 pages
A2
No ratings yet
A2
13 pages
Enhancing Trajectory Prediction in Complex Dynamical Systems With Neural Ordinary Differential Equations
No ratings yet
Enhancing Trajectory Prediction in Complex Dynamical Systems With Neural Ordinary Differential Equations
12 pages
Sciadv Abi8605
No ratings yet
Sciadv Abi8605
10 pages
An Analysis of Universal Differential Equations For
No ratings yet
An Analysis of Universal Differential Equations For
10 pages
An A PID Controller CVPR 2018 Paper
No ratings yet
An A PID Controller CVPR 2018 Paper
10 pages
The Mathematics of Neural Operators
No ratings yet
The Mathematics of Neural Operators
9 pages
Gelbrecht Et Al. - 2021 - Neural Partial Differential Equations For Chaotic
No ratings yet
Gelbrecht Et Al. - 2021 - Neural Partial Differential Equations For Chaotic
11 pages
Modeling Systems With Machine Learning Based Differential Equations
No ratings yet
Modeling Systems With Machine Learning Based Differential Equations
12 pages
Solving Flows of Dynamical Systems by Deep Neural Networks and A Novel Deep Learning Algorithm
No ratings yet
Solving Flows of Dynamical Systems by Deep Neural Networks and A Novel Deep Learning Algorithm
12 pages
State-Space Modeling For Control Based On Physics-Informed Neural Networks
No ratings yet
State-Space Modeling For Control Based On Physics-Informed Neural Networks
10 pages
Training Neural Networks Without Gradients
No ratings yet
Training Neural Networks Without Gradients
10 pages
Solving Differential Equations Via Artificial Neural Networks Findings and Failures in A Model Problem
No ratings yet
Solving Differential Equations Via Artificial Neural Networks Findings and Failures in A Model Problem
6 pages
A Proposal On Machine Learning Via Dynamical Systems
No ratings yet
A Proposal On Machine Learning Via Dynamical Systems
11 pages
Ordinary Differential Equation Application
No ratings yet
Ordinary Differential Equation Application
8 pages
16992-Article Text-20486-1-2-20210518
No ratings yet
16992-Article Text-20486-1-2-20210518
9 pages
Jacobi Gauss
No ratings yet
Jacobi Gauss
8 pages
IV. Nature of Finite Element Solutions: Stiffening Effect
No ratings yet
IV. Nature of Finite Element Solutions: Stiffening Effect
7 pages
Solve Systems of Ordinary Differential Equations Using Deep Neural Networks 2
No ratings yet
Solve Systems of Ordinary Differential Equations Using Deep Neural Networks 2
6 pages
CO429
No ratings yet
CO429
4 pages
011 Towards Understanding Normalization in Neural Odes
No ratings yet
011 Towards Understanding Normalization in Neural Odes
5 pages
Abdul Wajid Moroojo
No ratings yet
Abdul Wajid Moroojo
6 pages
Merger 02
No ratings yet
Merger 02
5 pages
Souptik29 May 2023
No ratings yet
Souptik29 May 2023
4 pages
MLDS A1 Spring2025
No ratings yet
MLDS A1 Spring2025
3 pages
Approximation of Solution Operators for High-dimensional PDEs部分1
No ratings yet
Approximation of Solution Operators for High-dimensional PDEs部分1
2 pages
Finite Element Analysis
No ratings yet
Finite Element Analysis
3 pages
Assingment 2
No ratings yet
Assingment 2
2 pages
2210 s24 QP 12 Removed
No ratings yet
2210 s24 QP 12 Removed
1 page

Training Neural ODEs Using Fully Discretized Simultaneous Optimization

Uploaded by

Training Neural ODEs Using Fully Discretized Simultaneous Optimization

Uploaded by

Training Neural ODEs Using Fully

Discretized Simultaneous Optimization ⋆

Abstract: Neural Ordinary Differential Equations (Neural ODEs) represent continuous-time

Keywords: Simultaneous dynamic optimization, nonlinear system identification, neural ODEs

1. INTRODUCTION Despite their advantages, training Neural ODEs (or mak-

The model can be run for a fixed number of iterations or

Model [Pre-training] MSE Train MSE Test Time (s)

Shorter Training Duration The collocation-based model

Smaller-Size Network We also observe that collocation-

Longer Training Duration While training models with

Hybrid Pre-Training We also demonstrate that the pro-

You might also like