\FAILED\FAILED

Learning and Verifying Maximal Taylor-Neural Lyapunov functions

Matthieu Barreau    Nicola Bastianello This work was partially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation, and by the European Union’s Horizon Research and Innovation Actions program under grant agreement No. 101070162.The authors are with Digital Futures, and the Division of Decision and Control Systems, KTH Royal Institute of Technology, Stockholm, Sweden. { barreau | nicolba}@kth.se.
Abstract

We introduce a novel neural network architecture, termed Taylor-neural Lyapunov functions, designed to approximate Lyapunov functions with formal certification. This architecture innovatively encodes local approximations and extends them globally by leveraging neural networks to approximate the residuals. Our method recasts the problem of estimating the largest region of attraction—specifically for maximal Lyapunov functions—into a learning problem, ensuring convergence around the origin through robust control theory. Physics-informed machine learning techniques further refine the estimation of the largest region of attraction. Remarkably, this method is versatile, operating effectively even without simulated data points. We validate the efficacy of our approach by providing numerical certificates of convergence across multiple examples. Our proposed methodology not only competes closely with state-of-the-art approaches, such as sum-of-squares and LyZNet, but also achieves comparable results even in the absence of simulated data. This work represents a significant advancement in control theory, with broad potential applications in the design of stable control systems and beyond.

{IEEEkeywords}

Stability of nonlinear systems, Neural networks, Robust control, Machine learning, Region of attraction

1 Introduction

Dynamical systems apply to many engineering technologies and natural phenomena [1], and thus the analysis of their properties provides key insights. The most fundamental of these properties is stability, which ensures the evolution of a dynamic system towards an equilibrium state. The predominant paradigm in stability analysis is the Lyapunov approach, which seeks to identify an energy function for the system [2]. This kind of stability certificate has been demonstrated to be a valuable tool due to its versatility, as it can be readily applied to a range of contexts, including controlled systems [3], performance certification [4], high- or infinite-dimensional systems [5], and discrete-time systems [2].

Nevertheless, discovering a Lyapunov function for a general system represents a significant challenge, as evidenced by decades of literature on the subject. In the context of linear, time-invariant systems, it is well established that the existence of a quadratic Lyapunov function is equivalent to global exponential stability [6]. Furthermore, the determination of a quadratic Lyapunov function is equivalent to the resolution of a linear matrix inequality, for which there exist efficient numerical solvers [7]. In general, for non-linear systems, the Lyapunov function is not quadratic, and there is then no general procedure [2].

Furthermore, the stability of a dynamical system may be constrained to a limited region around an equilibrium, called a region of attraction. This region includes all initial states that will evolve towards the given equilibrium and may not coincide with the entire state space. Consequently, an additional challenge is to compute a Lyapunov function that leads to the largest region of attraction. Such functions are known as a maximal Lyapunov function [8].

For polynomial systems, sum-of-squares techniques have been investigated for estimating a maximal Lyapunov function in [9, 10, 11]. However, this approach suffers from numerical errors when dealing with high-dimensional systems and does not accurately approximate the region of attraction for stiff systems [12]. For a more general class of systems, rational Lyapunov functions have been considered in [8, 13, 14] together with an algorithm to find a maximal Lyapunov function. However, rational Lyapunov functions suffer from the lack of efficient numerical tools. Considering quadratic functions, robust theory encapsulates non-linearities in a cone to compute an inner estimate of the region of attraction [15]. This approach is quite conservative and leads to a poor estimate of the maximal region of attraction for complex systems.

In this work, we plan to take advantage of the physics-informed machine learning paradigm [16, 17]. The idea is to approximate a solution to a differential equation by expressing it as a dynamical constraint in the learning problem. This approach has been proven to be successful in many applications [18, 19, 20], and it has been recently applied to Lyapunov functions in [21, 22] for instance. As investigated in [12], the largest region of attraction will be estimated using Zubov’s theory [23]. More specifically, we will translate part of the methodology described in [23, p91] with series expansion to the neural network case.

1.1 Main contributions

The current literature on neural Lyapunov functions does not rely on physics biases to improve the convergence properties. In this paper, we propose the following contributions:

  • A Taylor-based neural network as a universal approximation for Lyapunov functions (discrepancy bias);

  • A new loss function to apply constraints on a null set (learning bias);

  • A new training algorithm using the Taylor decomposition to enforce local stability (inductive bias);

  • A new sampling methodology to certify that the neural approximation of the Lyapunov function is a Lyapunov function.

We claim that these modifications enable us to discard the use of simulated data (observation bias) and improve the robustness of the algorithm, which means that the training algorithm more often converges to the optimal solution, independently of the initialization.

1.2 Background and related work

As first noted by [24] and enlightened more recently by [25] and in the recent survey by [26], it is possible to construct Lyapunov functions that are neural networks. The seminal work by [27] led to the non-convex optimization problem that the neural network Lyapunov function must satisfy. It also showed the approximation capabilities of neural networks but nothing was conducted regarding the region of attraction or the robustness of the training. In 2019, [28] focused on learning Lyapunov functions with a specific architecture to enforce some properties of the Lyapunov function. However, they used this knowledge for an enhanced learning of stable dynamical systems.

The key breakthrough was with the Physics-Informed Machine Learning framework [16] which incorporates physical priors in the form of a dynamical constraint. This was later discussed and enlarged by [17]. This new framework fits perfectly with the Lyapunov methodology. In fact, a Lyapunov function must satisfy constraints expressed in terms of differential inequalities. Such an approach has been investigated in many papers in the last three years.

The authors of [29] proposed to use a neural Lyapunov function to derive a control law for the system with a provable guarantee of stability. The control was obtained as the solution to the constrained optimization problem. The safety is ensured by, first, estimating the region of attraction and, secondly, by using a falsifier which penalizes the outer estimate of the region of attraction. The region of attraction is, however, computed using a regularization agent, leading to a result that is highly sensitive to the hyper-parameter, not guaranteed to converge, and often conservative.

The lack of formal guarantees has been investigated more thoroughly by [30]. They generate Lyapunov neural networks using symbolic computations which offer a trade-off between analytical and numerical methods. The method also relies on training using a verifier which is building counter-examples to get a more robust learning. However, the method works only for global asymptotically stable systems, which means that the region of attraction is nsuperscript𝑛\mathbb{R}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. This is a very restrictive assumption since many nonlinear systems have several equilibrium points and thus are not globally asymptotically stable.

Control has been investigated by [31] and [21]. They both use neural networks to estimate a control Lyapunov function, which leads to a stable system under the designed control law. The first one focuses on safety in robotic applications while the second one provides a better neural network architecture that enforces positive definiteness. Finally, [22, 32] worked on a similar topic, trying to combine all the ideas previously cited into one. They estimated the region of attraction of an equilibrium of a partially unknown nonlinear autonomous system using satisfiability modulo theories as a verifier. All these works obtain a rather conservative estimate of the region of attraction, and the obtained estimate is not robust across several trainings. Similar topics with the same conclusions have been investigated in a discrete-time context by [33, 34, 35, 36] to cite a few.

Recent work by [12] aims at learning a neural Lyapunov function using Zubov’s theorem to maximize the region of attraction [23]. Consequently, the learning is more robust and almost always estimates the true region of attraction. However, as done by [33], a simulator is needed to compute if some initial states are leading to an unstable equilibrium point. This knowledge is used as data to enhance learning. In the case of controller synthesis, for example, we often can’t simulate the system. This highlights the need for a pure learning procedure for the maximum region of attraction of a general nonlinear dynamical system.

From the previous papers, it appears that very little work was done on the training algorithm. The introduction of a falsifier or verifier was the only addition to certify the training a posteriori. Consequently, current works are very sensitive to initialization. Our main contribution is to improve the training algorithm designed by [21] to introduce a robust estimation of the region of attraction. Moreover, the neural architecture has been changed such that we can derive a universal Lyapunov function approximation theorem. Similarly to the work by [12], we use Zubov’s theorem to maximize the region of attraction but we do not use an external simulator to get some additional data.

1.3 Organization

The organization of the paper is as follows. In Section 2, some preliminaries are given that lead to the formulation of the problem. Section 3 focuses on the construction of Taylor-Neural Lyapunov functions. Section 4 focuses on the efficient learning of such functions. Section 5 explores the certification aspect. Section 6 is devoted to simulations and discussion. Section 7 concludes the article.

1.4 Notation

Throughout the paper, \mathbb{R}blackboard_R refers to the set of real numbers, C3(I,J)superscript𝐶3𝐼𝐽C^{3}(I,J)italic_C start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ( italic_I , italic_J ) is the set of three-times differentiable functions from I𝐼Iitalic_I to J𝐽Jitalic_J. For x=[x1x2]2𝑥superscriptdelimited-[]subscript𝑥1subscript𝑥2topsuperscript2x=\left[x_{1}\ x_{2}\right]^{\top}\in\mathbb{R}^{2}italic_x = [ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and ψ::𝜓\psi:\mathbb{R}\to\mathbb{R}italic_ψ : blackboard_R → blackboard_R, we use the notation ψ.(x)=[ψ(x1)ψ(x2)]formulae-sequence𝜓𝑥superscriptdelimited-[]𝜓subscript𝑥1𝜓subscript𝑥2top\psi.(x)=\left[\psi(x_{1})\ \psi(x_{2})\right]^{\top}italic_ψ . ( italic_x ) = [ italic_ψ ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) italic_ψ ( italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT, referring to the element-wise operation. For x𝑥x\in\mathbb{R}italic_x ∈ blackboard_R, we define the rectified linear function as x+=max(0,x)subscript𝑥0𝑥x_{+}=\max(0,x)italic_x start_POSTSUBSCRIPT + end_POSTSUBSCRIPT = roman_max ( 0 , italic_x ). For x,yn𝑥𝑦superscript𝑛x,y\in\mathbb{R}^{n}italic_x , italic_y ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, the euclidean scalar product is written as x,y𝑥𝑦\langle x,y\rangle⟨ italic_x , italic_y ⟩, the L2superscript𝐿2L^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT norm is defined as x2=x,xsubscriptnorm𝑥2𝑥𝑥\|x\|_{2}=\sqrt{\langle x,x\rangle}∥ italic_x ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = square-root start_ARG ⟨ italic_x , italic_x ⟩ end_ARG and the infinity norm is x=maxi|xi|subscriptnorm𝑥subscript𝑖subscript𝑥𝑖\|x\|_{\infty}=\max_{i}|x_{i}|∥ italic_x ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = roman_max start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT |. For two squared symmetric matrices A𝐴Aitalic_A and B𝐵Bitalic_B of the same size, ABprecedes𝐴𝐵A\prec Bitalic_A ≺ italic_B means that AB𝐴𝐵A-Bitalic_A - italic_B has strictly negative eigenvalues. For a discrete set 𝒜𝒜\mathcal{A}caligraphic_A, |𝒜|𝒜\left|\mathcal{A}\right|| caligraphic_A | refers to its cardinal.

2 Preliminaries and Problem statement

This section formalizes the problem and introduces the working assumptions.

2.1 Problem formulation

We consider the following dynamical system:

{x˙(t)=f(x(t)),t0,x(0)=x0𝒟=(1,1)ncases˙𝑥𝑡𝑓𝑥𝑡𝑡0𝑥0subscript𝑥0𝒟superscript11𝑛missing-subexpression\left\{\begin{array}[]{ll}\dot{x}(t)=f(x(t)),&t\geq 0,\\ x(0)=x_{0}\in\mathcal{D}=(-1,1)^{n}&\end{array}\right.{ start_ARRAY start_ROW start_CELL over˙ start_ARG italic_x end_ARG ( italic_t ) = italic_f ( italic_x ( italic_t ) ) , end_CELL start_CELL italic_t ≥ 0 , end_CELL end_ROW start_ROW start_CELL italic_x ( 0 ) = italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ caligraphic_D = ( - 1 , 1 ) start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT end_CELL start_CELL end_CELL end_ROW end_ARRAY (1)

where n\{0}𝑛\0n\in\mathbb{N}\backslash\{0\}italic_n ∈ blackboard_N \ { 0 } is the dimension of the system, f:nn:𝑓superscript𝑛superscript𝑛f:\mathbb{R}^{n}\to\mathbb{R}^{n}italic_f : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT is Lipschitz continuous and possibly non-linear. Under these conditions, there exists a unique solution to the previous problem which is forward complete [37].

We assume without loss of generality that the origin 00 is an equilibrium point of f𝑓fitalic_f, i.e. f(0)=0𝑓00f(0)=0italic_f ( 0 ) = 0. We are interested in showing the asymptotic stability of the origin as defined in [38, Definition 1.3] and reminded below.

Definition 1

The origin of (1) is said to be locally asymptotically stable in the open and connected set 𝒟𝒟\mathcal{R}\subseteq\mathcal{D}caligraphic_R ⊆ caligraphic_D containing the origin if for each ε>0𝜀0\varepsilon>0italic_ε > 0 there exists δ>0𝛿0\delta>0italic_δ > 0 such that

x0,x0δt>0,x(t)ε,formulae-sequencefor-allsubscript𝑥0formulae-sequencenormsubscript𝑥0𝛿formulae-sequencefor-all𝑡0norm𝑥𝑡𝜀\forall x_{0}\in\mathcal{R},\quad\|x_{0}\|\leq\delta\quad\Rightarrow\quad% \forall t>0,\quad\|x(t)\|\leq\varepsilon,∀ italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ caligraphic_R , ∥ italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∥ ≤ italic_δ ⇒ ∀ italic_t > 0 , ∥ italic_x ( italic_t ) ∥ ≤ italic_ε ,

and x(t)t0𝑡absentnorm𝑥𝑡0\|x(t)\|\xrightarrow[t\to\infty]{}0∥ italic_x ( italic_t ) ∥ start_ARROW start_UNDERACCENT italic_t → ∞ end_UNDERACCENT start_ARROW start_OVERACCENT end_OVERACCENT → end_ARROW end_ARROW 0.

\mathcal{R}caligraphic_R is called a region of attraction of (1) around the origin.

First, let us define a Lyapunov function similar to that proposed in [2, Theorem 3.1].

Definition 2

A continuously differentiable function V:𝒟+:𝑉𝒟superscriptabsentV:\mathcal{D}\to\mathbb{R}^{+}\initalic_V : caligraphic_D → blackboard_R start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ∈ is said to be a local Lyapunov function for f𝑓fitalic_f if

V(0)=0,𝑉00\displaystyle V(0)=0,italic_V ( 0 ) = 0 , (2a)
x𝒟\{0},for-all𝑥\𝒟0\displaystyle\forall x\in\mathcal{D}\backslash\{0\},\quad∀ italic_x ∈ caligraphic_D \ { 0 } , V(x)>0,𝑉𝑥0\displaystyle V(x)>0,italic_V ( italic_x ) > 0 , (2b)
x𝒟\{0},V(x)1,formulae-sequencefor-all𝑥\𝒟0𝑉𝑥1\displaystyle\forall x\in\mathcal{D}\backslash\{0\},\quad V(x)\leq 1,\quad∀ italic_x ∈ caligraphic_D \ { 0 } , italic_V ( italic_x ) ≤ 1 , Vx(x)f(x)<0,𝑉𝑥𝑥𝑓𝑥0\displaystyle\frac{\partial V}{\partial x}(x)\cdot f(x)<0,divide start_ARG ∂ italic_V end_ARG start_ARG ∂ italic_x end_ARG ( italic_x ) ⋅ italic_f ( italic_x ) < 0 , (2c)

The Lyapunov direct method [2, Theorem 3.1] provides a way to demonstrate asymptotic stability in a region of attraction.

Theorem 1

If there exists a local Lyapunov function V𝑉Vitalic_V then the origin is asymptotically stable and a region of attraction is (V)={x𝒟|V(x)<1}𝑉conditional-set𝑥𝒟𝑉𝑥1\mathcal{R}(V)=\left\{x\in\mathcal{D}\ |\ V(x)<1\right\}caligraphic_R ( italic_V ) = { italic_x ∈ caligraphic_D | italic_V ( italic_x ) < 1 }.

The existence of a region of attraction is guaranteed by the following assumption on f𝑓fitalic_f.

Assumption 1

Assume that f𝑓fitalic_f in (1) can be written as

f(x)=Ax+o(x)𝑓𝑥𝐴𝑥𝑜norm𝑥f(x)=Ax+o(\|x\|)italic_f ( italic_x ) = italic_A italic_x + italic_o ( ∥ italic_x ∥ ) (3)

such that A𝐴Aitalic_A has all eigenvalues with strictly negative real parts.

The Lyapunov indirect method [38, Theorem 12.6] then concludes that there exists a local Lyapunov function V𝒞(𝒟,+)𝑉superscript𝒞𝒟superscriptV\in\mathcal{C}^{\infty}(\mathcal{D},\mathbb{R}^{+})italic_V ∈ caligraphic_C start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( caligraphic_D , blackboard_R start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ) for (1). Consequently, the set

𝒱={\displaystyle\mathcal{V}=\Bigg{\{}caligraphic_V = { VC(𝒟,+)|i{1,,n},Vxi(0)0,formulae-sequence𝑉conditionalsuperscript𝐶𝒟superscriptfor-all𝑖1𝑛𝑉subscript𝑥𝑖00\displaystyle V\in C^{\infty}(\mathcal{D},\mathbb{R}^{+})\ |\ \forall i\in\{1,% \dots,n\},\frac{\partial V}{\partial x_{i}}(0)\neq 0,italic_V ∈ italic_C start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( caligraphic_D , blackboard_R start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT ) | ∀ italic_i ∈ { 1 , … , italic_n } , divide start_ARG ∂ italic_V end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ( 0 ) ≠ 0 ,
and V is a Lyapunov function for (1)}\displaystyle\text{ and }V\text{ is a Lyapunov function for \eqref{eq:% dynamical_system}}\Bigg{\}}and italic_V is a Lyapunov function for ( ) }

is not empty.

Let the following application μ𝜇\muitalic_μ be such that

μ(V)=𝒱+V(V)1.𝜇𝑉𝒱superscript𝑉maps-tosubscript𝑉1\mu(V)=\begin{array}[t]{rcl}\mathcal{V}&\to&\mathbb{R}^{+}\\ V&\mapsto&\int_{\mathcal{R}(V)}1\end{array}.italic_μ ( italic_V ) = start_ARRAY start_ROW start_CELL caligraphic_V end_CELL start_CELL → end_CELL start_CELL blackboard_R start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL italic_V end_CELL start_CELL ↦ end_CELL start_CELL ∫ start_POSTSUBSCRIPT caligraphic_R ( italic_V ) end_POSTSUBSCRIPT 1 end_CELL end_ROW end_ARRAY .

The function μ𝜇\muitalic_μ is the volume of the region of attraction related to a Lyapunov function V𝑉Vitalic_V. This defines a relation of order between the Lyapunov function such as V1V2precedes-or-equalssubscript𝑉1subscript𝑉2V_{1}\preceq V_{2}italic_V start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⪯ italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is equivalent to μ(V1)μ(V2)𝜇subscript𝑉1𝜇subscript𝑉2\mu(V_{1})\leq\mu(V_{2})italic_μ ( italic_V start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ≤ italic_μ ( italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) for any V1,V2𝒱subscript𝑉1subscript𝑉2𝒱V_{1},V_{2}\in\mathcal{V}italic_V start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∈ caligraphic_V. Since for any V𝒱𝑉𝒱V\in\mathcal{V}italic_V ∈ caligraphic_V we get (V)𝒟¯𝑉¯𝒟\mathcal{R}(V)\subseteq\bar{\mathcal{D}}caligraphic_R ( italic_V ) ⊆ over¯ start_ARG caligraphic_D end_ARG, then μ𝜇\muitalic_μ is upper-bounded by 1111. The following optimization problem is well-defined:

V𝒱=argsupV𝒱μ(V).superscript𝑉superscript𝒱subscriptsupremum𝑉𝒱𝜇𝑉V^{*}\in\mathcal{V}^{*}=\operatorname*{\arg\!\sup}_{V\in\mathcal{V}}\mu(V).italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ caligraphic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = start_OPERATOR roman_arg roman_sup end_OPERATOR start_POSTSUBSCRIPT italic_V ∈ caligraphic_V end_POSTSUBSCRIPT italic_μ ( italic_V ) .

From the original work of Zubov [23] and later about maximum Lyapunov functions in [8], we define

𝒱Z={V𝒱|x(V),Vx(x)f(x)=0}.\begin{split}\mathcal{V}_{Z}=\Big{\{}&V\in\mathcal{V}\ |\ \forall x\in\partial% \mathcal{R}(V),\quad\frac{\partial V}{\partial x}(x)\cdot f(x)=0\Big{\}}.\end{split}start_ROW start_CELL caligraphic_V start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT = { end_CELL start_CELL italic_V ∈ caligraphic_V | ∀ italic_x ∈ ∂ caligraphic_R ( italic_V ) , divide start_ARG ∂ italic_V end_ARG start_ARG ∂ italic_x end_ARG ( italic_x ) ⋅ italic_f ( italic_x ) = 0 } . end_CELL end_ROW (4)

and then get the following set inclusion:

𝒱Z𝒱.subscript𝒱𝑍superscript𝒱\mathcal{V}_{Z}\subseteq\mathcal{V}^{*}.caligraphic_V start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT ⊆ caligraphic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT .

We are now able to state the problem statement of this article.

Problem 2.2.

We want to find a Csuperscript𝐶C^{\infty}italic_C start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT approximation of a Lyapunov function Vsuperscript𝑉V^{*}italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT leading to the largest region of attraction for the dynamical system (1) in terms of volume, i.e.

V𝒱Z.superscript𝑉subscript𝒱𝑍V^{*}\in\mathcal{V}_{Z}.italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ caligraphic_V start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT .

2.2 General remarks

Note that the assumption x0𝒟subscript𝑥0𝒟x_{0}\in\mathcal{D}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ caligraphic_D done when defining the system in (1) is not restrictive, as any open interval can be rescaled and shifted to (1,1)11(-1,1)( - 1 , 1 ).

In the case of x0nsubscript𝑥0superscript𝑛x_{0}\in\mathbb{R}^{n}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, one can consider the non-linear transformation tanh\tanhroman_tanh to map \mathbb{R}blackboard_R to (1,1)11(-1,1)( - 1 , 1 ). However, global asymptotic stability ((V)=𝒟𝑉𝒟\mathbb{R}(V)=\mathcal{D}blackboard_R ( italic_V ) = caligraphic_D) requires radial unboundedness of the Lyapunov function (V𝑉V\to\inftyitalic_V → ∞ when x𝒟𝑥𝒟x\to\partial\mathcal{D}italic_x → ∂ caligraphic_D) [2]. As in [8], let a maximal Lyapunov function as
Vm(x)=log(1V(x))subscript𝑉𝑚𝑥1𝑉𝑥V_{m}(x)=-\log(1-V(x))italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_x ) = - roman_log ( 1 - italic_V ( italic_x ) ). If V𝑉Vitalic_V is a local Lyapunov function as defined previously then Vmsubscript𝑉𝑚V_{m}italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT is a Lyapunov function and Vm(𝒟\(V))=subscript𝑉𝑚\𝒟𝑉V_{m}(\mathcal{D}\backslash{}\mathcal{R}(V))=\inftyitalic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( caligraphic_D \ caligraphic_R ( italic_V ) ) = ∞. Consequently, the radial unboundedness of Vmsubscript𝑉𝑚V_{m}italic_V start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT is equivalent to

x𝒟,V(x)1.formulae-sequencefor-all𝑥𝒟𝑉𝑥1\forall x\in\partial\mathcal{D},\quad\quad V(x)\geq 1.∀ italic_x ∈ ∂ caligraphic_D , italic_V ( italic_x ) ≥ 1 .

This condition can be added to ensure global stability, but this point will not be discussed further in this article.

Concerning the definition of the region of attraction, it differs slightly from the classical ones in [38, Section 12.2] or [2, Section 3.1], where d={x𝒟|V(x)<d}subscript𝑑conditional-set𝑥𝒟𝑉𝑥𝑑\mathcal{R}_{d}=\{x\in\mathcal{D}\ |\ V(x)<d\}caligraphic_R start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT = { italic_x ∈ caligraphic_D | italic_V ( italic_x ) < italic_d }. The proposed version decreased the number of parameters by scaling the local Lyapunov function such that V=d=1𝑉𝑑1V=d=1italic_V = italic_d = 1 on the boundary of the region of attraction.

Regarding Assumption 1, it is not very restrictive. Indeed, any analytic function f𝑓fitalic_f will admit such a decomposition [39]. The constant term can be removed by an appropriate change of variable such that the origin becomes an equilibrium point for (1). If at least one eigenvalue of A𝐴Aitalic_A has a strictly positive real part, then the equilibrium point is not asymptotically stable [38, Theorem 12.2]. However, if there is an eigenvalue on the imaginary axis, the equilibrium point might still be asymptotically stable [38, Example 12.1]. We do not deal with these corner cases in this article.

3 Taylor-Neural Lyapunov functions

Finding a Lyapunov function V𝑉Vitalic_V in the set 𝒱Zsubscript𝒱𝑍\mathcal{V}_{Z}caligraphic_V start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT is challenging, it boils down to solving equations (2) together with Vxf|(V)=0evaluated-at𝑉𝑥𝑓𝑉0\left.\frac{\partial V}{\partial x}\cdot f\right|_{\partial\mathcal{R}(V)}=0divide start_ARG ∂ italic_V end_ARG start_ARG ∂ italic_x end_ARG ⋅ italic_f | start_POSTSUBSCRIPT ∂ caligraphic_R ( italic_V ) end_POSTSUBSCRIPT = 0. This system of equations is generally numerically intractable. Similar to robust control theory where the problem is conservatively relaxed by considering quadratic Lyapunov functions, we introduce Taylor-neural Lyapunov functions as universal approximations of maximal Lyapunov functions.

First, let us pick a Lyapunov function Vsuperscript𝑉V^{*}italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT in 𝒱Zsubscript𝒱𝑍\mathcal{V}_{Z}caligraphic_V start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT. Using Taylor expansion in several variables [40, Theorem 5.4], we get that for any x𝒟𝑥𝒟x\in\mathcal{D}italic_x ∈ caligraphic_D:

V(x)=V(0)+V(0)x+12xHx++i1++in=3,ik001(1t)223Vx1i1xnin(tx)𝑑tRi1,,in(x)k=1nxkik,\begin{split}&V^{*}(x)=V^{*}(0)+\nabla V^{*}(0)\cdot x+\frac{1}{2}x^{\top}H^{*% }x+\\ &+\sum_{\begin{subarray}{c}i_{1}+\cdot+i_{n}=3,\\ i_{k}\geq 0\end{subarray}}\underbrace{\int_{0}^{1}\frac{(1-t)^{2}}{2}\frac{% \partial^{3}V^{*}}{\partial x_{1}^{i_{1}}\cdots\partial x_{n}^{i_{n}}}(t\cdot x% )dt}_{R_{i_{1},\dots,i_{n}}(x)}\prod_{k=1}^{n}x_{k}^{i_{k}},\end{split}start_ROW start_CELL end_CELL start_CELL italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x ) = italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( 0 ) + ∇ italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( 0 ) ⋅ italic_x + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_H start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_x + end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + ∑ start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋅ + italic_i start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = 3 , end_CELL end_ROW start_ROW start_CELL italic_i start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≥ 0 end_CELL end_ROW end_ARG end_POSTSUBSCRIPT under⏟ start_ARG ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT divide start_ARG ( 1 - italic_t ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG 2 end_ARG divide start_ARG ∂ start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ⋯ ∂ italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUPERSCRIPT end_ARG ( italic_t ⋅ italic_x ) italic_d italic_t end_ARG start_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_i start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x ) end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , end_CELL end_ROW (5)

where H0superscript𝐻0H^{*}\neq 0italic_H start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ≠ 0 is the Hessian of Vsuperscript𝑉V^{*}italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT evaluated at 00, Vsuperscript𝑉\nabla V^{*}∇ italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is the gradient of Vsuperscript𝑉V^{*}italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and Ri1,,inC3(𝒟,)subscript𝑅subscript𝑖1subscript𝑖𝑛superscript𝐶3𝒟R_{i_{1},\dots,i_{n}}\in C^{3}(\mathcal{D},\mathbb{R})italic_R start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_i start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∈ italic_C start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ( caligraphic_D , blackboard_R ).

The following results are classical and related to the indirect Lyapunov method.

Lemma 3.3.
  1. 1.

    V(0)=V(0)=0superscript𝑉0superscript𝑉00V^{*}(0)=\nabla V^{*}(0)=0italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( 0 ) = ∇ italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( 0 ) = 0 and Hsuperscript𝐻H^{*}italic_H start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is a symmetric definite positive matrix.

  2. 2.

    AH+HA0precedessuperscript𝐴topsuperscript𝐻superscript𝐻𝐴0A^{\top}H^{*}+H^{*}A\prec 0italic_A start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_H start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + italic_H start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_A ≺ 0

Proof 3.4.
  1. 1.

    Since VC3(𝒟,)𝑉superscript𝐶3𝒟V\in C^{3}(\mathcal{D},\mathbb{R})italic_V ∈ italic_C start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT ( caligraphic_D , blackboard_R ), Hsuperscript𝐻H^{*}italic_H start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is a symmetric matrix, and Ri1,,insubscript𝑅subscript𝑖1subscript𝑖𝑛R_{i_{1},\dots,i_{n}}italic_R start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_i start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT are bounded. From (2a), we get that V(0)=0superscript𝑉00V^{*}(0)=0italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( 0 ) = 0. Evaluated in a neighborhood of the origin, equation (2b) implies V(0)=0superscript𝑉00\nabla V^{*}(0)=0∇ italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( 0 ) = 0. Equations (2a) and (2b) lead to a definite positive Hsuperscript𝐻H^{*}italic_H start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT.

  2. 2.

    Differentiate (5) and use 1), we get:

    Vx(x)=Hx+o(x).superscript𝑉𝑥𝑥superscript𝐻𝑥𝑜norm𝑥\frac{\partial V^{*}}{\partial x}(x)=H^{*}x+o(\|x\|).divide start_ARG ∂ italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_x end_ARG ( italic_x ) = italic_H start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_x + italic_o ( ∥ italic_x ∥ ) .

    Consequently, the time derivative along the trajectories of (1) leads to:

    Vx(x)f(x)superscript𝑉𝑥𝑥𝑓𝑥\displaystyle\frac{\partial V^{*}}{\partial x}(x)\cdot f(x)divide start_ARG ∂ italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_x end_ARG ( italic_x ) ⋅ italic_f ( italic_x ) =xHAx+o(x2)absentsuperscript𝑥topsuperscript𝐻𝐴𝑥𝑜superscriptnorm𝑥2\displaystyle=x^{\top}H^{*}Ax+o(\|x\|^{2})= italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_H start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_A italic_x + italic_o ( ∥ italic_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT )
    =12x(AH+HA)x+o(x2).absent12superscript𝑥topsuperscript𝐴topsuperscript𝐻superscript𝐻𝐴𝑥𝑜superscriptnorm𝑥2\displaystyle=\frac{1}{2}x^{\top}\left(A^{\top}H^{*}+H^{*}A\right)x+o(\|x\|^{2% }).= divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_A start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_H start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + italic_H start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_A ) italic_x + italic_o ( ∥ italic_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) .

    Equation (2c) evaluated in a neighborhood of the origin implies AH+HA0precedessuperscript𝐴topsuperscript𝐻superscript𝐻𝐴0A^{\top}H^{*}+H^{*}A\prec 0italic_A start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_H start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + italic_H start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_A ≺ 0.

Using the previous lemma, we get:

V(x)=12xHx+i1++in=3,ik0Ri1,,in(x)k=1nxkik,V^{*}(x)=\frac{1}{2}x^{\top}H^{*}x+\sum_{\begin{subarray}{c}i_{1}+\cdot+i_{n}=% 3,\\ i_{k}\geq 0\end{subarray}}R_{i_{1},\dots,i_{n}}(x)\prod_{k=1}^{n}x_{k}^{i_{k}},italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_H start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_x + ∑ start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + ⋅ + italic_i start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT = 3 , end_CELL end_ROW start_ROW start_CELL italic_i start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≥ 0 end_CELL end_ROW end_ARG end_POSTSUBSCRIPT italic_R start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_i start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x ) ∏ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , (6)

We now introduce the notion of neural network residual.

Definition 3.5.

Let the neural network residual R^Nsubscript^𝑅𝑁\hat{R}_{N}over^ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT where N>0𝑁0N>0italic_N > 0 is the number of neurons per layer be

R^N(x)=WNl+1HWNl,bNlHW0,b0(x)+bNl+1subscript^𝑅𝑁𝑥subscript𝑊subscript𝑁𝑙1subscript𝐻subscript𝑊subscript𝑁𝑙subscript𝑏subscript𝑁𝑙subscript𝐻subscript𝑊0subscript𝑏0𝑥subscript𝑏subscript𝑁𝑙1\hat{R}_{N}(x)=W_{N_{l}+1}H_{W_{N_{l}},b_{N_{l}}}\circ\cdots\circ H_{W_{0},b_{% 0}}(x)+b_{N_{l}+1}over^ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_x ) = italic_W start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT italic_H start_POSTSUBSCRIPT italic_W start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∘ ⋯ ∘ italic_H start_POSTSUBSCRIPT italic_W start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_x ) + italic_b start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT (7)

where Nl\{0}subscript𝑁𝑙\0N_{l}\in\mathbb{N}\backslash\{0\}italic_N start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ∈ blackboard_N \ { 0 } is the number of hidden layers, the weights W0N×nsubscript𝑊0superscript𝑁𝑛W_{0}\in\mathbb{R}^{N\times n}italic_W start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_n end_POSTSUPERSCRIPT, WiN×Nsubscript𝑊𝑖superscript𝑁𝑁W_{i}\in\mathbb{R}^{N\times N}italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_N end_POSTSUPERSCRIPT, WNl+1𝐧𝟑×Nsubscript𝑊subscript𝑁𝑙1superscriptsuperscript𝐧3𝑁W_{N_{l}+1}\in\mathbb{R}^{\mathbf{n^{3}}\times N}italic_W start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT bold_n start_POSTSUPERSCRIPT bold_3 end_POSTSUPERSCRIPT × italic_N end_POSTSUPERSCRIPT, biases biNsubscript𝑏𝑖superscript𝑁b_{i}\in\mathbb{R}^{N}italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT and

HW,b(x)=ψ.(Wx+b).formulae-sequencesubscript𝐻𝑊𝑏𝑥𝜓𝑊𝑥𝑏H_{W,b}(x)=\psi.(Wx+b).italic_H start_POSTSUBSCRIPT italic_W , italic_b end_POSTSUBSCRIPT ( italic_x ) = italic_ψ . ( italic_W italic_x + italic_b ) .

The parameters of the neural network are packed into the tensor Θ={(Wi,bi)}i=0,,Nl+1Θsubscriptsubscript𝑊𝑖subscript𝑏𝑖𝑖0subscript𝑁𝑙1\Theta=\left\{(W_{i},b_{i})\right\}_{i=0,\dots,N_{l}+1}roman_Θ = { ( italic_W start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 0 , … , italic_N start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT + 1 end_POSTSUBSCRIPT.

The activation function ψ𝜓\psiitalic_ψ is of class C(,)superscript𝐶C^{\infty}(\mathbb{R},\mathbb{R})italic_C start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( blackboard_R , blackboard_R ) and is bounded up to the order 3333.

A Taylor-neural Lyapunov function can then be proposed as

V^N(x)=12xPx+iR^N(x),eiR^N(i)(x)k=1nxkiksubscript^𝑉𝑁𝑥12superscript𝑥top𝑃𝑥subscript𝑖subscriptsubscript^𝑅𝑁𝑥subscript𝑒𝑖superscriptsubscript^𝑅𝑁𝑖𝑥superscriptsubscriptproduct𝑘1𝑛superscriptsubscript𝑥𝑘subscript𝑖𝑘\hat{V}_{N}(x)=\frac{1}{2}x^{\top}Px+\sum_{i\in\mathcal{I}}\underbrace{\left% \langle\hat{R}_{N}(x),e_{i}\right\rangle}_{\hat{R}_{N}^{(i)}(x)}\prod_{k=1}^{n% }x_{k}^{i_{k}}over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_x ) = divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_P italic_x + ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT under⏟ start_ARG ⟨ over^ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_x ) , italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟩ end_ARG start_POSTSUBSCRIPT over^ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ( italic_x ) end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT (8)

where P0succeeds𝑃0P\succ 0italic_P ≻ 0 such that AP+PA0precedessuperscript𝐴top𝑃𝑃𝐴0A^{\top}P+PA\prec 0italic_A start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_P + italic_P italic_A ≺ 0, ={(i1,,in){0,1,2,3}n|k=1nik=3}conditional-setsubscript𝑖1subscript𝑖𝑛superscript0123𝑛superscriptsubscript𝑘1𝑛subscript𝑖𝑘3\mathcal{I}=\left\{(i_{1},\dots,i_{n})\in\{0,1,2,3\}^{n}\ |\ \sum_{k=1}^{n}i_{% k}=3\right\}caligraphic_I = { ( italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_i start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ∈ { 0 , 1 , 2 , 3 } start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 3 } and {ei}isubscriptsubscript𝑒𝑖𝑖\{e_{i}\}_{i\in\mathcal{I}}{ italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT is a given basis of n3superscriptsuperscript𝑛3\mathbb{R}^{n^{3}}blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT.

Proposition 3.6.

Under Assumption 1, for any ε(0,1)𝜀01\varepsilon\in(0,1)italic_ε ∈ ( 0 , 1 ), there exist N>0𝑁0N>0italic_N > 0 and ΘΘ\Thetaroman_Θ such that V^Nsubscript^𝑉𝑁\hat{V}_{N}over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT is a Lyapunov function and

{x𝒟|V(x)1ε}(V^N)(V).conditional-set𝑥𝒟superscript𝑉𝑥1𝜀subscript^𝑉𝑁superscript𝑉\left\{x\in\mathcal{D}\ |\ V^{*}(x)\leq 1-\varepsilon\right\}\subseteq\mathcal% {R}(\hat{V}_{N})\subseteq\mathcal{R}(V^{*}).{ italic_x ∈ caligraphic_D | italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x ) ≤ 1 - italic_ε } ⊆ caligraphic_R ( over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) ⊆ caligraphic_R ( italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) . (9)
Proof 3.7.

We provide a formal proof in Appendix .1 and briefly outline it here. Select one Vsuperscript𝑉V^{*}italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT in 𝒱Fsubscript𝒱𝐹\mathcal{V}_{F}caligraphic_V start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT and write it as in (5). For P=H𝑃superscript𝐻P=H^{*}italic_P = italic_H start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, the proposed Taylor-neural Lyapunov function in (8) is quadratic and as close to Vsuperscript𝑉V^{*}italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT as desired in a sufficiently small neighborhood 𝒩𝒩\mathcal{N}caligraphic_N around the origin. Outside of 𝒩𝒩\mathcal{N}caligraphic_N, one can choose R^N(i)superscriptsubscript^𝑅𝑁𝑖\hat{R}_{N}^{(i)}over^ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT to be as close to Risubscript𝑅𝑖R_{i}italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT as desired provided a sufficient number of neurons [41, Theorem 4], so that the region of attraction is approximated as well as desired and that V^Nsubscript^𝑉𝑁\hat{V}_{N}over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT is a Lyapunov function.

Remark 3.8.

Note that compared to all other papers on the subject so far [29, 30, 27, 31, 21, 22, 12], none could argue that the proposed neural network approximation was a Lyapunov function. In [21], the authors pointed out this fact by noticing that a neural network approximation of a Lyapunov function usually does not have a negative time derivative everywhere around the origin. Using a third-order Taylor expansion prevents this phenomenon, leading to the previous proposition.

We have used here a physics-informed machine learning approach since we introduce a neural network approximation of a Lyapunov function and the constraints written in (2) translate into partial differential equations. Next section will focus on the optimization problem formulation.

4 Learning a Taylor-Neural Lyapunov function

In this section, we discuss how to learn a Taylor-Neural Lyapunov function. We formulate the training problem as a suitable constrained optimization problem, encoding the properties of Lyapunov functions, and we propose an efficient solver for the resulting problem.

4.1 Optimization problem formulation

The Taylor-neural Lyapunov function proposed in (8) does not enforce any properties of a Lyapunov function presented in (2a)-(2c). We want to encode in the neural network architecture as many constraints as possible to minimize the work of the learning procedure. To that extent, we introduce the following slightly modified function for γ0𝛾0\gamma\neq 0italic_γ ≠ 0:

V~N,γ,ε(x)=min{1,|V^N(x)|ε}+γ2x2.subscript~𝑉𝑁𝛾𝜀𝑥1subscriptsubscript^𝑉𝑁𝑥𝜀superscript𝛾2superscriptnorm𝑥2\tilde{V}_{N,\gamma,\varepsilon}(x)=\min\left\{1,\left|\hat{V}_{N}(x)\right|_{% \varepsilon}\right\}+\gamma^{2}\|x\|^{2}.over~ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_N , italic_γ , italic_ε end_POSTSUBSCRIPT ( italic_x ) = roman_min { 1 , | over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_x ) | start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT } + italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ italic_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (10)

where xε=xtanh(xε1)subscriptnorm𝑥𝜀𝑥𝑥superscript𝜀1\|x\|_{\varepsilon}=x\tanh\left(x\varepsilon^{-1}\right)∥ italic_x ∥ start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT = italic_x roman_tanh ( italic_x italic_ε start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) is a smooth approximation of the absolute value such that for any x𝑥x\in\mathbb{R}italic_x ∈ blackboard_R, limε0xε=|x|subscript𝜀0subscriptnorm𝑥𝜀𝑥\lim_{\varepsilon\to 0}\|x\|_{\varepsilon}=|x|roman_lim start_POSTSUBSCRIPT italic_ε → 0 end_POSTSUBSCRIPT ∥ italic_x ∥ start_POSTSUBSCRIPT italic_ε end_POSTSUBSCRIPT = | italic_x |. Note that V~N,γ,εsubscript~𝑉𝑁𝛾𝜀\tilde{V}_{N,\gamma,\varepsilon}over~ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_N , italic_γ , italic_ε end_POSTSUBSCRIPT can be chosen as close to V^Nsubscript^𝑉𝑁\hat{V}_{N}over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT as desired.

Remark 4.9.

From now on, to ease the reading, we will make the following abuse of notation V~=V~N.γ,ε~𝑉subscript~𝑉formulae-sequence𝑁𝛾𝜀\tilde{V}=\tilde{V}_{N.\gamma,\varepsilon}over~ start_ARG italic_V end_ARG = over~ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_N . italic_γ , italic_ε end_POSTSUBSCRIPT.

For γ>0𝛾0\gamma>0italic_γ > 0, V~~𝑉\tilde{V}over~ start_ARG italic_V end_ARG is positive definite by construction as in (2b) and (2a). For V~~𝑉\tilde{V}over~ start_ARG italic_V end_ARG to be a Lyapunov function, i.e. V~𝒱~𝑉𝒱\tilde{V}\in\mathcal{V}over~ start_ARG italic_V end_ARG ∈ caligraphic_V, it remains to satisfy (2c). Since we also approximate the largest region of attraction, we want V~𝒱Z~𝑉subscript𝒱𝑍\tilde{V}\in\mathcal{V}_{Z}over~ start_ARG italic_V end_ARG ∈ caligraphic_V start_POSTSUBSCRIPT italic_Z end_POSTSUBSCRIPT. Combining these two facts leads to the formulation:

x(V~),V~x(x)f(x)=(1V~(x))ϕ(x)formulae-sequencefor-all𝑥~𝑉~𝑉𝑥𝑥𝑓𝑥1~𝑉𝑥italic-ϕ𝑥\forall x\in\mathcal{R}(\tilde{V}),\quad\frac{\partial\tilde{V}}{\partial x}(x% )\cdot f(x)=-\left(1-\tilde{V}(x)\right)\phi(x)∀ italic_x ∈ caligraphic_R ( over~ start_ARG italic_V end_ARG ) , divide start_ARG ∂ over~ start_ARG italic_V end_ARG end_ARG start_ARG ∂ italic_x end_ARG ( italic_x ) ⋅ italic_f ( italic_x ) = - ( 1 - over~ start_ARG italic_V end_ARG ( italic_x ) ) italic_ϕ ( italic_x ) (11)

where ϕitalic-ϕ\phiitalic_ϕ is definite positive.

The positive definiteness of ϕitalic-ϕ\phiitalic_ϕ in 𝒟𝒟\mathcal{D}caligraphic_D is equivalent to the existence of β0𝛽0\beta\neq 0italic_β ≠ 0 such that ϕ(x)β2x2italic-ϕ𝑥superscript𝛽2superscriptnorm𝑥2\phi(x)\geq\beta^{2}\|x\|^{2}italic_ϕ ( italic_x ) ≥ italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ italic_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT for all x𝒟𝑥𝒟x\in\mathcal{D}italic_x ∈ caligraphic_D. Equality (11) then translates into

β0,x,DVβ(x)=V~x(x)f(x)+β2(1V~(x))x20\begin{split}&\exists\beta\neq 0,\quad\forall x\in\mathcal{R},\\ &\quad DV_{\beta}(x)=\frac{\partial\tilde{V}}{\partial x}(x)\cdot f(x)+\beta^{% 2}\left(1-\tilde{V}(x)\right)\|x\|^{2}\leq 0\end{split}start_ROW start_CELL end_CELL start_CELL ∃ italic_β ≠ 0 , ∀ italic_x ∈ caligraphic_R , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL italic_D italic_V start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT ( italic_x ) = divide start_ARG ∂ over~ start_ARG italic_V end_ARG end_ARG start_ARG ∂ italic_x end_ARG ( italic_x ) ⋅ italic_f ( italic_x ) + italic_β start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 - over~ start_ARG italic_V end_ARG ( italic_x ) ) ∥ italic_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ 0 end_CELL end_ROW (12)

where, to ease the writing, we use =(V~)~𝑉\mathcal{R}=\mathcal{R}(\tilde{V})caligraphic_R = caligraphic_R ( over~ start_ARG italic_V end_ARG ).

Remark 4.10.

In [12], they have a similar equality. The authors state that equality constraints are much better handled in training algorithms. However, in their case, they must pick ϕitalic-ϕ\phiitalic_ϕ as a neural network which prevents the equality from being strictly enforced for all x𝑥x\in\mathcal{R}italic_x ∈ caligraphic_R since the left-hand side can never equal the right-hand one.

Integrating (12) leads to

[DVβ(x)]+2𝑑x=0.subscriptsuperscriptsubscriptdelimited-[]𝐷subscript𝑉𝛽𝑥2differential-d𝑥0\int_{\mathcal{R}}\Big{[}DV_{\beta}(x)\Big{]}_{+}^{2}dx=0.∫ start_POSTSUBSCRIPT caligraphic_R end_POSTSUBSCRIPT [ italic_D italic_V start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT ( italic_x ) ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d italic_x = 0 . (13)

Any Lyapunov function in 𝒱𝒱\mathcal{V}caligraphic_V will satisfy (13). To ensure that V~N𝒱Fsubscript~𝑉𝑁subscript𝒱𝐹\tilde{V}_{N}\in\mathcal{V}_{F}over~ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ∈ caligraphic_V start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT, that is, to maximize the region of attraction, one needs to add the objective that V~Nx(x)f(x)=0subscript~𝑉𝑁𝑥𝑥𝑓𝑥0\frac{\partial\tilde{V}_{N}}{\partial x}(x)\cdot f(x)=0divide start_ARG ∂ over~ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_x end_ARG ( italic_x ) ⋅ italic_f ( italic_x ) = 0 for x𝑥x\in\partial\mathcal{R}italic_x ∈ ∂ caligraphic_R. This leads to the following optimization problem:

P,Θ=argminP,Θ,β0,γ0DV0(s)2𝑑ss.t.[DVβ(x)]+2𝑑x=0.superscript𝑃superscriptΘsubscriptformulae-sequence𝑃Θ𝛽0𝛾0subscript𝐷subscript𝑉0superscript𝑠2differential-d𝑠s.t.subscriptsuperscriptsubscriptdelimited-[]𝐷subscript𝑉𝛽𝑥2differential-d𝑥0P^{*},\Theta^{*}=\begin{array}[t]{cl}\displaystyle\operatorname*{\arg\!\min}_{% P,\Theta,\beta\neq 0,\gamma\neq 0}&\displaystyle\int_{\partial\mathcal{R}}DV_{% 0}(s)^{2}\ ds\\ \text{s.t.}&\displaystyle\int_{\mathcal{R}}\Big{[}DV_{\beta}(x)\Big{]}_{+}^{2}% dx=0.\end{array}italic_P start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , roman_Θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = start_ARRAY start_ROW start_CELL start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT italic_P , roman_Θ , italic_β ≠ 0 , italic_γ ≠ 0 end_POSTSUBSCRIPT end_CELL start_CELL ∫ start_POSTSUBSCRIPT ∂ caligraphic_R end_POSTSUBSCRIPT italic_D italic_V start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_s ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d italic_s end_CELL end_ROW start_ROW start_CELL s.t. end_CELL start_CELL ∫ start_POSTSUBSCRIPT caligraphic_R end_POSTSUBSCRIPT [ italic_D italic_V start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT ( italic_x ) ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d italic_x = 0 . end_CELL end_ROW end_ARRAY (14)

4.2 Numerical solution to the constrained optimization problem

The constrained optimization problem (14) is numerically intractable because of the integrals and the dynamical constraint. Therefore, we propose the training method depicted in Algorithm 1, which consists of the following routines:

  1. 1.

    sampling the integral in the loss and constraints,

  2. 2.

    formulate the problem as a Lagrangian optimization problem defined on sampled points,

  3. 3.

    apply a primal-dual strategy.

These steps will lead to a practical algorithm for solving (14) which is numerically efficient.

Algorithm 1 Training a Taylor-neural Lyapunov function
Nepoch,Nλ,N1,N2,αλ,αv,αη,ξsubscript𝑁epochsubscript𝑁𝜆subscript𝑁1subscript𝑁2subscript𝛼𝜆subscript𝛼𝑣subscript𝛼𝜂𝜉N_{\textrm{epoch}},N_{\lambda},N_{1},N_{2},\alpha_{\lambda},\alpha_{v},\alpha_% {\eta},\xiitalic_N start_POSTSUBSCRIPT epoch end_POSTSUBSCRIPT , italic_N start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT , italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_N start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT , italic_α start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT , italic_ξ
P0,Θ0,γ0,β0I,Xavier(),0.01,1.0formulae-sequencesubscript𝑃0subscriptΘ0subscript𝛾0subscript𝛽0𝐼Xavier0.011.0P_{0},\Theta_{0},\gamma_{0},\beta_{0}\leftarrow I,\textrm{Xavier}(),0.01,1.0italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , roman_Θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ← italic_I , Xavier ( ) , 0.01 , 1.0
λ0,λ10.0,1.0formulae-sequencesubscript𝜆0subscript𝜆10.01.0\lambda_{0},\lambda_{1}\leftarrow 0.0,1.0italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ← 0.0 , 1.0
Sample N0subscript𝑁0N_{0}italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT points from 𝒟0subscript𝒟0\mathcal{D}_{0}caligraphic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, N1subscript𝑁1N_{1}italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT points from 𝒟1subscript𝒟1\mathcal{D}_{1}caligraphic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT
for k=1Nepoch𝑘1subscript𝑁epochk=1\dots N_{\textrm{epoch}}italic_k = 1 … italic_N start_POSTSUBSCRIPT epoch end_POSTSUBSCRIPT do
     Update primal using (21)
     Update ηksubscript𝜂𝑘\eta_{k}italic_η start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT using (26)
     Pk+1Proj𝒞(γk+1)(Pk+1)subscript𝑃𝑘1subscriptProj𝒞subscript𝛾𝑘1subscript𝑃𝑘1P_{k+1}\leftarrow\text{Proj}_{\mathcal{C}(\gamma_{k+1})}(P_{k+1})italic_P start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ← Proj start_POSTSUBSCRIPT caligraphic_C ( italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ) \triangleright Using the SDP (24)
     if k mod Nλ𝑘 mod subscript𝑁𝜆k\text{ mod }N_{\lambda}italic_k mod italic_N start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT is 00 then
         Update dual using (22)
         Resample 𝒟1subscript𝒟1\mathcal{D}_{1}caligraphic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT
     end if
     if Stopping criteria then
         Break
     end if
end for
Pk,Θk,γk,βksubscript𝑃𝑘subscriptΘ𝑘subscript𝛾𝑘subscript𝛽𝑘P_{k},\Theta_{k},\gamma_{k},\beta_{k}italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , roman_Θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT

4.2.1 Sampling of the integrals

The constraint is an integral over part of the domain 𝒟𝒟\mathcal{D}caligraphic_D. It is classical [17] to use a uniform sampling over the whole domain using, for instance, latin-hyperspace sampling. Considering that we draw N1subscript𝑁1N_{1}italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT points from the uniform distribution, we get the discrete set 𝒟1subscript𝒟1\mathcal{D}_{1}caligraphic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. The novelty here comes to consider only a part of this domain, leading to the following approximation:

[DVβ(x)]+2𝑑x1|¯|x¯[DVβ(x)]+2similar-to-or-equalssubscriptsuperscriptsubscriptdelimited-[]𝐷subscript𝑉𝛽𝑥2differential-d𝑥1¯subscript𝑥¯superscriptsubscriptdelimited-[]𝐷subscript𝑉𝛽𝑥2\int_{\mathcal{R}}\Big{[}DV_{\beta}(x)\Big{]}_{+}^{2}dx\ \simeq\ \frac{1}{% \left|\bar{\mathcal{R}}\right|}\sum_{x\in\bar{\mathcal{R}}}\Big{[}DV_{\beta}(x% )\Big{]}_{+}^{2}∫ start_POSTSUBSCRIPT caligraphic_R end_POSTSUBSCRIPT [ italic_D italic_V start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT ( italic_x ) ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d italic_x ≃ divide start_ARG 1 end_ARG start_ARG | over¯ start_ARG caligraphic_R end_ARG | end_ARG ∑ start_POSTSUBSCRIPT italic_x ∈ over¯ start_ARG caligraphic_R end_ARG end_POSTSUBSCRIPT [ italic_D italic_V start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT ( italic_x ) ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (15)

where ¯𝒟1={x𝒟1|V~(x)<1}subscript¯subscript𝒟1conditional-set𝑥subscript𝒟1~𝑉𝑥1\bar{\mathcal{R}}_{\mathcal{D}_{1}}=\left\{x\in\mathcal{D}_{1}\ |\ \tilde{V}(x% )<1\right\}over¯ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT caligraphic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = { italic_x ∈ caligraphic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | over~ start_ARG italic_V end_ARG ( italic_x ) < 1 } and |¯(V~)|¯~𝑉\left|\bar{\mathcal{R}}(\tilde{V})\right|| over¯ start_ARG caligraphic_R end_ARG ( over~ start_ARG italic_V end_ARG ) | is the cardinal of ¯¯\bar{\mathcal{R}}over¯ start_ARG caligraphic_R end_ARG.

Concerning the objective, one can do something similar by sampling points on the curve \partial\mathcal{R}∂ caligraphic_R. We first need to draw N0subscript𝑁0N_{0}italic_N start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT points from 𝒟𝒟\partial\mathcal{D}∂ caligraphic_D to obtain the discrete set 𝒟0subscript𝒟0\mathcal{D}_{0}caligraphic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. Then we scale these points so that they fall on the boundary of the set 𝒟𝒟\mathcal{R}\cap\mathcal{D}caligraphic_R ∩ caligraphic_D, i.e., for each point xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in 𝒟0subscript𝒟0\mathcal{D}_{0}caligraphic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, we create the variable ηi(0,1]subscript𝜂𝑖01\eta_{i}\in(0,1]italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ ( 0 , 1 ] such that ηixisubscript𝜂𝑖subscript𝑥𝑖\eta_{i}x_{i}\in\partial\mathcal{R}italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ ∂ caligraphic_R. We then get

¯𝒟0={ηixi|xi𝒟0}.subscript¯subscript𝒟0conditional-setsubscript𝜂𝑖subscript𝑥𝑖subscript𝑥𝑖subscript𝒟0\bar{\partial\mathcal{R}}_{\mathcal{D}_{0}}=\left\{\eta_{i}x_{i}\ |\ x_{i}\in% \mathcal{D}_{0}\right\}.over¯ start_ARG ∂ caligraphic_R end_ARG start_POSTSUBSCRIPT caligraphic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = { italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT } .

With the previous definitions, ¯𝒟0subscript¯subscript𝒟0\bar{\partial\mathcal{R}}_{\mathcal{D}_{0}}\subset\partial\mathcal{R}over¯ start_ARG ∂ caligraphic_R end_ARG start_POSTSUBSCRIPT caligraphic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⊂ ∂ caligraphic_R and we get the following approximation of the integral:

DV0(s)2𝑑s1|¯|s¯DV0(s)2similar-to-or-equalssubscript𝐷subscript𝑉0superscript𝑠2differential-d𝑠1¯subscript𝑠¯𝐷subscript𝑉0superscript𝑠2\int_{\partial\mathcal{R}}DV_{0}(s)^{2}\ ds\ \simeq\ \frac{1}{\left|\bar{% \partial\mathcal{R}}\right|}\sum_{s\in\bar{\partial\mathcal{R}}}DV_{0}(s)^{2}∫ start_POSTSUBSCRIPT ∂ caligraphic_R end_POSTSUBSCRIPT italic_D italic_V start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_s ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_d italic_s ≃ divide start_ARG 1 end_ARG start_ARG | over¯ start_ARG ∂ caligraphic_R end_ARG | end_ARG ∑ start_POSTSUBSCRIPT italic_s ∈ over¯ start_ARG ∂ caligraphic_R end_ARG end_POSTSUBSCRIPT italic_D italic_V start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_s ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT (16)
Remark 4.11.

The previous approximation is correct if the set (V~N)subscript~𝑉𝑁\mathcal{R}(\tilde{V}_{N})caligraphic_R ( over~ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) is a star domain111A set A𝐴Aitalic_A is a star domain at x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT if for all xA𝑥𝐴x\in Aitalic_x ∈ italic_A the line-segment [x0,x]subscript𝑥0𝑥[x_{0},x][ italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x ] lies in A𝐴Aitalic_A. at the origin. Otherwise, there might not be a unique η𝜂\etaitalic_η for each point, and consequently, the sampling will not be uniform at the boundary.

A sampled version of (14) is then:

P,Θ=argminP,Θ,β0,γ01|¯𝒟0|s¯𝒟0DV0(s)2s.t.1|¯𝒟1|x¯𝒟1[DVβ(x)]+2=0superscript𝑃superscriptΘsubscriptformulae-sequence𝑃Θ𝛽0𝛾01subscript¯subscript𝒟0subscript𝑠subscript¯subscript𝒟0𝐷subscript𝑉0superscript𝑠2s.t.1subscript¯subscript𝒟1subscript𝑥subscript¯subscript𝒟1superscriptsubscriptdelimited-[]𝐷subscript𝑉𝛽𝑥20P^{*},\Theta^{*}=\begin{array}[t]{cl}\displaystyle\operatorname*{\arg\!\min}_{% P,\Theta,\beta\neq 0,\gamma\neq 0}&\displaystyle\frac{1}{\left|\bar{\partial% \mathcal{R}}_{\mathcal{D}_{0}}\right|}\sum_{s\in\bar{\partial\mathcal{R}}_{% \mathcal{D}_{0}}}DV_{0}(s)^{2}\\ \text{s.t.}&\displaystyle\frac{1}{\left|\bar{\mathcal{R}}_{\mathcal{D}_{1}}% \right|}\sum_{x\in\bar{\mathcal{R}}_{\mathcal{D}_{1}}}\Big{[}DV_{\beta}(x)\Big% {]}_{+}^{2}=0\end{array}italic_P start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , roman_Θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = start_ARRAY start_ROW start_CELL start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT italic_P , roman_Θ , italic_β ≠ 0 , italic_γ ≠ 0 end_POSTSUBSCRIPT end_CELL start_CELL divide start_ARG 1 end_ARG start_ARG | over¯ start_ARG ∂ caligraphic_R end_ARG start_POSTSUBSCRIPT caligraphic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT | end_ARG ∑ start_POSTSUBSCRIPT italic_s ∈ over¯ start_ARG ∂ caligraphic_R end_ARG start_POSTSUBSCRIPT caligraphic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_D italic_V start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_s ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL s.t. end_CELL start_CELL divide start_ARG 1 end_ARG start_ARG | over¯ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT caligraphic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT | end_ARG ∑ start_POSTSUBSCRIPT italic_x ∈ over¯ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT caligraphic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_D italic_V start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT ( italic_x ) ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 0 end_CELL end_ROW end_ARRAY (17)

for any discrete 𝒟0𝒟subscript𝒟0𝒟\mathcal{D}_{0}\subseteq\mathcal{D}caligraphic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ⊆ caligraphic_D and 𝒟1𝒟subscript𝒟1𝒟\mathcal{D}_{1}\subseteq\mathcal{D}caligraphic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊆ caligraphic_D.

4.2.2 Lagrangian formulation

The optimization problem in (17) is a learning problem under constraints. Some techniques to solve them are explored in [42] and we will follow the strategy mentioned in [43, 44]. Let first define the following costs:

0(P,Θ,γ)=1|¯𝒟0|s¯𝒟0DV0(s)2,1(P,Θ,γ,β)=1|¯𝒟1|x¯𝒟1[DVβ(x)]+2.subscript0𝑃Θ𝛾1subscript¯subscript𝒟0subscript𝑠subscript¯subscript𝒟0𝐷subscript𝑉0superscript𝑠2subscript1𝑃Θ𝛾𝛽1subscript¯subscript𝒟1subscript𝑥subscript¯subscript𝒟1superscriptsubscriptdelimited-[]𝐷subscript𝑉𝛽𝑥2\begin{array}[]{l}\displaystyle\mathcal{L}_{0}(P,\Theta,\gamma)=\frac{1}{\left% |\bar{\partial\mathcal{R}}_{\mathcal{D}_{0}}\right|}\sum_{s\in\bar{\partial% \mathcal{R}}_{\mathcal{D}_{0}}}DV_{0}(s)^{2},\\ \displaystyle\mathcal{L}_{1}(P,\Theta,\gamma,\beta)=\frac{1}{\left|\bar{% \mathcal{R}}_{\mathcal{D}_{1}}\right|}\sum_{x\in\bar{\mathcal{R}}_{\mathcal{D}% _{1}}}\Big{[}DV_{\beta}(x)\Big{]}_{+}^{2}.\end{array}start_ARRAY start_ROW start_CELL caligraphic_L start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_P , roman_Θ , italic_γ ) = divide start_ARG 1 end_ARG start_ARG | over¯ start_ARG ∂ caligraphic_R end_ARG start_POSTSUBSCRIPT caligraphic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT | end_ARG ∑ start_POSTSUBSCRIPT italic_s ∈ over¯ start_ARG ∂ caligraphic_R end_ARG start_POSTSUBSCRIPT caligraphic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_D italic_V start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_s ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , end_CELL end_ROW start_ROW start_CELL caligraphic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_P , roman_Θ , italic_γ , italic_β ) = divide start_ARG 1 end_ARG start_ARG | over¯ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT caligraphic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT | end_ARG ∑ start_POSTSUBSCRIPT italic_x ∈ over¯ start_ARG caligraphic_R end_ARG start_POSTSUBSCRIPT caligraphic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_D italic_V start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT ( italic_x ) ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . end_CELL end_ROW end_ARRAY (18)

We use the Lagrange multipliers λ0,λ1>0subscript𝜆0subscript𝜆10\lambda_{0},\lambda_{1}>0italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > 0. The extended cost is expressed as:

λ0,λ1(P,Θ,γ,β)=λ00(P,Θ,γ)+λ11(P,Θ,γ,β).subscriptsubscript𝜆0subscript𝜆1𝑃Θ𝛾𝛽subscript𝜆0subscript0𝑃Θ𝛾subscript𝜆1subscript1𝑃Θ𝛾𝛽\mathcal{L}_{\lambda_{0},\lambda_{1}}(P,\Theta,\gamma,\beta)=\lambda_{0}% \mathcal{L}_{0}(P,\Theta,\gamma)+\lambda_{1}\mathcal{L}_{1}(P,\Theta,\gamma,% \beta).caligraphic_L start_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_P , roman_Θ , italic_γ , italic_β ) = italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_P , roman_Θ , italic_γ ) + italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_P , roman_Θ , italic_γ , italic_β ) . (19)

Problem (14) is then equivalent to solving

P,Θ=argminP,Θ,β0,γ0maxλ1λ0,λ1(P,Θ,γ,β).superscript𝑃superscriptΘabsentsubscript𝑃Θformulae-sequence𝛽0𝛾0subscriptsubscript𝜆1subscriptsubscript𝜆0subscript𝜆1𝑃Θ𝛾𝛽\begin{array}[]{rl}P^{*},\Theta^{*}\!\!\!&\displaystyle=\operatorname*{\arg\!% \min}_{\begin{subarray}{c}P,\Theta,\\ \beta\neq 0,\gamma\neq 0\end{subarray}}\max_{\lambda_{1}}\mathcal{L}_{\lambda_% {0},\lambda_{1}}(P,\Theta,\gamma,\beta).\end{array}start_ARRAY start_ROW start_CELL italic_P start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , roman_Θ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_CELL start_CELL = start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_P , roman_Θ , end_CELL end_ROW start_ROW start_CELL italic_β ≠ 0 , italic_γ ≠ 0 end_CELL end_ROW end_ARG end_POSTSUBSCRIPT roman_max start_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_P , roman_Θ , italic_γ , italic_β ) . end_CELL end_ROW end_ARRAY (20)

Since 1>0subscript10\mathcal{L}_{1}>0caligraphic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > 0, a solution to the previous problem would be with λ=𝜆\lambda=\inftyitalic_λ = ∞, ensuring then [DVβ(x)]+=0subscriptdelimited-[]𝐷subscript𝑉𝛽𝑥0\Big{[}DV_{\beta}(x)\Big{]}_{+}=0[ italic_D italic_V start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT ( italic_x ) ] start_POSTSUBSCRIPT + end_POSTSUBSCRIPT = 0 for all x𝑥x\in\mathcal{R}italic_x ∈ caligraphic_R. As discussed in the next section, this rephrased problem can be solved using primal-dual optimization.

4.2.3 Training algorithm

The training algorithm is divided into several parts, each contributing to the overall robustness.

The primal-dual algorithm

The training algorithm is based on primal-dual optimization [45]. A similar training scheme has been used and investigated in [43, 46] for physics-informed machine learning problems and has shown great potential to improve the robustness of the training algorithm (i.e. decrease the sensitivity to the initialization). The main idea of this algorithm is to alternate between solving the min\minroman_min and max\maxroman_max problems. The primal problem is expressed in terms of the primal variables vk=(Pk,Θk,γk,βk)subscript𝑣𝑘subscript𝑃𝑘subscriptΘ𝑘subscript𝛾𝑘subscript𝛽𝑘v_{k}=(P_{k},\ \Theta_{k},\ \gamma_{k},\ \beta_{k})italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ( italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , roman_Θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ), and a first-order algorithm is generally written as:

vk+1=vkαv(P,Θ,γ,β)λ0,λ1(Pk,Θk,γk,βk)subscript𝑣𝑘1subscript𝑣𝑘subscript𝛼𝑣subscript𝑃subscriptΘsubscript𝛾subscript𝛽subscriptsubscript𝜆0subscript𝜆1subscript𝑃𝑘subscriptΘ𝑘subscript𝛾𝑘subscript𝛽𝑘v_{k+1}=v_{k}-\alpha_{v}\left(\nabla_{P},\nabla_{\Theta},\nabla_{\gamma},% \nabla_{\beta}\right)\cdot\mathcal{L}_{\lambda_{0},\lambda_{1}}(P_{k},\Theta_{% k},\gamma_{k},\beta_{k})italic_v start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = italic_v start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - italic_α start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ( ∇ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT , ∇ start_POSTSUBSCRIPT roman_Θ end_POSTSUBSCRIPT , ∇ start_POSTSUBSCRIPT italic_γ end_POSTSUBSCRIPT , ∇ start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT ) ⋅ caligraphic_L start_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , roman_Θ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) (21)

where P0,Θ0,β0,γ0subscript𝑃0subscriptΘ0subscript𝛽0subscript𝛾0P_{0},\Theta_{0},\beta_{0},\gamma_{0}italic_P start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , roman_Θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_β start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_γ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT are initial random values and αvsubscript𝛼𝑣\alpha_{v}italic_α start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT is the primal learning rate.

Remark 4.12.

Note that the update 21 can also be modified to include momentum and increase the robustness of the training algorithm (see ADAM [47]).

The dual problem aims at approximating the solution to the max\maxroman_max problem. Let the dual variables be 𝝀k=[λ0(k)λ1(k)]subscript𝝀𝑘superscriptdelimited-[]matrixsubscript𝜆0𝑘subscript𝜆1𝑘top\bm{\lambda}_{k}=\left[\begin{matrix}\lambda_{0}(k)&\lambda_{1}(k)\end{matrix}% \right]^{\top}bold_italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = [ start_ARG start_ROW start_CELL italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_k ) end_CELL start_CELL italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_k ) end_CELL end_ROW end_ARG ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT, that leads to the following first-order optimization scheme:

𝝀k+1=𝝀k+αλ𝝀λ0,λ1(P,Θ,γ,β)=𝝀k+αλ[0(P,Θ,γ)1(P,Θ,γ,β)]subscript𝝀𝑘1absentsubscript𝝀𝑘subscript𝛼𝜆subscript𝝀subscriptsubscript𝜆0subscript𝜆1𝑃Θ𝛾𝛽missing-subexpressionabsentsubscript𝝀𝑘subscript𝛼𝜆delimited-[]matrixsubscript0𝑃Θ𝛾subscript1𝑃Θ𝛾𝛽\begin{array}[]{rl}\bm{\lambda}_{k+1}\!\!\!\!&=\bm{\lambda}_{k}+\alpha_{% \lambda}\nabla_{\bm{\lambda}}\mathcal{L}_{\lambda_{0},\lambda_{1}}(P,\Theta,% \gamma,\beta)\\ &\displaystyle=\bm{\lambda}_{k}+\alpha_{\lambda}\left[\begin{matrix}\mathcal{L% }_{0}(P,\Theta,\gamma)\\ \mathcal{L}_{1}(P,\Theta,\gamma,\beta)\end{matrix}\right]\end{array}start_ARRAY start_ROW start_CELL bold_italic_λ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT end_CELL start_CELL = bold_italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_α start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT ∇ start_POSTSUBSCRIPT bold_italic_λ end_POSTSUBSCRIPT caligraphic_L start_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_P , roman_Θ , italic_γ , italic_β ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = bold_italic_λ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + italic_α start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT [ start_ARG start_ROW start_CELL caligraphic_L start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_P , roman_Θ , italic_γ ) end_CELL end_ROW start_ROW start_CELL caligraphic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_P , roman_Θ , italic_γ , italic_β ) end_CELL end_ROW end_ARG ] end_CELL end_ROW end_ARRAY (22)

with αλsubscript𝛼𝜆\alpha_{\lambda}italic_α start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT being the dual learning rate.

Remark 4.13.

In light of curriculum learning [48], it has been shown that the constraint containing derivatives usually brings complexity into the original optimization problem [49]. Consequently, a solution to get more robust training is to start with λ0=0,λ1=1formulae-sequencesubscript𝜆00subscript𝜆11\lambda_{0}=0,\lambda_{1}=1italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 0 , italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1 and increase their values. Thus, the objective will be taken into account at a later stage, putting the focus on the constraint first. Note that 𝛌𝛌\bm{\lambda}bold_italic_λ is increasing since both 0subscript0\mathcal{L}_{0}caligraphic_L start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and 1subscript1\mathcal{L}_{1}caligraphic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT are positive.

Projection subroutine

The advantage of the Taylor-neural Lyapunov functions lies in its explainability locally around the origin. Since V^N,γ(x)=x(P+γ2I)x+o(x2)subscript^𝑉𝑁𝛾𝑥superscript𝑥top𝑃superscript𝛾2𝐼𝑥𝑜superscriptnorm𝑥2\hat{V}_{N,\gamma}(x)=x^{\top}(P+\gamma^{2}I)x+o(\|x\|^{2})over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_N , italic_γ end_POSTSUBSCRIPT ( italic_x ) = italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_P + italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) italic_x + italic_o ( ∥ italic_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), using Lemma 3.3, the following equation should hold:

ε>0,xn,x2<εx[A(P+γ2I)+(P+γ2I)A]x0.\exists\varepsilon>0,\forall x\in\mathbb{R}^{n},\quad\|x\|^{2}<\varepsilon% \Rightarrow\\ x^{\top}\left[A^{\top}(P+\gamma^{2}I)+(P+\gamma^{2}I)A\right]x\leq 0.start_ROW start_CELL ∃ italic_ε > 0 , ∀ italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , ∥ italic_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT < italic_ε ⇒ end_CELL end_ROW start_ROW start_CELL italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT [ italic_A start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_P + italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) + ( italic_P + italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) italic_A ] italic_x ≤ 0 . end_CELL end_ROW

The previous inequality implies that the matrix P𝑃Pitalic_P must belong to the positive cone

𝒞(γ)={P𝕊+n|A(P+γ2I)+(P+γ2I)A0}.𝒞𝛾conditional-set𝑃subscriptsuperscript𝕊𝑛precedessuperscript𝐴top𝑃superscript𝛾2𝐼𝑃superscript𝛾2𝐼𝐴0\mathcal{C}(\gamma)=\left\{P\in\mathbb{S}^{n}_{+}\ |\ A^{\top}(P+\gamma^{2}I)+% (P+\gamma^{2}I)A\prec 0\right\}.caligraphic_C ( italic_γ ) = { italic_P ∈ blackboard_S start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT | italic_A start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_P + italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) + ( italic_P + italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_I ) italic_A ≺ 0 } .

Checking if a matrix P𝑃Pitalic_P belongs to this positive cone is a semi-definite program. However, after one initial step, the obtained Pk+1subscript𝑃𝑘1P_{k+1}italic_P start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT does not necessarily belong to 𝒞(γk+1)𝒞subscript𝛾𝑘1\mathcal{C}(\gamma_{k+1})caligraphic_C ( italic_γ start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ). To enforce this property, a projection onto the positive cone is needed, defined as

Proj𝒞(γk)(Pk)=argminP^𝒞(γk)PkP^2,subscriptProj𝒞subscript𝛾𝑘subscript𝑃𝑘subscript^𝑃𝒞subscript𝛾𝑘superscriptnormsubscript𝑃𝑘^𝑃2\text{Proj}_{\mathcal{C}(\gamma_{k})}(P_{k})=\operatorname*{\arg\!\min}_{\hat{% P}\in\mathcal{C}(\gamma_{k})}\|P_{k}-\hat{P}\|^{2},Proj start_POSTSUBSCRIPT caligraphic_C ( italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ( italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG ∈ caligraphic_C ( italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT ∥ italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - over^ start_ARG italic_P end_ARG ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (23)

where the norm on the symmetric definite matrix cone is defined as the spectral radius. However, this problem is not a semidefinite program. Using Schur’s complement [7, Section 2.1] leads to the following equivalent formulation for α>0𝛼0\alpha>0italic_α > 0:

(PkP^)(PkP^)αIMPk(α,P^)=[αIPkP^PkP^I]0.precedes-or-equalssuperscriptsubscript𝑃𝑘^𝑃topsubscript𝑃𝑘^𝑃𝛼𝐼subscript𝑀subscript𝑃𝑘𝛼^𝑃delimited-[]matrix𝛼𝐼subscript𝑃𝑘^𝑃subscript𝑃𝑘^𝑃𝐼succeeds-or-equals0\begin{array}[]{l}\left(P_{k}-\hat{P}\right)^{\top}\left(P_{k}-\hat{P}\right)% \preceq\alpha I\quad\Leftrightarrow\\ \hfill M_{P_{k}}(\alpha,\hat{P})=\left[\begin{matrix}\alpha I&P_{k}-\hat{P}\\ P_{k}-\hat{P}&I\end{matrix}\right]\succeq 0.\end{array}start_ARRAY start_ROW start_CELL ( italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - over^ start_ARG italic_P end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - over^ start_ARG italic_P end_ARG ) ⪯ italic_α italic_I ⇔ end_CELL end_ROW start_ROW start_CELL italic_M start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_α , over^ start_ARG italic_P end_ARG ) = [ start_ARG start_ROW start_CELL italic_α italic_I end_CELL start_CELL italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - over^ start_ARG italic_P end_ARG end_CELL end_ROW start_ROW start_CELL italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - over^ start_ARG italic_P end_ARG end_CELL start_CELL italic_I end_CELL end_ROW end_ARG ] ⪰ 0 . end_CELL end_ROW end_ARRAY

Matrix MPksubscript𝑀subscript𝑃𝑘M_{P_{k}}italic_M start_POSTSUBSCRIPT italic_P start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT is linear in each of its variables. The projection can be rewritten into a linear matrix inequality problem and thus solved efficiently using e.g. cvxpy [50] since it is a semi-definite program (SDP):

Proj𝒞(γ)(P)=argminP^𝒞(γ)minααs.t.MP(α,P^)0.subscriptProj𝒞𝛾𝑃subscript^𝑃𝒞𝛾subscript𝛼𝛼s.t.succeeds-or-equalssubscript𝑀𝑃𝛼^𝑃0\text{Proj}_{\mathcal{C}(\gamma)}(P)=\begin{array}[t]{cl}\displaystyle% \operatorname*{\arg\!\min}_{\hat{P}\in\mathcal{C}(\gamma)}\min_{\alpha}&\alpha% \\ \text{s.t.}&M_{P}(\alpha,\hat{P})\succeq 0.\end{array}Proj start_POSTSUBSCRIPT caligraphic_C ( italic_γ ) end_POSTSUBSCRIPT ( italic_P ) = start_ARRAY start_ROW start_CELL start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT over^ start_ARG italic_P end_ARG ∈ caligraphic_C ( italic_γ ) end_POSTSUBSCRIPT roman_min start_POSTSUBSCRIPT italic_α end_POSTSUBSCRIPT end_CELL start_CELL italic_α end_CELL end_ROW start_ROW start_CELL s.t. end_CELL start_CELL italic_M start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT ( italic_α , over^ start_ARG italic_P end_ARG ) ⪰ 0 . end_CELL end_ROW end_ARRAY (24)
Boundary estimate subroutine

To estimate correctly the boundary cost (16) which is related to the largest region of attraction, one must choose the correct parameters {ηi}isubscriptsubscript𝜂𝑖𝑖\left\{\eta_{i}\right\}_{i}{ italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT such that, for a given sampling {xi}i=𝒟0𝒟subscriptsubscript𝑥𝑖𝑖subscript𝒟0𝒟\left\{x_{i}\right\}_{i}=\mathcal{D}_{0}\subset\partial\mathcal{D}{ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = caligraphic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ⊂ ∂ caligraphic_D, we get ηixisubscript𝜂𝑖subscript𝑥𝑖\eta_{i}x_{i}\in\partial\mathcal{R}italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ ∂ caligraphic_R. We use a first-order optimization scheme, defined as follows for ξ>0𝜉0\xi>0italic_ξ > 0:

gx(η)={1V~(ηx) if ηx¯,ξη otherwise.subscript𝑔𝑥𝜂cases1~𝑉𝜂𝑥 if 𝜂𝑥¯𝜉𝜂 otherwise.g_{x}(\eta)=\left\{\begin{array}[]{ll}1-\tilde{V}(\eta x)&\text{ if }\eta x\in% \bar{\mathcal{R}},\\ -\xi\eta&\text{ otherwise.}\end{array}\right.italic_g start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_η ) = { start_ARRAY start_ROW start_CELL 1 - over~ start_ARG italic_V end_ARG ( italic_η italic_x ) end_CELL start_CELL if italic_η italic_x ∈ over¯ start_ARG caligraphic_R end_ARG , end_CELL end_ROW start_ROW start_CELL - italic_ξ italic_η end_CELL start_CELL otherwise. end_CELL end_ROW end_ARRAY (25)
{ηi(k+1)=ηi(k)+αη(k)gxi(ηi(k)),ηi(0)=1.casessubscript𝜂𝑖𝑘1subscript𝜂𝑖𝑘subscript𝛼𝜂𝑘subscript𝑔subscript𝑥𝑖subscript𝜂𝑖𝑘subscript𝜂𝑖01\left\{\begin{array}[]{l}\eta_{i}(k+1)=\eta_{i}(k)+\alpha_{\eta}(k)g_{x_{i}}(% \eta_{i}(k)),\\ \eta_{i}(0)=1.\end{array}\right.{ start_ARRAY start_ROW start_CELL italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_k + 1 ) = italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_k ) + italic_α start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ( italic_k ) italic_g start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_k ) ) , end_CELL end_ROW start_ROW start_CELL italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( 0 ) = 1 . end_CELL end_ROW end_ARRAY (26)
Lemma 4.14.

Let ξ>0𝜉0\xi>0italic_ξ > 0, αη(0,α¯]subscript𝛼𝜂0¯𝛼\alpha_{\eta}\in(0,\bar{\alpha}]italic_α start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ∈ ( 0 , over¯ start_ARG italic_α end_ARG ] such that ξα¯(0,1)𝜉¯𝛼01\xi\bar{\alpha}\in(0,1)italic_ξ over¯ start_ARG italic_α end_ARG ∈ ( 0 , 1 ).

If there exists a unique δi>0subscript𝛿𝑖0\delta_{i}>0italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > 0 such that δixi𝒟subscript𝛿𝑖subscript𝑥𝑖𝒟\delta_{i}x_{i}\in\partial\mathcal{R}\setminus\partial\mathcal{D}italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ ∂ caligraphic_R ∖ ∂ caligraphic_D, then the following holds:

K>0,k>K,ηi(k)δiα¯[ξδi,1].formulae-sequence𝐾0formulae-sequencefor-all𝑘𝐾subscript𝜂𝑖𝑘subscript𝛿𝑖¯𝛼𝜉subscript𝛿𝑖1\exists K>0,\ \forall k>K,\quad\frac{\eta_{i}(k)-\delta_{i}}{\bar{\alpha}}\in[% -\xi\delta_{i},1].∃ italic_K > 0 , ∀ italic_k > italic_K , divide start_ARG italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_k ) - italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG over¯ start_ARG italic_α end_ARG end_ARG ∈ [ - italic_ξ italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , 1 ] .
Proof 4.15.

Let xi𝒟subscript𝑥𝑖𝒟x_{i}\in\partial\mathcal{D}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ ∂ caligraphic_D. If there exists a unique δi(0,1)subscript𝛿𝑖01\delta_{i}\in(0,1)italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ ( 0 , 1 ) such that δixi𝒟subscript𝛿𝑖subscript𝑥𝑖𝒟\delta_{i}x_{i}\in\partial\mathcal{R}\setminus\partial\mathcal{D}italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ ∂ caligraphic_R ∖ ∂ caligraphic_D, then the function gxisubscript𝑔subscript𝑥𝑖g_{x_{i}}italic_g start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT can be equivalently written as:

gxi(ηi)={1V~(ηxi) if ηiδi,ξηi otherwise.subscript𝑔subscript𝑥𝑖subscript𝜂𝑖cases1~𝑉𝜂subscript𝑥𝑖 if subscript𝜂𝑖subscript𝛿𝑖𝜉subscript𝜂𝑖 otherwise.g_{x_{i}}(\eta_{i})=\left\{\begin{array}[]{ll}1-\tilde{V}(\eta x_{i})&\text{ % if }\eta_{i}\leq\delta_{i},\\ -\xi\eta_{i}&\text{ otherwise.}\end{array}\right.italic_g start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = { start_ARRAY start_ROW start_CELL 1 - over~ start_ARG italic_V end_ARG ( italic_η italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_CELL start_CELL if italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL - italic_ξ italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL start_CELL otherwise. end_CELL end_ROW end_ARRAY

Since ηi(0)=1>δisubscript𝜂𝑖01subscript𝛿𝑖\eta_{i}(0)=1>\delta_{i}italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( 0 ) = 1 > italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and 1ξα¯(0,1)1𝜉¯𝛼011-\xi\bar{\alpha}\in(0,1)1 - italic_ξ over¯ start_ARG italic_α end_ARG ∈ ( 0 , 1 ), the sequence is decreasing to 00. The smallest attainable value in that case is infηi>δi(1ξα¯)ηi=(1ξα¯)δisubscriptinfimumsubscript𝜂𝑖subscript𝛿𝑖1𝜉¯𝛼subscript𝜂𝑖1𝜉¯𝛼subscript𝛿𝑖\inf_{\eta_{i}>\delta_{i}}(1-\xi\bar{\alpha})\eta_{i}=(1-\xi\bar{\alpha})% \delta_{i}roman_inf start_POSTSUBSCRIPT italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT > italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( 1 - italic_ξ over¯ start_ARG italic_α end_ARG ) italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( 1 - italic_ξ over¯ start_ARG italic_α end_ARG ) italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Since V~(ηx)1~𝑉𝜂𝑥1\tilde{V}(\eta x)\leq 1over~ start_ARG italic_V end_ARG ( italic_η italic_x ) ≤ 1, gxi(ηi)subscript𝑔subscript𝑥𝑖subscript𝜂𝑖g_{x_{i}}(\eta_{i})italic_g start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) is positive when ηiδisubscript𝜂𝑖subscript𝛿𝑖\eta_{i}\leq\delta_{i}italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Consequently, ηi(k)(1ξα¯)δisubscript𝜂𝑖𝑘1𝜉¯𝛼subscript𝛿𝑖\eta_{i}(k)\geq(1-\xi\bar{\alpha})\delta_{i}italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_k ) ≥ ( 1 - italic_ξ over¯ start_ARG italic_α end_ARG ) italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT at any k𝑘kitalic_k.

Since ηisubscript𝜂𝑖\eta_{i}italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is a geometric sequence with a common ratio 1ξα¯(0,1)1𝜉¯𝛼011-\xi\bar{\alpha}\in(0,1)1 - italic_ξ over¯ start_ARG italic_α end_ARG ∈ ( 0 , 1 ), there exists K>0𝐾0K>0italic_K > 0, such that ηi(k)>δisubscript𝜂𝑖𝑘subscript𝛿𝑖\eta_{i}(k)>\delta_{i}italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_k ) > italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and ηi(k+1)δisubscript𝜂𝑖𝑘1subscript𝛿𝑖\eta_{i}(k+1)\leq\delta_{i}italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_k + 1 ) ≤ italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. After this K𝐾Kitalic_K, the largest attainable value is supηiδiηi+αη(k)(1V~(ηxi))=δi+α¯subscriptsupremumsubscript𝜂𝑖subscript𝛿𝑖subscript𝜂𝑖subscript𝛼𝜂𝑘1~𝑉𝜂subscript𝑥𝑖subscript𝛿𝑖¯𝛼\sup_{\eta_{i}\leq\delta_{i}}\eta_{i}+\alpha_{\eta}(k)\left(1-\tilde{V}(\eta x% _{i})\right)=\delta_{i}+\bar{\alpha}roman_sup start_POSTSUBSCRIPT italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≤ italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_α start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ( italic_k ) ( 1 - over~ start_ARG italic_V end_ARG ( italic_η italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ) = italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + over¯ start_ARG italic_α end_ARG since V~1~𝑉1\tilde{V}\leq 1over~ start_ARG italic_V end_ARG ≤ 1.

Combining these two facts leads to

k>K,ηi(k)[(1ξα¯)δi,δi+α¯]formulae-sequencefor-all𝑘𝐾subscript𝜂𝑖𝑘1𝜉¯𝛼subscript𝛿𝑖subscript𝛿𝑖¯𝛼\forall k>K,\quad\eta_{i}(k)\in[(1-\xi\bar{\alpha})\delta_{i},\delta_{i}+\bar{% \alpha}]∀ italic_k > italic_K , italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_k ) ∈ [ ( 1 - italic_ξ over¯ start_ARG italic_α end_ARG ) italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + over¯ start_ARG italic_α end_ARG ] (27)

which concludes the proof.

The previous lemma ensures that we can approximate the the boundary of \mathcal{R}caligraphic_R arbitrarily close provided that the learning rate αηsubscript𝛼𝜂\alpha_{\eta}italic_α start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT is sufficiently small. In some specific cases, we can even prove that limkV~(ηi(k)xi)=1subscript𝑘~𝑉subscript𝜂𝑖𝑘subscript𝑥𝑖1\lim_{k\to\infty}\tilde{V}(\eta_{i}(k)x_{i})=1roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT over~ start_ARG italic_V end_ARG ( italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_k ) italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = 1.

Proposition 4.16.

Under the same conditions as in Lemma 4.14 together with

  1. 1.

    V~xi(δi)0~𝑉subscript𝑥𝑖subscript𝛿𝑖0\frac{\partial\tilde{V}}{\partial x_{i}}(\delta_{i})\neq 0divide start_ARG ∂ over~ start_ARG italic_V end_ARG end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ( italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ≠ 0;

  2. 2.

    αηsubscript𝛼𝜂\alpha_{\eta}italic_α start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT is strictly decreasing with limkαη(k)=0subscript𝑘subscript𝛼𝜂𝑘0\lim_{k\to\infty}\alpha_{\eta}(k)=0roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ( italic_k ) = 0 and kαη(k)subscript𝑘subscript𝛼𝜂𝑘\sum_{k}\alpha_{\eta}(k)∑ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ( italic_k ) is diverging,

then the following holds:

limkηi(k)=δi.subscript𝑘subscript𝜂𝑖𝑘subscript𝛿𝑖\lim_{k\to\infty}\eta_{i}(k)=\delta_{i}.roman_lim start_POSTSUBSCRIPT italic_k → ∞ end_POSTSUBSCRIPT italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_k ) = italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT .
Proof 4.17.

A sketch of the proof is that Lemma 4.14 applies but because of the divergence of the series αηsubscript𝛼𝜂\sum\alpha_{\eta}∑ italic_α start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT together with 1), there will be K2>Ksubscript𝐾2𝐾K_{2}>Kitalic_K start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT > italic_K such that ηi(K2)>δisubscript𝜂𝑖subscript𝐾2subscript𝛿𝑖\eta_{i}(K_{2})>\delta_{i}italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_K start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) > italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. We can apply Lemma 4.14 again but α¯¯𝛼\bar{\alpha}over¯ start_ARG italic_α end_ARG has decreased and consequently the convergence interval (27) is tighter around δisubscript𝛿𝑖\delta_{i}italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Since αηsubscript𝛼𝜂\alpha_{\eta}italic_α start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT strictly decreases to 00, then ηisubscript𝜂𝑖\eta_{i}italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT can be as close as desired to δisubscript𝛿𝑖\delta_{i}italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

Resampling subroutine

Inequality (15) holds if the law of large number is verified, and consequently, |¯|¯\left|\bar{\mathcal{R}}\right|| over¯ start_ARG caligraphic_R end_ARG | is large. This might impact the training time and the efficiency of the solver [51]. That is why we can consider resampling regularly during the training with fewer points [52]. This has two advantages:

  1. 1.

    it keeps the computational burden low,

  2. 2.

    each resampling will bring new gradient information for (21) and prevent redundancy (see [53, Section 8.1.3]).

Remark 4.18.

Note that we do not resample 𝒟0subscript𝒟0\mathcal{D}_{0}caligraphic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT because each original xi𝒟0subscript𝑥𝑖subscript𝒟0x_{i}\in\mathcal{D}_{0}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_D start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is associated with a parameter ηisubscript𝜂𝑖\eta_{i}italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

Stopping criteria

Since V~N,γsubscript~𝑉𝑁𝛾\tilde{V}_{N,\gamma}over~ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_N , italic_γ end_POSTSUBSCRIPT is an universal approximation of the optimal Lyapunov function, that means the optimal solution has λ0,λ1(P,Θ,γ,β)=0subscriptsubscript𝜆0subscript𝜆1𝑃Θ𝛾𝛽0\mathcal{L}_{\lambda_{0},\lambda_{1}}(P,\Theta,\gamma,\beta)=0caligraphic_L start_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_P , roman_Θ , italic_γ , italic_β ) = 0 for any sampling and λ0,λ1>0subscript𝜆0subscript𝜆10\lambda_{0},\lambda_{1}>0italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT > 0. We will stop the training when λ0,λ1(P,Θ,γ,β)<εsubscriptsubscript𝜆0subscript𝜆1𝑃Θ𝛾𝛽𝜀\mathcal{L}_{\lambda_{0},\lambda_{1}}(P,\Theta,\gamma,\beta)<\varepsiloncaligraphic_L start_POSTSUBSCRIPT italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_P , roman_Θ , italic_γ , italic_β ) < italic_ε for multiple different samplings and where ϵitalic-ϵ\epsilonitalic_ϵ is the machine precision.

We might want to stop the training early if the algorithm has converged to a suboptimal solution. This will be indicated by a slow variation of both 0subscript0\mathcal{L}_{0}caligraphic_L start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and 1subscript1\mathcal{L}_{1}caligraphic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Then a possibility is to add a refine step which consists of freezing some variables and updating the others for λ0=0subscript𝜆00\lambda_{0}=0italic_λ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 0 and λ1=1subscript𝜆11\lambda_{1}=1italic_λ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 1. This will force us to find a valid Lyapunov function and forget about the optimality.

Once stopped, the Lyapunov function can be verified on multiple different sampling. If there is one point s𝒟superscript𝑠𝒟s^{*}\in\mathcal{D}italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ caligraphic_D such that DV0(s)>0𝐷subscript𝑉0superscript𝑠0DV_{0}(s^{*})>0italic_D italic_V start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) > 0, then one can rescale the Lyapunov function to exclude this point, i.e.

V~V~(s)1V~.~𝑉~𝑉superscriptsuperscript𝑠1~𝑉\tilde{V}\leftarrow\tilde{V}(s^{*})^{-1}\cdot\tilde{V}.over~ start_ARG italic_V end_ARG ← over~ start_ARG italic_V end_ARG ( italic_s start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ⋅ over~ start_ARG italic_V end_ARG .

5 Verifiying a Taylor-Neural Lyapunov function

The previous algorithm is not guaranteed to converge to a valid Lyapunov function, which means that there might exist a point s(V~)𝑠~𝑉s\in\mathcal{R}(\tilde{V})italic_s ∈ caligraphic_R ( over~ start_ARG italic_V end_ARG ) for which the constraint DV0(s)>0𝐷subscript𝑉0𝑠0DV_{0}(s)>0italic_D italic_V start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_s ) > 0. Existing works such as [12, 33, 31] are using a verifier to ensure that the optimized neural Lyapunov function is indeed a Lyapunov function. This does not extend straightforwardly to our case. In this section, we will instead derive conditions on the sampling ¯¯\bar{\mathcal{R}}over¯ start_ARG caligraphic_R end_ARG, which guarantees that DV0<0𝐷subscript𝑉00DV_{0}<0italic_D italic_V start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT < 0 in a compact set is strictly included in \mathcal{R}caligraphic_R.

5.1 Lipschitz continuity of DV0𝐷subscript𝑉0DV_{0}italic_D italic_V start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT

We first show that DV0𝐷subscript𝑉0DV_{0}italic_D italic_V start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT has bounded variations in the following two lemmas.

Lemma 5.19.

The function xV~subscript𝑥~𝑉\partial_{x}\tilde{V}∂ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT over~ start_ARG italic_V end_ARG is Lipschitz continuous on 𝒟𝒟\mathcal{D}caligraphic_D, i.e. for any i{1,,n}𝑖1𝑛i\in\{1,\dots,n\}italic_i ∈ { 1 , … , italic_n }

x,y𝒟,|xiV~(x)xiV~(y)|LVxyformulae-sequencefor-all𝑥𝑦𝒟subscriptsubscript𝑥𝑖~𝑉𝑥subscriptsubscript𝑥𝑖~𝑉𝑦subscript𝐿𝑉norm𝑥𝑦\forall x,y\in\mathcal{D},\quad|\partial_{x_{i}}\tilde{V}(x)-\partial_{x_{i}}% \tilde{V}(y)|\leq L_{\partial V}\|x-y\|∀ italic_x , italic_y ∈ caligraphic_D , | ∂ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT over~ start_ARG italic_V end_ARG ( italic_x ) - ∂ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT over~ start_ARG italic_V end_ARG ( italic_y ) | ≤ italic_L start_POSTSUBSCRIPT ∂ italic_V end_POSTSUBSCRIPT ∥ italic_x - italic_y ∥
Proof 5.20.

Since xiV~subscriptsubscript𝑥𝑖~𝑉\partial_{x_{i}}\tilde{V}∂ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT over~ start_ARG italic_V end_ARG is continuous and almost everywhere differentiable for any i𝑖iitalic_i, its derivative xxiV~subscript𝑥subscriptsubscript𝑥𝑖~𝑉\partial_{x}\partial_{x_{i}}\tilde{V}∂ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ∂ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT over~ start_ARG italic_V end_ARG is bounded. Therefore, xiV~subscriptsubscript𝑥𝑖~𝑉\partial_{x_{i}}\tilde{V}∂ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT over~ start_ARG italic_V end_ARG is Lipschitz continuous and LVsubscript𝐿𝑉L_{\partial V}italic_L start_POSTSUBSCRIPT ∂ italic_V end_POSTSUBSCRIPT is the maximum over all i{1,,n}𝑖1𝑛i\in\{1,\dots,n\}italic_i ∈ { 1 , … , italic_n } of the previous derivative on 𝒟𝒟\mathcal{D}caligraphic_D.

There have been some works focusing on computing the Lipschitz constant of neural networks [54, 55]. However, in our case, LVsubscript𝐿𝑉L_{\partial V}italic_L start_POSTSUBSCRIPT ∂ italic_V end_POSTSUBSCRIPT is the Lipschitz constant of the derivative of a neural network. Considering hyperbolic tangent as the activation function, a symbolic upper bound of LVsubscript𝐿𝑉L_{\partial V}italic_L start_POSTSUBSCRIPT ∂ italic_V end_POSTSUBSCRIPT can be derived manually but this goes beyond the scope of this paper.

Lemma 5.21.

Let f𝑓fitalic_f be Lipschitz continuous on 𝒟𝒟\mathcal{D}caligraphic_D with Lipschitz constant Lfsubscript𝐿𝑓L_{f}italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT. The function DV0𝐷subscript𝑉0DV_{0}italic_D italic_V start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is Lipschitz continuous on 𝒟𝒟\mathcal{D}caligraphic_D with Lipschitz constant

LDV=n(LV2f2+Lf2xV2).subscript𝐿𝐷𝑉𝑛superscriptsubscript𝐿𝑉2superscriptsubscriptnorm𝑓2superscriptsubscript𝐿𝑓2superscriptsubscriptnormsubscript𝑥𝑉2L_{DV}=\sqrt{n\left(L_{\partial V}^{2}\|f\|_{\infty}^{2}+L_{f}^{2}\|\partial_{% x}V\|_{\infty}^{2}\right)}.italic_L start_POSTSUBSCRIPT italic_D italic_V end_POSTSUBSCRIPT = square-root start_ARG italic_n ( italic_L start_POSTSUBSCRIPT ∂ italic_V end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ italic_f ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ ∂ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_V ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG .
Proof 5.22.

Since f𝑓fitalic_f is Lipschitz continuous on 𝒟𝒟\mathcal{D}caligraphic_D, it is also bounded, and the same holds for xV~subscript𝑥~𝑉\partial_{x}\tilde{V}∂ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT over~ start_ARG italic_V end_ARG. The result of this proposition comes from the following inequality:

x,y𝒟,|DV0(x)DV0(y)|2i=1n|(xiV~(x)xiV~(y))fi(x)+xiV~(y)(fi(x)fi(y))|2i=1n(LV2f2+Lf2xV~2)xy2\forall x,y\in\mathcal{D},\quad|DV_{0}(x)-DV_{0}(y)|^{2}\leq\\ \sum_{i=1}^{n}\left|\left(\partial_{x_{i}}\tilde{V}(x)-\partial_{x_{i}}\tilde{% V}(y)\right)f_{i}(x)\right.\\ \left.\quad\quad+\partial_{x_{i}}\tilde{V}(y)\left(f_{i}(x)-f_{i}(y)\right)% \right|^{2}\\ \leq\sum_{i=1}^{n}\left(L_{\partial V}^{2}\|f\|_{\infty}^{2}+L_{f}^{2}\|% \partial_{x}\tilde{V}\|^{2}\right)\|x-y\|^{2}start_ROW start_CELL ∀ italic_x , italic_y ∈ caligraphic_D , | italic_D italic_V start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_x ) - italic_D italic_V start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_y ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ≤ end_CELL end_ROW start_ROW start_CELL ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | ( ∂ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT over~ start_ARG italic_V end_ARG ( italic_x ) - ∂ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT over~ start_ARG italic_V end_ARG ( italic_y ) ) italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) end_CELL end_ROW start_ROW start_CELL + ∂ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT over~ start_ARG italic_V end_ARG ( italic_y ) ( italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) - italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_y ) ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL ≤ ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_L start_POSTSUBSCRIPT ∂ italic_V end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ italic_f ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∥ ∂ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT over~ start_ARG italic_V end_ARG ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) ∥ italic_x - italic_y ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW

Using the definition of Lipschitz continuity leads to the expression of LDVsubscript𝐿𝐷𝑉L_{DV}italic_L start_POSTSUBSCRIPT italic_D italic_V end_POSTSUBSCRIPT given above.

One can note that LDVsubscript𝐿𝐷𝑉L_{DV}italic_L start_POSTSUBSCRIPT italic_D italic_V end_POSTSUBSCRIPT depends both on LVsubscript𝐿𝑉L_{\partial V}italic_L start_POSTSUBSCRIPT ∂ italic_V end_POSTSUBSCRIPT and the maximum of xV~subscript𝑥~𝑉\partial_{x}\tilde{V}∂ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT over~ start_ARG italic_V end_ARG but will also increase with dimension.

5.2 Local and global certifications

Based on the work in [56], one can estimate the size of the balls around each sampling point in which DV0𝐷subscript𝑉0DV_{0}italic_D italic_V start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is negative. The following lemma provides such an estimate.

Lemma 5.23.

Assume 1=0subscript10\mathcal{L}_{1}=0caligraphic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0 and let xi¯{0}subscript𝑥𝑖¯0x_{i}\in\bar{\mathcal{R}}\setminus\{0\}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ over¯ start_ARG caligraphic_R end_ARG ∖ { 0 }. Then, for any xxi2(LDV1DV0(xi))𝑥superscriptsubscriptsubscript𝑥𝑖2superscriptsubscript𝐿𝐷𝑉1𝐷subscript𝑉0subscript𝑥𝑖x\in\mathcal{B}_{x_{i}}^{2}\left(-L_{DV}^{-1}DV_{0}(x_{i})\right)italic_x ∈ caligraphic_B start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( - italic_L start_POSTSUBSCRIPT italic_D italic_V end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_D italic_V start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ), we have DV0(x)<0𝐷subscript𝑉0𝑥0DV_{0}(x)<0italic_D italic_V start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_x ) < 0.

Proof 5.24.

Since, for a given sampling, 1=0subscript10\mathcal{L}_{1}=0caligraphic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0 and DVβ(xi)<0𝐷subscript𝑉𝛽subscript𝑥𝑖0DV_{\beta}(x_{i})<0italic_D italic_V start_POSTSUBSCRIPT italic_β end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) < 0, then DV0(xi)<0𝐷subscript𝑉0subscript𝑥𝑖0DV_{0}(x_{i})<0italic_D italic_V start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) < 0. Let x𝒟𝑥𝒟x\in\mathcal{D}italic_x ∈ caligraphic_D, then the following holds:

DV0(x)DV0(xi)+|xV~(x)f(x)xV~(xi)f(xi)|DV0(xi)+LDVxxi.𝐷subscript𝑉0𝑥absent𝐷subscript𝑉0subscript𝑥𝑖missing-subexpressionsubscript𝑥~𝑉𝑥𝑓𝑥subscript𝑥~𝑉subscript𝑥𝑖𝑓subscript𝑥𝑖missing-subexpressionabsent𝐷subscript𝑉0subscript𝑥𝑖subscript𝐿𝐷𝑉norm𝑥subscript𝑥𝑖\begin{array}[]{rl}DV_{0}(x)\!\!\!\!&\leq DV_{0}(x_{i})\\ &\quad\quad\quad+\left|\partial_{x}\tilde{V}(x)\cdot f(x)-\partial_{x}\tilde{V% }(x_{i})\cdot f(x_{i})\right|\\ &\leq DV_{0}(x_{i})+L_{DV}\|x-x_{i}\|.\end{array}start_ARRAY start_ROW start_CELL italic_D italic_V start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_x ) end_CELL start_CELL ≤ italic_D italic_V start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL + | ∂ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT over~ start_ARG italic_V end_ARG ( italic_x ) ⋅ italic_f ( italic_x ) - ∂ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT over~ start_ARG italic_V end_ARG ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ⋅ italic_f ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) | end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ≤ italic_D italic_V start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) + italic_L start_POSTSUBSCRIPT italic_D italic_V end_POSTSUBSCRIPT ∥ italic_x - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ . end_CELL end_ROW end_ARRAY

This leads to the main statement of this lemma.

The previous lemma shows a local result. We would like instead to find a sampling that guarantees that the union of all balls is a cover of the region of attraction. To that extent, we introduce the following definition.

Definition 5.25.

A sampling {xi1,,in}𝒟subscript𝑥subscript𝑖1subscript𝑖𝑛𝒟\{x_{i_{1},\dots,i_{n}}\}\subset\mathcal{D}{ italic_x start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_i start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT } ⊂ caligraphic_D is said to be uniform with parameters Δ0,ΔxsubscriptΔ0subscriptΔ𝑥\Delta_{0},\Delta_{x}roman_Δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , roman_Δ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT when

xi1,,in=(2i1Δx,,2inΔx)subscript𝑥subscript𝑖1subscript𝑖𝑛2subscript𝑖1subscriptΔ𝑥2subscript𝑖𝑛Δ𝑥x_{i_{1},\dots,i_{n}}=\left(2i_{1}\Delta_{x},\cdots,2i_{n}\Delta x\right)italic_x start_POSTSUBSCRIPT italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_i start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ( 2 italic_i start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT roman_Δ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT , ⋯ , 2 italic_i start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT roman_Δ italic_x )

for ij{N,,Δ02Δx,Δ02Δx,,N}subscript𝑖𝑗𝑁subscriptΔ02subscriptΔ𝑥subscriptΔ02subscriptΔ𝑥𝑁i_{j}\in\{-N,\dots,-\lceil\frac{\Delta_{0}}{2\Delta_{x}}\rceil,\lceil\frac{% \Delta_{0}}{2\Delta_{x}}\rceil,\dots,N\}italic_i start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ { - italic_N , … , - ⌈ divide start_ARG roman_Δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG 2 roman_Δ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_ARG ⌉ , ⌈ divide start_ARG roman_Δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG 2 roman_Δ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_ARG ⌉ , … , italic_N } with N=12Δx1𝑁12subscriptΔ𝑥1N=\lceil\frac{1}{2\Delta_{x}}-1\rceilitalic_N = ⌈ divide start_ARG 1 end_ARG start_ARG 2 roman_Δ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_ARG - 1 ⌉.

Note that xi𝒟sxi(Δx)subscriptsubscript𝑥𝑖subscript𝒟𝑠superscriptsubscriptsubscript𝑥𝑖subscriptΔ𝑥\bigcup_{x_{i}\in\mathcal{D}_{s}}\mathcal{B}_{x_{i}}^{\infty}\left(\Delta_{x}\right)⋃ start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_D start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT caligraphic_B start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( roman_Δ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ) is a cover of 𝒟0(Δ0)𝒟superscriptsubscript0subscriptΔ0\mathcal{D}\setminus\mathcal{B}_{0}^{\infty}\left(\Delta_{0}\right)caligraphic_D ∖ caligraphic_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( roman_Δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ). To find a region of attraction, we must find a level set of the Lyapunov function V~~𝑉\tilde{V}over~ start_ARG italic_V end_ARG. Let, for any c(0,1)𝑐01c\in(0,1)italic_c ∈ ( 0 , 1 ), the following subset of the region of attraction:

c(V~)={x𝒟|V~(x)c}(V~).subscript𝑐~𝑉conditional-set𝑥𝒟~𝑉𝑥𝑐~𝑉\mathcal{R}_{c}(\tilde{V})=\left\{x\in\mathcal{D}\ |\ \tilde{V}(x)\leq c\right% \}\subset\mathcal{R}(\tilde{V}).caligraphic_R start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( over~ start_ARG italic_V end_ARG ) = { italic_x ∈ caligraphic_D | over~ start_ARG italic_V end_ARG ( italic_x ) ≤ italic_c } ⊂ caligraphic_R ( over~ start_ARG italic_V end_ARG ) .

The largest certified region of attraction based on a uniform sampling is estimated in the following proposition.

Proposition 5.26.

Let δ>0𝛿0\delta>0italic_δ > 0 such that δ(V~)subscript𝛿~𝑉\mathcal{R}_{\delta}(\tilde{V})caligraphic_R start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ( over~ start_ARG italic_V end_ARG ) is a region of attraction. Consider a uniform sampling 𝒟s={xi}isubscript𝒟𝑠subscriptsubscript𝑥𝑖𝑖\mathcal{D}_{s}=\left\{x_{i}\right\}_{i}caligraphic_D start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT = { italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT of parameters Δ0,ΔxsubscriptΔ0subscriptΔ𝑥\Delta_{0},\Delta_{x}roman_Δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , roman_Δ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT such that

  • 0(Δ0)δ(V~)superscriptsubscript0subscriptΔ0subscript𝛿~𝑉\mathcal{B}_{0}^{\infty}(\Delta_{0})\subset\mathcal{R}_{\delta}(\tilde{V})caligraphic_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( roman_Δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ⊂ caligraphic_R start_POSTSUBSCRIPT italic_δ end_POSTSUBSCRIPT ( over~ start_ARG italic_V end_ARG );

  • ΔxDV¯0LDVnsubscriptΔ𝑥subscript¯𝐷𝑉0subscript𝐿𝐷𝑉𝑛\displaystyle\Delta_{x}\leq-\frac{\bar{DV}_{0}}{L_{DV}\sqrt{n}}roman_Δ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ≤ - divide start_ARG over¯ start_ARG italic_D italic_V end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG italic_L start_POSTSUBSCRIPT italic_D italic_V end_POSTSUBSCRIPT square-root start_ARG italic_n end_ARG end_ARG,

where

DV¯0=maxx𝒟s0(Δ0)V~(x)<1DV0(x)subscript¯𝐷𝑉0subscript𝑥subscript𝒟𝑠superscriptsubscript0subscriptΔ0~𝑉𝑥1𝐷subscript𝑉0𝑥\bar{DV}_{0}=\max_{\begin{subarray}{c}x\in\mathcal{D}_{s}\setminus\mathcal{B}_% {0}^{\infty}(\Delta_{0})\\ \tilde{V}(x)<1\end{subarray}}DV_{0}(x)over¯ start_ARG italic_D italic_V end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = roman_max start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_x ∈ caligraphic_D start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ∖ caligraphic_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( roman_Δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL over~ start_ARG italic_V end_ARG ( italic_x ) < 1 end_CELL end_ROW end_ARG end_POSTSUBSCRIPT italic_D italic_V start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_x )

If 1=0subscript10\mathcal{L}_{1}=0caligraphic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0 on 𝒟ssubscript𝒟𝑠\mathcal{D}_{s}caligraphic_D start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT, then the set c(V~)subscriptsuperscript𝑐~𝑉\mathcal{R}_{c^{*}}(\tilde{V})caligraphic_R start_POSTSUBSCRIPT italic_c start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( over~ start_ARG italic_V end_ARG ) is a certified region of attraction with

c=maxcs.t.c(V~)𝒟s.superscript𝑐𝑐s.t.subscript𝑐~𝑉subscriptsubscript𝒟𝑠c^{*}=\begin{array}[t]{cl}\displaystyle\max&c\\ \text{s.t.}&\displaystyle\mathcal{R}_{c}(\tilde{V})\subseteq\mathcal{R}_{% \mathcal{D}_{s}}.\end{array}italic_c start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = start_ARRAY start_ROW start_CELL roman_max end_CELL start_CELL italic_c end_CELL end_ROW start_ROW start_CELL s.t. end_CELL start_CELL caligraphic_R start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( over~ start_ARG italic_V end_ARG ) ⊆ caligraphic_R start_POSTSUBSCRIPT caligraphic_D start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT . end_CELL end_ROW end_ARRAY (28)

where

𝒟s=xi𝒟sV~(xi)<1xi(Δx)0(Δ0).subscriptsubscript𝒟𝑠subscriptsubscript𝑥𝑖subscript𝒟𝑠~𝑉subscript𝑥𝑖1superscriptsubscriptsubscript𝑥𝑖subscriptΔ𝑥superscriptsubscript0subscriptΔ0\mathcal{R}_{\mathcal{D}_{s}}=\bigcup_{\begin{subarray}{c}x_{i}\in\mathcal{D}_% {s}\\ \tilde{V}(x_{i})<1\end{subarray}}\mathcal{B}_{x_{i}}^{\infty}\left(\Delta_{x}% \right)\cup\mathcal{B}_{0}^{\infty}(\Delta_{0}).caligraphic_R start_POSTSUBSCRIPT caligraphic_D start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT = ⋃ start_POSTSUBSCRIPT start_ARG start_ROW start_CELL italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_D start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL over~ start_ARG italic_V end_ARG ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) < 1 end_CELL end_ROW end_ARG end_POSTSUBSCRIPT caligraphic_B start_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( roman_Δ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ) ∪ caligraphic_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( roman_Δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) .
Proof 5.27.

First of all, the approximation properties of Taylor-neural Lyapunov functions together with projection (24) ensures the existence of δ>0𝛿0\delta>0italic_δ > 0 as introduced in the proposition. Consequently, for any x0(Δ0)𝑥superscriptsubscript0subscriptΔ0x\in\mathcal{B}_{0}^{\infty}(\Delta_{0})italic_x ∈ caligraphic_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( roman_Δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ), DV0(x)𝐷subscript𝑉0𝑥DV_{0}(x)italic_D italic_V start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_x ) is definite negative.

For any x𝒟s𝑥subscriptsubscript𝒟𝑠x\in\mathcal{R}_{\mathcal{D}_{s}}italic_x ∈ caligraphic_R start_POSTSUBSCRIPT caligraphic_D start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT such that xΔ0norm𝑥subscriptΔ0\|x\|\geq\Delta_{0}∥ italic_x ∥ ≥ roman_Δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, we get that there exists xi𝒟ssubscript𝑥𝑖subscript𝒟𝑠x_{i}\in\mathcal{D}_{s}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_D start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT such that

xxiΔx.subscriptnorm𝑥subscript𝑥𝑖subscriptΔ𝑥\|x-x_{i}\|_{\infty}\leq\Delta_{x}.∥ italic_x - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ≤ roman_Δ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT .

Consequently, we get xxi2nΔxsubscriptnorm𝑥subscript𝑥𝑖2𝑛subscriptΔ𝑥\|x-x_{i}\|_{2}\leq\sqrt{n}\Delta_{x}∥ italic_x - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ square-root start_ARG italic_n end_ARG roman_Δ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT which in turn results in

xxi2DV¯0(s)LDVDV0(xi)LDV.subscriptnorm𝑥subscript𝑥𝑖2subscript¯𝐷𝑉0𝑠subscript𝐿𝐷𝑉𝐷subscript𝑉0subscript𝑥𝑖subscript𝐿𝐷𝑉\|x-x_{i}\|_{2}\leq-\frac{\bar{DV}_{0}(s)}{L_{DV}}\leq-\frac{DV_{0}(x_{i})}{L_% {DV}}.∥ italic_x - italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ - divide start_ARG over¯ start_ARG italic_D italic_V end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_s ) end_ARG start_ARG italic_L start_POSTSUBSCRIPT italic_D italic_V end_POSTSUBSCRIPT end_ARG ≤ - divide start_ARG italic_D italic_V start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG italic_L start_POSTSUBSCRIPT italic_D italic_V end_POSTSUBSCRIPT end_ARG .

Since 1=0subscript10\mathcal{L}_{1}=0caligraphic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0 on 𝒟ssubscript𝒟𝑠\mathcal{D}_{s}caligraphic_D start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT, from Lemma 5.23, we get DV0<0𝐷subscript𝑉00DV_{0}<0italic_D italic_V start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT < 0 on 𝒟ssubscriptsubscript𝒟𝑠\mathcal{R}_{\mathcal{D}_{s}}caligraphic_R start_POSTSUBSCRIPT caligraphic_D start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT.

The largest region of attraction in 𝒟ssubscriptsubscript𝒟𝑠\mathcal{R}_{\mathcal{D}_{s}}caligraphic_R start_POSTSUBSCRIPT caligraphic_D start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT is then obtained by solving the optimization problem (28).

Since 𝒟ssubscriptsubscript𝒟𝑠\mathcal{R}_{\mathcal{D}_{s}}caligraphic_R start_POSTSUBSCRIPT caligraphic_D start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT is an open set, we must have c<1superscript𝑐1c^{*}<1italic_c start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT < 1 and 𝒟s(V~)subscriptsubscript𝒟𝑠~𝑉\mathcal{R}_{\mathcal{D}_{s}}\subset\mathcal{R}(\tilde{V})caligraphic_R start_POSTSUBSCRIPT caligraphic_D start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT ⊂ caligraphic_R ( over~ start_ARG italic_V end_ARG ). Therefore, it is impossible to confirm that the entire set (V~)~𝑉\mathcal{R}(\tilde{V})caligraphic_R ( over~ start_ARG italic_V end_ARG ) is indeed a region of attraction. Moreover, since 1=0subscript10\mathcal{L}_{1}=0caligraphic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0 on 𝒟ssubscript𝒟𝑠\mathcal{D}_{s}caligraphic_D start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT, we get the following:

Δxβ(1c)Δ02LDVn.subscriptΔ𝑥𝛽1superscript𝑐superscriptsubscriptΔ02subscript𝐿𝐷𝑉𝑛\Delta_{x}\leq\frac{\beta(1-c^{*})\Delta_{0}^{2}}{L_{DV}\sqrt{n}}.roman_Δ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ≤ divide start_ARG italic_β ( 1 - italic_c start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) roman_Δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG italic_L start_POSTSUBSCRIPT italic_D italic_V end_POSTSUBSCRIPT square-root start_ARG italic_n end_ARG end_ARG .

Consequently, the sampling must have a finer grain if

  1. 1.

    csuperscript𝑐c^{*}italic_c start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is closer to 1111: the certified region of attraction is larger;

  2. 2.

    β𝛽\betaitalic_β is smaller: the constraint (12) is close to violation;

  3. 3.

    LDVsubscript𝐿𝐷𝑉L_{DV}italic_L start_POSTSUBSCRIPT italic_D italic_V end_POSTSUBSCRIPT is large: the Lyapunov function V~~𝑉\tilde{V}over~ start_ARG italic_V end_ARG has very fast variations or the system is stiff (Lfsubscript𝐿𝑓L_{f}italic_L start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT and fsubscriptnorm𝑓\|f\|_{\infty}∥ italic_f ∥ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT is large);

  4. 4.

    Δ0subscriptΔ0\Delta_{0}roman_Δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is small: we cannot find a large region of attraction around the origin;

  5. 5.

    n𝑛nitalic_n is large.

The previous proposition also helps us understand how fine the sampling should be during training to obtain a region of attraction that is meaningful. The previous proposition practically highlights that we need to have a finer sampling when we are close to the boundary of the region of attraction and around the origin.

There are still some computational concerns regarding optimization problem (28):

  1. 1.

    Finding the largest level set included in Dssubscriptsubscript𝐷𝑠\mathcal{R}_{D_{s}}caligraphic_R start_POSTSUBSCRIPT italic_D start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT might be a challenge in high dimension and one can consider a greedy algorithm to estimate it.

  2. 2.

    Finding Δ0subscriptΔ0\Delta_{0}roman_Δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT requires estimating a non-optimal region of attraction. Due to the projection operation (24), one can find such δ𝛿\deltaitalic_δ using reasoning similar to that conducted in Appendix .1.

6 Simulations and discussion

In this section, we present and discuss the results of applying our proposed method to different systems, both globally and locally stable. In addition, we compare our solution with state-of-the-art alternatives. We conclude by discussing the robustness of the training algorithm to different initializations to evaluate the consistency of the solutions it provides. To highlight the robustness of the method, we chose the same hyperparameters for Algorithm 1 throughout this section. These parameters are listed in Table 1.

Parameter Nλsubscript𝑁𝜆N_{\lambda}italic_N start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT N1subscript𝑁1N_{1}italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT N2subscript𝑁2N_{2}italic_N start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT αvsubscript𝛼𝑣\alpha_{v}italic_α start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT αλsubscript𝛼𝜆\alpha_{\lambda}italic_α start_POSTSUBSCRIPT italic_λ end_POSTSUBSCRIPT αηsubscript𝛼𝜂\alpha_{\eta}italic_α start_POSTSUBSCRIPT italic_η end_POSTSUBSCRIPT ξ𝜉\xiitalic_ξ
Value 100100100100 2000200020002000 2000200020002000 102superscript10210^{-2}10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT 101superscript10110^{-1}10 start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT 102superscript10210^{-2}10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT 102superscript10210^{-2}10 start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT
Table 1: Hyperparamters used for the simulations.

6.1 Simulations

6.1.1 Globally stable system

We propose to consider first the following system:

{x˙1(t)=3x1(t)+0.1sin(x2)x2,x˙2(t)=15x2(t).casessubscript˙𝑥1𝑡3subscript𝑥1𝑡0.1subscript𝑥2subscript𝑥2subscript˙𝑥2𝑡15subscript𝑥2𝑡\left\{\begin{array}[]{l}\dot{x}_{1}(t)=-3x_{1}(t)+0.1\sin(x_{2})x_{2},\\ \dot{x}_{2}(t)=-15x_{2}(t).\end{array}\right.{ start_ARRAY start_ROW start_CELL over˙ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t ) = - 3 italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t ) + 0.1 roman_sin ( italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL over˙ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) = - 15 italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) . end_CELL end_ROW end_ARRAY (29)

We can rewrite it as x˙=A(x)x˙𝑥𝐴𝑥𝑥\dot{x}=A(x)xover˙ start_ARG italic_x end_ARG = italic_A ( italic_x ) italic_x where A𝐴Aitalic_A belongs to the polytope [A1,A1]subscript𝐴1subscript𝐴1[A_{-1},A_{1}][ italic_A start_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT , italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] with Ai=[30.1i015]subscript𝐴𝑖delimited-[]matrix30.1𝑖015A_{i}=\left[\begin{matrix}-3&0.1i\\ 0&-15\end{matrix}\right]italic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = [ start_ARG start_ROW start_CELL - 3 end_CELL start_CELL 0.1 italic_i end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL - 15 end_CELL end_ROW end_ARG ]. This system is globally stable because there exists a common quadratic Lyapunov function to all A[A1,A1]𝐴subscript𝐴1subscript𝐴1A\in[A_{-1},A_{1}]italic_A ∈ [ italic_A start_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT , italic_A start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ]:

Vquad(x)=x[2.50.550.550.4]x.subscript𝑉𝑞𝑢𝑎𝑑𝑥superscript𝑥topdelimited-[]matrix2.50.550.550.4𝑥V_{quad}(x)=x^{\top}\left[\begin{matrix}2.5&0.55\\ 0.55&0.4\end{matrix}\right]x.italic_V start_POSTSUBSCRIPT italic_q italic_u italic_a italic_d end_POSTSUBSCRIPT ( italic_x ) = italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT [ start_ARG start_ROW start_CELL 2.5 end_CELL start_CELL 0.55 end_CELL end_ROW start_ROW start_CELL 0.55 end_CELL start_CELL 0.4 end_CELL end_ROW end_ARG ] italic_x .

We use the method described in this paper with the hyperparameters in Table 1 and a maximum number of epochs of 3000300030003000. The result of one training is displayed in Figure 1.

We notice that the early-stopping conditions are always reached. The difficulty with global systems is that the condition DV0𝐷subscript𝑉0DV_{0}italic_D italic_V start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT on \partial\mathcal{R}∂ caligraphic_R cannot be enforced. Compared to other works [12, 21], our methodology is capable of finding regions of attraction that are the whole domain 𝒟𝒟\mathcal{D}caligraphic_D.

Refer to caption
Figure 1: Taylor-neural Lyapunov function for globally stable system (29). The estimated region of attraction is colored. Arrows indicate the flow of the original system.

6.1.2 Locally stable equilibrium point

The second system considered is the model of a generator [38, Example 11.2], described as follows:

{x˙1(t)=x2(t),x˙2(t)=sin(x1(t))5x2(t).casessubscript˙𝑥1𝑡subscript𝑥2𝑡subscript˙𝑥2𝑡subscript𝑥1𝑡5subscript𝑥2𝑡\left\{\begin{array}[]{l}\dot{x}_{1}(t)=x_{2}(t),\\ \dot{x}_{2}(t)=-\sin(x_{1}(t))-5x_{2}(t).\end{array}\right.{ start_ARRAY start_ROW start_CELL over˙ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t ) = italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) , end_CELL end_ROW start_ROW start_CELL over˙ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) = - roman_sin ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t ) ) - 5 italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) . end_CELL end_ROW end_ARRAY (30)

Since there are multiple equilibrium points, it is well known that this system is not globally stable. The system is locally stable because it satisfies Assumption 1. In [38, Example 12.6], they provide the following Lyapunov function

Vloc(x1,x2)=0.5x22+1cos(x1)subscript𝑉𝑙𝑜𝑐subscript𝑥1subscript𝑥20.5superscriptsubscript𝑥221subscript𝑥1V_{loc}(x_{1},x_{2})=0.5x_{2}^{2}+1-\cos(x_{1})italic_V start_POSTSUBSCRIPT italic_l italic_o italic_c end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) = 0.5 italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 1 - roman_cos ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT )

which gives a rather conservative region of attraction.

We run the algorithm with the same hyperparameters as previously and a maximum number of epochs of 3000300030003000. The result is displayed in Figure 2. This time, the stopping conditions were never reached. We can see that the blue dots, which correspond to the boundary estimate {ηixi}isubscriptsubscript𝜂𝑖subscript𝑥𝑖𝑖\{\eta_{i}x_{i}\}_{i}{ italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are very close to the real boundary, which is a success. From the flow arrows, it seems that the region of attraction is very close to the real one (and much larger than the one obtained using Vlocsubscript𝑉𝑙𝑜𝑐V_{loc}italic_V start_POSTSUBSCRIPT italic_l italic_o italic_c end_POSTSUBSCRIPT). We can see that the level lines are not elliptical, which implies that the Lyapunov function is not quadratic. This indicates that higher-order terms in the Taylor decomposition have been learned successfully.

Refer to caption
Figure 2: Taylor-neural Lyapunov function for locally stable system (30). The region of attraction is the colored area. Blue dots refer to the sampling points at the boundary {ηixi}isubscriptsubscript𝜂𝑖subscript𝑥𝑖𝑖\{\eta_{i}x_{i}\}_{i}{ italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Arrows indicate the flow of the original system.

6.1.3 Van der Pol oscillator

The last example considers a special case of Van der Pole oscillator [57]:

{x˙1(t)=x2(t),x˙2(t)=x1(t)μ(1x1(t)2)x2(t).casessubscript˙𝑥1𝑡subscript𝑥2𝑡subscript˙𝑥2𝑡subscript𝑥1𝑡𝜇1subscript𝑥1superscript𝑡2subscript𝑥2𝑡\left\{\begin{array}[]{l}\dot{x}_{1}(t)=-x_{2}(t),\\ \dot{x}_{2}(t)=x_{1}(t)-\mu(1-x_{1}(t)^{2})x_{2}(t).\end{array}\right.{ start_ARRAY start_ROW start_CELL over˙ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t ) = - italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) , end_CELL end_ROW start_ROW start_CELL over˙ start_ARG italic_x end_ARG start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) = italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t ) - italic_μ ( 1 - italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) italic_x start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) . end_CELL end_ROW end_ARRAY (31)

We investigate the case μ=1𝜇1\mu=1italic_μ = 1. This system has a polynomial structure which makes the use of SOS Lyapunov functions possible [11]. The region of attraction has a nonconvex shape which becomes stiffer as μ𝜇\muitalic_μ increases.

We trained the Taylor-neural Lyapunov function using the same set of hyper-parameters with a maximum of 2000200020002000 epochs. The result is shown in Figure 3. Similarly to the previous example, the optimal region of attraction is well-estimated. The obtained result is very close to the SOS result (dashed line in the figure). The evolution of the training loss is shown in Figure 4. One can see that without early stop around 2000200020002000 epochs, the loss will have spikes which indicates bad fitting. This shows that it is important to monitor the loss during training and that it might be necessary to enforce early stopping for better convergence.

Refer to caption
Figure 3: Taylor-neural Lyapunov function for Van der Pol oscillator with μ=1𝜇1\mu=1italic_μ = 1. The region of attraction is the colored area. Blue dots refer to the sampling points at the boundary {ηixi}isubscriptsubscript𝜂𝑖subscript𝑥𝑖𝑖\{\eta_{i}x_{i}\}_{i}{ italic_η start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Arrows indicate the flow of the original system. Dash-line is the region of attraction obtained using SOS.
Refer to caption
Figure 4: Training loss for the Taylor-neural Lyapunov function for Van der Pol oscillator with μ=1𝜇1\mu=1italic_μ = 1.
LyzNet [12] SOS [11] This method
RoA coverage 95.64%percent95.6495.64\%95.64 % 94.17%percent94.1794.17\%94.17 % 90.44%percent90.4490.44\%90.44 %
System dimensions Potentially high Low Intermediate
System characteristic Strictly stable Polynomial Strictly stable
Data Simulated None None
Table 2: Performance of the training algorithm compared to other algorithms on the Van der Pol oscillator with μ=1𝜇1\mu=1italic_μ = 1.

We compare our results with SOS [11] and the LyzNet method developed in [12] and the results are presented in Table 2. A high-resolution estimation of the largest region of attraction is estimated using numerical integration. If the trajectory is close to the origin after some time, we consider that the initial point is part of the region of attraction. We computed 3911391139113911 points uniformly spread in the region of attraction based on this criteria. We then checked if these points were part of the region of attraction for the SOS formulation, LyzNet, and our method. We found that the best-performing method in this example is LyzNet, closely followed by SOS, and then our method is behind by approximately 4%percent44\%4 %. Our method still gives a region of attraction very close to the optimal one (90.44%percent90.4490.44\%90.44 %). However, Lyznet is much slower and uses external data generated by the simulator, which significantly restricts the interest in the method. Indeed, using an external simulator brings no guarantee if the point is stable or not. Moreover, it involves spending time simulating points.

Note that we also tried the algorithm for larger values of μ𝜇\muitalic_μ, but the stiffness of the system makes the algorithm diverge.

6.2 Robustness analysis

Investigating how the training algorithm differs when initialized differently is of tremendous importance. A robust training algorithm will almost always produce the same region of attraction, regardless of the initialization.

To evaluate robustness, the training algorithm is run 10101010 times with different initial states, and the obtained regions of attraction are denoted isubscript𝑖\mathcal{R}_{i}caligraphic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for i1,,10𝑖110i\in 1,\dots,10italic_i ∈ 1 , … , 10. Results are reported in Table 3. The numbers in the shared volume column indicate the probability that a point that is part of one learned region of attraction belongs to another learned region of attraction. The IoU column refers to the intersection over the union (also called the Jaccard similarity index). This is the percentage of points in the largest region of attraction i=110isuperscriptsubscript𝑖110subscript𝑖\cup_{i=1}^{10}\mathcal{R}_{i}∪ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 10 end_POSTSUPERSCRIPT caligraphic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT which belongs to the smallest region of attraction i=110isuperscriptsubscript𝑖110subscript𝑖\cap_{i=1}^{10}\mathcal{R}_{i}∩ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 10 end_POSTSUPERSCRIPT caligraphic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

Globally stable system (29) shows high percentages for both, indicating a very robust algorithm. This can be explained easily since there exist many Lyapunov functions that will return the optimal region of attraction in a bounded domain. Another explanation comes from the verification section, since variations in the region of attraction are close to the boundary of 𝒟𝒟\mathcal{D}caligraphic_D.

The locally stable system (30) has a more challenging region of attraction. However, the percentages are still relatively high, which indicates a good convergence of the algorithm. Looking more carefully at the plots shows convergence to suboptimal regions of attraction in some rare cases. This significantly decreases the shared and common volumes.

The last example has a more complex region of attraction. Without any surprise, it is harder to learn, but the region of attraction (even if often suboptimal) is relatively consistent over the tries. One region of attraction was relatively small, significantly impacting the percentage of shared volumes. To mitigate this issue, the solution might be to look at the training losses and stop the training when the loss is relatively low, and this time might be reached at different epochs for different initializations. Another issue comes from some non-connected points which are also identified as stable. A mitigation strategy could be to consider “ensemble learning”, where the final region of attraction is the intersection of multiple regions of attraction, making the process more stable but also more conservative. However, considering the large common volume in all cases, this technique would probably only remove corner-case points and keep a good estimate of the region of attraction.

Example Shared volume IoU
System (29) 99.9%percent99.999.9\%99.9 % 99.5%percent99.599.5\%99.5 %
System (30) 93.1%percent93.193.1\%93.1 % 74.3%percent74.374.3\%74.3 %
System (31) 88.5%percent88.588.5\%88.5 % 58.1%percent58.158.1\%58.1 %
Table 3: Robustness of the training algorithm. IoU refers to the intersection over the union. The higher the percentages, the more robust the training algorithm is.

7 Conclusion and perspectives

This paper proposes a new class of neural networks which rely on Taylor expansion. The approximation capabilities of such neural networks have been proven for candidate Lyapunov functions. The classical training algorithm based on gradient descent had been adapted to this case to provide certification that the obtained candidate Lyapunov function is indeed valid around the origin. The paper also addresses the issue of estimating the largest region of attraction, leveraging Zubov’s theorem. The efficiency of the approach has been demonstrated in the estimation of the largest region of attraction, where results comparable to the state of the art have been obtained on some examples. An extension to the numerical certification of the obtained Lyapunov function has also been discussed.

However, numerical experiments have shown that some improvement is possible in estimating the region of attraction. In some cases, the very poor convergence of the algorithms suggests that a better initialization procedure should be investigated. One solution would be to consider state-of-the-art algorithms in robust control, for example. In the same vein, alternative solutions for maximizing the region of attraction should be considered to mitigate the convergence to a bad local minimum. Research from the machine learning side can provide some insights.

The proposed method opens the way for new research directions. The capacity to estimate the largest region of attraction without any data introduces a fundamental change compared to other methodologies. The extension to controller synthesis is one of the most promising research directions, but the consideration of much more complex systems (of infinite dimension or uncertain, for instance) is another avenue. The investigation of other sampling strategies to ensure a better convergence is left for future research.

.1 Proof of Proposition 3.6

Let pick one V=V2+V3superscript𝑉subscript𝑉2subscript𝑉3V^{*}=V_{2}+V_{3}italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT + italic_V start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT from (5) where V2=12xHxsubscript𝑉212superscript𝑥topsuperscript𝐻𝑥V_{2}=\frac{1}{2}x^{\top}H^{*}xitalic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_H start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_x for x𝒟𝑥𝒟x\in\mathcal{D}italic_x ∈ caligraphic_D and set P=H0𝑃superscript𝐻succeeds0P=H^{*}\succ 0italic_P = italic_H start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ≻ 0 (Lemma 3.3).

Approximation of the region of attraction

Since V^N𝒱subscript^𝑉𝑁𝒱\hat{V}_{N}\in\mathcal{V}over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ∈ caligraphic_V, we get (V^N)(V)subscript^𝑉𝑁superscript𝑉\mathcal{R}(\hat{V}_{N})\subseteq\mathcal{R}(V^{*})caligraphic_R ( over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) ⊆ caligraphic_R ( italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ). The universal approximation theorem [41, Theorem 4] states that for any ε1εsubscript𝜀1𝜀\varepsilon_{1}\leq\varepsilonitalic_ε start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≤ italic_ε, there exist N>0𝑁0N>0italic_N > 0, weights and biases such that

i,supx𝒟|R^N(i)(x)Ri(x)|n3ε1.formulae-sequencefor-all𝑖subscriptsupremum𝑥𝒟superscriptsubscript^𝑅𝑁𝑖𝑥subscript𝑅𝑖𝑥superscript𝑛3subscript𝜀1\forall i\in\mathcal{I},\quad\sup_{x\in\mathcal{D}}\left|\hat{R}_{N}^{(i)}(x)-% R_{i}(x)\right|\leq n^{-3}\varepsilon_{1}.∀ italic_i ∈ caligraphic_I , roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_D end_POSTSUBSCRIPT | over^ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ( italic_x ) - italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) | ≤ italic_n start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT italic_ε start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT . (32)

Noting that 𝒟[1,1]n𝒟superscript11𝑛\mathcal{D}\subset[-1,1]^{n}caligraphic_D ⊂ [ - 1 , 1 ] start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT, equations (6) and (8) lead to: x𝒟for-all𝑥𝒟\forall x\in\mathcal{D}∀ italic_x ∈ caligraphic_D

|V(x)V^N(x)|=|i(Ri(x)R^N(i)(x))k=1nxkik|ε1.superscript𝑉𝑥subscript^𝑉𝑁𝑥subscript𝑖subscript𝑅𝑖𝑥superscriptsubscript^𝑅𝑁𝑖𝑥superscriptsubscriptproduct𝑘1𝑛superscriptsubscript𝑥𝑘subscript𝑖𝑘subscript𝜀1|V^{*}(x)-\hat{V}_{N}(x)|=\left|\sum_{i\in\mathcal{I}}\left(R_{i}(x)-\hat{R}_{% N}^{(i)}(x)\right)\prod_{k=1}^{n}x_{k}^{i_{k}}\right|\leq\varepsilon_{1}.| italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x ) - over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_x ) | = | ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT ( italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) - over^ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ( italic_x ) ) ∏ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | ≤ italic_ε start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT . (33)

Consequently, the left inclusion in (9) holds.

Positive-definiteness

Since the activation function ψC(,)𝜓superscript𝐶\psi\in C^{\infty}(\mathbb{R},\mathbb{R})italic_ψ ∈ italic_C start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT ( blackboard_R , blackboard_R ), then R^N(i)superscriptsubscript^𝑅𝑁𝑖\hat{R}_{N}^{(i)}over^ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT is bounded. Consequently, we get V^N(0)=0subscript^𝑉𝑁00\hat{V}_{N}(0)=0over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( 0 ) = 0.

Note that V3=o(V2)subscript𝑉3𝑜subscript𝑉2V_{3}=o(V_{2})italic_V start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT = italic_o ( italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) and V2=O(x2)subscript𝑉2𝑂superscriptnorm𝑥2V_{2}=O(\|x\|^{2})italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_O ( ∥ italic_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), consequently, there is χ>0𝜒0\chi>0italic_χ > 0 such that x𝒟,xχformulae-sequencefor-all𝑥𝒟norm𝑥𝜒\forall x\in\mathcal{D},\quad\|x\|\leq\chi∀ italic_x ∈ caligraphic_D , ∥ italic_x ∥ ≤ italic_χ

3|V3(x)|V2(x) and  3|ik=1nxkik|V2(x).3subscript𝑉3𝑥subscript𝑉2𝑥 and 3subscript𝑖superscriptsubscriptproduct𝑘1𝑛superscriptsubscript𝑥𝑘subscript𝑖𝑘subscript𝑉2𝑥3|V_{3}(x)|\leq V_{2}(x)\ \text{ and }\ 3\left|\sum_{i\in\mathcal{I}}\prod_{k=% 1}^{n}x_{k}^{i_{k}}\right|\leq V_{2}(x).3 | italic_V start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_x ) | ≤ italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ) and 3 | ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | ≤ italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ) . (34)

Let Vχ=infxχ,x𝒟V(x)>0subscript𝑉𝜒subscriptinfimumformulae-sequencenorm𝑥𝜒𝑥𝒟superscript𝑉𝑥0V_{\chi}=\inf_{\|x\|\geq\chi,x\in\mathcal{D}}V^{*}(x)>0italic_V start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT = roman_inf start_POSTSUBSCRIPT ∥ italic_x ∥ ≥ italic_χ , italic_x ∈ caligraphic_D end_POSTSUBSCRIPT italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x ) > 0 and ε¯=Vχ(n3+1)1>0¯𝜀subscript𝑉𝜒superscriptsuperscript𝑛3110\bar{\varepsilon}=V_{\chi}(n^{3}+1)^{-1}>0over¯ start_ARG italic_ε end_ARG = italic_V start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( italic_n start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT + 1 ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT > 0. The universal approximation theorem [41, Theorem 4] states that for any ε2min(1,ε¯)subscript𝜀21¯𝜀\varepsilon_{2}\leq\min(1,\bar{\varepsilon})italic_ε start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ roman_min ( 1 , over¯ start_ARG italic_ε end_ARG ), there exist N>0𝑁0N>0italic_N > 0, weights and biases such that

i,supx𝒟|R^N(i)(x)Ri(x)|n3ε2.formulae-sequencefor-all𝑖subscriptsupremum𝑥𝒟superscriptsubscript^𝑅𝑁𝑖𝑥subscript𝑅𝑖𝑥superscript𝑛3subscript𝜀2\forall i\in\mathcal{I},\quad\sup_{x\in\mathcal{D}}\left|\hat{R}_{N}^{(i)}(x)-% R_{i}(x)\right|\leq n^{3}\varepsilon_{2}.∀ italic_i ∈ caligraphic_I , roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_D end_POSTSUBSCRIPT | over^ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ( italic_x ) - italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) | ≤ italic_n start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_ε start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT . (35)

Using inequality (35) in (33) leads to x𝒟,xχformulae-sequencefor-all𝑥𝒟norm𝑥𝜒\forall x\in\mathcal{D},\|x\|\geq\chi∀ italic_x ∈ caligraphic_D , ∥ italic_x ∥ ≥ italic_χ:

V^N(x)V(x)|V^N(x)V(x)|Vχ(x)n3ε2ε2.subscript^𝑉𝑁𝑥superscript𝑉𝑥subscript^𝑉𝑁𝑥superscript𝑉𝑥subscript𝑉𝜒𝑥superscript𝑛3subscript𝜀2subscript𝜀2\hat{V}_{N}(x)\geq V^{*}(x)-\left|\hat{V}_{N}(x)-V^{*}(x)\right|\geq V_{\chi}(% x)-n^{3}\varepsilon_{2}\geq\varepsilon_{2}.over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_x ) ≥ italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x ) - | over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_x ) - italic_V start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_x ) | ≥ italic_V start_POSTSUBSCRIPT italic_χ end_POSTSUBSCRIPT ( italic_x ) - italic_n start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT italic_ε start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≥ italic_ε start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT .

Using (34) and (35), we get x𝒟,xχformulae-sequencefor-all𝑥𝒟norm𝑥𝜒\forall x\in\mathcal{D},\|x\|\leq\chi∀ italic_x ∈ caligraphic_D , ∥ italic_x ∥ ≤ italic_χ:

V^N(x)=V2(x)+V^3(x)V2(x)|V^3(x)|V2(x)|V3(x)|ε2|ik=1nxkik|23V2(x)ε2|ik=1nxkik|13V2(x).subscript^𝑉𝑁𝑥absentsubscript𝑉2𝑥subscript^𝑉3𝑥subscript𝑉2𝑥subscript^𝑉3𝑥missing-subexpressionabsentsubscript𝑉2𝑥subscript𝑉3𝑥subscript𝜀2subscript𝑖superscriptsubscriptproduct𝑘1𝑛superscriptsubscript𝑥𝑘subscript𝑖𝑘missing-subexpressionabsent23subscript𝑉2𝑥subscript𝜀2subscript𝑖superscriptsubscriptproduct𝑘1𝑛superscriptsubscript𝑥𝑘subscript𝑖𝑘13subscript𝑉2𝑥\begin{array}[]{rl}\hat{V}_{N}(x)\!\!\!\!&\displaystyle=V_{2}(x)+\hat{V}_{3}(x% )\geq V_{2}(x)-|\hat{V}_{3}(x)|\\ &\displaystyle\geq V_{2}(x)-|V_{3}(x)|-\varepsilon_{2}\left|\sum_{i\in\mathcal% {I}}\prod_{k=1}^{n}x_{k}^{i_{k}}\right|\\ &\displaystyle\geq\frac{2}{3}V_{2}(x)-\varepsilon_{2}\left|\sum_{i\in\mathcal{% I}}\prod_{k=1}^{n}x_{k}^{i_{k}}\right|\geq\frac{1}{3}V_{2}(x).\end{array}start_ARRAY start_ROW start_CELL over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_x ) end_CELL start_CELL = italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ) + over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_x ) ≥ italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ) - | over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_x ) | end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ≥ italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ) - | italic_V start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ( italic_x ) | - italic_ε start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL ≥ divide start_ARG 2 end_ARG start_ARG 3 end_ARG italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ) - italic_ε start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT | ∑ start_POSTSUBSCRIPT italic_i ∈ caligraphic_I end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT italic_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT | ≥ divide start_ARG 1 end_ARG start_ARG 3 end_ARG italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_x ) . end_CELL end_ROW end_ARRAY

Then, the inequality (2b) holds.

Negative time-derivative

Note first that

x𝒟,V^Nx(x)=V2x(x)+o(x).formulae-sequencefor-all𝑥𝒟subscript^𝑉𝑁𝑥𝑥subscript𝑉2𝑥𝑥𝑜norm𝑥\forall x\in\mathcal{D},\quad\frac{\partial\hat{V}_{N}}{\partial x}(x)=\frac{% \partial V_{2}}{\partial x}(x)+o(\|x\|).∀ italic_x ∈ caligraphic_D , divide start_ARG ∂ over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_x end_ARG ( italic_x ) = divide start_ARG ∂ italic_V start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_x end_ARG ( italic_x ) + italic_o ( ∥ italic_x ∥ ) .

Consequently, similarly to the proof of Lemma 3.3, we get:

x𝒟,V^Nx(x)f(x)=x(AH+HA)x+o(x2).formulae-sequencefor-all𝑥𝒟subscript^𝑉𝑁𝑥𝑥𝑓𝑥superscript𝑥topsuperscript𝐴topsuperscript𝐻superscript𝐻𝐴𝑥𝑜superscriptnorm𝑥2\forall x\in\mathcal{D},\quad\frac{\partial\hat{V}_{N}}{\partial x}(x)\cdot f(% x)=x^{\top}\left(A^{\top}H^{*}+H^{*}A\right)x+o(\|x\|^{2}).∀ italic_x ∈ caligraphic_D , divide start_ARG ∂ over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_x end_ARG ( italic_x ) ⋅ italic_f ( italic_x ) = italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_A start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT italic_H start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT + italic_H start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_A ) italic_x + italic_o ( ∥ italic_x ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) .

Note that the universal approximation [41, Theorem 4] also holds for the first derivative, we get that for any ε3>0subscript𝜀30\varepsilon_{3}>0italic_ε start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT > 0, there exist N𝑁Nitalic_N, weights, and biases such that

i,for-all𝑖\displaystyle\forall i\in\mathcal{I},\quad∀ italic_i ∈ caligraphic_I , supx𝒟|R^N(i)(x)Ri(x)|ε3,subscriptsupremum𝑥𝒟superscriptsubscript^𝑅𝑁𝑖𝑥subscript𝑅𝑖𝑥subscript𝜀3\displaystyle\sup_{x\in\mathcal{D}}\left|\hat{R}_{N}^{(i)}(x)-R_{i}(x)\right|% \leq\varepsilon_{3},roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_D end_POSTSUBSCRIPT | over^ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT ( italic_x ) - italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x ) | ≤ italic_ε start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ,
supx𝒟|R^N(i)x(x)Rix(x)|ε3.subscriptsupremum𝑥𝒟superscriptsubscript^𝑅𝑁𝑖𝑥𝑥subscript𝑅𝑖𝑥𝑥subscript𝜀3\displaystyle\sup_{x\in\mathcal{D}}\left|\frac{\partial\hat{R}_{N}^{(i)}}{% \partial x}(x)-\frac{\partial R_{i}}{\partial x}(x)\right|\leq\varepsilon_{3}.roman_sup start_POSTSUBSCRIPT italic_x ∈ caligraphic_D end_POSTSUBSCRIPT | divide start_ARG ∂ over^ start_ARG italic_R end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_i ) end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_x end_ARG ( italic_x ) - divide start_ARG ∂ italic_R start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_x end_ARG ( italic_x ) | ≤ italic_ε start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT .

Using Lemma 3.3.2., a similar reasoning as in the previous subsection implies that V^Nxf<0subscript^𝑉𝑁𝑥𝑓0\frac{\partial\hat{V}_{N}}{\partial x}\cdot f<0divide start_ARG ∂ over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_x end_ARG ⋅ italic_f < 0 on 𝒟\{0}\𝒟0\mathcal{D}\backslash\{0\}caligraphic_D \ { 0 }.

Conclusion

For ε4=min(ε1,ε2,ε3)subscript𝜀4subscript𝜀1subscript𝜀2subscript𝜀3\varepsilon_{4}=\min(\varepsilon_{1},\varepsilon_{2},\varepsilon_{3})italic_ε start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT = roman_min ( italic_ε start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ε start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_ε start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT ), there exist N𝑁Nitalic_N, weights and biases such that the universal approximation theorem holds for ε4subscript𝜀4\varepsilon_{4}italic_ε start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT then V^Nsubscript^𝑉𝑁\hat{V}_{N}over^ start_ARG italic_V end_ARG start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT is a Lyapunov function. By optimality of the region of attraction, the right part of the inclusion (9) holds.

References

  • [1] K. J. Åström and R. Murray, Feedback systems: an introduction for scientists and engineers.   Princeton university press, 2021.
  • [2] H. K. Khalil, Nonlinear Systems, ser. Pearson Education.   Prentice Hall, 2002.
  • [3] E. D. Sontag, Mathematical control theory: deterministic finite dimensional systems.   Springer Science & Business Media, 2013, vol. 6.
  • [4] J. Veenman, C. W. Scherer, and H. Köroğlu, “Robust stability and performance analysis based on integral quadratic constraints,” European Journal of Control, vol. 31, pp. 1–32, 2016.
  • [5] R. F. Curtain and H. Zwart, An introduction to infinite-dimensional linear systems theory.   Springer Science & Business Media, 2012, vol. 21.
  • [6] A. M. Lyapunov, “The general problem of the stability of motion,” International journal of control, vol. 55, no. 3, pp. 531–534, 1992.
  • [7] S. Boyd, L. El Ghaoui, E. Feron, and V. Balakrishnan, Linear matrix inequalities in system and control theory.   SIAM, 1994.
  • [8] A. Vannelli and M. Vidyasagar, “Maximal lyapunov functions and domains of attraction for autonomous nonlinear systems,” Automatica, vol. 21, no. 1, pp. 69–80, 1985.
  • [9] W. Tan and A. Packard, “Stability region analysis using polynomial and composite polynomial lyapunov functions and sum-of-squares programming,” IEEE Transactions on Automatic Control, vol. 53, no. 2, pp. 565–571, 2008.
  • [10] M. Jones and M. M. Peet, “Converse lyapunov functions and converging inner approximations to maximal regions of attraction of nonlinear systems,” in 2021 60th IEEE Conference on Decision and Control (CDC).   IEEE, 2021, pp. 5312–5319.
  • [11] D. Henrion and M. Korda, “Convex computation of the region of attraction of polynomial control systems,” IEEE Transactions on Automatic Control, vol. 59, no. 2, pp. 297–312, 2013.
  • [12] J. Liu, Y. Meng, M. Fitzsimmons, and R. Zhou, “Physics-Informed Neural Network Lyapunov Functions: PDE Characterization, Learning, and Verification,” arXiv preprint arXiv:2312.09131, 2023.
  • [13] G. Chesi, “Rational lyapunov functions for estimating and controlling the robust domain of attraction,” Automatica, vol. 49, no. 4, pp. 1051–1057, 2013.
  • [14] G. Valmorbida and J. Anderson, “Region of attraction estimation using invariant sets and rational lyapunov functions,” Automatica, vol. 75, pp. 37–45, 2017.
  • [15] S. Tarbouriech, G. Garcia, J. M. G. da Silva Jr, and I. Queinnec, Stability and stabilization of linear systems with saturating actuators.   Springer Science & Business Media, 2011.
  • [16] M. Raissi, P. Perdikaris, and G. E. Karniadakis, “Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations,” Journal of Computational physics, 2019.
  • [17] G. E. Karniadakis, I. G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, and L. Yang, “Physics-informed machine learning,” Nature Reviews Physics, 2021.
  • [18] S. Cai, Z. Mao, Z. Wang, M. Yin, and G. E. Karniadakis, “Physics-informed neural networks (pinns) for fluid mechanics: A review,” Acta Mechanica Sinica, vol. 37, no. 12, pp. 1727–1738, 2021.
  • [19] G. Kissas, Y. Yang, E. Hwuang, W. R. Witschey, J. A. Detre, and P. Perdikaris, “Machine learning in cardiovascular flows modeling: Predicting arterial blood pressure from non-invasive 4d flow mri data using physics-informed neural networks,” Computer Methods in Applied Mechanics and Engineering, vol. 358, p. 112623, 2020.
  • [20] Y. Bai, T. Chaolu, and S. Bilige, “The application of improved physics-informed neural network (ipinn) method in finance,” Nonlinear Dynamics, vol. 107, no. 4, pp. 3655–3667, 2022.
  • [21] N. Gaby, F. Zhang, and X. Ye, “Lyapunov-net: A deep neural network architecture for lyapunov function approximation,” in 2022 IEEE 61st Conference on Decision and Control (CDC), 2022.
  • [22] R. Zhou, T. Quartz, H. De Sterck, and J. Liu, “Neural lyapunov control of unknown nonlinear systems with stability guarantees,” Advances in Neural Information Processing Systems, vol. 35, 2022.
  • [23] V. I. Zubov, Methods of AM Lyapunov and their application.   US Atomic Energy Commission, 1961, vol. 4439.
  • [24] Y. Long and M. M. Bayoumi, “Feedback stabilization: control lyapunov functions modelled by neural networks,” in Proceedings of 32nd IEEE Conference on Decision and Control, vol. 3, 1993.
  • [25] T. X. Nghiem, J. Drgoňa, C. Jones, and al., “Physics-informed machine learning for modeling and control of dynamical systems,” in American Control Conference (ACC), 2023.
  • [26] C. Dawson, S. Gao, and C. Fan, “Safe control with learned certificates: A survey of neural Lyapunov, barrier, and contraction methods for robotics and control,” IEEE Transactions on Robotics, 2023.
  • [27] L. Grüne, “Computing lyapunov functions using deep neural networks,” Journal of Computational Dynamics, vol. 8, no. 2, pp. 131–152, 2021.
  • [28] Z. J. Kolter and G. Manek, “Learning stable deep dynamics models,” Advances in neural information processing systems, vol. 32, 2019.
  • [29] Y.-C. Chang, N. Roohi, and S. Gao, “Neural Lyapunov control,” Advances in neural information processing systems, vol. 32, 2019.
  • [30] A. Abate, D. Ahmed, M. Giacobbe, and A. Peruffo, “Formal synthesis of lyapunov neural networks,” IEEE Control Systems Letters, 2020.
  • [31] C. Dawson, Z. Qin, S. Gao, and C. Fan, “Safe nonlinear control using robust neural Lyapunov-barrier functions,” in Conference on Robot Learning.   PMLR, 2022, pp. 1724–1735.
  • [32] A. Abate, D. Ahmed, A. Edwards, M. Giacobbe, and A. Peruffo, “FOSSIL: a software tool for the formal synthesis of lyapunov functions and barrier certificates using neural networks,” in Proceedings of the 24th International Conference on Hybrid Systems: Computation and Control, 2021, pp. 1–11.
  • [33] S. M. Richards, F. Berkenkamp, and A. Krause, “The Lyapunov neural network: Adaptive stability certification for safe learning of dynamical systems,” in Conference on Robot Learning.   PMLR, 2018, pp. 466–476.
  • [34] A. Mehrjou, M. Ghavamzadeh, and B. Schölkopf, “Neural lyapunov redesign,” arXiv preprint arXiv:2006.03947, 2020.
  • [35] H. Dai, B. Landry, L. Yang, M. Pavone, and R. Tedrake, “Lyapunov-stable neural-network control,” arXiv preprint arXiv:2109.14152, 2021.
  • [36] S. Mukherjee, J. Drgoňa, A. Tuor, M. Halappanavar, and D. Vrabie, “Neural Lyapunov differentiable predictive control,” in 2022 IEEE 61st Conference on Decision and Control (CDC).   IEEE, 2022, pp. 2097–2104.
  • [37] D. Angeli and E. D. Sontag, “Forward completeness, unboundedness observability, and their lyapunov characterizations,” Systems & Control Letters, vol. 38, no. 4-5, pp. 209–217, 1999.
  • [38] T. Glad and L. Ljung, Control theory.   CRC press, 2000.
  • [39] S. Krantz and H. Parks, A Primer of Real Analytic Functions, ser. A Primer of Real Analytic Functions.   Birkhäuser Boston, 2002.
  • [40] R. Coleman, Calculus on normed vector spaces.   Springer Science & Business Media, 2012.
  • [41] K. Hornik, “Approximation capabilities of multilayer feedforward networks,” Neural networks, vol. 4, no. 2, pp. 251–257, 1991.
  • [42] L. Lu, R. Pestourie, W. Yao, Z. Wang, F. Verdugo, and S. G. Johnson, “Physics-informed neural networks with hard constraints for inverse design,” SIAM Journal on Scientific Computing, vol. 43, no. 6, pp. B1105–B1132, 2021.
  • [43] M. Barreau, M. Aguiar, J. Liu, and K. H. Johansson, “Physics-informed learning for identification and state reconstruction of traffic density,” in 2021 60th IEEE Conference on Decision and Control (CDC), 2021, pp. 2653–2658.
  • [44] M. L. Delle Monache, C. Pasquale, M. Barreau, and R. Stern, “New frontiers of freeway traffic control and estimation,” in 2022 IEEE 61st Conference on Decision and Control (CDC).   IEEE, 2022, pp. 6910–6925.
  • [45] M. X. Goemans and D. P. Williamson, “The primal-dual method for approximation algorithms and its application to network design problems,” Approximation algorithms for NP-hard problems, pp. 144–191, 1997.
  • [46] M. L. Delle Monache, C. Pasquale, M. Barreau, and R. Stern, “New frontiers of freeway traffic control and estimation,” in 2022 IEEE 61st Conference on Decision and Control (CDC), 2022, pp. 6910–6925.
  • [47] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in International Conference on Learning Representations (ICLR), San Diego, CA, USA, 2015.
  • [48] Y. Bengio, J. Louradour, R. Collobert, and J. Weston, “Curriculum learning,” in Proceedings of the 26th annual international conference on machine learning, 2009, pp. 41–48.
  • [49] P. Rathore, W. Lei, Z. Frangella, L. Lu, and M. Udell, “Challenges in training pinns: A loss landscape perspective,” arXiv preprint arXiv:2402.01868, 2024.
  • [50] S. Diamond and S. Boyd, “CVXPY: A Python-embedded modeling language for convex optimization,” Journal of Machine Learning Research, vol. 17, no. 83, pp. 1–5, 2016.
  • [51] M. Münzer and C. Bard, “A curriculum-training-based strategy for distributing collocation points during physics-informed neural network training,” arXiv preprint arXiv:2211.11396, 2022.
  • [52] A. Daw, J. Bu, S. Wang, P. Perdikaris, and A. Karpatne, “Mitigating propagation failures in physics-informed neural networks using retain-resample-release (r3) sampling,” arXiv preprint arXiv:2207.02338, 2022.
  • [53] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning.   MIT Press, 2016.
  • [54] M. Fazlyab, A. Robey, H. Hassani, M. Morari, and G. Pappas, “Efficient and accurate estimation of Lipschitz constants for deep neural networks,” Advances in neural information processing systems, vol. 32, 2019.
  • [55] Y. Ebihara, X. Dai, V. Magron, D. Peaucelle, and S. Tarbouriech, “Local Lipschitz constant computation of relu-fnns: Upper bound computation with exactness verification,” arXiv preprint arXiv:2310.11104, 2023.
  • [56] M. Mohri, A. Rostamizadeh, and A. Talwalkar, Foundations of Machine Learning.   MIT press, 2018.
  • [57] B. Van der Pol, “A theory of the amplitude of free and forced triode vibrations,” Radio Review, vol. 1, no. 701–710, 1920.