Xiao 等 - 2023 - BarrierNet Differentiable Control Barrier Functions for Learning of Safe Robot Control
Xiao 等 - 2023 - BarrierNet Differentiable Control Barrier Functions for Learning of Safe Robot Control
Authorized licensed use limited to: Zhejiang University. Downloaded on October 11,2024 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.
2290 IEEE TRANSACTIONS ON ROBOTICS, VOL. 39, NO. 3, JUNE 2023
III. BACKGROUND
In this section, we briefly introduce CBFs and refer interested
readers to [1] for more details. Intuitively, CBFs are a means
to translate state constraints to control constraints under affine
dynamics. The controls that satisfy those constraints can be effi-
ciently solved by formulating a QP. We start with the definition
of class K functions. We further define a sequence of sets Ci , i ∈ {1, . . . , m} as-
Definition 1. (Class K function [47]): A continuous func- sociated with (2) in the form
tion α : [0, a) → [0, ∞), a > 0 is said to belong to class K if Ci := {x ∈ Rn : ψi−1 (x) ≥ 0}, i ∈ {1, . . . , m}. (3)
it is strictly increasing and α(0) = 0. A continuous function
Definition 4. (HOCBF [12]): Let C1 , . . . , Cm be defined
β : R → R is said to belong to the extended class K if it is
by (3) and ψ1 (x), . . . , ψm (x) be defined by (2). A function
strictly increasing and β(0) = 0.
b : Rn → R is an HOCBF of relative degree m for system (1)
Consider an affine control system of the form
if there exist (m − i)th order differentiable class K functions
ẋ = f (x) + g(x)u (1) αi , i ∈ {1, . . . , m − 1} and a class K function αm such that
where x ∈ Rn , f : Rn → Rn , and g : Rn → Rn×q are locally sup Lm f b(x) + L g L m−1
f b(x) u+O(b(x))
Lipschitz, and u ∈ U ⊂ Rq , where U denotes a control con- u∈U
straint set. Each of the control component bounds is assumed + αm (ψm−1 (x))] ≥ 0 (4)
to be independent of each other (since we consider low-speed
experiments in this work, such as autonomous driving), but for all x ∈ C1 ∩, . . . , ∩Cm . In (4), Lm f (Lg ) denotes Lie
the CBF-based method can still work when control bounds derivatives
m−1 i along f (g) m (one) times, and O(b(x)) =
are coupled, which can be written as different forms of input i=1 Lf (αm−i ◦ ψm−i−1 )(x). Further, b(x) is such that
constraints in the QP. Lg Lm−1
f b(x) = 0 on the boundary of the set C1 ∩, . . . , ∩Cm .
Definition 2: A set C ⊂ Rn is forward invariant for system The HOCBF is a general form of the relative degree one
(1) if its solutions for some u ∈ U starting at any x(0) ∈ C CBF [2] (setting m = 1 reduces the HOCBF to the common
satisfy x(t) ∈ C, ∀t ≥ 0. CBF form). We can define αi (·), i ∈ {1, . . . , m} in Definition
Definition 3. (Relative degree): The relative degree of a (suf- 4 to be extended class K functions to ensure robustness of an
ficiently many times) differentiable function b : Rn → R with HOCBF to perturbations [2].
respect to system (1) is the number of times it needs to be Theorem 1. ([12]): Given an HOCBF b(x) from Definition
differentiated along its dynamics until any component of the 4 with the associated sets C1 , . . . , Cm defined by (3), if x(0) ∈
control u explicitly shows in the corresponding derivative. C1 ∩, . . . , ∩Cm , then any Lipschitz continuous controller u(t)
For systems with multiple control inputs, existing CBF meth- that satisfies the constraint in (4), ∀t ≥ 0 renders C1 ∩, . . . , ∩Cm
ods may fail due to inconsistent relative degrees across control forward invariant for system (1).
components. In such cases, we may define a relative degree set, We provide a summary of notations in Table I.
which can be addressed by integral CBFs [48] that can make
desired (all) control components show up in the derivative. Since IV. PROBLEM FORMULATION
function b is used to define a (safety) constraint b(x) ≥ 0, we We make the following assumptions in this work:
will refer to the relative degree of b as the relative degree of Assumption 1:
the constraint. For a constraint b(x) ≥ 0 with relative degree 1) All measurements are assumed to be precise and reliable,
m, b : Rn → R, and ψ0 (x) := b(x), we define a sequence of and there are no occluded objects.
functions ψi : Rn → R, i ∈ {1, . . . , m} 2) The environment model is assumed to be accessible with-
ψi (x) := ψ̇i−1 (x) + αi (ψi−1 (x)), i ∈ {1, . . . , m} (2) out time delay.
3) The system is not subject to any disturbances.
where αi (·), i ∈ {1, . . . , m} denotes a (m − i)th order differ- 4) The system dynamics (1) are assumed to accurately match
entiable class K function. the real system.
Authorized licensed use limited to: Zhejiang University. Downloaded on October 11,2024 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.
2292 IEEE TRANSACTIONS ON ROBOTICS, VOL. 39, NO. 3, JUNE 2023
5) The robot moves in low speed such that the control bounds the performance of the system. For instance, if a class K function
are independent from each other. αi in the HOCBF is too steep (like a step function), then they are
The above assumptions can be addressed using the online the least conservative. However, such a HOCBF constraint (4)
verification method [22], but this approach tends to be compu- becomes active only near the unsafe set boundary, requiring a
tationally expensive. These assumptions can also be addressed large control input effort. Hence, there might not exist a feasible
using the efficient event-triggered control framework [29], in control that satisfies the HOCBF constraint (4) and the control
which the issues of unknown dynamics, disturbances, and bound at the same time. If, on the other hand, αi is too flat,
measurement uncertainties are considered. This event-triggered then the HOCBF is over-conservative as the HOCBF constraint
framework can be directly applied to the proposed BarrierNet, becomes active when the system is far from the unsafe set
and thus, we only focus on the theoretic foundations in this boundary. Ideally, we wish to find a αi that has the steepest
work. For reliability of perception and occluded objects, we may slope while making the HOCBF constraint compatible with the
consider multisensors or infer from other observed participants, control bound when it becomes active. In such cases, we call
and this will be further studied in future work. a HOCBF nonconservative. The current approaches [30], [31]
Here, we formally define the learning problem for safety- only consider the feasibility issue of the CBF method without
critical control. considering this conservativeness.
Problem 1: Given: BarrierNet learns an HOCBF without losing the safety guar-
1) a system with known affine dynamics in the form of (1); antees. In addition, we also incorporate HOCBF in a differen-
2) a state-feedback nominal controller h (x) = u (such as tiable optimization layer to allow the tuning of its parameters
the model predictive controller) that is taken as the training from data. Given a safety requirement b(x) ≥ 0 with relative
label; degree m for system (1), we redefine the sequence of CBFs in
3) a set of safety constraints bj (x) ≥ 0, j ∈ S (where bj is (2) as
continuously differentiable, S is a constraint set);
4) control bounds umin ≤ u ≤ umax ;
ψi (x, z, z d ) := ψ̇i−1 (x, z, z d ) + pi (z)αi (ψi−1 (x, z, z d ))
5) a neural network controller h(x|θ) = u parameterized by
θ; i ∈ {1, . . . , m} (5)
our goal is to find the optimal parameters
θ = arg min Ex [l(h (x), h(x|θ))] where ψ0 (x, z) = b(x). The variable z ∈ Rd is the input of
θ the neural network (d ∈ N is the dimension of the features),
while guaranteeing the satisfaction of the safety constraints in and it is assumed that the relative degree of each component
3) and control bounds in 4). E(·) is the expectation and l(·, ·) in z is not lower than that of the safety constraint (this makes
denotes a similarity measure. sure that no control will appear in the derivatives of z during
Problem 1 defines policy distillation with safety guarantees. construction of an HOCBF), which is reasonable since z often
The safety constraints can be predefined by users or they can be comes from high-dimensional sensory measurement like im-
learned [33], [34]. ages; otherwise, its derivative can be omitted, as shown later.
z d = (z (1) , . . . , z (m−1) ) ∈ R(m−1)d denotes the derivatives of
V. BARRIERNET the input z. pi : Rd → R>0 , i ∈ {1, . . . , m} are the outputs of
the previous layer (e.g., CNN, LSTM, or MLP, as shown in
We introduce BarrierNet–a CBF-based neural network con-
Fig. 12) or trainable parameters themselves (i.e., pi is indepen-
troller with parameters trainable via backpropagation. We define
dent of z), where R>0 denotes the set of positive scalars. Note
the safety guarantees of a neural network controller as follows.
that it is possible to absorb pi (z) into the class K function αi
Definition 5: (Safety guarantees) A neural network controller
for notational simplicity, but it may raise the question of how to
has safety guarantees for system (1) if its outputs (controls)
learn a class K function. We adopt the most general and flexible
satisfy control bounds 4) in Problem 1 and drive system (1)
formulation that allows arbitrary class K function definition. If
such that bj (x(t)) ≥ 0, ∀t ≥ 0, ∀j ∈ S.
pi , i ∈ {1, . . . , m} are just trainable parameters, then the above
BarrierNet addresses several limitations with existing meth-
learnable CBFs are not involved with z d . The above formulation
ods. The HOCBF method provides safety guarantees for control
is similar to AdaCBFs [32] and can still guarantee safety, but is
systems with arbitrary relative degrees but in a conservative way.
trainable and does not require us to design auxiliary dynamics
In other words, the satisfaction of the HOCBF constraint (4)
for pi (that an AdaCBF does require), which is a challenging
is only a sufficient condition for the satisfaction of the original
aspect of the existing AdaCBF method. Then, we have a similar
safety constraint b(x) ≥ 0. This conservativeness of the HOCBF
HOCBF constraint (called differentiable CBF (dCBF)] as in
method significantly limits the system’s performance. For ex-
Definition 4 of the form
ample, conservativeness may drive the system much further
away from obstacles than necessary. Our first motivation for
f b(x) + Lg Lf
Lm b(x) u+O (b(x), z, z d )
m−1
BarrierNet is to address this conservativeness.
More specifically, an HOCBF constraint is always a hard
constraint in order to guarantee safety. This may adversely affect + pm (z)αm (ψm−1 (x, z, z d )) ≥ 0 (6)
Authorized licensed use limited to: Zhejiang University. Downloaded on October 11,2024 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.
XIAO et al.: BarrierNet: DIFFERENTIABLE CONTROL BARRIER FUNCTIONS FOR LEARNING OF SAFE ROBOT CONTROL 2293
A. Forward Pass
The forward step of a BarrierNet is to solve the QP in
Definition 6. The inputs of a BarrierNet include environmental
features z (such as the location and speed of an obstacle) that can
be provided directly or from a tracking network if raw sensory
inputs are used. BarrierNet also takes as inputs the system states
x as feedback, as shown in Fig. 1. The outputs are the solutions
of the QP (the resultant controls).
B. Backward Pass
Fig. 3. BarrierNet structure. F serves as a reference output (control), and all The main task of BarrierNet is to provide controls while
the other parameters could be either trainable (parameter of the BarrierNet)
or depend on the previous layers (input of the BarrierNet). Note that we can
always ensuring safety. Suppose denotes some loss function (a
compute reference control as H −1 F using reference F . similarity measure). Using the Lagrangian of QP followed by
applying the Karush–Kuhn–Tucker conditions, as introduced
in [38], we can find the loss gradient with respect to all the
parameters. Specifically, let λ denote the dual variables on the
Given learnable dCBFs defined by (5) and (6) that are not HOCBF constraints, and D(·) creates a diagonal matrix from a
conservative, we can incorporate them as a differentiable opti- vector, and let u∗ , λ∗ denote the optimal solutions of u and λ,
mization layer, which is our BarrierNet. respectively, we can first get du and dλ in the form
Definition 6 (BarrierNet): A BarrierNet is composed by neu- −1
du H GT D(λ∗ ) ∂ T
rons in the form = ∂u∗ (9)
dλ G D(Gu∗ − h) 0
1 where G, h are concatenated by Gj , hj , j ∈ S, where
u∗ (t) = arg min u(t)T H(z|θh )u(t) + F T (z|θf )u(t)
u(t) 2
Gj = − Lg Lm−1
f bj (x)
(7)
h j = Lm
f bj (x) + O (bj (x), z, z d )
s.t.
+ pm (z)αm (ψm−1 (x, z, z d )) . (10)
f bj (x) + [Lg Lf
Lm bj (x)]u + O(bj (x), (z, z d )|θp )
m−1
Since the control bounds in (7) are not trainable, they are not
+ pm (z|θpm )αm (ψm−1 (x, (z, z d )|θp )) ≥ 0, j ∈ S included in G, h.
umin ≤ u ≤ umax , t = kΔt + t0 (8) Then, the relevant gradient with respect to all the BarrierNet
parameters can be given by1
1
where H(z|θh ) ∈ Rq×q is positive definite, H −1 (z|θh )F (z|θf ) H = (du uT + udTu ), F = du
2
∈ Rq could be interpreted as a reference control (the output
of previous network layers) and θh , θf , θp = (θp1 , . . . , θpm ) are G = D(λ∗ )(dλ uT + λdTu ), h = −D(λ∗ )dλ . (11)
trainable parameters, Δt > 0 is the discretization time. In the above equation, G is not applicable in a BarrierNet
At a high level, this formulation produces control close to as it is determined by the corresponding HOCBF. h is also
that of the nominal controller while satisfying dCBF constraints, not directly related to the input of a BarrierNet. Nevertheless,
guaranteeing safety with minimal deterioration in performance. we have pi = hj pi hj , i ∈ {1, . . . , m}, j ∈ S, where
The inequality (8) in Definition 6 guarantees each safety con-
hj is given by h in (11) and pi hj is given by taking
straint bj (x) ≥ 0, ∀j ∈ S through the parameterized function the partial derivative of hj in (10).
pi , i ∈ {1, . . . , m}. Based on Definition 6, for instance, if we The following theorem characterizes the safety guarantees of
have ten control agents, we need ten BarrierNet neurons pre- a BarrierNet.
sented by (7) to ensure the safety of each agent. This implies Theorem 2: If pi (z), i ∈ {1, . . . , m} are differentiable (fur-
that BarrierNet can be extended to multiagent settings. ther, the relative degree of each component in z is not lower
In Definition 6, we make H(z|θh ) parameterized and depen- than that of the safety constraint) or pi , i ∈ {1, . . . , m} are
dent on the network input z, but H can also be directly trainable parameters to be trained, then a BarrierNet composed by neurons
parameters that do not depend on the previous layer (i.e., we as in Definition 6 guarantees the safety of system (1).
have H). The same applies to pi , i ∈ {1, . . . , m}. The train- Proof: If pi , i ∈ {1, . . . , m} are trainable parameters (i.e., pi
able parameters are θ = {θh , θf , θp } (or θ = {H, θf , pi , ∀i ∈ is independent of z), then the dCBF (6) is just a regular HOCBF
{1, . . . , m}} if H and pi do not depend on the previous layer). whose parameters are optimally determined by the training data,
The solution u∗ is the output of the neuron. The whole structure and thus, safety is guaranteed for system (1) by Theorem 1.
of a BarrierNet is shown in Fig. 3. The BarrierNet is differ-
entiable with respect to its parameters [38]. We describe the 1 Note: the gradient with respect to the parameters θ , θ , and θ can be
h f p
forward and backward passes as follows. obtained using the chain rule.
Authorized licensed use limited to: Zhejiang University. Downloaded on October 11,2024 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.
2294 IEEE TRANSACTIONS ON ROBOTICS, VOL. 39, NO. 3, JUNE 2023
If pi (z), i ∈ {1, . . . , m} are differentiable and the relative has only negative real roots and pi (z), i ∈ {1, . . . , m} are such
degree of each component in z is not lower than that of the that x ∈ C1 ∩, . . . , ∩Cm .
safety constraint (no control will appear in the derivatives of As shown in (12), if z d = 0, then Pr (p) = 0, i.e., (12) can be
z during construction of an HOCBF), it follows from Theo- rewritten as
rem 1 that each ψi (x, z, z d ) in (5) is a valid CBF. Starting
f bj (x) + [Lg Lf
Lm bj (x)]u
m−1
from ψm (x, z, z d ) ≥ 0 [the nonLie-derivative form of each
HOCBF constraint (6)], we can show that ψm−1 (x, z, z d ) ≥ 0 m r
is guaranteed to be satisfied following Theorem 1. Recursively, (m−r)
+ pkl (z) bj (x) ≥ 0.
we can show that ψ0 (x, z) ≥ 0 is guaranteed to be satisfied. As r=1 1≤k1 <k2 ...<kr ≤m l=1
b(x) = ψ0 (x, z) following (5), we have that system (1) is safety (15)
guaranteed in a BarrierNet.
Consider the case that pi (z), i ∈ {1, . . . , m} depend on the Combining (15) and (14), and by Vieta’s formulas, we have
input. In order to guarantee that pi (z), i ∈ {1, . . . , m} are that −pi (z), i ∈ {1, . . . , m} (pi (z) > 0) are the negative real
continuously differentiable and positive, we can choose dif- roots for the polynomial (13). Therefore, the condition (15)
ferentiable activation functions for the previous layers, such implies the satisfaction of bj (x) ≥ 0, and the BarrierNet com-
as sigmoid functions. The derivatives of the penalty functions posed by neurons as in Definition 6 guarantees the safety of
pi (z), i ∈ {1, . . . , m}, i.e., the derivatives of z may be hard to system (1).
evaluate. For instance, if z contains the pixels of an image, then In order to get piece-wise constant penalties pi (z) (i.e., z d =
its derivative is inaccurate when the camera sampling frequency 0) in dCBFs, we can determine the value of pi (z) and make it
is low. Nevertheless, we have the following corollary to show a constant before the construction of dCBFs, which can avoid
the safety guarantees of the BarrierNet. taking the derivatives of pi (z). If we do wish to consider z d in
Corollary 1: If z d = 0, the class K functions αi (·), i ∈ the BarrierNet, then the uncertainty of z d can be addressed by
{1, . . . , m} in (5) are linear functions, and pi (z), i ∈ considering the inter-sampling effect (i.e., constraint satisfaction
{1, . . . , m} are such that x ∈ C1 ∩, . . . , ∩Cm , then a BarrierNet between discrete time instant) [28], [29] when the bounds of z d
composed by neurons as in Definition 6 guarantees the safety of are known. We have shown the safety guarantees of a BarrierNet
system (1). with linear class K functions when z d = 0 by Corollary 1. We
Proof: When the class K functions αi (·), i ∈ {1, . . . , m} in can also infer that safety is still guaranteed for other types of
(5) are linear functions, the dCBF constraint (8) can be rewritten class K functions in a BarrierNet. pi (z) can be truncated to
as satisfy x ∈ C1 ∩, . . . , ∩Cm , and this is discussed in [12]. If no
such pi (z) exists (such as due to uncertainties), then we can
f bj (x) + [Lg Lf
Lm bj (x)]u
m−1
define all class K functions to be the extended class K functions
m r
[such as the linear functions shown in (15)] to achieve robust
(m−r)
+ pkl (z) +Pr (p) bj (x) ≥ 0 control [2].
r=1 1≤k1<k2 ...<kr≤m l=1 Remark 1 (Adaptivity of the BarrierNet): The HOCBF con-
(12) straints in a BarrierNet are regulated by the trainable penalty
functions without losing safety guarantees. The penalty func-
(0)
where bj (x) = bj (x) and Pr (p), r ∈ {1, . . . , m} are the poly- tions are environment-dependent. Their features can be calcu-
nomials of the derivatives of pi (z), i ∈ {1, . . . , m}, such that lated from upstream networks. The adaptive property of the
z d = 0 implies Pr (p) = 0, ∀r ∈ {1, . . . , m}. Note that when HOCBFs provides the adaptiveness of the BarrierNet. As a
setting z d = 0, it is equivalent to setting pi to be piecewise result, BarrierNet is able to generate safe controls while avoiding
constant among all discretized time intervals, and the value of overly conservative behavior.
pi within each time interval is determined by the value of the Remark 2 (Feasibility guarantees of a BarrierNet): Due to the
observation z at the beginning of the discretized time interval. existence of the control bound in a BarrierNet, the dCBF-based
If z is dependent on x, in order to make the control u not show QP (7) may become infeasible because of the possible conflict
up in z d [equivalently, Pr (p)], we may use the observation z between the control bound and the dCBF constraints. In order
such that pi (z) has a relative degree that is not lower than the to address this, we may require that the nominal controller
one of the safety constraint. provide ground truth (i.e., control labels) that strictly satisfies
According to the exponential CBF [3], a system is safety the safety constraints and the control bounds. Then, during the
guaranteed in terms of the constraint bj (x) ≥ 0 if the polynomial training of the BarrierNet layer, we can relax/remove the control
bounds. After the neural network converges, the differentiable
m−1
QPs are feasible when we add control bounds in the testing
s +
m
lr s r = 0 (13)
or implementation. However, there is still the possibility that
r=0
the QP could be infeasible as the BarrierNet may have some
corresponding to inputs that it had not seen before. However, we can find sufficient
m−1 conditions of feasibility, as shown in [31]. Briefly, this approach
(r) finds a feasibility constraint on the state of the system along
f bj (x) + [Lg Lf
Lm bj (x)]u + lr bj (x) ≥ 0
m−1
(14)
r=0 with the penalties pi (z), i ∈ {1, . . . , m}, and then enforces this
Authorized licensed use limited to: Zhejiang University. Downloaded on October 11,2024 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.
XIAO et al.: BarrierNet: DIFFERENTIABLE CONTROL BARRIER FUNCTIONS FOR LEARNING OF SAFE ROBOT CONTROL 2295
Authorized licensed use limited to: Zhejiang University. Downloaded on October 11,2024 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.
2296 IEEE TRANSACTIONS ON ROBOTICS, VOL. 39, NO. 3, JUNE 2023
Fig. 5. Control and penalty function p1 (z) from the BarrierNet when training
with the optimal controller. The blue curves (labeled as implementation) are the
vehicle control when we apply the BarrierNet to drive the vehicle dynamics to
pass through the CZ.
Authorized licensed use limited to: Zhejiang University. Downloaded on October 11,2024 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.
XIAO et al.: BarrierNet: DIFFERENTIABLE CONTROL BARRIER FUNCTIONS FOR LEARNING OF SAFE ROBOT CONTROL 2297
(x − xo )2 + (y − yo )2 ≥ R2 (21)
where
Authorized licensed use limited to: Zhejiang University. Downloaded on October 11,2024 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.
2298 IEEE TRANSACTIONS ON ROBOTICS, VOL. 39, NO. 3, JUNE 2023
Fig. 8. Controls and trajectories from the FC, DFB, and BarrierNet under the training obstacle size R = 6 m. The results refer to the case that the trained
FC/DFB/BarrierNet controller is used to drive a robot to its destination. Safety is guaranteed in both DFB and BarrierNet models but not in the FC model. The
DFB tends to be more conservative such that the trajectories/controls stay away from the ground truth as its CBF parameters are not adaptive. The varying penalty
functions allow the generation of desired control signals and trajectories (given by training labels) and demonstrate the adaptivity of the BarrierNet with safety
guarantees. (a) Control u1 . (b) Control u2 . (c) Penalty functions. (d) Robot trajectories.
When we increase the obstacle size during implementation in the sense that with the varying penalty functions, a BarrierNet
(i.e., the trained BarrierNet/DFB/FC controller is used to drive a can produce desired control signals given by labels (ground
robot to its destination), the controls u1 , u2 from the BarrierNet truth). This is due to the fact the varying penalty functions change
and DFB deviate from the ground truth, as shown in Fig. 9(a) the slop of class K functions in the HOCBF constraint without
and (b). This is due to the fact that the BarrierNet and DFB will losing safety guarantees.
always ensure safety first. Therefore, safety is always guaranteed
in the BarrierNet and DFB, as the solid and dashed curves shown C. Three-Dimensional Robot Navigation
in Fig. 10(a). Both the BarrierNet and DFB show some adaptivity
1) Experiment Setup: We consider a robot navigation prob-
to the size change of the obstacle, while the FC controller cannot
lem with obstacle avoidance in 3-D space. In this case, we con-
be adaptive to the size change of the obstacle. Thus, the safety
sider complicated superquadratic safety constraints. The robot
constraint (21) will be violated, as shown by the dotted curves
navigates according to the double-integrator dynamics. The state
in Fig. 10(a).
of the robot is x = (px , vx , py , vy , pz , vz ) ∈ R6 , in which the
The difference between the DFB and the proposed BarrierNet
components denote the position and speed along x, y, z axes.
is in the performance. In Fig. 10(b), we show all the trajectories
The three control inputs u1 , u2 , and u3 are the acceleration
from the BarrierNet, DFB, and FC controllers under different
along x, y, and z axes, respectively.
obstacle sizes. Collisions are avoided under the BarrierNet and
2) BarrierNet Design: The robot is required to avoid a su-
DFB controllers, as shown by all the solid and dashed trajectories
perquadratic obstacle in its path, i.e, the state of the robot should
and the corresponding obstacle boundaries in Fig. 10(b). How-
satisfy
ever, as shown in Fig. 10(b), the trajectories from the BarrierNet
(solid) can stay closer to the ground truth (red-solid) than the (px − xo )4 + (py − yo )4 + (pz − zo )4 ≥ R4 (25)
ones from the DFB (dashed) when R = 6 m (and other R val-
where (xo , yo , zo ) ∈ R3 denotes the location of the obstacle, and
ues). This is due to the fact that the CBFs in the DFB may not be
R > 0 is the half-length of the superquadratic obstacle.
properly defined such that the CBF constraint is active too early
The goal is to minimize the control input effort while subject to
when the robot gets close to the obstacle. It is important to note
the safety constraint (25) as the robot approaches its destination.
that the robot does not have to stay close to the obstacle boundary
The relative degree of the safety constraint (25) is two with
under the BarrierNet controller, and this totally depends on the
respect to the dynamics, thus, we use an HOCBF b(x) =
ground truth. The definitions of CBFs in the proposed BarrierNet
(px − xo )4 + (py − yo )4 + (pz − zo )4 − R4 to enforce it. Any
depend on the environment (network input), and thus, they are
control input u should satisfy the HOCBF constraint (4) which
adaptive and are without conservativeness.
in this case (choose α1 , α2 in Definition 4 as linear functions) is
The profiles of the penalty functions p1 (z)and p2 (z) in the
BarrierNet are shown in Fig. 8(c). The values of the penalty −Lg Lf b(x)u ≤ L2f b(x) + (p1 (z) + p2 (z))Lf b(x)
functions vary when the robot approaches the obstacle and gets
to its destination, and it shows the adaptivity of the BarrierNet +(ṗ1 (z) + p1 (z)p2 (z))b(x) (26)
Authorized licensed use limited to: Zhejiang University. Downloaded on October 11,2024 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.
XIAO et al.: BarrierNet: DIFFERENTIABLE CONTROL BARRIER FUNCTIONS FOR LEARNING OF SAFE ROBOT CONTROL 2299
Fig. 9. Controls from the BarrierNet and DFB under different obstacle sizes. Fig. 10. Safety metrics for the BarrierNet, the DFB and the FC network. The
The BarrierNet and DFB are trained under the obstacle size R = 6 m. The results BarrierNet, the DFB and the FC network are trained under the obstacle size
refer to the case that the trained BarrierNet/DFB controller is used to drive a robot R = 6 m. b(x) ≥ 0 implies safety guarantee. The trajectories under the FC
to its destination. When we increase the obstacle size during implementation, controller coincide as the FC cannot adapt to the size change of the obstacle.
the outputs (controls of the robot) of the BarrierNet and the DFB will adjust (a) HOCBF b(x) profiles under different obstacle sizes. (b) Robot trajectories
accordingly in order to guarantee safety, as shown by the blue and cyan curves. under different obstacle sizes.
However, the BarrierNet tends to be less conservative for unseen situations. (a)
Control u1 . (b) Control u2 .
and this is due to the complicated safety constraint (25). We can
where improve the tracking accuracy with deeper BarrierNet models
(not the focus of this article). Nevertheless, the implementation
Lg Lf b(x) = 4(px − xo )3 , 4(py − yo )3 , 4(pz − zo )3 trajectory under the BarrierNet controller is close to the ground
L2f b(x) = 12(px −xo )2 vx2 +12(py −yo )2 vy2 +12(pz −xo )2 vz2 truth, as shown in Fig. 11(b).
The robot is guaranteed to be collision-free from the obstacle
Lf b(x) = 4(px −xo )3 vx +4(py − yo )3 vy +4(pz − zo )3 vz . under the BarrierNet controller, as the solid-blue line shown
(27) in Fig. 11(b), while the robot from the FC may collide with
the obstacle as there is no safety guarantee, as the dotted-blue
In the above equations, z = x is the input to the model, and
line shown in Fig. 11(b). The barrier function in Fig. 11(a) also
p1 (z), p2 (z) are the trainable penalty functions. ṗ1 (x) is also
demonstrates the safety guarantees of the BarrierNet, but not in
set as 0 as in the 2-D navigation case.
the FC model.
The cost in the neuron of the BarrierNet is given by
min(u1 − f1 (z))2 + (u2 − f2 (z))2 + (u3 − f3 (z))2 (28) VII. BARRIERNET FOR VISION-BASED AUTONOMOUS DRIVING
u
where f1 (z), f2 (z), and f3 (z) are references controls provided In this section, we use the proposed BarrierNet methodology
by the upstream network (the outputs of the FC network). to achieve safety in a complex learning system: Vision-based
3) Results and Discussion: The training data are obtained by end-to-end control of lane following for autonomous driving.
solving a fine-tuned CBF controller introduced in [12]. We com- We first discuss where the two “ends” should be defined for
pare the FC model with our proposed BarrierNet. The training learning a good model based on limited data.
and testing results are shown in Fig. 11. The controls from the An end-to-end learning system is hard to interpret, especially
BarrierNet have some errors with respect to the ground truth, in the setting of inferring control from high-dimensional sensor
Authorized licensed use limited to: Zhejiang University. Downloaded on October 11,2024 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.
2300 IEEE TRANSACTIONS ON ROBOTICS, VOL. 39, NO. 3, JUNE 2023
Authorized licensed use limited to: Zhejiang University. Downloaded on October 11,2024 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.
XIAO et al.: BarrierNet: DIFFERENTIABLE CONTROL BARRIER FUNCTIONS FOR LEARNING OF SAFE ROBOT CONTROL 2301
Authorized licensed use limited to: Zhejiang University. Downloaded on October 11,2024 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.
2302 IEEE TRANSACTIONS ON ROBOTICS, VOL. 39, NO. 3, JUNE 2023
sorted obstacles. The sorted covering approach can make sure we collect real-world data from a wide range of environments,
that the vehicle may leave the road in order to avoid collision including different times of day, weather conditions, and seasons
with obstacles. In this setting, although we may have redundant of the year. The entire dataset consists of roughly 2 h of driv-
differentiable HOCBFs in terms of obstacle avoidance, these ing data, which is further augmented with our training dataset
HOCBFs always play an important role in guiding the vehicle, generation pipeline using VISTA.
either in lane keeping or obstacle avoidance.
The whole process for vision-based end-to-end autonomous B. Synthetic Training Dataset Generation
driving includes the following:
1) generating training data and control labels; We train our model with guided policy learning, which has
2) training the model using supervised learning; been shown to improve effectiveness for direct model transfer
3) estimating the vehicle state and obstacle state in real time to real-car deployment (other techniques, such as conformance
during testing; checking [27], can also be used to achieve sim-to-real transfer).
4) forward propagation of the model with front-view images The data generation process follows: 1) in VISTA, randomly
as inputs to generate safe controls for the ego vehicle. initializing both ego- and ado-car with different configurations
We summarize the algorithm for end-to-end autonomous driv- like relative poses, geographical locations associated with the
ing in Algorithm 2. real dataset, and the appearance of the vehicle; 2) running an
4) Multiple and Active Obstacles: The BarrierNet can handle optimal controller with access to privileged information to steer
multiple obstacles by adding the corresponding dCBF con- the ego-vehicle and collect ground-truth control outputs with
straints in the differentiable QP. Each of the two road boundaries corresponding states; 3) collecting RGB images at viewpoints
could be viewed as an obstacle in autonomous driving. The along the trajectories. We choose nonlinear model predictive
BarrierNet can also work for dynamic obstacles (such as other control (NMPC) as the privileged (nominal) controller. While
active vehicles, pedestrians, etc.), and the safety guarantees of NMPC is usually computationally expensive and hard to solve,
dynamic obstacles in the BarrierNet require additional state it is tractable offline and, with jerk ujerk and steering acceleration
estimations (such as moving speed) for those obstacles. In fact, usteer as controls, provides smooth acceleration a and steering
image-based state estimations are usually hard for states other rate w, which is used as learning targets in BarrierNet. Vehicle
than positions (such as speed, acceleration, etc.). Thus, we may dynamics of NMPC and BarrierNet (1) are defined with respect
need to use other sensors (such as light detection and ranging) to a reference trajectory [57]. It measures the along-trajectory
to better estimate states for dynamic obstacles and conduct distance s ∈ R and the lateral distance d ∈ R of the vehicle
multisensor fusion. The uncertainty of surrounding agents can center of gravity (CoG) with respect to the closest point on the
be considered in the dCBFs using the uncertainty bound if it is reference trajectory
known, as shown in [52]. The conservativeness of this approach ⎡ ⎤ ⎡ v cos(μ+β) ⎤ ⎡ ⎤
can be addressed using the adaptive event-driven approach for
ṡ 1−dκ 0 0
⎢ d˙⎥ ⎢ v sin(μ + β) ⎥ ⎢0 0⎥
CBFs [56]. ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢μ̇⎥ ⎢ v sin β − κ v cos(μ+β) ⎥ ⎢0 0⎥
⎢ ⎥ ⎢ lr 1−dκ ⎥ ⎢ ⎥ ujerk
⎢ v̇ ⎥ = ⎢ ⎥ + ⎢ 0 0⎥ (29)
VIII. VISION-BASED AUTONOMOUS DRIVING EXPERIMENTS ⎢ ⎥ ⎢ a ⎥ ⎢ ⎥ steer
⎢ ȧ ⎥ ⎢ 0 ⎥ ⎢1 0⎥ u
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
In this section, we show experiments with the proposed ⎣ δ̇ ⎦ ⎣ ω ⎦ ⎣0 0⎦ u
vision-based end-to-end autonomous driving framework in both ω̇ 0 0 1
sim-to-real environments and a full-scale autonomous vehicle.
ẋ f (x) g(x)
We start by introducing the hardware platform and data col-
lection, followed by implementation details of the proposed where μ is the vehicle local heading error determined by the
model. We then demonstrate extensive analysis in the sim-to-real difference of the global vehicle heading θ ∈ R and the tangent
environment virtual image synthesis and transformation for au- angle φ ∈ R of the closest point on the reference trajectory (i.e.,
tonomy (VISTA) [55]. Finally, we showcase results with real-car θ = φ + μ) as shown in Fig. 14; v, a denote the vehicle linear
deployment. speed and acceleration, respectively; δ, ω denote the steering
angle and steering rate, respectively; κ is the curvature of the
A. Hardware Setup and Real-World Data Collection reference trajectory at the closest point; lr is the length of the
We deploy our models onboard a full-scale autonomous vehi- vehicle from the tail to the CoG; and ujerk , usteer denote the two
cle (2019 Lexus RX 450H) equipped with an NVIDIA 2080Ti control inputs for jerk and steering acceleration (in the nominal
GPU and an AMD Ryzen 7 3800X 8-Core Processor. We use an controller), respectively. β = arctan( lr +l lr
f
tan δ), where lf is
red, green and blue (RGB) camera BFS-PGE-23S3C-CS as the the length of the vehicle from the head to the CoG. We set
primary perception sensor, which runs at 30 Hz, with a resolu- the receding horizon of the NMPC to 20-time steps during
tion of 960×600, and has 130◦ horizontal field-of-view. Other data sampling, and it is implemented in a virtual simulation
onboard sensors include inertial measurement sensors and wheel environment in MATLAB. We augment the real-world dataset
encoders to measure steering feedback and odometry. Also, we using VISTA and NMPC with synthetic obstacle avoidance and
use a differential global positioning system (dGPS) for evalu- lane following data. In total, the training dataset has around 400 k
ation purposes. To run the data-driven simulation VISTA [55], images.
Authorized licensed use limited to: Zhejiang University. Downloaded on October 11,2024 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.
XIAO et al.: BarrierNet: DIFFERENTIABLE CONTROL BARRIER FUNCTIONS FOR LEARNING OF SAFE ROBOT CONTROL 2303
Fig. 15. Lane following probabilistic comparisons of deviation from the lane
center in a BarrierNet with/without lane keeping CBFs.
Authorized licensed use limited to: Zhejiang University. Downloaded on October 11,2024 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.
2304 IEEE TRANSACTIONS ON ROBOTICS, VOL. 39, NO. 3, JUNE 2023
TABLE III
CRASH RATE AND CLEARANCE WITH/WITHOUT BARRIERNET, USING OR NOT
USING GROUND TRUTH OBSTACLE INFORMATION
Fig. 17. Penalty p1 (z), p2 (z) variation in a dCBF (BarrierNet) when ap-
proaching an obstacle under two (different) trained BarrierNets. The relative
degree of the safety constraint is two, and thus we have two CBF parameters in
one CBF. The segments inside the dotted boxes denote intervals when the ego
vehicle is near the obstacle. The box sizes are different as the ego has different
speeds when passing the obstacle.
Authorized licensed use limited to: Zhejiang University. Downloaded on October 11,2024 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.
XIAO et al.: BarrierNet: DIFFERENTIABLE CONTROL BARRIER FUNCTIONS FOR LEARNING OF SAFE ROBOT CONTROL 2305
Fig. 18. Illustration of real-car experiments. We show a driver’s view with a real car as obstacle in front of it (middle) and with a car in front of it using AR (right).
when the reference control (red) fails to avoid the front car and
requires correction from activated dCBF constraints.
3) Limitations: The proposed BarrierNet is subject to several
challenges in vision-based driving, which motivates further fu-
ture work as follows: 1) CBFs for all kinds of traffic participants
are constructed using a disk covering approach. However, how to
efficiently construct CBFs is still challenging (including CBFs
for lane keeping) when there are many participants; 2) occluded
Fig. 19. Two cases of experimental vehicle trajectories in lane keeping
with/without lane keeping CBFs in the BarrierNet. Tire slipping happened on
obstacles introduce uncertainties to the safety of the ego vehicle;
the icy road. 3) safety guarantees under dynamic obstacles are challenging
since we also need to know obstacle dynamics and states.
IX. CONCLUSION
In this article, we proposed BarrierNet - a differentiable
HOCBF layer that is trainable and guarantees safety with respect
to the user-defined safe sets. BarrierNet can be integrated with
any upstream neural network controller to provide a safety
layer. In our experiments, we show that the proposed BarrierNet
Fig. 20. Two cases of experimental vehicle trajectories in obstacle avoidance can guarantee safety while addressing the conservativeness that
with/without BarrierNet. In the left case, the heavy snow by the road is preventing CBFs induce. A potential future avenue of research emerging
the vehicle from getting back to the road due to tire slipping, and thus the vehicle
recovers slowly even when the steering wheel is at its left limit.
from this work will be to simultaneously learn the system
dynamics and unsafe sets with BarrierNets. This can be enabled
using the expressive class of continuous-time neural network
site (covariance < 1 cm), we provide qualitative analysis with models [58], [59], [60].
a side-by-side comparison between models with and without
BarrierNet.
1) BarrierNet in Challenging Sharp Turns: In Fig. 19, we ACKNOWLEDGMENT
demonstrate the driving trajectories of BarrierNet with and The views and conclusions contained in this document are
without lane-keeping CBFs in sharp left and right turns. We show those of the authors and should not be interpreted as representing
the footprint of the vehicle through time and indicate the forward the official policies, either expressed or implied, of the United
direction with arrows. Without lane-keeping CBFs (red), the car States Air Force or the USA government. The U.S. Government
is more prone to get off-road, while roughly correct estimates is authorized to reproduce and distribute reprints for government
of deviation from lane center (d) imposes an additional layer of purposes, notwithstanding any copyright notation herein.
safety with lane-keeping CBFs (blue).
2) Obstacle Avoidance in Real World: We also did experi-
ments on the autonomous car in obstacle avoidance, as shown REFERENCES
in Fig. 20. The first example (left) demonstrates that with [1] A. D. Ames, J. W. Grizzle, and P. Tabuada, “Control barrier function based
reasonable reference control (both models successfully avoid quadratic programs with application to adaptive cruise control,” in Proc.
IEEE 53rd Conf. Decis. Control, 2014, pp. 6271–6278.
the obstacle), the model with obstacle avoidance dCBFs (blue) [2] A. D. Ames, X. Xu, J. W. Grizzle, and P. Tabuada, “Control barrier function
creates more clearance to achieve better safety. The second based quadratic programs for safety critical systems,” IEEE Trans. Autom.
example (right) highlights the effectiveness of BarrierNet (blue) Control, vol. 62, no. 8, pp. 3861–3876, Aug. 2017.
Authorized licensed use limited to: Zhejiang University. Downloaded on October 11,2024 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.
2306 IEEE TRANSACTIONS ON ROBOTICS, VOL. 39, NO. 3, JUNE 2023
[3] Q. Nguyen and K. Sreenath, “Exponential control barrier functions for [29] W. Xiao, C. Belta, and C. G. Cassandras, “Event-triggered safety-critical
enforcing high relative-degree safety-critical constraints,” in Proc. Amer. control for systems with unknown dynamics,” in Proc. IEEE 60th Conf.
Control Conf., 2016, pp. 322–328. Decis. Control, 2021, pp. 540–545.
[4] L. Wang, E. A. Theodorou, and M. Egerstedt, “Safe learning of quadrotor [30] J. Breeden and D. Panagou, “High relative degree control barrier functions
dynamics using barrier certificates,” in Proc. IEEE Int. Conf. Robot. under input constraints,” in Proc. IEEE 60th Conf. Decis. Control, 2021,
Automat., 2018, pp. 2460–2465. pp. 6119–6124.
[5] A. Taylor, A. Singletary, Y. Yue, and A. Ames, “Learning for safety-critical [31] W. Xiao, C. Belta, and C. G. Cassandras, “Sufficient conditions for
control with control barrier functions,” in Proc. Learn. Dyn. Control, 2020, feasibility of optimal control problems using control barrier functions,”
pp. 708–717. Automatica, vol. 135, 2022, Art. no. 109960.
[6] J. Choi, F. Castañeda, C. J. Tomlin, and K. Sreenath, “Reinforcement [32] W. Xiao, C. Belta, and C. G. Cassandras, “Adaptive control barrier func-
learning for safety-critical control under model uncertainty, using control tions,” in IEEE Trans. Autom. Control, vol. 67, no. 5, pp. 2267–2281,
Lyapunov functions and control barrier functions,” in Proc. Robot.: Sci. May 2022, doi: 10.1109/TAC.2021.3074895.
Syst., 2020. [33] A. Robey et al., “Learning control barrier functions from expert demon-
[7] A. J. Taylor, A. Singletary, Y. Yue, and A. D. Ames, “A control barrier strations,” in Proc. IEEE 59th Conf. Decis. Control, 2020, pp. 3717–3724.
perspective on episodic learning via projection-to-state safety,” IEEE [34] M. Srinivasan, A. Dabholkar, S. Coogan, and P. A. Vela, “Synthesis of
Contr. Syst. Lett., vol. 5, no. 3, pp. 1019–1024, Jul. 2021. control barrier functions using a supervised machine learning approach,”
[8] X. Xu, P. Tabuada, J. W. Grizzle, and A. D. Ames, “Robustness of control in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., 2020, pp. 7139–7145.
barrier functions for safety critical control,” IFAC-PapersOnLine, vol. 48, [35] B. T. Lopez, J. J. E. Slotine, and J. P. How, “Robust adaptive control barrier
no. 27, pp. 54–61, 2015. functions: An adaptive and data-driven approach to safety,” IEEE Contr.
[9] T. Gurriet, A. Singletary, J. Reher, L. Ciarletta, E. Feron, and A. Ames, Syst. Lett., vol. 5, no. 3, pp. 1031–1036, Jul. 2021.
“Towards a framework for realizable safety critical control through active [36] S. Yaghoubi, G. Fainekos, and S. Sankaranarayanan, “Training neural
set invariance,” in Proc. ACM/IEEE 9th Int. Conf. Cyber- Phys. Syst., 2018, network controllers using control barrier functions in the presence of
pp. 98–106. disturbances,” in Proc. IEEE 23rd Int. Conf. Intell. Transp. Syst., 2020,
[10] A. J. Taylor and A. D. Ames, “Adaptive safety with control barrier pp. 1–6.
functions,” in Proc. Amer. Control Conf., 2020, pp. 1399–1405. [37] M. A. Pereira, Z. Wang, I. Exarchos, and E. A. Theodorou, “Safe optimal
[11] N. Csomay-Shanklin, R. K. Cosner, M. Dai, A. J. Taylor, and A. D. control using stochastic barrier functions and deep forward-backward
Ames, “Episodic learning for safe bipedal locomotion with control barrier SDEs,” in Proc. Conf. Robot Learn., 2020, pp. 1783–1801.
functions and projection-to-state safety,” in Proc. Learn. Dyn. Control, [38] B. Amos and J. Z. Kolter, “Optnet: Differentiable optimization as a layer
2021, pp. 1041–1053. in neural networks,” in Proc. 34th Int. Conf. Mach. Learn., 2017, vol. 70,
[12] W. Xiao and C. Belta, “Control barrier functions for systems with pp. 136–145.
high relative degree,” in Proc. IEEE 58th Conf. Decis. Control, 2019, [39] B. Amos, I. D. J. Rodriguez, J. Sacks, B. Boots, and J. Z. Kolter, “Dif-
pp. 474–479. ferentiable MPC for end-to-end planning and control,” in Proc. 32nd Int.
[13] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Conf. Neural Inf. Process. Syst., 2018, pp. 8299–8310.
Comput., vol. 9, no. 8, pp. 1735–1780, 1997. [40] P.-F. Massiani, S. Heim, and S. Trimpe, “On exploration requirements
[14] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image for learning safety constraints,” in Proc. Learn. Dyn. Control, 2021,
recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 905–916.
pp. 770–778. [41] S. Gruenbacher et al., “GoTube: Scalable stochastic verification of
[15] A. Vaswani et al., “Attention is all you need,” in Proc. Int. Conf. Neural continuous-depth models,” in Proc. AAAI Conf. Artif. Intell., vol. 36, no. 6,
Inf. Process. Syst., 2017, pp. 5998–6008. 2022, pp. 6755–6764.
[16] M. Lechner and R. Hasani, “Learning long-term dependencies in [42] S. Grunbacher, R. Hasani, M. Lechner, J. Cyranka, S. A. Smolka, and R.
irregularly-sampled time series,” 2020, arXiv:2006.04418. Grosu, “On the verification of neural odes with stochastic guarantees,” in
[17] R. Hasani et al., “Closed-form continuous-time neural networks,” Proc. AAAI Conf. Artif. Intell., 2021, vol. 35, pp. 11525–11535.
Nature Mach. Intell., Nature Publishing Group UK London, 2022, [43] J. V. Deshmukh, J. P. Kapinski, T. Yamaguchi, and D. Prokhorov, “Learn-
pp. 1–12. ing deep neural network controllers for dynamical systems with safety
[18] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representa- guarantees: Invited paper,” in Proc. IEEE/ACM Int. Conf. Comput.-Aided
tions by back-propagating errors,” Nature, vol. 323, no. 6088, pp. 533–536, Des., 2019, pp. 1–7.
1986. [44] J. Ferlez, M. Elnaggar, Y. Shoukry, and C. Fleming, “ShielDNN: A prov-
[19] A. E. Bryson and Y.-C. Ho, Applied Optimal Control. Waltham, MA, ably safe NN filter for unsafe NN controllers,” 2020, arXiv:2006.09564.
USA:Ginn Blaisdell, 1969. [45] W. Jin, Z. Wang, Z. Yang, and S. Mou, “Neural certificates for safe control
[20] J. B. Rawlings, D. Q. Mayne, and M. M. Diehl, Model Predictive Control: policies,” 2020, arXiv:2006.08465.
Theory, Computation, and Design, 2nd ed. Madison, WI, USA: Nob Hill [46] H. Zhao, X. Zeng, T. Chen, and J. Woodcock, “Learning safe neural
Publishing, 2017. network controllers with barrier certificates,” Form Asp Comput., vol. 33,
[21] S. Mitsch, K. Ghorbal, and A. Platzer, “On provably safe obstacle avoid- pp. 437–455, 2021.
ance for autonomous robotic ground vehicles,” in Proc. Robot.: Sci. Syst., [47] H. K. Khalil, Nonlinear Systems, 3rd ed. Englewood Cliffs, NJ, USA:
2013. Prentice-Hall, 2002.
[22] C. Pek, S. Manzinger, M. Koschi, and M. Althoff, “Using online verifi- [48] W. Xiao, C. G. Cassandras, C. A. Belta, and D. Rus, “Control barrier
cation to prevent autonomous vehicles from causing accidents,” Nature functions for systems with multiple control inputs,” in Proc. Amer. Control
Mach. Intell., vol. 2, no. 9, pp. 518–528, 2020. Conf., 2022, pp. 2221–2226.
[23] M. Althoff and J. M. Dolan, “Online verification of automated road [49] W. Xiao et al., “Rule-based optimal control for autonomous driving,” in
vehicles using reachability analysis,” IEEE Trans. Robot., vol. 30, no. 4, Proc. ACM/IEEE 12th Int. Conf. Cyber- Phys. Syst., 2021, pp. 143–154.
pp. 903–918, Aug. 2014. [50] Y. Ye and E. Tse, “An extension of Karmarkar’s projective algorithm
[24] S. M. LaValle, J. Kuffner, and J. James, “Randomized kinody- for convex quadratic programming,” Math. Program., vol. 44, no. 1,
namic planning,” Int. J. Robot. Res., vol. 20, no. 5, pp. 378–400, pp. 157–179, 1989.
2001. [51] W. Xiao and C. G. Cassandras, “Decentralized optimal merging control
[25] P. E. Hart, N. J. Nilsson, and B. Raphael, “A formal basis for the heuristic for connected and automated vehicles with safety constraint guarantees,”
determination of minimum cost paths,” IEEE Trans. Syst. Sci. Cybern., Automatica, vol. 123, 2021, Art. no. 109333.
vol. 4, no. 2, pp. 100–107, Jul. 1968. [52] W. Xiao, C. G. Cassandras, and C. Belta, “Bridging the gap between
[26] K. Ota et al., “Deep reactive planning in dynamic environments,” in Proc. optimal trajectory planning and safety-critical control with applications
Conf. Robot Learn., 2021, pp. 1943–1957. to autonomous vehicles,” Automatica, vol. 129, 2021, Art. no. 109592.
[27] H. Roehm, J. Oehlerking, M. Woehrle, and M. Althoff, “Model confor- [53] A. Amini et al., “Learning robust control policies for end-to-end au-
mance for cyber-physical systems: A survey,” ACM Trans. Cyber- Phys. tonomous driving from data-driven simulation,” IEEE Robot. Automat.
Syst., vol. 3, no. 3, pp. 1–26, 2019. Lett., vol. 5, no. 2, pp. 1143–1150, Apr. 2020.
[28] G. Yang, C. Belta, and R. Tron, “Self-triggered control for safety critical [54] T.-H. Wang, A. Amini, W. Schwarting, I. Gilitschenski, S. Karaman, and
systems using control barrier functions,” in Proc. Amer. Control Conf., D. Rus, “Learning interactive driving policies via data-driven simulation,”
2019, pp. 4454–4459. in Proc. Int. Conf. Robot. Automat., 2022, pp. 7745–7752.
Authorized licensed use limited to: Zhejiang University. Downloaded on October 11,2024 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.
XIAO et al.: BarrierNet: DIFFERENTIABLE CONTROL BARRIER FUNCTIONS FOR LEARNING OF SAFE ROBOT CONTROL 2307
[55] A. Amini et al., “Vista 2.0: An open, data-driven simulator for multimodal Makram Chahine received the Diplôme d’Ingénieur
sensing and policy learning for autonomous vehicles,” in Proc. Int. Conf. in applied mathematics from École Centrale Paris,
Robot. Automat., 2022, pp. 2419–2426. Rennes, France, and the M.Sc. degree in aerospace
[56] W. Xiao, C. Belta, and C. G. Cassandras, “Event-triggered control for engineering from the Georgia Institute of Technol-
safety-critical systems with unknown dynamics,” IEEE Trans. Autom. ogy, Atlanta, GA, USA, both in 2019. Since 2021,
Control, 2022, early access, doi: 10.1109/TAC.2022.3202088. he has been working toward the Ph.D. degree in
[57] A. Rucco, G. Notarstefano, and J. Hauser, “An efficient minimum-time electrical engineering and computer science with the
trajectory generation strategy for two-track car vehicles,” IEEE Trans. Massachusetts Institute of Technology, Cambridge,
Control Syst. Technol., vol. 23, no. 4, pp. 1505–1519, Jul. 2015. MA, USA.
[58] R. T. Chen, Y. Rubanova, J. Bettencourt, and D. Duvenaud, “Neural ordi- His research interests include autonomous robots,
nary differential equations,” in Proc. 32nd Int. Conf. Neural Inf. Process. artificial intelligence, and complex interacting
Syst., 2018, pp. 6572–6583. systems.
[59] R. Hasani, M. Lechner, A. Amini, D. Rus, and R. Grosu, “Liquid time-
constant networks,” in Proc. AAAI Conf. Artif. Intell., 2021, vol. 35,
pp. 7657–7666.
[60] C. Vorbach, R. Hasani, A. Amini, M. Lechner, and D. Rus, “Causal
navigation by continuous-time neural networks,” in Proc. Adv. Neural Inf.
Process. Syst., vol. 34, 2021, pp. 12425–12440.
Authorized licensed use limited to: Zhejiang University. Downloaded on October 11,2024 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.