0% found this document useful (0 votes)
54 views19 pages

Xiao 等 - 2023 - BarrierNet Differentiable Control Barrier Functions for Learning of Safe Robot Control

Uploaded by

1157312467
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views19 pages

Xiao 等 - 2023 - BarrierNet Differentiable Control Barrier Functions for Learning of Safe Robot Control

Uploaded by

1157312467
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

IEEE TRANSACTIONS ON ROBOTICS, VOL. 39, NO.

3, JUNE 2023 2289

BarrierNet: Differentiable Control Barrier Functions


for Learning of Safe Robot Control
Wei Xiao , Member, IEEE, Tsun-Hsuan Wang , Student Member, IEEE, Ramin Hasani, Makram Chahine ,
Alexander Amini , Member, IEEE, Xiao Li , Member, IEEE, and Daniela Rus , Fellow, IEEE

Abstract—Many safety-critical applications of neural networks,


such as robotic control, require safety guarantees. This article
introduces a method for ensuring the safety of learned models
for control using differentiable control barrier functions (dCBFs).
dCBFs are end-to-end trainable and guarantee safety. They im-
prove over classical control barrier functions (CBFs), which are
usually overly conservative. Our dCBF solution relaxes the CBF
definitions by: 1) using environmental dependencies; 2) embedding
them into differentiable quadratic programs. These novel safety
layers are called a BarrierNet. They can be used in conjunction Fig. 1. Safety-guaranteed learning system with the proposed BarrierNet con-
with any neural network-based controller. They are trained by troller for safety-critical systems. The BarrierNet learns adaptively changing
gradient descent. With BarrierNet, the safety constraints of a dCBFs and other parameters together with differentiable QPs (dQP), and it can
guarantees safety that is enforced by dCBFs in a nonoverly-conservative way.
neural controller become adaptable to changing environments.
We evaluate BarrierNet on the following several problems: 1)
robot traffic merging; 2) robot navigation in 2-D and 3-D spaces;
3) end-to-end vision-based autonomous driving in a sim-to-real trained offline do not perform well online, and they have trouble
environment and in physical experiments; 4) demonstrate their with previously unseen environments and situations.
effectiveness compared to state-of-the-art approaches. In this article, we use insights from model-based safety con-
Index Terms—Control barrier function (CBF), neural networks, trollers [1], [2] to equip end-to-end learning systems with safety
robot learning, safety guarantees. guarantees. We propose a new algorithm for synthesizing safe
neural controllers end-to-end by defining a novel instance of
control barrier functions (CBFs) that is differentiable. CBFs
I. INTRODUCTION are popular methods for guaranteeing safety of model-based
HE deployment of learning systems in decision-critical controllers when the system dynamics are known. A large body
T applications such as autonomous vehicle control requires
safety guarantees because one simple control mistake can lead
of work has studied variants of CBFs [3], [4], [5], [6], [7] and
their characteristics under increasing uncertainty [8], [9], [10].
to catastrophic outcomes. Many key aspects of the perception However, as uncertainty increases, CBFs cause the system’s
and navigation of robots can be expressed as state-of-the-art behavior to be excessively conservative [11].
(SOTA) learned models. While the deep-learning methodology In this article, we address the over-conservativeness of CBFs
has brought significant advancements in the development of by replacing the set of hard constraints in high-order CBFs
reliable complex systems, it is still challenging to guarantee the (HOCBFs) [12] for arbitrary-relative-degree systems with a set
safety of the resulting controller. This is because SOTA models of differentiable constraints. We do so without loss of safety
guarantees. We obtain a versatile safety guaranteed barrier layer,
which we term a BarrierNet, that can be combined with any
Manuscript received 8 October 2022; revised 21 December 2022; accepted
16 February 2023. Date of publication 21 March 2023; date of current version deep learning system [13], [14], [15], [16], [17]. BarrierNet can
7 June 2023. This paper was recommended for publication by Associate Editor be trained end-to-end via reverse-mode automatic differentia-
Rafael Murrieta-Cid and Editor Wolfram Burgard upon evaluation of the review- tion [18] (see Fig. 1).
ers’ comments. This work was supported in part by Capgemini Engineering,
in part by the United States Air Force Research Laboratory and the United BarrierNet allows the safety constraints of a neural controller
States Air Force Artificial Intelligence Accelerator and was accomplished under to be adaptable to changing environments. A canonical appli-
Cooperative under Grant FA8750-19-2-1000, and in part by the Office of Naval cation of BarrierNet is end-to-end vision-based autonomous
Research (ONR) under Grant N00014-18-1-2830. (Wei Xiao and Tsun-Hsuan
Wang contributed equally to this work.) (Corresponding author: Wei Xiao.) driving (see Figs. 1 and 2). In this example, the BarrierNet is
The authors are with the Computer Science and Artificial Intelligence trained in a sim-to-real environment and outputs acceleration
Lab, Massachusetts Institute of Technology, Cambridge, MA 02139 USA (e- and steering commands to navigate the vehicle along the center
mail: [email protected]; [email protected]; [email protected]; [email protected];
[email protected]; [email protected]; [email protected]). lane while avoiding obstacles. The sim-to-real environment can
Code: https://github.com/Weixy21/BarrierNet ensure safety during training and allows us to directly deploy the
This article has supplementary material provided by the au- neural controller to a real vehicle. The system and environmental
thors and color versions of one or more figures available at
https://doi.org/10.1109/TRO.2023.3249564. observations are inputs to the upstream network, whose outputs
Digital Object Identifier 10.1109/TRO.2023.3249564 serve as arguments to the BarrierNet layer. Finally, BarrierNet
1552-3098 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Zhejiang University. Downloaded on October 11,2024 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.
2290 IEEE TRANSACTIONS ON ROBOTICS, VOL. 39, NO. 3, JUNE 2023

reachability analysis [22], [23], rapidly-exploring randomized


trees [24], A∗ [25], and reactive planning [26], are widely used in
safety-critical systems. Conformance checking [27] can be used
to bridge the gap between simulations and real systems, while
sensor failures can be addressed using reachability analysis [23].
Compared to CBFs, these planning and control methods are usu-
ally computationally expensive. Further, these planning methods
usually require many hand-tuned parameters and much envi-
Fig. 2. Sim-to-real learning and deployment with the proposed BarrierNet
controller for end-to-end vision-based autonomous driving. The model is learned ronmental information. By comparison, the BarrierNet method
in a simulator with vehicle dynamics and photo-realistic sensor synthesis and described in this article learns parameters and only requires en-
directly deployed on a full-scale autonomous vehicle. vironment observations (such as front-view images). Moreover,
BarrierNet can be combined as an additional learnable layer with
any learning system and as a higher-frequency safety control
outputs the controls that guarantee collision avoidance. In con- module with any planning methods.
trast to existing work, our proposed architecture is end-to-end 1) CBF and Learning: A large body of work studied CBF-
trainable, including the BarrierNet for neural networks. based safety guarantees [3], [4], [5], [6], [7] and their char-
Contributions: This article contributed the following. acteristics under increasing model uncertainty [8], [9], [10].
1) We propose BarrierNet, a novel trainable and interpretable Many existing works [2], [3], [28] combine CBFs for systems
layer that can be added to neural networks and is built by with quadratic costs to form optimization problems. Time is
leveraging the definition of higher-order CBFs; BarrierNet discretized. An optimization problem with constraints given by
provides safety guarantees for general control problems the CBFs [inequalities of the form (4) is solved at each time
with neural network controllers, and it can be applied to step. The inter-sampling effect is considered in [28] and [29].
general robot navigation and manipulation tasks. Replacing CBFs by HOCBFs allows us to handle constraints
2) We resolve the over-conservativeness of CBFs by intro- with arbitrary relative degrees [12]. In this prior work, as the
ducing differentiable constraints in their definition (c.f., uncertainty of models increases, CBFs make the system’s behav-
Fig. 1), which makes the CBF parameters learnable from ior excessively conservative [11]. Feasibility guarantees of the
data. This allows end-to-end training along with a given CBF method under control input constraints have been studied
learning system. in [30] and [31]. However, these approaches did not consider the
3) We design BarrierNet such that the CBF parameters are conservativeness of the CBF method.
adaptive to changes in environmental conditions and can The recently proposed adaptive CBFs (AdaCBFs) [32] ad-
be learned from data. dressed the conservativeness of the HOCBF method by multi-
4) We train, evaluate, and verify BarrierNet on the following plying the class K functions of an HOCBF with some penalty
several robot-centric problems: a) traffic merging for au- functions. These penalty functions are HOCBFs and are guar-
tonomous vehicles; b) robot navigation with obstacles in anteed to be nonnegative. This is due to the fact that the main
2-D and 3-D; c) end-to-end vision-based autonomous driv- conservativeness of the HOCBF method comes from the class K
ing; we also deploy and evaluate the learned BarrierNet functions. By multiplying (relaxing) the class K functions with
model on a full-scale autonomous vehicle for safe lane some penalty functions, it has been shown that the satisfaction of
following and obstacle-avoidance with physical parked the AdaCBF constraint is a necessary and sufficient condition for
vehicles and augmented reality (AR) dynamic obstacles. the satisfaction of the original safety constraint b(x) ≥ 0 [32].
This rest of this article is organized as follows. In Sections II This is conditioned on designing proper auxiliary dynamics for
and III, we provide the related work and the necessary back- all the penalty functions based on task specifics. However, the
ground to construct our theory. We formulate the problem in design of such auxiliary dynamics remains a challenge, which
Section IV and introduce BarrierNets in Section V. Section VI we study in this article. In other words, we use the training data to
includes our experimental evaluation on simple examples. The determine the variation of the penalty functions in the proposed
proposed BarrierNet is extended to learning for vision-based BarrierNet.
end-to-end autonomous driving in Section VII, with real car Supervised learning techniques have been proposed to learn
experiments given in Section VIII. Finally, Section IX concludes safe set definitions from demonstrations [33] and sensor
this article. data [34], which are then enforced by CBFs. The authors in
[5] used data to learn system dynamics for CBFs. In a similar
II. RELATED WORK setting, [35] used adaptive control approaches to estimate the
unknown system parameters in CBFs. In [36], neural network
This article builds on a large body of work in machine
controllers are trained using CBFs in the presence of distur-
learning, safe planning, optimization, model-based control, and
bances. Learning involving uncertainties with CBFs has been
CBFs.
studied in [5] and [6], which could enable robust control for
safety guarantees. These prior works address learning safe sets
A. Safe Planning and Control and dynamics, whereas we focus on the design of environment-
Planning and control methods, such as the Hamiltonian anal- dependent and trainable CBFs, and we can directly apply these
ysis [19], model predictive control [20], theorem proving [21], learning methods to the proposed BarrierNet.
Authorized licensed use limited to: Zhejiang University. Downloaded on October 11,2024 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.
XIAO et al.: BarrierNet: DIFFERENTIABLE CONTROL BARRIER FUNCTIONS FOR LEARNING OF SAFE ROBOT CONTROL 2291

2) Optimization-Based Safety Frameworks: Recent ad- TABLE I


NOTATION
vances in differentiable optimization methods show promising
directions for safety guaranteed neural network controllers [37],
[38], [39], [40], [41], [42]. In [38], a differentiable quadratic
program (QP) layer, called OptNet (filter for safe control [37])
was introduced. In [43], [44], [45], and [46], safety guar-
anteed neural network controllers have been learned through
verification-in-the-loop training. By comparison, BarrierNet is
computationally efficient, easily scalable, general for control
problems, and adaptive to environmental changes.

III. BACKGROUND
In this section, we briefly introduce CBFs and refer interested
readers to [1] for more details. Intuitively, CBFs are a means
to translate state constraints to control constraints under affine
dynamics. The controls that satisfy those constraints can be effi-
ciently solved by formulating a QP. We start with the definition
of class K functions. We further define a sequence of sets Ci , i ∈ {1, . . . , m} as-
Definition 1. (Class K function [47]): A continuous func- sociated with (2) in the form
tion α : [0, a) → [0, ∞), a > 0 is said to belong to class K if Ci := {x ∈ Rn : ψi−1 (x) ≥ 0}, i ∈ {1, . . . , m}. (3)
it is strictly increasing and α(0) = 0. A continuous function
Definition 4. (HOCBF [12]): Let C1 , . . . , Cm be defined
β : R → R is said to belong to the extended class K if it is
by (3) and ψ1 (x), . . . , ψm (x) be defined by (2). A function
strictly increasing and β(0) = 0.
b : Rn → R is an HOCBF of relative degree m for system (1)
Consider an affine control system of the form
if there exist (m − i)th order differentiable class K functions
ẋ = f (x) + g(x)u (1) αi , i ∈ {1, . . . , m − 1} and a class K function αm such that
  
where x ∈ Rn , f : Rn → Rn , and g : Rn → Rn×q are locally sup Lm f b(x) + L g L m−1
f b(x) u+O(b(x))
Lipschitz, and u ∈ U ⊂ Rq , where U denotes a control con- u∈U

straint set. Each of the control component bounds is assumed + αm (ψm−1 (x))] ≥ 0 (4)
to be independent of each other (since we consider low-speed
experiments in this work, such as autonomous driving), but for all x ∈ C1 ∩, . . . , ∩Cm . In (4), Lm f (Lg ) denotes Lie
the CBF-based method can still work when control bounds derivatives
m−1 i along f (g) m (one) times, and O(b(x)) =
are coupled, which can be written as different forms of input i=1 Lf (αm−i ◦ ψm−i−1 )(x). Further, b(x) is such that
constraints in the QP. Lg Lm−1
f b(x) = 0 on the boundary of the set C1 ∩, . . . , ∩Cm .
Definition 2: A set C ⊂ Rn is forward invariant for system The HOCBF is a general form of the relative degree one
(1) if its solutions for some u ∈ U starting at any x(0) ∈ C CBF [2] (setting m = 1 reduces the HOCBF to the common
satisfy x(t) ∈ C, ∀t ≥ 0. CBF form). We can define αi (·), i ∈ {1, . . . , m} in Definition
Definition 3. (Relative degree): The relative degree of a (suf- 4 to be extended class K functions to ensure robustness of an
ficiently many times) differentiable function b : Rn → R with HOCBF to perturbations [2].
respect to system (1) is the number of times it needs to be Theorem 1. ([12]): Given an HOCBF b(x) from Definition
differentiated along its dynamics until any component of the 4 with the associated sets C1 , . . . , Cm defined by (3), if x(0) ∈
control u explicitly shows in the corresponding derivative. C1 ∩, . . . , ∩Cm , then any Lipschitz continuous controller u(t)
For systems with multiple control inputs, existing CBF meth- that satisfies the constraint in (4), ∀t ≥ 0 renders C1 ∩, . . . , ∩Cm
ods may fail due to inconsistent relative degrees across control forward invariant for system (1).
components. In such cases, we may define a relative degree set, We provide a summary of notations in Table I.
which can be addressed by integral CBFs [48] that can make
desired (all) control components show up in the derivative. Since IV. PROBLEM FORMULATION
function b is used to define a (safety) constraint b(x) ≥ 0, we We make the following assumptions in this work:
will refer to the relative degree of b as the relative degree of Assumption 1:
the constraint. For a constraint b(x) ≥ 0 with relative degree 1) All measurements are assumed to be precise and reliable,
m, b : Rn → R, and ψ0 (x) := b(x), we define a sequence of and there are no occluded objects.
functions ψi : Rn → R, i ∈ {1, . . . , m} 2) The environment model is assumed to be accessible with-
ψi (x) := ψ̇i−1 (x) + αi (ψi−1 (x)), i ∈ {1, . . . , m} (2) out time delay.
3) The system is not subject to any disturbances.
where αi (·), i ∈ {1, . . . , m} denotes a (m − i)th order differ- 4) The system dynamics (1) are assumed to accurately match
entiable class K function. the real system.

Authorized licensed use limited to: Zhejiang University. Downloaded on October 11,2024 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.
2292 IEEE TRANSACTIONS ON ROBOTICS, VOL. 39, NO. 3, JUNE 2023

5) The robot moves in low speed such that the control bounds the performance of the system. For instance, if a class K function
are independent from each other. αi in the HOCBF is too steep (like a step function), then they are
The above assumptions can be addressed using the online the least conservative. However, such a HOCBF constraint (4)
verification method [22], but this approach tends to be compu- becomes active only near the unsafe set boundary, requiring a
tationally expensive. These assumptions can also be addressed large control input effort. Hence, there might not exist a feasible
using the efficient event-triggered control framework [29], in control that satisfies the HOCBF constraint (4) and the control
which the issues of unknown dynamics, disturbances, and bound at the same time. If, on the other hand, αi is too flat,
measurement uncertainties are considered. This event-triggered then the HOCBF is over-conservative as the HOCBF constraint
framework can be directly applied to the proposed BarrierNet, becomes active when the system is far from the unsafe set
and thus, we only focus on the theoretic foundations in this boundary. Ideally, we wish to find a αi that has the steepest
work. For reliability of perception and occluded objects, we may slope while making the HOCBF constraint compatible with the
consider multisensors or infer from other observed participants, control bound when it becomes active. In such cases, we call
and this will be further studied in future work. a HOCBF nonconservative. The current approaches [30], [31]
Here, we formally define the learning problem for safety- only consider the feasibility issue of the CBF method without
critical control. considering this conservativeness.
Problem 1: Given: BarrierNet learns an HOCBF without losing the safety guar-
1) a system with known affine dynamics in the form of (1); antees. In addition, we also incorporate HOCBF in a differen-
2) a state-feedback nominal controller h (x) = u (such as tiable optimization layer to allow the tuning of its parameters
the model predictive controller) that is taken as the training from data. Given a safety requirement b(x) ≥ 0 with relative
label; degree m for system (1), we redefine the sequence of CBFs in
3) a set of safety constraints bj (x) ≥ 0, j ∈ S (where bj is (2) as
continuously differentiable, S is a constraint set);
4) control bounds umin ≤ u ≤ umax ;
ψi (x, z, z d ) := ψ̇i−1 (x, z, z d ) + pi (z)αi (ψi−1 (x, z, z d ))
5) a neural network controller h(x|θ) = u parameterized by
θ; i ∈ {1, . . . , m} (5)
our goal is to find the optimal parameters
θ = arg min Ex [l(h (x), h(x|θ))] where ψ0 (x, z) = b(x). The variable z ∈ Rd is the input of
θ the neural network (d ∈ N is the dimension of the features),
while guaranteeing the satisfaction of the safety constraints in and it is assumed that the relative degree of each component
3) and control bounds in 4). E(·) is the expectation and l(·, ·) in z is not lower than that of the safety constraint (this makes
denotes a similarity measure. sure that no control will appear in the derivatives of z during
Problem 1 defines policy distillation with safety guarantees. construction of an HOCBF), which is reasonable since z often
The safety constraints can be predefined by users or they can be comes from high-dimensional sensory measurement like im-
learned [33], [34]. ages; otherwise, its derivative can be omitted, as shown later.
z d = (z (1) , . . . , z (m−1) ) ∈ R(m−1)d denotes the derivatives of
V. BARRIERNET the input z. pi : Rd → R>0 , i ∈ {1, . . . , m} are the outputs of
the previous layer (e.g., CNN, LSTM, or MLP, as shown in
We introduce BarrierNet–a CBF-based neural network con-
Fig. 12) or trainable parameters themselves (i.e., pi is indepen-
troller with parameters trainable via backpropagation. We define
dent of z), where R>0 denotes the set of positive scalars. Note
the safety guarantees of a neural network controller as follows.
that it is possible to absorb pi (z) into the class K function αi
Definition 5: (Safety guarantees) A neural network controller
for notational simplicity, but it may raise the question of how to
has safety guarantees for system (1) if its outputs (controls)
learn a class K function. We adopt the most general and flexible
satisfy control bounds 4) in Problem 1 and drive system (1)
formulation that allows arbitrary class K function definition. If
such that bj (x(t)) ≥ 0, ∀t ≥ 0, ∀j ∈ S.
pi , i ∈ {1, . . . , m} are just trainable parameters, then the above
BarrierNet addresses several limitations with existing meth-
learnable CBFs are not involved with z d . The above formulation
ods. The HOCBF method provides safety guarantees for control
is similar to AdaCBFs [32] and can still guarantee safety, but is
systems with arbitrary relative degrees but in a conservative way.
trainable and does not require us to design auxiliary dynamics
In other words, the satisfaction of the HOCBF constraint (4)
for pi (that an AdaCBF does require), which is a challenging
is only a sufficient condition for the satisfaction of the original
aspect of the existing AdaCBF method. Then, we have a similar
safety constraint b(x) ≥ 0. This conservativeness of the HOCBF
HOCBF constraint (called differentiable CBF (dCBF)] as in
method significantly limits the system’s performance. For ex-
Definition 4 of the form
ample, conservativeness may drive the system much further
away from obstacles than necessary. Our first motivation for  
f b(x) + Lg Lf
Lm b(x) u+O (b(x), z, z d )
m−1
BarrierNet is to address this conservativeness.
More specifically, an HOCBF constraint is always a hard
constraint in order to guarantee safety. This may adversely affect + pm (z)αm (ψm−1 (x, z, z d )) ≥ 0 (6)

Authorized licensed use limited to: Zhejiang University. Downloaded on October 11,2024 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.
XIAO et al.: BarrierNet: DIFFERENTIABLE CONTROL BARRIER FUNCTIONS FOR LEARNING OF SAFE ROBOT CONTROL 2293

A. Forward Pass
The forward step of a BarrierNet is to solve the QP in
Definition 6. The inputs of a BarrierNet include environmental
features z (such as the location and speed of an obstacle) that can
be provided directly or from a tracking network if raw sensory
inputs are used. BarrierNet also takes as inputs the system states
x as feedback, as shown in Fig. 1. The outputs are the solutions
of the QP (the resultant controls).

B. Backward Pass
Fig. 3. BarrierNet structure. F serves as a reference output (control), and all The main task of BarrierNet is to provide controls while
the other parameters could be either trainable (parameter of the BarrierNet)
or depend on the previous layers (input of the BarrierNet). Note that we can
always ensuring safety. Suppose  denotes some loss function (a
compute reference control as H −1 F using reference F . similarity measure). Using the Lagrangian of QP followed by
applying the Karush–Kuhn–Tucker conditions, as introduced
in [38], we can find the loss gradient with respect to all the
parameters. Specifically, let λ denote the dual variables on the
Given learnable dCBFs defined by (5) and (6) that are not HOCBF constraints, and D(·) creates a diagonal matrix from a
conservative, we can incorporate them as a differentiable opti- vector, and let u∗ , λ∗ denote the optimal solutions of u and λ,
mization layer, which is our BarrierNet. respectively, we can first get du and dλ in the form
Definition 6 (BarrierNet): A BarrierNet is composed by neu-    −1   
du H GT D(λ∗ ) ∂ T
rons in the form = ∂u∗ (9)
dλ G D(Gu∗ − h) 0
1 where G, h are concatenated by Gj , hj , j ∈ S, where
u∗ (t) = arg min u(t)T H(z|θh )u(t) + F T (z|θf )u(t)
u(t) 2
Gj = − Lg Lm−1
f bj (x)
(7)
h j = Lm
f bj (x) + O (bj (x), z, z d )
s.t.
+ pm (z)αm (ψm−1 (x, z, z d )) . (10)
f bj (x) + [Lg Lf
Lm bj (x)]u + O(bj (x), (z, z d )|θp )
m−1

Since the control bounds in (7) are not trainable, they are not
+ pm (z|θpm )αm (ψm−1 (x, (z, z d )|θp )) ≥ 0, j ∈ S included in G, h.
umin ≤ u ≤ umax , t = kΔt + t0 (8) Then, the relevant gradient with respect to all the BarrierNet
parameters can be given by1
1
where H(z|θh ) ∈ Rq×q is positive definite, H −1 (z|θh )F (z|θf ) H = (du uT + udTu ), F = du
2
∈ Rq could be interpreted as a reference control (the output
of previous network layers) and θh , θf , θp = (θp1 , . . . , θpm ) are G  = D(λ∗ )(dλ uT + λdTu ), h = −D(λ∗ )dλ . (11)
trainable parameters, Δt > 0 is the discretization time. In the above equation, G  is not applicable in a BarrierNet
At a high level, this formulation produces control close to as it is determined by the corresponding HOCBF. h  is also
that of the nominal controller while satisfying dCBF constraints, not directly related to the input of a BarrierNet. Nevertheless,
guaranteeing safety with minimal deterioration in performance. we have pi  = hj  pi hj , i ∈ {1, . . . , m}, j ∈ S, where
The inequality (8) in Definition 6 guarantees each safety con-
hj  is given by h  in (11) and pi hj is given by taking
straint bj (x) ≥ 0, ∀j ∈ S through the parameterized function the partial derivative of hj in (10).
pi , i ∈ {1, . . . , m}. Based on Definition 6, for instance, if we The following theorem characterizes the safety guarantees of
have ten control agents, we need ten BarrierNet neurons pre- a BarrierNet.
sented by (7) to ensure the safety of each agent. This implies Theorem 2: If pi (z), i ∈ {1, . . . , m} are differentiable (fur-
that BarrierNet can be extended to multiagent settings. ther, the relative degree of each component in z is not lower
In Definition 6, we make H(z|θh ) parameterized and depen- than that of the safety constraint) or pi , i ∈ {1, . . . , m} are
dent on the network input z, but H can also be directly trainable parameters to be trained, then a BarrierNet composed by neurons
parameters that do not depend on the previous layer (i.e., we as in Definition 6 guarantees the safety of system (1).
have H). The same applies to pi , i ∈ {1, . . . , m}. The train- Proof: If pi , i ∈ {1, . . . , m} are trainable parameters (i.e., pi
able parameters are θ = {θh , θf , θp } (or θ = {H, θf , pi , ∀i ∈ is independent of z), then the dCBF (6) is just a regular HOCBF
{1, . . . , m}} if H and pi do not depend on the previous layer). whose parameters are optimally determined by the training data,
The solution u∗ is the output of the neuron. The whole structure and thus, safety is guaranteed for system (1) by Theorem 1.
of a BarrierNet is shown in Fig. 3. The BarrierNet is differ-
entiable with respect to its parameters [38]. We describe the 1 Note: the gradient with respect to the parameters θ , θ , and θ can be
h f p
forward and backward passes as follows. obtained using the chain rule.

Authorized licensed use limited to: Zhejiang University. Downloaded on October 11,2024 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.
2294 IEEE TRANSACTIONS ON ROBOTICS, VOL. 39, NO. 3, JUNE 2023

If pi (z), i ∈ {1, . . . , m} are differentiable and the relative has only negative real roots and pi (z), i ∈ {1, . . . , m} are such
degree of each component in z is not lower than that of the that x ∈ C1 ∩, . . . , ∩Cm .
safety constraint (no control will appear in the derivatives of As shown in (12), if z d = 0, then Pr (p) = 0, i.e., (12) can be
z during construction of an HOCBF), it follows from Theo- rewritten as
rem 1 that each ψi (x, z, z d ) in (5) is a valid CBF. Starting
f bj (x) + [Lg Lf
Lm bj (x)]u
m−1
from ψm (x, z, z d ) ≥ 0 [the nonLie-derivative form of each
HOCBF constraint (6)], we can show that ψm−1 (x, z, z d ) ≥ 0 m r

is guaranteed to be satisfied following Theorem 1. Recursively, (m−r)
+ pkl (z) bj (x) ≥ 0.
we can show that ψ0 (x, z) ≥ 0 is guaranteed to be satisfied. As r=1 1≤k1 <k2 ...<kr ≤m l=1
b(x) = ψ0 (x, z) following (5), we have that system (1) is safety (15)
guaranteed in a BarrierNet. 
Consider the case that pi (z), i ∈ {1, . . . , m} depend on the Combining (15) and (14), and by Vieta’s formulas, we have
input. In order to guarantee that pi (z), i ∈ {1, . . . , m} are that −pi (z), i ∈ {1, . . . , m} (pi (z) > 0) are the negative real
continuously differentiable and positive, we can choose dif- roots for the polynomial (13). Therefore, the condition (15)
ferentiable activation functions for the previous layers, such implies the satisfaction of bj (x) ≥ 0, and the BarrierNet com-
as sigmoid functions. The derivatives of the penalty functions posed by neurons as in Definition 6 guarantees the safety of
pi (z), i ∈ {1, . . . , m}, i.e., the derivatives of z may be hard to system (1). 
evaluate. For instance, if z contains the pixels of an image, then In order to get piece-wise constant penalties pi (z) (i.e., z d =
its derivative is inaccurate when the camera sampling frequency 0) in dCBFs, we can determine the value of pi (z) and make it
is low. Nevertheless, we have the following corollary to show a constant before the construction of dCBFs, which can avoid
the safety guarantees of the BarrierNet. taking the derivatives of pi (z). If we do wish to consider z d in
Corollary 1: If z d = 0, the class K functions αi (·), i ∈ the BarrierNet, then the uncertainty of z d can be addressed by
{1, . . . , m} in (5) are linear functions, and pi (z), i ∈ considering the inter-sampling effect (i.e., constraint satisfaction
{1, . . . , m} are such that x ∈ C1 ∩, . . . , ∩Cm , then a BarrierNet between discrete time instant) [28], [29] when the bounds of z d
composed by neurons as in Definition 6 guarantees the safety of are known. We have shown the safety guarantees of a BarrierNet
system (1). with linear class K functions when z d = 0 by Corollary 1. We
Proof: When the class K functions αi (·), i ∈ {1, . . . , m} in can also infer that safety is still guaranteed for other types of
(5) are linear functions, the dCBF constraint (8) can be rewritten class K functions in a BarrierNet. pi (z) can be truncated to
as satisfy x ∈ C1 ∩, . . . , ∩Cm , and this is discussed in [12]. If no
such pi (z) exists (such as due to uncertainties), then we can
f bj (x) + [Lg Lf
Lm bj (x)]u
m−1
define all class K functions to be the extended class K functions
m r
 [such as the linear functions shown in (15)] to achieve robust
(m−r)
+ pkl (z) +Pr (p) bj (x) ≥ 0 control [2].
r=1 1≤k1<k2 ...<kr≤m l=1 Remark 1 (Adaptivity of the BarrierNet): The HOCBF con-
(12) straints in a BarrierNet are regulated by the trainable penalty
functions without losing safety guarantees. The penalty func-
(0)
where bj (x) = bj (x) and Pr (p), r ∈ {1, . . . , m} are the poly- tions are environment-dependent. Their features can be calcu-
nomials of the derivatives of pi (z), i ∈ {1, . . . , m}, such that lated from upstream networks. The adaptive property of the
z d = 0 implies Pr (p) = 0, ∀r ∈ {1, . . . , m}. Note that when HOCBFs provides the adaptiveness of the BarrierNet. As a
setting z d = 0, it is equivalent to setting pi to be piecewise result, BarrierNet is able to generate safe controls while avoiding
constant among all discretized time intervals, and the value of overly conservative behavior.
pi within each time interval is determined by the value of the Remark 2 (Feasibility guarantees of a BarrierNet): Due to the
observation z at the beginning of the discretized time interval. existence of the control bound in a BarrierNet, the dCBF-based
If z is dependent on x, in order to make the control u not show QP (7) may become infeasible because of the possible conflict
up in z d [equivalently, Pr (p)], we may use the observation z between the control bound and the dCBF constraints. In order
such that pi (z) has a relative degree that is not lower than the to address this, we may require that the nominal controller
one of the safety constraint. provide ground truth (i.e., control labels) that strictly satisfies
According to the exponential CBF [3], a system is safety the safety constraints and the control bounds. Then, during the
guaranteed in terms of the constraint bj (x) ≥ 0 if the polynomial training of the BarrierNet layer, we can relax/remove the control
bounds. After the neural network converges, the differentiable
m−1
QPs are feasible when we add control bounds in the testing
s +
m
lr s r = 0 (13)
or implementation. However, there is still the possibility that
r=0
the QP could be infeasible as the BarrierNet may have some
corresponding to inputs that it had not seen before. However, we can find sufficient
m−1 conditions of feasibility, as shown in [31]. Briefly, this approach
(r) finds a feasibility constraint on the state of the system along
f bj (x) + [Lg Lf
Lm bj (x)]u + lr bj (x) ≥ 0
m−1
(14)
r=0 with the penalties pi (z), i ∈ {1, . . . , m}, and then enforces this

Authorized licensed use limited to: Zhejiang University. Downloaded on October 11,2024 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.
XIAO et al.: BarrierNet: DIFFERENTIABLE CONTROL BARRIER FUNCTIONS FOR LEARNING OF SAFE ROBOT CONTROL 2295

Algorithm 1: Construction and Training of BarrierNet.

Fig. 4. Traffic merging problem. A collision may happen at the MP as well as


everywhere within the CZ.

VI. EXPERIMENTS AND DISCUSSIONS


In this section, we present three case studies (a traffic merging
control problem and robot navigation problems in 2-D and 3-D)
to verify the effectiveness of our proposed BarrierNet.
feasibility constraint using another CBF. For instance, one of the
sufficient conditions for the feasibility guarantee of the adaptive A. Traffic Merging Control
1 +p2 )
cruise control problem shown in [31] is v ≤ vp + cd g(p p1 p2 , 1) Experiment Setup: The traffic merging problem arises
where v, vp denote the velocities of the ego and the preceding when traffic must be joined from two different roads, usually
vehicles, respectively, cd g > 0 denotes the maximum braking of associated with the main lane and a merging lane, as shown
the ego vehicle. However, it is still possible that no feasible solu- in Fig. 1. We consider the case where all traffic consisting of
tions can be found under complex specifications. In such cases, controlled autonomous vehicles (CAVs) arrive randomly at the
we assign priorities to specifications and make sure the most origin (O and O ) and join at the merging point (MP) M where
important specifications (such as no collision with pedestrians a lateral collision may occur. The segment from the origin to
in driving) are always satisfied while relaxing the less important the MP M has length L for both lanes and is called the control
specifications (such as lane keeping). The relaxation of less zone (CZ). All CAVs do not overtake each other in the CZ as
important specifications will not affect how we obtain CBFs each road consists of a single lane. A coordinator is associated
as this is done by relaxing the corresponding CBF constraint with the MP, whose function is to maintain a first-in-first-out
(i.e., replacing the right hand 0 of (8) by a slack variable, and (FIFO) queue of CAVs based on their arrival time at the CZ.
minimizing the slack variable in the cost as well). The CBFs The coordinator also enables real-time communication among
can be obtained using the disk-covering approach [49], which the CAVs that are in the CZ, including the last one leaving the
has been shown to work for all kinds of traffic participants. CZ. The FIFO assumption imposed so that CAVs cross the MP
The CBF method has been shown to work in complex speci- in their order of arrival is made for simplicity and often to ensure
fications [49], and thus, applicable to the BarrierNet using the fairness.
priority structure. 2) Notation: xk , vk , and uk denote the along-lane position,
Remark 3: The complexity of the BarrierNet is O(d3 ) using speed, and acceleration (control) of CAV k, respectively.
interior point method [50], where d is the dimension of the deci- t0k and tm
k denote the arrival time of CAV k at the origin and the
sion variable (control input) in the QP. Empirically, the number MP, respectively. zk,kp denotes the along lane distance between
of constraints does not significantly affect the complexity. More- CAV k and its preceding CAV kp , as shown in Fig. 4.
over, the batch-QP solving method introduced in [38] makes the Our goal is to jointly minimize all vehicles’ travel time and
BarrierNet tractable to be trained on large datasets using batch energy consumption in the CZ. Written as an objective function,
training method. Therefore, the proposed BarrierNet is scalable we have
to more inputs and constraints and can satisfy requirements for  tm k 1
actual deployments. min β(tm − t 0
) + u2 (t)dt (16)
The process of constructing and training a BarrierNet includes uk (t)
k k
t0k 2 k
the following.
where uk is the vehicle’s control (acceleration), and β > 0 is
1) Construct a learnable HOCBF by (5) that enforces each
a weight controlling the relative magnitude of travel time and
of the safety requirements.
energy consumption. We assume double integrator dynamics for
2) Construct the parameterized BarrierNet by (7).
all vehicles.
3) Get the training dataset using the nominal controller.
Each vehicle k should satisfy the following rear-end safety
4) Train the BarrierNet using error backpropagation.
constraint if its preceding vehicle kp is in the same lane
We summarize the algorithm for the BarrierNet in
Algorithm 1. zk,kp (t) ≥ φvk (t) + δ, ∀t ∈ [t0k , tm
k ] (17)

Authorized licensed use limited to: Zhejiang University. Downloaded on October 11,2024 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.
2296 IEEE TRANSACTIONS ON ROBOTICS, VOL. 39, NO. 3, JUNE 2023

Fig. 5. Control and penalty function p1 (z) from the BarrierNet when training
with the optimal controller. The blue curves (labeled as implementation) are the
vehicle control when we apply the BarrierNet to drive the vehicle dynamics to
pass through the CZ.

where zk,kp = xkp − xk denotes the along-lane distance be-


tween k and kp , φ is the reaction time (usually takes 1.8 s) and
δ ≥ 0.
The traffic merging problem is to find an optimal control that
minimizes (16), subject to (17). We assume vehicle k has access
to only the information of its immediate neighbors from the
coordinator (shown in Fig. 4), such as the preceding vehicle kp .
This merging problem can be solved analytically by optimal
control methods [51], but at the cost of extensive computation,
and the solution becomes complicated when one or more con-
straints become active in an optimal trajectory, hence possibly
Fig. 6. Safety comparison (under ten trained models) between the BarrierNet
prohibitive for real-time implementation. In the worst case, the and an FC network when training using the optimal/OCBF controller (δ = 0).
safety constraint may recursively become active (such as in If zk,kp /vk is above the line φ = 1.8, then safety is guaranteed. We observe
heavy traffic), which significantly increases the computation that only neural network agents equipped with BarrierNet satisfy this condition.
(a) Training using the optimal controller. (b) Training using the OCBF controller.
time.
3) BarrierNet Design: We enforce the safety constraint (17)
by a CBF b(zk,kp , vk ) = zk,kp (t) − φvk (t) − δ, and any control
input uk should satisfy the CBF constraint (4) which in this case 4) Results and Discussion: To get the training dataset, we
(choose α1 as a linear function in Definition 4) is solve an optimal or joint optimal control and barrier function
(OCBF) [52] controller offline. The solutions of an optimal or
ϕuk (t) ≤ vk (t) − vkp (t) + p1 (z)(zk,kp (t) − φvk (t) − δ)
OCBF controller are taken as labels. The training results with
(18)
the optimal controller and the OCBF controller are shown in
where vk is the speed of vehicle k and z = (xkp , vkp , xk , vk )
Figs. 5–7.
is the input of the neural network model (to be designed later).
In an optimal controller, the original safety constraint is active
p1 (z) is called a penalty in the CBF that addresses the conser-
after around 6 s, as shown in Fig. 5. Therefore, the sampling
vativeness of the CBF method. The cost in the neuron of the
trajectory is on the safety boundary, and the inter-sampling effect
BarrierNet is given by
becomes important in this case. Since we do not consider the
min(uk − f1 (z))2 (19) inter-sampling effect in this article, the safety metric of the
uk
BarrierNet might go below the lower bound φ = 1.8, as the
where f1 (z) is a reference to be trained [the output of the fully red curves shown in Fig. 6(a). However, due to the Lyapunov
connected (FC) network]. Then, we create a neural network property of the CBF, the safety metric will always stay close to
model whose structure is composed by an FC network (an input the lower bound φ = 1.8. The solutions for ten trained models
layer and two hidden layers) followed by a BarrierNet. The input are also consistent. In an FC network, the safety metrics vary
of the FC network is z, and its output is the penalty p1 (z) and under different trained models, and the safety constraint might
the reference f1 (z). While the input of the BarrierNet is the be violated, as the blue curves are shown in Fig. 6(a).
penalty p1 (z) and the reference f1 (z), and its output is applied In an OCBF controller, the original safety constraint is not
to control a vehicle k in the CZ. active, and thus, the inter-sampling effect is not sensitive. As

Authorized licensed use limited to: Zhejiang University. Downloaded on October 11,2024 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.
XIAO et al.: BarrierNet: DIFFERENTIABLE CONTROL BARRIER FUNCTIONS FOR LEARNING OF SAFE ROBOT CONTROL 2297

where x := (x, y, θ, v), u = (u1 , u2 ), x, y denote the robot’s


2-D coordinates, θ denotes the heading angle of the robot, v
denotes the linear speed, and u1 , u2 denote the two control inputs
for turning and acceleration.
2) BarrierNet Design: The robot is required to avoid a cir-
cular obstacle in its path, i.e., the state of the robot should satisfy

(x − xo )2 + (y − yo )2 ≥ R2 (21)

where (xo , yo ) ∈ R2 denotes the location of the obstacle, and


R > 0 is the radius of the obstacle.
The goal is to minimize the control input effort while subject to
the safety constraint (21) as the robot approaches its destination.
The relative degree of the safety constraint (21) is 2 with
Fig. 7. Control and penalty function p1 (z) from the BarrierNet when training respect to the dynamics (20), thus, we use an HOCBF b(x) =
with the OCBF controller. The blue curves (labeled as implementation) are the (x − xo )2 + (y − yo )2 − R2 to enforce it. Any control input
vehicle control when we apply the BarrierNet to drive the vehicle dynamics to
pass through the CZ. u should satisfy the HOCBF constraint (4) which in this case
(choose α1 , α2 in Definition 4 as linear functions) is

TABLE II − Lg Lf b(x)u ≤ L2f b(x) + (p1 (z) + p2 (z))Lf b(x)


COMPARISONS BETWEEN BARRIERNET, FC, OPTIMAL CONTROLLER, AND
OCBF CONTROLLER + (ṗ1 (z) + p1 (z)p2 (z))b(x) (22)

where

Lg Lf b(x) = [−2(x − xo )v sin θ + 2(y − yo )v cos θ


2(x − xo ) cos θ + 2(y − yo ) sin θ]
L2f b(x) = 2v 2
Lf b(x) = 2(x − xo )v cos θ + 2(y − yo )v sin θ. (23)

In the above equations, z = (x, xd ) is the input to the


shown in Fig. 6(b), safety is always guaranteed in a BarrierNet model, where xd ∈ R2 is the location of the destination, and
under ten trained models. While in an FC network, the safety p1 (z), p2 (z) are the trainable penalty functions. ṗ1 (x) is set
constraint may be violated as there are no guarantees. as 0.
We present the penalty functions when training with the The cost in the neuron of the BarrierNet is given by
optimal controller and the OCBF controller in Figs. 5 and 7,
respectively. The penalty function p1 (z) decreases when the min(u1 − f1 (z))2 + (u2 − f2 (z))2 (24)
u
CBF constraint becomes active. This shows the adaptivity of
the BarrierNet. This behavior is similar to the AdaCBF, but in where f1 (z) and f2 (z) are reference controls provided by the
the BarrierNet, we do not need to design auxiliary dynamics for upstream network (the outputs of the FC network).
the penalty functions. Therefore, the BarrierNet is simpler than 3) Results and Discussion: The training data are obtained by
the AdaCBF. Finally, we present a comprehensive comparison solving the CBF controller introduced in [12], and we generate
between the BarrierNet, the FC network, the optimal controller, 100 trajectories of different destinations as the training dataset.
and the OCBF controller in Table II. We compare the FC model, the deep forward–backward model
(DFB) [37] that is equivalent to taking the CBF-based QP
as a safety filter, and our proposed BarrierNet. The training
B. Two-Dimensional Robot Navigation and testing results are shown in Fig. 8(a)–(d). All the models
1) Experiment Setup: We consider a robot navigation prob- are trained for obstacle size R = 6 m. The controls from the
lem with obstacle avoidance. In this case, we consider nonlinear BarrierNet can stay very close to the ground truth, while there
dynamics with two control inputs and nonlinear safety con- are jumps for controls from the DFB when the robot gets close
straints. The robot navigates according to the following unicycle to the obstacle, which shows the conservativeness of the DFB,
model for a wheeled mobile robot: as shown by the blue solid (BarrierNet) and blue dashed (DFB)
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ curves in Fig. 8(a) and (b). The robot trajectory (dashed blue)
ẋ v cos(θ) 0 0   from the DFB stays far away from ground truth in Fig. 8(d), and
⎢ẏ ⎥ ⎢ v sin(θ) ⎥ ⎢0 0⎥ u1 this again shows its conservativeness. The robot from the FC
⎢ ⎥=⎢ ⎥ ⎢ ⎥
⎣ θ̇ ⎦ ⎣ 0 ⎦ + ⎣1 0⎦ u2 (20)
will collide with the obstacle as there is no safety guarantee, as
v̇ 0 0 1 the dotted-blue line shown in Fig. 8(d).

Authorized licensed use limited to: Zhejiang University. Downloaded on October 11,2024 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.
2298 IEEE TRANSACTIONS ON ROBOTICS, VOL. 39, NO. 3, JUNE 2023

Fig. 8. Controls and trajectories from the FC, DFB, and BarrierNet under the training obstacle size R = 6 m. The results refer to the case that the trained
FC/DFB/BarrierNet controller is used to drive a robot to its destination. Safety is guaranteed in both DFB and BarrierNet models but not in the FC model. The
DFB tends to be more conservative such that the trajectories/controls stay away from the ground truth as its CBF parameters are not adaptive. The varying penalty
functions allow the generation of desired control signals and trajectories (given by training labels) and demonstrate the adaptivity of the BarrierNet with safety
guarantees. (a) Control u1 . (b) Control u2 . (c) Penalty functions. (d) Robot trajectories.

When we increase the obstacle size during implementation in the sense that with the varying penalty functions, a BarrierNet
(i.e., the trained BarrierNet/DFB/FC controller is used to drive a can produce desired control signals given by labels (ground
robot to its destination), the controls u1 , u2 from the BarrierNet truth). This is due to the fact the varying penalty functions change
and DFB deviate from the ground truth, as shown in Fig. 9(a) the slop of class K functions in the HOCBF constraint without
and (b). This is due to the fact that the BarrierNet and DFB will losing safety guarantees.
always ensure safety first. Therefore, safety is always guaranteed
in the BarrierNet and DFB, as the solid and dashed curves shown C. Three-Dimensional Robot Navigation
in Fig. 10(a). Both the BarrierNet and DFB show some adaptivity
1) Experiment Setup: We consider a robot navigation prob-
to the size change of the obstacle, while the FC controller cannot
lem with obstacle avoidance in 3-D space. In this case, we con-
be adaptive to the size change of the obstacle. Thus, the safety
sider complicated superquadratic safety constraints. The robot
constraint (21) will be violated, as shown by the dotted curves
navigates according to the double-integrator dynamics. The state
in Fig. 10(a).
of the robot is x = (px , vx , py , vy , pz , vz ) ∈ R6 , in which the
The difference between the DFB and the proposed BarrierNet
components denote the position and speed along x, y, z axes.
is in the performance. In Fig. 10(b), we show all the trajectories
The three control inputs u1 , u2 , and u3 are the acceleration
from the BarrierNet, DFB, and FC controllers under different
along x, y, and z axes, respectively.
obstacle sizes. Collisions are avoided under the BarrierNet and
2) BarrierNet Design: The robot is required to avoid a su-
DFB controllers, as shown by all the solid and dashed trajectories
perquadratic obstacle in its path, i.e, the state of the robot should
and the corresponding obstacle boundaries in Fig. 10(b). How-
satisfy
ever, as shown in Fig. 10(b), the trajectories from the BarrierNet
(solid) can stay closer to the ground truth (red-solid) than the (px − xo )4 + (py − yo )4 + (pz − zo )4 ≥ R4 (25)
ones from the DFB (dashed) when R = 6 m (and other R val-
where (xo , yo , zo ) ∈ R3 denotes the location of the obstacle, and
ues). This is due to the fact that the CBFs in the DFB may not be
R > 0 is the half-length of the superquadratic obstacle.
properly defined such that the CBF constraint is active too early
The goal is to minimize the control input effort while subject to
when the robot gets close to the obstacle. It is important to note
the safety constraint (25) as the robot approaches its destination.
that the robot does not have to stay close to the obstacle boundary
The relative degree of the safety constraint (25) is two with
under the BarrierNet controller, and this totally depends on the
respect to the dynamics, thus, we use an HOCBF b(x) =
ground truth. The definitions of CBFs in the proposed BarrierNet
(px − xo )4 + (py − yo )4 + (pz − zo )4 − R4 to enforce it. Any
depend on the environment (network input), and thus, they are
control input u should satisfy the HOCBF constraint (4) which
adaptive and are without conservativeness.
in this case (choose α1 , α2 in Definition 4 as linear functions) is
The profiles of the penalty functions p1 (z)and p2 (z) in the
BarrierNet are shown in Fig. 8(c). The values of the penalty −Lg Lf b(x)u ≤ L2f b(x) + (p1 (z) + p2 (z))Lf b(x)
functions vary when the robot approaches the obstacle and gets
to its destination, and it shows the adaptivity of the BarrierNet +(ṗ1 (z) + p1 (z)p2 (z))b(x) (26)

Authorized licensed use limited to: Zhejiang University. Downloaded on October 11,2024 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.
XIAO et al.: BarrierNet: DIFFERENTIABLE CONTROL BARRIER FUNCTIONS FOR LEARNING OF SAFE ROBOT CONTROL 2299

Fig. 9. Controls from the BarrierNet and DFB under different obstacle sizes. Fig. 10. Safety metrics for the BarrierNet, the DFB and the FC network. The
The BarrierNet and DFB are trained under the obstacle size R = 6 m. The results BarrierNet, the DFB and the FC network are trained under the obstacle size
refer to the case that the trained BarrierNet/DFB controller is used to drive a robot R = 6 m. b(x) ≥ 0 implies safety guarantee. The trajectories under the FC
to its destination. When we increase the obstacle size during implementation, controller coincide as the FC cannot adapt to the size change of the obstacle.
the outputs (controls of the robot) of the BarrierNet and the DFB will adjust (a) HOCBF b(x) profiles under different obstacle sizes. (b) Robot trajectories
accordingly in order to guarantee safety, as shown by the blue and cyan curves. under different obstacle sizes.
However, the BarrierNet tends to be less conservative for unseen situations. (a)
Control u1 . (b) Control u2 .
and this is due to the complicated safety constraint (25). We can
where improve the tracking accuracy with deeper BarrierNet models
  (not the focus of this article). Nevertheless, the implementation
Lg Lf b(x) = 4(px − xo )3 , 4(py − yo )3 , 4(pz − zo )3 trajectory under the BarrierNet controller is close to the ground
L2f b(x) = 12(px −xo )2 vx2 +12(py −yo )2 vy2 +12(pz −xo )2 vz2 truth, as shown in Fig. 11(b).
The robot is guaranteed to be collision-free from the obstacle
Lf b(x) = 4(px −xo )3 vx +4(py − yo )3 vy +4(pz − zo )3 vz . under the BarrierNet controller, as the solid-blue line shown
(27) in Fig. 11(b), while the robot from the FC may collide with
the obstacle as there is no safety guarantee, as the dotted-blue
In the above equations, z = x is the input to the model, and
line shown in Fig. 11(b). The barrier function in Fig. 11(a) also
p1 (z), p2 (z) are the trainable penalty functions. ṗ1 (x) is also
demonstrates the safety guarantees of the BarrierNet, but not in
set as 0 as in the 2-D navigation case.
the FC model.
The cost in the neuron of the BarrierNet is given by
min(u1 − f1 (z))2 + (u2 − f2 (z))2 + (u3 − f3 (z))2 (28) VII. BARRIERNET FOR VISION-BASED AUTONOMOUS DRIVING
u

where f1 (z), f2 (z), and f3 (z) are references controls provided In this section, we use the proposed BarrierNet methodology
by the upstream network (the outputs of the FC network). to achieve safety in a complex learning system: Vision-based
3) Results and Discussion: The training data are obtained by end-to-end control of lane following for autonomous driving.
solving a fine-tuned CBF controller introduced in [12]. We com- We first discuss where the two “ends” should be defined for
pare the FC model with our proposed BarrierNet. The training learning a good model based on limited data.
and testing results are shown in Fig. 11. The controls from the An end-to-end learning system is hard to interpret, especially
BarrierNet have some errors with respect to the ground truth, in the setting of inferring control from high-dimensional sensor

Authorized licensed use limited to: Zhejiang University. Downloaded on October 11,2024 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.
2300 IEEE TRANSACTIONS ON ROBOTICS, VOL. 39, NO. 3, JUNE 2023

Fig. 12. Multilevel interpretable end-to-end autonomous driving framework


with dCBFs. The entire pipeline is end-to-end differentiable. Each depth of the
model learns different vehicle and environment information. The outputs of
BarrierNet are high-relative-degree controls.

inaccurate, affecting the performance and even the safety of the


vehicle.
3) Challenges: An end-to-end autonomous driving system
with high-relative-degree control variables requires a very large
and diverse training dataset. This is because more vehicle state
variables are involved in higher-relative-degree controls, and
thus, the vehicle may have different states under the same
observation. This will cause confusion in the training process.
If the training dataset is not large and diverse enough, the
poor generalization of the trained controller (from open-loop
to closed-loop or from sim-to-real) will make the controller fail
to achieve its task. Thus, we propose a multilevel interpretable
Fig. 11. HOCBFs and trajectories from the FC and BarrierNet. The results model under a limited training set, as described next.
refer to the case that the trained FC/BarrierNet controller is used to drive a robot In order to make the model interpretable, we take training
to its destination. Safety is guaranteed in the BarrierNet model but not in the FC loss outputs at different depths of the model. Following the
model. (a) HOCBF b(x) profiles with respect to the two approaching obstacles.
(b) Robot trajectories. The initial position is at the lower-left corner. convolutional neural network (CNN) or long short-term memory
(LSTM), we may take part of the neurons as the loss outputs for
the locations of the vehicle itself and the obstacles. This way,
measurements such as images. However, adding the BarrierNet we train neurons at this level to learn position information. In a
layer in an end-to-end learning system provides safety guaran- deeper setting, we may take part of the neurons as another loss
tees. output for the speed and steering angle of the vehicle, training
these neurons to learn speed and steering angle information.
A. Interpretable End-to-end Design By adding derivative layers following these neurons, we get
1) System’s Inputs Setup: We note that in human driving, the acceleration and steering rate information for the vehicle.
the majority of the information human drivers rely on comes The acceleration and steering rate could be taken as reference
from the front vision view. We define the input of the end-to- controls in the BarrierNet, which is also trainable, and we take
end architecture to be front-view images, which contain enough the output of the BarrierNet as the final loss output in the
information for executing safe driving. training process. In this framework, different depths of neurons
2) System’s Outputs Setup: At the output end, where Bar- learn different vehicle information, which makes the whole
rierNet is implemented, high-relative-degree control variables structure consistent with the vehicle’s physical dynamics. This
(such as acceleration, jerk, steering rate, or steering acceleration) architecture ensures the neural network model is interpretable.
are generated for driving the vehicle. The first advantage of We present the model structure in Fig. 12.
using high-relative-degree control variables is to ensure the
smoothness of the vehicle states (such as speed), which results in
smooth maneuvers for passenger comfort. Another advantage is B. End-to-end Policies With BarrierNet
to ensure the controller works with accurate maneuvering due to In this subsection, we explain how we can augment Barrier-
the physical inertia of the vehicle. If we take vehicle speed as one Net to an end-to-end driving policy and propose solutions for
of the controls, and the controller requires the speed to suddenly challenges that arise in autonomous driving with BarrierNet.
change to a large different value, the vehicle powertrain system 1) Policy Learning: The neural policy takes in front-view
will fail to respond. In this case, the vehicle control becomes images and extracts deep features based on a CNN. We then

Authorized licensed use limited to: Zhejiang University. Downloaded on October 11,2024 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.
XIAO et al.: BarrierNet: DIFFERENTIABLE CONTROL BARRIER FUNCTIONS FOR LEARNING OF SAFE ROBOT CONTROL 2301

use an LSTM to process features for each timestamp for tem-


poral reasoning. The output of the LSTM (z in Definition 6)
is then sent to multiple branches of multi-layer perceptrons
(MLPs) to predict different parameters in BarrierNet includ-
ing H(z), F (z), and pi (z). All parameters are then fed to the
differentiable optimization layer to solve for the final controls
that satisfy safety constraints enforced by dCBFs. Note that
while the final controls are acceleration a and steering rate ω,
we supervise the reference control F (z) with velocity v and
steering angle δ and compute their derivative for the reasons
discussed in Section VII-A. We bound the derivative of v, δ
to stabilize learning. Training objectives for the policy are
mse on speed, steering angle (loss 2), cceleration, and steer- Fig. 13. Large disk covering approach for obstacle avoidance. Collisions can
ing rate (loss 3), as shown in Fig. 12. For training, end-to- be avoided if the center of the vehicle never enters the disks (for a correct design
of disk locations and sizes). Sorted disks are used to cover corresponding sorted
end driving policies have been shown to achieve less robust- obstacles as they present themselves.
ness [53] by adopting imitation learning only (with supervi-
sion from real human-driving data). To tackle this issue, we
leverage a data-driven simulation [54], [55] to augment the real Algorithm 2: Vision-Based End-to-End Autonomous Driv-
human-driving dataset with more diverse data locally around ing With Barriernet.
the original dataset. More implementation details can be seen in
Section VIII-B.
2) State Estimation: Besides dCBF parameters, states of the
vehicle are also required in BarrierNet. Based on the notation in
vehicle dynamics (29), required states include lateral displace-
ment from the lane center d, local heading error μ, relative
progress between ego-car and obstacle Δs, and displacement
from lane center of the obstacle dobs . We use the same CNN +
LSTM architecture followed by multiple branches, with each
predicting a state and mse loss as training objectives. We cap
loss on Δs, dobs when the obstacle is absent or too far away to
ensure states can be reasonably predicted.
3) Safety With an Unknown Number of Constraints: One of
the challenges in autonomous driving with BarrierNet is that
we have to define the exact number of the HOCBFs when
designing the BarrierNet layer as it connects with previous
layers. However, a vehicle may encounter a time-varying number
of obstacles (constraints) in a complex environment. In order to
address this problem, we proceed as follows.
Suppose N ∈ N denotes the maximum number of obstacles
(such as other vehicles) a vehicle may encounter in driving. We
cover each of the obstacles with an off-the-center disk, as shown
in Fig. 13. The deviation direction of the disk depends on the with respect to the ego vehicle. Thus, the connection to the
direction of the obstacle with respect to the lane center. In this corresponding dCBFs (i.e., the order of obstacles) is crucial.
manner, we may use large disks to cover obstacles while making When there are no actual obstacles on the road, as in the
sure that the ego vehicle will not be overly conservative in driving case depicted on the left-hand side of Fig. 13, we just move
through. We may use multiple small disks to cover a single ob- the covering disks off the road. It is important to note that we
stacle. However, this increases the number of safety constraints have additional lane-keeping constraints [such as (30)], and thus,
required. Another advantage of using a large off-the-center disk these disks are not used to approximate the lane boundaries.
is to ensure the smoothness of the vehicle trajectory and to avoid Instead, they are placed in a way such that the road is not blocked.
getting stuck in local traps that may appear with small disks. In These disks move along the road as the vehicle progresses at the
this setting, we only have N safety constraints, one for each same speed. While the vehicle drives on the road, these disks do
obstacle. We sort them in a specific order in connection with the not affect its motion as the corresponding HOCBF constraints
previous layer and enforce them using the above differentiable are not activated.
HOCBFs. The order of obstacles is important since they are When there are one or more obstacles on the road, as in the
enforced by dCBFs, and dCBFs are connected to previous case depicted on the right-hand side of Fig. 13, we first sort
layers. We need to generate different adaptive parameters for the obstacles according to their distance with respect to the ego
dCBFs corresponding to obstacles whose distances are different vehicle. Then, we use the sorted disks to cover the corresponding

Authorized licensed use limited to: Zhejiang University. Downloaded on October 11,2024 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.
2302 IEEE TRANSACTIONS ON ROBOTICS, VOL. 39, NO. 3, JUNE 2023

sorted obstacles. The sorted covering approach can make sure we collect real-world data from a wide range of environments,
that the vehicle may leave the road in order to avoid collision including different times of day, weather conditions, and seasons
with obstacles. In this setting, although we may have redundant of the year. The entire dataset consists of roughly 2 h of driv-
differentiable HOCBFs in terms of obstacle avoidance, these ing data, which is further augmented with our training dataset
HOCBFs always play an important role in guiding the vehicle, generation pipeline using VISTA.
either in lane keeping or obstacle avoidance.
The whole process for vision-based end-to-end autonomous B. Synthetic Training Dataset Generation
driving includes the following:
1) generating training data and control labels; We train our model with guided policy learning, which has
2) training the model using supervised learning; been shown to improve effectiveness for direct model transfer
3) estimating the vehicle state and obstacle state in real time to real-car deployment (other techniques, such as conformance
during testing; checking [27], can also be used to achieve sim-to-real transfer).
4) forward propagation of the model with front-view images The data generation process follows: 1) in VISTA, randomly
as inputs to generate safe controls for the ego vehicle. initializing both ego- and ado-car with different configurations
We summarize the algorithm for end-to-end autonomous driv- like relative poses, geographical locations associated with the
ing in Algorithm 2. real dataset, and the appearance of the vehicle; 2) running an
4) Multiple and Active Obstacles: The BarrierNet can handle optimal controller with access to privileged information to steer
multiple obstacles by adding the corresponding dCBF con- the ego-vehicle and collect ground-truth control outputs with
straints in the differentiable QP. Each of the two road boundaries corresponding states; 3) collecting RGB images at viewpoints
could be viewed as an obstacle in autonomous driving. The along the trajectories. We choose nonlinear model predictive
BarrierNet can also work for dynamic obstacles (such as other control (NMPC) as the privileged (nominal) controller. While
active vehicles, pedestrians, etc.), and the safety guarantees of NMPC is usually computationally expensive and hard to solve,
dynamic obstacles in the BarrierNet require additional state it is tractable offline and, with jerk ujerk and steering acceleration
estimations (such as moving speed) for those obstacles. In fact, usteer as controls, provides smooth acceleration a and steering
image-based state estimations are usually hard for states other rate w, which is used as learning targets in BarrierNet. Vehicle
than positions (such as speed, acceleration, etc.). Thus, we may dynamics of NMPC and BarrierNet (1) are defined with respect
need to use other sensors (such as light detection and ranging) to a reference trajectory [57]. It measures the along-trajectory
to better estimate states for dynamic obstacles and conduct distance s ∈ R and the lateral distance d ∈ R of the vehicle
multisensor fusion. The uncertainty of surrounding agents can center of gravity (CoG) with respect to the closest point on the
be considered in the dCBFs using the uncertainty bound if it is reference trajectory
known, as shown in [52]. The conservativeness of this approach ⎡ ⎤ ⎡ v cos(μ+β) ⎤ ⎡ ⎤
can be addressed using the adaptive event-driven approach for
ṡ 1−dκ 0 0
⎢ d˙⎥ ⎢ v sin(μ + β) ⎥ ⎢0 0⎥
CBFs [56]. ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢μ̇⎥ ⎢ v sin β − κ v cos(μ+β) ⎥ ⎢0 0⎥  
⎢ ⎥ ⎢ lr 1−dκ ⎥ ⎢ ⎥ ujerk
⎢ v̇ ⎥ = ⎢ ⎥ + ⎢ 0 0⎥ (29)
VIII. VISION-BASED AUTONOMOUS DRIVING EXPERIMENTS ⎢ ⎥ ⎢ a ⎥ ⎢ ⎥ steer
⎢ ȧ ⎥ ⎢ 0 ⎥ ⎢1 0⎥  u 
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
In this section, we show experiments with the proposed ⎣ δ̇ ⎦ ⎣ ω ⎦ ⎣0 0⎦ u
vision-based end-to-end autonomous driving framework in both ω̇ 0 0 1
sim-to-real environments and a full-scale autonomous vehicle.       
ẋ f (x) g(x)
We start by introducing the hardware platform and data col-
lection, followed by implementation details of the proposed where μ is the vehicle local heading error determined by the
model. We then demonstrate extensive analysis in the sim-to-real difference of the global vehicle heading θ ∈ R and the tangent
environment virtual image synthesis and transformation for au- angle φ ∈ R of the closest point on the reference trajectory (i.e.,
tonomy (VISTA) [55]. Finally, we showcase results with real-car θ = φ + μ) as shown in Fig. 14; v, a denote the vehicle linear
deployment. speed and acceleration, respectively; δ, ω denote the steering
angle and steering rate, respectively; κ is the curvature of the
A. Hardware Setup and Real-World Data Collection reference trajectory at the closest point; lr is the length of the
We deploy our models onboard a full-scale autonomous vehi- vehicle from the tail to the CoG; and ujerk , usteer denote the two
cle (2019 Lexus RX 450H) equipped with an NVIDIA 2080Ti control inputs for jerk and steering acceleration (in the nominal
GPU and an AMD Ryzen 7 3800X 8-Core Processor. We use an controller), respectively. β = arctan( lr +l lr
f
tan δ), where lf is
red, green and blue (RGB) camera BFS-PGE-23S3C-CS as the the length of the vehicle from the head to the CoG. We set
primary perception sensor, which runs at 30 Hz, with a resolu- the receding horizon of the NMPC to 20-time steps during
tion of 960×600, and has 130◦ horizontal field-of-view. Other data sampling, and it is implemented in a virtual simulation
onboard sensors include inertial measurement sensors and wheel environment in MATLAB. We augment the real-world dataset
encoders to measure steering feedback and odometry. Also, we using VISTA and NMPC with synthetic obstacle avoidance and
use a differential global positioning system (dGPS) for evalu- lane following data. In total, the training dataset has around 400 k
ation purposes. To run the data-driven simulation VISTA [55], images.

Authorized licensed use limited to: Zhejiang University. Downloaded on October 11,2024 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.
XIAO et al.: BarrierNet: DIFFERENTIABLE CONTROL BARRIER FUNCTIONS FOR LEARNING OF SAFE ROBOT CONTROL 2303

Fig. 15. Lane following probabilistic comparisons of deviation from the lane
center in a BarrierNet with/without lane keeping CBFs.

Fig. 14. Coordinates of ego w.r.t a reference trajectory.


 2
vcμ,β 2Δsvsμ,β μ̇
L2f bobs (x) = 2 + 2(vsμ,β )2 −
BarrierNet Design: The vehicle is required to stay within 1 − dκ 1 − dκ
the lane and avoid obstacles along its path, i.e., the state of the 2Δsκv 2 sμ,β cμ,β
vehicle should satisfy × + 2(d − dobs )vcμ,β μ̇
(1 − dκ)2
|d| ≤ dlf 2Δsvcμ,β
Lf bm (x) = + 2(d − dobs )vsμ,β (34)
2
Δs + (d − dobs ) ≥ 2 2
rD (30) 1 − dκ
l +l
where dlf is a preset bound of how close the car needs to stay where sμ,β = sin(μ + β), cμ,β = cos(μ + β), lrf = r 2 f ,
κv·cμ,β
close to the lane center. If the road width changes, we can replace Δs = s − sobs , μ̇ = v·sin(β)
lr − 1−dκ .
dlf by a variable that is either known or predicted from the In the above equations, z is the deep feature extracted from
image observation. Δs is relative progress between ego-car and the front-view images, and p1 (z) and p2 (z) are the trainable
obstacle, rD is disk size, and dm is the lateral displacement from penalty functions. ṗ1 (x) is also set as 0 as in the 2-D navigation
lane center of the obstacle. case.
The goal is to minimize the control input effort while subject to The cost in the neuron of the BarrierNet is given by
the safety constraint (30) as the robot approaches its destination.
min(u1 − f1 (z))2 + (u2 − f2 (z))2 (35)
The relative degree of the safety constraints (30) is two with u
respect to the dynamics, thus, we use three HOCBFs bleft lf = where f1 (z) and f2 (z) are references controls provided by the
2 2 2
dlf − d, bright
lf = d lf + d, and b obs = Δs + (d − d obs ) − rD to upstream network (the outputs of the CNN + LSTM).
enforce them. Any control input u should satisfy the HOCBF
constraint (4) which in this case (choose α1 , α2 in Definition 4 C. Evaluation in Sim-to-Real Environments
as linear functions) is
Open-loop control error (i.e., the difference between predicted
− Lg Lf b(x)u ≤ L2f b(x) + (p1 (z) + p2 (z))Lf b(x) and ground-truth control) has been shown to be a poor indicator
to evaluate the performance of a driving policy since it only mea-
+ (ṗ1 (z) + p1 (z)p2 (z))b(x) (31)
sures error around ground-truth trajectories and ignores accumu-
where lated errors that gradually drift the vehicle to out-of-distribution
 regions. Hereby, we present closed-loop testing results in the
−sμ,β
lf (x) =
Lg Lf bleft −vcμ,β lrf sim-to-real environment VISTA [55].
δ 2 (1+(lrf +δ)2 ) 1) Lane Keeping as Safety Constraints: In Fig. 15, we show
the probability of ego-vehicle deviating away from the lane
L2f bleft
lf (x) = −vcμ,β μ̇
center for larger than 1 m. We run 1000 episodes with maximum
lf (x) = −vsμ,β
Lf bleft (32) 200 steps if not crashed (off-lane more than 2 m) prematurely.
 In each episode, the vehicle is randomly initialized at a point
sμ,β in the trace, and we compute the average deviation at every
Lg Lf bright
lf (x) = vcμ,β lrf
δ 2 (1+(lrf +δ)2 )
point to ensure a sufficiently large sample size for the statistics.
The model with lane-keeping CBFs achieves significantly better
L2f bright
lf (x) = vcμ,β μ̇ performance since they can encourage the autonomous vehicle
to stay close to the lane center by decreasing the boundary values
Lf bright
lf (x) = vsμ,β (33) due to the Lyapunov property of CBFs.
2) Obstacle Avoidance: In Table III, we show the crash rate
2Δscμ+β
 and minimal clearance of models with or without BarrierNet
1−dκ + 2(d − dobs )sμ+β and with or without access to ground-truth states. Minimal
Lg Lf bobs (x) = −2Δsvsμ+β 2(d−dobs )vcμ+β lrf
1−dκ + δ2 (1+l rf δ)
2 clearance is computed as the closest distance between polygons

Authorized licensed use limited to: Zhejiang University. Downloaded on October 11,2024 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.
2304 IEEE TRANSACTIONS ON ROBOTICS, VOL. 39, NO. 3, JUNE 2023

TABLE III
CRASH RATE AND CLEARANCE WITH/WITHOUT BARRIERNET, USING OR NOT
USING GROUND TRUTH OBSTACLE INFORMATION

Fig. 17. Penalty p1 (z), p2 (z) variation in a dCBF (BarrierNet) when ap-
proaching an obstacle under two (different) trained BarrierNets. The relative
degree of the safety constraint is two, and thus we have two CBF parameters in
one CBF. The segments inside the dotted boxes denote intervals when the ego
vehicle is near the obstacle. The box sizes are different as the ego has different
speeds when passing the obstacle.

BarrierNet infers vehicle state and obstacle information from


images that induce uncertainties. On the other hand, the MPC
is very computationally expensive (MPC computation time
0.872 s versus BarrierNet computation time 0.004 s on the
same machine). Although linearization is possible in MPC to
make it more efficient, this may make it lose guarantees. The
computation times of BarrierNet and e2e-learning are close.
4) BarrierNet With Different Profiles: We also notice that
the BarrierNet may learn different CBF parameters when the
ego vehicle approaches an obstacle. In Fig. 17, we present two
Fig. 16. Vehicle trajectories in obstacle avoidance using MPC, end-to-end possible variations of penalty functions p1 (z) and p2 (z) when
learning (e2e-learning), and the proposed BarrierNet in VISTA. The MPC the ego vehicle is around an obstacle. The penalty functions
method employs accurate ground truth vehicle state and obstacle information,
while e2e-learning and BarrierNet do not have such information and infer it
p1 (z) and p2 (z) adapt to the obstacle when the ego vehicle is
from images. The e2e-learning method fails to guarantee safety. close to an obstacle, and they recover to some values when the
ego leaves the obstacle. This shows the flexibility of the Barri-
erNet. Another observation is that the outputs of the BarrierNet
of ego- and ado-car within an episode. The introduction of tend to deviate from the reference controls (from the previous
obstacle avoidance dCBFs significantly reduces the crash rate LSTM layer) when the ego vehicle is close to the obstacle. This
and increases clearance. The remaining failures mainly come shows the safety guarantee property of CBFs. In order to avoid
from the imprecise or even erroneous state and obstacle infor- this deviation, we need to improve the learned model with better
mation inferred from the front-view camera only. With access reference controls and CBF parameters.
to ground-truth information (an ideal state estimator), the crash
rate is close to yet not zero. This might be due to misaligned
dynamics and inter-sampling effects of CBFs, which have been D. Physical Autonomous Car Experiments
extensively studied in CBFs [5], [32]. To verify the effectiveness of the proposed vision-based end-
3) BarrierNet Provides Safe Maneuvers: In Fig. 16, we to-end framework with dCBFs, we deploy the trained models
benchmarked for different learning systems, end-to-end learning on a full-scale autonomous driving car. The experiments are
(e2e-learning) and BarrierNet, to provide driving trajectories conducted in a test site with a rural road type. We majorly test
under the same configuration except for arbitrary initial pose the algorithm with AR and only perform the minimal experiment
on the road. We show ten variants of initial states for each with real-car obstacles for safety reasons. We use a precollected
model. We also compare the proposed BarrierNet with model map of the test site and vehicle poses from the dGPS to place
predictive control (MPC). The MPC method employs accurate virtual obstacles in the front of the ego-vehicle on the road with
ground truth vehicle state and obstacle information, and thus, AR. Note that the tested models are still using vision inputs
the resulting trajectories are consistent. It can be observed that only to steer the autonomous vehicle without any access to
trajectories from the three models mostly align with each other the ground-truth state. Fig. 18 is an illustration of the real-car
in the beginning as the ego-vehicle starts with different lateral experimental setup. Another thing worth mentioning is that the
displacements from the lane center and tries to recover. Then, the scene is covered with snow at the time we conducted real car
three sets of trajectories diverge while approaching the obstacle. experiments. The icy road surface at the track and heavy snow at
This is the consequence of correction from the activated dCBFs the side of the road introduce tire slippage and pose additional
over the reference unsafe control. With BarrierNet, safety is challenges to our self-driving system. Also, the reflection of
guaranteed. The trajectories from the BarrierNet are not as sunlight on the ice makes it hard to recognize road boundaries,
consistent as (but are still close to) the MPC ones since the even from human judgment. With high-precision dGPS in the

Authorized licensed use limited to: Zhejiang University. Downloaded on October 11,2024 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.
XIAO et al.: BarrierNet: DIFFERENTIABLE CONTROL BARRIER FUNCTIONS FOR LEARNING OF SAFE ROBOT CONTROL 2305

Fig. 18. Illustration of real-car experiments. We show a driver’s view with a real car as obstacle in front of it (middle) and with a car in front of it using AR (right).

when the reference control (red) fails to avoid the front car and
requires correction from activated dCBF constraints.
3) Limitations: The proposed BarrierNet is subject to several
challenges in vision-based driving, which motivates further fu-
ture work as follows: 1) CBFs for all kinds of traffic participants
are constructed using a disk covering approach. However, how to
efficiently construct CBFs is still challenging (including CBFs
for lane keeping) when there are many participants; 2) occluded
Fig. 19. Two cases of experimental vehicle trajectories in lane keeping
with/without lane keeping CBFs in the BarrierNet. Tire slipping happened on
obstacles introduce uncertainties to the safety of the ego vehicle;
the icy road. 3) safety guarantees under dynamic obstacles are challenging
since we also need to know obstacle dynamics and states.

IX. CONCLUSION
In this article, we proposed BarrierNet - a differentiable
HOCBF layer that is trainable and guarantees safety with respect
to the user-defined safe sets. BarrierNet can be integrated with
any upstream neural network controller to provide a safety
layer. In our experiments, we show that the proposed BarrierNet
Fig. 20. Two cases of experimental vehicle trajectories in obstacle avoidance can guarantee safety while addressing the conservativeness that
with/without BarrierNet. In the left case, the heavy snow by the road is preventing CBFs induce. A potential future avenue of research emerging
the vehicle from getting back to the road due to tire slipping, and thus the vehicle
recovers slowly even when the steering wheel is at its left limit.
from this work will be to simultaneously learn the system
dynamics and unsafe sets with BarrierNets. This can be enabled
using the expressive class of continuous-time neural network
site (covariance < 1 cm), we provide qualitative analysis with models [58], [59], [60].
a side-by-side comparison between models with and without
BarrierNet.
1) BarrierNet in Challenging Sharp Turns: In Fig. 19, we ACKNOWLEDGMENT
demonstrate the driving trajectories of BarrierNet with and The views and conclusions contained in this document are
without lane-keeping CBFs in sharp left and right turns. We show those of the authors and should not be interpreted as representing
the footprint of the vehicle through time and indicate the forward the official policies, either expressed or implied, of the United
direction with arrows. Without lane-keeping CBFs (red), the car States Air Force or the USA government. The U.S. Government
is more prone to get off-road, while roughly correct estimates is authorized to reproduce and distribute reprints for government
of deviation from lane center (d) imposes an additional layer of purposes, notwithstanding any copyright notation herein.
safety with lane-keeping CBFs (blue).
2) Obstacle Avoidance in Real World: We also did experi-
ments on the autonomous car in obstacle avoidance, as shown REFERENCES
in Fig. 20. The first example (left) demonstrates that with [1] A. D. Ames, J. W. Grizzle, and P. Tabuada, “Control barrier function based
reasonable reference control (both models successfully avoid quadratic programs with application to adaptive cruise control,” in Proc.
IEEE 53rd Conf. Decis. Control, 2014, pp. 6271–6278.
the obstacle), the model with obstacle avoidance dCBFs (blue) [2] A. D. Ames, X. Xu, J. W. Grizzle, and P. Tabuada, “Control barrier function
creates more clearance to achieve better safety. The second based quadratic programs for safety critical systems,” IEEE Trans. Autom.
example (right) highlights the effectiveness of BarrierNet (blue) Control, vol. 62, no. 8, pp. 3861–3876, Aug. 2017.

Authorized licensed use limited to: Zhejiang University. Downloaded on October 11,2024 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.
2306 IEEE TRANSACTIONS ON ROBOTICS, VOL. 39, NO. 3, JUNE 2023

[3] Q. Nguyen and K. Sreenath, “Exponential control barrier functions for [29] W. Xiao, C. Belta, and C. G. Cassandras, “Event-triggered safety-critical
enforcing high relative-degree safety-critical constraints,” in Proc. Amer. control for systems with unknown dynamics,” in Proc. IEEE 60th Conf.
Control Conf., 2016, pp. 322–328. Decis. Control, 2021, pp. 540–545.
[4] L. Wang, E. A. Theodorou, and M. Egerstedt, “Safe learning of quadrotor [30] J. Breeden and D. Panagou, “High relative degree control barrier functions
dynamics using barrier certificates,” in Proc. IEEE Int. Conf. Robot. under input constraints,” in Proc. IEEE 60th Conf. Decis. Control, 2021,
Automat., 2018, pp. 2460–2465. pp. 6119–6124.
[5] A. Taylor, A. Singletary, Y. Yue, and A. Ames, “Learning for safety-critical [31] W. Xiao, C. Belta, and C. G. Cassandras, “Sufficient conditions for
control with control barrier functions,” in Proc. Learn. Dyn. Control, 2020, feasibility of optimal control problems using control barrier functions,”
pp. 708–717. Automatica, vol. 135, 2022, Art. no. 109960.
[6] J. Choi, F. Castañeda, C. J. Tomlin, and K. Sreenath, “Reinforcement [32] W. Xiao, C. Belta, and C. G. Cassandras, “Adaptive control barrier func-
learning for safety-critical control under model uncertainty, using control tions,” in IEEE Trans. Autom. Control, vol. 67, no. 5, pp. 2267–2281,
Lyapunov functions and control barrier functions,” in Proc. Robot.: Sci. May 2022, doi: 10.1109/TAC.2021.3074895.
Syst., 2020. [33] A. Robey et al., “Learning control barrier functions from expert demon-
[7] A. J. Taylor, A. Singletary, Y. Yue, and A. D. Ames, “A control barrier strations,” in Proc. IEEE 59th Conf. Decis. Control, 2020, pp. 3717–3724.
perspective on episodic learning via projection-to-state safety,” IEEE [34] M. Srinivasan, A. Dabholkar, S. Coogan, and P. A. Vela, “Synthesis of
Contr. Syst. Lett., vol. 5, no. 3, pp. 1019–1024, Jul. 2021. control barrier functions using a supervised machine learning approach,”
[8] X. Xu, P. Tabuada, J. W. Grizzle, and A. D. Ames, “Robustness of control in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., 2020, pp. 7139–7145.
barrier functions for safety critical control,” IFAC-PapersOnLine, vol. 48, [35] B. T. Lopez, J. J. E. Slotine, and J. P. How, “Robust adaptive control barrier
no. 27, pp. 54–61, 2015. functions: An adaptive and data-driven approach to safety,” IEEE Contr.
[9] T. Gurriet, A. Singletary, J. Reher, L. Ciarletta, E. Feron, and A. Ames, Syst. Lett., vol. 5, no. 3, pp. 1031–1036, Jul. 2021.
“Towards a framework for realizable safety critical control through active [36] S. Yaghoubi, G. Fainekos, and S. Sankaranarayanan, “Training neural
set invariance,” in Proc. ACM/IEEE 9th Int. Conf. Cyber- Phys. Syst., 2018, network controllers using control barrier functions in the presence of
pp. 98–106. disturbances,” in Proc. IEEE 23rd Int. Conf. Intell. Transp. Syst., 2020,
[10] A. J. Taylor and A. D. Ames, “Adaptive safety with control barrier pp. 1–6.
functions,” in Proc. Amer. Control Conf., 2020, pp. 1399–1405. [37] M. A. Pereira, Z. Wang, I. Exarchos, and E. A. Theodorou, “Safe optimal
[11] N. Csomay-Shanklin, R. K. Cosner, M. Dai, A. J. Taylor, and A. D. control using stochastic barrier functions and deep forward-backward
Ames, “Episodic learning for safe bipedal locomotion with control barrier SDEs,” in Proc. Conf. Robot Learn., 2020, pp. 1783–1801.
functions and projection-to-state safety,” in Proc. Learn. Dyn. Control, [38] B. Amos and J. Z. Kolter, “Optnet: Differentiable optimization as a layer
2021, pp. 1041–1053. in neural networks,” in Proc. 34th Int. Conf. Mach. Learn., 2017, vol. 70,
[12] W. Xiao and C. Belta, “Control barrier functions for systems with pp. 136–145.
high relative degree,” in Proc. IEEE 58th Conf. Decis. Control, 2019, [39] B. Amos, I. D. J. Rodriguez, J. Sacks, B. Boots, and J. Z. Kolter, “Dif-
pp. 474–479. ferentiable MPC for end-to-end planning and control,” in Proc. 32nd Int.
[13] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Conf. Neural Inf. Process. Syst., 2018, pp. 8299–8310.
Comput., vol. 9, no. 8, pp. 1735–1780, 1997. [40] P.-F. Massiani, S. Heim, and S. Trimpe, “On exploration requirements
[14] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image for learning safety constraints,” in Proc. Learn. Dyn. Control, 2021,
recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 905–916.
pp. 770–778. [41] S. Gruenbacher et al., “GoTube: Scalable stochastic verification of
[15] A. Vaswani et al., “Attention is all you need,” in Proc. Int. Conf. Neural continuous-depth models,” in Proc. AAAI Conf. Artif. Intell., vol. 36, no. 6,
Inf. Process. Syst., 2017, pp. 5998–6008. 2022, pp. 6755–6764.
[16] M. Lechner and R. Hasani, “Learning long-term dependencies in [42] S. Grunbacher, R. Hasani, M. Lechner, J. Cyranka, S. A. Smolka, and R.
irregularly-sampled time series,” 2020, arXiv:2006.04418. Grosu, “On the verification of neural odes with stochastic guarantees,” in
[17] R. Hasani et al., “Closed-form continuous-time neural networks,” Proc. AAAI Conf. Artif. Intell., 2021, vol. 35, pp. 11525–11535.
Nature Mach. Intell., Nature Publishing Group UK London, 2022, [43] J. V. Deshmukh, J. P. Kapinski, T. Yamaguchi, and D. Prokhorov, “Learn-
pp. 1–12. ing deep neural network controllers for dynamical systems with safety
[18] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representa- guarantees: Invited paper,” in Proc. IEEE/ACM Int. Conf. Comput.-Aided
tions by back-propagating errors,” Nature, vol. 323, no. 6088, pp. 533–536, Des., 2019, pp. 1–7.
1986. [44] J. Ferlez, M. Elnaggar, Y. Shoukry, and C. Fleming, “ShielDNN: A prov-
[19] A. E. Bryson and Y.-C. Ho, Applied Optimal Control. Waltham, MA, ably safe NN filter for unsafe NN controllers,” 2020, arXiv:2006.09564.
USA:Ginn Blaisdell, 1969. [45] W. Jin, Z. Wang, Z. Yang, and S. Mou, “Neural certificates for safe control
[20] J. B. Rawlings, D. Q. Mayne, and M. M. Diehl, Model Predictive Control: policies,” 2020, arXiv:2006.08465.
Theory, Computation, and Design, 2nd ed. Madison, WI, USA: Nob Hill [46] H. Zhao, X. Zeng, T. Chen, and J. Woodcock, “Learning safe neural
Publishing, 2017. network controllers with barrier certificates,” Form Asp Comput., vol. 33,
[21] S. Mitsch, K. Ghorbal, and A. Platzer, “On provably safe obstacle avoid- pp. 437–455, 2021.
ance for autonomous robotic ground vehicles,” in Proc. Robot.: Sci. Syst., [47] H. K. Khalil, Nonlinear Systems, 3rd ed. Englewood Cliffs, NJ, USA:
2013. Prentice-Hall, 2002.
[22] C. Pek, S. Manzinger, M. Koschi, and M. Althoff, “Using online verifi- [48] W. Xiao, C. G. Cassandras, C. A. Belta, and D. Rus, “Control barrier
cation to prevent autonomous vehicles from causing accidents,” Nature functions for systems with multiple control inputs,” in Proc. Amer. Control
Mach. Intell., vol. 2, no. 9, pp. 518–528, 2020. Conf., 2022, pp. 2221–2226.
[23] M. Althoff and J. M. Dolan, “Online verification of automated road [49] W. Xiao et al., “Rule-based optimal control for autonomous driving,” in
vehicles using reachability analysis,” IEEE Trans. Robot., vol. 30, no. 4, Proc. ACM/IEEE 12th Int. Conf. Cyber- Phys. Syst., 2021, pp. 143–154.
pp. 903–918, Aug. 2014. [50] Y. Ye and E. Tse, “An extension of Karmarkar’s projective algorithm
[24] S. M. LaValle, J. Kuffner, and J. James, “Randomized kinody- for convex quadratic programming,” Math. Program., vol. 44, no. 1,
namic planning,” Int. J. Robot. Res., vol. 20, no. 5, pp. 378–400, pp. 157–179, 1989.
2001. [51] W. Xiao and C. G. Cassandras, “Decentralized optimal merging control
[25] P. E. Hart, N. J. Nilsson, and B. Raphael, “A formal basis for the heuristic for connected and automated vehicles with safety constraint guarantees,”
determination of minimum cost paths,” IEEE Trans. Syst. Sci. Cybern., Automatica, vol. 123, 2021, Art. no. 109333.
vol. 4, no. 2, pp. 100–107, Jul. 1968. [52] W. Xiao, C. G. Cassandras, and C. Belta, “Bridging the gap between
[26] K. Ota et al., “Deep reactive planning in dynamic environments,” in Proc. optimal trajectory planning and safety-critical control with applications
Conf. Robot Learn., 2021, pp. 1943–1957. to autonomous vehicles,” Automatica, vol. 129, 2021, Art. no. 109592.
[27] H. Roehm, J. Oehlerking, M. Woehrle, and M. Althoff, “Model confor- [53] A. Amini et al., “Learning robust control policies for end-to-end au-
mance for cyber-physical systems: A survey,” ACM Trans. Cyber- Phys. tonomous driving from data-driven simulation,” IEEE Robot. Automat.
Syst., vol. 3, no. 3, pp. 1–26, 2019. Lett., vol. 5, no. 2, pp. 1143–1150, Apr. 2020.
[28] G. Yang, C. Belta, and R. Tron, “Self-triggered control for safety critical [54] T.-H. Wang, A. Amini, W. Schwarting, I. Gilitschenski, S. Karaman, and
systems using control barrier functions,” in Proc. Amer. Control Conf., D. Rus, “Learning interactive driving policies via data-driven simulation,”
2019, pp. 4454–4459. in Proc. Int. Conf. Robot. Automat., 2022, pp. 7745–7752.

Authorized licensed use limited to: Zhejiang University. Downloaded on October 11,2024 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.
XIAO et al.: BarrierNet: DIFFERENTIABLE CONTROL BARRIER FUNCTIONS FOR LEARNING OF SAFE ROBOT CONTROL 2307

[55] A. Amini et al., “Vista 2.0: An open, data-driven simulator for multimodal Makram Chahine received the Diplôme d’Ingénieur
sensing and policy learning for autonomous vehicles,” in Proc. Int. Conf. in applied mathematics from École Centrale Paris,
Robot. Automat., 2022, pp. 2419–2426. Rennes, France, and the M.Sc. degree in aerospace
[56] W. Xiao, C. Belta, and C. G. Cassandras, “Event-triggered control for engineering from the Georgia Institute of Technol-
safety-critical systems with unknown dynamics,” IEEE Trans. Autom. ogy, Atlanta, GA, USA, both in 2019. Since 2021,
Control, 2022, early access, doi: 10.1109/TAC.2022.3202088. he has been working toward the Ph.D. degree in
[57] A. Rucco, G. Notarstefano, and J. Hauser, “An efficient minimum-time electrical engineering and computer science with the
trajectory generation strategy for two-track car vehicles,” IEEE Trans. Massachusetts Institute of Technology, Cambridge,
Control Syst. Technol., vol. 23, no. 4, pp. 1505–1519, Jul. 2015. MA, USA.
[58] R. T. Chen, Y. Rubanova, J. Bettencourt, and D. Duvenaud, “Neural ordi- His research interests include autonomous robots,
nary differential equations,” in Proc. 32nd Int. Conf. Neural Inf. Process. artificial intelligence, and complex interacting
Syst., 2018, pp. 6572–6583. systems.
[59] R. Hasani, M. Lechner, A. Amini, D. Rus, and R. Grosu, “Liquid time-
constant networks,” in Proc. AAAI Conf. Artif. Intell., 2021, vol. 35,
pp. 7657–7666.
[60] C. Vorbach, R. Hasani, A. Amini, M. Lechner, and D. Rus, “Causal
navigation by continuous-time neural networks,” in Proc. Adv. Neural Inf.
Process. Syst., vol. 34, 2021, pp. 12425–12440.

Alexander Amini (Member, IEEE) received the B.S.


and M.S. Ph.D. degrees from the Massachusetts Insti-
Wei Xiao (Member, IEEE) received the B.Sc. degree tute of Technology (MIT), Cambridge, MA, USA, in
in mechanical engineering and automation from the 2017, 2018, and 2022, respectively, both in electrical
University of Science and Technology Beijing, Bei- engineering and computer science, with a minor in
jing, China, the M.Sc. degree in robotics from the mathematics.
Chinese Academy of Sciences (Institute of Automa- He is a Postdoctoral Research Associate with the
tion), Beijing, China, and the Ph.D. degree in systems MIT, in the Computer Science and Artificial Intelli-
engineering from Boston University, Boston, MA, gence Laboratory (CSAIL), with Prof. Daniela Rus.
USA, in 2013, 2016, and 2021, respectively. His research interests include developing the science
He is currently a Postdoctoral Associate with the and engineering of autonomy and its applications to
Massachusetts Institute of Technology, Cambridge, safe decision making for autonomous agents. His work has spanned learn-
MA, USA. His current research interests include con- ing end-to-end control (i.e., perception-to-actuation) of autonomous systems,
trol theory and machine learning, with particular emphasis on robotics and traffic formulating confidence of neural networks, mathematical modeling of human
control. mobility, as well as building complex inertial refinement systems.
Dr. Xiao was the recipient of an Outstanding Student Paper Award at the 2020 Mr. Amini is a recipient of the NSF Graduate Research Fellowship.
IEEE Conference on Decision and Control.

Tsun-Hsuan Wang (Student Member, IEEE) re-


ceived the B.Sc. and M.Sc. degrees from National
Tsing Hua University, Hsinchu City, Taiwan, in 2017
and 2020, respectively, both in electrical engineering. Xiao Li (Member, IEEE) received the Ph.D. degree
He received the Ph.D. degree in electrical engineer- in mechanical engineering from Boston University,
ing and computer science from the Computer Sci- Boston, MA, USA, in 2019.
ence and Artificial Intelligence Laboratory (CSAIL), He is currently a Postdoctoral Associate with the
Massachusetts Institute of Technology (MIT), Cam- Massachusetts Institute of Technology (MIT), Cam-
bridge, MA, USA. bridge, MA, USA Computer Science and Artificial
His research interests include simulation for robot Intelligence Lab (CSAIL). His research interests in-
learning, intersection between control theory and ma- clude reinforcement learning, trajectory prediction,
chine learning, and structures and explainability in neural policies, on the and generation with applications in robotic manipu-
application of autonomous driving, flight control, and soft robotics. lation and autonomous driving.

Ramin Hasani received the Ph.D. degree with dis-


tinction in computer science from the Vienna Univer-
sity of Technology (TU Wien), Vienna, Austria, in
2020. Daniela Rus (Fellow, IEEE) received the Ph.D. de-
He was a Postdoctoral Associate with the gree in computer science from Cornell University,
Computer Science and Artificial Intelligence Lab Ithaca, NY, USA, in 1993.
(CSAIL), Massachusetts Institute of Technology She was a Professor with the Department of Com-
(MIT), Cambridge, MA, USA, leading research puter Science, Dartmouth College, Hanover, NH,
on modeling intelligence and sequential decision- USA. She is currently the Andrew (1956) and Erna
making, with Prof. Daniela Rus. He is currently a Viterbi Professor of Electrical Engineering and Com-
Principal AI and Machine Learning Scientist with the puter Science and the Director of the Computer Sci-
Vanguard Group and a Research Affiliate with the CSAIL, MIT. His research ence and Artificial Intelligence Laboratory with the
interests include robust deep learning and decision-making in complex dynam- Massachusetts Institute of Technology (MIT), Cam-
ical systems. bridge, MA, USA. Her research interests include
Dr. Hasani was the recipient of the HPC Innovation Excellence Award in robotics, mobile computing, and big data. The key focus of her research is
2022 and was nominated for the TÜV Austria Dissertation Award in 2020 for to develop the science of networked/distributed/collaborative robotics.
his Ph.D. dissertation and continued research on Liquid Neural Networks, which Dr. Rus is a Class of 2002 MacArthur Fellow, a Fellow of ACM and AAAI,
got recognized internationally. and a Member of the National Academy of Engineering.

Authorized licensed use limited to: Zhejiang University. Downloaded on October 11,2024 at 13:43:17 UTC from IEEE Xplore. Restrictions apply.

You might also like