Constrained Iterative LQR
Constrained Iterative LQR
net/publication/323789278
CITATIONS READS
53 3,805
3 authors:
Masayoshi Tomizuka
University of California, Berkeley
1,033 PUBLICATIONS 28,403 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Jianyu Chen on 07 November 2018.
Abstract—There exist a lot of challenges in trajectory and sampling-based methods are planned in a discretized
planning for autonomous driving: 1) Needs of both spatial space (either in state space or in action space), which leads to
and temporal planning for highly dynamic environments; 2) non-smooth trajectories. And the resolution of the discretized
Nonlinear vehicle models and non-convex collision avoidance
constraints. 3) High computational efficiency for real-time space will largely influence the computational efficiency due
implementation. Iterative Linear Quadratic Regulator (ILQR) to the ”curse of dimensionality”.
is an algorithm which solves predictive optimal control problem The optimization-based method formulates motion plan-
with nonlinear system very efficiently. However, it can not deal ning as a mathematical optimization problem. The plan-
with constraints. In this paper, the Constrained Iterative LQR ning results are continuous, optimal, and spatiotemporal.
(CILQR) is proposed to handle the constraints in ILQR. Then
an on road driving problem is formulated. Simulation case However, since the vehicle kinematics model is nonlinear,
studies show the capability of the CILQR algorithm to solve and the collision avoidance constraints are non-convex, the
the on road driving motion planning problem. problem contains nonlinear equality constraints and non-
convex inequality constraints, making it very difficult and in-
I. I NTRODUCTION efficient to solve the optimization problem. The state-of-the-
Motion Planning is one of the most important modules art work at this field is done by Mercedes-Benz [8], which
in autonomous driving. A typical motion planner for au- utilized Sequential Quadratic Programming (SQP) to solve
tonomous vehicles takes into account the information from the nonlinear and non-convex problem, but the computation
the environment (e.g. motions of the host vehicle and the sur- time is around 0.5s which should be shortened. Some works
rounding vehicles), and plans a trajectory for the vehicle to decouple the spatial and temporal planning [6], [7], which
execute. Although motion planning has been investigated for are computational efficient. However, the temporal planning
decades, it is still challenging to design a motion planner for are only locally allowed based on the spatial planning, which
autonomous vehicles satisfying the following requirements is not real spatiotemporal.
simultaneously: (1) The computation needs to be in real Iterative Linear Quadratic Regulator(ILQR) [9], [10] is
time to interact with the dynamic environment and deal with an algorithm based on Linear Quadratic Regulator(LQR)
emergencies. (2) The planning needs to be spatiotemporal that can solve the unconstrained optimal control problems
to deal with dynamic obstacles. (3) The planner needs to subject to nonlinear system dynamics. Its most important ad-
be able to deal with complex constraints such as collision vantage is time efficiency, which makes it a potential method
avoidance. to solve the real time issue in complicated optimization prob-
The autonomous driving motion planning techniques can lems. However, very few literatures consider constraints in
be roughly divided into three categories: Search-based, ILQR, while constraints are inevitable in autonomous driving
sampling-based and optimization-based methods [1], [2]. motion planning. The control-limited differential dynamic
Search-based methods often refer to A* [3] search or D* [4] programming (DDP) was proposed in [11], which solves the
search. Their planning results are globally optimal, but the nonlinear optimal control problem under constraints, but the
algorithms are inefficient when dealing with moving obsta- form of constraint is limited to a lower and upper bound
cles and are difficult to do spatiotemporal planning. They of the control input. Extended LQR [12], [13] achieves
are also not suitable for on-road driving motion planning collision avoidance, but it only deals with circle obstacles by
because it is hard to set a goal and find a good heuristic penalizing the distance from the center of the host robot to
function, which greatly impacts the computation time of the center of the obstacle. In autonomous driving, however,
A* search and therefore its capability to be real-time. The the shape of obstacles can be more complex. To the best of
sampling-based methods are mainly represented by Rapidly- our knowledge, no work has studied how to handle a general
exploring Random Tree (RRT) [5]. The main disadvantage of form of constraints using ILQR. Therefore, no one has used
RRT is its inefficiency when doing collision check. Besides, ILQR to deal with the general on-road autonomous driving
its planning results are suboptimal. Both the search-based motion planning with complex constraints.
In this paper, the Constrained Iterative LQR (CILQR) is
*J.Chen, W. Zhan and M. Tomizuka are with the Department of Mechani-
cal Engineering, University of California, Berkeley, CA 94720 USA (e-mail: proposed to handle the general form of constraints for ILQR.
[email protected], [email protected], [email protected]). LQR is solved by Dynamic Programming (DP) which is very
efficient, but it can only deal with unconstrained problems. B. Iterative LQR (ILQR)
To take advantage of the efficiency of DP, the constraints are
transformed to the cost function by adding a barrier function. Problem 2 (The Standard ILQR Problem) The Iterative
Nonlinear and non-convex constraints will be linearized. LQR algorithm can deal with the following nonlinear system
The barrier function is then quadratized and added to the motion planning problem
original cost function. Then the CILQR algorithm is applied
min J = 12 xTN QN xN + xTN pN + qN
u0 ,...,uN −1
to a general on road autonomous driving motion planning x0 ,...,xN
problem, where a new general representation of the obstacles N −1 (2)
1 T 1 T
is developed using ellipse. The centerline is approximated + 2 xk Qxk + xk p + 2 uk Ruk + uk r
T T
+q
k=0
using polynomial, enabling a continuously differentiable
distance function for better convergence. s.t. xk+1 = f (xk , uk ) , k = 0, 1, ..., N − 1 (3a)
The remainder of the paper is organized as follows. x0 = x start
(3b)
Section II introduces the Constrained Iterative LQR algo-
rithm. Section III shows how the algorithm is applied to where QN , pN , qN , Q, p, R, r, q are parameters describing
a general on road autonomous driving motion planning the quadratic cost function. The dynamics equation (3a) can
problem. Section IV gives simulation results of some typical be nonlinear. Start with an initial control input sequence,
scenarios. Finally, section V concludes the paper. such as uk = 0, the algorithm iterates with two steps:
Step 1 (Forward Simulation). The control sequence at the
II. C ONSTRAINED I TERATIVE LQR current iteration uk is passed to the real system dynamic-
s (3a) and get the state sequence xk for this iteration.
A. Problem Statement
Step 2 (Backward Optimization). First the nonlinear system
Problem 1 (A General Motion Planning Problem). A con- (3a) is linearized around xk and uk :
strained receding horizon motion planning problem can be
formulated as: δxk+1 = Ak δxk + Bk δuk (4)
N −1
where Ak = Dx f (xk , uk ) and Bk = Du f (xk , uk ) are
min
u0 ,...,uN −1
J = cN (xN ) + cuk (xk ) +cuk (uk ) the first derivatives of the original dynamic function. Now
x0 ,...,xN k=0
under this iteration and with the current linearized system,
s.t. xk+1 = f (xk , uk ) , k = 0, 1, ..., N − 1 (1a) the problem becomes a standard LQR problem:
x0 = x start
(1b) Problem 3 (The Standard LQR Problem)
2
2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC)
where ∇cxk (xk ) is the first order derivative of fkx on xk . where (13a) is the system dynamic equation, which is non-
linear. The collision avoidance constraints are the compli-
step 3 (Barrier Function Shaping). The linearized constraint mentary set of the space occupied by the obstacles, denoted
function fkx will be shaped by a barrier function: as xk ∈ C/∪Oj (k), where C is the whole space, Oj (k) is
the space occupied by obstacle j at time step k. Since Oj (k)
bxk = q1 exp (q2 fkx ) (9)
is usually convex, (13c) is usually a non-convex inequality
where q1 and q2 are parameters to tune the shape of constraint.
the barrier function. The use of the barrier function is to Problem 4 is a subproblem of problem 1. The CILQR
approximate the indicator function, which transforms the algorithm is applied to solve it. The solution will be a
constrained problem into an unconstrained problem. The sequence of optimal control commands u0 , u1 , ..., uN −1 , and
exponential barrier function (9) is twice differentiable and a planned trajectory x0 , x1 , ..., xN .
its derivatives are easy to compute.
step 4 (Constraint Function Quadratization). The shaped B. Vehicle Model
constraint function will be quadratized:
The vehicle kinematic model xshown in Fig. 1 is used in
bxk (xk + δxk ) ≈ δxTk ∇2 bxk (xk ) δxk +δxTk ∇bxk (xk )+bxk (xk ) T
this paper. The state is x = p y
p v θ , where px
(10) y
and p represent the two-dimensional position of the vehicle,
By applying the chain rule, we have:
v is the vehicle velocity and θ represents the yaw angle. The
∇bxk (xk ) = q1 q2 exp (q2 cxk (xk )) ∇cxk (xk ) (11) vehicle dynamics is:
3
2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC)
⎡ ⎤ ⎡ ⎤
N
v cos θ 0 0 J= yawr
+ cof f
k + ck
cacc + cvel (20)
⎢ v sin θ ⎥ ⎢ 0 0 ⎥
k k
ẋ = ⎢ ⎥+⎢ ⎥ v̇ = g (x, u) (14) k=0
⎣ 0 ⎦ ⎣ 1 0 ⎦ θ̇
where the four terms represent penalties for acceleration,
0 0 1
yaw rate, offset from reference and velocity difference. We
Note that there are nonlinear terms in the dynamic equa- will describe below the details of how they are computed:
tions and that θ̇ = Lv tan δ where δ is the steering angle of 1) acceleration: The term:
the front wheels.
1 0
cacc = w u T
uk (21)
k acc k
0 0
is the penalty for acceleration, which is highly related
to
passenger comfort. A lager acceleration than 1m s2 will
cause significant uncomfort [14]. Adding this term to the
objective function can improve passenger comfort and tuning
the weight wacc can change the level of comfort.
Fig. 1: Vehicle Kinematic Model 2) Yaw Rate: The term:
yawr 0 0
Since we are discussing in the discrete-time domain, ck = wyawr ukT
uk (22)
0 1
this model needs to be transformed to discrete-
time form. Once xk and uk is determined, we is the penalty for yaw rate. It penalizes the rapid changes in
have for t ∈ [Ts k, Ts (k + 1)], x (Ts k) = xk = vehicle direction and the lateral acceleration which is related
x T T
pk pyk vk θk , uk = v̇k θ̇k . So to driving safety and comfort. Tuning the weight wyawr can
T change how much we concern the above issues.
ẋ (t) = v (t) cos θ (t) v (t) sin θ (t) v̇k θ̇k , where
3) Offset to Reference: The term cof f
penalizes the
v (t) = vk + v̇k (t − Ts k) and θ (t) = θk + θ̇k (t − Ts k). k
distance to the reference. It is worth thinking about how
Substituting v (t) and θ (t) we get: we define the reference and the distance. A widely used
method is to give a sequence of reference points xr =
ṗx (t) = (vk + v̇k (t − Ts k)) cos (θk + θk (t − Ts k)) [xr0 , xr1 , ..., xrN ] where the lower subscript represents the time
ṗy (t) = (vk + v̇k (t − Ts k)) sin (θk + θk (t − Ts k)) stamp. The term can be then formulated as
(15)
After integrating we have: cof f T
= (xk − xrk ) Qrk (xk − xrk ) (23)
k
t
px (t) = pxk + Ts k ṗx (t) dτ However, this formulation may result in some problematic
t (16)
py (t) = pyk + Ts k ṗy (t) dτ scenarios. For example, as shown in Fig. 2, the vehicle is
on a curve road and it needs to stop before the line A. The
So the states in the next time step are pxk+1 = reference xri are points on the road centerline. If we set the
p ((k + 1) Ts ) and pyk+1 = py ((k + 1) Ts ) when θ̇k = 0.
x
cost as (23), the planned trajectory will be similar to xi
For θ̇k = 0 we have: in the figure. Note xi , xi+1 and xi+2 significantly deviate
from the centerline because they are ”pulled” towards the
pxk+1 = pxk + cos θk vk Ts + v̇2k Ts2 corresponding reference points xri , xri+1 and xri+2 .
(17)
pyk+1 = pyk + sin θk vk Ts + v̇2k Ts2
and:
vk+1 = vk + v̇k Ts
(18)
θk+1 = θk + θ̇k Ts
(17) and (18) can be packaged to the discrete system
dynamic model:
T
xk+1 = pxk+1 pyk+1 vk+1 θk+1 = f (xk , uk )
(19)
C. Objective Function
The objective function can be decoupled into four terms: Fig. 2: Problematic Scenario on Curve Road
4
2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC)
The above constraints are linear and can follow from step
Fig. 3: The Distance Function To The Polynomial Curve 3 in the CILQR algorithm.
4) Obstacle Avoidance Constraint: The obstacle avoid-
There are two main advantages to use this method. First, ance constraint considers the moving and static obstacles on
the polynomial is continuously differentiable and its tangent the road. For example, how to overtake the front vehicle
line is easy to compute. Second, in practice the commercial safely, and how to avoid static obstacles such as parking
sensors often fit the lanes as polynomials, e.g, mobileye fits vehicles on the roadside. It is difficult to deal with because
the lanes as 3rd order polynomials. the constraints are non-convex, as the feasible state space is
4) Velocity Difference: cvel is the cost that helps us the space outside the convex obstacle (such as a rectangle
k
maintain a relatively high speed toachieve representing a car). Here we ignore the vertical axis and only
efficiency.
As
vk − vdes 2 , where consider the planar problem.
k = wvel
shown in Fig. 3, we set cvel
vk is the velocity along the direction of the tangent lk , and In this paper, we model all the obstacles as ellipse,
vdes is the desired speed which is a scalar. as shown in Fig. 5. The long-axis and short-axis of the
ellipse can be set according to the obstacle’s shape (e.g.
D. Constraints the vehicle’s length and width) and it’s velocity. For exam-
Besides some objectives to optimize, there are also some ple, for a car, the long-axis of the ellipse may be set as
constraints to obey when driving. Here four kinds of con- a = l + vtsaf e + ssaf e , and the short-axis be b = w + ssaf e ,
straints are considered: the acceleration constraint, yaw where l is the vehicle length, w is the vehicle width, v is
rate constraint, boundary constraint and obstacle avoidance the vehicle velocity, tsaf e is the safe headway and ssaf e is
constraint. a safety margin. Then combining with the yaw angle of the
1) Acceleration Constraint: The acceleration needs to be vehicle, we can write the ellipse analytically:
bounded according to the limit of engine force and braking
T
force. This term is: (x − xO ) P (x − xO ) = 1 (26)
5
2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC)
6
2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC)
V. C ONCLUSION
B. Case 2: Lane Change and Car Following In this paper, the constrained ILQR algorithm is proposed
to handle the constraints for ILQR. The constraints are
linearized, shaped by barrier function, and then quadratized
Lane changing is one of the most common maneuver
to be placed into the cost function of ILQR. A general on
in both urban driving and highway driving. Usually there
road driving motion planning problem is then solved by
will be a front vehicle in the target lane and the vehicle
CILQR algorithm, where we considered the distance to the
needs to do adaptive cruise control (ACC) after finishing
polynomial reference line and the ellipse representation of
the lane change. One dangerous situation when executing
obstacles. Case studies show our algorithm works properly
lane changing is when there exists a vehicle besides the
in several on road driving scenarios.
host vehicle in the target lane. We created this dangerous
scenario. As shown in Fig. 9, O10 is the first obstacle at R EFERENCES
t = 0, O1T is the first obstacle at t = T , and O20 is the second [1] Gonzlez, David, et al. A review of motion planning techniques for
obstacle at t = 0, O2T is the second obstacle at t = T . automated vehicles. IEEE Transactions on Intelligent Transportation
Systems 17.4 (2016): 1135-1145.
In the planned trajectory of the host vehicle, the vehicle [2] Paden, Brian, et al. A Survey of Motion Planning and Control
first accelerates and passes around the first obstacle to Techniques for Self-driving Urban Vehicles. IEEE Transactions on
change the lane. After finishing lane changing, it decelerates Intelligent Vehicles 1.1 (2016): 33-55.
[3] Dolgov, Dmitri, et al. Practical search techniques in path planning for
to do ACC to its front vehicle. The runtime of this case is autonomous driving. Ann Arbor 1001 (2008): 48105.
0.19s. [4] Ferguson, Dave, and Anthony Stentz. Field D*: An interpolation-
based path planner and replanner. Robotics research. Springer Berlin
Heidelberg, 2007. 239-253.
[5] Kuffner, James J., and Steven M. LaValle. RRT-connect: An efficient
approach to single-query path planning. Robotics and Automation,
2000. Proceedings. ICRA’00. IEEE International Conference on. Vol.
2. IEEE, 2000.
[6] Gu, Tianyu, et al. Tunable and stable real-time trajectory planning for
urban autonomous driving. Intelligent Robots and Systems (IROS),
2015 IEEE/RSJ International Conference on. IEEE, 2015.
[7] Gu, Tianyu, John M. Dolan, and Jin-Woo Lee. Runtime-bounded
tunable motion planning for autonomous driving. Intelligent Vehicles
Symposium (IV), 2016 IEEE. IEEE, 2016.
[8] Ziegler, Julius, et al. Trajectory planning for Bertha - A local, continu-
ous method. Intelligent Vehicles Symposium Proceedings, 2014 IEEE.
Fig. 9: The Lane Change Scenario [9] Li, Weiwei, and Emanuel Todorov. Iterative linear quadratic regulator
design for nonlinear biological movement systems. ICINCO (1). 2004.
[10] Todorov, Emanuel, and Weiwei Li. A generalized iterative LQG
method for locally-optimal feedback control of constrained nonlinear
C. Case 3: A Mixed Scenario stochastic systems. Proceedings of the 2005, American Control Con-
ference, 2005.. IEEE, 2005.
[11] Tassa, Yuval, Nicolas Mansard, and Emo Todorov. Control-limited
differential dynamic programming. 2014 IEEE International Conference
In this scenario, the goal of the host vehicle is to overtake on Robotics and Automation (ICRA). IEEE, 2014.
the slow front vehicle O1 while avoiding the coming vehicle [12] van den Berg, Jur. Extended LQR: locally-optimal feedback control
O2 , and then stop at position x = 50m. As usual, the sub- for systems with non-linear dynamics and non-quadratic cost. Robotics
Research. Springer International Publishing, 2016. 39-56.
script represents the index of the obstacle and the superscript [13] van den Berg, Jur. Iterated LQR smoothing for locally-optimal feed-
represents the time. Fig. 10 shows the planned trajectory of back control of systems with non-linear dynamics and non-quadratic
the host vehicle. From the trajectory, we can see that the host cost. 2014 American Control Conference. IEEE, 2014.
[14] Gonzlez, David, et al. Speed profile generation based on quintic
vehicle first accelerates to overtake the front vehicle before Bzier curves for enhanced passenger comfort. Intelligent Transportation
encountering the coming vehicle, and then decelerates to Systems (ITSC), 2016 IEEE 19th International Conference on. IEEE,
stop at the stop line. The runtime of this case is 0.2s. 2016.