0% found this document useful (0 votes)
16 views7 pages

IGP2 Emnvironment

The document presents an integrated prediction and planning system for autonomous driving called Interpretable Goal-based Prediction and Planning (IGP2), which utilizes rational inverse planning and Monte Carlo Tree Search (MCTS) to recognize the goals of other vehicles and plan optimal maneuvers for the ego vehicle. The system is designed to improve driving efficiency and provide interpretable predictions and explanations for its decisions, demonstrated through simulations in urban driving scenarios. Key contributions include a method for goal recognition, integration with MCTS for optimized planning, and evaluation showing accurate goal recognition and enhanced driving efficiency.

Uploaded by

ankushkadu15
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views7 pages

IGP2 Emnvironment

The document presents an integrated prediction and planning system for autonomous driving called Interpretable Goal-based Prediction and Planning (IGP2), which utilizes rational inverse planning and Monte Carlo Tree Search (MCTS) to recognize the goals of other vehicles and plan optimal maneuvers for the ego vehicle. The system is designed to improve driving efficiency and provide interpretable predictions and explanations for its decisions, demonstrated through simulations in urban driving scenarios. Key contributions include a method for goal recognition, integration with MCTS for optimized planning, and evaluation showing accurate goal recognition and enhanced driving efficiency.

Uploaded by

ankushkadu15
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Interpretable Goal-based Prediction and Planning for Autonomous Driving

Stefano V. Albrecht∗† , Cillian Brewitt∗† , John Wilhelm∗† , Balint Gyevnar∗† ,


Francisco Eiras∗‡ , Mihai Dobre∗ , Subramanian Ramamoorthy∗†
∗ Five AI
Ltd., UK, {firstname.lastname}@five.ai
† School
of Informatics, University of Edinburgh, UK
‡ Department of Engineering Science, University of Oxford, UK

Abstract— We propose an integrated prediction and planning in capturing the coupled evolution of traffic. In our view, one
arXiv:2002.02277v3 [cs.RO] 15 Mar 2021

system for autonomous driving which uses rational inverse plan- of the most significant limitations of this class of methods
ning to recognise the goals of other vehicles. Goal recognition (though see recent progress [18]) is the difficulty in extracting
informs a Monte Carlo Tree Search (MCTS) algorithm to plan
optimal maneuvers for the ego vehicle. Inverse planning and interpretable predictions in a form that is amenable to efficient
MCTS utilise a shared set of defined maneuvers and macro integration with planning methods that effectively represent
actions to construct plans which are explainable by means multi-dimensional and hierarchical task objectives.
of rationality principles. Evaluation in simulations of urban Our starting point is that in order to predict the future
driving scenarios demonstrate the system’s ability to robustly maneuvers of a vehicle, we must reason about why – that is,
recognise the goals of other vehicles, enabling our vehicle to
exploit non-trivial opportunities to significantly reduce driving to what end – the vehicle performed its past maneuvers, which
times. In each scenario, we extract intuitive explanations for will yield clues as to its intended goal [19]. Knowing the goals
the predictions which justify the system’s decisions. of other vehicles enables prediction of their future maneuvers
and trajectories, which facilitates planning over extended
I. I NTRODUCTION timescales. We show in our work (illustrated in Figure 2) how
The ability to predict the intentions and driving trajectories such reasoning can help to address the problem of overly-
of other vehicles is a key problem for autonomous driving [1]. conservative autonomous driving [20]. Further, to the extent
This problem is significantly complicated by the need to make that our predictions are structured around the interpretation
fast and accurate predictions based on limited observation of observed trajectories in terms of high-level maneuvers, the
data which originate from coupled multi-agent interactions. goal recognition process lends itself to intuitive interpretation
To make prediction tractable in such conditions, a standard for the purposes of system analysis and debugging, at a
approach in autonomous driving research is to assume that level of detail suggested in Figure 2. As we develop towards
vehicles use one of a finite number of distinct high-level making our autonomous systems more trustworthy [21], these
maneuvers, such as lane-follow, lane-change, turn, stop, etc. notions of interpretation and the ability to justify (explain)
[2], [3], [4], [5], [6], [7]. A classifier of some type is used the system’s decisions are key [22].
to detect a vehicle’s current executed maneuver based on its To this end, we propose Interpretable Goal-based Predic-
observed driving trajectory. The limitation in such methods is tion and Planning (IGP2) which leverages the computational
that they only detect the current maneuver of other vehicles, advantages of using a finite space of maneuvers, but extends
hence planners using such predictions are effectively limited the approach to planning and prediction of sequences (i.e.,
to the timescales of the detected maneuvers. An alternative plans) of maneuvers. We achieve this via a novel integration
approach is to specify a finite set of possible goals for each of rational inverse planning [23], [24] to recognise the goals of
other vehicle (such as road exit points) and to plan a full other vehicles, with Monte Carlo Tree Search (MCTS) [25] to
trajectory to each goal from the vehicle’s observed local state plan optimal maneuvers for the ego vehicle. Inverse planning
[8], [9], [10]. While this approach can generate longer-term and MCTS utilise a shared set of defined maneuvers to
predictions, a limitation is that the generated trajectories must construct plans which are explainable by means of rationality
be matched relatively closely by a vehicle in order to yield principles, i.e. plans are optimal with respect to given metrics.
high-confidence predictions of the vehicle’s goals. We evaluate IGP2 in simulations of diverse urban driving
Recent methods based on deep learning have shown promis- scenarios, showing that (1) the system robustly recognises the
ing results for trajectory prediction in autonomous driving goals of other vehicles, even if significant parts of a vehicle’s
[11], [12], [13], [14], [15], [16], [17]. Prediction models are trajectory are occluded, (2) goal recognition enables our
trained on large datasets that are becoming available through vehicle to exploit opportunities to improve driving efficiency
data gathering campaigns involving sensorised vehicles (e.g. as measured by driving time compared to other prediction
video, lidar, radar). Reliable prediction over several second baselines, and (3) we are able to extract intuitive explanations
horizons remains a hard problem, in part due to the difficulties for the predictions to justify the system’s decisions.
In summary, our contributions are:
S.A. is supported by a Royal Society Industry Fellowship. C.B., J.W., B.G.
were interns at Five AI with partial financial support from the Royal Society • A method for goal recognition and multi-modal trajectory
and UKRI. IGP2 code: https://github.com/uoe-agents/IGP2 prediction via rational inverse planning.
• Integration of goal recognition with MCTS planning to
generate optimised plans for the ego vehicle.
• Evaluation in simulated urban driving scenarios showing
accurate goal recognition, improved driving efficiency,
and ability to interpret the predictions and ego plans.

II. P RELIMINARIES AND P ROBLEM D EFINITION


Let I be the set of vehicles in the local neighbourhood
of the ego vehicle (including itself). At time t, each vehicle
i ∈ I is in a local state sit ∈ S i , receives a local observation
oit ∈ Oi , and can choose an action ait ∈ Ai . We write
st ∈ S = ×i S i for the joint state and sa:b for the tuple Fig. 1: IGP2 system overview.
(sa , ..., sb ), and similarly for ot ∈ O, at ∈ A. Observations
depend on the joint state via p(oit |st ), and actions depend on possible goals, and (2) each vehicle follows a plan generated
the observations via p(ait |oi1:t ). In our system, a local state from a finite library of defined maneuvers.
contains a vehicle’s pose, velocity, and acceleration (we use Figure 1 provides an overview of the components in our
the terms velocity and speed interchangeably); an observation proposed IGP2 system. At a high level, IGP2 approximates the
contains the poses and velocities of nearby vehicles; and an optimal ego policy π ∗ as follows: For each non-ego vehicle,
action controls the vehicle’s steering and acceleration. The generate its possible goals and inversely plan for that vehicle
probability of a sequence of joint states s1:n is given by to each goal. The resulting goal probabilities and predicted
n−1
YZ Z trajectories for each non-ego vehicle inform the simulation
p(s1:n ) = p(ot |st )p(at |o1:t )p(st+1 |st , at ) dot dat process of a Monte Carlo Tree Search (MCTS) algorithm, to
t=1 O A generate an optimal maneuver plan for the ego vehicle toward
(1) its current goal. In order to keep the required search depth
where p(st+1 |st , at ) defines the joint vehicle dynamics, in inverse planning and MCTS shallow (and thus efficient),
and we assume Q independent local observationsQand actions, both plan over a shared set of macro actions which flexibly
i i i
p(ot |st ) = i p(o |s
t t ) and p(at |o1:t ) = i p(at |o1:t ). concatenate maneuvers using context information. We detail
Vehicles react to other vehicles via their observations oi1:n . these components in the following sections.
We define the planning problem as finding an optimal
policy π ∗ which selects the actions for the ego vehicle, ε, to A. Maneuvers
achieve a specified goal, Gε , while optimising the driving We assume that at any time, each vehicle is executing one of
trajectory via a defined reward function. Here, a policy is the following maneuvers: lane-follow, lane-change-left/right,
a function π : (Oε )∗ 7→ Aε which maps an observation turn-left/right, give-way, stop. Each maneuver ω specifies
sequence oε1:n to an action aεt . (State filtering [26] is outside applicability and termination conditions. For example, lane-
the scope of this work.) A goal can be any subset of local change-left is only applicable if there is a lane in same driving
states, Gε ⊂ S ε . In this paper, we focus on goals that specify direction to the left of the vehicle, and terminates once the
target locations and “stopping goals” which specify a target vehicle has reached the new lane and its orientation is aligned
location and zero velocity. Formally, define with the lane. Some maneuvers have free parameters, e.g.
follow-lane has a parameter to specify when to terminate.
Ωn = s1:n sεn ∈ Gε ∧ ∀m < n : sεm 6∈ Gε

(2)
If applicable, a maneuver specifies a local trajectory ŝi1:n to
where sεn ∈ Gε means thatPsεn satisfies Gε . The second be followed by the vehicle, which includes a reference path
∞ R in the global coordinate frame and target velocities along the
condition in (2) ensures that n=1 Ωn p(s1:n )ds1:n ≤ 1 for
any policy π, which is needed for soundness of the sum in path. For convenience in exposition, we assume that ŝi uses
(3). The problem is to find π ∗ such that the same representation and indexing as si , but in general
∞ Z this does not have to be the case (for example, ŝ may be
X

π ∈ arg max p(s1:n )Rε (s1:n ) ds1:n (3) indexed by longitudinal position rather than time, which can
π Ωn be interpolated to time indices). In our system, the reference
n=1
path is generated via a Bezier spline function fitted to a set of
where Ri (s1:n ) is vehicle i’s reward for s1:n . We define Ri
points extracted from the road topology, and target velocities
as a weighted sum of reward elements based on trajectory
are set using domain heuristics similar to [27].
execution time, longitudinal and lateral jerk, path curvature,
and safety distance to leading vehicle. B. Macro Actions
Macro actions specify common sequences of maneuvers
III. IGP2: I NTERPRETABLE G OAL - BASED P REDICTION
and automatically set the free parameters (if any) in maneuvers
AND P LANNING
based on context information such as road layout. Table I
Our general approach relies on two assumptions: (1) each specifies the macro actions used in our system. The applica-
vehicle seeks to reach some (unknown) goal from a set of bility condition of a macro action is given by the applicability
Macro action: Additional applicability condition: Maneuver sequence (maneuver parameters in brackets):
Continue — lane-follow (end of visible lane)
Continue next exit Must be in roundabout and not in outer-lane lane-follow (next exit point)
Change left/right There is a lane to the left/right lane-follow (until target lane clear), lane-change-left/right
Exit left/right Exit point on same lane ahead of car and in correct direction lane-follow (exit point), give-way (relevant lanes), turn-left/right
Stop There is a stopping goal ahead of the car on the current lane lane-follow (close to stopping point), stop
TABLE I: Macro actions used in our system. Each macro action concatenates one or more maneuvers and automatically sets their parameters.

condition of the first maneuver in the macro action as well where β is a scaling parameter (we use β = 1). This likelihood
as optional additional conditions. The termination condition definition assumes that vehicles drive approximately rationally
of a macro action is given by the termination condition of (i.e., optimally) to achieve their goals while allowing for some
the last maneuver in the macro action. deviation. If a goal is infeasible, we set its probability to zero.
Algorithm 1 shows the pseudo code for our goal recognition
C. Velocity Smoothing algorithm, with further details in below subsections.
To obtain a feasible trajectory across maneuvers for vehicle 1) Goal Generation: A heuristic function is used to
i, we define a velocity smoothing operation which optimises generate a set of possible goals G i for vehicle i based on its
the target velocities in a given trajectory ŝi1:n . Let x̂t be the location and context information such as road layout. In our
longitudinal position on the reference path at ŝit and v̂t its system, we include goals for the visible end of the current
target velocity, for 1 ≤ t ≤ n. We define κ : x → v as the road and connecting roads (bounded by the ego vehicle’s view
piecewise linear interpolation of target velocities between region). In addition to such static goals, it is also possible
points x̂t . Given the time elapsed between two time steps, to add dynamic goals which depend on current traffic. For
∆t; the maximum velocity and acceleration, vmax /amax ; and example, in the dense merging scenario shown in Figure 2d,
setting x1 = x̂1 , v1 = v̂1 , we define velocity smoothing as stopping goals are dynamically added to model a vehicle’s
Xn n−1
X intention to allow the ego vehicle to merge in front.
min ||vt − κ(xt )||2 + λ ||vt+1 − vt ||2 2) Maneuver Detection: Maneuver detection is used to
x2:n ,v2:n
t=1 t=1 detect the current executed maneuver of a vehicle (at time t),
s.t. xt+1 = xt + vt ∆t (4) allowing inverse planning to complete the maneuver before
0 < vt < vmax , vt ≤ κ(xt ) planning onward. We assume a module which computes
|vt+1 − vt | < amax ∆t probabilities over current maneuvers, p(ω i ), for each vehicle i.
One option is Bayesian changepoint detection (e.g. [29]). The
where λ > 0 is the weight given to the acceleration part details of maneuver detection are outside the scope of our
of the optimisation objective. Eq. (4) is a nonlinear non- paper and in our experiments we use a simulated detector
convex optimisation problem which can be solved, e.g., using (cf. Sec IV-B). As different current maneuvers may hint at
a primal-dual interior point method (we use IPOPT [28]). different goals, we perform inverse planning for each possible
From the solution of the problem, (x2:n , v2:n ), we interpolate current maneuver for which p(ω i ) > 0. Thus, each current
to obtain the achievable velocities at the original points x̂t . maneuver produces its associated posterior probabilities over
goals, denoted by p(Gi | s1:t , ω i ).
D. Goal Recognition
3) Inverse Planning: Inverse planning is done using A*
We assume that each non-ego vehicle i seeks to reach search [30] over macro actions. A* starts after completing
one of a finite number of possible goals Gi ∈ G i , using the current maneuver ω i which produces the initial trajectory
plans constructed from our defined macro actions. We use the ŝ1:τ . Each search node q corresponds to a state s ∈ S, with
framework of rational inverse planning [23], [24] to compute initial node at state ŝτ , and macro actions are filtered by their
a Bayesian posterior distribution over i’s goals at time t applicability conditions applied to s. A* chooses the next
macro action leading to a node q 0 which has lowest estimated
p(Gi |s1:t ) ∝ L(s1:t |Gi )p(Gi ) (5)
total cost1 to goal Gi , given by f (q 0 ) = l(q 0 ) + h(q 0 ). The
where L(s1:t |Gi ) is the likelihood of i’s observed trajectory cost l(q 0 ) to reach node q 0 is given by the driving time
assuming its goal is Gi , and p(Gi ) specifies the prior from i’s location in the initial search node to its location in
probability of Gi . q 0 , following the trajectories returned by the macro actions
The likelihood is a function of the reward difference leading to q 0 . A* uses the assumption that all other vehicles
between two plans: the reward r̂ of the optimal trajectory from not planned for use a constant-velocity lane-following model
i’s initial observed state si1 to goal Gi after velocity smoothing, after their observed trajectories. We do not check for collisions
and the reward r̄ of the trajectory which follows the observed during inverse planning. The cost heuristic h(q 0 ) to estimate
trajectory until time t and then continues optimally to goal remaining cost from q 0 to goal Gi is given by the driving
Gi , with smoothing applied only to the trajectory after t. The time from i’s location in q 0 to goal via straight line at speed
likelihood is defined as
1 Here we use the term “cost” in keeping with standard A* terminology
L(s1:t |Gi ) = exp(β(r̄ − r̂)) (6) and to differentiate from the reward function defined in Sec. II.
Algorithm 1 Goal recognition algorithm Algorithm 2 Monte Carlo Tree Search algorithm
i Returns: optimal maneuver for ego vehicle ε in state st
Input: vehicle i, current maneuver ω , observations s1:t
Returns: goal probabilities p(Gi |s1:t , ω i ) Perform K simulations:
1: Generate possible goals Gi ∈ G i from state sit 1: Search node q.s ← st (root node)
2: Set prior probabilities p(Gi ) (e.g. uniform) 2: Search depth d ← 0
3: for all Gi ∈ G i do 3: for all i ∈ I \ {ε} do
4: ŝi1:n ← A*(ω i ) from ŝi1 = si1 to Gi 4: Sample current maneuver ω i ∼ p(ω i )
5: Apply velocity smoothing to ŝi1:n 5: Sample goal Gi ∼ p(Gi | s1:t , ω i )
6: r̂ ← reward Ri (ŝi1:n ) 6: Sample trajectory ŝi1:n ∈ {ŝi,k i i i,k
1:n | ω , G } with p(ŝ1:n )
7: s̄i1:m ← A*(ω i ) from s̄it to Gi , with s̄i1:t = si1:t 7: while d < dmax do
8: Apply velocity smoothing to s̄it+1:m 8: Select macro action µ for ε applicable in q.s
9: r̄ ← reward Ri (s̄i1:m ) 9: ŝτ :ι ← Simulate µ until it terminates, with non-ego vehicles
10: L(s1:t |Gi , ω i ) ← exp(β(r̄ − r̂)) following their sampled trajectories ŝi1:n
11: Return p(Gi |s1:t , ω i ) ∝ L(s1:t |Gi , ω i ) p(Gi ) 10: r←∅
11: if ego vehicle collides during ŝτ :ι then
12: r ← rcoll
13: else if ŝει achieves ego goal Gε then
limit. This definition of h(q 0 ) is admissible as per A* theory, 14: r ← Rε (ŝt:n )
which ensures that the search returns an optimal plan. After 15: else if d = dmax − 1 then
the optimal plan is found, we extract the complete trajectory 16: r ← rterm
ŝi1:n from the maneuvers in the plan and initial segment ŝ1:τ . 17: if r 6= ∅ then
4) Trajectory Prediction: Our system predicts multiple 18: Use (8) to backprop r along search branches (q, µ, q 0 ) that
plausible trajectories for a given vehicle and goal. This generated the simulation
is required because there are situations in which different 19: Start next simulation
20: q 0 .s = ŝι ; q ← q 0 ; d ← d + 1
trajectories may be (near-)optimal but may lead to different
Return maneuver for ε in st , µ ∈ arg maxµ Q(root, µ)
predictions which could require different behaviour on the part
of the ego vehicle. We run A* search for a fixed amount of
time and let it compute a set of plans with associated rewards
(up to some fixed number of plans). Any time A* search of trajectories uses a combination of proportional control and
finds a node that reaches the goal, the corresponding plan is adaptive cruise control (based on IDM [32]) to control a
added to the set of plans. Given a set of smoothed trajectories vehicle’s acceleration and steering. Termination conditions
{ŝi,k i i i i of maneuvers are monitored in each time step based on the
1:n |ω , G }k=1..K to goal G with initial maneuver ω and
i i,k
associated reward rk = R (ŝ1:n ), we compute a distribution vehicle’s observations. Collision checking is performed on
over the trajectories via a Boltzmann distribution ŝτ :ι to check whether the ego vehicle collided, in which case
we set the reward to r ← rcoll which is back-propagated using
p(ŝi,k
1:n ) ∝ exp(γ rk ) (7) (8), where rcoll is a method parameter. Otherwise, if the new
state ŝι achieves the ego goal Gε , we compute the reward
where γ is a scaling parameter (we use γ = 1). Similar to
for back-propagation as r ← Rε (ŝt:n ). If the search reached
Eq. (6), Eq. (7) encodes the assumption that trajectories which
its maximum depth dmax without colliding or achieving the
are closer to optimal are more likely.
goal, we set r ← rterm which can be a constant or based on
E. Ego Vehicle Planning heuristic reward estimates similar to A* search.
The reward r is back-propagated through search branches
To compute an optimal plan for the ego vehicle, we use the (q, µ, q 0 ) that generated the simulation, using a 1-step off-
goal probabilities and predicted trajectories to inform a Monte policy update function (similar to Q-learning [33])
Carlo Tree Search (MCTS) algorithm [25] (see Algorithm 2). (
The algorithm performs a number of closed-loop simula- δ −1 [r − Q(q, µ)] if q leaf node, else
Q(q, µ) ← Q(q, µ)+ −1
tions ŝt:n , starting in the current state ŝt = st down to some δ [maxµ0 Q(q 0 , µ0 ) − Q(q, µ)]
fixed search depth or until a goal state is reached. At the (8)
start of each simulation, for each non-ego vehicle, we first where δ is the number of times that macro action µ has
sample a current maneuver, then goal, and then trajectory for been selected in q. After the simulations are completed, the
the vehicle using the associated probabilities (cf. Section III- algorithm selects the best macro action for execution in st
D). Each node q in the search tree corresponds to a state from the root node, arg maxµ Q(root, µ).
s ∈ S and macro actions are filtered by their applicability
conditions applied to s. After selecting a macro action µ using IV. E VALUATION
some exploration technique (we use UCB1 [31]), the state We evaluate IGP2 in simulations of diverse urban driving
in the current search node is forward-simulated based on the scenarios, showing that: (1) our inverse planning method
trajectory generated by the macro action µ and the sampled robustly recognises the goals of non-ego vehicles; (2) goal
trajectories of non-ego vehicles, resulting in a partial trajectory recognition leads to improved driving efficiency measured by
ŝτ :ι and new search node q 0 with state ŝι . Forward-simulation driving time; and (3) intuitive explanations for the predictions
(a) Scenario 1 (S1) (b) Scenario 2 (S2) (c) Scenario 3 (S3) (d) Scenario 4 (S4)
Fig. 2: IGP2 in 4 test scenarios. Ego vehicle shown in blue. Bar plots show goal probabilities for non-ego vehicles. For each goal, up to
two of the most probable predicted trajectories to goal are shown with thickness proportional to probability. (a) S1: Ego’s goal is blue goal.
Vehicle V1 is on the ego’s road, V1 changes from left to right lane, biasing the ego prediction towards the belief that V1 will exit, since a
lane change would be irrational if V1 ’s goal was to go east. As exiting will require a significant slowdown, the ego decides to switch
lanes to avoid being slowed down too. (b) S2: Ego’s goal is blue goal. Vehicle V1 is approaching the junction from the east and vehicle
V2 from the west. As V1 approaches the junction, slows down and waits to take a turn, the ego’s belief that V1 will turn right increases
significantly, since it would be irrational to stop if the goal was to turn left or go straight. Since the ego recognised V1 ’s goal is to go
north, it predicts that V1 will wait until V2 has passed, giving the ego an opportunity to enter the road. (c) S3: Ego’s goal is green goal.
As V1 changes from the inside to the outside lane of the roundabout and decreases its speed, it significantly biases the ego prediction
towards the belief that V1 will take the south exit since that is the rational course of action for that goal. This encourages the ego to enter
the roundabout while V1 is still in roundabout. (d) S4: Ego’s goal is purple goal. With two vehicles stopped at the junction at a traffic
light, vehicle V1 is approaching them from behind, and vehicle V2 is crossing in the opposite direction. When V1 reaches zero velocity, the
goal generation function adds a stopping goal (orange) for V1 in its current position, shifting the goal distribution towards it since stopping
is not rational for the north/west goals. The interpretation is that V1 wants the ego to merge in front of V1 , which the ego then does.

can be extracted to justify the system’s decisions. (Video show-


ing IGP2 in action: https://www.five.ai/igp2.)
A. Scenarios
We use two sets of scenario instances. For in-depth analysis
of goal recognition and planning, we use four defined local
interaction scenarios shown in Figure 2. For each of these
scenarios, we generate 100 instances with randomly offset
initial longitudinal positions (∼ [−10, +10] meters) and
initial speed sampled from range [5, 10] m/s for each vehicle Fig. 3: Town 1 and Town 2 layouts.
including ego vehicle. Here the ego vehicle observes the
whole scenario. To further assess IGP2’s ability to complete goal recognition, replaced by constant-velocity lane-following
full routes with random traffic, we use two random town prediction after completion of current maneuver. CVel-Avg:
layouts shown in Figure 3. Each town spans an area of like CVel, but uses velocity averaged over the past 2 seconds.
0.16 square kilometers and consists of roads, crossings, and Cons: like CVel, but using a conservative give-way maneuver
roundabouts with 2–4 lanes each. Each junction has one which always waits until all oncoming vehicles on priority
defined priority road. The ego vehicle’s observation radius lanes have passed. In the town scenarios we focus on IGP2
in towns is 50 meters. Non-ego vehicles are spawned within and Cons, and additionally compare to SH-CVel which works
25 meters outside the ego observation radius, with random similarly to MPDM [5]: it simulates each macro action
road, lane, speed, and goal. The total number of non-ego followed by a default Continue macro action, using CVel
vehicles within the ego radius and spawning radius is kept prediction for non-ego vehicles, then choosing the macro
at 8 to maintain a consistent medium-to-high level of traffic. action with maximum estimated reward. (SH stands for “short
In each town we generate 10 instances by choosing random horizon” as the search depth is effectively limited to 1.)
routes for the ego vehicle to complete. The ego vehicle’s We simulate noisy maneuver detection (cf. Sec. III-D.2)
goal is continually updated to be the outermost point on the by giving 0.9 probability to the current executed maneuver
route within the ego observation radius. In all simulations, of the non-ego vehicle and the rest uniformly to other
the non-ego vehicles use manual heuristics to select from the maneuvers. Prior probabilities over non-ego goals are uniform.
maneuvers in Section III-A to reach their goals. All vehicles A* computes up to two predicted trajectories for each non-
use independent proportional controllers for acceleration ego vehicle and goal. MCTS is run at a frequency of 1 Hz,
and steering, and IDM [32] for automatic distance-keeping. performs K = 30 simulations with a maximum search depth
Vehicle motion is simulated using a kinematic bicycle model. of dmax = 5, and uses rcoll = rterm = −1. We set λ = 10
for velocity smoothing (cf. Eq. (4)).
B. Algorithms & Parameters
We compare the following algorithms in scenarios S1–S4. C. Results
IGP2: full system using goal recognition and MCTS. IGP2- 1) Goal probabilities: Figure 4 shows the average prob-
MAP: like IGP2, but MCTS uses only the most probable ability over time assigned to the true goal in scenarios S1–
goal and trajectory for each vehicle. CVel: MCTS without S4. In all tested scenario instances we observe that the
1.0 S1 S2 S3 S4

Probability assigned to true goal


IGP2 5.97 ± .02 7.24 ± .05 8.54 ± .05 10.83 ± .03
0.8 IGP2-MAP 5.99 ± .02 7.23 ± .05 8.36 ± .06 10.40 ± .03
CVel 6.04 ± .03 9.80 ± .17 10.49 ± .09 12.83 ± .03
0.6
S1-V1
CVel-Avg 6.01 ± .02 11.31 ± .17 10.49 ± .09 13.59 ± .02
S2-V1 Cons 6.01 ± .02 12.89 ± .03 10.90 ± .04 16.78 ± .02
0.4
S3-V1
S4-V1 TABLE II: Average driving time (seconds) required to complete
0.2 Occlusions scenario instances from S1–S4, with standard error.
S1-V1
S3-V1
0.0 Town 1 Town 2
0 1 2 3 4 5 120 150
IGP2 Cons IGP2 Cons
Time (s)
100
Fig. 4: Average probability given to true goal of selected vehicles 80 100
in scenarios S1–S4. Note: lines for S1/S3 are shorter than indicated

Seconds

Seconds
in Tab. II since possible vehicle goals change after exit points are 60

reached and we only show lines for initial possible goals. 40 50

20

probability increases with growing evidence and at different 0 0


1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
rates depending on random scenario initialisation. Snapshots Route Route
of goal probabilities (shown as bar plots) associated with the
Fig. 5: Driving times (seconds) of IGP2 and Cons for 10 routes in
non-ego’s most probable current maneuver can be seen in Town 1 and Town 2.
Figure 2. We also tested the method’s robustness to missing
segments in the observed trajectory of a vehicle. In scenarios
suddenly accelerates and continues straight (rather than exiting
S1 and S3 we removed the entire lane-change maneuver from
as in S3, or stopping as in S4). In these cases we observed a 2-
the observed trajectory (but keeping the short lane-follow
3% collision rate for IGP2-MAP (in all collisions, V1 collided
segment before the lane change). To deal with occlusion,
into the ego) while IGP2 produced no collisions. These results
we applied A* search before the beginning of each missing
show that IGP2 exhibits safer driving than IGP2-MAP by
segment to reach the beginning of the next observed segment,
accounting for uncertainty over goals and trajectories.
thereby “filling the gaps” in the trajectory. Afterwards we Figure 5 shows the driving times of IGP2 and Cons for the
applied velocity smoothing to the reconstructed trajectory. routes in the two towns. Both algorithms completed all of the
The results are shown as dashed lines in Figure 4, showing routes. Goal recognition allowed IGP2 to reduce its driving
that even under significant occlusion the method is able to times substantially by exploiting multiple opportunities for
correctly recognise a vehicle’s goal. proactive lane changes and road/junction entries. In contrast,
2) Driving times: Table II shows the average driving Cons exhibited more conservative driving and often waited
times required of each algorithm in scenarios S1–S4. Goal considerably longer at junctions or before taking a turn until
recognition enabled IGP2 and IGP2-MAP to reduce their traffic cleared up. SH-CVel was unable to complete any of
driving times. (S1) All algorithms change lanes to avoid the given routes, as its short planning horizon often caused
being slowed down by V1, leading to same driving times, it to take a wrong turn (thus failing the instance).
however IGP2 and IGP2-MAP initiate the lane change before 3) Interpretability: We are able to extract intuitive expla-
all other algorithms by recognising V1’s intended goal. (S2) nations for the predictions and decisions made by IGP2. The
Cons waits for V1 to clear the lane, which in turn must wait explanations are given in the caption of Figure 2.
for V2 to pass. IGP2 and IGP2-MAP anticipate this behaviour,
allowing them to enter the road earlier. CVel and CVel-Avg V. C ONCLUSION
wait for V1 to reach near-zero velocity. (S3) IGP2 and IGP2- We proposed an autonomous driving system, IGP2, which
MAP are able to enter early as they recognise V1 ’s goal to integrates planning and prediction over extended horizons
exit the roundabout, while CVel, CVel-Avg, and Cons wait for by reasoning about the goals of other vehicles via rational
V1 to exit. (S4) Cons waits until V1 decides to close the gap inverse planning. Evaluation in diverse urban driving scenarios
after which the ego can enter the road. IGP2 and IGP2-MAP showed that IGP2 robustly recognises the goals of non-
recognise V1 ’s goal and merge in front. ego vehicles, resulting in improved driving efficiency while
IGP2-MAP achieved shorter driving times than IGP2 on allowing for intuitive interpretations of the predictions to
some scenario instances (such as S3 and S4). This is because explain the system’s decisions. IGP2 is general in that it uses
IGP2-MAP commits to the most-likely goal and trajectory of relatively standard planning techniques that could be replaced
other vehicles, while IGP2 also considers residual uncertainty with other techniques (e.g. POMDP-based planners [34]),
about goals and trajectories which may lead MCTS to select and the general principles underlying our approach could be
more cautious actions in some situations. The limitation applied to other domains in which mobile robots interact with
of IGP2-MAP can be seen when simulating unexpected other robots/humans. Important future directions include goal
(irrational) behaviours in other vehicles. To test this, we recognition in the presence of occluded objects which can
compared IGP2 and IGP2-MAP on instances from S3 and be seen by the non-ego vehicle but not the ego vehicle, and
S4 which were modified such that V1, after slowing down, accounting for human irrational biases [35], [36].
R EFERENCES on UK roads: Creating an industry-wide framework for safety,” 2019,
FiveAI.
[1] W. Schwarting, J. Alonso-Mora, and D. Rus, “Planning and decision- [22] M. Gadd, D. De Martini, L. Marchegiani, P. Newman, and L. Kunze,
making for autonomous vehicles,” Annual Review of Control, Robotics, “Sense–Assess–eXplain (SAX): Building trust in autonomous vehicles
and Autonomous Systems, vol. 1, pp. 187–210, 2018. in challenging real-world driving scenarios,” in 2020 IEEE Intelligent
[2] C. Dong, J. M. Dolan, and B. Litkouhi, “Smooth behavioral estimation Vehicles Symposium (IV). IEEE, 2020, pp. 150–155.
for ramp merging control in autonomous driving,” in IEEE Intelligent [23] M. Ramı́rez and H. Geffner, “Probabilistic plan recognition using off-
Vehicles Symposium. IEEE, 2018, pp. 1692–1697. the-shelf classical planners,” in 24th AAAI Conference on Artificial
[3] C. Hubmann, J. Schulz, M. Becker, D. Althoff, and C. Stiller, “Au- Intelligence, 2010, pp. 1121–1126.
tomated driving in uncertain environments: Planning with interaction [24] C. Baker, R. Saxe, and J. Tenenbaum, “Action understanding as inverse
and uncertain maneuver prediction,” IEEE Transactions on Intelligent planning,” Cognition, vol. 113, no. 3, pp. 329–349, 2009.
Vehicles, vol. 3, no. 1, pp. 5–17, 2018. [25] C. Browne, E. Powley, D. Whitehouse, S. Lucas, P. Cowling, P. Rohlf-
[4] B. Zhou, W. Schwarting, D. Rus, and J. Alonso-Mora, “Joint multi- shagen, S. Tavener, D. Perez, S. Samothrakis, and S. Colton, “A
policy behavior estimation and receding-horizon trajectory planning survey of Monte Carlo tree search methods,” IEEE Transactions on
for automated urban driving,” in IEEE International Conference on Computational Intelligence and AI in Games, vol. 4, no. 1, pp. 1–43,
Robotics and Automation. IEEE, 2018, pp. 2388–2394. 2012.
[5] E. Galceran, A. Cunningham, R. Eustice, and E. Olson, “Multipolicy [26] S. V. Albrecht and S. Ramamoorthy, “Exploiting causality for selective
decision-making for autonomous driving via changepoint-based behav- belief filtering in dynamic Bayesian networks,” Journal of Artificial
ior prediction: Theory and experiment,” Autonomous Robots, vol. 41, Intelligence Research, vol. 55, pp. 1135–1178, 2016.
no. 6, pp. 1367–1382, 2017. [27] C. Gámez Serna and Y. Ruichek, “Dynamic speed adaptation for path
[6] C. Hubmann, M. Becker, D. Althoff, D. Lenz, and C. Stiller, “Decision tracking based on curvature information and speed limits,” Sensors,
making for autonomous driving considering interaction and uncertain vol. 17, no. 1383, 2017.
prediction of surrounding vehicles,” in IEEE Intelligent Vehicles [28] A. Wächter and L. Biegler, “On the implementation of an interior-
Symposium (IV). IEEE, 2017, pp. 1671–1678. point filter line-search algorithm for large-scale nonlinear programming,”
[7] W. Song, G. Xiong, and H. Chen, “Intention-aware autonomous Mathematical Programming, vol. 106, no. 1, pp. 25–57, 2006.
driving decision-making in an uncontrolled intersection,” Mathematical [29] S. Niekum, S. Osentoski, C. Atkeson, and A. Barto, “Online Bayesian
Problems in Engineering, vol. 2016, 2016. changepoint detection for articulated motion models,” in IEEE Inter-
[8] B. D. Ziebart, N. Ratliff, G. Gallagher, C. Mertz, K. Peterson, J. A. national Conference on Robotics and Automation. IEEE, 2015.
Bagnell, M. Hebert, A. K. Dey, and S. Srinivasa, “Planning-based [30] P. Hart, N. Nilsson, and B. Raphael, “A formal basis for the heuristic
prediction for pedestrians,” in Intelligent Robots and Systems, 2009, determination of minimum cost paths,” in IEEE Transactions on Systems
pp. 3931–3936. Science and Cybernetics, vol. 4, July 1968, pp. 100–107.
[9] J. Hardy and M. Campbell, “Contingency planning over probabilistic [31] P. Auer, N. Cesa-Bianchi, and P. Fischer, “Finite-time analysis of the
obstacle predictions for autonomous road vehicles,” IEEE Transactions multiarmed bandit problem,” Machine Learning, vol. 47, no. 2-3, pp.
on Robotics, vol. 29, no. 4, pp. 913–929, 2013. 235–256, 2002.
[10] T. Bandyopadhyay, K. S. Won, E. Frazzoli, D. Hsu, W. S. Lee, and [32] M. Treiber, A. Hennecke, and D. Helbing, “Congested traffic states in
D. Rus, “Intention-aware motion planning,” in Algorithmic Foundations empirical observations and microscopic simulations,” Physical Review
of Robotics X. Springer, 2013, pp. 475–491. E, vol. 62, no. 2, p. 1805, 2000.
[11] H. Zhao, J. Gao, T. Lan, C. Sun, B. Sapp, B. Varadarajan, Y. Shen, [33] C. Watkins and P. Dayan, “Q-learning,” Machine Learning, vol. 8, no.
Y. Shen, Y. Chai, C. Schmid, C. Li, and D. Anguelov, “TNT: Target- 3-4, pp. 279–292, 1992.
driven trajectory prediction,” in Conference on Robot Learning, 2020. [34] A. Somani, N. Ye, D. Hsu, and W. S. Lee, “DESPOT: online POMDP
[12] N. Rhinehart, R. McAllister, K. Kitani, and S. Levine, “PRECOG: planning with regularization,” in Advances in Neural Information
prediction conditioned on goals in visual multi-agent settings,” in Processing Systems, 2013, pp. 1772–1780.
Proceedings of the IEEE International Conference on Computer Vision, [35] M. Kwon, E. Biyik, A. Talati, K. Bhasin, D. P. Losey, and D. Sadigh,
2019, pp. 2821–2830. “When humans aren’t optimal: Robots that collaborate with risk-aware
[13] Y. Chai, B. Sappm, M. Bansal, and D. Anguelov, “MultiPath: multiple humans,” in Proceedings of the ACM/IEEE International Conference
probabilistic anchor trajectory hypotheses for behavior prediction,” in on Human-Robot Interaction, 2020.
Conference on Robot Learning, 2019. [36] Y. Hu, L. Sun, and M. Tomizuka, “Generic prediction architecture
[14] Y. Xu, T. Zhao, C. Baker, Y. Zhao, and Y. N. Wu, “Learning trajectory considering both rational and irrational driving behaviors,” in 2019
prediction with continuous inverse optimal control via Langevin IEEE Intelligent Transportation Systems Conference (ITSC). IEEE,
sampling of energy-based models,” arXiv preprint arXiv:1904.05453, 2019, pp. 3539–3546.
2019.
[15] S. Casas, W. Luo, and R. Urtasun, “IntentNet: learning to predict
intention from raw sensor data,” in Conference on Robot Learning,
2018, pp. 947–956.
[16] N. Lee, W. Choi, P. Vernaza, C. B. Choy, P. H. Torr, and M. Chandraker,
“DESIRE: distant future prediction in dynamic scenes with interacting
agents,” in Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, 2017, pp. 336–345.
[17] M. Wulfmeier, D. Z. Wang, and I. Posner, “Watch this: Scalable cost-
function learning for path planning in urban environments,” in 2016
IEEE/RSJ International Conference on Intelligent Robots and Systems.
IEEE, 2016, pp. 2089–2095.
[18] A. Sadat, S. Casas, M. Ren, X. Wu, P. Dhawan, and R. Urtasun,
“Perceive, predict, and plan: Safe motion planning through interpretable
semantic representations,” in European Conference on Computer Vision.
Springer, 2020, pp. 414–430.
[19] S. V. Albrecht and P. Stone, “Autonomous agents modelling other agents:
A comprehensive survey and open problems,” Artificial Intelligence,
vol. 258, pp. 66–95, 2018.
[20] J. Stewart, “Why people keep rear-ending self-driving
cars,” WIRED magazine, 2020, https://www.wired.com/story/
self-driving-car-crashes-rear-endings-why-charts-statistics/ (Accessed:
2020-10-31).
[21] P. Koopman, R. Hierons, S. Khastgir, J. Clark, M. Fisher, R. Alexander,
K. Eder, P. Thomas, G. Barrett, P. H. Torr, A. Blake, S. Ramamoorthy,
and J. McDermid, “Certification of highly automated vehicles for use

You might also like