0% found this document useful (0 votes)

10 views12 pages

1224548

The paper presents DeepGame-TP, a game-theoretical trajectory planner that integrates deep learning to enhance trajectory planning for automated vehicles in traffic. It utilizes an LSTM network to predict agents' desired speeds, improving adaptability in various driving scenarios while maintaining real-time performance and transparency. Experiments show that DeepGame-TP effectively avoids dangerous situations and can compute solutions for up to three vehicles in under 100 milliseconds.

Uploaded by

APPOLINAIRE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views12 pages

1224548

Uploaded by

APPOLINAIRE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Giovanni Lucente1,2 , Mikkel Skov Maarssoe1 , Anas Abulehia1 , Sanath Himasekhar

Konthala1 , Reza Dariani1 , and Julian Schindler1

1
Institute of Transportation Systems, German Aerospace Center (DLR)
2
Fakultät Verkehrs-und Maschinensysteme, TU Berlin

September 16, 2024

Posted on 16 Sep 2024 — CC-BY 4.0 — https://doi.org/10.36227/techrxiv.172651860.08700487/v1 — e-Prints posted on TechRxiv are preliminary reports that are not peer reviewed. They should not b...

1
Received XX Month, XXXX; revised XX Month, XXXX; accepted XX Month, XXXX; Date of publication XX Month, XXXX; date of
current version XX Month, XXXX.
Digital Object Identifier 10.1109/OJITS.2022.1234567

DeepGame-TP: Integrating Dynamic

Game Theory and Deep Learning for
Trajectory Planning
Giovanni Lucente∗† , Mikkel Skov Maarssoe∗ , Anas Abulehia∗ , Sanath Himasekhar Konthala∗ ,
Reza Dariani∗ , and Julian Schindler∗
∗ Institute of Transportation Systems, German Aerospace Center (DLR), Lilienthalplatz 7, 38108

Braunschweig, Germany
† Fakultät Verkehrs- und Maschinensysteme, TU Berlin, Straße des 17. Juni 135, 10623 Berlin,

Germany
CORRESPONDING AUTHOR: Giovanni Lucente (e-mail: [email protected]).
This work was supported by the German Aerospace Center (DLR).

ABSTRACT Trajectory planning for automated vehicles in traffic has been a challenging task and a
hot topic in recent research. The need for flexibility, transparency, interpretability and predictability
poses challenges in deploying data-driven approaches in this safety-critical application. This paper
proposes DeepGame-TP, a game-theoretical trajectory planner that uses deep learning to model each
agent’s cost function and adjust it based on observed behavior. In particular, a LSTM network predicts
each agent’s desired speed, forming a penalizing term that reflects aggressiveness in the cost function.
Experiments demonstrated significant advantages of this innovative framework, highlighting the adaptability
of DeepGame-TP in intersection, overtaking, and merging scenarios. It effectively avoids dangerous
situations that could arise from incorrect cost function estimates. The approach is suitable for real-time
applications, solving the Generalized Nash Equilibrium Problem (GNEP) in scenarios with up to three
vehicles in under 100 milliseconds on average.

INDEX TERMS Dynamic Game, Deep Learning, Generalized Nash Equilibrium, LSTM, Trajectory
Planning

I. INTRODUCTION It is possible to identify three main challenges in the

I N recent years, many successful data-driven approaches

have been applied to the field of automated driving,
particularly in perception, prediction and planning tasks.
planning task: the mutual influence between prediction and
planning, the computational time constraint for online ap-
plications and the requirement for interpretability and trans-
Recent advances in deep learning techniques, such as transparency, which are necessary for a safety application like
formers, large language models, and generative AI, have driving automation.
shown great potential in image and speech recognition and Data-driven approaches have shown great potential in the
generation. The research community is now rushing to apply trajectory prediction task, particularly in interactive multi-
these techniques to automated driving, hoping to achieve the agent scenarios where the reciprocal influence between traf-
same success seen in other fields. While the application of fic participants is crucial. Nevertheless, prediction alone is
deep learning techniques in perception is considered state- insufficient for effective planning. Separating prediction from
of-the-art, the discussion about which approach should be planning can lead to suboptimal behavior, where planning
considered standard for planning is still ongoing and a hot merely reacts passively to predictions, or to inaccurate
topic in the literature. estimates, as the actions of the ego vehicle also affect
other traffic participants. For this reason, many works in

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/

VOLUME , 1
Lucente et al.: DeepGame-TP

the literature prefer to consider prediction and planning as a Game Theory is the traditional framework for modeling
single task. multi-agent environments and identifying optimal policies
Another crucial challenge for trajectory planning algorithms within them. The availability of large datasets and simu-
is the stringent computational time constraints required for lation environments allows us to bypass explicit modeling
real-time online applications. These algorithms must process of agent interactions, instead delegating this task to learned
data and generate accurate plans quickly (generally within models. Common learning-based methods are Reinforcement
100 ms) to ensure the responsiveness and safety of automated Learning, Imitation Learning and, more recently, Generative
driving systems. Models.
Deploying an Advanced Driver Assistance System (ADAS) Reinforcement Learning (RL) algorithms enable an agent
in the real world requires a thorough understanding of its to act optimally in an environment by continuously inter-
functionality. In this context, every algorithm used in safety acting with it and collecting feedback on its behavior. The
applications must be interpretable and transparent. Good primary limitations of applying RL to trajectory planning for
performance on a test set does not necessarily mean the autonomous driving arise from the complexity of accurately
solution will work well in real-world conditions, as testing representing real-world driving environments in simulation,
sets often fail to capture the full range of deployment sce- from the high-dimensional state and action spaces inherent in
narios. Furthermore, the black-box nature of some systems trajectory planning tasks and from the challenge of designing
prevents users from addressing issues, even when they are an appropriate reward function. An agent that has learned an
aware of them. This is particularly problematic for data- effective policy in a simulated environment does not guaran-
driven approaches, which are often designed as black boxes. tee optimal performance in reality. The simulated environ-
Therefore, transparent AI has become a major research focus, ment and the curse of dimensionality, resulting in sample
with many solutions being proposed to improve understand- inefficiency, pose challenges in generalizing the policy to all
ing and trust in these systems [1]. possible scenarios. Consequently, the agent’s behavior can
To tackle this challenge, this paper presents DeepGame-TP, become unpredictable in scenarios not encountered during
a game-theoretic trajectory planner for multi-agent traffic training, with an additional risk of overfitting. Another
scenarios that incorporates deep learning while preserving challenge is defining a reward function that consistently leads
transparency in its approach. DeepGame-TP utilizes an Aug- to an optimal policy in all situations, without causing sub-
mented Lagrangian Trust-Region solver to find Generalized optimal or unsafe behavior. For these reasons, RL still suffer
Nash Equilibria (GNE) in dynamic games. The agents’ from long training times and poor performance [3]. Recent
behavior is modeled in the cost function by adding a penalty works that apply RL in trajectory planning for automated
for the distance between the future speed and the desired driving and propose solutions for these problems are [4]–[6].
speed profile, predicted through a Long-Short Term Memory In [4], the authors propose a simulation framework to gen-
(LSTM) neural network, trained on the NGSIM dataset [2]. erate driving scenarios from a real word dataset for training.
The contributions of this paper are: The trained RL algorithm is then tested in a non-signalized T-
1) Definition of a framework that integrates deep learning junction and a non-signalized lane merge intersection. In [5],
into a game-theoretic approach for trajectory planning. the authors propose a hierarchical framework for trajectory
An LSTM neural network predicts the desired longitu- planning. Instead of directly mapping sensor information to
dinal behavior of each actor, which is then incorporated low-level control signals, RL is applied only to the subtask of
into the cost function to compute the Generalized Nash choosing a desired state. Subsequently, a low-level planner
Equilibrium (GNE). is used to generate and track the trajectory according to the
2) Definition of a straightforward approach to characterize chosen state. This is a common tendency that can be found in
the behavior of agents in a traffic environment. The literature: since RL does not always provide effective end-to-
predicted desired speed indicates the driver’s level of end planning performance, often it is used to solve a smaller
aggressiveness. part of the planning problem [3]. In [6], the authors face the
3) Definition of a Trust Region solver based on an problem of sparse rewards, that deteriorate the performance
Augmented Lagrangian formulation, enabling real-time of RL. They define a new reward function by leveraging
computation of the GNE and facilitating online appli- field approximations, which is demonstrated to yield dense
cations. rewards. As these examples illustrate, defining an appropriate
reward function remains a challenge for the performance of
RL algorithms.
II. RELATED WORKS
Imitation Learning (IL) provides a framework where defin-
Trajectory planning in traffic has been tackled using var- ing a reward function is no longer necessary. Instead, the
ious approaches to address the complexity of this task. agent learns directly from examples provided by an expert,
The challenges to face include optimal control in a multi- mapping observations to actions [7]. The literature presents
agent environment, real-time applications and modeling the different examples of IL application in the field of automated
behavior of different agents. It is possible to categorize the vehicles, particularly employing end-to-end approaches [8].
approaches into model-based and learning-based methods.

2 VOLUME ,
It is possible to divide IL techniques into Behavioral Cloning pared to trajectory prediction. The most used approaches for
(BC), Inverse Reinforcement Learning (IRL) and Adversarial trajectory prediction are Generative Adversarial Networks
Imitation Learning (AIL). In Behavioral Cloning (BC), a (GAN) [17], [18], Variational Autoencoders (VAE) [19] and
model is trained through supervised learning to map the state Diffusion Models [20]–[22]. Diffusion models have proven
of the environment to the corresponding expert action. The to be highly effective in prediction tasks. However, they
main advantage of BC is its simplicity; it does not require involve numerous computationally expensive denoising steps
knowledge of the environment’s dynamics, as it relies solely and sampling operations, making them less suitable for real-
on expert demonstrations. The most well-known limitation time, safety-critical applications [20]. Nevertheless, the most
of this approach is the covariate shift problem [7], [9], which critical issue of the application of data-driven approaches in
occurs when the state distribution during training differs safety tasks is the lack of transparency of neural network
from that during testing. This problem arises because the based architectures. The output is not predictable and there
agent is trained on states generated by the expert policy but is are limitations in detecting, understanding and fixing issues.
tested on states influenced by its own policy. As a result, the This constitutes a big topic in research recently [1].
agent may encounter traffic situations it never encountered Game theory is the traditional framework used to model
during training, potentially leading to safety issues. Inverse multi-agent environments and define optimal policies within
Reinforcement Learning (IRL) is an alternative approach to them. The traffic problem is often framed as a dynamic
IL, where the agent infers the underlying reward function or game [23]–[26], where the objective is to solve the Gener-
policy from expert demonstrations. Once the reward function alized Nash Equilibrium Problem (GNEP). A Generalized
is inferred, it is used to learn an optimal decision-making Nash Equilibrium (GNE) is a type of Nash Equilibrium
model through RL. IRL is less sensitive to the covariate (NE) where players are interconnected through shared state
shift problem because the state distribution during both constraints [27], such as collision avoidance. In [23], the
training and testing is generated by the agent, ensuring authors introduce an augmented Lagrangian algorithm to
consistency. However, major limitations of IRL include the solve GNEPs for trajectory optimization problems. The
difficulty of inferring the reward function and the high proposed solver is based on a quasi-Newton root-finding
computational cost during training. These challenges are algorithm to satisfy the first order optimality conditions,
particularly pronounced for complex tasks and rich state- with constraints enforced using an augmented Lagrangian
action spaces, leading to the curse of dimensionality. A formulation. The algorithm is tested in highly interactive
policy indeed can be optimal for an infinite number of scenarios, like intersection and merging. Nonetheless, there
reward functions [7]. Recently some works have deployed is no distinction in the agents’ objective functions. In [24],
IRL for trajectory planning in traffic [10]–[12]. In these [26], two different methods for estimating and differentiating
works, the application of IRL is limited to a score module the agents’ objective functions are presented. In [26], the
to evaluate already planned trajectories, as in [10], [11], authors propose an inverse optimal control algorithm that
or to a Personalized Adaptive Cruise Control (P-ACC) to is able to estimate the other agents’ objective functions in
learn the driver’s car-following preferences from historical real time. From the normal distribution of the objective
data, like in [12]. Adversarial Imitation Learning (AIL) is function parameters, sigma points are sampled. For each
an imitation learning strategy that involves a competitive of these points, the GNEP is solved, resulting in a set of
game between an agent and an adversary (discriminator). predicted trajectories. Upon receiving a new system mea-
The agent generates trajectories aimed at emulating those of surement, an Unscented Kalman Filter updates the parameter
the expert, while the adversary endeavors to distinguish the distribution, making the sigma points with better predictive
agent’s generated trajectories from the original ones provided performance more likely. This approach requires solving a
by the expert. Recent works have shown the potentiality GNEP for each sigma point, where the number of sigma
of AIL [13]–[15], however, a common issue with these points is linearly proportional to the number of agents. In
approaches is that the driving policy may not perform well [24], Social Value Orientation (SVO) is employed to char-
in situations that are different from those encountered during acterize agents’ behavior in the dynamic game, enhancing
training [16]. prediction accuracy when computing the NE. SVO quantifies
Recent advancements in generative models have sparked the extent of an agent’s selfishness or altruism. The utility
increased interest within the research community in applying function for each agent is defined as a combination of its
these models to automated driving, particularly in trajectory own rewards and those of other agents, weighted by the
planning and prediction. However, most approaches primar- SVO angular preference. The reward functions are learned
ily focus on predicting trajectories and generating traffic sce- through IRL from the NGSIM driving data. The likelihood
narios based on specific inputs, making examples of applica- of candidate SVOs is computed evaluating the Gaussian
tions in trajectory planning rare. In trajectory planning, the kernel on the distance between predicted and actual trajec-
generation process must be conditioned on inputs that specify tories. The features used to compute the reward function,
the long-term objective of the planned trajectory, making however, are not always observable in realistic scenarios,
the overall design and training process more complex com- which poses challenges for applying this approach to a real

VOLUME , 3
Lucente et al.: DeepGame-TP

trajectory planner. Generally, methods that estimate the cost simplified:

functions of other agents online require solving prediction min J i (U i )
sub-problems to evaluate candidate parameters, as seen in Ui (3)
the two examples cited above. This process can degrade s.t. C i (U ) < 0
computational performance and affect online applicability. The solution to this set of optimization problems, U , is
The implementation of learned models for this task, which an open-loop generalized Nash equilibrium. The control
avoids the need to solve sub-problems, has received limited signal U i of agent i is the best response to the other
exploration in the literature. agents’ strategies U −i given the initial state of the system
x0 = [x10 , ..., xM T
0 ] . It is open-loop since the control input
III. PROBLEM STATEMENT sequences U is a function of time, and not function of
The traffic environment is modeled as a multi-player dy- the state of the system at time step k , xk . Nonetheless, if
namic game, the proposed algorithm solves the GNEP with- the open-loop game is repeatedly resolved online, as new
out distinguishing between the automated vehicle and the information is obtained, the solution constitutes a policy that
other vehicles that are present in the traffic scenario. The time is closed-loop in the model-predictive control sense [23].
horizon is discretized with N steps, the state and input size In the following subsections we will present the augmented
are denoted by nX and nU respectively. The state and the Lagrangian formulation of the problem, the Trust Region
control input of the agent i at time step k are denoted with xik solver and how are practically implemented the cost function
and uik . The state of each agent at time step k is composed by and the inequality constraints.
xik = [x, y, ψ, v]Tk , i.e. the cartesian coordinates, heading
and speed, while the input is composed by uik = [δ, F ]Tk , A. AUGMENTED LAGRANGIAN FORMULATION
i.e. the steering angle and the longitudinal force. The constrained optimization problems are transformed into
Let’s consider the GNEP with M players, each player unconstrained optimization problems through the augmented
i decides over its control input sequence, denoted with Lagrangian formulation. The augmented Lagrangian for each
U i = [(ui1 )T , ... (uiN −1 )T ]T ∈ RnU (N −1) , that generates agent i is then defined as:
a trajectory denoted with X i = [(xi2 )T , ... (xiN )T ]T ∈ T 1 T
RnX (N −1) , according to the dynamic model: Li (U ) = J i (U i ) + λi C i (U ) + C i (U ) Iµ C i (U ) (4)
2
xk+1 = f (xk , uk ) (1) Where J i is the cost function, λi is the vector of Lagrangian
multipliers associated to the inequality constraints C i and Iµ
The cost function of agent i is denoted by J i (X i , U i ) : is the penalty weight matrix, whose elements are:
RnU (N −1) → R, it depends on the trajectory and on the (
control input sequence. The goal of each agent is to minimize µ if j = q and Cji ≥ 0
Iµ,jq = (5)
the cost function without violating the constraints. The tra- 0 if j ̸= q or Cji < 0
jectory of the system, that aggregates the agents’ trajectories, Where µ is the penalty weight for the quadratic term,
is indicated with X = [X 1 , ... X M ]T ∈ RnX (N −1)M , which, along with the Lagrange multiplier, is increased if
the aggregate input sequence of the system is denoted by the corresponding constraint is violated. The updating rules
U = [U 1 , ... U M ]T ∈ RnU (N −1)M . The vector of all the are:
players’ strategies except the one of player i is denoted with i (k+1) i (k) i (k)
λj ← max(0, λj + µ(k) Cj )
U −i . The problem is formalized as a set of optimization (6)
(k+1) (k)
problems: µ ← γµ γ>1

min J i (X i , U i ) The strategy U i is considered optimal for agent i, as it

X i ,U i minimizes J i under the inequality constraints C i , if:
s.t. Di (X i , U i ) = 0 (2)
∇U i Li (U ) = 0 (7)
i
C (X, U ) < 0 If equation (7) is valid ∀i, that is, for each agent in the
This set of M problems constitutes a GNEP, since they are scenario, then the solution U is a GNE. The unconstrained
coupled through the inequality constraints C i (X, U ) < 0, minimization problem is solved using a Trust Region algo-
that depend on the control input sequence of every vehicle. rithm, which is explained in the following subsection.
This coupling comes from the collision avoidance con-
straints. The equality constraints Di (X i , U i ) = 0 represent B. TRUST REGION SOLVER
the dynamic model that links the control input sequence U i In the unconstrained optimization, Trust Region methods
with the trajectory X i . However, in the practical implemen- define a region around the current iterate where the quadratic
tation of the optimization problem, the equality constrained model is trusted to be a good approximation of the objec-
are ignored, since the dynamic model is imposed through tive function. Within this region, the algorithm seeks the
the integration function X i = F (U i , xi0 ), where xi0 is the approximate minimizer of the model. Trust Region methods
i vehicle state at time 0. The optimization problem is then determine both the direction and the step size simultaneously.

4 VOLUME ,
If a step is not acceptable, the region’s size is reduced, and
Algorithm 1: Trust Region Algorithm
a new minimizer is sought. The size of the trust region is
crucial for the efficiency of each step: a small region results Input: Initial system state x0 = [x10 , ..., xM T
0 ] ,
in small steps, while a large region may lead to a minimizer Lagrange function Li (U ), initial guess U0 ,
of the model that is far from the function’s minimizer. The initial trust region radius ∆0 , tolerance ϵ,
size of the trust region is adjusted based on the algorithm’s acceptance treshold η , maximum number of
performance in the previous iteration [28]. iterations kmax
Algorithm 1 presents the details of the Trust Region solver. Output: Agents’ optimal control sequence
Table 1 shows the parameters of the algorithm. Some key U ∗ = [U ∗1 , ... U ∗M ]T
points should be highlighted: 1 Initialize: k ← 0, Uk ← U0 , ∆ik ← ∆0 , Hki ← I ;
– The Trust Region method is typically used to solve 2 while not converged and k < kmax do
a single unconstrained optimization problem. In this 3 ∇L(Uk ) ← [∇Uk1 L1 , ... ∇UkM LM ]T ;
work, however, it is employed to solve a generalized 4 L(Uk ) ← [L1 (Uk1 ), ... LM (UkM )]T ;
Nash equilibrium, involving multiple optimization prob- 5 for each agent i do
T T
lems that are coupled through inequality constraints. 6 si ← arg minsi si ∇Li (Uk ) + 21 si Hki si
i i
– In the algorithm, there is an initial loop over the agents s. t. ∥s ∥ ≤ ∆k ;
to solve the sub-problems, followed by a second loop 7 end
to check if the actual reduction in the cost function is 8 s = [s1 , ... sM ]T ;
close to the predicted one. Having two separate loops 9 for each agent i do
instead of a single combined loop ensures that all agents 10 δLi ← Li (Uk ) − Li (Uk + s);
T T
are treated equally, preventing any strategic advantage 11 δ L̂i ← −(si ∇Li (Uk ) + 12 si Hki si );
for the agents that appear earlier in the list. 12 ρik ← δLi /δ L̂i ;
– The calculation of the Hessian matrix for the sub- 13 if ρik > η then
i
problem is a critical point for computational complexity 14 Uk+1 ← Uki + si ;
and online feasibility. To mitigate this complexity, in 15 else
this implementation, the Hessian matrix is estimated us- 16
i
Uk+1 ← Uki ;
ing the Symmetric Rank-One (SR1) method (Algorithm 17 end
2). 18 if ρik > 0.75 then
19 if ∥si ∥ > 0.8 ∆ik then
C. LSTM NEURAL NETWORK FOR BEHAVIOR 20 ∆ik+1 ← 2.0 ∆ik ;
PREDICTION 21 end
The GNE simultaneously represents the planned trajectory 22 end
for the controlled vehicle and the predicted trajectories for 23 if ρik < 0.1 then
the other vehicles. To ensure a reliable model of the agents, 24 ∆ik+1 ← 0.5 ∆ik ;
and consequently a reliable prediction and optimal strategy, 25 end
the agents’ behavior must be understood and accurately 26 δ∇Li = ∇Uki Li (Uki + si ) − ∇Uki Li (Uki );
i
represented in the cost function they aim to minimize. In this 27 Hk+1 ← SR1(Hki , δ∇Li , si );
work, the cost function component that gives a representation 28 end
of the agent behavior is the desired speed, predicted through 29 if ∥∇L(Uk+1 )∥ ≤ ϵ then
a Long-Short Term Memory (LSTM) neural network. A 30 converged;
higher desired speed will lead to more aggressive behaviors 31 end
such as overtaking, taking priority when entering unsignal- 32 C(Uk+1 ) ← [C 1 (Uk+1 ), ... C M (Uk+1 )]T ;
ized intersections or maintaining a lower time headway. 33 λ(k+1) , µ(k+1) ← update(λ(k) , µ(k) , C(Uk+1 ));
The cost function for agent i is the following: 34 k ← k + 1;
k<N 35 end
qf X
J i (U i ) = ( [α1 ∥x̃k − x̃C 2
k∥ + 36 return Uk+1
2
k=0
α2 (∥cos(ψk ) − cos(ψkC )∥2
(8)
+ ∥sin(ψk ) − sin(ψkC )∥2 ) + the centerlane from the point x̃k . The term Fk denotes the
2 longitudinal force input. The variable v̂k indicates the desired
α3 ∥vk − v̂k ∥ +
speed at time step k . For non-controllable vehicles, this speed
α4 ∥Fk ∥2 ])2 is predicted by the LSTM network, while for the ego vehicle,
Where x̃k and ψk are the cartesian coordinates and the it is specified by the user. The coefficients qf , α1 , α2 , α3 , α4
heading of the trajectory at timestep k , x̃C C
k and ψk the have been empirically tuned to have a natural behavior and to
cartesian coordinates and the heading of the closest point on avoid numerical instability (qf = 1e−2 , α1 = 1e−1 , α2 =

VOLUME , 5
Lucente et al.: DeepGame-TP

achieves state-of-the-art performance, particularly within a

Algorithm 2: Symmetric Rank 1 (SR1) Hessian
4-s prediction horizon, while maintaining good performance
update
for longer-term predictions [29]. Moreover, the simple input
Input: Initial Hessian Hk , gradient difference structure enables application in scenarios where only the
y = ∇f (x + ∆x) − ∇f (x), step ∆x, target vehicle is observable, without requiring data from sur-
parameter r ∈ (0, 1) rounding vehicles. This is common in situations without V2I
Output: Hessian update Hk+1 or V2V communication. In contrast, other models depend on
T
1 if |∆x (y − H∆x)| ≥ r ∥∆x∥ ∥∇y − H∆x∥ then
the history of all surrounding vehicles, which may not always
(y − H∆x)(y − H∆x)T be accessible.
2 Hk+1 = Hk + ;
(y − H∆x)T ∆x Predicting the future desired speed has two key benefits:
3 else
– It enables a realistic representation of agents in the
4 Hk+1 = Hk ;
traffic scenario by tailoring the cost function to the
5 end
observed behavior.
6 return Hk+1
– It facilitates the convergence of the algorithm to realistic
GNE. For example, if a vehicle is slowing down to
match the speed of the vehicle in front, it is likely in-
TABLE 1: Parameters of the Trust Region algorithm
tending to follow rather than overtake. The cost function
∆i0 ϵ η kmax is then adjusted online to reflect this behavior, ensuring
1.0 1e−2 M 1e−4 25 convergence to the appropriate GNE.

D. CONSTRAINTS
1e2 , α3 = 1.0, α4 = 1.0). The first two terms in the cost
function penalize the deviation from center line, in terms of In the GNEP presented in equation 3, only inequality con-
cartesian distance and heading, the third term penalizes the straints are considered. This is because the only relevant
deviation from the predicted longitudinal behavior, the last equality constraints for the trajectory planning problem, the
term penalizes the longitudinal force input. dynamic constraints, are enforced through the integration
Figure 1 illustrates the architecture of the network used to function X i = F (U i , xi0 ), where xi0 represents the initial
predict the desired speed, along with its inputs and outputs. state of vehicle i. The constraints for the GNEP of agent i
The network is composed by a first LSTM layer with 32 are:
units, a 1D convolutional layer with 32 units, a Max pooling – Constraints on the inputs:
layer with a pool size of 2 and a final dense layer with 182 δmin ≤ δki ≤ δmax ∀k
neurons. Further details can be found in [29]. (9)
Fmin ≤ Fki ≤ Fmax ∀k
Where δki and Fki are the steering angle and the longi-
tudinal force of agent i at time step k .
– Constraints to stay in the lane:
∥x̃ik − x̃C 2
k ∥ ≤ rlim ∀k (10)
Where x̃ik
are the cartesian coordinates of agent i at
time step k and x̃C
k are the cartesian coordinates of the
closest point on the center line.
– Constraints for collision avoidance:
FIGURE 1: Architecture of the neural network for desired / Ω(x̃jk , ψkj )
x̃ik ∈ ∀ k, ∀ j ̸= i (11)
speed prediction.
Where Ω(x̃jk , ψkj )
represents the area of an ellipse
centered on the agent j at time step k and rotated in
The input captures the longitudinal behavior of a vehicle the direction of its heading ψkj .
over a 3-second time window, while the output describes
the vehicle’s behavior over a 9-second period, including the IV. CASE STUDIES AND DISCUSSION
observed 3 seconds and the subsequent 6 seconds. The input To evaluate the approach of DeepGame-TP, demonstrate its
features include progress, speed, and acceleration, and the flexibility and verify the effectiveness of using a learned-
output features consist of progress and speed. These features based model in cost function estimation, three scenarios are
are sampled every 0.1 seconds, resulting in 30 time steps tested and analyzed:
for the input and 90 time steps for the output. The model – Intersection scenario
has been trained on the NGSIM dataset [2], which includes – Overtaking scenario
two freeway segments and two arterial segments. The model – Merging scenario

6 VOLUME ,
In each scenario, DeepGame-TP is compared with a baseline
approach where the future desired speed, estimated by the TABLE 2: Performance of DeepGame-TP (DeepG.) using
LSTM network, is replaced by the maximum speed, thereby the LSTM module and without it (baseline) in the yield case.
omitting the learned-based module. For the comparison, the The table shows the average value and the standard deviation
following KPIs of the controlled vehicle are considered: of 10 runs, for each KPI.
– Minimum distance from the closest vehicle (measured Min Avg Max Avg Min Max
between the centers of gravity) Dist Jerk Jerk Vel Acc Acc
– Average jerk [m] [m/s3 ] [m/s3 ] [m/s] [m/s2 ] [m/s2 ]
– Maximum jerk DeepG. 3.4 ± 2.3 ± 23 ± 4 3.9 ± −1.9± 2.0 ±
– Average speed 0.6 0.4 0.1 0.2 0.01
– Minimum acceleration Baseline 3.4 ± 1.5 ± 20.0 ± 4.0 ± −0.9± 2.0 ±
– Maximum acceleration 0.2 0.4 0.2 0.1 0.1 0.01
The simulation environment is Automated Driving Open
Research (ADORe), an open source modular software li-
brary and toolkit for decision making, planning, control TABLE 3: Performance of DeepGame-TP (DeepG.) using
and simulation of automated vehicles, developed by the the LSTM module and without it (baseline) in the proceed
Institute of Transportation Systems of the German Aerospace case. The table shows the average value and the standard
Center (DLR). In the following subsections, each scenario is deviation of 10 runs, for each KPI.
analyzed separately, with an additional subsection dedicated Min Avg Max Avg Min Max
to real-time computational performance. Dist Jerk Jerk Vel Acc Acc
[m] [m/s3 ] [m/s3 ] [m/s] [m/s2 ] [m/s2 ]
A. INTERSECTION DeepG. 3.6 ± 1.4 ± 20.0 ± 3.9 ± −0.7± 2.0 ±
In the intersection scenario depicted in figure 2, the ego 0.6 0.4 0.1 0.1 0.2 0.01
vehicle is the white one, which needs to turn left, while the Baseline 1.3 ± 3.8 ± 27 ± 6 3.4 ± −2.0± 2.0 ±
red vehicle continues straight, creating a potential collision 0.9 0.3 0.2 0.1 0.01
risk. The red vehicle is controlled by a simple Intelligent
Driver Model (IDM). Two cases are tested: in the first case,
the red vehicle proceeds at a lower speed, allowing the ego – In the yield case, there is no significant difference
vehicle to enter the intersection before it (”proceed case”). between using or not using the LSTM module. The
In the second case, the red vehicle proceeds at a higher red vehicle accelerates to its maximum speed, taking
speed, forcing the ego vehicle to complete its turn after priority over the ego vehicle. Therefore, treating the
the red vehicle has passed (”yield case”). Therefore, the maximum speed as the desired speed is appropriate and
aim is to test the difference in using the LSTM network to results in a more cautious and comfortable behavior, as
predict the desired speed when the ego vehicle can adopt a indicated by the KPI for minimum acceleration.
more aggressive approach versus when it needs to be more – In the proceed case, there is a clear advantage to using
cautious. In the two cases tested, if the LSTM network is the LSTM network to predict the desired speed. Here,
the red vehicle does not accelerate to the maximum
speed but instead allows the ego vehicle to enter the
intersection first. With the LSTM network, this intention
is correctly understood, and the ego vehicle acts ac-
cordingly. Without the LSTM network, the red vehicle
is predicted to accelerate, causing confusion for the
ego vehicle, which attempts to enter the intersection
after the red vehicle has passed. This discrepancy in
speed prediction leads to a collision, as evidenced by
the minimum distance KPI.

FIGURE 2: Intersection scenario. B. OVERTAKING

Figure 3 shows the simulated overtaking scenario. In this
scenario, the ego vehicle has a desired speed equal to the
not used, then the desired speed is replaced by the maximum maximum allowed, while the red vehicles proceeds with a
speed in the cost function. A PID controller is used to follow speed that is way lower. For this reason, the GNE converges
the planned trajectories. to an overtaking maneuver. The ego vehicles is controlled
The results shown in tables 2 and 3 lead to some consid- by DeepGame-TP, while the red vehicle is controlled by an
erations: IDM.

VOLUME , 7
Lucente et al.: DeepGame-TP

FIGURE 3: Overtaking scenario.

(a) Example of trajectory using DeepGame-TP.

TABLE 4: Performance of DeepGame-TP (DeepG.) using
the LSTM module and without it (Baseline) in the overtaking
scenario. The table shows the average value and the standard
deviation of 10 runs, for each KPI.
Min Avg Max Avg Min Max
Dist Jerk Jerk Vel Acc Acc
[m] [m/s3 ] [m/s3 ] [m/s] [m/s2 ] [m/s2 ]
DeepG. 2.8 ± 1.6 ± 21 ± 2 6.7 ± −1.5± 2.0 ±
0.1 0.4 0.2 0.3 0.01
Baseline 2.3 ± 1.3 ± 21 ± 4 6.7 ± −2.0± 2.0 ±
0.3 0.4 0.2 0.2 0.01

Table 4 presents the KPI results. It is evident that without

the LSTM module, and thus without an accurate estimate (b) Example of Trajectory without the LSTM module (baseline).
of the vehicle’s future trajectory, the risk of collision signif-
icantly increases. This is more clearly illustrated in figure FIGURE 4: Example of trajectories if the LSTM module
4, which shows the trajectories in both scenarios, and in is used (7a) and if it is not used (7b) for the overtaking
figure 5, which displays the distance between the vehicles scenario. The ego vehicle trajectory is in orange.
over time. From these two figures, the following observations
can be made:
– With the LSTM module, the overtaking maneuver is
significantly smoother. During the bumper-to-bumper C. MERGING
phase, the distance is maintained, and the minimum Figure 6 illustrates the merging scenario used to test
distance is achieved in the final phase of the maneuver, DeepGame-TP. In this scenario, the ego vehicle merges into
when the vehicles are side by side. This results in a the left lane where two other vehicles are present. The
minimum distance of 2.8 meters between the centers of configuration of the scene allows the ego vehicle to choose
gravity, which is acceptable (approximately 1 meter of between merging ahead of the two cars by accelerating or
door-to-door distance). merging between them by decelerating. The two vehicles
– Without the LSTM module, the overtaking maneuver are controlled by an IDM with their target speed set to the
is abrupt. The front vehicle is predicted to accelerate maximum speed.
to the maximum allowed speed, which is an incorrect In scenarios where vehicles are traveling near the maxi-
estimate. This brings the ego vehicle too close to the mum speed, selecting this speed as the desired speed in the
front vehicle, causing the overtaking maneuver to be cost function for the other agents is a reasonable assumption.
initiated with significant delay and urgency. The mini- This explains why there is no significant performance dif-
mum distance is reached not only during the side-by- ference between DeepGame-TP and the version without the
side phase but also during the bumper-to-bumper phase, LSTM module, as shown in table 5. This scenario verifies
leading to a collision. A distance of approximately if the algorithm can solve the GNEP with three vehicles in
2.3 meters between the centers of gravity is indeed real time. The next subsection examines the computational
insufficient for maintaining adequate bumper-to-bumper performance of the algorithm, demonstrating its potential for
space. online application.

8 VOLUME ,
TABLE 5: Performance of DeepGame-TP (DeepG.) using
the LSTM module and without it (baseline) in the merging
scenario. The table shows the average value and the standard
deviation of 10 runs, for each KPI.
Min Avg Max Avg Min Max
Dist Jerk Jerk Vel Acc Acc
[m] [m/s3 ] [m/s3 ] [m/s] [m/s2 ] [m/s2 ]
DeepG. 2.8 ± 1.9 ± 21 ± 2 5.1 ± −0.8± 2.0 ±
0.3 0.7 0.3 0.3 0.01
Baseline 2.7 ± 1.7 ± 21 ± 3 5.3 ± −0.5± 2.0 ±
0.4 0.8 0.2 0.2 0.01

(a) Distance to vehicle using DeepGame-TP.

execution time for each episode. Various strategies have been
implemented to reduce computational time:
– The initial guess U0 in the Trust Region solver (Algo-
rithm 1) is the solution from the previous time step. This
accounts for the bimodal distribution observed in the
histograms in Figure 7. The first peak at low execution
times occurs when the traffic situation remains similar
to the previous time step, meaning the initial guess is
close to the actual GNEP solution. The second peak
at higher execution times arises when the previous
time step’s solution is no longer valid for the current
situation, requiring more time to solve the GNEP.
– The computation of the gradient ∇L has been paral-
lelized across multiple cores.
(b) Distance to vehicle without the LSTM module (baseline).
– The number of integration nodes (N ) has been set to 12,
with the integration time step configured to 0.5 seconds.
FIGURE 5: Example of distance between the ego vehicle Table 6 shows the solve time of DeepGame-TP compared
and the other one over time during the overtaking scenario to ALGAMES and LUCIDGames [23], [26], in compa-
if the LSTM module is used (7a) and if it is not used (7b). rable scenarios. Given the available computational power,
DeepGame-TP is comparable to the state of the art in
terms of real-time applicability. Figure 7 shows that the
computational time is always below 100 milliseconds for
each scenario with two agents. However, in the merging
scenario with three agents, the computational time increases
to between 100 and 150 milliseconds for about 30% of
the run time. In this work, all the experiments have been
executed on a 8-core processor (Intel® Core™ i7-11850H).

TABLE 6: Comparison of computational times between

DeepGame-TP (DeepG.) and other algorithms.
Algorithm intersection overtaking merging # cores
2 players 2 players 3 players
DeepG. 46 ± 58 ± 52 ± 8
FIGURE 6: Merging scenario.
27 [ms] 22 [ms] 48 [ms]
ALGAMES 50 ± - 89 ± -
[23] 11 [ms] 14 [ms]
LUCID - - 26 ± 16
D. REAL TIME COMPUTATIONAL PERFORMANCE
[26] 37 [ms]
In this section, considerations on execution time and online
applicability are presented. Figure 7 shows histograms of the

VOLUME , 9
Lucente et al.: DeepGame-TP

Simulation campaigns demonstrate the approach’s flexibility,

as it is not restricted to any specific topology, and its
potential for real-time application, with the ability to handle
scenarios involving up to three agents within 150 millisec-
onds. Further advancements could be made by exploring the
use of learning-based models to enhance understanding and
modeling within game-theoretic frameworks.

REFERENCES
(a) Execution time for the intersection scenario. [1] T. Räuker, A. Ho, S. Casper, and D. Hadfield-Menell,
“Toward transparent ai: A survey on interpreting the inner
structures of deep neural networks,” 2023. [Online]. Available:
https://arxiv.org/abs/2207.13243
[2] “U.s. department of transportation federal highway administration.
(2016). next generation simulation (ngsim) vehicle trajectories
and supporting data. [dataset]. provided by its datahub
through data.transportation.gov.” 2016. [Online]. Available:
http://doi.org/10.21949/1504477
[3] K. R. Williams, R. Schlossman, D. Whitten, J. Ingram, S. Musuvathy,
J. Pagan, K. A. Williams, S. Green, A. Patel, A. Mazumdar, and
J. Parish, “Trajectory planning with deep reinforcement learning
in high-level action spaces,” IEEE Transactions on Aerospace and
Electronic Systems, vol. 59, no. 3, pp. 2513–2529, 2023.
[4] E. Zhang, R. Zhang, and N. Masoud, “Predictive trajectory
(b) Execution time for the overtaking scenario.
planning for autonomous vehicles at intersections using
reinforcement learning,” Transportation Research Part C: Emerging
Technologies, vol. 149, p. 104063, 2023. [Online]. Available:
https://www.sciencedirect.com/science/article/pii/S0968090X23000529
[5] Z. Wang, J. Tu, and C. Chen, “Reinforcement learning based trajectory
planning for autonomous vehicles,” 2021 China Automation Congress
(CAC), pp. 7995–8000, 2021.
[6] Z. Li, K. You, J. Sun, and G. Wang, “Informative trajectory plan-
ning using reinforcement learning for minimum-time exploration of
spatiotemporal fields,” IEEE Transactions on Neural Networks and
Learning Systems, pp. 1–11, 2023.
[7] M. Zare, P. M. Kebria, A. Khosravi, and S. Nahavandi, “A survey of
imitation learning: Algorithms, recent developments, and challenges,”
2023. [Online]. Available: https://arxiv.org/abs/2309.02473
(c) Execution time for the merging scenario. [8] L. Le Mero, D. Yi, M. Dianati, and A. Mouzakitis, “A survey on
imitation learning techniques for end-to-end autonomous vehicles,”
FIGURE 7: Histograms of the execution times for each IEEE Transactions on Intelligent Transportation Systems, vol. 23,
no. 9, pp. 14 128–14 147, 2022.
scenario during a run. [9] F. Codevilla, E. Santana, A. Lopez, and A. Gaidon, “Exploring the
limitations of behavior cloning for autonomous driving,” in 2019
IEEE/CVF International Conference on Computer Vision (ICCV),
2019, pp. 9328–9337.
[10] T. Phan-Minh, F. Howington, T.-S. Chu, M. S. Tomov, R. E. Beaudoin,
V. CONCLUSIONS S. U. Lee, N. Li, C. Dicle, S. Findler, F. Suarez-Ruiz, B. Yang,
This work introduces DeepGame-TP, a trajectory planner for S. Omari, and E. M. Wolff, “Driveirl: Drive in real life with inverse
multi-agent traffic environments that solves the Generalized reinforcement learning,” in 2023 IEEE International Conference on
Robotics and Automation (ICRA), 2023, pp. 1544–1550.
Nash Equilibrium Problem using the Augmented Lagrangian [11] Z. Huang, H. Liu, J. Wu, and C. Lv, “Conditional predictive be-
formulation and a Trust Region solver. The speed component havior planning with inverse reinforcement learning for human-like
of each agent’s cost function is learned by an LSTM network, autonomous driving,” IEEE Transactions on Intelligent Transportation
Systems, vol. 24, no. 7, pp. 7244–7258, 2023.
which predicts the desired speed profile for the next 6
[12] Z. Zhao, Z. Wang, K. Han, R. Gupta, P. Tiwari, G. Wu, and M. J. Barth,
seconds. Case studies demonstrate that the learning-based “Personalized car following for autonomous driving with inverse
cost function approach of DeepGame-TP outperforms the reinforcement learning,” in 2022 International Conference on Robotics
and Automation (ICRA), 2022, pp. 2891–2897.
traditional approach, where the desired speed is fixed at the
[13] G. C. Karl Couto and E. A. Antonelo, “Generative adversarial imitation
maximum speed, especially in intersection and overtaking learning for end-to-end autonomous driving on urban environments,”
scenarios. DeepGame-TP enables the ego vehicle to adapt in 2021 IEEE Symposium Series on Computational Intelligence (SSCI),
to observed behaviors, understanding the longitudinal ag- 2021, pp. 1–7.
[14] R. Bhattacharyya, B. Wulfe, D. J. Phillips, A. Kuefler, J. Morton,
gressiveness of agents and adjusting their cost functions R. Senanayake, and M. J. Kochenderfer, “Modeling human driving
accordingly. As a result, DeepGame-TP offers a transpar- behavior through generative adversarial imitation learning,” IEEE
ent approach to trajectory planning in highly interactive Transactions on Intelligent Transportation Systems, vol. 24, no. 3, pp.
2874–2887, 2023.
scenarios such as intersections, overtaking, and merging, [15] A. Jamgochian, E. Buehrle, J. Fischer, and M. J. Kochenderfer,
using deep learning to model the agents’ cost function. “Shail: Safety-aware hierarchical adversarial imitation learning for au-

10 VOLUME ,
tonomous driving in urban environments,” in 2023 IEEE International Mikkel Skov Maarssoe was born in Denmark,
Conference on Robotics and Automation (ICRA), 2023, pp. 1530–1536. Odense, 1997. He received a B.S. degree in
[16] A. Plebe, H. Svensson, S. Mahmoud, and M. Da Lio, robotics engineering in 2021 and an M.S. degree in
“Human-inspired autonomous driving: A survey,” Cognitive robotics engineering focused on advanced robotics
Systems Research, vol. 83, p. 101169, 2024. [Online]. Available: technology in 2023 at the University of Southern
https://www.sciencedirect.com/science/article/pii/S1389041723001031 Denmark (SDU) in Denmark Odense. Between
[17] C.-C. Hsu, L.-W. Kang, S.-Y. Chen, I.-S. Wang, C.-H. Hong, and C.- 2019 and 2023 he worked as an ambassador for
Y. Chang, “Deep learning-based vehicle trajectory prediction based on the robotics engineering education at SDU, and has
generative adversarial network for autonomous driving applications,” among other teaching roles been an instructor in
Multimedia Tools and Applications, vol. 82, 09 2022. the course embodied artificial intelligence in 2022
[18] L. Zhao, Y. Liu, A. Y. Al-Dubai, A. Y. Zomaya, G. Min, and at SDU. He is currently working as a researcher in
A. Hawbani, “A novel generation-adversarial-network-based vehicle autonomous driving at the German Aerospace Center (DLR), with a focus
trajectory prediction method for intelligent vehicular networks,” IEEE on computer science, control engineering, and software development.
Internet of Things Journal, vol. 8, no. 3, pp. 2066–2077, 2021.
[19] X. Chen, J. Xu, R. Zhou, W. Chen, J. Fang, and C. Liu,
“Trajvae: A variational autoencoder model for trajectory generation,” Anas Abulehia received a Bachelor’s degree in
Neurocomputing, vol. 428, pp. 332–339, 2021. [Online]. Available: Mechatronics Engineering from Palestine Poly-
https://www.sciencedirect.com/science/article/pii/S0925231220312017 technic University, Palestinian Territories, in 2016.
[20] Y. Choi, R. C. Mercurius, S. M. A. Shabestary, and A. Rasouli, He later earned a Master of Engineering in Embed-
“Dice: Diverse diffusion model with scoring for trajectory prediction,” ded Systems from Fachhochschule Dortmund in
2023. [Online]. Available: https://arxiv.org/abs/2310.14570 2023. With extensive experience as a mechatronics
[21] I. Bae, Y.-J. Park, and H.-G. Jeon, “Singulartrajectory: Universal engineer, he is currently pursuing a Ph.D. at the
trajectory predictor using diffusion model,” 2024. [Online]. Available: Institute of Transportation Systems of the German
https://arxiv.org/abs/2403.18452 Aerospace Center (DLR) in Braunschweig. His
[22] K. Chen, X. Chen, Z. Yu, M. Zhu, and H. Yang, “Equidiff: A research interests include autonomous systems, ve-
conditional equivariant diffusion model for trajectory prediction,” hicles, and connected vehicles, and transportation.
2023. [Online]. Available: https://arxiv.org/abs/2308.06564
[23] S. Le Cleac’h, M. Schwager, and Z. Manchester, “Algames: a
fast augmented lagrangian solver for constrained dynamic games,” Sanath Himasekhar Konthala received a B.S.
Autonomous Robots, vol. 46, no. 1, pp. 201–215, Jan 2022. [Online]. degree in mechanical engineering in 2019 and
Available: https://doi.org/10.1007/s10514-021-10024-7 a M.S. degree in automotive engineering from
[24] W. Schwarting, A. Pierson, J. Alonso-Mora, S. Karaman, FH Aachen in 2023. He is currently a research
and D. Rus, “Social behavior for autonomous vehicles,” assistant at the German Aerospace Center (DLR),
Proceedings of the National Academy of Sciences, vol.
focusing on autonomous driving. His research in-
116, no. 50, pp. 24 972–24 978, 2019. [Online]. Available: terests include trajectory planning and control, as
https://www.pnas.org/doi/abs/10.1073/pnas.1820676116 well as cooperative automated driving.
[25] M. Bhatt, Y. Jia, and N. Mehr, “Efficient constrained
multi-agent trajectory optimization using dynamic potential
games,” pp. 7303–7310, Jan 2023. [Online]. Available:
https://doi.org/10.1109/IROS55552.2023.10342328
[26] S. Le Cleac’h, M. Schwager, and Z. Manchester, “Lucidgames: Online
unscented inverse dynamic games for adaptive trajectory prediction Prof. Dr.-Ing. Reza Dariani is currently a Pro-
and planning,” IEEE Robotics and Automation Letters, vol. PP, pp. fessor for Signals and Systems at Hochschule
1–1, 04 2021. Merseburg. He received his B.S. degree in Power
[27] F. Facchinei and C. Kanzow, “Generalized nash equilibrium problems,” Electrical Engineering in 2008 from the Uni-
Annals of Operations Research, vol. 175, no. 1, pp. 177–211, Mar versity of Saveh-Iran. In 2010, he obtained a
2010. [Online]. Available: https://doi.org/10.1007/s10479-009-0653-x M.S. degree in Electronic-Electrical and Auto-
[28] N. Andrei, The Trust-Region Method. Cham: Springer matic from the University of Reims-France, fol-
International Publishing, 2022, pp. 331–353. [Online]. Available: lowed by another M.S. degree in Mechatronics
https://doi.org/10.1007/978-3-031-08720-28 from the University of Strasbourg- France in 2011.
[29] G. Lucente, M. S. Maarssoe, I. Kahl, and J. Schindler, “Deep learning He completed his Ph.D. in Trajectory Planning
algorithms for longitudinal driving behavior prediction: A comparative for Autonomous Vehicles at Otto-von-Guericke
analysis of convolutional neural network and long–short-term memory Universität of Magdeburg-Germany. Since 2016, he has been working as
models,” SAE International Journal of Connected and Automated Vehicles, a researcher at the Institute of Transportation Systems of the German
2024. [Online]. Available: https://doi.org/10.4271/12-07-04-0025 Aerospace Center in Braunschweig-Germany.

Giovanni Lucente received the B.S. degree in Julian Schindler is working since 2006 at the
mechanical engineering from Politecnico di Mi- Institute of Transportation Systems at the German
lano, Italy, in 2017 and completed his M.S. de- Aerospace Center (DLR-ITS) in Braunschweig,
gree in mechanical engineering from Politecnico Germany. As computer scientist he was first re-
di Milano in 2020. With previous working expe- sponsible for the software architecture of several
rience in automotive companies, he is currently driving simulators. Working on the ergonomic
a Ph.D. student at the Institute of Transportation design of vehicle automation functions and the
Systems of the German Aerospace Center (DLR) cooperation between driver and vehicle, he was
in Braunschweig. His research interests revolve participating in several national and international
around decision-making processes and cooperation projects, such as Citymobil (FP6), interactIVe,
between automated and connected vehicles in ur- HAVEit and ISiPADAS (FP7). Since cooperation
ban traffic environments. afterwards also included other agents, e.g. infrastructure or VRUs, he
focussed on that topic, coordinating e.g. H2020-MAVEN and H2020-
TransAID. Since 2017, he is also leading the group “System-Automation &
Integration” at DLR-ITS.

VOLUME , 11

Vidali Tesi
No ratings yet
Vidali Tesi
73 pages
Autonomous Driving With Deep Reinforcement Learning in CARLA Simulation
No ratings yet
Autonomous Driving With Deep Reinforcement Learning in CARLA Simulation
7 pages
Autonomous Driving System Using Proximal Policy Optimization in Deep Reinforcement Learning
No ratings yet
Autonomous Driving System Using Proximal Policy Optimization in Deep Reinforcement Learning
10 pages
Self-Driving Car Racing: Application of Deep Reinforcement Learning
No ratings yet
Self-Driving Car Racing: Application of Deep Reinforcement Learning
12 pages
Safe, Efficient, Comfort, and Energy-Saving Automated Driving Through Roundabout Based On Deep Reinforcement Learning
No ratings yet
Safe, Efficient, Comfort, and Energy-Saving Automated Driving Through Roundabout Based On Deep Reinforcement Learning
6 pages
Deep Reinforcement Learning For Autonomous Driving A Survey
No ratings yet
Deep Reinforcement Learning For Autonomous Driving A Survey
18 pages
10、《Let Hybrid a Path Planner Obey Traffic Rules a Deep Reinforcement Learning-Based Planning Framework》
No ratings yet
10、《Let Hybrid a Path Planner Obey Traffic Rules a Deep Reinforcement Learning-Based Planning Framework》
8 pages
Integrating Deep Reinforcement Learning With Model-Based Path Planner
No ratings yet
Integrating Deep Reinforcement Learning With Model-Based Path Planner
6 pages
PGP Report Sachin t22060
No ratings yet
PGP Report Sachin t22060
20 pages
Comparing DRL Architectures
No ratings yet
Comparing DRL Architectures
14 pages
Decision-Making Strategy On Highway For Autonomous Vehicles Using Deep Reinforcement Learning
No ratings yet
Decision-Making Strategy On Highway For Autonomous Vehicles Using Deep Reinforcement Learning
11 pages
Deep Reinforcement Learning For Autonomous Driving A Survey
No ratings yet
Deep Reinforcement Learning For Autonomous Driving A Survey
18 pages
Autonomous Vehicle Control Via Deep Reinforcement Learning: Simon Kardell Mattias Kuosku
No ratings yet
Autonomous Vehicle Control Via Deep Reinforcement Learning: Simon Kardell Mattias Kuosku
73 pages
Driver Modeling Behavioral Game Theory
No ratings yet
Driver Modeling Behavioral Game Theory
22 pages
Deep Imitative Reinforcement Learning With Gradient Conflict-Free For Decision-Making in Autonomous Vehicles
No ratings yet
Deep Imitative Reinforcement Learning With Gradient Conflict-Free For Decision-Making in Autonomous Vehicles
17 pages
Mainrep
No ratings yet
Mainrep
6 pages
Safe Navigation Based On Deep Q-Network Algorithm Using An Improved Control Architecture
No ratings yet
Safe Navigation Based On Deep Q-Network Algorithm Using An Improved Control Architecture
6 pages
Learning For Autonomous Vehicles: A Focus On Expert Demonstration
No ratings yet
Learning For Autonomous Vehicles: A Focus On Expert Demonstration
24 pages
Reinforcement Learning in Autonomous Driving
No ratings yet
Reinforcement Learning in Autonomous Driving
7 pages
Attention-Based Highway Safety Planner For Autonomous Driving Via Deep Reinforcement Learning
No ratings yet
Attention-Based Highway Safety Planner For Autonomous Driving Via Deep Reinforcement Learning
14 pages
Mainrep
No ratings yet
Mainrep
6 pages
GENDRIVE
No ratings yet
GENDRIVE
7 pages
Deep learning-LSTM
No ratings yet
Deep learning-LSTM
55 pages
MAE 598 Intro To Autonomous Project Dhiram Omkar Harshal
No ratings yet
MAE 598 Intro To Autonomous Project Dhiram Omkar Harshal
14 pages
Huang GameFormer Game-Theoretic Modeling and Learning of Transformer-Based Interactive Prediction and ICCV 2023 Paper
No ratings yet
Huang GameFormer Game-Theoretic Modeling and Learning of Transformer-Based Interactive Prediction and ICCV 2023 Paper
11 pages
Autonomous Car Racing in Simulation Environment Using Deep Reinforcement Learning
No ratings yet
Autonomous Car Racing in Simulation Environment Using Deep Reinforcement Learning
6 pages
21AIE401DRL TeamNo4 AIE19005 20 36 Report
No ratings yet
21AIE401DRL TeamNo4 AIE19005 20 36 Report
7 pages
Multi-Agent Connected Autonomous Driving Using Deep Reinforcement Learning
No ratings yet
Multi-Agent Connected Autonomous Driving Using Deep Reinforcement Learning
16 pages
Multi-Agent Connected Autonomous Driving Using Deep Reinforcement Learning
No ratings yet
Multi-Agent Connected Autonomous Driving Using Deep Reinforcement Learning
16 pages
Electronics: Decision-Making System For Lane Change Using Deep Reinforcement Learning in Connected and Automated Driving
No ratings yet
Electronics: Decision-Making System For Lane Change Using Deep Reinforcement Learning in Connected and Automated Driving
13 pages
Paper 2
No ratings yet
Paper 2
21 pages
Proximal Policy Optimization Through A Deep Reinfo
No ratings yet
Proximal Policy Optimization Through A Deep Reinfo
19 pages
IGP2 Emnvironment
No ratings yet
IGP2 Emnvironment
7 pages
Virtual Testing and Policy Deployment Framework For Autonomous Navigation of An Unmanned Ground Vehicle Using Reinforcement Learning
No ratings yet
Virtual Testing and Policy Deployment Framework For Autonomous Navigation of An Unmanned Ground Vehicle Using Reinforcement Learning
6 pages
Deep Learning Implementation of Self Driving Car: 15BCE0205 (Radhika Garodia) 15BCE0311 (Kanav Sethi)
No ratings yet
Deep Learning Implementation of Self Driving Car: 15BCE0205 (Radhika Garodia) 15BCE0311 (Kanav Sethi)
11 pages
Decision-Making For Autonomous Vehicles On Highway: Deep Reinforcement Learning With Continuous Action Horizon
No ratings yet
Decision-Making For Autonomous Vehicles On Highway: Deep Reinforcement Learning With Continuous Action Horizon
9 pages
Nuplan: A Closed-Loop Ml-Based Planning Benchmark For Autonomous Vehicles
No ratings yet
Nuplan: A Closed-Loop Ml-Based Planning Benchmark For Autonomous Vehicles
5 pages
Modified DDPG Car-Following Model With A Real-World Human Driving Experience With CARLA Simulator
No ratings yet
Modified DDPG Car-Following Model With A Real-World Human Driving Experience With CARLA Simulator
34 pages
Learning A Deep Neural Net Policy For End-to-End Control of Autonomous Vehicles
No ratings yet
Learning A Deep Neural Net Policy For End-to-End Control of Autonomous Vehicles
6 pages
DRLAD ExploringApplicationsTalk2019 CognitiveVehicles
No ratings yet
DRLAD ExploringApplicationsTalk2019 CognitiveVehicles
28 pages
Generating Data To Train A Deep Neural Network End-To-End Within A Simulated Environment
No ratings yet
Generating Data To Train A Deep Neural Network End-To-End Within A Simulated Environment
48 pages
Collision Avoidance Using RL
No ratings yet
Collision Avoidance Using RL
19 pages
Open DeepRacer Autonomous Racing Platform For Experimentation With Sim2Real Reinforcement Learning PDF
No ratings yet
Open DeepRacer Autonomous Racing Platform For Experimentation With Sim2Real Reinforcement Learning PDF
9 pages
Donkey Car Depp Reinforcement Learning
No ratings yet
Donkey Car Depp Reinforcement Learning
7 pages
Paper 3 Mlis
No ratings yet
Paper 3 Mlis
9 pages
Learning For Autonomous Vehicles: A Focus On Expert Demonstration
No ratings yet
Learning For Autonomous Vehicles: A Focus On Expert Demonstration
26 pages
Experiment 9
No ratings yet
Experiment 9
4 pages
DQN-based Reinforcement Learning For Vehicle Control of Autonomous Vehicles Interacting With Pedestrians
No ratings yet
DQN-based Reinforcement Learning For Vehicle Control of Autonomous Vehicles Interacting With Pedestrians
6 pages
Game Theoretic Planning For Self-Driving Cars in Competitive Scenarios
No ratings yet
Game Theoretic Planning For Self-Driving Cars in Competitive Scenarios
9 pages
Robust Deep Reinforcement Learning in Autonomous Car Path Planning
No ratings yet
Robust Deep Reinforcement Learning in Autonomous Car Path Planning
4 pages
Improving Deep Reinforcement L
No ratings yet
Improving Deep Reinforcement L
9 pages
Basic Study For Transfer Learning For Autonomous Driving in Car Race of Model Car
No ratings yet
Basic Study For Transfer Learning For Autonomous Driving in Car Race of Model Car
4 pages
Controlling An Autonomous Vehicle With Deep Reinforcement Learning
No ratings yet
Controlling An Autonomous Vehicle With Deep Reinforcement Learning
7 pages
2020 Hierarchical Reinforcement Learning For Autonomous Decision Making and Motion Planning of Intelligent Vehicles
No ratings yet
2020 Hierarchical Reinforcement Learning For Autonomous Decision Making and Motion Planning of Intelligent Vehicles
14 pages
Deep Reinforcement Learning For Mobile Robot Path Planning: Hao Liu, Yi Shen, Shuangjiang Yu, Zijun Gao, Tong Wu
No ratings yet
Deep Reinforcement Learning For Mobile Robot Path Planning: Hao Liu, Yi Shen, Shuangjiang Yu, Zijun Gao, Tong Wu
7 pages
The Actor-Dueling-Critic Method
No ratings yet
The Actor-Dueling-Critic Method
20 pages
Podracer Architectures For Scalable Reinforcement Learning
No ratings yet
Podracer Architectures For Scalable Reinforcement Learning
12 pages
Probabilistic MDP-behavior Planning For Cars
No ratings yet
Probabilistic MDP-behavior Planning For Cars
6 pages
Drones 06 00323 v3
No ratings yet
Drones 06 00323 v3
18 pages
Connectivity Prediction in Mobile Ad Hoc Networks for Real-Time Control
From Everand
Connectivity Prediction in Mobile Ad Hoc Networks for Real-Time Control
Sebastian Thelen
5/5 (1)
Unit1 - New
No ratings yet
Unit1 - New
300 pages
2nd Puc Maths Target Centum by Linge Gowda AP
No ratings yet
2nd Puc Maths Target Centum by Linge Gowda AP
43 pages
Blackbook FINAL
No ratings yet
Blackbook FINAL
37 pages
DD Quiz
No ratings yet
DD Quiz
2 pages
eFLL - A Fuzzy Library For Arduino and Embedded Systems
No ratings yet
eFLL - A Fuzzy Library For Arduino and Embedded Systems
31 pages
Basic R
No ratings yet
Basic R
17 pages
Schneider Electric UPS Network Management Cards AP9547
No ratings yet
Schneider Electric UPS Network Management Cards AP9547
3 pages
Mths112 Reader 2023 e
No ratings yet
Mths112 Reader 2023 e
134 pages
IMS Roaming, Interconnection and Interworking Guidelines 02 December 2020
No ratings yet
IMS Roaming, Interconnection and Interworking Guidelines 02 December 2020
74 pages
Ilovepdf Merged-Compressed
No ratings yet
Ilovepdf Merged-Compressed
352 pages
PLR3000 S01 Operating Instructions en 1.7
No ratings yet
PLR3000 S01 Operating Instructions en 1.7
66 pages
Experiment Number - 1.3: Student'S Name - MD Sharjil Alam STUDENT'S UID - 21BCS2854 Class and Group - 420-B Semester - 2
No ratings yet
Experiment Number - 1.3: Student'S Name - MD Sharjil Alam STUDENT'S UID - 21BCS2854 Class and Group - 420-B Semester - 2
11 pages
QGIS - Instructions
No ratings yet
QGIS - Instructions
13 pages
Creating An MPLS VPN
No ratings yet
Creating An MPLS VPN
24 pages
Armillia Karenna - TP060327 - Pfda
No ratings yet
Armillia Karenna - TP060327 - Pfda
65 pages
Python Question Bank
100% (2)
Python Question Bank
3 pages
Introduction To Agricultural Information Systems
No ratings yet
Introduction To Agricultural Information Systems
13 pages
18 Visual Paradigm
No ratings yet
18 Visual Paradigm
6 pages
Dissertation Recipes Problem Statement
100% (2)
Dissertation Recipes Problem Statement
4 pages
Dell Precision 690 Technicke Specifikace en
No ratings yet
Dell Precision 690 Technicke Specifikace en
2 pages
OS R23 - UNIT-3 (Part-1)
No ratings yet
OS R23 - UNIT-3 (Part-1)
15 pages
Safety Telegram 900 For SIMIT
No ratings yet
Safety Telegram 900 For SIMIT
10 pages
Brochure - OmniFlow iBPS Intelligent Business Process Suite
No ratings yet
Brochure - OmniFlow iBPS Intelligent Business Process Suite
11 pages
Nism Admit Card
No ratings yet
Nism Admit Card
2 pages
Hadi Kiroto, S.Pd. : Training Transcript Educator Center
No ratings yet
Hadi Kiroto, S.Pd. : Training Transcript Educator Center
3 pages
Session - 11 (Shell Variables)
No ratings yet
Session - 11 (Shell Variables)
6 pages
PLSQL Day 1
No ratings yet
PLSQL Day 1
12 pages
Comprehensive Insurance Migration Scoping Report-P
No ratings yet
Comprehensive Insurance Migration Scoping Report-P
7 pages
Firewall Over Floodlight Controller-FTH
No ratings yet
Firewall Over Floodlight Controller-FTH
15 pages
Fixme
No ratings yet
Fixme
68 pages

1224548

Uploaded by

1224548

Uploaded by

Giovanni Lucente1,2 , Mikkel Skov Maarssoe1 , Anas Abulehia1 , Sanath Himasekhar

Konthala1 , Reza Dariani1 , and Julian Schindler1

September 16, 2024

DeepGame-TP: Integrating Dynamic

I. INTRODUCTION It is possible to identify three main challenges in the

I N recent years, many successful data-driven approaches

trajectory planner. Generally, methods that estimate the cost simplified:

min J i (X i , U i ) The strategy U i is considered optimal for agent i, as it

achieves state-of-the-art performance, particularly within a

FIGURE 2: Intersection scenario. B. OVERTAKING

FIGURE 3: Overtaking scenario.

(a) Example of trajectory using DeepGame-TP.

Table 4 presents the KPI results. It is evident that without

(a) Distance to vehicle using DeepGame-TP.

TABLE 6: Comparison of computational times between

Simulation campaigns demonstrate the approach’s flexibility,

You might also like