0% found this document useful (1 vote)

986 views23 pages

A Survey On Trajectory-Prediction Methods For Autonomous Driving

This article provides a comprehensive review of trajectory prediction methods for autonomous vehicles over the last two decades. It begins by explaining the importance of trajectory prediction for autonomous vehicle safety and outlines the problem formulation. The review then classifies and elaborately introduces popular trajectory prediction methods including physics-based methods, classic machine learning methods, deep learning methods, and reinforcement learning methods. It evaluates the performance of each type of method and discusses potential future research directions.

Uploaded by

Rashmi Chelliah (RA1911026010022)

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (1 vote)

986 views23 pages

A Survey On Trajectory-Prediction Methods For Autonomous Driving

Uploaded by

Rashmi Chelliah (RA1911026010022)

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIV.2022.3167103, IEEE
Transactions on Intelligent Vehicles
IEEE TRANSACTIONS ON INTELLIGENT VEHICLES 1

A Survey on Trajectory-Prediction Methods for

Autonomous Driving
Yanjun Huang, Jiatong Du, Ziru Yang, Zewei Zhou, Lin Zhang, Hong Chen∗ , Senior Member, IEEE

Abstract—In order to drive safely in a dynamic environment, Planning

autonomous vehicles should be able to predict the future states Potential Dangers and Control
of traffic participants nearby, especially surrounding vehicles,
similar to the capability of predictive driving of human drivers.
That is why researchers are devoted to the field of trajectory pre-
diction and propose different methods. This paper is to provide a
comprehensive and comparative review of trajectory-prediction
methods proposed over the last two decades for autonomous
driving. It starts with the problem formulation and algorithm
classification. Then, the popular methods based on physics, classic
machine learning, deep learning, and reinforcement learning are The predicted
elaborately introduced and analyzed. Finally, this paper evaluates Prediction Perception trajectory

the performance of each kind of method and outlines potential

research directions to guide readers in this field.
Index Terms—Autonomous driving, trajectory prediction, ma-
chine learning, deep learning, reinforcement learning Fig. 1: The impact of trajectory prediction.

I. I NTRODUCTION
review papers have discussed a part of trajectory-prediction
Autonomous driving is attracting more and more attention techniques. Lefèvre et al. [4] present a survey on existing
from both academia and industrial sectors [1], because of methods of motion prediction and risk assessment for AVs
its promising merits to solve many long-term transportation before 2014. Most of these methods are classical but out of
challenges related to safety, congestion, energy-saving, and date. Mohammad et al. [5] review behavior-prediction methods
so on [2], [3]. In recent years, we have witnessed the rapid at intersections based on drivers’ maneuvers. A review of
development of perception, planning, and control systems for deep learning-based approaches focusing on vehicle behavior
autonomous vehicles (AVs). However, mass production of AVs analysis is presented in 2019 by Mozaffari et al. [6], which
will become true only if the safety of autonomous driving describes different criteria to classify only a part of popu-
is verified. To further improve the safety, one of the most lar methods based on input and output information, and it
key technologies is AVs should be able to predict the future does not involve some latest published methods. Two recent
states of the surrounding environment in real time like human publications [7], [8] similarly focus on trajectory prediction
drivers. for AVs, but Ref. [7] provides a review about tracking and
When a human drives a vehicle, he/she usually observes trajectory prediction, which only contains methods using deep
the surrounding traffic participants and predicts their future learning and methods using stochastic techniques, and Ref. [8]
states before initiating a new driving maneuver, e.g., accel- only presents deep learning methods. Other surveys [9], [10]
eration or lane change. Future states of traffic participants use vision information to detect anomaly behavior and Ref.
can be represented by future trajectories, utilized to detect [11], [12] survey human motion prediction, which is obviously
potential dangers in advance and used in designing decision- different from the topic of this study.
making or planning algorithm, as shown in Fig. 1. However, Thus, this survey comprehensively reviews trajectory-
due to diverse maneuvers of traffic participants, the complex prediction methods for AVs proposed over the last two
interactions between traffic participants and environments, the decades. We select heuristic and state-of-the-art trajectory
uncertainty of sensory information, the computation burdens prediction methods for a period of time to compare and
and computing time requirements of AVs, how to accurately summarize. Note that the historical trajectory information used
predict future trajectories of traffic participants is drawing in prediction methods can be obtained from the perception
much attention and becoming one of the key points to improve system [13] and vehicle to everything V2X [14], and vision-
the safety of autonomous driving. based methods are not the focus of this review. Since traffic
Many researchers are devoted to the field of trajectory participants, e.g., surrounding vehicles, directly impact the
prediction and propose a number of useful methods. Several ego vehicle, this paper mainly focuses on trajectory-prediction
methods for vehicles. As shown in Fig. 2, this paper will
Authors Affiliation: School of Automotive Studies, Tongji University,
Shanghai 201804, China. Corresponding author: Hong Chen (email: chen- review physics-based methods, classic machine learning-based
[email protected]). methods, deep learning-based methods, and reinforcement

2379-8858 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: SRM University. Downloaded on August 02,2022 at 10:14:56 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIV.2022.3167103, IEEE
Transactions on Intelligent Vehicles
IEEE TRANSACTIONS ON INTELLIGENT VEHICLES 2

Input Classification of Trajectory Prediction Methods

Physics-related
Factors
Contextual Road-related Factors Deep
Physics-based Classic Machine Reinforcement
Factors
Methods Learning-based Methods Learning-based Methods Learning-based Methods
Interaction-related
Factors Single Trajectory Gaussian Process Sequential
IRL
Methods Network
Output Support Vector
Unimodal Trajectory Machine
Kalman Filtering
GNN GAIL
Multimodal Methods Hidden Markov
Output
Types Trajectory Model
Monte Carlo
Intention Dynamic Bayesian Generative Model DIRL
Methods
Network

Fig. 2: The taxonomy of trajectory-prediction models for AVs.

learning-based methods, respectively. The main contributions where n represents all traffic vehicles detected by the ego
of this work can be summarized as follows: vehicle; xtj , yjt refers to coordinates of vehicle j at time step
1) The popular trajectory prediction methods for AVs based t. X is the input of the prediction model, and vehicle trajectory
on physics, classic machine learning, deep learning, and with a time step length tf is predicted. For other methods, pt
reinforcement learning are elaborately reviewed. may also contain information such as velocity, acceleration,
2) The metrics and datasets for evaluating the performance orientation, etc. The output of the model is defined as:
of methods are detailed summarized.
Y = pth +1 , pth +2 , · · · , pth +tf .

3) The pros and cons of each method are discussed, and (3)
potential research directions are outlined.
The rest of this paper is arranged as follows: In Section II, Regard the trajectory prediction model as the function F .
the problem of trajectory prediction is described and methods Some methods can directly output future trajectories, that is
used are classified according to different criteria. Section III, Y = F (X). Others generate intermediate results M , from
IV, V, and VI review physics-based methods, classic machine which Y is generated: M = F1 (X), Y = F2 (M ). Note
learning-based methods, deep learning-based methods, and re- that, M can be maneuvers generated by some maneuver-based
inforcement learning-based methods, respectively. Section VII methods, or reward functions generated by reinforcement
summarizes the datasets and metrics for trajectory prediction learning-based methods, etc.
and compares some methods based on the NGSIM dataset.
Section VIII summarizes the pros and cons of each method B. Classification of Trajectory-Prediction Methods
and puts forward some possible future directions. The key
The classification of trajectory-prediction methods for AVs,
conclusions are presented in Section IX.
input, and outputs are shown in Fig. 2. Besides Fig. 3 shows
the input and output factors of trajectory prediction.
II. P ROBLEM F ORMULATION AND C LASSIFICATION OF
T RAJECTORY-P REDICTION M ETHODS 1) Prediction Methods: According to different modeling
approaches, prediction methods over the last two decades can
In this section, the problem of trajectory prediction is be divided into four parts: physics-based methods, classic ma-
described and the existing methods are classified based on chine learning-based methods, deep learning-based methods,
different criteria. and reinforcement learning-based methods, as shown in 2.
2) Contextual Factors: Since trajectory-prediction methods
A. Problem Formulation of Trajectory Prediction usually need to model future states based on their historical
Trajectory-prediction problems can be expressed as using trajectories under current environment, some factors should
past states of traffic participants in a given scene to estimate be considered. This study divides these factors into three
their future states. The historical states of traffic participants, categories: physics-related factors, road-related factors, and
e.g. vehicles, observed by the AVs or road side units is interaction-related factors.
a Physics-related factors refer to dynamics and kinematics
X = p1 , p2 , · · · , pth ,

(1) factors of vehicles.
t
where p (t ∈ 1, 2, ..., th ) represents the states when the b Road-related factors include the modeling of the map
number of time steps is t; th represents the length of historical information and corresponding traffic rules.
trajectory and pth denotes the states of traffic vehicles at the c Interaction-related factors include the social regulations and
current time. Regarding most of trajectory-prediction methods, inter-dependencies between vehicles’ maneuvers.
pt only contains the coordinate information of the vehicles, 3) Output Types: Trajectory-prediction methods need to
defined as: provide the future trajectories of traffic participants, which
can be unimodal or multimodal. In addition, some methods
pt = xt0 , y0t , xt1 , y1t , · · · xtn , ynt ,

(2) also provide the behavior intention of traffic participants.

Dynamics and
Kinematics Factors Social Regulations
Physics-related Factors Contextual Factors
Interaction-related Factors

Multimodal Trajectory Output Types

Unimodal trajectory

Inter-dependencies Traffic Light Signal

between Vehicles Road-related Factors
Interaction-related Factors

Lane Information
Road-related Factors

Traffic Rules
Road-related Factors Road Conditions
Road-related Factors

Intention

Fig. 3: The input and output factors of trajectory prediction.

Therefore, the prediction algorithm can be divided into the the Constant-Velocity (CV) and Constant-Acceleration (CA)
following three categories according to the output type. models [15], [20], [21], Constant Turn Rate and Velocity
a Unimodal trajectory. Prediction methods output a future (CTRV) and Constant Turn Rate and Acceleration (CTRA)
trajectory for a single or multiple traffic participants. models [22], [23], the Constant Steering angle and Veloc-
b Multimodal trajectory. Prediction methods generate mul- ity (CSAV) and Constant Steering Angle and Acceleration
timodal future trajectories for traffic participants with the (CSAA) models [24], etc.
probability of each future trajectory.
c Intention. Prediction methods produce behavior intentions
B. Single Trajectory Methods
to assist in prediction. Intention can be part of the final
output, or just be an intermediate step in the method. A simple method to predict vehicle trajectory is to directly
In the following sections, we will introduce different pre- apply the vehicle’s current state to the physics model. This
diction methods, and analyze them according to the above method applies to both the dynamics model [17]–[19], [25]
classification approaches. and the kinematics model [22], [26], [27]. In [25] the linear
bicycle model is used for collision avoidance, while Lytrivis
III. P HYSICS - BASED M ETHODS et al. [22] and Miller et al. [26] use the CTRA model and
the CV model, respectively. The advantage of this method
The physics-based methods employ the vehicle’s dynamics
lies in its high computational efficiency, and it is suitable for
or kinematics models. Typically, they include the Single Tra-
less constrained applications. However, they are not able to
jectory methods, the Kalman Filtering methods, and the Monte
consider the road-related factors and the uncertainty of the
Carlo methods, as shown in Fig. 4.
current state is unreliable for long-term prediction.

A. Physics models
Physics models contain the dynamics models and the kine- C. Kalman Filtering Methods
matics models. Dynamics models can become very complex, Single Trajectory methods assume the states of vehicles
including many inherent parameters, but complex dynamics are perfectly known without noises. In contrast, the Kalman
models bring small gains in predictive accuracy and introduce Filtering (KF) methods are able to handle such noises, which
an extra computation burden, such that a simple dynamics model the uncertainty or noise of the current vehicle’s state
model is preferred for trajectory prediction. In the prediction and its physics model by a Gaussian distribution [28]. The
task, the vehicle is usually regarded as a bicycle model, driven prediction and update steps are combined into a loop, the
by the front wheels [17]–[19]. mean value and covariance matrix of the vehicle state can be
Thanks to a simple structure, kinematics models are used obtained for each future time step, calculated as an average
more often than dynamics ones. The commonly used include trajectory with related uncertainty [15], [24].

𝑓(𝑆𝑡 ) + 𝑤(𝑡)
𝑆𝑡+𝑚
𝑆𝑡 𝑓(𝑆𝑡𝑆)𝑡+1
+ 𝑤(𝑡) 𝑆𝑡
𝑆𝑡+𝑚
𝑆𝑡 𝑆𝑡+1 𝑆𝑡-n 𝑆𝑡
(a) (b) (c)
𝑆𝑡-n
Fig. 4: Illustrations of physics-based methods: (a) the Single Trajectory methods, (b) the Kalman Filtering methods (based on
[15]), (c) the Monte Carlo methods (based on [16]).

Compared to the previous method, the advantage is that it Physics-based methods have more accurate results when the
considers the uncertainty of the predicted trajectory. However, movement of vehicles can be accurately described by kine-
unimodal Gaussian distribution
Potential Future is not enough to represent
Trajectories matics or dynamics models, but the physical model of the
different operations such that Kaempchen et al. [28] propose traffic participants is constantly changing, such that most of
Interacting Multiple Model Potential Futureto
(IMM) Trajectories
output Multimodal tra- these methods are only suitable for short-term prediction (no
jectories. Switched Kalman filter (SKF) [29] relies on a set of more than 1s). The use of one or more physical models can
Kalman filters used to describe physical models of the vehicle obtain future trajectories of traffic participants quickly, but the
and switch between them [28], [30]. Zhang et al. Ref. [31] choice of physical model and switch between them will bring
proposes a method based on vehicle-to-vehicle communication an obvious prediction error. One way to solve this problem is
and KF, enabling the host vehicle to predict trajectories of to take interaction-related factors into consideration like IMM-
remote vehicles for obstacle avoidance. Recently, Lefkopoulos KF [32]. To reach the state-of-art level, physics-based methods
et al. [32] present a novel method called Interacting Multiple possibly need to combine with the learning-based methods,
Model Kalman Filter (IMM-KF), which takes interaction- such as Ref. [36], which uses a learning-based discriminator
related factors into consideration. The proposed method uses to extract interaction information and generate model-based
the physics-based model to predict trajectories of traffic par- trajectories.
ticipants for multiple seconds.
IV. C LASSIC M ACHINE L EARNING - BASED M ETHODS
D. Monte Carlo Methods Unlike physics-based methods that use several physics mod-
In general, an analytical expression for the predicted state els, machine learning-based methods apply data-driven models
distribution is usually unknown without any assumptions of the to predict trajectories. According to the body of literature,
linearity or the model’s Gaussian nature. Monte Carlo method classic machine learning-based methods for trajectory predic-
can simulate the state distribution approximately. It randomly tion of AVs include Gaussian Process (GP), Support Vector
samples the input variables and applies the physics model to Machine (SVM), Hidden Markov Model (HMM), Dynamic
generate potential future trajectories. To ensure the feasibility Bayesian Network (DBN), K-Nearest Neighbors (KNN), Deci-
of a maneuver, the generated trajectory samples can be filtered sion Tree, and so on. Since the most commonly used methods
with a lateral acceleration lower than the actual allowable in classic machine learning are GP, SVM, HMM, DBN, this
lateral acceleration [16], or a vehicle’s physical limitations can section will mainly introduce these methods.
be considered in the physics model such that the input of the
model will be more realistic [33]. The Monte Carlo method A. Gaussian Process
can be used to predict the trajectories of traffic participants
from a completely known state or from an uncertain state The prototype trajectory method is one of the maneuver-
estimated by a filtering algorithm. Okamoto et al. [34] present based methods, which divides vehicles’ trajectories into a
a maneuver-based model that applies the Monte Carlo method collection of several types of prototype trajectories. The model
to predict future trajectories by the identified maneuver. Wang measures the similarity between the historical trajectory and
et al. [35] use the Monte Carlo method to predict trajectories the prototype set to predict the possible trajectory. Gaussian
and utilize MPC to optimize the reference trajectories. Process (GP) [37] is an effective means used in the prototype
trajectory method [38]–[40].
When GP is applied to predict trajectory, trajectories are
E. Summary regarded as the samples of GP, sampled along the time axis.
Physics-based methods utilize physics models to accom- The samples are represented by N discrete points to map to
plish trajectory prediction with relatively low computational the N-dimensional space. After that, the sample satisfies the N-
resources. Based on the classification approaches in Section dimensional Gaussian distribution in the N-dimensional space.
II-B2 and II-B3, this paper classifies physics-based methods Therefore, the main task of the GP model at the modeling stage
as shown in Table I. Physics-based methods are the first and is to determine the parameters of GP through the samples.
simplest methods used by researchers. Although the accuracy In [41] HMM is used to estimate likely behaviors, then GP
of these methods is relatively low, more and more models is employed to predict the trajectories. GP can also be used
use the idea of physics-based models to improve the accuracy. to model interaction-related factors, Trautman et al. [42] use

TABLE I: Summary of physics-based methods.

Physics-based Methods Single Trajectory Methods Kalman Filtering Methods Monte Carlo Methods
Physics-based Factors [17]–[19], [22], [25]–[27] [15], [24], [28]–[32] [16], [33]–[35]
Contextual Factors Road-related Factors [22] [35]
Interaction-related Factors [32] [34]
Unimodal Trajectory [17]–[19], [22], [25]–[27] [15], [24], [31]
Output Types Multimodal Trajectory [28]–[30], [32] [16], [33]–[35]
Intention [32] [34]

GP for joint collision avoidance to solve the frozen robot and the state transition probability is not related to time. The
problem. Guo et al. [43] apply GP and Dirichlet process mathematical expression is:
(DP) to define motion processes and apply a non-parametric
Bayesian network to extract potential motion patterns. P (Sn+1 = s | S1 = s1 , S2 = s2 · · · , Sn = sn )
For methods based on prototype trajectory, each trajectory (4)
=P (Sn+1 = s | Sn = sn ) .
can be represented by the prototype set through training.
Therefore, the main difference between these methods is how In real life, we can only observe the distinct state that is
to construct the prototype trajectory. Govea et al. [44] obtain exposed on the surface, but no intuitive representation of its
the prototype trajectories by statistically calculating the mean hidden states exists. Therefore, it is necessary to establish
and the variance of all trajectory samples. Hermes et al. [45] a Markov process with hidden states and get the essential
divide the sample trajectories into several subsets and obtain states of events through the observable states set related to
several prototype trajectories after training to reflect vehicle the hidden states’ probability, which is the so-called Hidden
movement changes. However, it is difficult to generalize Markov Model. HMM is represented by (S, O, A, B, π) [50],
these models to other scenes because the methods based on as shown in Fig. 5:
trajectory samples only trained for specific scenarios. • S = {S1 , S2 , · · · SN } represents the hidden states se-
quence.
B. Support Vector Machine • O = {O1 , O2 , · · · OM } represents the observation se-
quence.
Support Vector Machine (SVM) can learn and recognize
• A represents the transition probability matrix between
driver’s maneuver in a complex environment. The key of
hidden states.
SVM is to find the support vector that meets the classification
• B is the output matrix, representing the transition prob-
requirements and determine the optimal hyperplane which can
ability of hidden states to output states.
maximize the interval of the classified data. When applied
• π is the initial probability matrix, representing the initial
to the trajectory-prediction problem, driving maneuvers are
probability distribution in hidden states.
usually defined into several categories: turning left, turning
right, keeping straight, etc. Then it uses the kernel function to When HMM is used in the trajectory prediction, the his-
convert the input data to high-dimensional and perform linear torical states of traffic participants are represented by obser-
classification in the space to find the driving maneuvers so as vation sequence O, and HMM solves the most likely future
to predict the trajectory. observation sequence. Holger et al. [52] use the steering angle
Mandalia et al. [46] first apply SVM to identifying lane- and global coordinates as the input of HMM to predict the
changing maneuver, using characteristics such as steering driver’s maneuvers. Based on HMM, Qiao et al. [53] propose
wheel angle, position, and acceleration for identification. an algorithm called HMTP* that selects parameters adaptively
Since SVM can output the characteristics of classification to simulate the real scenes at a dynamically changing speed.
probability, Kumar et al. [47] propose a layered architecture In [51], HMM combined with Fuzzy Logic is used for driver
method combining SVM and Bayesian filtering to identify maneuver prediction. Besides, HMM can be integrated into
lane-changing maneuvers so as to obtain more accurate iden- decision-making and planning systems. In [54], HMM is used
tification results. In [48], [49], SVM is used to identify for trajectory prediction and risk assessment, and the results
the maneuvers of traffic participants. Accordingly, SVM can of are fed into the decision-making and planning system.
identify vehicles’ maneuvers, but SVM needs to define the
driver’s maneuver in advance, and preset maneuvers will also
𝑎22
impact the final prediction results.
𝑺𝟐 𝑎23
𝑎11 𝑎12
𝑎21 𝑎32 𝑎33
C. Hidden Markov Model
𝑺𝟏 𝑎31 𝑺𝟑
SVM is effective in classification problems, but it is not
𝑎13
as effective as Hidden Markov Model (HMM) in trajectory
prediction. HMM is one of the most popular classic machine 𝑏12
𝑏11 𝑏21 𝑏23 𝑏33
𝑏22 𝑏 𝑏13
learning-based trajectory prediction methods. HMM is also a 𝑏31 23

maneuver-based method that uses Markov Chain. The Markov 𝑶𝟏 𝑶𝟐 … 𝑶𝑴

Chain refers to a process containing finite events, the state at
time t + 1 of the system is only related to the previous time t, Fig. 5: Illustration of Hidden Markov Model (based on [51]).

Although traditional HMM methods have achieved a great DBN to judge driving maneuvers and utilize the kinematics
success in predicting driver’s maneuvers, they do not consider model corresponding to each driving maneuver to predict the
the impact of interaction-related factors in the prediction trajectory. In [61], the vehicle maneuver is predicted by game
process, such that its prediction results are not accurate enough theory, and then the vehicle motion is judged by DBN which
in actual traffic scenes. Deo et al. [55] propose a vehicle considers the interaction-related factors. He et al. [62] use
trajectory prediction model based on HMM and Variational DBN to identify vehicle following and lane-change maneuvers,
Gaussian Mixture Models (GMM) considering interaction- and predict the trajectory of lane-change maneuver. In [63],
related factors. The vehicle interaction information is obtained DBN is designed to consider physics-related factors, road-
by finding the optimal solution of the energy function. Zhang related factors, and interaction-related factors. Li et al. [64]
et al. [56] propose a GMM-HMM maneuver prediction model combine DBN with end-to-end models to predict pedestrian
based on game theory, considering interaction-aware factors. trajectories, where DBN is used to extract traffic participants’
characteristics and dynamics information, end-to-end models
D. Dynamic Bayesian Network treat the prediction problem as a sequential generation problem
to generate the prediction trajectory.
To improve the accuracy of trajectory prediction, the pre- DBN models the effect of interaction between traffic partici-
diction model should consider at least both vehicle states and pants when applied to trajectory prediction and perform well in
the interaction effect between traffic participants. Dynamic classic machine learning-based methods. As maneuver-based
Bayesian Network (DBN) mentioned by Koller et al. [57] can methods, DBN models obtain high recognition performance
model such interactions. DBN is a maneuver-based method and have been used in several real-world tests [66]. However,
that uses the Bayesian Network and considers time sequence. DBN still faces the error problem from recognizing maneuvers
The basic concepts and probabilistic inference of DBN are to generating trajectories. Many methods can only judge two
the same as Bayesian Networks. The difference is Bayesian or three maneuvers, such as lane-keeping and lane-changing,
Networks describe static systems, while Kevin et al. [58] and the model’s generalization ability is not strong.
introduce the concept of time templates to solve timing issues
in probabilistic models. Time segment refers to a time template
materialized according to DBN, which discretizes continuous E. Summary
time into countable points with preset time granularity. In summary, the classic machine learning-based methods
Generally, the preset time granularity should be consis- determine the probability distribution by mining data fea-
tent with the actual state acquisition frequency, and DBN is tures, which can be classified as shown in Table II. The
trained according to the sensor sampling frequency as the classic machine learning-based methods provide new ideas
time segment. Besides, the inference and learning methods for trajectory prediction, which promote the development of
of DBN need to be converted into Bayesian Networks before learning-based methods. With more factors to be considered,
they can be directly applied. Common inference methods of the accuracy of these methods keeps increasing, contributing
Bayesian Networks include the Variable Elimination Method, to trajectory prediction. Most of these methods are maneuver-
Clique Tree Algorithm, and Sampling Algorithm. The learning based methods, which can predict future trajectories by first
methods of Bayesian Networks include Maximum Likelihood judging maneuvers. In these methods, the maneuvers usually
Estimation, Bayesian Estimation, EM algorithm, etc. Also, need to be provided or identified in advance.
special inference methods for DBN with high complexity exist,
such as the forward and backward inference method [65]. V. D EEP L EARNING - BASED M ETHODS
Most traditional prediction methods are only suitable for
V ∆𝑽 𝒙 ∆𝒙 V ∆𝑽 𝒙 ∆𝒙 Behavior
layer simple prediction scenes and short-time prediction tasks. Re-
cently, trajectory prediction methods based on deep learning
H H
Hidden have become increasingly popular because they can not only
layer
consider the physics-related factors and road-related factors
Observation
but also consider the interaction-related factors and adapt to
𝒅𝟎 ∅𝟎 𝒅𝟏 ∅𝟏
t t+1
layer more complex scenes. A general description of these methods
is shown in Fig. 7. In the following, this paper summarizes
Fig. 6: Illustration of Dynamic Bayesian Network (based on the current popular deep learning-based trajectory prediction
[62]). methods for AVs.

The architecture of DBN includes a behavior layer, a hidden

layer, and an observation layer, as shown in Fig. 6. The A. Sequential Network
behavior layer represents the network’s input information, The sequential network is used to extract features of his-
and the observation layer represents the driver’s maneuver. torical trajectory and can be used as the output layer. The
Using this architecture, Gindele et al. [59] model the driving sequential network for trajectory prediction based on deep
maneuvers of multiple vehicles. The input information in- learning mainly includes Recurrent Neural Network (RNN),
cludes all vehicle states, vehicle interaction relationships, road Convolutional Neural Network(CNN), and Attention Mecha-
structures, observation states, etc. Schreier et al. [60] apply nism(AM).

TABLE II: Summary of classic machine learning-based methods.

Classic Machine Learning-based Methods GP SVM HMM DBN
Physics-based Factors [38]–[45] [46]–[49] [41], [50]–[56] [59]–[64]
Contextual Factors Road-related Factors [38]–[41], [43] [46]–[49] [41], [52], [54]–[56] [59], [60], [62], [63]
Interaction-related Factors [42], [43] [55], [56] [59], [61]–[64]
Unimodal Trajectory [46]–[49] [50], [53] [61], [62]
Output Types Multimodal Trajectory [38]–[45] [41], [54]–[56] [59], [60], [63], [64]
Intention [38]–[41], [43] [46]–[49] [41], [50]–[53], [55], [56] [60]–[64]

Historical Trajectory Feature extraction and regression Predicted Trajectory

Fig. 7: Description of deep learning-based methods.

1) Recurrent Neural Network: Different from the classic and map information, this paper uses nonlinear optimization
machine learning-based methods and CNN that can effec- methods to optimize the initial future trajectory. In order to
tively process spatial information, RNN is designed to handle predict multi-modal trajectories, Zyner et al. [74] adopt the
temporal information [67], [68]. It stores information of the weighted Gaussian Mixture Model (GMM) for prediction, and
previous time steps and utilizes hidden states together with the its parameters are obtained by an encoder-decoder three-layer
input to determine the output, as shown in Fig. 8. However, LSTM, and then predicted trajectories are clustered using
in practical application, it is found that when the number of the modal with the highest probability. Hyeon et al. [75]
time steps is large, the gradient of the RNN is more likely utilizes an encoder-decoder LSTM architecture. The LSTM
to attenuate or explode. Gated RNN, e.g.,Long Short-Term encoder encodes the historical trajectory features, and the
Memory Network (LSTM) and Gated Recurrent Unit (GRU) LSTM decoder solves the K most likely future trajectories
can solve this problem. Trajectory prediction models using through the beam search algorithm. Xing et al. [76] predicts
RNN can be divided into the single RNN models and the trajectories for the first vehicle in the fleet, which uses GMM
multiple RNN models. to distinguish driving styles and uses the LSTM and fully
connected regression layer to analyze sequence data and
driving styles to predict vehicle trajectory. By calculating the
Future Trajectories
distance between the vehicle and the centerline, Chang et al.
[77] propose an LSTM encoder-decoder baseline that takes
RNN
Cell
RNN
Cell
RNN
Cell
into account map information and social information, and
compares with the Nearest Neighbor (NN) regression method.
Considering lane information, Kawasaki et al. [78] combine
Historical Trajectory Features LSTM with KF for multi-modal trajectory prediction.
Fig. 8: Illustration of Recurrent Neural Network With the development of neural networks, several groups
of RNN architectures are widely used. Two groups of LSTM
A single RNN is utilized for maneuver-based and single- networks are used by Dai et al. [79] to predict the target
modal trajectory prediction, or applied to other auxiliary mod- vehicle’s trajectory. One group is used to model the trajectory
els to support more complex functions, such as interaction- of surrounding vehicles, and the other group is used to model
aware prediction. LSTM is used in [69]–[71] as a sequence the interaction between surrounding vehicles. Ding et al. [80]
classifier to predict vehicles’ maneuvers. To achieve the goal, present a set of GRU encoders to describe the paired interac-
cells of LSTM extract vehicle features, and the hidden states tions between vehicles. Xin et al. [81] use an LSTM to predict
of the last cell will be fed to the output layer to predict the target lane of the target vehicle, and use another LSTM to
maneuvers. In [69], [70], the input is fed to the fully connected predict the trajectory according to the target vehicle’s state and
layer to extract features, and then substituted into the three- the predicted target lane. To predict the multi-modal trajectory,
layer LSTM; In [71], two layers of LSTM without embedding Deo et al. [82] proposes six different LSTM decoders, which
are used. Altché et al. [72] use a single-layer LSTM to ac- are related to six specific maneuvers. The encoder LSTM
complish trajectory prediction of the target vehicle. To predict encodes the features of the historical trajectory. The one-hot
the maneuver-based trajectory, Ding et al. [73] use an LSTM vector that represents the specific maneuver of the vehicle
encoder that encodes the states of the target vehicle to predict connects the encoder and the decoder. The decoder LSTM
its maneuver, and trajectory prediction is achieved by using predicts binary Gaussian distribution parameters to output the
the predicted maneuver and map information. Finally, based future trajectory and predict the probability of each of the
on interaction-related factors, traffic rules (such as red lights), six maneuvers. In [83], five RNNs and three fully connected

layers are used to process the input data and output the three In addition, some other neural networks apply to trajec-
coefficients of the cubic polynomial, representing the future tory prediction using the CNN framework. In [97], CNN is
trajectory of the target vehicle. Tang et al. [84] use rigorous applied to the rasterized image meanwhile TCN is used to
mathematical modeling from a probabilistic perspective to capture history trajectories features which will be concatenated
construct an MFP model with an end-to-end structure. The with the raster feature and the current state. Zhang et al.
model contains a group of RNNs that share parameters in par- [98] employ TCN to predict the lane-change maneuver and
allel, forming a dynamic state encoder based on the attention trajectories. In [99], a persistent Memory Augmented Neural
mechanism. Each encoder RNN represents a vehicle’s trajec- Network (MANN) is used for trajectory prediction. CNN is
tory by aggregating the history information and automatically applied to understand the scene image, scene features and
learns multi-modal information through discrete latent codes. trajectory features will be processed in MANN to generate
Then the decoder RNNs calculate the probability of the multi- multimodal trajectories. In recent years, new methods use
modal trajectories to obtain the prediction trajectory. In [85], CNN for trajectory prediction and achieve state-of-the-art
multi-modal trajectories are generated based on a multi-head results. The first one [100] uses CNN to output a heatmap to
attention layer, which uses LSTM as encoder-decoder and puts represent the agent’s possible future. The second one [101]
two attention layers in the middle. In [86], a recurrent attention introduces the point cloud learning method into trajectory
and interaction model is presented to predict trajectories of prediction to capture both spatial and temporal information.
pedestrians. Zhang et al. [87] propose a multiple LSTM-based 3) Convolutional and Recurrent Neural Network: RNN is
framework that combines intention prediction and trajectory able to extract temporal features, which is very suitable to
prediction. The intention of the vehicle at intersections is process time-series information; whereas, CNN is capable
predicted by one LSTM model and the trajectory is predicted of extracting spatial features including the interaction-related
by another LSTM-based prior trajetories model. factors between traffic participants. This has inspired some
2) Convolutional Neural Network: Recently, CNN has researchers to use a combination of RNN and CNN to process
achieved success in many tasks, such as computer vision the temporal and spatial information for trajectory prediction.
[88], [89] and machine translation [90]. Besides, Nikhil et Deo et al. [102] use an LSTM encoder to extract the temporal
al. [91] believe that using CNN to predict the trajectory is information of surrounding vehicles, and then feed it into a
better than RNN because the trajectory has a strong spatio- social pooling layer [103] to form a social tensor. In this study,
temporal continuity. They apply a sequence-to-sequence struc- the social pooling layer captures interaction-related factors
ture, take the historical trajectory as input, and implement time between vehicles after spatial rasterization, and then the social
continuity by stacking the convolutional layer after a fully tensor is sent to a set of CNNs to learn the spatial correlation
connected layer, and output the future trajectory through a of vehicles. Finally, six LSTM decoders are used to generate
fully connected layer. Experiments show that using this CNN- distributions of six specific maneuvers, which include three
based network runs faster. The general methods using CNN lateral maneuvers (left lane change, right lane change, and
to process trajectory information is shown in Fig. 9. keep lane) and two longitudinal maneuvers (brake, normal
speed). Then it finds the maneuver with the highest probability
Future Trajectories and predict its future trajectory. Chandra et al. [104] propose
a model called TraPHic based on the CNN-LSTM hybrid
network to predict traffic participants’ trajectories. The model
feeds the state and the surrounding objects of the main vehicle
CNN
kernel
CNN
kernel
CNN
kernel
CNN
kernel
CNN
kernel
into the CNN-LSTM networks to obtain their features, then
connects these features and the LSTM decoder to obtain the
predicted trajectory of the main vehicle, but this algorithm only
Padding Padding predicts the trajectory of one object per operation. Xie et al.
[105] also use the CNN-LSTM framework. They use ”box” to
Historical Trajectory Features
detect and eliminate outliers in the vehicle trajectory to obtain
Fig. 9: Illustration of Convolutional Neural Network for valid trajectory data, which will be fed into the convolutional
trajectory prediction. layer and the maximum pooling layer to extract interaction-
aware features which will be fed into an LSTM and a fully
However, most methods applying only the CNN framework connected layer for prediction. The hyperparameters of the
use bird-eye image as the input. In [92], a set of possible future model are optimized by the Grid Search (GS) algorithm.
trajectories is generated by the vehicle state (velocity, acceler- To better predict the trajectory, researchers introduce High
ation, and yaw rate) and the raster image, the trajectory with Definition (HD) maps information to make the predicted tra-
the highest probability is found as the future trajectory through jectory closer to the real trajectory [106]. HD maps generally
analyzing semantic features. Cui et al. [93] make a progress include raster maps and vector maps, which contain semantic
in embedding a bicycle vehicle kinematics model with CNN information about the road and can indicate line segments.
[94] for trajectory prediction, which also operates on raster Some methods employ CNN to extract the scene context
maps. In [95], using a novel fast CNN architecture trajectory information from the raster maps to take into account the road-
prediction of vulnerable road users (VRUs) is presented by related factors and the interaction-related factors. Because
context rasterization techniques [96]. methods of using CNN to process raster maps belong to the

category of perception system, which is not the focus. There- Predicted Future States
fore, this paper will briefly explain the process of trajectory Sh+1 Sh+f
prediction after getting semantic features from raster maps.
Classic algorithms contain DESIRE [107], using stochastic 1- Linear

step policies. Hong et al [108] encode semantic features with Add & Norm
ConvNets to predict vehicles behaviors. Based on ConvNets, Feed
Chai et al. [109] find trajectory anchors through unsupervised Forward

learning, use GMM and semantic features to train their model. Add & Norm Add & Norm
Except for the raster maps, the processing methods of vector Feed Multi-Head
maps are shown in section V-B. Forward Attention NX

4) Attention Mechanism: The attention mechanism allows NX

Add & Norm
the human to use limited attention resources to quickly filter Add & Norm
Multi-Head
out high-value information from a large amount of informa- Attention Multi-Head
tion. The attention mechanism (AM) in deep learning mimics Attention

the way humans think and is widely used in various types

of deep learning tasks such as Natural Language Processing Positional Positional
Encoding Encoding
(NLP), image classification, and speech recognition [110]–
[112]. AM is usually used in the trajectory prediction task Linear Linear
[113]–[115]. In [116], the multi-head attention is used to
extract the lane and vehicle attention to output the distribution S1 S2 Sh Sh+1 Sh+f
of the future trajectories. In [117], AM models the interac- History States Ground Truth Future States
tions between traffic participants by extracting attentions from
LSTM encoders, and in [118] each attention head models Fig. 10: Illustration of Transformer (based on [120]) for
a possible way of interaction between the target and the trajectory prediction.
combined context features. Vaswani et al. [119] propose the
Transformer model, which uses substantial attention mech-
anisms to complete the sequence machine translation tasks these methods is still unsatisfactory. Usually, each scene can be
without using RNN. The constraint of sequential computation viewed as an irregular graph and each graph has an unordered
remains for RNN, while the attention mechanism can perform node with a variable size, as shown in Fig. 11. The number
a parallel calculation on sequential data. Since the Transformer of adjacent nodes of each node in the graph varies, resulting
model has achieved excellent results in machine translation, in some important operations, such as convolution, which are
researchers apply the Transformer model for the trajectory easy to calculate on the image but no longer suitable for direct
prediction task, as shown in Fig. 10. In [120], the sequential use on the graph. Still, each node in the graph will have
step-by-step model of LSTMs and only-attention-based mod- edges related to other nodes. This information can be used
els, including the Transformer (TF) and the larger Bidirectional to capture the interdependence between objects. Therefore,
Transformer (BERT), are compared for predicting the future Graph Neural Network (GNN) is very suitable for vehicle
trajectories of pedestrians. It shows that the TF-based model trajectory prediction problems based on interaction-related
has better performance especially in the long-term prediction, factors [125]. Diehl et al. [126] confirm this idea. They use two
and can also cope with the missing input observations, which popular graph networks: Graph Convolutional Network (GCN)
is the common phenomenon of real sensor data. In addition and Graph Attention Network (GAT) for trajectory prediction
to modeling the trajectory sequences, TF can also model based on interaction-related factors and prove its effectiveness.
the interaction between traffic participants and environment
[121]–[123]. Liu et al. [124] apply stacked transformers as States of Traffic
the backbone, which integrate environmental information into Participants
trajectory proposal to predict future trajectory. It can be seen
that the transformer-based model has advantages in processing States of Targets

time-series data.
GNN operation

B. Graph Neural Network Fig. 11: Description of Graph Neural Network.

When it comes to prediction methods considering
interaction-related factors, each object in the environment can As for road-related factors, using CNN to process raster
be regarded as a node to form a graph. Although some methods maps has a big computation burden and it is easy to lose
using RNN and CNN have achieved great success when information. In contrast, vector maps use polylines with mul-
extracting Euclidean spatial data features, the data in many tiple control points and their attributes to represent structured
practical application scenes are generated from non-Euclidean road information. These polylines form groups of vectors that
spaces. Because many classic deep learning-based methods can be used as nodes in GNN, which has been widely used in
are processing non-Euclidean spatial data, the performance of trajectory prediction. In the following, this paper will introduce

the vehicle trajectory prediction methods based on GNN. convolutional networks. Chandra et al. [133] use a two-layer
1) Graph Convolutional Network: Graph Convolutional GNN-LSTM structure to solve the trajectory prediction prob-
Network (GCN) is the most popular graph neural network lem. The first layer uses an LSTM encoder-decoder to predict
method. The graph convolutional network extends the convolu- future trajectories of traffic participants, and the second layer
tion operation from traditional image data processing to graph models the interaction-related factors of traffic participants
data processing. The core idea is to learn a mapping function, through a weighted dynamic geometric graph network (DGG)
which can extract interaction-aware features from the features [134]. The spectrum in the graph is extracted by specific
of nodes in the graph and the features of their neighbors. regularization of the eigenvalues after the LSTM encoder-
In the space-based graph convolutional network, a GCN- decoder, and the spectrum sequence is fed into the LSTM
based trajectory prediction model called GRIP is proposed by network at the first layer to complete the prediction task.
Li et al. [127], which treats each vehicle as a node at each Zhao et al. [135] propose a spectrum-based GCN network
sampling time and considers the interaction-related factors. If that can share information among all vehicles in the scene
two nodes represent the same vehicle and the sampling time is to consider the change of the surrounding vehicles to adapt to
adjacent, an edge exists between the two nodes, representing the environment.
the time relationship. If two nodes at the same time represent 2) Graph Neural Network using Vector Maps: Benz [136]
two vehicles, and the distance between the two vehicles is first applies HD maps to trajectory prediction, and executes
less than a fixed value, an edge exists between the two map topology based on the lane information associated with
nodes, representing the spatial relationship and the interaction the vehicle to obtain its future trajectory along the lane.
state of these objects. GRIP uses a GCN model composed However, it does not consider interaction-related factors. Since
of several convolutional layers and graphics operations to the Argoverse dataset [77] with vector maps is proposed,
model the graph network. The output of GCN is fed to the researchers have used GNN to obtain the interaction features
LSTM encoder-decoder to predict the trajectory of surrounding between vehicles, between vehicles and maps to improve the
vehicles. Although GRIP has a considerable improvement over accuracy of trajectory prediction. Taking vehicles and vector
the popular models at that time, GRIP uses a fixed graph maps in the scene as nodes, Gao et al. [137] propose VectorNet
network to represent the interaction-related factors between which uses GNN to achieve trajectory prediction. Liang et al.
traffic participants, the generalization ability in complex scenes [138] use CNN to extract vehicle features and GCN to extract
needs to be improved. Therefore, Li et al. [128] propose an lane features from vector maps, and then combine these two
upgraded version for GRIP, called GRIP++, which uses both features for trajectory prediction. Using VectorNet to extract
fixed and dynamic graph networks to predict the trajectory map features, Zhao et al. [139] propose a target-driven method
of traffic participants. This method has higher accuracy than called TNT, which defines sparse goal anchors and selects the
GRIP, and at the end of 2019 it ranked first in the Baidu best trajectory to the target, and DenseTNT [140] estimates
Apolloscape dataset [129] ranking. Besides, GRIP uses LSTM dense goal candidates and get better results than TNT. Zeng
encoder-decoder, while GRIP++ uses GRU encoder-decoder. et al. [141] use LaneRCNN to obtain the representation of each
Jeon et al. [130] propose a SCALE-Net model, which can participant’s local lane map to encode their past trajectory and
predict any number of surrounding vehicles’ trajectories while local map topology, and complete the interaction of the local
keeping the performance. SCALE-Net uses a edge-enhance lane map through the interaction module.
graph convolutional network (EGCN) [131] to learn edge 3) Other Graph Neural Network: The attention mechanism
features in the traffic flow. For each moment, each vehicle is has now been widely used in sequence-based tasks. Its advan-
a node, and the node state is Xl = [x, y, vx , vy , θ] (represent tage is that it can amplify the impact of the most important
x coordinate, y coordinate, speed in x-direction, speed in part of the data. Veličković et al. [142] propose Graph Atten-
y-direction and heading angle respectively), and the edges tion Network (GAT). When aggregating feature information,
between nodes are represented by ∆X = |Xm − Xl |, showing GAT uses an attention mechanism to determine the weights
a multi-dimensional state. The built graph is calculated by between nodes. Huang et al. [143] apply GAT to the trajectory
the multi-layer EGCN algorithm. At the next moment, the prediction. The model firstly uses an LSTM encoder to encode
graph model will be rebuilt and the EGCN will run again. trajectories of traffic participants, then uses GAT to calculate
The output of EGCN is processed by a sequence model, the weight of attention for each traffic participant and forms
which consists of an LSTM encoder-decoder, followed by a the interaction information of each participant at this moment
five-layer multi-layer perceptron (MLP). GCN is also used in by weighted averaging these states. Finally, the model uses an
pedestrian trajectory prediction tasks. For example, Mohamed LSTM decoder to generate predicted trajectories.
et al. [132] model pedestrian trajectories as spatio-temporal Besides, Zhang et al. [144] propose a social graph net-
graphs to replace clustering layers. The edges of the graph work applied to trajectory prediction. To effectively capture
represent the interaction-related factors between pedestrians. the social behaviors of traffic participants, the team uses a
To solve the problem for recursive units, the model uses GCN directed graph, which is dynamically constructed based on
and temporal convolutional network (TCN) to operate on the real-time position and speed direction. Based on the social
spatio-temporal graph such that the model can predict the graph, the LSTM network is constructed to collect social
entire sequence at one time. effects and trained by samples to generate end-oriented and
All of the above methods use space-based graph convolu- interaction-aware representations. For the uncertainty of the
tional networks, but some papers use spectrum-based graph future trajectory, the network uses a time stochastic method

to sequentially learn the uncertainty in social interaction to then feed it into the GRU generator for trajectory prediction.
form a priori model, then sample the prior model and use Sadeghian et al. [150] propose a GAN-based model for
layered LSTM to decode step by step to generate the prediction predicting pedestrians and vehicles’ trajectories, considering
trajectory. Recently, a graph-structured recurrent model named all vehicles’ impact and the interaction between them for
Trajectron++ is proposed in [145] to produce dynamically- trajectory prediction. A feature extractor is applied, which
feasible future trajectories, which represents a scene as a uses CNN to extract features from the scene and an attention
directed spatio-temporal graph and is designed to be tightly mechanism to consider the interaction-related factors. Hegde
integrated with the planning system for AVs. et al. [151] use the vehicle’s coordinate information for the
GAN network to predict the vehicle trajectory. Zhao et al.
[152] propose a multi-agent tensor fusion network MATF-
C. Generative Model
GAN which can preserve spatial structure information. The
In the task of trajectory prediction, the multi-modality architecture combines the strengths of agent-oriented and spa-
of the trajectory brings uncertainty and challenges to the tial structure-oriented trajectory prediction methods and learns
research. To explain the inherent multi-modal distribution, to represent relevant information about social interactions and
some researchers use generative models to generate multi- physical constraints of the scene through end-to-end training.
modal trajectories. Generative models for trajectory prediction Wang et al. [153] propose a TS-GAN model, which uses a self-
include Generative Adversarial Network (GAN) and Condi- created convolutional social mechanism and a recurrent social
tional Variational Auto Encoder (CVAE). mechanism to extract vehicle spatial and temporal information
in the GAN network. Song et al. [36] use the vehicle state and
Generator Discriminator
the vector map information to generate model-based multi-
modal trajectories, and use the learning-based discriminator to
Real/False

History Predicted
extract vehicle interaction information and obtain the optimal
States States trajectories.
2) Conditional Variational Auto Encoder: The so-called
Auto Encoder (AE) compresses data into a low-dimensional
Fig. 12: Description of Generative Adversarial Network. vector representation through the encoder and uses a decoder
to decode the low-dimensional vector to obtain a reconstructed
1) Generative Adversarial Network: Generative Adversar- output. AE hopes to minimize reconstruction errors. However,
ial Network (GAN) was first presented by Ian Goodfellow AE is accused of simply ”remembering” data, and its ability to
[146] in 2014. With superior performance, it has quickly generate data is poor. Kingma et al. [154] propose a Variational
become a major research hotspot in less than two years. GAN Auto Encoder (VAE) framework to use neural networks to
is essentially a generative model, which is mainly composed parameterize the distribution in variational inference, thereby
of two parts, namely Generator and Discriminator. The gener- improving the generation ability of the model. In [155], a Con-
ator is utilized to generate a random sample similar to the ditional VAE (CVAE) is proposed to complete the structured
real sample, and the discriminator is used for determining prediction tasks. For trajectory prediction, combining CVAE
whether the data is true or false. Through the continuous and RNN variants into the form of encoder-decoder is an
game evolution of the generator and discriminator, GAN can effective way for trajectory generation [75], [82], [84]. Some
obtain a generator with higher quality and a Discriminator with methods that use raw sensor data as input also use CVAE for
stronger judgment ability. multi-modal trajectory prediction [108], [156], [157]. These
When applying GAN to trajectory prediction, the generator methods of using CVAE as the network framework have been
is utilized to generate the predicted trajectory, and the discrim- mentioned above, and will not be repeated in this section.
inator is utilized to judge whether the predicted trajectory is
correct, as shown in Fig. 12. A typical application is that Gupta
et al. [147] use GAN for pedestrian trajectory prediction called D. Summary
SGAN. The generator uses an LSTM encoder, pooling module, In summary, deep learning-based trajectory prediction meth-
and LSTM decoder to generate the predicted trajectory, and the ods for AVs can be classified into Table III. More and
discriminator uses LSTM to determine whether the predicted more researchers apply deep learning-based methods to spatial
trajectory is reasonable. In the model, the pooling module and temporal prediction problems like trajectory prediction
is social pooling, whose purpose is to help consider all and achieve state-of-the-art results. Thus, we summarize the
pedestrians and reduce computation. Unlike the social pooling mainstream methods based on deep learning, give the state
proposed in [102], the pooling module used here deals with encoder, context encoder, interaction module, decoder, and
the interaction between pedestrians. Based on SGAN, Yang summary description of these methods, as shown in Table IV.
et al. [148] design a pedestrian trajectory prediction model Using sequential networks to extract historical trajectory fea-
focusing on how to more effectively extract interaction-related tures, processing trajectory features through different network
factors and generate a variety of feasible trajectories, which structures, extracting interaction information of traffic partic-
adds a latent variable predictor on the basis of SGAN to ipants and road information, and using sequential networks
estimate latent variables. Li et al. [149] use Environmental to obtain the final predicted future trajectory, has become the
Attention Mechanism (EAM) for deep feature extraction, and mainstream research direction of trajectory prediction. Deep

TABLE III: Summary of deep learning-based methods.

Deep Learning-based Methods Sequential Network GNN Generative Model
Physics-based [69]–[87], [91]–[102], [104]–[109], [126]–[128], [130], [132], [133], [36], [75], [82], [84], [108],
Factors [116]–[118], [120]–[124], [145] [135], [137]–[141], [143], [144] [147]–[153], [156], [157]
[69]–[75], [77], [78], [80]–[84], [86], [87], [36], [75], [82], [84], [108],
Road-related
[92]–[102], [106]–[109], [116]–[118], [132], [137]–[141], [143] [148]–[150], [152], [153], [156],
Factors
Contextual [121]–[124], [145] [157]
Factors Interaction- [73], [76], [77], [79]–[82], [84]–[87], [92],
[126]–[128], [130], [132], [133], [36], [82], [84], [107], [108],
related [100]–[102], [104]–[109], [116]–[118],
[135], [137]–[141], [143], [144] [147]–[153], [157]
Factors [121]–[124], [145]
Unimodal [72], [73], [79], [81], [83], [91], [95], [96], [126]–[128], [130], [133], [135],
Trajectory [98], [104], [105], [120] [137], [144]
[74]–[78], [82], [84], [85], [85]–[87],
Multimodal [36], [75], [82], [84], [108],
[92]–[94], [97], [99]–[102], [106]–[109], [132], [138]–[141], [143]
Output Types Trajectory [147]–[153], [156], [157]
[116]–[118], [120]–[124], [145]
[69]–[71], [73], [74], [76], [80]–[82], [84], [82], [84], [108], [147], [151],
Intention [133], [135], [144]
[86], [87], [92], [98], [102], [108], [109] [153]

TABLE IV: The mainstream approaches for deep learning-based methods.

Context Interaction
Classification Methods Year State Encoder Decoder Description
Encoder Module
Radial Basis Learn latent variables to model the multimodel
RNN MFP [84] 2019 RNN CNN RNN
Function trajectories
Trajectory Set
CNN CoverNet [92] 2020 CNN CNN - Apply the raster image as input
Generator
1D-CNN,
CNN HOME [100] 2021 CNN Self-Attention CNN Output 2D topview heatmap
GRU
Displacement
CNN TPCN [101] 2021 PointNet++ [158] Joint Learning Use point cloud learning-based methods
Prediction
DESIRE Social
CNN and RNN 2017 GRU CNN GRU Use deep IOC framework to encode
[107] Pooling [103]
CS-LSTM Social Six LSTM decoders to generate distributions of
CNN and RNN 2018 LSTM - LSTM
[102] Pooling [103] six specific maneuvers
Attention MHA-JAM Attwntion Each attention head to generate a distinct future
2021 LSTM CNN LSTM
Mechanism [118] Head trajectory to address multimodality
Attention mmTransformer Stacked Transformers to refine a set of fixed
2021 Transformer VectorNet Transformer MLP
Mechanism [124] proposals
VectorNet Operate on the vectorized HD maps and
GNN 2020 PointNet [159] GNN MLP
[137] trajectories
DenseTNT Goal Set Directly output a set of trajectories from dense
GNN 2021 VectorNet
[140] Predictor goal candidates
Auto-Encoder
Generative TS-GAN Incorporate GAN into modeling spatial and
2020 LSTM - Social LSTM
Model [153] temporal information
Convolution
Generative 1D-CNN, Model-based Model-based generator and learning-based
PRIME [36] 2021 LSTM Self-Attention
Model LSTM Generator evaluator

learning-based methods have reached the state-of-art results

in trajectory prediction tasks and can predict longer time than " #
physics-based methods and classic machine learning-based
X
′
v∗ (s) = max Rsa +γ a
Pss′ v∗ (s ) ,
methods. At present, more and more autonomous vehicle a
s′ ∈A (5)
trials use deep learning-based methods to predict the future X
′ ′
q∗ (s, a) = Rsa +γ a
Pss′ max q∗ (s , a ) .
trajectory of traffic participants. ′ a
s′ ∈A

VI. R EINFORCEMENT L EARNING - BASED M ETHODS Using MDP, the RL-based methods can be classified as
In recent years, the rapid development of reinforcement Inverse Reinforcement Learning (IRL) methods, Generative
learning (RL) provides a new way to understand high- Adversarial Imitation Learning (GAIL) methods, and Deep
dimensional complex policies [160]–[162], which provides IRL (DIRL) methods, which will be discussed below.
new ideas for trajectory prediction tasks of AVs [163], [164].
When RL is used in the field of trajectory prediction for AVs, Demonstration
Demonstration
Environment Environment
Environment
most methods use the Markov decision process (MDP) [165] of the expert
of the expert

to maximize the expected cumulative reward. A MDP is a { !, " {𝝉

, …𝟏,, 𝝉#𝟐}, … , 𝝉𝒏 } {ො𝝉, 𝟏!,$𝝉ො}𝟐 , … , 𝝉ො 𝒏 }
{! " , ! # , …
Inverse Inverse
Reinforcement Optimal
ActorActor 𝝅 Reinforcement
Reinforcement Expert Expert
tuple (S, A, P , R, γ), where S is a finite set of states, A Learning Actor LearningLearning

is a finite set of actions, P is a state transition probability Find anFind an

a ′ Reward RewardReward
matrix, Pss ′ = P [St+1 = s | St = s, At = a], R is a reward
actor based
actor based Function R
Function on R on R Function R
a
function, Rs = E [Rt+1 | St = s, At = a], and γ is a discount
(a) (b)
factor. To find the best decision process over all policies, the
optimal state-value function v⋆ (s) and the optimal action-value Fig. 13: Description of (a) RL and (b) IRL.
function q⋆ (s, a) can be calculated as

A. Inverse Reinforcement Learning control (IOC) method using Langevin Sampling is proposed
Usually, MDP assumes that the reward function is already to learn the cost function of other vehicles in an energy-based
provided. However, the driver’s behavior is always compli- generative model. Based on the decision-making mechanism,
cated such that manually specifying the weight of the reward reward functions are learned using a polynomial trajectory
function is inappropriate [166], [167]. IRL learns the reward sampler with discrete latent driving intentions in [181].
function according to the expert demonstration (trajectory) to
generate the corresponding optimal driving policy as shown in B. Generative Adversarial Imitation Learning
Fig. 13. Ho et al. [191] propose GAIL in 2016, which uses the
We divide IRL into maximum margin-based and maximum method of GAN to do imitation learning in RL. Instead
entropy-based methods according to the way of learning the of learning the reward function from experts’ demonstration
weights of the reward function. Maximum margin-based meth- with IRL, GAIL directly extracts policies from data. Just as
ods optimize the reward function weights by minimizing the GAN, the core idea of GAIL is that the generator generates a
feature expectations between the expert demonstration and the trajectory similar to the expert trajectory as much as possible,
predicted trajectory. In [168], the structured maximum margin and the discriminator tries to judge whether it is an expert
is applied to learn mappings from features to reward and use trajectory as much as possible.
these optimal policies in MDP to imitate expert’s behavior. Many articles use GAIL to complete trajectory prediction
SCIRL is proposed by [169] which doesn’t solve the direct for AVs. Kuefler et al. [182] extend GAIL to the optimization
RL problem but estimates the feature expectation of expert of RNN to demonstrate human driver behaviors, and policies
policies through structured classification. Silver et al. [170] and actions are evaluated by the discriminator. Li et al. [183]
use maximum margin planning framework to learn reward apply the information maximization theorem to extract the
functions and learn driving maneuvers for AVs. However, latent structure underlying expert demonstrations. In [184], a
most margin-based methods are ambiguous in the matching parameter-sharing extension of GAIL is proposed to model the
of feature expectations, because some degeneracies can also interaction between multi-agent and can provide agents with
satisfy the optimal policy of expert demonstration. domain-specific knowledge. To overcome the shortcomings of
Maximum entropy-based methods are more popular because GAIL, which only models the next state using the current state,
they can use multiple reward functions to explain the ambi- Choi et al. [185] propose a method combining a partially-
guity of experts’ behavior [171], most of which are based on observable Markov decision process (POMDP) within the
linear mapping and can be formulated as, GAIL framework, and the model is trained using the reward
function from the discriminator.
r(Φ(s)) = θ⊤ Φ(s), (6)
where r is the approximation of reward function; Φ is a C. Deep Inverse Reinforcement Learning
function to output the features of the state s, and the weight Since the prediction problem is nonlinear, it is necessary to
θ will be acquired by training. Several works apply maximum use nonlinear mapping for generalizable function approxima-
entropy-based IRL (MaxEnt-IRL) to behavior prediction for tions. In [192], the deep inverse reinforcement learning (DIRL)
AVs. In [172], using MaxEnt-IRL acceptability-dependent be- framework is proposed to approximate complex and nonlinear
havior models are learned from expert’s trajectories to generate reward functions, which can be expressed as,
the stochastic behavior, then the optimum behavior model is
chosen by maximizing the social acceptability. Sharifzadeh et r(Φ(s)) = f (θ, Φ(s)), (7)
al. [173] leverage IRL with Deep Q-Networks (DQN) to ex-
tract the rewards with large state spaces. In [174], interaction- where f is a nonlinear function. In this paper, a fully
related factors are considered to accomplish probabilistic convolutional neural network (FCN) is applied in IRL for
prediction for AVs. The distribution for future trajectories reward approximation. Some DIRL methods take historical
is formulated by driving maneuvers. A spatiotemporal state trajectories as input. You et al. [186] consider the driving
lattice is proposed by [175] to model driver behavior from style and the road geometry, where the authors first use RL to
expert’s demonstrations. design MDP, then learn the optimal driving policy from IRL,
Besides, some MaxEnt-IRL methods utilize sampled tra- and use the deep neural network (DNN) to approximate the
jectories to accomplish prediction tasks. In [176], candidate reward function. In [164], trajectories of traffic participants are
trajectories are sampled first which will be selected with encoded by LSTM and the reward network is learned by FCN.
the minimal cost as the predicted trajectory. Wu et al. in Currently, more DIRL-based methods directly use raw per-
[177] propose a method to learn the reward functions in ception data. Wulfmeier et al. [187] apply FCN for mapping
the continuous domain by using the speed profile sampler the lidar data to traversability maps. The network is pre-
to estimate the partition function. In [178], state sequences trained to regress to a manual prior cost map and the initialize
are sampled from the MaxEnt policy which will be fed to an weights will be fine-tuned by the maximum entropy DIRL
attention-based trajectory generator to generate valued future network. In [188], using camera image, the driving behavior is
trajectories. Xin et al. Ref. [179] use randomly pre-sampled modeled by DIRL, where the CNN is to extract the associated
policies in sub-space to approximate the optimal policy for state features. Zhu et al. [189] use RL ConvNet and state
reducing computational costs. In [180], an inverse optimal visiting frequency (SVF) ConvNet to encode the vehicle’s

TABLE V: Summary of reinforcement learning-based methods.

Reinforcement Learning-based Methods IRL GAIL D-IRL
Physics-based Factors [168]–[171], [173]–[181] [182]–[185] [164], [186]–[190]
Contextual Factors Road-related Factors [168], [170], [173]–[181] [182], [184], [185] [186], [187], [189], [190]
Interaction-related Factors [171], [173], [174], [177], [180], [181] [184] [187], [190]
Unimodal Trajectory [168], [173], [175], [177] [182]–[185] [164], [186], [187], [189], [190]
Output Types Multimodal Trajectory [169]–[171], [174], [176], [178]–[181]
Intention [170], [171], [173]–[176], [181] [182]–[184] [164], [186]–[190]

kinematics and obtain the weight of the reward function B. Evaluation Metrics
by back-propagating the loss gradient [193] between expert Several evaluation metrics are usually used for vehicle
SVF from expert demonstration and policy SVF from lidar trajectory prediction.
data. In [190], a convolutional LSTM considering the inertial,
1) Root Mean Squared Error (RMSE): RMSE calculates the
environment, and social is proposed to extract the feature map
square root of the average of squared prediction error:
from lidar and trajectory data, which will be incorporated into
the output reward map to predict the traversability map.
v
u n 2
u1 X
RM SE = t t
Ypred t
− YGT , (8)
n t=1
D. Summary
where n is the number of data samples in the prediction
In summary, reinforcement learning-based trajectory predic- t t
horizon, Ypred and YGT are predicted results and ground
tion methods for AVs can be classified into Table V. Such
truth trajectory at sample time t correspondingly. RMSE
methods use MDP to maximize the expected cumulative re-
is sensitive to large prediction errors and one of the
ward and generate optimal driving policies by learning expert
commonly used metrics for trajectory prediction.
demonstrations, most of which are planning-based methods.
2) Negative Log Likelihood (NLL): For a modeled trajectory
Combining with deep learning networks, these methods can
distribution f (Y ):
better extract expert demonstrations and consider more factors.
However, Most of them are computationally intensive and
N LL = −log(f (Y )), (9)
require long training periods.
where Y represents the ground truth trajectory and the
NLL value is not a physical quantity. RMSE is used to
VII. E VALUATION
calculate the models’ average error, while NLL is more
The appearance of a variety of datasets has facilitated focused on determining the correctness of the trajectory
the performance of the learning-based prediction algorithms. in the maneuver-based models.
Therefore, it is necessary to choose suitable metrics to evaluate 3) Average displacement error (ADE): The average L2 dis-
the performance of each algorithm. This section will first tance between the predicted trajectory and the ground
introduce several datasets, then introduce the performance- truth.
evaluation metrics, finally the performance of the aforemen-
tioned works using different methods on the same NGSIM Np T
1 X X
t t

dataset [194] will be compared. ADE = Ypred [i] − YGT [i] , (10)
Np × T i=1 t=1
where Np represents all the predicted objects, and T
A. Datasets
represents the prediction time. For multimodal prediction,
To evaluate the quality of the trajectory prediction model, minimum ADE (mADE) is usually used to indicate the
the predicted trajectory is usually compared with the ground minimum value of ADE over K predictions.
truth trajectory, which is obtained from various datasets. These 4) Final displacement error (FDE): The L2 distance between
datasets are collected by sensors, such as lidar and cameras, the final predicted results and the corresponding ground
and manually annotated or automatically generated to produce truth positions.
sequences of vehicles’ movements.
Np
The popular datasets used in trajectory prediction are sum- 1 X T T

marized in Table VI. This paper introduces the datasets in F DE = Ypred [i] − YGT [i] , (11)
Np i=1
reverse chronological order and lists the typical methods that
T T
use the dataset for trajectory prediction. Most of the methods where Ypred and YGT are predicted results and ground
mentioned in this paper take trajectories as input and some truth at the final time step T correspondingly. For mul-
also use vehicle states or map information. However, since timodal prediction, minimum FDE (mFDE) is usually
most trajectories in these datasets are obtained by learning used to indicate the minimum value of FDE over K
methods from images or point clouds, some models directly predictions.
use images or point clouds as input for end-to-end trajectory 5) Miss Rate (MR): Based on the L2 distance of the final
prediction. position, the ratio of cases where the predicted trajectory

TABLE VI: Datasets for AVs which utilized in trajectory prediction.

Dataset Year Agents Sensors Scene Duration and tracking quantity Data type Typical methods
vehicles
NuScenes lidar trajectories, MHA-JAM [118],
2020 pedestrians urban 1000 driving scenes
[195] camera HD map Trajectron++ [145]
cyclists
vehicles DenseTNT [140],
Waymo Open lidar trajectories,
2020 pedestrians urban 103354, 20s 10Hz segments Scene Transformer
Dataset [196] camera HD map
cyclists [123]
vehicles
Lyft Level 5 lidar 1000+ hours, 16K miles of trajectories,
2020 pedestrians urban Graph-LSTM [133]
[197] camera data from 23 vehicles HD map
cyclists
Argoverse [77], lidar 324,557 interesting vehicle trajectories, VectorNet [137],
2019 vehicles urban
[198] camera trajectories, 1000 driving hours HD map LaneRCNN [141]
INTERACTION vehicles drone urban trajectories,
2019 11 locations, 40000 vehicles IPTM [87]
[199] pedestrians camera highway HD map
110500 vehicles, 147 driven trajectories,
HighD [200] 2018 vehicles drone highway MHA-LSTM [117]
hours lane
vehicles
Apolloscape lidar
2018 pedestrians urban 1000km trajectories trajectories GRIP [127]
[129] camera
cyclists
vehicles
lidar urban image, point DESIRE [107],
KITTI [201] 2013 pedestrians 50 sequences
camera highway cloud MANTRA [99]
cyclists
90 min recording of two trajectories, CS-LSTM [102],
NGSIM [194] 2006 vehices camera highway
highways lane TS-GAN [153]

is not within 2.0 meters of the ground truth. Argoverse [77] in Table VIII, which is recorded under urban
conditions and the prediction horizon is 3 seconds. It can be
When the prediction results are multi-modal, assuming seen from Table VII and VIII, the longer the prediction time,
that the prediction results are K likely future trajectories, the lower the prediction accuracy, and most learning-based
ADE, FDE and MR will be judged according to the methods surpass conventional methods. Besides, multimodal
optimal future trajectory, and they will be recorded as prediction is more consistent with human cognitive process
ADEK , FDEK , MRK respectively. and multimodal prediction is more accurate than unimodal
6) Computation Time: Computation time is very impor- prediction. GNN performs well in Table VII with the ability
tant for the on-board performance of the method. The to capture structure road features, such that some state-of-the-
computing power of autonomous vehicles is limited, art methods use GNN to encode HD map information and
but the trajectory prediction model is generally complex complete trajectory prediction. At present, most of the latest
and requires a huge computational resources. To achieve trajectory prediction methods use deep learning, but for AVs
higher levels for autonomous driving, the computation of to carry out safer planning and control, trajectory-prediction
each module must be relatively fast to reduce the delay methods need to be more accurate.
as small as possible. Therefore, the real-time performance
or computational cost is very important for the model.
7) Prediction Horizon: Prediction horizon refers to the time D. Applications
steps in the future that can be predicted by the model. Since trajectory prediction plays an important role in en-
Generally, the longer the prediction horizon is, the lower suring the safety of AVs, major autonomous driving teams
the accuracy will be in a dynamics or even stochastic have embedded the trajectory prediction module into the
driving environment. However, to meet the requirements development of AVs above the L4 level. However, due to
of the planning and control system, the trajectory predic- the confidentiality of the software, many autonomous driving
tion results with a certain period time should be fed into manufacturers have not mentioned the specific algorithm they
the system, such that the prediction time should not be use, so this section only summarizes the trajectory prediction
too short and in accordance with other module. methods used by the autonomous driving teams that have
been clearly announced. Early real-world studies use physics-
based methods for trajectory prediction [23]. Next, BMW
C. Performance of Different Methods uses Dynamic Bayesian Networks to determine the driving
For real-world autonomous driving, accuracy is one of the intentions of surrounding vehicles and performs experiments
most important metrics for trajectory prediction methods. To on highways [204]. The pioneer IV autonomous vehicle of
allow readers to better compare the various methods and their the University of Science and Technology of China uses a
accuracy, this paper compares the performance of trajectory- knowledge-driven approach to obtain the future lane of the
prediction methods on highway and urban scenes respectively. predicted vehicle and then uses LSTM to predict its future
In Table VII, methods based on NGSIM I-80 and US-101 high- trajectory [205]. For the Baidu Apollo autonomous vehicle
way driving datasets [194] are compared using RMSE, while [206], a new model named Inter-TNT based on the advanced
we use minADE, minFDE, MR to compare methods based on method TNT [139], is introduced as the prediction module.

TABLE VII: Comparison for trajectory prediction methods for AVs based on the highway driving dataset NGSIM.
RMSE(m)
Classification Models
1S 2S 3S 4S 5S
Single Trajectory methods Constant Velocity [102] 0.73 1.78 3.13 4.78 6.68
Kalman Filtering methods IMM-KF [32] 0.58 1.36 2.28 3.37 4.55
HMM C-VGMM+VIM [55], [152] 0.66 1.56 2.75 4.24 5.99
RNN M-LSTM [82] 0.58 1.26 2.12 3.24 4.66
RNN MFP-1 [84] 0.54 1.16 1.90 2.78 3.83
CNN and RNN CS-LSTM(M) [102] 0.62 1.29 2.13 3.20 4.52
Attention Mechanism MHA-LSTM [117] 0.41 1.01 1.74 2.67 3.83
GNN GRIP++ [128] 0.38 0.89 1.45 2.14 2.94
GNN GISNet [135] 0.33 0.83 1.42 2.14 3.23
Generative Model MATF-GAN [152] 0.66 1.34 2.08 2.97 4.13
Generative Model TS-GAN [153] 0.60 1.24 1.95 2.78 3.72
IRL L-IRL [164], [202] 1.12 2.29 2.31 3.38 4.45
GAIL GAIL-GRU [164], [182] 0.69 1.51 2.55 3.65 4.71
DIRL MEDIRL [164], [187] 1.35 2.57 2.83 3.69 4.88
DIRL DN-IRL [164], [203] 0.54 1.02 1.91 2.43 3.76

TABLE VIII: Comparison for trajectory prediction methods for AVs based on Argoverse under urban conditions.
K=6 K=1
Classification Models
minFDE1 minADE1 MR2 minFDE1 minADE1 MR2
Physics-based CV [77] 7.57 3.39 0.82 7.89 3.53 0.84
Classic Machine Learning-based NN+map [77] 4.03 2.08 0.58 8.12 3.65 0.84
RNN LSTM+map [77] 5.44 2.34 0.69 6.81 2.96 0.81
RNN Jean [85] 1.49 0.93 0.19 4.18 1.86 0.63
Attention Mechanism SceneTransformer [123] 1.23 0.80 0.13 - - -
Attention Mechanism mmTransformer [124] 1.34 0.84 0.15 - - -
GNN LaneGCN [138] 1.36 0.87 0.16 3.78 1.71 0.59
GNN DenseTNT [140] 1.45 0.93 0.11 - - -
GNN LaneRCNN [141] 1.45 0.90 0.12 3.69 1.69 0.57
Generative Model PRIME [36] 1.56 1.22 0.12 3.82 1.91 0.59
1 minADE/ minFDE: in meters
2 MR: the threshold for endpoint error is 2m

With the advancement of autonomous driving technology, in static scenes. Because of its simplicity and fast response,
more and more advanced and complex trajectory prediction these methods can be easily used in real applications for AVs,
methods will be applied to real vehicles. such as collision risk analysis.
2) Classic Machine Learning-based Methods: compared
VIII. D ISCUSSION AND D IRECTIONS with physics-based methods, this type of method is able to
This section will discuss the advantages and disadvantages consider more factors and its accuracy is relatively high with
of different categories for trajectory prediction, and outline a longer prediction length at a higher computing cost. Most
potential research directions to guide readers in this field. of these methods are maneuver-based methods, which predicts
the trajectory with the maneuver known as a prior. However,
vehicle maneuvers of human drivers are usually diverse and
A. Discussion vary greatly in different scenarios such that the generalization
This section discusses the performance of the trajectory ability of is poor. In real applications for AVs, such methods
prediction methods in terms of accuracy, computation time, are used in scenarios such as lane change studies, leveraging
prediction horizon, etc., analyzes its practical applications in their advantages in maneuver recognition.
AVs, and gives a summary in Table IX. Note that, we refer to 3) Deep Learning-based Methods: traditional trajectory
short-term and long-term prediction to characterize prediction prediction methods for AVs are only suitable for simple scenes
horizons of no more than 1 second and no less than 3 seconds, and short-term prediction, but deep learning-based methods
respectively. can make accurate prediction in a longer prediction horizon.
1) Physics-based Methods: they are suitable for the move- By using RNN, CNN, GNN and other networks for feature
ment of vehicles, which can be accurately described by kine- extraction, interaction-related factors and map information
matics or dynamics models. Given a suitable physics model, are considered. Among them, it can adapt to more complex
these methods can be applied to a variety of scenarios at small environments and a longer time horizon. Deep learning-based
computational cost and in a short time but without training. methods require to use a large amount of data for training.
However, the prediction results based on such models heavily Besides, with the increase of consideration factors and the
depends on the inputs and the model selection. The inputs are increase of the number of network layers, the computing
closely related to human or machine drivers, influenced by the costs and time increases sharply. Such methods can naturally
driving environment or the interactions with other participants. generate multi-modal trajectories, which is consistent with the
Therefore, without the capability to describe such factors, diversity of vehicles’ maneuvers. In real applications for AVs,
physics-based models are limited to short-term prediction and it is necessary to reach a balance between calculation time and

TABLE IX: The performance of the trajectory prediction methods.

Prediction Computation
Methods Accuracy Applications
Horizon Cost
High in short-term prediction, low in other
Physics-based Short Small Colision risk analysis
prediction horizon
Classic Machine Good at recognizing maneuvers but
Medium Medium Maneuver recognition
Learning-based generalization ability is poor
Relatively More and more applied in
Deep Learning-based High in considering some factors Long
high real-world
Reinforcement Relatively high, prediction methods are
Long High More applied in planning
Learning-based relatively few

New Training
Structures Methods

Interaction
Advanced
Algorithms Benchmark
Dataset
Map

Standard
More Benchmark
Traffic Merits
Information Future
Signs
Directions for Benchmark
Trajectory Models
Traffic Prediction
Rules

Noises
Planning Integration Robustness

Credibility
Decision
Control Making
Anti-perturbation

Fig. 14: Illustration of Potential Research Directions.

model complexity to ensure the real-time performance and methods to the learning-based methods, which can handle
safety of AVs. At present, more and more real-world trials more complex scenes. After summarizing the methods of the
use these methods to predict the future trajectory of traffic past two decades, this paper outlines the potential research
participants. directions as shown in Fig. 14 and discusses as follows.
4) Reinforcement Learning-based Methods: they imitate 1) Inclusion of more information: It can be seen that the
the human decision-making process and obtain the reward methods based on the interaction-aware factors and map
function through learning the expert demonstration to generate information are more suitable for real application scenes
the corresponding optimal driving policy. They can continu- and are currently one of the most popular development
ously evolve through learning and adapt to complex environ- directions. However, much more information needs to be
ments and long prediction horizons. Such methods probably considered in addition to the interaction-related factors.
generate higher accuracy trajectories than deep learning meth- For example, most of the current methods do not consider
ods in a longer time domain. However, most of these methods the constraints based on explicit traffic rules, but in real
are typically computationally expensive in their recovery of scenes, traffic rules can reshape the maneuvers or even
an expert cost function and require long training times. In real trajectories of vehicles. Similarly, information such as
applications for AVs, reinforcement learning-based trajectory traffic lights, road signs, etc., can be also used as reliable
prediction methods are more applied to trajectory planning, inputs for prediction. In addition, other useful audio-
taking its advantages in the decision-making process. visual information, such as vehicle turn signals, vehicle
horns, etc., can be used as references for prediction.
In the future, researchers are encouraged to use more
B. Potential Research Directions information for trajectory prediction.
With the continuous advancement of autonomous driv- 2) Introduction of more advanced algorithms: Just like the
ing technology, the importance of trajectory prediction has outstanding achievements of the Transformer model in
been paid more and more attention. The trajectory prediction the field of NLP [119], by introducing more advanced al-
method has been developed from the traditional Kalman filter gorithms it can achieve higher prediction accuracy under

the same input data. The current algorithm achieves high based methods, the classic machine learning-based meth-
accuracy by adding HD maps, considering interaction- ods, the deep learning-based methods, and the reinforcement
related factors, and generating the multimodal trajectory learning-based methods. The performance of each kind of
that conforms to the multi-modality of human inten- method and the opportunities for applying it to real-world au-
tions. In addition, more advanced algorithms need to be tonomous driving are discussed. Recent advances in trajectory
continuously proposed to further improve the ability of prediction for AVs are encouraging, but it still faces various
trajectory prediction algorithms with new structures and challenges and has potential research directions in the future
training methods. With the continuous iterative upgrade which we have outlined to guide readers in this field.
of the autonomous driving system, it has become the Safety is crucial for autonomous driving. To break through
general trend to improve the predictive ability of AVs the bottleneck of AVs and ensure their safety, AVs need to
and meet the safety requirements of autonomous driving predict their surroundings just like human drivers. We hope
through more advanced algorithms. our survey will improve the application of prediction systems
3) Integration other key technologies of AVs: The effec- in AVs and stimulate further research along the directions
tiveness of the whole system can be greatly improved discussed.
when the trajectory prediction results are considered for
decision making, trajectory planning, and motion control.
Take the motion control system as an example, most of ACKNOWLEDGEMENTS
the current motion control systems regard the movement The authors would like to thank the National Key R&D
of traffic participants as uniform linear motion, which is Program of China under Grant No2020AAA0108100 for fund-
quite different from the real trajectory of traffic partici- ing, Science and Technology Commission of Shanghai Munic-
pants. When the trajectory prediction model is integrated, ipality under Grant 21ZR1465900 and thank the anonymous
the local decision-making planning control system can reviewers for their valuable suggestions.
better cope with the environment’s changes and improve
the safety of autonomous driving.
4) Improvement of model robustness: Most of the datasets R EFERENCES
are semi-automatically annotated and the ground truth [1] Y. Ma, Z. Wang, H. Yang, and L. Yang, “Artificial intelligence
trajectories have measurement noises. In real applications applications in the development of autonomous vehicles: a survey,”
for AVs, various noises exist in the perception system, IEEE/CAA Journal of Automatica Sinica, vol. 7, no. 2, pp. 315–329,
2020.
include tracking errors, location errors, map errors, etc., [2] F.-Y. Wang, “Metavehicles in the metaverse: Moving to a new phase
which will bring deviations and uncertainty. Therefore, for intelligent vehicles and smart mobility,” IEEE Transactions on
robustness should be considered to improve the anti- Intelligent Vehicle, vol. 7, no. 1, pp. 1–5, 2022.
[3] D. Cao, X. Wang, L. Li, C. Lv, X. Na, Y. Xing, X. Li, Y. Li, Y. Chen,
perturbation ability of the real application for AVs. In and F.-Y. Wang, “Future directions of intelligent vehicles: Potentials,
addition, besides the location metrics (such as ADE, possibilities, and perspectives,” IEEE Transactions on Intelligent Vehi-
FDE), probabilistic metrics (NLL, mADE, mFDE) should cle, vol. 7, no. 1, pp. 6–10, 2022.
[4] S. Lefèvre, D. Vasquez, and C. Laugier, “A survey on motion prediction
also be applied to improve the credibility of the method and risk assessment for intelligent vehicles,” Robomech Journal, vol. 1,
and make the model better applicable to real-world au- no. 1, p. 1, 2014.
tonomous driving. [5] Mohammad, Shokrolah, Shirazi, Brendan, Tran, and Morris, “Looking
at intersections: A survey of intersection monitoring, behavior and
5) Establishment of a benchmark: A benchmark is needed, safety analysis of recent studies,” IEEE Transactions on Intelligent
with a standard unified metric and a map-available dataset Transportation Systems, 2017.
in a more complex environment. This benchmark should [6] S. Mozaffari, O. Y. Al-Jarrah, M. Dianati, P. A. Jennings, and
A. Mouzakitis, “Deep learning-based vehicle behaviour prediction
allow a long-term and multi-modal prediction with ob- for autonomous driving applications: A review,” CoRR, vol.
stacle avoidance scenes and non-convex constraints, and abs/1912.11676, 2019. [Online]. Available: http://arxiv.org/abs/1912.
allow the use of different history horizons to predict 11676
[7] F. Leon and M. Gavrilescu, “A review of tracking and trajectory
future trajectories of different prediction horizons. Be- prediction methods for autonomous driving,” Mathematics, vol. 9,
sides, a test set is needed to make an inference on no. 6, p. 660, 2021.
the trained model, and make the computation time as a [8] J. Liu, X. Mao, Y. Fang, D. Zhu, and M. Q.-H. Meng, “A survey
on deep-learning approaches for vehicle trajectory prediction in au-
unified comparison. Moreover, in the real applications for tonomous driving,” arXiv preprint arXiv:2110.10436, 2021.
AVs, since good perception and tracking are not always [9] K. Santhoshk., P. Dograd., and P. Royp., “Anomaly detection in road
completed, the benchmark dataset should include test sets traffic using visual surveillance,” ACM Computing Surveys (CSUR),
with inaccurate ground truth values, to be more suitable 2020.
[10] A. Ben Mabrouk and E. Zagrouba, “Abnormal behavior recognition for
for real applications and better used for AVs. intelligent video surveillance systems: A review,” Expert Systems with
Applications, vol. 91, no. jan., pp. 480–491, 2018.
[11] D. Ridel, E. Rehder, M. Lauer, C. Stiller, and D. Wolf, “A literature
IX. C ONCLUSION review on the prediction of pedestrian behavior in urban scenarios,”
in 2018 21st International Conference on Intelligent Transportation
In this paper, a thorough analysis of the trajectory-prediction Systems (ITSC). IEEE, 2018, pp. 3105–3112.
problem for AVs and the taxonomy for trajectory-prediction [12] A. Rudenko, L. Palmieri, M. Herman, K. M. Kitani, D. M. Gavrila,
and K. O. Arras, “Human motion trajectory prediction: A survey,” The
methods are proposed. Trajectory-prediction methods for AVs International Journal of Robotics Research, vol. 39, no. 8, pp. 895–935,
are comprehensively reviewed, which include the physics- 2020.

[13] W. Wang, R. Yu, Q. Huang, and U. Neumann, “Sgpn: Similarity [35] Y. Wang, Z. Liu, Z. Zuo, Z. Li, L. Wang, and X. Luo, “Trajectory
group proposal network for 3d point cloud instance segmentation,” in planning and safety assessment of autonomous vehicles based on
Proceedings of the IEEE Conference on Computer Vision and Pattern motion prediction and model predictive control,” IEEE Transactions
Recognition, 2018, pp. 2569–2578. on Vehicular Technology, vol. 68, no. 9, pp. 8546–8556, 2019.
[14] S. Chen, J. Hu, Y. Shi, Y. Peng, J. Fang, R. Zhao, and L. Zhao, [36] H. Song, D. Luan, W. Ding, M. Y. Wang, and Q. Chen, “Learning to
“Vehicle-to-everything (v2x) services supported by lte-based systems predict vehicle trajectories with model-based planning,” arXiv preprint
and 5g,” IEEE Communications Standards Magazine, vol. 1, no. 2, pp. arXiv:2103.04027, 2021.
70–76, 2017. [37] C. E. Rasmussen, “Gaussian processes in machine learning,” in Summer
[15] S. Ammoun and F. Nashashibi, “Real time trajectory prediction for school on machine learning. Springer, 2003, pp. 63–71.
collision risk estimation between vehicles,” in Intelligent Computer [38] J. Joseph, F. Doshi velez, and N. Roy, “A bayesian nonparametric
Communication and Processing, 2009. ICCP 2009. IEEE 5th Inter- approach to modeling mobility patterns.” 01 2010.
national Conference on, 2009. [39] J. Joseph, F. Doshi-Velez, A. S. Huang, and N. Roy, “A bayesian
[16] A. Broadhurst, S. Baker, and T. Kanade, “Monte carlo road safety nonparametric approach to modeling motion patterns,” Autonomous
reasoning,” in Intelligent Vehicles Symposium, 2005. Proceedings. Robots, vol. 31, no. 4, pp. 383–400, 2011.
IEEE, 2005. [40] Q. Tran and J. Firl, “Online maneuver recognition and multimodal
[17] C. F. Lin and A. G. Ulsoy, “Vehicle dynamics and external disturbance trajectory prediction for intersection assistance using non-parametric
estimation for vehicle path prediction,” IEEE Transactions on Control regression,” in 2014 ieee intelligent vehicles symposium proceedings.
Systems Technology, vol. 8, no. 3, pp. 508–518, 2000. IEEE, 2014, pp. 918–923.
[18] R. Pepy, A. Lambert, and H. Mounier, “Reducing navigation errors [41] C. Laugier, I. E. Paromtchik, M. Perrollaz, M. Yong, J.-D. Yoder,
by planning with realistic vehicle model,” in Intelligent Vehicles C. Tay, K. Mekhnacha, and A. Nègre, “Probabilistic analysis of
Symposium, 2006. dynamic scenes and collision risks assessment to improve driving
[19] N. Kaempchen, B. Schiele, and K. Dietmayer, “Situation assessment safety,” IEEE Intelligent Transportation Systems Magazine, vol. 3,
of an autonomous emergency brake for arbitrary vehicle-to-vehicle no. 4, pp. 4–19, 2011.
collision scenarios,” IEEE Transactions on Intelligent Transportation [42] P. Trautman and A. Krause, “Unfreezing the robot: Navigation in dense,
Systems, vol. 10, no. 4, pp. 678–687, 2009. interacting crowds,” in 2010 IEEE/RSJ International Conference on
[20] R. Schubert, E. Richter, and G. Wanielik, “Comparison and evalua- Intelligent Robots and Systems. IEEE, 2010, pp. 797–803.
tion of advanced motion models for vehicle tracking,” in 2008 11th [43] Y. Guo, V. V. Kalidindi, M. Arief, W. Wang, J. Zhu, H. Peng, and
International Conference on Information Fusion, 2008. D. Zhao, “Modeling multi-vehicle interaction scenarios using gaussian
[21] A. Polychronopoulos, M. Tsogas, A. J. Amditis, and L. Andreone, random field,” in 2019 IEEE Intelligent Transportation Systems Con-
“Sensor fusion for predicting vehicles’ path for collision avoidance ference (ITSC), 2019, pp. 3974–3980.
systems,” IEEE Transactions on Intelligent Transportation Systems, [44] D. A. V. Govea and T. Fraichard, “Motion prediction for moving
vol. 8, no. 3, pp. 549–562, 2007. objects: a statistical approach,” in IEEE International Conference on
[22] P. Lytrivis, G. Thomaidis, and A. Amditis, “Cooperative path prediction Robotics & Automation, 2004.
in vehicular environments,” in Intelligent Transportation Systems, 2008. [45] C. Hermes, C. Wohler, K. Schenk, and F. Kummert, “Long-term vehicle
ITSC 2008. 11th International IEEE Conference on, 2008. motion prediction,” in Intelligent Vehicles Symposium, 2010.
[23] A. Barth and U. Franke, “Where will the oncoming vehicle be the next [46] H. Mandalia and D. Salvucci, “Using support vector machines for lane-
second?” in Intelligent Vehicles Symposium, 2008. change detection,” Proceedings of the Human Factors and Ergonomics
[24] T. Batz, K. Watson, and J. Beyerer, “Recognition of dangerous situ- Society Annual Meeting, vol. 49, 09 2005.
ations within a cooperative group of vehicles,” in Intelligent Vehicles [47] P. Kumar, M. Perrollaz, S. Lefèvre, and C. Laugier, “Learning-based
Symposium, 2009. approach for online lane change intention prediction,” in IEEE Intelli-
[25] M. Brannstrom, E. Coelingh, and J. Sjoberg, “Model-based threat as- gent Vehicles Symposium, 2013.
sessment for avoiding arbitrary vehicle collisions,” IEEE Transactions [48] G. Aoude and J. How, “Using support vector machines and bayesian
on Intelligent Transportation Systems, vol. 11, no. 3, pp. 658–669, filtering for classifying agent intentions at road intersections,” 09 2009.
2010. [49] G. S. Aoude, B. D. Luders, K. K. H. Lee, D. S. Levine, and J. P. How,
[26] R. Miller and Q. Huang, “An adaptive peer-to-peer collision warning “Threat assessment design for driver assistance system at intersections,”
system,” in IEEE Vehicular Technology Conference, 2002. in 13th International IEEE Conference on Intelligent Transportation
[27] J. Hillenbrand, A. M. Spieker, and K. Kroschel, “A multilevel collision Systems, 2010, pp. 1855–1862.
mitigation approach—its situation assessment, decision making, and [50] S. Gambs, M.-O. Killijian, and M. Nunez del Prado Cortez, “Next
performance tradeoffs,” IEEE Transactions on Intelligent Transporta- place prediction using mobility markov chains,” 04 2012.
tion Systems, vol. 7, pp. 528–540, 2006. [51] Q. Deng and D. Söffker, “Improved driving behaviors prediction
[28] N. Kaempchen, K. Weiss, M. Schaefer, and K. C. J. Dietmayer, “Imm based on fuzzy logic-hidden markov model (fl-hmm),” in 2018 IEEE
object tracking for high dynamic driving maneuvers,” in Intelligent Intelligent Vehicles Symposium (IV), 2018, pp. 2003–2008.
Vehicles Symposium, 2004. [52] H. Berndt, J. Emmert, and K. Dietmayer, “Continuous driver inten-
[29] B. Jin, J. Bo, S. Tao, H. Liu, and G. Liu, “Switched kalman filter- tion recognition with hidden markov models,” in International IEEE
interacting multiple model algorithm based on optimal autoregressive Conference on Intelligent Transportation Systems, 2008.
model for manoeuvring target tracking,” Iet Radar Sonar & Navigation, [53] S. Qiao, D. Shen, X. Wang, N. Han, and W. Zhu, “A self-adaptive
vol. 9, no. 2, pp. 199–209, 2015. parameter selection trajectory prediction approach via hidden markov
[30] H. Dyckmanns, R. Matthaei, M. Maurer, B. Lichte, and D. Stuker, models,” IEEE Transactions on Intelligent Transportation Systems,
“Object tracking in urban intersections based on active use of a priori vol. 16, no. 1, pp. 284–296, 2015.
knowledge: Active interacting multi model filter,” in Intelligent Vehicles [54] Y. Wang, C. Wang, W. Zhao, and C. Xu, “Decision-making and
Symposium, 2011. planning method for autonomous vehicles based on motivation and
[31] Zhang, Ruifeng, Cao, Libo, Tan, Jianjie, Bao, and Shan, “A method for risk assessment,” IEEE Transactions on Vehicular Technology, vol. 70,
connected vehicle trajectory prediction and collision warning algorithm no. 1, pp. 107–120, 2021.
based on v2v communication,” International journal of crashworthi- [55] N. Deo, A. Rangesh, and M. M. Trivedi, “How would surround vehicles
ness, 2017. move? a unified framework for maneuver classification and motion
[32] V. Lefkopoulos, M. Menner, A. Domahidi, and M. N. Zeilinger, prediction,” IEEE Transactions on Intelligent Vehicles, pp. 129–140,
“Interaction-aware motion prediction for autonomous driving: A mul- 2018.
tiple model kalman filtering scheme,” IEEE Robotics and Automation [56] S. Zhang, Y. Zhi, R. He, and J. Li, “Research on traffic vehicle behavior
Letters, vol. 6, no. 1, pp. 80–87, 2021. prediction method based on game theory and hmm,” IEEE Access,
[33] M. Althoff and A. Mergel, “Comparison of markov chain abstraction vol. 8, pp. 30 210–30 222, 2020.
and monte carlo simulation for the safety assessment of autonomous [57] D. Koller and N. Friedman, Probabilistic Graphical Models: Principles
cars,” IEEE Transactions on Intelligent Transportation Systems, vol. 12, and Techniques. MIT Press, 2009.
no. 4, pp. 1237–1247, 2011. [58] K. Murphy, “Dynamic bayesian networks: Representation, inference
[34] K. Okamoto, K. Berntorp, and S. Di Cairano, “Driver intention-based and learning,” Ph.D. dissertation, 01 2002.
vehicle threat assessment using random forests and particle filtering,” [59] T. Gindele, S. Brechtel, and R. Dillmann, “Learning driver behavior
IFAC-PapersOnLine, vol. 50, pp. 13 860–13 865, 07 2017. models from traffic observations for decision making and planning,”

Intelligent Transportation Systems Magazine, IEEE, vol. 7, no. 1, pp. [82] N. Deo and M. M. Trivedi, “Multi-modal trajectory prediction of
69–79, 2015. surrounding vehicles with maneuver based lstms,” in 2018 IEEE
[60] M. Schreier, V. Willert, and J. Adamy, “An integrated approach Intelligent Vehicles Symposium (IV), 2018, pp. 1179–1184.
to maneuver-based trajectory prediction and criticality assessment in [83] K. Min, D. Kim, J. Park, and K. Huh, “Rnn-based path prediction of
arbitrary road environments,” IEEE Transactions on Intelligent Trans- obstacle vehicles with deep ensemble,” IEEE Transactions on Vehicular
portation Systems, vol. 17, no. 10, pp. 2751–2766, 2016. Technology, vol. 68, no. 10, pp. 10 252–10 256, 2019.
[61] M. Bahram, A. Lawitzky, J. Friedrichs, M. Aeberhard, and D. Wollherr, [84] C. Tang and R. R. Salakhutdinov, “Multiple futures prediction,” Ad-
“A game-theoretic approach to replanning-aware interactive scene vances in Neural Information Processing Systems, vol. 32, pp. 15 424–
prediction and planning,” IEEE Transactions on Vehicular Technology, 15 434, 2019.
vol. 65, no. 6, pp. 3981–3992, 2015. [85] J. Mercat, T. Gilles, N. El Zoghby, G. Sandou, D. Beauvois, and
[62] G. He, X. Li, Y. Lv, B. Gao, and H. Chen, “Probabilistic intention G. P. Gil, “Multi-head attention for multi-modal joint vehicle motion
prediction and trajectory generation based on dynamic bayesian net- forecasting,” in 2020 IEEE International Conference on Robotics and
works,” in 2019 Chinese Automation Congress (CAC), 2019. Automation (ICRA). IEEE, 2020, pp. 9638–9644.
[63] J. Li, B. Dai, X. Li, X. Xu, and D. Liu, “A dynamic bayesian [86] X. Li, Y. Liu, K. Wang, and F.-Y. Wang, “A recurrent attention
network for vehicle maneuver prediction in highway driving scenarios: and interaction model for pedestrian trajectory prediction,” IEEE/CAA
Framework and verification,” Electronics, vol. 8, no. 1, pp. 40–, 2019. Journal of Automatica Sinica, vol. 7, no. 5, pp. 1361–1370, 2020.
[64] Y. Li, X. Y. Lu, J. Wang, and K. Li, “Pedestrian trajectory predic- [87] T. Zhang, W. Song, M. Fu, Y. Yang, and M. Wang, “Vehicle motion
tion combining probabilistic reasoning and sequence learning,” IEEE prediction at intersections based on the turning intention and prior
Transactions on Intelligent Vehicles, vol. 5, no. 3, pp. 461–474, 2020. trajectories model,” IEEE/CAA Journal of Automatica Sinica, 2021.
[65] T. Heskes and O. Zoeter, “Expectation propagation for approximate [88] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
inference in dynamic bayesian networks,” Proceedings UAI-2002, 12 with deep convolutional neural networks,” Advances in neural infor-
2012. mation processing systems, vol. 25, pp. 1097–1105, 2012.
[66] G. Weidl, A. L. Madsen, D. Kasper, and G. Breuel, “Optimizing [89] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
bayesian networks for recognition of driving maneuvers to meet the recognition,” in Proceedings of the IEEE conference on computer vision
automotive requirements,” in 2014 IEEE International Symposium on and pattern recognition, 2016, pp. 770–778.
Intelligent Control (ISIC). IEEE, 2014, pp. 1626–1631. [90] J. Gehring, M. Auli, D. Grangier, D. Yarats, and Y. N. Dauphin, “Con-
[67] A. Graves, “Generating sequences with recurrent neural networks,” volutional sequence to sequence learning,” in International Conference
arXiv preprint arXiv:1308.0850, 2013. on Machine Learning. PMLR, 2017, pp. 1243–1252.
[68] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning [91] N. Nikhil and B. Tran Morris, “Convolutional neural network for
with neural networks,” in Advances in neural information processing trajectory prediction,” in Proceedings of the European Conference on
systems, 2014, pp. 3104–3112. Computer Vision (ECCV) Workshops, 2018, pp. 0–0.
[69] Alex, Zyner, Stewart, Worrall, Eduardo, and Nebot, “A recurrent [92] T. Phan-Minh, E. C. Grigore, F. A. Boulton, O. Beijbom, and E. M.
neural network solution for predicting driver intention at unsignalized Wolff, “Covernet: Multimodal behavior prediction using trajectory
intersections,” IEEE Robotics & Automation Letters, 2018. sets,” in Proceedings of the IEEE/CVF Conference on Computer Vision
[70] A. Zyner, S. Worrall, J. Ward, and E. Nebot, “Long short term memory and Pattern Recognition, 2020, pp. 14 074–14 083.
for driver intent prediction,” in Intelligent Vehicles Symposium, 2017. [93] H. Cui, T. Nguyen, F.-C. Chou, T.-H. Lin, J. Schneider, D. Bradley,
[71] D. J. Phillips, T. A. Wheeler, and M. J. Kochenderfer, “Generalizable and N. Djuric, “Deep kinematic models for kinematically feasible
intention prediction of human drivers at intersections,” in 2017 IEEE vehicle trajectory predictions,” in 2020 IEEE International Conference
Intelligent Vehicles Symposium (IV), 2017. on Robotics and Automation (ICRA). IEEE, 2020, pp. 10 563–10 569.
[72] F. Altché and A. D. L. Fortelle, “An lstm network for highway [94] H. Cui, V. Radosavljevic, F.-C. Chou, T.-H. Lin, T. Nguyen, T.-K.
trajectory prediction,” in 2017 IEEE 20th International Conference on Huang, J. Schneider, and N. Djuric, “Multimodal trajectory predictions
Intelligent Transportation Systems (ITSC), 2017. for autonomous driving using deep convolutional networks,” in 2019
[73] W. Ding and S. Shen, “Online vehicle trajectory prediction using International Conference on Robotics and Automation (ICRA). IEEE,
policy anticipation network and optimization-based context reasoning,” 2019, pp. 2090–2096.
in 2019 International Conference on Robotics and Automation (ICRA). [95] F.-C. Chou, T.-H. Lin, H. Cui, V. Radosavljevic, T. Nguyen, T.-K.
IEEE, 2019, pp. 9610–9616. Huang, M. Niedoba, J. Schneider, and N. Djuric, “Predicting motion
[74] A. Zyner, S. Worrall, and E. Nebot, “Naturalistic driver intention and of vulnerable road users using high-definition maps and efficient
path prediction using recurrent neural networks,” IEEE transactions on convnets,” in 2020 IEEE Intelligent Vehicles Symposium (IV). IEEE,
intelligent transportation systems, vol. 21, no. 4, pp. 1584–1594, 2019. 2020, pp. 1655–1662.
[75] S. H. Park, B. Kim, C. M. Kang, C. C. Chung, and J. W. [96] N. Djuric, V. Radosavljevic, H. Cui, T. Nguyen, F.-C. Chou, T.-H.
Choi, “Sequence-to-sequence prediction of vehicle trajectory via lstm Lin, N. Singh, and J. Schneider, “Uncertainty-aware short-term motion
encoder-decoder architecture,” in 2018 IEEE Intelligent Vehicles Sym- prediction of traffic actors for autonomous driving,” in Proceedings of
posium (IV). IEEE, 2018, pp. 1672–1678. the IEEE/CVF Winter Conference on Applications of Computer Vision,
[76] Y. Xing, C. Lv, and D. Cao, “Personalized vehicle trajectory prediction 2020, pp. 2095–2104.
based on joint time-series modeling for connected vehicles,” IEEE [97] J. Strohbeck, V. Belagiannis, J. Müller, M. Schreiber, M. Herrmann,
Transactions on Vehicular Technology, vol. 69, no. 2, pp. 1341–1352, D. Wolf, and M. Buchholz, “Multiple trajectory prediction with deep
2020. temporal and spatial convolutional neural networks,” in 2020 IEEE/RSJ
[77] M.-F. Chang, J. Lambert, P. Sangkloy, J. Singh, S. Bak, A. Hartnett, International Conference on Intelligent Robots and Systems (IROS).
D. Wang, P. Carr, S. Lucey, D. Ramanan et al., “Argoverse: 3d tracking IEEE, 2020, pp. 1992–1998.
and forecasting with rich maps,” in Proceedings of the IEEE/CVF [98] Y. Zhang, Y. Zou, J. Tang, and J. Liang, “A lane-changing predic-
Conference on Computer Vision and Pattern Recognition, 2019, pp. tion method based on temporal convolution network,” arXiv preprint
8748–8757. arXiv:2011.01224, 2020.
[78] A. Kawasaki and A. Seki, “Multimodal trajectory predictions for [99] F. Marchetti, F. Becattini, L. Seidenari, and A. D. Bimbo, “Mantra:
urban environments using geometric relationships between a vehicle Memory augmented networks for multiple trajectory prediction,” in
and lanes,” in 2020 IEEE International Conference on Robotics and Proceedings of the IEEE/CVF Conference on Computer Vision and
Automation (ICRA). IEEE, 2020, pp. 9203–9209. Pattern Recognition, 2020, pp. 7143–7152.
[79] S. Dai, L. Li, and Z. Li, “Modeling vehicle interactions via modified [100] T. Gilles, S. Sabatini, D. Tsishkou, B. Stanciulescu, and F. Moutarde,
lstm models for trajectory prediction,” IEEE Access, pp. 38 287–38 296, “Home: Heatmap output for future motion estimation,” in 2021 IEEE
2019. International Intelligent Transportation Systems Conference (ITSC).
[80] W. Ding, J. Chen, and S. Shen, “Predicting vehicle behaviors over IEEE, 2021, pp. 500–507.
an extended horizon using behavior interaction network,” in 2019 [101] M. Ye, T. Cao, and Q. Chen, “Tpcn: Temporal point cloud networks
International Conference on Robotics and Automation (ICRA), 2019. for motion forecasting,” in Proceedings of the IEEE/CVF Conference
[81] L. Xin, P. Wang, C. Y. Chan, J. Chen, and B. Cheng, “Intention-aware on Computer Vision and Pattern Recognition, 2021, pp. 11 318–11 327.
long horizon trajectory prediction of surrounding vehicles using dual [102] N. Deo and M. M. Trivedi, “Convolutional social pooling for vehicle
lstm networks,” in 2018 IEEE International Conference on Intelligent trajectory prediction,” in 2018 IEEE/CVF Conference on Computer
Transportation Systems (ITSC), 2018. Vision and Pattern Recognition Workshops (CVPRW), 2018.

[103] A. Alahi, K. Goel, V. Ramanathan, A. Robicquet, and S. Savarese, Conference on Computer Vision and Pattern Recognition, 2021, pp.
“Social lstm: Human trajectory prediction in crowded spaces,” in 7577–7586.
2016 IEEE Conference on Computer Vision and Pattern Recognition [125] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu, “A
(CVPR), 2016. comprehensive survey on graph neural networks,” IEEE Transactions
[104] R. Chandra, U. Bhattacharya, A. Bera, and D. Manocha, “Traphic: on Neural Networks and Learning Systems, 2019.
Trajectory prediction in dense and heterogeneous traffic using weighted [126] F. Diehl, T. Brunner, M. T. Le, and A. Knoll, “Graph neural networks
interactions,” in Proceedings of the IEEE/CVF Conference on Com- for modelling traffic participant interaction,” in 2019 IEEE Intelligent
puter Vision and Pattern Recognition, 2019, pp. 8483–8492. Vehicles Symposium (IV), 2019.
[105] G. Xie, A. Shangguan, R. Fei, W. Ji, and X. Hei, “Motion trajectory [127] X. Li, X. Ying, and M. C. Chuah, “Grip: Graph-based interaction-aware
prediction based on a cnn-lstm sequential model,” Sciece China. trajectory prediction,” in 2019 IEEE Intelligent Transportation Systems
Information Sciences, vol. 63, no. 11, 2020. Conference (ITSC). IEEE, 2019, pp. 3960–3966.
[106] S. Casas, C. Gulino, S. Suo, and R. Urtasun, “The importance of [128] ——, “Grip++: Enhanced graph-based interaction-aware trajectory
prior knowledge in precise multimodal prediction,” in 2020 IEEE/RSJ prediction for autonomous driving,” arXiv preprint arXiv:1907.07792,
International Conference on Intelligent Robots and Systems (IROS). 2019.
IEEE, 2020, pp. 2295–2302. [129] X. Huang, P. Wang, X. Cheng, D. Zhou, Q. Geng, and R. Yang, “The
[107] N. Lee, W. Choi, P. Vernaza, C. B. Choy, P. H. Torr, and M. Chandraker, apolloscape open dataset for autonomous driving and its application,”
“Desire: Distant future prediction in dynamic scenes with interacting in 2018 IEEE/CVF Conference on Computer Vision and Pattern
agents,” in Proceedings of the IEEE Conference on Computer Vision Recognition Workshops (CVPRW), 2018.
and Pattern Recognition, 2017, pp. 336–345. [130] H. Jeon, J. Choi, and D. Kum, “Scale-net: Scalable vehicle trajectory
[108] J. Hong, B. Sapp, and J. Philbin, “Rules of the road: Predicting prediction network under random number of interacting vehicles via
driving behavior with a convolutional model of semantic interactions,” edge-enhanced graph convolutional neural network,” arXiv, 2020.
in Proceedings of the IEEE/CVF Conference on Computer Vision and [131] L. Gong and Q. Cheng, “Exploiting edge features in graph neural
Pattern Recognition, 2019, pp. 8454–8462. networks,” in 2019 IEEE/CVF Conference on Computer Vision and
[109] Y. Chai, B. Sapp, M. Bansal, and D. Anguelov, “Multipath: Multiple Pattern Recognition (CVPR), 2018.
probabilistic anchor trajectory hypotheses for behavior prediction,” [132] A. Mohamed, K. Qian, M. Elhoseiny, and C. Claudel, “Social-stgcnn:
arXiv preprint arXiv:1910.05449, 2019. A social spatio-temporal graph convolutional neural network for human
[110] D. Hu, “An introductory survey on attention mechanisms in nlp prob- trajectory prediction,” in 2020 IEEE/CVF Conference on Computer
lems,” in Proceedings of SAI Intelligent Systems Conference. Springer, Vision and Pattern Recognition (CVPR), 2020.
2019, pp. 432–448. [133] R. Chandra, T. Guan, S. Panuganti, T. Mittal, U. Bhattacharya, A. Bera,
[111] V. Mnih, N. Heess, A. Graves et al., “Recurrent models of visual and D. Manocha, “Forecasting trajectory and behavior of road-agents
attention,” in Advances in neural information processing systems, 2014, using spectral clustering in graph-lstms,” IEEE Robotics and Automa-
pp. 2204–2212. tion Letters, vol. 5, no. 3, pp. 4882–4890, 2020.
[112] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by [134] B. M. Waxman, “Routing of multipoint connections,” IEEE
jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, J.select.areas Commun, vol. 6, no. 9, pp. 1617–1622, 1988.
2014. [135] Z. Zhao, H. Fang, Z. Jin, and Q. Qiu, “Gisnet: Graph-based information
[113] D. Varshneya and G. Srinivasaraghavan, “Human trajectory predic- sharing network for vehicle trajectory prediction,” in 2020 International
tion using spatially aware deep attention models,” arXiv preprint Joint Conference on Neural Networks (IJCNN). IEEE, 2020, pp. 1–7.
arXiv:1705.09436, 2017. [136] J. Ziegler, P. Bender, M. Schreiber, H. Lategahn, T. Strauss, C. Stiller,
[114] S. Haddad, M. Wu, H. Wei, and S. K. Lam, “Situation-aware pedes- T. Dang, U. Franke, N. Appenrodt, C. G. Keller et al., “Making bertha
trian trajectory prediction with spatio-temporal attention model,” arXiv drive—an autonomous journey on a historic route,” IEEE Intelligent
preprint arXiv:1902.05437, 2019. transportation systems magazine, vol. 6, no. 2, pp. 8–20, 2014.
[115] T. Fernando, S. Denman, S. Sridharan, and C. Fookes, “Soft+ hardwired [137] J. Gao, C. Sun, H. Zhao, Y. Shen, D. Anguelov, C. Li, and C. Schmid,
attention: An lstm framework for human trajectory prediction and “Vectornet: Encoding hd maps and agent dynamics from vectorized
abnormal event detection,” Neural networks, vol. 108, pp. 466–478, representation,” in Proceedings of the IEEE/CVF Conference on Com-
2018. puter Vision and Pattern Recognition, 2020, pp. 11 525–11 533.
[116] H. Kim, D. Kim, G. Kim, J. Cho, and K. Huh, “Multi-head atten- [138] M. Liang, B. Yang, R. Hu, Y. Chen, R. Liao, S. Feng, and R. Urtasun,
tion based probabilistic vehicle trajectory prediction,” in 2020 IEEE “Learning lane graph representations for motion forecasting,” in Euro-
Intelligent Vehicles Symposium (IV). IEEE, 2020, pp. 1720–1725. pean Conference on Computer Vision. Springer, 2020, pp. 541–556.
[117] K. Messaoud, I. Yahiaoui, A. Verroust-Blondet, and F. Nashashibi, [139] H. Zhao, J. Gao, T. Lan, C. Sun, B. Sapp, B. Varadarajan, Y. Shen,
“Attention based vehicle trajectory prediction,” IEEE Transactions on Y. Shen, Y. Chai, C. Schmid et al., “Tnt: Target-driven trajectory
Intelligent Vehicles, vol. 6, no. 1, pp. 175–185, 2020. prediction,” in Conference on Robot Learning (CoRL), 2020, pp. 895–
[118] K. Messaoud, N. Deo, M. M. Trivedi, and F. Nashashibi, “Trajectory 904.
prediction for autonomous driving based on multi-head attention with [140] J. Gu, C. Sun, and H. Zhao, “Densetnt: End-to-end trajectory prediction
joint agent-map representation,” in 2021 IEEE Intelligent Vehicles from dense goal sets,” in Proceedings of the IEEE/CVF International
Symposium (IV). IEEE, 2021, pp. 165–170. Conference on Computer Vision, 2021, pp. 15 303–15 312.
[119] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. [141] W. Zeng, M. Liang, R. Liao, and R. Urtasun, “Lanercnn: Dis-
Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in tributed representations for graph-centric motion forecasting,” in 2021
Advances in neural information processing systems, 2017, pp. 5998– IEEE/RSJ International Conference on Intelligent Robots and Systems
6008. (IROS). IEEE, 2021, pp. 532–539.
[120] F. Giuliari, I. Hasan, M. Cristani, and F. Galasso, “Transformer [142] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and
networks for trajectory forecasting,” in 2020 25th International Confer- Y. Bengio, “Graph attention networks,” in 6th International Conference
ence on Pattern Recognition (ICPR). IEEE, 2021, pp. 10 335–10 342. on Learning Representations (ICLR), 2018.
[121] Z. Huang, X. Mo, and C. Lv, “Multi-modal motion prediction with [143] Y. Huang, H. Bi, Z. Li, T. Mao, and Z. Wang, “Stgat: Modeling
transformer-based neural network for autonomous driving,” arXiv spatial-temporal interactions for human trajectory prediction,” in 2019
preprint arXiv:2109.06446, 2021. International Conference in Computer Vision, 2019.
[122] L. L. Li, B. Yang, M. Liang, W. Zeng, M. Ren, S. Segal, and [144] L. Zhang, Q. She, and P. Guo, “Stochastic trajectory prediction with
R. Urtasun, “End-to-end contextual perception and prediction with social graph network,” 07 2019.
interaction transformer,” in 2020 IEEE/RSJ International Conference [145] T. Salzmann, B. Ivanovic, P. Chakravarty, and M. Pavone, “Trajec-
on Intelligent Robots and Systems (IROS). IEEE, 2020, pp. 5784– tron++: Dynamically-feasible trajectory forecasting with heterogeneous
5791. data,” in European Conference on Computer Vision. Springer, 2020,
[123] J. Ngiam, B. Caine, V. Vasudevan, Z. Zhang, H.-T. L. Chiang, J. Ling, pp. 683–700.
R. Roelofs, A. Bewley, C. Liu, A. Venugopal et al., “Scene transformer: [146] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
A unified multi-task model for behavior prediction and planning,” in he S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,”
International Conference on Learning Representations (ICLR), 2021. Advances in neural information processing systems, vol. 27, 2014.
[124] Y. Liu, J. Zhang, L. Fang, Q. Jiang, and B. Zhou, “Multimodal motion [147] A. Gupta, J. Johnson, L. Fei-Fei, S. Savarese, and A. Alahi, “Social gan:
prediction with stacked transformers,” in Proceedings of the IEEE/CVF Socially acceptable trajectories with generative adversarial networks,”

in 2018 IEEE/CVF Conference on Computer Vision and Pattern [170] D. Silver, J. A. Bagnell, and A. Stentz, “Learning autonomous driving
Recognition (CVPR), 2018. styles and maneuvers from expert demonstration,” in Experimental
[148] B. Yang, G. Yan, P. Wang, C.-y. Chan, X. Liu, and Y. Chen, “Tppo: Robotics. Springer, 2013, pp. 371–386.
A novel trajectory predictor with pseudo oracle,” 02 2020. [171] B. D. Ziebart, A. L. Maas, J. A. Bagnell, A. K. Dey et al., “Maximum
[149] J. Li, H. Ma, and M. Tomizuka, “Conditional generative neural system entropy inverse reinforcement learning.” in Aaai, vol. 8. Chicago, IL,
for probabilistic trajectory prediction,” in 2019 IEEE/RSJ International USA, 2008, pp. 1433–1438.
Conference on Intelligent Robots and Systems (IROS), 2019. [172] M. Herman, V. Fischer, T. Gindele, and W. Burgard, “Inverse rein-
[150] A. Sadeghian, V. Kosaraju, A. Sadeghian, N. Hirose, H. Rezatofighi, forcement learning of behavioral models for online-adapting navigation
and S. Savarese, “Sophie: An attentive gan for predicting paths compli- strategies,” in 2015 IEEE international conference on robotics and
ant to social and physical constraints,” in 2019 IEEE/CVF Conference automation (ICRA). IEEE, 2015, pp. 3215–3222.
on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 1349– [173] S. Sharifzadeh, I. Chiotellis, R. Triebel, and D. Cremers, “Learning to
1358. drive using inverse reinforcement learning and deep q-networks,” arXiv
[151] C. Hegde, S. Dash, and P. Agarwal, “Vehicle trajectory prediction using preprint arXiv:1612.03653, 2016.
gan,” in 2020 Fourth International Conference on I-SMAC (IoT in [174] L. Sun, W. Zhan, and M. Tomizuka, “Probabilistic prediction of interac-
Social, Mobile, Analytics and Cloud) (I-SMAC), 2020, pp. 502–507. tive driving behavior via hierarchical inverse reinforcement learning,”
[152] T. Zhao, Y. Xu, M. Monfort, W. Choi, C. Baker, Y. Zhao, Y. Wang, in 2018 21st International Conference on Intelligent Transportation
and Y. N. Wu, “Multi-agent tensor fusion for contextual trajectory Systems (ITSC). IEEE, 2018, pp. 2111–2117.
prediction,” in 2019 IEEE/CVF Conference on Computer Vision and [175] D. S. González, O. Erkent, V. Romero-Cano, J. Dibangoye, and
Pattern Recognition (CVPR), 2019. C. Laugier, “Modeling driver behavior from demonstrations in dynamic
[153] Y. Wang, S. Zhao, R. Zhang, X. Cheng, and L. Yang, “Multi-vehicle environments using spatiotemporal lattices,” in 2018 IEEE Interna-
collaborative learning for trajectory prediction with spatio-temporal tional Conference on Robotics and Automation (ICRA). IEEE, 2018,
tensor fusion,” IEEE Transactions on Intelligent Transportation Sys- pp. 3384–3390.
tems, vol. PP, no. 99, pp. 1–13, 2020. [176] D. Xu, Z. Ding, X. He, H. Zhao, M. Moze, F. Aioun, and F. Guillemard,
[154] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv “Learning from naturalistic driving data for human-like autonomous
preprint arXiv:1312.6114, 2013. highway driving,” IEEE Transactions on Intelligent Transportation
[155] K. Sohn, H. Lee, and X. Yan, “Learning structured output represen- Systems, 2020.
tation using deep conditional generative models,” Advances in neural [177] Z. Wu, L. Sun, W. Zhan, C. Yang, and M. Tomizuka, “Efficient
information processing systems, vol. 28, pp. 3483–3491, 2015. sampling-based maximum entropy inverse reinforcement learning with
[156] N. Rhinehart, K. M. Kitani, and P. Vernaza, “R2p2: A reparameterized application to autonomous driving,” IEEE Robotics and Automation
pushforward policy for diverse, precise generative path forecasting,” in Letters, vol. 5, no. 4, pp. 5355–5362, 2020.
Proceedings of the European Conference on Computer Vision (ECCV), [178] N. Deo and M. M. Trivedi, “Trajectory forecasts in unknown
2018, pp. 772–788. environments conditioned on grid-based plans,” arXiv preprint
[157] S. Casas, C. Gulino, S. Suo, K. Luo, R. Liao, and R. Urtasun, “Implicit arXiv:2001.00735, 2020.
latent variable model for scene-consistent motion forecasting,” in [179] L. Xin, S. E. Li, P. Wang, W. Cao, B. Nie, C.-Y. Chan, and
Proceedings of the European Conference on Computer Vision (ECCV). B. Cheng, “Accelerated inverse reinforcement learning with randomly
Springer, 2020. pre-sampled policies for autonomous driving reward design,” in 2019
[158] C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hierarchical IEEE Intelligent Transportation Systems Conference (ITSC). IEEE,
feature learning on point sets in a metric space,” Advances in neural 2019, pp. 2757–2764.
information processing systems, vol. 30, 2017. [180] Y. Xu, T. Zhao, C. Baker, Y. Zhao, and Y. N. Wu, “Learning trajec-
[159] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning tory prediction with continuous inverse optimal control via langevin
on point sets for 3d classification and segmentation,” in Proceedings sampling of energy-based models,” arXiv preprint arXiv:1904.05453,
of the IEEE conference on computer vision and pattern recognition, 2019.
2017, pp. 652–660. [181] Z. Huang, J. Wu, and C. Lv, “Driving behavior modeling using
[160] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. naturalistic human driving data with inverse reinforcement learning,”
Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski IEEE Transactions on Intelligent Transportation Systems, 2021.
et al., “Human-level control through deep reinforcement learning,” [182] A. Kuefler, J. Morton, T. Wheeler, and M. Kochenderfer, “Imitating
nature, vol. 518, no. 7540, pp. 529–533, 2015. driver behavior with generative adversarial networks,” in 2017 IEEE
[161] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Intelligent Vehicles Symposium (IV). IEEE, 2017, pp. 204–211.
Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, [183] Y. Li, J. Song, and S. Ermon, “Infogail: Interpretable imitation learning
M. Lanctot et al., “Mastering the game of go with deep neural networks from visual demonstrations,” in Proceedings of the 31st International
and tree search,” nature, vol. 529, no. 7587, pp. 484–489, 2016. Conference on Neural Information Processing Systems, 2017, pp.
[162] B. Hjaltason, “Predicting vehicle trajectories with inverse reinforcement 3815–3825.
learning,” 2019. [184] R. Bhattacharyya, B. Wulfe, D. Phillips, A. Kuefler, J. Morton,
[163] B. R. Kiran, I. Sobh, V. Talpaert, P. Mannion, A. A. Al Sallab, S. Yo- R. Senanayake, and M. Kochenderfer, “Modeling human driving behav-
gamani, and P. Pérez, “Deep reinforcement learning for autonomous ior through generative adversarial imitation learning,” arXiv preprint
driving: A survey,” IEEE Transactions on Intelligent Transportation arXiv:2006.06412, 2020.
Systems, 2021. [185] S. Choi, J. Kim, and H. Yeo, “Trajgail: Generating urban vehicle trajec-
[164] T. Fernando, S. Denman, S. Sridharan, and C. Fookes, “Deep inverse tories using generative adversarial imitation learning,” Transportation
reinforcement learning for behavior prediction in autonomous driving: Research Part C: Emerging Technologies, vol. 128, p. 103091, 2021.
Accurate forecasts of vehicle motion,” IEEE Signal Processing Maga- [186] C. You, J. Lu, D. Filev, and P. Tsiotras, “Advanced planning for
zine, vol. 38, no. 1, pp. 87–96, 2020. autonomous vehicles using reinforcement learning and deep inverse
[165] R. Bellman, “A markovian decision process,” Journal of mathematics reinforcement learning,” Robotics and Autonomous Systems, vol. 114,
and mechanics, vol. 6, no. 5, pp. 679–684, 1957. pp. 1–18, 2019.
[166] P. Wang, C.-Y. Chan, and A. de La Fortelle, “A reinforcement learning [187] M. Wulfmeier, D. Rao, D. Z. Wang, P. Ondruska, and I. Posner,
based approach for automated lane change maneuvers,” in 2018 IEEE “Large-scale cost function learning for path planning using deep
Intelligent Vehicles Symposium (IV). IEEE, 2018, pp. 1379–1384. inverse reinforcement learning,” The International Journal of Robotics
[167] Y. Guan, S. E. Li, J. Duan, W. Wang, and B. Cheng, “Markov Research, vol. 36, no. 10, pp. 1073–1087, 2017.
probabilistic decision making of self-driving cars in highway with [188] Q. Zou, H. Li, and R. Zhang, “Inverse reinforcement learning via neural
random traffic flow: a simulation study,” Journal of Intelligent and network in driver behavior modeling,” in 2018 IEEE Intelligent Vehicles
Connected Vehicles, 2018. Symposium (IV). IEEE, 2018, pp. 1245–1250.
[168] N. D. Ratliff, J. A. Bagnell, and M. A. Zinkevich, “Maximum margin [189] Z. Zhu, N. Li, R. Sun, D. Xu, and H. Zhao, “Off-road autonomous
planning,” in Proceedings of the 23rd international conference on vehicles traversability analysis and trajectory planning based on deep
Machine learning, 2006, pp. 729–736. inverse reinforcement learning,” in 2020 IEEE Intelligent Vehicles
[169] E. Klein, M. Geist, B. Piot, and O. Pietquin, “Inverse reinforcement Symposium (IV). IEEE, 2020, pp. 971–977.
learning through structured classification,” in NIPS 2012, 2012, pp. [190] C. Jung and D. H. Shim, “Incorporating multi-context into the
1–9. traversability map for urban autonomous driving using deep inverse

reinforcement learning,” IEEE Robotics and Automation Letters, vol. 6, Yanjun Huang is a Professor at School of Auto-
no. 2, pp. 1662–1669, 2021. motive studies, Tongji University. He received his
[191] J. Ho and S. Ermon, “Generative adversarial imitation learning,” PhD Degree in 2016 from the Department of MME
Advances in neural information processing systems, vol. 29, pp. 4565– at University of Waterloo. His research interest is
4573, 2016. mainly on improving vehicle performance in terms
[192] M. Wulfmeier, P. Ondruska, and I. Posner, “Maximum entropy deep in- of safety, energy-saving, and intelligence by using
verse reinforcement learning,” arXiv preprint arXiv:1507.04888, 2015. advanced control and learning methods.
[193] M. Wulfmeier, D. Z. Wang, and I. Posner, “Watch this: Scalable cost- He has published several books, over 60 papers in
function learning for path planning in urban environments,” in 2016 journals and conference; He is the recipient of IEEE
IEEE/RSJ International Conference on Intelligent Robots and Systems VTS 2019 Best Land Transportation Paper Award,
(IROS). IEEE, 2016, pp. 2089–2095. the 2018 Best paper of Automotive Innovation, etc.
[194] “Traffic analysis tools,” https://ops.fhwa.dot.gov/trafficanalysistools/ He is serving as AE or EBM of IET Intelligent Transport System, SAE Int.
index.htm, accessed January 6, 2021. J. of Commercial vehicles, Int. J. of Autonomous Vehicle system, etc.
[195] H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu,
A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom, “nuscenes: A
Jiatong Du received the B.S. degree in the School
multimodal dataset for autonomous driving,” in Proceedings of the
of Automotive Studies, Tongji University, Shang-
IEEE/CVF conference on computer vision and pattern recognition,
hai, China, in 2020. He is currently pursuing the
2020, pp. 11 621–11 631.
Ph.D. degree with the School of Automotive Studies,
[196] S. Ettinger, S. Cheng, B. Caine, C. Liu, H. Zhao, S. Pradhan, Y. Chai,
Tongji University, Shanghai, China. His research
B. Sapp, C. R. Qi, Y. Zhou et al., “Large scale interactive motion fore-
interests include machine learning, deep learning,
casting for autonomous driving: The waymo open motion dataset,” in
trajectory prediction, trajectory planning, intelligent
Proceedings of the IEEE/CVF International Conference on Computer
transportation systems, and autonomous vehicle.
Vision, 2021, pp. 9710–9719.
[197] J. Houston, G. Zuidhof, L. Bergamini, Y. Ye, L. Chen, A. Jain,
S. Omari, V. Iglovikov, and P. Ondruska, “One thousand and
one hours: Self-driving motion prediction dataset,” arXiv preprint
arXiv:2006.14480, 2020.
[198] B. Wilson, W. Qi, T. Agarwal, J. Lambert, J. Singh, S. Khandelwal, Ziru Yang received the B.S. degree in the School
B. Pan, R. Kumar, A. Hartnett, J. K. Pontes, D. Ramanan, P. Carr, of Automotive Studies at Tongji University in 2021.
and J. Hays, “Argoverse 2: Next generation datasets for self-driving She is currently pursuing the M.S. degree with the
perception and forecasting,” in Proceedings of the Neural Information School of Automotive Studies, Tongji University,
Processing Systems Track on Datasets and Benchmarks (NeurIPS Shanghai, China. Her research interests include tra-
Datasets and Benchmarks 2021), 2021. jectory prediction and planning, cooperative vehicle
[199] W. Zhan, L. Sun, D. Wang, H. Shi, A. Clausse, M. Naumann, J. Kum- infrastructure system, and autonomous vehicle.
merle, H. Konigshof, C. Stiller, A. de La Fortelle et al., “Interaction
dataset: An international, adversarial and cooperative motion dataset
in interactive driving scenarios with semantic maps,” arXiv preprint
arXiv:1910.03088, 2019.
[200] R. Krajewski, J. Bock, L. Kloeker, and L. Eckstein, “The highd dataset: Zewei Zhou received the B.S. degree in vehicle en-
A drone dataset of naturalistic vehicle trajectories on german highways gineering from Chongqing University in 2020. He is
for validation of highly automated driving systems,” in 2018 21st currently pursuing the M.S. degree with the School
International Conference on Intelligent Transportation Systems (ITSC). of Automotive Studies, Tongji University, Shang-
IEEE, 2018, pp. 2118–2125. hai, China. His research interests include machine
[201] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: learning, trajectory prediction, trajectory planning,
The kitti dataset,” The International Journal of Robotics Research, intelligent transportation systems, and autonomous
vol. 32, no. 11, pp. 1231–1237, 2013. vehicle.
[202] K. Saleh, M. Hossny, and S. Nahavandi, “Long-term recurrent predic-
tive model for intent prediction of pedestrians via inverse reinforcement
learning,” in 2018 Digital Image Computing: Techniques and Applica-
tions (DICTA). IEEE, 2018, pp. 1–8.
[203] T. Fernando, S. Denman, S. Sridharan, and C. Fookes, “Neighbourhood Lin Zhang received the Ph.D. degree in automotive
context embeddings in deep inverse reinforcement learning for predict- engineering from Jilin University, in 2019. He is
ing pedestrian motion over long time horizons,” in Proceedings of the currently a Post-Doctoral Fellow with the School of
IEEE/CVF International Conference on Computer Vision Workshops, Automotive Studies, Tongji University. His research
2019, pp. 0–0. interests include vehicle control in terms of safety,
[204] M. Bahram, C. Hubmann, A. Lawitzky, M. Aeberhard, and D. Wollherr, saving energy, and intelligence, including vehicle
“A combined model-and learning-based framework for interaction- dynamics and control, HEV control, and trajectory
aware maneuver prediction,” IEEE Transactions on Intelligent Trans- planning.
portation Systems, vol. 17, no. 6, pp. 1538–1550, 2016.
[205] S. Wang, P. Zhao, B. Yu, W. Huang, and H. Liang, “Vehicle trajectory
prediction by knowledge-driven lstm network in urban environments,”
Journal of Advanced Transportation, vol. 2020, 2020.
[206] “Inter-tnt (jointly vectornet-tnt-interaction) evaluator,” Hong Chen (M’02–SM’12) received the B.S. and
https://github.com/ApolloAuto/apollo/blob/master/docs/technical M.S. degrees in process control from Zhejiang Uni-
documents/jointly prediction planning evaluator.md, accessed March versity, China, in 1983 and 1986, respectively, and
26, 2022. the Ph.D. degree in system dynamics and control
engineering from the University of Stuttgart, Ger-
many, in 1997. In 1986, she joined Jilin University
of Technology, China. From 1993 to 1997, she was a
Wissenschaftlicher Mitarbeiter with the Institut fuer
Systemdynamik und Regelungstechnik, University
of Stuttgart. Since 1999, she has been a professor
at Jilin University and hereafter a Tang Aoqing
professor. Recently, she joined Tongji University as a distinguished professor.
Her current research interests include model predictive control, nonlinear
control and applications in mechatronic systems e.g. automotive systems.

11.power Foundation of Leadership
100% (3)
11.power Foundation of Leadership
34 pages
Design and Implementation of A Convolutional Neural Network On An Edge Computing Smartphone For Human Activity Recognition
No ratings yet
Design and Implementation of A Convolutional Neural Network On An Edge Computing Smartphone For Human Activity Recognition
12 pages
Improved YOLOv4 Tiny Network For Real-Time Electronic Component Detection
No ratings yet
Improved YOLOv4 Tiny Network For Real-Time Electronic Component Detection
13 pages
Implementation of SLAM On Mobile Robots and Stitching of The Generated Maps
No ratings yet
Implementation of SLAM On Mobile Robots and Stitching of The Generated Maps
13 pages
12 - Goal Stack Planning
100% (1)
12 - Goal Stack Planning
65 pages
Embedded Systems Application Areas
No ratings yet
Embedded Systems Application Areas
8 pages
SSRN Id4107251
No ratings yet
SSRN Id4107251
7 pages
Chapter 7. Object Recognition
No ratings yet
Chapter 7. Object Recognition
106 pages
PROJECT REPORT Vidoe Tracking
100% (1)
PROJECT REPORT Vidoe Tracking
27 pages
Gesture Controlled Robot Using Image Processing
No ratings yet
Gesture Controlled Robot Using Image Processing
19 pages
Development of Autonomous Manoeuvres in A Quadcopter
100% (1)
Development of Autonomous Manoeuvres in A Quadcopter
108 pages
CNN-based and DTW Features For Human Activity Recognition On Depth Maps
No ratings yet
CNN-based and DTW Features For Human Activity Recognition On Depth Maps
14 pages
Robot Vision
No ratings yet
Robot Vision
40 pages
Concurrent Activity Recognition With Multimodal CNN-LSTM Structure
No ratings yet
Concurrent Activity Recognition With Multimodal CNN-LSTM Structure
14 pages
Panoptic Segmentation
No ratings yet
Panoptic Segmentation
29 pages
Robotics and Computer Vision in Swarm Intelligence and Traffic Safety
No ratings yet
Robotics and Computer Vision in Swarm Intelligence and Traffic Safety
10 pages
Introduction To Mobile Robotics: SLAM: Simultaneous Localization and Mapping
No ratings yet
Introduction To Mobile Robotics: SLAM: Simultaneous Localization and Mapping
48 pages
Multi Object Tracking in Traffic Environments: A Systematic Literature
No ratings yet
Multi Object Tracking in Traffic Environments: A Systematic Literature
13 pages
Pedestrian Tracking Algorithm For Video Surveillance Based On Lightweight Convolutional Neural Network
No ratings yet
Pedestrian Tracking Algorithm For Video Surveillance Based On Lightweight Convolutional Neural Network
12 pages
16 Robotics Visions Warm Intelligence Traffic Safety
No ratings yet
16 Robotics Visions Warm Intelligence Traffic Safety
9 pages
Analysis of Walking Pattern Using LRCN For Early Diagnosis of Dementia in Elderly Patients
No ratings yet
Analysis of Walking Pattern Using LRCN For Early Diagnosis of Dementia in Elderly Patients
12 pages
Chapter 1 Robotics
No ratings yet
Chapter 1 Robotics
36 pages
Introduction To Robot Vision: Ziv Yaniv Computer Aided Interventions and Medical Robotics, Georgetown University
No ratings yet
Introduction To Robot Vision: Ziv Yaniv Computer Aided Interventions and Medical Robotics, Georgetown University
28 pages
GPS Denied Navigation
100% (1)
GPS Denied Navigation
3 pages
Modeling and Control of A 4-Wheel Skid-Steering Mobile Robot
No ratings yet
Modeling and Control of A 4-Wheel Skid-Steering Mobile Robot
20 pages
Autonomous Agile Aerial Robots: by P.A.Murthy 09A81A04A4 Iv/Iv Ece
No ratings yet
Autonomous Agile Aerial Robots: by P.A.Murthy 09A81A04A4 Iv/Iv Ece
33 pages
Computational Intelligence and Applications
No ratings yet
Computational Intelligence and Applications
4 pages
Autonomous Underwater Vehicles: Jing Yan Xian Yang Haiyan Zhao Xiaoyuan Luo Xinping Guan
No ratings yet
Autonomous Underwater Vehicles: Jing Yan Xian Yang Haiyan Zhao Xiaoyuan Luo Xinping Guan
222 pages
Dynamic Mobile Robot Paper 1
No ratings yet
Dynamic Mobile Robot Paper 1
6 pages
Robotics
No ratings yet
Robotics
15 pages
Vision Systems Applications PDF
No ratings yet
Vision Systems Applications PDF
618 pages
Applsci 13 04144 v2
No ratings yet
Applsci 13 04144 v2
26 pages
Computer Vision Based Moving Object Detection and Tracking: Suresh Kumar, Prof. Yatin Kumar Agarwal
No ratings yet
Computer Vision Based Moving Object Detection and Tracking: Suresh Kumar, Prof. Yatin Kumar Agarwal
6 pages
Wireless Control Quadcopter
No ratings yet
Wireless Control Quadcopter
54 pages
3D Convolutional Neural Networks For Human Action Recognition
No ratings yet
3D Convolutional Neural Networks For Human Action Recognition
11 pages
A Survey of Neuromorphic Computing and Neural Networks in Hardware
No ratings yet
A Survey of Neuromorphic Computing and Neural Networks in Hardware
88 pages
A Review of Deep Learning Methods and Applications For PDF
No ratings yet
A Review of Deep Learning Methods and Applications For PDF
14 pages
Raspberry Pi Workshop Proposal
No ratings yet
Raspberry Pi Workshop Proposal
7 pages
Graph Neural Networks: Aakash Kumar Arvind Ramadurai
No ratings yet
Graph Neural Networks: Aakash Kumar Arvind Ramadurai
22 pages
Robotics: Hira Shabbir 15006101049
100% (1)
Robotics: Hira Shabbir 15006101049
13 pages
Feature Extraction and Classification
No ratings yet
Feature Extraction and Classification
15 pages
Resource-Constrained Machine Learning For ADAS: A Systematic Review
No ratings yet
Resource-Constrained Machine Learning For ADAS: A Systematic Review
26 pages
Deep Learning-Based Approach For Sign Language Gesture Recognition With Efficient Hand Gesture Representation
No ratings yet
Deep Learning-Based Approach For Sign Language Gesture Recognition With Efficient Hand Gesture Representation
16 pages
Simultaneous Localization and Mapping For Robot Mapping
No ratings yet
Simultaneous Localization and Mapping For Robot Mapping
4 pages
08 Robot Sensor Motor
No ratings yet
08 Robot Sensor Motor
29 pages
Introduction To Robotics
No ratings yet
Introduction To Robotics
27 pages
Donkey Car Depp Reinforcement Learning
No ratings yet
Donkey Car Depp Reinforcement Learning
7 pages
Sun Human Action Recognition ICCV 2015 Paper
No ratings yet
Sun Human Action Recognition ICCV 2015 Paper
9 pages
Robotics Good One
No ratings yet
Robotics Good One
50 pages
Machine Learning For Everyone
100% (1)
Machine Learning For Everyone
50 pages
The Postman (Autonomous Quadcopter)
No ratings yet
The Postman (Autonomous Quadcopter)
14 pages
Combining Multiple Sources of Knowledge in Deep Cnns For Action Recognition
No ratings yet
Combining Multiple Sources of Knowledge in Deep Cnns For Action Recognition
8 pages
Content: Ambient Intelligence
100% (1)
Content: Ambient Intelligence
20 pages
Image Processing in UAV
No ratings yet
Image Processing in UAV
11 pages
Visual Servoing For A Quadcopter Flight Control
No ratings yet
Visual Servoing For A Quadcopter Flight Control
99 pages
Research Article: Moving Object Detection Using Dynamic Motion Modelling From UAV Aerial Images
No ratings yet
Research Article: Moving Object Detection Using Dynamic Motion Modelling From UAV Aerial Images
13 pages
Image Segmentation: Unlocking Insights through Pixel Precision
From Everand
Image Segmentation: Unlocking Insights through Pixel Precision
Fouad Sabry
No ratings yet
Anna Friebe, Florian Haug Eds. Robotic Sailing 2015 Proceedings of The 8th International Robotic Sailing Conference PDF
100% (1)
Anna Friebe, Florian Haug Eds. Robotic Sailing 2015 Proceedings of The 8th International Robotic Sailing Conference PDF
163 pages
A Vision-Based Automatic Landing Method For Fixed Wing UAVs
No ratings yet
A Vision-Based Automatic Landing Method For Fixed Wing UAVs
15 pages
Sustainability 15 14716 v2
No ratings yet
Sustainability 15 14716 v2
43 pages
An Approach To Vehicle Trajectory Prediction Using Automatically Generated Traffic Maps
No ratings yet
An Approach To Vehicle Trajectory Prediction Using Automatically Generated Traffic Maps
6 pages
Curriculum Development
No ratings yet
Curriculum Development
142 pages
Chapter 1: Content Knowledge and Pedagogy Title: Mother Tongue, Filipino and English in Teaching and Learning Observation
No ratings yet
Chapter 1: Content Knowledge and Pedagogy Title: Mother Tongue, Filipino and English in Teaching and Learning Observation
4 pages
Competency-Based Training (10 Principles)
100% (2)
Competency-Based Training (10 Principles)
22 pages
DRichard Wilczynski Paper 1
No ratings yet
DRichard Wilczynski Paper 1
2 pages
Prof. Devt. & Applied E. Module 5
No ratings yet
Prof. Devt. & Applied E. Module 5
6 pages
Main Complementary: Page Text Book/ Activity Book: 4
No ratings yet
Main Complementary: Page Text Book/ Activity Book: 4
2 pages
Blue Planetary Eagle 10
No ratings yet
Blue Planetary Eagle 10
4 pages
Learners Module Grade 7
No ratings yet
Learners Module Grade 7
4 pages
A Semi-Detailed Lesson Plan in Science 7
100% (1)
A Semi-Detailed Lesson Plan in Science 7
3 pages
AntagonismWorkshop (Compressed)
No ratings yet
AntagonismWorkshop (Compressed)
46 pages
Deep Learning ASSIGNMENT 2
No ratings yet
Deep Learning ASSIGNMENT 2
1 page
7es Lesson Plan Template1pdf
100% (1)
7es Lesson Plan Template1pdf
2 pages
Lecture 1: Introduction To Cognitive Computing and Deep Learning
No ratings yet
Lecture 1: Introduction To Cognitive Computing and Deep Learning
32 pages
Ml0120en m2v4 The Mnist Database
No ratings yet
Ml0120en m2v4 The Mnist Database
2 pages
Kaggle Ai Report 2023
No ratings yet
Kaggle Ai Report 2023
72 pages
TPACK
No ratings yet
TPACK
3 pages
A Case Study of ADEPR
100% (1)
A Case Study of ADEPR
20 pages
S01G01-introduction S01 PDF
100% (1)
S01G01-introduction S01 PDF
2 pages
The Structure of An Article Is Simple
No ratings yet
The Structure of An Article Is Simple
6 pages
Qualitative Research Report Writing
No ratings yet
Qualitative Research Report Writing
9 pages
Extensive Reading YANTI ASMARAA
No ratings yet
Extensive Reading YANTI ASMARAA
4 pages
Unit 3: Management of Instruction: Bloom's Taxonomy
No ratings yet
Unit 3: Management of Instruction: Bloom's Taxonomy
2 pages
Language and The Brain
No ratings yet
Language and The Brain
3 pages
First Term Exam Second Year Literary Classes
No ratings yet
First Term Exam Second Year Literary Classes
2 pages
Year 12 Modern History Assessment Task
No ratings yet
Year 12 Modern History Assessment Task
11 pages
Consolidation Achievers B1 Vocabulary Worksheet Consolidation Unit 2 1
100% (1)
Consolidation Achievers B1 Vocabulary Worksheet Consolidation Unit 2 1
1 page
Bakhtin and The Guru Granth Sahib: TH TH
No ratings yet
Bakhtin and The Guru Granth Sahib: TH TH
5 pages
Knowledge Management - Lecture 2 Objectivist Perspective
No ratings yet
Knowledge Management - Lecture 2 Objectivist Perspective
15 pages
Department of Education: First Summative Test in English 6 (Fourth Quarter)
No ratings yet
Department of Education: First Summative Test in English 6 (Fourth Quarter)
5 pages

A Survey On Trajectory-Prediction Methods For Autonomous Driving

Uploaded by

A Survey On Trajectory-Prediction Methods For Autonomous Driving

Uploaded by

This article has been accepted for publication in a future issue of this journal, but has not been

A Survey on Trajectory-Prediction Methods for

Abstract—In order to drive safely in a dynamic environment, Planning

the performance of each kind of method and outlines potential

Input Classification of Trajectory Prediction Methods

Fig. 2: The taxonomy of trajectory-prediction models for AVs.

Multimodal Trajectory Output Types

Inter-dependencies Traffic Light Signal

Fig. 3: The input and output factors of trajectory prediction.

TABLE I: Summary of physics-based methods.

maneuver-based method that uses Markov Chain. The Markov 𝑶𝟏 𝑶𝟐 … 𝑶𝑴

The architecture of DBN includes a behavior layer, a hidden

TABLE II: Summary of classic machine learning-based methods.

Historical Trajectory Feature extraction and regression Predicted Trajectory

Fig. 7: Description of deep learning-based methods.

4) Attention Mechanism: The attention mechanism allows NX

the way humans think and is widely used in various types

B. Graph Neural Network Fig. 11: Description of Graph Neural Network.

TABLE III: Summary of deep learning-based methods.

TABLE IV: The mainstream approaches for deep learning-based methods.

learning-based methods have reached the state-of-art results

to maximize the expected cumulative reward. A MDP is a { !, " {𝝉

is a finite set of actions, P is a state transition probability Find anFind an

TABLE V: Summary of reinforcement learning-based methods.

TABLE VI: Datasets for AVs which utilized in trajectory prediction.

TABLE IX: The performance of the trajectory prediction methods.

Fig. 14: Illustration of Potential Research Directions.

You might also like