0% found this document useful (0 votes)
15 views14 pages

Optical Flow Based Detection and Tracking of Moving Objects For Autonomous Vehicles

Uploaded by

marco.mariotti00
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views14 pages

Optical Flow Based Detection and Tracking of Moving Objects For Autonomous Vehicles

Uploaded by

marco.mariotti00
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 1

Optical Flow Based Detection and Tracking of


Moving Objects for Autonomous Vehicles
MReza Alipour Sormoli1 , Mehrdad Dianati1 , Senior Member, IEEE, Sajjad Mozaffari1 , and Roger Woodman1

Abstract—Accurate velocity estimation of surrounding moving range estimations. Therefore, in this study, we focus on
objects and their trajectories are critical elements of perception developing a DATMO technique primarily tailored for LiDAR
systems in Automated/Autonomous Vehicles (AVs) with a direct perception sensors. However, we believe such a technique can
impact on their safety. These are non-trivial problems due to
the diverse types and sizes of such objects and their dynamic be easily adapted to any type of perception sensor, such as
and random behaviour. Recent point cloud based solutions often depth cameras, that provides a point cloud output.
use Iterative Closest Point (ICP) techniques, which are known to When it comes to designing a DATMO technique, min-
have certain limitations. For example, their computational costs imising its processing cost/time and estimation error are two
are high due to their iterative nature, and their estimation error significant challenges. If an AV uses a LiDAR perception
often deteriorates as the relative velocities of the target objects
increase (>2 m/sec). Motivated by such shortcomings, this paper sensor, the velocity of the surrounding moving objects can
first proposes a novel Detection and Tracking of Moving Objects be calculated by corresponding two consecutive point cloud
(DATMO) for AVs based on an optical flow technique, which scans. In traditional approaches, an object detection function
is proven to be computationally efficient and highly accurate is used as the first processing stage to identify objects in
for such problems. This is achieved by representing the driving two consecutive LiDAR scans. Then, a tracking algorithm is
scenario as a vector field and applying vector calculus theories to
ensure spatiotemporal continuity. We also report the results of a applied to compute the velocity for the objects of interest [9].
comprehensive performance evaluation of the proposed DATMO Although these techniques are computationally efficient, their
technique, carried out in this study using synthetic and real- accuracy depends on the accuracy of the underlying object
world data. The results of this study demonstrate the superiority detection algorithms. While enhancing various elements of the
of the proposed technique, compared to the DATMO techniques techniques, for example, by utilising the geometric models
in the literature, in terms of estimation accuracy and processing
time in a wide range of relative velocities of moving objects. of the moving objects in the detection process, can improve
Finally, we evaluate and discuss the sensitivity of the estimation the performance of this category of DATMO techniques,
error of the proposed DATMO technique to various system and such techniques do not perform well in many scenarios [21].
environmental parameters, as well as the relative velocities of the For example, if a vehicle travels at the speed of 90 km/hr
moving objects. on a highway, the lateral velocity estimation error of such
Index Terms—Autonomous vehicles, optical flow, LiDAR, point techniques can exceed 2 m/sec. This is not regarded as an
cloud, DATMO, MODT, state estimation. acceptable input to the planning modules that determine cut-
in/cut-out intentions of vulnerable road users [22].
I. I NTRODUCTION The problem can be exacerbated because the geometric
models of the moving objects can significantly vary for various
CCURATE, reliable and fast perception of the surround-
A ing environment is one of the most important technical
challenges in the safe deployment of Autonomous/Automated
road users. This can have a direct impact on the estimation
errors of the aforementioned DATMO techniques. To address
these problems, a different category of DATMO techniques
Vehicle (AV) technologies. This problem includes the detec-
has emerged in the literature [23]–[26]. These techniques
tion of surrounding objects and estimation of their states, i.e.,
often use point cloud registration algorithms such as Itera-
their position and velocity. A particularly important element
tive Closest Points (ICP) [22] and track all moving points
of this problem is associated with the Detection and Tracking
in the point cloud [23]; hence, they are more accurate in
of Moving Objects (DATMO), a.k.a. Moving Object Detection
velocity estimation. However, these DATMO techniques are
and Tracking (MODT) in some studies of the literature [1].
computationally expensive because of their iterative nature
There is a wide range of DATMO techniques in the literature
[28]. Furthermore, the performance of the underlying point
tailored for AVs that use camera perception sensors [2]–
cloud registration methods can deteriorate when the deviation
[4], LiDAR perception sensors [5]–[7] and Radar perception
between two consecutive point cloud scans increases. For
sensors [8]. LiDAR perception sensors are particularly popular
instance, if the relative speed of the Ego Vehicle (EV) and a
for AVs as they inherently provide a wide field of view (FOV)
target object of interest is high (e.g., 12 km/hr), the dislocation
and robust point clouds that can be used for highly accurate
of the consecutive LiDAR scans is usually large, which can
1 Mehrdad Dianati holds part-time professorial posts at the School of result in a large error in ICP-based point cloud registration
Electronics, Electrical Engineering and Computer Science (EEECS), Queen’s algorithms [21]. This can result in poor performance of the
University of Belfast and WMG at the University of Warwick. Other authors latter category of the DATMO techniques.
are with WMG, University of Warwick, e-mail: {mreza.alipour, m.dianati,
sajjad.mozaffari, r.woodman}@warwick.ac.uk, [email protected] Motivated by the above challenges, in this paper, a novel
# Corresponding author: [email protected] DATMO technique is proposed for AVs that use LiDAR
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 2

TABLE I
A BRIEF REVIEW OF THE L I DAR- BASED DATMO METHODS IN THE LITERATURE .
L : L EARNING - BASED , ICP : INTERACTIVE CLOSEST POINT METHOD , G : G RID - BASED REPRESENTATION

Category References Advantages Disadvantages


• tracking performance relies
• independent detection and
on detection/classification
tracking
Detection-Based Model-Based [1]l [9] • difficult to detect and track
• fusing different senors in de-
Tracking objects with unknown ge-
tection level
ometries

• tracking object with different • more false-negatives (FNs)


[10] [11] [12]gl [13]l
Model-Free geometries in detection and tracking
[14]g
• sensor’s physical model is
considered → more accurate
• tracking performance
DATMO
Model-Based decreases for objects with
Direct Tracking [15] [16] [17]g [18] • geometry of the objects is
different shapes
tracked → accurate state es-
timation

[7]g [19]g [20] • tracking all scanned points • high computational cost
[21]icp [22]icp • DATMO performance • higher the relative velocity
Model-Free doesn’t rely on geometric for moving objects → lower
(Point-Based) [23]icp [24]icp
[25]l [26] [27]gicp+l shape tracking performance

perception sensors. The proposed technique is inspired by ered. In the next step, the KITTI tracking dataset from real
optical flow algorithm [29]. In our approach, the 3D LiDAR driving scenarios [30] is used. Each one of the synthetic and
scans are initially converted to 2.5D motion grids (Fig. 3) KITTI datasets serves a different purpose in our study. The
inspired by [19]; then, the 2D velocity of each cell in the grid synthetic dataset enables generating a wide range of driving
is estimated by comparing two consecutive LiDAR scans. In scenarios and various target vehicles (shape, velocity, dimen-
the next step, a series of grid mask filters such as temporal sion, etc.), which is not practically feasible in real-world data
and rigid-body continuity filters are applied to eliminate false collection campaigns. The flexibility of this type of dataset
positive detection. The LiDAR points are classified based becomes even more important when it comes to analysing
on their velocity vectors, and each class is associated with error sensitivity to different factors that are easy to change or
a moving object. Finally, a Kalman Filter is used to track sweep in synthetically generated scenarios. On the other hand,
the velocity and position of the detected moving objects, testing the proposed DATMO techniques with data collected
considering their dynamic model. The main contributions of by contemporary sensors in real-world conditions helped us
this paper are summarized as follows: validate its performance in the real world. Comparing the
• Adopting optical flow technique to process the 3D point estimation error distribution shows that the proposed DATMO
cloud data instead of complex ICP algorithms. This outperform the state-of-the-art in both speed and yaw angle
is used to generate a grid-based velocity vector field estimation. Moreover, the computational cost (without parallel
representing a dynamic driving environment. calculations) shows improvements of about 10%, whereas
• Introducing a two-layer filter applied to the velocity parallelising the proposed method is easily available and could
vector field eliminating the false positives and erroneous improve this metric even more significantly. The proposed
vectors. These filters are designed based on the spatial error sensitivity analysis also revealed a meaningful correlation
continuity of the vector field (rigid-body assumption) and between the configuration of the TV and estimation error
temporal propagation to improve the estimation perfor- from which the researcher could benefit to develop motion
mance results. planning/prediction algorithms.
• Introducing novel error model for velocity estimation as The rest of the paper is organized as follows. An overview
a function of a configuration set for target vehicles (TVs) of the existing related work in the literature is described in Sec-
w.r.t the ego vehicle (EV). This offers further insights tion II. The system model and problem formulation are given
to be incorporated into the downstream modules such in Section III. The proposed DATMO method is explained
as motion planning/prediction in the autonomous vehicle in Section IV. The performance evaluation methodology and
framework. results are discussed in Sections V to VI. Finally, the key
The performance of the proposed technique compared to findings and conclusions of the study are given in Section.VII.
the ICP-based methods, such as the one in [21], [27], model-
free [14] and model-based indirect tracking methods [18], is II. RELATED WORKS
evaluated in two steps. First, the compared DATMO techniques In order to review available point cloud based DATMO/-
are evaluated on a synthetic dataset generated in MATLAB MODT approaches in the literature, in this paper, they are
scenario designer, where various driving contexts are consid- categorized into two main classes: 1) detection-based methods
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 3

Fig. 1. High-level schematic system diagram of optical flow based DATMO for AVs. ω, vx , and vy all are n × m matrices. The same colour code (blocks
and signals) is used to expand and explain in different sections

also known as traditional methods [31]; 2) direct tracking However, the tracking accuracy declines for moving objects
methods which is further divided into model-based and point- with different shapes and geometry like cyclists or pedestrians.
based approaches. The second subclass of the direct tracking approach (point-
based tracking) is a geometric model free in which every
A. Detection-Based Tracking point is tracked in consecutive LiDAR scans. However, these
scanned point clouds could be used directly or represented
The detection-based or indirect algorithms track the ab-
in the form of 2D/3D grid space before being used in a
stracted objects, patterns, bounding boxes, or clusters [10],
grid-based tracking algorithms [19]. The key advantage of
[11] by applying different filters such as variants of the Kalman
the point-based DATMO stems from the fact that there is no
filter or particle filter. Therefore, the tracking performance
assumption about the geometric shape of the object, and the
for these methods relies on both classification algorithms (or
objects are classified/detected based on tracking corresponding
pattern recognition) and filter structure [9]. There is a great
scanned points on them. But tracking all points makes the
number of research aimed to improve the object tracking
computation process expensive and limits the method in terms
task by developing enhanced classification/clustering steps
of the maximum number of moving objects in a scene [20]. To
(before tracking) using learning-based [12], [13] or geometric
overcome this challenge, before tracking scanned points they
model estimation [1] algorithms, but all are still classified
are divided into static and moving categories by generating a
under the first category of DATMO methods. There are other
static obstacle map (SOM) [21] or filter objects of interested
studies in this category, such as [14], focused on reducing
with the help of deep learning methods [27]..
the computational complexity by applying the classification
and subsequent tracking only on the moving point. In a Point cloud registration (PCR) algorithms are widely used
detection-based approach the detection and tracking steps are in model-free DATMO methods. After clustering point clouds
independent and various sensor data could be used in the in consecutive scans, corresponding clusters are detected and
detection algorithm without changing the tracking part. PCR algorithms such as interactive closest point (ICP) [32]
are applied to each set (two clusters from the same object
in different time steps) to calculate precise relative motion
B. Direct Tracking [21]–[26]. Although, low standard deviation of error has been
In the direct tracking methods, the sensor model and/or reported for tracking velocity (0.4 m/sec) and orientation
object’s geometric model is used to estimate corresponding (1.81 deg) for the moving objects [21], these methods suffer
points in space without prior detection. This method could be from a number of considerable drawbacks. First of all, the
further divided into model-based and model-free (point-based) computational time is not deterministic and depends on the
approaches. number of moving objects. Secondly, The performance of the
In model-bases direct tracking DATMO algorithms, prior ICP algorithm highly depends on the initial conditions and
knowledge about the geometric shape and dynamic model the performance deteriorates when the relative velocity of the
of the moving vehicles are used to track the states and the moving objects (with respect to EV) increases. Finally, because
geometric shape of the objects [15], [16] without detecting the the PCR algorithms are based on the iterative optimization
objects first [9], [18]. Tracking the geometry helps to predict process, parallelizing these algorithms is not simple and
the dynamic properties with higher precision and discard straightforward. Various methods reviewed in this section are
tracked objects with strange shapes or geometry changes. summarized along with advantages/disadvantages in Table.I.
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 4

III. S YSTEM M ODEL AND P ROBLEM D EFINITION

We consider a system consisting of an EV equipped with


LiDAR sensors and multiple target vehicles (TVs) such as cars,
vans (or other large vehicles with different shapes), and bikers
on a segment of a road as shown in Fig. 2. The estimation of
speed and direction of motion for all TVs is desired.
As illustrated in Fig. 1, the input of the DATMO pipeline is
a 3D point cloud (P l ) generated by raycast LiDAR, and the de-
sired output is a set of 2D velocity vectors ({v̂1t , v̂2t , . . . , v̂ot })
each belongs to a unique track (trace of velocity vectors in
a certain period of time). It should be noted that “o” is the
number of moving objects at a time “t” (stationary objects
are not included). Unlike [21], in this study, we don’t assume
a small relative velocity for the moving targets. Moreover,
the update rate of the LiDAR sensor is presumed 10 Hz, so,
Fig. 3. 3D point cloud conversion to 2.5 bird’s eye view grid. Darker grids
in order to avoid losing data, the proposed algorithm should corresponds to higher value of Gij , and Gij = 0 in white cells.
be able to calculate the desired output (within a radius of
120 m) in less than 100 ms (regardless of the number of
moving objects). However, we assumed that the ego vehicle A. Point Cloud to Bird’s Eye View Conversion Process
and the surrounding objects move in the horizontal plane The optical-flow algorithm is the main component of the
(xy in Fig.3). Therefore the velocity in the vertical direction proposed method, and this process requires 2D grayscale
(z) is ignored and not reported in the estimated output. The images to calculate the velocity vector field. However, the
estimated velocities are in the local coordinate system attached input data from LiDAR is 3D scattered point cloud. Similar
to the EV. to [19] a conversion block is utilized for mapping the point
cloud input to a bird’s eye view grid which is also known as
2.5D grid map (Fig. 3). The input signal which is fed into
IV. P ROPOSED M ETHOD
the conversion process is a point cloud containing L points
The proposed DATMO is illustrated using a block diagram each has three coordinates without intensity data, e.g. the lth
of processes as shown in Fig. 1. The algorithm between point is represented by P l = {plx , ply , plz }. The output of the
input and output signals is divided into four main processes conversion process (input of the vector flow generator) is a
summarized as follows: grayscale 2D image which is called 2.5D grid map in this
paper. In other words, the output is a n × m matrix in which
(A) 3D LiDAR sensor data are converted to 2.5D grayscale
cells are normalized between 0 and 255. Each cell is referred
grid map.
to by an ij pair where i (j) is an integer between 1 and
(B) A velocity vector field generated using the optical flow
N (M ). Moreover, the centre of each cell in the grid has
algorithm.
also a coordinate (Gxij , Gyij ). Based on the grid’s resolution,
(C) The false positive estimation are eliminated by a filtering
all cells have the same dimensions of w and h in x and
mask calculated based on the continuum property of the
y directions, respectively. A value is assigned to each cell
velocity vector field for rigid-body motion
of this grid space based on the height of the corresponding
(D) Finally fusing all measured information and dynamic
points in the point cloud data i.e. the points with the same x
model in an Extended Kalman Filter (EKF)
and y values (see Eq. 2). This value is calculated based on a
The core process is the optical flow calculation and the linear combination of the mean and standard deviation of the
rest are either for preparing input data for this step or post- height of the corresponding points projected on the horizontal
processing the generated vector flow for filtering and tracking plane. This concept has been illustrated in Fig. 3 and the value
the true positives. In the following subsections, each process assigned to cell ij is obtained as follows:
in the pipeline is described further in detail.
1
Gij = [a · µ (Pzg ) + b · σ (Pzg )] (1)
hmax
where, a and b are constant weight, and hmax is normalizing
constant. µ(·) and σ(·) are mean and standard deviation
functions, respectively. Superscription g is a set of integers
1, 2, .., L that satisfies the following condition:
 
Gxij − w/2 ≤ pgx < Gxij + w/2

(2)
Gyij − h/2 ≤ pgy < Gyij + h/2
Fig. 2. Ego vehicle (EV) and target vehicles (TVs), including cars, vans, and Based on this definition, Gij = 0 means that there is no
bikers moving with different velocities point in the point cloud corresponding to the cell ij above
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 5

ground. Lower values (relative to a threshold) show that the


points are from ground [19]. On the other hand, higher values
are assigned for the points on vertical planes (bigger height
standard deviation) or horizontal plane with high z components
(bigger average height).

B. Motion Vector Flow Generation using Farneback Optical


Flow Algorithm
A dense optical flow algorithm is used to calculate the
velocity (linear and angular) for each cell of the grid map
generated in the previous step using two consecutive frames
of the converted 2.5D grayscale grid map (Fig.4). The output
signal of this process carries three n × m matrices including
linear and angular velocities of each cell in the grid. Two
matrices for linear velocities in x and y directions (vx and vy ,
respectively), and one for angular yaw rate in the z-direction
(ω).
Fig. 5. Vector field propagation mask in time step k
There are several optical flow algorithms available in the
literature for estimating the per-pixel motion between two
consecutive images [33]. In this study, we need dense vector theory [35] and rigid body assumption for each moving object
flow (not sparse) to calculate the velocity of all occupied cells in the scene, the angular velocity of each cell (in z direction)
with high accuracy and low computational cost. Although any is obtained by the Eq. 3. (see [36] and Appendix).
dense optical flow algorithm could be used in the proposed
DATMO framework, the well-known Gunar-Farneback optical
ω = 0.5 (∇ × v) (3)
flow generator [29] has been used here. This algorithm satisfies
the requirements (accuracy-cost trade-off) more efficiently Where v is the linear velocity vector field, and ∇ is the curl
compared to other methods [34]. However, the Farneback operator.
algorithm employs expanded polynomial transformation of Therefore, the output of the vector flow generation process
adjacent cell’s brightness to estimate the dense velocity distri- includes angular velocity in z direction, in addition to 2D
bution for each grid cell [29], and this may cause estimating linear velocity in x and y directions.
non-zero velocities for unoccupied cells in the neighbourhood
of the occupied cells. This challenge is addressed in the next
part (filtering and masking process). C. Masking and Filtering the vector field
The optical flow algorithm calculate linear velocity distribu- Due to the dense nature of the vector field obtained by the
tion, however, the angular velocity is also required for accurate Farneback optical flow algorithm, the generated vector field
state tracking of the vehicles (sec.IV-D). Based on vector field contains false positive values for cells which are not occupied
or do not belong to moving objects (static). The masking
process is to filter out undesirable false positives and provides
the final estimated velocity vectors, so the output of this step
is a subset of its input. In this section, the mask is obtained
in two steps and prepare the final vector filed for the tracking
process.
1) Vector Field Propagation Mask: The second masking
layer for the vector field is based on temporal filtering which is
called propagation in this study. Propagation of the vector filed
in time step k is obtained by changing the (x, y) position of the
velocity vectors in the 2D plane according to the linear velocity
values in time step (k−1). The propagation is calculated using
Eq. 4.
k k−1
ṽi′′j ′ = v̂ij ; k−1 
i = i + v̂x dt + w/2 (4)
j ′ = j + v̂yk−1 dt + h/2
Fig. 4. Optical flow based velocity vector field generation process. Grayscale
brightness refers to the occupied cells which contain LiDAR scanned points. where ṽ is propagated vector field, and dt is time increment.
⊗ and ⊙ show the angular velocity vectors perpendicular to the motion plane
in −z and +z, respectively. NOTE: for reading the system diagrams used
The value inside ⌊·⌋ is rounded down to the nearest integer.
in this paper the signals are expanded (rectangles with dashed line frame) to As shown in Fig. 5, the propagated vector field of time
illustrate data that is carried between processing blocks. step (k − 1) is compared with the vector field calculated in
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 6

Fig. 6. Expanded schematic system diagram of optical flow based DATMO for AVs. NOTE: for reading the system diagrams used in this paper the signals
are expanded (rectangles with dashed line frame) to illustrate data that is carried between processing blocks. And, MMR stands for memory.

the same time step and the masking matrix (M c) is obtained 


using Eq. 5: ∆V = 0
(6)
∇ (∇ × V) = 0
 k k The first part of Eq. 6 is for continuity in the linear velocity
1 if ṽij − v̂ij ≤ αp
(M p)kij = (5) i.e the objects cannot tear apart nor implode, while the second
0 otherwise
part refers to the fact that all points on a single object should
rotate with the same angular velocity. The results of both
In this equation, αp is a constant threshold close to zero. The operations are 2D matrices, so, the estimated values for the
final masking boolean matrix is calculated by multiplying two cells that do not satisfy the condition (not exactly equal to
masks calculated in two layers: M ask = M c×M p. Applying zero but below a certain threshold αcont ) are set to zero. The
this filtering mask to the vector field at each time step filters resulting mask from this procedure is referred to by (M c) in
out undesirable false positive vectors. the rest of the text.
2) Rigid-Body Continuity Mask: In this study, it has been
assumed that the moving objects are rigid i.e different parts D. Tracking
of a single object have zero relative motion. Therefore, linear The resulting vector field is used to detect moving objects
(v) and angular (ω) velocity vector fields should satisfy the and estimate their velocity. The tracking process output is
continuity conditions of Eq. 6 [35]: the final estimated state of the objects (linear and angular
velocities) augmented with a unique ID. As illustrated in Fig.1,
x̄, x̂. and x̃ are masked, estimated, and propagated values of
x variable, respectively, while the superscription shows the
time step for the variables. An Extended Kalman Filter (EKF)
is designed to use vector field data as measurements and the
dynamic model of Eq.7 (constant linear/angular acceleration)
as the prediction model to estimate the state (Xn ) of the
moving objects. Every estimated position and velocity is
assigned to either an existing or new track with a unique ID
via Global Nearest Neighbour (GNN) [21].

Ẋn = f (Xn , U )
   
ẋn vn cos θn − v + ln ω
 ẏn   vn sin θn − ln ω    (7)
    v
 θ̇n = ωn − ω ; U=
    ω
 v̇n   ka 
Fig. 7. Tracking process system diagram. kα
ω̇n
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 7

V. P ERFORMANCE E VALUATION
An experimental test is designed to evaluate and verify the
performance of the designed DATMO algorithm. Two main
objectives are targeted in this section. Comparing the proposed
DATMO with state-of-the-art (SOTA) methods, and obtaining
an error model for estimation accuracy.
First and foremost we statistically compare the performance
of the DATMO method with SOTA geometric model-free
approach (GMFA) developed in [21] which has been proven
to be more efficient than the geometric model-based tracking
(MBT) method proposed in [37]. However, the proposed
method is further compared against SOTA model free [14] and
model-based [18] direct tracking methods as well. The GMFA
algorithm is regenerated and evaluated, while the quantitative
performance of the other methods are obtained from [14] and
[18]. Therefore, the later experiments are consistent with those
specified in these studies.
Fig. 8. Target vehicle’s (TV) configuration (position, orientation, and velocity) In addition, we further investigate how the state estimation
relative to the ego vehicle (EV) error changes as a function of the deriving environment i.e
configuration of EV and TV. Regarding these objectives, the
experimental evaluation is conducted in two different steps.
Initially, synthetic data is generated to evaluate the algorithm
1) Measurements in EKF and Updating Tracks: In the in various custom situations, and in the next step, the algorithm
Kalman filter structure three measurements are used for each is tested on a real-world dataset of KITTI.
moving object i.e two linear velocities and one angular veloc-
ity in z direction. As illustrated in Fig.7, these measurements A. Datasets
are calculated by clustering masked velocity vector fields
{v̄x , v̄y , ω̄} provided by optical flow and taking the mean 1) Synthetic Data Generation and Simulation: In order to
value of each cluster. In the proposed approach, Euclidean evaluate the proposed method for estimating the state of the
distance is utilized for clustering vectors, and mean position target vehicles in diverse possible configurations, generating
and velocity are fed into the EKF algorithm to estimate the a synthetic dataset is essential. In addition, the estimation
state vector for each moving object using motion dynamics of error is calculated more accurately in the simulation compared
Eq.7. with real-world datasets such as KITTI for which the ground
truth of objects’ velocity has not been provided directly. In
All clustered points should be either assigned to an existing this study, the TV’s configuration space is defined by three
track or initialised on a new track. Similar to [21], in our variables (Fig. 8): distance to EV (ln ), relative orientation
approach, the clusters are assigned to the predicted tracks (βn ), and relative velocity (∆υn = υn − υ). The aim is
via GNN. Each cluster is assigned to at most one track to design scenarios covering all possible configurations for
based on a 4D feature vector [xm , ym , λ1 , λ2 ] containing the investigating the meaningful relations between the estimation
mean position and shape of the cluster (independent of the error and these three variables, in addition to assessing the
orientation) in the motion plane. Two components in the estimation accuracy. Therefore, the flexibility in changing dif-
features vector showing the shape of a cluster are eigenvalues ferent configurations provided by synthetic datasets is another
of the covariance matrix of the points in a cluster (λ1 , λ2 ). reason that justifies utilising this type of dataset.
So, a cluster is assigned to a track if the Euclidean distance The driving scenario designer toolbox in MATLAB is used
between their feature vectors is less than a threshold γ. to generate synthetic scenarios and add a LiDAR sensor to
collect point cloud data. As illustrated in Fig. 9-(b), three
The final step in managing the tracks is confirming and/or
different types of TVs are simulated in synthetic scenarios in-
deleting tracks. Every one of these two procedures is done by
cluding sedan, van, and cyclist. Moreover, for considering EV
a 2D integer vector. A track is confirmed when M1 number
motion effect completely, a nonzero curvature is considered
of measurements/detection is assigned to it in the last N1
to avoid zero yaw rate for EV. The LiDAR sensor parameters
updates (M1 < N1 ). And similarly, a confirmed track is
are adjusted according to what is used in the KITTI dataset
deleted if in the last N2 (M2 < N2 ) consecutive updates, no
collection sensor. The point cloud data from a simulated scene
measurement is assigned to it M2 times. It should be noted
has been plotted in Fig. 9-(c).
that the coordinate system used in this section is attached to
In order to cover all possible cases for the n-th target vehicle
EV with a configuration shown in Fig.8.
configurations ({ln , βn , ∆υn }), each scenario contains a target
The interaction between different processes is depicted in vehicle (sedan, van, or cyclist) moving in the same multi-lane
the assembled system diagram of Fig.6. This system diagram road in which EV moves in one of the lanes (with a speed of
is a detailed version of Fig.1. 20 m/s). TVs move with 10 different speeds (10 to 40 with a
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 8

Fig. 9. Synthetic primary scenario generated in Matlab scenario designer: trajectories design parameters for different TVs (a), a 3D meshed object in scene
(b), and bird’s eye view scanned point cloud excluding ground points in EV coordinate system (c)

Fig. 10. Proposed DATMO algorithm result for KITTI dataset: tracking IDs and velocity vectors plotted on the front camera image (left), and bird’s eye view
scanned point cloud excluding ground points (right)

step of 2 m/s) in a lane and drive in two modes: first, keep (secondary scenarios). In these scenarios the TV moves with
the same lane, and second, overtake back and forth between the speed of 6 m/sec along i) straight right-angled, ii) right
two lanes with trajectories defined by two parameters of s turn, and iii) circular paths (see [14] for details).
and n = w/2 shown in Fig.9-(a). In the case of changing
lanes/overtaking, two values of 2 and 4 seconds are used for 2) KITTI Dataset: The final evaluation is conducted using
s (assuming constant speed). And finally, the lateral offset of the real-world KITTI tracking dataset for multi-object tracking
the TV start lane from EV’s lane varies from -80m to 80m task. Besides the ground truth labels, only LiDAR data from
(with a step of 1m), The cyclists’ trajectories include only this dataset is used in the current study for the estimation
lane-keeping i.e without any lateral motion. There is only one task, however, the colour images of each frame are also
TV in each generated scenario to prevent occlusion, although, used to plot velocity vectors in image coordinate (Fig.10-left)
in Fig.9 three TVs are depicted which is a combination of using transformation matrix (velo-to-cam). Moreover, since
three scenarios to be more informative. there is no ground truth label for the velocity of objects
in the driving environment, it is obtained by tracking the
We refer to all synthetic scenarios described above as pri- centre of the 3D bounding boxes. In the KITTI dataset, the
mary scenarios. To further compare the performance metrics bounding box coordinates are provided in the camera frame
against model-free [14] and model-based [18] indirect tracking whereas the estimated velocity values are obtained in the
methods, simulation scenarios designed in [14] are replicated LiDAR coordinate system. Therefore, the calculated velocities
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 9

in [14] and [18]. Lastly, the effects of the continuity filters


(Section. IV-C) are presented in an ablation study.
The grid size in the proposed algorithms is 0.17 × 0.17 m
and the Farneback optical flow algorithm used in the proposed
method is taken from OpenCV computer vision library with
the following setting: NumPyramidLevels = 3, PyramidScale =
0.5, NumIterations = 3, NeighborhoodSize = 3, and FilterSize
= 11.

1) Comparison with Direct Tracking Methods: The de-


tection and state estimation results for 21 sequences of the
KITTI training labelled dataset and more than 800 synthetic
driving scenarios (primary) are presented in this section. In
Fig. 11. Comparative results of the proposed and GMFA algorithms for the both sets of these datasets, the detection and state estimation
synthetic and KITTI datasets performance evaluated for cyclists, sedans, vans (or bigger
vehicles such as trucks or buses in the KITTI dataset) and
pedestrians are ignored in this study. As a sample, the tracking
are transformed to the camera coordinate (Tvelo−to−cam ) to
results of moving object tracking in both KITTI and synthetic
calculate estimation error.
datasets have been illustrated in Fig.11 left and right column,
Since we need to compare the estimation results with the respectively, for GMFA and the proposed approaches (dashed
ground truth labels, the training sequences are used for velocity red and solid green, respectively). Discontinuation of the
and yaw angle error calculations, however, both training and dashed red diagram in the left column plot shows that the
testing sequences are used to obtain detection performance. GMFA couldn’t track the object from approximately 71 sec
Adopted from [21], the point cloud data closer than 25m in the onward. The top row in this figure shows the speed estimations
lateral direction (left and right), 80m and 15m in longitudinal whereas the bottom row depicts the yaw angle estimation
front and rear directions, respectively are considered, and the results for a moving object in one sequence. The estimated
rest of the data and labels out of this range are discarded in values are reported as relative values i.e measured in the EV
the evaluation process. coordinate system.
In order to compare the GMFA and the proposed ap-
B. Evaluation Metrics proaches, the estimation error distribution of all sequences for
Following the previous studies [21], [22], the velocity vector two datasets is obtained. This distribution contains estimation
estimation accuracy is evaluated by speed and angle (θ) errors error of all time steps throughout all sequences. Speed and
with respect to the ground truth (GT) values. The estimation yaw angle estimation error distribution has been shown in
errors for oth target object at time t are calculated in Eq. 8. Fig.12 top and bottom row, respectively. In this figure, the
performance of both GMFA (red) and the proposed (green)
δvo (t) = voGT − ∥v̂o ∥ t
(8)
δθo (t) = θoGT − θ̂o
t

In order to compare DATMO performance throughout all


data points, the standard deviation (σ) of the error distribution
of all timesteps and target moving objects is used in Table.II.
Moreover, same as [21], the detection performance is also
evaluated by Precision and Recall defined in Eq. 9.

P r = (T P )/(T P + F P )
(9)
Re = (T P )/(T P + F N )
The last metric to quantify the estimation performance is
the time each algorithm takes to process an instance of the
LiDAR scan to detect and estimate the state of the moving
objects.

C. Results
The evaluation results are divided into three sections. Firstly,
a stochastic comparative analysis is conducted with model-free
direct tracking methods in [21] (GMFA) and [27]. Secondly,
simulation results compare the proposed method with model- Fig. 12. Comparative error distribution of the proposed and GMFA [21]
free and model-based indirect tracking methods developed algorithms for the primary synthetic and KITTI datasets
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 10

TABLE II
E XPERIMENTAL EVALUATION RESULTS OF THE PROPOSED METHOD AND GMFA [21] FOR BOTH SYNTHETIC SIMULATION AND KITTI DATASET

∥∆υ∥ Precision Recall σv σa Time


Dataset Method
[m/s] (%) (%) [m/s] [deg] [ms]

GMFA 94.3 93.6 0.28 1.12 147


≤1
Simulation Proposed 94.1 93.8 0.23 1.37 139
GMFA 88.9 87.3 2.28 3.81 155
>1
Proposed 89.2 88.3 1.48 0.63 147
GMFA 89.7 88.7 0.48 18.12 196
≤1
KITTI Proposed 88.8 89.1 0.33 10.97 171
GMFA 87.5 86.8 0.78 34.21 201
>1
Proposed 88.6 88.2 0.56 16.71 183

TABLE III relative velocity (∥∆υ∥ ≤ 1 and ∥∆υ∥ > 1 m/s), because the
C OMPARING TO OTHER MODEL - FREE AND LEARNING - BASED MOTION GMFA method of [21] is developed for detecting and tracking
ESTIMATION METHODS
of moving objects with “low relative speed”. Therefore, in
Speed Error order to check if the GMFA method is replicated properly,
Ref time [ms] Training needed
[m/sec] the estimation errors for low relative speeds (∥∆υ∥ ≤ 1 m/s)
Wang et al. [38] 1.69 80 Yes should be less than what was reported in [21]. It should be
Liu et al. [39] 4.37 - Yes noted that since there is no exact velocity ground truth label for
Li et al. [27] 0.42 240 Partially the KITTI tracking dataset, the calculated error for this dataset
Proposed 0.44 142 No even for low speeds is not comparable directly with values
reported in [21], and we use the replicated GMFA algorithm
instead to only compare the final estimation error with the
TABLE IV proposed approach performance. Moreover, the processing
C OMPARING THE MEAN AND MAX ESTIMATION ERRORS AGAINST MODEL time reported in this table is the average time the computing
FREE [14] AND MODEL - BASED [18] INDIRECT TRACKING METHODS .
R ESULTS OBTAINED FROM THE SECONDARY SCENARIOS IN [14]. unit (Intel Core(TM) i7-7600 CPU @ 2.80GHz) needs for each
cycle excluding the first step of each sequence which needs
secondary Speed [m/sec] Direction [deg] extra initialization time. The breakdown of computational
scenar- Method complexity for different processes within the framework is
ios mean max mean max
given in Table. VI. Since we believe that the core process
Wang et al. [14] 0.25 0.43 0.52 1.49
of optical flow in the proposed method could be parallelized
i Zhang et al. [18] 0.38 0.59 1.23 1.93 using off-the-shelf tools, this process was implemented on both
Proposed 0.09 0.31 0.18 0.51 CPU and GPU (GeForce RTX 2080 Ti) for the simulation sce-
Wang et al. [14] 0.40 0.70 0.80 2.53 narios. The results indicate an 80% improvement in processing
ii Zhang et al. [18] 0.52 1.00 1.53 4.02 time for GPU compared to CPU.
Proposed 0.21 0.47 0.83 1.70 The estimation error and computational complexity com-
Wang et al. [14] 0.29 0.40 1.65 2.50 parison with other model-free and learning-based motion esti-
iii Zhang et al. [18] 0.43 0.84 2.25 5.09
mation methods is summarised in Table. III. The performance
metrics of other methods and evaluation conditions are adopted
Proposed 0.19 0.44 0.41 1.82
from [27]. The results are based on the KITTI tracking dataset,
sequences 0000, 0005, and 0010. The objects within a radius
of 50 m are considered. The results, represented in Table. III,
methods is illustrated for KITTI and synthetic datasets sepa- include other learning-based motion estimation methods as
rately in the left and right columns, respectively. Furthermore, additional references. The comparison suggests that although
for each distribution, a normal distribution function has been our method’s performance in terms of speed accuracy is
fitted with the standard deviation value printed in the top left comparable to that of [27], there is a significant improvement
for both methods using the same colour codes. Similar to [21] in computational complexity. This is attributed to the iterative
the standard deviation values are used to compare the accuracy nature of the ICP method used in [27]. Moreover, all other
of DATMO methods. methods in this table are data-driven and will need retraining
Finally, the detection and estimation results of two methods the different situation or sensor configurations; otherwise, the
and two datasets are summarized in Table.II. Precision and performance may decline. While our method is deterministic
recall metrics are for evaluating moving object detection while and does not need training.
the standard deviation and time columns report the result of the
state estimation accuracy and computational cost, respectively. 2) Comparison with Indirect Tracking Methods: Table IV
The results are further reported for two different ranges of presents secondary simulation results for comparison with in-
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 11

direct tracking methods. Like the simulation scenarios used in TABLE V


this comparison, the performance metrics and values for other A BLATION STUDY FOR THE TWO - LAYER FILTER APPLIED ON VELOCITY
VECTOR FIELD
methods in this table are sourced from [14]. As anticipated, the
proposed method outperforms both indirect tracking methods, Propagation Continuity
# σa [m/sec] σa [deg]
as they compute velocity based on macroscopic classified Filter Filter
point clouds. In contrast, our framework employs the optical 1 ✗ ✗ 2.11 0.93
flow algorithm to calculate the velocity field at a microscopic 2 ✓ ✗ 1.45 0.70
level (grid-based) before the classification and EKF tracking 3 ✓ ✓ 1.38 0.64
processes take place.
TABLE VI
3) Ablation Study: The impact of the two-layer filters P ROCESSING TIME FOR DIFFERENT COMPONENTS OF THE PROPOSED
applied to the velocity vector field and ablation experiment METHOD
is investigated in this section. The continuity and propagation
Data 3D to 2D Optical
filters (see Section. IV-C) are disabled individually to quantify Process: Pars- Conver- Flow
GNN Total
their effect through the changes in the performance metrics. Tracker (GPU/CPU)
ing sion (GPU/CPU)
Only the KITTI dataset is used for this experiment and the Time [ms]: 8.1 4.4 4.6/119 10.8 27.9/142.3
results are summarized in Table. V
Based on the findings presented in Table. V, the performance
metrics show improvement with the inclusion of both filters, sweep these variables and the proposed algorithm is applied
with notably significant enhancement attributed to the propa- to measure the estimation error for each case. The result of
gation filter. this sensitivity analysis is presented in a 3D plot and three 2D
plots (three views of the same 3D plot) in Fig.13. In this figure,
the heatmap colour correlated to the absolute speed estimation
D. Estimation Error Sensitivity to TV’s Configuration error (eυ ) is used to visualize the error value, particularly in
After validating the proposed DATMO approach and com- the top-view plot (red and blue colours corresponding to high
paring it with the state-of-the-art method, the sensitivity of the and low absolute error, respectively).
speed estimation error to the changes in the TV’s configuration
for the proposed method is explored in this section. This VI. D ISCUSSION
would help other researchers who use this tracking method
The obtained results are presented in the section.V-C are
(motion planning and control) to consider an error model. Two
further discussed in detail in this section. The comparative
elements of the TV configuration (βn , ln ) are used as variables
data reported in TableII and Fig.12 show the superior per-
in this section. In other words, we want to investigate how
formance of the proposed method compared with the GMFA
the estimation error changes by changing the distance (ln )
approach. But before comparing the two approaches, we need
and orientation (βn ) of the TV. The synthetic data is used to
to validate the regenerated algorithm for GMFA. Since this
algorithm is originally developed for low relative speed and
has been validated with an autonomous vehicle platform,
the performance of the regenerated algorithm for low-speed
synthetic dataset should surpass the values reported in [21] (the
standard deviation of the speed and yaw angle error are 0.40
m/s and 1.81 deg, respectively). According to Table.II (first
row), the GMFA result for low-speed simulation outperforms
these outcomes. Therefore, the regenerated GMFA algorithm
is reliable to be tested as the baseline with other datasets such
as KITTI or high relative speed synthetic datasets.
In detecting the moving objects, precision and recall values
show almost similar performance for both comparing methods
(increased only 1% in the proposed approach). However,
the state estimation accuracy shows more than 34% and
50% improvement in the standard deviation of the estimated
speed and yaw angle error distribution, respectively. The fitted
normal distribution along with the standard deviations for both
synthetic and KITTI datasets has been shown in Fig.12.
Moreover, the measured processing times show an average
of ∼ 8% improvement for the proposed method compared with
Fig. 13. Error model as a function of orientation and distance of the TV with
the GMFA. The computation time for the synthetic data shows
respect to EV for ∆υ = 10 m/s. The heatmap colour corresponds to the less improvement compared to the KITTI dataset (5% vs
absolute error value in eυ axis 10.5%). Since each sequence of the synthetic scenario contains
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 12

only one moving object, and the GMFA is based on point cloud
registration for which the processing time depends on the
number of detected moving point clusters, the computational
effort is more consistent and less than that of KITTI scenarios
in which the number of moving objects are more than one
in most sequences. All computations in this study were done
on CPU without parallel processing, whereas the proposed
method which is based on an optical flow algorithm, has
the potential to be implemented on GPU to accelerate the
computations. This is another advantage of this approach over
point cloud registration-based methods such as GMFA that
use optimization and this makes it more difficult to parallelize Fig. 14. Velocity of points on moving rigid body (vehicle)
the computations. Recently, Contemporary GPUs dedicate
hardware to particularly accelerate optical flow algorithms up
to 10 times faster [40]. Therefore, using parallel computation the grid size and implementing the algorithm using parallel
will accelerate the processing even more for the proposed computing to calculate optical flow.
approach.
Overall, the comparison results indicate that the proposed ACKNOWLEDGMENT
method’s performance in state estimation and computational
This research is sponsored by the Centre for Doctoral
cost is comparable with the state-of-the-art method (GMFA),
Training to Advance the Deployment of Future Mobility
and as the last part of analysing the results error sensitivity
Technologies (CDT FMT) at the University of Warwick.
of the proposed method is considered. The estimation error
sensitivity to the configuration of the TV, illustrated in Fig.13,
shows that the error magnitude is more sensitive to the A PPENDIX
orientation of the TV when the target vehicle is located at D ERIVING A NGULAR VELOCITY FROM VELOCITY VECTOR
farther distances (ln > 45 m). The way that the error value FIELD
changes with respect to the orientation of the TV (βn ) is also
interesting. The error increases at three specific orientations: Assuming rigid body motion, the angular velocity
n ocould be
βn = 0, 90, 180 deg. regarding Fig.8, the first (βn = 0 deg) obtained from the velocity vector field. If î, ĵ, k̂ are unit
and last (βn = 180 deg) orientations correspond to the vectors in {x, y, z}, respectively, and considering notation
configuration in which TV facing or backing on to the EV, used in Fig. 14, the angular velocity for planar motion is
whereas the second orientation (βn = 90 deg) is for the derived as follows:
case in which TV’s side is toward the EV i.e LiDAR sensor v = vc + ω × r
location. One of the possible reasons for this correlation could = vc + ω × (R − Rc )
be the fact that in these configurations the scanned point cloud = (vc − ω × Rc ) + ω × R
is no longer scattered in 3D space and mostly on a 2D plane.
For instance, in βn = 90 deg most of the scanned points are Rewriting this equation by substituting R = xî+y ĵ, and Vc =
from the side of the vehicle. However, to elaborate more on the vc − ω × Rc = Vcx î + Vcy ĵ:
error model and consider all involved factors more research is  
required in future studies. v = Vc + −ωy î + ωxĵ
= (Vcx − ωy) î + (Vcy + ωx) ĵ
VII. C ONCLUSION
And by applying the curl operator to both sides, the angular
In this study, a novel DATMO technique was proposed using velocity is obtained based on the curl of the vector field v. It
a Farneback optical flow algorithm. This study revealed the should be noted that the rigid body assumption makes the curl
promising potential of this approach in terms of accuracy and independent of the position and linear velocity of the centre
processing costs. Similar to traditional GMFA techniques, the c.
optical-flow-based technique approach proposed and studied
in this paper demonstrated good resilience against variations ∇×v = [∂/(∂x) (Vcy + ωx) − ∂/(∂y) (Vcx − ωy)] k̂
of the object sizes in driving scenes. Analysis of the error sen- = 2ω k̂
sitivity to the configuration of the target vehicle in this study ⇒ ω = 0.5 (∇ × v)
revealed meaningful correlations which could be used in future
for error modelling. Our results showed that the error values
increase when the TV moving in radial (βn = 0, 180 deg) R EFERENCES
and tangential (βn = 90 deg) directions in distances farther [1] M. Sualeh and G.-W. Kim, “Dynamic multi-lidar based multiple object
than 50 m. It shall be noted that Small size objects such detection and tracking,” Sensors, vol. 19, no. 6, p. 1474, 2019.
[2] M. Y. Abbass, K.-C. Kwon, N. Kim, S. A. Abdelwahab, F. E. A.
as pedestrians were not covered in our study. Further studies El-Samie, and A. A. Khalaf, “A survey on online learning for visual
could explore estimating the state of pedestrians by reducing tracking,” The Visual Computer, vol. 37, no. 5, pp. 993–1014, 2021.
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 13

[3] C. Premachandra, S. Ueda, and Y. Suzuki, “Detection and tracking [25] J. Groß, A. Ošep, and B. Leibe, “Alignnet-3d: Fast point cloud regis-
of moving objects at road intersections using a 360-degree camera tration of partially observed objects,” in 2019 International Conference
for driver assistance and automated driving,” IEEE Access, vol. 8, pp. on 3D Vision (3DV). IEEE, 2019, pp. 623–632.
135 652–135 660, 2020. [26] J. Kim, H. Lee, and K. Yi, “Online static probability map and odometry
[4] S. Sivaraman and M. M. Trivedi, “Looking at vehicles on the road: A estimation using automotive lidar for urban autonomous driving,” in
survey of vision-based vehicle detection, tracking, and behavior anal- 2021 IEEE International Intelligent Transportation Systems Conference
ysis,” IEEE transactions on intelligent transportation systems, vol. 14, (ITSC). IEEE, 2021, pp. 2674–2681.
no. 4, pp. 1773–1795, 2013. [27] J. Li, X. Huang, and J. Zhan, “High-precision motion detection and
[5] M. Kusenbach, M. Himmelsbach, and H.-J. Wuensche, “A new geo- tracking based on point cloud registration and radius search,” IEEE
metric 3d lidar feature for model creation and classification of moving Transactions on Intelligent Transportation Systems, 2023.
objects,” in 2016 IEEE Intelligent Vehicles Symposium (IV). IEEE, [28] E. Arnold, S. Mozaffari, and M. Dianati, “Fast and robust registration
2016, pp. 272–278. of partially overlapping point clouds,” IEEE Robotics and Automation
[6] A. Börcs, B. Nagy, and C. Benedek, “Instant object detection in lidar Letters, vol. 7, no. 2, pp. 1502–1509, 2021.
point clouds,” IEEE Geoscience and Remote Sensing Letters, vol. 14, [29] G. Farnebäck, “Two-frame motion estimation based on polynomial
no. 7, pp. 992–996, 2017. expansion,” in Scandinavian conference on Image analysis. Springer,
[7] S. Steyer, G. Tanzmeister, and D. Wollherr, “Grid-based environment 2003, pp. 363–370.
estimation using evidential mapping and particle tracking,” IEEE Trans- [30] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous
actions on Intelligent Vehicles, vol. 3, no. 3, pp. 384–396, 2018. driving? the kitti vision benchmark suite,” in Conference on Computer
Vision and Pattern Recognition (CVPR), 2012.
[8] M. C. Hutchison, J. A. Pautler, and M. A. Smith, “Traffic light signal
[31] A. Petrovskaya, M. Perrollaz, L. Oliveira, L. Spinello, R. Triebel,
system using radar-based target detection and tracking,” Oct. 26 2010,
A. Makris, J.-D. Yoder, C. Laugier, U. Nunes, and P. Bessière, “Aware-
uS Patent 7,821,422.
ness of road scene participants for autonomous driving,” Handbook of
[9] Y. Ye, L. Fu, and B. Li, “Object detection and tracking using multi-layer Intelligent Vehicles, pp. 1383–1432, 2012.
laser for autonomous urban driving,” in 2016 IEEE 19th International [32] P. J. Besl and N. D. McKay, “Method for registration of 3-d shapes,”
Conference on Intelligent Transportation Systems (ITSC). IEEE, 2016, in Sensor fusion IV: control paradigms and data structures, vol. 1611.
pp. 259–264. Spie, 1992, pp. 586–606.
[10] L. Spinello, R. Triebel, and R. Siegwart, “Multiclass multimodal detec- [33] D. Fortun, P. Bouthemy, and C. Kervrann, “Optical flow modeling and
tion and tracking in urban environments,” The International Journal of computation: A survey,” Computer Vision and Image Understanding,
Robotics Research, vol. 29, no. 12, pp. 1498–1515, 2010. vol. 134, pp. 1–21, 2015.
[11] B. Douillard, D. Fox, F. Ramos et al., “Laser and vision based outdoor [34] J. Tanaś and A. Kotyra, “Comparison of optical flow algorithms
object mapping.” in Robotics: Science and Systems, vol. 8, 2008. performance on flame image sequences,” in Photonics Applications
[12] M. Himmelsbach, A. Mueller, T. Lüttel, and H.-J. Wünsche, “Lidar- in Astronomy, Communications, Industry, and High Energy Physics
based 3d object perception,” in Proceedings of 1st international work- Experiments 2017, vol. 10445. SPIE, 2017, pp. 243–249.
shop on cognition for technical systems, vol. 1, 2008. [35] K. M. Urwin, Advanced calculus and vector field theory. Elsevier,
[13] S. Shi, C. Guo, L. Jiang, Z. Wang, J. Shi, X. Wang, and H. Li, “Pv- 2014.
rcnn: Point-voxel feature set abstraction for 3d object detection,” in [36] J. Casey, “A treatment of rigid body dynamics,” Journal of Applied
Proceedings of the IEEE/CVF Conference on Computer Vision and Mechanics, vol. 50, pp. 905–907, 1983.
Pattern Recognition, 2020, pp. 10 529–10 538. [37] H. Cho, Y.-W. Seo, B. V. Kumar, and R. R. Rajkumar, “A multi-sensor
[14] H. Wang and B. Liu, “Detection and tracking dynamic vehicles for fusion system for moving object detection and tracking in urban driving
autonomous driving based on 2-d point scans,” IEEE Systems Journal, environments,” in 2014 IEEE International Conference on Robotics and
2022. Automation (ICRA). IEEE, 2014, pp. 1836–1843.
[15] A. Petrovskaya and S. Thrun, “Model based vehicle detection and [38] Q. Wang, J. Chen, J. Deng, X. Zhang, and K. Zhang, “Simultaneous
tracking for autonomous urban driving,” Autonomous Robots, vol. 26, pose estimation and velocity estimation of an ego vehicle and moving
no. 2, pp. 123–139, 2009. obstacles using lidar information only,” IEEE Transactions on Intelligent
[16] J. An and E. Kim, “Novel vehicle bounding box tracking using a low- Transportation Systems, vol. 23, no. 8, pp. 12 121–12 132, 2021.
end 3d laser scanner,” IEEE Transactions on Intelligent Transportation [39] X. Liu, C. R. Qi, and L. J. Guibas, “Flownet3d: Learning scene flow
Systems, vol. 22, no. 6, pp. 3403–3419, 2020. in 3d point clouds,” in Proceedings of the IEEE/CVF conference on
[17] S. Steyer, C. Lenk, D. Kellner, G. Tanzmeister, and D. Wollherr, computer vision and pattern recognition, 2019, pp. 529–537.
“Grid-based object tracking with nonlinear dynamic state and shape [40] A. Medhekar, V. Chiluka, and A. Patait, “Accelerate opencv: Op-
estimation,” IEEE Transactions on Intelligent Transportation Systems, tical flow algorithms with nvidia turing gpus,” https://developer.nvidia.
vol. 21, no. 7, pp. 2874–2893, 2019. com/blog/opencv-optical-flow-algorithms-with-nvidia-turing-gpus/, ac-
[18] X. Zhang, W. Xu, C. Dong, and J. M. Dolan, “Efficient l-shape fitting cessed: 2019-12-05.
for vehicle detection using laser scanners,” in 2017 IEEE Intelligent
Vehicles Symposium (IV). IEEE, 2017, pp. 54–59.
[19] A. Asvadi, P. Peixoto, and U. Nunes, “Detection and tracking of moving
objects using 2.5 d motion grids,” in 2015 IEEE 18th International
Conference on Intelligent Transportation Systems. IEEE, 2015, pp. Mohammadreza Alipour Sormoli received the
788–793. M.Sc. degree from the Amirkabir University of
[20] R. Kaestner, J. Maye, Y. Pilat, and R. Siegwart, “Generative object Technology (Tehran Polytechnic) in 2017. worked
detection and tracking in 3d range data,” in 2012 IEEE International as a research assistant at Koc University and is
Conference on Robotics and Automation. IEEE, 2012, pp. 3075–3081. currently working toward the PhD degree in au-
tonomous driving technology at the University of
[21] H. Lee, J. Yoon, Y. Jeong, and K. Yi, “Moving object detection and
Warwick (WMG). His research interests include
tracking based on interaction of static obstacle map and geometric
robotics, mechatronics, control and dynamics of
model-free approachfor urban autonomous driving,” IEEE Transactions
autonomous systems.
on Intelligent Transportation Systems, vol. 22, no. 6, pp. 3275–3284,
2020.
[22] H. Lee, H. Lee, D. Shin, and K. Yi, “Moving objects tracking based
on geometric model-free approach with particle filter using automotive
lidar,” IEEE Transactions on Intelligent Transportation Systems, 2022.
[23] F. Moosmann and C. Stiller, “Joint self-localization and tracking of
generic objects in 3d range data,” in 2013 IEEE International Conference
on Robotics and Automation. IEEE, 2013, pp. 1146–1152.
[24] A. Dewan, T. Caselitz, G. D. Tipaldi, and W. Burgard, “Motion-based
detection and tracking in 3d lidar scans,” in 2016 IEEE international
conference on robotics and automation (ICRA). IEEE, 2016, pp. 4508–
4513.
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 14

Mehrdad Dianati (Senior Member, IEEE) is a


professor of connected and cooperative autonomous
vehicles at WMG, the University of Warwick and
the School of EEECS at the Queen’s University
of Belfast. He has been involved in a number of
national and international projects as the project
leader and the work-package leader in recent years.
Prior to academia, he worked in the industry for
more than nine years as a Senior Software/Hardware
Developer and the Director of Research and Devel-
opment. He frequently provides voluntary services to
the research community in various editorial roles; for example, he has served
as an Associate Editor for the IEEE Transactions On Vehicular Technology.
He is the Field Chief Editor of Frontiers in Future Transportation.

Sajjad Mozaffari received the B.Sc. and M.Sc.


degrees in electrical engineering from the Univer-
sity of Tehran, Tehran, Iran, in 2015 and 2018,
respectively. He is currently working toward the
PhD degree with the Warwick Manufacturing Group,
University of Warwick, Coventry, U.K. His research
interests include machine learning, computer vision,
and connected and autonomous vehicles.

Roger Woodman is an Assistant Professor and


Human Factors research lead at WMG, University of
Warwick. He received his PhD from Bristol Robotics
Laboratory and has more than 20 years of experi-
ence working in industry and academia. Among his
research interests, are trust and acceptance of new
technology with a focus on self-driving vehicles,
shared mobility, and human-machine interfaces. He
has several scientific papers published in the field
of connected and autonomous vehicles. He lectures
on the topic of Human Factors of Future Mobility
and is the Co-director of the Centre for Doctoral Training, training doctoral
researchers in the areas of intelligent and electrified mobility systems.

You might also like